[go: up one dir, main page]

WO2025149083A1 - Polypeptides iscb et leurs utilisations - Google Patents

Polypeptides iscb et leurs utilisations

Info

Publication number
WO2025149083A1
WO2025149083A1 PCT/CN2025/072127 CN2025072127W WO2025149083A1 WO 2025149083 A1 WO2025149083 A1 WO 2025149083A1 CN 2025072127 W CN2025072127 W CN 2025072127W WO 2025149083 A1 WO2025149083 A1 WO 2025149083A1
Authority
WO
WIPO (PCT)
Prior art keywords
iscb
sequence
activity
polypeptide
domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2025/072127
Other languages
English (en)
Inventor
Hainan ZHANG
Qingquan XIAO
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huidagene Therapeutics Co Ltd
Huidagene Therapeutics Singapore Pte Ltd
Original Assignee
Huidagene Therapeutics Co Ltd
Huidagene Therapeutics Singapore Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huidagene Therapeutics Co Ltd, Huidagene Therapeutics Singapore Pte Ltd filed Critical Huidagene Therapeutics Co Ltd
Publication of WO2025149083A1 publication Critical patent/WO2025149083A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]

Definitions

  • CRIPSR Cas systems such as type II Cas9 and type V Cas12 systems, serving as the prokaryotic adaptive immunity system against viruses, have been developed into genome editing tools in basic research and gene therapy 1- 3 .
  • Engineered Cas9 nickase (nCas9) or deactivated Cas9 (dCas9) versions fused with various domains have established base editing, prime editing, and epigenome editing technologies 4-6 .
  • nCas9 nickase
  • dCas9 deactivated Cas9
  • the large size of Cas9 and Cas12, particularly nCas9-based gene editing tool hinders the application of gene editing based on adeno-associated virus (AAV) vectors.
  • AAV adeno-associated virus
  • IscB proteins serve as compact RNA-guided DNA endonucleases, making it a strong candidate for base editing.
  • the inventor identified 10 out of 19 uncharacterized IscB proteins from uncultured microbes showing RNA-guided DNA endonuclease activity in mammalian cells.
  • the inventor further enhanced the RNA-guided DNA endonuclease activity of IscB ortholog IscB. m16 and expanded its target-adjacent motif (TAM) scope from MRNRAA to NNNGNA, resulting in an enhanced IscB system named as IscB. m16*.
  • TAM target-adjacent motif
  • the invention of the disclosure is not and shall not be used to edit any human germ cell (i.e., an embryonic cell, an egg cell, a sperm cell) containing any genetic material in any jurisdiction unless it is allowed by applicable laws and regulations in the jurisdiction.
  • human germ cell i.e., an embryonic cell, an egg cell, a sperm cell
  • the IscB polypeptide is a mutant of SEQ ID NO: 16 and comprises an amino acid mutation (e.g., substitution) relative to (compared to) SEQ ID NO: 16 at a position selected from the group consisting of K30, K37, H38, N39, P50, V53, E74, E77, V79, H83, E85, K93, A96, K99, Q103, A104, H107, V133, E142, E159, P160, L172, T179, K180, E182, K196, E201, D204, H205, H214, L217, F218, E221, S222, D224, D225, Y228, A229, E232, G233, K234, G239, I250, H253, E254, A261, S268, G269, D272, L273, A278, A280, D282, K283, A285, V287, K292, K307, T310, A313, D318, E326, S
  • the amino acid mutation is a substitution with R, S, H, L, V, or E.
  • the amino acid mutation leads to a decreased guide sequence-independent (off-target) endonuclease activity, or wherein the IscB polypeptide comprising said amino acid mutation has a decreased guide sequence-independent (off-target) endonuclease activity compared to the reference or wild type IscB polypeptide of any one of SEQ ID NOs: 1-19, e.g., a decrease by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%.
  • the amino acid substitution is an amino acid substitution with a positively charged amino acid residue, such as, Arginine (R) .
  • the amino acid mutation comprises a substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of T459E, P460S, T462H, T462L, T465V, and a combination of any two or more residues thereof, wherein the position is numbered according to SEQ ID NO: 16.
  • the amino acid mutation comprises a substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of E326R, T459E, P460S, T462H, and a combination of any two or more residues thereof, wherein the position is numbered according to SEQ ID NO: 16.
  • the amino acid mutation comprises a combination substitution corresponding to a combination substitution of D61A, E326R, P460S, T462H, and T459E, wherein the position is numbered according to SEQ ID NO: 16.
  • the amino acid mutation comprises a combination substitution corresponding to a combination substitution of D61A, H248A, E326R, P460S, T462H, and T459E, wherein the position is numbered according to SEQ ID NO: 16.
  • the IscB polypeptide comprising said amino acid mutation comprises, consists essentially of, or consists an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of SEQ ID NO: 239 or an N-terminal truncation of the amino acid sequence of SEQ ID NO: 239 lacking the most N-terminal Methionine (M) (coded by start codon ATG) .
  • M N-terminal Methionine
  • the IscB polypeptide comprising said amino acid mutation comprises, consists essentially of, or consists an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of SEQ ID NO: 240 or 241 or an N-terminal truncation of the amino acid sequence of SEQ ID NO: 240 or 241 lacking the most N-terminal Methionine (M) (coded by start codon ATG) .
  • M N-terminal Methionine
  • the IscB polypeptide is an endonuclease or has guide sequence-specific endonuclease activity.
  • the IscB polypeptide is a nickase or has guide sequence-specific nickase activity.
  • the IscB polypeptide is endonuclease deficient.
  • the IscB polypeptide is catalytically inactive.
  • the IscB polypeptide is fused to a functional domain to form a fusion protein.
  • the functional domain has transposase activity, methylase activity, demethylase activity, translation activation activity, translation repression activity, transcription activation activity, transcription repression activity, transcription release factor activity, chromatin modifying or remodeling activity, histone modification activity, nuclease activity, single-strand RNA cleavage activity, double-strand RNA cleavage activity, single-strand DNA cleavage activity, double-strand DNA cleavage activity, nucleic acid binding activity, detectable activity, or any combination thereof.
  • the adenine deaminase or a catalytic domain thereof is TadA8E V106W (SEQ ID NO: 253) .
  • the functional domain comprises a methylpurine glycosylase (MPG) .
  • MPG methylpurine glycosylase
  • the functional domain comprises a methylase or a catalytic domain thereof.
  • the disclosure provides a system comprising:
  • IscB polypeptide or method of the disclosure or a polynucleotide (e.g., a DNA, an RNA) encoding the IscB polypeptide, and
  • a guide nucleic acid or a polynucleotide e.g., a DNA, an RNA
  • the guide nucleic acid comprising:
  • the target sequence comprises about or at least about 14 contiguous nucleotides of the target DNA, e.g., about or at least about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more contiguous nucleotides of the target DNA, or in a numerical range between any two of the preceding values, e.g., from about 14 to about 20 contiguous nucleotides of the target DNA, from about 14 to about 50 contiguous nucleotides of the target DNA; optionally, wherein the target sequence comprises about 14 contiguous nucleotides of the target DNA.
  • the target sequence is immediately 3’ to a target adjacent motif (TAM) , or wherein the reversely complementary sequence of the target sequence (i.e., the protospacer sequence) is immediately 5’ to a target adjacent motif (TAM) .
  • TAM target adjacent motif
  • the TAM is 5’-NNNGNA-3’, wherein N is A, T, G, or C; and optionally, wherein the TAM is 5’-NNNGAN-3’, wherein N is A, T, G, or C.
  • the guide sequence is at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100% (fully) , optionally about 100% (fully) , reversely complementary to the target sequence; (2) the guide sequence contains no more than 5, 4, 3, 2, or 1 mismatch or contains no mismatch with the target sequence; or (3) the guide sequence comprises no mismatch with the target sequence in the first 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, or 70 nucleotides at the 5’ end of the guide sequence.
  • the guide sequence comprises a polynucleotide sequence having a sequence identity of at least about 60% (e.g., at least about 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to any one of SEQ ID NOs: 261-403 or comprises a polynucleotide sequence having no more than 1, 2, 3, 4, 5, 6, 7, or 8 nucleotide difference from any one of SEQ ID NOs: 261-403.
  • the disclosure provides a polynucleotide encoding the IscB polypeptide or method of the disclosure (e.g., SEQ ID NOs: 39-57) .
  • the disclosure provides a polynucleotide encoding the guide nucleic acid of the disclosure.
  • the disclosure provides a lipid nanoparticle (LNP) comprising an RNA (e.g., mRNA) encoding the IscB polypeptide or method of the disclosure and a guide nucleic acid optionally as defined in the disclosure.
  • LNP lipid nanoparticle
  • the disclosure provides a cell comprising the IscB polypeptide or method of the disclosure, the system of the disclosure, the polynucleotide of the disclosure, the vector of the disclosure, the rAAV particle of the disclosure, the RNP of the disclosure, or the LNP of the disclosure.
  • the cell is not a human germ cell (i.e., an embryonic cell, an egg cell, a sperm cell) .
  • the disclosure provides a method for modifying a target DNA, comprising contacting the target DNA with the system of the disclosure, the vector of the disclosure, the ribonucleoprotein of the disclosure, or the lipid nanoparticle of the disclosure, wherein the guide sequence is capable of hybridizing to a target sequence of the target DNA, wherein the target DNA is modified by the complex.
  • the method is ex vivo, in vivo, or in vitro.
  • the method is non-therapeutical.
  • the target DNA is in a cell
  • the cell is a eukaryotic cell (e.g., an animal cell, a vertebrate cell, a mammalian cell, a non-human mammalian cell, a non-human primate cell, a rodent (e.g., mouse or rat) cell, a human cell, a plant cell, or a yeast cell) or a prokaryotic cell (e.g., a bacteria cell) ;
  • a eukaryotic cell e.g., an animal cell, a vertebrate cell, a mammalian cell, a non-human mammalian cell, a non-human primate cell, a rodent (e.g., mouse or rat) cell, a human cell, a plant cell, or a yeast cell
  • a prokaryotic cell e.g., a bacteria cell
  • the cell is from a plant or an animal
  • the plant is a dicotyledon; optionally selected from the group consisting of soybean, cabbage (e.g., Chinese cabbage) , rapeseed, brassica, watermelon, melon, potato, tomato, tobacco, eggplant, pepper, cucumber, cotton, alfalfa, eggplant, grape;
  • the plant is a monocotyledon; optionally selected from the group consisting of rice, corn, wheat, barley, oat, sorghum, millet, grasses, Poaceae, Zizania, Avena, Coix, Hordeum, Oryza, Panicum (e.g., Panicum miliaceum) , Secale, Setaria (e.g., Setaria italica) , Sorghum, Triticum, Zea, Cymbopogon, Saccharum (e.g., Saccharum officinarum) , Phyllostachys, Dendrocalamus, Bambusa, Yushania; and/or
  • animal is selected from the group consisting of pig, ox, sheep, goat, mouse, rat, alpaca, monkey, rabbit, chicken, duck, goose, fish (e.g., zebra fish) .
  • the disclosure provides a cell modified by the method of the disclosure.
  • the cell is not a human embryonic stem cell.
  • the disclosure provides a method of increasing guide sequence-specific binding ability (e.g., represented by the guide sequence-specific endonuclease activity of the IscB system or guide sequence-specific base editing efficiency of the IscB system) of a guide nucleic acid for use in an IscB system comprising (1) an IscB polypeptide (e.g., the IscB polypeptide or method of the disclosure) , or a polynucleotide (e.g., a DNA, an RNA) encoding the IscB polypeptide, and (2) the guide nucleic acid or a polynucleotide (e.g., a DNA, an RNA) encoding the guide nucleic acid, the guide nucleic acid comprising:
  • scaffold sequence is 3’ to the guide sequence.
  • the guide nucleic acid is a guide RNA (gRNA) (interchangeably used with omega RNA ( ⁇ RNA) ) .
  • gRNA guide RNA
  • ⁇ RNA omega RNA
  • the guide nucleic acid is capable of directing guide sequence specific binding of the complex to the target sequence of the target DNA.
  • the method comprises introducing a nucleotide mutation into the scaffold sequence relative to a reference or wild type scaffold sequence compatible to the IscB polypeptide.
  • the method comprises introducing a nucleotide mutation into the scaffold sequence relative to a reference or wild type scaffold sequence of any one of SEQ ID NOs: 20-38.
  • the nucleotide mutation comprises a deletion of about, at least about, or at most about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 nucleotides in a stem-loop region of the reference scaffold sequence.
  • the nucleotide mutation comprises a substitution of about, at least about, or at most about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more thermodynamically unstable base pairs in a stem-loop region of the reference scaffold sequence with a G-C or C-G base pair.
  • thermodynamically unstable base pair is a A-U or U-Abase pair, a A-G or G-Abase pair, or a U-G or G-U base pair.
  • the stem-loop region is selected from the first 5’ stem loop region and the first 3’ stem loop region, wherein the first 5’ stem loop region is counted from the 5’ end of the reference scaffold sequence, and wherein the first 3’ stem loop region is counted from the 3’ end of the reference scaffold sequence.
  • an IscB-based dsDNA base editing system comprising:
  • an uracil glycosylase inhibitor (UGI) domain capable of inhibiting a uracil-DNA glycosylase (UDG) ;
  • RNA or “ ⁇ RNA” throughout the disclosure
  • guide RNA comprising:
  • the cytidine deamination domain is capable of deaminating a cytosine base of a target nucleotide of a protospacer sequence on the nontarget strand of the target dsDNA, wherein the protospacer sequence is complementary to the target sequence.
  • the IscB domain is a IscB nickase (nIscB) domain capable of nicking the target strand.
  • the IscB domain is a IscB nickase of the disclosure.
  • the IscB domain is a IscB nickase (nIscB) domain that comprises a D-to-Amutation corresponding to the D61A mutation in the amino acid sequence of SEQ ID NO: 444.
  • nIscB IscB nickase
  • the IscB domain comprise an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of SEQ ID NO: 444.
  • a sequence identity of at least about 60% (e.g., at least about 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of SEQ ID NO: 444.
  • the cytidine deamination domain is a deaminase from an apolipoprotein B mRNA-editing complex (APOBEC) family deaminase.
  • APOBEC apolipoprotein B mRNA-editing complex
  • the APOBEC family deaminase is selected from the group consisting of APOBEC1 deaminase, APOBEC2 deaminase, APOBEC3A deaminase, APOBEC3B deaminase, APOBEC3C deaminase, APOBEC3D deaminase, APOBEC3F deaminase, APOBEC3G deaminase, and APOBEC3H deaminase.
  • the cytidine deamination domain comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of SEQ ID NO: 254.
  • the cytidine deaminase domain is an activation-induced deaminase (AID) .
  • the cytidine deaminase domain is a cytidine deaminase 1 from Petromyzon marinus (pmCDA1) .
  • the base editor comprises one UGI domain, two UGI domains, or three UGI domains; and optionally one UGI domain.
  • the UGI domain comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of SEQ ID NO: 255.
  • 60% e.g., at least about 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%
  • the base editor comprises one IscB domain and one cytidine deamination domain.
  • the base editor is a fusion protein comprising the IscB domain, the cytidine deamination domain, and the UGI domain, wherein any adjacent two of the IscB domain, the cytidine deamination domain, and the UGI domain are connected to each other with or without a linker.
  • the IscB domain is at the N-or C-terminal, optionally C-terminal, of the cytidine deamination domain.
  • the base editor comprises a nuclear localization sequence (NLS) , e.g., one, two, three, or four NLSs.
  • NLS nuclear localization sequence
  • the base editor comprises a NLS N-or C-terminally fused to the cytidine deaminase domain.
  • the base editor comprises a NLS N-or C-terminally fused to the IscB domain.
  • the cytidine deaminase domain and the IscB domain are linked to each other via a linker comprising the amino acid sequence (GGGS) n, (GGGGS) n, (G) n, (EAAAK) n, (GGS) n, (SGGS) n, SGSETPGTSESATPES (XTEN linker, SEQ ID NO: 448) , GGGGGSGGGGSGGGGSGGGGS (SEQ ID NO: 449) , a GS linker containing a XTEN linker (SEQ ID NO: 448) (e.g., SGGSSGGSSGSETPGTSESATPESSGGSSGGS (GS-XTEN-GS linker, SEQ ID NO: 459) ) , a GS linker containing a NLS (e.g., bpNLS, such as SEQ ID NO: 445 or 446) (e.g., SGGSSGGSKRTADGSEFESPKKKRKVSGGSSGGSGSGSGSGS
  • the cytidine deaminase domain and the IscB domain are linked to each other via a linker comprising a XTEN linker (e.g., SEQ ID NO: 448) or a GS linker (e.g., SEQ ID NO: 449) .
  • a linker comprising a XTEN linker (e.g., SEQ ID NO: 448) or a GS linker (e.g., SEQ ID NO: 449) .
  • the fusion protein comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of any one of SEQ ID NOs: 406-415.
  • an IscB-based dsDNA base editing system comprising:
  • a base editor or a polynucleotide encoding the base editor, the base editor comprising:
  • the adenine deamination domain is capable of deaminating an adenine base of a target nucleotide of a protospacer sequence on the nontarget strand of the target dsDNA, wherein the protospacer sequence is complementary to the target sequence.
  • the IscB domain is a IscB nickase (nIscB) domain capable of nicking the target strand.
  • the IscB domain is a IscB nickase (nIscB) domain that comprises a D-to-Amutation corresponding to the D61A mutation in the amino acid sequence of SEQ ID NO: 444.
  • nIscB IscB nickase
  • the IscB domain comprise an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of SEQ ID NO: 444.
  • a sequence identity of at least about 60% (e.g., at least about 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of SEQ ID NO: 444.
  • the adenine deamination domain is a deaminase from a Bacterial tRNA adenosine deaminase (TadA) family deaminase.
  • TadA Bacterial tRNA adenosine deaminase
  • the base editor comprises one IscB domain and one adenine deamination domain.
  • the IscB domain is at the N-terminal or the C-terminal, optionally N-terminal, of the adenine deamination domain.
  • the base editor comprises one IscB domain and two adenine deamination domains.
  • the two adenine deamination domains are separated by the IscB domain.
  • the fusion protein comprises the structure:
  • the base editor comprises a nuclear localization sequence (NLS) , e.g., one, two, three, or four NLSs.
  • NLS nuclear localization sequence
  • the base editor comprises a NLS N-or C-terminally fused to the adenine deaminase domain.
  • the base editor comprises a NLS N-or C-terminally fused to the IscB domain.
  • the NLS comprises the amino acid sequence PKKKRKV, MDSLLMNRRKFLYQFKNVRWAKGRRETYLC, KRTADGSEFESPKKKRKV (bpNLS 1, SEQ ID NO: 445) , KRTADGSESEPKKKRKV (bpNLS 1, SEQ ID NO: 446) , or KRPAATKKAGQAKKKK (npNLS, SEQ ID NO: 447) .
  • the adenine deaminase domain and the IscB domain are linked to each other via a linker comprising a XTEN linker (e.g., SEQ ID NO: 448) or a bpNLS (e.g., SEQ ID NO: 445 or 446) , for example, a GS-XTEN-GS linker (SEQ ID NO: 459) , a GS-bpNLS-GS linker (SEQ ID NO: 450) .
  • a linker comprising a XTEN linker (e.g., SEQ ID NO: 448) or a bpNLS (e.g., SEQ ID NO: 445 or 446) , for example, a GS-XTEN-GS linker (SEQ ID NO: 459) , a GS-bpNLS-GS linker (SEQ ID NO: 450) .
  • the two adenine deaminase domains are in tandem and linked to each other via a linker comprising a bpNLS (e.g., SEQ ID NO: 445 or 446) , for example, a GS-bpNLS-GS linker (SEQ ID NO: 450) , and the two adenine deaminase domains in tandem are linked to the IscB domain via a linker comprising a bpNLS (e.g., SEQ ID NO: 445 or 446) , for example, a GS-bpNLS-GS linker (SEQ ID NO: 450) .
  • a linker comprising a bpNLS e.g., SEQ ID NO: 445 or 446
  • a GS-bpNLS-GS linker SEQ ID NO: 450
  • the fusion protein comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of any one of SEQ ID NOs: 416-427.
  • the scaffold sequence comprises a polynucleotide sequence having a sequence identity of at least about 60% (e.g., at least about 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the polynucleotide sequence of SEQ ID NO: 442.
  • FIG. 7 shows functional properties of IscB systems. Distribution of conserved residues in 19 newly identified IscB proteins and 2 reported IscB proteins, OgeuIscB and AwaIscB. The conserved residues in RuvC, HNH, P1D and TID domains were marked as asterisks and blue lines in the bottom bar.
  • FIG. 10 shows engineering various IscB ⁇ RNAs to improve their editing efficiency in mammalian cells.
  • IscB. m1 (a) and IscB. m8 (b) improved EGFP signal by truncation of stem loop of ⁇ RNA
  • the red dotted lines represent the value of WT.
  • FIG. 13 shows TAM profiling of IscB. m16 and IscB. m16 RESH in HEK293T cells using fluorescence reporter system.
  • a-b TAM profiling of IscB. m16 and IscB. m16 RESH for 72 reporters containing the same spacer and 5’-NNNGAA-3’ (a) or 5’-AAAGNN-3’ (b) TAM sequences, along with ⁇ RNA-v2.27. Data shown as the mean of three independent biological replicates.
  • FIG. 14 shows characterization of IscB. m16 and IscB. m16*editing activities in HEK293T cells.
  • a Effect of the spacer length on cleavage activity generated by IscB. m16 WT and IscB. m16*at 2 sites on GFxxFP reporters.
  • FIG. 19 shows the gRNA-independent off-target levels of five R loops formed by dSaCas9 in IscB-and SpG-derived adenine base editors in HEK293T cells.
  • a-c The gRNA-independent off-target levels of base editors at ALDH1A3-S1 (a) , EMX1-S2 (b) , and VEGFA-S1 (c) .
  • Fig. 23 shows an exemplified target DNA and an exemplified IscB system comprising a guide nucleic acid and an IscB polypeptide.
  • an exemplary dsDNA is depicted to comprise a 5’ to 3’ s ingle DNA strand and a 3’ to 5’ s ingle DNA strand
  • the 5’ to 3’ s ingle DNA strand comprises an exemplary first deoxyribonucleotide dA
  • the 3’ to 5’ s ingle DNA strand comprises an exemplary second deoxyribonucleotide dC that base pairs with the dT.
  • An exemplary guide nucleic acid is depicted to comprise a guide sequence and a scaffold sequence.
  • the guide sequence is designed to hybridize to a part of the 3’ to 5’ s ingle DNA strand, and so the guide sequence “targets” that part.
  • the 3’ to 5’ s ingle DNA strand is referred to as a “target strand (TS) ” of the dsDNA
  • the opposite 5’ to 3’ s ingle DNA strand is referred to as a “nontarget strand (NTS) ” of the dsDNA.
  • nucleic acid sequence e.g., a DNA sequence
  • a nucleic acid sequence is written in 5’ to 3’ direction/orientation unless explicitly indicated otherwise.
  • a DNA sequence of ATGC it is usually understood as 5’-ATGC-3’ unless otherwise indicated. Its reverse sequence is 5’-CGTA-3’. Its fully complementary sequence is 5’-TACG-3’. Its fully reverse complementary sequence is 5’-GCAT-3’. Note that the fully complementary sequence usually does not have the ability to base-pair /hybridize with the original sequence.
  • the guide sequence of a guide nucleic acid is designed to have a sequence of 5’-GCAU-3’ that is fully reversely complementary to the 5’ to 3’ strand of the dsDNA (5’-ATGC-3’) , which would be set forth in GCAT in the electric sequence listing and marked as an RNA sequence according to WIPO standard ST. 26.
  • the guide sequence of a guide nucleic acid is fully reversely complementary to the target sequence and the target sequence is fully reversely complementary to the protospacer sequence
  • the guide sequence is identical to the protospacer sequence except for the difference between the U in the guide sequence due to its RNA nature and the corresponding T in the protospacer sequence due to its DNA nature.
  • symbol “t” is used to denote both T in DNA and U in RNA (See “Table 1: List of nucleotides symbols” , the definition of symbol “t” is “thymine in DNA/uracil in RNA (t/u) ” ) .
  • guide nucleic acid refers to a nucleic acid-based molecule capable of forming a complex with a nucleic acid programmable protein, for example, an IscB polypeptide of the disclosure (e.g., via a scaffold sequence of the guide nucleic acid) , and comprises a sequence (e.g., a guide sequences) that is sufficient to hybridize to a target nucleic acid and guides the complex to the target nucleic acid, which includes but is not limited to RNA-based molecules, e.g., a guide RNA.
  • RNA guide As used herein, the terms “guide RNA (gRNA) ” , “omega RNA” , “ ⁇ RNA” , and “RNA guide” are used interchangeably. As used in the disclosure, the term “guide sequence” is used interchangeably with the term “spacer sequence” .
  • the term “complex” refers to a grouping of two or more molecules.
  • the complex comprises a nucleic acid and a polypeptide interacting with (e.g., binding to, coming into contact with, adhering to) one another.
  • the term “complex” can refer to a grouping of a guide nucleic acid and a polypeptide (e.g., an IscB polypeptide) .
  • the term “complex” can refer to a grouping of a guide nucleic acid, a polypeptide (e.g., an IscB polypeptide) , and a target nucleic acid (e.g., a target DNA) .
  • adjacent includes instances wherein there is no nucleotide between the protospacer sequence and the PAM and also instances wherein there are a small number (e.g., 1, 2, 3, 4, or 5) of nucleotides between the protospacer sequence and the PAM.
  • a “immediately adjacent (to) ” B, A “immediately 5’ to” B, and A “immediately 3’ to” B mean that there is no nucleotide between A and B.
  • the guide sequence is so designed to be substantially capable of hybridizing to a target sequence.
  • the term “hybridize” refers to a reaction in which one or more polynucleotide sequences react to form a complex that is stabilized via hydrogen bonding between the bases of the one or more polynucleotide sequences. The hydrogen bonding may occur by Watson Crick base pairing, Hoogstein binding, or in any other sequence specific manner.
  • a polynucleotide sequence capable of hybridizing to a given polynucleotide sequence is referred to as the “complement” of the given polynucleotide sequence.
  • the term “substantially complementary” refers to a polynucleotide sequence (e.g., a guide sequence) that has a certain level of complementarity to a second polynucleotide sequence (e.g., a target sequence) (e.g., at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%of the guide sequence can base-pair with the polynucleotide sequence of the target sequence, or at most 1, 2, 3, 4, or 5 contiguous or non-contiguous nucleotides of the guide sequence mismatch the nucleotides of the target sequence) .
  • a guide sequence that is substantially complementary to a target sequence has 100%or less than 100%complementarity to the target sequence. In some embodiments, a guide sequence that is substantially complementary to a target sequence has at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% complementarity to the target sequence, and/or has at most 1, 2, 3, 4, or 5 contiguous or non-contiguous nucleotide mismatches from the target sequence.
  • identity refers to the overall relatedness between polymeric molecules, e.g., between nucleic acids (e.g., DNA and/or RNA) and/or between polypeptides.
  • polymeric molecules are considered to be “substantially identical” to one another if their sequences are at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99%identical.
  • variant refers to an entity that shows significant structural identity with a reference entity (e.g., a wild-type sequence) but differs structurally from the reference entity in the presence or level of one or more chemical moieties as compared with the reference entity. In many embodiments, a variant also differs functionally from its reference entity. In general, whether a particular entity is properly considered to be a “variant” of a reference entity is based on its degree of structural identity with the reference entity. As will be appreciated by those skilled in the art, any biological or chemical reference entity has certain characteristic structural elements. A variant, by definition, is a distinct chemical entity that shares one or more such characteristic structural elements.
  • a variant polypeptide shows an overall sequence identity with a reference polypeptide (e.g., a nuclease described herein) that is at least about 60%, 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%or 99%.
  • a variant polypeptide does not share at least one characteristic sequence element with a reference polypeptide, for example, an IscB nickase as a variant of a reference IscB polypeptide does not share an active RuvC domain or an active HNH domain with the reference IscB polypeptide.
  • the reference polypeptide has one or more biological activities.
  • a variant polypeptide shares one or more of the biological activities of the reference polypeptide, e.g., nuclease activity.
  • a variant polypeptide lacks one or more of the biological activities of the reference polypeptide, for example, an IscB nickase as a variant of a reference IscB polypeptide does not share the endonuclease activity of the reference IscB polypeptide.
  • a variant polypeptide shows a reduced level of one or more biological activities (e.g., nuclease activity, e.g., off-target nuclease activity) as compared with the reference polypeptide.
  • a polypeptide of interest is considered to be a “variant” of a reference polypeptide if the polypeptide of interest has an amino acid sequence that is identical to that of the reference polypeptide but for a small number of sequence alterations at particular positions. Typically, fewer than about 20%, 15%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1%of the residues in the variant are substituted as compared with the reference polypeptide.
  • any additions or deletions are typically fewer than about 25, about 20, about 19, about 18, about 17, about 16, about 15, about 14, about 13, about 12, about 11, about 10, about 9, about 8, about 7, about 6, and commonly are fewer than about 5, about 4, about 3, or about 2 residues.
  • the reference polypeptide is a wild type polypeptide.
  • a variant of a polynucleotide may be naturally occurring such as an allelic variant, or it may be a variant that is not known to occur naturally.
  • Non-naturally occurring variants of a polynucleotide may be made by mutagenesis techniques, by direct synthesis, and by other recombinant methods known to skilled artisans.
  • nucleic acid or polypeptide As used herein, the terms “non-naturally occurring” and “engineered” are used interchangeably and refer to artificial participation. When these terms are used to describe a nucleic acid or a polypeptide, it is meant that the nucleic acid or polypeptide is at least substantially freed from at least one other component of its association in nature or as found in nature.
  • a “conservative substitution” refers to a substitution of an amino acid made among amino acids within one of the following four groups:
  • non-polar amino acids including Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , and Phenylalanine (Phe/F) ;
  • polar amino acids including Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , and Glutamine (Gln/Q) ; and
  • Lysine Lys/K
  • Arginine Arg/R
  • Histidine Histidine
  • wild type has the meaning commonly understood by those skilled in the art to mean a typical form of an organism, a strain, a gene, or a feature that distinguishes it from a mutant or variant when it exists in nature. It can be isolated from sources in nature and not intentionally modified.
  • a mutant /engineered polypeptide e.g., of WT OgeuIscB “comprising an amino acid mutation (e.g., substitution) at a position corresponding to a given position (e.g., D61) of a given polypeptide (e.g., WT OgeuIscB of SEQ ID NO: 1) ” or similar description means that the given polypeptide serves as a parent or reference polypeptide that does not comprises an amino acid mutation at the given position, and the mutant is a mutant of the parent or reference polypeptide and comprises an amino acid mutation at a position of the amino acid sequence of the mutant corresponding to the given position of the amino acid sequence of the given polypeptide.
  • the mutant comprising an amino acid mutation at a position corresponding to D61 of a given polypeptide means that the mutant comprises an amino acid mutation at position 41 of the mutant since position 41 in the mutant is corresponding to D61 in the given polypeptide as determined by alignment of the mutant and the given polypeptide.
  • an engineered IscB polypeptide comprising an amino acid substitution corresponding to D61A relative to SEQ ID NO: 1 refers to the fact that the parent or reference polypeptide of SEQ ID NO: 1 comprises amino acid D (Asp) at position 61, and the engineered IscB polypeptide comprises amino acid A (Ala) at a position corresponding to D61 of SEQ ID NO: 1.
  • wild type OgeuIscB refers to the fact that the parent or reference polypeptide of SEQ ID NO: 1 comprises amino acid D (Asp) at position 61
  • the engineered IscB polypeptide comprises amino acid A (Ala) at a position corresponding to D61 of SEQ ID NO: 1.
  • upstream and downstream refer to relative positions within a single nucleic acid (e.g., DNA) sequence in a nucleic acid. “Upstream” and “downstream” relate to the 5’ to 3’ direction, respectively, in which transcription occurs.
  • the first sequence is upstream of the second sequence when the 3’ end of the first sequence is on the left side of the 5’ end of the second sequence, and the first sequence is downstream of the second sequence when the 5’ end of the first sequence is on the right side of the 3’ end of the second sequence.
  • a promoter is usually at the upstream of a sequence under the regulation of the promoter; and on the other hand, a sequence under the regulation of a promoter is usually at the downstream of the promoter.
  • regulatory element refers to a DNA sequence that controls or impacts one or more aspects of transcription and/or expression and is intended to include promoters, enhancers, silencers, termination signals, internal ribosome entry sites (IRES) , and other expression control elements (e.g., transcription termination signals such as polyadenylation signals and poly-U sequences) .
  • Regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host cells and those that direct expression of a nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences) . Regulatory elements may also direct expression in a time-dependent manner, e.g., in a cell cycle-dependent or developmental stage-dependent manner, which may or may not be tissue or cell type specific.
  • operably linked refers to a juxtaposition wherein the components described are in a relationship permitting them to function in their intended manner.
  • a regulatory element “operably linked” to a functional element is associated in such a way that transcription, expression, and/or activity of the functional element is achieved under conditions compatible with the regulatory element.
  • “operably linked” regulatory elements are contiguous (e.g., covalently linked) with the functional elements of interest; in some embodiments, regulatory elements act in trans to or otherwise at a distance from the functional elements of interest.
  • the term “treat” , “treatment” , or “treating” is an approach for obtaining beneficial or desired results including clinical results.
  • the beneficial or desired clinical results include, but are not limited to, one or more of the following: alleviating one or more symptoms resulting from a disease, diminishing the extent of a disease, stabilizing a disease (e.g., delaying the worsening of a disease) , delaying the spread (e.g., metastasis) of a disease, delaying the recurrence of a disease, reducing recurrence rate of a disease, delay or slowing the progression of a disease, ameliorating a disease state, providing a remission (partial or total) of a disease, decreasing the dose of one or more other medications required to treat a disease, delaying the progression of a disease, increasing the quality of life, and prolonging survival.
  • a reduction of pathological consequence of a disease such as cancer
  • the disclosure provides, in part, engineered IscB polypeptides that exhibit improved specificity for targeting a DNA, e.g., relative to a wild type IscB polypeptide. Improved specificity can be, e.g., (i) increased on-target binding, cleavage, and/or editing of DNA and/or (ii) decreased off-target binding, cleavage, and/or editing of DNA, e.g., relative to a wild type IscB polypeptide and/or to another engineered IscB polypeptide.
  • the disclosure also provides, in part, optimal configurations of IscB-based base editors.
  • scaffold sequence is 3’ to the guide sequence
  • said method comprising introducing an amino acid mutation into the IscB polypeptide.
  • the IscB polypeptide comprises an amino acid mutation relative to (compared to) a reference or wild type IscB polypeptide of any one of SEQ ID NOs: 1-19.
  • the RuvC-I domain comprises, consists essentially of, or consists of amino acid residues of the reference or wild type IscB polypeptide at positions corresponding to positions 55-85 of SEQ ID NO: 16.
  • the Bridge Helix domain comprises, consists essentially of, or consists of amino acid residues of the reference or wild type IscB polypeptide at positions corresponding to positions 86-122 of SEQ ID NO: 16.
  • the HNH domain comprises, consists essentially of, or consists of amino acid residues of the reference or wild type IscB polypeptide at positions corresponding to positions 197-297 of SEQ ID NO: 16.
  • the P1D domain comprises, consists essentially of, or consists of amino acid residues of the reference or wild type IscB polypeptide at positions corresponding to positions 375-429 of SEQ ID NO: 16.
  • the TID domain comprises, consists essentially of, or consists of amino acid residues of the reference or wild type IscB polypeptide at positions corresponding to positions 430-513 of SEQ ID NO: 16.
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of M1, A2, N3, V4, I5, Y6, V7, I8, N9, K10, D11, G12, K13, P14, L15, M16, P17, T18, T19, R20, R21, G22, H23, V24, G25, Y26, L27, L28, R29, K30, K31, Q32, A33, R34, V35, V36, K37, H38, N39, P40, F41, T42, V43, Q44, L45, S46, Y47, E48, T49, P50, D51, K52, V53, Q54, E55, L56, T57, L58, G59, I60, D61, P62, G63, R64, T65, N66, I67, G68, I69, A70, V71, V72, D73, E
  • the amino acid mutation leads to an increased guide sequence-specific endonuclease activity, or wherein the IscB polypeptide comprising said amino acid mutation has an increased guide sequence-specific endonuclease activity compared to the reference or wild type IscB polypeptide of any one of SEQ ID NOs: 1-19, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%
  • the IscB polypeptide comprising said amino acid mutation leads to increased guide sequence-specific base editing efficiency compared to the reference or wild type IscB polypeptide of any one of SEQ ID NOs: 1-19, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
  • the IscB polypeptide has decreased guide sequence-independent (off-target) endonuclease activity or substantially lacks guide sequence-independent (off-target) endonuclease activity.
  • the amino acid mutation leads to a decreased guide sequence-independent (off-target) endonuclease activity, or wherein the IscB polypeptide comprising said amino acid mutation has a decreased guide sequence-independent (off-target) endonuclease activity compared to the reference or wild type IscB polypeptide of any one of SEQ ID NOs: 1-19, e.g., a decrease by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%.
  • the IscB polypeptide is capable of recognizing a target adjacent motif (TAM) comprising, consisting essentially of, or consisting of 5’-NNNGNA-3’ immediately 3’ adjacent to a protospacer sequence of a target DNA, wherein N is A, T, G, or C.
  • TAM target adjacent motif
  • the IscB polypeptide has an increased guide-sequence specific endonuclease activity compared to that of SEQ ID NO: 16 for a protospacer sequence of a target DNA immediately 5’ adjacent to a target adjacent motif (TAM) comprising, consisting essentially of, or consisting of 5’-NNNGNA-3’, wherein N is A, T, G, or C.
  • TAM target adjacent motif
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of K30, K37, H38, N39, P50, V53, E74, E77, V79, H83, E85, K93, A96, K99, Q103, A104, H107, V133, E142, E159, P160, L172, T179, K180, E182, K196, E201, D204, H205, H214, L217, F218, E221, S222, D224, D225, Y228, A229, E232, G233, K234, G239, I250, H253, E254, A261, S268, G269, D272, L273, A278, A280, D282, K283, A285, V287, K292, K307, T310, A313, D318, E326, S328, F329, I330, S333, A335, P337, S350,
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from or that is a position the group consisting of D61, E193, and H248 of SEQ ID NO: 16.
  • the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution.
  • the amino acid substitution is an amino acid substitution with
  • a polar amino acid residue such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ,
  • a positively charged amino acid residue such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/H) ) , or
  • a negatively charged amino acid residue such as, Aspartic Acid (Asp/D) , Glutamic Acid (Glue/E) .
  • the amino acid substitution is an amino acid substitution with a positively charged amino acid residue, such as, Arginine (R) .
  • the amino acid substitution is an amino acid substitution with a non-polar amino acid residue, such as, Alanine (A) .
  • the amino acid mutation comprises a substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of K30R, K37R, H38R, N39R, P50R, V53R, E74R, E77R, V79R, H83R, E85R, K93R, A96R, K99R, Q103R, A104R, H107R, V133R, E142R, E159R, P160R, L172R, T179R, K180R, E182R, K196R, E201R, D204R, H205R, H214R, L217R, F218R, E221R, S222R, D224R, D225R, Y228R, A229R, E232R, G233R, K234R, G239R, I250R, H253R, E254R, A261R, S268R, G269R, D272R, L273R, A278R,
  • the amino acid mutation comprises a substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of T459E, P460S, T462H, T462L, T465V, and a combination of any two or more residues thereof, wherein the position is numbered according to SEQ ID NO: 16.
  • the amino acid mutation comprises a substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of E326R, T459E, P460S, T462H, and a combination of any two or more residues thereof, wherein the position is numbered according to SEQ ID NO: 16.
  • the IscB polypeptide comprising said amino acid mutation comprises a substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of D61A, E193A, and H248A, wherein the position is numbered according to SEQ ID NO: 16.
  • the amino acid mutation comprises a combination substitution corresponding to a combination substitution of E326R, P460S, T462H, and T459E, wherein the position is numbered according to SEQ ID NO: 16.
  • the amino acid mutation comprises a combination substitution corresponding to a combination substitution of D61A, E326R, P460S, T462H, and T459E, wherein the position is numbered according to SEQ ID NO: 16.
  • the IscB polypeptide comprising said amino acid mutation comprises, consists essentially of, or consists an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of SEQ ID NO: 16.
  • the IscB polypeptide comprising said amino acid mutation comprises, consists essentially of, or consists an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of SEQ ID NO: 239 or an N-terminal truncation of the amino acid sequence of SEQ ID NO: 239 lacking the most N-terminal Methionine (M) (coded by start codon ATG) .
  • M N-terminal Methionine
  • the IscB polypeptide comprising said amino acid mutation comprises, consists essentially of, or consists an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of SEQ ID NO: 240 or 241 or an N-terminal truncation of the amino acid sequence of SEQ ID NO: 240 or 241 lacking the most N-terminal Methionine (M) (coded by start codon ATG) .
  • M N-terminal Methionine
  • the IscB polypeptide is a nickase or has guide sequence-specific nickase activity.
  • the IscB polypeptide is endonuclease deficient.
  • the IscB polypeptide is fused to a functional domain to form a fusion protein.
  • the functional domain comprises an uracil glycosylase (UNG) .
  • UNG uracil glycosylase
  • the functional domain comprises a methylase or a catalytic domain thereof.
  • the scaffold sequence comprises a nucleotide mutation relative to a reference or wild type scaffold sequence of any one of SEQ ID NOs: 20-38.
  • thermodynamically unstable base pair is a A-U or U-Abase pair, a A-G or G-Abase pair, or a U-G or G-U base pair.
  • the stem-loop region is selected from the first 5’ stem loop region and the first 3’ stem loop region, wherein the first 5’ stem loop region is counted from the 5’ end of the reference scaffold sequence, and wherein the first 3’ stem loop region is counted from the 3’ end of the reference scaffold sequence.
  • the plant is a dicotyledon; optionally selected from the group consisting of soybean, cabbage (e.g., Chinese cabbage) , rapeseed, brassica, watermelon, melon, potato, tomato, tobacco, eggplant, pepper, cucumber, cotton, alfalfa, eggplant, grape;
  • the disclosure provides a cell modified by the method of the disclosure.
  • the disclosure provides a pharmaceutical composition
  • a pharmaceutical composition comprising (1) the system of the disclosure, the vector of the disclosure, the rAAV particle of the disclosure, the ribonucleoprotein of the disclosure, the lipid nanoparticle of the disclosure, or the cell of the disclosure; and (2) a pharmaceutically acceptable excipient.
  • the method comprises introducing a nucleotide mutation into the scaffold sequence relative to a reference or wild type scaffold sequence of any one of SEQ ID NOs: 20-38.
  • the nucleotide mutation comprises a deletion of about, at least about, or at most about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 nucleotides in a stem-loop region of the reference scaffold sequence.
  • the stem-loop region is selected from the first 5’ stem loop region, the second 5’ stem loop region, the third 5’ stem loop region, the fourth 5’ stem loop region, the fifth 5’ stem loop region, or the sixth 5’ stem loop region of the reference scaffold sequence, wherein the first, the second, the third, the fourth, the fifth, and the sixth 5’ stem loop region are counted from the 5’ end of the reference scaffold sequence.
  • the IscB polypeptide of the disclosure may be combined /associated with one or more functional domains for additional function other than cleavage, e.g., deamination, base editing, prime editing, enabling various modifications of target DNA.
  • the IscB polypeptide further comprises a functional domain fused to the IscB polypeptide with or without a linker to form a fusion protein.
  • the linker is a GS linker containing multiple glycine (GS) and serine (S) residues, XTEN linker (SEQ ID NO: 100) , XTEN&GS linker (SEQ ID NO: 99) containing XTEN linker (SEQ ID NO: 100) , or bpSV40 NLS&GS linker (SEQ ID NO: 111) containing bpSV40 NLS (SEQ ID NO: 110) .
  • the NLS is SV40 NLS (such as, SEQ ID NO: 11) , bpSV40 NLS (BP NLS; bpNLS; such as, SEQ ID NO: 110) , or NP NLS (Xenopus laevis Nucleoplasmin NLS; nucleoplasmin NLS) (such as, SEQ ID NO: 12) .
  • the deaminase or catalytic domain thereof is a cytosine deaminase or a catalytic domain thereof (e.g., an apolipoprotein B mRNA-editing complex (APOBEC) family deaminase, an activation induced deaminase (AID) , a cytidine deaminase 1 from Petromyzon marinus (pmCDA1) , or a functional variant thereof, e.g., APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H, hAPOBEC3-W104A (SEQ ID NO: 113) ) a cytosine deaminase or a catalytic domain thereof (e.g., an apolipoprotein B mRNA-editing complex (APOBEC) family deamin
  • the fusion protein comprises the amino acid sequence of SEQ ID NO: 98, or an amino acid sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of SEQ ID NO: 98.
  • 80% e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%
  • the IscB polypeptide or fusion protein of the disclosure may be used in combination with a guide nucleic acid as described herein to constitute a system comprising the IscB polypeptide or fusion protein and the guide nucleic acid, i.e., an IscB system.
  • the disclosure provides a system comprising:
  • the IscB polypeptide or fusion protein of the disclosure or a polynucleotide (e.g., a DNA, an RNA) encoding the IscB polypeptide or fusion protein, and
  • the system is a complex comprising the IscB polypeptide or fusion protein complexed with the guide nucleic acid.
  • the complex further comprises the target DNA hybridized with the guide sequence.
  • the scaffold sequence is 3’ to the guide sequence.
  • the system further comprises a donor polynucleotide for integration or insertion into the target DNA.
  • the disclosure provides a guide nucleic acid described herein.
  • the scaffold sequence leads to an increased guide sequence-specific (on-target) endonuclease activity compared to that led by SEQ ID NO: 2 when both are used in otherwise identical guide nucleic acid in combination with a same IscB polypeptide (e.g., the engineered IscB polypeptide of the disclosure) , e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, or more.
  • a same IscB polypeptide e.g., the engineered IscB polypeptide of the disclosure
  • the scaffold sequence comprises the polynucleotide sequence of any one of SEQ ID NOs: 21-26, 30-31, 33, 36-38, 40, and 43-45, or a polynucleotide sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the polynucleotide sequence of any one of SEQ ID NOs: 21-26, 30-31, 33, 36-38, 40, and 43-45.
  • the disclosure provides a guide nucleic acid comprising the scaffold sequence of the disclosure.
  • the protospacer sequence comprises about or at least about 14 contiguous nucleotides of the target DNA, e.g., about or at least about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more contiguous nucleotides of the target DNA, or in a numerical range between any two of the preceding values, e.g., from about 14 to about 50, or from about 17 to about 22 contiguous nucleotides of the target DNA.
  • the protospacer sequence comprises about 16 contiguous nucleotides of the target DNA.
  • the protospacer sequence comprises about 16 contiguous nucleotides of the target DNA.
  • the protospacer sequence is immediately 5’ to a target adjacent motif (TAM) .
  • TAM target adjacent motif
  • the TAM is 5’-NNNNNN-3’, wherein N is A, T, G, or C.
  • the TAM is 5’-NNNGAN-3’, wherein N is A, T, G, or C.
  • the target sequence is immediately 3’ to a target adjacent motif (TAM) , or the reversely complementary sequence of the target sequence (i.e., the protospacer sequence) is immediately 5’ to a target adjacent motif (TAM) .
  • TAM target adjacent motif
  • the TAM is 5’-NNNNNN-3’, wherein N is A, T, G, or C.
  • the TAM is 5’-NNNGAN-3’, wherein N is A, T, G, or C.
  • the guide sequence is about or at least about 14 nucleotides in length, e.g., about or at least about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more nucleotides in length, or in a length of a numerical range between any two of the preceding values, e.g., in a length of from about 14 to about 50 nucleotides, or from about 17 to about 22 nucleotides.
  • the guide sequence is about 16 nucleotides in length.
  • the guide sequence comprises a sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the sequence of any one of SEQ ID NOs: 50-72, 116-125, 136-151, 168-212, and 258-302; or a sequence having at most 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotide differences, whether consecutive or not, compared to the sequence of any one of SEQ ID NOs: 50-72, 116-125, 136-151, 168-212, and 258-302.
  • the guide sequence comprises any one of SEQ ID NOs: 50-72, 116-125, 136-151, 168-212, and 258-302.
  • the system comprises two or more guide nuclei acids comprising two or more guide sequences capable of hybridizing to two or more target sequences of the same target DNA or different target DNAs, wherein the two or more guide sequences are the same or different, and wherein the two or more target sequences are the same or different.
  • the disclosure provides a guide nucleic acid comprising the guide sequence of the disclosure.
  • the target DNA is a target dsDNA, such as, a eukaryotic dsDNA, e.g., a gene in a eukaryotic cell.
  • a target dsDNA such as, a eukaryotic dsDNA, e.g., a gene in a eukaryotic cell.
  • the target dsDNA comprises a protospacer sequence on a nontarget strand of the target dsDNA, wherein the dsDNA comprises a target deoxyribonucleotide (e.g., dA, dT, dC, dG) at a position of the protospacer sequence selected from the group consisting of position 2, position 3, position 4, position 5, position 6, position 7, position 8, position 9, position 10, position 11, position 12, position 13, position 14, and a combination thereof; or wherein the target deoxyribonucleotide is at a position of the protospacer sequence between position 2 and position 12 or between position 3 and position 14, both inclusive.
  • a target deoxyribonucleotide e.g., dA, dT, dC, dG
  • the polynucleotide encoding the polypeptide of the disclosure is operably linked to or under the regulation of a promoter.
  • Adeno-associated virus when engineered to delivery, e.g., a protein-encoding sequence of interest, may be termed as a (r) AAV vector, a (r) AAV vector particle, or a (r) AAV particle, where “r” stands for “recombinant” .
  • the nucleic acid packaged in AAV vectors for delivery may be termed as a (r) AAV vector genome, vector genome, or vg for short, while viral genome may refer to the original viral genome of natural AAVs.
  • the rAAV particle comprising a capsid with a serotype suitable for delivery into ear cells (e.g., inner hair cells) .
  • the rAAV particle comprising a capsid with a serotype of AAV1, AAV2, AAV3A, AAV3B, AAV4, AAV5, AAV6, AAV7, AAVrh74, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV-DJ, or AAV. PHP.
  • the serotype of the capsid is AAV9 or a functional variant thereof.
  • rAAV particles may be produced using the triple transfection method (described in detail in U.S. Pat. No. 6,001,650) .
  • the vector titers are usually expressed as vector genomes per ml (vg/ml) .
  • the vector titer is above 1 ⁇ 10 9 , above 5 ⁇ 10 10 , above 1 ⁇ 10 11 , above 5 ⁇ 10 11 , above 1 ⁇ 10 12 , above 5 ⁇ 10 12 , or above 1 ⁇ 10 13 vg/ml.
  • RNA as a vector genome into a rAAV particle
  • systems and methods of packaging an RNA as a vector genome into a rAAV particle is recently developed and applicable herein. See PCT/CN2022/075366, which is incorporated herein by reference in its entirety.
  • sequence elements described herein for DNA vector genomes when present in RNA vector genomes, should generally be considered to be applicable for the RNA vector genomes except that the deoxyribonucleotides in the DNA sequence are the corresponding ribonucleotides in the RNA sequence (e.g., dT is equivalent to U, and dA is equivalent to A) and/or the element in the DNA sequence is replaced with the corresponding element with a corresponding function in the RNA sequence or omitted because its function is unnecessary in the RNA sequence and/or an additional element necessary for the RNA vector genome is introduced.
  • dT is equivalent to U
  • dA is equivalent to A
  • a coding sequence e.g., as a sequence element of rAAV vector genomes herein, is construed, understood, and considered as covering and covers both a DNA coding sequence and an RNA coding sequence.
  • an RNA sequence can be transcribed from the DNA coding sequence, and optionally further a protein can be translated from the transcribed RNA sequence as necessary.
  • the RNA coding sequence per se can be a functional RNA sequence for use, or an RNA sequence can be produced from the RNA coding sequence, e.g., by RNA processing, or a protein can be translated from the RNA coding sequence.
  • an IscB polypeptide coding sequence encoding an IscB polypeptide covers either an IscB polypeptide DNA coding sequence from which an IscB polypeptide is expressed (indirectly via transcription and translation) or an IscB polypeptide RNA coding sequence from which an IscB polypeptide is translated (directly) .
  • a gRNA coding sequence encoding a gRNA covers either a gRNA DNA coding sequence from which a gRNA is transcribed or a gRNA RNA coding sequence (1) which per se is the functional gRNA for use, or (2) from which a gRNA is produced, e.g., by RNA processing.
  • RNA packaging signals may be unnecessary and can be omitted at least partly, while RNA packaging signals can be introduced.
  • a promoter to drive transcription of DNA sequences may be unnecessary and can be omitted at least partly.
  • a sequence encoding a polyA signal may be unnecessary and can be omitted at least partly, while a polyA tail can be introduced.
  • other DNA elements of rAAV DNA vector genomes can be either omitted or replaced with corresponding RNA elements and/or additional RNA elements can be introduced, in order to adapt to the strategy of delivering an RNA vector genome by rAAV particles.
  • the disclosure provides a ribonucleoprotein (RNP) comprising the engineered IscB polypeptide of the disclosure and a guide nucleic acid.
  • RNP ribonucleoprotein
  • the guide nucleic acid is as described herein.
  • the disclosure provides a lipid nanoparticle (LNP) comprising an RNA (e.g., mRNA) encoding the engineered IscB polypeptide of the disclosure and a guide nucleic acid.
  • LNP lipid nanoparticle
  • the guide nucleic acid is as described herein.
  • the disclosure provides a cell comprising the engineered IscB polypeptide of the disclosure, the system of the disclosure, the polynucleotide of the disclosure, the vector of the disclosure, the rAAV particle of the disclosure, the RNP of the disclosure, or the LNP of the disclosure.
  • the system of the disclosure comprising the IscB polypeptide or fusion protein of the disclosure has a wide variety of utilities, including modifying (e.g., cleaving, deleting, inserting, base editing, translocating, inactivating, or activating) a target DNA in a multiplicity of cell types.
  • the system has a broad spectrum of applications requiring high activity /efficiency and small sizes, e.g., drug screening, disease diagnosis and prognosis, and treating various genetic disorders.
  • the method and/or the system of the disclosure can be used to modify a target DNA, for example, to modify the translation and/or transcription of one or more genes of the cells.
  • the modification may lead to increased transcription /translation /expression of a gene.
  • the modification may lead to decreased transcription /translation /expression of a gene.
  • the disclosure provides a method for modifying a target DNA, comprising contacting the target DNA with the system of the disclosure, the vector of the disclosure, the ribonucleoprotein of the disclosure, or the lipid nanoparticle of the disclosure, wherein the guide sequence is capable of hybridizing to a target sequence of the target DNA, wherein the target DNA is modified by the complex.
  • the modification includes indel event, double-strand cleavage (double stranded break, DSB) , single-strand cleavage (nick) (e.g., on either target strand or nontarget stand of a dsDNA) , base editing (e.g., single base editing) , prime editing, and integration or insertion of exogenous donor (e.g., by homologous recombination) .
  • the method is in vitro, in vivo, or ex vivo.
  • the target DNA is in a cell.
  • the disclosure provides a cell comprising the system of the disclosure.
  • the disclosure provides a cell modified by the system of the disclosure or the method of the disclosure.
  • the cell is modified in vitro, in vivo, or ex vivo.
  • the cell is a eukaryotic cell (e.g., an animal cell, a vertebrate cell, a mammalian cell, a non-human mammalian cell, a non-human primate cell, a rodent (e.g., mouse or rat) cell, a human cell, a plant cell, or a yeast cell) or a prokaryotic cell (e.g., a bacteria cell) .
  • a eukaryotic cell e.g., an animal cell, a vertebrate cell, a mammalian cell, a non-human mammalian cell, a non-human primate cell, a rodent (e.g., mouse or rat) cell, a human cell, a plant cell, or a yeast cell
  • a prokaryotic cell e.g., a bacteria cell
  • the cell is from a plant or an animal. In some embodiments, the cell is not from a plant.
  • the cell is a non-human mammalian cell, such as a cell from a non-human primate (e.g., monkey) , an ox /cow /bull /cattle, sheep, goat, pig, horse, dog, cat, rodent (such as rabbit, mouse, rat, hamster, etc. ) , alpaca.
  • the cell is from fish (such as salmon, zebra fish) , bird (such as poultry bird, including chick, duck, goose) , reptile, shellfish (e.g., oyster, clam, lobster, shrimp) , insect, worm, yeast, etc.
  • the plant is a dicotyledon.
  • the dicotyledon is selected from the group consisting of soybean, cabbage (e.g., Chinese cabbage) , rapeseed, brassica, watermelon, melon, potato, tomato, tobacco, eggplant, pepper, cucumber, cotton, alfalfa, eggplant, grape.
  • the plant is a monocotyledon.
  • the cell is from a plant, such as monocot or dicot.
  • the plant is a food crop such as barley, cassava, cotton, groundnuts or peanuts, maize, millet, oil palm fruit, potatoes, pulses, rapeseed or canola, rice, rye, sorghum, soybeans, sugar cane, sugar beets, sunflower, and wheat.
  • the plant is a cereal (barley, maize, millet, rice, rye, sorghum, and wheat) .
  • the plant is a tuber (cassava and potatoes) .
  • the plant is a sugar crop (sugar beets and sugar cane) .
  • the disclosure provides a pharmaceutical composition
  • a pharmaceutical composition comprising (1) the system of the disclosure, the vector of the disclosure, the rAAV particle of the disclosure, the ribonucleoprotein of the disclosure, the lipid nanoparticle of the disclosure, or the cell of the disclosure; and (2) a pharmaceutically acceptable excipient.
  • the disease is selected from the group consisting of Angelman syndrome (AS) , Alzheimer's disease (AD) , transthyretin amyloidosis (ATTR) , transthyretin amyloid cardiomyopathy (ATTR-CM) , cystic fibrosis (CF) , hereditary angioedema, diabetes, progressive pseudohypertrophic muscular dystrophy, Duchenne muscular dystrophy (DMD) , Becker muscular dystrophy (BMD) , spinal muscular atrophy (SMA) , alpha-1-antitrypsin deficiency, Pompe disease, myotonic dystrophy, Huntington’s disease (HTT) , fragile X syndrome, Friedreich ataxia, amyotrophic lateral sclerosis (ALS) , frontotemporal dementia, hereditary chronic kidney disease, hyperlipidemia, Leber congenital amaurosis (LCA) , sickle cell disease, thalassemia (e.g., ⁇ -thalassemia)
  • the target DNA encodes a mRNA, a tRNA, a ribosomal RNA (rRNA) , a microRNA (miRNA) , a non-coding RNA, a long non-coding (lnc) RNA, a nuclear RNA, an interfering RNA (iRNA) , a small interfering RNA (siRNA) , a ribozyme, a riboswitch, a satellite RNA, a microswitch, a microzyme, or a viral RNA.
  • iRNA interfering RNA
  • siRNA small interfering RNA
  • the target DNA is in a eukaryotic cell, for example, a human cell, a non-human primate cell, or a mouse cell.
  • the administrating comprises local administration or systemic administration.
  • the administrating comprises intrathecal administration, intramuscular administration, intravenous administration, transdermal administration, intranasal administration, oral administration, mucosal administration, intraperitoneal administration, intracranial administration, intracerebroventricular administration, or stereotaxic administration.
  • the administration is injection or infusion.
  • the subject is a human, a non-human primate, or a mouse.
  • the level of the transcript (e.g., mRNA) of the target DNA is decreased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the transcript (e.g., mRNA) of the target DNA in the subject prior to the administration.
  • the level of the transcript (e.g., mRNA) of the target DNA is increased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the transcript (e.g., mRNA) of the target DNA in the subject prior to the administration.
  • the level of the expression product (e.g., protein) of the target DNA is decreased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the expression product (e.g., protein) of the target DNA in the subject prior to the administration.
  • the level of the expression product (e.g., protein) of the target DNA is increased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the expression product (e.g., protein) of the target DNA in the subject prior to the administration.
  • the expression product is a functional mutant of the expression product of the target DNA.
  • the median survival of the subject suffering from the disease but receiving the administration is 5 days, 10 days, 20 days, 30 days, 2 months, 3 months, 4 months, 5 months, 6 months, 7 months, 8 months, 9 months, 10 months, 11 months, 12 months, 1.5 year, 2 years, 2.5 years, 3 years, 4 years, 5 years, 6 years, 7 years, 8 years, 9 years, 10 years or more longer than that of a subject or a population of subjects suffering from the disease and not receiving the administration.
  • the therapeutically effective dose may be either via a single dose, or multiple doses.
  • the actual dose may vary greatly depending upon a variety of factors, such as the vector choices, the target cells, organisms, tissues, the general conditions of the subject to be treated, the degrees of transformation/modification sought, the administration routes, the administration modes, the types of transformation/modification sought, etc.
  • the therapeutically effective dose of the rAAV particle may be about 1.0E+8, 2.0E+8, 3.0E+8, 4.0E+8, 6.0E+8, 8.0E+8, 1.0E+9, 2.0E+9, 3.0E+9, 4.0E+9, 6.0E+9, 8.0E+9, 1.0E+10, 2.0E+10, 3.0E+10, 4.0E+10, 6.0E+10, 8.0E+10, 1.0E+11, 2.0E+11, 3.0E+11, 4.0E+11, 6.0E+11, 8.0E+11, 1.0E+12, 2.0E+12, 3.0E+12, 4.0E+12, 6.0E+12, 8.0E+12, 1.0E+13, 2.0E+13, 3.0E+13, 4.0E+13, 6.0E+13, 8.0E+13, 1.0E+14, 2.0E+14, 3.0E+14, 4.0E+14, 6.0E+14, 8.0E+14, 1.0E+15, 2.0E+15, 2.0
  • the disclosure provides a kit comprising the IscB polypeptide or fusion protein of the disclosure, the system of the disclosure, the polynucleotide of the disclosure, the vector of the disclosure, the RNP of the disclosure, the LNP of the disclosure, the delivery system of the disclosure, the cell of the disclosure, or the pharmaceutical composition of the disclosure, or any one, two, or all components of the same.
  • the kit further comprises an instruction to use the component (s) contained therein, and/or instructions for combining with additional component (s) that may be available or necessary elsewhere.
  • the kit further comprises one or more buffers that may be used to dissolve any of the component (s) contained therein, and/or to provide suitable reaction conditions for one or more of the component (s) .
  • buffers may include one or more of PBS, HEPES, Tris, MOPS, Na 2 CO 3 , NaHCO 3 , NaB, or combinations thereof.
  • the reaction condition includes a proper pH, such as a basic pH. In some embodiments, the pH is between 7-10.
  • any one or more of the kit components may be stored in a suitable container or at a suitable temperature, e.g., 4 Celsius degree.
  • Orthogonal R-loop assay was performed to detect the gRNA-independent off-target editing as described previously 27 .0.8 ⁇ g of plasmids that encode IscB. m16*-ABE, enOgueIscB-ABE or SpG-ABE with their respective ⁇ RNA or single-guide RNA (sgRNA) , and 0.8 ⁇ g of dSaCas9 plasmids with corresponding sgRNA targeting five previously reported R-loop sites were co-transfected into HEK293T cells using PEI.
  • sgRNA single-guide RNA
  • transfected cells were analyzed by FACS followed by genomic DNA extraction with 25 ⁇ l of freshly prepared lysis buffer (Vazyme) containing proteinase K. Amplification and targeted deep sequencing were performed at ABE on-target sites and dSaCas9 R-loop off-target sites.
  • AAVs were manufactured by HuidaGene Therapeutics Inc (Shanghai, China) . Briefly, Cells were grown in culture until they reached a confluency of 70 -90%. Before transfection, the growth media was replaced with pre-warmed growth media. For each 15-cm dish, a mixture of 20 ⁇ g of pHelper, 10 ⁇ g of pRepCap, and 10 ⁇ g of GOI plasmid was prepared and added dropwise to the cell media. After a three-day incubation period, AAVs were harvested and purified using iodixanol density gradient centrifugation.
  • Tissues samples were homogenized using RIPA buffer supplemented with a protease inhibitor cocktail.
  • the supernatants of the lysates were quantified using a Pierce BCA protein assay kit (Thermo Fisher Scientific, 23225) and adjusted to a uniform concentration using H 2 O.
  • Equal volumes of the samples were mixed with NuPAGE LDS sample buffer (Invitrogen, NP0007) and 10% ⁇ -mercaptoethanol, then subjected to boiling at 70 °C for 10 min.
  • a total of 10 ⁇ g of protein per lane was loaded into 3%to 8%tris-acetate gels (Invitrogen, EA03752BOX) and underwent electrophoresis for 1 hour at 200 V.
  • the amino acid sequences of the wild type IscB proteins (named as IscB. m1 to IscB. m19, respectively) of these IscB systems are set forth in SEQ ID NOs: 1-19, respectively.
  • the human codon-optimized coding sequences of these IscB proteins are set forth in SEQ ID NO: 39-57, respectively.
  • the scaffold sequences of the ⁇ RNA in the IscB systems corresponding to these IscB proteins one by one are set forth in SEQ ID NOs: 20-38, respectively.
  • IscB systems were phylogenetically clustered into three subgroups based on sequence alignment of the IscB proteins (FIG. 1a, FIG. 6) .
  • the inventor identified the conserved residues within the RuvC, HNH, P1D and TID domains of the IscB proteins, suggesting the possibility of endonuclease and nickase activity of the identified IscB proteins (FIG. 7) .
  • the inventor performed a bacterial depletion assay.
  • the inventor co-transformed E. coli cells with plasmids carrying the IscB and ⁇ RNA with a spacer (a. k. a., a spacer sequence, a guide sequence) and a TAM library plasmid carrying target sequences complementary to the spacer and an 8-base pair (bp) randomized sequences (as potential TAM) (FIG. 8a) .
  • the inventor employed a fluorescence reporter system. This system involved co-transfecting a reporter plasmid encoding unactivated GFxxFP and an expression plasmid expressing one of the IscB proteins and its corresponding GFxxFP-targeting ⁇ RNA into cultured HEK293T cells. The inventor then measured the EGFP signal intensity of the GFxxFP reporter which was activated by IscB-mediated double-strand breaks (DSB) 21 (FIG. 1c) . Using the GFxxFP reporter with the experimentally determined TAM for each IscB, 10 (marked with “*” in FIG.
  • IscB. m16 SEQ ID NO: 16; coded by SEQ ID NO: 54; with ⁇ RNA scaffold sequence of SEQ ID NO: 35
  • IscB. m16 exhibited the highest signal intensity, indicating the highest endonuclease activity (FIG. 1d) .
  • Example 2 Engineering scaffold sequence of ⁇ RNA to improve editing efficiency
  • the inventor engineered the wild type scaffold sequence (SEQ ID NO: 35) of ⁇ RNA identified along with IscB. m16 by truncation or mutagenesis, generating multiple scaffold sequence variants (SEQ ID NOs: 58-97; FIG. 2b) with a change in one of five stem-loop regions namely R1, R2, R3, R4, and R5 (FIG. 2a, FIG. 9a) .
  • the five stem-loop regions are parts of the secondary structure of the scaffold sequence of ⁇ RNA.
  • the secondary structure of the scaffold sequence of an IscB ⁇ RNA can be depicted, like any other RNA, by well-known methods, e.g., online tool RNAfold (http: //rna. tbi. univie. ac. at/cgi-bin/RNAWebSuite/RNAfold. cgi) .
  • the inventor then generated scaffold sequence variants based on R1- ⁇ 59 (SEQ ID NO: 136) by the replacement of A-U to C-G base-pairs in the R1 stem loop and identified a variant (R1- ⁇ 59-M2; SEQ ID NO: 153) with even higher endonuclease activity than R1- ⁇ 59 (FIG. 2e) .
  • Scaffold sequence variants SEQ ID NOs: 168-195 based on SEQ ID NO: 37, scaffold sequence variants SEQ ID NOs: 196-222 based on SEQ ID NO:34, scaffold sequence variants SEQ ID NOs: 223-232 based on SEQ ID NO: 20, and scaffold sequence variants SEQ ID NOs: 233-238 based on SEQ ID NO: 27 were generated and tested with respective IscB. m18, IscB. 15, IscB. m1, and IscB. m8 for endonuclease activity.
  • m16 (SEQ ID NO: 16) is composed of, from N-to C-terminus, PLMP domain (positions 1-54) , RuvC-I domain (positions 55-85) , Bridge Helix (positions 86-122) , Linker (positions 123-160) , RuvC-II domain (positions 161-196) , HNH domain (positions 197-297) , RuvC-III domain (positions 298-374) , P1D domain (positions 375-429) , and TID domain (positions 430-513) .
  • the domain architecture of other IscB can be determined by sequence alignment with IscB. m16.
  • IscB. m16 According to the predictive structural analysis of IscB. m16 (SEQ ID NO: 16) , the inventor performed an arginine scanning mutagenesis, where a single amino acid residue arginine (R) was introduced to substitute the original amino acid residue at one indicated position of IscB. m16 (SEQ ID NO: 16) (Table A2) .
  • R arginine
  • A2R denotes IscB.
  • m16-A2R mutant containing a single substitution of A2R relative to IscB.
  • m16 using six GFxxFP reporters with different TAMs to broaden TAM recognition (FIG. 11) . These reporters had the same target sequences but different 6-base TAMs (AAAGAA, CAAGAA, ACAGAA, AACGAA, AAAGCA, AAAGAC) .
  • m16 SEQ ID NO: 16
  • 7 (seven) variants (each with single substitution M424R, T462R, N463R, T465R, Q475R, K478R, or I504R) (indicated by red arrows in FIG.
  • TAM pool 1 consisted of ACAGAA, AATGAA, AAACAA, AAAGCA, and AAAGAC, which were TAMs for which IscB. m16 showed relative low endonuclease activity (FIG. 12a-b) .
  • IscB. m16 mutants with a single substitution especially P460S, T462H, T462L or T465V mutant, greatly enhanced the endonuclease activity for TAM pool 1 (FIG. 3b, FIG. 12c) .
  • the inventor combined the single substitutions of E326R, P460S, T462H, T462L, and T465V in various ways, and obtained the best performing combination mutant of E326R + P460S + T462H, named as IscB.
  • m16 RSH SEQ ID NO: 405) (FIG. 3c, FIG. 12e) .
  • the inventor detected EGFP activation using 64 TAM reporters with 5’-NNNGAA-3’ TAM, and IscB.
  • m16 RSH showed high endonuclease activity (FIG. 12f) .
  • TAM recognition the inventor designed three additional TAM pools, pool 2 (TTTGAA, TTGGAA, TCAGAA, CTAGAA and CTGGAA) and pool 3 (GTAGAA, GTTGAA, GTCGAA, GTGGAA and GCAGAA) , as well as pool 4 (ATAGAA, TGTGAA, CTCGAA, and GAGGAA) as a positive pool (FIG. 12f) .
  • pool 2 TTTGAA, TTGGAA, TCAGAA, CTAGAA and CTGGAA
  • GTAGAA GTTGAA, GTCGAA, GTGGAA and GCAGAA
  • ATAGAA TGTGAA, CTCGAA, and GAGGAA
  • the inventor separately combined a single substitution at a site of T459, N463, Q475, L481 or I504 with IscB. m16 RSH , and evaluated the resulting mutants using reporters with TAM pool 2, pool 3 or pool 4 (FIG. 3d; “3M” refers to IscB. m16 RSH ) .
  • the inventor found that the combination of T459E with IscB. m16 RSH , named as IscB. m16 RESH (SEQ ID NO: 239) , exhibited increased editing efficiency at pool2 and pool3 reporters relative to IscB. m16 RSH (FIG. 3d) .
  • the inventor further optimized IscB. m16 scaffold sequence variant v2.27 (SEQ ID NO: 107) based on IscB. m16 RESH , and found that variant v2.27-M21 (R1- ⁇ 13, R5- ⁇ 10, T24G, G25C, T57C, T79C, A117G) (SEQ ID NO: 242) showed significantly enhanced endonuclease activity compared to v2.27 (SEQ ID NO: 107) , hereafter named as “en ⁇ RNA” (FIG. 3e) . Then the IscB system composed of IscB. m16 RESH and an ⁇ RNA composed of a guide sequence and the scaffold sequence variant en ⁇ RNA was designated as “IscB. m16*” . Unless otherwise indicated, en ⁇ RNA was used in all the subsequent experiments.
  • IscB m16 scaffold sequence variant v2.27-M21 (R1- ⁇ 13, R5- ⁇ 10, T24G, G25C, T57C, T79C, A117G) (hereafter named as “en ⁇ RNA” ) , wherein the g-c base pair at positions 57 and 117 (numbered according to WT IscB. m16 scaffold sequence of SEQ ID NO: 35) of IscB. m16 scaffold sequence variant v2.27 was further replaced with a c-g base pair (FIG. 22) .
  • the inventor explored the optimal guide sequence length (spacer length) using the GFxxFP fluorescence reporter, and found that IscB. m16 * achieved maximum endonuclease activity with a 14-nt spacer length at two different targets (FIG. 14a) .
  • the inventor examined the indel efficiency of IscB. m16 * at five endogenous loci in cultured HEK293T cells and found that IscB. m16*showed the highest endonuclease activity and the broadest range of deletion (FIG. 3f, FIG. 14b) compared to the other combinations of wild type IscB. m16 scaffold sequence or scaffold sequence variant en ⁇ RNA and wild type IscB. m16 or IscB. m16 RESH mutant. Furthermore, TAM identification of IscB. m16*using bacterial depletion indicated that IscB. m16*recognized a TAM of 5’-NNNGNA-3’, while IscB. m16 recognized a TAM of 5’-MRNRAA-3’ (R denotes A or G) (FIG. 3g) .
  • Example 4 IscB. m16*-mediated base editing in mammalian cells
  • the inventor constructed endonuclease-deficient IscB. m16 mutant D61A in RuvC-I, H248A in HNH domain, and D61A+H248A on the basis of IscB. m16 and IscB. m16*, respectively.
  • m16 RESH -D61A (nickase) is set forth in SEQ ID NO: 240, and the amino acid sequence of IscB.
  • m16 RESH -D61A+H248A (dead) is set forth in SEQ ID NO: 241.
  • IscB. m16*-ABE To comprehensively evaluate the base editing performance of IscB. m16*-ABE, the inventor designed dozens of TAM/PAM-matched endogenous loci for testing IscB. m16-ABE, IscB. m16*-ABE, enOgeuIscB-ABE 19 , and SpG-ABE 22 (FIG. 4a-b) . The inventor found that the base editing window of IscB. m16*-ABE ranged from positions 1 to 10 (counting the TAM as positions 15-20) , while the optimal base editing occurred within positions 2-5 (FIG. 4c) . At these matched G-containing TAM/PAM sites in HEK293T cells, IscB.
  • m16*-ABE showed significantly higher A-to-G base editing efficiency (46.15 ⁇ 4.08%) than that of IscB. m16-ABE (9.19 ⁇ 2.34%) and enOgeuIscB-ABE (31.34 ⁇ 4.90%) , and comparable base editing efficiency to SpG-ABE (50.77 ⁇ 4.13%) but with much smaller base editor size (FIG. 4a and 4d, FIG. 16) .
  • the indel activity of IscB. m16*-ABE was similar to that of enOgeuIscB-ABE but lower than that of SpG-ABE (FIG. 4e, FIG. 17a) .
  • IscB. m16*-ABE in HEK293T cells conducted gRNA-dependent off-target DNA editing at predictive sites using CasOFFinder 26 , and gRNA-independent off-target DNA editing using the orthogonal R-loop assay 27 at ALDH1A3-S1, VEGFA-S1, and EMX1-S2 target sites, respectively.
  • Targeted deep sequencing analysis revealed that IscB. m16*-ABE exhibited similar gRNA-dependent off-target effects as enOgeuIscB-ABE and SpG-ABE at predicted off-target sites (FIG. 18) .
  • IscB. m16*-ABE showed comparably low gRNA-independent off-target events to enOgeuIscB-ABE and SpG-ABE (FIG. 4g-h, FIG. 19) .
  • IscB. m16*-CBE exhibited comparable base editing efficiency and indel efficiency with enOgeuIscB-CBE and SpG-CBE, with base editing efficiencies of 60.01 ⁇ 8.08, 63.72 ⁇ 5.33, and 68.06 ⁇ 5.88, respectively (FIG. 4g-h, FIG. 17b-c) .
  • the IscB. m16*-based base editors can be packaged with ⁇ RNA into a single rAAV vector, making it a promising candidate for the treatment of certain genetic diseases, such as Duchenne muscular dystrophy (DMD) 28, 29 .
  • DMD Duchenne muscular dystrophy
  • Previous study has shown that exon 50 skipping of the dystrophin gene can restore the dystrophin expression in a mouse model with an exon 51 deletion, a mutation occurring in nearly 8%of DMD patients 30, 31 .
  • the inventor devised a strategy to disrupt the splicing signal with IscB.
  • IscB. m16*-CBE by converting the G within the splicing acceptor site ( ‘AG’ ) to other bases (A/C/T) , resulting in exon skipping (FIG. 5a) .
  • the inventor first tested the IscB. m16*-CBE with ⁇ RNA targeting the AG site adjacent to exon 50 in HEK293T cells. The inventor observed that IscB. m16*-CBE displayed approximate 25%conversion rate at position 10, which is the splicing acceptor site, while enOgeuIscB-CBE and SpG-CBE showed nearly no base editing activity at that position (FIG. 5b) . To conveniently package the IscB.
  • IscB. m16*-CBE-v1 carried one copy of BpNLS (bpSV40 NLS) (SEQ ID NO: 256) at the N-terminal of the CBE and one copy of NpNLS (SEQ ID NO: 257) at the C-terminal of the CBE, and IscB. m16*-CBE-v2 carried two copies of BpNLS at the N-terminal of the CBE and one copy of NpNLS followed by one copy of BpNLS at the C-terminal of the CBE.
  • BpNLS nuclear localization signal
  • IscB. m16*-CBE-v2 achieved an approximate 7%of G-to-H (G-to-A, G-to-T, and G-to-C) conversion and up to 30%level of exon 51 skipping (FIG. 5d-e) .
  • IscB. m16*-based base editor as a highly effective and broad-TAM miniature base editing tool, provides a promising approach for basic research and therapeutic applications.
  • the inventor identified 19 natural IscB orthologs with various TAM recognition, and 10 of those IscBs showed endonuclease activity in mammalian cells, highlighting the diversity of IscB family.
  • the inventor found that the truncation of the first (R1) and the last (e.g., R5/R6) stem loops of ⁇ RNA scaffold sequence usually enhanced the endonuclease activity of IscBs.
  • R1D first
  • RuvC domains of IscB the inventor developed IscB.
  • IscB. m16*-based base editors showed base editing efficiency comparable to SpG-BE but with much smaller size, and even higher than SpG-BE and enOgeuIscB-BE at some disease-related loci, such as DMD. Therefore, considering their compact size and extended editing scope, IscB. m16*-based base editors have high potential to be alternatives to enOgeuIscB-and Cas9-based base editors, especially for AAV based therapeutics applications.
  • This Example demonstrates the C-to-T base editing efficiency of the miCBE of the disclosure to design optimal IscB-based CBE configuration.
  • a miCBE expression plasmid and a guide RNA expression plasmid were constructed for the detection of the C-to-T base editing efficiency of the miCBE of the disclosure.
  • the miCBE expression plasmid comprised, from 5’ to 3’, CMV enhancer 1 (SEQ ID NO: 428) , CAG promoter (SEQ ID NO: 429) , hybrid intron (SEQ ID NO: 430) , a sequence encoding a miCBE based on OgeuIscB, a bGH polyA signal coding sequence (SEQ ID NO: 431) , CMV enhancer 2 (SEQ ID NO: 432) , CMV promoter (SEQ ID NO: 433) , and a mCherry (SEQ ID NO: 434) coding sequence indicative of successful transfection and expression of the miCBE expression plasmid.
  • the guide RNA expression plasmid comprised, from 5’ to 3’, U6 promoter (SEQ ID NO: 435) , a sequence encoding a EXM1-S1-targeting-or PCSK9-S4-targeting-guide RNA (SEQ ID NO: 436 or 439) composed of a EXM1-S1-targeting-or PCSK9-S4-targeting-guide sequence (SEQ ID NO: 437 or 440) and a scaffold sequence (SEQ ID NO: 442) 3’ to the guide sequence, CMV enhancer 2 (SEQ ID NO: 432) , CMV promoter (SEQ ID NO: 433) , and a EGFP (SEQ ID NO: 443) coding sequence indicative of successful transfection and expression of the guide RNA expression plasmid.
  • U6 promoter SEQ ID NO: 435
  • SEQ ID NO: 436 or 439 composed of a EXM1-S1-targeting-or PCSK9-S4-targeting-guide sequence
  • FIG. 24 The structures of the miCBEs (miCBE-v1 to v10) tested in this Example are shown in FIG. 24.
  • A3A W104A APOBEC3A-W104A mutant.
  • OgeuIscB D61A OgeuIscB-D61A+E85R+H369R+S387R+S457R (IscB nickase) .
  • the amino acid sequences of miABE-v1 to v10 fusions are set forth in SEQ ID NOs: 406-415, respectively.
  • HEK293T cells were cultured in 24-well tissue culture plates according to standard methods for 12 hours, before the miCBE expression plasmid and the guide RNA expression plasmid (synthesized by GenScript Co., Ltd. ) were co-transfected into the cells using standard polyethyleneimine (PEI) transfection. The transfected cells were then cultured at 37°C under CO2 for 48 hours. mCherry and EGFP dual-positive cells were sorted from the cultured cells by flow cytometry.
  • PEI polyethyleneimine
  • Table 1 shows the base editing efficiency (%) of each tested miCBE at each indicated site for target EXM1-S1.
  • Sites C9, C11, C12, and C13 refer to the cytidines at positions 9, 11, 12, and 13 of the EXM1-S1 protospacer sequence (SEQ ID NO: 438) , respectively.
  • Untreated blank HEK293 cells with no transfection of the two expression plasmids.
  • miCBE-v10, miCBE-v1, and miCBE-v9 are only difference in the number of UGI domain contained therein.
  • miCBE-v10 contains one UGI domain
  • miCBE-v1 contains two UGI domains
  • miCBE-v9 contains three UGI domains.
  • Tables 3 and 4 for either EMX1 or PCSK9 targe gene, miCBE-v1 with two UGI domains achieved significantly improved base editing efficiency than that of miCBE-v9 with three UGI domains at all the indicated sites, and miCBE-v10 with just one UGI domain even achieved significantly improved base editing efficiency than that of miCBE-v1 with two UGI domains at all the indicated sites. It is thus concluded that one UGI domain is superior to two UGI domains for the miCBE of the disclosure, and two UGI domains are superior to three UGI domains for the miCBE of the disclosure.
  • miCBE-v10 achieved a higher base editing efficiency than not only miCBE-v1 but all miCBE-v1 to v8 that contain two UGI domains at all the four sites. It is thus concluded that miCBE-v10 is superior to all the tested miCBEs with two UGI domains.
  • miCBE-v10 achieved a higher base editing efficiency than not only miCBE-v1 but all miCBE-v1 to v8 that contain two UGI domains at sites C6, C10, and C11, and achieved a base editing efficiency comparable to the best two-UGI-containing miCBE at sites C4 (slightly lower than miCBE-v6 by about 3%) , C5 (slightly lower than miCBE-v5 by about 3%) , C8 (slightly lower than miCBE-v6 by about 1%) , and C9 (slightly lower than miCBE-v6 by about 1%) .
  • Considering the benefit of saving one UGI domain e.g., reduced tool size
  • N -cytidine deaminase domain -IscB nickase -C configuration is superior to N -IscB nickase -cytidine deaminase domain -C configuration for the miCBE of the disclosure.
  • the data of miCBE-v1, miCBE-v2, and miCBE-v6 in Tables 1 and 2 are rearranged into Tables 11 and 12, respectively, for direct comparison.
  • the orientation of N -cytidine deaminase domain -IscB nickase -C was constant across miCBE-v1, miCBE-v2, and miCBE-v6, and the UGI domains were placed at the C-terminal, in the middle, and at the N-terminal of the base editor, respectively.
  • the comparison shows that for EMX1 target gene, the base editing efficiency with UGI domains at the C-terminal of miCBE is superior to the UGI domains either in the middle or at the N-terminal of miCBE; and for PCSK9 target gene, the base editing efficiency with UGI domains at the C-terminal of miCBE is superior to the UGI domains in the middle of miCBE, and the base editing efficiency with UGI domains at the N-terminal of miCBE is superior to the UGI domains at the C-terminal of miCBE.
  • UGI domains may be at either the N-terminal or C-terminal of miCBE to adapt to specific targets, and it is generally not desired to place UGI domains in the middle of miCBE between the cytidine deaminase domain and the IscB nickase.
  • the comparison shows that the XTEN linker achieved higher base editing efficiency than the GS-bpNLS-GS linker for both EMX1 and PCSK9 target genes at all indicated sites.
  • the comparison also shows that the GS linker achieved significantly higher base editing efficiency than the GS-bpNLS-GS linker for EMX1 target gene and slightly higher or lower base editing efficiency than the GS-bpNLS-GS linker for PCSK9 target gene.
  • the introduction of the additional bpNLS did not bring about a substantial advantage over the either the XTEN linker or the GS linker but undesirably increased the overall size of the miCBE.
  • Example 7 Evaluation of A-to-G base editing efficiency of mini IscB adenine base editor (miABE)
  • a miABE expression plasmid and a guide RNA expression plasmid were constructed for the detection of the C-to-T base editing efficiency of the miABE of the disclosure.
  • the miABE expression plasmid comprised, from 5’ to 3’, CMV enhancer 1 (SEQ ID NO: 428) , CAG promoter (SEQ ID NO: 429) , hybrid intron (SEQ ID NO: 430) , a sequence encoding a miABE based on OgeuIscB, a bGH polyA signal coding sequence (SEQ ID NO: 431) , CMV enhancer 2 (SEQ ID NO: 432) , CMV promoter (SEQ ID NO: 433) , and a mCherry (SEQ ID NO: 434) coding sequence indicative of successful transfection and expression of the miABE expression plasmid.
  • Tables 17 and 18 shows the base editing efficiency (%) of each tested miABE at each indicated site for target VEGFA-S3.
  • Sites A3, A4, A5, A7, A9, and A11 refer to the adenines at positions 3, 4, 5, 7, 9, and 11 of the VEGFA-S3 protospacer sequence (SEQ ID NO: 456) , respectively.
  • miABE-v1, v3, and -v4 are only different in the linker between the IscB nickase and the adenine deaminase domain (TadA8e-V106W) , which is 32 aa GS-XTEN-GS linker (SEQ ID NO: 459) in v1, 21 aa GS linker (SEQ ID NO: 449) in v3, and 34 aa GS-bpNLS-GS linker (SEQ ID NO: 450) in v4.
  • both miABE-v1 and -v4 achieved significantly improved base editing efficiency than miABE-v3 for both VEGFa-S1 and VEGFa-S3 targets.
  • miABE-v6 and v7 in Tables 16 and 18 are rearranged into Tables 23 and 24, respectively, for direct comparison.
  • miABE-v6 and -v7 are only different in the linker between the N-terminal adenine deaminase domain (TadA8e-V106W) and the IscB nickase, which is 32 aa GS-XTEN-GS linker (SEQ ID NO: 459) in v6 and 34 aa GS-bpNLS-GS linker (SEQ ID NO: 450) in v7.
  • miABE-v6 achieved a higher or lower base editing efficiency than miABE-v7 depending on the target site of the target sequence. It was thus further confirmed that the selection of use of the GS-XTEN-GS linker (SEQ ID NO: 459) or the GS-bpNLS-GS linker (SEQ ID NO: 450) for the miABEs of the disclosure may depend on a target sequence.
  • miABE-v10 achieved significantly improved base editing efficiency than miABE-v7 at A2, A3, A4, and A6 for VEGFa-S1 target and significantly improved base editing efficiency than miABE-v7 at A3, A4, A5, A7, and A9 for VEGFa-S3 target and comparable base editing efficiency at A11. It is thus generally concluded that it would be advantageous to use two adenine deaminase domains in tandem (not being separated by the IscB nickase) than two adenine deaminase domains separated by the IscB nickase for the miABEs of the disclosure.
  • miABE-v8 and -v9 are only different in the orientation of the IscB nickase and the two adenine deaminase domains (TadA8e-V106W) in tandem. As shown from the comparison, miABE-v8 achieved significantly improved base editing efficiency than miABE-v9 for both VEGFa-S1 and VEGFa-S3 targets at all indicated sites except for site A11 of VEGFa-S1 target.
  • miABE-v10 and -v11 are only different in the orientation of the IscB nickase and the two adenine deaminase domains (TadA8e-V106W) in tandem. As shown from the comparison, miABE-v10 achieved significantly improved base editing efficiency than miABE-v11 for both VEGFa-S1 and VEGFa-S3 targets at all indicated sites.
  • miABE-v8 and -v10 are only different in the linker combination of the first linker between the IscB nickase and the two adenine deaminase domains (TadA8e-V106W) in tandem and the second linker between the two adenine deaminase domains (TadA8e-V106W) in tandem, which is two GS-XTEN-GS linkers (SEQ ID NO: 459) in miABE-v8 and two GS-bpNLS-GS linkers (SEQ ID NO: 450) in miABE-v10.
  • miABE-v10 achieved significantly improved base editing efficiency than miABE-v8 for both VEGFa-S1 and VEGFa-S3 targets at all indicated sites.
  • T. et al. Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease. Nature 599, 692-696 (2021) .

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

L'invention concerne divers polypeptides IscB, des protéines de fusion, des systèmes IscB et leurs utilisations.
PCT/CN2025/072127 2024-01-11 2025-01-13 Polypeptides iscb et leurs utilisations Pending WO2025149083A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNPCT/CN2024/071744 2024-01-11
CN2024071744 2024-01-11

Publications (1)

Publication Number Publication Date
WO2025149083A1 true WO2025149083A1 (fr) 2025-07-17

Family

ID=96386415

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2025/072127 Pending WO2025149083A1 (fr) 2024-01-11 2025-01-13 Polypeptides iscb et leurs utilisations

Country Status (1)

Country Link
WO (1) WO2025149083A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022087494A1 (fr) * 2020-10-23 2022-04-28 The Broad Institute, Inc. Nucléases iscb reprogrammables et leurs utilisations
WO2023097228A1 (fr) * 2021-11-23 2023-06-01 The Broad Institute, Inc. Nucléases iscb reprogrammables et leurs utilisations
CN116656649A (zh) * 2023-06-02 2023-08-29 南京医科大学 一种is200/is60s转座子iscb突变蛋白及其应用
WO2023215915A1 (fr) * 2022-05-06 2023-11-09 Cornell University Utilisation d'iscb dans l'édition génomique
WO2023230483A2 (fr) * 2022-05-23 2023-11-30 The Broad Institute, Inc. Polypeptides iscb chimériques modifiés et utilisations associées
CN117247938A (zh) * 2023-08-14 2023-12-19 上海中医药大学 一种IscB融合蛋白表达载体及其构建方法和基因编辑的应用

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022087494A1 (fr) * 2020-10-23 2022-04-28 The Broad Institute, Inc. Nucléases iscb reprogrammables et leurs utilisations
WO2023097228A1 (fr) * 2021-11-23 2023-06-01 The Broad Institute, Inc. Nucléases iscb reprogrammables et leurs utilisations
WO2023215915A1 (fr) * 2022-05-06 2023-11-09 Cornell University Utilisation d'iscb dans l'édition génomique
WO2023230483A2 (fr) * 2022-05-23 2023-11-30 The Broad Institute, Inc. Polypeptides iscb chimériques modifiés et utilisations associées
CN116656649A (zh) * 2023-06-02 2023-08-29 南京医科大学 一种is200/is60s转座子iscb突变蛋白及其应用
CN117247938A (zh) * 2023-08-14 2023-12-19 上海中医药大学 一种IscB融合蛋白表达载体及其构建方法和基因编辑的应用

Similar Documents

Publication Publication Date Title
US12359218B2 (en) Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
JP7280905B2 (ja) Crisprcpf1の結晶構造
JP7083364B2 (ja) 配列操作のための最適化されたCRISPR-Cas二重ニッカーゼ系、方法および組成物
US12133884B2 (en) Methods of substituting pathogenic amino acids using programmable base editor systems
CN106852157B (zh) 用于使用h1启动子表达crispr向导rna的组合物和方法
EP4100032A1 (fr) Procédés d'édition génomique pour le traitement de l'amyotrophie musculaire spinale
CN114375334A (zh) 工程化CasX系统
JP2022546608A (ja) 新規核酸塩基エディター及びその使用方法
CN114096666A (zh) 治疗血红素病变的组合物和方法
US20250283063A1 (en) Novel crispr-cas12i systems and uses thereof
US20230383288A1 (en) Systems, methods, and compositions for rna-guided rna-targeting crispr effectors
JP2024540337A (ja) 新型CRISPR-Cas12iシステム及びその用途
US20250027114A1 (en) Cas9 variants having non-canonical pam specificities and uses thereof
AU2021360919A1 (en) Compositions and methods for treating glycogen storage disease type 1a
WO2024094084A1 (fr) Polypeptides iscb et leurs utilisations
US20240011005A1 (en) Novel crispr-cas12f systems and uses thereof
WO2025149083A1 (fr) Polypeptides iscb et leurs utilisations
CN120230749A (zh) TnpB-ωRNA基因编辑系统及用途
WO2024083135A1 (fr) Polypeptides iscb et leurs utilisations
WO2025180412A1 (fr) Désaminases et leurs utilisations
EP4658786A2 (fr) Méthodes d'édition de gènes, systèmes et compositions pour le traitement de l'amyotrophie spinale

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25738610

Country of ref document: EP

Kind code of ref document: A1