[go: up one dir, main page]

WO2024131786A1 - System for inserting large fragment dna into genome - Google Patents

System for inserting large fragment dna into genome Download PDF

Info

Publication number
WO2024131786A1
WO2024131786A1 PCT/CN2023/139871 CN2023139871W WO2024131786A1 WO 2024131786 A1 WO2024131786 A1 WO 2024131786A1 CN 2023139871 W CN2023139871 W CN 2023139871W WO 2024131786 A1 WO2024131786 A1 WO 2024131786A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
utr
seq
retrotransposase
rna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/139871
Other languages
French (fr)
Chinese (zh)
Inventor
李伟
周琪
陈阳灿
骆胜球
胡艳萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Zoology of CAS
Institute for Stem Cell and Regeneration of CAS
Original Assignee
Institute of Zoology of CAS
Institute for Stem Cell and Regeneration of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Zoology of CAS, Institute for Stem Cell and Regeneration of CAS filed Critical Institute of Zoology of CAS
Priority to CN202380087116.0A priority Critical patent/CN120418275A/en
Publication of WO2024131786A1 publication Critical patent/WO2024131786A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • C12N9/1241Nucleotidyltransferases (2.7.7)
    • C12N9/1276RNA-directed DNA polymerase (2.7.7.49), i.e. reverse transcriptase or telomerase
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K48/00Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K48/00Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
    • A61K48/005Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy characterised by an aspect of the 'active' part of the composition delivered, i.e. the nucleic acid delivered
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y207/00Transferases transferring phosphorus-containing groups (2.7)
    • C12Y207/07Nucleotidyltransferases (2.7.7)
    • C12Y207/07049RNA-directed DNA polymerase (2.7.7.49), i.e. telomerase or reverse-transcriptase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/90Vectors containing a transposable element

Definitions

  • the present application belongs to the field of biotechnology. More specifically, the present application relates to a retrotransposon capable of inserting a large DNA fragment into a human genome at a specific site, its use, and a system using the enzyme.
  • CRISPR clustered regularly interspaced short palindromic repeats
  • Some other methods of site-directed insertion of large DNA fragments have their own shortcomings. For example, the efficiency of large fragment insertion achieved by homologous recombination is low, and the introduced double-stranded DNA breaks pose a safety risk; recombinase systems such as Cre/loxP often require the pre-insertion of loxP sites before the second step of integration can be performed; in addition, most current technologies rely on DNA donors, making it difficult to solve problems such as in vivo delivery.
  • the reverse transcriptase complements the gap in large fragment site-specific integration technology and provides a more versatile and convenient tool for biological research.
  • the new reverse transcriptase system is mainly divided into two components. First, the reverse transcriptase protein polypeptide itself. Second, the donor RNA carrying new genetic information. The reverse transcriptase recognizes and binds to the RNA donor sequence to form a protein-RNA complex, which is then transferred to the target gene through the reverse transcriptase. Activity can convert RNA into DNA and integrate it into specific genomic sites.
  • the DNA integration method based on reverse transcription enzymes avoids the generation of double-stranded DNA breaks and the risks they bring, while avoiding the preparation and dependence on DNA donors in practical applications. Only one step of reverse transcription reaction is required to complete the integration of large fragments of DNA, which is convenient for wide application in various scenarios.
  • a reverse transposase comprising a target DNA binding domain containing a zinc finger binding motif, a reverse transcriptase domain, and an endonuclease domain, capable of reverse transcribing RNA into DNA.
  • the reverse transcriptase according to item 1 comprising 1 to 3 zinc finger domains (ZF), 1 Myb-like domain, 1 reverse transcriptase domain (RT) and 1 restriction endonuclease-like nuclease domain (RLE); further, mutations, deletions and insertions may optionally occur in the N-terminus of the protein and in the amino acid sequences that are not conservative or structural between the above four domains.
  • ZF zinc finger domains
  • RT reverse transcriptase domain
  • RLE restriction endonuclease-like nuclease domain
  • the retrotransposase according to item 1 or 2 whose amino acid sequence is as shown in any one of SEQ ID No.1 to 6 or SEQ ID No.32 to 43 or SEQ ID No.68 to 71, or has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity with the amino acid sequence of any one of SEQ ID No.1 to 6 or SEQ ID No.32 to 43 or SEQ ID No.68 to 71.
  • a system for modifying DNA comprising:
  • a donor RNA or a nucleic acid encoding the donor RNA wherein the donor RNA comprises: a sequence that binds to the retrotransposase and a heterologous sequence,
  • the heterologous sequence is at least 1-50000 bases, for example, 1 nt or more, 10 nt or more, 50 nt or more, 60 nt or more, 70 nt or more, 80 nt or more, 90 nt or more, 100 nt or more, 150 nt or more, 200 nt or more, 250 nt or more, 300 nt or more, 350 nt or more, 400 nt or more, 450 nt or more, 500 nt or more, 550 nt or more, 600 nt or more, 650 nt or more, 700 nt or more, 750 nt or more, 800 nt or more, 850 nt or more, 900 nt or more, 950 nt or more, 1000 nt or more, 1100 nt or more, 1200 nt or more.
  • nt or more 1300nt or more, 1400nt or more, 1500nt or more, 1600nt or more, 1700nt or more, 1800nt or more, 1900nt or more, 2000nt or more, 2100nt or more, 2200nt or more, 2300nt or more, 2400nt or more, 2500nt or more, 2600nt or more, 2700nt or more, 2800nt or more, 2900nt or more, 3000nt or more, 3500nt or more, 4000nt or more, 4500nt or more, 5000nt or more, 5500nt or more, 6000nt or more, 6500nt or more, 7000nt or more, 7500nt or more, 8000nt or more, 8500nt or more, 9000nt or more, 9500nt or more, 10000nt or more, 15000nt or more, 20000nt or more, 25000nt or more, 30000nt or more, 35000
  • heterologous sequence comprises one or more of the following: a sequence encoding a polypeptide or a non-coding RNA sequence, a sequence comprising a promoter or an enhancer, a sequence encoding one or more introns, and a transcription termination sequence;
  • the polypeptide is a therapeutic polypeptide or a mammalian polypeptide; further preferably, the polypeptide is a therapeutic protein, a membrane protein, an intracellular protein, an extracellular protein, a structural protein, a signal transduction protein, a regulatory protein, a transport protein, an organelle protein, a sensory protein, a motor protein, a defense protein, a storage protein, a reporter protein, an antibody, an enzyme, a coagulation factor, and further preferably, the number of amino acids in the polypeptide is 20 to 10000, for example, the number of amino acids is 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440,
  • the intracellular protein is selected from cytoplasmic protein, nuclear protein, organelle protein, mitochondrial protein or lysosomal protein,
  • sequence encoding the polypeptide contains one or more introns.
  • the donor RNA further comprises a homology domain, preferably the homology domain comprises a first homology domain and a second homology domain,
  • the first homology domain is 5 or more bases located at the 5' end of the donor RNA and have 100% identity with the target DNA chain
  • the second homology domain is 5 or more bases located at the 3' end of the donor RNA and have 100% identity with the target DNA chain
  • the target DNA is a genomic safe harbor GSH site or the target DNA is a genomic Natural Harbor TM site.
  • nucleic acid encoding the reverse transposase according to any one of items 1 to 3 and the donor RNA or the nucleic acid encoding the donor RNA are separate nucleic acids, preferably the donor RNA does not encode the reverse transposase, and further preferably the donor RNA comprises one or more chemical modifications; or
  • nucleic acid encoding the reverse transposase described in any one of items 1 to 3 and the donor RNA or the nucleic acid encoding the donor RNA are covalently linked.
  • the nucleic acid encoding the reverse transposase described in any one of items 1 to 3 and the donor RNA or the nucleic acid encoding the donor RNA form a fusion nucleic acid.
  • the fusion nucleic acid comprises RNA or DNA.
  • a 5' untranslated sequence (5'UTR) to which the retrotransposase binds,
  • the promoter is located between the 5' untranslated sequence (5'UTR) to which the reverse transposase binds and the heterologous sequence, or preferably, the promoter is located between the 3' untranslated sequence (3'UTR) to which the reverse transposase binds and the heterologous sequence.
  • 5'UTR 5' untranslated sequence
  • 3'UTR 3' untranslated sequence
  • heterologous sequence comprises an open reading frame or its reverse complement sequence oriented in a 5' to 3' direction on the donor RNA; or the heterologous sequence comprises an open reading frame or its reverse complement sequence oriented in a 3' to 5' direction on the donor RNA.
  • the donor RNA further comprises a nuclear localization signal or the nucleic acid encoding the retrotransposase according to any one of items 1 to 3 comprises a nuclear localization signal and/or a nucleolar localization signal and/or a nuclear export signal.
  • the nucleic acid encoding the retrotransposase and the nucleic acid encoding the donor RNA are present in a ratio of 10:1 to 1:10, for example, 10:1, 9:1, 8:1, 7:1, 6:1, 5:1, 4:1, 3:1, 2:1, 1:1, 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9, or 1:10.
  • the donor RNA comprises a stem-loop sequence or helix 5' of the pseudoknot sequence, preferably comprises one or more (e.g. 2, 3 or more) stem-loop sequences or helices 3' of the pseudoknot sequence, such as 3' of the pseudoknot sequence and 5' of the heterologous sequence, and further preferably the donor RNA of the pseudoknot has catalytic activity, such as RNA cleavage activity, such as cis-RNA cleavage activity, or
  • the donor RNA comprises, e.g., at least one stem-loop sequence or helix, e.g., 1, 2, 3, 4, 5 or more stem-loop sequences, hairpin or helix sequences, e.g., 3' to the heterologous sequence.
  • the 5' untranslated sequence (5'UTR) in the donor RNA to which the retrotransposase binds has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence of any one of SEQ ID No.7 to 12 or SEQ ID No.44 to 55;
  • the 3’ non-translated sequence (5’UTR) in the donor RNA that binds to the retrotransposase has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity with the nucleotide sequence described in any one of SEQ ID No.13 to 18 or SEQ ID No.56 to 67.
  • the first homology domain is the first homology domain
  • 3' untranslated sequence (3'UTR) to which the retrotransposase binds
  • the second homology domain is the second homology domain
  • the first homology domain is 10 or more, 20 or more, 30 or more, 40 or more, 50 or more, 60 or more, 70 or more, 80 or more, 90 or more, or 100 or more bases located at the 5' end of the donor RNA and having 100% identity with the target DNA chain
  • the second homology domain is 10 or more, 20 or more, 30 or more, 40 or more, or 50 or more bases located at the 3' end of the donor RNA and having 100% identity with the target DNA chain. or more than 60 or more than 70 or more than 80 or more than 90 or more than 100 bases;
  • the 5' untranslated sequence (5'UTR) in the donor RNA that binds to the retrotransposase has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence of any one of SEQ ID No.7 to 12;
  • the 3’ untranslated sequence (5’UTR) in the donor RNA that binds to the retrotransposase has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity with the nucleotide sequence described in any one of SEQ ID No. 13 to 18.
  • the 5' untranslated sequence (5'UTR) to which the retrotransposase binds is a non-natural 5' untranslated sequence (5'UTR); or
  • the 3' untranslated sequence (5'UTR) to which the retrotransposase binds is a non-natural 3' untranslated sequence (5'UTR);
  • non-native 5' untranslated sequences having additions, deletions and/or substitutions of nucleotides relative to the native 5'UTR sequences;
  • non-native 3' untranslated sequences having additions, deletions and/or substitutions of nucleotides relative to the native 3'UTR sequences;
  • non-native 5' untranslated sequence has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity to the nucleotide sequence of SEQ ID No.19-21;
  • the non-natural 3’ untranslated sequence (3’UTR) is further preferred, having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence described in any one of SEQ ID No.22-23.
  • the heterologous sequence is inserted into the target site at a copy number of 1 insertion in about 1%-80% of the cells (e.g., about 1%-10%, 10%-20%, 20%-30%, 30%-40%, 40%-50%, 50%-60%, 60%-70%, or 70%-80% of the cells) in a population of cells contacted with the system, for example, as measured using colony isolation and ddPCR.
  • a non-natural 5' untranslated sequence having additions, deletions and/or substitutions of nucleotides relative to a natural 5'UTR sequence, preferably having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence of SEQ ID No. 19-21.
  • a non-natural 3’ untranslated sequence (3’UTR) having additions, deletions and/or substitutions of nucleotides relative to a natural 3’UTR sequence, preferably having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence of SEQ ID No. 22-23.
  • An engineered transposable element comprising, from 5' to 3':
  • 5' untranslated sequence (5'UTR), heterologous sequence and 3' untranslated sequence (3'UTR),
  • the 5' untranslated sequence comprises a nucleotide sequence selected from SEQ ID No. 19-21 having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity;
  • the non-natural 3’ untranslated sequence (3’UTR) is further preferred, having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence described in any one of SEQ ID No.22-23.
  • a host cell comprising the system according to any one of items 4 to 18 or the element according to item 21 or 22, wherein the host cell is preferably a mammalian cell or a plant cell, and more preferably a human cell.
  • a method for modifying a target DNA strand in a cell, tissue or subject comprising applying the system of any one of items 4 to 18 to the cell, tissue or subject, wherein the The system reverse transcribes the donor RNA sequence into the target DNA strand, thereby modifying the target DNA strand in a cell, tissue or subject.
  • the cells and tissues are mammalian cells and tissues, preferably human cells and tissues, and the subject is a mammal, preferably a human.
  • a method for modifying the genome of a mammalian cell or inserting DNA into the genome of a mammal comprising applying the system of any one of items 4 to 18 to the cell, preferably the mammal is a human.
  • the method comprises contacting a cell, a tissue or a subject with a retrotransposase as described in any one of items 1 to 3 or a nucleic acid encoding the retrotransposase as described in any one of items 1 to 3 and a donor RNA or a nucleic acid encoding the donor RNA,
  • the contacting comprises contacting the cell, tissue or subject with a plasmid, virus, virus-like particle, virosome, liposome, vesicle, exosome or lipid nanoparticle;
  • said contacting comprises the use of non-viral delivery, such as electroporation.
  • the contacting comprises intravenously administering to the subject, preferably at least twice, the retrotransposase shown in any one of Items 1 to 3 or a nucleic acid encoding the retrotransposase shown in any one of Items 1 to 3 and a donor RNA or a nucleic acid encoding the donor RNA.
  • the retrotransposase described in any one of Items 1 to 3 or the nucleic acid encoding the retrotransposase described in any one of Items 1 to 3 and the donor RNA or the nucleic acid encoding the donor RNA are administered separately; or
  • the retrotransposase described in any one of Items 1 to 3 or a nucleic acid encoding the retrotransposase described in any one of Items 1 to 3 is administered together with a donor RNA or a nucleic acid encoding the donor RNA.
  • a vector comprising the nucleic acid described in item 34.
  • a host cell comprising the vector of item 35.
  • a pharmaceutical composition comprising the system described in any one of items 4 to 18, or the nucleic acid described in item 34, or the vector described in item 35, or the host cell described in item 23 or 36.
  • the system is placed in a pharmaceutically acceptable carrier, and further preferably, the carrier is a vesicle (including liposomes, natural or synthetic lipid bilayers, exosomes), lipid nanoparticles, viruses or plasmid vectors.
  • the system and method constructed by the present application can realize gene writing at the DNA level by using only RNA donors, which is a technical innovation.
  • the system and method of the present application can meet but are not limited to:
  • Treatment needs for example, by providing expression of a therapeutic transgene in an individual with a loss-of-function mutation, by replacing a gain-of-function mutation with a normal transgene, by providing a regulatory sequence to eliminate expression of a gain-of-function mutation, and/or by controlling the expression of operably linked genes, transgenes, and systems thereof.
  • the RNA sequence template encodes a promoter region specific to the host cell's therapeutic needs, such as a tissue-specific promoter or enhancer.
  • the promoter can be operably linked to a coding sequence.
  • plants can be given new economic traits (such as stress resistance, insect resistance, etc.).
  • FIG. 1 shows the structures of the vector expressing the retrotransposase protein and the vector expressing the donor RNA.
  • Figure 2 Activity results of various retrotransposase systems in mammals using the system constructed in Example 2. The results showed that compared with the negative control, the six novel retrotransposase systems (#3, #21, #23, #24, #31, #33) had significant activity.
  • FIG. 3 GFP after #21 retrotransposase achieves GFP gene integration in mammalian cells The expression results were obtained and the proportion of GFP-positive cells was quantified by flow cytometry.
  • FIG. 4 shows the GFP expression results after the #21 retrotransposase achieves GFP gene integration in mammalian cells, and the GFP expression is observed by fluorescence microscopy.
  • FIG5 shows that #21 retrotransposase achieves GFP gene integration in mammalian cells, and explores the effect of donor RNA containing non-natural 5’UTR and/or non-natural 3’UTR on the efficiency of gene integration of #21 retrotransposase in mammalian cells.
  • FIG. 6 Display of the positions of PCR primers in the examples.
  • Figure 7 PCR amplification results of the junction between the 3’ end of the integrated DNA sequence and the genome (inside the 28s rDNA gene) in mammalian cells using different retrotransposase systems.
  • FIG8 PCR amplification results of sequences spanning both ends of introns of integration sequences in mammalian cells using different retrotransposase systems.
  • FIG. 9 shows the expression results of GFP after the GFP gene was integrated into mammalian cells by four types of #21 retrotransposases, and the expression level of GFP was observed by fluorescence microscopy.
  • nucleic acid refers to a polymeric form of nucleotides of any length, including deoxyribonucleotides, ribonucleotides, combinations thereof, and their analogs.
  • Oligonucleotide and “oligonucleotide” are used interchangeably and refer to short polynucleotides having no more than about 50 nucleotides.
  • complementarity refers to the ability of a nucleic acid to form hydrogen bonds with another nucleic acid via traditional Watson-Crick base pairing.
  • the complementarity percentage indicates the percentage of residues in a nucleic acid molecule that can form hydrogen bonds (i.e., Watson-Crick base pairing) with a second nucleic acid (e.g., 5, 6, 7, 8, 9, 10 out of 10, which are about 50%, 60%, 70%, 80%, 90%, 100% complementary, respectively).
  • “Completely complementary” means that all consecutive residues of a nucleic acid sequence form hydrogen bonds with the same number of consecutive residues in a second nucleic acid sequence.
  • substantially complementary means that the degree of complementarity is at least about any of 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99% or 100% over a region of about 40, 50, 60, 70, 80, 100, 150, 200, 250 or more nucleotides, or refers to two nucleic acids that hybridize under stringent conditions.
  • Percentage (%) sequence identity for nucleic acid sequences is defined as the percentage of nucleotides in a candidate sequence that are identical to the nucleotides in a particular nucleic acid sequence after alignment of the sequences (if necessary) by allowing gaps to achieve the maximum percentage of sequence identity.
  • Percentage (%) sequence identity for peptide, polypeptide or protein sequences is the percentage of amino acid residues in a candidate sequence that are identically replaced with the amino acid residues in a particular peptide or amino acid sequence after alignment of the sequences (if necessary) by allowing gaps to achieve the maximum percentage of sequence homology.
  • alignment can be achieved in various ways within the technical scope of the art, for example, using publicly available computer software such as mafft, muscle, Clustal, needle, BLAST, BLAST-2, ALIGN or MEGALIGNTM (DNASTAR) software, for example, preferably using methods such as mafft, muscle, Clustal, needle.
  • suitable parameters for measuring alignment including any algorithm required for achieving maximum alignment over the full length of the compared sequence.
  • polypeptide and “peptide” are used interchangeably herein and refer to polymers of amino acids of any length.
  • the polymer may be linear or branched, it may contain modified amino acids, and may be interrupted by non-amino acids.
  • a protein may have one or more polypeptides.
  • the term also encompasses amino acid polymers that have been modified; for example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation (such as conjugation with a labeling component).
  • variant is interpreted as a polynucleotide or polypeptide that is different from a reference polynucleotide or polypeptide, respectively, but retains the necessary properties.
  • a typical variant of a polynucleotide differs from the nucleic acid sequence of another reference polynucleotide. Changes in the variant nucleic acid sequence may or may not change the amino acid sequence of the polypeptide encoded by the reference polynucleotide. Nucleotide changes may result in amino acid substitutions, additions, deletions, fusions and truncations in the polypeptide encoded by the reference sequence, as described below.
  • a typical variant of a polypeptide differs from another reference polypeptide in amino acid sequence.
  • the differences are limited so that the sequences of the reference polypeptide and the variant are very similar overall and identical in many regions.
  • the amino acid sequences of the variant and the reference polypeptide may differ by any combination of one or more substitutions, additions, deletions.
  • the substituted or inserted amino acid residues may or may not be amino acid residues encoded by the genetic code.
  • Variants of polynucleotides or polypeptides may be naturally occurring (such as allelic variants), or may be unknown naturally occurring variants.
  • Non-naturally occurring variants of nucleotides and polypeptides can be prepared by mutagenesis techniques, by direct synthesis, and by other recombinant methods known to those skilled in the art.
  • wild type has a meaning generally understood by those skilled in the art, and refers to an organism, strain, gene or characteristic in a typical form that distinguishes it from a mutant or variant when it exists in nature. It can be isolated from resources in nature and has not been intentionally modified.
  • nucleic acid molecule or polypeptide As used herein, the terms “non-naturally occurring” or “engineered” are used interchangeably and refer to human involvement. When these terms are used to describe a nucleic acid molecule or polypeptide, it means that the nucleic acid molecule or polypeptide is at least substantially free of at least one other component with which it is naturally associated or naturally occurring.
  • Cell as used herein should be understood to refer not only to a particular individual cell, but also to the progeny or potential progeny of that cell. Because certain modifications may occur in progeny due to mutation or environmental influences, such progeny may not in fact be the same as the parent cell, but are still included within the scope of the term herein.
  • transduction and “transfection” include methods known in the art for introducing DNA into cells using infectious agents (such as viruses) or other means to express a protein or molecule of interest.
  • infectious agents such as viruses
  • virus-like agents there are chemical-based transfection methods, such as transfection methods using calcium phosphate, dendrimers, liposomes, or cationic polymers (e.g., DEAE-dextran or polyethyleneimine); non-chemical methods, such as electroporation, cell squeezing, sonoporation, optical transfection, impalefection, protoplast fusion, plasmid delivery, or transposon; particle-based methods, such as using a gene gun, magnetofection or magnet-assisted transfection, particle bombardment; and hybrid methods (such as nuclear transfection).
  • transfected refers to the process of transferring or introducing exogenous nucleic acid into a host cell.
  • a “transfected,” “transformed,” or “transduced” cell is a cell that has been transfected, transformed, or transduced with exogenous nucleic acid.
  • in vivo refers to inside the organism from which the cell was obtained.
  • Ex vivo or “in vitro” refers to outside the organism from which the cell was obtained.
  • treatment is a method for obtaining beneficial or desired results (including clinical results).
  • beneficial or desired clinical results include, but are not limited to, one or more of the following: alleviating one or more symptoms caused by the disease, alleviating the extent of the disease, stabilizing the disease (e.g., preventing or delaying the worsening of the disease), preventing or delaying the spread of the disease (e.g., metastasis), preventing or delaying the recurrence of the disease, reducing the recurrence rate of the disease, delaying or slowing the progression of the disease, improving the disease state, providing (partial or complete) remission of the disease, reducing the dose of one or more other drugs required to treat the disease, delaying the progression of the disease, improving the quality of life, and/or prolonging Survival.
  • Treatment also includes reducing the pathological consequences of a disorder, condition, or disease. The methods of the present invention contemplate any one or more of these aspects of
  • the term "effective amount” refers to an amount of a compound or composition sufficient to treat a particular disorder, condition or disease (such as improving, alleviating, alleviating and/or delaying one or more symptoms thereof).
  • an "effective amount” can be administered in one or more doses, i.e., a single dose or multiple doses may be required to achieve a desired treatment endpoint.
  • Subject “Subject,” “individual,” or “patient” are used interchangeably herein for purposes of treatment and refer to any animal classified as a mammal, including humans, livestock and farm animals, and zoo, farm, or pet animals such as dogs, horses, cats, cows, etc.
  • the individual is a human individual.
  • reference to "not" a value or parameter generally means and describes an "except” value or parameter.
  • the method is not used to treat cancer type X, meaning that the method is used to treat cancers other than type X.
  • the term “and/or” in phrases such as "A and/or B” is intended to include both A and B; A or B; A (alone); and B (alone).
  • the term “and/or” in phrases such as "A, B, and/or C” is intended to include each of the following embodiments: A, B, and C; A, B, or C; A or C; A or B; B or C; A and C; A and B; B and C; A (alone); B (alone); and C (alone).
  • the mutations described herein may include one or more of: insertion, deletion, substitution, and may be mutations of a single amino acid or multiple amino acids.
  • a “vector” is a composition of matter that contains an isolated nucleic acid and can be used to deliver the isolated nucleic acid to the interior of a cell.
  • Many vectors are known in the art, including but not limited to linear polynucleotides, polynucleotides associated with ions or amphiphilic compounds, plasmids, and viruses.
  • suitable vectors are The vector comprises a replication origin, a promoter sequence, a convenient restriction endonuclease site and one or more selective markers that function in at least one organism.
  • the term “vector” should also be interpreted as including non-plasmid and non-viral compounds that facilitate nucleic acid transfer into cells, such as, for example, polylysine compounds, liposomes, etc.
  • the vector is a viral vector.
  • viral vectors include, but are not limited to, adenoviral vectors, adeno-associated viral vectors, lentiviral vectors, retroviral vectors, vaccinia vectors, herpes simplex virus vectors, and derivatives thereof.
  • the vector is a bacteriophage vector. Viral vector technology is well known in the art and is described, for example, in Sambrook et al. (2001, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, New York), and other virology and molecular biology manuals.
  • the rAAV construct can be administered to a subject enterally. In some embodiments, the rAAV construct can be administered to a subject parenterally. In some embodiments, the rAAV particles can be administered subcutaneously, intraocularly, intravitreally, subretinally, intravenously (IV), intraventricularly, intramuscularly, intrathecally (IT), intracisternal, intraperitoneally, via inhalation, topically, or by direct injection into one or more cells, tissues, or organs. In some embodiments, the rAAV particles can be administered to a subject by injection into the hepatic artery or portal vein.
  • Vectors can be transferred into host cells by physical, chemical or biological methods.
  • vectors into host cells include: calcium phosphate precipitation, lipofection, particle bombardment, microinjection, electroporation, etc.
  • Methods for producing cells containing vectors and/or exogenous nucleic acids are well known in the art. See, for example, Sambrook et al. (2001) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, New York.
  • the vector is introduced into the cell by electroporation.
  • the cell is a bacterium, a yeast cell, a fungal cell, an algae cell, a plant cell or an animal cell (e.g., a mammalian cell, such as a human cell).
  • the cell is a cell of natural origin, such as a cell isolated by a tissue biopsy.
  • the cell is a cell isolated from a cell line cultured in vitro.
  • the cell is from a primary cell line.
  • the cell is from an immortalized cell line.
  • the cell is a genetically engineered cell.
  • the nuclear localization signal herein is a domain of a protein, usually a short amino acid sequence, which can interact with a nuclear import carrier to enable the protein to be transported into the cell nucleus.
  • the nuclear localization signal may also be a RNA sequence.
  • the nuclear localization signal is located on the donor RNA.
  • the retrotransposase polypeptide is encoded on the first RNA, and the donor RNA is a second separate RNA, and the nuclear localization signal is located on the donor RNA instead of on the RNA encoding the retrotransposase polypeptide.
  • the RNA encoding the retrotransposase is mainly targeted to the cytoplasm to promote its translation, while the donor RNA is mainly targeted to the nucleus to promote its retrotransposition into the genome.
  • the nuclear localization signal is at the 3' end, 5' end or inside of the donor RNA. In some embodiments, the nuclear localization signal is at the 3' end of the heterologous sequence (e.g., directly at the 3' end of the heterologous sequence) or at the 5' end of the heterologous sequence (e.g., directly at the 5' end of the heterologous sequence).
  • the nuclear localization signal is placed outside the 5' UTR of the donor RNA or outside the 3' UTR. In some embodiments, the nuclear localization signal is placed between the 5'UTR and the 3'UTR, wherein optionally, the nuclear localization signal is not transcribed with the transgene (e.g., the nuclear localization signal is in an antisense orientation or downstream of a transcription termination signal or a polyadenylation signal). In some embodiments, the nuclear localization sequence is located within an intron. In some embodiments, a plurality of identical or different nuclear localization signals are in RNA, such as in a donor RNA.
  • the length of the nuclear localization signal is less than 5, 10, 25, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, or 1000 bp.
  • RNA nuclear localization sequences can be used.
  • domain refers to the structure of a biomolecule that contributes to a specific function of a biomolecule.
  • a domain can comprise a continuous region (e.g., a continuous sequence) or different non-continuous regions (e.g., a non-continuous sequence) of a biomolecule.
  • protein domains include, but are not limited to, endonuclease domains, target DNA binding domains, reverse transcription domains; examples of domains of nucleic acids are regulatory domains, such as transcription factor binding domains.
  • the present application relates to a target DNA binding domain containing a zinc finger binding motif, a reverse transcriptase domain, and an endonuclease domain.
  • the reverse transcriptase domain refers to a domain with reverse transcription function, and those skilled in the art can use conventional tools as basic local comparison search tools (e.g., BLAST) to identify the reverse transcription domain based on homology with other known reverse transcription domains.
  • the reverse transcriptase domain is modified, for example, by site-specific mutation.
  • the reverse transcriptase domain is engineered to bind to a heterologous sequence.
  • the endonuclease domain refers to a domain with endonuclease function
  • the endonuclease element is a heterologous endonuclease element, such as Fok1 nuclease, type II restriction endonuclease (RLE type nuclease) or another RLE type endonuclease (also referred to as REL).
  • the heterologous endonuclease activity has nickase activity and does not form double-strand breaks.
  • BLAST basic local comparison search tools
  • websites or software to predict domains e.g., using InterPro website, hhpred website, CDD website, psi-blast software, blastp software or hh-suite software to predict domains
  • endonuclease domains based on homology with other known endonuclease domains.
  • the target DNA binding domain is a target DNA binding domain containing a zinc finger binding motif, wherein the zinc finger binding motif is an amino acid sequence responsible for binding to a target DNA of a specific sequence.
  • exogenous when used with respect to a biomolecule (e.g., a nucleic acid sequence or a polypeptide), refers to the artificial introduction of the biomolecule into a host genome, cell, or organism.
  • a nucleic acid added to an existing genome, cell, tissue, or subject using recombinant DNA technology or other methods is exogenous to the existing nucleic acid sequence, cell, tissue, or subject.
  • heterologous means that when used to describe a first element with reference to a second element, the term heterologous means that the first element and the second element do not exist in the arrangement as described in nature.
  • a heterologous polypeptide, nucleic acid molecule, construct or sequence refers to (a) a polypeptide, nucleic acid molecule or a portion of a polypeptide or nucleic acid molecule sequence that is not natural for the cell expressing it, (b) a polypeptide or nucleic acid molecule or a portion of a polypeptide or nucleic acid molecule that has been altered or mutated relative to its natural state, or (c) a polypeptide or nucleic acid molecule with expression that is altered compared to the natural expression level under similar conditions.
  • a heterologous regulatory sequence e.g., a promoter, an enhancer
  • a heterologous domain of a polypeptide or nucleic acid sequence e.g., a DNA binding domain of a polypeptide or a nucleic acid encoding a DNA binding domain of a polypeptide
  • a heterologous domain of a polypeptide or nucleic acid sequence can be arranged relative to other domains, or can be a different sequence or relative to other domains or portions of a polypeptide or its encoding nucleic acid from different sources.
  • the heterologous nucleic acid molecule may be present in the native host cell genome, but may have an altered expression level or a different sequence, or both.
  • the heterologous nucleic acid molecule may not be endogenous to the host cell or the host genome, but may have been introduced into the host cell by transformation (e.g., transfection, electroporation), wherein the added molecule may be It may be integrated into the host genome or may exist transiently (e.g., mRNA) or semi-stably for more than one generation (e.g., episomal viral vectors, plasmids, or other self-replicating vectors) as extrachromosomal genetic material.
  • the term "gene expression unit” is a nucleic acid sequence comprising at least one regulatory nucleic acid sequence operably linked to at least one effector sequence.
  • a first nucleic acid sequence is operably linked to a second nucleic acid sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence.
  • the promoter or enhancer is operably linked to the coding sequence.
  • Operably linked DNA sequences can be continuous or non-continuous. In the case where it is necessary to connect two protein coding regions, the operably linked sequences can be in the same reading frame.
  • the term "host genome or host cell” refers to a cell and/or its genome into which proteins and/or genetic material have been introduced. It should be understood that these terms are intended to refer not only to specific subject cells and/or genomes, but also to the offspring of such cells and/or the genomes of the offspring of such cells. Because some modifications may occur in offspring due to mutations or environmental influences, such offspring may actually be different from parental cells, but are still included in the scope of the term "host cell” used herein.
  • the host genome or host cell can be a separated cell or cell line grown in culture, or a genomic material separated from such a cell or cell line, or can be a host cell or host genome that constitutes a living tissue or organism.
  • the host cell can be an animal cell or a plant cell, for example, as described herein.
  • the host cell can be a cattle cell, a horse cell, a pig cell, a goat cell, a sheep cell, a chicken cell or a turkey cell.
  • the host cell can be a corn cell, a soybean cell, a wheat cell or a rice cell.
  • genomic safe harbor sites are sites in the host genome that can accommodate the integration of new genetic material, for example, so that the inserted genetic elements do not cause significant changes in the host genome that pose a risk to the host cell or organism.
  • GSH sites typically meet 1, 2, 3, 4, 5, 6, 7, 8 or 9 of the following conditions: (i) >300 kb from cancer-related genes; (ii) >300 kb from miRNA/other functional small RNAs; (iii) >50 kb from the 5' gene end; (iv) >50 kb from the replication origin; (v) >50 kb from any extremely conserved element; (vi) low transcriptional activity (i.e., no mRNA +/- 25 kb); (vii) not in a copy number variable region; (viii) in open chromatin; and/or (ix) is unique, with 1 copy in the human genome.
  • GSH sites in the human genome that meet some or all of these criteria include: (i) adeno-associated virus site 1 (AAVS1), which is the natural site of integration of the AAV virus on chromosome 19; (ii) the chemokine (CC motif) receptor 5 (CCR5) gene, a chemokine receptor gene known as the HIV-1 co-receptor; (iii) the mouse Rosa26 locus Human ortholog; (iv) rDNA locus. Additional GSH sites are known and described, for example, in Pellenz et al., electronic publication on August 20, 2018 (https://doi.org/10.1101/396390).
  • the genomic safe harbor site is a Natural Harbor TM site.
  • the Natural Harbor TM site is a ribosomal DNA (rDNA).
  • the Natural Harbor TM site is a 5S rDNA, 18S rDNA, 5.8S rDNA, or 28S rDNA.
  • the Natural Harbor TM site is a Mutsu site in 5S rDNA.
  • the Natural Harbor TM site is an R2 site in 28S rDNA.
  • a "pseudoknot sequence” refers to a nucleic acid (eg, RNA) having a sequence with appropriate self-complementarity to form a pseudoknot structure.
  • a “stem-loop sequence” refers to a nucleic acid sequence (e.g., an RNA sequence) having sufficient self-complementarity to form a stem-loop, e.g., having a stem comprising at least two (e.g., 3, 4, 5, 6, 7, 8, 9, or 10) base pairs, and having a loop having at least three (e.g., four) base pairs.
  • the stem may contain mismatches or bulges.
  • the cell is an animal cell of an organism selected from the group consisting of cow, sheep, goat, horse, pig, deer, chicken, duck, goose, rabbit, and fish.
  • the cell is a mammalian cell. In some embodiments, the cell is a human cell. In some embodiments, the human cell is a human embryonic kidney 293T (HEK293T or 293T) cell or a HeLa cell. In some embodiments, the cell is a human embryonic kidney (HEK293T) cell. In some embodiments, the cell is a mouse Hepa1-6 cell. In some embodiments, the mammalian cell is selected from the group consisting of an immune cell, a hepatocyte, a tumor cell, a stem cell, a blood cell, a neural cell, a zygote, a muscle cell (such as a cardiomyocyte) and a skin cell.
  • an immune cell a hepatocyte, a tumor cell, a stem cell, a blood cell, a neural cell, a zygote, a muscle cell (such as a cardiomyocyte) and a skin cell.
  • the cell is an immune cell selected from the group consisting of cytotoxic T cells, helper T cells, natural killer (NK) T cells, iNK-T cells, NK-T-like cells, ⁇ T cells, tumor-infiltrating T cells, and dendritic cells (DC) activated T cells.
  • the method produces modified immune cells, such as CAR-T cells or TCR-T cells.
  • the cell is an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a gamete progenitor cell, a gamete, a zygote, or a cell in an embryo.
  • ES embryonic stem
  • iPS induced pluripotent stem
  • the reverse transcriptase of the present application comprises a target DNA binding domain containing a zinc finger binding motif, a reverse transcriptase domain, and a nuclease endonuclease domain, and can reverse transcribe RNA into DNA.
  • the amino acid sequence of the retrotransposase of the present application is as shown in any one of SEQ ID No.1-6 or SEQ ID No.32-43 or SEQ ID No.68-71, or has at least 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identity with the amino acid sequence of any one of SEQ ID No.1-6 or SEQ ID No.32-43 or SEQ ID No.68-71.
  • the amino acid sequence of the reverse transposase of the present application is a conservative mutant of the amino acid sequence shown in any one of SEQ ID No. 1 to 6 or SEQ ID No. 32 to 43 or SEQ ID No. 68 to 71.
  • the reverse transcriptase of the present application comprises 1 to 3 zinc finger domains (ZF), 1 Myb-like domain, 1 reverse transcriptase domain (RT) and 1 restriction endonuclease-like nuclease domain (RLE).
  • ZF zinc finger domains
  • RT reverse transcriptase domain
  • RLE restriction endonuclease-like nuclease domain
  • the N-terminus of the protein and the amino acid sequences of different lengths between the domains do not have obvious structural or conservative properties. Therefore, the conservative mutants of the amino acid sequences shown in any one of SEQ ID No. 1 to 6 or SEQ ID No. 32 to 43 or SEQ ID No. 68 to 71 of the present application include the truncated sequences of the amino acid sequences shown in any one of SEQ ID No. 1 to 6 or SEQ ID No. 32 to 43 or SEQ ID No.
  • 68 to 71 obtained by removing the N-terminus of the protein and the amino acid sequences that do not have conservatism or structure between the four domains, namely, 1 to 3 zinc finger domains (ZF), 1 Myb-like domain, 1 reverse transcriptase domain (RT) and 1 restriction endonuclease-like nuclease domain (RLE), but they still have reverse transposase activity. Or, a conservative mutant still having retrotransposase activity obtained by mutation, deletion, insertion, etc.
  • ZF zinc finger domains
  • RT reverse transcriptase domain
  • RLE restriction endonuclease-like nuclease domain
  • ZF zinc finger domains
  • RT reverse transcriptase domain
  • RLE restriction endonuclease-like nuclease domain
  • the structural property of a region in a protein refers to the region having an obvious rigid structure ( ⁇ -helix or ⁇ -fold) in the resolved three-dimensional structure of the protein (which can be obtained from databases such as PDB) or in the three-dimensional structure predicted by protein three-dimensional structure prediction software (e.g. Alphafold).
  • it may be a truncated form of protein #21, such as a truncated form after truncating the amino acids 401-467, which still has the activity of the retrotransposase required by the present application.
  • truncated form of protein #21 such as a truncated form after truncating amino acids 401-467 and further adding 32 linked amino acids after amino acid 400, which still has the activity of the retrotransposase required by the present application.
  • it may be a truncated form of protein #21, for example, a truncated form after truncating amino acids 1-100 thereof, which still has the activity of the retrotransposase required by the present application.
  • it may be a truncated form of protein #21, for example, a truncated form after truncating the amino acids 1-200 thereof, which still has the activity of the retrotransposase required by the present application.
  • it may be a truncated form of protein #21, such as a truncated form after truncating amino acids 1-200 and amino acids 401-467, which still has the activity of the retrotransposase required by the present application.
  • the present application relates to a system for modifying DNA, the system comprising: the reverse transposase of the present application or a nucleic acid encoding the reverse transposase of the present application; and a donor RNA or a nucleic acid encoding the donor RNA, the donor RNA comprising: a sequence that binds to the reverse transposase and a heterologous sequence, preferably the heterologous sequence is at least 1-50000 bases, for example, more than 1nt, more than 10nt, more than 50nt, more than 60nt, more than 70nt, more than 80nt, more than 90nt, more than 10 ...
  • 0nt or more 150nt or more, 200nt or more, 250nt or more, 300nt or more, 350nt or more, 400nt or more, 450nt or more, 500nt or more, 550nt or more, 600nt or more, 650nt or more, 700nt or more, 750nt or more, 800nt or more, 850nt or more, 900nt or more, 950nt or more, 1000nt or more, 1100nt or more, 1200nt or more, 1300nt or more 1400nt or more, 1500nt or more, 1600nt or more, 1700nt or more, 1800nt or more, 1900nt or more, 2000nt or more, 2100nt or more, 2200nt or more, 2300nt or more, 2400nt or more, 2500nt or more, 2600nt or more, 2700nt or more, 2800nt or more, 2900nt or more, 3000nt or more
  • the retrotransposase and the donor RNA are encoded separately by two plasmids, the retrotransposase gene is expressed by the CAG promoter, and the donor RNA is expressed by the CAG promoter.
  • the expression frame of the donor RNA is also embedded with a CMV promoter. In the expression frame initiated by the CMV promoter, CMV expresses GFP, but the GFP here is separated by an intron sequence inserted in the opposite direction.
  • the final mode of action of the system of the present application is: after the donor RNA is expressed by the CAG promoter, the intron sequence contained in the donor RNA is sheared off from the expressed RNA, and the GFP sequence in the expression frame initiated by the CMV promoter at this time returns to normal at the RNA level.
  • the GFP sequence at this time is an antisense RNA chain and does not have the ability to translate GFP protein.
  • the reverse expression frame initiated by the CMV promoter can express the normal mRNA of GFP without introns, and then express the GFP protein with green fluorescence.
  • the heterologous sequence of the present application is selected from one or more of the following: a sequence encoding a polypeptide, a sequence containing an organizational promoter or enhancer, and a sequence encoding one or more introns.
  • the polypeptide is a therapeutic polypeptide or a mammalian polypeptide; further preferably, the polypeptide is a therapeutic protein, a membrane protein, an intracellular protein, an extracellular protein, a structural protein, a signal transduction protein, a regulatory protein, a transport protein, an organelle protein, a sensory protein, a motor protein, a defense protein, a storage protein, a reporter protein, an antibody, an enzyme, or a coagulation factor.
  • the number of amino acids in the polypeptide is 20 to 10,000, for example, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800 , 330, 340, 350, 360, 370, 380, 390, 400,
  • the intracellular protein is selected from cytoplasmic proteins, nuclear proteins, organelle proteins, mitochondrial proteins or lysosomal proteins.
  • one or more introns are included in the sequence encoding the polypeptide.
  • the systems described herein can be used in vitro or in vivo.
  • the system or system components are delivered to cells (e.g., mammalian cells, such as human cells), for example, in vitro or in vivo.
  • the cells are eukaryotic cells, such as cells of multicellular organisms, such as animals, such as mammals (e.g., humans, pigs, cattle), birds (e.g., poultry, such as chickens, turkeys, or ducks), or fish.
  • the cells are non-human animal cells (e.g., experimental animals, livestock, or companion animals).
  • the cells are stem cells (e.g., hematopoietic stem cells), fibroblasts, or T cells.
  • the cells are non-dividing cells, such as non-dividing fibroblasts or non-dividing T cells. In some embodiments, the cells are plant cells.
  • the components of the Gene Writer system can be delivered in the form of polypeptides, nucleic acids (e.g., DNA, RNA), and combinations thereof.
  • delivery can use any combination of the following to deliver the retrotransposase (e.g., as DNA encoding the retrotransposase protein, as RNA encoding the retrotransposase protein, or as the protein itself) and the donor RNA (e.g., as DNA encoding RNA, or as RNA):
  • the retrotransposase e.g., as DNA encoding the retrotransposase protein, as RNA encoding the retrotransposase protein, or as the protein itself
  • the donor RNA e.g., as DNA encoding RNA, or as RNA
  • a virus is used to deliver DNA encoding a retrotransposase protein.
  • RNA and in some embodiments, a virus is used to deliver the donor RNA (or DNA encoding the donor RNA).
  • the system and/or components of the system are delivered in the form of nucleic acids.
  • the retrotransposase polypeptide can be delivered in the form of DNA or RNA encoding the polypeptide, and the donor RNA can be delivered in the form of RNA or its complementary DNA to be transcribed into RNA.
  • the system or components of the system are delivered on 1, 2, 3, 4 or more different nucleic acid molecules.
  • the system or components of the system are delivered as a combination of DNA and RNA.
  • the system or components of the system are delivered as a combination of DNA and protein.
  • the system or components of the system are delivered as a combination of RNA and protein.
  • the retrotransposase polypeptide is delivered as a protein.
  • a system or a component of a system is delivered to a cell, such as a mammalian cell or a human cell, using a vector.
  • the vector can be, for example, a plasmid or a virus.
  • delivery is in vivo, in vitro, ex vivo or in situ.
  • the virus is an adeno-associated virus (AAV), a lentivirus, an adenovirus.
  • a system or a component of a system is delivered to a cell together with a virus-like particle or a virion. In some embodiments, delivery uses more than one virus, virus-like particle or virion.
  • compositions and systems described herein can be formulated in liposomes or other similar vesicles.
  • Liposomes are spherical vesicle structures, which are composed of a monolayer or multilayer lipid bilayer around an internal aqueous compartment and a relatively impermeable external lipophilic phospholipid bilayer. Liposomes can be anionic, neutral or cationic. Liposomes are biocompatible, nontoxic, can deliver hydrophilic and lipophilic drug molecules, protect their cargo from degradation by plasma enzymes, and transport their loads across biomembranes and blood-brain barriers (BBB).
  • BBB blood-brain barriers
  • Vesicles can be made of several different types of lipids; however, phospholipids are most commonly used to produce liposomes as drug carriers.
  • Methods for preparing multilamellar vesicle lipids are known in the art (see, e.g., U.S. Patent No. 6,693,086, which is incorporated herein by reference for its teachings on the preparation of multilamellar vesicle lipids).
  • U.S. Patent No. 6,693,086 which is incorporated herein by reference for its teachings on the preparation of multilamellar vesicle lipids.
  • Extruded lipids can be prepared by extrusion through a filter with a reduced size.
  • Lipid nanoparticles are another example of carriers that provide a biocompatible and biodegradable delivery system for the pharmaceutical compositions described herein.
  • Nanostructured lipid carriers are modified solid lipid nanoparticles (SLNs) that retain the properties of SLNs, improving the stability and Drug loading, and prevent drug leakage.
  • Polymer nanoparticles are an important component of drug delivery. These nanoparticles can effectively guide drug delivery to specific targets and improve drug stability and controlled drug release.
  • Lipopolymer nanoparticles (PLN) can also be used, which is a new type of carrier that combines liposomes and polymers. These nanoparticles have the complementary advantages of PNP and liposomes.
  • PLN is composed of a core-shell structure; the polymer core provides a stable structure, and the phospholipid shell provides good biocompatibility. In this way, these two components improve the drug encapsulation efficiency, promote surface modification, and prevent the leakage of water-soluble drugs.
  • the 5' untranslated sequence (5'UTR) in the donor RNA that binds to the retrotransposase is a natural sequence or a non-natural sequence.
  • the non-natural 5' untranslated sequence (5'UTR) has a sequence obtained by adding, deleting and/or replacing nucleotides relative to the natural 5'UTR sequence.
  • the 3' untranslated sequence (3'UTR) in the donor RNA that binds to the retrotransposase is a natural sequence or a non-natural sequence.
  • the non-natural 3' untranslated sequence (5'UTR) has a sequence obtained by adding, deleting and/or replacing nucleotides relative to the natural 3'UTR sequence.
  • the 5' untranslated sequence (5'UTR) in the donor RNA to which the retrotransposase binds has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity with the nucleotide sequence of any one of SEQ ID No. 7 to 12 or SEQ ID No. 44 to 55.
  • the 3' untranslated sequence (5'UTR) in the donor RNA to which the retrotransposase binds has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity with the nucleotide sequence of any one of SEQ ID No. 13 to 18 or SEQ ID No. 56 to 67.
  • the non-natural 5' untranslated sequence has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence of SEQ ID No.19-21.
  • the non-natural 3' untranslated sequence has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity with the nucleotide sequence of any one of SEQ ID No.22-23.
  • the present application also relates to the use of the above-mentioned non-native 3' untranslated sequence (3'UTR) and non-native 5' untranslated sequence (5'UTR).
  • the protein sequence of the retrotransposase of the present application is codon-optimized (human) and synthesized, and the DNA coding fragment is loaded between the XmaI and NheI restriction sites of the pCAG-SV40poly(A) vector by Gibson cloning.
  • the expression donor RNA plasmid is amplified by overlapping PCR method to contain multiple sequences of GFP(N)-intron-GFP(C) (Seq ID No.24), 5'-UTR (as described in any one of Seq ID No.7 to 12), 3'-UTR sequence (as described in any one of Seq ID No.13 to 18), and CMV promoter (Seq ID No.25), and finally the DNA coding fragment is loaded into the pSV40-mCherry vector by Gibson cloning to construct a plasmid expressing the donor RNA.
  • HEK293T cell line cells (sourced from ATCC) were cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). The cells were seeded in a 24-well plate-cell culture dish (Corning) for 16 hours until the cell density reached 70%-90%.
  • DMEM Gibco
  • Gibco penicillin-streptomycin
  • Gibco 10% fetal bovine serum
  • Lipofectamine 3000 Invitrogen
  • 250 ng of a plasmid encoding a retrotransposase protein as described in any one of Seq ID No. 1 to 6
  • 250 ng of a plasmid expressing a donor RNA were transfected into each 24-well plate-cell culture dish.
  • This embodiment designs a reporter system that can accurately reflect whether the retrotransposase system of the present application can work in mammalian cells ( Figure 1).
  • the retrotransposase protein and the donor RNA are separately encoded by two plasmids, respectively, reflecting the modularity of the novel retrotransposase system.
  • the donor RNA is expressed by the CAG promoter. It is worth noting that the expression frame of the donor RNA initiated by the CAG promoter also contains a reverse expression frame initiated by the CMV promoter. In the expression frame initiated by the CMV promoter, CMV expresses GFP, but the GFP here is separated by an intron sequence inserted in the reverse direction.
  • the final mode of action of the system of the present application is: after the donor RNA is expressed by the CAG promoter, the intron sequence contained in the donor RNA is sheared off from the expressed RNA, and the GFP sequence in the expression frame expressed by the CMV promoter returns to normal at the RNA level.
  • the GFP sequence at this time is an antisense RNA chain and does not have the ability to translate GFP protein.
  • the reverse expression frame expressed by the CMV promoter can express normal GFP mRNA without introns, and then express GFP protein with green fluorescence. Therefore, by detecting the presence and proportion of the final GFP cells by fluorescence microscopy and flow cytometry, we can judge whether the new retrotransposase can play an effect in mammalian cells and how high the activity is.
  • Example 2 Novel retrotransposases are able to function in mammalian cells
  • the donor RNA of the novel retrotransposase system generally includes 5 parts, homology arm-left (Seq ID No. 26), 5'UTR sequence, 3'UTR sequence, a sequence between the 5'UTR sequence and the 3'UTR sequence carrying new gene information, and homology arm-right (Seq ID No. 27) ( Figure 1).
  • the protein polypeptide sequence of the novel retrotransposase tested in this example was codon-optimized (human) and synthesized, and the DNA coding fragment was loaded between the XmaI and NheI restriction sites of the pCAG-SV40poly(A) vector by Gibson cloning.
  • the expression donor RNA plasmid was amplified by overlapping PCR to amplify multiple sequences containing GFP sequence, intron sequence, 5'-UTR, 3'-UTR sequence, and CMV promoter, and finally the DNA coding fragment was loaded into the pSV40-mCherry vector by Gibson cloning to construct a plasmid expressing the donor RNA.
  • HEK293T cell line cells were cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). Cells were seeded in 24-well plates-cell culture dishes (Corning) for 16 hours until the cell density reached 70%-90%.
  • DMEM fetal bovine serum
  • Gibco penicillin-streptomycin
  • Gibco 10% fetal bovine serum
  • 250ng of plasmid encoding retrotransposase protein and 250ng of plasmid expressing donor RNA were transfected into each 24-well plate-cell culture dish. After transfection for 24 hours, cells were digested with trypsin-EDTA (0.05%) (Gibco).
  • Example 1 we used the reporter system constructed in Example 1 to test 33 different retrotransposase systems.
  • the sequences of the retrotransposases of some systems and the corresponding 5'UTR sequences and 3'UTR sequences are shown in Table 1 below.
  • 6 new systems had GFP signals that were significantly higher than the negative control ( Figure 2).
  • the protein sequences, 5'UTR and 3'UTR sequences of these 6 new retrotransposase systems (#3, #21, #23, #24, #31, #33) are shown in the sequence table.
  • Example 3 Use of donors containing non-native 5'UTR and/or non-native 3'UTR RNA boosts efficiency of new retrotransposase
  • this example tests the effects of multiple donor RNAs containing non-natural 5'UTR and/or non-natural 3'UTR on the efficiency of the novel retrotransposase (#21).
  • the donor RNA of the novel retrotransposase system generally contains 5 parts, homology arm-left (Seq ID No.26), 5'UTR sequence (Seq ID No.8, Seq ID No.19-21), 3'UTR sequence (Seq ID No.14, Seq ID No.22-23), a sequence between the 5'UTR sequence and the 3'UTR sequence carrying new gene information, and homology arm-right (Seq ID No.27)) ( Figure 1).
  • the protein polypeptide sequence of the novel retrotransposase tested in this example was codon optimized (human) and synthesized, and the DNA coding fragment was loaded between the XmaI and NheI restriction sites of the pCAG-SV40poly(A) vector by Gibson cloning.
  • the expression donor RNA plasmid was amplified by overlapping PCR method to contain multiple sequences of GFP(N)-intron-GFP(C), non-natural 5'-UTR, non-natural 3'-UTR sequence, and CMV promoter, and finally the DNA coding fragment was loaded into the pSV40-mCherry vector by Gibson cloning to construct a plasmid expressing donor RNA.
  • the cells of the HEK293T cell line were cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). The cells were seeded in a 24-well plate-cell culture dish (Corning) for 16 hours until the cell density reached 70%-90%.
  • DMEM fetal bovine serum
  • 250ng of the plasmid encoding the retrotransposase protein and 250ng of the plasmid expressing the donor RNA were transfected into each 24-well plate-cell culture dish. After 24 hours of transfection, the cells were digested with trypsin-EDTA (0.05%) (Gibco).
  • the cells with mCherry signals were then sorted using a MoFlo XDP (Beckman Coulter) instrument and re-seeded back into a 12-well plate. After continuing to culture for 6 days, the cells were digested with trypsin-EDTA (0.05%) (Gibco) and then stained using a BD FACSAria TM Fusion Cell The ratio of cells with GFP positive signals was analyzed by Sorter (BD) instrument. The ratio of cells with GFP positive signals was compared with that of negative control and combined with the results observed under a fluorescence microscope to confirm whether the new retrotransposase system can function in mammalian cells.
  • BD Sorter
  • Example 4 The novel retrotransposase can amplify the corresponding DNA writing fragment
  • FIG. 7 shows the PCR amplification results of different retrotransposases at the junction of the 3' end of the integrated DNA sequence and the genome (inside the 28s rDNA gene) in mammalian cells.
  • the donor RNA of the novel retrotransposase system generally comprises five parts, homology arm-left (Seq ID No. 26), 5'UTR sequence, 3'UTR sequence, a sequence between the 5'UTR sequence and the 3'UTR sequence carrying new gene information, and homology arm-right (Seq ID No. 27) ( Figure 1).
  • the protein polypeptide sequence of the novel retrotransposase tested in this example was codon optimized (human) and synthesized, and the DNA coding fragment was loaded between the XmaI and NheI restriction sites of the pCAG-SV40poly(A) vector by Gibson cloning.
  • the expression donor RNA plasmid was amplified by overlapping PCR to amplify multiple sequences containing GFP sequence, intron sequence, 5'-UTR, 3'-UTR sequence, and CMV promoter, and finally the DNA coding fragment was loaded into the pSV40-mCherry vector by Gibson cloning to construct a plasmid expressing the donor RNA.
  • the cells of the HEK293T cell line were cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). The cells were seeded in 24-well plate-cell culture dishes (Corning) for 16 hours until the cell density reached 70%-90%.
  • DMEM fetal bovine serum
  • 250 ng of the plasmid encoding the retrotransposase protein and 250 ng of the plasmid expressing the donor RNA were transfected into each 24-well plate-cell culture dish. After 24 hours of transfection, the cells were digested with trypsin-EDTA (0.05%) (Gibco).
  • the cells were then sorted using a MoFlo XDP (Beckman Coulter) instrument.
  • the cells with mCherry signal were re-seeded into 12-well plates and cultured for 6 days before being digested with trypsin-EDTA (0.05%) (Gibco).
  • the left side is the experimental group that was transfected with plasmids expressing the donor and the retrotransposase protein at the same time
  • the right side is the control group that was transfected with only the donor plasmid.
  • the black triangle indicates the corresponding PCR-amplified positive fragment.
  • some of the retrotransposase systems are able to amplify the corresponding written fragments.
  • Example 5 A novel retrotransposase can achieve integration of the GFP gene
  • FIG. 8 shows the results of PCR amplification of sequences spanning both ends of introns of integration sequences in mammalian cells using different retrotransposase systems.
  • the donor RNA of the novel retrotransposase system generally comprises five parts, homology arm-left (Seq ID No. 26), 5'UTR sequence, 3'UTR sequence, a sequence between the 5'UTR sequence and the 3'UTR sequence carrying new gene information, and homology arm-right (Seq ID No. 27) ( Figure 1).
  • the protein polypeptide sequence of the novel retrotransposase tested in this example was codon-optimized (human) and synthesized, and the DNA coding fragment was loaded into the pCAG-SV40 poly (A) vector between the XmaI and NheI restriction sites by Gibson cloning.
  • the expression donor RNA plasmid containing the GFP sequence, intron sequence, 5'-UTR, 3'-UTR sequence, CMV promoter sequence was cloned by overlapping PCR. Multiple sequences of the promoter were amplified separately, and finally the DNA coding fragments were loaded into the pSV40-mCherry vector using Gibson cloning to construct a plasmid expressing the donor RNA.
  • Cells of the HEK293T cell line were cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). The cells were seeded in 24-well plates-cell culture dishes (Corning) for 16 hours until the cell density reached 70%-90%.
  • DMEM fetal bovine serum
  • Lipofectamine3000 Invitrogen
  • 250 ng of plasmid encoding retrotransposase protein and 250 ng of plasmid expressing donor RNA were transfected into each 24-well plate-cell culture dish. After 24 hours of transfection, the cells were digested with trypsin-EDTA (0.05%) (Gibco).
  • the cells with mCherry signal were then sorted using a MoFlo XDP (Beckman Coulter) instrument and re-seeded back into 12-well plates. After 6 days of continuous culture, the cells were digested with trypsin-EDTA (0.05%) (Gibco).
  • Figure 8 shows that in the two channels of each group of gel images, the left side is the experimental group that simultaneously transfected the plasmid expressing the donor and the retrotransposase protein, and the right side is the control group that only transfected the expression of the donor plasmid.
  • some retrotransposase systems proteins #3, #4, #5, #8, #10, #11, #14, #21, #25, #29, #31, #32) can amplify the corresponding written fragments.
  • #3, #4, #5, #8, #10, #11, #14, #17, #21, #24, #25, #27, #29, #31, and #32 proteins can achieve site-specific integration of DNA in mammalian cells in combination with their respective donors.
  • the fluorescence flow sorting experiment combined with GFP shows that #3, #21, #23, #24, #31, and #33 can achieve complete integration of the GFP gene.
  • Example 6 Exploring the activity of truncated retrotransposase protein in mammalian cells
  • the sequence of the retrotransposase protein was aligned using the mafft software (muscle software, Clustal software, and blast software also have similar functions), and then the needle software (blast software also has similar functions) was used to calculate the similarity between proteins.
  • the retrotransposase protein was predicted using the InterPro website (hhpred website, NCBI CDD website, psi-blast software, blastp software, and hh-suite software also have similar functions).

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Animal Behavior & Ethology (AREA)
  • Veterinary Medicine (AREA)
  • Public Health (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The present application provides a retrotransposase. The retrotransposase comprises a target DNA-binding domain containing a zinc finger binding motif, a reverse transcriptase domain, and an endonuclease domain, and can reversely transcribe an RNA into an DNA. An amino acid sequence of the retrotransposase of the present application is as shown in any one of SEQ ID NOs: 1-6, or SEQ ID NOs: 32-43, or SEQ ID NOs: 68-71, or has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity with the amino acid sequence in any one of SEQ ID NOs: 1-6, or SEQ ID NOs: 32-43, or SEQ ID NOs: 68-71. The present application further relates to a system for modifying a DNA. The system comprises: the retrotransposase of the present application or a nucleic acid encoding the retrotransposase of the present application; and a donor RNA or a nucleic acid encoding the donor RNA, the donor RNA comprising a sequence and a heterologous sequence that bind to the heterologous sequence.

Description

一种在基因组插入大片段DNA的系统A system for inserting large fragments of DNA into genomes

相关申请Related Applications

本专利申请要求于2022年12月19日递交的申请号为2022116339212的中国专利申请的优先权。This patent application claims priority to Chinese patent application number 2022116339212 filed on December 19, 2022.

同时提交的序列表文件Sequence listing file submitted simultaneously

下列XML文件的全部内容通过整体引用并入本文:计算机可读格式(CRF)的序列表(名称:PFG00948PCT-PG03564-序列表.xml,日期:20231219,大小:99KB)。The entire contents of the following XML file are incorporated herein by reference in their entirety: Sequence Listing in Computer Readable Format (CRF) (Name: PFG00948PCT-PG03564-Sequence Listing.xml, Date: 20231219, Size: 99KB).

技术领域Technical Field

本申请属于生物技术领域。更具体地说,本申请涉及具有在人基因组中定点插入大片段DNA的逆转座酶(Retrotransposon)及其用途,以及利用该酶的系统。The present application belongs to the field of biotechnology. More specifically, the present application relates to a retrotransposon capable of inserting a large DNA fragment into a human genome at a specific site, its use, and a system using the enzyme.

背景技术Background technique

基因组定点插入大片段DNA是在基因工程研究中的十分重要的技术。Inserting large fragments of DNA into the genome at a specific point is a very important technology in genetic engineering research.

大片段DNA整合进哺乳动物细胞基因组中一般都低效且不特异。现有的几种基因组编辑方法,如成簇规律间隔短回文重复序列(CRISPR)-Cas系统等,不适用于长片段DNA的整合。一些其他的定点插入大片段DNA方法则都有各自的缺陷。如通过同源重组实现的大片段插入效率低,且引入的双链DNA断裂,有安全风险;像Cre/loxP等重组酶系统往往需要预先插入loxP位点,才能进行第二步的整合操作;此外目前的技术大多都依赖于DNA供体提供,难以解决体内递送等问题。The integration of large DNA fragments into the genome of mammalian cells is generally inefficient and non-specific. Several existing genome editing methods, such as the clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system, are not suitable for the integration of long DNA fragments. Some other methods of site-directed insertion of large DNA fragments have their own shortcomings. For example, the efficiency of large fragment insertion achieved by homologous recombination is low, and the introduced double-stranded DNA breaks pose a safety risk; recombinase systems such as Cre/loxP often require the pre-insertion of loxP sites before the second step of integration can be performed; in addition, most current technologies rely on DNA donors, making it difficult to solve problems such as in vivo delivery.

而逆转座酶则很好补充了大片段定点整合技术的缺口,为生物学研究提供了更通用更便捷的工具手段。新型逆转座酶系统主要分为两个组分。第一,逆转座酶蛋白多肽本身。第二,携带有新的基因信息的供体RNA。逆转座酶通过识别并结合RNA供体序列形成蛋白质-RNA复合物,通过逆转录酶 活性可以将RNA转变成DNA整合进特定的基因组位点。基于逆转座酶的DNA整合方法双链DNA断裂的产生以及其所带来的风险,同时避免了在实际应用中对DNA供体的制备与依赖,仅需一步逆转录反应就能完成大片段DNA的整合,有便于在多种场景中广泛应用。The reverse transcriptase complements the gap in large fragment site-specific integration technology and provides a more versatile and convenient tool for biological research. The new reverse transcriptase system is mainly divided into two components. First, the reverse transcriptase protein polypeptide itself. Second, the donor RNA carrying new genetic information. The reverse transcriptase recognizes and binds to the RNA donor sequence to form a protein-RNA complex, which is then transferred to the target gene through the reverse transcriptase. Activity can convert RNA into DNA and integrate it into specific genomic sites. The DNA integration method based on reverse transcription enzymes avoids the generation of double-stranded DNA breaks and the risks they bring, while avoiding the preparation and dependence on DNA donors in practical applications. Only one step of reverse transcription reaction is required to complete the integration of large fragments of DNA, which is convenient for wide application in various scenarios.

发明内容Summary of the invention

1.一种逆转座酶,其包括含有锌指结合基序的靶DNA结合结构域、逆转录酶结构域、以及核酸内切酶结构域,能够将RNA逆转录成DNA。1. A reverse transposase comprising a target DNA binding domain containing a zinc finger binding motif, a reverse transcriptase domain, and an endonuclease domain, capable of reverse transcribing RNA into DNA.

2.根据项1所述的逆转座酶,其包含1~3个锌指结构域(ZF),1个Myb类结构域,1个逆转录酶结构域(RT)以及1个限制性内切酶样核酸内切酶结构域(RLE);进一步可选在蛋白质N端、以及上述四种结构域之间不具备保守性或结构性的氨基酸序列中发生突变、删除、插入。2. The reverse transcriptase according to item 1, comprising 1 to 3 zinc finger domains (ZF), 1 Myb-like domain, 1 reverse transcriptase domain (RT) and 1 restriction endonuclease-like nuclease domain (RLE); further, mutations, deletions and insertions may optionally occur in the N-terminus of the protein and in the amino acid sequences that are not conservative or structural between the above four domains.

3.根据项1或2所述的逆转座酶,其氨基酸序列如SEQ ID No.1~6或SEQ ID No.32~43或SEQ ID No.68~71中的任一项所示或与SEQ ID No.1~6或SEQ ID No.32~43或SEQ ID No.68~71中任一项所述的氨基酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%同一性。3. The retrotransposase according to item 1 or 2, whose amino acid sequence is as shown in any one of SEQ ID No.1 to 6 or SEQ ID No.32 to 43 or SEQ ID No.68 to 71, or has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity with the amino acid sequence of any one of SEQ ID No.1 to 6 or SEQ ID No.32 to 43 or SEQ ID No.68 to 71.

4.一种用于对DNA进行修饰的系统,所述系统包括:4. A system for modifying DNA, the system comprising:

项1~3中任一项所示的逆转座酶或编码项1~3中任一项所述的逆转座酶的核酸;和The retrotransposase described in any one of Items 1 to 3 or a nucleic acid encoding the retrotransposase described in any one of Items 1 to 3; and

供体RNA或编码所述供体RNA的核酸,所述供体RNA包含:与所述逆转座酶结合的序列和异源序列,A donor RNA or a nucleic acid encoding the donor RNA, wherein the donor RNA comprises: a sequence that binds to the retrotransposase and a heterologous sequence,

优选所述异源序列是至少1-50000个碱基,例如1nt以上、10nt以上、50nt以上、60nt以上、70nt以上、80nt以上、90nt以上、100nt以上、150nt以上、200nt以上、250nt以上、300nt以上、350nt以上、400nt以上、450nt以上、500nt以上、550nt以上、600nt以上、650nt以上、700nt以上、750nt以上、800nt以上、850nt以上、900nt以上、950nt以上、1000nt以上、1100nt以上、1200nt以上、1300nt以上、1400nt以上、1500nt以上、1600nt以上、1700nt以上、1800nt以上、1900nt以上、2000nt以上、2100nt以上、2200nt以上、2300nt以上、2400nt以上、2500nt以上、2600nt以上、2700nt以上、2800nt以上、2900nt以上、3000nt以上、3500nt以上、4000nt以上、4500nt以上、5000nt以上、5500nt以上、6000nt以上、6500nt以上、7000nt以上、 7500nt以上、8000nt以上、8500nt以上、9000nt以上、9500nt以上、10000nt以上、15000nt以上、20000nt以上、25000nt以上、30000nt以上、35000nt以上、40000nt以上、45000nt以上。Preferably, the heterologous sequence is at least 1-50000 bases, for example, 1 nt or more, 10 nt or more, 50 nt or more, 60 nt or more, 70 nt or more, 80 nt or more, 90 nt or more, 100 nt or more, 150 nt or more, 200 nt or more, 250 nt or more, 300 nt or more, 350 nt or more, 400 nt or more, 450 nt or more, 500 nt or more, 550 nt or more, 600 nt or more, 650 nt or more, 700 nt or more, 750 nt or more, 800 nt or more, 850 nt or more, 900 nt or more, 950 nt or more, 1000 nt or more, 1100 nt or more, 1200 nt or more. nt or more, 1300nt or more, 1400nt or more, 1500nt or more, 1600nt or more, 1700nt or more, 1800nt or more, 1900nt or more, 2000nt or more, 2100nt or more, 2200nt or more, 2300nt or more, 2400nt or more, 2500nt or more, 2600nt or more, 2700nt or more, 2800nt or more, 2900nt or more, 3000nt or more, 3500nt or more, 4000nt or more, 4500nt or more, 5000nt or more, 5500nt or more, 6000nt or more, 6500nt or more, 7000nt or more, 7500nt or more, 8000nt or more, 8500nt or more, 9000nt or more, 9500nt or more, 10000nt or more, 15000nt or more, 20000nt or more, 25000nt or more, 30000nt or more, 35000nt or more, 40000nt or more, 45000nt or more.

5.根据项4所述的系统,其中,所述异源序列包含如下中的一种或两种以上:编码多肽的序列或非编码RNA序列、包含启动子或增强子的序列、编码一个或多个内含子的序列、转录终止序列;5. The system according to item 4, wherein the heterologous sequence comprises one or more of the following: a sequence encoding a polypeptide or a non-coding RNA sequence, a sequence comprising a promoter or an enhancer, a sequence encoding one or more introns, and a transcription termination sequence;

优选所述多肽为治疗性多肽或哺乳动物多肽;进一步优选所述多肽为治疗性蛋白质、膜蛋白质、细胞内蛋白质、细胞外蛋白质、结构蛋白、信号传到蛋白、调节蛋白、转运蛋白、细胞器蛋白、感觉蛋白、运动蛋白、防御蛋白、储存蛋白、报告蛋白质、抗体、酶、凝血因子,进一步优选所述多肽的氨基酸个数为20个~10000个,例如氨基酸个数为30个、40个、50个、60个、70个、80个、90个、100个、110个、120个、130个、140个、150个、160个、170个、180个、190个、200个、210个、220个、230个、240个、250个、260个、270个、280个、290个、300个、310个、320个、330个、340个、350个、360个、370个、380个、390个、400个、410个、420个、430个、440个、450个、460个、470个、480个、490个、500个、550个、600个、650个、700个、750个、800个、850个、900个、950个、1000个、1100个、1200个、1300个、1400个、1500个、1600个、1700个、1800个、1900个、2000个、2100个、2200个、2300个、2400个、2500个、2600个、2700个、2800个、2900个、3000个、3100个、3200个、3300个、3400个、3500个、3600个、3700个、3800个、3900个、4000个、4100个、4200个、4300个、4400个、4500个、4600个、4700个、4800个、4900个、5000个、5100个、5200个、5300个、5400个、5500个、5600个、5700个、5800个、5900个、6000个、6100个、6200个、6300个、6400个、6500个、6600个、6700个、6800个、6900个、7000个、7100个、7200个、7300个、7400个、7500个、7600个、7700个、7800个、7900个、8000个、8100个、8200个、8300个、8400个、8500个、8600个、8700个、8800个、8900个、9000个、9100个、9200个、9300个、9400个、9500个、9600个、9700个、9800个、9900个;Preferably, the polypeptide is a therapeutic polypeptide or a mammalian polypeptide; further preferably, the polypeptide is a therapeutic protein, a membrane protein, an intracellular protein, an extracellular protein, a structural protein, a signal transduction protein, a regulatory protein, a transport protein, an organelle protein, a sensory protein, a motor protein, a defense protein, a storage protein, a reporter protein, an antibody, an enzyme, a coagulation factor, and further preferably, the number of amino acids in the polypeptide is 20 to 10000, for example, the number of amino acids is 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800 , 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3100, 3200, 3300, 3400, 3500, 3600, 3700, 3800, 3900, 4000, 4100, 4200, 4300, 4400, 4500, 4600, 4700, 4800, 4900, 5000, 5100, 5200, 5300, 5400, 5500, 5600, 5700, 5800, 5900 00, 6000, 6100, 6200, 6300, 6400, 6500, 6600, 6700, 6800, 6900, 7000, 7100, 7200, 7300, 7400, 7500, 7600, 7700, 7800, 7900, 8000, 8100, 8200, 8300, 8400, 8500, 8600, 8700, 8800, 8900, 9000, 9100, 9200, 9300, 9400, 9500, 9600, 9700, 9800, 9900;

优选所述细胞内蛋白选自胞质蛋白、核蛋白、细胞器蛋白、线粒体蛋白或溶酶体蛋白, Preferably, the intracellular protein is selected from cytoplasmic protein, nuclear protein, organelle protein, mitochondrial protein or lysosomal protein,

进一步优选在编码多肽的序列中包含一个或多个内含子。It is further preferred that the sequence encoding the polypeptide contains one or more introns.

6.根据项4或5所述的系统,其中,所述供体RNA还包含同源结构域,优选所述同源结构域包括第一同源结构域和第二同源结构域,6. The system according to item 4 or 5, wherein the donor RNA further comprises a homology domain, preferably the homology domain comprises a first homology domain and a second homology domain,

进一步优选,所述第一同源结构域为位于所述供体RNA的5’端的与靶DNA链具有100%同一性的5个以上的碱基,所述第二同源结构域为位于所述供体RNA的3’端的与靶DNA链具有100%同一性的5个以上的碱基,优选所述靶DNA是基因组安全港GSH位点或者所述靶DNA是基因组Natural HarborTM位点。Further preferably, the first homology domain is 5 or more bases located at the 5' end of the donor RNA and have 100% identity with the target DNA chain, and the second homology domain is 5 or more bases located at the 3' end of the donor RNA and have 100% identity with the target DNA chain, and preferably the target DNA is a genomic safe harbor GSH site or the target DNA is a genomic Natural Harbor TM site.

7.根据项4~6中任一项所述的系统,其中,编码项1~3中任一项所述的逆转座酶的核酸和所述供体RNA或编码所述供体RNA的核酸是分开的核酸,优选所述供体RNA不编码逆转座酶,进一步优选所述供体RNA包含一个或多个化学修饰;或者7. The system according to any one of items 4 to 6, wherein the nucleic acid encoding the reverse transposase according to any one of items 1 to 3 and the donor RNA or the nucleic acid encoding the donor RNA are separate nucleic acids, preferably the donor RNA does not encode the reverse transposase, and further preferably the donor RNA comprises one or more chemical modifications; or

编码项1~3中任一项所述的逆转座酶的核酸和所述供体RNA或编码所述供体RNA的核酸是共价连接的,优选编码项1~3中任一项所述的逆转座酶的核酸和所述供体RNA或编码所述供体RNA的核酸形成融合核酸,进一步优选所述融合核酸包含RNA或DNA。The nucleic acid encoding the reverse transposase described in any one of items 1 to 3 and the donor RNA or the nucleic acid encoding the donor RNA are covalently linked. Preferably, the nucleic acid encoding the reverse transposase described in any one of items 1 to 3 and the donor RNA or the nucleic acid encoding the donor RNA form a fusion nucleic acid. Further preferably, the fusion nucleic acid comprises RNA or DNA.

8.根据项4~7中任一项所述的系统,其中,所述供体RNA包含:8. The system according to any one of items 4 to 7, wherein the donor RNA comprises:

任选与所述逆转座酶结合的5’非翻译序列(5’UTR),Optionally, a 5' untranslated sequence (5'UTR) to which the retrotransposase binds,

与所述逆转座酶结合的3’非翻译序列(3’UTR),a 3' untranslated sequence (3'UTR) to which the retrotransposase binds,

异源序列,以及Heterologous sequences, and

与所述异源序列可操作地链接的启动子,a promoter operably linked to the heterologous sequence,

优选所述启动子位于与所述逆转座酶结合的5’非翻译序列(5’UTR)与所述异源序列之间或者优选所述启动子位于与所述逆转座酶结合的3’非翻译序列(3’UTR)与所述异源序列之间。Preferably, the promoter is located between the 5' untranslated sequence (5'UTR) to which the reverse transposase binds and the heterologous sequence, or preferably, the promoter is located between the 3' untranslated sequence (3'UTR) to which the reverse transposase binds and the heterologous sequence.

9.根据项4~8中任一项所述的系统,其中,所述异源序列包含在所述供体RNA上以5’至3’取向的开放阅读框或其反向互补序列;或者所述异源序列包含在所述供体RNA上以3’至5’取向的开放阅读框或其反向互补序列。9. A system according to any one of items 4 to 8, wherein the heterologous sequence comprises an open reading frame or its reverse complement sequence oriented in a 5' to 3' direction on the donor RNA; or the heterologous sequence comprises an open reading frame or its reverse complement sequence oriented in a 3' to 5' direction on the donor RNA.

10.根据项4~9中任一项所述的系统,其中,所述供体RNA进一步包含核定位信号或者所述编码项1~3中任一项所述的逆转座酶的核酸包含核定位信号和/或核仁定位信号和/或出核信号。10. The system according to any one of items 4 to 9, wherein the donor RNA further comprises a nuclear localization signal or the nucleic acid encoding the retrotransposase according to any one of items 1 to 3 comprises a nuclear localization signal and/or a nucleolar localization signal and/or a nuclear export signal.

11.根据项4~10中任一项所述的系统,其中,编码项1~3中任一项所 述的逆转座酶的核酸和编码所述供体RNA的核酸以10:1~1:10的比例存在,例如以10:1、9:1、8:1、7:1、6:1、5:1、4:1、3:1、2:1、1:1、1:2、1:3、1:4、1:5、1:6、1:7、1:8、1:9、1:10的比例存在。11. A system according to any one of items 4 to 10, wherein the encoding of any one of items 1 to 3 The nucleic acid encoding the retrotransposase and the nucleic acid encoding the donor RNA are present in a ratio of 10:1 to 1:10, for example, 10:1, 9:1, 8:1, 7:1, 6:1, 5:1, 4:1, 3:1, 2:1, 1:1, 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9, or 1:10.

12.根据项4~11中任一项所述的系统,其中,其中所述供体RNA包含假结序列的5'的茎环序列或螺旋,优选包含假结序列的3',例如假结序列的3'和异源序列的5'的一个或多个(例如2、3或更多个)茎环序列或螺旋,进一步优选所述假结的供体RNA具有催化活性,例如,RNA切割活性,例如,顺式-RNA切割活性,或者12. A system according to any one of items 4 to 11, wherein the donor RNA comprises a stem-loop sequence or helix 5' of the pseudoknot sequence, preferably comprises one or more (e.g. 2, 3 or more) stem-loop sequences or helices 3' of the pseudoknot sequence, such as 3' of the pseudoknot sequence and 5' of the heterologous sequence, and further preferably the donor RNA of the pseudoknot has catalytic activity, such as RNA cleavage activity, such as cis-RNA cleavage activity, or

所述供体RNA包含例如所述异源序列的3’的至少一个茎环序列或螺旋,例如1、2、3、4、5或更多个茎环序列、发夹或螺旋序列。The donor RNA comprises, e.g., at least one stem-loop sequence or helix, e.g., 1, 2, 3, 4, 5 or more stem-loop sequences, hairpin or helix sequences, e.g., 3' to the heterologous sequence.

13.根据项4~12中任一项所述的系统,其中,13. A system according to any one of items 4 to 12, wherein:

所述供体RNA中的与所述逆转座酶结合的5’非翻译序列(5’UTR)与SEQ ID No.7~12或SEQ ID No.44~55中任一项所述的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性;The 5' untranslated sequence (5'UTR) in the donor RNA to which the retrotransposase binds has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence of any one of SEQ ID No.7 to 12 or SEQ ID No.44 to 55;

所述供体RNA中的与所述逆转座酶结合的3’非翻译序列(5’UTR)与SEQ ID No.13~18或SEQ ID No.56~67中任一项所述的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性。The 3’ non-translated sequence (5’UTR) in the donor RNA that binds to the retrotransposase has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity with the nucleotide sequence described in any one of SEQ ID No.13 to 18 or SEQ ID No.56 to 67.

14.根据项4~13中任一项所述的系统,其中,所述供体RNA从其5’末端到3’末端依次包括如下结构:14. The system according to any one of items 4 to 13, wherein the donor RNA comprises the following structures in order from its 5' end to its 3' end:

第一同源结构域,The first homology domain,

与所述逆转座酶结合的5’非翻译序列(5’UTR),a 5' untranslated sequence (5'UTR) to which the retrotransposase binds,

异源序列,Heterologous sequences,

与所述逆转座酶结合的3’非翻译序列(3’UTR),以及a 3' untranslated sequence (3'UTR) to which the retrotransposase binds, and

第二同源结构域;The second homology domain;

优选第一同源结构域为位于所述供体RNA的5’末端的与靶DNA链具有100%同一性的10个以上或20个以上或30个以上或40个以上或50个以上或60个以上或70个以上或80个以上或90个以上或100个以上的碱基,所述第二同源结构域为位于所述供体RNA的3’末端的与靶DNA链具有100%同一性的10个以上或20个以上或30个以上或40个以上或50个以上 或60个以上或70个以上或80个以上或90个以上或100个以上的碱基;Preferably, the first homology domain is 10 or more, 20 or more, 30 or more, 40 or more, 50 or more, 60 or more, 70 or more, 80 or more, 90 or more, or 100 or more bases located at the 5' end of the donor RNA and having 100% identity with the target DNA chain, and the second homology domain is 10 or more, 20 or more, 30 or more, 40 or more, or 50 or more bases located at the 3' end of the donor RNA and having 100% identity with the target DNA chain. or more than 60 or more than 70 or more than 80 or more than 90 or more than 100 bases;

进一步优选所述供体RNA中的与所述逆转座酶结合的5’非翻译序列(5’UTR)与SEQ ID No.7~12中任一项所述的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性;It is further preferred that the 5' untranslated sequence (5'UTR) in the donor RNA that binds to the retrotransposase has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence of any one of SEQ ID No.7 to 12;

所述供体RNA中的与所述逆转座酶结合的3’非翻译序列(5’UTR)与SEQ ID No.13~18中任一项所述的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性。The 3’ untranslated sequence (5’UTR) in the donor RNA that binds to the retrotransposase has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity with the nucleotide sequence described in any one of SEQ ID No. 13 to 18.

15.根据项8~14中的任一项所述的系统,其中,15. A system according to any one of items 8 to 14, wherein:

所述逆转座酶结合的5’非翻译序列(5’UTR)为非天然的5’非翻译序列(5’UTR);或者The 5' untranslated sequence (5'UTR) to which the retrotransposase binds is a non-natural 5' untranslated sequence (5'UTR); or

所述逆转座酶结合的3’非翻译序列(5’UTR)为非天然的3’非翻译序列(5’UTR);The 3' untranslated sequence (5'UTR) to which the retrotransposase binds is a non-natural 3' untranslated sequence (5'UTR);

进一步优选非天然的5’非翻译序列(5’UTR),相对于天然的5’UTR序列,具有核苷酸的增加、删除和/或替换;Further preferred are non-native 5' untranslated sequences (5'UTRs) having additions, deletions and/or substitutions of nucleotides relative to the native 5'UTR sequences;

进一步优选非天然的3’非翻译序列(3’UTR),相对于天然的3’UTR序列,具有核苷酸的增加、删除和/或替换;Further preferred are non-native 3' untranslated sequences (3'UTRs) having additions, deletions and/or substitutions of nucleotides relative to the native 3'UTR sequences;

进一步优选非天然的5’非翻译序列(5’UTR),与SEQ ID No.19-21的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性;Further preferred non-native 5' untranslated sequence (5'UTR) has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity to the nucleotide sequence of SEQ ID No.19-21;

进一步优选非天然的3’非翻译序列(3’UTR),与SEQ ID No.22-23中任一项所述的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性。The non-natural 3’ untranslated sequence (3’UTR) is further preferred, having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence described in any one of SEQ ID No.22-23.

16.根据项4~15中任一项所述的系统,其中,所述异源序列以至少0.01、0.025、0.05、0.075、0.1、0.15、0.2、0.25、0.3、0.4,0.5、0.75、1、1.25、1.5、1.75、2、2.5、3、4或5个拷贝/基因组的平均拷贝数插入受试者基因组中的靶位点,优选仅在基因组的一个靶位点处插入。16. A system according to any one of items 4 to 15, wherein the heterologous sequence is inserted into a target site in the subject's genome at an average copy number of at least 0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.5, 3, 4 or 5 copies/genome, preferably only at one target site in the genome.

17.根据项4~16中任一项所述的系统,其中,导致所述异源序列插入与所述系统接触的细胞群体中约1%-80%的细胞(例如约1%-10%、10%-20%、20%-30%、30%-40%、40%-50%、50%-60%、60%-70%或70%-80%的细胞)中的靶位点(例如,以1个插入或多于一个插入的拷贝数),例如,如使用单细胞ddPCR所测量的,或者 17. A system according to any one of items 4 to 16, wherein the heterologous sequence is caused to be inserted into the target site (e.g., at a copy number of 1 insertion or more than one insertion) in about 1%-80% of the cells (e.g., about 1%-10%, 10%-20%, 20%-30%, 30%-40%, 40%-50%, 50%-60%, 60%-70% or 70%-80% of the cells) in a cell population contacted with the system, for example, as measured using single cell ddPCR, or

导致所述异源序列以1个插入的拷贝数插入与所述系统接触的细胞群体中约1%-80%的细胞(例如约1%-10%、10%-20%、20%-30%、30%-40%、40%-50%、50%-60%、60%-70%或70%-80%的细胞)中的靶位点,例如,如使用菌落分离和ddPCR所测量的。The heterologous sequence is inserted into the target site at a copy number of 1 insertion in about 1%-80% of the cells (e.g., about 1%-10%, 10%-20%, 20%-30%, 30%-40%, 40%-50%, 50%-60%, 60%-70%, or 70%-80% of the cells) in a population of cells contacted with the system, for example, as measured using colony isolation and ddPCR.

18.根据项4~17中任一项所述的系统,其中,导致所述异源序列在细胞群体中以比插入非靶位点(脱靶插入)更高的速率插入靶位点(中靶插入),其中中靶插入与脱靶插入的比率大于10:1、20:1、30:1、40:1、50:1、60:1、70:1、80:1、90:1、100:1、200:1、500:1或1,000:1。18. A system according to any one of items 4 to 17, wherein the heterologous sequence is caused to be inserted into a target site (on-target insertion) in a cell population at a higher rate than insertion into a non-target site (off-target insertion), wherein the ratio of on-target insertion to off-target insertion is greater than 10:1, 20:1, 30:1, 40:1, 50:1, 60:1, 70:1, 80:1, 90:1, 100:1, 200:1, 500:1 or 1,000:1.

19.一种非天然的5’非翻译序列(5’UTR),其相对于天然的5’UTR序列,具有核苷酸的增加、删除和/或替换,优选与SEQ ID No.19-21的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性。19. A non-natural 5' untranslated sequence (5'UTR) having additions, deletions and/or substitutions of nucleotides relative to a natural 5'UTR sequence, preferably having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence of SEQ ID No. 19-21.

20.一种非天然的3’非翻译序列(3’UTR),其相对于天然的3’UTR序列,具有核苷酸的增加、删除和/或替换,优选与SEQ ID No.22-23的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性。20. A non-natural 3’ untranslated sequence (3’UTR) having additions, deletions and/or substitutions of nucleotides relative to a natural 3’UTR sequence, preferably having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence of SEQ ID No. 22-23.

21.一种工程化转座元件,其从5'到3'包含:21. An engineered transposable element comprising, from 5' to 3':

5’非翻译序列(5’UTR)、异源序列和3’非翻译序列(3’UTR),5' untranslated sequence (5'UTR), heterologous sequence and 3' untranslated sequence (3'UTR),

其中所述5’非翻译序列包含选自与SEQ ID No.19-21具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性的核苷酸序列;wherein the 5' untranslated sequence comprises a nucleotide sequence selected from SEQ ID No. 19-21 having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity;

进一步优选非天然的3’非翻译序列(3’UTR),与SEQ ID No.22-23中任一项所述的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性。The non-natural 3’ untranslated sequence (3’UTR) is further preferred, having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence described in any one of SEQ ID No.22-23.

22.根据项21所述的元件,其中,所述元件为项4~18中提及的供体RNA或编码所述供体RNA的核酸。22. The element according to item 21, wherein the element is the donor RNA mentioned in items 4 to 18 or a nucleic acid encoding the donor RNA.

23.一种宿主细胞,其包括项4~18中任一项所述的系统或项21或22所述的元件,优选所述宿主细胞为哺乳动物细胞和植物细胞,进一步优选为人的细胞。23. A host cell comprising the system according to any one of items 4 to 18 or the element according to item 21 or 22, wherein the host cell is preferably a mammalian cell or a plant cell, and more preferably a human cell.

24.一种修饰细胞、组织或受试者中的靶DNA链的方法,所述方法包括对所述细胞、组织或受试者使用项4~18中任一项所述的系统,其中所述 系统将所述供体RNA序列逆转录成所述靶DNA链,从而修饰细胞、组织或受试者中的靶DNA链。24. A method for modifying a target DNA strand in a cell, tissue or subject, the method comprising applying the system of any one of items 4 to 18 to the cell, tissue or subject, wherein the The system reverse transcribes the donor RNA sequence into the target DNA strand, thereby modifying the target DNA strand in a cell, tissue or subject.

25.根据项24所述的方法,其中,所述细胞、组织是哺乳动物的细胞、组织,优选是人的细胞、组织,所述受试者是哺乳动物,优选是人。25. The method according to item 24, wherein the cells and tissues are mammalian cells and tissues, preferably human cells and tissues, and the subject is a mammal, preferably a human.

26.根据项24或25所述的方法,其中,所述细胞是成纤维细胞或原代细胞或没有被永生化的细胞。26. The method according to item 24 or 25, wherein the cells are fibroblasts or primary cells or cells that have not been immortalized.

27.根据项24~26中任一项所述的方法,其中,所述方法在体内或体外进行。27. The method according to any one of items 24 to 26, wherein the method is performed in vivo or in vitro.

28.一种修饰哺乳动物细胞基因组或将DNA插入哺乳动物基因组的方法,所述方法包括对所述细胞使用项4~18中任一项所述的系统,优选所述哺乳动物为人。28. A method for modifying the genome of a mammalian cell or inserting DNA into the genome of a mammal, the method comprising applying the system of any one of items 4 to 18 to the cell, preferably the mammal is a human.

29.根据项28所述的方法,其中,所述方法导致向所述哺乳动物的基因组添加外源DNA序列至少5、10、20、50、100、200、500、1000、2000、3000、4000、5000、6000、7000、8000、9000、10000个碱基对。29. A method according to claim 28, wherein the method results in the addition of at least 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000 base pairs of exogenous DNA sequence to the genome of the mammal.

30.根据项24~29中任一项所述的方法,其中,所述细胞是组织的一部分;或者所述哺乳动物细胞是整倍体,没有被永生化,是生物体的一部分,是原代细胞,是非分裂的,是肝细胞或来自患有遗传性疾病的受试者。30. A method according to any one of items 24 to 29, wherein the cell is part of a tissue; or the mammalian cell is euploid, not immortalized, is part of an organism, is a primary cell, is non-dividing, is a hepatocyte or is from a subject with a genetic disease.

31.根据项24~30中任一项所述的方法,其中,31. The method according to any one of items 24 to 30, wherein

所述方法包括使细胞、组织或受试者与项1~3中任一项所示的逆转座酶或编码项1~3中任一项所述的逆转座酶的核酸和供体RNA或编码所述供体RNA的核酸接触,The method comprises contacting a cell, a tissue or a subject with a retrotransposase as described in any one of items 1 to 3 or a nucleic acid encoding the retrotransposase as described in any one of items 1 to 3 and a donor RNA or a nucleic acid encoding the donor RNA,

优选所述接触包括使所述细胞、组织或受试者与质粒、病毒、病毒样颗粒、病毒体、脂质体、囊泡、外来体或脂质纳米颗粒接触;Preferably, the contacting comprises contacting the cell, tissue or subject with a plasmid, virus, virus-like particle, virosome, liposome, vesicle, exosome or lipid nanoparticle;

进一步优选所述接触包括使用非病毒递送,比如电穿孔。It is further preferred that said contacting comprises the use of non-viral delivery, such as electroporation.

32.根据项31所述的方法,其中,32. The method according to claim 31, wherein:

所述接触包括对受试者进行静脉施用,优选至少向受试者施用两次项1~3中任一项所示的逆转座酶或编码项1~3中任一项所述的逆转座酶的核酸和供体RNA或编码所述供体RNA的核酸。The contacting comprises intravenously administering to the subject, preferably at least twice, the retrotransposase shown in any one of Items 1 to 3 or a nucleic acid encoding the retrotransposase shown in any one of Items 1 to 3 and a donor RNA or a nucleic acid encoding the donor RNA.

33.根据项24~32中任一项所述的方法,其中,33. The method according to any one of items 24 to 32, wherein

项1~3中任一项所示的逆转座酶或编码项1~3中任一项所述的逆转座酶的核酸和供体RNA或编码所述供体RNA的核酸分开施用;或者 The retrotransposase described in any one of Items 1 to 3 or the nucleic acid encoding the retrotransposase described in any one of Items 1 to 3 and the donor RNA or the nucleic acid encoding the donor RNA are administered separately; or

项1~3中任一项所示的逆转座酶或编码项1~3中任一项所述的逆转座酶的核酸和供体RNA或编码所述供体RNA的核酸一起施用。The retrotransposase described in any one of Items 1 to 3 or a nucleic acid encoding the retrotransposase described in any one of Items 1 to 3 is administered together with a donor RNA or a nucleic acid encoding the donor RNA.

34.一种编码项1~3中任一项所述的逆转座酶的核酸。34. A nucleic acid encoding the retrotransposase according to any one of items 1 to 3.

35.一种包含项34所述的核酸的载体。35. A vector comprising the nucleic acid described in item 34.

36.一种包含项35所述的载体的宿主细胞。36. A host cell comprising the vector of item 35.

37.一种药物组合物,其包括项4~18中任一项所述的系统、或项34所述的核酸、或项35所述的载体、或项23或36所述的宿主细胞,优选所述系统置于药学上可接受的载体中,进一步优选所述载体为囊泡(包括脂质体、天然或合成脂质双分子层、外来体)、脂质纳米颗粒、病毒或质粒载体。37. A pharmaceutical composition comprising the system described in any one of items 4 to 18, or the nucleic acid described in item 34, or the vector described in item 35, or the host cell described in item 23 or 36. Preferably, the system is placed in a pharmaceutically acceptable carrier, and further preferably, the carrier is a vesicle (including liposomes, natural or synthetic lipid bilayers, exosomes), lipid nanoparticles, viruses or plasmid vectors.

发明效果Effects of the Invention

本申请构建的系统和方法在仅需使用RNA供体就能实现在DNA水平的基因写入,具有技术上的创新。通过将编码基因整合到RNA序列模板中,本申请的系统和方法可以满足但不限于:The system and method constructed by the present application can realize gene writing at the DNA level by using only RNA donors, which is a technical innovation. By integrating the coding gene into the RNA sequence template, the system and method of the present application can meet but are not limited to:

治疗需求,例如,通过在具有功能丧失性突变的个体中提供治疗性转基因的表达,通过以正常转基因代替功能获得性突变,通过提供调节序列以消除功能获得性突变表达,和/或通过控制可操作地连接的基因、转基因及其系统的表达。在某些实施例中,RNA序列模板编码对宿主细胞的治疗需要具有特异性的启动子区,例如组织特异性启动子或增强子。在其他实施例中,启动子可以可操作地连接至编码序列。Treatment needs, for example, by providing expression of a therapeutic transgene in an individual with a loss-of-function mutation, by replacing a gain-of-function mutation with a normal transgene, by providing a regulatory sequence to eliminate expression of a gain-of-function mutation, and/or by controlling the expression of operably linked genes, transgenes, and systems thereof. In certain embodiments, the RNA sequence template encodes a promoter region specific to the host cell's therapeutic needs, such as a tissue-specific promoter or enhancer. In other embodiments, the promoter can be operably linked to a coding sequence.

功能细胞的制备需求,比如说将具有特定功能的生物大分子(例如嵌合抗原受体(CAR)等)整合入免疫细胞中,赋予其肿瘤杀伤的新功能。The need to prepare functional cells, for example, integrating biological macromolecules with specific functions (such as chimeric antigen receptors (CARs)) into immune cells to give them new tumor-killing functions.

作物育种的新需求。比如,通过在植物的愈伤组织内整合特定的基因,赋予植物新的经济性状(例如抗逆性,抗虫性等)。New demands for crop breeding. For example, by integrating specific genes into the callus tissue of plants, plants can be given new economic traits (such as stress resistance, insect resistance, etc.).

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1:显示表达逆转座酶蛋白的载体与表达供体RNA的载体的结构。FIG. 1 : shows the structures of the vector expressing the retrotransposase protein and the vector expressing the donor RNA.

图2:使用实施例2构建的系统对多种逆转座酶系统在哺乳动物中的活性结果。结果显示,相对于阴性对照,6个新型的逆转座酶系统(#3,#21,#23,#24,#31,#33)具有明显的活性。Figure 2: Activity results of various retrotransposase systems in mammals using the system constructed in Example 2. The results showed that compared with the negative control, the six novel retrotransposase systems (#3, #21, #23, #24, #31, #33) had significant activity.

图3:显示#21逆转座酶在哺乳动物细胞实现GFP的基因整合后的GFP 表达结果,并通过流式分析技术来对GFP阳性细胞的比例进行定量的结果。Figure 3: GFP after #21 retrotransposase achieves GFP gene integration in mammalian cells The expression results were obtained and the proportion of GFP-positive cells was quantified by flow cytometry.

图4:显示#21逆转座酶在哺乳动物细胞实现GFP的基因整合后的GFP表达结果,并通过荧光显微镜来观察GFP的表达。FIG. 4 shows the GFP expression results after the #21 retrotransposase achieves GFP gene integration in mammalian cells, and the GFP expression is observed by fluorescence microscopy.

图5:显示#21逆转座酶在哺乳动物细胞实现GFP的基因整合,探究包含非天然5’UTR或/和非天然3’UTR的供体RNA对#21逆转座酶在哺乳动物细胞中基因整合的效率的影响。FIG5 shows that #21 retrotransposase achieves GFP gene integration in mammalian cells, and explores the effect of donor RNA containing non-natural 5’UTR and/or non-natural 3’UTR on the efficiency of gene integration of #21 retrotransposase in mammalian cells.

图6:实施例中的PCR引物的位置展示。FIG. 6 : Display of the positions of PCR primers in the examples.

图7:不同逆转座酶系统在哺乳动物细胞中整合DNA序列的3‘端与基因组(28s rDNA基因内部)的接头处的PCR扩增结果。Figure 7: PCR amplification results of the junction between the 3’ end of the integrated DNA sequence and the genome (inside the 28s rDNA gene) in mammalian cells using different retrotransposase systems.

图8:不同逆转座酶系统在哺乳动物细胞中整合序列的跨内含子两端的序列PCR扩增结果。FIG8 : PCR amplification results of sequences spanning both ends of introns of integration sequences in mammalian cells using different retrotransposase systems.

图9:4种#21逆转座酶在哺乳动物细胞实现GFP的基因整合后的GFP的表达结果,并通过荧光显微镜来观察GFP的表达量。FIG. 9 shows the expression results of GFP after the GFP gene was integrated into mammalian cells by four types of #21 retrotransposases, and the expression level of GFP was observed by fluorescence microscopy.

具体实施方法Specific implementation methods

需要说明的是,在说明书及权利要求当中使用了某些词汇来指称特定组件。本领域技术人员应可以理解,技术人员可能会用不同名词来称呼同一个组件。本说明书及权利要求并不以名词的差异来作为区分组件的方式,而是以组件在功能上的差异来作为区分的准则。如在通篇说明书及权利要求当中所提及的“包含”或“包括”为一开放式用语,故应解释成“包含但不限定于”。说明书后续描述为实施本发明的较佳实施方式,然所述描述乃以说明书的一般原则为目的,并非用以限定本发明的范围。本发明的保护范围当视所附权利要求所界定者为准。除非另有定义,本文中使用的所有技术和科学术语具有与本公开所属领域的普通技术人员通常所理解的相同含义。It should be noted that certain words are used in the specification and claims to refer to specific components. Those skilled in the art should understand that technicians may use different nouns to refer to the same component. This specification and claims do not use the difference in nouns as a way to distinguish components, but use the difference in the functions of the components as the criterion for distinction. As mentioned throughout the specification and claims, "including" or "comprising" is an open term, so it should be interpreted as "including but not limited to". The subsequent description of the specification is a preferred embodiment of the present invention, but the description is based on the general principles of the specification and is not intended to limit the scope of the present invention. The scope of protection of the present invention shall be defined by the attached claims. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by ordinary technicians in the field to which the present disclosure belongs.

术语“核酸”、“多核苷酸”和“核苷酸序列”可互换使用,是指任何长度的核苷酸的聚合形式,包括脱氧核糖核苷酸、核糖核苷酸、其组合及其类似物。“寡核苷酸”和“低聚核苷酸”可互换使用,是指具有不超过约50个核苷酸的短多核苷酸。如本文所用,“互补性”是指核酸通过传统的沃森-克里克(Watson-Crick)碱基配对与另一核酸形成氢键的能力。互补性百分比表示可与第二种核酸形成氢键(即,沃森-克里克碱基配对)的核酸分子中的残基百分比(例如,10分之5、6、7、8、9、10,分别互补约50%、60%、70%、80%、 90%和100%)。“完全互补”是指核酸序列的所有连续残基与第二核酸序列中相同数目的连续残基形成氢键。如本文所用,“基本上互补”是指在约40、50、60、70、80、100、150、200、250个或更多个核苷酸的区域上,互补程度为至少约70%、75%、80%、85%、90%、95%、97%、98%、99%或100中任一个,或指在严格条件下杂交的两种核酸。The terms "nucleic acid,""polynucleotide," and "nucleotide sequence" are used interchangeably and refer to a polymeric form of nucleotides of any length, including deoxyribonucleotides, ribonucleotides, combinations thereof, and their analogs. "Oligonucleotide" and "oligonucleotide" are used interchangeably and refer to short polynucleotides having no more than about 50 nucleotides. As used herein, "complementarity" refers to the ability of a nucleic acid to form hydrogen bonds with another nucleic acid via traditional Watson-Crick base pairing. The complementarity percentage indicates the percentage of residues in a nucleic acid molecule that can form hydrogen bonds (i.e., Watson-Crick base pairing) with a second nucleic acid (e.g., 5, 6, 7, 8, 9, 10 out of 10, which are about 50%, 60%, 70%, 80%, 90%, 100% complementary, respectively). "Completely complementary" means that all consecutive residues of a nucleic acid sequence form hydrogen bonds with the same number of consecutive residues in a second nucleic acid sequence. As used herein, "substantially complementary" means that the degree of complementarity is at least about any of 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99% or 100% over a region of about 40, 50, 60, 70, 80, 100, 150, 200, 250 or more nucleotides, or refers to two nucleic acids that hybridize under stringent conditions.

针对核酸序列的“序列同一性百分比(%)”定义为,在通过允许空缺(gaps)来比对序列(如有必要)以实现最大的序列同一性百分比后,候选序列中与特定核酸序列中的核苷酸相同的核苷酸百分比。针对肽、多肽或蛋白质序列的“序列同一性百分比(%)”,是在通过允许空缺来比对序列(如有必要)以实现最大的序列同源性百分比后,候选序列中与特定肽或氨基酸序列中的氨基酸残基相同替换的氨基酸残基的百分比。为了确定氨基酸序列同一性百分比的目的,比对可以以本领域技术范围内的各种方式来实现,例如,使用诸如mafft、muscle、Clustal、needle、BLAST、BLAST-2、ALIGN或MEGALIGNTM(DNASTAR)软件之类的公众可获得的计算机软件,例如优选使用mafft、muscle、Clustal、needle等方法。本领域技术人员可以确定用于测量比对的合适参数,包括在所比较序列的全长上实现最大比对所需的任何算法。"Percentage (%) sequence identity" for nucleic acid sequences is defined as the percentage of nucleotides in a candidate sequence that are identical to the nucleotides in a particular nucleic acid sequence after alignment of the sequences (if necessary) by allowing gaps to achieve the maximum percentage of sequence identity. "Percentage (%) sequence identity" for peptide, polypeptide or protein sequences is the percentage of amino acid residues in a candidate sequence that are identically replaced with the amino acid residues in a particular peptide or amino acid sequence after alignment of the sequences (if necessary) by allowing gaps to achieve the maximum percentage of sequence homology. For the purpose of determining the percentage of amino acid sequence identity, alignment can be achieved in various ways within the technical scope of the art, for example, using publicly available computer software such as mafft, muscle, Clustal, needle, BLAST, BLAST-2, ALIGN or MEGALIGNTM (DNASTAR) software, for example, preferably using methods such as mafft, muscle, Clustal, needle. Those skilled in the art can determine suitable parameters for measuring alignment, including any algorithm required for achieving maximum alignment over the full length of the compared sequence.

术语“多肽”和“肽”在本文可互换使用,是指任何长度的氨基酸的聚合物。所述聚合物可以是直链或支链的,它可以包含经修饰的氨基酸,并且可以被非氨基酸中断。蛋白质可以具有一个或多个多肽。该术语还涵盖已经过修饰的氨基酸聚合物;例如,二硫键的形成、糖基化、脂质化、乙酰化、磷酸化或任何其他操作(诸如与标记组分的缀合)。The terms "polypeptide" and "peptide" are used interchangeably herein and refer to polymers of amino acids of any length. The polymer may be linear or branched, it may contain modified amino acids, and may be interrupted by non-amino acids. A protein may have one or more polypeptides. The term also encompasses amino acid polymers that have been modified; for example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation (such as conjugation with a labeling component).

如本文所用,“变体”解释为分别不同于参比多核苷酸或多肽但保留必要特性的多核苷酸或多肽。多核苷酸的典型变体与另一参比多核苷酸的核酸序列不同。变体核酸序列的变化可以改变或可以不改变参比多核苷酸编码的多肽的氨基酸序列。核苷酸变化可导致参比序列编码的多肽中的氨基酸替换、添加、缺失、融合和截短,如下所述。多肽的典型变体与另一参比多肽在氨基酸序列上不同。通常,差异是有限的,使得参比多肽和变体的序列总体上非常相似,并且在许多区域是相同的。变体和参比多肽的氨基酸序列可以通过一个或多个替换、添加、缺失的任何组合而不同。替换或插入的氨基酸残基可以是或可以不是遗传密码编码的氨基酸残基。多核苷酸或多肽的变体可以是天然存在的(诸如等位基因变体),或者可以是未知天然存在的变体。多 核苷酸和多肽的非天然存在的变体可以通过诱变技术,通过直接合成,以及通过本领域技术人员已知的其他重组方法来制备。As used herein, "variant" is interpreted as a polynucleotide or polypeptide that is different from a reference polynucleotide or polypeptide, respectively, but retains the necessary properties. A typical variant of a polynucleotide differs from the nucleic acid sequence of another reference polynucleotide. Changes in the variant nucleic acid sequence may or may not change the amino acid sequence of the polypeptide encoded by the reference polynucleotide. Nucleotide changes may result in amino acid substitutions, additions, deletions, fusions and truncations in the polypeptide encoded by the reference sequence, as described below. A typical variant of a polypeptide differs from another reference polypeptide in amino acid sequence. Usually, the differences are limited so that the sequences of the reference polypeptide and the variant are very similar overall and identical in many regions. The amino acid sequences of the variant and the reference polypeptide may differ by any combination of one or more substitutions, additions, deletions. The substituted or inserted amino acid residues may or may not be amino acid residues encoded by the genetic code. Variants of polynucleotides or polypeptides may be naturally occurring (such as allelic variants), or may be unknown naturally occurring variants. Non-naturally occurring variants of nucleotides and polypeptides can be prepared by mutagenesis techniques, by direct synthesis, and by other recombinant methods known to those skilled in the art.

如本文所用,术语“野生型”具有本领域技术人员通常理解的含义,意指当它存在于大自然中时,将其与突变体或变体区分开的、典型形式的生物体、菌株、基因或特征。它可以与自然界中的资源隔离开来,并没有被刻意修饰。As used herein, the term "wild type" has a meaning generally understood by those skilled in the art, and refers to an organism, strain, gene or characteristic in a typical form that distinguishes it from a mutant or variant when it exists in nature. It can be isolated from resources in nature and has not been intentionally modified.

如本文所用,术语“非天然存在”或“工程化的”可互换使用,是指人工参与。当这些术语用于描述核酸分子或多肽时,是指所述核酸分子或多肽至少基本上不含其天然缔合的或天然存在的至少一种其他组分。As used herein, the terms "non-naturally occurring" or "engineered" are used interchangeably and refer to human involvement. When these terms are used to describe a nucleic acid molecule or polypeptide, it means that the nucleic acid molecule or polypeptide is at least substantially free of at least one other component with which it is naturally associated or naturally occurring.

本文所用的“细胞”应理解为不仅指特定的单个细胞,而且指该细胞的后代或潜在后代。因为由于突变或环境影响,可能在后代中发生某些修饰,所以此类后代可能事实上与亲本细胞不同,但仍包括在本文术语的范围内。"Cell" as used herein should be understood to refer not only to a particular individual cell, but also to the progeny or potential progeny of that cell. Because certain modifications may occur in progeny due to mutation or environmental influences, such progeny may not in fact be the same as the parent cell, but are still included within the scope of the term herein.

如本文所用,术语“转导”和“转染”包括本领域已知的使用感染剂(如病毒)或其他方式将DNA引入细胞中以表达目的蛋白质或分子的方法。除了病毒或类似病毒的试剂外,还有基于化学的转染方法,如使用磷酸钙、树状聚合物,脂质体或阳离子聚合物(例如DEAE-葡聚糖或聚乙烯亚胺)的转染方法;非化学方法,如电穿孔、细胞挤压(cell squeezing)、声致穿孔(sonoporation)、光学转染、穿刺转染(impalefection)、原生质体融合、质粒递送或转座子;基于颗粒的方法,如使用基因枪、磁转染或磁体辅助转染、颗粒轰击;以及杂交方法(诸如核转染)。As used herein, the terms "transduction" and "transfection" include methods known in the art for introducing DNA into cells using infectious agents (such as viruses) or other means to express a protein or molecule of interest. In addition to viruses or virus-like agents, there are chemical-based transfection methods, such as transfection methods using calcium phosphate, dendrimers, liposomes, or cationic polymers (e.g., DEAE-dextran or polyethyleneimine); non-chemical methods, such as electroporation, cell squeezing, sonoporation, optical transfection, impalefection, protoplast fusion, plasmid delivery, or transposon; particle-based methods, such as using a gene gun, magnetofection or magnet-assisted transfection, particle bombardment; and hybrid methods (such as nuclear transfection).

如本文所用,术语“转染的”、“转化的”或“转导的”是指将外源核酸转移或引入宿主细胞的过程。“转染的”、“转化的”或“转导的”细胞是已经用外源核酸转染、转化或转导的细胞。As used herein, the term "transfected," "transformed," or "transduced" refers to the process of transferring or introducing exogenous nucleic acid into a host cell. A "transfected," "transformed," or "transduced" cell is a cell that has been transfected, transformed, or transduced with exogenous nucleic acid.

术语“体内”是指从其中获得细胞的该生物体内。“离体”或“体外”是指从其中获得细胞的该生物体外。The term "in vivo" refers to inside the organism from which the cell was obtained. "Ex vivo" or "in vitro" refers to outside the organism from which the cell was obtained.

如本文所用,“治疗(treatment/treating)”是用于获得有益的或期望的结果(包括临床结果)的方法。为了本发明的目的,有益的或期望的临床结果包括但不限于以下的一种或多种:减轻由疾病引起的一种或多种症状,减轻疾病的程度,稳定疾病(例如预防或延缓疾病的恶化),预防或延缓疾病的扩散(例如转移),预防或延缓疾病的复发,降低疾病的复发率,延缓或减慢疾病的进展,改善疾病状态,提供疾病的(部分或全部)缓解,减少治疗该疾病所需的一种或多种其他药物的剂量,延缓疾病的进展,提高生活质量,和/或延长 生存期。“治疗”还包括减少病症、病况或疾病的病理后果。本发明的方法考虑了这些治疗的方面中的任何一个或多个。As used herein, "treatment" or "treating" is a method for obtaining beneficial or desired results (including clinical results). For the purposes of the present invention, beneficial or desired clinical results include, but are not limited to, one or more of the following: alleviating one or more symptoms caused by the disease, alleviating the extent of the disease, stabilizing the disease (e.g., preventing or delaying the worsening of the disease), preventing or delaying the spread of the disease (e.g., metastasis), preventing or delaying the recurrence of the disease, reducing the recurrence rate of the disease, delaying or slowing the progression of the disease, improving the disease state, providing (partial or complete) remission of the disease, reducing the dose of one or more other drugs required to treat the disease, delaying the progression of the disease, improving the quality of life, and/or prolonging Survival. "Treatment" also includes reducing the pathological consequences of a disorder, condition, or disease. The methods of the present invention contemplate any one or more of these aspects of treatment.

如本文所用,术语“有效量”是指足以治疗特定病症、病况或疾病(如改善、缓解、减轻和/或延迟其一种或多种症状)的化合物或组合物的量。如本领域中所理解的,“有效量”可以以一次或多次给药,即,可能需要单次给药或多此给药来达到期望的治疗终点。As used herein, the term "effective amount" refers to an amount of a compound or composition sufficient to treat a particular disorder, condition or disease (such as improving, alleviating, alleviating and/or delaying one or more symptoms thereof). As understood in the art, an "effective amount" can be administered in one or more doses, i.e., a single dose or multiple doses may be required to achieve a desired treatment endpoint.

“受试者”、“个体”或“患者”在本文中可互换使用,以达到治疗目的,是指任何归类为哺乳动物的动物,包括人类、家畜和农场动物,以及动物园、农场或宠物动物如狗、马、猫、牛等。在一些实施方案中,所述个体是人类个体。"Subject," "individual," or "patient" are used interchangeably herein for purposes of treatment and refer to any animal classified as a mammal, including humans, livestock and farm animals, and zoo, farm, or pet animals such as dogs, horses, cats, cows, etc. In some embodiments, the individual is a human individual.

应理解,本文所述的本发明的实施方案包括“由...组成”和/或“基本上由...组成”的实施方案。在本文中对“约”值或参数的提及包括(并描述了)针对该值或参数本身的变化。例如,提及“大约X”的描述,包括对“X”的描述。It is to be understood that embodiments of the invention described herein include "consisting of" and/or "consisting essentially of" embodiments. Reference herein to "about" a value or parameter includes (and describes) variations with respect to that value or parameter itself. For example, a description referring to "about X" includes a description of "X".

如本文所用,对“不”值或参数的提及通常意指并描述了“除…外”值或参数。例如,所述方法不用于治疗X型癌症,意味着所述方法用于治疗除X型以外的癌症。As used herein, reference to "not" a value or parameter generally means and describes an "except" value or parameter. For example, the method is not used to treat cancer type X, meaning that the method is used to treat cancers other than type X.

如本文所用,术语“大约X-Y”具有与“大约X至大约Y”相同的含义。As used herein, the term "about X-Y" has the same meaning as "about X to about Y."

如本文和所附权利要求书中所使用的,单数形式“一个/一种(a/an)”和“所述”包括复数对象,除非上下文另外明确指出。还应注意,权利要求可以被撰写为排除任何可选的要素。因此此陈述旨在作为与权利要求要素的叙述结合使用诸如“只”、“仅”等排他性术语的先行基础,或使用“否”的限制。As used herein and in the appended claims, the singular forms "a", "an", and "the" include plural referents unless the context clearly dictates otherwise. It should also be noted that the claims may be drafted to exclude any optional elements. This statement is therefore intended to serve as antecedent basis for the use of exclusive terminology such as "only", "only" and the like in connection with the recitation of claim elements, or the use of a limitation of "no".

如本文所用,术语“和/或”在词语诸如“A和/或B”中,旨在既包括A和B;A或B;A(单独);以及B(单独)。同样地,如本文所用,术语“和/或”在词语诸如“A、B和/或C”中,旨在包括以下每个实施方案:A、B和C;A、B或C;A或C;A或B;B或C;A和C;A和B;B和C;A(单独);B(单独);以及C(单独)。As used herein, the term "and/or" in phrases such as "A and/or B" is intended to include both A and B; A or B; A (alone); and B (alone). Similarly, as used herein, the term "and/or" in phrases such as "A, B, and/or C" is intended to include each of the following embodiments: A, B, and C; A, B, or C; A or C; A or B; B or C; A and C; A and B; B and C; A (alone); B (alone); and C (alone).

如未特殊说明,本文所述突变可以包括一个或多个:插入、删除、置换,可以是单个氨基酸或多个氨基酸的突变。Unless otherwise specified, the mutations described herein may include one or more of: insertion, deletion, substitution, and may be mutations of a single amino acid or multiple amino acids.

“载体”是包含分离的核酸并且可以用于将所述分离的核酸递送至细胞内部的物质组合物。许多载体是本领域已知的,包括但不限于:线性多核苷酸、与离子或两亲性化合物缔合的多核苷酸、质粒和病毒。通常,合适的载 体包含在至少一种生物中起作用的复制起点、启动子序列、方便的限制性核酸内切酶位点和一种或多种选择性标记物。术语“载体”也应被解释为包括非质粒和非病毒化合物,其促进核酸转移到细胞中,诸如例如,聚赖氨酸化合物、脂质体等。A "vector" is a composition of matter that contains an isolated nucleic acid and can be used to deliver the isolated nucleic acid to the interior of a cell. Many vectors are known in the art, including but not limited to linear polynucleotides, polynucleotides associated with ions or amphiphilic compounds, plasmids, and viruses. In general, suitable vectors are The vector comprises a replication origin, a promoter sequence, a convenient restriction endonuclease site and one or more selective markers that function in at least one organism. The term "vector" should also be interpreted as including non-plasmid and non-viral compounds that facilitate nucleic acid transfer into cells, such as, for example, polylysine compounds, liposomes, etc.

在一些实施方案中,所述载体是病毒载体。病毒载体的实例包括但不限于:腺病毒载体、腺相关病毒载体、慢病毒载体、逆转录病毒载体、牛痘载体、单纯疱疹病毒载体及其衍生物。在一些实施方案中,所述载体是噬菌体载体。病毒载体技术是本领域众所周知的,并且例如描述于Sambrook等人(2001,Molecular Cloning:A Laboratory Manual,Cold Spring Harbor Laboratory,New York),以及其他病毒学和分子生物学手册。In some embodiments, the vector is a viral vector. Examples of viral vectors include, but are not limited to, adenoviral vectors, adeno-associated viral vectors, lentiviral vectors, retroviral vectors, vaccinia vectors, herpes simplex virus vectors, and derivatives thereof. In some embodiments, the vector is a bacteriophage vector. Viral vector technology is well known in the art and is described, for example, in Sambrook et al. (2001, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, New York), and other virology and molecular biology manuals.

在一些实施方案中,rAAV构建体可经肠内向受试者施用。在一些实施方案中,rAAV构建体可经肠胃外向受试者施用。在一些实施方案中,rAAV颗粒可经皮下、眼内、玻璃体内、视网膜下、静脉内(IV)、脑室内、肌内、鞘内(IT)、脑池内、腹膜内、经由吸入、局部或通过直接注射到一种或多种细胞、组织或器官。在一些实施方案中,rAAV颗粒可通过注射到肝动脉或门静脉中向受试者施用。In some embodiments, the rAAV construct can be administered to a subject enterally. In some embodiments, the rAAV construct can be administered to a subject parenterally. In some embodiments, the rAAV particles can be administered subcutaneously, intraocularly, intravitreally, subretinally, intravenously (IV), intraventricularly, intramuscularly, intrathecally (IT), intracisternal, intraperitoneally, via inhalation, topically, or by direct injection into one or more cells, tissues, or organs. In some embodiments, the rAAV particles can be administered to a subject by injection into the hepatic artery or portal vein.

将载体引入哺乳动物细胞的方法是本领域已知的。可以通过物理、化学或生物学方法将载体转移到宿主细胞中。Methods for introducing vectors into mammalian cells are known in the art. Vectors can be transferred into host cells by physical, chemical or biological methods.

用于将载体引入宿主细胞的物理方法包括:磷酸钙沉淀、脂质转染、粒子轰击、显微注射、电穿孔等。产生包含载体和/或外源核酸的细胞的方法是本领域众所周知的。参见,例如Sambrook et al.(2001)Molecular Cloning:A Laboratory Manual,Cold Spring Harbor Laboratory,New York。在一些实施方案中,通过电穿孔将所述载体引入所述细胞。Physical methods for introducing vectors into host cells include: calcium phosphate precipitation, lipofection, particle bombardment, microinjection, electroporation, etc. Methods for producing cells containing vectors and/or exogenous nucleic acids are well known in the art. See, for example, Sambrook et al. (2001) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, New York. In some embodiments, the vector is introduced into the cell by electroporation.

本文描述的方法适用于任何合适的细胞类型。在一些实施方案中,所述细胞是细菌、酵母细胞、真菌细胞、藻类细胞、植物细胞或动物细胞(例如,哺乳动物细胞,如人细胞)。在一些实施方案中,所述细胞是自然来源的诸如由组织活检分离出的细胞。在一些实施方案中,所述细胞是从体外培养的细胞系分离的细胞。在一些实施方案中,所述细胞来自原代细胞系。在一些实施方案中,所述细胞来自永生化细胞系。在一些实施方案中,所述细胞是基因工程化的细胞。 The methods described herein are applicable to any suitable cell type. In some embodiments, the cell is a bacterium, a yeast cell, a fungal cell, an algae cell, a plant cell or an animal cell (e.g., a mammalian cell, such as a human cell). In some embodiments, the cell is a cell of natural origin, such as a cell isolated by a tissue biopsy. In some embodiments, the cell is a cell isolated from a cell line cultured in vitro. In some embodiments, the cell is from a primary cell line. In some embodiments, the cell is from an immortalized cell line. In some embodiments, the cell is a genetically engineered cell.

本文的核定位信号(Nuclear localization signal)是蛋白质的一个结构域,通常为一短的氨基酸序列,它能与入核载体相互作用,使蛋白能被运进细胞核。同时,核定位信号也有可能是一段RNA序列,在某些实施例中,核定位信号位于供体RNA上。在某些实施例中,逆转座酶多肽被编码在第一RNA上,并且供体RNA是第二单独RNA,并且核定位信号位于供体RNA上而不是在编码逆转座酶多肽的RNA上。尽管不希望受到理论的束缚,但是在一些实施例中,编码逆转座酶的RNA主要靶向细胞质以促进其翻译,而供体RNA主要靶向核以促进其逆转座进入基因组。在一些实施例中,核定位信号在供体RNA的3’末端、5'末端或内部。在一些实施例中,核定位信号在异源序列的3’(例如,直接在异源序列的3’)或在异源序列的5’(例如,直接在异源序列的5’)。在一些实施例中,核定位信号被置于供体RNA的5’UTR之外或3’UTR之外。在一些实施例中,核定位信号放置在5’UTR和3’UTR之间,其中任选地,核定位信号不随转基因转录(例如,核定位信号是反义取向或在转录终止信号或聚腺苷酸化信号的下游)。在一些实施例中,核定位序列位于内含子内部。在一些实施例中,多个相同或不同的核定位信号在RNA中,例如在供体RNA中。在一些实施例中,核定位信号的长度小于5、10、25、50、75、100、150、200、250、300、350、400、450、500、600、700、800、900或1000bp。可以使用各种RNA核定位序列。The nuclear localization signal herein is a domain of a protein, usually a short amino acid sequence, which can interact with a nuclear import carrier to enable the protein to be transported into the cell nucleus. At the same time, the nuclear localization signal may also be a RNA sequence. In some embodiments, the nuclear localization signal is located on the donor RNA. In some embodiments, the retrotransposase polypeptide is encoded on the first RNA, and the donor RNA is a second separate RNA, and the nuclear localization signal is located on the donor RNA instead of on the RNA encoding the retrotransposase polypeptide. Although it is not desired to be bound by theory, in some embodiments, the RNA encoding the retrotransposase is mainly targeted to the cytoplasm to promote its translation, while the donor RNA is mainly targeted to the nucleus to promote its retrotransposition into the genome. In some embodiments, the nuclear localization signal is at the 3' end, 5' end or inside of the donor RNA. In some embodiments, the nuclear localization signal is at the 3' end of the heterologous sequence (e.g., directly at the 3' end of the heterologous sequence) or at the 5' end of the heterologous sequence (e.g., directly at the 5' end of the heterologous sequence). In some embodiments, the nuclear localization signal is placed outside the 5' UTR of the donor RNA or outside the 3' UTR. In some embodiments, the nuclear localization signal is placed between the 5'UTR and the 3'UTR, wherein optionally, the nuclear localization signal is not transcribed with the transgene (e.g., the nuclear localization signal is in an antisense orientation or downstream of a transcription termination signal or a polyadenylation signal). In some embodiments, the nuclear localization sequence is located within an intron. In some embodiments, a plurality of identical or different nuclear localization signals are in RNA, such as in a donor RNA. In some embodiments, the length of the nuclear localization signal is less than 5, 10, 25, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, or 1000 bp. Various RNA nuclear localization sequences can be used.

如本文所用的术语“结构域”是指有助于某一生物分子的特定功能的生物分子的结构。结构域可以包含生物分子的连续区域(例如,连续序列)或不同的非连续区域(例如,非连续序列)。蛋白质结构域的实例包括但不限于核酸内切酶结构域、靶DNA结合结构域、逆转录结构域;核酸的结构域的实例是调节结构域,例如转录因子结合结构域。As used herein, the term "domain" refers to the structure of a biomolecule that contributes to a specific function of a biomolecule. A domain can comprise a continuous region (e.g., a continuous sequence) or different non-continuous regions (e.g., a non-continuous sequence) of a biomolecule. Examples of protein domains include, but are not limited to, endonuclease domains, target DNA binding domains, reverse transcription domains; examples of domains of nucleic acids are regulatory domains, such as transcription factor binding domains.

在一些实施方式中,本申请涉及含有锌指结合基序的靶DNA结合结构域,逆转录酶结构域、以及核酸内切酶结构域。In some embodiments, the present application relates to a target DNA binding domain containing a zinc finger binding motif, a reverse transcriptase domain, and an endonuclease domain.

在一些实施方式中,逆转录酶结构域是指具有逆转录功能的结构域,本领域技术人员能够使用常规工具作为基本局部比对搜索工具(例如:BLAST),基于与其他已知逆转录结构域的同源性来鉴定逆转录结构域。在一些实施例中,逆转录酶结构域被修饰,例如通过位点特异性突变。在实施例中,逆转录酶结构域被工程化以结合异源序列。 In some embodiments, the reverse transcriptase domain refers to a domain with reverse transcription function, and those skilled in the art can use conventional tools as basic local comparison search tools (e.g., BLAST) to identify the reverse transcription domain based on homology with other known reverse transcription domains. In certain embodiments, the reverse transcriptase domain is modified, for example, by site-specific mutation. In an embodiment, the reverse transcriptase domain is engineered to bind to a heterologous sequence.

在一些实施方式中,核酸内切酶结构域是指具有核酸内切功能的结构域,核酸内切酶元件是异源核酸内切酶元件,例如Fok1核酸酶,II型限制性l样核酸内切酶(RLE型核酸酶)或另一RLE型核酸内切酶(也称为REL)。在一些实施例中,异源核酸内切酶活性具有切口酶活性,并且不形成双链断裂。本领域技术人员能够使用工具作为基本局部比对搜索工具(例如:BLAST),或者利用网站或者软件进行结构域预测(例如使用InterPro网站、hhpred网站、CDD网站、psi-blast软件、blastp软件或hh-suite软件进行结构域预测),或基于与其他已知核酸内切酶结构域的同源性来鉴定核酸内切酶结构域。In some embodiments, the endonuclease domain refers to a domain with endonuclease function, and the endonuclease element is a heterologous endonuclease element, such as Fok1 nuclease, type II restriction endonuclease (RLE type nuclease) or another RLE type endonuclease (also referred to as REL). In some embodiments, the heterologous endonuclease activity has nickase activity and does not form double-strand breaks. Those skilled in the art can use tools as basic local comparison search tools (e.g., BLAST), or use websites or software to predict domains (e.g., using InterPro website, hhpred website, CDD website, psi-blast software, blastp software or hh-suite software to predict domains), or identify endonuclease domains based on homology with other known endonuclease domains.

在一些实施方式中,靶DNA结合结构域是含有锌指结合基序的靶DNA结合结构域,其中锌指结合基序是负责结合特定序列的靶DNA的一段氨基酸序列。In some embodiments, the target DNA binding domain is a target DNA binding domain containing a zinc finger binding motif, wherein the zinc finger binding motif is an amino acid sequence responsible for binding to a target DNA of a specific sequence.

如本文所用的术语“外源的”,当相对于生物分子(例如核酸序列或多肽)使用时,是指通过人工将生物分子引入宿主基因组、细胞或生物中。例如,使用重组DNA技术或其他方法添加到现有基因组、细胞、组织或受试者中的核酸对于现有核酸序列、细胞、组织或受试者而言是外源的。As used herein, the term "exogenous", when used with respect to a biomolecule (e.g., a nucleic acid sequence or a polypeptide), refers to the artificial introduction of the biomolecule into a host genome, cell, or organism. For example, a nucleic acid added to an existing genome, cell, tissue, or subject using recombinant DNA technology or other methods is exogenous to the existing nucleic acid sequence, cell, tissue, or subject.

如本文所使用的术语“异源”是指当用于参考第二元件来描述第一元件时,术语异源的意思是第一元件和第二元件自然界中不以如所描述的布置存在。例如,异源多肽、核酸分子、构建体或序列是指(a)多肽、核酸分子或多肽或核酸分子序列的一部分,其对于表达其的细胞而言不是天然的,(b)相对于其天然状态已发生改变或突变的多肽或核酸分子或多肽或核酸分子的一部分,或(c)具有与在类似条件下的天然表达水平相比改变的表达的多肽或核酸分子。例如,异源调节序列(例如启动子,增强子)可以用于调节基因或核酸分子的表达,其方式不同于基因或核酸分子通常在自然界中表达的方式。在另一个实例中,多肽或核酸序列的异源结构域(例如,多肽的DNA结合结构域或编码多肽的DNA结合结构域的核酸)可以相对于其他结构域布置,或者可以是不同的序列或相对于多肽的其他结构域或部分或其编码核酸来自不同来源。在某些实施例中,异源核酸分子可以存在于天然宿主细胞基因组中,但是可以具有改变的表达水平或具有不同的序列或两者。在其他实施例中,异源核酸分子对于宿主细胞或宿主基因组可能不是内源的,而是可能是已经通过转化(例如,转染,电穿孔)引入宿主细胞中,其中添加的分子可以 整合到宿主基因组中或可以作为染色体外遗传物质短暂(例如mRNA)存在或半稳定存在超过一代(例如游离病毒载体、质粒或其他自我复制载体)。As used herein, the term "heterologous" means that when used to describe a first element with reference to a second element, the term heterologous means that the first element and the second element do not exist in the arrangement as described in nature. For example, a heterologous polypeptide, nucleic acid molecule, construct or sequence refers to (a) a polypeptide, nucleic acid molecule or a portion of a polypeptide or nucleic acid molecule sequence that is not natural for the cell expressing it, (b) a polypeptide or nucleic acid molecule or a portion of a polypeptide or nucleic acid molecule that has been altered or mutated relative to its natural state, or (c) a polypeptide or nucleic acid molecule with expression that is altered compared to the natural expression level under similar conditions. For example, a heterologous regulatory sequence (e.g., a promoter, an enhancer) can be used to regulate the expression of a gene or nucleic acid molecule in a manner different from the manner in which the gene or nucleic acid molecule is usually expressed in nature. In another example, a heterologous domain of a polypeptide or nucleic acid sequence (e.g., a DNA binding domain of a polypeptide or a nucleic acid encoding a DNA binding domain of a polypeptide) can be arranged relative to other domains, or can be a different sequence or relative to other domains or portions of a polypeptide or its encoding nucleic acid from different sources. In certain embodiments, the heterologous nucleic acid molecule may be present in the native host cell genome, but may have an altered expression level or a different sequence, or both. In other embodiments, the heterologous nucleic acid molecule may not be endogenous to the host cell or the host genome, but may have been introduced into the host cell by transformation (e.g., transfection, electroporation), wherein the added molecule may be It may be integrated into the host genome or may exist transiently (e.g., mRNA) or semi-stably for more than one generation (e.g., episomal viral vectors, plasmids, or other self-replicating vectors) as extrachromosomal genetic material.

本文所使用的术语“基因表达单元”是核酸序列,其包含与至少一个效应子序列可操作地连接的至少一个调节核酸序列。当第一核酸序列被放置成与第二核酸序列有功能关系时,该第一核酸序列与该第二核酸序列可操作地连接。例如,如果启动子或增强子影响编码序列的转录或表达,则所述启动子或增强子与所述编码序列可操作地连接。可操作地连接DNA序列可以是连续的或非连续的。在需要连接两个蛋白质编码区的情况下,可操作地连接的序列可以在同一阅读框中。As used herein, the term "gene expression unit" is a nucleic acid sequence comprising at least one regulatory nucleic acid sequence operably linked to at least one effector sequence. A first nucleic acid sequence is operably linked to a second nucleic acid sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For example, if a promoter or enhancer affects the transcription or expression of a coding sequence, the promoter or enhancer is operably linked to the coding sequence. Operably linked DNA sequences can be continuous or non-continuous. In the case where it is necessary to connect two protein coding regions, the operably linked sequences can be in the same reading frame.

如本文所使用的术语“宿主基因组或宿主细胞”是指已将蛋白质和/或遗传物质引入其中的细胞和/或其基因组。应当理解,这些术语不仅旨在指特定的受试者细胞和/或基因组,而且还指这种细胞的后代和/或这种细胞的后代的基因组。因为由于突变或环境影响,某些修饰可能在后代中发生,所以这样的后代实际上可能与亲本细胞不同,但仍包括在本文所用的术语“宿主细胞”的范围内。宿主基因组或宿主细胞可以是在培养物中生长的分离的细胞或细胞系,或者是从这种细胞或细胞系分离的基因组材料,或者可以是组成活组织或生物体的宿主细胞或宿主基因组。在一些情况下,宿主细胞可以是动物细胞或植物细胞,例如,如本文所述。在某些情况下,宿主细胞可以是牛细胞、马细胞、猪细胞、山羊细胞、绵羊细胞、鸡细胞或火鸡细胞。在某些情况下,宿主细胞可以是玉米细胞、大豆细胞、小麦细胞或稻细胞。As used herein, the term "host genome or host cell" refers to a cell and/or its genome into which proteins and/or genetic material have been introduced. It should be understood that these terms are intended to refer not only to specific subject cells and/or genomes, but also to the offspring of such cells and/or the genomes of the offspring of such cells. Because some modifications may occur in offspring due to mutations or environmental influences, such offspring may actually be different from parental cells, but are still included in the scope of the term "host cell" used herein. The host genome or host cell can be a separated cell or cell line grown in culture, or a genomic material separated from such a cell or cell line, or can be a host cell or host genome that constitutes a living tissue or organism. In some cases, the host cell can be an animal cell or a plant cell, for example, as described herein. In some cases, the host cell can be a cattle cell, a horse cell, a pig cell, a goat cell, a sheep cell, a chicken cell or a turkey cell. In some cases, the host cell can be a corn cell, a soybean cell, a wheat cell or a rice cell.

本文所使用的基因组安全港位点(GSH位点):基因组安全港位点是宿主基因组中的位点,其能够容纳新遗传材料的整合,例如,使得插入的遗传元件不会引起宿主基因组的显著改变对宿主细胞或生物体构成风险。GSH位点通常满足以下条件中的1、2、3、4、5、6、7、8或9个:(i)距癌症相关基因>300kb;(ii)距miRNA/其他功能性小RNA>300kb;(iii)距5'基因末端>50kb;(iv)距复制起点>50kb;(v)距任何极保守元件>50kb;(vi)转录活性低(即无mRNA+/-25kb);(vii)不在拷贝数可变区中;(viii)在开放染色质中;和/或(ix)是唯一的,在人基因组中有1个拷贝。满足一些或所有这些标准的人基因组中GSH位点的实例包括:(i)腺相关病毒位点1(AAVS1),它是AAV病毒在19号染色体上整合的天然位点;(ii)趋化因子(C-C基序)受体5(CCR5)基因,一种被称为HIV-1共同受体的趋化因子受体基因;(iii)小鼠Rosa26基因座的 人直系同源物;(iv)rDNA基因座。另外的GSH位点是已知的,并且描述于例如Pellenz等人,2018年8月20日电子公开(https://doi.org/10.1101/396390)中。As used herein, genomic safe harbor sites (GSH sites): genomic safe harbor sites are sites in the host genome that can accommodate the integration of new genetic material, for example, so that the inserted genetic elements do not cause significant changes in the host genome that pose a risk to the host cell or organism. GSH sites typically meet 1, 2, 3, 4, 5, 6, 7, 8 or 9 of the following conditions: (i) >300 kb from cancer-related genes; (ii) >300 kb from miRNA/other functional small RNAs; (iii) >50 kb from the 5' gene end; (iv) >50 kb from the replication origin; (v) >50 kb from any extremely conserved element; (vi) low transcriptional activity (i.e., no mRNA +/- 25 kb); (vii) not in a copy number variable region; (viii) in open chromatin; and/or (ix) is unique, with 1 copy in the human genome. Examples of GSH sites in the human genome that meet some or all of these criteria include: (i) adeno-associated virus site 1 (AAVS1), which is the natural site of integration of the AAV virus on chromosome 19; (ii) the chemokine (CC motif) receptor 5 (CCR5) gene, a chemokine receptor gene known as the HIV-1 co-receptor; (iii) the mouse Rosa26 locus Human ortholog; (iv) rDNA locus. Additional GSH sites are known and described, for example, in Pellenz et al., electronic publication on August 20, 2018 (https://doi.org/10.1101/396390).

在一些实施例中,基因组安全港位点是Natural HarborTM位点。在一些实施例中,Natural HarborTM位点是核糖体DNA(rDNA)。在一些实施例中,Natural HarborTM位点是5SrDNA、18S rDNA、5.8S rDNA或28S rDNA。在一些实施例中,Natural HarborTM位点是5S rDNA中的Mutsu位点。在一些实施例中,Natural HarborTM位点是28S rDNA中的R2位点。In some embodiments, the genomic safe harbor site is a Natural Harbor TM site. In some embodiments, the Natural Harbor TM site is a ribosomal DNA (rDNA). In some embodiments, the Natural Harbor TM site is a 5S rDNA, 18S rDNA, 5.8S rDNA, or 28S rDNA. In some embodiments, the Natural Harbor TM site is a Mutsu site in 5S rDNA. In some embodiments, the Natural Harbor TM site is an R2 site in 28S rDNA.

如本文所使用的“假结序列”是指具有带有合适的自身互补性以形成假结结构的序列的核酸(例如RNA)。As used herein, a "pseudoknot sequence" refers to a nucleic acid (eg, RNA) having a sequence with appropriate self-complementarity to form a pseudoknot structure.

如本文所使用的“茎环序列”是指具有足够的自身互补性以形成茎-环的核酸序列(例如,RNA序列),例如,具有的茎包含至少两个(例如,3、4、5、6、7、8、9或10个)碱基对,以及具有的环具有至少三个(例如,四个)碱基对。茎可能包含不匹配或凸起。As used herein, a "stem-loop sequence" refers to a nucleic acid sequence (e.g., an RNA sequence) having sufficient self-complementarity to form a stem-loop, e.g., having a stem comprising at least two (e.g., 3, 4, 5, 6, 7, 8, 9, or 10) base pairs, and having a loop having at least three (e.g., four) base pairs. The stem may contain mismatches or bulges.

在一些实施方案中,所述细胞是选自下组的生物的动物细胞:牛、绵羊、山羊、马、猪、鹿、鸡、鸭、鹅、兔和鱼。In some embodiments, the cell is an animal cell of an organism selected from the group consisting of cow, sheep, goat, horse, pig, deer, chicken, duck, goose, rabbit, and fish.

在一些实施方案中,所述细胞是哺乳动物细胞。在一些实施方案中,所述细胞是人细胞。在一些实施方案中,所述人细胞是人胚胎肾293T(HEK293T或293T)细胞或HeLa细胞。在一些实施方案中,所述细胞是人胚肾(HEK293T)细胞。在一些实施方案中,所述细胞是小鼠Hepa1-6细胞。在一些实施方案中,所述哺乳动物细胞选自下组:免疫细胞、肝细胞、肿瘤细胞、干细胞、血液细胞、神经细胞、合子、肌肉细胞(如心肌细胞)和皮肤细胞。In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is a human cell. In some embodiments, the human cell is a human embryonic kidney 293T (HEK293T or 293T) cell or a HeLa cell. In some embodiments, the cell is a human embryonic kidney (HEK293T) cell. In some embodiments, the cell is a mouse Hepa1-6 cell. In some embodiments, the mammalian cell is selected from the group consisting of an immune cell, a hepatocyte, a tumor cell, a stem cell, a blood cell, a neural cell, a zygote, a muscle cell (such as a cardiomyocyte) and a skin cell.

在一些实施方案中,所述细胞是选自下组的免疫细胞:细胞毒性T细胞、辅助T细胞、天然杀伤(NK)T细胞、iNK-T细胞、NK-T样细胞、γδT细胞、肿瘤浸润性T细胞和树突状细胞(DC)激活的T细胞。在一些实施方案中,所述方法产生经修饰的免疫细胞,诸如CAR-T细胞或TCR-T细胞。In some embodiments, the cell is an immune cell selected from the group consisting of cytotoxic T cells, helper T cells, natural killer (NK) T cells, iNK-T cells, NK-T-like cells, γδT cells, tumor-infiltrating T cells, and dendritic cells (DC) activated T cells. In some embodiments, the method produces modified immune cells, such as CAR-T cells or TCR-T cells.

在一些实施方案中,所述细胞是胚胎干(ES)细胞、诱导性多能干(iPS)细胞、配子的祖细胞、配子、合子或胚胎中的细胞。In some embodiments, the cell is an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a gamete progenitor cell, a gamete, a zygote, or a cell in an embryo.

本申请的逆转座酶,其包括含有锌指结合基序的靶DNA结合结构域、逆转录酶结构域、以及核酸内切酶结构域,能够将RNA逆转录成DNA。 The reverse transcriptase of the present application comprises a target DNA binding domain containing a zinc finger binding motif, a reverse transcriptase domain, and a nuclease endonuclease domain, and can reverse transcribe RNA into DNA.

在一个具体的实施方式中,本申请的逆转座酶的氨基酸序列如SEQ ID No.1~6或SEQ ID No.32~43或SEQ ID No.68~71中的任一项所示或与SEQ ID No.1~6或SEQ ID No.32~43或SEQ ID No.68~71中任一项所述的氨基酸序列具有至少70%、71%、72%、73%、74%、75%、76%、77%、78%、79%、80%、81%、82%、83%、84%、85%、86%、87%、88%、89%、90%、91%、92%、93%、94%、95%、96%、97%、98%、99%同一性。In a specific embodiment, the amino acid sequence of the retrotransposase of the present application is as shown in any one of SEQ ID No.1-6 or SEQ ID No.32-43 or SEQ ID No.68-71, or has at least 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identity with the amino acid sequence of any one of SEQ ID No.1-6 or SEQ ID No.32-43 or SEQ ID No.68-71.

在一个具体的实施方式中,本申请的逆转座酶的氨基酸序列为SEQ ID No.1~6或SEQ ID No.32~43或SEQ ID No.68~71中的任一项所示的氨基酸序列的保守突变体。In a specific embodiment, the amino acid sequence of the reverse transposase of the present application is a conservative mutant of the amino acid sequence shown in any one of SEQ ID No. 1 to 6 or SEQ ID No. 32 to 43 or SEQ ID No. 68 to 71.

由于本申请的逆转座酶均包含1~3个锌指结构域(ZF),1个Myb类结构域,1个逆转录酶结构域(RT)以及1个限制性内切酶样核酸内切酶结构域(RLE)。蛋白质的N端,以及结构域与结构域之间存在长度不一的氨基酸序列,这些序列不具备明显的结构性或保守性,因此,本申请的SEQ ID No.1~6或SEQ ID No.32~43或SEQ ID No.68~71中的任一项所示的氨基酸序列的保守突变体包括去除掉蛋白质N端、以及1~3个锌指结构域(ZF),1个Myb类结构域,1个逆转录酶结构域(RT)以及1个限制性内切酶样核酸内切酶结构域(RLE)这四种结构域之间的不具备保守性或结构性的氨基酸序列得到的SEQ ID No.1~6或SEQ ID No.32~43或SEQ ID No.68~71中的任一项所示的氨基酸序列的截短序列,但其仍然具有逆转座酶活性。或者是在蛋白质N端、以及1~3个锌指结构域(ZF),1个Myb类结构域,1个逆转录酶结构域(RT)以及1个限制性内切酶样核酸内切酶结构域(RLE)这四种结构域之间的不具备保守性或结构性的氨基酸序列中发生突变、删除、插入等操作得到的仍然具有逆转座酶活性的保守突变体。或者是在蛋白质N端、以及1~3个锌指结构域(ZF),1个Myb类结构域,1个逆转录酶结构域(RT)以及1个限制性内切酶样核酸内切酶结构域(RLE)这四种结构域之间的不具备保守性或结构性的氨基酸序列的截短序列的基础上,在这些不具备结构性或保守型的区域发生突变、删除、插入等操作得到的仍然具有逆转座酶活性的保守突变体。The reverse transcriptase of the present application comprises 1 to 3 zinc finger domains (ZF), 1 Myb-like domain, 1 reverse transcriptase domain (RT) and 1 restriction endonuclease-like nuclease domain (RLE). The N-terminus of the protein and the amino acid sequences of different lengths between the domains do not have obvious structural or conservative properties. Therefore, the conservative mutants of the amino acid sequences shown in any one of SEQ ID No. 1 to 6 or SEQ ID No. 32 to 43 or SEQ ID No. 68 to 71 of the present application include the truncated sequences of the amino acid sequences shown in any one of SEQ ID No. 1 to 6 or SEQ ID No. 32 to 43 or SEQ ID No. 68 to 71 obtained by removing the N-terminus of the protein and the amino acid sequences that do not have conservatism or structure between the four domains, namely, 1 to 3 zinc finger domains (ZF), 1 Myb-like domain, 1 reverse transcriptase domain (RT) and 1 restriction endonuclease-like nuclease domain (RLE), but they still have reverse transposase activity. Or, a conservative mutant still having retrotransposase activity obtained by mutation, deletion, insertion, etc. in the amino acid sequence that is not conservative or structural between the four domains of the protein N-terminus and 1 to 3 zinc finger domains (ZF), 1 Myb-like domain, 1 reverse transcriptase domain (RT) and 1 restriction endonuclease-like nuclease domain (RLE). Or, a conservative mutant still having retrotransposase activity obtained by mutation, deletion, insertion, etc. in the non-structural or conservative regions based on the truncated sequence of the amino acid sequence that is not conservative or structural between the four domains of the protein N-terminus and 1 to 3 zinc finger domains (ZF), 1 Myb-like domain, 1 reverse transcriptase domain (RT) and 1 restriction endonuclease-like nuclease domain (RLE).

蛋白质中某一区域的结构性是指该区域在蛋白质被解析的三维结构(可以从PDB等数据库中获取),或者利用蛋白质三维结构预测软件(例如:Alphafold)预测得到的三维结构中具有明显的刚性结构(α-螺旋或者β-折叠)。 The structural property of a region in a protein refers to the region having an obvious rigid structure (α-helix or β-fold) in the resolved three-dimensional structure of the protein (which can be obtained from databases such as PDB) or in the three-dimensional structure predicted by protein three-dimensional structure prediction software (e.g. Alphafold).

例如可以是#21号蛋白的截短体,例如对其401-467位氨基酸进行截短后的截短体,其仍然具有本申请所要求的逆转座酶的活性。For example, it may be a truncated form of protein #21, such as a truncated form after truncating the amino acids 401-467, which still has the activity of the retrotransposase required by the present application.

例如可以是#21号蛋白的截短体的变体,例如对其401-467位氨基酸进行截短后的截短体并进一步在400位氨基酸后增加32个连接氨基酸,其仍然具有本申请所要求的逆转座酶的活性。For example, it can be a variant of a truncated form of protein #21, such as a truncated form after truncating amino acids 401-467 and further adding 32 linked amino acids after amino acid 400, which still has the activity of the retrotransposase required by the present application.

例如可以是#21号蛋白的截短体,例如对其1-100位氨基酸进行截短后的截短体,其仍然具有本申请所要求的逆转座酶的活性。For example, it may be a truncated form of protein #21, for example, a truncated form after truncating amino acids 1-100 thereof, which still has the activity of the retrotransposase required by the present application.

例如可以是#21号蛋白的截短体,例如对其1-200位氨基酸进行截短后的截短体,其仍然具有本申请所要求的逆转座酶的活性。For example, it may be a truncated form of protein #21, for example, a truncated form after truncating the amino acids 1-200 thereof, which still has the activity of the retrotransposase required by the present application.

例如也可以是#21号蛋白的截短体,例如对其1-200位氨基酸和401-467位氨基酸进行截短后的截短体,其仍然具有本申请所要求的逆转座酶的活性。For example, it may be a truncated form of protein #21, such as a truncated form after truncating amino acids 1-200 and amino acids 401-467, which still has the activity of the retrotransposase required by the present application.

本申请涉及一种用于对DNA进行修饰的系统,所述系统包括:本申请的逆转座酶或编码本申请所述的逆转座酶的核酸;和供体RNA或编码所述供体RNA的核酸,所述供体RNA包含:与所述逆转座酶结合的序列和异源序列,优选所述异源序列是至少1-50000个碱基,例如1nt以上、10nt以上、50nt以上、60nt以上、70nt以上、80nt以上、90nt以上、100nt以上、150nt以上、200nt以上、250nt以上、300nt以上、350nt以上、400nt以上、450nt以上、500nt以上、550nt以上、600nt以上、650nt以上、700nt以上、750nt以上、800nt以上、850nt以上、900nt以上、950nt以上、1000nt以上、1100nt以上、1200nt以上、1300nt以上、1400nt以上、1500nt以上、1600nt以上、1700nt以上、1800nt以上、1900nt以上、2000nt以上、2100nt以上、2200nt以上、2300nt以上、2400nt以上、2500nt以上、2600nt以上、2700nt以上、2800nt以上、2900nt以上、3000nt以上、3500nt以上、4000nt以上、4500nt以上、5000nt以上、5500nt以上、6000nt以上、6500nt以上、7000nt以上、7500nt以上、8000nt以上、8500nt以上、9000nt以上、9500nt以上、10000nt以上、15000nt以上、20000nt以上、25000nt以上、30000nt以上、35000nt以上、40000nt以上、45000nt以上。The present application relates to a system for modifying DNA, the system comprising: the reverse transposase of the present application or a nucleic acid encoding the reverse transposase of the present application; and a donor RNA or a nucleic acid encoding the donor RNA, the donor RNA comprising: a sequence that binds to the reverse transposase and a heterologous sequence, preferably the heterologous sequence is at least 1-50000 bases, for example, more than 1nt, more than 10nt, more than 50nt, more than 60nt, more than 70nt, more than 80nt, more than 90nt, more than 10 ... 0nt or more, 150nt or more, 200nt or more, 250nt or more, 300nt or more, 350nt or more, 400nt or more, 450nt or more, 500nt or more, 550nt or more, 600nt or more, 650nt or more, 700nt or more, 750nt or more, 800nt or more, 850nt or more, 900nt or more, 950nt or more, 1000nt or more, 1100nt or more, 1200nt or more, 1300nt or more 1400nt or more, 1500nt or more, 1600nt or more, 1700nt or more, 1800nt or more, 1900nt or more, 2000nt or more, 2100nt or more, 2200nt or more, 2300nt or more, 2400nt or more, 2500nt or more, 2600nt or more, 2700nt or more, 2800nt or more, 2900nt or more, 3000nt or more, 3500nt or more, 4000nt or more, 4500nt or more 0nt or more, 5000nt or more, 5500nt or more, 6000nt or more, 6500nt or more, 7000nt or more, 7500nt or more, 8000nt or more, 8500nt or more, 9000nt or more, 9500nt or more, 10000nt or more, 15000nt or more, 20000nt or more, 25000nt or more, 30000nt or more, 35000nt or more, 40000nt or more, 45000nt or more.

在一个具体的方式中,如图1所示,逆转座酶与供体RNA分别由2个质粒来分开编码,逆转座酶基因由CAG启动子表达,供体RNA由CAG启动子表达。供体RNA的表达框中同时还嵌合了一个由CMV启动子起始表 达的反向表达框。在由CMV启动子起始表达的表达框中,CMV表达GFP,但是此处的GFP被一个反方向插入的内含子(intron)序列所隔断。因此,本申请的系统最终的作用模式是:由CAG启动子起始表达供体RNA后,供体RNA中包含的内含子(intron)序列从表达的RNA上被剪切下来,此时由CMV启动子起始表达的表达框中的GFP序列在RNA水平上恢复正常。但此时的GFP序列是反义RNA链,不具备翻译出GFP蛋白的能力。当且仅当逆转座酶将失去内含子的供体RNA通过逆转录酶活性永久整合入细胞的DNA中后,由CMV启动子起始表达的反向表达框才能够表达正常的没有内含子的GFP的mRNA,进而表达出具有绿色荧光的GFP蛋白。In a specific embodiment, as shown in FIG1 , the retrotransposase and the donor RNA are encoded separately by two plasmids, the retrotransposase gene is expressed by the CAG promoter, and the donor RNA is expressed by the CAG promoter. The expression frame of the donor RNA is also embedded with a CMV promoter. In the expression frame initiated by the CMV promoter, CMV expresses GFP, but the GFP here is separated by an intron sequence inserted in the opposite direction. Therefore, the final mode of action of the system of the present application is: after the donor RNA is expressed by the CAG promoter, the intron sequence contained in the donor RNA is sheared off from the expressed RNA, and the GFP sequence in the expression frame initiated by the CMV promoter at this time returns to normal at the RNA level. However, the GFP sequence at this time is an antisense RNA chain and does not have the ability to translate GFP protein. When and only when the reverse transposase permanently integrates the donor RNA that has lost the intron into the DNA of the cell through the reverse transcriptase activity, the reverse expression frame initiated by the CMV promoter can express the normal mRNA of GFP without introns, and then express the GFP protein with green fluorescence.

本申请的异源序列选自如下中的一种或两种以上:编码多肽的序列、包含组织性启动子或增强子的序列、编码一个或多个内含子的序列。The heterologous sequence of the present application is selected from one or more of the following: a sequence encoding a polypeptide, a sequence containing an organizational promoter or enhancer, and a sequence encoding one or more introns.

在一个具体的方式中,所述多肽为治疗性多肽或哺乳动物多肽;进一步优选所述多肽为治疗性蛋白质、膜蛋白质、细胞内蛋白质、细胞外蛋白质、结构蛋白、信号传到蛋白、调节蛋白、转运蛋白、细胞器蛋白、感觉蛋白、运动蛋白、防御蛋白、储存蛋白、报告蛋白质、抗体、酶、凝血因子。In a specific embodiment, the polypeptide is a therapeutic polypeptide or a mammalian polypeptide; further preferably, the polypeptide is a therapeutic protein, a membrane protein, an intracellular protein, an extracellular protein, a structural protein, a signal transduction protein, a regulatory protein, a transport protein, an organelle protein, a sensory protein, a motor protein, a defense protein, a storage protein, a reporter protein, an antibody, an enzyme, or a coagulation factor.

在一个具体的方式中,所述多肽的氨基酸个数为20个~10000个,例如氨基酸个数为30个、40个、50个、60个、70个、80个、90个、100个、110个、120个、130个、140个、150个、160个、170个、180个、190个、200个、210个、220个、230个、240个、250个、260个、270个、280个、290个、300个、310个、320个、330个、340个、350个、360个、370个、380个、390个、400个、410个、420个、430个、440个、450个、460个、470个、480个、490个、500个、550个、600个、650个、700个、750个、800个、850个、900个、950个、1000个、1100个、1200个、1300个、1400个、1500个、1600个、1700个、1800个、1900个、2000个、2100个、2200个、2300个、2400个、2500个、2600个、2700个、2800个、2900个、3000个、3100个、3200个、3300个、3400个、3500个、3600个、3700个、3800个、3900个、4000个、4100个、4200个、4300个、4400个、4500个、4600个、4700个、4800个、4900个、5000个、5100个、5200个、5300个、5400个、5500个、5600个、5700个、5800个、5900个、6000个、6100个、6200个、6300个、6400个、6500个、6600个、6700个、6800个、6900个、7000个、7100个、7200个、7300个、7400个、7500个、7600个、7700个、7800 个、7900个、8000个、8100个、8200个、8300个、8400个、8500个、8600个、8700个、8800个、8900个、9000个、9100个、9200个、9300个、9400个、9500个、9600个、9700个、9800个、9900个。In a specific embodiment, the number of amino acids in the polypeptide is 20 to 10,000, for example, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800 , 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700 , 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3100, 3200, 3300, 3400, 3500, 3600, 3700, 3800, 3900, 4000, 4100, 4200, 4300, 4400, 4500, 4600, 4700, 4800 800, 4900, 5000, 5100, 5200, 5300, 5400, 5500, 5600, 5700, 5800, 5900, 6000, 6100, 6200, 6300, 6400, 6500, 6600, 6700, 6800, 6900, 7000, 7100, 7200, 7300, 7400, 7500, 7600, 7700, 7800 , 7900, 8000, 8100, 8200, 8300, 8400, 8500, 8600, 8700, 8800, 8900, 9000, 9100, 9200, 9300, 9400, 9500, 9600, 9700, 9800, and 9900.

在一个具体的方式中,所述细胞内蛋白选自胞质蛋白、核蛋白、细胞器蛋白、线粒体蛋白或溶酶体蛋白。在一个具体的方式中,在编码多肽的序列中包含一个或多个内含子。In a specific manner, the intracellular protein is selected from cytoplasmic proteins, nuclear proteins, organelle proteins, mitochondrial proteins or lysosomal proteins. In a specific manner, one or more introns are included in the sequence encoding the polypeptide.

本文所述系统可以在体外或体内使用。在一些实施例中,例如在体外或体内将系统或系统组分递送至细胞(例如,哺乳动物细胞,例如人细胞)。在一些实施例中,细胞是真核细胞,例如多细胞生物的细胞,例如动物,例如哺乳动物(例如人、猪、牛)、鸟(例如家禽,例如鸡、火鸡、或鸭)或鱼。在一些实施例中,细胞是非人动物细胞(例如,实验动物、牲畜或伴侣动物)。在一些实施例中,细胞是干细胞(例如,造血干细胞)、成纤维细胞或T细胞。在一些实施例中,细胞是非分裂细胞,例如非分裂成纤维细胞或非分裂T细胞。在一些实施例中,细胞是植物细胞。本领域技术人员将理解,可以以多肽、核酸(例如,DNA、RNA)及其组合的形式递送Gene Writer系统的组分。The systems described herein can be used in vitro or in vivo. In some embodiments, the system or system components are delivered to cells (e.g., mammalian cells, such as human cells), for example, in vitro or in vivo. In some embodiments, the cells are eukaryotic cells, such as cells of multicellular organisms, such as animals, such as mammals (e.g., humans, pigs, cattle), birds (e.g., poultry, such as chickens, turkeys, or ducks), or fish. In some embodiments, the cells are non-human animal cells (e.g., experimental animals, livestock, or companion animals). In some embodiments, the cells are stem cells (e.g., hematopoietic stem cells), fibroblasts, or T cells. In some embodiments, the cells are non-dividing cells, such as non-dividing fibroblasts or non-dividing T cells. In some embodiments, the cells are plant cells. Those skilled in the art will appreciate that the components of the Gene Writer system can be delivered in the form of polypeptides, nucleic acids (e.g., DNA, RNA), and combinations thereof.

本申请中,递送可以使用以下任何组合来递送逆转座酶(例如,作为编码逆转座酶蛋白的DNA,作为编码逆转座酶蛋白的RNA或作为蛋白本身)和供体RNA(例如,作为编码RNA的DNA,或作为RNA):In the present application, delivery can use any combination of the following to deliver the retrotransposase (e.g., as DNA encoding the retrotransposase protein, as RNA encoding the retrotransposase protein, or as the protein itself) and the donor RNA (e.g., as DNA encoding RNA, or as RNA):

1.逆转座酶DNA+供体DNA1. Retrotransposase DNA + donor DNA

2.逆转座酶RNA+供体DNA2. Retrotransposase RNA + donor DNA

3.逆转座酶DNA+供体RNA3. Retrotransposase DNA + donor RNA

4.逆转座酶RNA+供体RNA4. Retrotransposase RNA + donor RNA

5.逆转座酶蛋白+供体DNA5. Retrotransposase protein + donor DNA

6.逆转座酶蛋白+供体RNA6. Retrotransposase protein + donor RNA

7.逆转座酶病毒+包含供体RNA或DNA的病毒7. Retrovirus + virus containing donor RNA or DNA

8.逆转座酶病毒+供体DNA8. Retrovirus + donor DNA

9.逆转座酶病毒+供体RNA9. Retrovirus + donor RNA

10.逆转座酶DNA+包含供体RNA或DNA的病毒10. Retrotransposase DNA + virus containing donor RNA or DNA

11.逆转座酶RNA+包含供体RNA或DNA的病毒11. Retrotransposase RNA + virus containing donor RNA or DNA

12.逆转座酶蛋白+包含供体RNA或DNA的病毒12. Retrotransposase protein + virus containing donor RNA or DNA

如上所述,在一些实施例中,使用病毒递送编码逆转座酶蛋白的DNA 或RNA,并且在一些实施例中,使用病毒递送供体RNA(或编码供体RNA的DNA)。As described above, in some embodiments, a virus is used to deliver DNA encoding a retrotransposase protein. Or RNA, and in some embodiments, a virus is used to deliver the donor RNA (or DNA encoding the donor RNA).

在一个实施例中,系统和/或系统的组分以核酸的形式递送。例如,逆转座酶多肽可以以编码所述多肽的DNA或RNA的形式递送,并且供体RNA可以以RNA或其有待转录成RNA的互补DNA的形式递送。在一些实施例中,系统或系统的组分在1、2、3、4或更多个不同的核酸分子上递送。在一些实施例中,系统或系统的组分作为DNA和RNA的组合递送。在一些实施例中,系统或系统的组分作为DNA和蛋白质的组合递送。在一些实施例中,系统或系统的组分作为RNA和蛋白质的组合递送。在一些实施例中,逆转座酶多肽作为蛋白质递送。In one embodiment, the system and/or components of the system are delivered in the form of nucleic acids. For example, the retrotransposase polypeptide can be delivered in the form of DNA or RNA encoding the polypeptide, and the donor RNA can be delivered in the form of RNA or its complementary DNA to be transcribed into RNA. In some embodiments, the system or components of the system are delivered on 1, 2, 3, 4 or more different nucleic acid molecules. In some embodiments, the system or components of the system are delivered as a combination of DNA and RNA. In some embodiments, the system or components of the system are delivered as a combination of DNA and protein. In some embodiments, the system or components of the system are delivered as a combination of RNA and protein. In some embodiments, the retrotransposase polypeptide is delivered as a protein.

在一些实施例中,使用载体将系统或系统的组分递送到细胞,例如哺乳动物细胞或人细胞。载体可以是例如质粒或病毒。在一些实施例中,递送是体内、体外、离体或原位的。在一些实施例中,病毒是腺相关病毒(AAV)、慢病毒、腺病毒。在一些实施例中,系统或系统的组分与病毒样颗粒或病毒体一起被递送至细胞。在一些实施例中,递送使用一种以上的病毒、病毒样颗粒或病毒体。In some embodiments, a system or a component of a system is delivered to a cell, such as a mammalian cell or a human cell, using a vector. The vector can be, for example, a plasmid or a virus. In some embodiments, delivery is in vivo, in vitro, ex vivo or in situ. In some embodiments, the virus is an adeno-associated virus (AAV), a lentivirus, an adenovirus. In some embodiments, a system or a component of a system is delivered to a cell together with a virus-like particle or a virion. In some embodiments, delivery uses more than one virus, virus-like particle or virion.

在一个实施例中,本文所述的组合物和系统可以配制在脂质体或其他类似的囊泡中。脂质体是球形囊泡结构,所述球形囊泡结构由围绕内部水性隔室的单层或多层的脂质双层和相对不可渗透的外部亲脂性磷脂双层构成。脂质体可以是阴离子的、中性的或阳离子的。脂质体具有生物相容性,无毒,可以递送亲水性和亲脂性药物分子,保护其货物免受血浆酶的降解,并将其负载运输穿过生物膜和血脑屏障(BBB)。In one embodiment, compositions and systems described herein can be formulated in liposomes or other similar vesicles. Liposomes are spherical vesicle structures, which are composed of a monolayer or multilayer lipid bilayer around an internal aqueous compartment and a relatively impermeable external lipophilic phospholipid bilayer. Liposomes can be anionic, neutral or cationic. Liposomes are biocompatible, nontoxic, can deliver hydrophilic and lipophilic drug molecules, protect their cargo from degradation by plasma enzymes, and transport their loads across biomembranes and blood-brain barriers (BBB).

囊泡可由几种不同类型的脂质制成;然而,磷脂最常用于产生脂质体作为药物载剂。制备多层囊泡脂质的方法是本领域已知的(参见例如美国专利号6,693,086,其关于多层囊泡脂质制备的教导通过引用并入文中)。尽管当脂质膜与水溶液混合时,囊泡的形成是自发的,但也可以通过使用均质器、超声仪或挤压装置以振荡的形式施加力来加快囊泡的形成。可通过挤出通过具有减小尺寸的过滤器来制备挤出的脂质。Vesicles can be made of several different types of lipids; however, phospholipids are most commonly used to produce liposomes as drug carriers. Methods for preparing multilamellar vesicle lipids are known in the art (see, e.g., U.S. Patent No. 6,693,086, which is incorporated herein by reference for its teachings on the preparation of multilamellar vesicle lipids). Although the formation of vesicles is spontaneous when the lipid film is mixed with an aqueous solution, the formation of vesicles can also be accelerated by applying force in the form of oscillation using a homogenizer, sonicator, or extrusion device. Extruded lipids can be prepared by extrusion through a filter with a reduced size.

脂质纳米颗粒是为本文所述的药物组合物提供生物相容性和可生物降解的递送系统的载剂的另一个实例。纳米结构化的脂质载剂(NLC)是经修饰的固体脂质纳米颗粒(SLN),其保留了SLN的特性,提高了药物的稳定性和 载药量,并防止了药物泄漏。聚合物纳米颗粒(PNP)是药物递送的重要组成部分。这些纳米颗粒可以有效地将药物递送引导至特定靶并改善药物稳定性和受控的药物释放。也可以使用脂质聚合物纳米颗粒(PLN),其是一种组合了脂质体和聚合物的新型载剂。这些纳米颗粒具有PNP和脂质体的互补优势。PLN由核-壳结构构成;聚合物核提供了稳定的结构,磷脂壳提供了良好的生物相容性。这样,这两种组分提高了药物包封有效率,促进了表面修饰,并防止了水溶性药物的泄漏。Lipid nanoparticles are another example of carriers that provide a biocompatible and biodegradable delivery system for the pharmaceutical compositions described herein. Nanostructured lipid carriers (NLCs) are modified solid lipid nanoparticles (SLNs) that retain the properties of SLNs, improving the stability and Drug loading, and prevent drug leakage. Polymer nanoparticles (PNP) are an important component of drug delivery. These nanoparticles can effectively guide drug delivery to specific targets and improve drug stability and controlled drug release. Lipopolymer nanoparticles (PLN) can also be used, which is a new type of carrier that combines liposomes and polymers. These nanoparticles have the complementary advantages of PNP and liposomes. PLN is composed of a core-shell structure; the polymer core provides a stable structure, and the phospholipid shell provides good biocompatibility. In this way, these two components improve the drug encapsulation efficiency, promote surface modification, and prevent the leakage of water-soluble drugs.

在一个具体的实施方式中,供体RNA中的与逆转座酶结合的5’非翻译序列(5’UTR)为天然序列或非天然的序列。其中,非天然的5’非翻译序列(5’UTR),相对于天然的5’UTR序列,具有核苷酸的增加、删除和/或替换得到的序列。In a specific embodiment, the 5' untranslated sequence (5'UTR) in the donor RNA that binds to the retrotransposase is a natural sequence or a non-natural sequence. The non-natural 5' untranslated sequence (5'UTR) has a sequence obtained by adding, deleting and/or replacing nucleotides relative to the natural 5'UTR sequence.

在一个具体的实施方式中,供体RNA中的与逆转座酶结合的3’非翻译序列(3’UTR)为天然序列或非天然的序列。其中,非天然的3’非翻译序列(5’UTR),相对于天然的3’UTR序列,具有核苷酸的增加、删除和/或替换得到的序列。In a specific embodiment, the 3' untranslated sequence (3'UTR) in the donor RNA that binds to the retrotransposase is a natural sequence or a non-natural sequence. The non-natural 3' untranslated sequence (5'UTR) has a sequence obtained by adding, deleting and/or replacing nucleotides relative to the natural 3'UTR sequence.

在一个具体的方式中,供体RNA中的与逆转座酶结合的5’非翻译序列(5’UTR)与SEQ ID No.7~12或SEQ ID No.44~55中任一项所述的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性。In a specific embodiment, the 5' untranslated sequence (5'UTR) in the donor RNA to which the retrotransposase binds has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity with the nucleotide sequence of any one of SEQ ID No. 7 to 12 or SEQ ID No. 44 to 55.

在一个具体的方式中,供体RNA中的与逆转座酶结合的3’非翻译序列(5’UTR)与SEQ ID No.13~18或SEQ ID No.56~67中任一项所述的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性。In a specific embodiment, the 3' untranslated sequence (5'UTR) in the donor RNA to which the retrotransposase binds has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity with the nucleotide sequence of any one of SEQ ID No. 13 to 18 or SEQ ID No. 56 to 67.

在一个具体的方式中,非天然的5’非翻译序列(5’UTR),与SEQ ID No.19-21的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性。In one specific embodiment, the non-natural 5' untranslated sequence (5'UTR) has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence of SEQ ID No.19-21.

在一个具体的方式中,非天然的3’非翻译序列(3’UTR),与SEQ ID No.22-23中任一项所述的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性。In one specific embodiment, the non-natural 3' untranslated sequence (3'UTR) has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity with the nucleotide sequence of any one of SEQ ID No.22-23.

本申请还涉及上述非天然的3’非翻译序列(3’UTR)和非天然的5’非翻译序列(5’UTR)的使用。 The present application also relates to the use of the above-mentioned non-native 3' untranslated sequence (3'UTR) and non-native 5' untranslated sequence (5'UTR).

质粒构建Plasmid construction

本申请的逆转座酶的蛋白质序列经过密码子优化(人类)并合成,通过Gibson克隆的方式将DNA编码片段装载到pCAG-SV40poly(A)载体的XmaI和NheI酶切位点之间。表达供体RNA质粒则通过重叠PCR的方法将含有GFP(N)-intron-GFP(C)(Seq ID No.24)、5’-UTR(如Seq ID No.7~12中任一项所述)、3’-UTR序列(如Seq ID No.13~18中任一项所述)、CMV启动子(Seq ID No.25)的多段序列进行分别扩增,最后使用Gibson克隆的方式将DNA编码片段装载到pSV40-mCherry载体上,从而构建表达供体RNA的质粒。The protein sequence of the retrotransposase of the present application is codon-optimized (human) and synthesized, and the DNA coding fragment is loaded between the XmaI and NheI restriction sites of the pCAG-SV40poly(A) vector by Gibson cloning. The expression donor RNA plasmid is amplified by overlapping PCR method to contain multiple sequences of GFP(N)-intron-GFP(C) (Seq ID No.24), 5'-UTR (as described in any one of Seq ID No.7 to 12), 3'-UTR sequence (as described in any one of Seq ID No.13 to 18), and CMV promoter (Seq ID No.25), and finally the DNA coding fragment is loaded into the pSV40-mCherry vector by Gibson cloning to construct a plasmid expressing the donor RNA.

细胞培养、转染、荧光激活细胞分选法(FACS)Cell culture, transfection, fluorescence activated cell sorting (FACS)

将HEK293T细胞系的细胞(来源自ATCC)在含1%青霉素-链霉素(Gibco)和10%胎牛血清(Gibco)的DMEM(Gibco)中培养。将细胞接种在24孔板-细胞培养皿(Corning)中16小时,直到细胞密度达到70%-90%。通过使用Lipofectamine 3000(Invitrogen),将250ng编码逆转座酶蛋白(如Seq ID No.1~6中任一项所述)的质粒和250ng表达供体RNA的质粒转染到每个24孔板-细胞培养皿中。转染24小时后,用胰蛋白酶-EDTA(0.05%)(Gibco)消化细胞。然后使用MoFlo XDP(Beckman Coulter)仪器分选具有mCherry信号的细胞,并重新接种回12孔板中,继续培养6天后,用胰蛋白酶-EDTA(0.05%)(Gibco)消化细胞,再使用BD FACSAriaTM Fusion Cell Sorter(BD)仪器分析具有GFP阳性信号的细胞比例。通过与阴性对照的GFP阳性信号的细胞比例进行比较,以及结合在荧光显微镜下观察的结果,来确认新型的逆转座酶系统在哺乳动物细胞中是否能够发挥功能。HEK293T cell line cells (sourced from ATCC) were cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). The cells were seeded in a 24-well plate-cell culture dish (Corning) for 16 hours until the cell density reached 70%-90%. By using Lipofectamine 3000 (Invitrogen), 250 ng of a plasmid encoding a retrotransposase protein (as described in any one of Seq ID No. 1 to 6) and 250 ng of a plasmid expressing a donor RNA were transfected into each 24-well plate-cell culture dish. After transfection for 24 hours, the cells were digested with trypsin-EDTA (0.05%) (Gibco). Then, cells with mCherry signals were sorted using MoFlo XDP (Beckman Coulter) instrument and re-seeded in 12-well plates. After 6 days of culture, cells were digested with trypsin-EDTA (0.05%) (Gibco) and the proportion of cells with GFP-positive signals was analyzed using BD FACSAria TM Fusion Cell Sorter (BD) instrument. By comparing the proportion of cells with GFP-positive signals with the negative control and combining the results observed under a fluorescence microscope, it was confirmed whether the new retrotransposase system could function in mammalian cells.

实施例Example

下面将参照附图更详细地描述本发明的具体实施例。虽然附图中显示了本发明的具体实施例,然而应当理解,可以以各种形式实现本发明而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本发明,并且能够将本发明的范围完整的传达给本领域的技术人员。The specific embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although the specific embodiments of the present invention are shown in the accompanying drawings, it should be understood that the present invention can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided in order to enable a more thorough understanding of the present invention and to enable the scope of the present invention to be fully communicated to those skilled in the art.

实施例1:GFP报告系统的构建Example 1: Construction of GFP reporter system

本实施例设计了一个报告系统,能够准确地反映本申请逆转座酶系统在哺乳动物细胞中是否能够发挥作用(图1)。具体来讲,逆转座酶蛋白与供体RNA分别由2个质粒来分开编码,体现了新型逆转座酶系统的可模块化性能(modularity)。供体RNA由CAG启动子表达,值得注意的是,由CAG启动子起始表达供体RNA的表达框中同时还嵌合了一个由CMV启动子起始表达的反向表达框。在由CMV启动子起始表达的表达框中,CMV表达GFP,但是此处的GFP被一个反方向插入的内含子(intron)序列所隔断。This embodiment designs a reporter system that can accurately reflect whether the retrotransposase system of the present application can work in mammalian cells (Figure 1). Specifically, the retrotransposase protein and the donor RNA are separately encoded by two plasmids, respectively, reflecting the modularity of the novel retrotransposase system. The donor RNA is expressed by the CAG promoter. It is worth noting that the expression frame of the donor RNA initiated by the CAG promoter also contains a reverse expression frame initiated by the CMV promoter. In the expression frame initiated by the CMV promoter, CMV expresses GFP, but the GFP here is separated by an intron sequence inserted in the reverse direction.

因此,本申请的系统最终的作用模式是:由CAG启动子起始表达供体RNA后,供体RNA中包含的内含子(intron)序列从表达的RNA上被剪切下来,此时由CMV启动子起始表达的表达框中的GFP序列在RNA水平上恢复正常。但此时的GFP序列是反义RNA链,不具备翻译出GFP蛋白的能力。当且仅当新型逆转座酶将失去内含子的供体RNA通过逆转录酶活性永久整合入细胞的DNA中后,由CMV启动子起始表达的反向表达框才能够表达正常的没有内含子的GFP的mRNA,进而表达出具有绿色荧光的GFP蛋白。因此,通过荧光显微镜以及流式细胞术的方法检测最终GFP细胞是否存在以及所占的比例,我们就可以判断新型逆转座酶在哺乳动物细胞中是否能够发挥效果,以及活性有多高。Therefore, the final mode of action of the system of the present application is: after the donor RNA is expressed by the CAG promoter, the intron sequence contained in the donor RNA is sheared off from the expressed RNA, and the GFP sequence in the expression frame expressed by the CMV promoter returns to normal at the RNA level. However, the GFP sequence at this time is an antisense RNA chain and does not have the ability to translate GFP protein. When and only when the new retrotransposase permanently integrates the donor RNA that has lost introns into the DNA of the cell through reverse transcriptase activity, the reverse expression frame expressed by the CMV promoter can express normal GFP mRNA without introns, and then express GFP protein with green fluorescence. Therefore, by detecting the presence and proportion of the final GFP cells by fluorescence microscopy and flow cytometry, we can judge whether the new retrotransposase can play an effect in mammalian cells and how high the activity is.

实施例2:新型逆转座酶能够在哺乳动物细胞中发挥功能Example 2: Novel retrotransposases are able to function in mammalian cells

基于实施例1设计的GFP报告系统,本实施例测试了多个新型的逆转座酶系统。需要注意的是,新型逆转座酶系统的供体RNA一般包含5个部分,同源臂-左(Seq ID No.26),5’UTR序列,3’UTR序列,5’UTR序列与3’UTR序列二者之间的携带有新的基因信息的序列,以及同源臂-右(Seq ID No.27))(图1)。Based on the GFP reporter system designed in Example 1, this example tests multiple novel retrotransposase systems. It should be noted that the donor RNA of the novel retrotransposase system generally includes 5 parts, homology arm-left (Seq ID No. 26), 5'UTR sequence, 3'UTR sequence, a sequence between the 5'UTR sequence and the 3'UTR sequence carrying new gene information, and homology arm-right (Seq ID No. 27) (Figure 1).

质粒构建Plasmid construction

本实施例测试的新型逆转座酶的蛋白质多肽序列经过密码子优化(人类)并合成,通过Gibson克隆的方式将DNA编码片段装载到pCAG-SV40poly(A)载体的XmaI和NheI酶切位点之间。表达供体RNA质粒则通过重叠PCR的方法将含有GFP序列、内含子序列、5’-UTR、3’-UTR序列、CMV启动子的多段序列进行分别扩增,最后使用Gibson克隆的方式将DNA编码片段装载到pSV40-mCherry载体上,从而构建表达供体RNA的质粒。 The protein polypeptide sequence of the novel retrotransposase tested in this example was codon-optimized (human) and synthesized, and the DNA coding fragment was loaded between the XmaI and NheI restriction sites of the pCAG-SV40poly(A) vector by Gibson cloning. The expression donor RNA plasmid was amplified by overlapping PCR to amplify multiple sequences containing GFP sequence, intron sequence, 5'-UTR, 3'-UTR sequence, and CMV promoter, and finally the DNA coding fragment was loaded into the pSV40-mCherry vector by Gibson cloning to construct a plasmid expressing the donor RNA.

细胞培养、转染、荧光激活细胞分选法(FACS)Cell culture, transfection, fluorescence activated cell sorting (FACS)

将HEK293T细胞系的细胞在含1%青霉素-链霉素(Gibco)和10%胎牛血清(Gibco)的DMEM(Gibco)中培养。将细胞接种在24孔板-细胞培养皿(Corning)中16小时,直到细胞密度达到70%-90%。通过使用Lipofectamine3000(Invitrogen),将250ng编码逆转座酶蛋白的质粒和250ng表达供体RNA的质粒转染到每个24孔板-细胞培养皿中。转染24小时后,用胰蛋白酶-EDTA(0.05%)(Gibco)消化细胞。然后使用MoFlo XDP(Beckman Coulter)仪器分选具有mCherry信号的细胞,并重新接种回12孔板中,继续培养6天后,用胰蛋白酶-EDTA(0.05%)(Gibco)消化细胞,再使用BD FACSAriaTM Fusion Cell Sorter(BD)仪器分析具有GFP阳性信号的细胞比例。通过与阴性对照的GFP阳性信号的细胞比例进行比较,以及结合在荧光显微镜下观察的结果,来确认新型的逆转座酶系统在哺乳动物细胞中是否能够发挥功能。HEK293T cell line cells were cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). Cells were seeded in 24-well plates-cell culture dishes (Corning) for 16 hours until the cell density reached 70%-90%. By using Lipofectamine3000 (Invitrogen), 250ng of plasmid encoding retrotransposase protein and 250ng of plasmid expressing donor RNA were transfected into each 24-well plate-cell culture dish. After transfection for 24 hours, cells were digested with trypsin-EDTA (0.05%) (Gibco). Then, cells with mCherry signals were sorted using MoFlo XDP (Beckman Coulter) instrument and re-seeded in 12-well plates. After 6 days of culture, cells were digested with trypsin-EDTA (0.05%) (Gibco) and the proportion of cells with GFP-positive signals was analyzed using BD FACSAria TM Fusion Cell Sorter (BD) instrument. By comparing the proportion of cells with GFP-positive signals with the negative control and combining the results observed under a fluorescence microscope, it was confirmed whether the new retrotransposase system could function in mammalian cells.

在本实施例中,我们使用实施例1构建的报告系统测试了33种不同的逆转座酶系统,部分系统的逆转座酶的序列和对应的5’UTR序列,3’UTR序列如下表1所示,我们发现有6个新型的系统具有明显的高于阴性对照的GFP信号(图2)。这6个新型的逆转座酶系统(#3,#21,#23,#24,#31,#33)的蛋白质序列、5’UTR以及3’UTR序列见序列表。针对#21号新型逆转座酶系统我们做了具体分析,发现,相对于阴性对照,#21号新型逆转座酶系统展现了较高的大片段基因整合活性(图3),同时在荧光显微镜下也能清晰地观察到GFP阳性细胞的存在(图4)。因此,本实施例展示了能够在哺乳动物细胞中能够整合大片段基因的新型逆转座酶工具。In this embodiment, we used the reporter system constructed in Example 1 to test 33 different retrotransposase systems. The sequences of the retrotransposases of some systems and the corresponding 5'UTR sequences and 3'UTR sequences are shown in Table 1 below. We found that 6 new systems had GFP signals that were significantly higher than the negative control (Figure 2). The protein sequences, 5'UTR and 3'UTR sequences of these 6 new retrotransposase systems (#3, #21, #23, #24, #31, #33) are shown in the sequence table. We made a specific analysis of the #21 new retrotransposase system and found that, relative to the negative control, the #21 new retrotransposase system exhibited a higher large-fragment gene integration activity (Figure 3), and the presence of GFP-positive cells could also be clearly observed under a fluorescence microscope (Figure 4). Therefore, this embodiment demonstrates a new retrotransposase tool that can integrate large-fragment genes in mammalian cells.

表1实施例中验证的部分逆转座酶体系和部分对应的5’UTR序列、3’UTR序列

Table 1 Some of the retrotransposase systems verified in the examples and some of the corresponding 5'UTR sequences and 3'UTR sequences

实施例3:使用包含有非天然的5’UTR或/和非天然的3’UTR的供体Example 3: Use of donors containing non-native 5'UTR and/or non-native 3'UTR RNA能够提高新型逆转座酶的效率RNA boosts efficiency of new retrotransposase

基于实施例1设计的GFP报告系统,本实施例测试了多个包含有非天然的5’UTR或/和非天然的3’UTR的供体RNA对新型逆转座酶(#21)效率的影响。需要注意的是,新型逆转座酶系统的供体RNA一般包含5个部分,同源臂-左(Seq ID No.26),5’UTR序列(Seq ID No.8,Seq ID No.19-21),3’UTR序列(Seq ID No.14,Seq ID No.22-23),5’UTR序列与3’UTR序列二者之间的携带有新的基因信息的序列,以及同源臂-右(Seq ID No.27))(图1)。Based on the GFP reporter system designed in Example 1, this example tests the effects of multiple donor RNAs containing non-natural 5'UTR and/or non-natural 3'UTR on the efficiency of the novel retrotransposase (#21). It should be noted that the donor RNA of the novel retrotransposase system generally contains 5 parts, homology arm-left (Seq ID No.26), 5'UTR sequence (Seq ID No.8, Seq ID No.19-21), 3'UTR sequence (Seq ID No.14, Seq ID No.22-23), a sequence between the 5'UTR sequence and the 3'UTR sequence carrying new gene information, and homology arm-right (Seq ID No.27)) (Figure 1).

质粒构建Plasmid construction

本实施例测试的新型逆转座酶的蛋白质多肽序列经过密码子优化(人类)并合成,通过Gibson克隆的方式将DNA编码片段装载到pCAG-SV40poly(A)载体的XmaI和NheI酶切位点之间。表达供体RNA质粒则通过重叠PCR的方法将含有GFP(N)-intron-GFP(C)、非天然的5’-UTR、非天然的3’-UTR序列、CMV启动子的多段序列进行分别扩增,最后使用Gibson克隆的方式将DNA编码片段装载到pSV40-mCherry载体上,从而构建表达供体RNA的质粒。The protein polypeptide sequence of the novel retrotransposase tested in this example was codon optimized (human) and synthesized, and the DNA coding fragment was loaded between the XmaI and NheI restriction sites of the pCAG-SV40poly(A) vector by Gibson cloning. The expression donor RNA plasmid was amplified by overlapping PCR method to contain multiple sequences of GFP(N)-intron-GFP(C), non-natural 5'-UTR, non-natural 3'-UTR sequence, and CMV promoter, and finally the DNA coding fragment was loaded into the pSV40-mCherry vector by Gibson cloning to construct a plasmid expressing donor RNA.

细胞培养、转染、荧光激活细胞分选法(FACS)Cell culture, transfection, fluorescence activated cell sorting (FACS)

将HEK293T细胞系的细胞在含1%青霉素-链霉素(Gibco)和10%胎牛血清(Gibco)的DMEM(Gibco)中培养。将细胞接种在24孔板-细胞培养皿(Corning)中16小时,直到细胞密度达到70%-90%。通过使用Lipofectamine3000(Invitrogen),将250ng编码逆转座酶蛋白的质粒和250ng表达供体RNA的质粒转染到每个24孔板-细胞培养皿中。转染24小时后,用胰蛋白酶-EDTA(0.05%)(Gibco)消化细胞。然后使用MoFlo XDP(Beckman Coulter)仪器分选具有mCherry信号的细胞,并重新接种回12孔板中,继续培养6天后,用胰蛋白酶-EDTA(0.05%)(Gibco)消化细胞,再使用BD FACSAriaTM Fusion Cell  Sorter(BD)仪器分析具有GFP阳性信号的细胞比例。通过与阴性对照的GFP阳性信号的细胞比例进行比较,以及结合在荧光显微镜下观察的结果,来确认新型的逆转座酶系统在哺乳动物细胞中是否能够发挥功能。The cells of the HEK293T cell line were cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). The cells were seeded in a 24-well plate-cell culture dish (Corning) for 16 hours until the cell density reached 70%-90%. By using Lipofectamine3000 (Invitrogen), 250ng of the plasmid encoding the retrotransposase protein and 250ng of the plasmid expressing the donor RNA were transfected into each 24-well plate-cell culture dish. After 24 hours of transfection, the cells were digested with trypsin-EDTA (0.05%) (Gibco). The cells with mCherry signals were then sorted using a MoFlo XDP (Beckman Coulter) instrument and re-seeded back into a 12-well plate. After continuing to culture for 6 days, the cells were digested with trypsin-EDTA (0.05%) (Gibco) and then stained using a BD FACSAria TM Fusion Cell The ratio of cells with GFP positive signals was analyzed by Sorter (BD) instrument. The ratio of cells with GFP positive signals was compared with that of negative control and combined with the results observed under a fluorescence microscope to confirm whether the new retrotransposase system can function in mammalian cells.

在本实施例中,我们使用实施例1构建的报告系统测试了多个包含有非天然的5’UTR或/和非天然的3’UTR的供体RNA对新型逆转座酶效率的影响。我们发现有多个包含有非天然的5’UTR或/和非天然的3’UTR的供体RNA能够提高新型逆转座酶效率(图5)。In this example, we tested the effect of multiple donor RNAs containing non-natural 5'UTRs and/or non-natural 3'UTRs on the efficiency of the novel retrotransposase using the reporter system constructed in Example 1. We found that multiple donor RNAs containing non-natural 5'UTRs and/or non-natural 3'UTRs can improve the efficiency of the novel retrotransposase ( FIG. 5 ).

实施例4:新型逆转座酶能够扩增出相应的DNA写入片段Example 4: The novel retrotransposase can amplify the corresponding DNA writing fragment

使用实施例2中的系统,使用不同逆转座酶系统在哺乳动物细胞中测试,最后收取细胞,提基因组后进行PCR扩增(引物序列1)。图7显示了不同的逆转座酶在哺乳动物细胞中整合DNA序列的3‘端与基因组(28s rDNA基因内部)的接头处的PCR扩增结果。Using the system in Example 2, different retrotransposase systems were tested in mammalian cells, and the cells were finally collected and the genome was extracted for PCR amplification (primer sequence 1). Figure 7 shows the PCR amplification results of different retrotransposases at the junction of the 3' end of the integrated DNA sequence and the genome (inside the 28s rDNA gene) in mammalian cells.

在本实施例中,测试了多个新型的逆转座酶系统。新型逆转座酶系统的供体RNA一般包含5个部分,同源臂-左(Seq ID No.26),5’UTR序列,3’UTR序列,5’UTR序列与3’UTR序列二者之间的携带有新的基因信息的序列,以及同源臂-右(Seq ID No.27))(图1)。In this embodiment, multiple novel retrotransposase systems were tested. The donor RNA of the novel retrotransposase system generally comprises five parts, homology arm-left (Seq ID No. 26), 5'UTR sequence, 3'UTR sequence, a sequence between the 5'UTR sequence and the 3'UTR sequence carrying new gene information, and homology arm-right (Seq ID No. 27) (Figure 1).

质粒构建Plasmid construction

本实施例测试的新型逆转座酶的蛋白质多肽序列经过密码子优化(人类)并合成,通过Gibson克隆的方式将DNA编码片段装载到pCAG-SV40poly(A)载体的XmaI和NheI酶切位点之间。表达供体RNA质粒则通过重叠PCR的方法将含有GFP序列、内含子序列、5’-UTR、3’-UTR序列、CMV启动子的多段序列进行分别扩增,最后使用Gibson克隆的方式将DNA编码片段装载到pSV40-mCherry载体上,从而构建表达供体RNA的质粒。The protein polypeptide sequence of the novel retrotransposase tested in this example was codon optimized (human) and synthesized, and the DNA coding fragment was loaded between the XmaI and NheI restriction sites of the pCAG-SV40poly(A) vector by Gibson cloning. The expression donor RNA plasmid was amplified by overlapping PCR to amplify multiple sequences containing GFP sequence, intron sequence, 5'-UTR, 3'-UTR sequence, and CMV promoter, and finally the DNA coding fragment was loaded into the pSV40-mCherry vector by Gibson cloning to construct a plasmid expressing the donor RNA.

细胞培养、转染Cell culture and transfection

将HEK293T细胞系的细胞在含1%青霉素-链霉素(Gibco)和10%胎牛血清(Gibco)的DMEM(Gibco)中培养。将细胞接种在24孔板-细胞培养皿(Corning)中16小时,直到细胞密度达到70%-90%。通过使用Lipofectamine 3000(Invitrogen),将250ng编码逆转座酶蛋白的质粒和250ng表达供体RNA的质粒转染到每个24孔板-细胞培养皿中。转染24小时后,用胰蛋白酶-EDTA(0.05%)(Gibco)消化细胞。然后使用MoFlo XDP(Beckman Coulter)仪器分选具 有mCherry信号的细胞,并重新接种回12孔板中,继续培养6天后,用胰蛋白酶-EDTA(0.05%)(Gibco)消化细胞。The cells of the HEK293T cell line were cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). The cells were seeded in 24-well plate-cell culture dishes (Corning) for 16 hours until the cell density reached 70%-90%. By using Lipofectamine 3000 (Invitrogen), 250 ng of the plasmid encoding the retrotransposase protein and 250 ng of the plasmid expressing the donor RNA were transfected into each 24-well plate-cell culture dish. After 24 hours of transfection, the cells were digested with trypsin-EDTA (0.05%) (Gibco). The cells were then sorted using a MoFlo XDP (Beckman Coulter) instrument. The cells with mCherry signal were re-seeded into 12-well plates and cultured for 6 days before being digested with trypsin-EDTA (0.05%) (Gibco).

在本实施例中,我们使用了实施例1构建的报告系统测试了33种不同的逆转座酶系统,并利用引物1(引物1序列见表2)对不同的逆转座酶系统在哺乳动物细胞整合DNA序列的3’端与基因组(28s rDNA基因内部)的接头处进行PCR扩增,引物的位置见图6。我们发现相比于只转染表达供体质粒的阴性对照组,转染表达供体与逆转座酶蛋白的质粒的实验组中有16组存在对应的PCR扩增出的阳性片段(图7)。图7的每组胶图的两个通道中左侧为同时转染表达供体与逆转座酶蛋白的质粒的实验组,右侧为只转染表达供体质粒的对照组。黑色三角标示了对应的PCR扩增出的阳性片段。如图7所示,部分逆转座酶系统(#3,#4,#5,#8,#10,#11,#14,#17,#21,#22,#24,#25,#27,#29,#31,#32蛋白)能够扩增出相应的写入片段。In this example, we used the reporter system constructed in Example 1 to test 33 different retrotransposase systems, and used primer 1 (primer 1 sequence is shown in Table 2) to perform PCR amplification of different retrotransposase systems at the junction of the 3' end of the mammalian cell integrated DNA sequence and the genome (inside the 28s rDNA gene). The positions of the primers are shown in Figure 6. We found that compared with the negative control group that only transfected with the donor plasmid, 16 of the experimental groups that transfected with the plasmid expressing the donor and the retrotransposase protein had corresponding PCR-amplified positive fragments (Figure 7). In the two channels of each group of gel images in Figure 7, the left side is the experimental group that was transfected with plasmids expressing the donor and the retrotransposase protein at the same time, and the right side is the control group that was transfected with only the donor plasmid. The black triangle indicates the corresponding PCR-amplified positive fragment. As shown in Figure 7, some of the retrotransposase systems (proteins #3, #4, #5, #8, #10, #11, #14, #17, #21, #22, #24, #25, #27, #29, #31, and #32) are able to amplify the corresponding written fragments.

表2
Table 2

实施例5:新型逆转座酶能够实现GFP基因的整合Example 5: A novel retrotransposase can achieve integration of the GFP gene

使用实施例2中的系统,使用不同逆转座酶系统在哺乳动物细胞中测试,最后收取细胞,提基因组后进行PCR扩增(引物序列2)。图8显示了不同逆转座酶系统在哺乳动物细胞中整合序列的跨内含子两端的序列PCR扩增结果。Using the system in Example 2, different retrotransposase systems were tested in mammalian cells, and the cells were finally harvested and the genome was extracted for PCR amplification (primer sequence 2). Figure 8 shows the results of PCR amplification of sequences spanning both ends of introns of integration sequences in mammalian cells using different retrotransposase systems.

在本实施例中,测试了多个新型的逆转座酶系统。新型逆转座酶系统的供体RNA一般包含5个部分,同源臂-左(Seq ID No.26),5’UTR序列,3’UTR序列,5’UTR序列与3’UTR序列二者之间的携带有新的基因信息的序列,以及同源臂-右(Seq ID No.27))(图1)。In this embodiment, multiple novel retrotransposase systems were tested. The donor RNA of the novel retrotransposase system generally comprises five parts, homology arm-left (Seq ID No. 26), 5'UTR sequence, 3'UTR sequence, a sequence between the 5'UTR sequence and the 3'UTR sequence carrying new gene information, and homology arm-right (Seq ID No. 27) (Figure 1).

质粒构建Plasmid construction

本本实施例测试的新型逆转座酶的蛋白质多肽序列经过密码子优化(人类)并合成,通过Gibson克隆的方式将DNA编码片段装载到pCAG-SV40poly(A)载体的XmaI和NheI酶切位点之间。表达供体RNA质粒则通过重叠PCR的方法将含有GFP序列、内含子序列、5’-UTR、3’-UTR序列、CMV启 动子的多段序列进行分别扩增,最后使用Gibson克隆的方式将DNA编码片段装载到pSV40-mCherry载体上,从而构建表达供体RNA的质粒。The protein polypeptide sequence of the novel retrotransposase tested in this example was codon-optimized (human) and synthesized, and the DNA coding fragment was loaded into the pCAG-SV40 poly (A) vector between the XmaI and NheI restriction sites by Gibson cloning. The expression donor RNA plasmid containing the GFP sequence, intron sequence, 5'-UTR, 3'-UTR sequence, CMV promoter sequence was cloned by overlapping PCR. Multiple sequences of the promoter were amplified separately, and finally the DNA coding fragments were loaded into the pSV40-mCherry vector using Gibson cloning to construct a plasmid expressing the donor RNA.

细胞培养、转染Cell culture and transfection

将HEK293T细胞系的细胞在含1%青霉素-链霉素(Gibco)和10%胎牛血清(Gibco)的DMEM(Gibco)中培养。将细胞接种在24孔板-细胞培养皿(Corning)中16小时,直到细胞密度达到70%-90%。通过使用Lipofectamine3000(Invitrogen),将250ng编码逆转座酶蛋白的质粒和250ng表达供体RNA的质粒转染到每个24孔板-细胞培养皿中。转染24小时后,用胰蛋白酶-EDTA(0.05%)(Gibco)消化细胞。然后使用MoFlo XDP(Beckman Coulter)仪器分选具有mCherry信号的细胞,并重新接种回12孔板中,继续培养6天后,用胰蛋白酶-EDTA(0.05%)(Gibco)消化细胞。Cells of the HEK293T cell line were cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). The cells were seeded in 24-well plates-cell culture dishes (Corning) for 16 hours until the cell density reached 70%-90%. By using Lipofectamine3000 (Invitrogen), 250 ng of plasmid encoding retrotransposase protein and 250 ng of plasmid expressing donor RNA were transfected into each 24-well plate-cell culture dish. After 24 hours of transfection, the cells were digested with trypsin-EDTA (0.05%) (Gibco). The cells with mCherry signal were then sorted using a MoFlo XDP (Beckman Coulter) instrument and re-seeded back into 12-well plates. After 6 days of continuous culture, the cells were digested with trypsin-EDTA (0.05%) (Gibco).

在本实施例中,我们使用了实施例1构建的报告系统测试了33种不同的逆转座酶系统,并利用引物2(引物2序列见表3)对不同逆转座酶系统在哺乳动物细胞中整合序列的跨内含子两端的序列进行PCR扩增,我们发现相比于只转染表达供体质粒的阴性对照组,转染表达供体与逆转座酶蛋白的质粒的实验组中有12个新型的系统扩增出了去除内含子的DNA小条带(251bp)(图8)。图8显示每组胶图的两个通道中左侧为同时转染表达供体与逆转座酶蛋白的质粒的实验组,右侧为只转染表达供体质粒的对照组。如图8所示,部分逆转座酶系统(#3,#4,#5,#8,#10,#11,#14,#21,#25,#29,#31,#32蛋白)能够扩增出相应的写入片段。以上结论表明,#3,#4,#5,#8,#10,#11,#14,#17,#21,#24,#25,#27,#29,#31,#32蛋白结合各自对应的供体能够在哺乳动物细胞中实现DNA的定点整合。但是结合GFP的荧光流式分选实验(实施例2),说明#3,#21,#23,#24,#31,#33能够实现GFP基因的完整整合。In this example, we used the reporter system constructed in Example 1 to test 33 different retrotransposase systems, and used primer 2 (primer 2 sequence is shown in Table 3) to perform PCR amplification on the sequences across the introns of the integration sequences of different retrotransposase systems in mammalian cells. We found that compared with the negative control group that only transfected the expression of the donor plasmid, 12 new systems in the experimental group that transfected the plasmid expressing the donor and the retrotransposase protein amplified the small DNA band (251bp) without introns (Figure 8). Figure 8 shows that in the two channels of each group of gel images, the left side is the experimental group that simultaneously transfected the plasmid expressing the donor and the retrotransposase protein, and the right side is the control group that only transfected the expression of the donor plasmid. As shown in Figure 8, some retrotransposase systems (proteins #3, #4, #5, #8, #10, #11, #14, #21, #25, #29, #31, #32) can amplify the corresponding written fragments. The above conclusions show that #3, #4, #5, #8, #10, #11, #14, #17, #21, #24, #25, #27, #29, #31, and #32 proteins can achieve site-specific integration of DNA in mammalian cells in combination with their respective donors. However, the fluorescence flow sorting experiment combined with GFP (Example 2) shows that #3, #21, #23, #24, #31, and #33 can achieve complete integration of the GFP gene.

表3
table 3

实施例6:探究截短型的逆转座酶蛋白在哺乳动物细胞中的活性 Example 6: Exploring the activity of truncated retrotransposase protein in mammalian cells

逆转座酶蛋白质序列使用mafft软件(muscle软件、Clustal软件、blast软件也具有类似功能)进行序列比对,然后使用needle软件(blast软件也具有类似功能)计算蛋白质之间相似性。同时,逆转座酶蛋白质使用InterPro网站(hhpred网站、NCBI CDD网站、psi-blast软件、blastp软件、hh-suite软件也具有类似功能)进行结构域预测。基于蛋白质多序列比对以及蛋白质结构域预测的结果,发现本申请所涉及的蛋白质均包含1~3个锌指结构域(ZF),1个Myb类结构域,1个逆转录酶结构域(RT)以及1个限制性内切酶样核酸内切酶结构域(RLE)。蛋白质的N端,以及结构域与结构域之间存在长度不一的氨基酸序列,这些序列不具备明显的结构性与保守性,我们推测这些序列对蛋白质的活性不会起决定性的作用。因此,我们推测删除或者替换掉这些序列后的蛋白仍然能够在哺乳动物细胞中有基因定点整合的活性。为了证明这一点,以#21号蛋白为例,在本实施例中,采用与实施例2完全相同的体系,我们分别构建了如下4种突变体,突变体序列见如下表4,其中5’UTR和3’UTR分别使用Seq ID No.8和Seq ID No.14所示的序列。结果证明,如图9所示,4种截短型的逆转座酶蛋白在哺乳动物细胞中依然具有基因定点整合的活性,说明本申请探究的蛋白质具有活性功能的最小构造,且基于同类序列的结构预测,可以推广到属于同一类型的其他蛋白质。The sequence of the retrotransposase protein was aligned using the mafft software (muscle software, Clustal software, and blast software also have similar functions), and then the needle software (blast software also has similar functions) was used to calculate the similarity between proteins. At the same time, the retrotransposase protein was predicted using the InterPro website (hhpred website, NCBI CDD website, psi-blast software, blastp software, and hh-suite software also have similar functions). Based on the results of protein multiple sequence alignment and protein domain prediction, it was found that the proteins involved in this application all contain 1 to 3 zinc finger domains (ZF), 1 Myb-like domain, 1 reverse transcriptase domain (RT), and 1 restriction endonuclease-like nuclease domain (RLE). There are amino acid sequences of varying lengths at the N-terminus of the protein, and between domains. These sequences do not have obvious structural and conservative properties. We speculate that these sequences will not play a decisive role in the activity of the protein. Therefore, we speculate that the protein after deleting or replacing these sequences can still have the activity of gene site-specific integration in mammalian cells. To prove this point, taking protein #21 as an example, in this example, using the same system as in Example 2, we constructed the following four mutants, the mutant sequences are shown in Table 4, where the 5'UTR and 3'UTR use the sequences shown in Seq ID No.8 and Seq ID No.14, respectively. The results show that, as shown in Figure 9, the four truncated retrotransposase proteins still have the activity of site-directed gene integration in mammalian cells, indicating that the protein explored in this application has the minimum structure of active function, and the structural prediction based on the same sequence can be extended to other proteins of the same type.

表4
Table 4

尽管以上结合附图对本发明的实施方案进行了描述,但本发明并不局限于上述的具体实施方案和应用领域,上述的具体实施方案仅仅是示意性的、指导性的,而不是限制性的。本领域的普通技术人员在本说明书的启示下和 在不脱离本发明权利要求所保护的范围的情况下,还可以做出很多种的形式,这些均属于本发明保护之列。Although the embodiments of the present invention are described above in conjunction with the accompanying drawings, the present invention is not limited to the above-mentioned specific embodiments and application fields. The above-mentioned specific embodiments are only illustrative and instructive, but not restrictive. Without departing from the scope of protection of the claims of the present invention, many other forms can be made, all of which are protected by the present invention.

本申请中涉及的序列如下:The sequences involved in this application are as follows:

>#3蛋白(SEQ ID No.1)
>#3 protein (SEQ ID No.1)

>#21蛋白(SEQ ID No.2)

>#21 protein (SEQ ID No.2)

>#23蛋白(SEQ ID No.3)
>#23 protein (SEQ ID No.3)

>#24蛋白(SEQ ID No.4)

>#24 protein (SEQ ID No.4)

>#31蛋白(SEQ ID No.5)
>#31 protein (SEQ ID No.5)

>#33蛋白(SEQ ID No.6)

>#33 protein (SEQ ID No.6)

>#3 5’UTR(SEQ ID No.7):gtcccggggtcagcagccctcggacgtctcggagcttag>#21 5’UTR(SEQ ID No.8)
>#3 5'UTR (SEQ ID No.7):gtcccggggtcagcagccctcggacgtctcggagcttag>#21 5'UTR (SEQ ID No.8)

>#23 5’UTR(SEQ ID No.9)
>#23 5'UTR (SEQ ID No.9)

>#24 5’UTR(SEQ ID No.10)
>#24 5'UTR (SEQ ID No.10)

>#31 5’UTR(SEQ ID No.11)
>#31 5'UTR (SEQ ID No.11)

>#33 5’UTR(SEQ ID No.12)
>#33 5'UTR (SEQ ID No.12)

>#3 3’UTR(SEQ ID No.13)
>#3 3'UTR (SEQ ID No.13)

>#21 3’UTR(SEQ ID No.14)
>#21 3'UTR (SEQ ID No.14)

>#23 3’UTR(SEQ ID No.15)
>#23 3'UTR (SEQ ID No.15)

>#24 3’UTR(SEQ ID No.16)
>#24 3'UTR (SEQ ID No.16)

>#31 3’UTR(SEQ ID No.17)
>#31 3'UTR (SEQ ID No.17)

>#33 3’UTR(SEQ ID No.18)
>#33 3'UTR (SEQ ID No.18)

SEQ ID No.19
SEQ ID No.19

SEQ ID No.20
SEQ ID No.20

SEQ ID No.21
SEQ ID No.21

SEQ ID No.22
SEQ ID No.22

SEQ ID No.23
SEQ ID No.23

SEQ ID No.24

SEQ ID No.24

SEQ ID No.25
SEQ ID No.25

SEQ ID No.26
SEQ ID No.26

SEQ ID No.27
SEQ ID No.27

SEQ ID No.28:ATGGGGCGGAGTTGTTACGSEQ ID No.28:ATGGGGCGGAGTTGTTACG

SEQ ID No.29:CTTCACCGTGCCAGACTAGASEQ ID No.29:CTTCACCGTGCCAGACTAGA

SEQ ID No.30:CGCTATGTCCTGATAGCGGTSEQ ID No.30:CGCTATGTCCTGATAGCGGT

SEQ ID No.31:CAAGATCCGCCACAACATCGSEQ ID No.31:CAAGATCCGCCACAACATCG

>#4蛋白(SEQ ID No.32)

>#4 protein (SEQ ID No.32)

>#5蛋白(SEQ ID No.33)
>#5 protein (SEQ ID No.33)

>#8蛋白(SEQ ID No.34)

>#8 protein (SEQ ID No.34)

>#10蛋白(SEQ ID No.35)
>#10 protein (SEQ ID No.35)

>#11蛋白(SEQ ID No.36)

>#11 protein (SEQ ID No.36)

>#14蛋白(SEQ ID No.37)
>#14 protein (SEQ ID No.37)

>#17蛋白(SEQ ID No.38)

>#17 protein (SEQ ID No.38)

>#22蛋白(SEQ ID No.39)
>#22 protein (SEQ ID No.39)

>#25蛋白(SEQ ID No.40)

>#25 protein (SEQ ID No.40)

>#27蛋白(SEQ ID No.41)
>#27 protein (SEQ ID No.41)

>#29蛋白(SEQ ID No.42)
>#29 protein (SEQ ID No.42)

>#32蛋白(SEQ ID No.43)
>#32 protein (SEQ ID No.43)

>#4 5’UTR(SEQ ID No.44)
>#4 5'UTR (SEQ ID No.44)

>#5 5’UTR(SEQ ID No.45)
>#5 5'UTR (SEQ ID No.45)

>#8 5’UTR(SEQ ID No.46)
>#8 5'UTR (SEQ ID No.46)

>#10 5’UTR(SEQ ID No.47)
>#10 5'UTR (SEQ ID No.47)

>#11 5’UTR(SEQ ID No.48)
>#11 5'UTR (SEQ ID No.48)

>#14 5’UTR(SEQ ID No.49)
>#14 5'UTR (SEQ ID No.49)

>#17 5’UTR(SEQ ID No.50)
>#17 5'UTR (SEQ ID No.50)

>#22 5’UTR(SEQ ID No.51)
>#22 5'UTR (SEQ ID No.51)

>#25 5’UTR(SEQ ID No.52)
>#25 5'UTR (SEQ ID No.52)

>#27 5’UTR(SEQ ID No.53)
>#27 5'UTR (SEQ ID No.53)

>#29 5’UTR(SEQ ID No.54)
>#29 5'UTR (SEQ ID No.54)

>#32 5’UTR(SEQ ID No.55)
>#32 5'UTR (SEQ ID No.55)

>#4 3’UTR(SEQ ID No.56)
>#4 3'UTR (SEQ ID No.56)

>#5 3’UTR(SEQ ID No.57)
>#5 3'UTR (SEQ ID No.57)

>#8 3’UTR(SEQ ID No.58)
>#8 3'UTR (SEQ ID No.58)

>#10 3’UTR(SEQ ID No.59)
>#10 3'UTR (SEQ ID No.59)

>#11 3’UTR(SEQ ID No.60)
>#11 3'UTR (SEQ ID No.60)

>#14 3’UTR(SEQ ID No.61)
>#14 3'UTR (SEQ ID No.61)

>#17 3’UTR(SEQ ID No.62)
>#17 3'UTR (SEQ ID No.62)

>#22 3’UTR(SEQ ID No.63)
>#22 3'UTR (SEQ ID No.63)

>#25 3’UTR(SEQ ID No.64)
>#25 3'UTR (SEQ ID No.64)

>#27 3’UTR(SEQ ID No.65)
>#27 3'UTR (SEQ ID No.65)

>#29 3’UTR(SEQ ID No.66)
>#29 3'UTR (SEQ ID No.66)

>#32 3’UTR(SEQ ID No.67)
>#32 3'UTR (SEQ ID No.67)

>#21(Δ401-467)(SEQ ID No.68)

>#21(Δ401-467) (SEQ ID No.68)

>#21(Δ401-467,+32aa linker)(SEQ ID No.69)
>#21 (Δ401-467, +32aa linker) (SEQ ID No.69)

>#21(Δ1-100)(SEQ ID No.70)
>#21(Δ1-100) (SEQ ID No.70)

>#21(Δ1-200)(SEQ ID No.71)
>#21(Δ1-200) (SEQ ID No.71)

Claims (37)

一种逆转座酶,其包括含有锌指结合基序的靶DNA结合结构域、逆转录酶结构域、以及核酸内切酶结构域,能够将RNA逆转录成DNA。A reverse transposase comprises a target DNA binding domain containing a zinc finger binding motif, a reverse transcriptase domain, and an endonuclease domain, and can reverse transcribe RNA into DNA. 根据权利要求1所述的逆转座酶,其包含1~3个锌指结构域(ZF),1个Myb类结构域,1个逆转录酶结构域(RT)以及1个限制性内切酶样核酸内切酶结构域(RLE)。The reverse transcriptase according to claim 1, comprising 1 to 3 zinc finger domains (ZF), 1 Myb-like domain, 1 reverse transcriptase domain (RT) and 1 restriction endonuclease-like nuclease domain (RLE). 根据权利要求1或2所述的逆转座酶,其氨基酸序列如SEQ ID No.1~6或SEQ ID No.32~43或SEQ ID No.68~71中的任一项所示或与SEQ ID No.1~6或SEQ ID No.32~43或SEQ ID No.68~71中任一项所述的氨基酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%同一性。The retrotransposase according to claim 1 or 2, whose amino acid sequence is as shown in any one of SEQ ID No.1-6 or SEQ ID No.32-43 or SEQ ID No.68-71, or has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% identity with the amino acid sequence of any one of SEQ ID No.1-6 or SEQ ID No.32-43 or SEQ ID No.68-71. 一种用于对DNA进行修饰的系统,所述系统包括:A system for modifying DNA, the system comprising: 权利要求1~3中任一项所示的逆转座酶或编码权利要求1~3中任一项所述的逆转座酶的核酸;和The retrotransposase according to any one of claims 1 to 3 or a nucleic acid encoding the retrotransposase according to any one of claims 1 to 3; and 供体RNA或编码所述供体RNA的核酸,所述供体RNA包含:与所述逆转座酶结合的序列和异源序列,A donor RNA or a nucleic acid encoding the donor RNA, wherein the donor RNA comprises: a sequence that binds to the retrotransposase and a heterologous sequence, 优选所述异源序列是至少1-50000个碱基,例如1nt以上、10nt以上、50nt以上、60nt以上、70nt以上、80nt以上、90nt以上、100nt以上、150nt以上、200nt以上、250nt以上、300nt以上、350nt以上、400nt以上、450nt以上、500nt以上、550nt以上、600nt以上、650nt以上、700nt以上、750nt以上、800nt以上、850nt以上、900nt以上、950nt以上、1000nt以上、1100nt以上、1200nt以上、1300nt以上、1400nt以上、1500nt以上、1600nt以上、1700nt以上、1800nt以上、1900nt以上、2000nt以上、2100nt以上、2200nt以上、2300nt以上、2400nt以上、2500nt以上、2600nt以上、2700nt以上、2800nt以上、2900nt以上、3000nt以上、3500nt以上、4000nt以上、4500nt以上、5000nt以上、5500nt以上、6000nt以上、6500nt以上、7000nt以上、7500nt以上、8000nt以上、8500nt以上、9000nt以上、9500nt以上、10000nt以上、15000nt以上、20000nt以上、25000nt以上、30000nt以上、35000nt以上、40000nt以上、45000nt以上。Preferably, the heterologous sequence is at least 1-50000 bases, for example, 1 nt or more, 10 nt or more, 50 nt or more, 60 nt or more, 70 nt or more, 80 nt or more, 90 nt or more, 100 nt or more, 150 nt or more, 200 nt or more, 250 nt or more, 300 nt or more, 350 nt or more, 400 nt or more, 450 nt or more, 500 nt or more, 550 nt or more, 600 nt or more, 650 nt or more, 700 nt or more, 750 nt or more, 800 nt or more, 850 nt or more, 900 nt or more, 950 nt or more, 1000 nt or more, 1100 nt or more, 1200 nt or more, 1300 nt or more, 1400 nt or more, 1500 nt or more, 1600 nt or more, 1700 nt or more, 1800 nt or more, 1900 nt or more nt or more, 2000nt or more, 2100nt or more, 2200nt or more, 2300nt or more, 2400nt or more, 2500nt or more, 2600nt or more, 2700nt or more, 2800nt or more, 2900nt or more, 3000nt or more, 3500nt or more, 4000nt or more, 4500nt or more, 5000nt or more, 5500nt or more, 6000nt or more, 6500nt or more, 7000nt or more, 7500nt or more, 8000nt or more, 8500nt or more, 9000nt or more, 9500nt or more, 10000nt or more, 15000nt or more, 20000nt or more, 25000nt or more, 30000nt or more, 35000nt or more, 40000nt or more, 45000nt or more. 根据权利要求4所述的系统,其中,所述异源序列包含如下中的一种 或两种以上:编码多肽的序列或非编码RNA序列、包含启动子或增强子的序列、编码一个或多个内含子的序列、转录终止序列;The system of claim 4, wherein the heterologous sequence comprises one of the following or two or more: a sequence encoding a polypeptide or a non-coding RNA sequence, a sequence comprising a promoter or enhancer, a sequence encoding one or more introns, a transcription termination sequence; 优选所述多肽为治疗性多肽或哺乳动物多肽;进一步优选所述多肽为治疗性蛋白质、膜蛋白质、细胞内蛋白质、细胞外蛋白质、结构蛋白、信号传到蛋白、调节蛋白、转运蛋白、细胞器蛋白、感觉蛋白、运动蛋白、防御蛋白、储存蛋白、报告蛋白质、抗体、酶、凝血因子,进一步优选所述多肽的氨基酸个数为20个~10000个,例如氨基酸个数为30个、40个、50个、60个、70个、80个、90个、100个、110个、120个、130个、140个、150个、160个、170个、180个、190个、200个、210个、220个、230个、240个、250个、260个、270个、280个、290个、300个、310个、320个、330个、340个、350个、360个、370个、380个、390个、400个、410个、420个、430个、440个、450个、460个、470个、480个、490个、500个、550个、600个、650个、700个、750个、800个、850个、900个、950个、1000个、1100个、1200个、1300个、1400个、1500个、1600个、1700个、1800个、1900个、2000个、2100个、2200个、2300个、2400个、2500个、2600个、2700个、2800个、2900个、3000个、3100个、3200个、3300个、3400个、3500个、3600个、3700个、3800个、3900个、4000个、4100个、4200个、4300个、4400个、4500个、4600个、4700个、4800个、4900个、5000个、5100个、5200个、5300个、5400个、5500个、5600个、5700个、5800个、5900个、6000个、6100个、6200个、6300个、6400个、6500个、6600个、6700个、6800个、6900个、7000个、7100个、7200个、7300个、7400个、7500个、7600个、7700个、7800个、7900个、8000个、8100个、8200个、8300个、8400个、8500个、8600个、8700个、8800个、8900个、9000个、9100个、9200个、9300个、9400个、9500个、9600个、9700个、9800个、9900个;Preferably, the polypeptide is a therapeutic polypeptide or a mammalian polypeptide; further preferably, the polypeptide is a therapeutic protein, a membrane protein, an intracellular protein, an extracellular protein, a structural protein, a signal transduction protein, a regulatory protein, a transport protein, an organelle protein, a sensory protein, a motor protein, a defense protein, a storage protein, a reporter protein, an antibody, an enzyme, a coagulation factor, and further preferably, the number of amino acids in the polypeptide is 20 to 10000, for example, the number of amino acids is 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800 , 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3100, 3200, 3300, 3400, 3500, 3600, 3700, 3800, 3900, 4000, 4100, 4200, 4300, 4400, 4500, 4600, 4700, 4800, 4900, 5000, 5100, 5200, 5300, 5400, 5500, 5600, 5700, 5800, 5900 00, 6000, 6100, 6200, 6300, 6400, 6500, 6600, 6700, 6800, 6900, 7000, 7100, 7200, 7300, 7400, 7500, 7600, 7700, 7800, 7900, 8000, 8100, 8200, 8300, 8400, 8500, 8600, 8700, 8800, 8900, 9000, 9100, 9200, 9300, 9400, 9500, 9600, 9700, 9800, 9900; 优选所述细胞内蛋白选自胞质蛋白、核蛋白、细胞器蛋白、线粒体蛋白或溶酶体蛋白,Preferably, the intracellular protein is selected from cytoplasmic proteins, nuclear proteins, organelle proteins, mitochondrial proteins or lysosomal proteins, 进一步优选在编码多肽的序列中包含一个或多个内含子。It is further preferred that the sequence encoding the polypeptide contains one or more introns. 根据权利要求4或5所述的系统,其中,所述供体RNA还包含同源结构域,优选所述同源结构域包括第一同源结构域和第二同源结构域,The system according to claim 4 or 5, wherein the donor RNA further comprises a homology domain, preferably the homology domain comprises a first homology domain and a second homology domain, 进一步优选,所述第一同源结构域为位于所述供体RNA的5’端的与靶 DNA链具有100%同一性的5个以上的碱基,所述第二同源结构域为位于所述供体RNA的3’端的与靶DNA链具有100%同一性的5个以上的碱基,优选所述靶DNA是基因组安全港GSH位点或者所述靶DNA是基因组Natural HarborTM位点。Further preferably, the first homology domain is located at the 5' end of the donor RNA and is The DNA chain has more than 5 bases with 100% identity, and the second homology domain is more than 5 bases with 100% identity with the target DNA chain located at the 3' end of the donor RNA, preferably the target DNA is a genomic safe harbor GSH site or the target DNA is a genomic Natural Harbor TM site. 根据权利要求4~6中任一项所述的系统,其中,编码权利要求1~3中任一项所述的逆转座酶的核酸和所述供体RNA或编码所述供体RNA的核酸是分开的核酸,优选所述供体RNA不编码逆转座酶,进一步优选所述供体RNA包含一个或多个化学修饰;或者The system according to any one of claims 4 to 6, wherein the nucleic acid encoding the retrotransposase according to any one of claims 1 to 3 and the donor RNA or the nucleic acid encoding the donor RNA are separate nucleic acids, preferably the donor RNA does not encode the retrotransposase, and further preferably the donor RNA comprises one or more chemical modifications; or 编码权利要求1~3中任一项所述的逆转座酶的核酸和所述供体RNA或编码所述供体RNA的核酸是共价连接的,优选编码权利要求1~3中任一项所述的逆转座酶的核酸和所述供体RNA或编码所述供体RNA的核酸形成融合核酸,进一步优选所述融合核酸包含RNA或DNA。The nucleic acid encoding the reverse transposase according to any one of claims 1 to 3 and the donor RNA or the nucleic acid encoding the donor RNA are covalently linked. Preferably, the nucleic acid encoding the reverse transposase according to any one of claims 1 to 3 and the donor RNA or the nucleic acid encoding the donor RNA form a fusion nucleic acid. Further preferably, the fusion nucleic acid comprises RNA or DNA. 根据权利要求4~7中任一项所述的系统,其中,所述供体RNA包含:The system according to any one of claims 4 to 7, wherein the donor RNA comprises: 任选与所述逆转座酶结合的5’非翻译序列(5’UTR),Optionally, a 5' untranslated sequence (5'UTR) to which the retrotransposase binds, 与所述逆转座酶结合的3’非翻译序列(3’UTR),a 3' untranslated sequence (3'UTR) to which the retrotransposase binds, 异源序列,以及Heterologous sequences, and 与所述异源序列可操作地链接的启动子,a promoter operably linked to the heterologous sequence, 优选所述启动子位于与所述逆转座酶结合的5’非翻译序列(5’UTR)与所述异源序列之间或者优选所述启动子位于与所述逆转座酶结合的3’非翻译序列(3’UTR)与所述异源序列之间。Preferably, the promoter is located between the 5' untranslated sequence (5'UTR) to which the reverse transposase binds and the heterologous sequence, or preferably, the promoter is located between the 3' untranslated sequence (3'UTR) to which the reverse transposase binds and the heterologous sequence. 根据权利要求4~8中任一项所述的系统,其中,所述异源序列包含在所述供体RNA上以5’至3’取向的开放阅读框或其反向互补序列;或者所述异源序列包含在所述供体RNA上以3’至5’取向的开放阅读框或其反向互补序列。The system according to any one of claims 4 to 8, wherein the heterologous sequence comprises an open reading frame or its reverse complement sequence in a 5' to 3' orientation on the donor RNA; or the heterologous sequence comprises an open reading frame or its reverse complement sequence in a 3' to 5' orientation on the donor RNA. 根据权利要求4~9中任一项所述的系统,其中,所述供体RNA进一步包含核定位信号或者所述编码权利要求1所述的逆转座酶的核酸包含核定位信号和/或核仁定位信号和/或出核信号。The system according to any one of claims 4 to 9, wherein the donor RNA further comprises a nuclear localization signal or the nucleic acid encoding the retrotransposase according to claim 1 comprises a nuclear localization signal and/or a nucleolar localization signal and/or a nuclear export signal. 根据权利要求4~10中任一项所述的系统,其中,编码权利要求1~3中任一项所述的逆转座酶的核酸和编码所述供体RNA的核酸以10:1~1:10的比例存在,例如以10:1、9:1、8:1、7:1、6:1、5:1、4:1、3:1、2:1、1:1、1:2、1:3、1:4、1:5、1:6、1:7、1:8、1:9、1:10的比例存在。 The system according to any one of claims 4 to 10, wherein the nucleic acid encoding the retroposase according to any one of claims 1 to 3 and the nucleic acid encoding the donor RNA are present in a ratio of 10:1 to 1:10, for example, in a ratio of 10:1, 9:1, 8:1, 7:1, 6:1, 5:1, 4:1, 3:1, 2:1, 1:1, 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9, 1:10. 根据权利要求4~11中任一项所述的系统,其中,其中所述供体RNA包含假结序列的5'的茎环序列或螺旋,优选包含假结序列的3',例如假结序列的3'和异源序列的5'的一个或多个(例如2、3或更多个)茎环序列或螺旋,进一步优选所述假结的供体RNA具有催化活性,例如,RNA切割活性,例如,顺式-RNA切割活性,或者The system according to any one of claims 4 to 11, wherein the donor RNA comprises a stem-loop sequence or helix 5' of the pseudoknot sequence, preferably comprises one or more (e.g. 2, 3 or more) stem-loop sequences or helices 3' of the pseudoknot sequence, such as 3' of the pseudoknot sequence and 5' of the heterologous sequence, and further preferably the donor RNA of the pseudoknot has catalytic activity, such as RNA cleavage activity, such as cis-RNA cleavage activity, or 所述供体RNA包含例如所述异源序列的3’的至少一个茎环序列或螺旋,例如1、2、3、4、5或更多个茎环序列、发夹或螺旋序列。The donor RNA comprises, e.g., at least one stem-loop sequence or helix, e.g., 1, 2, 3, 4, 5 or more stem-loop sequences, hairpin or helix sequences, e.g., 3' to the heterologous sequence. 根据权利要求4~12中任一项所述的系统,其中,The system according to any one of claims 4 to 12, wherein: 所述供体RNA中的与所述逆转座酶结合的5’非翻译序列(5’UTR)与SEQ ID No.7~12或SEQ ID No.44~55中任一项所述的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性;The 5' untranslated sequence (5'UTR) in the donor RNA to which the retrotransposase binds has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence of any one of SEQ ID No.7 to 12 or SEQ ID No.44 to 55; 所述供体RNA中的与所述逆转座酶结合的3’非翻译序列(5’UTR)与SEQ ID No.13~18或SEQ ID No.56~67中任一项所述的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性。The 3’ non-translated sequence (5’UTR) in the donor RNA that binds to the retrotransposase has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity with the nucleotide sequence described in any one of SEQ ID No.13 to 18 or SEQ ID No.56 to 67. 根据权利要求4~13中任一项所述的系统,其中,所述供体RNA从其5’末端到3’末端依次包括如下结构:The system according to any one of claims 4 to 13, wherein the donor RNA comprises the following structures from its 5' end to its 3' end: 第一同源结构域,The first homology domain, 与所述逆转座酶结合的5’非翻译序列(5’UTR),a 5' untranslated sequence (5'UTR) to which the retrotransposase binds, 异源序列,Heterologous sequences, 与所述逆转座酶结合的3’非翻译序列(3’UTR),以及a 3' untranslated sequence (3'UTR) to which the retrotransposase binds, and 第二同源结构域;The second homology domain; 优选第一同源结构域为位于所述供体RNA的5’末端的与靶DNA链具有100%同一性的10个以上或20个以上或30个以上或40个以上或50个以上或60个以上或70个以上或80个以上或90个以上或100个以上的碱基,所述第二同源结构域为位于所述供体RNA的3’末端的与靶DNA链具有100%同一性的10个以上或20个以上或30个以上或40个以上或50个以上或60个以上或70个以上或80个以上或90个以上或100个以上的碱基;Preferably, the first homology domain is 10 or more, 20 or more, 30 or more, 40 or more, 50 or more, 60 or more, 70 or more, 80 or more, 90 or more, or 100 or more bases located at the 5' end of the donor RNA and having 100% identity with the target DNA chain, and the second homology domain is 10 or more, 20 or more, 30 or more, 40 or more, 50 or more, 60 or more, 70 or more, 80 or more, 90 or more, or 100 or more bases located at the 3' end of the donor RNA and having 100% identity with the target DNA chain; 进一步优选所述供体RNA中的与所述逆转座酶结合的5’非翻译序列(5’UTR)与SEQ ID No.7~12中任一项所述的核苷酸序列具有至少70%、75%、 80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性;It is further preferred that the 5' untranslated sequence (5'UTR) in the donor RNA that binds to the retrotransposase has at least 70%, 75%, or 10% affinity with the nucleotide sequence of any one of SEQ ID Nos. 7 to 12. 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity; 所述供体RNA中的与所述逆转座酶结合的3’非翻译序列(5’UTR)与SEQ ID No.13~18中任一项所述的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性。The 3’ untranslated sequence (5’UTR) in the donor RNA that binds to the retrotransposase has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity with the nucleotide sequence described in any one of SEQ ID No. 13 to 18. 根据权利要求8~14中的任一项所述的系统,其中,The system according to any one of claims 8 to 14, wherein: 所述逆转座酶结合的5’非翻译序列(5’UTR)为非天然的5’非翻译序列(5’UTR);或者The 5' untranslated sequence (5'UTR) to which the retrotransposase binds is a non-natural 5' untranslated sequence (5'UTR); or 所述逆转座酶结合的3’非翻译序列(5’UTR)为非天然的3’非翻译序列(5’UTR);The 3' untranslated sequence (5'UTR) to which the retrotransposase binds is a non-natural 3' untranslated sequence (5'UTR); 进一步优选非天然的5’非翻译序列(5’UTR),相对于天然的5’UTR序列,具有核苷酸的增加、删除和/或替换;Further preferred are non-native 5' untranslated sequences (5'UTRs) having additions, deletions and/or substitutions of nucleotides relative to the native 5'UTR sequences; 进一步优选非天然的3’非翻译序列(3’UTR),相对于天然的3’UTR序列,具有核苷酸的增加、删除和/或替换;Further preferred are non-native 3' untranslated sequences (3'UTRs) having additions, deletions and/or substitutions of nucleotides relative to the native 3'UTR sequences; 进一步优选非天然的5’非翻译序列(5’UTR),与SEQ ID No.19-21的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性;Further preferred non-native 5' untranslated sequence (5'UTR) has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity to the nucleotide sequence of SEQ ID No.19-21; 进一步优选非天然的3’非翻译序列(3’UTR),与SEQ ID No.22-23中任一项所述的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性。The non-natural 3’ untranslated sequence (3’UTR) is further preferred, having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence described in any one of SEQ ID No.22-23. 根据权利要求4~15中任一项所述的系统,其中,所述异源序列以至少0.01、0.025、0.05、0.075、0.1、0.15、0.2、0.25、0.3、0.4,0.5、0.75、1、1.25、1.5、1.75、2、2.5、3、4或5个拷贝/基因组的平均拷贝数插入受试者基因组中的靶位点,优选仅在基因组的一个靶位点处插入。The system of any one of claims 4 to 15, wherein the heterologous sequence is inserted into a target site in the genome of the subject at an average copy number of at least 0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.5, 3, 4 or 5 copies per genome, preferably only at one target site in the genome. 根据权利要求4~16中任一项所述的系统,其中,导致所述异源序列插入与所述系统接触的细胞群体中约1%-80%的细胞(例如约1%-10%、10%-20%、20%-30%、30%-40%、40%-50%、50%-60%、60%-70%或70%-80%的细胞)中的靶位点(例如,以1个插入或多于一个插入的拷贝数),例如,如使用单细胞ddPCR所测量的,或者The system of any one of claims 4 to 16, wherein the heterologous sequence is caused to be inserted into the target site (e.g., at a copy number of 1 insertion or more than one insertion) in about 1%-80% of the cells (e.g., about 1%-10%, 10%-20%, 20%-30%, 30%-40%, 40%-50%, 50%-60%, 60%-70%, or 70%-80% of the cells) in a population of cells contacted with the system, e.g., as measured using single cell ddPCR, or 导致所述异源序列以1个插入的拷贝数插入与所述系统接触的细胞群体中约1%-80%的细胞(例如约1%-10%、10%-20%、20%-30%、30%-40%、40%-50%、50%-60%、60%-70%或70%-80%的细胞)中的靶位点,例如,如 使用菌落分离和ddPCR所测量的。The heterologous sequence is inserted into the target site at a copy number of 1 insertion in about 1%-80% of the cells (e.g., about 1%-10%, 10%-20%, 20%-30%, 30%-40%, 40%-50%, 50%-60%, 60%-70%, or 70%-80% of the cells) in a cell population contacted with the system, e.g., As measured using colony isolation and ddPCR. 根据权利要求4~17中任一项所述的系统,其中,导致所述异源序列在细胞群体中以比插入非靶位点(脱靶插入)更高的速率插入靶位点(中靶插入),其中中靶插入与脱靶插入的比率大于10:1、20:1、30:1、40:1、50:1、60:1、70:1、80:1、90:1、100:1、200:1、500:1或1,000:1。The system of any one of claims 4 to 17, wherein the heterologous sequence is caused to be inserted into a target site (on-target insertion) in a cell population at a higher rate than into a non-target site (off-target insertion), wherein the ratio of on-target insertion to off-target insertion is greater than 10:1, 20:1, 30:1, 40:1, 50:1, 60:1, 70:1, 80:1, 90:1, 100:1, 200:1, 500:1, or 1,000:1. 一种非天然的5’非翻译序列(5’UTR),其相对于天然的5’UTR序列,具有核苷酸的增加、删除和/或替换,优选与SEQ ID No.19-21的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性。A non-natural 5' untranslated sequence (5'UTR) having additions, deletions and/or substitutions of nucleotides relative to a natural 5'UTR sequence, preferably having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence of SEQ ID No.19-21. 一种非天然的3’非翻译序列(3’UTR),其相对于天然的3’UTR序列,具有核苷酸的增加、删除和/或替换,优选与SEQ ID No.22-23的核苷酸序列具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性。A non-natural 3' untranslated sequence (3'UTR) having additions, deletions and/or substitutions of nucleotides relative to a natural 3'UTR sequence, preferably having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity with the nucleotide sequence of SEQ ID No. 22-23. 一种工程化转座元件,其从5'到3'包含:An engineered transposable element comprising, from 5' to 3': 5’非翻译序列(5’UTR)、异源序列和3’非翻译序列(3’UTR),5' untranslated sequence (5'UTR), heterologous sequence and 3' untranslated sequence (3'UTR), 其中所述5’非翻译序列包含选自与SEQ ID No.19-21具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性的核苷酸序列;wherein the 5' untranslated sequence comprises a nucleotide sequence selected from SEQ ID No. 19-21 having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity; 其中所述3’非翻译序列包含选自与SEQ ID No.22-23具有至少70%、75%、80%、85%、90%、95%、96%、97%、98%、99%、100%的同一性的核苷酸序列。Wherein the 3’ non-translated sequence comprises a nucleotide sequence selected from SEQ ID No.22-23 having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identity. 根据权利要求21所述的元件,其中,所述元件为权利要求4~18中任一项提及的供体RNA或编码所述供体RNA的核酸。The element according to claim 21, wherein the element is the donor RNA mentioned in any one of claims 4 to 18 or a nucleic acid encoding the donor RNA. 一种宿主细胞,其包括权利要求4~18中任一项所述的系统或权利要求21或22所述的元件,优选所述宿主细胞为哺乳动物细胞和植物细胞,进一步优选为人的细胞。A host cell comprising the system according to any one of claims 4 to 18 or the element according to claim 21 or 22, wherein the host cell is preferably a mammalian cell or a plant cell, and more preferably a human cell. 一种修饰细胞、组织或受试者中的靶DNA链的方法,所述方法包括对所述细胞、组织或受试者使用权利要求4~18中任一项所述的系统,其中所述系统将所述供体RNA序列逆转录成所述靶DNA链,从而修饰细胞、组织或受试者中的靶DNA链。A method for modifying a target DNA chain in a cell, tissue or subject, the method comprising using the system of any one of claims 4 to 18 on the cell, tissue or subject, wherein the system reverse transcribes the donor RNA sequence into the target DNA chain, thereby modifying the target DNA chain in the cell, tissue or subject. 根据权利要求24所述的方法,其中,所述细胞、组织是哺乳动物的 细胞、组织,优选是人的细胞、组织,所述受试者是哺乳动物,优选是人。The method according to claim 24, wherein the cells or tissues are mammalian The cell or tissue is preferably a human cell or tissue, and the subject is a mammal, preferably a human. 根据权利要求24或25所述的方法,其中,所述细胞是成纤维细胞或原代细胞或没有被永生化的细胞。The method according to claim 24 or 25, wherein the cells are fibroblasts or primary cells or cells that have not been immortalized. 根据权利要求24~26中任一项所述的方法,其中,所述方法在体内或体外进行。The method according to any one of claims 24 to 26, wherein the method is performed in vivo or in vitro. 一种修饰哺乳动物细胞基因组或将DNA插入哺乳动物基因组的方法,所述方法包括对所述细胞使用权利要求4~18中任一项所述的系统,优选所述哺乳动物为人。A method for modifying the genome of a mammalian cell or inserting DNA into the genome of a mammal, the method comprising applying the system of any one of claims 4 to 18 to the cell, preferably the mammal is a human. 根据权利要求28所述的方法,其中,所述方法导致向所述哺乳动物的基因组添加外源DNA序列至少5、10、20、50、100、200、500、1000、2000、3000、4000、5000、6000、7000、8000、9000、10000个碱基对。The method of claim 28, wherein said method results in the addition of at least 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000 base pairs of exogenous DNA sequence to the genome of said mammal. 根据权利要求24~29中任一项所述的方法,其中,所述细胞是组织的一部分;或者所述哺乳动物细胞是整倍体,没有被永生化,是生物体的一部分,是原代细胞,是非分裂的,是肝细胞或来自患有遗传性疾病的受试者。The method according to any one of claims 24 to 29, wherein the cell is part of a tissue; or the mammalian cell is euploid, is not immortalized, is part of an organism, is a primary cell, is non-dividing, is a hepatocyte or is from a subject with a genetic disease. 根据权利要求24~30中任一项所述的方法,其中,The method according to any one of claims 24 to 30, wherein 所述方法包括使细胞、组织或受试者与权利要求1~3中任一项所示的逆转座酶或编码权利要求1~3中任一项所述的逆转座酶的核酸和供体RNA或编码所述供体RNA的核酸接触,The method comprises contacting a cell, a tissue or a subject with the retroposase as claimed in any one of claims 1 to 3 or a nucleic acid encoding the retroposase as claimed in any one of claims 1 to 3 and a donor RNA or a nucleic acid encoding the donor RNA, 优选所述接触包括使所述细胞、组织或受试者与质粒、病毒、病毒样颗粒、病毒体、脂质体、囊泡、外来体或脂质纳米颗粒接触;Preferably, the contacting comprises contacting the cell, tissue or subject with a plasmid, virus, virus-like particle, virosome, liposome, vesicle, exosome or lipid nanoparticle; 进一步优选所述接触包括使用非病毒递送,比如电穿孔。It is further preferred that said contacting comprises the use of non-viral delivery, such as electroporation. 根据权利要求31所述的方法,其中,The method according to claim 31, wherein 所述接触包括对受试者进行静脉施用,优选至少向受试者施用两次权利要求1~3中任一项所示的逆转座酶或编码权利要求1~3中任一项所述的逆转座酶的核酸和供体RNA或编码所述供体RNA的核酸。The contacting comprises intravenously administering to the subject, preferably administering to the subject at least twice, the retroposase as claimed in any one of claims 1 to 3 or a nucleic acid encoding the retroposase as claimed in any one of claims 1 to 3 and a donor RNA or a nucleic acid encoding the donor RNA. 根据权利要求24~32中任一项所述的方法,其中,The method according to any one of claims 24 to 32, wherein 权利要求1~3中任一项所示的逆转座酶或编码权利要求1~3中任一项所述的逆转座酶的核酸和供体RNA或编码所述供体RNA的核酸分开施用;或者The retroposase according to any one of claims 1 to 3 or a nucleic acid encoding the retroposase according to any one of claims 1 to 3 and the donor RNA or a nucleic acid encoding the donor RNA are administered separately; or 权利要求1~3中任一项所示的逆转座酶或编码权利要求1~3中任一项所述的逆转座酶的核酸和供体RNA或编码所述供体RNA的核酸一起施用。 The retroposase according to any one of claims 1 to 3 or a nucleic acid encoding the retroposase according to any one of claims 1 to 3 is administered together with a donor RNA or a nucleic acid encoding the donor RNA. 一种编码权利要求1~3中任一项所述的逆转座酶的核酸。A nucleic acid encoding the retrotransposase according to any one of claims 1 to 3. 一种包含权利要求34所述的核酸的载体。A vector comprising the nucleic acid of claim 34. 一种包含权利要求35所述的载体的宿主细胞。A host cell comprising the vector of claim 35. 一种药物组合物,其包括权利要求4~18中任一项所述的系统、或权利要求34所述的核酸、或权利要求35所述的载体、或权利要求23或36所述的宿主细胞,优选所述系统置于药学上可接受的载体中,进一步优选所述载体为囊泡(包括脂质体、天然或合成脂质双分子层、外来体)、脂质纳米颗粒、病毒或质粒载体。 A pharmaceutical composition comprising the system of any one of claims 4 to 18, or the nucleic acid of claim 34, or the vector of claim 35, or the host cell of claim 23 or 36, preferably the system is placed in a pharmaceutically acceptable carrier, and further preferably the carrier is a vesicle (including liposomes, natural or synthetic lipid bilayers, exosomes), lipid nanoparticles, viruses or plasmid vectors.
PCT/CN2023/139871 2022-12-19 2023-12-19 System for inserting large fragment dna into genome Ceased WO2024131786A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202380087116.0A CN120418275A (en) 2022-12-19 2023-12-19 System for inserting large fragment DNA into genome

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211633921.2 2022-12-19
CN202211633921 2022-12-19

Publications (1)

Publication Number Publication Date
WO2024131786A1 true WO2024131786A1 (en) 2024-06-27

Family

ID=91500211

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/139871 Ceased WO2024131786A1 (en) 2022-12-19 2023-12-19 System for inserting large fragment dna into genome

Country Status (2)

Country Link
CN (2) CN120418275A (en)
WO (1) WO2024131786A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110312800A (en) * 2016-11-11 2019-10-08 生物辐射实验室股份有限公司 Method for processing nucleic acid samples
CN113286880A (en) * 2018-08-28 2021-08-20 旗舰先锋创新Vi有限责任公司 Methods and compositions for regulating a genome
WO2021178720A2 (en) * 2020-03-04 2021-09-10 Flagship Pioneering Innovations Vi, Llc Methods and compositions for modulating a genome
WO2021178717A2 (en) * 2020-03-04 2021-09-10 Flagship Pioneering Innovations Vi, Llc Improved methods and compositions for modulating a genome
CN114981409A (en) * 2019-09-03 2022-08-30 美洛德生物医药公司 Methods and compositions for genomic integration

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110312800A (en) * 2016-11-11 2019-10-08 生物辐射实验室股份有限公司 Method for processing nucleic acid samples
CN113286880A (en) * 2018-08-28 2021-08-20 旗舰先锋创新Vi有限责任公司 Methods and compositions for regulating a genome
CN114981409A (en) * 2019-09-03 2022-08-30 美洛德生物医药公司 Methods and compositions for genomic integration
WO2021178720A2 (en) * 2020-03-04 2021-09-10 Flagship Pioneering Innovations Vi, Llc Methods and compositions for modulating a genome
WO2021178717A2 (en) * 2020-03-04 2021-09-10 Flagship Pioneering Innovations Vi, Llc Improved methods and compositions for modulating a genome

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DATABASE Protein 24 July 2016 (2016-07-24), "reverse transcriptase domain protein, partial [Drosophila mercatorum]", XP093183494, Database accession no. AAB94032.1 *
DATABASE Protein 26 July 2016 (2016-07-26), "unnamed protein product, partial [Drosophila melanogaster]", XP093183496, Database accession no. CAA36225.1 *

Also Published As

Publication number Publication date
CN118222533A (en) 2024-06-21
CN120418275A (en) 2025-08-01

Similar Documents

Publication Publication Date Title
US12037611B2 (en) Enhanced hAT family transposon-mediated gene transfer and associated compositions, systems, and methods
JP7733865B2 (en) Enhanced hAT family transposon-mediated gene transfer and related compositions, systems, and methods
AU2018355587B2 (en) Targeted replacement of endogenous T cell receptors
CN115851665A (en) Engineered Cas12i nuclease, effector protein thereof and application thereof
JP2017503485A (en) CRISPR-CAS system and method for altering gene product expression, structural information, and inducible modular CAS enzyme
JP2020533957A (en) CRISPR Reporter Non-Human Animals and Their Use
WO2018132936A1 (en) Genetical alternation and disease modelling using cre-dependent cas9 expressing mammals
WO2022040909A1 (en) Split cas12 systems and methods of use thereof
CN116254246B (en) Engineered CAS12B effector proteins and methods of use thereof
WO2024131786A1 (en) System for inserting large fragment dna into genome
WO2024152937A1 (en) Gene editing system targeting fgf2 and use thereof
WO2025011467A1 (en) Engineered mammalian gene writing system
CN120051567A (en) Engineered Acr proteins for modulating CRISPR activity
CN118632622A (en) Mutant myocilin disease model and its use
WO2023138617A1 (en) Engineered casx nuclease, effector protein and use thereof
CN119859643A (en) Transposon and transposon system
WO2021247989A1 (en) Enhanced hat family member spin transposon-mediated gene transfer and associated compositions, systems, and methods

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23905949

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202380087116.0

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

WWP Wipo information: published in national office

Ref document number: 202380087116.0

Country of ref document: CN

122 Ep: pct application non-entry in european phase

Ref document number: 23905949

Country of ref document: EP

Kind code of ref document: A1