[go: up one dir, main page]

WO2025007020A1 - Multiplexed retron genome editing in prokaryotic and eukaryotic genomes - Google Patents

Multiplexed retron genome editing in prokaryotic and eukaryotic genomes Download PDF

Info

Publication number
WO2025007020A1
WO2025007020A1 PCT/US2024/036205 US2024036205W WO2025007020A1 WO 2025007020 A1 WO2025007020 A1 WO 2025007020A1 US 2024036205 W US2024036205 W US 2024036205W WO 2025007020 A1 WO2025007020 A1 WO 2025007020A1
Authority
WO
WIPO (PCT)
Prior art keywords
retron
gene
engineered
cell
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/036205
Other languages
French (fr)
Inventor
Seth SHIPMAN
Santiago C. LOPEZ
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
J David Gladstone Institutes
University of California Berkeley
University of California San Diego UCSD
Original Assignee
J David Gladstone Institutes
University of California Berkeley
University of California San Diego UCSD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by J David Gladstone Institutes, University of California Berkeley, University of California San Diego UCSD filed Critical J David Gladstone Institutes
Publication of WO2025007020A1 publication Critical patent/WO2025007020A1/en
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/70Vectors or expression systems specially adapted for E. coli
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/80Vectors or expression systems specially adapted for eukaryotic hosts for fungi
    • C12N15/81Vectors or expression systems specially adapted for eukaryotic hosts for fungi for yeasts
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • C12N9/1241Nucleotidyltransferases (2.7.7)
    • C12N9/1276RNA-directed DNA polymerase (2.7.7.49), i.e. reverse transcriptase or telomerase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/12Type of nucleic acid catalytic nucleic acids, e.g. ribozymes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/10Plasmid DNA
    • C12N2800/101Plasmid DNA for bacteria
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/10Plasmid DNA
    • C12N2800/102Plasmid DNA for yeast
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/10Plasmid DNA
    • C12N2800/106Plasmid DNA for vertebrates
    • C12N2800/107Plasmid DNA for vertebrates for mammalian
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y207/00Transferases transferring phosphorus-containing groups (2.7)
    • C12Y207/07Nucleotidyltransferases (2.7.7)
    • C12Y207/07049RNA-directed DNA polymerase (2.7.7.49), i.e. telomerase or reverse-transcriptase

Definitions

  • Retrons are reverse transcribed elements found in nearly all myxobacteria (Dhundale et al. Journal of Bacteriology 164, 914-917 (1985)) and sparsely in E. coli (Lampson et al. Science 243, 1033-1038 (1989)), V. cholerae (Inouye et al. Microbiology and Immunology 55, 510-513), and other bacteria.
  • the retron operon encodes an RNA primer (multicopy singlestranded RNA, msr), an RNA sequence to be reverse-transcribed (multicopy single-stranded DNA, msd), and a reverse transcriptase, in that order.
  • the retron transcript folds up upon itself and is partially reverse transcribed to generate a single stranded DNA (ssDNA) of about 80 bases.
  • ssDNA single stranded DNA
  • the retron-derived DNA is single stranded, it contains a hairpin of doublestranded DNA.
  • Multiple retron ssDNAs can also complement each other to form larger doublestranded elements. Retron variants have different DNA lengths and base content, but broadly share this overall format.
  • the ssDNA generated by the retron has been used for genome engineering, such as bacterial, with the X Red Beta recombinase for recombineering (Farzadfard et al. Science 346, 1256272, (2014)); and eukaryotic, as a homology-directed repair (HDR) template for Cas9 editing (Sharon et al. Cell 175, 544-557. e516, (2016)) in yeast.
  • genome engineering such as bacterial
  • X Red Beta recombinase for recombineering
  • HDR homology-directed repair
  • compositions and methods for modification of the retron editing system to enable multiple simultaneous edits in prokaryotic and/or eukaryotic cells find use in, for example, therapeutic editing and cell engineering, such as for bioproduction.
  • One embodiment provides a mulitplex engineered retron comprising: a) at least one msr gene encoding multicopy single-stranded RNA (msRNA); b) at least one msd gene encoding multicopy single-stranded DNA (msDNA); c) two or more heterologous sequences of interest; and d) a ret gene encoding a reverse transcriptase, including two to five or more different heterologous sequences of interest.
  • the retron can comprise at least two msr genes and at least two msd genes or the retron can comprise one msr gene and at least two msd genes.
  • the heterologous sequences can be inserted into the msr gene and/or the msd gene, including where each of the at least two msd genes independently comprise at least one heterologous sequence.
  • the retron can comprise single-stranded DNA (msDNA) encoded by the msd gene which comprises a msd stem loop, and where the loop comprises the heterologous sequence(s) of interest.
  • each heterologous sequence independently encodes a donor polynucleotide comprising a 5' homology arm that hybridizes to a 5' target sequence and a 3' homology arm that hybridizes to a 3' target sequence flanking a donor nucleotide sequence comprising an intended edit to be integrated at a target locus by homology directed repair (HDR) or recombineering.
  • the edit can be one or more gene replacements, gene knockouts, deletions, nested deletions, insertions, inversions, or point mutations.
  • a retron further comprising a modification which results in enhanced production of msDNA formed from the retron, as compared to a retron without any of the modifications described herein.
  • the heterologous sequence comprises a CRISPR protospacer DNA sequence, including a where the CRISPR protospacer DNA sequence comprises a modified AAG protospacer adjacent motif (PAM).
  • the retron further comprising a barcode sequence, including wherein the barcode sequence is located in a hairpin loop of the msDNA.
  • the msr gene and the msd genes are provided in a trans arrangement or a cis arrangement.
  • the ret gene is provided in a trans arrangement with respect to the msr gene and/or the msd gene.
  • the msr gene, msd gene, and ret gene are a modified bacterial retron msr gene, msd gene, and ret gene, such as wherein the msr gene, msd gene, and ret gene are independently a modified myxobacteria retron, a modified Escherichia coli retron, a modified Salmonella enterica retron, or a modified Vibrio cholerae retron.
  • the modified Escherichia coli retron is a modified EC83 or a modified EC86.
  • One embodiment provides a vector system comprising one or more vectors comprising the engineered retron described herein.
  • the msr gene and the msd gene are provided by the same vector or different vectors.
  • the msr gene, the msd gene, and the ret gene are provided by the same vector.
  • the same vector comprises a promoter operably linked to the msr gene and the msd gene, including where the promoter is operably linked to the ret gene.
  • One embodiment provides for a second promoter operably linked to the ret gene.
  • the msr gene, the msd gene, and the ret gene are provided by different vectors.
  • the one or more vectors are viral vectors or nonviral vectors, such as plamids.
  • the engineered retron comprises two or more heterologous sequences, wherein each heterologous sequence independently encodes a donor polynucleotide comprising a 5' homology arm that hybridizes to a 5' target sequence and a 3' homology arm that hybridizes to a 3' target sequence flanking a nucleotide sequence comprising an intended edit to be integrated at a target locus by homology directed repair (HDR) or recombineering.
  • HDR homology directed repair
  • One embodiment further comprises a vector encoding an RNA-guided nuclease, including wherein the RNA-guided nuclease is a Cas nuclease or an engineered RNA-guided Fokl-nuclease, such as Cas9 or Cpfl.
  • the engineered retron comprises a CRISPR protospacer DNA sequence.
  • Another embodiment further comprises a vector encoding a Casl and/or Cas2 protein and/or a vector comprising a CRISPR array sequence and/or a vector encoding bacteriophage homologous recombination proteins, such as a vector encoding the bacteriophage homologous recombination proteins is a replication defective X prophage comprising the exo, bet, and gam genes.
  • the host cell is a prokaryotic, archeon, or eukaryotic host cell.
  • the eukaryotic host cell is a mammalian host cell, such as a human host cell.
  • the eukaryotic host cell is a non-human host cell.
  • the host cell endogenously expresses or has been modified to express one or more single stand annealing proteins (SSAPs), one more single stranded DNA binding proteins (SSBs), one or more mutant mismatch repair proteins or combination thereof.
  • SSAPs single stand annealing proteins
  • SSBs single stranded DNA binding proteins
  • kits comprising the engineered retron described herein, the vector system described herein, or the host cell described herein.
  • the kit further comprises instructions for genetically modifying a cell with the engineered retron.
  • One embodiment provides a multiplex method of genetically modifying a cell comprising: a) transfecting a cell with the engineered retron described herein; and b)introducing an RNA-guided nuclease and guide RNA into the cell, wherein the RNA-guided nuclease forms a complex with the guide RNA, said guide RNAs directing the complex to the genomic target locus, wherein the RNA-guided nuclease creates a double-stranded break in the genomic DNA at the genomic target locus, and the donor polynucleotide generated by the engineered retron is integrated at the genomic target locus recognized by its 5' homology arm and 3' homology arm by homology directed repair (HDR) to produce a genetically modified cell.
  • HDR homology directed repair
  • the RNA-guided nuclease is a Cas nuclease, such as Cas9 or Cpfl, or an engineered RNA-guided Fokl-nuclease.
  • the RNA-guided nuclease is provided by a vector or a recombinant polynucleotide integrated into the genome of the cell.
  • the engineered retron is provided by a vector.
  • the donor polynucleotide is used to create two or more independent gene replacements, gene knockouts, deletions, nested deletions, insertions, inversions, or point mutations.
  • One embodiment provides a multiplex method of genetically modifying a cell by recombineering, the method comprising: a) transfecting the cell with the engineered retron described herein; and b) introducing bacteriophage recombination proteins into the cell, wherein the bacteriophage recombination proteins mediate homologous recombination at a target locus such that the donor polynucleotide generated by the engineered retron is integrated at the target locus recognized by its 5' homology arm and 3' homology arm to produce a genetically modified cell.
  • the donor polynucleotide is used to modify a plasmid, bacterial artificial chromosome (BAC), or a bacterial chromosome in the bacterial cell by recombineering.
  • each donor polynucleotide can create a gene replacement, gene knockout, deletion, nested deletion, insertion, inversion, or point mutation.
  • said introducing bacteriophage recombination proteins into the cell comprises insertion of a replication-defective X prophage into the bacterial genome, wherein bacteriophage comprises exo, bet, an gam genes.
  • One embodiment provides a method of barcoding a cell comprising transfecting a cell with the engineered retron described herein.
  • One embodiment provides a multiplex method of producing an in vivo molecular recording system comprising: a) introducing a Casl protein or a Cas2 protein of a CRISPR adaptation system into a host cell; b) introducing a CRISPR array nucleic acid sequence comprising a leader sequence and at least one repeat sequence into the host cell, wherein the CRISPR array nucleic acid sequence is integrated into genomic DNA or a vector in the host cell; and c) introducing a plurality of engineered retrons described herein into the host cell, wherein each retron comprises a different protospacer DNA sequence that can be processed and inserted into the CRISPR array nucleic acid sequence.
  • the Casl protein or the Cas2 protein are provided by a vector.
  • the engineered retron is provided by a vector.
  • the plurality of engineered retrons comprises at least three different protospacer DNA sequences.
  • an engineered cell comprising an in vivo molecular recording system comprising: a) Casl protein or a Cas2 protein of a CRISPR adaptation system; b) CRISPR array nucleic acid sequence comprising a leader sequence and at least one repeat sequence into the host cell, wherein the CRISPR array nucleic acid sequence is integrated into genomic DNA or a vector in the engineered cell; and c) plurality of engineered retrons described herein, wherein each retron comprises a different protospacer DNA sequence that can be processed and inserted into the CRISPR array nucleic acid sequence.
  • the Casl protein or the Cas2 protein are provided by a vector.
  • the engineered retron is provided by a vector.
  • the plurality of engineered retrons comprises at least three different protospacer DNA sequences.
  • One embodiment provides a kit comprising the engineered cell described herein and instructions for in vivo molecular recording.
  • One embodiment provides a multiplex method of producing recombinant msDNA comprising: a) transfecting a host cell with the engineered retron described herein or the vector system described herein; and c) culturing the host cell under suitable conditions, wherein the msDNA is produced.
  • an engineered retron ncRNA comprising: a) at least one msr gene encoding multicopy single-stranded RNA (msRNA); b) at least one msd gene encoding multicopy single-stranded DNA (msDNA); c) at least one guide RNA; and d) two or more repair templates.
  • the retron comprises two to five different repair templates.
  • the retron comprises at least two msr genes and at least two msd genes.
  • the retron comprises one msr gene and at least two msd genes.
  • the repair templates are inserted into the msr gene and/or the msd gene.
  • each of the at least two msd genes comprise at least one repair template.
  • the at least one guide RNA is fused to the end of each msd gene and each of the msd gene can be separated by a Csy4 site.
  • single-stranded DNA (msDNA) encoded by the msd gene comprises a msd stem loop, and where the loop comprises the repair template.
  • the retron comprises: one msr gene; two to five msd genes; at least one guide RNA fused to the end of each msd gene; at least one repair template in each msd gene; and at least one Csy4 site.
  • the msd genes are separated by the Csy4 site.
  • the guide RNA binds to a target genomic DNA.
  • the guide RNA binds to a target genomic DNA in a bacterial, yeast, or mammalian cell, such as a human cell.
  • the guide RNA binds to a target genomic DNA in a non-human cell.
  • the repair template binds to a target genomic DNA, including wherein the repair template binds to a target genomic DNA in a bacterial, yeast, or mammalian cell. In one embodiment, the repair template binds to a target genomic DNA having at least one allele with a mutation or polymorphism. In one embodiment, the repair template comprises one or more non-complementary nucleotides to the repair templates target genomic DNA. In one embodiment, the repair template comprises two or more, or three or more non-complementary nucleotides to the repair templates target genomic DNA. In one embodiment, the non- complementary nucleotides are ‘repair’ nucleotides that can substitute for mutant, variant, or polymorphism nucleotides in the target genomic DNA.
  • composition comprising a carrier and an engineered retron described herein.
  • One embodiment provides a multiplex method comprising administering an engineered retron described herein, or a composition described herein to a subject or to cell(s) from the subject.
  • the subject has, or is suspected of having or developing a disease or condition, such as cystic fibrosis, thalassemia, sickle cell anemia, Huntington's disease, diabetes, Duchenne's Muscular Dystrophy, Tay-Sachs Disease, Marfan syndrome, Alzheimer’s disease, Leber's hereditary optic atrophy (LHON), myoclonic epilepsy with ragged red fibers (MERRF), mitochondrial encephalopathy, lactic acidosis, and stroke-like episodes (MELAS; a type of dementia), obesity, cancers, brain ischemia, coronary disease, myocardial infarction, reperfusion hindrance of ischemic diseases, atopic dermatitis, psoriasis vulgaris, contact dermatitis, keloid, decubital ulcer, ulcerative colitis,
  • nucleotide sequence encoding the engineered retron further comprises at least one promoter, such as a RNA polymerase III (pol III) promoter.
  • the pol III promoter is a constitutive promoter, such as SNR52, 7SK, U6, or Hl.
  • the msr gene is expressed from the pol III promoter.
  • the at least one promoter is a RNA polymerase II (pol II) promoter, such as an inducible promoter.
  • the msd gene is expressed from the pol II promoter.
  • One embodiment provides a vector comprising an expression cassette described herein.
  • composition comprising a carrier and the expression cassette described herein or a vector described herein.
  • One embodiment provides a multiplex method comprising administering an expression cassette described herein or a vector described herein, or a composition described herein to a subject or to cell(s) from the subject.
  • the subject has, or is suspected of having or developing a disease or condition, such as cystic fibrosis, thalassemia, sickle cell anemia, Huntington's disease, diabetes, Duchenne's Muscular Dystrophy, Tay-Sachs Disease, Marfan syndrome, Alzheimer’s disease, Leber's hereditary optic atrophy (LHON), myoclonic epilepsy with ragged red fibers (MERRF), mitochondrial encephalopathy, lactic acidosis, and stroke-like episodes (MELAS; a type of dementia), obesity, cancers, brain ischemia, coronary disease, myocardial infarction, reperfusion hindrance of ischemic diseases, atopic dermatitis, psoriasis vulgaris, contact dermatitis, keloid, decubital ulcer,
  • One embodiment provides a multiplex gene editing system comprising: one or more vectors comprising one or more nucleotide sequences encoding an engineered retron described herein, a retron reverse transcriptase, and a Cas nuclease.
  • the retron reverse transcriptase and Cas nuclease are encoded as a fusion protein.
  • the one or more vectors comprise one or more promoters.
  • the guide RNA of the retron binds to a target genomic DNA.
  • the guide RNA of the retron binds to a target genomic DNA in a bacterial, yeast, or mammalian cell.
  • the guide RNA of the retron binds to a target genomic DNA in a mammalian cell, such as a human cell. In one embodiment, the guide RNA of the retron binds to a target genomic DNA in a nonhuman cell. In one embodiment, the repair template of the retron binds to a target genomic DNA. In one embodiment, the repair template of the retron binds to a target genomic DNA in a bacterial, yeast, or mammalian cell. In another embodiment, the repair template of the retron binds to a target genomic DNA having at least one allele with a mutation or polymorphism. In one embodiment, the repair template of the retron comprises one or more non-complementary nucleotides to the repair templates target genomic DNA.
  • the repair template of the retron comprises two or more, or three or more non-complementary nucleotides to the repair templates target genomic DNA.
  • the non-complementary nucleotides are ‘repair’ nucleotides that can substitute for mutant, variant, or polymorphism nucleotides in the target genomic DNA.
  • the one more promoters is a RNA polymerase III (pol III) promoter, such as constitutive promoter, including from SNR52, 7SK, U6, or Hl.
  • the msr gene is expressed from the pol III promoter.
  • the one or more promoters is a RNA polymerase II (pol II) promoter.
  • the pol II promoter is an inducible promoter. In one embodiment, the msd gene is expressed from the pol II promoter. In one embodiment, a first vector encodes the retron and a second vector encodes the retron reverse transcriptase and Cas nuclease, including a Cas9 or Cpfl. In one embodiment, the Cas nuclease is SpCas9.
  • composition comprising a carrier and a gene editing system described herein.
  • One embodiment provides a multiplex method comprising administering a gene editing system described herein, or a composition described herein to a subject or to cell(s) from the subject.
  • the subject has, or is suspected of having or developing a disease or condition, such as cystic fibrosis, thalassemia, sickle cell anemia, Huntington's disease, diabetes, Duchenne's Muscular Dystrophy, Tay-Sachs Disease, Marfan syndrome, Alzheimer’s disease, Leber's hereditary optic atrophy (LHON), myoclonic epilepsy with ragged red fibers (MERRF), mitochondrial encephalopathy, lactic acidosis, and stroke-like episodes (MELAS; a type of dementia), obesity, cancers, brain ischemia, coronary disease, myocardial infarction, reperfusion hindrance of ischemic diseases, atopic dermatitis, psoriasis vulgaris, contact dermatitis, keloid, decubital ulcer, ulcerative colitis,
  • One embodiment provides a multiplex method of genetically editing one or more target cites in one or more cells, comprising: a) transfecting a population of cells with an expression cassette described herein, or a gene editing system described herein to generate a population of transfected cells; and b) selecting one or more cells from the population of transfected cells as genetically edited cells.
  • selecting one or more cells comprises generating colonies from individual transfected cells to provide isogenic individual colonies and selecting one or more precisely edited cells from at least one isogenic colony.
  • One embodiment further comprises sequencing one or more genomic target sites in cells from one or more isogenic individual colonies to confirm that the genomic target sites in at least one of the isogenic individual colonies are precisely edited, thereby generating precisely edited cells.
  • Another embodiment further comprises administering a population of the precisely edited cells to a subject.
  • the subject has, or is suspected of having or developing a disease or condition, such as cystic fibrosis, thalassemia, sickle cell anemia, Huntington's disease, diabetes, Duchenne's Muscular Dystrophy, Tay-Sachs Disease, Marfan syndrome, Alzheimer’s disease, Leber's hereditary optic atrophy (LHON), myoclonic epilepsy with ragged red fibers (MERRF), mitochondrial encephalopathy, lactic acidosis, and stroke-like episodes (MELAS; a type of dementia), obesity, cancers, brain ischemia, coronary disease, myocardial infarction, reperfusion hindrance of ischemic diseases, atopic dermatitis, psoriasis vulgaris, contact dermatitis, keloid, decubital ulcer, ulcerative colitis, Crohn's disease, nephropathy, glomerulosclerosis, albuminuria
  • FIGS. 1A-1G Encoding several donors in a retron msd enables multiplexed retron recombineering in bacteria.
  • D. Top schematic of the retron recombineering cassette with 3 donors encoded in the msd region. The numbers above the donors indicate the order of reverse transcription.
  • Bottom quantification of precise editing rates of bacterial rpoB, gyrA, and lacZ loci.
  • FIGS. 2A-2F Improved multiplexed editing using donors in arrayed retron msds.
  • Bottom: quantification of precise editing rates for precise editing of gyrA or rpoB alone or simultaneously (unpaired t-test, singleplex versus multiplex, rpoB P ⁇ 0.0001, gyrA P 0.0006).
  • FIGS. 3A-3C Increasing limits of deletion size using nested deletion donor arrays.
  • Top Schematic of arrayed msd retron cassette with two donors to make 25 and 50 bp deletions.
  • Middle Schematic of a nested deletion strategy using two donors to delete 25 bp and 50 bp. If the 25 bp occurs first, the 50 bp deletion becomes a 25 bp deletion.
  • Bottom Quantification of precise editing rates for single 25 and 50 bp deletions, and for the nested 50 bp deletion.
  • Middle Schematic of a nested deletion strategy using three donors to delete 25 bp, 50 bp and 100 bp.
  • FIGS. 4A-4D Multisite editing of individual bacterial genomes using multitrons.
  • FIGS. 5A-5F Metabolic engineering using multitrons.
  • A. Top architecture of the multiplexed retron recombineering cassette in the temperature sensitive plasmid. The operon is composed of a single msr followed by 5x arrayed msds with donors and the genes encoding the RT, the CspRecT and the dominant negative MutLE32K.
  • Bottom quantification of precise editing rates using a 5x arrayed msd to edit hda, fbaH, priB, rpoB and gyrA by Illumina sequencing 24h and 48h after of editing (two-way ANOVA, effect of expression time PO.OOOl).
  • Circles show each of the three biological replicates, bars are mean ⁇ SD. The order of the donors in the arrayed msd is indicated.
  • FIGS 6A-J Arrayed retron msds enable multiplexed editing in eukaryotic cells.
  • A. Top Schematic of the donor encoding retron ncRNA/gRNA expression cassette expressed from a Gal7 Pol II promoter and flanked by ribozymes versus a new construction replacing ribozymes with Csy4 sequences.
  • Bottom left schematic of a retron ncRNA-Cas9 gRNA hybrid for genome editing in yeast, depicted above the protein-coding expression cassette which is inserted into the yeast genome.
  • Bottom right quantification of precise editing of the ADE2 locus in yeast by Illumina sequencing after 48h of editing.
  • C Editing rates for each indicated site at each time point from bulk (Illumina amplicon sequencing) and individual colony sequencing. Circles show each of the three biological replicates, bars are mean ⁇ SD. Mean colony sequencing rates are indicated with a bar.
  • D Quantification of expected (product of bulk rates at each indicated site) and real precise editing rates of double edits in individual genomes.
  • G-I top: schematic of 2x, 3x or 5x arrayed retron msdRNA-Cas9 gRNA expression cassettes, as shown in c.
  • the editors target ADE2 and FAA1 (E); ADE2, CAN1 and FAA1 (F); and ADE2, CAN1, TRP2, SGS1 and FAA1 (G).
  • J. Arrayed retron msds enable multiplexed editing in human cells.
  • Top Schematic of the donor encoding retron ncRNA/gRNA expression cassette expressed from an Hl promoter and flanked by tRNA-Cys-GCA (hCtRNA) sequences.
  • FIG. SI Trans msr multitron architecture enables precise genome editing.
  • Top Schematic of retron recombineering using an msd array with a single msr sequence in trans including a terminator (T) between the msd array and msr.
  • Bottom quantification of precise editing rates for precise editing of rpoB or gyrA simultaneously by Illumina sequencing after 24h of editing. Circles show each of the three biological replicates, bars are mean ⁇ SD.
  • FIGS. S2A-S2F Optimization of retron recombineering using a single plasmid.
  • B. Quantification of precise editing rates for rpoB target site at 30 and 37°C circles show each of the three biological replicates, lines are mean ⁇ SD.
  • C. Quantification of ODeoo using increasing concentrations of m-toluic acid after 16h of bacterial growing (n l).
  • FIGS. S3A-S3B Local off-target mutations.
  • A Quantification of precise editing rates for fbaH and hda genes using a live or dead version of Ecol RT, circles show each of the three biological replicates, bars are mean ⁇ SD.
  • B Local off-target mutation frequency in the 70 bp region of the chromosome homologous to fbaH and hda editing donors using a live of dead version of Ecol RT circles show each of the three biological replicates, bars are mean ⁇ SD. All data was quantified using Illumina MiSeq after 24h of editing.
  • FIGS. S4A-S4E Undesired on-target mutation rates caused by arrayed retron multiplexed editing in yeast cells.
  • A. Top Schematic of the donor encoding retron ncRNA/gRNA expression cassette expressed from a Gal7 Pol II promoter and flanked by ribozymes versus a new construction replacing ribozymes with Csy4 sequences.
  • Bottom left schematic of a retron ncRNA-Cas9 gRNA hybrid for genome editing in yeast, depicted above the protein-coding expression cassette which is inserted into the yeast genome.
  • Circles show each of the three biological replicates, bars are mean ⁇ SD; absence/presence of Csy4 in the protein-coding expression cassette is shown below the graph.
  • B. Top schematic of an arrayed retron ncRNA-Cas9 gRNA expression cassette, expressed from a Gal7 Pol II promoter, flanked by ribozymes, and separated by a Csy4 sequence. The retron editors in positions 1 and 2 target the ADE2 and FAA1 locus, respectively.
  • Bottom quantification of indel rates of the ADE2 and FAA1 loci in yeast by Illumina sequencing after 48h of editing.
  • Circles show each of the three biological replicates, bars are mean ⁇ SD; absence/presence of Csy4 in the protein-coding expression cassette is shown below the graph.
  • C-E top: schematic of 2x, 3x or 5x arrayed retron msdRNA-Cas9 gRNA expression cassettes.
  • the editors target ADE2 and FAA1 (e); ADE2, CAN1 and FAA1 (f); and ZDE2, CAN1, TRP2, SGS1 and FAA J (g).
  • Retrons are tripartite systems composed by a reverse transcriptase (RT), a contiguous non-coding RNA (ncRNA) with two regions, msr and msd, and an additional protein or RT- fused domain with diverse enzymatic functions (3).
  • RT reverse transcriptase
  • ncRNA contiguous non-coding RNA
  • ssDNA single-stranded DNA
  • retron-derived donors have been tested to efficiently edit genomes across kingdoms of life (4,5). For many biotechnological and therapeutic applications, editing of multiple DNA loci in a single genome is desired. Therefore, herein a multiplexed retron-based genome editing tool has been developed. New retron architectures were designed that coded for several ssDNA donors, showing that this group of prokaryotic RTs represent a versatile bioengineering tool for editing up to 5 loci simultaneously in both prokaryotic and eukaryotic cells with efficiencies >90%. This technology was used to engineer the lycopene metabolic path in E. coli and S. cerevisiae showing that retron-based genome editing could be used to increase the production of compounds of interest.
  • SSAPs single-stranded annealing proteins
  • references in the specification to "one embodiment,” “an embodiment,” etc., indicate that the embodiment described may include a particular aspect, feature, structure, moiety, or characteristic, but not every embodiment necessarily includes that aspect, feature, structure, moiety, or characteristic. Moreover, such phrases may, but do not necessarily, refer to the same embodiment referred to in other portions of the specification. Further, when a particular aspect, feature, structure, moiety, or characteristic is described in connection with an embodiment, it is within the knowledge of one skilled in the art to affect or connect such aspect, feature, structure, moiety, or characteristic with other embodiments, whether or not explicitly described.
  • one or more substituents on a phenyl ring refers to one to five, or one to four, for example if the phenyl ring is di -substituted.
  • the term “about” can refer to a variation of ⁇ 5%, ⁇ 10%, ⁇ 20%, or ⁇ 25% of the value specified. For example, “about 50" percent can in some embodiments carry a variation from 45 to 55 percent.
  • the term “about” can include one or two integers greater than and/or less than a recited integer at each end of the range. Unless indicated otherwise herein, the term “about” is intended to include values, e.g., weight percentages, proximate to the recited range that are equivalent in terms of the functionality of the individual ingredient, the composition, or the embodiment.
  • the term about can also modify the endpoints of a recited range as discuss above in this paragraph.
  • ranges recited herein also encompass any and all possible sub-ranges and combinations of sub-ranges thereof, as well as the individual values making up the range, particularly integer values.
  • a recited range e.g., weight percentages or carbon groups
  • Any listed range can be easily recognized as sufficiently describing and enabling the same range being broken down into at least equal halves, thirds, quarters, fifths, or tenths. As a non-limiting example, each range discussed herein can be readily broken down into a lower third, middle third and upper third, etc.
  • the invention encompasses not only the main group, but also the main group absent one or more of the group members.
  • the invention therefore envisages the explicit exclusion of any one or more of members of a recited group. Accordingly, provisos may apply to any of the disclosed categories or embodiments whereby any one or more of the recited elements, species, or embodiments, may be excluded from such categories or embodiments, for example, for use in an explicit negative limitation.
  • Recombinant as used herein to describe a nucleic acid molecule means a polynucleotide of genomic, cDNA, bacterial, semi synthetic, or synthetic origin which, by virtue of its origin or manipulation, is not associated with all or a portion of the polynucleotide with which it is associated in nature.
  • recombinant as used with respect to a protein or polypeptide means a polypeptide produced by expression of a recombinant polynucleotide.
  • the gene of interest is cloned and then expressed in transformed organisms, as described further below.
  • the host organism expresses the foreign gene to produce the protein under expression conditions.
  • a "cell” refers to any type of cell isolated from a prokaryotic, eukaryotic, or archaeon organism, including bacteria, archaea, fungi, protists, plants, and animals, including cells from tissues, organs, and biopsies, as well as recombinant cells, cells from cell lines cultured in vitro, and cellular fragments, cell components, or organelles comprising nucleic acids.
  • the term also encompasses artificial cells, such as nanoparticles, liposomes, polymersomes, or microcapsules encapsulating nucleic acids.
  • the methods described herein can be performed, for example, on a sample comprising a single cell or a population of cells.
  • the term also includes genetically modified cells.
  • transformation refers to the insertion of an exogenous polynucleotide (e.g., an engineered retron) into a host cell, irrespective of the method used for the insertion. For example, direct uptake, transduction or f-mating are included.
  • exogenous polynucleotide may be maintained as a non-integrated vector, for example, a plasmid, or alternatively, may be integrated into the host genome.
  • Recombinant host cells refer to cells which can be, or have been, used as recipients for recombinant vector or other transferred DNA, and include the original progeny of the original cell which has been transfected.
  • a “coding sequence” or a sequence which "encodes” a selected polypeptide is a nucleic acid molecule which is transcribed (in the case of DNA) and translated (in the case of mRNA) into a polypeptide in vivo when placed under the control of appropriate regulatory sequences (or “control elements”).
  • the boundaries of the coding sequence can be determined by a start codon at the 5' (amino) terminus and a translation stop codon at the 3' (carboxy) terminus.
  • a coding sequence can include, but is not limited to, cDNA from viral, prokaryotic or eukaryotic mRNA, genomic DNA sequences from viral or prokaryotic DNA, and even synthetic DNA sequences.
  • a transcription termination sequence may be located 3' to the coding sequence.
  • control elements include, but are not limited to, transcription promoters, transcription enhancer elements, transcription termination signals, polyadenylation sequences (located 3' to the translation stop codon), sequences for optimization of initiation of translation (located 5’ to the coding sequence), and translation termination sequences.
  • “Operably linked” refers to an arrangement of elements wherein the components so described are configured so as to perform their usual function.
  • a given promoter operably linked to a coding sequence is capable of effecting the expression of the coding sequence when the proper enzymes are present.
  • the promoter need not be contiguous with the coding sequence, so long as it functions to direct the expression thereof.
  • intervening untranslated yet transcribed sequences can be present between the promoter sequence and the coding sequence and the promoter sequence can still be considered “operably linked" to the coding sequence.
  • Encoded by or “coded by” refers to a nucleic acid sequence which codes for a polypeptide or RNA sequence.
  • the polypeptide sequence or a portion thereof contains an amino acid sequence of at least 3 to 5 amino acids, more preferably at least 8 to 10 amino acids, and even more preferably at least 15 to 20 amino acids from a polypeptide encoded by the nucleic acid sequence.
  • the RNA sequence or a portion thereof contains a nucleotide sequence of at least 3 to 5 nucleotides, more preferably at least 8 to 10 nucleotides, and even more preferably at least 15 to 20 nucleotides.
  • isolated refers to material that is free to varying degrees from components which normally accompany it as found in its native state.
  • Isolate denotes a degree of separation from original source or surroundings.
  • Purify denotes a degree of separation that is higher than isolation.
  • a “purified” or “biologically pure” protein is sufficiently free of other materials such that any impurities do not materially affect the biological properties of the protein or cause other adverse consequences. That is, a nucleic acid or peptide of this invention is purified if it is substantially free of cellular material, viral material, or culture medium when produced by recombinant DNA techniques, or chemical precursors or other chemicals when chemically synthesized.
  • Purity and homogeneity are typically determined using analytical chemistry techniques, for example, polyacrylamide gel electrophoresis or high-performance liquid chromatography.
  • the term "purified" can denote that a nucleic acid or protein gives rise to essentially one band in an electrophoretic gel.
  • modifications for example, phosphorylation or glycosylation, different modifications may give rise to different isolated proteins, which can be separately purified.
  • substantially purified generally refers to isolation of a substance (compound, polynucleotide, protein, polypeptide, peptide composition) such that the substance comprises the majority percent of the sample in which it resides.
  • a substantially purified component comprises 50%, preferably 80%-85%, more preferably 90-95% of the sample.
  • Techniques for purifying polynucleotides and polypeptides of interest are well-known in the art and include, for example, ion-exchange chromatography, affinity chromatography and sedimentation according to density.
  • “Expression” refers to detectable production of a gene product by a cell.
  • the gene product may be a transcription product (i.e., RNA), which may be referred to as “gene expression”, or the gene product may be a translation product of the transcription product (i.e., a protein), depending on the context.
  • “Purified polynucleotide” refers to a polynucleotide of interest or fragment thereof which is essentially free, e.g., contains less than about 50%, preferably less than about 70%, and more preferably less than about at least 90%, of the protein and/or nucleic acids with which the polynucleotide is naturally associated.
  • Techniques for purifying polynucleotides of interest include, for example, disruption of the cell containing the polynucleotide with a chaotropic agent and separation of the polynucleotide(s) and proteins by ion-exchange chromatography, affinity chromatography and sedimentation according to density.
  • transfection is used to refer to the uptake of foreign DNA by a cell.
  • a cell has been "transfected” when exogenous DNA has been introduced inside the cell membrane.
  • transfection techniques are generally known in the art. See, e.g., Graham et al. (1973) Virology, 52:456, Sambrook et al. (2001) Molecular Cloning, a laboratory manual, 3rd edition, Cold Spring Harbor Laboratories, New York, Davis et al. (1995) Basic Methods in Molecular Biology, 2nd edition, McGraw-Hill, and Chu et al. (1981) Gene 13: 197.
  • Such techniques can be used to introduce one or more exogenous DNA moieties into suitable host cells.
  • the term refers to both stable and transient uptake of the genetic material and includes uptake of peptide-linked or antibody-linked DNAs.
  • a “vector” is capable of transferring nucleic acid sequences to target cells (e.g., viral vectors, non-viral vectors, particulate carriers, and liposomes).
  • target cells e.g., viral vectors, non-viral vectors, particulate carriers, and liposomes.
  • vector construct e.g., viral vectors, non-viral vectors, particulate carriers, and liposomes.
  • expression vector e transfer vector
  • the term includes cloning and expression vehicles, as well as viral vectors.
  • “Mammalian cell” refers to any cell derived from a mammalian subject suitable for transfection with an engineered retron or vector system comprising an engineered retron, as described herein.
  • the cell may be xenogeneic, autologous, or allogeneic.
  • the cell can be a primary cell obtained directly from a mammalian subject.
  • the cell may also be a cell derived from the culture and expansion of a cell obtained from a mammalian subject. Immortalized cells are also included within this definition.
  • the cell has been genetically engineered to express a recombinant protein and/or nucleic acid.
  • subject includes animals, including both vertebrates and invertebrates, including, without limitation, invertebrates such as arthropods, mollusks, annelids, and cnidarians; and vertebrates such as amphibians, including frogs, salamanders, and caecillians; reptiles, including lizards, snakes, turtles, crocodiles, and alligators; fish; mammals, including human and non-human mammals such as non-human primates, including chimpanzees and other apes and monkey species; laboratory animals such as mice, rats, rabbits, hamsters, guinea pigs, and chinchillas; domestic animals such as dogs and cats; farm animals such as sheep, goats, pigs, horses and cows; and birds such as domestic, wild and game birds, including chickens, turkeys and other gallinaceous birds, ducks, geese, and the like.
  • the disclosed methods find use of the disclosed methods, find
  • Gene transfer refers to methods or systems for reliably inserting DNA or RNA of interest into a host cell. Such methods can result in transient expression of non-integrated transferred DNA, extrachromosomal replication and expression of transferred replicons (e.g., episomes), or integration of transferred genetic material into the genomic DNA of host cells.
  • Gene delivery expression vectors include, but are not limited to, vectors derived from bacterial plasmid vectors, viral vectors, non-viral vectors, alphaviruses, pox viruses and vaccinia viruses.
  • derived from is used herein to identify the original source of a molecule but is not meant to limit the method by which the molecule is made which can be, for example, by chemical synthesis or recombinant means.
  • a polynucleotide "derived from" a designated sequence refers to a polynucleotide sequence which comprises a contiguous sequence of approximately at least about 6 nucleotides, preferably at least about 8 nucleotides, more preferably at least about 10-12 nucleotides, and even more preferably at least about 15-20 nucleotides corresponding, i.e., identical or complementary to, a region of the designated nucleotide sequence.
  • the derived polynucleotide will not necessarily be derived physically from the nucleotide sequence of interest, but may be generated in any manner, including, but not limited to, chemical synthesis, replication, reverse transcription or transcription, which is based on the information provided by the sequence of bases in the region(s) from which the polynucleotide is derived. As such, it may represent either a sense or an antisense orientation of the original polynucleotide.
  • a "barcode” refers to one or more nucleotide sequences that are used to identify a nucleic acid or cell with which the barcode is associated. Barcodes can be 3-1000 or more nucleotides in length, preferably 10-250 nucleotides in length, and more preferably 10-30 nucleotides in length, including any length within these ranges, such as 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 nucleotides in length.
  • Barcodes may be used, for example, to identify a single cell, subpopulation of cells, colony, or sample from which a nucleic acid originated. Barcodes may also be used to identify the position (i.e., positional barcode) of a cell, colony, or sample from which a nucleic acid originated, such as the position of a colony in a cellular array, the position of a well in a multi-well plate, or the position of a tube, flask, or other container in a rack. For example, a barcode may be used to identify a genetically modified cell from which a nucleic acid originated. In some embodiments, a barcode is used to identify a particular type of genome edit or a particular type of donor nucleic acid.
  • hybridize and “hybridization” refer to the formation of complexes between nucleotide sequences which are sufficiently complementary to form complexes via Watson-Crick base pairing.
  • homologous region refers to a region of a nucleic acid with homology to another nucleic acid region. Thus, whether a "homologous region” is present in a nucleic acid molecule is determined with reference to another nucleic acid region in the same or a different molecule. Further, since a nucleic acid is often double-stranded, the term “homologous, region,” as used herein, refers to the ability of nucleic acid molecules to hybridize to each other. For example, a single-stranded nucleic acid molecule can have two homologous regions which are capable of hybridizing to each other. Thus, the term “homologous region” includes nucleic acid segments with complementary sequences.
  • Homologous regions may vary in length, but will typically be between 4 and 500 nucleotides (e.g., from about 4 to about 40, from about 40 to about 80, from about 80 to about 120, from about 120 to about 160, from about 160 to about 200, from about 200 to about 240, from about 240 to about 280, from about 280 to about 320, from about 320 to about 360, from about 360 to about 400, from about 400 to about 440, etc.).
  • nucleotides e.g., from about 4 to about 40, from about 40 to about 80, from about 80 to about 120, from about 120 to about 160, from about 160 to about 200, from about 200 to about 240, from about 240 to about 280, from about 280 to about 320, from about 320 to about 360, from about 360 to about 400, from about 400 to about 440, etc.
  • complementary refers to polynucleotides that are able to form base pairs with one another. Base pairs are typically formed by hydrogen bonds between nucleotide units in an anti-parallel orientation between polynucleotide strands. Complementary polynucleotide strands can base pair in a Watson- Crick manner (e.g., A to T, A to U, C to G), or in any other manner that allows for the formation of duplexes. As persons skilled in the art are aware, when using RNA as opposed to DNA, uracil (U) rather than thymine (T) is the base that is considered to be complementary to adenosine.
  • uracil when uracil is denoted in the context of the present invention, the ability to substitute a thymine is implied, unless otherwise stated.
  • “Complementarity” may exist between two RNA strands, two DNA strands, or between an RNA strand and a DNA strand. It is generally understood that two or more polynucleotides may be “complementary” and able to form a duplex despite having less than perfect or less than 100% complementarity. Two sequences are "perfectly complementary” or "100% complementary” if at least a contiguous portion of each polynucleotide sequence, comprising a region of complementarity, perfectly base pairs with the other polynucleotide without any mismatches or interruptions within such region.
  • Two or more sequences are considered “perfectly complementary” or " 100% complementary” even if either or both polynucleotides contain additional non-complementary sequences as long as the contiguous region of complementarity within each polynucleotide is able to perfectly hybridize with the other.
  • "Less than perfect” complementarity refers to situations where less than all of the contiguous nucleotides within such region of complementarity are able to base pair with each other. Determining the percentage of complementarity between two polynucleotide sequences is a matter of ordinary skill in the art.
  • Cas9 encompasses type II clustered regularly interspaced short palindromic repeats (CRISPR) system Cas9 endonucleases from any species, and also includes biologically active fragments, variants, analogs, and derivatives thereof that retain Cas9 endonuclease activity (i.e., catalyze site-directed cleavage of DNA to generate doublestrand breaks).
  • CRISPR clustered regularly interspaced short palindromic repeats
  • a gRNA may comprise a sequence "complementary" to a target sequence (e.g., major or minor allele), capable of sufficient base-pairing to form a duplex (i.e., the gRNA hybridizes with the target sequence). Additionally, the gRNA may comprise a sequence complementary to a PAM sequence, wherein the gRNA also hybridizes with the PAM sequence in a target DNA.
  • a target sequence e.g., major or minor allele
  • the gRNA may comprise a sequence complementary to a PAM sequence, wherein the gRNA also hybridizes with the PAM sequence in a target DNA.
  • donor polynucleotide refers to a polynucleotide that provides a sequence of an intended edit to be integrated into the genome at a target locus by HDR or recombineering.
  • a "target site” or “target sequence” is the nucleic acid sequence recognized (i.e., sufficiently complementary for hybridization) by a guide RNA (gRNA) or a homology arm of a donor polynucleotide.
  • the target site may be allele-specific (e.g., a major or minor allele).
  • a target site can be a genomic site that is intended to be modified such as by insertion of one or more nucleotides, replacement of one or more nucleotides, deletion of one or more nucleotides, or a combination thereof.
  • homology arm is meant a portion of a donor polynucleotide that is responsible for targeting the donor polynucleotide to the genomic sequence to be edited in a cell.
  • the donor polynucleotide typically comprises a 5' homology arm that hybridizes to a 5' genomic target sequence and a 3' homology arm that hybridizes to a 3' genomic target sequence flanking a nucleotide sequence comprising the intended edit to the genomic DNA.
  • the homology arms are referred to herein as 5' and 3' (i.e., upstream and downstream) homology arms, which relates to the relative position of the homology arms to the nucleotide sequence comprising the intended edit within the donor polynucleotide.
  • the 5' and 3' homology arms hybridize to regions within the target locus in the genomic DNA to be modified, which are referred to herein as the "5' target sequence” and "3' target sequence,” respectively.
  • the nucleotide sequence comprising the intended edit can be integrated into the genomic DNA by HDR or recombineering at the genomic target locus recognized (i.e., sufficiently complementary for hybridization) by the 5' and 3' homology arms.
  • a CRISPR adaptation system refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas") genes, including sequences encoding a Cas gene, and a CRISPR array nucleic acid sequence including a leader sequence and at least one repeat sequence.
  • CRISPR-associated (“Cas") genes including sequences encoding a Cas gene, and a CRISPR array nucleic acid sequence including a leader sequence and at least one repeat sequence.
  • one or more elements of a CRISPR adaption system are derived from a type I, type II, or type III CRISPR system.
  • Casl and Cas2 are found in all three types of CRISPR-Cas systems, and they are involved in spacer acquisition. In the I-E system of E. coh. Casl and Cas2 form a complex where a Cas2 dimer bridges two Casl dimers.
  • Cas2 performs a non-enzymatic scaffolding role, binding double-stranded fragments of invading DNA, while Casl binds the single-stranded flanks of the DNA and catalyzes their integration into CRISPR arrays.
  • one or more elements of a CRISPR system is derived from a particular organism comprising an endogenous CRISPR system, such as Streptococcus pyogenes.
  • a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system).
  • a CRISPR system refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, and a CRISPR array nucleic acid sequence including a leader sequence and at least one repeat sequence.
  • a CRISPR system can be a type I, type II, or type III CRISPR system.
  • a vector comprises a regulatory element operably linked to an enzyme-coding sequence encoding a CRISPR enzyme, such as a Cas protein.
  • Cas proteins include Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csnl and Csxl2), CaslO, Csyl, Csy2, Csy3, Csel, Cse2, Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, Csf4, homo
  • Casl and Cas2 are found in type I, type II, or type III CRISPR systems, and they are involved in spacer acquisition.
  • Casl and Cas2 form a complex where a Cas2 dimer bridges two Casl dimers.
  • Cas2 performs a non-enzymatic scaffolding role, binding double-stranded fragments of invading (phage) DNA, while Casl binds the single-stranded flanks of the DNA and catalyzes their integration into CRISPR arrays.
  • the disclosure provides protospacers that are adjacent to short (3 - 5 bp) DNA sequences termed protospacer adjacent motifs (PAM).
  • PAMs are important for type I and type II systems during acquisition.
  • type I and type II systems protospacers are excised at positions adjacent to a PAM sequence, with the other end of the spacer is cut using a ruler mechanism, thus maintaining the regularity of the spacer size in the CRISPR array.
  • the conservation of the PAM sequence differs between CRISPR-Cas systems and may be evolutionarily linked to Casl and the leader sequence.
  • the disclosure provides for integration of defined synthetic DNA that is produced within a cell such as by using an engineered retron system within the cell into a CRISPR array in a directional manner, occurring preferentially, but not exclusively, adjacent to the leader sequence.
  • the protospacer is a defined synthetic DNA.
  • the defined synthetic DNA is at least 3, 5, 10, 20, 30, 40, or 50 nucleotides, or between 3-50, or between 10-100, or between 20-90, or between 30-80, or between 40-70, or between 50-60, nucleotides in length.
  • the oligo nucleotide sequence or the defined synthetic DNA includes a modified "AAG" protospacer adjacent motif (PAM).
  • a regulatory element is operably linked to one or more elements of a CRISPR system so as to drive expression of the one or more elements of the CRISPR system.
  • CRISPRs Clustered Regularly Interspaced Short Palindromic Repeats
  • SPIDRs Sacer Interspersed Direct Repeats
  • the CRISPR locus comprises a distinct class of interspersed short sequence repeats (SSRs) that were recognized in E. coli (Ishino et al, J. BacterioL, 169:5429-5433 (1987); and Nakata et al., J.
  • the CRISPR loci typically differ from other SSRs by the structure of the repeats, which have been termed short regularly spaced repeats (SRSRs) (Janssen et al, OMICS J. Integ. Biol., 6:23-33 (2002); and Mojica et al, Mol. Microbiol., 36:244-246 (2000)).
  • SRSRs short regularly spaced repeats
  • the repeats are short elements that occur in clusters that are regularly spaced by unique intervening sequences with a substantially constant length (Mojica et al., (2000), supra).
  • the repeat sequences are highly conserved between strains, the number of interspersed repeats and the sequences of the spacer regions typically differ from strain to strain (van Embden et al., J.
  • CRISPR loci have been identified in more than 40 prokaryotes (See e.g., Jansen et al, Mol. Microbiol., 43: 1565- 1575 (2002); and Mojica et al, (2005)) including, but not limited to Aeropyrum, Pyrobaculum, Sulfolobus, Archaeoglobus, Halocarcula, Methanobacteriumn, Methanococcus, Methanosarcina, Methanopyrus, Pyrococcus, Picrophilus, Thernioplasnia, Corynebacterium, Mycobacterium, Streptomyces, Aquifrx, Porphvromonas, Chlorobium, Thermus, Bacillus, Listeria, Staphylococcus, Clostridium, Thermoanaerobacter, Mycoplasma, Fusobacterium, Azarcus, Chromobacterium, Neisseria,
  • an enzyme coding sequence encoding a CRISPR enzyme is codon optimized for expression in particular cells, such as eukaryotic cells.
  • the eukaryotic cells may be those of or derived from a particular organism, such as a mammal, including but not limited to human, mouse, rat, rabbit, dog, or non-human primate.
  • codon optimization refers to a process of modifying a nucleic acid sequence for enhanced expression in the host cells of interest by replacing at least one codon (e.g. about one or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more codons) of the native sequence with codons that are more frequently or most frequently used in the genes of that host cell while maintaining the native amino acid sequence.
  • Codon bias differs in codon usage between organisms
  • mRNA messenger RNA
  • tRNA transfer RNA
  • the predominance of selected tRNAs in a cell is generally a reflection of the codons used most frequently in peptide synthesis. Accordingly, genes can be tailored for optimal gene expression in a given organism based on codon optimization. Codon usage tables are readily available, for example, at the "Codon Usage Database", and these tables can be adapted in a number of ways.
  • codon optimizing a particular sequence for expression in a particular host cell are also available, such as Gene Forge (Aptagen; Jacobus, Pa.), are also available.
  • one or more codons e.g., 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more, or all codons
  • one or more codons in a sequence encoding a CRISPR enzyme correspond to the most frequently used codon for a particular amino acid.
  • Some bacterial cells have a Class II CRISPR system where endoribonucleases (cas nucleases) are expressed that can preferentially cleave specific sequences, including certain repeat sequences in DNA, various U-rich regions in RNAs, sites near a protospacer adjacent motif (PAM).
  • Class II CRISPR systems for example, can include a cluster of four genes Cas9, Casl, Cas2, and Csnl, that employ a tracrRNA and a crispr RNA (crRNA).
  • crRNA crispr RNA
  • targeted DNA double-strand break (DSB) may be generated in four sequential steps. First, the pre-crRNA and tracrRNA, may be expressed.
  • tracrRNA may hybridize to the direct repeats of pre-CRISPR guide RNA (pre-crRNA), which is then processed into mature crRNAs containing individual spacer sequences.
  • pre-crRNA pre-CRISPR guide RNA
  • the mature crRNA:tracrRNA complex can direct a cas nuclease to the DNA target consisting of the protospacer and the corresponding PAM sequence via heteroduplex formation between the spacer region of the crRNA and the protospacer DNA.
  • the cas nuclease may then cleave target DNA upstream of the PAM site to create a double-stranded break within the protospacer. Such cleavage can undermine or destroy a phage.
  • Cas nucleases bind to nucleic acids only in presence of a specific sequence, called protospacer adjacent motif (PAM), on the non-targeted DNA strand. Therefore, the locations in the genome that can be targeted by different Cas proteins are limited by the locations of these PAM sequences.
  • the cas nuclease cuts 3-4 nucleotides upstream of the PAM sequence.
  • Table 1 Examples of Cas nucleases and their PAM sequences.
  • an “N” in a PAM sequence means that any nucleotide is present; an R means that an A or a G is present; a W means that an A or a T is present; a Y means that a T or a C is present; and a V means that an A, C or G is present.
  • "Administering" a nucleic acid, such as an engineered retron construct or vector comprising an engineered retron construct to a cell comprises transducing, transfecting, electroporating, translocating, fusing, phagocytosing, shooting or ballistic methods, etc., i.e., any means by which a nucleic acid can be transported across a cell membrane.
  • Retrons are defined by their unique ability to produce an unusual satellite DNA known as msDNA (multicopy single-stranded DNA).
  • a typical retron operon consists of a gene encoding a retron reverse transcriptase (RT) (encoded by the ret gene) and a region encoding a non-coding RNA (ncRNA), which includes two contiguous and inverted non-coding sequences referred to as the msr and msd.
  • the ncRNA serves both as a primer site (i.e., the msr region) for binding of the retron RT and template for the reverse transcriptase (i.e., the msd region), and a gene encoding an accessory protein.
  • RNA transcripts including the msr and msd
  • the ncRNA then becomes folded into a specific secondary structure.
  • the 5' and 3' ends of ncRNA are referred to generally as the al and a2 complementary regions and can hybridize to one another to form a stem or duplex region referred to as the “al/a2 stem” or the “al/a2 duplex” of the ncRNA.
  • the retron RT once translated, binds the ncRNA downstream from the msd locus (without being bound by theory, the binding may involve the al/a2 duplex) and initiates reverse transcription of the msd region as a template sequence, thereby generating a single strand DNA reverse transcriptase product (i.e., the RT-DNA, with a characteristic hairpin structure, which in wild type retrons varies in length from about 48 to 163 bases).
  • the RT-DNA as part of the priming event, is covalently attached to a 2’ OH group present in a conserved branching guanosine residue. Reverse transcription halts before reaching the msr locus.
  • RNA the remaining portions of the ncRNA not removed by processing
  • DNA the single stranded RT-DNA product covalently attached to the ncRNA
  • retrons A large number of retrons have been identified and can be modified or engineered as described herein.
  • Ecol previously called Ec86
  • BL21 E. coli cells this retron is present and active, producing reverse transcriptase DNA that can be detected at the population level.
  • the wild type Ecol retron can be eliminated from BL21 E. coli cells by removing the retron operon from the genome. In the absence of this native operon, the ncRNA and reverse transcriptase can be expressed from a plasmid lacking the accessory protein.
  • the accessory protein is a core component of the phage-defense conferred by retrons, this reduced system would reduce phage defense capacity, yet cells with ncRNA-reverse transcriptase encoding plasmids continue to produce abundant reverse transcribed DNA.
  • the accessory protein coding region is not included in the engineered retrons.
  • ncRNA Ecol wild type retron non-coding RNA
  • RT reverse transcriptase
  • An example of an Eco2 human-codon optimized reverse transcriptase (RT) sequence is shown below as SEQ ID NO: 4.
  • An example of an Ecol wild type retron reverse transcriptase sequence is shown below as SEQ ID NO: 5.
  • SEQ ID NO: 3 An example of an Eco2 wild type retron reverse transcriptase sequence is shown below as SEQ ID NO: 3.
  • SEQ ID NO: 6 An example of an Eco2 wild type retron reverse transcriptase sequence is shown below as SEQ ID NO: 6.
  • SEQ ID NO: 7 An example of a sequence for an Eco4 retron reverse transcriptase is shown below as SEQ ID NO: 7.
  • SEQ ID NO: 8 An example of a sequence for a Sen2 retron reverse transcriptase is shown below as SEQ ID NO: 8.
  • variants and homologs of any of the sequences described here can also be used in the methods and systems described herein.
  • such variants and homologs can have less than 100% sequence identity to any of the sequences described herein.
  • the variants and homologs can have about at least 40% sequence identity, or at least 50% sequence identity, or at least 60% sequence identity, or at least 70% sequence identity, or at least 80% sequence identity, or at least 90% sequence identity, or at least 95% sequence identity, or at least 96% sequence identity, or at least 97% sequence identity, or at least 98% sequence identity, or at least 99% sequence identity, or 60-99% sequence identity, or 70-99% sequence identity, or 80- 99% sequence identity, or 90-95% sequence identity, or 90-99% sequence identity, or 95-97% sequence identity, or 97-99% sequence identity, or 100% sequence identity with any of sequences described herein.
  • any of the retrons described in Mestre et al., Systematic Prediction of Genes Functionally Associated with Bacterial Retrons and Classification of The Encoded Tripartite Systems, Nucleic Acids Research, Volume 48, Issue 22, 16 December 2020, Pages 12632- 12647” may be used as a starting point by which to introduce the modifications described herein to result in the engineered retrons, ncRNAs, msDNAs, and RT-DNAs described herein.
  • These retron sequences are provided as follows in Table A:
  • Engineered retrons in which modifications of the retron ncRNA to enable encoding of multiple editing donors/repair templates, are provided, such that two or more donor/templates are provided, including two, three, four five, six, seven, eight, nine, or ten donor/template sequences are provided.
  • vector systems encoding such engineered retrons and methods of using engineered retrons and vector systems encoding them in various applications such as CRISPR/Cas-mediated genome editing, recombineering, cellular barcoding, and molecular recording are also provided.
  • some embodiments comprise:
  • ncRNAs modified retron non-coding RNAs that contain homology to multiple sites (e.g., two or more sites) in the bacterial genome with the intended mutations (deletion, addition, substitution etc.) in these sites;
  • a ret gene coding for a reverse transcriptase such as a retron reverse-transcriptase (RT) protein to reverse-transcribe the retron ncRNA into a single-stranded DNA template for recombination
  • a reverse transcriptase such as a retron reverse-transcriptase (RT) protein to reverse-transcribe the retron ncRNA into a single-stranded DNA template for recombination
  • SSAP recT/single-stranded annealing protein
  • SSB single-stranded binding protein
  • some embodiments comprise:
  • modified retron ncRNAs that contain both homology to multiple sites in the eukaryotic genome and the intended change mutations deletion, addition, substitution etc.) to those sites;
  • a retron reverse-transcriptase (RT) protein to reverse-transcribe the retron ncRNA into a single-stranded DNA donor for precisely repairing the yeast genome with the desired edit;
  • a DNA repair template can comprise a single strand DNA product of reverse transcription which comprises a nucleotide sequence having a sequence modification (e.g., a desired one or more mutations, insertion, deletion, or inversion) that is flanked by regions of homology to a target genomic site.
  • Such engineered retrons provide both the guide RNA (as part of the ncRNA) and the DNA repair template (encoded as part of the msd region, which is converted by the retron RT to a single strand RT-DNA which operates as the DNA repair template), thereby providing a vehicle to make the desired nucleotide changes at genomic sites.
  • the guide RNA as part of the ncRNA
  • the DNA repair template encoded as part of the msd region, which is converted by the retron RT to a single strand RT-DNA which operates as the DNA repair template
  • the engineered retron ncRNA can be modified from its endogenous sequence (e.g., the endogenous ncRNA from retron sequences of Table A) in various ways, including but not limited to : (1) the ncRNA can be fused to a guide sequence (e.g., a CRISPR crRNA-tracrRNA), allowing the transcribed ncRNA to serve as a targeting molecule for a trans-expressed RNA-guided nuclease (e.g., a CRISPR nuclease); (2) the msd region (reverse transcribed region of the retron ncRNA) can be modified to contain a sequence that is reverse transcribed to provide DNA donor/repair template; and (3) the al/a2 duplex can be modified in length to facilitate increased production of the RT-DNA.
  • a guide sequence e.g., a CRISPR crRNA-tracrRNA
  • the msd region reverse transcribed region of the retron ncRNA
  • a DNA donor/repair template can comprise a single strand DNA product of reverse transcription which comprises a nucleotide sequence having a sequence modification (e.g., a desired one or more mutations, insertions, deletions, or inversions) that is flanked by regions of homology to a target genomic site.
  • a sequence modification e.g., a desired one or more mutations, insertions, deletions, or inversions
  • engineered retrons provide both the guide RNA (as part of the ncRNA) and the DNA donor/repair template (encoded as part of the msd region, which is converted by the retron RT to a single strand RT-DNA which operates as the DNA donor/repair template), thereby providing a vehicle to make the desired nucleotide changes at genomic sites
  • Retron msr, msd, and/or reverse transcriptases used in the engineered retrons may be derived from a bacterial retron operon.
  • Representative retrons are available such as those from gram-negative bacteria including, without limitation, myxobacteria retrons such as Myxococcus xanthus retrons (e.g., Mx65, Mxl62) and Stigmatella aurantiaca retrons (e.g., Sal63); Escherichia coli retrons (e.g., Ec48, E67, Ec73, Ec78, EC83, EC86, EC107, and Ecl07); Salmonella enlerica: Vibrio cholerae retrons (e.g., Vc81, Vc95, Vcl37); Vibrio parahaemolyticus (e.g., Vc96); and Nannocystis exedens retrons (e.g., Nel44), orthose retrons
  • Retron msr gene, msd gene, and ret gene nucleic acid sequences as well as retron reverse transcriptase protein sequences may be derived from any source, including those of Table A. Representative retron sequences, including msr gene, msd gene, and ret gene nucleic acid sequences and reverse transcriptase protein sequences are listed in the National Center for Biotechnology Information (NCBI) database. See, for example, NCBI entries: Accession Nos.
  • any of these retron sequences or a variant thereof comprising a sequence can include variant nucleotides, added nucleotides, or fewer nucleotides.
  • the retrons can have at least about 80-100% sequence identity thereto, including any percent identity within this range, such as 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity to any of the retron sequences described herein (including those defined by accession number), and can be used to construct an engineered retron or vector system comprising an engineered retron, as described herein.
  • recombinant retron constructs can have a non-native configuration with a non-native spacing between the msr gene, msd gene, and ret gene.
  • the msr gene and the msd gene may be separated in a trans arrangement rather than provided in the endogenous cis arrangement.
  • the ret gene may be provided in a trans arrangement with respect to either the msr gene or the msd gene, or both.
  • the ret gene is provided in a trans arrangement that eliminates a cryptic stop signal for the reverse transcriptase, which allows the generation of longer single stranded DNAs from the engineered retron construct.
  • the retron construct is modified with respect to the native retron to include a donor/repair templates of interest, such as two or more donor/repair templates of interest.
  • the retrons can be engineered with donor/repair templates for use in a variety of applications.
  • donor/repair templates can be added to retron constructs to provide a cell with a nucleic acid encoding a protein or regulatory RNA of interest, a donor polynucleotide suitable for use in gene editing, e.g., by homology directed repair (HDR) or recombination-mediated genetic engineering (recombineering), or a CRISPR protospacer DNA sequence for use in molecular recording, as discussed further herein.
  • HDR homology directed repair
  • CRISPR protospacer DNA sequence for use in molecular recording
  • multiple copies of donor DNAs can be generated in vivo from retron templates.
  • modified retron non-coding RNAs ncRNAs
  • Retron ncRNAs are naturally partially reverse transcribed into ssDNA.
  • the portion of the ncRNA that is partially reverse transcribed can provide the donor DNA for editing genomes, prokaryotic or eukaryotic. Such reverse transcription provides multiple copies of single stranded donor DNA, which is ideal for editing genomes.
  • the donor DNAs can be generated in host cells that also provide one or more types of single strand annealing proteins (SSAPs) and/or one or more singlestranded DNA binding proteins (SSBs).
  • the SSAPs can facilitate recombination (editing) and in some cases the SSAP is a RecT recombinase.
  • Single-stranded DNA binding proteins (SSBs) bind and stabilize single-stranded DNA (ssDNA).
  • the SSAP and/or SSB proteins can be expressed endogenously, or the bacterial host cells can be modified to include an expression cassette from which the SSAP and/or SSB proteins can be expressed.
  • the bacterial host cells can have, or be modified to express CspRecT as a SSAP.
  • RecT binds to single-stranded DNA and promotes the renaturation of complementary single-stranded DNAs to facilitate recombination.
  • RecT has a function similar to that of lambda RedB.
  • Constructs can also be used to express the different ncRNAs, reverse transcriptases, along with the SSAP, SSB, mutant mismatch repair proteins (e.g., mutL mutants), or combinations thereof.
  • the engineered retrons can include a unique barcode.
  • Barcodes may comprise one or more nucleotide sequences that are used to identify a nucleic acid or cell with which the barcode is associated. Such barcodes may be inserted for example, into the loop region of the msd- encoded DNA.
  • Barcodes can be 3-1000 or more nucleotides in length, preferably 10-250 nucleotides in length, and more preferably 10-30 nucleotides in length, including any length within these ranges, such as 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 nucleotides in length.
  • barcodes are also used to identify the position (i.e., positional barcode) of a cell, colony, or sample from which a retron originated, such as the position of a colony in a cellular array, the position of a well in a multiwell plate, the position of a tube in a rack, or the location of a sample in a laboratory.
  • a barcode may be used to identify the position of a genetically modified cell containing a retron. The use of barcodes allows retrons from different cells to be pooled in a single reaction mixture for sequencing while still being able to trace a particular retron back to the colony from which it originated.
  • expression cassettes with segments encoding any of the ncRNAs, donor DNAs, and/or reverse transcriptases, and/or other proteins that can facilitate editing can be linked to a barcode that is inserted into a genome and can be recovered by sequencing. In this way, many variables can be identified and evaluated in the same population of phage to assess relative integration frequency.
  • adapter sequences can be added to retron constructs to facilitate high- throughput amplification or sequencing.
  • a pair of adapter sequences can be added at the 5’ and 3’ ends of a retron construct to allow amplification or sequencing of multiple retron constructs simultaneously by the same set of primers.
  • Amplification of retron constructs may be performed, for example, before transfection of cells or ligation into vectors. Any method for amplifying the retron constructs may be used, including, but not limited to polymerase chain reaction (PCR), isothermal amplification, nucleic acid sequence-based amplification (NASBA), transcription mediated amplification (TMA), strand displacement amplification (SDA), and ligase chain reaction (LCR).
  • PCR polymerase chain reaction
  • NASBA nucleic acid sequence-based amplification
  • TMA transcription mediated amplification
  • SDA strand displacement amplification
  • LCR ligase chain reaction
  • the retron constructs comprise common 5’ and 3’ priming sites to allow amplification of retron sequences in parallel with a set of universal primers.
  • a set of selective primers is used to selectively amplify a subset of retron sequences from a pooled mixture.
  • the engineered ncRNA may comprise one or more guide sequences.
  • the guide RNA can be inserted into the al/a2 complementarity region of the retron, which region of the ncRNA structure is where the 5’ and 3’ ends of the ncRNA fold back upon themselves.
  • the guide RNA can be coupled to the 3’ end of the ncRNA in the al/a2 region. In another embodiment, the guide RNA can be coupled to the 5’ end of the ncRNA in the al/a2 region. In one embodiment, a guide RNA can be coupled to the 5’ and 3’ end of the ncRNA. In various embodiments, a linker may separate the 3’ or 5’ retron end, as the case may be, and the guide DNA.
  • the linker may be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 or more nucleotides in length.
  • the gRNA is coupled to the 3’ end of the al/a2 region.
  • the guide RNA may include a nucleotide sequence that is complementary to a genomic target sequence (i.e., a “spacer” sequence), and thereby mediates binding of the RNA-guided nuclease to which it is complexed (e.g., a Cas9 nuclease-gRNA complex) by hybridization between the space sequence and a complementary strand of the genomic target site.
  • a genomic target sequence i.e., a “spacer” sequence
  • the gRNA can be designed with a sequence complementary to the sequence of a mutant genomic allele to target the nuclease-gRNA complex to the site of a mutation.
  • the mutation may comprise an insertion, a deletion, or a substitution.
  • the mutation may include a single nucleotide variation, gene fusion, translocation, inversion, duplication, frameshift, missense, nonsense, or other mutation associated with a phenotype or disease of interest.
  • the targeted allele may be a common genetic variant or a rare genetic variant.
  • the gRNA is designed to selectively bind to an allele with single basepair discrimination, for example, to allow binding of the nuclease-gRNA complex to a single nucleotide polymorphism (SNP) and modification of the SNP.
  • the gRNA may be designed to target disease-relevant mutations of interest for the purpose of genome editing to remove the mutation from a gene.
  • the guide RNA can include a trans-activating crRNA (tracrRNA) scaffold recognized by a catalytically active RNA-guided nuclease (e.g., Cas9 nuclease).
  • a guide RNA has the complementary sequence to the target DNA site, often referred to as a CRISPR RNA (crRNA), and a trans-activating crRNA (tracrRNA) scaffold that is recognized by a catalytically active Cas9 protein.
  • the tracrRNA is made of up of a longer stretch of bases that are constant and provide the “stem loop” structure bound by the CRISPR nuclease.
  • the crRNA can anneal to the tracrRNA through a direct repeat sequence to form a dual-guide RNA (dgRNA), or the crRNA-tracrRNA can be expressed as a single RNA transcript.
  • dgRNA dual-guide RNA
  • the guide RNA may be a single guide RNA comprising crRNA and tracrRNA sequences in a single RNA molecule, or the guide RNA may comprise two RNA molecules with crRNA and tracrRNA sequences residing in separate RNA molecules.
  • the gRNA is 5-50 nucleotides, 10-30 nucleotides, 15-25 nucleotides, 18-22 nucleotides, or 18-21 nucleotides in length, or any length between the stated ranges, including, for example, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides in length.
  • 20 base gRNAs can be useful for the human editing, whereas 18 base gRNAs were used in many experiments for editing yeast cells.
  • CRISPR/Cas guide RNAs examples include CRISPR/Cas guide RNAs (as well as information regarding requirements related to protospacer adjacent motif (PAM) sequences present in targeted nucleic acids) can be found in the art, for example, see Jinek et al., Science. 2012 Aug 17;337(6096):816-21 ; Chylinski et al., RNA Biol. 2013 May;10(5):726-37; Ma et al., Biomed Res Int. 2013 ;2013:270805; Hou et al., Proc Natl Acad Sci U S A. 2013 Sep 24; 110(39): 15644- 9; Jinek et al., Elife.
  • PAM protospacer adjacent motif
  • the ncRNA may also be modified to include a nucleotide sequence that is reverse transcribed to form the donor/repair template.
  • the repair template has a sequence that binds to a genomic DNA locus.
  • the repair template sequence can be complementary to at least one chromosomal DNA strand.
  • the repair template is an HDR donor sequence which conducts repair of a DNA break by way of the homology-dependent repair pathway.
  • the donor/repair template has at least one nucleotide that is different from the complementary target sequence.
  • the donor/repair template has at least two nucleotides, or at least three nucleotides, or at least four nucleotides, or at least five nucleotides, or more that are different from the complementary target sequence.
  • These ‘different’ nucleotides are the repair nucleotides that can replace nucleotides or sequences (e.g., mutations) in the target chromosomal site.
  • the donor/repair template segment of the ncRNA can have repair nucleotides that are adjacent to each other, or repair nucleotides that are separate from each other within the repair template segment. Such separations are warranted, for example, when the target chromosomal locus has two or more mutations that are not adjacent to each other.
  • a homology arm may comprise a nucleotide sequence having at least about 80-100% sequence identity/complementarity to the corresponding genomic target sequence, including any percent identity within this range, such as at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity thereto, wherein the nucleotide sequence comprising the intended edit can be integrated into the genomic DNA by HDR at the genomic target locus recognized (i.e., having sufficient complementary for hybridization) by the 5' and 3' arms of the repair template.
  • the corresponding homologous nucleotide sequences in the genomic target sequence flank a specific site for cleavage and/or a specific site for introducing the intended edit.
  • the distance between the specific cleavage site and the homologous nucleotide sequences can be several hundred nucleotides. In some embodiments, the distance between a homology arm and the cleavage site is 200 nucleotides or less (e.g., at least 0, 10, 20, 30, 50, 75, 100, 125, 150, 175, and 200 nucleotides). In most cases, a smaller distance may give rise to a higher gene targeting rate.
  • the repair template is substantially identical to the target genomic sequence, across its entire length except for the sequence changes to be introduced to a portion of the genome that encompasses both the specific cleavage site and the portions of the genomic target sequence to be altered.
  • a homology arm of the repair template can be of any length, e.g., 10 nucleotides or more, 15 nucleotides or more, 20 nucleotides or more, 50 nucleotides or more, 100 nucleotides or more, 250 nucleotides or more, 300 nucleotides or more, 350 nucleotides or more, 400 nucleotides or more, 450 nucleotides or more, 500 nucleotides or more, 1000 nucleotides (1 kb) or more, 5000 nucleotides (5 kb) or more, 10000 nucleotides (10 kb) or more, etc. In some instances, the 5' and 3' homology arms are substantially equal in length to one another.
  • the 5' and 3' homology arms are not necessarily equal in length to one another.
  • one homology arm may be 30% shorter or less than the other homology arm, 20% shorter or less than the other homology arm, 10% shorter or less than the other homology arm, 5% shorter or less than the other homology arm, 2% shorter or less than the other homology arm, or only a few nucleotides less than the other homology arm.
  • the 5' and 3' homology arms are substantially different in length from one another, e.g., one may be 40% shorter or more, 50% shorter or more, sometimes 60% shorter or more, 70% shorter or more, 80% shorter or more, 90% shorter or more, or 95% shorter or more than the other homology arm.
  • the donor/repair template segment of the ncRNA can therefore be of various lengths.
  • the repair template segment is at least 15 nucleotides, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, at least 190, or at least 200 nucleotides in length.
  • the donor/repair template comprises or encodes a donor / template sequence, wherein the donor / template corrects / repairs / removes a mutation at the target genome site.
  • the mutation may be a mutated exon in a disease gene.
  • the donor/repair template may encode or comprises a functional DNA element, such as a promoter, an enhancer, a protein binding sequence, a methylation site, or a homology region for assisting gene editing, etc.
  • donor DNA or “donor DNA template” it is meant a single-stranded DNA to be inserted at a site cleaved by a programmable nuclease (e.g., a CRISPR/Cas effector protein or otherwise RNA-guided nuclease; a TALEN; a ZFN) (e.g., after dsDNA cleavage, after nicking a target DNA, after dual nicking a target DNA, and the like).
  • a programmable nuclease e.g., a CRISPR/Cas effector protein or otherwise RNA-guided nuclease; a TALEN; a ZFN
  • the donor DNA template can contain sufficient homology to a genomic sequence at the target site, e.g., 70%, 80%, 85%, 90%, 95%, or 100% homology with the nucleotide sequences flanking the target site, e.g. within about 50 bases or less of the target site, e.g. within about 30 bases, within about 15 bases, within about 10 bases, within about 5 bases, or immediately flanking the target site, to support homology-directed repair between it and the genomic sequence to which it bears homology.
  • sufficient homology to a genomic sequence at the target site e.g., 70%, 80%, 85%, 90%, 95%, or 100% homology with the nucleotide sequences flanking the target site, e.g. within about 50 bases or less of the target site, e.g. within about 30 bases, within about 15 bases, within about 10 bases, within about 5 bases, or immediately flanking the target site, to support homology-directed repair between it and the genomic sequence to which it bears homology.
  • Donor DNA template can be of any length, e.g., 50 nucleotides or more, 100 nucleotides or more, 250 nucleotides or more, 500 nucleotides or more, 1000 nucleotides or more, 5000 nucleotides or more, etc.
  • a suitable donor DNA template can be from 50 nucleotides to 100 nucleotides, from 100 nucleotides to 500 nucleotides, from 500 nucleotides to 1000 nucleotides, from 1000 nucleotides to 5000 nucleotides, or from 5000 nucleotides to 10,000 nucleotides, or more than 10,000 nucleotides, in length.
  • a donor DNA template can comprise a first homology arm and a second homology arm.
  • the first homology arm is at or near the 5’ end of the donor DNA; and comprises a nucleotide sequence that is at least partially complementary to a first nucleotide sequence in a target nucleic acid.
  • the second homology arm is at or near the 3’ end of the donor DNA; and comprises a nucleotide sequence that is at least partially complementary to a second nucleotide sequence in the target nucleic acid.
  • the first and second homology arms can each independently have a length of from about 10 nucleotides to 400 nucleotides; e.g., from 10 nucleotides (nt) to 15 nt, from 15 nt to 20 nt, from 20 nt to 25 nt, from 25 nt to 30 nt, from 30 nt to 35 nt, from 35 nt to 40 nt, from 40 nt to 45 nt, from 45 nt to 50 nt, from 50 nt to 75 nt, from 75 nt to 100 nt, from 100 nt to 125 nt, from 125 nt to 150 nt, from 150 nt to 175 nt, from 175 nt to 200 nt, from 200 nt to 225 nt, from 225 nt to 250 nt, from 250 nt to 275 nt, from 275 nt to 300 nt, from 325 n
  • the donor DNA template is used for editing the target nucleotide sequence.
  • the donor DNA template comprises one or more mutations to be introduced into the target polynucleotide. Examples of such mutations include substitutions, deletions, insertions, or a combination thereof.
  • the mutation causes a shift in an open reading frame on the target polynucleotide.
  • the donor polynucleotide alters a stop codon in the target polynucleotide.
  • the donor polynucleotide corrects a premature stop codon. The correction can be achieved by deleting the stop codon, or by introducing one or more sequence changes to alter the stop codon to a codon.
  • the donor polynucleotide addresses loss of function mutations, deletions, or translocations that may occur, for example, in certain disease contexts by inserting or restoring a functional copy of a gene, or functional fragment thereof, or a functional regulatory sequence or functional fragment of a regulatory sequence.
  • a functional fragment includes a fragment less than the entire copy of a gene but otherwise provides sufficient nucleotide sequence to restore the functionality of a wild type gene or noncoding regulatory sequence (e.g., sequences encoding long non-coding RNA).
  • the donor DNA template may be used to replace a single allele of a defective gene or defective fragment thereof. In another embodiment, the donor DNA template is used to replace both alleles of a defective gene or defective gene fragment.
  • a “defective gene” or “defective gene fragment” is a gene or portion of a gene that when expressed, fails to generate a functioning protein or non-coding RNA with functionality of the corresponding wild-type gene.
  • these defective genes may be associated with one or more disease phenotypes.
  • the defective gene or gene fragment is not replaced but the heterologous nucleic acid is used to insert donor polynucleotides that encode gene or gene fragments that compensate for or override defective gene expression such that cell phenotypes associated with defective gene expression are eliminated or changed to a different or desired cellular phenotype. This can be achieved by including the coding sequence of a therapeutic protein, such as a therapeutic antibody or functional fragment thereof, or a wild-type version of a defective protein associated with one or more disease phenotypes.
  • the donor may include, but not be limited to, genes or gene fragments, encoding proteins or RNA transcripts to be expressed, regulatory elements, repair templates, and the like.
  • the donor polynucleotides may comprise left end and right end sequence elements that function with transposition components that mediate insertion.
  • the donor DNA template manipulates a splicing site on the target polynucleotide.
  • the donor DNA template disrupts a splicing site. The disruption may be achieved by inserting the polynucleotide to a splicing site and/or introducing one or more mutations to the splicing site.
  • the donor polynucleotide may restore a splicing site.
  • the polynucleotide may comprise a splicing site sequence.
  • the donor DNA template to be inserted has a size from 10 bp to 50 kb in length, e.g., from 50 bp to ⁇ 40kb, from 100 bp to ⁇ 30 kb, from 100 bp to ⁇ 10 kb, from 100 bp to 300 bp, from 200 bp to 400 bp, from 300 bp to 500 bp, from 400 bp to 600 bp, from 500 bp to 700 bp, from 600 bp to 800 bp, from 700 bp to 900 bp, from 800 bp to 1000 bp, from 900 bp to 1100 bp, from 1000 bp to 1200 bp, from 1100 bp to 1300 bp, from 1200 bp to 1400 bp, from 1300 bp to 1500 bp, from 1400 bp to 1600 bp, from 1500 bp to 1700 bp, from 1600 bp
  • the homologous arm on one or both ends of the sequence to be inserted is independently about 20 bp, 40 bp, 60 bp, 80 bp, 100 bp, 120 bp, or 150 bp.
  • the first homology arm and the second homology arm of the donor DNA flank a nucleotide sequence (“a nucleotide sequence of interest” or “an intervening nucleotide sequence”) that is to be introduced into a target nucleic acid.
  • the nucleotide sequence of interest can comprise: i) a nucleotide sequence encoding a polypeptide of interest; ii) a nucleotide sequence encoding an exon of a gene; iii) a promoter sequence; iv) an enhancer sequence; v) a nucleotide sequence encoding a non-coding RNA; or vi) any combination of the foregoing.
  • the donor DNA can provide for gene correction, gene replacement, gene tagging, transgene insertion, nucleotide deletion, gene disruption, gene mutation, etc.
  • the donor DNA can be used to add, e.g., insert or replace, nucleic acid material to a target DNA (e.g. to “knock in” a nucleic acid that encodes a protein, an siRNA, an miRNA, etc.), to add a tag (e.g., 6xHis, a fluorescent protein (e.g., a green fluorescent protein; a yellow fluorescent protein, etc.), hemagglutinin (HA), FLAG, etc.), to add a regulatory sequence to a gene (e.g.
  • the donor DNA can be used to modify DNA in a site-specific, i.e. “targeted”, way; for example gene knock-out, gene knock-in, gene editing, gene tagging, etc., as used in, for example, gene therapy, e.g.
  • a disease or as an antiviral, antipathogenic, or anticancer therapeutic, the production of genetically modified organisms in agriculture, the large scale production of proteins by cells for therapeutic, diagnostic, or research purposes, the induction of pluripotent stem cells, biological research, the targeting of genes of pathogens for deletion or replacement, etc.
  • the donor DNA comprises a nucleotide sequence encoding a polypeptide of interest.
  • Polypeptides of interest include, e.g., a) functional versions of a polypeptide that comprises one or more amino acid substitutions, insertions, and/or deletions and that exhibits reduced function, e.g., where the reduced function is associated with or causes a pathological condition; b) fluorescent polypeptides; c) hormones; d) receptors for ligands; e) ion channels; f) neurotransmitters; g) and the like.
  • Non-limiting examples of polypeptides that can be encoded by a donor DNA include, e.g., IL1B (interleukin 1, beta), XDH (xanthine dehydrogenase), TP53 (tumor protein p53), PTGIS (prostaglandin 12 (prostacyclin) synthase), MB (myoglobin), IL4 (interleukin 4), ANGPT1 (angiopoietin 1), ABCG8 (ATP -binding cassette, sub-family G (WHITE), member 8), CTSK (cathepsin K), PTGIR (prostaglandin 12 (prostacyclin) receptor (IP)), KCNJ11 (potassium inwardly-rectifying channel, subfamily J, member 11), INS (insulin), CRP (C - reactive protein, pentraxin-related), PDGFRB (platelet- derived growth factor receptor, beta polypeptide), CCNA2 (cyclin A2), PDGFB (platelet-
  • ACE angiotensin I converting enzyme peptidyl-dipeptidase A 1)
  • TNF tumor necrosis factor
  • IL6 interleukin 6 (interferon, beta 2)
  • STN statin
  • SERPINE1 serotonin peptidase inhibitor
  • clade E nonin, plasminogen activator inhibitor type 1
  • ALB albumin
  • ADIPOQ adiponectin, C1Q and collagen domain containing
  • APOB apolipoprotein B (including Ag(x) antigen)
  • APOE apolipoprotein E
  • LEP laeptin
  • MTHFR 5,10-methylenetetrahydrofolate reductase (NADPH)
  • APOA1 apolipoprotein A-I
  • EDN1 endothelin 1
  • NPPB natriuretic peptide precursor B
  • NOS3 nitric oxide synthase 3
  • GNRH1 gonadotropin-releasing hormone 1 (luteinizing- releasing hormone)
  • PAPPA pregnancy-associated plasma protein A, pappalysin 1
  • ARR3 arrestin 3, retinal (X-arrestin)
  • NPPC natriuretic peptide precursor C
  • AHSP alpha hemoglobin stabilizing protein
  • PTK2 PTK2 protein tyrosine kinase 2
  • IL 13 interleukin 13
  • MTOR mechanistic target of rapamycin (serine/threonine kinase)
  • ITGB2 integratedin, beta 2 (complement component 3 receptor 3 and 4 subunit)
  • GSTT1 glutthione S- transfcrase theta 1
  • IL6ST interleukin 6 signal transducer (gpl30, oncostatin M receptor)
  • CPB2 carboxypeptidase B2 (plasma)
  • CYP1A2 cytochrome P
  • CAMP cathelicidin antimicrobial peptide
  • ZC3H12A zinc finger CCCH-type containing 12A
  • AKR1B1 aldo-keto reductase family 1, member Bl (aldose reductase)
  • DES desmin
  • MMP7 matrix metallopeptidase 7 (matrilysin, uterine)
  • AHR aryl hydrocarbon receptor
  • CSF1 colony stimulating factor 1 (macrophage)
  • HDAC9 histone deacetylase 9
  • CTGF connective tissue growth factor
  • KCNMA1 potassium large conductance calcium-activated channel, subfamily M, alpha member 1
  • UGT1A UDP glucuronosyltransf erase 1 family, polypeptide A complex locus
  • PRKCA protein kinase C, alpha
  • COMT catechol-b- methyltransf erase
  • SIOOB SI 00 calcium binding protein B
  • the donor DNA encodes a wildtype version of any of the foregoing polypeptides; i.e., the donor DNA can encode a “normal” version that does not include a mutation(s) that results in reduced function, lack of function, or pathogenesis.
  • the donor DNA comprises a nucleotide sequence encoding a fluorescent polypeptide.
  • Suitable fluorescent proteins include, but are not limited to, green fluorescent protein (GFP) or variants thereof, blue fluorescent variant of GFP (BFP), cyan fluorescent variant of GFP (CFP), yellow fluorescent variant of GFP (YFP), enhanced GFP (EGFP), enhanced CFP (ECFP), enhanced YFP (EYFP), GFPS65T, Emerald, Topaz (TYFP), Venus, Citrine, mCitrine, GFPuv, destabilized EGFP (dEGFP), destabilized ECFP (dECFP), destabilised EYFP (dEYFP), mCFPm, Cerulean, T-Sapphire, CyPet, YPet, mKO, HcRed, t- HcRed, DsRed, DsRed2, DsRed-monomer, J-Red, dimer2, t-dimer2(12), mRFP
  • fluorescent proteins include mHoneydew, mBanana, mOrange, dTomato, tdTomato, mTangerine, mStrawberry, mCherry, mGrapel, mRaspberry, mGrape2, m PI urn (Shaner et al. (2005) Nat. Methods 2:905-909), and the like. Any of a variety of fluorescent and colored proteins from Anthozoan species, as described in, e.g., Matz et al. (1999) Nature Biotechnol. 17:969-973, can be encoded.
  • the donor DNA encodes an RNA, e.g., an siRNA, a microRNA, a short hairpin RNA (shRNA), an anti-sense RNA, a riboswitch, a ribozyme, an aptamer, a ribosomal RNA, a transfer RNA, and the like.
  • an RNA e.g., an siRNA, a microRNA, a short hairpin RNA (shRNA), an anti-sense RNA, a riboswitch, a ribozyme, an aptamer, a ribosomal RNA, a transfer RNA, and the like.
  • a donor DNA can include, in addition to a nucleotide sequence encoding one or more gene products (e.g., an RNA and/or a polypeptide), one or more transcriptional control elements, e.g., a promoter, an enhancer, and the like.
  • the transcriptional control element is inducible.
  • the promoter is reversible.
  • the transcriptional control element is constitutive.
  • the promoter is functional in a eukaryotic cell.
  • the promoter is a cell type- specific promoter.
  • the promoter is a tissue-specific promoter.
  • the nucleotide sequence of the donor DNA is typically not identical to the target nucleic acid (e.g., genomic sequence) that it replaces. Rather, the donor DNA may contain at least one or more single base changes, insertions, deletions, inversions or rearrangements with respect to the target nucleic acid (e.g., genomic sequence), so long as sufficient homology is present to support homology-directed repair (e.g., for gene correction, e.g., to convert a diseasecausing base pair or a non-disease-causing base pair).
  • homology-directed repair e.g., for gene correction, e.g., to convert a diseasecausing base pair or a non-disease-causing base pair.
  • the donor DNA comprises a non-homologous sequence flanked by two regions of homology, such that homology-directed repair between the target DNA region and the two flanking sequences results in insertion of the non-homologous sequence at the target region.
  • Donor DNA may also comprise a vector backbone containing sequences that are not homologous to the DNA region of interest (the target nucleic acid) and that are not intended for insertion into the DNA region of interest (the target nucleic acid).
  • the homologous region(s) of a donor sequence will have at least 50% sequence identity to a target nucleic acid (e.g., a genomic sequence) with which recombination is desired. In certain cases, 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.9% sequence identity is present. Any value between 1% and 100% sequence identity can be present, depending upon the length of the donor polynucleotide.
  • the donor DNA may comprise certain nucleotide sequence differences as compared to the target nucleic acid (e.g., genomic sequence), where such difference includes, e.g. restriction sites, nucleotide polymorphisms, selectable markers (e.g., drug resistance genes, fluorescent proteins, enzymes etc.), etc., which may be used to assess for successful insertion of the donor DNA at the cleavage site or in some cases may be used for other purposes (e.g., to signify expression at the targeted genomic locus).
  • nucleotide sequence differences will not change the amino acid sequence, or will make silent amino acid changes (i.e., changes which do not affect the structure or function of the protein).
  • these sequences differences may include flanking recombination sequences such as FLPs, loxP sequences, or the like, that can be activated at a later time for removal of the marker sequence.
  • the donor DNA will include one or more nucleotide sequences to aid in localization of the donor to the nucleus of the recipient cell or to aid in the integration of the donor DNA into the target nucleic acid.
  • the donor DNA may comprise one or more nucleotide sequences encoding one or more nuclear localization signals and the like.
  • the donor DNA will include nucleotide sequences to recruit DNA repair enzymes to increase insertion efficiency.
  • Fiuman enzymes involved in homology directed repair include MRN-CtIP, BLM-DNA2, Exol, ERCC1, Rad51, Rad52, Ligase 1, RoIQ, PARP1, Ligase 3, BRCA2, RecQ/BLM-ToroIIIa, RTEL, Roid, and Roih (Verma and Greenburg (2016) Genes Dev. 30 (10): 1138-1154).
  • the donor DNA is delivered as reconstituted chromatin (Cruz-Becerra and Kadonaga (2020) eLife 2020;9:e55780 DOI: 10.7554/eLife.55780).
  • the ends of the donor DNA are protected (e.g., from exonucleolytic degradation) by any convenient method and such methods are known to those of skill in the art.
  • one or more dideoxynucleotide residues can be added to the 3' terminus of a linear molecule and/or self-complementary oligonucleotides can be ligated to one or both ends. See, for example, Chang et al. (1987) Proc. Natl. Acad Sci USA 84:4959-4963; Nehls et al. (1996) Science 272:886-889.
  • Additional methods for protecting exogenous polynucleotides from degradation include, but are not limited to, addition of terminal amino group(s) and the use of modified internucleotide linkages such as, for example, phosphorothioates, phosphoramidates, and O-methyl ribose or deoxyribose residues.
  • additional lengths of sequence may be included outside of the regions of homology that can be degraded without impacting recombination.
  • compositions, systems, and methods include use of two components, (1) a programmable nuclease (e.g., an RNA-guide CRISPR nuclease), and (2) a retron reverse transcriptase for synthesis of the msd DNA from the ncRNA.
  • a programmable nuclease e.g., an RNA-guide CRISPR nuclease
  • a retron reverse transcriptase for synthesis of the msd DNA from the ncRNA.
  • the programmable nuclease is targeted to a site in the genome by a guide RNA which can be fused or coupled to a retron non-coding RNA (ncRNA), which then generates a cut in the genome.
  • ncRNA retron non-coding RNA
  • This chromosomal break is then precisely repaired by the endogenous cellular machinery, using retron-derived reverse transcribed DNA (RT-DNA) as a repair template.
  • RT-DNA retron
  • the programmable nuclease used for genome modification is a Cas nuclease.
  • Any RNA-guided Cas nuclease capable of catalyzing site-directed cleavage of DNA to allow integration of donor polynucleotides can be used in genome editing, including CRISPR system type I, type II, or type III Cas nucleases.
  • Cas proteins include Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8al, Cas8a2, Cas8b, Cas8c, Cas9 (Csnl or Csxl2), CaslO, CaslOd, CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Cs
  • a type II CRISPR system Cas9 endonuclease is used.
  • Cas9 nucleases from any species, or biologically active fragments, variants, analogs, or derivatives thereof that retain Cas9 endonuclease activity i.e., catalyze site-directed cleavage of DNA to generate double-strand breaks
  • the Cas9 need not be physically derived from an organism but may be synthetically or recombinantly produced.
  • Cas9 sequences from a number of bacterial species are well known in the art and listed in the National Center for Biotechnology Information (NCBI) database.
  • Cpfl is another class II CRISPR/Cas system RNA-guided nuclease with similarities to Cas9 and may be used analogously. Unlike Cas9, Cpfl does not require a tracrRNA and only depends on a crRNA in its guide RNA, which provides the advantage that shorter guide RNAs can be used with Cpfl for targeting than Cas9. Cpfl is capable of cleaving either DNA or RNA.
  • the PAM sites recognized by Cpfl have the sequences 5'-YTN-3' (where "Y” is a pyrimidine and “N” is any nucleobase) or 5'-TTN-3', in contrast to the G-rich PAM site recognized by Cas9.
  • Cpfl cleavage of DNA produces double-stranded breaks with a sticky- ends having a 4 or 5 nucleotide overhang.
  • Ledford et al. (2015) Nature. 526 (7571): 17-17, Zetsche et al. (2015) Cell. 163 (3):759-771 Murovec et al. (2017) Plant Biotechnol. J. 15(8):917-926, Zhang et al. (2017) Front. Plant Sci. 8: 177, Fernandes et al. (2016) Postepy Biochem. 62(3):315-326; herein incorporated by reference.
  • C2clis another class II CRISPR/Cas system RNA-guided nuclease that may be used.
  • C2cl similarly to Cas9, depends on both a crRNA and tracrRNA for guidance to target sites.
  • RNA-guided FokI nucleases comprise fusions of inactive Cas9 (dCas9) and the FokI endonuclease (FokI-dCas9), wherein the dCas9 portion confers guide RNA-dependent targeting on FokI.
  • dCas9 inactive Cas9
  • FokI-dCas9 FokI endonuclease
  • dCas9 portion confers guide RNA-dependent targeting on FokI.
  • engineered RNA-guided FokI nucleases see, e.g., Havlicek et al. (2017) Mol. Ther. 25(2):342-355, Pan et al. (2016) Sci Rep. 6:35794, Tsai et al. (2014) Nat Biotechnol. 32(6):569-576; herein incorporated by reference.
  • the reverse transcriptase is expressed in cells to synthesize the msd DNA from the ncRNA.
  • the msd DNA includes the repair template within the msd loop.
  • the retron reverse transcriptase can be expressed from the same expression cassette as the Cas nuclease, or the reverse transcriptase can be expressed from a different expression cassette than the Cas nuclease.
  • a variety of expression cassettes and/or expression vectors can be used to express the retron reverse transcriptase and the Cas nuclease.
  • the engineered retrons may be introduced into any type of cell, including any cell from a prokaryotic, eukaryotic, or archaeon organism, including bacteria, archaea, fungi, protists, plants (e.g., monocotyledonous and dicotyledonous plants); and animals (e.g., vertebrates and invertebrates).
  • animals e.g., vertebrates and invertebrates.
  • animals that may be transfected with an engineered retron include, without limitation, vertebrates such as fish, birds, mammals (e.g., human and non-human primates, farm animals, pets, and laboratory animals), reptiles, and amphibians.
  • Examples of plants that may be transfected with an engineered retron include, without limitation, crops including cereals such as wheat, oats, and rice, legumes such as soybeans and peas, corn, grasses such as alfalfa, and cotton.
  • the engineered retrons can be introduced into a single cell or a population of cells of interest. Cells from tissues, organs, and biopsies, as well as recombinant cells, genetically modified cells, cells from cell lines cultured in vitro, and artificial cells (e.g., nanoparticles, liposomes, polymersomes, or microcapsules encapsulating nucleic acids) may all be transfected with the engineered retrons.
  • the subject methods are also applicable to cellular fragments, cell components, or organelles (e.g., mitochondria in animal and plant cells, plastids (e.g., chloroplasts) in plant cells and algae).
  • Cells may be cultured or expanded after transfection with the engineered retron constructs.
  • nucleic acids into a host cell are well known in the art. Commonly used methods include chemically induced transformation, typically using divalent cations (e.g., CaCh), dextran-mediated transfection, polybrene mediated transfection, lipofectamine and LT-1 mediated transfection, electroporation, protoplast fusion, encapsulation of nucleic acids in liposomes, and direct microinjection of the nucleic acids comprising engineered retrons into nuclei.
  • divalent cations e.g., CaCh
  • dextran-mediated transfection e.g., polybrene mediated transfection
  • lipofectamine and LT-1 mediated transfection e.g., electroporation, protoplast fusion, encapsulation of nucleic acids in liposomes
  • electroporation protoplast fusion
  • protoplast fusion e.g., electroporation of electroporation of nucleic acids in liposomes
  • electroporation protoplast fusion
  • the engineered retrons, retron components, and retron editing systems are produced by a vector system comprising one or more vectors.
  • the retron msr gene, msd gene, and ret gene are expressed in vivo from a vector within a cell.
  • a "vector” is a composition of matter which can be used to deliver a nucleic acid of interest to the interior of a cell.
  • the retron msr gene, msd gene, and ret gene can be introduced into a cell with a single vector or in multiple separate vectors to produce msDNA in a host subject.
  • Vectors typically include control elements operably linked to the retron sequences, which allow for the production of msDNA in vivo in the subject species.
  • the retron msr gene, msd gene, and ret gene can be operably linked to a promoter to allow expression of the retron reverse transcriptase and msDNA product.
  • heterologous sequences encoding desired products of interest e.g., polynucleotide encoding polypeptide or regulatory RNA, donor polynucleotide for gene editing, or protospacer DNA for molecular recording
  • desired products of interest e.g., polynucleotide encoding polypeptide or regulatory RNA, donor polynucleotide for gene editing, or protospacer DNA for molecular recording
  • Any eukaryotic, archeon, or prokaryotic cell, capable of being transfected with a vector comprising the engineered retron sequences, may be used to produce the msDNA.
  • the ability of constructs to produce the msDNA along with other retron-encoded products can be empirically determined.
  • the engineered retron is produced by a vector system comprising one or more vectors.
  • the msr gene, the msd gene, and the ret gene may be provided by the same vector (i.e., cis arrangement of all such retron elements), wherein the vector comprises a promoter operably linked to the msr gene and the msd gene.
  • the promoter is further operably linked to the ret gene.
  • the vector further comprises a second promoter operably linked to the ret gene.
  • the ret gene may be provided by a second vector that does not include the msr gene and the msd gene (i.e., trans arrangement of msr-msd and ref).
  • the msr gene, the msd gene, and the ret gene are each provided by different vectors (i.e., trans arrangement of all retron elements).
  • Numerous vectors are available including, but not limited to, linear polynucleotides, polynucleotides associated with ionic or amphiphilic compounds, plasmids, and viruses.
  • the term "vector" includes an autonomously replicating plasmid or a virus.
  • viral vectors include, but are not limited to, adenoviral vectors, adeno-associated virus vectors, retroviral vectors, lentiviral vectors, and the like.
  • An expression construct can be replicated in a living cell, or it can be made synthetically.
  • the terms "expression construct,” “expression vector,” and “vector,” are used interchangeably to demonstrate the application of the invention in a general, illustrative sense, and are not intended to limit the invention.
  • the nucleic acid comprising an engineered retron sequence is under transcriptional control of a promoter.
  • a "promoter” refers to a DNA sequence recognized by the synthetic machinery of the cell, or introduced synthetic machinery, required to initiate the specific transcription of a gene.
  • the term promoter will be used here to refer to a group of transcriptional control modules that are clustered around the initiation site for RNA polymerase I, II, or III, including RNA polymerase III (Pol III) promoters to express the ncRNA/gRNA in mammalian (e.g., human) cells.
  • RNA polymerase III RNA polymerase III
  • RNA polymerase III is responsible for the synthesis of a large variety of small nuclear and cytoplasmic non-coding RNAs.
  • PolIII promoters include the 7SK, U6 and Hl promoters.
  • PolIII promoters can provide expression in a variety of cell types. PolIII promoters are typically compact, for example, providing expression from 5 '-flanking sequences as short as 100 bp. In other cases, the PolIII promoter has more than 100 nucleotides.
  • the DNA elements for transcription of the Hl RNA gene are composed of the octamer, Staf transcription factor binding site, proximal sequence element (PSE) and TATA motifs.
  • SEQ ID NO: 9 An example of a sequence for a Hl promoter is shown below as SEQ ID NO: 9.
  • transcription terminator/polyadenylation signals will also be present in the expression construct.
  • PolIII terminates transcription at small PolyU stretch.
  • a hairpin loop is not required, but may enhance termination efficiency in humans.
  • Typical promoters for mammalian cell expression include the SV40 early promoter, a CMV promoter such as the CMV immediate early promoter (see, U.S. Patent Nos. 5,168,062 and 5,385,839, incorporated herein by reference in their entireties), the mouse mammary tumor virus LTR promoter, the adenovirus major late promoter (Ad MLP), and the herpes simplex virus promoter, among others.
  • Other nonviral promoters such as a promoter derived from the murine metallothionein gene, will also find use for mammalian expression.
  • These and other promoters can be obtained from commercially available plasmids, using techniques well known in the art.
  • Enhancer elements may be used in association with the promoter to increase expression levels of the constructs. Examples include the SV40 early gene enhancer, as described in Dijkema et al., EMBO J. (1985) 4:761, the enhancer/promoter derived from the long terminal repeat (LTR) of the Rous Sarcoma Virus, as described in Gorman et al., Proc. Natl. Acad. Sci. USA (1982b) 79:6777 and elements derived from human CMV, as described in Boshart et al., Cell (1985) 44 :521, such as elements included in the CMV intron A sequence.
  • LTR long terminal repeat
  • the phrase "operably linked” or “under transcriptional control” as used herein means that the promoter is in the correct location and orientation in relation to a polynucleotide to control the initiation of transcription by RNA polymerase and expression of the msr gene, msd gene, and ret gene.
  • transcription terminator/polyadenylation signals will also be present in the expression construct.
  • sequences include, but are not limited to, those derived from SV40, as described in Sambrook et al., supra, as well as a bovine growth hormone terminator sequence (see, e.g., U.S. Patent No. 5,122,458).
  • 5'- UTR sequences can be placed adjacent to the coding sequence in order to enhance expression of the same.
  • Such sequences may include UTRs comprising an internal ribosome entry site (IRES).
  • IRES intracranial ribosomal translation initiation complex
  • the IRES element attracts a eukaryotic ribosomal translation initiation complex and promotes translation initiation. See, e.g., Kaufman et al., Nuc. Acids Res. (1991) 19:4485- 4490; Gurtu et al., Biochem. Biophys. Res. Comm. (1996) 229:295-298; Rees et al., BioTechniques (1996) 20: 102-110; Kobayashi et al., BioTechniques (1996) 21 :399-402; and Mosser et al., BioTechniques (1997 22 150-161).
  • IRES sequences include sequences derived from a wide variety of viruses, such as from leader sequences of picomaviruses such as the encephalomyocarditis virus (EMCV) UTR (Jang et al. J. Virol. (1989) 63: 1651-1660), the polio leader sequence, the hepatitis A virus leader, the hepatitis C virus IRES, human rhinovirus type 2 IRES (Dobrikova et al., Proc. Natl. Acad. Sci. (2003) 100(25): 15125-15130), an IRES element from the foot and mouth disease virus (Ramesh et al., Nucl. Acid Res.
  • EMCV encephalomyocarditis virus
  • IRES giardiavirus IRES
  • yeast angiotensin II type 1 receptor IRES
  • FGF-1 IRES and FGF-2 IRES fibroblast growth factor IRES
  • vascular endothelial growth factor IRES Baranick et al. (2008) Proc. Natl. Acad. Sci. U.S.A. 105(12):4733-4738, Stein et al. (1998) Mol. Cell. Biol. 18(6):3112-3119, Bert et al. (2006) RNA 12(6): 1074-1083
  • insulin-like growth factor 2 IRES Pedersen et al. (2002) Biochem. J. 363(Pt l):37-44.
  • IRES sequence may be included in a vector, for example, to express multiple bacteriophage recombination proteins for recombineering or an RNA-guided nuclease (e.g., Cas9) for HDR in combination with a retron reverse transcriptase from an expression cassette.
  • a polynucleotide encoding a viral T2A peptide can be used to allow production of multiple protein products (e.g., Cas9, bacteriophage recombination proteins, retron reverse transcriptase) from a single vector.
  • multiple protein products e.g., Cas9, bacteriophage recombination proteins, retron reverse transcriptase
  • One or more 2A linker peptides can be inserted between the coding sequences in the multici stronic construct.
  • the 2A peptide which is self-cleaving, allows co-expressed proteins from the multi ci stronic construct to be produced at equimolar levels.
  • 2A peptides from various viruses may be used, including, but not limited to 2A peptides derived from the foot-and-mouth disease virus, equine rhinitis A virus, Thosea asigna virus and porcine teschovirus-1. See, e.g., Kim et al. (2011) PLoS One 6(4):el8556, Trichas et al. (2008) BMC Biol. 6:40, Provost et al. (2007) Genesis 45(10):625-629, Furler et al. (2001) Gene Ther. 8(11): 864-873; herein incorporated by reference in their entireties.
  • the expression construct comprises a plasmid suitable for transforming a bacterial host.
  • Bacterial expression vectors include, but are not limited to, pACYC177, pASK75, pBAD, pBADM, pBAT, pCal, pET, pETM, pGAT, pGEX, pHAT, pKK223, pMal, pProEx, pQE, and pZA31
  • Bacterial plasmids may contain antibiotic selection markers (e.g., ampicillin, kanamycin, erythromycin, carbenicillin, streptomycin, or tetracycline resistance), a lacZ gene (P- galactosidase produces blue pigment from x-gal substrate), fluorescent markers (e.g., GFP. mCherry), or other markers for selection of transformed bacteria. See,
  • the expression construct comprises a plasmid suitable for transforming a yeast cell.
  • Yeast expression plasmids typically contain a yeast-specific origin of replication (ORI) and nutritional selection markers (e.g., HIS3, URA3, LYS2, LEU2, TRP1, MET15, ura4+, leul+, ade6+), antibiotic selection markers (e.g., kanamycin resistance), fluorescent markers (e.g., mCherry), or other markers for selection of transformed yeast cells.
  • the yeast plasmid may further contain components to allow shuttling between a bacterial host (e.g., E. colt) and yeast cells.
  • yeast plasmids A number of different types are available including yeast integrating plasmids (Yip), which lack an ORI and are integrated into host chromosomes by homologous recombination; yeast replicating plasmids (YRp), which contain an autonomously replicating sequence (ARS) and can replicate independently; yeast centromere plasmids (YCp), which are low copy vectors containing a part of an ARS and part of a centromere sequence (CEN); and yeast episomal plasmids (YEp), which are high copy number plasmids comprising a fragment from a 2 micron circle (a natural yeast plasmid) that allows for 50 or more copies to be stably propagated per cell.
  • Yip yeast integrating plasmids
  • ARS autonomously replicating sequence
  • YCp yeast centromere plasmids
  • CEN yeast episomal plasmids
  • yeast episomal plasmids YEp
  • the expression construct comprises a virus or engineered construct derived from a viral genome.
  • viral based systems have been developed for gene transfer into mammalian cells. These include adenoviruses, retroviruses (y- retroviruses and lentiviruses), poxviruses, adeno-associated viruses, baculoviruses, and herpes simplex viruses (see e.g., Warnock et al. (2011) Methods Mol. Biol. 737: 1-25; Walther et al. (2000) Drugs 60(2):249-271; and Lundstrom (2003) Trends Biotechnol. 21(3): 117-122; herein incorporated by reference in their entireties).
  • the ability of certain viruses to enter cells via receptor-mediated endocytosis, to integrate into host cell genomes and express viral genes stably and efficiently have made them attractive candidates for the transfer of foreign genes into mammalian cells.
  • retroviruses provide a convenient platform for gene delivery systems. Selected sequences can be inserted into a vector and packaged in retroviral particles using techniques known in the art. The recombinant virus can then be isolated and delivered to cells of the subject either in vivo or ex vivo.
  • retroviral systems have been described (U.S. Pat. No. 5,219,740; Miller and Rosman (1989) BioTechniques 7:980-990; Miller, A. D. (1990) Human Gene Therapy 1 :5-14; Scarpa et al. (1991) Virology 180:849-852; Bums et al. (1993) Proc. Natl. Acad. Sci.
  • Lentiviruses are a class of retroviruses that are particularly useful for delivering polynucleotides to mammalian cells because they are able to infect both dividing and nondividing cells (see e.g., Lois et al (2002) Science 295:868-872; Durand et al. (2011) Viruses 3(2): 132-159; herein incorporated by reference).
  • adenovirus vectors have also been described. Unlike retroviruses which integrate into the host genome, adenoviruses persist extrachromosomally thus minimizing the risks associated with insertional mutagenesis (Haj-Ahmad and Graham, J. Virol. (1986) 57:267-274; Bett et al., J. Virol. (1993) 67:5911-5921; Mittereder et al., Human Gene Therapy (1994) 5:717-729; Seth et al., J. Virol. (1994) 68:933-940; Barr et al., Gene Therapy (1994) 1 :51-58; Berkner, K. L.
  • AAV vector systems have been developed for gene delivery.
  • AAV vectors can be readily constructed using techniques well known in the art. See, e.g., U.S. Pat. Nos. 5,173,414 and 5,139,941; International Publication Nos. WO 92/01070 (published 23 January 1992) and WO 93/03769 (published 4 March 1993); Lebkowski et al., Molec. Cell. Biol.
  • Another vector system useful for delivering nucleic acids encoding the engineered retrons is the enterically administered recombinant poxvirus vaccines described by Small, Jr., P. A., et al. (U.S. Pat. No. 5,676,950, issued Oct. 14, 1997, herein incorporated by reference).
  • vaccinia virus recombinants expressing a nucleic acid molecule of interest can be constructed as follows. The DNA encoding the particular nucleic acid sequence is first inserted into an appropriate vector so that it is adjacent to a vaccinia promoter and flanking vaccinia DNA sequences, such as the sequence encoding thymidine kinase (TK). This vector is then used to transfect cells which are simultaneously infected with vaccinia.
  • TK thymidine kinase
  • Homologous recombination serves to insert the vaccinia promoter plus the gene encoding the sequences of interest into the viral genome.
  • the resulting TK-recombinant can be selected by culturing the cells in the presence of 5-bromodeoxyuridine and picking viral plaques resistant thereto.
  • avipoxviruses such as the fowlpox and canarypox viruses, can also be used to deliver the nucleic acid molecules of interest.
  • the use of an avipox vector is particularly desirable in human and other mammalian species since members of the avipox genus can only productively replicate in susceptible avian species and therefore are not infective in mammalian cells.
  • Methods for producing recombinant avipoxviruses are known in the art and employ genetic recombination, as described above with respect to the production of vaccinia viruses. See, e.g., WO 91/12882; WO 89/03429; and WO 92/03545.
  • Molecular conjugate vectors such as the adenovirus chimeric vectors described in Michael et al., J. Biol. Chem. (1993) 268:6866-6869 and Wagner et al., Proc. Natl. Acad. Sci. USA (1992) 89:6099-6103, can also be used for gene delivery.
  • Sindbis-virus derived vectors useful for the practice of the instant methods, see, Dubensky et al. (1996) J. Virol. 70:508-519; and International Publication Nos. WO 95/07995, WO 96/17072; as well as, Dubensky, Jr., T. W., et al., U.S. Pat. No. 5,843,723, issued Dec.
  • chimeric alphavirus vectors comprised of sequences derived from Sindbis virus and Venezuelan equine encephalitis virus. See, e.g., Perri et al. (2003) J. Virol. 77: 10394-10403 and International Publication Nos. WO 02/099035, WO 02/080982, WO 01/81609, and WO 00/61772; herein incorporated by reference in their entireties.
  • a vaccinia-based infection/transfection system can be conveniently used to provide for inducible, transient expression of the nucleic acids of interest (e.g., engineered retron) in a host cell.
  • cells are first infected in vitro with a vaccinia virus recombinant that encodes the bacteriophage T7 RNA polymerase.
  • This polymerase displays extraordinar specificity in that it only transcribes templates bearing T7 promoters.
  • cells are transfected with the nucleic acid of interest, driven by a T7 promoter.
  • the polymerase expressed in the cytoplasm from the vaccinia virus recombinant transcribes the transfected DNA into RNA.
  • RNA RNA-binding protein
  • Elroy-Stein and Moss Proc. Natl. Acad. Sci. USA (1990) 87:6743-6747; Fuerst et al., Proc. Natl. Acad. Sci. USA (1986) 83:8122-8126.
  • an amplification system can be used that will lead to high level expression following introduction into host cells.
  • a T7 RNA polymerase promoter preceding the coding region for T7 RNA polymerase can be engineered. Translation of RNA derived from this template will generate T7 RNA polymerase which in turn will transcribe more templates. Concomitantly, there will be a cDNA whose expression is under the control of the T7 promoter. Thus, some of the T7 RNA polymerase generated from translation of the amplification template RNA will lead to transcription of the desired gene.
  • T7 RNA polymerase can be introduced into cells along with the template(s) to prime the transcription reaction.
  • the polymerase can be introduced as a protein or on a plasmid encoding the RNA polymerase.
  • Insect cell expression systems such as baculovirus systems
  • baculovirus systems can also be used and are known to those of skill in the art and described in, e.g., Baculovirus and Insect Cell Expression Protocols (Methods in Molecular Biology, D.W. Murhammer ed., Humana Press, 2 nd edition, 2007) and L. King The Baculovirus Expression System: A laboratory guide (Springer, 1992).
  • Materials and methods for baculovirus/insect cell expression systems are commercially available in kit form from, inter alia, Thermo Fisher Scientific (Waltham, MA) and Clontech (Mountain View, CA).
  • Plant expression systems can also be used for transforming plant cells. Generally, such systems use virus-based vectors to transfect plant cells with heterologous genes. For a description of such systems see, e.g., Porta et al., Mol. Biotech. (1996) 5:209-221; andhackland et al., Arch. Virol. (1994) 139: 1-22.
  • the expression construct In order to effect expression of engineered retron constructs, the expression construct must be delivered into a cell. This delivery may be accomplished in vitro, as in laboratory procedures for transforming cells lines, or in vivo or ex vivo, as in the treatment of certain disease states.
  • One mechanism for delivery is via viral infection where the expression construct is encapsulated in an infectious viral particle.
  • Non-viral methods for the transfer of expression constructs into cultured cells include the use of calcium phosphate precipitation, DEAE- dextran, electroporation, direct microinjection, DNA-loaded liposomes, lipofectamine-DNA complexes, cell sonication, gene bombardment using high velocity microprojectiles, and receptor-mediated transfection (see, e.g., Graham and Van Der Eb (1973) Virology 52:456- 467; Chen and Okayama (1987) Mol. Cell Biol. 7:2745-2752; Rippe et al. (1990) Mol. Cell Biol. 10:689-695; Gopal (1985) Mol. Cell Biol.
  • the nucleic acid comprising the engineered retron sequence may be positioned and expressed at different sites.
  • the nucleic acid comprising the engineered retron sequence may be stably integrated into the genome of the cell. This integration may be in the cognate location and orientation via homologous recombination (gene replacement) or it may be integrated in a random, non-specific location (gene augmentation).
  • the nucleic acid may be stably maintained in the cell as a separate, episomal segment of DNA. Such nucleic acid segments or "episomes" encode sequences sufficient to permit maintenance and replication independent of or in synchronization with the host cell cycle. How the expression construct is delivered to a cell and where in the cell the nucleic acid remains is dependent on the type of expression construct employed.
  • the expression construct may simply consist of naked recombinant DNA or plasmids comprising the engineered retron. Transfer of the construct may be performed by any of the methods mentioned above which physically or chemically permeabilize the cell membrane. This is particularly applicable for transfer in vitro but it may be applied to in vivo use as well.
  • Dubensky et al. Proc. Natl. Acad. Sci. USA (1984) 81 :7529- 7533
  • polyomavirus DNA in the form of calcium phosphate precipitates into liver and spleen of adult and newborn mice demonstrating active viral replication and acute infection.
  • Benvenisty & Neshif Proc. Natl. Acad. Sci.
  • a naked DNA expression construct may be transferred into cells by particle bombardment.
  • This method depends on the ability to accelerate DNA-coated microprojectiles to a high velocity allowing them to pierce cell membranes and enter cells without killing them (Klein et al. (1987) Nature 327:70-73).
  • Several devices for accelerating small particles have been developed.
  • One such device relies on a high voltage discharge to generate an electrical current, which in turn provides the motive force (Yang et al. (1990) Proc. Natl. Acad. Sci. USA 87:9568-9572).
  • the microprojectiles may consist of biologically inert substances, such as tungsten or gold beads.
  • the expression construct may be delivered using liposomes.
  • Liposomes are vesicular structures characterized by a phospholipid bilayer membrane and an inner aqueous medium. Multilamellar liposomes have multiple lipid layers separated by aqueous medium. They form spontaneously when phospholipids are suspended in an excess of aqueous solution. The lipid components undergo self-rearrangement before the formation of closed structures and entrap water and dissolved solutes between the lipid bilayers (Ghosh & Bachhawat (1991) Liver Diseases, Targeted Diagnosis and Therapy Using Specific Receptors and Ligands, Wu et al. (Eds.), Marcel Dekker, NY, 87-104).
  • the liposome may be complexed with a hemagglutinating virus (HVJ). This has been shown to facilitate fusion with the cell membrane and promote cell entry of liposome-encapsulated DNA (Kaneda et al. (1989) Science 243:375-378).
  • HVJ hemagglutinating virus
  • the liposome may be complexed or employed in conjunction with nuclear nonhistone chromosomal proteins (HMG-I) (Kato et al. (1991) J. Biol. Chem. 266(6):3361 -3364).
  • HMG-I nuclear nonhistone chromosomal proteins
  • the liposome may be complexed or employed in conjunction with both HVJ and HMG-I.
  • receptor-mediated delivery vehicles which can be employed to deliver a nucleic acid into cells. These take advantage of the selective uptake of macromolecules by receptor-mediated endocytosis in almost all eukaryotic cells. Because of the cell type-specific distribution of various receptors, the delivery can be highly specific (Wu and Wu (1993) Adv. Drug Delivery Rev. 12:159-167).
  • Receptor-mediated gene targeting vehicles generally consist of two components: a cell receptor-specific ligand and a DNA-binding agent.
  • ligands have been used for receptor- mediated gene transfer. The most extensively characterized ligands are asialoorosomucoid (ASOR) and transferrin (see, e.g., Wu and Wu (1987), supra, Wagner et al. (1990) Proc. Natl. Acad. Sci. USA 87(9):3410-3414).
  • a synthetic neoglycoprotein which recognizes the same receptor as ASOR, has been used as a gene delivery vehicle (Ferkol et al. (1993) FASEB J. 7: 1081-1091; Perales et al. (1994) Proc. Natl. Acad. Sci. USA 91(9):4086-4090), and epidermal growth factor (EGF) has also been used to deliver genes to squamous carcinoma cells (Myers, EPO 0273085).
  • the delivery vehicle may comprise a ligand and a liposome.
  • a ligand for example, Nicolau et al. (Methods Enzymol. (1987) 149: 157-176) employed lactosyl-ceramide, a galactose-terminal asialoganglioside, incorporated into liposomes and observed an increase in the uptake of the insulin gene by hepatocytes.
  • a nucleic acid encoding a particular gene also may be specifically delivered into a cell by any number of receptor-ligand systems with or without liposomes.
  • antibodies to surface antigens on cells can similarly be used as targeting moieties.
  • a recombinant polynucleotide comprising an engineered retron may be administered in combination with a cationic lipid.
  • cationic lipids include, but are not limited to, lipofectin, DOTMA, DOPE, and DOTAP.
  • WO/0071096, which is specifically incorporated by reference, describes different formulations, such as a DOTAP:cholesterol or cholesterol derivative formulation that can effectively be used for gene therapy.
  • Other disclosures also discuss different lipid or liposomal formulations including nanoparticles and methods of administration; these include, but are not limited to, U.S.
  • Patent Publication 20030203865, 20020150626, 20030032615, and 20040048787 which are specifically incorporated by reference to the extent they disclose formulations and other related aspects of administration and delivery of nucleic acids.
  • Methods used for forming particles are also disclosed in U.S. Pat. Nos. 5,844,107, 5,877,302, 6,008,336, 6,077,835, 5,972,901, 6,200,801, and 5,972,900, which are incorporated by reference for those aspects.
  • gene transfer may more easily be performed under ex vivo conditions.
  • Ex vivo gene therapy refers to the isolation of cells from a subject, the delivery of a nucleic acid into cells in vitro, and then the return of the modified cells back into the subject. This may involve the collection of a biological sample comprising cells from the subject. For example, blood can be obtained by venipuncture, and solid tissue samples can be obtained by surgical techniques according to methods well known in the art.
  • the subject who receives the cells is also the subject from whom the cells are harvested or obtained, which provides the advantage that the donated cells are autologous.
  • cells can be obtained from another subject (i.e., donor), a culture of cells from a donor, or from established cell culture lines. Cells may be obtained from the same or a different species than the subject to be treated, but preferably are of the same species, and more preferably of the same immunological profile as the subject.
  • Such cells can be obtained, for example, from a biological sample comprising cells from a close relative or matched donor, then transfected with nucleic acids (e.g., comprising an engineered retron), and administered to a subject in need of genome modification, for example, for treatment of a disease or condition.
  • nucleic acids e.g., comprising an engineered retron
  • the bacterial host cells can be any bacterial cells that have phage receptor binding proteins (RBP) for the phage type(s) of interest.
  • RBP phage receptor binding proteins
  • bacteriophages can be species-specific with regard to their hosts and may only infect a single bacterial species or even some specific strains within a species.
  • the bacterial hosts include Escherichia coh.
  • other bacterial species can be used as host cells.
  • the bacterial host cells can be one or more strains of Escherichia coli (often linked to gastrointestinal distress), Salmonella (often linked to food poisoning), Mycobacterium (causes tuberculosis), Bacillus anthracis (anthrax), Citrobacter freundii (gastroenteritis, neonatal meningitis, and septicemia), Clostridium tetani (tetanus), Clostridium botulinum (botulism), Clostridium difficile (gastrointestinal problems, especially in those with a weak immune system), Enterobacter hormaechei (nosocomial infections), Haemophilus influenzae (meningitis), Haemophilus influenzae Type B (ear, throat, lung infections), Heliobacter pylori (stomach ulcers), Klebsiella pneumoniae (infections, especially lung infections), Leptospira (Leptospirosis), Listeria monocytogenes (meningitis), Salmon
  • the bacterial host cells can be modified to include retron nucleic acids (ncRNAs) that encode donor DNAs adapted for editing phage genomes.
  • ncRNAs retron nucleic acids
  • the bacterial host cells can also include components to facilitate editing of the bacterial or phage genomes, including one or more types of reverse transcriptases, single strand annealing proteins (SSAPs), single-stranded DNA binding proteins (SSBs), mismatch repair (e.g., mutL) mutants, or combinations thereof.
  • the bacterial host cells can be modified to include a dominant-negative mutant mutL gene (e.g., with an E32K mutation). Sequences for such protein components are available in the database provided by National Center for Biotechnology Information (NCBI; see website at ncbi.nlm.nih.gov). Some examples of sequences for these types of proteins are provided herein but other sequences for these types of proteins can also be used.
  • a variety of one or more single strand annealing proteins can be expressed, either endogenously or recombinantly, to facilitate recombination during editing of the bacterial or phage’s genomes.
  • the one or more single strand annealing proteins (SSAPs) so expressed are compatible with one or more single-stranded binding proteins (SSBs) to promote recombination during editing.
  • the SSAPs can be bacterial or phage SSAPs - either the bacterial host cell or the infecting phage can express such SSAPs.
  • SSAP bacteriophage lambda bet
  • SSAP protein that has the following protein sequence (NCBI NP_040617.1; SEQ ID NO: 11).
  • NCBI ADN68402.1; SEQ ID NO: 15 An example of a protein sequence for an Escherichia phage vB_EcoP_24B bet SSAP protein is shown below (NCBI ADN68402.1; SEQ ID NO: 15).
  • the SSAPs expressed by the bacterial host cells include a RecT recombinase.
  • RecT recombinase Such recombination facilitating RecT proteins are of the Pfam family: PF03837.
  • a RecT protein sequence is the following Enterobacteriaceae RecT protein sequence (NCBI WP 000166319.1; SEQ ID NO: 17).
  • a nucleotide sequence that encodes an Enterobacteriaceae RecT protein is shown below
  • RecT protein sequence is the following Escherichia coli RecT protein sequence (NCBI QTN08202.1; SEQ ID NO: 19).
  • Clostridium tetani RecT protein sequence is shown below (NCBI SUY55099.1; SEQ ID NO: 24).
  • Clostridium difficile RecT protein sequence is shown below (NCBI AXU84523.1; SEQ ID NO: 25).
  • CspRecT A protein sequence for a RecT from a Collinsella ster coris phage
  • NCBI WP_006720782.1; SEQ ID NO: 29) is shown below.
  • the bacterial host cells can also have modified mismatch repair functions.
  • genes encoding mismatch repair enzymes can be modified to reduce mismatch repair.
  • one or more mismatch repair genes can be modified so that the encoded protein may bind to a mismatch site but be unable to correct the mismatch, resulting in unrepaired sites that are blocked from repair by other repair mechanisms.
  • mutL gene A protein sequence for an Escherichia coli DSM 30083 MutL is shown below (NCBI ACZ50725.1; SEQ ID NO: 30).
  • a mutant E. coli mutL protein with a replacement of the glutamic acid (E) at position 32 with a lysine (K) is a dominant negative mutL (E32K) mutant protein, which even in the presence of wild type mutL, inhibits overall mismatch repair reaction, as well as MutH activation.
  • a nucleotide sequence for wild type Escherichia coli mutL is shown below (NCBI GU134327.1; SEQ ID NO: 31).
  • Salmonella enterica mutL protein sequence is shown below (NCBI ACL55048.1; SEQ ID NO: 33). A glutamic acid at position 32 is highlighted also below.
  • Bacillus anthracis mutL protein sequence is shown below (NCBI WP 000516478.1; SEQ ID NO: 34).
  • a glutamic acid at a position homologous to position 32 (position 33) is also highlighted below.
  • Clostridium difficile mutL protein sequence is shown below (NCBI).
  • Haemophilus influenzae mutL protein sequence is shown below (NCBI AVJ09575.1; SEQ ID NO: 37).
  • a glutamic acid at position 32 is also highlighted below.
  • Streptococcus pneumoniae mutL protein sequence is shown below (NCBI ABO44018.1; SEQ ID NO: 39).
  • SSBs single-stranded binding proteins
  • SSAPs single strand annealing proteins
  • SSB that can be expressed by bacteria during editing of phage genomes
  • a nucleotide sequence for the above Escherichia coli str. K-12 substr. MG1655 ssDNA- binding protein is shown below (NCBI NC_000913.3; SEQ ID NO: 41) 1 ATGGCCAGCA GAGGCGTAAA CAAGGTTATT CTCGTTGGTA 41 ATCTGGGTCA GGACCCGGAA GTACGCTACA TGCCAAATGG 81 TGGCGCAGTT GCCAACATTA CGCTGGCTAC TTCCGAATCC 121 TGGCGTGATA AAGCGACCGG CGAGATGAAA GAACAGACTG 161 AATGGCACCG CGTTGTGCTG TTCGGCAAAC TGGCAGAAGT 201 GGCGAGCGAA TATCTGCGTA AAGGTTCTCA GGTTTATATC 241 GAAGGTCAGC TGCGTACCCG TAAATGGACC GATCAATCCG 281 GTCAGGATCG CTACACCACA GAAGTCGTGG TGAACGTTGG 321 CGGCACCATG CAGATG
  • Klebsiella pneumoniae SSB protein sequence is shown below (NCBI ANI75733.1; SEQ ID NO: 43).
  • Klebsiella pneumoniae SSB protein sequence is shown below (NCBI WP_102017779.1; SEQ ID NO: 44).
  • Salmonella enterica SSB protein sequence is shown below (NCBI EEC4048472.1; SEQ ID NO: 47).
  • Salmonella enterica SSB protein sequence Another example of a Salmonella enterica SSB protein sequence is shown below
  • Citrobacter freundii SSB protein sequence is shown below (NCBI EHL7056105.1; SEQ ID NO: 50).
  • variants and homologs of any of the sequences described here can also be used in the methods and systems described herein.
  • such variants and homologs can have less than 100% sequence identity to any of the sequences described herein.
  • the variants and homologs can have about at least 40% sequence identity, or at least 50% sequence identity, or at least 60% sequence identity, or at least 70% sequence identity, or at least 80% sequence identity, or at least 90% sequence identity, or at least 95% sequence identity, or at least 96% sequence identity, or at least 97% sequence identity, or at least 98% sequence identity, or at least 99% sequence identity, or 60-99% sequence identity, or 70-99% sequence identity, or 80- 99% sequence identity, or 90-95% sequence identity, or 90-99% sequence identity, or 95-97% sequence identity, or 97-99% sequence identity, or 100% sequence identity with any of sequences described herein.
  • the host cell is a prokaryotic cell, an archaeal cell, or a eukaryotic host cell.
  • the eukaryotic host cell is a mammalian cell, such as a human cell, a non-human cell, or a non-human mammalian cell.
  • the host cell is an artificial cell or genetically modified cell.
  • the host cell is in vitro, such as a tissue culture cell. In some embodiments, the host cell is within a living host organism.
  • Cells that may contain any of the compositions described herein.
  • the methods described herein are used to deliver recombinant retrons or components thereof into a eukaryotic cell (e.g., a mammalian cell, such as a human cell).
  • the cell is in vitro (e.g., cultured cell.
  • the cell is in vivo (e.g., in a subject such as a human subject).
  • the cell is ex vivo (e.g., isolated from a subject and may be administered back to the same or a different subject).
  • the cell host can be a mammalian cell.
  • Mammalian cells of the present disclosure include human cells, primate cells (e.g., vero cells), rat cells (e.g., GH3 cells, OC23 cells) or mouse cells (e.g., MC3T3 cells).
  • human cell lines including, without limitation, human embryonic kidney (HEK) cells, HeLa cells, cancer cells from the National Cancer Institute's 60 cancer cell lines (NCI60), DU145 (prostate cancer) cells, Lncap (prostate cancer) cells, MCF-7 (breast cancer) cells, MDA-MB-438 (breast cancer) cells, PC3 (prostate cancer) cells, T47D (breast cancer) cells, THP-1 (acute myeloid leukemia) cells, U87 (glioblastoma) cells, SHSY5Y human neuroblastoma cells (cloned from a myeloma) and Saos-2 (bone cancer) cells.
  • HEK human embryonic kidney
  • HeLa cells cancer cells from the National Cancer Institute's 60 cancer cell lines (NCI60)
  • DU145 (prostate cancer) cells Lncap (prostate cancer) cells
  • MCF-7 breast cancer
  • MDA-MB-438 breast cancer
  • PC3 prostate cancer
  • T47D
  • the cells can be human embryonic kidney (HEK) cells (e.g., HEK 293 or HEK 293T cells).
  • the cells can be stem cells (e.g., human stem cells) such as, for example, pluripotent stem cells (e.g., human pluripotent stem cells including human induced pluripotent stem cells (hiPSCs)).
  • stem cell refers to a cell with the ability to divide for indefinite periods in culture and to give rise to specialized cells.
  • a pluripotent stem cell refers to a type of stem cell that is capable of differentiating into all tissues of an organism, but not alone capable of sustaining full organismal development.
  • a human induced pluripotent stem cell refers to a somatic (e.g., mature or adult) cell that has been reprogrammed to an embryonic stem cell-like state by being forced to express genes and factors important for maintaining the defining properties of embryonic stem cells (see, e.g., Takahashi and Yamanaka, Cell 126 (4): 663-76, 2006, incorporated by reference herein).
  • Human induced pluripotent stem cells express stem cell markers and are capable of generating cells characteristic of all three germ layers (ectoderm, endoderm, mesoderm).
  • a host cell is transiently or non-transiently transfected with one or more delivery systems described herein, including virus-based systems, virus-like particle systems, and nonvirus-base delivery, including LNPs and liposomes.
  • a cell is transfected as it naturally occurs in a subject.
  • a cell that is transfected is taken from a subject, i.e., ex vivo transfection.
  • the cell is derived from cells taken from a subject, such as a cell line.
  • a wide variety of cell lines for tissue culture are known in the art.
  • cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD- 3, NHDF, HeLa-S3, Huhl, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Panel, PC-3, TF1, CTLL-2, C1R, Rat6, CV1, RPTE, A10, T24, J82, A375, ARH- 77, Calul, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bcl-1, BC-3, IC21, DLD2, Raw264.7, NRK, NRK-52E, MRC5, MEF, Hep G2, HeLa B, HeLa T4, COS, COS-1, COS-6, COS-M6A, BS-C-1 monkey kidney epithelial, BA
  • a cell transfected with one or more retron delivery systems described herein is used to establish a new cell line comprising one or more nucleic acid molecules encoding the recombinant retron-based gene editing systems described herein or encoding at last a component of said systems (e.g., a recombinant ncRNA or a recombinant retron RT).
  • ATCC American Type Culture Collection
  • a cell transfected with one or more retron delivery systems described herein is used to establish a new cell line comprising one or more nucleic acid molecules encoding the recombinant retron-based gene editing systems described herein or encoding at last a component of said systems (e.g., a recombinant ncRNA or a recombinant retron RT).
  • bacteriophages can be modified using the editing components, bacterial host cells, and methods described herein.
  • Bacteriophages are ubiquitous viruses, found wherever bacteria exist. It is estimated there are more than 10 31 bacteriophages on the planet, more than every other organism on Earth, including bacteria.
  • Many types of bacteriophages can be modified by the methods and editing systems described herein.
  • the phages to be modified are DNA phages.
  • the phages to be modified can have double-stranded genomes.
  • the phage with the genomes to be edited can be lytic phages, which are easier to isolate than temperate phages.
  • the phages with the genomes that will be edited can be temperate phages.
  • one type of editing that can be performed using the methods described herein can be converting temperate phages or lysogenic phages into lytic phages.
  • phages belong to the order of Caudovirales, which are tailed phages that have dsDNA and an isometric capsid.
  • Caudovirales is comprised of three phylogenetically-related families that are discriminated by tail morphology: Myoviridae (long contractile tails), Siphoviridae (long non-contractile tails), and Podoviridae (short tails) (Ackermann, 2007; Krupovic, Prangishvili, Hendrix, & Bamford, 2011).
  • phages A The most well- studied tailed phages are the coliphages A (Siphoviridae), T4 (Myoviridae), and T7 (Podoviridae which infect Escherichia coli. Any such phage species can be genomically modified using the methods described herein.
  • the bacteriophage database at the website phagesdb.org provides information and sequences for bacteriophages that can used to identify target sites for editing.
  • the NCBI database also provides sequences for bacteriophages that can used to identify target sites for editing.
  • bacteriophages that can be modified include: bacteriophage lambda, T2, T5, T7, PDX, vB EcoS-28621, vB EcoS-286211, vB EcoS-2862111, vB_EcoS-2862IV, vB_EcoS-2862V, vB EcoS- 260201, vB EcoS-2602011, vB EcoS-26020111, vB_EcoS-26020IV, vB_EcoS-26020V; bacteriophage Pal (ATCC 12,175-B1), Pa2 (ATCC 14203-B1), and Pal l (ATCC 14205-B1) that can inhibit P.
  • aeruginosa strain PA01 bacteriophage (
  • bacteriophage KP DPI, SA DPI, PA DP4, and EC DP3 isolated from wastewater against multi-drug resistant bacteria including K. pneumoniae, S. aureus, P. aeruginosa, and E.
  • Some pathogenic bacterial toxins are encoded by bacteriophage genomes such that the host bacteria are only pathogenic when lysogenized by the toxin-encoding phage.
  • Examples of toxins that can be encoded by bacteriophages are cholera toxin in Vibrio cholerae, diphtheria toxin in Corynebacterium diphtheriae, botulinum neurotoxin in Clostridium botulinum, the binary toxin of Clostridium difficile, and Shiga toxin of Shigella species. Without their phage- encoded toxins, these bacterial species are either much less pathogenic or not pathogenic at all.
  • toxin-encoding genes in bacteriophage genomes can be deleted or knocked out using the methods described herein before those phage are further modified.
  • Bacterial cells can have a couple of mechanisms that can interfere with phage infection, including receptor/adsorption blocking; abortive infection; clustered, regularly interspaced short palindromic repeats (CRISPR) with CRISPR-associated (Cas) proteins (CRISPR-Cas); and restriction modification (RM). Phage can be modified to make them less vulnerable to these bacterial cell defense mechanisms.
  • CRISPR regularly interspaced short palindromic repeats
  • Cas CRISPR-associated proteins
  • RM restriction modification
  • the engineered retron or one or more components thereof (e.g., engineered ncRNAs, engineered msDNA, engineered RT, nucleic acid molecules encoding the engineered retrons and/or retron components, guide RNAs, programmable nucleases) may be provided as pharmaceutical compositions.
  • one or more LNPs or other non-virus-based delivery system comprising one or more circular or linear RNA molecules encoding each of the components of the retron-based genome editing system may be formulated as a pharmaceutical composition for administering to a subject in need (e.g., a human in need of gene editing).
  • Formulations can include, without limitation, saline, liposomes, lipid nanoparticles, polymers, peptides, proteins, cells transfected with viral vectors e.g., for transfer or transplantation into a subject) and combinations thereof.
  • Formulations of the pharmaceutical compositions described herein may be prepared by any method known or hereafter developed in the art of pharmacology.
  • pharmaceutical composition refers to compositions comprising at least one active ingredient and optionally one or more pharmaceutically acceptable excipients.
  • Such preparatory methods include the step of associating the active ingredient with an excipient and/or one or more other accessory ingredients.
  • active ingredient generally refers an engineered retron as described herein.
  • a pharmaceutical composition in accordance with the present disclosure may be prepared, packaged, and/or sold in bulk, as a single unit dose, and/or as a plurality of single unit doses.
  • a “unit dose” refers to a discrete amount of the pharmaceutical composition comprising a predetermined amount of the active ingredient.
  • the amount of the active ingredient is generally equal to the dosage of the active ingredient which would be administered to a subject and/or a convenient fraction of such a dosage such as, for example, one-half or one-third of such a dosage.
  • compositions comprising any of the various components of the recombinant retron-based genome editing systems described herein, including, but not limited to, engineered retrons and/or retron components, engineered ncRNAs, engineered msDNA, engineered RT, nucleic acid molecules encoding the engineered retrons and/or retron components, programmable nucleases (e.g., RNA-guided nucleases), guide RNAs, and vector or vector systems encoding the engineered retrons and/or retron components, and any combinations thereof.
  • the term“pharmaceutical composition”, as used herein, refers to a composition formulated for pharmaceutical use.
  • the pharmaceutical composition further comprises a pharmaceutically acceptable carrier.
  • the pharmaceutical composition comprises additional agents (e.g., for specific delivery, increasing half-life, or other therapeutic compounds).
  • the term“pharmaceutically-acceptable carrier” means a pharmaceutically-acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., organ, tissue or portion of the body).
  • a pharmaceutically acceptable carrier is“ acceptable” in the sense of being compatible with the other ingredients of the formulation and not injurious to the tissue of the subject (e.g., physiologically compatible, sterile, physiologic pH, etc.).
  • materials which can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters, such as ethyl
  • the pharmaceutical composition is formulated for delivery to a subject, e.g., for gene editing.
  • Suitable routes of administrating the pharmaceutical composition described herein include, without limitation: topical, subcutaneous, transdermal, intradermal, intralesional, intraarticular, intraperitoneal, intravesical, transmucosal, gingival, intradental, intracochlear, transtympanic, intraorgan, epidural, intrathecal, intramuscular, intravenous, intravascular, intraosseus, periocular, intratumoral, intracerebral, and intracerebroventricular administration.
  • the pharmaceutical composition described herein is administered locally to a diseased site (e.g., tumor site).
  • a diseased site e.g., tumor site
  • the pharmaceutical composition described herein is administered to a subject by injection, by means of a catheter, by means of a suppository, or by means of an implant, the implant being of a porous, non-porous, or gelatinous material, including a membrane, such as a sialastic membrane, or a fiber.
  • the pharmaceutical composition described herein is delivered in a controlled release system.
  • a pump may be used (see, e.g., Langer, 1990, Science 249: 1527-1533; Sefton, 1989, CRC Crit. Ref. Biomed. Eng.14:201; Buchwald et al., 1980, Surgery 88:507; Saudek et al., 1989, N. Engl. J. Med.321 :574).
  • polymeric materials can be used.
  • the pharmaceutical composition is formulated in accordance with routine procedures as a composition adapted for intravenous or subcutaneous administration to a subject, e.g., a human.
  • pharmaceutical composition for administration by injection are solutions in sterile isotonic aqueous buffer.
  • the pharmaceutical can also include a solubilizing agent and a local anesthetic such as lignocaine to ease pain at the site of the injection.
  • the ingredients are supplied either separately or mixed together in unit dosage form, for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent.
  • the pharmaceutical is to be administered by infusion
  • it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline.
  • an ampoule of sterile water for injection or saline can be provided so that the ingredients can be mixed prior to administration.
  • a pharmaceutical composition for systemic administration may be a liquid, e.g., sterile saline, lactated Ringer’s or Hank’s solution.
  • the pharmaceutical composition can be in solid forms and re-dissolved or suspended immediately prior to use. Lyophilized forms are also contemplated.
  • the pharmaceutical composition can be contained within a lipid particle or vesicle, such as a liposome or microcrystal or LNP, which is also suitable for parenteral administration.
  • the particles can be of any suitable structure, such as unilamellar or plurilamellar, so long as compositions are contained therein.
  • Compounds can be entrapped in “stabilized plasmid- lipid particles” (SPLP) containing the fusogenic lipid dioleoylphosphatidylethanolamine (DOPE), low levels (5-10 mol%) of cationic lipid, and stabilized by a polyethyleneglycol (PEG) coating (Zhang Y. P. et al., Gene Ther.1999, 6:1438-47).
  • SPLP stabilized plasmid- lipid particles
  • lipids such as N-[l-(2,3- dioleoyloxi)propyl]-N,N,N-trimethyl-amoniummethylsulfate, or “DOTAP,” are particularly preferred for such particles and vesicles.
  • DOTAP N-[l-(2,3- dioleoyloxi)propyl]-N,N,N-trimethyl-amoniummethylsulfate
  • the preparation of such lipid particles is well known. See, e.g., U.S. Patent Nos.4,880,635; 4,906,477; 4,911,928; 4,917,951; 4,920,016; and 4,921,757; each of which is incorporated herein by reference.
  • the pharmaceutical composition can be provided as a pharmaceutical kit comprising (a) a container containing a recombinant retron-based genome editing system or one or more components thereof in lyophilized form and (b) a second container containing a pharmaceutically acceptable diluent (e.g., sterile water) for injection.
  • a pharmaceutically acceptable diluent e.g., sterile water
  • the pharmaceutically acceptable diluent can be used for reconstitution or dilution of the lyophilized system of the invention.
  • Optionally associated with such contained s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use or sale for human administration.
  • an article of manufacture containing materials useful for the treatment of the diseases described above comprises a container and a label.
  • suitable containers include, for example, bottles, vials, syringes, and test tubes.
  • the containers may be formed from a variety of materials such as glass or plastic.
  • the container holds a composition that is effective for treating a disease described herein and may have a sterile access port.
  • the container may be an intravenous solution bag or a vial having a stopper pierce- able by a hypodermic injection needle.
  • the active agent in the composition is a compound of the invention.
  • the label on or associated with the container indicates that the composition is used for treating the disease of choice.
  • the article of manufacture may further comprise a second container comprising a pharmaceutically acceptable buffer, such as phosphate-buffered saline, Ringer's solution, or dextrose solution. It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters, needles, syringes, and package inserts with instructions for use.
  • kits comprising engineered retron constructs as described herein.
  • the kit provides an engineered retron construct or a vector system comprising such a retron construct.
  • the engineered retron construct, included in the kit comprises a heterologous sequence capable of providing a cell with a nucleic acid encoding a protein or regulatory RNA of interest, a cellular barcode, a donor polynucleotide suitable for use in gene editing, e.g., by homology directed repair (HDR) or recombination-mediated genetic engineering (recombineering), or a CRISPR protospacer DNA sequence for use in molecular recording.
  • HDR homology directed repair
  • CRISPR protospacer DNA sequence for use in molecular recording.
  • Other agents may also be included in the kit such as transfection agents, host cells, suitable media for culturing cells, buffers, and the like.
  • agents can be provided in liquid or sold form in any convenient packaging (e.g., stick pack, dose pack, etc.).
  • the agents of a kit can be present in the same or separate containers.
  • the agents may also be present in the same container.
  • the subject kits may further include (in certain embodiments) instructions for practicing the subject methods. These instructions may be present in the subject kits in a variety of forms, one or more of which may be present in the kit.
  • One form in which these instructions may be present is as printed information on a suitable medium or substrate, e.g., a piece or pieces of paper on which the information is printed, in the packaging of the kit, in a package insert, and the like.
  • UTILITY Retrons can be engineered with heterologous sequences for use in a variety of applications.
  • heterologous sequences can be added to retron constructs to provide a cell with a heterologous nucleic acid encoding a protein or regulatory RNA of interest, a cellular barcode, a donor polynucleotide/repair template suitable for use in gene editing, including therapeutic editing, e.g., by homology directed repair (HDR) or recombination- mediated genetic engineering (recombineering), or a CRISPR protospacer DNA sequence for use in molecular recording, as discussed further herein.
  • HDR homology directed repair
  • recombination- mediated genetic engineering bineering
  • CRISPR protospacer DNA sequence for use in molecular recording
  • compositions and methods described herein find use in, for example, (1) metabolic engineering of bacteria and/or yeast or other eukaryotic cells to increase the production of products of interest such as functional molecules (e.g., chemicals, fuels, materials, and proteins) or for their use in medical research (e.g., gene therapy, discovery of new drugs); (2) improved molecular recording technologies by using different mutations to sense one or several cellular events simultaneously; (3) design cellular chassis as a recipient of engineered biological systems in synthetic biology or bioengineering (e.g., bacterial chassis to improve phage editing technologies); (4) multiplexed gene therapy in mammalian, such as human cells; and (5) multiplexed editing of phage genomes for engineered phage therapy.
  • functional molecules e.g., chemicals, fuels, materials, and proteins
  • medical research e.g., gene therapy, discovery of new drugs
  • improved molecular recording technologies by using different mutations to sense one or several cellular events simultaneously
  • design cellular chassis as a recipient of engineered biological
  • the engineered retrons, components, and systems described herein may be used for research tools, such as kits, functional genomics assays, and generating engineered cell lines and animal models for research and drug screening.
  • the kit may comprise one or more reagents in addition to the engineered retron, such as a buffer, a control reagent, a control vector, a control RNA polynucleotide, a reagent for in vitro production of the polypeptide from DNA, and adaptors for sequencing.
  • a buffer can be, for example, a stabilization buffer, a reconstituting buffer, a diluting buffer, a wash buffer, or a buffer for introducing a polypeptide and/or polynucleotide of the kit into a cell.
  • a kit can comprise one or more additional reagents specific for plants.
  • One or more additional reagents for plants can include, for example, soil, nutrients, plants, seeds, spores, Agrobacterium, a T-DNA vector, and a pBINAR vector.
  • the single-stranded DNA generated by an engineered retron can be used to produce a desired product of interest in cells.
  • the retron is engineered with a heterologous sequence encoding a polypeptide of interest to allow production of the polypeptide from the retron msDNA generated in a cell.
  • the polypeptide of interest may be any type of protein/peptide including, without limitation, an enzyme, an extracellular matrix protein, a receptor, transporter, ion channel, or other membrane protein, a hormone, a neuropeptide, an antibody, or a cytoskeletal protein; or a fragment thereof, or a biologically active domain of interest.
  • the protein is a therapeutic protein or therapeutic antibody for use in treatment of a disease.
  • Non-limiting examples of polypeptides of interest include: growth hormones, insulinlike growth factors (IGF-1), Fat-1, Phytase, xylanase, beta-glucanase, Lysozyme or lysostaphin, Histone deacetylase such as HDAC6, CD163, etc.
  • IGF-1 insulinlike growth factors
  • Fat-1 Fat-1
  • Phytase xylanase
  • beta-glucanase beta-glucanase
  • Lysozyme or lysostaphin Histone deacetylase
  • Histone deacetylase such as HDAC6, CD163, etc.
  • the retron is engineered with a heterologous sequence encoding an RNA of interest to allow production of the RNA from the retron in a cell.
  • the RNA of interest may be any type of RNA including, without limitation, a RNA interference (RNAi) nucleic acid or regulatory RNA such as, but not limited to, a microRNA (miRNA), a small interfering RNA (siRNA), a short hairpin RNA (shRNA), a small nuclear RNA (snRNA), a long non-coding RNA (IncRNA), an antisense nucleic acid, and the like.
  • miRNA microRNA
  • siRNA small interfering RNA
  • shRNA short hairpin RNA
  • snRNA small nuclear RNA
  • IncRNA long non-coding RNA
  • the retron is used for genome editing a desired site.
  • a retron is engineered with a heterologous nucleic acid sequence encoding a donor polynucleotide suitable for use with nuclease genome editing system.
  • the nuclease is designed to specifically target a location proximal to the desired edit (the nuclease should be designed such that it will not cut the target once the edit is properly installed).
  • the nuclease e.g., Cas or non-Cas
  • a heterologous nucleic acid sequence is inserted into the retron msd.
  • the heterologous nucleic acid sequence has 10-100 or more bp of homologous nucleic acid sequence to the genome on both sides of the desired edit.
  • the desired edit (insertion, deletion, or mutation) is in between the homologous sequence.
  • donor polynucleotides comprise a sequence comprising an intended genome edit flanked by a pair of homology arms responsible for targeting the donor polynucleotide to the target locus to be edited in a cell.
  • the donor polynucleotide typically comprises a 5' homology arm that hybridizes to a 5' genomic target sequence and a 3' homology arm that hybridizes to a 3' genomic target sequence.
  • the homology arms are referred to herein as 5' and 3' (z.e., upstream and downstream) homology arms, which relate to the relative position of the homology arms to the nucleotide sequence comprising the intended edit within the donor polynucleotide.
  • the 5' and 3' homology arms hybridize to regions within the target locus in the genomic DNA to be modified, which are referred to herein as the “5' target sequence” and “3' target sequence,” respectively.
  • a homology arm must be sufficiently complementary for hybridization to the target sequence to mediate homologous recombination between the donor polynucleotide and genomic DNA at the target locus.
  • a homology arm may comprise a nucleotide sequence having at least about 80-100% sequence identity to the corresponding genomic target sequence, including any percent identity within this range, such as at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity thereto, wherein the nucleotide sequence comprising the intended edit can be integrated into the genomic DNA by HDR at the genomic target locus recognized (z.e., having sufficient complementary for hybridization) by the 5' and 3' homology arms.
  • the corresponding homologous nucleotide sequences in the genomic target sequence flank a specific site for cleavage and/or a specific site for introducing the intended edit.
  • the distance between the specific cleavage site and the homologous nucleotide sequences can be several hundred nucleotides. In some embodiments, the distance between a homology arm and the cleavage site is 200 nucleotides or less (e.g., 0, 10, 20, 30, 50, 75, 100, 125, 150, 175, and 200 nucleotides). In most cases, a smaller distance may give rise to a higher gene targeting rate.
  • the donor polynucleotide is substantially identical to the target genomic sequence, across its entire length except for the sequence changes to be introduced to a portion of the genome that encompasses both the specific cleavage site and the portions of the genomic target sequence to be altered.
  • a homology arm can be of any length, e.g. 10 nucleotides or more, 15 nucleotides or more, 20 nucleotides or more, 50 nucleotides or more, 100 nucleotides or more, 250 nucleotides or more, 300 nucleotides or more, 350 nucleotides or more, 400 nucleotides or more, 450 nucleotides or more, 500 nucleotides or more, 1000 nucleotides (1 kb) or more, 5000 nucleotides (5 kb) or more, 10000 nucleotides (10 kb) or more, etc. In some instances, the 5' and 3' homology arms are substantially equal in length to one another.
  • the 5' and 3' homology arms are not necessarily equal in length to one another.
  • one homology arm may be 30% shorter or less than the other homology arm, 20% shorter or less than the other homology arm, 10% shorter or less than the other homology arm, 5% shorter or less than the other homology arm, 2% shorter or less than the other homology arm, or only a few nucleotides less than the other homology arm.
  • the 5' and 3' homology arms are substantially different in length from one another, e.g., one may be 40% shorter or more, 50% shorter or more, sometimes 60% shorter or more, 70% shorter or more, 80% shorter or more, 90% shorter or more, or 95% shorter or more than the other homology arm.
  • the donor polynucleotide may be used in combination with an RNA-guided nuclease, which is targeted to a particular genomic sequence (z.e., genomic target sequence to be modified) by a guide RNA.
  • a target-specific guide RNA comprises a nucleotide sequence that is complementary to a genomic target sequence, and thereby mediates binding of the nuclease- gRNA complex by hybridization at the target site.
  • the gRNA can be designed with a sequence complementary to the sequence of a minor allele to target the nuclease-gRNA complex to the site of a mutation.
  • the mutation may comprise an insertion, a deletion, or a substitution.
  • the mutation may include a single nucleotide variation, gene fusion, translocation, inversion, duplication, frameshift, missense, nonsense, or other mutation associated with a phenotype or disease of interest.
  • the targeted minor allele may be a common genetic variant or a rare genetic variant.
  • the gRNA is designed to selectively bind to a minor allele with single base-pair discrimination, for example, to allow binding of the nuclease-gRNA complex to a single nucleotide polymorphism (SNP).
  • SNP single nucleotide polymorphism
  • the gRNA may be designed to target disease-relevant mutations of interest for the purpose of genome editing to remove the mutation from a gene.
  • the gRNA can be designed with a sequence complementary to the sequence of a major or wild-type allele to target the nuclease-gRNA complex to the allele for the purpose of genome editing to introduces a mutation into a gene in the genomic DNA of the cell, such as an insertion, deletion, or substitution.
  • Such genetically modified cells can be used, for example, to alter phenotype, confer new properties, or produce disease models for drug screening.
  • the RNA-guided nuclease used for genome modification is a clustered regularly interspersed short palindromic repeats (CRISPR) system Cas nuclease.
  • CRISPR clustered regularly interspersed short palindromic repeats
  • Any RNA-guided Cas nuclease capable of catalyzing site- directed cleavage of DNA to allow integration of donor polynucleotides by the HDR mechanism can be used in genome editing, including CRISPR system Class 1, Type I, II, or III Cas nucleases; Class 2, Type II nuclease (such as Cas9); a Class 2, Type V nuclease (such as Cpfl), or a Class 2, Type VI nuclease (such as C2c2).
  • Cas proteins include Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8al, Cas8a2, Cas8b, Cas8c, Cas9 (Csnl or Csxl2), CaslO, CaslOd, CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Cs
  • a Class 1, type II CRISPR system Cas9 endonuclease is used.
  • Cas9 nucleases from any species, or biologically active fragments, variants, analogs, or derivatives thereof that retain Cas9 endonuclease activity ie., catalyze site-directed cleavage of DNA to generate double-strand breaks
  • the Cas9 need not be physically derived from an organism but may be synthetically or recombinantly produced.
  • Cas9 sequences from a number of bacterial species are well known in the art and listed in the National Center for Biotechnology Information (NCBI) database.
  • sequences or a variant thereof comprising a sequence having at least about 70- 100% sequence identity thereto, including any percent identity within this range, such as 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity thereto, can be used for genome editing, as described herein. See also Fonfara et al. (2014) Nucleic Acids Res. 42(4):2577-90; Kapitonov et al. (2015) J. Bacterid.
  • the genomic target site will typically comprise a nucleotide sequence that is complementary to the gRNA and may further comprise a protospacer adjacent motif (PAM).
  • the target site comprises 20-30 base pairs in addition to a 3 or more base pair PAM.
  • the first nucleotide of a PAM can be any nucleotide, while the two or more other nucleotides will depend on the specific Cas9 protein that is chosen.
  • Exemplary PAM sequences are known to those of skill in the art and include, without limitation, NNG, NGN, NAG, and NGG, wherein N represents any nucleotide.
  • the allele targeted by a gRNA comprises a mutation that creates a PAM within the allele, wherein the PAM promotes binding of the Cas9-gRNA complex to the allele.
  • the gRNA is 5-50 nucleotides, 10-30 nucleotides, 15- 25 nucleotides, 18-22 nucleotides, or 19-21 nucleotides in length, or any length between the stated ranges, including, for example, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides in length.
  • the guide RNA may be a single guide RNA comprising crRNA and tracrRNA sequences in a single RNA molecule, or the guide RNA may comprise two RNA molecules with crRNA and tracrRNA sequences residing in separate RNA molecules.
  • Cpfl is another class II CRISPR/Cas system RNA-guided nuclease with similarities to Cas9 and may be used analogously. Unlike Cas9, Cpfl does not require a tracrRNA and only depends on a crRNA in its guide RNA, which provides the advantage that shorter guide RNAs can be used with Cpfl for targeting than Cas9. Cpfl is capable of cleaving either DNA or RNA.
  • the PAM sites recognized by Cpfl have the sequences 5'-YTN-3' (where “Y” is a pyrimidine and “N” is any nucleobase) or 5'-TTN-3', in contrast to the G-rich PAM site recognized by Cas9.
  • Cpfl cleavage of DNA produces double-stranded breaks with a sticky - ends having a 4 or 5 nucleotide overhang.
  • Ledford et al. (2015) Nature. 526 (7571): 17-17, Zetsche et al. (2015) Cell. 163 (3):759-771 Murovec et al. (2017) Plant Biotechnol. J. 15(8):917-926, Zhang et al. (2017) Front. Plant Sci. 8: 177, Fernandes et al. (2016) Postepy Biochem. 62(3) :315-326; herein incorporated by reference.
  • C2cl (Casl2b) is another class II CRISPR/Cas system RNA-guided nuclease that may be used.
  • C2cl similarly to Cas9, depends on both a crRNA and tracrRNA for guidance to target sites. See, e.g., Shmakov etal. (2015) Mol Cell. 60(3):385-397, Zhang etal. (2017) Front Plant Sci. 8: 177; herein incorporated by reference.
  • RNA-guided Fokl nucleases comprise fusions of inactive Cas9 (dCas9) and the Fokl endonuclease (FokI-dCas9), wherein the dCas9 portion confers guide RNA-dependent targeting on Fokl.
  • dCas9 inactive Cas9
  • FokI-dCas9 Fokl endonuclease
  • dCas9 portion confers guide RNA-dependent targeting on Fokl.
  • engineered RNA-guided Fold nucleases see, e.g., Havlicek et al. (2017) Mol. Ther. 25(2):342-355, Pan et al. (2016) Sci Rep. 6:35794, Tsai et al. (2014) Nat Biotechnol. 32(6):569-576; herein incorporated by reference.
  • any other Cas enzymes and variants described in other sections of the application can be used similarly.
  • the RNA-guided nuclease is provided in the form of a protein, optionally where the nuclease is complexed with a gRNA to form a ribonucleoprotein (RNP) complex.
  • the RNA-guided nuclease is provided by a nucleic acid encoding the RNA-guided nuclease, such as an RNA (e.g., messenger RNA) or DNA (expression vector).
  • the RNA-guided nuclease and the gRNA are both provided by vectors, such as the vectors and the vector system described in other parts of the application (all incorporated herein by reference). Both can be expressed by a single vector or separately on different vectors.
  • the vectors encoding the RNA-guided nuclease and gRNA may be included in the vector system comprising the engineered retron msr gene, msd gene and ret gene sequences.
  • the RNA-guided nuclease is fused to the RT and/or the msDNA.
  • the RNP complex may be administered to a subject or delivered into a cell by methods known in the art, such as those described in U.S. Pat. No. 11,390,884, which is incorporated by reference herein in its entirety.
  • the endonuclease/gRNA ribonucleoprotein (RNP) complexes are delivered to cells by electroporation. Direct delivery of the RNP complex to a subject or cell eliminates the need for expression from nucleic acids (e.g., transfection of plasmids encoding Cas9 and gRNA). It also eliminates unwanted integration of DNA segments derived from nucleic acid delivery (e.g., transfection of plasmids encoding Cas9 and gRNA). An endonuclease/gRNA ribonucleoprotein (RNP) complex usually is formed prior to administration.
  • Codon usage may be optimized to further improve production of an RNA-guided nuclease and/or reverse transcriptase (RT) in a particular cell or organism.
  • a nucleic acid encoding an RNA-guided nuclease or reverse transcriptase can be modified to substitute codons having a higher frequency of usage in a yeast cell, a bacterial cell, a human cell, a non-human cell, a mammalian cell, a rodent cell, a mouse cell, a rat cell, or any other host cell of interest, as compared to the naturally occurring polynucleotide sequence.
  • the protein can be transiently, conditionally, or constitutively expressed in the cell.
  • the engineered retron used for genome editing with nuclease genome editing systems can further include accessory or enhancer proteins for recombination.
  • recombination enhancers can include nonhomologous end joining (NHEJ) inhibitors (e.g., inhibitor of DNA ligase IV, a KU inhibitor (e.g., KU70 or KU80), a DNA-PKc inhibitor, or an artemis inhibitor) and homologous directed repair (HDR) promoters, or both, that can enhance or improve more precise genome editing and/or the efficiency of homologous recombination.
  • the recombination accessory or enhancers can comprise C-terminal binding protein interacting protein (CtIP), cyclinB2, Rad family members (e.g., Rad50, Rad51, Rad52, etc).
  • CtIP is a transcription factor containing C2H2 zinc fingers that are involved in early steps of homologous recombination. Mammalian CtIP and its orthologs in other eukaryotes promote the resection of DNA double-strand breaks and are essential for meiotic recombination.
  • HDR may be enhanced by using Cas9 nuclease associated (e.g., fused) to an N-terminal domain of CtIP, an approach that forces CtIP to the cleavage site and increases transgene integration by HDR.
  • an N-terminal fragment of CtIP called HE for HDR enhancer, may be sufficient for HDR stimulation and requires the CtIP multimerization domain and CDK phosphorylation sites to be active.
  • HDR stimulation by the Cas9-HE fusion depends on the guide RNA used, and therefore the guide RNA will be designed accordingly.
  • any target gene or sequence in a host cell can be edited or modified for a desired trait, including but not limited to: Myostatin (e.g., GDF8) to increase muscle growth; Pc POLLED to induce hairlessness; KISS 1R to induce bore taint; Dead end protein (dnd) to induce sterility; Nano2 and DDX to induce sterility; CD 163 to induce PRRSV resistance; RELA to induce ASFV resilience; CD 18 to induce Mannheimia (Pasteurella) haemolytica resilience; NRAMPl to induce tuberculosis resilience; Negative regulators of muscle mass (e.g., Myostatin) to increase muscle mass.
  • Myostatin e.g., GDF8
  • Pc POLLED to induce hairlessness
  • KISS 1R to induce bore taint
  • Dead end protein (dnd) to induce sterility
  • Nano2 and DDX to induce sterility
  • CD 163 to induce PRRSV resistance
  • RELA
  • bacteriophages which naturally shape bacterial communities, can be co-opted as a biological technology to help eliminate pathogenic bacteria from our bodies and food supply.
  • Phage genome editing is a critical tool to engineer more effective phage technologies.
  • editing phage genomes has traditionally been a low efficiency process that requires laborious screening, counter selection, or in vitro construction of modified genomes. These requirements impose limitations on the type and throughput of phage modifications, which in turn limit our knowledge and potential for innovation.
  • a scalable approach for engineering phage genomes using recombitrons modified bacterial retrons that generate recombineering donor DNA along with single stranded binding and annealing proteins to integrate those donors into phage genomes.
  • This system can efficiently create genome modifications in multiple distinct phages without the need for counterselection.
  • the process is continuous, with edits accumulating in the phage genome the longer the phage is cultured with the host, and multiplexable, with different editing hosts contributing distinct mutations along the genome of a phage in a mixed culture.
  • recombitrons yield single-base substitutions at up to 99% efficiency, short ( ⁇ 20 base pair) insertions and deletions at 5-50%, and up to 5 distinct mutations installed on a single phage genome, all without counterselection and only a few hours of hands-on time.
  • compositions and methods described herein provide genomic editing of phage genomes by supplying donor DNA and by using endogenously or recombinantly expressed proteins that facilitate transfer of the edited sequences from the donor DNA into the phage genomes during phage replication.
  • ncRNAs retron noncoding RNAs
  • Editing of phage genomes generally is done during phage replication. Once a bacteriophage attaches to a susceptible host, it pursues one of two replication strategies: lytic or lysogenic. During a lytic replication cycle, a phage attaches to a susceptible host bacterium, introduces its genome into the host cell cytoplasm, and utilizes the ribosomes of the host to manufacture its proteins. The host cell resources are rapidly converted to phage genomes and capsid proteins, which assemble into multiple copies of the original phage. As the host cell dies, it is either actively or passively lysed, releasing the new bacteriophage to infect another host cell.
  • the phage In the lysogenic replication cycle, the phage also attaches to a susceptible host bacterium and introduces its genome into the host cell cytoplasm. However, the phage genome is instead integrated into the bacterial cell chromosome or maintained as an episomal element where, in both cases, it is replicated and passed on to daughter bacterial cells without killing them.
  • the phage with the genomes that will be edited can be lytic, temperate, or lysogenic phage.
  • one type editing that can be performed using the methods described here can be converting temperate or lysogenic phages into lytic phages.
  • the donor DNA includes a sequence having a sequence identity of about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% to a target phage genomic DNA sequence (or a complement thereof).
  • the donor DNA, or complement thereof includes a sequence having a sequence identity of at least about 90%, 95%, 96%, 97%, 98%, or 99% to a target nucleic acid.
  • the target phage sequences can be any site in the phage genome.
  • the donor DNA do not edit target phage sequences involved in phage cellular entry or phage replication. Instead, the donor DNA can, for example, target phage sequences that bacterial cells defensively target.
  • the target sites in phage genomes are selected to improve phage killing or increase phage inhibition of bacterial growth.
  • the endogenously or recombinantly expressed proteins facilitate transfer of the editing sequences from the donor DNA into the phage genomes during phage replication.
  • These proteins can include one or more single strand annealing proteins (SSAPs), single-stranded DNA binding proteins (SSBs), mutant mismatch repair proteins, or a combination thereof.
  • SSAPs single strand annealing proteins
  • SSBs single-stranded DNA binding proteins
  • mutant mismatch repair proteins or a combination thereof.
  • CRISPR clustered regularly interspaced short palindromic repeats
  • Cas CRISPR-associated systems
  • the retron is engineered with a heterologous sequence encoding a donor polynucleotide suitable for use with a CRISPR/Cas genome editing system.
  • Donor polynucleotides comprise a sequence comprising an intended genome edit flanked by a pair of homology arms responsible for targeting the donor polynucleotide to the target locus to be edited in a cell.
  • the donor polynucleotide typically comprises a 5' homology arm that hybridizes to a 5' genomic target sequence and a 3' homology arm that hybridizes to a 3' genomic target sequence.
  • the homology arms are referred to herein as 5' and 3' (i.e., upstream and downstream) homology arms, which relate to the relative position of the homology arms to the nucleotide sequence comprising the intended edit within the donor polynucleotide.
  • the 5' and 3' homology arms hybridize to regions within the target locus in the genomic DNA to be modified, which are referred to herein as the "5' target sequence” and "3' target sequence,” respectively.
  • a homology arm must be sufficiently complementary for hybridization to the target sequence to mediate homologous recombination between the donor polynucleotide and genomic DNA at the target locus.
  • a homology arm may comprise a nucleotide sequence having at least about 80-100% sequence identity to the corresponding genomic target sequence, including any percent identity within this range, such as at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity thereto, wherein the nucleotide sequence comprising the intended edit can be integrated into the genomic DNA by HDR at the genomic target locus recognized (i.e., having sufficient complementary for hybridization) by the 5' and 3' homology arms.
  • the corresponding homologous nucleotide sequences in the genomic target sequence flank a specific site for cleavage and/or a specific site for introducing the intended edit.
  • the distance between the specific cleavage site and the homologous nucleotide sequences can be several hundred nucleotides. In some embodiments, the distance between a homology arm and the cleavage site is 200 nucleotides or less (e.g., 0, 10, 20, 30, 50, 75, 100, 125, 150, 175, and 200 nucleotides). In most cases, a smaller distance may give rise to a higher gene targeting rate.
  • the donor polynucleotide is substantially identical to the target genomic sequence, across its entire length except for the sequence changes to be introduced to a portion of the genome that encompasses both the specific cleavage site and the portions of the genomic target sequence to be altered.
  • a homology arm can be of any length, e.g., 10 nucleotides or more, 15 nucleotides or more, 20 nucleotides or more, 50 nucleotides or more, 100 nucleotides or more, 250 nucleotides or more, 300 nucleotides or more, 350 nucleotides or more, 400 nucleotides or more, 450 nucleotides or more, 500 nucleotides or more, 1000 nucleotides (1 kb) or more, 5000 nucleotides (5 kb) or more, 10000 nucleotides (10 kb) or more, etc. In some instances, the 5' and 3' homology arms are substantially equal in length to one another.
  • the 5' and 3' homology arms are not necessarily equal in length to one another.
  • one homology arm may be 30% shorter or less than the other homology arm, 20% shorter or less than the other homology arm, 10% shorter or less than the other homology arm, 5% shorter or less than the other homology arm, 2% shorter or less than the other homology arm, or only a few nucleotides less than the other homology arm.
  • the 5' and 3' homology arms are substantially different in length from one another, e.g., one may be 40% shorter or more, 50% shorter or more, sometimes 60% shorter or more, 70% shorter or more, 80% shorter or more, 90% shorter or more, or 95% shorter or more than the other homology arm.
  • the donor polynucleotide is used in combination with an RNA-guided nuclease, which is targeted to a particular genomic sequence (i.e., genomic target sequence to be modified) by a guide RNA.
  • a target-specific guide RNA comprises a nucleotide sequence that is complementary to a genomic target sequence, and thereby mediates binding of the nuclease- gRNA complex by hybridization at the target site.
  • the gRNA can be designed with a sequence complementary to the sequence of a minor allele to target the nuclease-gRNA complex to the site of a mutation.
  • the mutation may comprise an insertion, a deletion, or a substitution.
  • the mutation may include a single nucleotide variation, gene fusion, translocation, inversion, duplication, frameshift, missense, nonsense, or other mutation associated with a phenotype or disease of interest.
  • the targeted minor allele may be a common genetic variant or a rare genetic variant.
  • the gRNA is designed to selectively bind to a minor allele with single base-pair discrimination, for example, to allow binding of the nuclease-gRNA complex to a single nucleotide polymorphism (SNP).
  • SNP single nucleotide polymorphism
  • the gRNA may be designed to target disease-relevant mutations of interest for the purpose of genome editing to remove the mutation from a gene.
  • the gRNA can be designed with a sequence complementary to the sequence of a major or wild-type allele to target the nuclease-gRNA complex to the allele for the purpose of genome editing to introduces a mutation into a gene in the genomic DNA of the cell, such as an insertion, deletion, or substitution.
  • Such genetically modified cells can be used, for example, to alter phenotype, confer new properties, or produce disease models for drug screening.
  • the RNA-guided nuclease used for genome modification is a clustered regularly interspersed short palindromic repeats (CRISPR) system Cas nuclease.
  • CRISPR clustered regularly interspersed short palindromic repeats
  • Any RNA-guided Cas nuclease capable of catalyzing site-directed cleavage of DNA to allow integration of donor polynucleotides by the HDR mechanism can be used in genome editing, including CRISPR system type I, type II, or type III Cas nucleases.
  • Cas proteins include Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8al, Cas8a2, Cas8b, Cas8c, Cas9 (Csnl or Csxl2), CaslO, CaslOd, CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Cs
  • a type II CRISPR system Cas9 endonuclease is used.
  • Cas9 nucleases from any species, or biologically active fragments, variants, analogs, or derivatives thereof that retain Cas9 endonuclease activity i.e., catalyze site-directed cleavage of DNA to generate double-strand breaks
  • the Cas9 need not be physically derived from an organism but may be synthetically or recombinantly produced.
  • Cas9 sequences from a number of bacterial species are well known in the art and listed in the National Center for Biotechnology Information (NCBI) database.
  • sequences or a variant thereof comprising a sequence having at least about 70-100% sequence identity thereto, including any percent identity within this range, such as 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity thereto, can be used for genome editing, as described herein. See also Fonfara et al. (2014) Nucleic Acids Res. 42(4):2577-90; Kapitonov et al. (2015) J. Bacteriol.
  • the CRISPR-Cas system naturally occurs in bacteria and archaea where it plays a role in RNA-mediated adaptive immunity against foreign DNA.
  • the bacterial type II CRISPR system uses the endonuclease, Cas9, which forms a complex with a guide RNA (gRNA) that specifically hybridizes to a complementary genomic target sequence, where the Cas9 endonuclease catalyzes cleavage to produce a double-stranded break.
  • gRNA guide RNA
  • Targeting of Cas9 typically further relies on the presence of a 5' protospacer-adjacent motif (PAM) in the DNA at or near the gRNA-binding site.
  • PAM 5' protospacer-adjacent motif
  • the genomic target site will typically comprise a nucleotide sequence that is complementary to the gRNA and may further comprise a protospacer adjacent motif (PAM).
  • the target site comprises 20-30 base pairs in addition to a 3 base pair PAM.
  • the first nucleotide of a PAM can be any nucleotide, while the two other nucleotides will depend on the specific Cas9 protein that is chosen.
  • Exemplary PAM sequences are known to those of skill in the art and include, without limitation, NNG, NGN, NAG, and NGG, wherein N represents any nucleotide.
  • the allele targeted by a gRNA comprises a mutation that creates a PAM within the allele, wherein the PAM promotes binding of the Cas9-gRNA complex to the allele.
  • the gRNA is 5-50 nucleotides, 10-30 nucleotides, 15-25 nucleotides, 18-22 nucleotides, or 19-21 nucleotides in length, or any length between the stated ranges, including, for example, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides in length.
  • the guide RNA may be a single guide RNA comprising crRNA and tracrRNA sequences in a single RNA molecule, or the guide RNA may comprise two RNA molecules with crRNA and tracrRNA sequences residing in separate RNA molecules.
  • Cpfl is another class II CRISPR/Cas system RNA-guided nuclease with similarities to Cas9 and may be used analogously. Unlike Cas9, Cpfl does not require a tracrRNA and only depends on a crRNA in its guide RNA, which provides the advantage that shorter guide RNAs can be used with Cpfl for targeting than Cas9. Cpfl is capable of cleaving either DNA or RNA.
  • the PAM sites recognized by Cpfl have the sequences 5'-YTN-3' (where "Y” is a pyrimidine and “N” is any nucleobase) or 5'-TTN-3', in contrast to the G-rich PAM site recognized by Cas9.
  • Cpfl cleavage of DNA produces double-stranded breaks with a sticky- ends having a 4 or 5 nucleotide overhang.
  • Ledford et al. (2015) Nature. 526 (7571): 17-17, Zetsche et al. (2015) Cell. 163 (3):759-771 Murovec et al. (2017) Plant Biotechnol. J. 15(8):917-926, Zhang et al. (2017) Front. Plant Sci. 8: 177, Fernandes et al. (2016) Postepy Biochem. 62(3):315-326; herein incorporated by reference.
  • C2clis another class II CRISPR/Cas system RNA-guided nuclease that may be used.
  • C2cl similarly to Cas9, depends on both a crRNA and tracrRNA for guidance to target sites.
  • RNA-guided FokI nucleases comprise fusions of inactive Cas9 (dCas9) and the FokI endonuclease (FokI-dCas9), wherein the dCas9 portion confers guide RNA-dependent targeting on FokI.
  • dCas9 inactive Cas9
  • FokI-dCas9 FokI endonuclease
  • dCas9 portion confers guide RNA-dependent targeting on FokI.
  • engineered RNA-guided FokI nucleases see, e.g., Havlicek et al. (2017) Mol. Ther. 25(2):342-355, Pan et al. (2016) Sci Rep. 6:35794, Tsai et al. (2014) Nat Biotechnol. 32(6):569-576; herein incorporated by reference.
  • the RNA-guided nuclease can be provided in the form of a protein, optionally where the nuclease complexed with a gRNA, or provided by a nucleic acid encoding the RNA-guided nuclease, such as an RNA (e.g., messenger RNA) or DNA (expression vector).
  • a nucleic acid encoding the RNA-guided nuclease such as an RNA (e.g., messenger RNA) or DNA (expression vector).
  • the RNA-guided nuclease and the gRNA are both provided by vectors. Both can be expressed by a single vector or separately on different vectors.
  • the vector(s) encoding the RNA-guided nuclease an gRNA may be included in the vector system comprising the engineered retron msr gene, msd gene and ret gene sequences.
  • Codon usage may be optimized to improve production of an RNA-guided nuclease and/or retron reverse transcriptase in a particular cell or organism.
  • a nucleic acid encoding an RNA-guided nuclease or reverse transcriptase can be modified to substitute codons having a higher frequency of usage in a yeast cell, a bacterial cell, a human cell, a nonhuman cell, a mammalian cell, a rodent cell, a mouse cell, a rat cell, or any other host cell of interest, as compared to the naturally occurring polynucleotide sequence.
  • the protein can be transiently, conditionally, or constitutively expressed in the cell.
  • RECOMBINEERING Recombineering can be used in modifying chromosomal as well as episomal replicons in cells, for example, to create gene replacements, gene knockouts, deletions, insertions, inversions, or point mutations.
  • Recombineering can also be used to modify a plasmid or bacterial artificial chromosome (B AC), for example, to clone a gene or insert markers or tags.
  • B AC bacterial artificial chromosome
  • the engineered retrons described herein can be used in recombineering applications to provide linear single-stranded or doublestranded DNA for recombination.
  • Homologous recombination may be mediated by bacteriophage proteins such as RecE/RecT from Rac prophage or Reda. S from bacteriophage lambda.
  • the linear DNA should have sufficient homology at the 5' and 3' ends to a target DNA molecule present in a cell (e.g., plasmid, BAC, or chromosome) to allow recombination.
  • the linear double-stranded or single-stranded DNA molecule used in recombineering comprises a sequence having the intended edit to be inserted flanked by two homology arms that target the linear DNA molecule to a target site for homologous recombination.
  • Homology arms for recombineering typically range in length from 13-300 nucleotides, or 20 to 200 nucleotides, including any length within this range such as 13, 14, 15, 16, 17, 18, 19, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, or 200 nucleotides in length.
  • a homology arm is at least 15, at least 20, at least 30, at least 40, or at least 50 or more nucleotides in length.
  • Homology arms ranging from 40-50 nucleotides in length generally have sufficient targeting efficiency for recombination; however, longer homology arms ranging from 150 to 200 bases or more may further improve targeting efficiency.
  • the 5' homology arm and the 3' homology arm differ in length.
  • the linear DNA may have about 50 bases at the 5' end and about 20 bases at the 3' end with homology to the region to be targeted.
  • the bacteriophage homologous recombination proteins can be provided to a cell as proteins or by one or more vectors encoding the recombination proteins.
  • one or more vectors encoding the bacteriophage recombination proteins are included in the vector system comprising the engineered retron msr gene, msd gene and ret gene sequences.
  • a number of bacterial strains containing prophage recombination systems are available for recombineering, including, without limitation, DY380, containing a defective X prophage with recombination proteins exo, bet, and gam; EL250, derived from DY380, which in addition to the recombination genes found in DY380, also contains a tightly controlled arabinose-inducible flpe gene (flpe mediates recombination between two identical frt sites); EL350, also derived from DY380, which in addition to the recombination genes found in DY380, also contains a tightly controlled arabinose-inducible ere gene (ere mediates recombination between two identical loxP sites; SW102, derived from DY380, which is designed for BAC recombineering using a galK positive/negative selection; SW105, derived from EL250, which can also be used for galK positive/negative selection, but like EL250
  • Recombineering can be carried out by transfecting bacterial cells of such strains with an engineered retron comprising a heterologous sequence encoding a linear DNA suitable for recombineering.
  • an engineered retron comprising a heterologous sequence encoding a linear DNA suitable for recombineering.
  • the heterologous sequence in the engineered retron construct comprises a synthetic CRISPR protospacer DNA sequence to allow molecular recording.
  • the endogenous CRISPR Casl-Cas2 system is normally utilized by bacteria and archaea to keep track of foreign DNA sequences originating from viral infections by storing short sequences (i.e., protospacers) that confer sequence-specific resistance to invading viral nucleic acids within genome-based arrays. These arrays not only preserve the spacer sequences but also record the order in which the sequences are acquired, generating a temporal record of acquisition events.
  • This system can be adapted to record arbitrary DNA sequences into a genomic CRISPR array in the form of "synthetic protospacers" that are introduced into cells using engineered retrons.
  • Engineered retrons carrying the protospacer sequences can be used for integration of synthetic CRISPR protospacer sequences at a specific genomic locus by utilizing the CRISPR system Casl-Cas2 complex.
  • Molecular recording can be used to keep track of certain biological events by producing a stable genetic memory tracking code. See, e.g., Shipman et al. (2016) Science 353(6298):aafl l75 and International Patent Application Publication No. WO/2018/191525; herein incorporated by reference in their entireties.
  • the CRISPR-Cas system is harnessed to record specific and arbitrary DNA sequences into a bacterial genome.
  • the DNA sequences can be produced by an engineered retron within the cell.
  • the engineered retron can be used to produce the protospacers within the cell, which are inserted into a CRISPR array within the cell.
  • the cell may be modified to include one or more engineered retrons (or vector systems encoding them) that can produce one or more synthetic protospacers in the cell, wherein the synthetic protospacers are added to the CRISPR array.
  • a record of defined sequences, recorded over many days, and in multiple modalities can be generated.
  • the engineered retron comprises an msd protospacer nucleic acid region or an msr protospacer nucleic acid region.
  • the protospacer sequence is first incorporated into the msr RNA, which is reverse transcribed into protospacer DNA.
  • Double stranded protospacer DNA is produced when two complementary protospacer DNA sequences having complementary sequences hybridize, or when a double-stranded structure (such as a hairpin) is formed in a single stranded protospacer DNA (e.g., a single msDNA can form an appropriate hairpin structure to provide the double stranded DNA protospacer).
  • a single stranded DNA produced in vivo from a first engineered retron may be hybridized with a complementary single-stranded DNA produced in vivo from the same retron or a second engineered retron or may form a hairpin structure and then used as a protospacer sequence to be inserted into a CRISPR array as a spacer sequence.
  • the engineered retron(s) should provide sufficient levels of the protospacer sequence within a cell for incorporation into the CRISPR array.
  • the use of protospacers generated within the cell extends the in vivo molecular recording system from only capturing information known to a user, to capturing biological or environmental information that may be previously unknown to a user.
  • an msDNA protospacer sequence in an engineered retron construct may be driven by a promoter that is downstream of a sensor pathway for a biological phenomenon or environmental toxin.
  • the capture and storage of the protospacer sequence in the CRISPR array records the event. If multiple msDNA protospacers are driven by different promoters, the activity of those promoters is recorded (along with anything that may be upstream of the promoters) as well as the relative order of promoter activity (based on the relative position of spacer sequences in the CRISPR array).
  • the CRISPR array may be sequenced to determine whether a given biological or environmental event has taken place and the order of multiple events, given by the presence and relative position of msDNA-derived spacers in the CRISPR array.
  • the synthetic protospacer further comprises an AAG PAM sequence at its 5' end. Protospacers including the 5' AAG PAM are acquired by the CRISPR array with greater efficiency than those that do not include a PAM sequence.
  • Casl and Cas2 are provided by a vector that expresses the Casl and Cas2 at a level sufficient to allow the synthetic protospacer sequences produced by engineered retrons to be acquired by a CRISPR array in a cell.
  • a vector system can be used to allow molecular recording in a cell that lacks endogenous Cas proteins.
  • the engineered ncRNAs, reverse transcriptases, Cas nucleases, and the expression systems described herein and/or cells containing the engineered ncRNAs, reverse transcriptases, Cas nucleases, or expression systems can be administered to a subject.
  • a subject may suffer from a disease or condition or be suspected of suffering from a disease or condition. Symptoms of the disease or condition can be reduced by such administration. In some cases, progression of the disease or condition can be prevented or reduced by such administration. In some cases, the subject may be asymptomatic but be genetically predisposed to developing disease or condition.
  • described herein are methods of administering one or more engineered ncRNAs, reverse transcriptases, Cas nucleases, and/or expression systems therefor and/or cells containing the engineered ncRNAs, reverse transcriptases, Cas nucleases, to a subject.
  • the methods can provide prophylaxis, amelioration and/or therapy for a variety of diseases or conditions, including cystic fibrosis, thalassemia, sickle cell anemia, Huntington's disease, diabetes, Duchenne's Muscular Dystrophy, Tay-Sachs Disease, Marfan syndrome, Alzheimer’s disease, Leber's hereditary optic atrophy (LHON), myoclonic epilepsy with ragged red fibers (MERRF), mitochondrial encephalopathy, lactic acidosis, and stroke-like episodes (MELAS; a type of dementia), obesity, cancers, brain ischemia, coronary disease, myocardial infarction, reperfusion hindrance of ischemic diseases, atopic dermatitis, psoriasis vulgaris, contact dermatitis, keloid, decubital ulcer, ulcerative colitis, Crohn's disease, nephropathy, glomerulosclerosis, albuminuria, nephritis, renal failure, rhe
  • the methods of diagnosing, prognosing, treating, and/or preventing a disease, state, or condition in or of a subject can include modifying a polynucleotide in a subject or cell thereof using a composition, system, or component thereof of the engineered retron as described herein, and/or include detecting a diseased or healthy polynucleotide in a subject or cell thereof using a composition, system, or component thereof of the engineered retron as described herein.
  • the method of treatment or prevention can include using a composition, system, or component of the engineered retron to modify a polynucleotide of an infectious organism (e.g., bacterial or virus) within a subject or cell thereof.
  • an infectious organism e.g., bacterial or virus
  • the method of treatment or prevention can include using a composition, system, or component of the engineered retron to modify a polynucleotide of an infectious organism or symbiotic organism within a subject.
  • composition, system, and components of the engineered retron can be used to develop models of diseases, states, or conditions.
  • composition, system, and components of the engineered retron can be used to detect a disease state or correction thereof, such as by a method of treatment or prevention described herein.
  • compositions, system, and components of the engineered retron can be used to screen and select cells that can be used, for example, as treatments or preventions described herein.
  • composition, system, and components thereof can be used to develop biologically active agents that can be used to modify one or more biologic functions or activities in a subject or a cell thereof.
  • the method can include delivering a composition, system, and/or component of the engineered retron to a subject or cell thereof, or to an infectious or symbiotic organism by a suitable delivery technique and/or composition.
  • the components can operate as described elsewhere herein to elicit a nucleic acid modification event.
  • the nucleic acid modification event can occur at the genomic, epigenomic, and/or transcriptomic level. DNA and/or RNA cleavage, gene activation, and/or gene deactivation can occur.
  • compositions, system, and components of the engineered retron as described elsewhere herein can be used to treat and/or prevent a disease, such as a genetic and/or epigenetic disease, in a subject; to treat and/or prevent genetic infectious diseases in a subject, such as bacterial infections, viral infections, fungal infections, parasite infections, and combinations thereof; to modify the composition or profile of a microbiome in a subject, which can in turn modify the health status of the subject; to modify cells ex vivo, which can then be administered to the subject whereby the modified cells can treat or prevent a disease or symptom thereof; or to treat mitochondrial diseases, where the mitochondrial disease etiology involves a mutation in the mitochondrial DNA.
  • a disease such as a genetic and/or epigenetic disease
  • genetic infectious diseases in a subject such as bacterial infections, viral infections, fungal infections, parasite infections, and combinations thereof
  • modify the composition or profile of a microbiome in a subject which can in turn modify the health status of the subject
  • a method of treating a subject comprising inducing gene editing by transforming the subject with the Cas effector(s), and encoding and expressing in vivo the remaining portions of the composition, system, (e.g., RNA, guides), complex or component of the engineered retron.
  • a suitable repair template may also be provided by the engineered retron as described herein elsewhere.
  • a method of inducing one or more polynucleotide modifications in a eukaryotic or prokaryotic cell or component thereof (e.g., a mitochondria) of a subject, infectious organism, and/or organism of the microbiome of the subject can include the introduction, deletion, or substitution of one or more nucleotides at a target sequence of a polynucleotide of one or more cell(s).
  • the modification can occur in vitro, ex vivo, in situ, or in vivo.
  • the method of treating or inhibiting a condition or a disease caused by one or more mutations in a genomic locus in a eukaryotic organism or a non-human organism can include manipulation of a target sequence within a coding, non-coding or regulatory element of said genomic locus in a target sequence in a subject or a non-human subject in need thereof comprising modifying the subject or a non -human subject by manipulation of the target sequence and wherein the condition or disease is susceptible to treatment or inhibition by manipulation of the target sequence including providing treatment comprising delivering a composition comprising the particle delivery system or the delivery system or the virus particle of any one of the above embodiment or the cell of any one of the above embodiment.
  • particle delivery system or the delivery system or the virus vector (in viral particle) of any one of the above embodiments or the cell of any one of the above embodiments in ex vivo or in vivo gene or genome editing; or for use in in vitro, ex vivo or in vivo gene therapy.
  • target polynucleotide modification using the subject engineered retron and the associated composition, vectors, system and methods comprises addition, deletion, or substitution of 1 -about 10k nucleotides at each target sequence of said polynucleotide of said cell(s).
  • the modification can include the addition, deletion, or substitution of at least 1, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, 100, 200, 250, 300, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 5000, 6000, 7000, 8000, 9000, 10,000 or more nucleotides at each target sequence.
  • formation of system or complex results in cleavage, nicking, and/or another modification of one or both strands in or near (e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, or more base pairs from) the target sequence.
  • a method of modifying a target polynucleotide in a cell to treat or prevent a disease can include allowing a composition, system, or component of the subject engineered retron to bind to the target polynucleotide, e.g., to effect cleavage, nicking, or other modification as the composition, system, is capable of said target polynucleotide, thereby modifying the target polynucleotide, wherein the composition, system, or component thereof, complex with a guide sequence, and hybridize said guide sequence to a target sequence within the target polynucleotide, wherein said guide sequence is optionally linked to a tracr mate sequence, which in turn can hybridize to a tracr sequence.
  • modification can include cleaving or nicking one or two strands at the location of the target sequence by one or more components of the composition, system, or component thereof.
  • the engineered retron and the associated compositions, systems, vectors, uses, and methods of use can be used to treat diseases of the circulatory system.
  • the treatment can be carried out by using an AAV or a lentiviral vector to deliver the engineered retron, composition, system, and/or vector described herein to modify hematopoietic stem cells (HSCs) or iPSCs in vivo or ex vivo.
  • HSCs hematopoietic stem cells
  • iPSCs hematopoietic stem cells
  • the treatment can be carried out by correcting HSCs or iPSCs as to the disease using a composition, system, herein or a component thereof, wherein the composition, system, optionally includes a suitable HDR repair template (e.g., a template in the msDNA of the engineered retron).
  • a suitable HDR repair template e.g., a template in the msDNA of the engineered retron.
  • the treatment or prevention for treating a circulatory system or blood disease can include modifying a human cord blood cell.
  • the treatment or prevention for treating a circulatory system or blood disease can include modifying a granulocyte colony-stimulating factor-mobilized peripheral blood cell (mPB) with any modification described herein.
  • the human cord blood cell or mPB can be CD34 + .
  • the cord blood cells or mPB cells modified are autologous.
  • the cord blood cells or mPB cells are allogenic. In addition to the modification of the disease genes, allogenic cells can be further modified using the composition, system, described herein to reduce the immunogenicity of the cells when delivered to the recipient.
  • the modified cord blood cells or mPB cells can be optionally expanded in vitro.
  • the modified cord blood cell(s) or mPB cells can be derived to a subject in need thereof using any suitable delivery technique.
  • composition and system may be engineered to target genetic locus or loci in HSCs.
  • the components of the systems can be codon-optimized for a eukaryotic cell and especially a mammalian cell, e.g., a human cell, for instance, HSC, or iPSC and sgRNA targeting a locus or loci in HSC, such as circulatory disease, can be prepared.
  • a mammalian cell e.g., a human cell, for instance, HSC, or iPSC and sgRNA targeting a locus or loci in HSC, such as circulatory disease
  • These may be delivered via particles, such as the lipid nanoparticle delivery system described herein.
  • the particles may be formed by the components of the systems herein being admixed.
  • the HSCs or iPCS can be expanded prior to administration to the subject.
  • Expansion of HSCs can be via any suitable method such as that described by, Lee, “Improved ex vivo expansion of adult hematopoietic stem cells by overcoming CUL4-mediated degradation of H0XB4.” Blood. 2013 May 16;121(20):4082-9. doi: 10.1182/blood-2012-09-455204. Epub 2013 Mar 21.
  • the HSCs or iPSCs modified are autologous. In some embodiments, the HSCs or iPSCs are allogenic. In addition to the modification of the disease genes, allogenic cells can be further modified using the composition, system, described herein to reduce the immunogenicity of the cells when delivered to the recipient.
  • the engineered retron and the associated compositions, systems, vectors, uses, and methods of use can be used to treat neurological diseases.
  • the neurological diseases comprise diseases of the brain and CNS.
  • Delivery options for the diseases in the brain include encapsulation of the systems in the form of either DNA or RNA into liposomes and conjugating to molecular Trojan horses for trans-blood brain barrier (BBB) delivery.
  • BBB trans-blood brain barrier
  • Molecular Trojan horses have been shown to be effective for delivery of B-gal expression vectors into the brain of non-human primates.
  • the same approach can be used to delivery vectors or vector systems of the invention.
  • an artificial virus can be generated for CNS and/or brain delivery.
  • the engineered retron and the associated compositions, systems, vectors, uses, and methods of use can be used to treat hearing diseases or hearing loss in one or both ears. Deafness is often caused by lost or damaged hair cells that cannot relay signals to auditory neurons.
  • the composition, system, or modified cells can be delivered to one or both ears for treating or preventing hearing disease or loss by any suitable method or technique known in the art, such as US20120328580 (e.g., auricular administration), by intratympanic injection (e.g., into the middle ear), and/or injections into the outer, middle, and/or inner ear; administration in situ, via a catheter or pump (U.S. 2006/0030837) and Jacobsen (U.S. Pat. No. 7,206,639). Also see US20120328580. Cells resulting from such methods can then be transplanted or implanted into a patient in need of such treatment.
  • the engineered retron and the associated compositions, systems, vectors, uses, and methods of use can be used to treat diseases in non-dividing cells.
  • exemplary non-dividing cells include muscle cells or neurons.
  • homologous recombination (HR) is generally suppressed in the G1 cell-cycle phase but can be turned back on using art- recognized methods, such as Orthwein et al. (Nature. 2015 Dec 17; 528(7582): 422-426).
  • the engineered retron and the associated compositions, systems, vectors, uses, and methods of use can be used to treat diseases of the eye. In some embodiments, the engineered retron and the associated compositions, systems, vectors, uses, and methods of use, can be used to treat muscle diseases and cardiovascular diseases.
  • the engineered retron and the associated compositions, systems, vectors, uses, and methods of use can be used to treat diseases of the liver and kidney.
  • the engineered retron and the associated compositions, systems, vectors, uses, and methods of use can be used to treat epithelial and lung diseases.
  • the engineered retron and the associated compositions, systems, vectors, uses, and methods of use can be used to treat diseases of the skin.
  • the engineered retron and the associated compositions, systems, vectors, uses, and methods of use can be used to treat cancer.
  • the engineered retron and the associated compositions, systems, vectors, uses, and methods of use can be used in adoptive cell therapy.
  • the engineered retron and the associated compositions, systems, vectors, uses, and methods of use can be used to treat infectious diseases.
  • the engineered retron and the associated compositions, systems, vectors, uses, and methods of use can be used to treat mitochondrial diseases.
  • gene transfer may more easily be performed under ex vivo conditions.
  • Ex vivo gene therapy refers to the isolation of cells from a subject, the delivery of a nucleic acid into cells in vitro, and then the return of the modified cells back into the subject. This may involve the collection of a biological sample comprising cells from the subject. For example, blood can be obtained by venipuncture, cells can be obtained by scrapings, and solid tissue samples can be obtained by surgical techniques etc. according to methods available in the art.
  • the subject who receives the cells is also the subject from whom the cells are harvested or obtained, which provides the advantage that the donated cells are autologous.
  • cells can be obtained from another subject (i.e., donor), a culture of cells from a donor, or from established cell culture lines. Cells may be obtained from the same or a different species than the subject to be treated, but preferably are of the same species, and more preferably of the same immunological profile as the subject.
  • Such cells can be obtained, for example, from a biological sample comprising cells from a close relative or matched donor, then transfected with nucleic acids (e.g., comprising an engineered retron), and administered to a subject in need of genome modification, for example, for treatment of a disease or condition.
  • nucleic acids e.g., comprising an engineered retron
  • Multiplexing the act of consolidating multiple discrete elements into a single composite channel - has enabled genomic technologies to scale toward the complexity of the biology we hope to understand.
  • This now-standard multiplexed gRNA workflow has allowed scientists run experiments across every gene in parallel with barely more effort than they might have previously put into determining the effect of single gene.
  • the typical multiplexing of a gRNA library precludes an important level of analysis: it is implemented across cells, where a single edit is made per genome, and thus cannot be used to study the interaction of mutations within a genome.
  • MAGE multiplexed automated genome engineering
  • ssDNA single stranded DNA
  • ssDNA single stranded DNA
  • a eukaryotic version of this technology has been developed to extend this approach to yeast (6).
  • MAGE is limited by its requirement for numerous labor-intensive recombineering cycles required to attain efficient combinatorial editing rates, and by its reliance on exogenously delivered oligonucleotides that leave no trackable plasmid element for phenotyping by proxy (7).
  • Base-editing (BE) and primeediting (PE) (8-10) are two other precise editing approaches that can be multiplexed (11-16).
  • Base-editors are the simplest to multiplex using tandem gRNAs but are limited to single base mutations of a defined type (either A»T-to-G»C or OG-to-T»A) (11,13-14).
  • Prime-editors have also been multiplexed, but the complexity of the editing elements grows quickly with additional sites.
  • multiplexed prime editing requires a three plasmid system, and multiple edits occur on the same genome in less than 1% of cells (12), while systems built for human and plant cells require two gRNAs per site in addition to the editing template, which can create issues with the assembly of multiplexed plasmids (14-16).
  • Retrons are bacterial tripartite systems that have been shown to provide phage defense (17-20).
  • Two of the components of the retron operon are a reverse transcriptase and a small (200-300 base), structured non-coding RNA (ncRNA).
  • the reverse transcriptase recognizes and partially reverse transcribes the ncRNA into a single-stranded DNA fragment that is present at the abundance of a cellular transcript (18,21-24).
  • retron ncRNA can be modified to encode an editing donor to precisely edit the genomes of bacteria, phage, plant, yeast, and even human cells (25-31).
  • these retron-derived editors have only been used to edit genomic positions one at a time.
  • these multiplexed, arrayed retron elements - termed multitrons - can be paired with single- stranded annealing proteins to edit prokaryotic genomes and with CRISPR components to edit eukaryotic genomes (26-29).
  • the changes include a broad range of different kinds of edits from single point mutations to large insertions, deletions and replacements/substitutions enabling large-scale genome engineering in prokaryotic and eukaryotic genomes.
  • These edits to the host genome are introduced by a reverse-transcribed editing donor, which edits the bacterial genome during replication and the yeast genome after a double-strand break (DSB) generated by a CRISPR derived nuclease. Editing of multiple DNA loci in a single genome is desired for many applications in biotechnology such as rewiring metabolic pathways to improve the production of compounds of interest, creating molecular recorders, new genetic circuits or improving the use of different cells as biosensors.
  • DSB double-strand break
  • pSLS.492 (29) plasmid containing a rpoB donor was used as backbone.
  • a donor upstream of the rpoB donor a 60 bp reverse oligo annealing (25bp) with the 5’ region of the msd and containing 35 bp of the new donor, and a 60 bp forward oligo annealing (25bp) with the 5’ end of rpoB donor and harboring the other half of the new donor were used.
  • a 60 bp forward oligo annealing (25b) with the 3’ region of the msd and containing 35 bp of the new donor, and a 60 bp reverse oligo annealing (25bp) with the 3’ end of rpoB donor and harboring the other half of the new donor were used.
  • a KLD reaction (NEB) was carried out to self-ligate the plasmid encoding an additional donor.
  • the mentioned cycle was repeated again.
  • the pCDF-DUET-1 vector (Novagen) was used as a backbone.
  • a parental plasmid (pAGD159; Table 3) containing a whole ncRNA with a gyrA donor downstream of the first T7 promoter, and Ecol-RT downstream of the second T7 promoter was constructed.
  • the ncRNA harboring the rpoB donor from pSLS.492 was amplified and cloned upstream and downstream of the gyrA -containing ncRNA by Gibson Assembly.
  • the msr was deleted from pAGD159 and subsequently cloned between the second T7 promoter and Ecol-RT using a Gibson Assembly approach.
  • the msd harboring the rpoB donor from pSLS.492 was amplified and cloned upstream and downstream of the gyrA -containing ncRNA by Gibson Assembly.
  • a T7 terminator was cloned between the msd array and the second T7 promoter.
  • the Golden Gate protocol was carried out in 20uL reactions as follows: 1 uL pAGD236, 5uL of each gBlock (3uL for 5x msd arrays), 1.5uL Bsal (NEB), 2uL T4 DNA ligase Buffer, 0.5 uL T4 DNA ligase (NEB).
  • the reaction consists on 30 or 60 cycles (depending on the complexity) of 5 min at 16°C and 5 min at 37°C and a final cycle of 10 min at 60°C.
  • the retron cassette (ncRNA and RT) from pSLS.492 was cloned into pORTMAGE-Ecl (33) upstream of the CspRecT gene ( Figure S2).
  • RBS optimization of Ecol RT and CspRecT genes were carried out using primers that contain the optimized RBS and self-ligating the plasmids using KLD reaction mix.
  • recombineering operon was cloned into pKD-46 (35) backbone to obtain the parental temperature-sensitive multitron plasmid (pAGD248).
  • pSCL390 a derivative of pZS.157 (Addgene #114454), was generated with a yeast codon-optimized P2A-Csy4 CDS gblock (IDT) cloned downstream of the SpCas9 CDS by Gibson Assembly.
  • pSCL.396 a derivative of pSCL.39 (Addgene #184973), was generated with the 5’ Hammerhead ribozyme and 3’ HDV ribozyme replaced by Csy4 recognition sites by amplification of the editron and backbone from pSCL.39 and assembled via Gibson Assembly.
  • pSCL.391 a derivative of pSCL.39 where a second editron, targeting the S. cerevisiae FAA1 locus was added on the 3’ end of the ADE2 -targeting editron by Gibson Assembly.
  • the cassette thus consists of two editrons, separated by a Csy4 recognition site, and flanked by a Hammerhead ribozyme and a HDV ribozyme on the 5’ and 3’ of the expression cassette, respectively.
  • pSCL.452 is a derivative of a derivative of pSCL.39, generated by Gibson Assembly of the pSCL.39 backbone, amplified to replace the recombitron with inverted PaqCI sites for Golden Gate assembly, with a gblock (IDT) encoding pSNR52p-msr-SUP4t.
  • plasmids carrying retron msd arrays for the editing of multiple loci in the yeast genome were generated by Golden Gate cloning of pre PaqCI-digested pSCL.452 with gBlocks (IDT) that encoded a PaqCI cut site, a retron msd-encoded donor and paired gRNA for editing, a Csy4 recognition sequence, and a PaqCI cut site (Fig 6E).
  • gBlocks were ordered with compatible nucleotide overhangs to enable random cloning of all combinations of gblocks into the entry plasmid, after PaqCI digestion.
  • All human vectors are derivatives of pSCL.273, itself a derivative of pCAGGS.
  • pCAGGS was modified by replacing the MCS and rb_glob_polyA sequence with an IDT gblock containing inverted BbsI restriction sites and a SpCas9 tracrRNA, using Gibson Assembly.
  • the resulting plasmid, pSCL.273, contains an SV40 ori for plasmid maintenance in HEK293T cells.
  • the strong CAG promoter is followed by the BbsI sites and SpCas9 tracrRNA.
  • Bbsl-mediated digestion of pSCL.273 yields a backbone for single or library cloning of plasmids with inserts that contain ⁇ retron RT - Hl promoter - hCtRNA n _msdRNAn_gRNA n ⁇ , by Gibson Assembly or Golden Gate cloning (see Fig 6e for an illustration of this principle).
  • the retron RT (or its catalytically dead counterpart) and Hl promoter fragments were synthesized through IDT, as were the hCtRNAn_msdRNA n _gRNA n units.
  • the E. coli strains used in this study were DH5a (New England Biolabs) for cloning purposes, bMS.346 (DE3) for retron recombineering assays. Bacteria were grown in LB medium (10 g/1 tryptone, 5 g/1 yeast extract, 5 g/1 NaCl). Antibiotics were added as required (carbenicillin, spectinomycin, kanamycin and chloramphenicol).
  • yeast strains were created by LiAc/SS carrier DNA/PEG transformation (52) of BY4742 (26).
  • Strains for evaluating the effect of Csy4 on genome editing efficiency were created by BY4742 integration of plasmids pZS.157 (Addgene #114454) or pSCL.390.
  • the plasmids were KpnI-linearized and inserted into the genome by homologous recombination into the HIS3 locus. Transformants were isolated on SC-HIS plates.
  • the retron cassette encoded in a pET-21 (+) plasmid (Novagen) and the CspRecT and mutLE32K in the plasmid pORTMAGE-Ecl (33) were overexpressed using 1 mM IPTG, 1 mM m-toluic acid and 0.2% arabinose for 16 h with shaking at 37°C.
  • bMS.346 electrocompetent cells containing pAC-LYC (42) plasmid were transformed with different multitron plasmid versions (Table 3 and 4) and growth for 16 h at 30°C.
  • Single colonies from the transformation plate were inoculated into 500uL of LB in triplicates in ImL deep-well plates and incubated at 30°C for 24 h with vigorous shaking to prevent the cells from settling.
  • a 1 1000 dilution of the cultures were passaged into LB 1% arabinose and incubated at 30°C for 24 h with vigorous shaking. This last step was repeated for a total of 72h of editing.
  • the parental strains (-Csy4: HIS3::pZS.157; +Csy4: 7/AS'3::pSCL390) were transformed with variants of the editron expression cassettes by LiAc/SS carrier DNA/PEG transformation.
  • Single colonies from the transformation plate were inoculated into 500uL of SC-HIS-URA 2% raffinose in triplicates in ImL deep-well plates and incubated at 30°C for 24 h with vigorous shaking to prevent the cells from settling. Cultures were passaged into SC- HIS-URA 2% galactose and incubated at 30°C for 24 h with vigorous shaking.
  • genomic DNA was extracted by (1) resuspending the cell pellets in 120uL of lysis buffer (100 mM EDTA pH 8, 50 mM Tris-HCl pH 8, 2% SDS) and heating them to 95 °C for 15 min; (2) cooling the lysate on ice and adding 60uL of protein precipitation buffer (7.5 M ammonium acetate), then inverting gently and placing samples at -20°C for lOmin; (3) centrifugation of the samples at maximum speed for 2mins (or until a clear supernatant forms) and collecting the supernatant ( ⁇ 100uL) in new 1 ,5mL tubes; (4) precipitating the nucleic acids by adding equal parts of ice-cold isopropanol to the samples, mixing the samples thoroughly and incubating the mix at -20°C for lOmin (or overnight for higher yield), followed by pelleting by centrifugation at maximum speed for 2min;
  • lysis buffer 100 mM EDTA pH 8, 50 mM Tris-HC
  • gDNA 0.5uL was used as template in 20-pl PCR reactions with primers flanking the edit site in of the target locus, which additionally contained adapters for Illumina sequencing preparation (Table 6). The primers do not bind to the retron msd donor sequence. These amplicons were indexed and sequenced on an Illumina MiSeq instrument and processed with custom Python software to quantify the percentage of precise edits using the retron derived RT-DNA template.
  • HEK293T cells expressing spCas9 from a piggyBac integrated, TRE3G driven, doxycycline-inducible (1 pg/ml) cassette (18), were seeded at 7 xlO 5 live cells/well in coated 6-well plates and grown in DMEM +GlutaMax supplement (Thermo Fisher #10566016) overnight.
  • Lipofectamine 3000 transfection mixes were prepared in independent triplicates and cells were transfected with 5ug of plasmid per well (3 wells per plasmid). Cells were passaged the next day and doxycycline was refreshed at passaging. Cells were grown for an additional 48 h, for a total of 72h of editing.
  • gDNA was extracted using a QIAamp DNA mini kit according to the manufacturer’s instructions. DNA was eluted in 150 pl of ultra-pure, nuclease-free water. 0.5uL of gDNA was used as template in 20-pl PCR reactions with primers flanking the edit site in of the target locus, which additionally contained adapters for Illumina sequencing preparation (Table 6). The primers do not bind to the retron msd donor sequence. These amplicons were indexed and sequenced on an Illumina MiSeq instrument and processed with custom Python software to quantify the percentage of precise edits using the retron derived RT-DNA template.
  • gDNA was tagmented using Tn5 transposase using the following reaction (50uL): 25 uL 2x TD Buffer (20 mM Tris-HCl pH 7.6, 10 mM MgC12 and 20% dimethyl formamide), 2.5uL Tn5 (in-house prepared) and 50 ng gDNA.
  • the reaction was incubated for Ih 30’ at 37 °C.
  • the gDNA was cleaned-up and eluted in 15uL using the DNA Clean & Concentrator (Zymo Research).
  • Tagmented gDNAs were indexed and sequenced on an Illumina MiSeq instrument.
  • E. coli strain bMS.346 whole genome variants were called against A. coli K12 sbstr. MG1655 genome (accession no. NC_000913) using Geneious Prime® 2023.2.1 software alignment tools.
  • Variants appearing in the genome of the wild-type and dead RT isolates were called against the bMS.346 parental strain.
  • lycopene extraction 1 ml of cells were centrifuged at 16,000g for 30 s, the supernatant was removed, and the cell pellet was resuspended with 1 mL water. Cells were re-centrifuged at 16,000g for 30 s, the supernatant was removed, and the cells were resuspended in 200 ml acetone and incubated in the dark for 15 min at 55 °C with intermittent vortexing. The mixture was centrifuged at 16,000g for 1 min and the supernatant containing the lycopene was transferred to 96 white/clear bottom plate.
  • Lycopene yield of the different colonies from each was calculated by normalizing the times of lycopene production against the control. Cells coming from different clusters of lycopene production were re-striked in LB- chloramphenicol agar plates grown for 24h at 30°C and for another 48h at room temperature. Between 3 and 8 colonies from each re-striking were selected to quantify the lycopene production following the described protocol and for Sanger sequencing across the dxs/idi targets.
  • retrons in bacterial recombineering was originally developed for applications in molecular recording (25) and has more recently been optimized to install single targeted edits and interrogate biology (26-29).
  • a retron ncRNA - which can be divided into two regions: an msr (multicopy single-stranded RNA) that is not reverse transcribed and an msd (multicopy single-stranded DNA) that is reverse transcribed - is modified to encode an editing donor within the msd region.
  • This modified ncRNA is expressed in cells along with a retron reverse transcriptase (e.g., retron Ecol-RT) that reverse transcribes the retron msd to produce an editing donor (RT -Donor).
  • retron reverse transcriptase e.g., retron Ecol-RT
  • SSAP single-stranded annealing protein
  • SSB host single-stranded binding protein
  • retron RTs are capable of reverse transcribing much longer RT-Donors, even up to an entire gene length (26).
  • ncRNA donor unit 229 bp
  • arrayed design adds 109 bp of direct repeat for each additional editor due to msr duplication, both of which pose challenges for the synthesis and assembly of new multitron plasmids.
  • each msd encodes a distinct donor as in the previous version, but the msr is expressed in trans as a separate transcript (Fig 2B).
  • This trans msr arrangement was previously shown to be a tolerated modification for reverse transcription of endogenous retron msds (34). In practice, this reduces the editing unit to 149 bp and reduces the length of the longest direct repeat to 74 bases.
  • the trans msr can interact with any of the arrayed msds, again keeping the donor at a constant distance from the site of RT priming (Fig 2C).
  • retron-derived donors support a broad range of precise mutations, including insertions, deletions and replacements.
  • retron RT-Donor or oligonucleotide donors the efficiency of inserting and deleting base pairs is inversely related to the size of the edit 3,31 . This is presumably intrinsic to the mechanism of recombineering, a result that we replicated here using RT -Donors to delete 1 to 100 bp, finding a declining efficiency with deletion size whether using an endogenous ncRNA architecture or the trans msr architecture (Fig 3 A).
  • a nested deletion series consists of multiple donors intended to make deletions of increasing size progressively at same locus. If the smallest deletion succeeds, it creates a smaller target size for a previously disfavored large deletion.
  • the 50 bp deletion was not significantly less efficient than the 25 bp deletion using singleplex retron donors so, unsurprisingly, the rate of 50 bp deletions by the multitron version was not significantly increased. However, the rate of the 25 bp deletion was decreased by the multitron, suggesting that 25 bp deletions were being converted into 50 bp deletions.
  • Fig 5B To test multitrons in the context of metabolic engineering, we chose to focus on increasing production of lycopene by modifying genes in its biosynthetic pathway (Fig 5B).
  • the other three genes gmpA, gdhA, fdhF were specifically targeted for inactivation by the introduction of premature stop codons within their open reading frames.
  • the multitron plasmid was generated using a one-pot golden gate approach (41) to clone arrayed msds encoding different donors.
  • the MP was next transformed into the bacterial host harboring the lycopene plasmid (LP, a plasmid containing three essential genes (crtE, crtl, cr TB) required for lycopene production (42). Editing cycles were carried out at the permissive temperature (30°C), with dilutions of the culture after every cycle. Editing targets were sequenced in bulk using Illumina MiSeq to determine overall efficiencies. In parallel, cells were plated at 37°C to cure the MP.
  • Retron RT-Donors have been used in S. cerevisiae in combination with CRISPR Cas9 and gRNAs to install precise mutations via templated repair of a cut site (Fig 6A).
  • the architecture of the donor element in yeast is typically a retron ncRNA fused to a CRISPR gRNA and scaffold, all surrounded by ribozymes to excise the editing elements from an mRNA.
  • the relatively large, structured ribozymes present a potential engineering hurdle if they need to be multiply duplicated.
  • base editing which can also be multiplexed.
  • Base Editors can reach efficiencies of over 80% (13) and can be multiplexed to more than 30 loci (14).
  • MAGE is a more relevant comparison to the bacterial work and can achieve similar efficiencies, although with dramatically more hands on type to complete the multiple electroporation cycles.
  • MAGE cannot be used to make the edits that we show in yeast or human cells (new to the revision) so as a technology, we would argue that the multitrons are a more universal platform.
  • the existing yeast genome editing toolbox is vast and spans from simple HR-based editing to more nuanced, multiplexed approaches that have enabled both trackable, genomewide phenotypic screens and targeted, saturation mutagenesis of individual ORFs (44-50).
  • “trackable and multiplex” in this context has usually meant many changes across many genomes, with ⁇ 1 change per genome, rather than >1 changes on an individual genome; and tools that do enable multiple changes per single genome typically do not support trackability of precise and varied edits or require involved and time-consuming workflows. In this sense, we believe that multitrons, in their ability to support multiple trackable and precise edits per individual genome, will naturally fit into the toolbox of yeast biologists in years to come.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Mycology (AREA)
  • Medicinal Chemistry (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Cell Biology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

Provided herein are compositions and methods for modification of the retron editing system to enable multiple simultaneous edits in prokaryotic and/or eukaryotic cells. Such compositions and methods find use in, for example, therapeutic editing and cell engineering, such as for bioproduction.

Description

MULTIPLEXED RETRON GENOME EDITING IN PROKARYOTIC AND EUKARYOTIC GENOMES
PRIORITY
This application claims the benefit of priority of the filing date of U.S. provisional application No. 63/524,317, filed on June 30, 2023, the disclosure of which is incorporated by reference herein in its entirety.
BACKGROUND
Retrons are reverse transcribed elements found in nearly all myxobacteria (Dhundale et al. Journal of Bacteriology 164, 914-917 (1985)) and sparsely in E. coli (Lampson et al. Science 243, 1033-1038 (1989)), V. cholerae (Inouye et al. Microbiology and Immunology 55, 510-513), and other bacteria. The retron operon encodes an RNA primer (multicopy singlestranded RNA, msr), an RNA sequence to be reverse-transcribed (multicopy single-stranded DNA, msd), and a reverse transcriptase, in that order. The retron transcript folds up upon itself and is partially reverse transcribed to generate a single stranded DNA (ssDNA) of about 80 bases. Although the retron-derived DNA is single stranded, it contains a hairpin of doublestranded DNA. Multiple retron ssDNAs can also complement each other to form larger doublestranded elements. Retron variants have different DNA lengths and base content, but broadly share this overall format.
The ssDNA generated by the retron has been used for genome engineering, such as bacterial, with the X Red Beta recombinase for recombineering (Farzadfard et al. Science 346, 1256272, (2014)); and eukaryotic, as a homology-directed repair (HDR) template for Cas9 editing (Sharon et al. Cell 175, 544-557. e516, (2018)) in yeast.
SUMMARY
Our understanding of genomics is limited by the scale of our genomic technologies. While libraries of genomic manipulations scaffolded on CRISPR gRNAs have been transformative, these existing approaches are typically multiplexed across genomes. Yet much of the complexity of real genomes is encoded within a genome across sites. Unfortunately, building cells with multiple, non-adjacent precise mutations remains a laborious cycle of editing, isolating an edited cell, and editing again. Herein is described a technology for precisely modifying multiple sites on a single genome simultaneously. This technology - termed a multitron - is built from a heavily modified retron, in which multiple donor-encoding msds are produced from a single transcript. The multitron architecture is compatible with both recombineering in prokaryotic cells and CRISPR editing in eukaryotic cells. Herein it is applications for this approach are demonstrated in molecular recording, genetic element minimization, and metabolic engineering.
Provided herein are compositions and methods for modification of the retron editing system to enable multiple simultaneous edits in prokaryotic and/or eukaryotic cells. Such compositions and methods find use in, for example, therapeutic editing and cell engineering, such as for bioproduction.
One embodiment provides a mulitplex engineered retron comprising: a) at least one msr gene encoding multicopy single-stranded RNA (msRNA); b) at least one msd gene encoding multicopy single-stranded DNA (msDNA); c) two or more heterologous sequences of interest; and d) a ret gene encoding a reverse transcriptase, including two to five or more different heterologous sequences of interest. The retron can comprise at least two msr genes and at least two msd genes or the retron can comprise one msr gene and at least two msd genes. The heterologous sequences can be inserted into the msr gene and/or the msd gene, including where each of the at least two msd genes independently comprise at least one heterologous sequence. The retron can comprise single-stranded DNA (msDNA) encoded by the msd gene which comprises a msd stem loop, and where the loop comprises the heterologous sequence(s) of interest. In some embodiments, each heterologous sequence independently encodes a donor polynucleotide comprising a 5' homology arm that hybridizes to a 5' target sequence and a 3' homology arm that hybridizes to a 3' target sequence flanking a donor nucleotide sequence comprising an intended edit to be integrated at a target locus by homology directed repair (HDR) or recombineering. The edit can be one or more gene replacements, gene knockouts, deletions, nested deletions, insertions, inversions, or point mutations. One embodiment provides for a retron further comprising a modification which results in enhanced production of msDNA formed from the retron, as compared to a retron without any of the modifications described herein. In one embodiment, the heterologous sequence comprises a CRISPR protospacer DNA sequence, including a where the CRISPR protospacer DNA sequence comprises a modified AAG protospacer adjacent motif (PAM). In one embodiment, the retron further comprising a barcode sequence, including wherein the barcode sequence is located in a hairpin loop of the msDNA. In one embodiment, the msr gene and the msd genes are provided in a trans arrangement or a cis arrangement. In another embodiment, the ret gene is provided in a trans arrangement with respect to the msr gene and/or the msd gene. In one embodiment, the msr gene, msd gene, and ret gene are a modified bacterial retron msr gene, msd gene, and ret gene, such as wherein the msr gene, msd gene, and ret gene are independently a modified myxobacteria retron, a modified Escherichia coli retron, a modified Salmonella enterica retron, or a modified Vibrio cholerae retron. In one embodiment, the modified Escherichia coli retron is a modified EC83 or a modified EC86.
One embodiment provides a vector system comprising one or more vectors comprising the engineered retron described herein. In one embodiment, the msr gene and the msd gene are provided by the same vector or different vectors. In another embodiment, the msr gene, the msd gene, and the ret gene are provided by the same vector. In one embodiment, the same vector comprises a promoter operably linked to the msr gene and the msd gene, including where the promoter is operably linked to the ret gene. One embodiment provides for a second promoter operably linked to the ret gene. In one embodiment, the msr gene, the msd gene, and the ret gene are provided by different vectors. In one embodiment, the one or more vectors are viral vectors or nonviral vectors, such as plamids. In one embodiment, the engineered retron comprises two or more heterologous sequences, wherein each heterologous sequence independently encodes a donor polynucleotide comprising a 5' homology arm that hybridizes to a 5' target sequence and a 3' homology arm that hybridizes to a 3' target sequence flanking a nucleotide sequence comprising an intended edit to be integrated at a target locus by homology directed repair (HDR) or recombineering. One embodiment further comprises a vector encoding an RNA-guided nuclease, including wherein the RNA-guided nuclease is a Cas nuclease or an engineered RNA-guided Fokl-nuclease, such as Cas9 or Cpfl. In one embodiment, the engineered retron comprises a CRISPR protospacer DNA sequence. Another embodiment further comprises a vector encoding a Casl and/or Cas2 protein and/or a vector comprising a CRISPR array sequence and/or a vector encoding bacteriophage homologous recombination proteins, such as a vector encoding the bacteriophage homologous recombination proteins is a replication defective X prophage comprising the exo, bet, and gam genes.
One embodiment provides an isolated host cell comprising the engineered retron described herein or the vector system described herein. In one embodiment, the host cell is a prokaryotic, archeon, or eukaryotic host cell. In one embodiment, the eukaryotic host cell is a mammalian host cell, such as a human host cell. In one embodiment, the eukaryotic host cell is a non-human host cell. In one embodiment, the host cell endogenously expresses or has been modified to express one or more single stand annealing proteins (SSAPs), one more single stranded DNA binding proteins (SSBs), one or more mutant mismatch repair proteins or combination thereof.
One embodiment provides a kit comprising the engineered retron described herein, the vector system described herein, or the host cell described herein. In one embodiment, the kit further comprises instructions for genetically modifying a cell with the engineered retron.
One embodiment provides a multiplex method of genetically modifying a cell comprising: a) transfecting a cell with the engineered retron described herein; and b)introducing an RNA-guided nuclease and guide RNA into the cell, wherein the RNA-guided nuclease forms a complex with the guide RNA, said guide RNAs directing the complex to the genomic target locus, wherein the RNA-guided nuclease creates a double-stranded break in the genomic DNA at the genomic target locus, and the donor polynucleotide generated by the engineered retron is integrated at the genomic target locus recognized by its 5' homology arm and 3' homology arm by homology directed repair (HDR) to produce a genetically modified cell. In one embodiment, the RNA-guided nuclease is a Cas nuclease, such as Cas9 or Cpfl, or an engineered RNA-guided Fokl-nuclease. In one embodiment, the RNA-guided nuclease is provided by a vector or a recombinant polynucleotide integrated into the genome of the cell. In one embodiment, the engineered retron is provided by a vector. In one embodiment, the donor polynucleotide is used to create two or more independent gene replacements, gene knockouts, deletions, nested deletions, insertions, inversions, or point mutations.
One embodiment provides a multiplex method of genetically modifying a cell by recombineering, the method comprising: a) transfecting the cell with the engineered retron described herein; and b) introducing bacteriophage recombination proteins into the cell, wherein the bacteriophage recombination proteins mediate homologous recombination at a target locus such that the donor polynucleotide generated by the engineered retron is integrated at the target locus recognized by its 5' homology arm and 3' homology arm to produce a genetically modified cell. In one embodiment, the donor polynucleotide is used to modify a plasmid, bacterial artificial chromosome (BAC), or a bacterial chromosome in the bacterial cell by recombineering. In one embodiment, each donor polynucleotide can create a gene replacement, gene knockout, deletion, nested deletion, insertion, inversion, or point mutation. In one embodiment, said introducing bacteriophage recombination proteins into the cell comprises insertion of a replication-defective X prophage into the bacterial genome, wherein bacteriophage comprises exo, bet, an gam genes. One embodiment provides a method of barcoding a cell comprising transfecting a cell with the engineered retron described herein.
One embodiment provides a multiplex method of producing an in vivo molecular recording system comprising: a) introducing a Casl protein or a Cas2 protein of a CRISPR adaptation system into a host cell; b) introducing a CRISPR array nucleic acid sequence comprising a leader sequence and at least one repeat sequence into the host cell, wherein the CRISPR array nucleic acid sequence is integrated into genomic DNA or a vector in the host cell; and c) introducing a plurality of engineered retrons described herein into the host cell, wherein each retron comprises a different protospacer DNA sequence that can be processed and inserted into the CRISPR array nucleic acid sequence. In one embodiment, the Casl protein or the Cas2 protein are provided by a vector. In one embodiment, the engineered retron is provided by a vector. In one embodiment, the plurality of engineered retrons comprises at least three different protospacer DNA sequences.
On embodiment provides an engineered cell comprising an in vivo molecular recording system comprising: a) Casl protein or a Cas2 protein of a CRISPR adaptation system; b) CRISPR array nucleic acid sequence comprising a leader sequence and at least one repeat sequence into the host cell, wherein the CRISPR array nucleic acid sequence is integrated into genomic DNA or a vector in the engineered cell; and c) plurality of engineered retrons described herein, wherein each retron comprises a different protospacer DNA sequence that can be processed and inserted into the CRISPR array nucleic acid sequence. In one embodiment, the Casl protein or the Cas2 protein are provided by a vector. In another embodiment, the engineered retron is provided by a vector. In one embodiment, the plurality of engineered retrons comprises at least three different protospacer DNA sequences.
One embodiment provides a kit comprising the engineered cell described herein and instructions for in vivo molecular recording.
One embodiment provides a multiplex method of producing recombinant msDNA comprising: a) transfecting a host cell with the engineered retron described herein or the vector system described herein; and c) culturing the host cell under suitable conditions, wherein the msDNA is produced.
Another embodiment provides an engineered retron ncRNA comprising: a) at least one msr gene encoding multicopy single-stranded RNA (msRNA); b) at least one msd gene encoding multicopy single-stranded DNA (msDNA); c) at least one guide RNA; and d) two or more repair templates. In one embodiment, the retron comprises two to five different repair templates. In one embodiment, the retron comprises at least two msr genes and at least two msd genes. In one embodiment, the retron comprises one msr gene and at least two msd genes. In one embodiment, the repair templates are inserted into the msr gene and/or the msd gene. In one embodiment, each of the at least two msd genes comprise at least one repair template. In one embodiment, the at least one guide RNA is fused to the end of each msd gene and each of the msd gene can be separated by a Csy4 site. In one embodiment, single-stranded DNA (msDNA) encoded by the msd gene comprises a msd stem loop, and where the loop comprises the repair template.
One embodiment provides a multiplex engineered retron described herein, wherein the retron comprises: one msr gene; two to five msd genes; at least one guide RNA fused to the end of each msd gene; at least one repair template in each msd gene; and at least one Csy4 site. In one embodiment, the msd genes are separated by the Csy4 site. In one embodiment, the guide RNA binds to a target genomic DNA. In one embodiment, the guide RNA binds to a target genomic DNA in a bacterial, yeast, or mammalian cell, such as a human cell. In one embodiment, the guide RNA binds to a target genomic DNA in a non-human cell. In one embodiment, the repair template binds to a target genomic DNA, including wherein the repair template binds to a target genomic DNA in a bacterial, yeast, or mammalian cell. In one embodiment, the repair template binds to a target genomic DNA having at least one allele with a mutation or polymorphism. In one embodiment, the repair template comprises one or more non-complementary nucleotides to the repair templates target genomic DNA. In one embodiment, the repair template comprises two or more, or three or more non-complementary nucleotides to the repair templates target genomic DNA. In one embodiment, the non- complementary nucleotides are ‘repair’ nucleotides that can substitute for mutant, variant, or polymorphism nucleotides in the target genomic DNA.
One embodiment provides a composition comprising a carrier and an engineered retron described herein.
One embodiment provides a multiplex method comprising administering an engineered retron described herein, or a composition described herein to a subject or to cell(s) from the subject. In one embodiment, the subject has, or is suspected of having or developing a disease or condition, such as cystic fibrosis, thalassemia, sickle cell anemia, Huntington's disease, diabetes, Duchenne's Muscular Dystrophy, Tay-Sachs Disease, Marfan syndrome, Alzheimer’s disease, Leber's hereditary optic atrophy (LHON), myoclonic epilepsy with ragged red fibers (MERRF), mitochondrial encephalopathy, lactic acidosis, and stroke-like episodes (MELAS; a type of dementia), obesity, cancers, brain ischemia, coronary disease, myocardial infarction, reperfusion hindrance of ischemic diseases, atopic dermatitis, psoriasis vulgaris, contact dermatitis, keloid, decubital ulcer, ulcerative colitis, Crohn's disease, nephropathy, glomerulosclerosis, albuminuria, nephritis, renal failure, rheumatoid arthritis, osteoarthritis, asthma, chronic obstructive pulmonary disease (COPD), and combinations thereof.
One embodiment provides an expression cassette comprising a nucleotide sequence encoding an engineered retron described herein, and optionally a nucleotide sequence encoding a retron reverse transcriptase. In one embodiment, the nucleotide sequence encoding the engineered retron further comprises at least one promoter, such as a RNA polymerase III (pol III) promoter. In one embodiment, the pol III promoter is a constitutive promoter, such as SNR52, 7SK, U6, or Hl. In one embodiment, the msr gene is expressed from the pol III promoter. In one embodiment, the at least one promoter is a RNA polymerase II (pol II) promoter, such as an inducible promoter. In one embodiment, the msd gene is expressed from the pol II promoter.
One embodiment provides a vector comprising an expression cassette described herein.
One embodiment provides a composition comprising a carrier and the expression cassette described herein or a vector described herein.
One embodiment provides a multiplex method comprising administering an expression cassette described herein or a vector described herein, or a composition described herein to a subject or to cell(s) from the subject. In one embodiment, the subject has, or is suspected of having or developing a disease or condition, such as cystic fibrosis, thalassemia, sickle cell anemia, Huntington's disease, diabetes, Duchenne's Muscular Dystrophy, Tay-Sachs Disease, Marfan syndrome, Alzheimer’s disease, Leber's hereditary optic atrophy (LHON), myoclonic epilepsy with ragged red fibers (MERRF), mitochondrial encephalopathy, lactic acidosis, and stroke-like episodes (MELAS; a type of dementia), obesity, cancers, brain ischemia, coronary disease, myocardial infarction, reperfusion hindrance of ischemic diseases, atopic dermatitis, psoriasis vulgaris, contact dermatitis, keloid, decubital ulcer, ulcerative colitis, Crohn's disease, nephropathy, glomerulosclerosis, albuminuria, nephritis, renal failure, rheumatoid arthritis, osteoarthritis, asthma, chronic obstructive pulmonary disease (COPD), and combinations thereof.
One embodiment provides a multiplex gene editing system comprising: one or more vectors comprising one or more nucleotide sequences encoding an engineered retron described herein, a retron reverse transcriptase, and a Cas nuclease. In one embodiment, the retron reverse transcriptase and Cas nuclease are encoded as a fusion protein. In one embodiment, the one or more vectors comprise one or more promoters. In one embodiment, the guide RNA of the retron binds to a target genomic DNA. In one embodiment, the guide RNA of the retron binds to a target genomic DNA in a bacterial, yeast, or mammalian cell. In one embodiment, the guide RNA of the retron binds to a target genomic DNA in a mammalian cell, such as a human cell. In one embodiment, the guide RNA of the retron binds to a target genomic DNA in a nonhuman cell. In one embodiment, the repair template of the retron binds to a target genomic DNA. In one embodiment, the repair template of the retron binds to a target genomic DNA in a bacterial, yeast, or mammalian cell. In another embodiment, the repair template of the retron binds to a target genomic DNA having at least one allele with a mutation or polymorphism. In one embodiment, the repair template of the retron comprises one or more non-complementary nucleotides to the repair templates target genomic DNA. In one embodiment, the repair template of the retron comprises two or more, or three or more non-complementary nucleotides to the repair templates target genomic DNA. In one embodiment, the non-complementary nucleotides are ‘repair’ nucleotides that can substitute for mutant, variant, or polymorphism nucleotides in the target genomic DNA. In one embodiment, the one more promoters is a RNA polymerase III (pol III) promoter, such as constitutive promoter, including from SNR52, 7SK, U6, or Hl. In one embodiment, the msr gene is expressed from the pol III promoter. In one embodiment, the one or more promoters is a RNA polymerase II (pol II) promoter. In one embodiment, the pol II promoter is an inducible promoter. In one embodiment, the msd gene is expressed from the pol II promoter. In one embodiment, a first vector encodes the retron and a second vector encodes the retron reverse transcriptase and Cas nuclease, including a Cas9 or Cpfl. In one embodiment, the Cas nuclease is SpCas9.
One embodiment provides a composition comprising a carrier and a gene editing system described herein.
One embodiment provides a multiplex method comprising administering a gene editing system described herein, or a composition described herein to a subject or to cell(s) from the subject. In one embodiment, the subject has, or is suspected of having or developing a disease or condition, such as cystic fibrosis, thalassemia, sickle cell anemia, Huntington's disease, diabetes, Duchenne's Muscular Dystrophy, Tay-Sachs Disease, Marfan syndrome, Alzheimer’s disease, Leber's hereditary optic atrophy (LHON), myoclonic epilepsy with ragged red fibers (MERRF), mitochondrial encephalopathy, lactic acidosis, and stroke-like episodes (MELAS; a type of dementia), obesity, cancers, brain ischemia, coronary disease, myocardial infarction, reperfusion hindrance of ischemic diseases, atopic dermatitis, psoriasis vulgaris, contact dermatitis, keloid, decubital ulcer, ulcerative colitis, Crohn's disease, nephropathy, glomerulosclerosis, albuminuria, nephritis, renal failure, rheumatoid arthritis, osteoarthritis, asthma, chronic obstructive pulmonary disease (COPD), and combinations thereof.
One embodiment provides a multiplex method of genetically editing one or more target cites in one or more cells, comprising: a) transfecting a population of cells with an expression cassette described herein, or a gene editing system described herein to generate a population of transfected cells; and b) selecting one or more cells from the population of transfected cells as genetically edited cells. In one embodiment, selecting one or more cells comprises generating colonies from individual transfected cells to provide isogenic individual colonies and selecting one or more precisely edited cells from at least one isogenic colony. One embodiment further comprises sequencing one or more genomic target sites in cells from one or more isogenic individual colonies to confirm that the genomic target sites in at least one of the isogenic individual colonies are precisely edited, thereby generating precisely edited cells. Another embodiment further comprises administering a population of the precisely edited cells to a subject. In one embodiment, the subject has, or is suspected of having or developing a disease or condition, such as cystic fibrosis, thalassemia, sickle cell anemia, Huntington's disease, diabetes, Duchenne's Muscular Dystrophy, Tay-Sachs Disease, Marfan syndrome, Alzheimer’s disease, Leber's hereditary optic atrophy (LHON), myoclonic epilepsy with ragged red fibers (MERRF), mitochondrial encephalopathy, lactic acidosis, and stroke-like episodes (MELAS; a type of dementia), obesity, cancers, brain ischemia, coronary disease, myocardial infarction, reperfusion hindrance of ischemic diseases, atopic dermatitis, psoriasis vulgaris, contact dermatitis, keloid, decubital ulcer, ulcerative colitis, Crohn's disease, nephropathy, glomerulosclerosis, albuminuria, nephritis, renal failure, rheumatoid arthritis, osteoarthritis, asthma, chronic obstructive pulmonary disease (COPD), and combinations thereof.
DESCRIPTION OF THE DRAWINGS
FIGS. 1A-1G. Encoding several donors in a retron msd enables multiplexed retron recombineering in bacteria. A. Top: schematic of the retron recombineering operon with two donors encoded within the msd region (light blue). Donor labels 1 (purple) and 2 (green) indicate the order in which the donor is reverse transcribed by the retron RT (grey). Bottom: schematic of the retron recombineering process, which occurs during replication in the lagging strand with the participation of CspRecT protein. Red stars represent the desired mutations to be integrated in the bacterial chromosome. B. Quantification of precise editing rates of the rpoB locus alone and both rpoB and gyrA loci in bacteria. The order in which the donors are reverse transcribed (1 or 2) is indicated for each condition. For b, c, d, e, and f, data were quantified by Illumina sequencing after 24h of editing, circles show each of the three biological replicates, bars are mean ±SD (one-way ANOVA, effect of condition on rpoB editing P=0.2616). C. Comparison of donor order for retron-encoded donors versus donors encoded on an oligonucleotide matching the retron RT-DNA. Editing is shown as percent of maximum precise editing for each condition, illustrating that the retron is influenced by position effects that are not found when using an oligonucleotide donor (one-way ANOVA effect of conditions P<0.001; Tukey’s corrected effect of retron order P<0.0001, oligo order P=0.9842). D. Top: schematic of the retron recombineering cassette with 3 donors encoded in the msd region. The numbers above the donors indicate the order of reverse transcription. Bottom: quantification of precise editing rates of bacterial rpoB, gyrA, and lacZ loci. Right: schematic of the bacterial chromosome indicating donor position and strand with respect to the origin of replication (lagging strand for rpoB and gyrA donors and leading strand for the lacZ donor). E. Replot of the data in d, illustrating the effect of position on editing at each site (two-way ANOVA effect of position PO.OOOl). F. Quantification of precise editing rates for rpoB, gyrA and priB loci in bacteria, in the same architecture shown in d. Right: schematic of bacterial chromosome indicating donor position and strand respect to the origin of replication. In this case, all donors are in the lagging strand, (one-way ANOVA, effect of editing site P=0.0015). G. Use of multiplexed retron recombineering to improve analog molecular recording technologies, (left) Increasing amounts of m-toluic acid (mTol) are recorded using a retron-derived analog recorder; (right) quantification of precise editing rates for rpoB, gyrA and priB loci using different amounts of mTol. Error bars indicate the standard deviation for three independent biological replicates.
FIGS. 2A-2F. Improved multiplexed editing using donors in arrayed retron msds. A. Top: schematic of retron recombineering using 2 independent ncRNAs. Each msd region (blue) encodes a different donor (1 and 2). Bottom: quantification of precise editing rates for precise editing of gyrA or rpoB alone or simultaneously (unpaired t-test, singleplex versus multiplex, rpoB P<0.0001, gyrA P=0.0006). B. Top: Schematic of retron recombineering using an msd array with a single msr sequence in trans. Bottom: quantification of precise editing rates for precise editing of rpoB or gyrA alone or simultaneously (unpaired t-test, singleplex versus multiplex, rpoB P=Q.23 \2, gyrA P=QA"IQ2'). C. Top: schematic of arrayed msd and msr transcription products. Arrayed msd is transcribed as a single transcript. Bottom: schematic of RT-DNA production using as template an arrayed msd. 1 and 2 indicates the number of the msd in the arrayed msd. D. Top: schematic of 3x arrayed msd. Bottom: quantification of precise editing of rpoB, gyrA or priB edits alone or simultaneously. E. Replot of the data in d, illustrating the effect of position on editing at each site (two-way ANOVA, effect of position P=0.1138). F. Quantification of precise editing using a 5x arrayed msd toedithda,fbaH,priB, rpoB and gyrA. Data in a, b, d, e and f were quantified by Illumina sequencing after 24h of editing, circles show each of the three biological replicates, bars are mean ±SD.
FIGS. 3A-3C. Increasing limits of deletion size using nested deletion donor arrays. A. Top: Schematic of genome deletions using retron recombineering. Middle: schematic of a standard retron cassette to make deletions (top) with the donor represented by a diamond and an arrayed msd retron cassette with the donor represented by a hexagon. Bottom: quantification of precise editing rates for a single deletion of 1 bp, 10 bp, 25 bp, 50 bp or 100 bp deletions by Illumina sequencing after 24h of editing. Diamonds show deletions from a standard architecture and hexagons show deletions using an arrayed architecture (one-way ANOVA, effect of deletion size P<0.0001). B. Top: Schematic of arrayed msd retron cassette with two donors to make 25 and 50 bp deletions. Middle: Schematic of a nested deletion strategy using two donors to delete 25 bp and 50 bp. If the 25 bp occurs first, the 50 bp deletion becomes a 25 bp deletion. Bottom: Quantification of precise editing rates for single 25 and 50 bp deletions, and for the nested 50 bp deletion. C. Top: Schematic of arrayed msd retron cassette with three donors to make 25, 50 bp and 100 bp deletions. Middle: Schematic of a nested deletion strategy using three donors to delete 25 bp, 50 bp and 100 bp. Bottom: Quantification of precise editing rates for single 25 bp, 50 bp and 100 bp deletions, and for each deletion using the nested strategy (unpaired t-test, singleplex versus multiplex lOObp deletion, P=0.0485). Data in b and c were quantified by Illumina sequencing 24h after of editing, circles show each of the three biological replicates, bars are mean ±SD.
FIGS. 4A-4D. Multisite editing of individual bacterial genomes using multitrons. A. Top: Schematic of retron recombineering using an msd array encoding 3 donors with a single msr sequence in trans. Bottom: (left) schematic of the multitron recombineering process at this locus. All retron donors are able to target the lagging strand of gyrA gene during bacterial replication in a chromosomal window of 300 bp. Arrows represent the primers used to amplify the target region, (right) quantification of precise editing rates of individual target sites along the gyrA gene, circles show each of the three biological replicates, bars are mean ±SD. B. Quantification of expected (product of bulk rates at each indicated site) and real precise editing rates of double and triple combinatorial edits in the gyrA locus of an individual genome. Circles show each of the three biological replicates, bars are mean ±SD (two-way ANOVA, expected vs real, =0.0765). C. Top: Schematic of single-plasmid, temperature sensitive multitron architecture. Below: Editing rates for each indicated site at each time point from bulk (Illumina amplicon sequencing) and individual colony sequencing. Circles show each of the three biological replicates, bars are mean ±SD. Mean colony sequencing rates are indicated with a bar. D. Quantification of expected (product of bulk rates at each indicated site) and real precise editing rates of double edits in individual genomes. Circles show each of the three biological replicates, bars are mean ±SD (two-way ANOVA, expected vs real, =0.2734). Colony sequencing represented by a single point.
FIGS. 5A-5F. Metabolic engineering using multitrons. A. Top: architecture of the multiplexed retron recombineering cassette in the temperature sensitive plasmid. The operon is composed of a single msr followed by 5x arrayed msds with donors and the genes encoding the RT, the CspRecT and the dominant negative MutLE32K. Bottom: quantification of precise editing rates using a 5x arrayed msd to edit hda, fbaH, priB, rpoB and gyrA by Illumina sequencing 24h and 48h after of editing (two-way ANOVA, effect of expression time PO.OOOl). Circles show each of the three biological replicates, bars are mean ±SD. The order of the donors in the arrayed msd is indicated. B. Top: Schematic of the lycopene biosynthesis pathway, with key genes to increase lycopene production highlighted. Bottom: Schematic of metabolic engineering of lycopene biosynthesis pathway using multiplexed retron recombineering. C. The donors are cloned into a temperature sensitive backbone using a golden gate assembly protocol. Single colonies are grown for 24 h and then induced with arabinose. This cycle is repeated by making 1 : 1000 dilutions for several days. Editing rates are measured by Illumina sequencing and cultures are plated to select individual colonies based on color for quantification of lycopene production. D. Quantification of precise editing rates using different recombitron plasmids containing a variable number of donors to edit genes in the lycopene pathway, quantified by Illumina sequencing after 24h and 72h. Circles show each of the six biological replicates, bars are mean ±SD. E. Quantification of lycopene production in single colonies. Lycopene production was normalized against the average production of the control, which contains the pAC-LYC but was not exposed to the recombineering process. Each point represents a colony (n=12). F. Quantification of lycopene production from colonies re-isolated from samples in the low (~2X control), medium (~3X control), and high (~4x control) production clusters of the dxs/idi condition. Open circles are individual colony values and closed circles are the mean. Sanger sequencing examples to the right illustrate the genotype of each subset (all individual colonies within a condition have identical genotypes).
FIGS 6A-J. Arrayed retron msds enable multiplexed editing in eukaryotic cells. A. Top: Schematic of the donor encoding retron ncRNA/gRNA expression cassette expressed from a Gal7 Pol II promoter and flanked by ribozymes versus a new construction replacing ribozymes with Csy4 sequences. Bottom left: schematic of a retron ncRNA-Cas9 gRNA hybrid for genome editing in yeast, depicted above the protein-coding expression cassette which is inserted into the yeast genome. Bottom right: quantification of precise editing of the ADE2 locus in yeast by Illumina sequencing after 48h of editing. Circles show each of the three biological replicates, bars are mean ±SD; absence/presence of Csy4 in the proteincoding expression cassette is shown below the graph (Sidak’s corrected multiple comparisons, effect of Csy4 expression, ribozyme construction =0.2779, csy4 construction P<0.0001). B. Top: schematic of an arrayed retron ncRNA-Cas9 gRNA expression cassette, expressed from a Gal7 Pol II promoter, flanked by ribozymes, and separated by a Csy4 sequence. The retron editors in positions 1 and 2 target the ADE2 and FAA1 locus, respectively. Bottom: quantification of precise editing of the ADE2 and FAA1 loci in yeast by Illumina sequencing after 48h of editing. Circles show each of the three biological replicates, bars are mean ±SD; absence/presence of Csy4 in the protein-coding expression cassette is shown below the graph (Sidak’s corrected multiple comparisons, effect of csy4 expression, ADE2 O.OOOl, FAA1 P=0.0012). C. Editing rates for each indicated site at each time point from bulk (Illumina amplicon sequencing) and individual colony sequencing. Circles show each of the three biological replicates, bars are mean ±SD. Mean colony sequencing rates are indicated with a bar. D. Quantification of expected (product of bulk rates at each indicated site) and real precise editing rates of double edits in individual genomes. Circles show each of the three biological replicates, bars are mean ±SD (two-way ANOVA, expected vs real, P=0. 0.4318). E. Top: schematic of an arrayed retron msdRNA-Cas9 gRNA expression cassette, expressed from a Gal7 Pol II promoter, flanked and separated by a Csy4 sequence; the msrRNA is expressed in trans from a SNR52 Pol III promoter. Bottom: assembly schematic for one-pot Golden Gate cloning of multiple msdRNA-sgRNA editors. F. Schematic showing the presumed processing, annealing and reverse-transcription involved in the generation of editing donors from arrayed retron msdRNA-Cas9 gRNA cassettes. G-I, top: schematic of 2x, 3x or 5x arrayed retron msdRNA-Cas9 gRNA expression cassettes, as shown in c. Bottom: quantification of precise editing of the various yeast loci targeted by the retron editors shown above, by Illumina sequencing, after 24 and 120h of editing. The editors target ADE2 and FAA1 (E); ADE2, CAN1 and FAA1 (F); and ADE2, CAN1, TRP2, SGS1 and FAA1 (G). Two-way ANOVA, effect of expression time, E P<0.0001, F P<0.0001, G P=0.0038. Circles show each 3 biological replicates, bars are mean ±SD. J. Arrayed retron msds enable multiplexed editing in human cells. Top: Schematic of the donor encoding retron ncRNA/gRNA expression cassette expressed from an Hl promoter and flanked by tRNA-Cys-GCA (hCtRNA) sequences. Bottom: quantification of precise editing of the EMX1, HEK3 and FANCF locus in HEK293T cells by Illumina sequencing after 72h post-transfection. Circles show each of the three (EMX1 only), two (dRT and 3X editor EMX1) and one (3X editor HEK3 and FANCF) biological replicates, bars are mean ±SD; absence/presence of a catalytically active retron RT is shown below the graph.
FIG. SI. Trans msr multitron architecture enables precise genome editing. Top: Schematic of retron recombineering using an msd array with a single msr sequence in trans including a terminator (T) between the msd array and msr. Bottom: quantification of precise editing rates for precise editing of rpoB or gyrA simultaneously by Illumina sequencing after 24h of editing. Circles show each of the three biological replicates, bars are mean ±SD.
FIGS. S2A-S2F. Optimization of retron recombineering using a single plasmid. A. Left: schematic of the different retron operon architectures tested. ncRNA with donor (orange and blue), genes required (grey) and optimized ribosome binding sites (RBS) regions (green) are indicated Right: quantification of rates for precise rpoB editing, circles show each of the three biological replicates, bars are mean ±SD. B. Quantification of precise editing rates for rpoB target site at 30 and 37°C, circles show each of the three biological replicates, lines are mean ±SD. C. Quantification of ODeoo using increasing concentrations of m-toluic acid after 16h of bacterial growing (n=l). D. Quantification of precise editing rates for rpoB using different concentrations of arabinose (n=l). E. Quantification of colonies with intact msd arrays. A total of 30 colonies coming from 3 different replicates were sequenced, bars are mean ±SD All precise editing rates were quantified using Illumina MiSeq after 24h of editing. F. Scheme of the protocol used to analyze genetic stability of the retron arrays. Briefly, recombineering plasmid was transformed into E. coli strain bMS.346, followed by 5 days of growing and diluting in the presence or absence of the arabinose. A dilution of the final culture was diluted and plated. Finally, the msd Array of 10 individual colonies per replicate (n=3) were amplified and sequenced to assess genetic stability of the multitron approach. FIGS. S3A-S3B. Local off-target mutations. A. Quantification of precise editing rates for fbaH and hda genes using a live or dead version of Ecol RT, circles show each of the three biological replicates, bars are mean ±SD. B. Local off-target mutation frequency in the 70 bp region of the chromosome homologous to fbaH and hda editing donors using a live of dead version of Ecol RT circles show each of the three biological replicates, bars are mean ±SD. All data was quantified using Illumina MiSeq after 24h of editing.
FIGS. S4A-S4E. Undesired on-target mutation rates caused by arrayed retron multiplexed editing in yeast cells. A. Top: Schematic of the donor encoding retron ncRNA/gRNA expression cassette expressed from a Gal7 Pol II promoter and flanked by ribozymes versus a new construction replacing ribozymes with Csy4 sequences. Bottom left: schematic of a retron ncRNA-Cas9 gRNA hybrid for genome editing in yeast, depicted above the protein-coding expression cassette which is inserted into the yeast genome. Bottom right: quantification of indel rates of the ADE2 locus in yeast by Illumina sequencing after 48h of editing. Circles show each of the three biological replicates, bars are mean ±SD; absence/presence of Csy4 in the protein-coding expression cassette is shown below the graph. B. Top: schematic of an arrayed retron ncRNA-Cas9 gRNA expression cassette, expressed from a Gal7 Pol II promoter, flanked by ribozymes, and separated by a Csy4 sequence. The retron editors in positions 1 and 2 target the ADE2 and FAA1 locus, respectively. Bottom: quantification of indel rates of the ADE2 and FAA1 loci in yeast by Illumina sequencing after 48h of editing. Circles show each of the three biological replicates, bars are mean ±SD; absence/presence of Csy4 in the protein-coding expression cassette is shown below the graph. C-E, top: schematic of 2x, 3x or 5x arrayed retron msdRNA-Cas9 gRNA expression cassettes. Bottom: quantification of indel rates of the various yeast loci targeted by the retron editors shown above, by Illumina sequencing, after 24 and 120h of editing. The editors target ADE2 and FAA1 (e); ADE2, CAN1 and FAA1 (f); and ZDE2, CAN1, TRP2, SGS1 and FAA J (g).
DETAILED DESCRIPTION
In the last years a tool has been developed to produce template DNA inside cells in a bacterial system called retron that recently has been shown to be involved in phage defense (1,2). Retrons are tripartite systems composed by a reverse transcriptase (RT), a contiguous non-coding RNA (ncRNA) with two regions, msr and msd, and an additional protein or RT- fused domain with diverse enzymatic functions (3). Targeted reverse-transcription activity of retrons is being used to produce a specific single-stranded DNA (ssDNA) donor in vivo via engineering of the msd region (4,5). Combined with single-stranded annealing proteins (SSAPs) in bacteria and with Cas9 in yeast and human cells (or other eukaryotic cells), retron- derived donors have been tested to efficiently edit genomes across kingdoms of life (4,5). For many biotechnological and therapeutic applications, editing of multiple DNA loci in a single genome is desired. Therefore, herein a multiplexed retron-based genome editing tool has been developed. New retron architectures were designed that coded for several ssDNA donors, showing that this group of prokaryotic RTs represent a versatile bioengineering tool for editing up to 5 loci simultaneously in both prokaryotic and eukaryotic cells with efficiencies >90%. This technology was used to engineer the lycopene metabolic path in E. coli and S. cerevisiae showing that retron-based genome editing could be used to increase the production of compounds of interest.
DEFINITIONS
The following definitions are included to provide a clear and consistent understanding of the specification and claims. As used herein, the recited terms have the following meanings. All other terms and phrases used in this specification have their ordinary meanings as one of skill in the art would understand. Such ordinary meanings may be obtained by reference to technical dictionaries, such as Hawley's Condensed Chemical Dictionary 14th Edition, by R.J. Lewis, John Wiley & Sons, New York, N.Y., 2001.
References in the specification to "one embodiment," "an embodiment," etc., indicate that the embodiment described may include a particular aspect, feature, structure, moiety, or characteristic, but not every embodiment necessarily includes that aspect, feature, structure, moiety, or characteristic. Moreover, such phrases may, but do not necessarily, refer to the same embodiment referred to in other portions of the specification. Further, when a particular aspect, feature, structure, moiety, or characteristic is described in connection with an embodiment, it is within the knowledge of one skilled in the art to affect or connect such aspect, feature, structure, moiety, or characteristic with other embodiments, whether or not explicitly described.
The singular forms "a," "an," and "the" include plural reference unless the context clearly dictates otherwise. Thus, for example, a reference to "a compound" includes a plurality of such compounds, so that a compound X includes a plurality of compounds X. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for the use of exclusive terminology, such as "solely," "only," and the like, in connection with any element described herein, and/or the recitation of claim elements or use of "negative" limitations. The term "and/or" means any one of the items, any combination of the items, or all of the items with which this term is associated. The phrase "one or more" is readily understood by one of skill in the art, particularly when read in context of its usage. For example, one or more substituents on a phenyl ring refers to one to five, or one to four, for example if the phenyl ring is di -substituted.
As used herein, “or” should be understood to have the same meaning as “and/or” as defined above. For example, when separating a listing of items, “and/or” or “or” shall be interpreted as being inclusive, e.g., the inclusion of at least one, but also including more than one of a number of items, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as “only one of’ or “exactly one of,” or, when used in the claims, “consisting of,” will refer to the inclusion of exactly one element of a number or list of elements. In general, the term “or” as used herein shall only be interpreted as indicating exclusive alternatives (i.e., “one or the other but not both”) when preceded by terms of exclusivity, such as “either,” “one of,” “only one of,” or “exactly one of.”
As used herein, the terms “including,” “includes,” “having,” “has,” “with,” or variants thereof, are intended to be inclusive similar to the term “comprising.”
The term "about" can refer to a variation of ± 5%, ± 10%, ± 20%, or ± 25% of the value specified. For example, "about 50" percent can in some embodiments carry a variation from 45 to 55 percent. For integer ranges, the term "about" can include one or two integers greater than and/or less than a recited integer at each end of the range. Unless indicated otherwise herein, the term "about" is intended to include values, e.g., weight percentages, proximate to the recited range that are equivalent in terms of the functionality of the individual ingredient, the composition, or the embodiment. The term about can also modify the endpoints of a recited range as discuss above in this paragraph.
As will be understood by the skilled artisan, all numbers, including those expressing quantities of ingredients, properties such as molecular weight, reaction conditions, and so forth, are approximations and are understood as being optionally modified in all instances by the term "about." These values can vary depending upon the desired properties sought to be obtained by those skilled in the art utilizing the teachings of the descriptions herein. It is also understood that such values inherently contain variability necessarily resulting from the standard deviations found in their respective testing measurements.
As will be understood by one skilled in the art, for any and all purposes, particularly in terms of providing a written description, all ranges recited herein also encompass any and all possible sub-ranges and combinations of sub-ranges thereof, as well as the individual values making up the range, particularly integer values. A recited range (e.g., weight percentages or carbon groups) includes each specific value, integer, decimal, or identity within the range. Any listed range can be easily recognized as sufficiently describing and enabling the same range being broken down into at least equal halves, thirds, quarters, fifths, or tenths. As a non-limiting example, each range discussed herein can be readily broken down into a lower third, middle third and upper third, etc. As will also be understood by one skilled in the art, all language such as "up to," "at least," "greater than," "less than," "more than," "or more," and the like, include the number recited and such terms refer to ranges that can be subsequently broken down into sub-ranges as discussed above. In the same manner, all ratios recited herein also include all sub-ratios falling within the broader ratio. Accordingly, specific values recited for radicals, substituents, and ranges, are for illustration only; they do not exclude other defined values or other values within defined ranges for radicals and substituents.
One skilled in the art will also readily recognize that where members are grouped together in a common manner, such as in a Markush group, the invention encompasses not only the entire group listed as a whole, but each member of the group individually and all possible subgroups of the main group.
Additionally, for all purposes, the invention encompasses not only the main group, but also the main group absent one or more of the group members. The invention therefore envisages the explicit exclusion of any one or more of members of a recited group. Accordingly, provisos may apply to any of the disclosed categories or embodiments whereby any one or more of the recited elements, species, or embodiments, may be excluded from such categories or embodiments, for example, for use in an explicit negative limitation.
"Recombinant" as used herein to describe a nucleic acid molecule means a polynucleotide of genomic, cDNA, bacterial, semi synthetic, or synthetic origin which, by virtue of its origin or manipulation, is not associated with all or a portion of the polynucleotide with which it is associated in nature.
The term "recombinant" as used with respect to a protein or polypeptide means a polypeptide produced by expression of a recombinant polynucleotide. In general, the gene of interest is cloned and then expressed in transformed organisms, as described further below. The host organism expresses the foreign gene to produce the protein under expression conditions. As used herein, a "cell" refers to any type of cell isolated from a prokaryotic, eukaryotic, or archaeon organism, including bacteria, archaea, fungi, protists, plants, and animals, including cells from tissues, organs, and biopsies, as well as recombinant cells, cells from cell lines cultured in vitro, and cellular fragments, cell components, or organelles comprising nucleic acids. The term also encompasses artificial cells, such as nanoparticles, liposomes, polymersomes, or microcapsules encapsulating nucleic acids. The methods described herein can be performed, for example, on a sample comprising a single cell or a population of cells. The term also includes genetically modified cells.
The term "transformation" refers to the insertion of an exogenous polynucleotide (e.g., an engineered retron) into a host cell, irrespective of the method used for the insertion. For example, direct uptake, transduction or f-mating are included. The exogenous polynucleotide may be maintained as a non-integrated vector, for example, a plasmid, or alternatively, may be integrated into the host genome.
"Recombinant host cells," "host cells", "cells", "cell lines", "cell cultures", and other such terms denoting microorganisms or higher eukaryotic cell lines cultured as unicellular entities refer to cells which can be, or have been, used as recipients for recombinant vector or other transferred DNA, and include the original progeny of the original cell which has been transfected.
A "coding sequence" or a sequence which "encodes" a selected polypeptide, is a nucleic acid molecule which is transcribed (in the case of DNA) and translated (in the case of mRNA) into a polypeptide in vivo when placed under the control of appropriate regulatory sequences (or "control elements"). The boundaries of the coding sequence can be determined by a start codon at the 5' (amino) terminus and a translation stop codon at the 3' (carboxy) terminus. A coding sequence can include, but is not limited to, cDNA from viral, prokaryotic or eukaryotic mRNA, genomic DNA sequences from viral or prokaryotic DNA, and even synthetic DNA sequences. A transcription termination sequence may be located 3' to the coding sequence.
Typical "control elements," include, but are not limited to, transcription promoters, transcription enhancer elements, transcription termination signals, polyadenylation sequences (located 3' to the translation stop codon), sequences for optimization of initiation of translation (located 5’ to the coding sequence), and translation termination sequences.
"Operably linked" refers to an arrangement of elements wherein the components so described are configured so as to perform their usual function. Thus, a given promoter operably linked to a coding sequence is capable of effecting the expression of the coding sequence when the proper enzymes are present. The promoter need not be contiguous with the coding sequence, so long as it functions to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between the promoter sequence and the coding sequence and the promoter sequence can still be considered "operably linked" to the coding sequence.
"Encoded by" or “coded by” refers to a nucleic acid sequence which codes for a polypeptide or RNA sequence. For example, the polypeptide sequence or a portion thereof contains an amino acid sequence of at least 3 to 5 amino acids, more preferably at least 8 to 10 amino acids, and even more preferably at least 15 to 20 amino acids from a polypeptide encoded by the nucleic acid sequence. The RNA sequence or a portion thereof contains a nucleotide sequence of at least 3 to 5 nucleotides, more preferably at least 8 to 10 nucleotides, and even more preferably at least 15 to 20 nucleotides.
The terms "isolated," "purified," or "biologically pure" refer to material that is free to varying degrees from components which normally accompany it as found in its native state. "Isolate" denotes a degree of separation from original source or surroundings. "Purify" denotes a degree of separation that is higher than isolation. A "purified" or "biologically pure" protein is sufficiently free of other materials such that any impurities do not materially affect the biological properties of the protein or cause other adverse consequences. That is, a nucleic acid or peptide of this invention is purified if it is substantially free of cellular material, viral material, or culture medium when produced by recombinant DNA techniques, or chemical precursors or other chemicals when chemically synthesized. Purity and homogeneity are typically determined using analytical chemistry techniques, for example, polyacrylamide gel electrophoresis or high-performance liquid chromatography. The term "purified" can denote that a nucleic acid or protein gives rise to essentially one band in an electrophoretic gel. For a protein that can be subjected to modifications, for example, phosphorylation or glycosylation, different modifications may give rise to different isolated proteins, which can be separately purified.
"Substantially purified" generally refers to isolation of a substance (compound, polynucleotide, protein, polypeptide, peptide composition) such that the substance comprises the majority percent of the sample in which it resides. Typically, in a sample, a substantially purified component comprises 50%, preferably 80%-85%, more preferably 90-95% of the sample. Techniques for purifying polynucleotides and polypeptides of interest are well-known in the art and include, for example, ion-exchange chromatography, affinity chromatography and sedimentation according to density.
"Expression" refers to detectable production of a gene product by a cell. The gene product may be a transcription product (i.e., RNA), which may be referred to as "gene expression", or the gene product may be a translation product of the transcription product (i.e., a protein), depending on the context. "Purified polynucleotide" refers to a polynucleotide of interest or fragment thereof which is essentially free, e.g., contains less than about 50%, preferably less than about 70%, and more preferably less than about at least 90%, of the protein and/or nucleic acids with which the polynucleotide is naturally associated. Techniques for purifying polynucleotides of interest are available in the art and include, for example, disruption of the cell containing the polynucleotide with a chaotropic agent and separation of the polynucleotide(s) and proteins by ion-exchange chromatography, affinity chromatography and sedimentation according to density.
The term "transfection" is used to refer to the uptake of foreign DNA by a cell. A cell has been "transfected" when exogenous DNA has been introduced inside the cell membrane. A number of transfection techniques are generally known in the art. See, e.g., Graham et al. (1973) Virology, 52:456, Sambrook et al. (2001) Molecular Cloning, a laboratory manual, 3rd edition, Cold Spring Harbor Laboratories, New York, Davis et al. (1995) Basic Methods in Molecular Biology, 2nd edition, McGraw-Hill, and Chu et al. (1981) Gene 13: 197. Such techniques can be used to introduce one or more exogenous DNA moieties into suitable host cells. The term refers to both stable and transient uptake of the genetic material and includes uptake of peptide-linked or antibody-linked DNAs.
A "vector" is capable of transferring nucleic acid sequences to target cells (e.g., viral vectors, non-viral vectors, particulate carriers, and liposomes). Typically, "vector construct," "expression vector," and "gene transfer vector," mean any nucleic acid construct capable of directing the expression of a nucleic acid of interest and which can transfer nucleic acid sequences to target cells. Thus, the term includes cloning and expression vehicles, as well as viral vectors.
"Mammalian cell" refers to any cell derived from a mammalian subject suitable for transfection with an engineered retron or vector system comprising an engineered retron, as described herein. The cell may be xenogeneic, autologous, or allogeneic. The cell can be a primary cell obtained directly from a mammalian subject. The cell may also be a cell derived from the culture and expansion of a cell obtained from a mammalian subject. Immortalized cells are also included within this definition. In some embodiments, the cell has been genetically engineered to express a recombinant protein and/or nucleic acid.
The term "subject" includes animals, including both vertebrates and invertebrates, including, without limitation, invertebrates such as arthropods, mollusks, annelids, and cnidarians; and vertebrates such as amphibians, including frogs, salamanders, and caecillians; reptiles, including lizards, snakes, turtles, crocodiles, and alligators; fish; mammals, including human and non-human mammals such as non-human primates, including chimpanzees and other apes and monkey species; laboratory animals such as mice, rats, rabbits, hamsters, guinea pigs, and chinchillas; domestic animals such as dogs and cats; farm animals such as sheep, goats, pigs, horses and cows; and birds such as domestic, wild and game birds, including chickens, turkeys and other gallinaceous birds, ducks, geese, and the like. In some cases, the disclosed methods find use in experimental animals, in veterinary application, and in the development of animal models for disease, including, but not limited to, rodents including mice, rats, and hamsters; primates, and transgenic animals.
"Gene transfer" or "gene delivery" refers to methods or systems for reliably inserting DNA or RNA of interest into a host cell. Such methods can result in transient expression of non-integrated transferred DNA, extrachromosomal replication and expression of transferred replicons (e.g., episomes), or integration of transferred genetic material into the genomic DNA of host cells. Gene delivery expression vectors include, but are not limited to, vectors derived from bacterial plasmid vectors, viral vectors, non-viral vectors, alphaviruses, pox viruses and vaccinia viruses.
The term "derived from" is used herein to identify the original source of a molecule but is not meant to limit the method by which the molecule is made which can be, for example, by chemical synthesis or recombinant means.
A polynucleotide "derived from" a designated sequence refers to a polynucleotide sequence which comprises a contiguous sequence of approximately at least about 6 nucleotides, preferably at least about 8 nucleotides, more preferably at least about 10-12 nucleotides, and even more preferably at least about 15-20 nucleotides corresponding, i.e., identical or complementary to, a region of the designated nucleotide sequence. The derived polynucleotide will not necessarily be derived physically from the nucleotide sequence of interest, but may be generated in any manner, including, but not limited to, chemical synthesis, replication, reverse transcription or transcription, which is based on the information provided by the sequence of bases in the region(s) from which the polynucleotide is derived. As such, it may represent either a sense or an antisense orientation of the original polynucleotide.
A "barcode" refers to one or more nucleotide sequences that are used to identify a nucleic acid or cell with which the barcode is associated. Barcodes can be 3-1000 or more nucleotides in length, preferably 10-250 nucleotides in length, and more preferably 10-30 nucleotides in length, including any length within these ranges, such as 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 nucleotides in length. Barcodes may be used, for example, to identify a single cell, subpopulation of cells, colony, or sample from which a nucleic acid originated. Barcodes may also be used to identify the position (i.e., positional barcode) of a cell, colony, or sample from which a nucleic acid originated, such as the position of a colony in a cellular array, the position of a well in a multi-well plate, or the position of a tube, flask, or other container in a rack. For example, a barcode may be used to identify a genetically modified cell from which a nucleic acid originated. In some embodiments, a barcode is used to identify a particular type of genome edit or a particular type of donor nucleic acid.
The terms "hybridize" and "hybridization" refer to the formation of complexes between nucleotide sequences which are sufficiently complementary to form complexes via Watson-Crick base pairing.
The term "homologous region" refers to a region of a nucleic acid with homology to another nucleic acid region. Thus, whether a "homologous region" is present in a nucleic acid molecule is determined with reference to another nucleic acid region in the same or a different molecule. Further, since a nucleic acid is often double-stranded, the term "homologous, region," as used herein, refers to the ability of nucleic acid molecules to hybridize to each other. For example, a single-stranded nucleic acid molecule can have two homologous regions which are capable of hybridizing to each other. Thus, the term "homologous region" includes nucleic acid segments with complementary sequences. Homologous regions may vary in length, but will typically be between 4 and 500 nucleotides (e.g., from about 4 to about 40, from about 40 to about 80, from about 80 to about 120, from about 120 to about 160, from about 160 to about 200, from about 200 to about 240, from about 240 to about 280, from about 280 to about 320, from about 320 to about 360, from about 360 to about 400, from about 400 to about 440, etc.).
As used herein, the terms "complementary" or "complementarity" refers to polynucleotides that are able to form base pairs with one another. Base pairs are typically formed by hydrogen bonds between nucleotide units in an anti-parallel orientation between polynucleotide strands. Complementary polynucleotide strands can base pair in a Watson- Crick manner (e.g., A to T, A to U, C to G), or in any other manner that allows for the formation of duplexes. As persons skilled in the art are aware, when using RNA as opposed to DNA, uracil (U) rather than thymine (T) is the base that is considered to be complementary to adenosine. However, when uracil is denoted in the context of the present invention, the ability to substitute a thymine is implied, unless otherwise stated. "Complementarity" may exist between two RNA strands, two DNA strands, or between an RNA strand and a DNA strand. It is generally understood that two or more polynucleotides may be "complementary" and able to form a duplex despite having less than perfect or less than 100% complementarity. Two sequences are "perfectly complementary" or "100% complementary" if at least a contiguous portion of each polynucleotide sequence, comprising a region of complementarity, perfectly base pairs with the other polynucleotide without any mismatches or interruptions within such region. Two or more sequences are considered "perfectly complementary" or " 100% complementary" even if either or both polynucleotides contain additional non-complementary sequences as long as the contiguous region of complementarity within each polynucleotide is able to perfectly hybridize with the other. "Less than perfect" complementarity refers to situations where less than all of the contiguous nucleotides within such region of complementarity are able to base pair with each other. Determining the percentage of complementarity between two polynucleotide sequences is a matter of ordinary skill in the art.
The term "Cas9" as used herein encompasses type II clustered regularly interspaced short palindromic repeats (CRISPR) system Cas9 endonucleases from any species, and also includes biologically active fragments, variants, analogs, and derivatives thereof that retain Cas9 endonuclease activity (i.e., catalyze site-directed cleavage of DNA to generate doublestrand breaks). A Cas9 endonuclease binds to and cleaves DNA at a site comprising a sequence complementary to its bound guide RNA (gRNA). For purposes of Cas9 targeting, a gRNA may comprise a sequence "complementary" to a target sequence (e.g., major or minor allele), capable of sufficient base-pairing to form a duplex (i.e., the gRNA hybridizes with the target sequence). Additionally, the gRNA may comprise a sequence complementary to a PAM sequence, wherein the gRNA also hybridizes with the PAM sequence in a target DNA.
The term "donor polynucleotide" refers to a polynucleotide that provides a sequence of an intended edit to be integrated into the genome at a target locus by HDR or recombineering. A "target site" or "target sequence" is the nucleic acid sequence recognized (i.e., sufficiently complementary for hybridization) by a guide RNA (gRNA) or a homology arm of a donor polynucleotide. The target site may be allele-specific (e.g., a major or minor allele). For example, a target site can be a genomic site that is intended to be modified such as by insertion of one or more nucleotides, replacement of one or more nucleotides, deletion of one or more nucleotides, or a combination thereof.
By "homology arm" is meant a portion of a donor polynucleotide that is responsible for targeting the donor polynucleotide to the genomic sequence to be edited in a cell. The donor polynucleotide typically comprises a 5' homology arm that hybridizes to a 5' genomic target sequence and a 3' homology arm that hybridizes to a 3' genomic target sequence flanking a nucleotide sequence comprising the intended edit to the genomic DNA. The homology arms are referred to herein as 5' and 3' (i.e., upstream and downstream) homology arms, which relates to the relative position of the homology arms to the nucleotide sequence comprising the intended edit within the donor polynucleotide. The 5' and 3' homology arms hybridize to regions within the target locus in the genomic DNA to be modified, which are referred to herein as the "5' target sequence" and "3' target sequence," respectively. For example, the nucleotide sequence comprising the intended edit can be integrated into the genomic DNA by HDR or recombineering at the genomic target locus recognized (i.e., sufficiently complementary for hybridization) by the 5' and 3' homology arms.
In general, "a CRISPR adaptation system" refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated ("Cas") genes, including sequences encoding a Cas gene, and a CRISPR array nucleic acid sequence including a leader sequence and at least one repeat sequence. In some embodiments, one or more elements of a CRISPR adaption system are derived from a type I, type II, or type III CRISPR system. Casl and Cas2 are found in all three types of CRISPR-Cas systems, and they are involved in spacer acquisition. In the I-E system of E. coh. Casl and Cas2 form a complex where a Cas2 dimer bridges two Casl dimers. In this complex Cas2 performs a non-enzymatic scaffolding role, binding double-stranded fragments of invading DNA, while Casl binds the single-stranded flanks of the DNA and catalyzes their integration into CRISPR arrays.
In some embodiments, one or more elements of a CRISPR system is derived from a particular organism comprising an endogenous CRISPR system, such as Streptococcus pyogenes. In general, a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system).
In general, "a CRISPR system" refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated ("Cas") genes, including sequences encoding a Cas gene, and a CRISPR array nucleic acid sequence including a leader sequence and at least one repeat sequence. A CRISPR system can be a type I, type II, or type III CRISPR system.
In some embodiments, a vector comprises a regulatory element operably linked to an enzyme-coding sequence encoding a CRISPR enzyme, such as a Cas protein. Non-limiting examples of Cas proteins include Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csnl and Csxl2), CaslO, Csyl, Csy2, Csy3, Csel, Cse2, Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, Csf4, homologs thereof, or modified versions thereof.
Casl and Cas2 are found in type I, type II, or type III CRISPR systems, and they are involved in spacer acquisition. In the I-E system of E. coh. Casl and Cas2 form a complex where a Cas2 dimer bridges two Casl dimers. In this complex, Cas2 performs a non-enzymatic scaffolding role, binding double-stranded fragments of invading (phage) DNA, while Casl binds the single-stranded flanks of the DNA and catalyzes their integration into CRISPR arrays.
In certain embodiments, the disclosure provides protospacers that are adjacent to short (3 - 5 bp) DNA sequences termed protospacer adjacent motifs (PAM). The PAMs are important for type I and type II systems during acquisition. In type I and type II systems, protospacers are excised at positions adjacent to a PAM sequence, with the other end of the spacer is cut using a ruler mechanism, thus maintaining the regularity of the spacer size in the CRISPR array. The conservation of the PAM sequence differs between CRISPR-Cas systems and may be evolutionarily linked to Casl and the leader sequence.
In some embodiments, the disclosure provides for integration of defined synthetic DNA that is produced within a cell such as by using an engineered retron system within the cell into a CRISPR array in a directional manner, occurring preferentially, but not exclusively, adjacent to the leader sequence. In the type I-E system from E. coh. it was demonstrated that the first direct repeat, adjacent to the leader sequence is copied, with the newly acquired spacer inserted between the first and second direct repeats. In one embodiment, the protospacer is a defined synthetic DNA. In some embodiments, the defined synthetic DNA is at least 3, 5, 10, 20, 30, 40, or 50 nucleotides, or between 3-50, or between 10-100, or between 20-90, or between 30-80, or between 40-70, or between 50-60, nucleotides in length. In one embodiment, the oligo nucleotide sequence or the defined synthetic DNA includes a modified "AAG" protospacer adjacent motif (PAM).
In some embodiments, a regulatory element is operably linked to one or more elements of a CRISPR system so as to drive expression of the one or more elements of the CRISPR system. In general, CRISPRs (Clustered Regularly Interspaced Short Palindromic Repeats), also known as SPIDRs (SPacer Interspersed Direct Repeats), constitute a family of DNA loci that are usually specific to a particular bacterial species. The CRISPR locus comprises a distinct class of interspersed short sequence repeats (SSRs) that were recognized in E. coli (Ishino et al, J. BacterioL, 169:5429-5433 (1987); and Nakata et al., J. BacterioL, 171 :3553-3556 (1989)), and associated genes. Similar interspersed SSRs have been identified in Haloferctx medilerranei. Streptococcus pyogenes, Anabaena, and Mycobacterium tuberculosis (See, Groenen et al., Mol. Microbiol., 10: 1057-1065 (1993); Hoe et al., Emerg. Infect. Dis., 5:254- 263 (1999); Masepohl et al, Biochim. Biophys. Acta 1307:26-30 (1996); and Mojica et al, Mol. Microbiol, 17:85-93 (1995)). The CRISPR loci typically differ from other SSRs by the structure of the repeats, which have been termed short regularly spaced repeats (SRSRs) (Janssen et al, OMICS J. Integ. Biol., 6:23-33 (2002); and Mojica et al, Mol. Microbiol., 36:244-246 (2000)). In general, the repeats are short elements that occur in clusters that are regularly spaced by unique intervening sequences with a substantially constant length (Mojica et al., (2000), supra). Although the repeat sequences are highly conserved between strains, the number of interspersed repeats and the sequences of the spacer regions typically differ from strain to strain (van Embden et al., J. Bacteriol., 182:2393-2401 (2000)). CRISPR loci have been identified in more than 40 prokaryotes (See e.g., Jansen et al, Mol. Microbiol., 43: 1565- 1575 (2002); and Mojica et al, (2005)) including, but not limited to Aeropyrum, Pyrobaculum, Sulfolobus, Archaeoglobus, Halocarcula, Methanobacteriumn, Methanococcus, Methanosarcina, Methanopyrus, Pyrococcus, Picrophilus, Thernioplasnia, Corynebacterium, Mycobacterium, Streptomyces, Aquifrx, Porphvromonas, Chlorobium, Thermus, Bacillus, Listeria, Staphylococcus, Clostridium, Thermoanaerobacter, Mycoplasma, Fusobacterium, Azarcus, Chromobacterium, Neisseria, Nitrosomonas, Desulfovibrio, Geobacter, Myrococcus, Campylobacter, Wolinella, Acinetobacter, Erwinia, Escherichia, Legionella, Methylococcus, Pasteurella, Photobacterium, Salmonella, Xanthomonas, Yersinia, Treponema, and Thermotoga.
In some embodiments, an enzyme coding sequence encoding a CRISPR enzyme is codon optimized for expression in particular cells, such as eukaryotic cells. The eukaryotic cells may be those of or derived from a particular organism, such as a mammal, including but not limited to human, mouse, rat, rabbit, dog, or non-human primate. In general, codon optimization refers to a process of modifying a nucleic acid sequence for enhanced expression in the host cells of interest by replacing at least one codon (e.g. about one or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more codons) of the native sequence with codons that are more frequently or most frequently used in the genes of that host cell while maintaining the native amino acid sequence. Various species exhibit particular bias for certain codons of a particular amino acid. Codon bias (differences in codon usage between organisms) often correlates with the efficiency of translation of messenger RNA (mRNA), which is in turn believed to be dependent on, among other things, the properties of the codons being translated and the availability of particular transfer RNA (tRNA) molecules. The predominance of selected tRNAs in a cell is generally a reflection of the codons used most frequently in peptide synthesis. Accordingly, genes can be tailored for optimal gene expression in a given organism based on codon optimization. Codon usage tables are readily available, for example, at the "Codon Usage Database", and these tables can be adapted in a number of ways. See Nakamura, Y., et al. "Codon usage tabulated from the international DNA sequence databases: status for the year 2000" Nucl. Acids Res. 28:292 (2000). Computer algorithms for codon optimizing a particular sequence for expression in a particular host cell are also available, such as Gene Forge (Aptagen; Jacobus, Pa.), are also available. In some embodiments, one or more codons (e.g., 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more, or all codons) in a sequence encoding a CRISPR enzyme correspond to the most frequently used codon for a particular amino acid.
Some bacterial cells have a Class II CRISPR system where endoribonucleases (cas nucleases) are expressed that can preferentially cleave specific sequences, including certain repeat sequences in DNA, various U-rich regions in RNAs, sites near a protospacer adjacent motif (PAM). Class II CRISPR systems, for example, can include a cluster of four genes Cas9, Casl, Cas2, and Csnl, that employ a tracrRNA and a crispr RNA (crRNA). In this system, targeted DNA double-strand break (DSB) may be generated in four sequential steps. First, the pre-crRNA and tracrRNA, may be expressed. Second, tracrRNA may hybridize to the direct repeats of pre-CRISPR guide RNA (pre-crRNA), which is then processed into mature crRNAs containing individual spacer sequences. Third, the mature crRNA:tracrRNA complex can direct a cas nuclease to the DNA target consisting of the protospacer and the corresponding PAM sequence via heteroduplex formation between the spacer region of the crRNA and the protospacer DNA. The cas nuclease may then cleave target DNA upstream of the PAM site to create a double-stranded break within the protospacer. Such cleavage can undermine or destroy a phage.
However, Cas nucleases bind to nucleic acids only in presence of a specific sequence, called protospacer adjacent motif (PAM), on the non-targeted DNA strand. Therefore, the locations in the genome that can be targeted by different Cas proteins are limited by the locations of these PAM sequences. The cas nuclease cuts 3-4 nucleotides upstream of the PAM sequence. Hence, one method to generate phage that are not vulnerable to CRISPR- based bacterial defense mechanisms is to modify PAM sites in phage genomes so that cas nucleases cannot bind to their genomic DNA.
Table 1: Examples of Cas nucleases and their PAM sequences.
Figure imgf000030_0001
In Table 1, an “N” in a PAM sequence means that any nucleotide is present; an R means that an A or a G is present; a W means that an A or a T is present; a Y means that a T or a C is present; and a V means that an A, C or G is present. "Administering" a nucleic acid, such as an engineered retron construct or vector comprising an engineered retron construct to a cell comprises transducing, transfecting, electroporating, translocating, fusing, phagocytosing, shooting or ballistic methods, etc., i.e., any means by which a nucleic acid can be transported across a cell membrane.
Before the present disclosure is further described, it is to be understood that the disclosed subject matter is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present disclosure will be limited only by the appended claims.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the disclosed subject matter. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the disclosed subject matter, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the disclosed subject matter.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosed subject matter belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the disclosed subject matter, the preferred methods and materials are now described. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited.
It must be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a cell" includes a plurality of such cells and reference to "the nucleic acid" includes reference to one or more nucleic acids and equivalents thereof known to those skilled in the art, and so forth. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as "solely," "only" and the like in connection with the recitation of any features or elements described herein, which includes use of a "negative" limitation. It is appreciated that certain features of the disclosed subject matter, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the disclosed subject matter, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination. All combinations of the embodiments pertaining to the disclosure are specifically embraced by the disclosed subject matter and are disclosed herein just as if each and every combination was individually and explicitly disclosed. In addition, all sub-combinations of the various embodiments and elements thereof are also specifically embraced by the present disclosure and are disclosed herein just as if each and every such sub-combination was individually and explicitly disclosed herein.
The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the disclosed subject matter is not entitled to antedate such publication. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed.
RETRONS
Retrons are defined by their unique ability to produce an unusual satellite DNA known as msDNA (multicopy single-stranded DNA). A typical retron operon consists of a gene encoding a retron reverse transcriptase (RT) (encoded by the ret gene) and a region encoding a non-coding RNA (ncRNA), which includes two contiguous and inverted non-coding sequences referred to as the msr and msd. The ncRNA serves both as a primer site (i.e., the msr region) for binding of the retron RT and template for the reverse transcriptase (i.e., the msd region), and a gene encoding an accessory protein.
The ret gene and the non-coding RNA (including the msr and msd) are transcribed as a single RNA transcript, processed to separate the ret region and the ncRNA region as separate transcripts. The ncRNA then becomes folded into a specific secondary structure. The 5' and 3' ends of ncRNA are referred to generally as the al and a2 complementary regions and can hybridize to one another to form a stem or duplex region referred to as the “al/a2 stem” or the “al/a2 duplex” of the ncRNA.
The retron RT, once translated, binds the ncRNA downstream from the msd locus (without being bound by theory, the binding may involve the al/a2 duplex) and initiates reverse transcription of the msd region as a template sequence, thereby generating a single strand DNA reverse transcriptase product (i.e., the RT-DNA, with a characteristic hairpin structure, which in wild type retrons varies in length from about 48 to 163 bases). The RT-DNA, as part of the priming event, is covalently attached to a 2’ OH group present in a conserved branching guanosine residue. Reverse transcription halts before reaching the msr locus. It is thought that cellular RNase H degrades the template RNA during reverse transcription. The result is the formation of a chimeric molecule of RNA (the remaining portions of the ncRNA not removed by processing) and DNA (the single stranded RT-DNA product covalently attached to the ncRNA), which is referred to as “msDNA.”
A large number of retrons have been identified and can be modified or engineered as described herein. One of the first described retrons found in E. coli is called Ecol (previously called Ec86). In BL21 E. coli cells, this retron is present and active, producing reverse transcriptase DNA that can be detected at the population level. The wild type Ecol retron can be eliminated from BL21 E. coli cells by removing the retron operon from the genome. In the absence of this native operon, the ncRNA and reverse transcriptase can be expressed from a plasmid lacking the accessory protein. Since the accessory protein is a core component of the phage-defense conferred by retrons, this reduced system would reduce phage defense capacity, yet cells with ncRNA-reverse transcriptase encoding plasmids continue to produce abundant reverse transcribed DNA. The accessory protein coding region is not included in the engineered retrons.
An example of an Ecol wild type retron non-coding RNA (ncRNA) sequence is shown below as SEQ ID NO: 1.
1 ATGCGCACCC TTAGCGAGAG GTTTATCATT AAGGTCAACC
41 TCTGGATGTT GTTTCGGCAT CCTGCATTGA ATCTGAGTTA
81 CTGTCTGTTT TCCTTGTTGG AACGGAGAGC ATCGCCTGAT
121 GCTCTCCGAG CCAACCAGGA AACCCGTTAT TTCTGACGTA
161 AGGGTGCGCA
An example of an Ecol human-codon optimized reverse transcriptase (RT) sequence that can be used is shown below as SEQ ID NO: 3.
1 ATGAAATCTG CAGAGTATCT GAATACGTTC CGCCTTAGGA
41 ATTTGGGCCT CCCCGTGATG AACAATCTCC ACGATATGAG
81 CAAGGCGACT CGAATATCCG TGGAAACGCT GAGACTGCTC
121 ATCTATACAG CAGACTTTCG GTACAGGATC TACACGGTCG
161 AAAAGAAGGG GCCTGAGAAA CGCATGCGAA CAATTTATCA
201 ACCTAGCCGA GAGCTCAAGG CGTTGCAGGG CTGGGTTCTT
Figure imgf000034_0001
An example of an Eco2 human-codon optimized reverse transcriptase (RT) sequence is shown below as SEQ ID NO: 4.
Figure imgf000034_0002
Figure imgf000035_0001
An example of an Ecol wild type retron reverse transcriptase sequence is shown below as SEQ ID NO: 5.
1 KSAEYLNT FR LRNLGLPVMN NLHDMSKATR I SVETLRLLI
41 YTADFRYRIY TVEKKGPEKR MRTIYQPSRE LKALQGWVLR
81 NILDKLSSSP FSIGFEKHQS ILNNATPHIG ANFILNIDLE
121 DFFPSLTANK VFGVFHSLGY NRLI SSVLTK ICCYKNLLPQ
161 GAPSSPKLAN LICSKLDYRI QGYAGSRGLI YTRYADDLTL
201 SAQSMKKVVK ARDFLFS I I P SEGLVINSKK TCI SGPRSQR
241 KVTGLVISQE KVGIGREKYK EIRAKIHHI F CGKSSEIEHV
281 RGWLSFILSV DSKSHRRLIT YISKLEKKYG KNPLNKAKT
An example of an Eco2 wild type retron reverse transcriptase sequence is shown below as SEQ ID NO: 3.
1 CACGCATGTA GGCAGATTTG TTGGTTGTGA ATCGCAACCA
41 GTGGCCTTAA TGGCAGGAGG AATCGCCTCC CTAAAATCCT
81 TGATTCAGAG CTATACGGCA GGTGTGCTGT GCGAAGGAGT
121 GCCTGCATGC GT
An example of an Eco2 wild type retron reverse transcriptase sequence is shown below as SEQ ID NO: 6.
1 MTKTSKLDAL RAATSREDLA KILDIKLVFL TNVLYRIGSD
41 NQYTQFTI PK KGKGVRT ISA PTDRLKDIQR RICDLLSDCR
81 DE I FAIRKIS NNYSFGFERG KS I ILNAYKH RGKQI ILNID
121 LKDFFESFNF GRVRGYFLSN QDFLLNPVVA TTLAKAACYN
161 GTLPQGSPCS PI I SNLICNI MDMRLAKLAK KYGCTYSRYA
201 DDIT ISTNKN T FPLEMATVQ PEGVVLGKVL VKE IENSGFE
241 INDSKTRLTY KTSRQEVTGL TVNRIVNIDR CYYKKTRALA
281 HALYRTGEYK VPDENGVLVS GGLDKLEGMF GFIDQVDKFN
321 NIKKKLNKQP DRYVLTNATL HGFKLKLNAR EKAYSKFIYY
361 KFFHGNTCPT I ITEGKTDRI YLKAALHSLE TSYPELFREK
401 TDSKKKEINL NI FKSNEKTK YFLDLSGGTA DLKKFVERYK
441 NNYASYYGSV PKQPVIMVLD NDTGPSDLLN FLRNKVKSCP
481 DDVTEMRKMK Y IHVFYNLY I VLTPLSPSGE QTSMEDLFPK 521 DILDIKIDGK KFNKNNDGDS KTEYGKHI FS MRVVRDKKRK
561 IDFKAFCCI F DAIKDIKEHY KLMLNS
An example of a sequence for an Eco4 retron reverse transcriptase is shown below as SEQ ID NO: 7.
1 MS IDIETTLQ KAYPDFDVLL KSRPATHYKV YKI PKRT IGY
41 RI IAQPTPRV KAIQRDI IE I LKQHTHIHDA ATAYVDGKNI
81 LDNAKIHQSS VYLLKLDLVN FFNKITPELL FKALARQKVD
121 ISDTNKNLLK QFCFWNRTKR KNGALVLSVG APSSPFI SNI
161 VMSS FDEE IS S FCKENKISY SRYADDLT FS TNERDVLGLA
201 HQKVKTTLIR FFGTRI I INN NKIVYSSKAH NRHVTGVTLT
241 NNNKLSLGRE RKRYITSLVF KFKEGKLSNV DINHLRGLIG
281 FAYNIEPAFI ERLEKKYGES TIKS IKKYSE GG
An example of a sequence for a Sen2 retron reverse transcriptase is shown below as SEQ ID NO: 8.
1 MDILQHISDL LLTKKSE I I S FSLTAPYRYK IYKIAKRNSD
41 KKRT IAHPSK ELKFIQREIT EYLTDKLPVH ECAFAYKKGS
81 SIKTNAQVHL HTKYLLKMDF ENFFPS ITPR LFFSKLRLAN
121 IDLTADDKVL LENILFFKSK RNSNLRLS IG APSSPLI SNF
161 VMYFWDIEVQ E ICSKIGVNY TRYADDLT FS TNNKDVLFDI
201 PDMLENVLPK YSLGRIRINH EKTVFSSKGH NRHVTGITLT
241 NDNKLS IGRE RKRKI SAMIH HFINGKLSTD ECNKLVGLLA
281 FAKNIEPS FY KSMVIKYGSD NIYKLQKQKD K
Variants and homologs of any of the sequences described here can also be used in the methods and systems described herein. For example, such variants and homologs can have less than 100% sequence identity to any of the sequences described herein. The variants and homologs can have about at least 40% sequence identity, or at least 50% sequence identity, or at least 60% sequence identity, or at least 70% sequence identity, or at least 80% sequence identity, or at least 90% sequence identity, or at least 95% sequence identity, or at least 96% sequence identity, or at least 97% sequence identity, or at least 98% sequence identity, or at least 99% sequence identity, or 60-99% sequence identity, or 70-99% sequence identity, or 80- 99% sequence identity, or 90-95% sequence identity, or 90-99% sequence identity, or 95-97% sequence identity, or 97-99% sequence identity, or 100% sequence identity with any of sequences described herein.
This section is provided as a retron overview, other types of retrons are described throughout the application and can be engineered as described herein. Other types of retrons are described throughout the application and can be used in the methods described herein as well.
In addition, any of the retrons described in Mestre et al., Systematic Prediction of Genes Functionally Associated with Bacterial Retrons and Classification of The Encoded Tripartite Systems, Nucleic Acids Research, Volume 48, Issue 22, 16 December 2020, Pages 12632- 12647” (incorporated herein by reference) may be used as a starting point by which to introduce the modifications described herein to result in the engineered retrons, ncRNAs, msDNAs, and RT-DNAs described herein. These retron sequences are provided as follows in Table A:
Table A: Retron sequences that may be modified as described herein
Figure imgf000038_0001
Figure imgf000039_0001
Figure imgf000040_0001
Figure imgf000041_0001
Figure imgf000042_0001
Figure imgf000043_0001
Figure imgf000044_0001
Figure imgf000045_0001
Figure imgf000046_0001
Figure imgf000047_0001
Figure imgf000048_0001
Figure imgf000049_0001
Figure imgf000050_0001
Figure imgf000051_0001
Figure imgf000052_0001
Figure imgf000053_0001
Figure imgf000054_0001
Figure imgf000055_0001
Figure imgf000056_0001
Figure imgf000057_0001
Figure imgf000058_0001
Figure imgf000059_0001
Figure imgf000060_0001
Figure imgf000061_0001
Figure imgf000062_0001
Figure imgf000063_0001
Figure imgf000064_0001
Figure imgf000065_0001
Figure imgf000066_0001
Figure imgf000067_0001
Figure imgf000068_0001
Figure imgf000069_0001
Figure imgf000070_0001
Figure imgf000071_0001
Figure imgf000072_0001
Figure imgf000073_0001
Figure imgf000074_0001
Figure imgf000075_0001
Figure imgf000076_0001
Figure imgf000077_0001
Figure imgf000078_0001
Figure imgf000079_0001
Figure imgf000080_0001
Figure imgf000081_0001
Figure imgf000082_0001
Figure imgf000083_0001
Figure imgf000084_0001
Figure imgf000085_0001
Figure imgf000086_0001
Figure imgf000087_0001
Figure imgf000088_0001
Figure imgf000089_0001
Figure imgf000090_0001
Figure imgf000091_0001
Figure imgf000092_0001
Figure imgf000093_0001
Figure imgf000094_0001
Figure imgf000095_0001
Figure imgf000096_0001
Figure imgf000097_0001
Figure imgf000098_0001
"Accession in Patrie database (https://www.patricbrc.org/) orNCBI (https://www.ncbi.nlm.nih.gov/protein/)
ENGINEERED RETRON NON-CODING RNAs
Engineered retrons, in which modifications of the retron ncRNA to enable encoding of multiple editing donors/repair templates, are provided, such that two or more donor/templates are provided, including two, three, four five, six, seven, eight, nine, or ten donor/template sequences are provided. In addition, vector systems encoding such engineered retrons and methods of using engineered retrons and vector systems encoding them in various applications such as CRISPR/Cas-mediated genome editing, recombineering, cellular barcoding, and molecular recording are also provided.
For prokaryotic use of the engineered multiplexed retron system described herein, some embodiments comprise:
1) modified retron non-coding RNAs (ncRNAs) that contain homology to multiple sites (e.g., two or more sites) in the bacterial genome with the intended mutations (deletion, addition, substitution etc.) in these sites;
(2) a ret gene coding for a reverse transcriptase, such as a retron reverse-transcriptase (RT) protein to reverse-transcribe the retron ncRNA into a single-stranded DNA template for recombination;
(3) a recT/single-stranded annealing protein (SSAP) to promote recombination of the single-stranded DNA template into the bacterial genome;
(4) a dominant-negative mutL to suppress mismatch recognition when making singlebase or small changes; and/or
(5) a single-stranded binding protein (SSB) that is compatible with the SSAP to promote recombination. See US 62/899,625; PCT/US2020/050323; 63/328,387; and
PCT/US2023/064014, which are incorporated herein by reference.
For eukaryotic use of the engineered multiplexed retron system described herein, some embodiments comprise:
1) modified retron ncRNAs that contain both homology to multiple sites in the eukaryotic genome and the intended change mutations deletion, addition, substitution etc.) to those sites; (2) a retron reverse-transcriptase (RT) protein to reverse-transcribe the retron ncRNA into a single-stranded DNA donor for precisely repairing the yeast genome with the desired edit;
(3) a CRISPR-associated nuclease that targeted several loci in the genome using a certain number of guides RNA fused to the 5’ and 3’ end of a modified retron non-coding RNAs (ncRNAs). A DNA repair template can comprise a single strand DNA product of reverse transcription which comprises a nucleotide sequence having a sequence modification (e.g., a desired one or more mutations, insertion, deletion, or inversion) that is flanked by regions of homology to a target genomic site. Such engineered retrons provide both the guide RNA (as part of the ncRNA) and the DNA repair template (encoded as part of the msd region, which is converted by the retron RT to a single strand RT-DNA which operates as the DNA repair template), thereby providing a vehicle to make the desired nucleotide changes at genomic sites. See US 63/275,287; PCT/US2022/079220; and 63/400,434; which are incorporated herein by reference.
In various embodiments, the engineered retron ncRNA can be modified from its endogenous sequence (e.g., the endogenous ncRNA from retron sequences of Table A) in various ways, including but not limited to : (1) the ncRNA can be fused to a guide sequence (e.g., a CRISPR crRNA-tracrRNA), allowing the transcribed ncRNA to serve as a targeting molecule for a trans-expressed RNA-guided nuclease (e.g., a CRISPR nuclease); (2) the msd region (reverse transcribed region of the retron ncRNA) can be modified to contain a sequence that is reverse transcribed to provide DNA donor/repair template; and (3) the al/a2 duplex can be modified in length to facilitate increased production of the RT-DNA. A DNA donor/repair template can comprise a single strand DNA product of reverse transcription which comprises a nucleotide sequence having a sequence modification (e.g., a desired one or more mutations, insertions, deletions, or inversions) that is flanked by regions of homology to a target genomic site. Such engineered retrons provide both the guide RNA (as part of the ncRNA) and the DNA donor/repair template (encoded as part of the msd region, which is converted by the retron RT to a single strand RT-DNA which operates as the DNA donor/repair template), thereby providing a vehicle to make the desired nucleotide changes at genomic sites
Sequences of the retron msr, msd, and/or reverse transcriptases used in the engineered retrons may be derived from a bacterial retron operon. Representative retrons are available such as those from gram-negative bacteria including, without limitation, myxobacteria retrons such as Myxococcus xanthus retrons (e.g., Mx65, Mxl62) and Stigmatella aurantiaca retrons (e.g., Sal63); Escherichia coli retrons (e.g., Ec48, E67, Ec73, Ec78, EC83, EC86, EC107, and Ecl07); Salmonella enlerica: Vibrio cholerae retrons (e.g., Vc81, Vc95, Vcl37); Vibrio parahaemolyticus (e.g., Vc96); and Nannocystis exedens retrons (e.g., Nel44), orthose retrons available in Table A. Retron msr gene, msd gene, and ret gene nucleic acid sequences as well as retron reverse transcriptase protein sequences may be derived from any source, including those of Table A. Representative retron sequences, including msr gene, msd gene, and ret gene nucleic acid sequences and reverse transcriptase protein sequences are listed in the National Center for Biotechnology Information (NCBI) database. See, for example, NCBI entries: Accession Nos. EF428983, M55249, EU250030, X60206, X62583, AB299445, AB436696, AB436695, M86352, M30609, M24392, AF427793, AQ3354, and AB079134; all of which sequences (as entered by the date of filing of this application) are herein incorporated by reference in their entireties. Any of these retron sequences or a variant thereof comprising a sequence can include variant nucleotides, added nucleotides, or fewer nucleotides. For example, the retrons can have at least about 80-100% sequence identity thereto, including any percent identity within this range, such as 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity to any of the retron sequences described herein (including those defined by accession number), and can be used to construct an engineered retron or vector system comprising an engineered retron, as described herein.
In some embodiments, recombinant retron constructs can have a non-native configuration with a non-native spacing between the msr gene, msd gene, and ret gene. The msr gene and the msd gene may be separated in a trans arrangement rather than provided in the endogenous cis arrangement. In addition, the ret gene may be provided in a trans arrangement with respect to either the msr gene or the msd gene, or both. In some embodiments, the ret gene is provided in a trans arrangement that eliminates a cryptic stop signal for the reverse transcriptase, which allows the generation of longer single stranded DNAs from the engineered retron construct.
In some embodiments, the retron construct is modified with respect to the native retron to include a donor/repair templates of interest, such as two or more donor/repair templates of interest. In this context, the retrons can be engineered with donor/repair templates for use in a variety of applications. For example, donor/repair templates can be added to retron constructs to provide a cell with a nucleic acid encoding a protein or regulatory RNA of interest, a donor polynucleotide suitable for use in gene editing, e.g., by homology directed repair (HDR) or recombination-mediated genetic engineering (recombineering), or a CRISPR protospacer DNA sequence for use in molecular recording, as discussed further herein. Such donor/repair templates may be inserted, for example, into the msr gene or the msd gene such that the donor/repair template is transcribed by the retron reverse transcriptase as part of the msDNA product.
In some embodiments, multiple copies of donor DNAs can be generated in vivo from retron templates. For example, modified retron non-coding RNAs (ncRNAs) can be expressed from an expression cassette within the host cells as RNA molecules. Retron ncRNAs are naturally partially reverse transcribed into ssDNA. As provided herein, in some embodiments, the portion of the ncRNA that is partially reverse transcribed can provide the donor DNA for editing genomes, prokaryotic or eukaryotic. Such reverse transcription provides multiple copies of single stranded donor DNA, which is ideal for editing genomes.
In some embodiments, the donor DNAs can be generated in host cells that also provide one or more types of single strand annealing proteins (SSAPs) and/or one or more singlestranded DNA binding proteins (SSBs). The SSAPs can facilitate recombination (editing) and in some cases the SSAP is a RecT recombinase. Single-stranded DNA binding proteins (SSBs) bind and stabilize single-stranded DNA (ssDNA). The SSAP and/or SSB proteins can be expressed endogenously, or the bacterial host cells can be modified to include an expression cassette from which the SSAP and/or SSB proteins can be expressed. For example, in some cases the bacterial host cells can have, or be modified to express CspRecT as a SSAP. RecT binds to single-stranded DNA and promotes the renaturation of complementary single-stranded DNAs to facilitate recombination. RecT has a function similar to that of lambda RedB.
Constructs can also be used to express the different ncRNAs, reverse transcriptases, along with the SSAP, SSB, mutant mismatch repair proteins (e.g., mutL mutants), or combinations thereof.
The engineered retrons can include a unique barcode. Barcodes may comprise one or more nucleotide sequences that are used to identify a nucleic acid or cell with which the barcode is associated. Such barcodes may be inserted for example, into the loop region of the msd- encoded DNA. Barcodes can be 3-1000 or more nucleotides in length, preferably 10-250 nucleotides in length, and more preferably 10-30 nucleotides in length, including any length within these ranges, such as 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 nucleotides in length. In some embodiments, barcodes are also used to identify the position (i.e., positional barcode) of a cell, colony, or sample from which a retron originated, such as the position of a colony in a cellular array, the position of a well in a multiwell plate, the position of a tube in a rack, or the location of a sample in a laboratory. In particular, a barcode may be used to identify the position of a genetically modified cell containing a retron. The use of barcodes allows retrons from different cells to be pooled in a single reaction mixture for sequencing while still being able to trace a particular retron back to the colony from which it originated.
Therefore, expression cassettes with segments encoding any of the ncRNAs, donor DNAs, and/or reverse transcriptases, and/or other proteins that can facilitate editing can be linked to a barcode that is inserted into a genome and can be recovered by sequencing. In this way, many variables can be identified and evaluated in the same population of phage to assess relative integration frequency.
In addition, adapter sequences can be added to retron constructs to facilitate high- throughput amplification or sequencing. For example, a pair of adapter sequences can be added at the 5’ and 3’ ends of a retron construct to allow amplification or sequencing of multiple retron constructs simultaneously by the same set of primers.
Amplification of retron constructs may be performed, for example, before transfection of cells or ligation into vectors. Any method for amplifying the retron constructs may be used, including, but not limited to polymerase chain reaction (PCR), isothermal amplification, nucleic acid sequence-based amplification (NASBA), transcription mediated amplification (TMA), strand displacement amplification (SDA), and ligase chain reaction (LCR). In one embodiment, the retron constructs comprise common 5’ and 3’ priming sites to allow amplification of retron sequences in parallel with a set of universal primers. In another embodiment, a set of selective primers is used to selectively amplify a subset of retron sequences from a pooled mixture.
In various embodiments, the engineered ncRNA may comprise one or more guide sequences. In certain embodiments, the guide RNA can be inserted into the al/a2 complementarity region of the retron, which region of the ncRNA structure is where the 5’ and 3’ ends of the ncRNA fold back upon themselves.
In one embodiment, the guide RNA can be coupled to the 3’ end of the ncRNA in the al/a2 region. In another embodiment, the guide RNA can be coupled to the 5’ end of the ncRNA in the al/a2 region. In one embodiment, a guide RNA can be coupled to the 5’ and 3’ end of the ncRNA. In various embodiments, a linker may separate the 3’ or 5’ retron end, as the case may be, and the guide DNA. The linker may be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 or more nucleotides in length. For instance, non-limiting embodiment wherein the gRNA is coupled to the 3’ end of the al/a2 region.
The guide RNA may include a nucleotide sequence that is complementary to a genomic target sequence (i.e., a “spacer” sequence), and thereby mediates binding of the RNA-guided nuclease to which it is complexed (e.g., a Cas9 nuclease-gRNA complex) by hybridization between the space sequence and a complementary strand of the genomic target site. For example, the gRNA can be designed with a sequence complementary to the sequence of a mutant genomic allele to target the nuclease-gRNA complex to the site of a mutation. The mutation may comprise an insertion, a deletion, or a substitution. For example, the mutation may include a single nucleotide variation, gene fusion, translocation, inversion, duplication, frameshift, missense, nonsense, or other mutation associated with a phenotype or disease of interest. The targeted allele may be a common genetic variant or a rare genetic variant. In certain embodiments, the gRNA is designed to selectively bind to an allele with single basepair discrimination, for example, to allow binding of the nuclease-gRNA complex to a single nucleotide polymorphism (SNP) and modification of the SNP. In particular, the gRNA may be designed to target disease-relevant mutations of interest for the purpose of genome editing to remove the mutation from a gene.
The guide RNA can include a trans-activating crRNA (tracrRNA) scaffold recognized by a catalytically active RNA-guided nuclease (e.g., Cas9 nuclease). A guide RNA has the complementary sequence to the target DNA site, often referred to as a CRISPR RNA (crRNA), and a trans-activating crRNA (tracrRNA) scaffold that is recognized by a catalytically active Cas9 protein. The tracrRNA is made of up of a longer stretch of bases that are constant and provide the “stem loop” structure bound by the CRISPR nuclease. The crRNA can anneal to the tracrRNA through a direct repeat sequence to form a dual-guide RNA (dgRNA), or the crRNA-tracrRNA can be expressed as a single RNA transcript. When these RNA components hybridize they form a guide RNA which “programmably” targets CRISPR nucleases to DNA sequences depending on the complementarity of the crRNA and the presence of other DNA features (e.g. PAM sequences recognized by the nuclease). The guide RNA may be a single guide RNA comprising crRNA and tracrRNA sequences in a single RNA molecule, or the guide RNA may comprise two RNA molecules with crRNA and tracrRNA sequences residing in separate RNA molecules. In certain embodiments, the gRNA is 5-50 nucleotides, 10-30 nucleotides, 15-25 nucleotides, 18-22 nucleotides, or 18-21 nucleotides in length, or any length between the stated ranges, including, for example, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides in length. For example, as illustrated herein 20 base gRNAs can be useful for the human editing, whereas 18 base gRNAs were used in many experiments for editing yeast cells.
Examples of various CRISPR/Cas guide RNAs (as well as information regarding requirements related to protospacer adjacent motif (PAM) sequences present in targeted nucleic acids) can be found in the art, for example, see Jinek et al., Science. 2012 Aug 17;337(6096):816-21 ; Chylinski et al., RNA Biol. 2013 May;10(5):726-37; Ma et al., Biomed Res Int. 2013 ;2013:270805; Hou et al., Proc Natl Acad Sci U S A. 2013 Sep 24; 110(39): 15644- 9; Jinek et al., Elife. 2013;2:e00471; Pattanayak et al., Nat Biotechnol. 2013 Sep;31(9):839- 43; Qi et al., Cell. 2013 Feb 28; 152(5): 1173-83; Wang et al., Cell. 2013 May 9; 153(4):910- 8; Auer et al., Genome Res. 2013 Oct 31; Chen et al., Nucleic Acids Res. 2013 Nov l;41(20):el9; Cheng et al., Cell Res. 2013 Oct;23(10):1163- 71; Cho et al., Genetics. 2013 Nov; 195(3): 1177-80; DiCarlo et al., Nucleic Acids Res. 2013 Apr;41(7):4336-43; Dickinson et al., Nat Methods. 2013 Oct; 10(10): 1028-34; Ebina et al., Sci Rep. 2013;3:2510; Fujii et al., Nucleic Acids Res. 2013 Nov 1;41 (20):el87; Hu et al., Cell Res. 2013 Nov;23(ll): 1322-5; Jiang et al., Nucleic Acids Res. 2013 Nov l;41(20):el88; Larson et al., Nat Protoc. 2013 Nov;8(ll):2180-96; Mali et al., Nat Methods. 2013 Oct;10(10):957-63; Nakayama et al. Genesis. 2013 Dec;51(12):835-43; Ran et al., Nat Protoc. 2013 Nov;8(ll):2281-308; Ran et al., Cell. 2013 Sep 12; 154(6): 1380-9; Upadhyay et al., G3 (Bethesda). 2013 Dec 9;3(12):2233-8; Walsh et al., Proc Natl Acad Sci U S A. 2013 Sep 24; 110(39): 15514-5; Xie et al., Mol Plant. 2013 Oct 9; Yang et al., Cell. 2013 Sep 12; 154(6): 1370-9; Briner et al., Mol Cell. 2014 Oct 23;56(2):333-9; and U.S. patents and patent applications: 8,906,616; 8,895,308; 8,889,418; 8,889,356; 8,871,445; 8,865,406; 8,795,965; 8,771,945; 8,697,359; 20140068797; 20140170753; 20140179006; 20140179770; 20140186843; 20140186919; 20140186958; 20140189896; 20140227787; 20140234972; 20140242664; 20140242699; 20140242700; 20140242702; 20140248702; 20140256046; 20140273037; 20140273226; 20140273230; 20140273231; 20140273232; 20140273233; 20140273234; 20140273235; 20140287938; 20140295556; 20140295557; 20140298547; 20140304853; 20140309487; 20140310828; 20140310830; 20140315985; 20140335063; 20140335620; 20140342456; 20140342457; 20140342458; 20140349400; 20140349405; 20140356867; 20140356956; 20140356958; 20140356959; 20140357523; 20140357530; 20140364333; and 20140377868; all of which are hereby incorporated by reference in their entireties.
DONOR/REPAIR TEMPLATE MODIFICATIONS
In various other embodiments, the ncRNA may also be modified to include a nucleotide sequence that is reverse transcribed to form the donor/repair template. The repair template has a sequence that binds to a genomic DNA locus. Hence, in DNA form, the repair template sequence can be complementary to at least one chromosomal DNA strand. In some embodiments, the repair template is an HDR donor sequence which conducts repair of a DNA break by way of the homology-dependent repair pathway.
However, the donor/repair template has at least one nucleotide that is different from the complementary target sequence. In some cases, the donor/repair template has at least two nucleotides, or at least three nucleotides, or at least four nucleotides, or at least five nucleotides, or more that are different from the complementary target sequence. These ‘different’ nucleotides are the repair nucleotides that can replace nucleotides or sequences (e.g., mutations) in the target chromosomal site.
The donor/repair template segment of the ncRNA can have repair nucleotides that are adjacent to each other, or repair nucleotides that are separate from each other within the repair template segment. Such separations are warranted, for example, when the target chromosomal locus has two or more mutations that are not adjacent to each other.
The repair template or heterologous sequence of interest must be sufficiently complementary for hybridization to the target sequence to mediate homologous recombination between the donor polynucleotide and genomic DNA at the target locus. For example, a homology arm may comprise a nucleotide sequence having at least about 80-100% sequence identity/complementarity to the corresponding genomic target sequence, including any percent identity within this range, such as at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity thereto, wherein the nucleotide sequence comprising the intended edit can be integrated into the genomic DNA by HDR at the genomic target locus recognized (i.e., having sufficient complementary for hybridization) by the 5' and 3' arms of the repair template.
The corresponding homologous nucleotide sequences in the genomic target sequence (i.e., the "5' target sequence" and "3' target sequence") flank a specific site for cleavage and/or a specific site for introducing the intended edit. The distance between the specific cleavage site and the homologous nucleotide sequences (e.g., each homology arm of the repair template) can be several hundred nucleotides. In some embodiments, the distance between a homology arm and the cleavage site is 200 nucleotides or less (e.g., at least 0, 10, 20, 30, 50, 75, 100, 125, 150, 175, and 200 nucleotides). In most cases, a smaller distance may give rise to a higher gene targeting rate. In a preferred embodiment, the repair template is substantially identical to the target genomic sequence, across its entire length except for the sequence changes to be introduced to a portion of the genome that encompasses both the specific cleavage site and the portions of the genomic target sequence to be altered.
A homology arm of the repair template can be of any length, e.g., 10 nucleotides or more, 15 nucleotides or more, 20 nucleotides or more, 50 nucleotides or more, 100 nucleotides or more, 250 nucleotides or more, 300 nucleotides or more, 350 nucleotides or more, 400 nucleotides or more, 450 nucleotides or more, 500 nucleotides or more, 1000 nucleotides (1 kb) or more, 5000 nucleotides (5 kb) or more, 10000 nucleotides (10 kb) or more, etc. In some instances, the 5' and 3' homology arms are substantially equal in length to one another. However, in some instances the 5' and 3' homology arms are not necessarily equal in length to one another. For example, one homology arm may be 30% shorter or less than the other homology arm, 20% shorter or less than the other homology arm, 10% shorter or less than the other homology arm, 5% shorter or less than the other homology arm, 2% shorter or less than the other homology arm, or only a few nucleotides less than the other homology arm. In other instances, the 5' and 3' homology arms are substantially different in length from one another, e.g., one may be 40% shorter or more, 50% shorter or more, sometimes 60% shorter or more, 70% shorter or more, 80% shorter or more, 90% shorter or more, or 95% shorter or more than the other homology arm.
The donor/repair template segment of the ncRNA can therefore be of various lengths. In some cases, the repair template segment is at least 15 nucleotides, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, at least 190, or at least 200 nucleotides in length.
In certain embodiments, the donor/repair template comprises or encodes a donor / template sequence, wherein the donor / template corrects / repairs / removes a mutation at the target genome site. For example, the mutation may be a mutated exon in a disease gene.
In certain embodiments, the donor/repair template may encode or comprises a functional DNA element, such as a promoter, an enhancer, a protein binding sequence, a methylation site, or a homology region for assisting gene editing, etc. By “donor DNA” or “donor DNA template” it is meant a single-stranded DNA to be inserted at a site cleaved by a programmable nuclease (e.g., a CRISPR/Cas effector protein or otherwise RNA-guided nuclease; a TALEN; a ZFN) (e.g., after dsDNA cleavage, after nicking a target DNA, after dual nicking a target DNA, and the like). The donor DNA template can contain sufficient homology to a genomic sequence at the target site, e.g., 70%, 80%, 85%, 90%, 95%, or 100% homology with the nucleotide sequences flanking the target site, e.g. within about 50 bases or less of the target site, e.g. within about 30 bases, within about 15 bases, within about 10 bases, within about 5 bases, or immediately flanking the target site, to support homology-directed repair between it and the genomic sequence to which it bears homology.
Approximately 25, 50, 100, or 200 nucleotides, or more than 200 nucleotides, of sequence homology between a donor DNA template and a genomic sequence (or any integral value between 10 and 200 nucleotides, or more) can support homology-directed repair. Donor DNA template can be of any length, e.g., 50 nucleotides or more, 100 nucleotides or more, 250 nucleotides or more, 500 nucleotides or more, 1000 nucleotides or more, 5000 nucleotides or more, etc. A suitable donor DNA template can be from 50 nucleotides to 100 nucleotides, from 100 nucleotides to 500 nucleotides, from 500 nucleotides to 1000 nucleotides, from 1000 nucleotides to 5000 nucleotides, or from 5000 nucleotides to 10,000 nucleotides, or more than 10,000 nucleotides, in length.
A donor DNA template can comprise a first homology arm and a second homology arm. The first homology arm is at or near the 5’ end of the donor DNA; and comprises a nucleotide sequence that is at least partially complementary to a first nucleotide sequence in a target nucleic acid. The second homology arm is at or near the 3’ end of the donor DNA; and comprises a nucleotide sequence that is at least partially complementary to a second nucleotide sequence in the target nucleic acid. The first and second homology arms can each independently have a length of from about 10 nucleotides to 400 nucleotides; e.g., from 10 nucleotides (nt) to 15 nt, from 15 nt to 20 nt, from 20 nt to 25 nt, from 25 nt to 30 nt, from 30 nt to 35 nt, from 35 nt to 40 nt, from 40 nt to 45 nt, from 45 nt to 50 nt, from 50 nt to 75 nt, from 75 nt to 100 nt, from 100 nt to 125 nt, from 125 nt to 150 nt, from 150 nt to 175 nt, from 175 nt to 200 nt, from 200 nt to 225 nt, from 225 nt to 250 nt, from 250 nt to 275 nt, from 275 nt to 300 nt, from 325 nt to 350 nt, from 350 nt to 375 nt, or from 375 nt to 400 nt.
In certain embodiments, the donor DNA template is used for editing the target nucleotide sequence. In certain embodiments, the donor DNA template comprises one or more mutations to be introduced into the target polynucleotide. Examples of such mutations include substitutions, deletions, insertions, or a combination thereof. In certain embodiments, the mutation causes a shift in an open reading frame on the target polynucleotide. In certain embodiments, the donor polynucleotide alters a stop codon in the target polynucleotide. In certain embodiments, the donor polynucleotide corrects a premature stop codon. The correction can be achieved by deleting the stop codon, or by introducing one or more sequence changes to alter the stop codon to a codon. In certain embodiments, the donor polynucleotide addresses loss of function mutations, deletions, or translocations that may occur, for example, in certain disease contexts by inserting or restoring a functional copy of a gene, or functional fragment thereof, or a functional regulatory sequence or functional fragment of a regulatory sequence. A functional fragment includes a fragment less than the entire copy of a gene but otherwise provides sufficient nucleotide sequence to restore the functionality of a wild type gene or noncoding regulatory sequence (e.g., sequences encoding long non-coding RNA).
In certain embodiments, the donor DNA template may be used to replace a single allele of a defective gene or defective fragment thereof. In another embodiment, the donor DNA template is used to replace both alleles of a defective gene or defective gene fragment. A “defective gene” or “defective gene fragment” is a gene or portion of a gene that when expressed, fails to generate a functioning protein or non-coding RNA with functionality of the corresponding wild-type gene.
In certain example embodiments, these defective genes may be associated with one or more disease phenotypes. In certain example embodiments, the defective gene or gene fragment is not replaced but the heterologous nucleic acid is used to insert donor polynucleotides that encode gene or gene fragments that compensate for or override defective gene expression such that cell phenotypes associated with defective gene expression are eliminated or changed to a different or desired cellular phenotype. This can be achieved by including the coding sequence of a therapeutic protein, such as a therapeutic antibody or functional fragment thereof, or a wild-type version of a defective protein associated with one or more disease phenotypes.
In certain embodiments, the donor may include, but not be limited to, genes or gene fragments, encoding proteins or RNA transcripts to be expressed, regulatory elements, repair templates, and the like. According to the invention, the donor polynucleotides may comprise left end and right end sequence elements that function with transposition components that mediate insertion. In certain embodiments, the donor DNA template manipulates a splicing site on the target polynucleotide. In certain embodiments, the donor DNA template disrupts a splicing site. The disruption may be achieved by inserting the polynucleotide to a splicing site and/or introducing one or more mutations to the splicing site. In certain embodiments, the donor polynucleotide may restore a splicing site. For example, the polynucleotide may comprise a splicing site sequence.
In certain embodiments, the donor DNA template to be inserted has a size from 10 bp to 50 kb in length, e.g., from 50 bp to ~40kb, from 100 bp to ~30 kb, from 100 bp to ~10 kb, from 100 bp to 300 bp, from 200 bp to 400 bp, from 300 bp to 500 bp, from 400 bp to 600 bp, from 500 bp to 700 bp, from 600 bp to 800 bp, from 700 bp to 900 bp, from 800 bp to 1000 bp, from 900 bp to 1100 bp, from 1000 bp to 1200 bp, from 1100 bp to 1300 bp, from 1200 bp to 1400 bp, from 1300 bp to 1500 bp, from 1400 bp to 1600 bp, from 1500 bp to 1700 bp, from 1600 bp to 1800 bp, from 1700 bp to 1900 bp, from 1800 bp to 2000 bp nucleotides in length.
In certain embodiments, the homologous arm on one or both ends of the sequence to be inserted is independently about 20 bp, 40 bp, 60 bp, 80 bp, 100 bp, 120 bp, or 150 bp.
The first homology arm and the second homology arm of the donor DNA flank a nucleotide sequence (“a nucleotide sequence of interest” or “an intervening nucleotide sequence”) that is to be introduced into a target nucleic acid. The nucleotide sequence of interest can comprise: i) a nucleotide sequence encoding a polypeptide of interest; ii) a nucleotide sequence encoding an exon of a gene; iii) a promoter sequence; iv) an enhancer sequence; v) a nucleotide sequence encoding a non-coding RNA; or vi) any combination of the foregoing.
The donor DNA can provide for gene correction, gene replacement, gene tagging, transgene insertion, nucleotide deletion, gene disruption, gene mutation, etc. For example, the donor DNA can be used to add, e.g., insert or replace, nucleic acid material to a target DNA (e.g. to “knock in” a nucleic acid that encodes a protein, an siRNA, an miRNA, etc.), to add a tag (e.g., 6xHis, a fluorescent protein (e.g., a green fluorescent protein; a yellow fluorescent protein, etc.), hemagglutinin (HA), FLAG, etc.), to add a regulatory sequence to a gene (e.g. promoter, polyadenylation signal, internal ribosome entry sequence (IRES), 2A peptide, start codon, stop codon, splice signal, localization signal, enhancer, etc.), to modify a nucleic acid sequence (e.g., introduce a mutation), and the like. For example, the donor DNA can be used to modify DNA in a site-specific, i.e. “targeted”, way; for example gene knock-out, gene knock-in, gene editing, gene tagging, etc., as used in, for example, gene therapy, e.g. to treat a disease; or as an antiviral, antipathogenic, or anticancer therapeutic, the production of genetically modified organisms in agriculture, the large scale production of proteins by cells for therapeutic, diagnostic, or research purposes, the induction of pluripotent stem cells, biological research, the targeting of genes of pathogens for deletion or replacement, etc.
In some cases, the donor DNA comprises a nucleotide sequence encoding a polypeptide of interest. Polypeptides of interest include, e.g., a) functional versions of a polypeptide that comprises one or more amino acid substitutions, insertions, and/or deletions and that exhibits reduced function, e.g., where the reduced function is associated with or causes a pathological condition; b) fluorescent polypeptides; c) hormones; d) receptors for ligands; e) ion channels; f) neurotransmitters; g) and the like.
Non-limiting examples of polypeptides that can be encoded by a donor DNA include, e.g., IL1B (interleukin 1, beta), XDH (xanthine dehydrogenase), TP53 (tumor protein p53), PTGIS (prostaglandin 12 (prostacyclin) synthase), MB (myoglobin), IL4 (interleukin 4), ANGPT1 (angiopoietin 1), ABCG8 (ATP -binding cassette, sub-family G (WHITE), member 8), CTSK (cathepsin K), PTGIR (prostaglandin 12 (prostacyclin) receptor (IP)), KCNJ11 (potassium inwardly-rectifying channel, subfamily J, member 11), INS (insulin), CRP (C - reactive protein, pentraxin-related), PDGFRB (platelet- derived growth factor receptor, beta polypeptide), CCNA2 (cyclin A2), PDGFB (platelet-derived growth factor beta polypeptide (simian sarcoma viral (v-sis) oncogene homolog)), KCNJ5 (potassium inwardly- rectifying channel, subfamily J, member 5), KCNN3 (potassium intermediate/small conductance calcium-activated channel, subfamily N, member 3), CAPN10 (calpain 10), PTGES (prostaglandin E synthase), ADRA2B (adrenergic, alpha-2B-, receptor), ABCG5 (ATP- binding cassette, sub-family G (WHITE), member 5), PRDX2 (peroxiredoxin 2), CAPN5 (calpain 5), PARP14 (poly (ADP -ribose) polymerase family, member 14), MEX3C (mex-3 homolog C (C. elegans)), ACE angiotensin I converting enzyme (peptidyl-dipeptidase A) 1), TNF (tumor necrosis factor (TNF superfamily, member 2)), IL6 (interleukin 6 (interferon, beta 2)), STN (statin), SERPINE1 (serpin peptidase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 1), ALB (albumin), ADIPOQ (adiponectin, C1Q and collagen domain containing), APOB (apolipoprotein B (including Ag(x) antigen)), APOE (apolipoprotein E), LEP (leptin), MTHFR (5,10-methylenetetrahydrofolate reductase (NADPH)), APOA1 (apolipoprotein A-I), EDN1 (endothelin 1), NPPB (natriuretic peptide precursor B), NOS3 (nitric oxide synthase 3 (endothelial cell)), PPARG (peroxisome proliferator-activated receptor gamma), PLAT (plasminogen activator, tissue), PTGS2 (prostaglandin-endoperoxide synthase 2 (prostaglandin G/H synthase and cyclooxygenase)), CETP (cholesteryl ester transfer protein, plasma), AGTR1 (angiotensin II receptor, type 1), HMGCR (3 -hydroxy-3 -methylglutaryl-Coenzyme A reductase), IGF1 (insulin-like growth factor 1 (somatomedin C)), SELE (selectin E), REN (renin), PPARA (peroxisome proliferator- activated receptor alpha), P0N1 (paraoxonase 1), KNG1 (kininogen 1), CCL2 (chemokine (C- C motif) ligand 2), LPL (lipoprotein lipase), vWF (von Willebrand factor), F2 (coagulation factor II (thrombin)), ICAM1 (intercellular adhesion molecule 1), TGFB1 (transforming growth factor, beta 1), NPPA (natriuretic peptide precursor A), IL 10 (interleukin 10), EPO (erythropoietin), SOD1 (superoxide dismutase 1, soluble), VCAM1 (vascular cell adhesion molecule 1), IFNG (interferon, gamma), LPA (lipoprotein, Lp(a)), MPO (myeloperoxidase), ESRI (estrogen receptor 1), MAPK1 (mitogen-activated protein kinase 1), HP (haptoglobin), F3 (coagulation factor III (thromboplastin, tissue factor)), CST3 (cystatin C), COG2 (component of oligomeric Golgi complex 2), MMP9 (matrix metallopeptidase 9 (gelatinase B, 92 kDa gelatinase, 92 kDa type IV collagenase)), SERPINC1 (serpin peptidase inhibitor, clade C (antithrombin), member 1), F8 (coagulation factor VIII, procoagulant component), HM0X1 (heme oxygenase (decycling) 1), APOC3 (apolipoprotein C-III), IL8 (interleukin 8), PROK1 (prokineticin 1), CBS (cystathionine-beta-synthase), NOS2 (nitric oxide synthase 2, inducible), TLR4 (toll-like receptor 4), SELP (selectin P (granule membrane protein 140 kDa, antigen CD62)), ABCA1 (ATP -binding cassette, sub-family A (ABC1), member 1), AGT (angiotensinogen (serpin peptidase inhibitor, clade A, member 8)), LDLR (low density lipoprotein receptor), GPT (glutamic -pyruvate transaminase (alanine aminotransferase)), VEGFA (vascular endothelial growth factor A), NR3C2 (nuclear receptor subfamily 3, group C, member 2), IL18 (interleukin 18 (interferon-gamma-inducing factor)), NOS1 (nitric oxide synthase 1 (neuronal)), NR3C1 (nuclear receptor subfamily 3, group C, member 1 (glucocorticoid receptor)), FGB (fibrinogen beta chain), HGF (hepatocyte growth factor (hepapoietin A; scatter factor)), ILIA (interleukin 1, alpha), RETN (resistin), AKT1 (v-akt murine thymoma viral oncogene homolog 1), LIPC (lipase, hepatic), HSPD1 (heat shock 60 kDa protein 1 (chaperonin)), MAPK14 (mitogen-activated protein kinase 14), SPP1 (secreted phosphoprotein 1), ITGB3 (integrin, beta 3 (platelet glycoprotein I l la, antigen CD61)), CAT (catalase), UTS2 (urotensin 2), THBD (thrombomodulin), F10 (coagulation factor X), CP (ceruloplasmin (ferroxidase)), TNFRSF11B (tumor necrosis factor receptor superfamily, member lib), EDNRA (endothelin receptor type A), EGFR (epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b) oncogene homolog, avian)), MMP2 (matrix metallopeptidase 2 (gelatinase A, 72 kDa gelatinase, 72 kDa type IV collagenase)), PLG (plasminogen), NPY (neuropeptide Y), RHOD (ras homolog gene family, member D), MAPK8 (mitogen-activated protein kinase 8), MYC (v-myc myelocytomatosis viral oncogene homolog (avian)), FN1 (fibronectin 1), CMA1 (chymase 1, mast cell), PLAU (plasminogen activator, urokinase), GNB3 (guanine nucleotide binding protein (G protein), beta polypeptide 3), ADRB2 (adrenergic, beta-2-, receptor, surface), AP0A5 (apolipoprotein A-V), SOD2 (superoxide dismutase 2, mitochondrial), F5 (coagulation factor V (proaccelerin, labile factor)), VDR (vitamin D (1,25- dihydroxyvitamin D3) receptor), AL0X5 (arachidonate 5 - lipoxygenase), HLA-DRB1 (major histocompatibility complex, class II, DR beta 1), PARP1 (poly (ADP-ribose) polymerase 1), CD40LG (CD40 ligand), P0N2 (paraoxonase 2), AGER (advanced glycosylation end product-specific receptor), IRS1 (insulin receptor substrate 1), PTGS1 (prostaglandin-endoperoxide synthase 1 (prostaglandin G/H synthase and cyclooxygenase)), ECE1 (endothelin converting enzyme 1), F7 (coagulation factor VII (serum prothrombin conversion accelerator)), URN (interleukin 1 receptor antagonist), EPHX2 (epoxide hydrolase 2, cytoplasmic), IGFBP1 (insulin-like growth factor binding protein 1), MAPK10 (mitogen- activated protein kinase 10), FAS (Fas (TNF receptor superfamily, member 6)), ABCB1 (ATP -binding cassette, sub-family B (MDR/TAP), member 1), JUN (jun oncogene), IGFBP3 (insulin-like growth factor binding protein 3), CD14 (CD14 molecule), PDE5A (phosphodiesterase 5A, cGMP-specific), AGTR2 (angiotensin II receptor, type 2), CD40 (CD40 molecule, TNF receptor superfamily member 5), LCAT (lecithin-cholesterol acyltransferase), CCR5 (chemokine (C-C motif) receptor 5), MMP1 (matrix metallopeptidase 1 (interstitial collagenase)), TIMP1 (TIMP metallopeptidase inhibitor 1), ADM (adrenomedullin), DYT10 (dystonia 10), STAT3 (signal transducer and activator of transcription 3 (acute-phase response factor)), MMP3 (matrix metallopeptidase 3 (stromelysin 1, progelatinase)), ELN (elastin), USF1 (upstream transcription factor 1), CFH (complement factor H), HSPA4 (heat shock 70 kDa protein 4), MMP12 (matrix metallopeptidase 12 (macrophage elastase)), MME (membrane metallo- endopeptidase), F2R (coagulation factor II (thrombin) receptor), SELL (selectin L), CTSB (cathepsin B), ANXA5 (annexin A5), ADRB 1 (adrenergic, beta-1-, receptor), CYBA (cytochrome b-245, alpha polypeptide), FGA (fibrinogen alpha chain), GGT1 (gamma-glutamyltransf erase 1), LIPG (lipase, endothelial), HIF1 A (hypoxia inducible factor 1, alpha subunit (basic helix-loop-helix transcription factor)), CXCR4 (chemokine (C-X-C motif) receptor 4), PROC (protein C (inactivator of coagulation factors Va and Villa)), SCARB1 (scavenger receptor class B, member 1), CD79A (CD79a molecule, immunoglobulin-associated alpha), PLTP (phospholipid transfer protein), ADD1 (adducin 1 (alpha)), FGG (fibrinogen gamma chain), SAA1 (serum amyloid Al), KCNH2 (potassium voltage-gated channel, subfamily H (eag-related), member 2), DPP4 (dipeptidyl- peptidase 4), G6PD (glucose-6-phosphate dehydrogenase), NPR1 (natriuretic peptide receptor A/guanylate cyclase A (atrionatriuretic peptide receptor A)), VTN (vitronectin), KIAA0101 (KIAA0101), FOS (FBJ murine osteosarcoma viral oncogene homolog), TLR2 (toll-like receptor 2), PPIG (peptidylprolyl isomer ase G (cyclophilin G)), IL1R1 (interleukin 1 receptor, type I), AR (androgen receptor), CYP1A1 (cytochrome P450, family 1, subfamily A, polypeptide 1), SERPINA1 (serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 1), MTR (5-methyltetrahydrofolate-homocysteine methyltransferase), RBP4 (retinol binding protein 4, plasma), AP0A4 (apolipoprotein A-IV), CDKN2A (cyclin- dependent kinase inhibitor 2 A (melanoma, pl6, inhibits CDK4)), FGF2 (fibroblast growth factor 2 (basic)), EDNRB (endothelin receptor type B), ITGA2 (integrin, alpha 2 (CD49B, alpha 2 subunit of VLA-2 receptor)), CAB INI (calcineurin binding protein 1), SHBG (sex hormone-binding globulin), HMGB1 (high- mobility group box 1), HSP90B2P (heat shock protein 90 kDa beta (Grp94), member 2 (pseudogene)), CYP3 A4 (cytochrome P450, family 3, subfamily A, polypeptide 4), GJA1 (gap junction protein, alpha 1, 43 kDa), CAV1 (caveolin 1, caveolae protein, 22 kDa), ESR2 (estrogen receptor 2 (ER beta)), LTA (lymphotoxin alpha (TNF superfamily, member 1)), GDF15 (growth differentiation factor 15), BDNF (brain- derived neurotrophic factor), CYP2D6 (cytochrome P450, family 2, subfamily D, polypeptide 6), NGF (nerve growth factor (beta polypeptide)), SP1 (Sp 1 transcription factor), TGIF1 (TGFB-induced factor homeobox 1), SRC (v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian)), EGF (epidermal growth factor (beta-urogastrone)), PIK3CG (phosphoinositide-3 -kinase, catalytic, gamma polypeptide), HLA-A (major histocompatibility complex, class I, A), KCNQ1 (potassium voltage-gated channel, KQT-like subfamily, member 1), CNR1 (cannabinoid receptor 1 (brain)), FBN1 (fibrillin 1), CHKA (choline kinase alpha), BEST1 (bestrophin 1), APP (amyloid beta (A4) precursor protein), CTNNB1 (catenin (cadherin-associated protein), beta 1, 88 kDa), IL2 (interleukin 2), CD36 (CD36 molecule (thrombospondin receptor)), PRKAB1 (protein kinase, AMP-activated, beta 1 non-catalytic subunit), TPO (thyroid peroxidase), ALDH7A1 (aldehyde dehydrogenase 7 family, member Al), CX3CR1 (chemokine (C-X3-C motif) receptor 1), TH (tyrosine hydroxylase), F9 (coagulation factor IX), GH1 (growth hormone 1), TF (transferrin), HFE (hemochromatosis), IE17A (interleukin 17A), PTEN (phosphatase and tensin homolog), GSTM1 (glutathione S - transferase mu 1), DMD (dystrophin), GATA4 (GATAbinding protein 4), F13A1 (coagulation factor XIII, Al polypeptide), TTR (transthyretin), FABP4 (fatty acid binding protein 4, adipocyte), P0N3 (paraoxonase 3), AP0C1 (apolipoprotein C-I), INSR (insulin receptor), TNFRSF1B (tumor necrosis factor receptor superfamily, member IB), HTR2A (5- hydroxytryptamine (serotonin) receptor 2A), CSF3 (colony stimulating factor 3 (granulocyte)), CYP2C9 (cytochrome P450, family 2, subfamily C, polypeptide 9), TXN (thioredoxin), CYP11B2 (cytochrome P450, family 11, subfamily B, polypeptide 2), PTH (parathyroid hormone), CSF2 (colony stimulating factor 2 (granulocyte-macrophage)), KDR (kinase insert domain receptor (a type III receptor tyrosine kinase)), PLA2G2A (phospholipase A2, group IIA (platelets, synovial fluid)), B2M (beta-2-microglobulin), THBS1 (thrombospondin 1), GCG (glucagon), RHOA (ras homolog gene family, member A), ALDH2 (aldehyde dehydrogenase 2 family (mitochondrial)), TCF7L2 (transcription factor 7-like 2 (T-cell specific, HMG-box)), BDKRB2 (bradykinin receptor B2), NFE2L2 (nuclear factor (erythroid- derived 2)-like 2), N0TCH1 (Notch homolog 1, translocation-associated (Drosophila)), UGT1A1 (UDP glucuronosyltransferase 1 family, polypeptide Al), IFNA1 (interferon, alpha 1), PPARD (peroxisome proliferator-activated receptor delta), SIRT1 (sirtuin (silent mating type information regulation 2 homolog) 1 (S. cerevisiae)), GNRH1 (gonadotropin-releasing hormone 1 (luteinizing- releasing hormone)), PAPPA (pregnancy-associated plasma protein A, pappalysin 1), ARR3 (arrestin 3, retinal (X-arrestin)), NPPC (natriuretic peptide precursor C), AHSP (alpha hemoglobin stabilizing protein), PTK2 (PTK2 protein tyrosine kinase 2), IL 13 (interleukin 13), MTOR (mechanistic target of rapamycin (serine/threonine kinase)), ITGB2 (integrin, beta 2 (complement component 3 receptor 3 and 4 subunit)), GSTT1 (glutathione S- transfcrase theta 1), IL6ST (interleukin 6 signal transducer (gpl30, oncostatin M receptor)), CPB2 (carboxypeptidase B2 (plasma)), CYP1A2 (cytochrome P450, family 1, subfamily A, polypeptide 2), HNF4A (hepatocyte nuclear factor 4, alpha), SLC6A4 (solute carrier family 6 (neurotransmitter transporter, serotonin), member 4), PLA2G6 (phospholipase A2, group VI (cytosolic, calcium-independent)), TNFSF11 (tumor necrosis factor (ligand) superfamily, member 11), SLC8A1 (solute carrier family 8 (sodium/calcium exchanger), member 1), F2RL1 (coagulation factor II (thrombin) receptor-like 1), AKR1A1 (aldo-keto reductase family 1, member Al (aldehyde reductase)), ALDH9A1 (aldehyde dehydrogenase 9 family, member Al), BGLAP (bone gamma-carboxyglutamate (gla) protein), MTTP (microsomal triglyceride transfer protein), MTRR (5-methyltetrahydrofolate- homocysteine methyltransferase reductase), SULT1A3 (sulfotransferase family, cytosolic, 1A, phenol- preferring, member 3), RAGE (renal tumor antigen), C4B (complement component 4B (Chido blood group), P2RY12 (purinergic receptor P2Y, G-protein coupled, 12), RNLS (renalase, FAD-dependent amine oxidase), CREB1 (cAMP responsive element binding protein 1), POMC (proopiomelanocortin), RAC1 (ras-related C3 botulinum toxin substrate 1 (rho family, small GTP binding protein Rael)), LMNA (lamin NC), CD59 (CD59 molecule, complement regulatory protein), SCN5A (sodium channel, voltage-gated, type V, alpha subunit), CYP1B1 (cytochrome P450, family 1, subfamily B, polypeptide 1), MIF (macrophage migration inhibitory factor (glycosylation-inhibiting factor)), MMP13 (matrix metallopeptidase 13 (collagenase 3)), TIMP2 (TIMP metallopeptidase inhibitor 2), CYP19A1 (cytochrome P450, family 19, subfamily A, polypeptide 1), CYP21 A2 (cytochrome P450, family 21, subfamily A, polypeptide 2), PTPN22 (protein tyrosine phosphatase, non-receptor type 22 (lymphoid)), MYH14 (myosin, heavy chain 14, non-muscle), MBL2 (mannose-binding lectin (protein C) 2, soluble (opsonic defect)), SELPLG (selectin P ligand), A0C3 (amine oxidase, copper containing 3 (vascular adhesion protein 1)), CTSL1 (cathepsin LI), PCNA (proliferating cell nuclear antigen), IGF2 (insulin like growth factor 2 (somatomedin A)), ITGB1 (integrin, beta 1 (fibronectin receptor, beta polypeptide, antigen CD29 includes MDF2, MSK12)), CAST (calpastatin), CXCL12 (chemokine (C-X-C motif) ligand 12 (stromal cell-derived factor 1)), IGHE (immunoglobulin heavy constant epsilon), KCNE1 (potassium voltage-gated channel, Isk-related family, member 1), TFRC (transferrin receptor (p90, CD71)), C0L1A1 (collagen, type I, alpha 1), C0L1A2 (collagen, type I, alpha 2), IL2RB (interleukin 2 receptor, beta), PLA2G10 (phospholipase A2, group X), ANGPT2 (angiopoietin 2), PROCR (protein C receptor, endothelial (EPCR)), N0X4 (NADPH oxidase 4), HAMP (hepcidin antimicrobial peptide), PTPN11 (protein tyrosine phosphatase, non-receptor type 11), SLC2A1 (solute carrier family 2 (facilitated glucose transporter), member 1), IL2RA (interleukin 2 receptor, alpha), CCL5 (chemokine (C-C motif) ligand 5), IRF1 (interferon regulatory factor 1), CFLAR (CASP8 and FADD-like apoptosis regulator), CALC A (calcitonin-related polypeptide alpha), EIF4E (eukaryotic translation initiation factor 4E), GSTP1 (glutathione S-transferase pi 1), JAK2 (Janus kinase 2), CYP3A5 (cytochrome P450, family 3, subfamily A, polypeptide 5), HSPG2 (heparan sulfate proteoglycan 2), CCL3 (chemokine (C-C motif) ligand 3), MYD88 (myeloid differentiation primary response gene (88)), VIP (vasoactive intestinal peptide), SOAT1 (sterol O-acyltransferase 1), ADRBK1 (adrenergic, beta, receptor kinase 1), NR4A2 (nuclear receptor subfamily 4, group A, member 2), MMP8 (matrix metallopeptidase 8 (neutrophil collagenase)), NPR2 (natriuretic peptide receptor B/guanylate cyclase B (atrionatriuretic peptide receptor B)), GCH1 (GTP cyclohydrolase 1), EPRS (glutamyl -prolyl - tRNA synthetase), PPARGC1A (peroxisome proliferator-activated receptor gamma, coactivator 1 alpha), F12 (coagulation factor XII (Hageman factor)), PEC AMI (platelet/endothelial cell adhesion molecule), CCL4 (chemokine (C-C motif) ligand 4), SERPINA3 (serpin peptidase inhibitor, clade A (alpha- 1 antiproteinase, antitrypsin), member 3), CASR (calcium-sensing receptor), GJA5 (gap junction protein, alpha 5, 40 kDa), FABP2 (fatty acid binding protein 2, intestinal), TTF2 (transcription termination factor, RNA polymerase II), PROS1 (protein S (alpha)), CTF1 (cardiotrophin 1), SGCB (sarcoglycan, beta (43 kDa dystrophin- associated glycoprotein)), YME1L1 (YMEl-like 1 (S. cerevisiae)), CAMP (cathelicidin antimicrobial peptide), ZC3H12A (zinc finger CCCH-type containing 12A), AKR1B1 (aldo-keto reductase family 1, member Bl (aldose reductase)), DES (desmin), MMP7 (matrix metallopeptidase 7 (matrilysin, uterine)), AHR (aryl hydrocarbon receptor), CSF1 (colony stimulating factor 1 (macrophage)), HDAC9 (histone deacetylase 9), CTGF (connective tissue growth factor), KCNMA1 (potassium large conductance calcium-activated channel, subfamily M, alpha member 1), UGT1A (UDP glucuronosyltransf erase 1 family, polypeptide A complex locus), PRKCA (protein kinase C, alpha), COMT (catechol-b- methyltransf erase), SIOOB (SI 00 calcium binding protein B), EGR1 (early growth response 1), PRL (prolactin), IL15 (interleukin 15), DRD4 (dopamine receptor D4), CAMK2G (calcium/calmodulin- dependent protein kinase II gamma), SLC22A2 (solute carrier family 22 (organic cation transporter), member 2), CCL11 (chemokine (C-C motif) ligand 11), PGF (placental growth factor), THPO (thrombopoietin), GP6 (glycoprotein VI (platelet)), TACR1 (tachykinin receptor 1), NTS (neurotensin), HNF1 A (HNF1 homeobox A), SST (somatostatin), KCND1 (potassium voltage-gated channel, Shal- related subfamily, member 1), LOC646627 (phospholipase inhibitor), TBXAS1 (thromboxane A synthase 1 (platelet)), CYP2J2 (cytochrome P450, family 2, subfamily J, polypeptide 2), TBXA2R (thromboxane A2 receptor), ADH1C (alcohol dehydrogenase 1C (class I), gamma polypeptide), ALOX12 (arachidonate 12-lipoxygenase), AHSG (alpha-2-HS-gly coprotein), BHMT (betainehomocysteine methyltransferase), GJA4 (gap junction protein, alpha 4, 37 kDa), SLC25A4 (solute carrier family 25 (mitochondrial carrier; adenine nucleotide translocator), member 4), ACLY (ATP citrate lyase), ALOX5AP (arachidonate 5-lipoxygenase-activating protein), NUMA1 (nuclear mitotic apparatus protein 1), CYP27B1 (cytochrome P450, family 27, subfamily B, polypeptide 1), CYSLTR2 (cysteinyl leukotriene receptor 2), SOD3 (superoxide dismutase 3, extracellular), LTC4S (leukotriene C4 synthase), UCN (urocortin), GHRL (ghrelin/obestatin prepropeptide), AP0C2 (apolipoprotein C-II), CLEC4A (C-type lectin domain family 4, member A), KBTBD10 (kelch repeat and BTB (POZ) domain containing 10), TNC (tenascin C), TYMS (thymidylate synthetase), SHC1 (SHC (Src homology 2 domain containing) transforming protein 1), LRP1 (low density lipoprotein receptor-related protein 1), SOCS3 (suppressor of cytokine signaling 3), ADH1B (alcohol dehydrogenase IB (class I), beta polypeptide), KLK3 (kallikrein-related peptidase 3), HSD11B1 (hydroxysteroid (11 -beta) dehydrogenase 1), VKORC1 (vitamin K epoxide reductase complex, subunit 1), SERPINB2 (serpin peptidase inhibitor, clade B (ovalbumin), member 2), TNS1 (tensin 1), RNF19A (ring finger protein 19 A), EPOR (erythropoietin receptor), ITGAM (integrin, alpha M (complement component 3 receptor 3 subunit)), PITX2 (paired-like homeodomain 2), MAPK7 (mitogen- activated protein kinase 7), FCGR3A (Fc fragment of IgG, low affinity I l la, receptor (CD16a)), LEPR (leptin receptor), ENG (endoglin), GPX1 (glutathione peroxidase 1), GOT2 (glutamic-oxaloacetic transaminase 2, mitochondrial (aspartate aminotransferase 2)), HRH1 (histamine receptor HI), NR112 (nuclear receptor subfamily 1, group I, member 2), CRH (corticotropin releasing hormone), HTR1A (5-hydroxytryptamine (serotonin) receptor 1A), VDAC1 (voltage-dependent anion channel 1), HPSE (heparanase), SFTPD (surfactant protein D), TAP2 (transporter 2, ATP -binding cassette, sub-family B (MDR/TAP)), RNF123 (ring finger protein 123), PTK2B (PTK2B protein tyrosine kinase 2 beta), NTRK2 (neurotrophic tyrosine kinase, receptor, type 2), IL6R (interleukin 6 receptor), ACHE (acetylcholinesterase (Yt blood group)), GLP1R (glucagon-like peptide 1 receptor), GHR (growth hormone receptor), GSR (glutathione reductase), NQO1 (NAD(P)H dehydrogenase, quinone 1), NR5A1 (nuclear receptor subfamily 5, group A, member 1), GJB2 (gap junction protein, beta 2, 26 kDa), SLC9A1 (solute carrier family 9 (sodium/hydrogen exchanger), member 1), MAOA (monoamine oxidase A), PCSK9 (proprotein convertase subtilisin/kexin type 9), FCGR2A (Fc fragment of IgG, low affinity Ila, receptor (CD32)), SERPINF1 (serpin peptidase inhibitor, clade F (alpha-2 antiplasmin, pigment epithelium derived factor), member 1), EDN3 (endothelin 3), DHFR (dihydrofolate reductase), GAS6 (growth arrest-specific 6), SMPD1 (sphingomyelin phosphodiesterase 1, acid lysosomal), UCP2 (uncoupling protein 2 (mitochondrial, proton carrier)), TFAP2A (transcription factor AP-2 alpha (activating enhancer binding protein 2 alpha)), C4BPA (complement component 4 binding protein, alpha), SERPINF2 (serpin peptidase inhibitor, clade F (alpha-2 antiplasmin, pigment epithelium derived factor), member 2), TYMP (thymidine phosphorylase), ALPP (alkaline phosphatase, placental (Regan isozyme)), CXCR2 (chemokine (C-X-C motif) receptor 2), SLC39A3 (solute carrier family 39 (zinc transporter), member 3), ABCG2 (ATP- binding cassette, sub-family G (WHITE), member 2), ADA (adenosine deaminase), JAK3 (Janus kinase 3), HSPA1A (heat shock 70 kDa protein 1A), FASN (fatty acid synthase), FGF1 (fibroblast growth factor 1 (acidic)), Fll (coagulation factor XI), ATP7A (ATPase, Cu++ transporting, alpha polypeptide), CR1 (complement component (3b/4b) receptor 1 (Knops blood group)), GFAP (glial fibrillary acidic protein), ROCK1 (Rho-associated, coiled-coil containing protein kinase 1), MECP2 (methyl CpG binding protein 2 (Rett syndrome)), MYLK (myosin light chain kinase), BCF1E (butyrylcholinesterase), LIPE (lipase, hormone-sensitive), PRDX5 (peroxiredoxin 5), ADORA1 (adenosine Al receptor), WRN (Werner syndrome, RecQ helicase-like), CXCR3 (chemokine (C-X-C motif) receptor 3), CD81 (CD81 molecule), SMAD7 (SMAD family member 7), LAMC2 (laminin, gamma 2), MAP3K5 (mitogen- activated protein kinase kinase kinase 5), CF1GA (chromogranin A (parathyroid secretory protein 1)), IAPP (islet amyloid polypeptide), RFIO (rhodopsin), ENPP1 (ectonucleotide pyrophosphatase/phosphodiesterase 1), PTF1LF1 (parathyroid hormone-like hormone), NRG1 (neuregulin 1), VEGFC (vascular endothelial growth factor C), ENPEP (glutamyl aminopeptidase (aminopeptidase A)), CEBPB (CCAAT/enhancer binding protein (CZEBP), beta), NAGLU (N-acetylglucosaminidase, alpha), F2RL3 (coagulation factor II (thrombin) receptor-like 3), CX3CL1 (chemokine (C-X3- C motif) ligand 1), BDKRB1 (bradykinin receptor Bl), ADAMTS13 (ADAM metallopeptidase with thrombospondin type 1 motif, 13), ELANE (elastase, neutrophil expressed), ENPP2 (ectonucleotide pyrophosphatase/phosphodiesterase 2), CISFI (cytokine inducible SF12- containing protein), GAST (gastrin), MYOC (myocilin, trabecular mesh work inducible glucocorticoid response), ATP1 A2 (ATPase, Na+/K+ transporting, alpha 2 polypeptide), NF1 (neurofibromin 1), GJB1 (gap junction protein, beta 1, 32 kDa), MEF2A (myocyte enhancer factor 2A), VCL (vinculin), BMPR2 (bone morphogenetic protein receptor, type II (serine/threonine kinase)), TUBB (tubulin, beta), CDC42 (cell division cycle 42 (GTP binding protein, 25 kDa)), KRT18 (keratin 18), F1SF1 (heat shock transcription factor 1), MYB (v- myb myeloblastosis viral oncogene homolog (avian)), PRKAA2 (protein kinase, AMP- activated, alpha 2 catalytic subunit), ROCK2 (Rho-associated, coiled-coil containing protein kinase 2), TFPI (tissue factor pathway inhibitor (lipoprotein-associated coagulation inhibitor)), PRKG1 (protein kinase, cGMP- dependent, type I), BMP2 (bone morphogenetic protein 2), CTNND1 (catenin (cadherin-associated protein), delta 1), CTF1 (cystathionase (cystathionine gamma-lyase)), CTSS (cathepsin S), VAV2 (vav 2 guanine nucleotide exchange factor), NPY2R (neuropeptide Y receptor Y2), IGFBP2 (insulin-like growth factor binding protein 2, 36 kDa), CD28 (CD28 molecule), GSTA1 (glutathione S-transferase alpha 1), PPIA (peptidylprolyl isomerase A (cyclophilin A)), APOFI (apolipoprotein FI (beta-2- glycoprotein I)), S100A8 (S100 calcium binding protein A8), IL11 (interleukin 11), AL0X15 (arachidonate 15 -lipoxygenase), FBLN1 (fibulin 1), NR1F13 (nuclear receptor subfamily 1, group FI, member 3), SCD (stearoyl-CoA desaturase (delta-9-desaturase)), GIP (gastric inhibitory polypeptide), CF1GB (chromogranin B (secretogranin 1)), PRKCB (protein kinase C, beta), SRD5A1 (steroid-5-alpha- reductase, alpha polypeptide 1 (3-oxo-5 alpha-steroid delta 4- dehydrogenase alpha 1)), F1SD11B2 (hydroxy steroid (11-beta) dehydrogenase 2), CALCRL (calcitonin receptor-like), GALNT2 (UDP-N- acetyl-alpha-D-galactosamine:polypeptide N- acetylgalactosaminyltransferase 2 (GalNAc-T2)), ANGPTL4 (angiopoietin-like 4), KCNN4 (potassium intermediate/small conductance calcium-activated channel, subfamily N, member 4), PIK3C2A (phosphoinositide-3 -kinase, class 2, alpha polypeptide), HBEGF (heparin- binding EGF-like growth factor), CYP7A1 (cytochrome P450, family 7, subfamily A, polypeptide 1), HLA-DRB5 (major histocompatibility complex, class II, DR beta 5), BNIP3 (BCL2/adeno virus E1B 19 kDa interacting protein 3), GCKR (glucokinase (hexokinase 4) regulator), S100A12 (S100 calcium binding protein A 12), PADI4 (peptidyl arginine deaminase, type IV), HSPA14 (heat shock 70 kDa protein 14), CXCR1 (chemokine (C-X-C motif) receptor 1), H19 (H19, imprinted maternally expressed transcript (non-protein coding)), KRTAP19-3 (keratin associated protein 19-3), insulin, RAC2 (ras-related C3 botulinum toxin substrate 2 (rho family, small GTP binding protein Rac2)), RYR1 (ryanodine receptor 1 (skeletal)), CLOCK (clock homolog (mouse)), NGFR (nerve growth factor receptor (TNFR superfamily, member 16)), DBH (dopamine beta-hydroxylase (dopamine betamonooxygenase)), CHRNA4 (cholinergic receptor, nicotinic, alpha 4), CACNA1C (calcium channel, voltage-dependent, L type, alpha 1C subunit), PRKAG2 (protein kinase, AMP- activated, gamma 2 non-catalytic subunit), CHAT (choline acetyltransferase), PTGDS (prostaglandin D2 synthase 21 kDa (brain)), NR1H2 (nuclear receptor subfamily 1, group H, member 2), TEK (TEK tyrosine kinase, endothelial), VEGFB (vascular endothelial growth factor B), MEF2C (myocyte enhancer factor 2C), MAPKAPK2 (mitogen-activated protein kinase-activated protein kinase 2), TNFRSF11 A (tumor necrosis factor receptor superfamily, member I la, NFKB activator), HSPA9 (heat shock 70 kDa protein 9 (mortalin)), CYSLTR1 (cysteinyl leukotriene receptor 1), MAT1 A (methionine adenosyltransferase I, alpha), OPRL1 (opiate receptor-like 1), IMPA1 (inositol(myo)-l(or 4) -monophosphatase 1), CLCN2 (chloride channel 2), DLD (dihydrolipoamide dehydrogenase), PSMA6 (proteasome (prosome, macropain) subunit, alpha type, 6), PSMB8 (proteasome (prosome, macropain) subunit, beta type, 8 (large multifunctional peptidase 7)), CHI3L1 (chitinase 3 -like 1 (cartilage glycoprotein- 39)), ALDH1B1 (aldehyde dehydrogenase 1 family, member Bl), PARP2 (poly (ADP -ribose) polymerase 2), STAR (steroidogenic acute regulatory protein), LBP (lipopolysaccharide binding protein), ABCC6 (ATP- binding cassette, sub-family C(CFTR/MRP), member 6), RGS2 (regulator of G-protein signaling 2, 24 kDa), EFNB2 (ephrin-B2), cystic fibrosis transmembrane conductance regulator (CFTR), GJB6 (gap junction protein, beta 6, 30 kDa), APOA2 (apolipoprotein A-II), AMPD1 (adenosine monophosphate deaminase 1), DYSF (dysferlin, limb girdle muscular dystrophy 2B (autosomal recessive)), FDFT1 (farnesyl- diphosphate famesyltransf erase 1), EDN2 (endothelin 2), CCR6 (chemokine (C-C motif) receptor 6), GJB3 (gap junction protein, beta 3, 31 kDa), IL1RL1 (interleukin 1 receptor-like 1), ENTPD1 (ectonucleoside triphosphate diphosphohydrolase 1), BBS4 (Bardet-Biedl syndrome 4), CELSR2 (cadherin, EGF LAG seven-pass G-type receptor 2 (flamingo homolog, Drosophila)), FUR (Fll receptor), RAPGEF3 (Rap guanine nucleotide exchange factor (GEF) 3), HYAL1 (hyaluronoglucosaminidase 1), ZNF259 (zinc finger protein 259), ATOX1 (ATX1 antioxidant protein 1 homolog (yeast)), ATF6 (activating transcription factor 6), K'HK (ketohexokinase (fructokinase)), SAT1 (spermidine/spermine Nl-acetyltransf erase 1), GGFI (gamma-glutamyl hydrolase (conjugase, folylpolygammaglutamyl hydrolase)), TIMP4 (TIMP metallopeptidase inhibitor 4), SLC4A4 (solute carrier family 4, sodium bicarbonate cotransporter, member 4), PDE2A (phosphodiesterase 2 A, cGMP- stimulated), PDE3B (phosphodiesterase 3B, cGMP-inhibited), FADS1 (fatty acid desaturase 1), FADS2 (fatty acid desaturase 2), TMSB4X (thymosin beta 4, X-linked), TXNIP (thioredoxin interacting protein), LIMSI (LIM and senescent cell antigen-like domains 1), RFIOB (ras homolog gene family, member B), LY96 (lymphocyte antigen 96), FOXO1 (forkhead box 01), PNPLA2 (patatin-like phospholipase domain containing 2), TRH (thyrotropin-releasing hormone), GJC1 (gap junction protein, gamma 1, 45 kDa), SLC17A5 (solute carrier family 17 (anion/sugar transporter), member 5), FTO (fat mass and obesity associated), GJD2 (gap junction protein, delta 2, 36 kDa), PSRC1 (proline/serine-rich coiled-coil 1), CASP12 (caspase 12 (gene/pseudogene)), GPBAR1 (G protein-coupled bile acid receptor 1), PXK (PX domain containing serine/threonine kinase), IL33 (interleukin 33), TRIBI (tribbles homolog 1 (Drosophila)), PBX4 (pre-B-cell leukemia homeobox 4), NUPR1 (nuclear protein, transcriptional regulator, 1), 15-Sep(15 kDa selenoprotein), CILP2 (cartilage intermediate layer protein 2), TERC (telomerase RNA component), GGT2 (gamma-glutamyltransf erase 2), MT-C01 (mitochondrially encoded cytochrome c oxidase I), UOX (urate oxidase, pseudogene), a CRISPR/Cas effector polypeptide, an enzymatically active CRISPR/Cas effector polypeptide (e.g., is capable of cleaving a target nucleic acid) and a CRISPR/Cas effector polypeptide that is not enzymatically active (e.g., does not cleave a target nucleic acid, but retains binding to the target nucleic acid). In some cases, the donor DNA encodes a wildtype version of any of the foregoing polypeptides; i.e., the donor DNA can encode a “normal” version that does not include a mutation(s) that results in reduced function, lack of function, or pathogenesis.
In some cases, the donor DNA comprises a nucleotide sequence encoding a fluorescent polypeptide. Suitable fluorescent proteins include, but are not limited to, green fluorescent protein (GFP) or variants thereof, blue fluorescent variant of GFP (BFP), cyan fluorescent variant of GFP (CFP), yellow fluorescent variant of GFP (YFP), enhanced GFP (EGFP), enhanced CFP (ECFP), enhanced YFP (EYFP), GFPS65T, Emerald, Topaz (TYFP), Venus, Citrine, mCitrine, GFPuv, destabilized EGFP (dEGFP), destabilized ECFP (dECFP), destabilised EYFP (dEYFP), mCFPm, Cerulean, T-Sapphire, CyPet, YPet, mKO, HcRed, t- HcRed, DsRed, DsRed2, DsRed-monomer, J-Red, dimer2, t-dimer2(12), mRFPl, pocilloporin, Renilla GFP, Monster GFP, paGFP, Kaede protein and kindling protein, Phycobiliproteins and Phycobiliprotein conjugates including B-Phycoerythrin, R-Phycoerythrin and Allophycocyanin. Other examples of fluorescent proteins include mHoneydew, mBanana, mOrange, dTomato, tdTomato, mTangerine, mStrawberry, mCherry, mGrapel, mRaspberry, mGrape2, m PI urn (Shaner et al. (2005) Nat. Methods 2:905-909), and the like. Any of a variety of fluorescent and colored proteins from Anthozoan species, as described in, e.g., Matz et al. (1999) Nature Biotechnol. 17:969-973, can be encoded.
In some cases, the donor DNA encodes an RNA, e.g., an siRNA, a microRNA, a short hairpin RNA (shRNA), an anti-sense RNA, a riboswitch, a ribozyme, an aptamer, a ribosomal RNA, a transfer RNA, and the like.
A donor DNA can include, in addition to a nucleotide sequence encoding one or more gene products (e.g., an RNA and/or a polypeptide), one or more transcriptional control elements, e.g., a promoter, an enhancer, and the like. In some cases, the transcriptional control element is inducible. In some cases, the promoter is reversible. In some cases, the transcriptional control element is constitutive. In some cases, the promoter is functional in a eukaryotic cell. In some cases, the promoter is a cell type- specific promoter. In some cases, the promoter is a tissue-specific promoter. The nucleotide sequence of the donor DNA is typically not identical to the target nucleic acid (e.g., genomic sequence) that it replaces. Rather, the donor DNA may contain at least one or more single base changes, insertions, deletions, inversions or rearrangements with respect to the target nucleic acid (e.g., genomic sequence), so long as sufficient homology is present to support homology-directed repair (e.g., for gene correction, e.g., to convert a diseasecausing base pair or a non-disease-causing base pair). In some cases, the donor DNA comprises a non-homologous sequence flanked by two regions of homology, such that homology-directed repair between the target DNA region and the two flanking sequences results in insertion of the non-homologous sequence at the target region. Donor DNA may also comprise a vector backbone containing sequences that are not homologous to the DNA region of interest (the target nucleic acid) and that are not intended for insertion into the DNA region of interest (the target nucleic acid). Generally, the homologous region(s) of a donor sequence will have at least 50% sequence identity to a target nucleic acid (e.g., a genomic sequence) with which recombination is desired. In certain cases, 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.9% sequence identity is present. Any value between 1% and 100% sequence identity can be present, depending upon the length of the donor polynucleotide.
The donor DNA may comprise certain nucleotide sequence differences as compared to the target nucleic acid (e.g., genomic sequence), where such difference includes, e.g. restriction sites, nucleotide polymorphisms, selectable markers (e.g., drug resistance genes, fluorescent proteins, enzymes etc.), etc., which may be used to assess for successful insertion of the donor DNA at the cleavage site or in some cases may be used for other purposes (e.g., to signify expression at the targeted genomic locus). In some cases, if located in a coding region, such nucleotide sequence differences will not change the amino acid sequence, or will make silent amino acid changes (i.e., changes which do not affect the structure or function of the protein). Alternatively, these sequences differences may include flanking recombination sequences such as FLPs, loxP sequences, or the like, that can be activated at a later time for removal of the marker sequence. In some cases, the donor DNA will include one or more nucleotide sequences to aid in localization of the donor to the nucleus of the recipient cell or to aid in the integration of the donor DNA into the target nucleic acid. For example, in some case, the donor DNA may comprise one or more nucleotide sequences encoding one or more nuclear localization signals and the like. In some cases, the donor DNA will include nucleotide sequences to recruit DNA repair enzymes to increase insertion efficiency. Fiuman enzymes involved in homology directed repair include MRN-CtIP, BLM-DNA2, Exol, ERCC1, Rad51, Rad52, Ligase 1, RoIQ, PARP1, Ligase 3, BRCA2, RecQ/BLM-ToroIIIa, RTEL, Roid, and Roih (Verma and Greenburg (2016) Genes Dev. 30 (10): 1138-1154). In some cases, the donor DNA is delivered as reconstituted chromatin (Cruz-Becerra and Kadonaga (2020) eLife 2020;9:e55780 DOI: 10.7554/eLife.55780).
In some cases, the ends of the donor DNA are protected (e.g., from exonucleolytic degradation) by any convenient method and such methods are known to those of skill in the art. For example, one or more dideoxynucleotide residues can be added to the 3' terminus of a linear molecule and/or self-complementary oligonucleotides can be ligated to one or both ends. See, for example, Chang et al. (1987) Proc. Natl. Acad Sci USA 84:4959-4963; Nehls et al. (1996) Science 272:886-889. Additional methods for protecting exogenous polynucleotides from degradation include, but are not limited to, addition of terminal amino group(s) and the use of modified internucleotide linkages such as, for example, phosphorothioates, phosphoramidates, and O-methyl ribose or deoxyribose residues. As an alternative to protecting the termini of a linear donor DNA, additional lengths of sequence may be included outside of the regions of homology that can be degraded without impacting recombination.
In various embodiments, the compositions, systems, and methods include use of two components, (1) a programmable nuclease (e.g., an RNA-guide CRISPR nuclease), and (2) a retron reverse transcriptase for synthesis of the msd DNA from the ncRNA. The programmable nuclease is targeted to a site in the genome by a guide RNA which can be fused or coupled to a retron non-coding RNA (ncRNA), which then generates a cut in the genome. This chromosomal break is then precisely repaired by the endogenous cellular machinery, using retron-derived reverse transcribed DNA (RT-DNA) as a repair template.
In some cases, the programmable nuclease used for genome modification is a Cas nuclease. Any RNA-guided Cas nuclease capable of catalyzing site-directed cleavage of DNA to allow integration of donor polynucleotides can be used in genome editing, including CRISPR system type I, type II, or type III Cas nucleases. Examples of Cas proteins include Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8al, Cas8a2, Cas8b, Cas8c, Cas9 (Csnl or Csxl2), CaslO, CaslOd, CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, Csf4, and Cul966, and homologs or modified versions thereof. In certain embodiments, a type II CRISPR system Cas9 endonuclease is used. Cas9 nucleases from any species, or biologically active fragments, variants, analogs, or derivatives thereof that retain Cas9 endonuclease activity (i.e., catalyze site-directed cleavage of DNA to generate double-strand breaks) may be used to perform genome modification as described herein. The Cas9 need not be physically derived from an organism but may be synthetically or recombinantly produced. Cas9 sequences from a number of bacterial species are well known in the art and listed in the National Center for Biotechnology Information (NCBI) database. See, for example, NCBI entries for Cas9 from: Streptococcus pyogenes (WP 002989955, WP_038434062, WP_011528583); Campylobacter jejuni (WP_022552435, YP_002344900), Campylobacter coll (WP 060786116); Campylobacter fetus (WP 059434633); Corynebacterium ulcerans (NC_015683, NC_017317); Corynebacterium diphtheria (NC_016782, NC_016786); Enterococcus faecalis (WP_033919308); Spiroplasma syrphidicola (NC_021284); Prevotella intermedia (NC_017861); Spiroplasma taiwanense (NC_021846); Streptococcus iniae (NC_021314); Belliella baltica (NC_018010); Psychrojlexus torquisl (NC 018721); Streptococcus thermophilus (YP 820832), Streptococcus mutans (WP_061046374, WP_024786433); Listeria innocua (NP_472073); Listeria monocytogenes (WP 061665472); Legionella pneumophila (WP 062726656); Staphylococcus aureus (WP_001573634); Francisella tularensis (WP_032729892, WP_0 14548420), Enterococcus faecalis (WP 033919308); Lactobacillus rhamnosus (WP_048482595, WP_032965177); and Neisseria meningitidis (WP_061704949, YP 002342100); all of which sequences (as entered by the date of filing of this application) are herein incorporated by reference in their entireties.
In another embodiment, the CRISPR nuclease from Prevotella and Francisella 1 (Cpfl) is used. Cpfl is another class II CRISPR/Cas system RNA-guided nuclease with similarities to Cas9 and may be used analogously. Unlike Cas9, Cpfl does not require a tracrRNA and only depends on a crRNA in its guide RNA, which provides the advantage that shorter guide RNAs can be used with Cpfl for targeting than Cas9. Cpfl is capable of cleaving either DNA or RNA. The PAM sites recognized by Cpfl have the sequences 5'-YTN-3' (where "Y" is a pyrimidine and "N" is any nucleobase) or 5'-TTN-3', in contrast to the G-rich PAM site recognized by Cas9. Cpfl cleavage of DNA produces double-stranded breaks with a sticky- ends having a 4 or 5 nucleotide overhang. For a discussion of Cpfl, see, e.g., Ledford et al. (2015) Nature. 526 (7571): 17-17, Zetsche et al. (2015) Cell. 163 (3):759-771, Murovec et al. (2017) Plant Biotechnol. J. 15(8):917-926, Zhang et al. (2017) Front. Plant Sci. 8: 177, Fernandes et al. (2016) Postepy Biochem. 62(3):315-326; herein incorporated by reference.
C2clis another class II CRISPR/Cas system RNA-guided nuclease that may be used. C2cl, similarly to Cas9, depends on both a crRNA and tracrRNA for guidance to target sites. For a description of C2cl, see, e.g., Shmakov et al. (2015) Mol Cell. 60(3):385-397, Zhang et al. (2017) Front Plant Sci. 8: 177; herein incorporated by reference.
In yet another embodiment, an engineered RNA-guided FokI nuclease may be used. RNA-guided FokI nucleases comprise fusions of inactive Cas9 (dCas9) and the FokI endonuclease (FokI-dCas9), wherein the dCas9 portion confers guide RNA-dependent targeting on FokI. For a description of engineered RNA-guided FokI nucleases, see, e.g., Havlicek et al. (2017) Mol. Ther. 25(2):342-355, Pan et al. (2016) Sci Rep. 6:35794, Tsai et al. (2014) Nat Biotechnol. 32(6):569-576; herein incorporated by reference.
The reverse transcriptase is expressed in cells to synthesize the msd DNA from the ncRNA. As described above, the msd DNA includes the repair template within the msd loop. The retron reverse transcriptase can be expressed from the same expression cassette as the Cas nuclease, or the reverse transcriptase can be expressed from a different expression cassette than the Cas nuclease.
A variety of expression cassettes and/or expression vectors can be used to express the retron reverse transcriptase and the Cas nuclease.
Delivery of an engineered retron to a cell will generally be accomplished with or without vectors. The engineered retrons (or vectors or expression cassettes containing them) may be introduced into any type of cell, including any cell from a prokaryotic, eukaryotic, or archaeon organism, including bacteria, archaea, fungi, protists, plants (e.g., monocotyledonous and dicotyledonous plants); and animals (e.g., vertebrates and invertebrates). Examples of animals that may be transfected with an engineered retron include, without limitation, vertebrates such as fish, birds, mammals (e.g., human and non-human primates, farm animals, pets, and laboratory animals), reptiles, and amphibians. Examples of plants that may be transfected with an engineered retron include, without limitation, crops including cereals such as wheat, oats, and rice, legumes such as soybeans and peas, corn, grasses such as alfalfa, and cotton. The engineered retrons can be introduced into a single cell or a population of cells of interest. Cells from tissues, organs, and biopsies, as well as recombinant cells, genetically modified cells, cells from cell lines cultured in vitro, and artificial cells (e.g., nanoparticles, liposomes, polymersomes, or microcapsules encapsulating nucleic acids) may all be transfected with the engineered retrons. The subject methods are also applicable to cellular fragments, cell components, or organelles (e.g., mitochondria in animal and plant cells, plastids (e.g., chloroplasts) in plant cells and algae). Cells may be cultured or expanded after transfection with the engineered retron constructs.
Methods of introducing nucleic acids into a host cell are well known in the art. Commonly used methods include chemically induced transformation, typically using divalent cations (e.g., CaCh), dextran-mediated transfection, polybrene mediated transfection, lipofectamine and LT-1 mediated transfection, electroporation, protoplast fusion, encapsulation of nucleic acids in liposomes, and direct microinjection of the nucleic acids comprising engineered retrons into nuclei. See, e.g., Sambrook et al. (2001) Molecular Cloning, a laboratory manual, 3rd edition, Cold Spring Harbor Laboratories, New York, Davis et al. (1995) Basic Methods in Molecular Biology, 2nd edition, McGraw-Hill, and Chu et al. (1981) Gene 13: 197; herein incorporated by reference in their entireties.
VECTOR SYSTEMS COMPRISING ENGINEERED RETRONS
In some embodiments, the engineered retrons, retron components, and retron editing systems are produced by a vector system comprising one or more vectors. In certain embodiments, the retron msr gene, msd gene, and ret gene are expressed in vivo from a vector within a cell. A "vector" is a composition of matter which can be used to deliver a nucleic acid of interest to the interior of a cell. The retron msr gene, msd gene, and ret gene can be introduced into a cell with a single vector or in multiple separate vectors to produce msDNA in a host subject. Vectors typically include control elements operably linked to the retron sequences, which allow for the production of msDNA in vivo in the subject species. For example, the retron msr gene, msd gene, and ret gene can be operably linked to a promoter to allow expression of the retron reverse transcriptase and msDNA product. In some embodiments, heterologous sequences encoding desired products of interest (e.g., polynucleotide encoding polypeptide or regulatory RNA, donor polynucleotide for gene editing, or protospacer DNA for molecular recording) may be inserted in the msr gene or msd gene. Any eukaryotic, archeon, or prokaryotic cell, capable of being transfected with a vector comprising the engineered retron sequences, may be used to produce the msDNA. The ability of constructs to produce the msDNA along with other retron-encoded products can be empirically determined.
In some embodiments, the engineered retron is produced by a vector system comprising one or more vectors. In the vector system, the msr gene, the msd gene, and the ret gene may be provided by the same vector (i.e., cis arrangement of all such retron elements), wherein the vector comprises a promoter operably linked to the msr gene and the msd gene. In some embodiments, the promoter is further operably linked to the ret gene. In other embodiments, the vector further comprises a second promoter operably linked to the ret gene. Alternatively, the ret gene may be provided by a second vector that does not include the msr gene and the msd gene (i.e., trans arrangement of msr-msd and ref). In yet other embodiments, the msr gene, the msd gene, and the ret gene are each provided by different vectors (i.e., trans arrangement of all retron elements). Numerous vectors are available including, but not limited to, linear polynucleotides, polynucleotides associated with ionic or amphiphilic compounds, plasmids, and viruses. Thus, the term "vector" includes an autonomously replicating plasmid or a virus. Examples of viral vectors include, but are not limited to, adenoviral vectors, adeno-associated virus vectors, retroviral vectors, lentiviral vectors, and the like. An expression construct can be replicated in a living cell, or it can be made synthetically. For purposes of this application, the terms "expression construct," "expression vector," and "vector," are used interchangeably to demonstrate the application of the invention in a general, illustrative sense, and are not intended to limit the invention.
In certain embodiments, the nucleic acid comprising an engineered retron sequence is under transcriptional control of a promoter. A "promoter" refers to a DNA sequence recognized by the synthetic machinery of the cell, or introduced synthetic machinery, required to initiate the specific transcription of a gene. The term promoter will be used here to refer to a group of transcriptional control modules that are clustered around the initiation site for RNA polymerase I, II, or III, including RNA polymerase III (Pol III) promoters to express the ncRNA/gRNA in mammalian (e.g., human) cells.
RNA polymerase III (Pol III) is responsible for the synthesis of a large variety of small nuclear and cytoplasmic non-coding RNAs. Examples of PolIII promoters include the 7SK, U6 and Hl promoters. PolIII promoters can provide expression in a variety of cell types. PolIII promoters are typically compact, for example, providing expression from 5 '-flanking sequences as short as 100 bp. In other cases, the PolIII promoter has more than 100 nucleotides. For example, the DNA elements for transcription of the Hl RNA gene are composed of the octamer, Staf transcription factor binding site, proximal sequence element (PSE) and TATA motifs.
An example of a sequence for a Hl promoter is shown below as SEQ ID NO: 9.
1 AATTCGGAAC GCTGACGTCA TCAACCCGCT CCAAGGAATC 41 GCGGGCCCAG TGTCACTAGG CGGGAACACC CAGCGCGCGT
81 GCGCCCTGGC AGGAAGATGG CTGTGAGGGA CAGGGGAGTG
121 GCGCCCTGCA ATATTTGCAT GTCGCTATGT GTTCTGGGAA
161 ATCACCATAA ACGTGAAATG TCTTTGGATT TGGGAATCTT
201 ATAAGTTCTG TATGAGACCA C
Typically, transcription terminator/polyadenylation signals will also be present in the expression construct. PolIII terminates transcription at small PolyU stretch. In eukaryotes, a hairpin loop is not required, but may enhance termination efficiency in humans.
Typical promoters for mammalian cell expression include the SV40 early promoter, a CMV promoter such as the CMV immediate early promoter (see, U.S. Patent Nos. 5,168,062 and 5,385,839, incorporated herein by reference in their entireties), the mouse mammary tumor virus LTR promoter, the adenovirus major late promoter (Ad MLP), and the herpes simplex virus promoter, among others. Other nonviral promoters, such as a promoter derived from the murine metallothionein gene, will also find use for mammalian expression. These and other promoters can be obtained from commercially available plasmids, using techniques well known in the art. See, e.g., Sambrook et al., supra. Enhancer elements may be used in association with the promoter to increase expression levels of the constructs. Examples include the SV40 early gene enhancer, as described in Dijkema et al., EMBO J. (1985) 4:761, the enhancer/promoter derived from the long terminal repeat (LTR) of the Rous Sarcoma Virus, as described in Gorman et al., Proc. Natl. Acad. Sci. USA (1982b) 79:6777 and elements derived from human CMV, as described in Boshart et al., Cell (1985) 44 :521, such as elements included in the CMV intron A sequence.
In one embodiment, an expression vector for expressing an engineered retron, including the msr gene, msd gene, and ret gene comprises a promoter "operably linked" to a polynucleotide encoding the msr gene, msd gene, and ret gene. The phrase "operably linked" or "under transcriptional control" as used herein means that the promoter is in the correct location and orientation in relation to a polynucleotide to control the initiation of transcription by RNA polymerase and expression of the msr gene, msd gene, and ret gene.
Typically, transcription terminator/polyadenylation signals will also be present in the expression construct. Examples of such sequences include, but are not limited to, those derived from SV40, as described in Sambrook et al., supra, as well as a bovine growth hormone terminator sequence (see, e.g., U.S. Patent No. 5,122,458). Additionally, 5'- UTR sequences can be placed adjacent to the coding sequence in order to enhance expression of the same. Such sequences may include UTRs comprising an internal ribosome entry site (IRES).
Inclusion of an IRES permits the translation of one or more open reading frames from a vector. The IRES element attracts a eukaryotic ribosomal translation initiation complex and promotes translation initiation. See, e.g., Kaufman et al., Nuc. Acids Res. (1991) 19:4485- 4490; Gurtu et al., Biochem. Biophys. Res. Comm. (1996) 229:295-298; Rees et al., BioTechniques (1996) 20: 102-110; Kobayashi et al., BioTechniques (1996) 21 :399-402; and Mosser et al., BioTechniques (1997 22 150-161). A multitude of IRES sequences are known and include sequences derived from a wide variety of viruses, such as from leader sequences of picomaviruses such as the encephalomyocarditis virus (EMCV) UTR (Jang et al. J. Virol. (1989) 63: 1651-1660), the polio leader sequence, the hepatitis A virus leader, the hepatitis C virus IRES, human rhinovirus type 2 IRES (Dobrikova et al., Proc. Natl. Acad. Sci. (2003) 100(25): 15125-15130), an IRES element from the foot and mouth disease virus (Ramesh et al., Nucl. Acid Res. (1996) 24:2697-2700), a giardiavirus IRES (Garlapati et al., J. Biol. Chem. (2004) 279(5):3389-3397), and the like. A variety of nonviral IRES sequences will also find use herein, including, but not limited to IRES sequences from yeast, as well as the human angiotensin II type 1 receptor IRES (Martin et al., Mol. Cell Endocrinol. (2003) 212:51-61), fibroblast growth factor IRESs (FGF-1 IRES and FGF-2 IRES, Martineau et al. (2004) Mol. Cell. Biol. 24(17):7622-7635), vascular endothelial growth factor IRES (Baranick et al. (2008) Proc. Natl. Acad. Sci. U.S.A. 105(12):4733-4738, Stein et al. (1998) Mol. Cell. Biol. 18(6):3112-3119, Bert et al. (2006) RNA 12(6): 1074-1083), and insulin-like growth factor 2 IRES (Pedersen et al. (2002) Biochem. J. 363(Pt l):37-44). These elements are readily commercially available in plasmids sold, e.g., by Clontech (Mountain View, CA), Invivogen (San Diego, CA), Addgene (Cambridge, MA) and GeneCopoeia (Rockville, MD). See also IRESite: The database of experimentally verified IRES structures (iresite.org). An IRES sequence may be included in a vector, for example, to express multiple bacteriophage recombination proteins for recombineering or an RNA-guided nuclease (e.g., Cas9) for HDR in combination with a retron reverse transcriptase from an expression cassette.
A polynucleotide encoding a viral T2A peptide can be used to allow production of multiple protein products (e.g., Cas9, bacteriophage recombination proteins, retron reverse transcriptase) from a single vector. One or more 2A linker peptides can be inserted between the coding sequences in the multici stronic construct. The 2A peptide, which is self-cleaving, allows co-expressed proteins from the multi ci stronic construct to be produced at equimolar levels. 2A peptides from various viruses may be used, including, but not limited to 2A peptides derived from the foot-and-mouth disease virus, equine rhinitis A virus, Thosea asigna virus and porcine teschovirus-1. See, e.g., Kim et al. (2011) PLoS One 6(4):el8556, Trichas et al. (2008) BMC Biol. 6:40, Provost et al. (2007) Genesis 45(10):625-629, Furler et al. (2001) Gene Ther. 8(11): 864-873; herein incorporated by reference in their entireties.
In certain embodiments, the expression construct comprises a plasmid suitable for transforming a bacterial host. Numerous bacterial expression vectors are known to those of skill in the art, and the selection of an appropriate vector is a matter of choice. Bacterial expression vectors include, but are not limited to, pACYC177, pASK75, pBAD, pBADM, pBAT, pCal, pET, pETM, pGAT, pGEX, pHAT, pKK223, pMal, pProEx, pQE, and pZA31 Bacterial plasmids may contain antibiotic selection markers (e.g., ampicillin, kanamycin, erythromycin, carbenicillin, streptomycin, or tetracycline resistance), a lacZ gene (P- galactosidase produces blue pigment from x-gal substrate), fluorescent markers (e.g., GFP. mCherry), or other markers for selection of transformed bacteria. See, e.g., Sambrook et al., supra.
In other embodiments, the expression construct comprises a plasmid suitable for transforming a yeast cell. Yeast expression plasmids typically contain a yeast-specific origin of replication (ORI) and nutritional selection markers (e.g., HIS3, URA3, LYS2, LEU2, TRP1, MET15, ura4+, leul+, ade6+), antibiotic selection markers (e.g., kanamycin resistance), fluorescent markers (e.g., mCherry), or other markers for selection of transformed yeast cells. The yeast plasmid may further contain components to allow shuttling between a bacterial host (e.g., E. colt) and yeast cells. A number of different types of yeast plasmids are available including yeast integrating plasmids (Yip), which lack an ORI and are integrated into host chromosomes by homologous recombination; yeast replicating plasmids (YRp), which contain an autonomously replicating sequence (ARS) and can replicate independently; yeast centromere plasmids (YCp), which are low copy vectors containing a part of an ARS and part of a centromere sequence (CEN); and yeast episomal plasmids (YEp), which are high copy number plasmids comprising a fragment from a 2 micron circle (a natural yeast plasmid) that allows for 50 or more copies to be stably propagated per cell.
In other embodiments, the expression construct comprises a virus or engineered construct derived from a viral genome. A number of viral based systems have been developed for gene transfer into mammalian cells. These include adenoviruses, retroviruses (y- retroviruses and lentiviruses), poxviruses, adeno-associated viruses, baculoviruses, and herpes simplex viruses (see e.g., Warnock et al. (2011) Methods Mol. Biol. 737: 1-25; Walther et al. (2000) Drugs 60(2):249-271; and Lundstrom (2003) Trends Biotechnol. 21(3): 117-122; herein incorporated by reference in their entireties). The ability of certain viruses to enter cells via receptor-mediated endocytosis, to integrate into host cell genomes and express viral genes stably and efficiently have made them attractive candidates for the transfer of foreign genes into mammalian cells.
For example, retroviruses provide a convenient platform for gene delivery systems. Selected sequences can be inserted into a vector and packaged in retroviral particles using techniques known in the art. The recombinant virus can then be isolated and delivered to cells of the subject either in vivo or ex vivo. A number of retroviral systems have been described (U.S. Pat. No. 5,219,740; Miller and Rosman (1989) BioTechniques 7:980-990; Miller, A. D. (1990) Human Gene Therapy 1 :5-14; Scarpa et al. (1991) Virology 180:849-852; Bums et al. (1993) Proc. Natl. Acad. Sci. USA 90:8033-8037; Boris-Lawrie and Temin (1993) Cur. Opin. Genet. Develop. 3: 102-109; and Ferry et al. (2011) Curr. Pharm. Des. 17(24):2516-2527). Lentiviruses are a class of retroviruses that are particularly useful for delivering polynucleotides to mammalian cells because they are able to infect both dividing and nondividing cells (see e.g., Lois et al (2002) Science 295:868-872; Durand et al. (2011) Viruses 3(2): 132-159; herein incorporated by reference).
A number of adenovirus vectors have also been described. Unlike retroviruses which integrate into the host genome, adenoviruses persist extrachromosomally thus minimizing the risks associated with insertional mutagenesis (Haj-Ahmad and Graham, J. Virol. (1986) 57:267-274; Bett et al., J. Virol. (1993) 67:5911-5921; Mittereder et al., Human Gene Therapy (1994) 5:717-729; Seth et al., J. Virol. (1994) 68:933-940; Barr et al., Gene Therapy (1994) 1 :51-58; Berkner, K. L. BioTechniques (1988) 6:616-629; and Rich et al., Human Gene Therapy (1993) 4:461-476). Additionally, various adeno-associated virus (AAV) vector systems have been developed for gene delivery. AAV vectors can be readily constructed using techniques well known in the art. See, e.g., U.S. Pat. Nos. 5,173,414 and 5,139,941; International Publication Nos. WO 92/01070 (published 23 January 1992) and WO 93/03769 (published 4 March 1993); Lebkowski et al., Molec. Cell. Biol. (1988) 8:3988-3996; Vincent et al., Vaccines 90 (1990) (Cold Spring Harbor Laboratory Press); Carter, B. J. Current Opinion in Biotechnology (1992) 3:533-539; Muzyczka, N. Current Topics in Microbiol, and Immunol. (1992) 158:97-129; Kotin, R. M. Human Gene Therapy (1994) 5:793-801; Shelling and Smith, Gene Therapy (1994) 1 : 165-169; and Zhou et al., J. Exp. Med. (1994) 179: 1867-1875. Another vector system useful for delivering nucleic acids encoding the engineered retrons is the enterically administered recombinant poxvirus vaccines described by Small, Jr., P. A., et al. (U.S. Pat. No. 5,676,950, issued Oct. 14, 1997, herein incorporated by reference).
Additional viral vectors which will find use for delivering the nucleic acid molecules of interest include those derived from the pox family of viruses, including vaccinia virus and avian poxvirus. By way of example, vaccinia virus recombinants expressing a nucleic acid molecule of interest (e.g., engineered retron) can be constructed as follows. The DNA encoding the particular nucleic acid sequence is first inserted into an appropriate vector so that it is adjacent to a vaccinia promoter and flanking vaccinia DNA sequences, such as the sequence encoding thymidine kinase (TK). This vector is then used to transfect cells which are simultaneously infected with vaccinia. Homologous recombination serves to insert the vaccinia promoter plus the gene encoding the sequences of interest into the viral genome. The resulting TK-recombinant can be selected by culturing the cells in the presence of 5-bromodeoxyuridine and picking viral plaques resistant thereto.
Alternatively, avipoxviruses, such as the fowlpox and canarypox viruses, can also be used to deliver the nucleic acid molecules of interest. The use of an avipox vector is particularly desirable in human and other mammalian species since members of the avipox genus can only productively replicate in susceptible avian species and therefore are not infective in mammalian cells. Methods for producing recombinant avipoxviruses are known in the art and employ genetic recombination, as described above with respect to the production of vaccinia viruses. See, e.g., WO 91/12882; WO 89/03429; and WO 92/03545.
Molecular conjugate vectors, such as the adenovirus chimeric vectors described in Michael et al., J. Biol. Chem. (1993) 268:6866-6869 and Wagner et al., Proc. Natl. Acad. Sci. USA (1992) 89:6099-6103, can also be used for gene delivery.
Members of the alphavirus genus, such as, but not limited to, vectors derived from the Sindbis virus (SIN), Semliki Forest virus (SFV), and Venezuelan Equine Encephalitis virus (VEE), will also find use as viral vectors for delivering the polynucleotides of the present invention. For a description of Sindbis-virus derived vectors useful for the practice of the instant methods, see, Dubensky et al. (1996) J. Virol. 70:508-519; and International Publication Nos. WO 95/07995, WO 96/17072; as well as, Dubensky, Jr., T. W., et al., U.S. Pat. No. 5,843,723, issued Dec. 1, 1998, and Dubensky, Jr., T. W., U.S. Patent No. 5,789,245, issued Aug. 4, 1998, both herein incorporated by reference. Particularly preferred are chimeric alphavirus vectors comprised of sequences derived from Sindbis virus and Venezuelan equine encephalitis virus. See, e.g., Perri et al. (2003) J. Virol. 77: 10394-10403 and International Publication Nos. WO 02/099035, WO 02/080982, WO 01/81609, and WO 00/61772; herein incorporated by reference in their entireties.
A vaccinia-based infection/transfection system can be conveniently used to provide for inducible, transient expression of the nucleic acids of interest (e.g., engineered retron) in a host cell. In this system, cells are first infected in vitro with a vaccinia virus recombinant that encodes the bacteriophage T7 RNA polymerase. This polymerase displays exquisite specificity in that it only transcribes templates bearing T7 promoters. Following infection, cells are transfected with the nucleic acid of interest, driven by a T7 promoter. The polymerase expressed in the cytoplasm from the vaccinia virus recombinant transcribes the transfected DNA into RNA. The method provides for high level, transient, cytoplasmic production of large quantities of RNA. See, e.g., Elroy-Stein and Moss, Proc. Natl. Acad. Sci. USA (1990) 87:6743-6747; Fuerst et al., Proc. Natl. Acad. Sci. USA (1986) 83:8122-8126.
As an alternative approach to infection with vaccinia or avipox virus recombinants, or to the delivery of nucleic acids using other viral vectors, an amplification system can be used that will lead to high level expression following introduction into host cells. Specifically, a T7 RNA polymerase promoter preceding the coding region for T7 RNA polymerase can be engineered. Translation of RNA derived from this template will generate T7 RNA polymerase which in turn will transcribe more templates. Concomitantly, there will be a cDNA whose expression is under the control of the T7 promoter. Thus, some of the T7 RNA polymerase generated from translation of the amplification template RNA will lead to transcription of the desired gene. Because some T7 RNA polymerase is required to initiate the amplification, T7 RNA polymerase can be introduced into cells along with the template(s) to prime the transcription reaction. The polymerase can be introduced as a protein or on a plasmid encoding the RNA polymerase. For a further discussion of T7 systems and their use for transforming cells, see, e.g., International Publication No. WO 94/26911; Studier and Moffatt, J. Mol. Biol. (1986) 189: 113-130; Deng and Wolff, Gene (1994) 143:245-249; Gao et al., Biochem. Biophys. Res. Commun. (1994) 200: 1201-1206; Gao and Huang, Nuc. Acids Res. (1993) 21 :2867-2872; Chen et al., Nuc. Acids Res. (1994) 22:2114-2120; and U.S. Pat. No. 5,135,855.
Insect cell expression systems, such as baculovirus systems, can also be used and are known to those of skill in the art and described in, e.g., Baculovirus and Insect Cell Expression Protocols (Methods in Molecular Biology, D.W. Murhammer ed., Humana Press, 2nd edition, 2007) and L. King The Baculovirus Expression System: A laboratory guide (Springer, 1992). Materials and methods for baculovirus/insect cell expression systems are commercially available in kit form from, inter alia, Thermo Fisher Scientific (Waltham, MA) and Clontech (Mountain View, CA).
Plant expression systems can also be used for transforming plant cells. Generally, such systems use virus-based vectors to transfect plant cells with heterologous genes. For a description of such systems see, e.g., Porta et al., Mol. Biotech. (1996) 5:209-221; and Hackland et al., Arch. Virol. (1994) 139: 1-22.
In order to effect expression of engineered retron constructs, the expression construct must be delivered into a cell. This delivery may be accomplished in vitro, as in laboratory procedures for transforming cells lines, or in vivo or ex vivo, as in the treatment of certain disease states. One mechanism for delivery is via viral infection where the expression construct is encapsulated in an infectious viral particle.
Several non-viral methods for the transfer of expression constructs into cultured cells also are contemplated. These include the use of calcium phosphate precipitation, DEAE- dextran, electroporation, direct microinjection, DNA-loaded liposomes, lipofectamine-DNA complexes, cell sonication, gene bombardment using high velocity microprojectiles, and receptor-mediated transfection (see, e.g., Graham and Van Der Eb (1973) Virology 52:456- 467; Chen and Okayama (1987) Mol. Cell Biol. 7:2745-2752; Rippe et al. (1990) Mol. Cell Biol. 10:689-695; Gopal (1985) Mol. Cell Biol. 5: 1188-1190; Tur-Kaspa et al. (1986) Mol. Cell. Biol. 6:716-718; Potter et al. (1984) Proc. Natl. Acad. Sci. USA 81 :7161-7165); Harland and Weintraub (1985) J. Cell Biol. 101 : 1094-1099); Nicolau & Sene (1982) Biochim. Biophys. Acta 721 : 185-190; Fraley et al. (1979) Proc. Natl. Acad. Sci. USA 76:3348-3352; Fechheimer et al. (1987) Proc Natl. Acad. Sci. USA 84:8463-8467; Yang et al. (1990) Proc. Natl. Acad. Sci. USA 87:9568-9572; Wu and Wu (1987) J. Biol. Chem. 262:4429-4432; Wu and Wu (1988) Biochemistry 27:887-892; herein incorporated by reference). Some of these techniques may be successfully adapted for in vivo or ex vivo use.
Once the expression construct has been delivered into the cell the nucleic acid comprising the engineered retron sequence may be positioned and expressed at different sites. In certain embodiments, the nucleic acid comprising the engineered retron sequence may be stably integrated into the genome of the cell. This integration may be in the cognate location and orientation via homologous recombination (gene replacement) or it may be integrated in a random, non-specific location (gene augmentation). In yet further embodiments, the nucleic acid may be stably maintained in the cell as a separate, episomal segment of DNA. Such nucleic acid segments or "episomes" encode sequences sufficient to permit maintenance and replication independent of or in synchronization with the host cell cycle. How the expression construct is delivered to a cell and where in the cell the nucleic acid remains is dependent on the type of expression construct employed.
In yet another embodiment, the expression construct may simply consist of naked recombinant DNA or plasmids comprising the engineered retron. Transfer of the construct may be performed by any of the methods mentioned above which physically or chemically permeabilize the cell membrane. This is particularly applicable for transfer in vitro but it may be applied to in vivo use as well. Dubensky et al. (Proc. Natl. Acad. Sci. USA (1984) 81 :7529- 7533) successfully injected polyomavirus DNA in the form of calcium phosphate precipitates into liver and spleen of adult and newborn mice demonstrating active viral replication and acute infection. Benvenisty & Neshif (Proc. Natl. Acad. Sci. USA (1986) 83:9551-9555) also demonstrated that direct intraperitoneal injection of calcium phosphate-precipitated plasmids results in expression of the transfected genes. It is envisioned that DNA encoding an engineered retron of interest may also be transferred in a similar manner in vivo and express retron products.
In still another embodiment, a naked DNA expression construct may be transferred into cells by particle bombardment. This method depends on the ability to accelerate DNA-coated microprojectiles to a high velocity allowing them to pierce cell membranes and enter cells without killing them (Klein et al. (1987) Nature 327:70-73). Several devices for accelerating small particles have been developed. One such device relies on a high voltage discharge to generate an electrical current, which in turn provides the motive force (Yang et al. (1990) Proc. Natl. Acad. Sci. USA 87:9568-9572). The microprojectiles may consist of biologically inert substances, such as tungsten or gold beads.
In a further embodiment, the expression construct may be delivered using liposomes. Liposomes are vesicular structures characterized by a phospholipid bilayer membrane and an inner aqueous medium. Multilamellar liposomes have multiple lipid layers separated by aqueous medium. They form spontaneously when phospholipids are suspended in an excess of aqueous solution. The lipid components undergo self-rearrangement before the formation of closed structures and entrap water and dissolved solutes between the lipid bilayers (Ghosh & Bachhawat (1991) Liver Diseases, Targeted Diagnosis and Therapy Using Specific Receptors and Ligands, Wu et al. (Eds.), Marcel Dekker, NY, 87-104). Also contemplated is the use of lipofectamine-DNA complexes. In certain embodiments, the liposome may be complexed with a hemagglutinating virus (HVJ). This has been shown to facilitate fusion with the cell membrane and promote cell entry of liposome-encapsulated DNA (Kaneda et al. (1989) Science 243:375-378). In other embodiments, the liposome may be complexed or employed in conjunction with nuclear nonhistone chromosomal proteins (HMG-I) (Kato et al. (1991) J. Biol. Chem. 266(6):3361 -3364). In yet further embodiments, the liposome may be complexed or employed in conjunction with both HVJ and HMG-I. In that such expression constructs have been successfully employed in transfer and expression of nucleic acid in vitro and in vivo, then they are applicable for the present invention. Where a bacterial promoter is employed in the DNA construct, it also will be desirable to include within the liposome an appropriate bacterial polymerase.
Other expression constructs which can be employed to deliver a nucleic acid into cells are receptor-mediated delivery vehicles. These take advantage of the selective uptake of macromolecules by receptor-mediated endocytosis in almost all eukaryotic cells. Because of the cell type-specific distribution of various receptors, the delivery can be highly specific (Wu and Wu (1993) Adv. Drug Delivery Rev. 12:159-167).
Receptor-mediated gene targeting vehicles generally consist of two components: a cell receptor-specific ligand and a DNA-binding agent. Several ligands have been used for receptor- mediated gene transfer. The most extensively characterized ligands are asialoorosomucoid (ASOR) and transferrin (see, e.g., Wu and Wu (1987), supra, Wagner et al. (1990) Proc. Natl. Acad. Sci. USA 87(9):3410-3414). A synthetic neoglycoprotein, which recognizes the same receptor as ASOR, has been used as a gene delivery vehicle (Ferkol et al. (1993) FASEB J. 7: 1081-1091; Perales et al. (1994) Proc. Natl. Acad. Sci. USA 91(9):4086-4090), and epidermal growth factor (EGF) has also been used to deliver genes to squamous carcinoma cells (Myers, EPO 0273085).
In other embodiments, the delivery vehicle may comprise a ligand and a liposome. For example, Nicolau et al. (Methods Enzymol. (1987) 149: 157-176) employed lactosyl-ceramide, a galactose-terminal asialoganglioside, incorporated into liposomes and observed an increase in the uptake of the insulin gene by hepatocytes. Thus, it is feasible that a nucleic acid encoding a particular gene also may be specifically delivered into a cell by any number of receptor-ligand systems with or without liposomes. Also, antibodies to surface antigens on cells can similarly be used as targeting moieties.
In a particular example, a recombinant polynucleotide comprising an engineered retron may be administered in combination with a cationic lipid. Examples of cationic lipids include, but are not limited to, lipofectin, DOTMA, DOPE, and DOTAP. The publication of WO/0071096, which is specifically incorporated by reference, describes different formulations, such as a DOTAP:cholesterol or cholesterol derivative formulation that can effectively be used for gene therapy. Other disclosures also discuss different lipid or liposomal formulations including nanoparticles and methods of administration; these include, but are not limited to, U.S. Patent Publication 20030203865, 20020150626, 20030032615, and 20040048787, which are specifically incorporated by reference to the extent they disclose formulations and other related aspects of administration and delivery of nucleic acids. Methods used for forming particles are also disclosed in U.S. Pat. Nos. 5,844,107, 5,877,302, 6,008,336, 6,077,835, 5,972,901, 6,200,801, and 5,972,900, which are incorporated by reference for those aspects.
In certain embodiments, gene transfer may more easily be performed under ex vivo conditions. Ex vivo gene therapy refers to the isolation of cells from a subject, the delivery of a nucleic acid into cells in vitro, and then the return of the modified cells back into the subject. This may involve the collection of a biological sample comprising cells from the subject. For example, blood can be obtained by venipuncture, and solid tissue samples can be obtained by surgical techniques according to methods well known in the art.
Usually, but not always, the subject who receives the cells (i.e., the recipient) is also the subject from whom the cells are harvested or obtained, which provides the advantage that the donated cells are autologous. However, cells can be obtained from another subject (i.e., donor), a culture of cells from a donor, or from established cell culture lines. Cells may be obtained from the same or a different species than the subject to be treated, but preferably are of the same species, and more preferably of the same immunological profile as the subject. Such cells can be obtained, for example, from a biological sample comprising cells from a close relative or matched donor, then transfected with nucleic acids (e.g., comprising an engineered retron), and administered to a subject in need of genome modification, for example, for treatment of a disease or condition.
BACTERIAL HOST CELLS
In some embodiment, the bacterial host cells can be any bacterial cells that have phage receptor binding proteins (RBP) for the phage type(s) of interest. However, bacteriophages can be species-specific with regard to their hosts and may only infect a single bacterial species or even some specific strains within a species. For example, in some cases the bacterial hosts include Escherichia coh. However, other bacterial species can be used as host cells. For example, the bacterial host cells can be one or more strains of Escherichia coli (often linked to gastrointestinal distress), Salmonella (often linked to food poisoning), Mycobacterium (causes tuberculosis), Bacillus anthracis (anthrax), Citrobacter freundii (gastroenteritis, neonatal meningitis, and septicemia), Clostridium tetani (tetanus), Clostridium botulinum (botulism), Clostridium difficile (gastrointestinal problems, especially in those with a weak immune system), Enterobacter hormaechei (nosocomial infections), Haemophilus influenzae (meningitis), Haemophilus influenzae Type B (ear, throat, lung infections), Heliobacter pylori (stomach ulcers), Klebsiella pneumoniae (infections, especially lung infections), Leptospira (Leptospirosis), Listeria monocytogenes (meningitis), Pseudomonas aeruginosa (infections, especially lung infections), Neisseria gonorrhoeae (gonorrhea), Neisseria meningitidis (meningitis), Serratia marcescens (nosocomial infections), Shigella dysenteriae (dysentery, shigellosis), Staphylococcus aureus (e.g., methicillin-resistant Staphylococcus aureus (MRSA) that is resistant to antibiotics), Streptococcus pneumoniae (infections, especially lung infections, and meningitis), Treponema pallidum (syphilis), Vibrio cholerae (cholera), Vibrio vulnificus (“flesh-eating” bacteria), Yersinia pestis (bubonic plague), Legionella pneumophila (Legionnaires' disease), Cutibacterium acnes (acne)and others.
The bacterial host cells can be modified to include retron nucleic acids (ncRNAs) that encode donor DNAs adapted for editing phage genomes.
The bacterial host cells can also include components to facilitate editing of the bacterial or phage genomes, including one or more types of reverse transcriptases, single strand annealing proteins (SSAPs), single-stranded DNA binding proteins (SSBs), mismatch repair (e.g., mutL) mutants, or combinations thereof. In some cases, the bacterial host cells can be modified to include a dominant-negative mutant mutL gene (e.g., with an E32K mutation). Sequences for such protein components are available in the database provided by National Center for Biotechnology Information (NCBI; see website at ncbi.nlm.nih.gov). Some examples of sequences for these types of proteins are provided herein but other sequences for these types of proteins can also be used.
A variety of one or more single strand annealing proteins (SSAPs) can be expressed, either endogenously or recombinantly, to facilitate recombination during editing of the bacterial or phage’s genomes. In general, the one or more single strand annealing proteins (SSAPs) so expressed are compatible with one or more single-stranded binding proteins (SSBs) to promote recombination during editing. The SSAPs can be bacterial or phage SSAPs - either the bacterial host cell or the infecting phage can express such SSAPs.
For example, one type of SSAP that can be expressed by bacteria during editing of genomes is the bacteriophage lambda bet (also called Redp) SSAP protein that has the following protein sequence (NCBI NP_040617.1; SEQ ID NO: 11).
1 MSTALATLAG KLAERVGMDS VDPQE LITTL RQTAFKGDAS 41 DAQFIAL LIV ANQYGLNPWT KEIYAFPDKQ NGIVPVVGVD 81 GWSRIINENQ QFDGMDFEQD NESCTCRIYR KDRNHPICVT 121 EWMDECRREP FKTREGREIT GPWQSHPKRM LRHKAMIQCA 161 RLAFGFAGIY DKDEAERIVE NTAYTAERQP ERDITPVNDE 201 TMQEINTLLI ALDKTWDDDL LPLCSQI FRR DIRASSE LTQ 241 AEAVKALGF L KQKAAEQKVA A
An example of a protein sequence for a Shigella dysenteriae bet SSAP protein is shown below (NCBI AAF28115.1; SEQ ID NO: 12).
1 MSTALATLAG KLAERVGMDS VDPQE LITTL RQTAFKGDAS
41 DAQFIAL LIV ANQYGLNPWT KEIYAFPDKQ NGIVPVVGVD
81 GWSRIINENQ QFDGMDFEQD NESCTCRIYR KDRNHPICVT
121 EWMDECRRAP FKTREGREIT GPWQSHPKRM LRHKAMIQCA 161 RLAFGFAGIY DKDEAERIVE NTAYTTERQP ERDITPVNEE 201 TMSEINALLT FMEKTWDDDL LPLCSQI FRR NIYTSSE LTQ 241 AEAVKVLGF L KQKVTEQKVA A
An example of a protein sequence for an Escherichia phage Stx2 II bet SSAP protein is shown below (NCBI NP_859351.1; SEQ ID NO: 13).
1 MSTALATLAG KLAERVGMDS VDPQE LITTL RQTAFKGDAS 41 DAQFIAL LIV ANQYGLNPWT KEIYAFPDKQ NGIVPVVGVD 81 GWSRIINENQ QFDGMDFEQD NESCTCRIYR KDRNHPICVT 121 EWMDECRREP FKTREGREIT GPWQSHPKRM LRHKAMIQCA 161 RLAFGFAGIY DKDEAERIVE NTAYTAERQP ERDITPVNDE 201 TMQEINTLLI ALDKTWDDDL LPLCSQI FRR DIRASSE LTQ 241 AEAVKVLGF L KQKASEQKVA A
An example of a protein sequence for an Enterobacteria phage VT2-Sakai bet SSAP protein is shown below (NCBI BAA84297.1; SEQ ID NO: 14).
1 MSTALATLAG KLAERVGMDS VDPQE LITTL RQTAFKGDAS
41 DAQFIAL LIV ANQYGLNPWT KEIYAFPDKQ NGIVPVVGVD
81 GWSRIINENQ QFDGMDFEQD NESCTCRIYR KDRNHPICVT
121 EWMDECRREP FKTREGREIT GPWQSHPKRM LRHKAMIQCA 161 RLAFGFAGIY DKDEAERIVE NTAYTAERQP ERDITPVNDE 201 TMQEINTLLI ALDKTWDDDL LPLCSQI FRR DIRASSE LTQ 241 AEAVKVLGF L KQKASEQKVA A An example of a protein sequence for an Escherichia phage vB_EcoP_24B bet SSAP protein is shown below (NCBI ADN68402.1; SEQ ID NO: 15).
1 MSTALATLAG KLAERVGMDS VDPQE LITTL RQTAFKGDAS
41 DAQFIAL LIV ANQYGLNPWT KEIYAFPDKQ NGIVPVVGVD
81 GWSRIINENQ QFDGMDFEQD NESCTCRIYR KDRNHPICVT
121 EWMDECRREP FKTREGREIT GPWQSHPKRM LRHKAMIQCA 161 RLAFGFAGIY DKDEAERIVE NTAYTAERQP ERDITPVNDE 201 TMQEINTLLI ALDKTWDDDL LPLCSQI FRR DIRASSE LTQ 241 AEAVKALGF L KQKATEQKVA A
An example of a protein sequence for a Serratia phage Eta recombination (bet) protein is shown below (NCBI YP_008130312.1; SEQ ID NO: 16).
1 MSTALAAIAQ SSGVSVDDVT DVLKGMIISA KNQHGAQVSN
41 AE LAVVSGVC AKYDLNPMVK ECAAFISGGK LQWLMIDGW
81 YRIVNRQPNF DGVE FDDHID DKSVLTAITC RMYIKGRTRP
121 VWTEYMSEC RDPKSSVWQK WPARMLRHKA YIQCARMTFG
161 ISDMIDNDEA SRITQGEKNI TQQASSVSTV DYQAIDQAMG 201 ECEDHDALNK LCAEIRAEME KRGTWNSEKV TLADMKSRHK 241 ARIDAAVVTD EF EVVEDDND GAVKSDVEDS ATDDDVPF E
In some cases, the SSAPs expressed by the bacterial host cells include a RecT recombinase. Such recombination facilitating RecT proteins are of the Pfam family: PF03837. One example of a RecT protein sequence is the following Enterobacteriaceae RecT protein sequence (NCBI WP 000166319.1; SEQ ID NO: 17).
1 MTKQPPIAKA DLQKTQGNRA PAAVKNSDVI SFINQPSMKE 41 QLAAALPRHM TAERMIRIAT TEIRKVPALG NCDTMSFVSA 61 IVQCSQLGLE PGSALGHAYL LPFGNKNEKS GKKNVQLIIG 121 YRGMIDLARR SGQIAS LSAR VVREGDE FSF EFGLDEKLIH 161 RPGENEDAPV THVYAVARLK DGGTQFEVMT RKQI E LVRSL 201 SKAGNNGPWV THWE EMAKKT AIRRLFKYLP VSIEIQRAVS 241 MDEKEPLTID PADSSVLTGE YSVIDNSEE
A nucleotide sequence that encodes an Enterobacteriaceae RecT protein is shown below
(NCBI NC_000913.3; SEQ ID NO: 18).
1 ATGACTAAGC AACCACCAAT CGCAAAAGCC GATCTGCAAA 41 AAACTCAGGG AAACCGTGCA CCAGCAGCAG TTAAAAATAG 61 CGACGTGATT AGTTTTATTA ACCAGCCATC AATGAAAGAG 121 CAACTGGCAG CAGCTCTTCC ACGCCATATG ACGGCTGAAC 161 GTATGATCCG TATCGCCACC ACAGAAATTC GTAAAGTTCC
201 GGCGTTAGGA AACTGTGACA CTATGAGTTT TGTCAGTGCG 241 ATCGTACAGT GTTCACAGCT CGGACTTGAG CCAGGTAGCG 281 CCCTCGGTCA TGCATATTTA CTGCCTTTTG GTAATAAAAA 321 CGAAAAGAGC GGTAAAAAGA ACGTTCAGCT AATCATTGGC 361 TATCGCGGCA TGATTGATCT GGCTCGCCGT TCTGGTCAAA 401 TCGCCAGCCT GTCAGCCCGT GTTGTCCGTG AAGGTGACGA 441 GTTTAGCTTC GAATTTGGCC TTGATGAAAA GTTAATACAC 481 CGCCCGGGAG AAAACGAAGA TGCCCCGGTT ACCCACGTCT 521 ATGCTGTCGC AAGACTGAAA GACGGAGGTA CTCAGTTTGA 561 AGTTATGACG CGCAAACAGA TTGAGCTGGT GCGCAGCCTG 601 AGTAAAGCTG GTAATAACGG GCCGTGGGTA ACTCACTGGG 641 AAGAAATGGC AAAGAAAACG GCTATTCGTC GCCTGTTCAA 681 ATATTTGCCC GTATCAATTG AGATCCAGCG TGCAGTATCA 721 ATGGATGAAA AGGAACCACT GACAATCGAT CCTGCAGATT 761 CCTCTGTATT AACCGGGGAA TACAGTGTAA TCGATAATTC 810 AGAGGAATAA
Another example of a RecT protein sequence is the following Escherichia coli RecT protein sequence (NCBI QTN08202.1; SEQ ID NO: 19).
1 MTKQPPIAKA DLQKTQGNRA PAAVKNSDVI SFINQPSMKE
41 QLAAALPRHM TAERMIRIAT TEIRKVPALG NCDTMSFVSA
81 IVQCSQLGLE PGSALGHAYL LPFGNKNEKS GKKNVQLIIG
121 YRGMIDLARR SGQIASLSAR VVREGDEFSF EFGLDEKLIH
161 RPGENEDAPV THVYAVARLK DGGTQFEVMT RKQIELVRSL
201 SKAGNNGPWV THWEEMAKKT AIRRLFKYLP VSIEIQRAVS
241 MDEKEPLTID PADSSVLTGE YSVIDNSEE
An example of an Escherichia ruysiae RecT sequence is shown below (NCBI MBY7351797.1; SEQ ID NO: 20).
1 MTKQPPIAKA DLQKTQGNRA PAAVKNSDVI SFINQPSMKE 41 QLAAALPRHM TAERMIRIAT TEIRKVPALG NCDTMSFVSA 81 IVQCSQLGLE PGSALGHAYL LPFGNKNEKS GKKNVQLIIG 121 YRGMIDLARR SGQIAS LSAR IVREGDE FSF EFGLDEKLIH 161 RPGENEDAPV THVYAVARLK DGGTQFEVMT RKQI E LVRSL 201 SKAGNNGPWV THWE EMAKKT AIRRLFKYLP VSIEIQRAVS 241 MDEKEPLTID PADSSVLTGE YSVIDNSEE
An example of a Salmonella RecT protein sequence is shown below (NCBI
WP_079839509.1; SEQ ID NO: 21).
1 MPKQPPIAKA DLQKTQGARP PTAVKNNNDV ISFINQPSMK
41 EQLAAALPRH MTAERMIRIA TTEIRKVPAL GDCDTMSFVS
81 AIVQCSQLGL EPGGALGHAY L LPFGNRNEK SGKKNVQLII
121 GYRGMIDLAR RSGQIASLSA RWREGDDFS FE FGLE EKLV 161 HRPGENEDAP VTHVYAVARL KDGGTQF EVM TRKQIE LVRA
201 QSKAGNNGPW VTHWEEMAKK TAIRRLFKYL PVSI EIQRAV
241 SMDEKETLTI DPADASVITG EYSWENAGV EENVTA
An example of a Mycobacterium tuberculosis RecT protein sequence is shown below
(NCBI SGD85611.1; SEQ ID NO: 22).
1 MSNPPIAQAD LQKAQGTAVK EKTKDQQLIQ FINQPGMKAQ
41 LSAALPRHIT PDRMIRIVTT EIRKTPS LAT CDMQSFIGAV
81 VQCSQLGLEP GNALGHAYLL PFGNGKATSG QPNVQLIIGY
121 RGMIDLARRS GQIISISARS VREGDSFHF E YGLNEDLTHV
161 PGENDSGPIT HVYAVARLKE GGVQF EVMSF SQIEKVRDSS 201 KAGKNGPWVS HWEEMAKKTV IRRLFKYLPV SI EMQRAVI L 241 DEKAEANVDQ EHASIF EGEY ETVSPE
An example of a Bacillus anthracis RecT protein sequence is shown below (NCBI GEU13688.1; SEQ ID NO: 23).
1 MSNDLTQITQ RS LDEQVIGN LNRLQEQGLE MPPGYSPQNA 41 LKSAF FE LTN NSGGNL LQLA ANNPETKTSI SNAL LDMVIQ 81 GLSPAKKQCY FIKYGNKVQL MRSYFGTMAV LDRVTGGAEI 121 TPVWREGDV FEIAMDGPDL VVAKHETSF E NLDNDIKAAY 141 VVIKLANGKE VTTVMTKKQI DKSWSKAKTK NVQNDFPE EM 201 AKRTVINRAA KYLINTSNDN DLFVQAAKDT LENE FERKDV 241 TPERE EQTAV LE EKIFTNNK KVI EQENDI E RITRVADVPE 281 QPDIEQAKQI EKEDLTKVAD QI LEEPVQET LDVMAGYETN 321 QKESEADVST IE EDDYPF
An example of a Clostridium tetani RecT protein sequence is shown below (NCBI SUY55099.1; SEQ ID NO: 24).
1 MATNESLKNQ LTTKKETGLG SAGNTIKGLM NSPAIKKRFE 41 EVLKQRAPQY MSSIVNLVNS DINLKKCDQM SVVASCMVAA 81 TLDLPVDKNL GYAWWPYGN KAQFQLGYKG YVQLALRTGQ 121 YKSINVI EIH EGE LIDWNPL TEE LKIDFSK KESDAVIGYA 161 GYF E L LNGFK KSTYWTKEQI TKHKNKFSKS DFGWKKDFDA 201 MARKTVLRNM LSKWGI LSIE MQNAYTADQG IIKNEIMETG 241 EVKENIEYI E ADFESYEDNS I EEGGANE
An example of a Clostridium difficile RecT protein sequence is shown below (NCBI AXU84523.1; SEQ ID NO: 25).
1 MASEKAKGAL EKKVSGANTV KVSPSKGMEQ LMNKMASQIK
41 KALPSMVSSE RFQRVALTAF SNNPRLQSCE PMSFIAAMME
81 SAQLGLEPNT PLGQAYLIPY GNKVQFQIGY KGLLE LAQRS
121 GKIKTIYAHK IRENDKFEIK YGLHQDLVHE PKLNGDRGEI
161 IGYYAVYHLD TGGHSFSFMT KEEII EFAKS KSKSYSSGPW 201 QTDFDSMAKK TVIKQL LKYA PLSIE LQKAM VGDETIKSEI
241 DEDMSMVVDE SESLEVDF EV KENMDGKVSV EEAINVD
An example of a Haemophilus influenzae RecT-like single-stranded DNA binding protein is shown below (NCBI AJO88455.1; SEQ ID NO: 26).
1 MTNQVQHQQN KQPPALKTFF ESANVQNKIK E LVGKNAATF 41 ATSVMQIANS NSMLKTADPM SIFNAACMAA TLNLPLQNGL 81 GFAYIVPFRN NKEKKTEAQF QIGYKGFIQL AQRSGQFKRL 121 VALPVYKKQL IKKDFINGFE FDWEQEPEQN ENPIGYYAYF 161 KLVNDFSAE L YMSHDDIVKH AQRYSQTFKK GYGVWHDNFE 201 AMALKTVTKL LLSKQAPLSV EMQQAVLADQ TVVKDVENQE 241 FNYTDNIQEA EF LAWDEAT F EQCKQSIAN GETTLQE LCD
281 SGAYE FSQEQ LTKLEE LENQ KAE
An example of a Staphylococcus aureus RecT protein sequence is shown below (NCBI BAV60852.1; SEQ ID NO: 27).
1 MTENNKLQTI EQQLVQEKNV SDNVLNKVRV LESQGNLE LP 41 NDYSPSNAMK QAWLQISQDN KLMSCNDTSK ANAL LDMVTQ 81 GLNPAKNQCY FIPYGNKMQL QRSYHGNVMM LKRDAGAQDV
121 VAQVIYKGDT FKQEMGGTGR IKAIKHEQDF FNIDKENIIG 161 AYCTIVFNDG RDNYIEVMTI EQIKQAWMQS SMIKDEKALQ 201 NSKTHNNFKE EMAKKTVINR AAKRYINTST DSNLFKYAQE 241 SEQRQRKEVL DAEVEENANQ EQLDF EQPVL EEAQYTE LEN 281 DKPIDVSDF E EIKEPATEKE SEE EPF
An example of a Streptococcus pneumoniae RecT protein sequence is shown below
(NCBI VRD08895.1; SEQ ID NO: 28).
1 MANEIAKFDT LTPQQAFKSP AALEKFKSVL DGSETQFVAS
41 L LSIINNNSY LAQATNTSIM NAAMKAATLK LPIEPS LGMA
81 YWPYNRSEK RGNTWVKINE AQFQMGYKGF IQLAQRSGQI
121 RNINCDVVYK EE F LRYDKVY GTLHLTDEQV DSGEVEGYFA 161 S LE LINGFRK MI FWKKEKVI AHAQKYSKTY DKKTGDFKPG 201 TPWKTEFDAM AQKTLIKE LL SKYAPLSIE L QKAI LADNED 241 SNVNEVKRAK DVTPQEPENL SDL LSAPEE E QAKDVTPLED
281 DAQNSAADSV PDFVDPENGQ MDMLEGEDF
A protein sequence for a RecT from a Collinsella ster coris phage (called CspRecT;
NCBI: WP_006720782.1; SEQ ID NO: 29) is shown below.
1 MNQIVKFTDD SGLAVQVTPD DVRRYICENA TEKEVGLF LQ
41 LCQTQRLNPF VKDAYLVKYG GAPASMITSY QVFNRRACRD 81 ANYDGIKSGV WLRDGDVVH KRGAACYKKA GE E LIGGWAE 121 VRFKDGRETA YAEVALDDYS TGKSNWAKMP GVMI EKCAKA 161 AAWRLAFPDT FQGMYAAE EM DQAQQPEQVR AQAEQPVDLQ 201 PIRE LFKPYC EHFGITPAEG MTAVCGAVGA EGMHSMTEQQ
241 ARRARAWME E EMAAPAVEAE YEVVDEGEVF
The bacterial host cells can also have modified mismatch repair functions. For example, genes encoding mismatch repair enzymes can be modified to reduce mismatch repair. In some cases, one or more mismatch repair genes can be modified so that the encoded protein may bind to a mismatch site but be unable to correct the mismatch, resulting in unrepaired sites that are blocked from repair by other repair mechanisms.
One example of a gene involved in mismatch repair within E. coli is the mutL gene. A protein sequence for an Escherichia coli DSM 30083 MutL is shown below (NCBI ACZ50725.1; SEQ ID NO: 30).
1 MPIQVLPPQL ANQIAAGEW ERPASWKE L VENS LDAGAT 41 RIDIDIERGG AKLIRIRDNG CGIKKDE LAL ALARHATSKI 81 ASLDDLEAII SLGFRGEALA SISSVSRLTL TSRTAEQQEA 121 WQAYAEGRDM NVTVKPAAHP VGTTLEVLDL FYNTPARRKF 161 LRTEKTE FNH IDEIIRRIAL ARFDVTINLS HNGKIVRQYR 201 AVPEGGQKER RLGAICGTAF LEQALAI EWQ HGDLTLRGWV 241 ADPNHTTPAL AEIQYCYVNG RMMRDRLINH AIRQACEDKL 281 GADQQPAFVL YLEIDPHQVD VNVHPAKHEV RFHQSRLVHD 321 FIYQGVLSVL QQQLETPLPL DDEPQPAPRS IPENRVAAGR 361 NHFAEPAARE PVAPRYTPAP ASGSRPAAPW PNAQPGYQKQ 401 QGEVYRQLLQ TPAPMQKLKA PEPQEPALAA NSQSFGRVLT 441 IVHSDCALLE RDGNIS LLSL PVAERWLRQA QLTPGEAPVC 481 AQPLLIPLRL KVSAEEKSAL EKAQSALAE L GIDFQSDAQH 521 VTIRAVPLPL RQQNLQI LIP E LIGYLAKQS VF EPGNIAQW 561 IARNLMSEHA QWSMAQAITL LADVERLCPQ LVKTPPGGLL 601 QSVDLHPAIK ALKDE
A mutant E. coli mutL protein with a replacement of the glutamic acid (E) at position 32 with a lysine (K) is a dominant negative mutL (E32K) mutant protein, which even in the presence of wild type mutL, inhibits overall mismatch repair reaction, as well as MutH activation.
A nucleotide sequence for wild type Escherichia coli mutL is shown below (NCBI GU134327.1; SEQ ID NO: 31).
1 ATGTAGCGAC CAGTATGATC AGTCAGTTGC AACGCATTGG 41 CGAAATACAT AAACGCCGAC CAGAACACGC GAGTCTCGGC 81 GTTCTGCGTT CGCCGGATAT CCCGTCAGTA CTGGTCGAAA 121 CCGGTTTTAT CAGCAACAAC AGCGAAGAAC GTTTGCTGGC 161 GAGCGACGAT TACCAACAAC AGCTGGCAGA AGCCATTTAC 201 AAAGGCCTGC GCAATTATTT CCTTGCGCAT CCGATGCAAT
241 CTGCGCCGCA GGGGGCAACG GCACAAACTG CCAGTACGGT 281 GACGACGCCA GATCGCACGC TGCCAAACTA AGGACGATTG 321 ATGCCAATTC AGGTCTTACC GCCACAACTG GCGAACCAGA
Figure imgf000145_0001
An example of a mutL protein sequence from Mycobacterium tuberculosis is shown below (NCBI SGC95817.1; SEQ ID NO: 32). A glutamic acid at position 32 is highlighted also below.
1 MAIRI LPPQL ANQIAAGEW ERPASWKE L VENS LDAGAT 41 RIDIDIERGG EKLIRIRDNG CGIAKDE LAL ALARHATSKI 81 ATLDDLEAIM SMGFRGEALA SISSVSRLTF TSRTRDQNEA 121 WQAYAEGREM AVTLKPAAHP AGSTVEVLDL FFNTPARRKF 161 LRTEKTE FGH IDEVVRRIAL SRFDVAINLT HNGKLMRQYR 201 PVKAEDQQER RLGAICGTAF MQQALAVTWS HE E LEIRGWV 241 ASPAEYNGPA DLQYCYVNGR MMRDRLINHA IRQAYEDRLS 281 GDQQPAYVLY LTIDPRQVDV NVHPAKHEVR FHQARLVHDF 321 IYQAVLSVLK QHEQPVSLFQ PDPSAPQTQH VPENRAAAGK 361 NIF EREEALT PPPHRETTGG GASHSGHSSG KPAYSADKPV 401 YSPKEAGIYQ SLMQTPAETR PQLFPEKSTF LSENRISAAS 441 EPVTEKSERV SFGKI LTLYP PCYALIETDA GIALFS LSKA 481 SHYLRCQQLI PGEQGLKSQP LMIPLQMALN AQECETFTQF 521 AGVLRTFGI E GSVSRGKATI RTVSLPLRQQ NLPHLIPE LL 561 RF LADNPEGD EKAIASRLAE MLVSEPAAQS KAQAVQLLAD 601 VERLCPQLVR RPPADL LQLI DLTEVVAALR HE
An example of a Salmonella enterica mutL protein sequence is shown below (NCBI ACL55048.1; SEQ ID NO: 33). A glutamic acid at position 32 is highlighted also below.
1 MPIQVLPPQL ANQIAAGEW ERPASWKE L VENS LDAGAT 41 RVDIDIERGG AKLIRIRDNG CGIKKEE LAL ALARHATSKI 81 ASLDDLEAII SLGFRGEALA SISSVSRLTL TSRTAEQAEA 121 WQAYAEGRDM DVTVKPAAHP VGTTLEVLDL FYNTPARRKF 161 MRTEKTE FNH IDEIIRRIAL ARFDVTLNLS HNGKLVRQYR 201 AVAKDGQKER RLGAICGTPF LEQALAI EWQ HGDLTLRGWV 241 ADPNHTTTAL TEIQYCHVNG RMMRDRLINH AIRQACEDKL 281 GADQQPAFVL YLEIDPHQVD VNVHPAKHEV RFHQSRLVHD 321 FIYQGVLSVL QQQTETTLPL EDIAPAPRHV PENRIAAGRN 361 HFAVPAEPTA AREPATPRYS GGASGGNGGR QSAGGWPHAQ 401 PGYQKQQGEV YRAL LQTPTT SPAPEAVAPA LDGHSQSFGR 441 VLTIVCGDCA LLEHAGTIQL LSLPVAERWL RQAQLTPGQS 481 PVCAQPL LIP LRLKVSADEK AALQKAQSL L GE LGIE FQSD 521 AQHVTIRAVP LPLRQQNLQI LIPE LIGYLA QQTTFATVNI 561 AQWIARNVQS EHPQWSMAQA ISL LADVERL CPQLVKAPPG 601 GLLQPVDLHS AMNALKHE
An example of a Bacillus anthracis mutL protein sequence is shown below (NCBI WP 000516478.1; SEQ ID NO: 34). A glutamic acid at a position homologous to position 32 (position 33) is also highlighted below.
1 MGKIRKLDDQ LSNLIAAGEV VERPASWKE LVENSIDANS 41 TSIEIHLEEA GLSKIRIIDN GDGIAEEDCI VAFERHATSK 81 IKDENDLFRI RTLGFRGEAL PSIASVSELE LITSTGDAPG 121 THLIIKGGDI IKQEKTASRK GTDITVQNLF FNTPARLKYM 161 KTIHTELGNI TDIVYRIAMS HPEVSLKLFH NEKKLLHTSG 201 NGDVRQVLAS IYSIQVAKKL VPIEAESLDF TIKGYVTLPE 241 VTRASRNYMS TIVNGRYVRN FVLMKAIQQG YHTLLPVGRY 281 PIGFLSIEMD PMLVDVNVHP AKLEVRFSKE QE LLKLIEET 321 LQAAFKKIQL IPDAGVTTKK KEKDESVQEQ FQFEHAKPKE 361 PSMPEIVLPT GMDEKQEEPQ AVKQPTQLWQ PSTKPIIEEP 401 IQEEKSWDSN EEGFELEE LE EVREIKEIEM NGNDLPPLYP 441 IGQMHGTYIF AQNDKGLYMI DQHAAQERIN YEYFRDKVGR 481 VAQEVQE LLV PYRIDLSLTE F LRVEEQLEE LKKVGLFLEQ 521 FGHQSFIVRS HPTWFPKGQE TEIIDEMMEQ WKLKKVDIK 561 KLREEAAIMM SCKASIKANQ YLTNDQIFAL LEELRTTTNP 601 YTCPHGRPI L VHHSTYELEK MFKRVM
An example of a Clostridium tetani mutL protein sequence is shown below (NCBI
WP 011099529.1; SEQ ID NO: 35). A glutamic acid at a position homologous to position 32
(position 33) is also highlighted below.
1 MKRINILDEC TFNKIAAGEV VERPFSVVKE LVENSIDAEA 41 KNITVEVKNG GQDLIKVSDD GAGIYADDIQ KAFLTHATSK 81 I LNIDDIFSL NTMGFRGEAL PSIASISKI L LKSKPLSETS 121 GKEIYMEGGN FISFNDVGMN TGTTIKVTDL FYNVPARLKF 161 LKSSSRESSL ISDIIQRLSL ANPDIAFKLI NNGKTVLNTY 201 GSGNLEDAIR VIYGKKTLEN ISYFESHSDI ISVYGYIGNA 241 E LSRGSRNNQ SIFVNKRYIK SGLITAAVEN AFKSFLTINK 281 FPFFVIFIDI FPEYIDVNVH PTKTEIKFKE DKIVFSFVFK 321 TVHESIKKSL YKEFNEQIKE DVKEDNKEII KENPSLFQNV 361 EKVQIPIDLK SASMDIERKS LVNSVLCNEN NIVKDNINKN 401 IYIDTKENLS ENKLKNILKE NTEDMVSKLP DMKIIGQFDN 441 TYI LAESVKN LYIIDQHAAH EKI LFETYRD KIKKDEVKSQ 481 LLLQPIVLE L DSEDFSYYVD NKE LFYKTGF NIEVFGENTI 521 NIREVPFIMG KPDINNLFMD IINNIKAMGS GETIEVKYDS 561 IAMLACKSAV KAHDKLSKEE MEALINDLRF AKDPFNCPHG 601 RPTIIKITSL ELEKKFKRIQ
An example of a Clostridium difficile mutL protein sequence is shown below (NCBI
WP_211652115.1; SEQ ID NO: 36). A glutamic acid at a position homologous to position 32
(position 34) is also highlighted below.
1 MKNIINI LDD LTINKIAAGE VVERPSSWK ELIENSIDAG
41 ANKISIDIID GGKSLIKITD NGIGIPSSEV EKSF LRHATS
81 KIKKIDDLYD LYSLGFRGEA LASISAVSKL EMTTKTKDEI
121 IGTKIYVEGG KIISKEPIGS TNGTTIIIKD IFFNTPARQK
161 F LKSTHAETI NISDLINKLA IGNPNIQFKY TNNNKQMLNT
201 PGDGKLVNTI RSIYGKEITE NIIDVEFKCN HFKMNGYIGN 241 NNIYRSNKNL QHIYINKRFV KSKIIIDAIT ESYKSIIPIG 281 KHVVCFLNIE VDPSCIDVNI HPNKLEIKFE KEQEVYIELR 321 DFLKVKLIHS NLIGKYATYS DKKTQPRIAI NSREKSTDYK 361 LRNNDLLEST HKNSNITKGK DEVIEWTLS SEKPINEFQS 401 VSEVLNASVE DDVKNINYLS EDSVNDNIQE EFQVDGIKNE 441 GNYYLGDSIK DSEEEYSCSS KRKFSLYGYS VIGVVFNTYI 481 ILSKDDSMYL LDQHAAHERI LYERYMEKFY RQDINMQILL 521 DPVVIEVSNV DMLQIENNLE LFMKFGFELE IFGNNHIMVR 561 CVPTIFGVPE TEKFILQIID NIEEITSNYD LKGERFASMA 601 CRSAIKANDK IYDIEIKSLL EQLEKCENPF TCPHGRPIMV 641 EISKTEIEKM FKRIM
An example of a Haemophilus influenzae mutL protein sequence is shown below (NCBI AVJ09575.1; SEQ ID NO: 37). A glutamic acid at position 32 is also highlighted below.
1 MPIRILSPQL ANQIAAGEW ERPASWKEL VENSLDAGAN 41 KIQIDIENGG ANLIRIRDNG CGIPKEELSL ALARHATSKI 81 ADLDDLEAIL SLGFRGEALA SISSVSRLTL TSRTEEQTEA 121 WQVYAQGRDM ETTIKPASHP VGTTVEVANL FFNTPARRKF 161 LRTDKTEFAH IDEVIRRIAL TKFNTAFTLT HNGKIVRQYR 201 PAFDLNQQLK RVAVICGDDF VKNALRIDWK HDDLHLSGWV 241 ATPNFSRTQN DLSYCYINGR MVRDKVISHA IRQAYAQYLP 281 TDAYPAFVLF IDLNPHDVDV NVHPTKHEVR FHQQRLIHDF 321 IYEGISYALN NQEQLNWHTE QSAVENHEEN TVREPQPNYS 361 IRPNRAAAGQ NSFAPQYHEK PQQNQPHFSN TPVLPNHVST 401 GYRDYRSDAP SKTEQRLYAE LLRTLPPTAQ KDISNTAQQN 441 ISDTAKIIST EIIECSSHLR ALSLIENRAL LLQQNQDFFL 481 LSLEKLQRLQ WQLALQQIQI EQQPLLIPIV FRLTEAQFQA 521 WQQYSDNFKK IGFEFIENQA QLRLTLNKVP NVLRTQNLQK 561 CVMAMLTRDE NSSPFLTALC AQLECKTFDA LADALNLLSE 601 TERLLTQTNR TAFTQLLKPV NWQPLLDEI
An example of a Staphylococcus aureus mutL protein sequence is shown below (NCBI
YP 499806.1; SEQ ID NO: 38). A glutamic acid at a position homologous to position 32
(position 33) is also highlighted below.
1 MGKIKELQTS LANKIAAGEV VERPSSVVKE LLENAIDAGA
41 TEISIEVEES GVQSIRWDN GSGIEAEDLG LVFHRHATSK
81 LDQDEDLFHI RTLGFRGEAL ASISSVAKVT LKTCTDNANG
121 NEIYVENGEI LNHKPAKAKK GTDILVESLF YNTPARLKYI
161 KSLYTELGKI TDIVNRMAMS HPDIRIALIS DGKTMLSTNG 201 SGRTNEVMAE IYGMKVARDL VHISGDTSDY HIEGFVAKPE 241 HSRSNKHYIS IFINGRYIKN FMLNKAILEG YHTLLTIGRF
281 PICYINIEMD PILVDVNVHP TKLEVRLSKE EQLYQLIVSK
321 IQEAFKDRIL IPKNNLDYVP KKNKVLHSFE QQKIEFEQRQ 361 NTENNQEKTF SSEESNSKPF MEENQNDEIV IKEDSYNPFV 401 TKTSESLIAD DESSGYNNTR EKDEDYFKKQ QEILQEMDQT
441 FDSNDGTTVQ NYENKASDDY YDVNDIKGTK SKDPKRRIPY 481 MEIVGQVHGT YIIAQNEFGM YMIDQHAAQE RIKYEYFRDK 521 IGEVTNEVQD LLIPLTFHFS KDEQLVIDQY KNE LQQVGIM 561 LEHFGGHDYI VSSYPVWFPK DEVEEIIKDM IE LI LE EKKV 601 DIKKLREDVA IMMSCKKSIK ANHYLQKHEM SDLIDQLREA 641 EDPFTCPHGR PIIINFSKYE LEKLFKRVM
An example of a Streptococcus pneumoniae mutL protein sequence is shown below (NCBI ABO44018.1; SEQ ID NO: 39). A glutamic acid at a position homologous to position
32 (position 33) is also highlighted below.
1 MSHII E LPEM LANQIAAGEV I ERPASVVKE LVENAIDAGS
41 SQIII EI EEA GLKKVQITDN GHGIAHDEVE LALRRHATSK
81 IKNQADLFRI RTLGFRGEAL PSIASVSVLT LLTAVDGASH
121 GTKLVARGGE VE EVIPATSP VGTKVCVEDL FFNTPARLKY
161 MKSQQAE LSH IIDIVNRLGL AHPEISFSLI SDGKEMTRTA
201 GTGQLRQAIA GIYGLASAKK MIEIENSDLD FEISGFVS LP
241 E LTRANRNYI SLFINGRYIK NF L LNRAI LD GFGSKLMVGR
281 FPLAVIHIHI DPYLADVNVH PTKQEVRISK EKE LMTLVSE
321 AIANS LKEQT LIPDALENLA KSTVRNREKV EQTI LPLKEN
361 TLYYEKTEPS RPSQTEVADY QVE LTDEGQD LTLFAKETLD
401 RLTKPAKLHF AERKPANYDQ LDHPE LDLAS IDKAYDKLER
441 E EASSFPE LE FFGQMHGTYL FAQGRDGLYI IDQHAAQERV
481 KYE EYRESIG NVDQSQQQLL VPYIF EFPAD DALRLKERMP
521 L LE EVGVF LA EYGENQFI LR EHPIWMAEE E IESGIYEMCD
561 MLL LTKEVSI KKYRAE LAIM MSCKRSIKAN HRIDDHSARQ
601 L LYQLSQCDN PYNCPHGRPV LVHFTKSDME KMFRRIQENH
641 TSLRE LGKY
A variety of single-stranded binding proteins (SSBs) can be expressed, either endogenously or recombinantly, to facilitate recombination during editing of genomes. In general, the one or more single-stranded binding proteins so expressed are compatible with one or more single strand annealing proteins (SSAPs) to promote recombination during editing.
For example, one type of SSB that can be expressed by bacteria during editing of phage genomes is the Escherichia coli str. K-12 substr. MG1655 ssDNA-binding protein with the following sequence (NCBI NP_418483.1; SEQ ID NO: 40).
1 MASRGVNKVI LVGNLGQDPE VRYMPNGGAV ANITLATSES
41 WRDKATGEMK EQTEWHRVVL FGKLAEVASE YLRKGSQVYI
81 EGQLRTRKWT DQSGQDRYTT EWVNVGGTM QMLGGRQGGG
121 APAGGNIGGG QPQGGWGQPQ QPQGGNQFSG GAQSRPQQSA
161 PAAPSNEPPM DFDDDIPF
A nucleotide sequence for the above Escherichia coli str. K-12 substr. MG1655 ssDNA- binding protein is shown below (NCBI NC_000913.3; SEQ ID NO: 41) 1 ATGGCCAGCA GAGGCGTAAA CAAGGTTATT CTCGTTGGTA 41 ATCTGGGTCA GGACCCGGAA GTACGCTACA TGCCAAATGG 81 TGGCGCAGTT GCCAACATTA CGCTGGCTAC TTCCGAATCC 121 TGGCGTGATA AAGCGACCGG CGAGATGAAA GAACAGACTG 161 AATGGCACCG CGTTGTGCTG TTCGGCAAAC TGGCAGAAGT 201 GGCGAGCGAA TATCTGCGTA AAGGTTCTCA GGTTTATATC 241 GAAGGTCAGC TGCGTACCCG TAAATGGACC GATCAATCCG 281 GTCAGGATCG CTACACCACA GAAGTCGTGG TGAACGTTGG 321 CGGCACCATG CAGATGCTGG GTGGTCGTCA GGGTGGTGGC 361 GCTCCGGCAG GTGGCAATAT CGGTGGTGGT CAGCCGCAGG 401 GCGGTTGGGG TCAGCCTCAG CAGCCGCAGG GTGGCAATCA 441 GTTCAGCGGC GGCGCGCAGT CTCGCCCGCA GCAGTCCGCT 481 CCGGCAGCGC CGTCTAACGA GCCGCCGATG GACTTTGATG 521 ATGACATTCC GTTCTGA
Another example of an Escherichia coli SSB protein sequence is shown below (NCBI
WP_222563270.1; SEQ ID NO: 42).
1 MWKRGENKVI LMGRAGKDAE VRYTPNGTAI AS LTLATEIS
41 YNDNEGKEQK ETEWHDIVIF GKKAEAAGKY FKKGMMLYFV
81 GRIRNNKWQG TDGKMRSNKE IVIDNNGEMQ MLPGAGPRNT
121 SAEGQGGIEN HDEPPFPDMN DYPQ
An example of a Klebsiella pneumoniae SSB protein sequence is shown below (NCBI ANI75733.1; SEQ ID NO: 43).
1 MINIKYMRCI MWKRGENKVI LMGRSGKDAE LRYTPNGTAI
41 ASLTLATEIS YSDNEGKEQK ETEWHDIVI F GKKAEAAGKY
81 FKKGMMLYFV GRIRNNKWQG TDGKMRSNKE IVIDNNGEMQ
121 MLPGAGPRNT SAEGQGGI EN HDEPPFPDMN DYPQ
Another example of a Klebsiella pneumoniae SSB protein sequence is shown below (NCBI WP_102017779.1; SEQ ID NO: 44).
1 MWKRGENKVI LMGRAGKDAE VRYTPNGTAI AS LTLATEIS
41 YSDNEGKEQK ETEWHDIVIF GKKAEAAGKY FKKGMMLYFV
81 GRIRNNKWQG TDGKMRSNKE IVIDNNGEMQ MLPGAGPRNT
121 SAEGQGGFEN HDEPPFPDMN DYPQ
An example of a multi-species Enterobacterales SSB protein sequence is shown below (NCBI WP_011091064.1; SEQ ID NO: 45).
1 MWKRGENKVI LMGRAGKDAE VRYTPNGTAI AS LTLATEIS
41 YSDNEGKEQK ETEWHDIVIF GKKAEAAGKY FKKGMMLYFV
81 GRIRNNKWQG TDGKMRSNKE IVIDNNGEMQ MLPGAGPRNT
121 SAEGLGGTEN HDEPPFPDMN DYPQ Another example of a multi-species Enterobacterales SSB protein sequence is shown below (NCBI WP_004187334.1; SEQ ID NO: 46).
1 MWKRGENKVI LMGRAGKDAE VRYTPNGTAI AS LTLATEIS
41 YSDNEGKEQK ETEWHDIVIF GKKAEAAGKY FKKGMMLYFV
81 GRIRNNKWQG TDGKMRSNKE IVIDNNGEMQ MLPGAGPRNT
121 SAEGQGGIEN HDEPPFPDMN DYPQ
An example of a Salmonella enterica SSB protein sequence is shown below (NCBI EEC4048472.1; SEQ ID NO: 47).
1 MWKRGENKVI LMGRAGKDAE VRYTPNGTAI AS LTLATEIS
41 YSDNEGKEQK ETEWHDIVIF GKKAEAAGKY FKKGMMLYFV
81 GRIRNNKWQG TDGKMRSNKE IVIDNNGEMQ MLPGAGPRNT
121 STEGLGGTEN HDEPPFPDMN DYPQ
Another example of a Salmonella enterica SSB protein sequence is shown below
(NCBI EGJ 1037365.1; SEQ ID NO: 48).
1 MWKRGENKVI LMGRAGKDAE VRYTPNGTAI AS LTLATEIS
41 YSDNEGKEQK ETEWHDIVIF GKKAEAAGKY FKKGMMLYFV
81 GRIRNNKWQG TDGKMRSNKE IVIDNNGEMQ MLPGAGPRNT
121 SAEGHGGIEN HDEPPFPDMN DYPQ
An example of an Enterobacter hormaechei subsp. Steigerwaltii SSB protein sequence is shown below (NCBI HAS0717021.1; SEQ ID NO: 49).
1 KRGENKVI LM GRAGKDAEVR YTPNRTAIAS LTLATEISYS
41 DNEGKEQKET EWHDIVIFGK KAEAAGKYFK KGMMLYFVGR 81 IRNNKWQGTD GKMRSNKEIV IDNNGEMQML PGAGPRNTSA 121 EGQGGIENHD EPPFPDMNDY PQ
An example of a Citrobacter freundii SSB protein sequence is shown below (NCBI EHL7056105.1; SEQ ID NO: 50).
1 VI LMGRSGKD AE LRYTPNGT AIASLTLATE ISYSDNEGKE
41 QKETEWHDIV IFGKKAEAAG KYFKKGMMLY FVGRIRNNKW
81 QGTDGKMRSN KEIVIDNNGE MQMLPGAGPR NTSAEGQGGI
121 ENHDEPPFPD MNDYPQ
Variants and homologs of any of the sequences described here can also be used in the methods and systems described herein. For example, such variants and homologs can have less than 100% sequence identity to any of the sequences described herein. The variants and homologs can have about at least 40% sequence identity, or at least 50% sequence identity, or at least 60% sequence identity, or at least 70% sequence identity, or at least 80% sequence identity, or at least 90% sequence identity, or at least 95% sequence identity, or at least 96% sequence identity, or at least 97% sequence identity, or at least 98% sequence identity, or at least 99% sequence identity, or 60-99% sequence identity, or 70-99% sequence identity, or 80- 99% sequence identity, or 90-95% sequence identity, or 90-99% sequence identity, or 95-97% sequence identity, or 97-99% sequence identity, or 100% sequence identity with any of sequences described herein.
EUKARYOTIC CELLS
In some embodiments, the host cell is a prokaryotic cell, an archaeal cell, or a eukaryotic host cell. In some embodiments, the eukaryotic host cell is a mammalian cell, such as a human cell, a non-human cell, or a non-human mammalian cell. In some embodiments, the host cell is an artificial cell or genetically modified cell. In some embodiments, the host cell is in vitro, such as a tissue culture cell. In some embodiments, the host cell is within a living host organism.
Cells that may contain any of the compositions described herein. The methods described herein are used to deliver recombinant retrons or components thereof into a eukaryotic cell (e.g., a mammalian cell, such as a human cell). In some embodiments, the cell is in vitro (e.g., cultured cell. In some embodiments, the cell is in vivo (e.g., in a subject such as a human subject). In some embodiments, the cell is ex vivo (e.g., isolated from a subject and may be administered back to the same or a different subject).
The present disclosure contemplates the use of any suitable host cell. For example, the cell host can be a mammalian cell. Mammalian cells of the present disclosure include human cells, primate cells (e.g., vero cells), rat cells (e.g., GH3 cells, OC23 cells) or mouse cells (e.g., MC3T3 cells). There are a variety of human cell lines, including, without limitation, human embryonic kidney (HEK) cells, HeLa cells, cancer cells from the National Cancer Institute's 60 cancer cell lines (NCI60), DU145 (prostate cancer) cells, Lncap (prostate cancer) cells, MCF-7 (breast cancer) cells, MDA-MB-438 (breast cancer) cells, PC3 (prostate cancer) cells, T47D (breast cancer) cells, THP-1 (acute myeloid leukemia) cells, U87 (glioblastoma) cells, SHSY5Y human neuroblastoma cells (cloned from a myeloma) and Saos-2 (bone cancer) cells. In some embodiments, the cells can be human embryonic kidney (HEK) cells (e.g., HEK 293 or HEK 293T cells). In some embodiments, the cells can be stem cells (e.g., human stem cells) such as, for example, pluripotent stem cells (e.g., human pluripotent stem cells including human induced pluripotent stem cells (hiPSCs)). A stem cell refers to a cell with the ability to divide for indefinite periods in culture and to give rise to specialized cells. A pluripotent stem cell refers to a type of stem cell that is capable of differentiating into all tissues of an organism, but not alone capable of sustaining full organismal development. A human induced pluripotent stem cell refers to a somatic (e.g., mature or adult) cell that has been reprogrammed to an embryonic stem cell-like state by being forced to express genes and factors important for maintaining the defining properties of embryonic stem cells (see, e.g., Takahashi and Yamanaka, Cell 126 (4): 663-76, 2006, incorporated by reference herein). Human induced pluripotent stem cells express stem cell markers and are capable of generating cells characteristic of all three germ layers (ectoderm, endoderm, mesoderm).
Some aspects of this disclosure provide cells comprising any of the compositions disclosed herein, including, but not limited to, engineered retrons and/or retron components, engineered ncRNAs, engineered msDNA, engineered RT, nucleic acid molecules encoding the engineered retrons and/or retron components, and vector or vector systems encoding the engineered retrons and/or retron components, and any combinations thereof. In some embodiments, a host cell is transiently or non-transiently transfected with one or more delivery systems described herein, including virus-based systems, virus-like particle systems, and nonvirus-base delivery, including LNPs and liposomes. In some embodiments, a cell is transfected as it naturally occurs in a subject. In some embodiments, a cell that is transfected is taken from a subject, i.e., ex vivo transfection. In some embodiments, the cell is derived from cells taken from a subject, such as a cell line. A wide variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD- 3, NHDF, HeLa-S3, Huhl, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Panel, PC-3, TF1, CTLL-2, C1R, Rat6, CV1, RPTE, A10, T24, J82, A375, ARH- 77, Calul, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bcl-1, BC-3, IC21, DLD2, Raw264.7, NRK, NRK-52E, MRC5, MEF, Hep G2, HeLa B, HeLa T4, COS, COS-1, COS-6, COS-M6A, BS-C-1 monkey kidney epithelial, BALB/3T3 mouse embryo fibroblast, 3T3 Swiss, 3T3-L1, 132-d5 human fetal fibroblasts; 10.1 mouse fibroblasts, 293-T, 3T3, 721, 9L, A2780, A2780ADR, A2780cis, A 172, A20, A253, A431, A-549, ALC, B16, B35, BCP-1 cells, BEAS-2B, bEnd.3, BHK-21, BR 293. BxPC3. C3H-10T1/2, C6/36, Cal-27, CHO, CHO-7, CHO-IR, CHO-K1, CHO-K2, CHO- T, CHO Dhfr -/-, COR-L23, COR-L23/CPR, COR-L23/5010, COR-L23/R23, COS-7, COV- 434, CML Tl, CMT, CT26, D17, DH82, DU145, DuCaP, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, HEK-293, HeLa, Hepalclc7, HL- 60, HMEC, HT-29, Jurkat, JY cells, K562 cells, Ku812, KCL22, KG1, KYO1, LNCap, Ma- Mel 1-48, MC-38, MCF-7, MCF-10A, MDA-MB-231, MDA-MB-468, MDA-MB-435, MDCK II, MDCK 11, MOR/0.2R, MONO-MAC 6, MTD-1A, MyEnd, NCI- H69/CPR, NCI- H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NALM-1, NW-145, OPCN/OPCT cell lines, Peer, PNT-1A/PNT 2, RenCa, RIN-5F, RMA/RMAS, Saos-2 cells, Sf-9, SkBr3, T2, T- 47D, T84, THP1 cell line, U373, U87, U937, VCaP, Vero cells, WM39, WT-49, X63, YAC- 1, YAR, and transgenic varieties thereof.
Cell lines are available from a variety of sources known to those with skill in the art (see, e.g., the American Type Culture Collection (ATCC) (Manassus, Va.)). In some embodiments, a cell transfected with one or more retron delivery systems described herein is used to establish a new cell line comprising one or more nucleic acid molecules encoding the recombinant retron-based gene editing systems described herein or encoding at last a component of said systems (e.g., a recombinant ncRNA or a recombinant retron RT).
Bacteriophages
Various types of bacteriophages (phages) can be modified using the editing components, bacterial host cells, and methods described herein. Bacteriophages are ubiquitous viruses, found wherever bacteria exist. It is estimated there are more than 1031 bacteriophages on the planet, more than every other organism on Earth, including bacteria. Many types of bacteriophages can be modified by the methods and editing systems described herein. However, in some cases the phages to be modified are DNA phages. For example, the phages to be modified can have double-stranded genomes.
The phage with the genomes to be edited can be lytic phages, which are easier to isolate than temperate phages. However, in some cases the phages with the genomes that will be edited can be temperate phages. For example, one type of editing that can be performed using the methods described herein can be converting temperate phages or lysogenic phages into lytic phages.
The vast majority of phages belong to the order of Caudovirales, which are tailed phages that have dsDNA and an isometric capsid. Caudovirales is comprised of three phylogenetically-related families that are discriminated by tail morphology: Myoviridae (long contractile tails), Siphoviridae (long non-contractile tails), and Podoviridae (short tails) (Ackermann, 2007; Krupovic, Prangishvili, Hendrix, & Bamford, 2011). The most well- studied tailed phages are the coliphages A (Siphoviridae), T4 (Myoviridae), and T7 (Podoviridae which infect Escherichia coli. Any such phage species can be genomically modified using the methods described herein. The bacteriophage database at the website phagesdb.org provides information and sequences for bacteriophages that can used to identify target sites for editing. The NCBI database also provides sequences for bacteriophages that can used to identify target sites for editing.
Examples of bacteriophages that can be modified include: bacteriophage lambda, T2, T5, T7, PDX, vB EcoS-28621, vB EcoS-286211, vB EcoS-2862111, vB_EcoS-2862IV, vB_EcoS-2862V, vB EcoS- 260201, vB EcoS-2602011, vB EcoS-26020111, vB_EcoS-26020IV, vB_EcoS-26020V; bacteriophage Pal (ATCC 12,175-B1), Pa2 (ATCC 14203-B1), and Pal l (ATCC 14205-B1) that can inhibit P. aeruginosa strain PA01; bacteriophage (|)MR11 that can lyse multidrug resistant S. aureus; bacteriophage KP DPI, SA DPI, PA DP4, and EC DP3 (isolated from wastewater against multi-drug resistant bacteria including K. pneumoniae, S. aureus, P. aeruginosa, and E. coll); bacteriophage AB-Navyl, AB-Navy2, AB-Navy3, and AB-Navy4, which can inhibit the wound infection caused by multi-drug resi stant dcvz/c/riAac/c/' baumannii; bacteriophage Sb-1, MR-5 and MR- 10 that can inhibit or lyse Staphylococcus aureus; bacteriophage Kpnl, Kpn2, Kpn3, Kpn4, Kpn5, K01, K02, K03, K04, K05 that can inhibit Klebsiella pneumoniae.
Some pathogenic bacterial toxins are encoded by bacteriophage genomes such that the host bacteria are only pathogenic when lysogenized by the toxin-encoding phage. Examples of toxins that can be encoded by bacteriophages are cholera toxin in Vibrio cholerae, diphtheria toxin in Corynebacterium diphtheriae, botulinum neurotoxin in Clostridium botulinum, the binary toxin of Clostridium difficile, and Shiga toxin of Shigella species. Without their phage- encoded toxins, these bacterial species are either much less pathogenic or not pathogenic at all. Hence, toxin-encoding genes in bacteriophage genomes can be deleted or knocked out using the methods described herein before those phage are further modified.
Bacterial cells can have a couple of mechanisms that can interfere with phage infection, including receptor/adsorption blocking; abortive infection; clustered, regularly interspaced short palindromic repeats (CRISPR) with CRISPR-associated (Cas) proteins (CRISPR-Cas); and restriction modification (RM). Phage can be modified to make them less vulnerable to these bacterial cell defense mechanisms.
PHARMACEUTICAL COMPOSITIONS
The engineered retron, or one or more components thereof (e.g., engineered ncRNAs, engineered msDNA, engineered RT, nucleic acid molecules encoding the engineered retrons and/or retron components, guide RNAs, programmable nucleases) may be provided as pharmaceutical compositions. For example, one or more LNPs or other non-virus-based delivery system comprising one or more circular or linear RNA molecules encoding each of the components of the retron-based genome editing system may be formulated as a pharmaceutical composition for administering to a subject in need (e.g., a human in need of gene editing).
Formulations can include, without limitation, saline, liposomes, lipid nanoparticles, polymers, peptides, proteins, cells transfected with viral vectors e.g., for transfer or transplantation into a subject) and combinations thereof.
Formulations of the pharmaceutical compositions described herein may be prepared by any method known or hereafter developed in the art of pharmacology. As used herein the term “pharmaceutical composition” refers to compositions comprising at least one active ingredient and optionally one or more pharmaceutically acceptable excipients.
In general, such preparatory methods include the step of associating the active ingredient with an excipient and/or one or more other accessory ingredients. As used herein, the phrase “active ingredient” generally refers an engineered retron as described herein.
A pharmaceutical composition in accordance with the present disclosure may be prepared, packaged, and/or sold in bulk, as a single unit dose, and/or as a plurality of single unit doses. As used herein, a “unit dose” refers to a discrete amount of the pharmaceutical composition comprising a predetermined amount of the active ingredient. The amount of the active ingredient is generally equal to the dosage of the active ingredient which would be administered to a subject and/or a convenient fraction of such a dosage such as, for example, one-half or one-third of such a dosage.
Other aspects of the present disclosure relate to pharmaceutical compositions comprising any of the various components of the recombinant retron-based genome editing systems described herein, including, but not limited to, engineered retrons and/or retron components, engineered ncRNAs, engineered msDNA, engineered RT, nucleic acid molecules encoding the engineered retrons and/or retron components, programmable nucleases (e.g., RNA-guided nucleases), guide RNAs, and vector or vector systems encoding the engineered retrons and/or retron components, and any combinations thereof. The term“pharmaceutical composition”, as used herein, refers to a composition formulated for pharmaceutical use. In some embodiments, the pharmaceutical composition further comprises a pharmaceutically acceptable carrier. In some embodiments, the pharmaceutical composition comprises additional agents (e.g., for specific delivery, increasing half-life, or other therapeutic compounds).
As used here, the term“pharmaceutically-acceptable carrier” means a pharmaceutically-acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., organ, tissue or portion of the body). A pharmaceutically acceptable carrier is“ acceptable” in the sense of being compatible with the other ingredients of the formulation and not injurious to the tissue of the subject (e.g., physiologically compatible, sterile, physiologic pH, etc.).
Some examples of materials which can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) pH buffered solutions; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids (23) serum component, such as serum albumin, HDL and LDL; (22) C2-C12 alcohols, such as ethanol; and (23) other non-toxic compatible substances employed in pharmaceutical formulations. Wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservative and antioxidants can also be present in the formulation. The terms such as “excipient”, “carrier”, “pharmaceutically acceptable carrier” or the like are used interchangeably herein. In some embodiments, the pharmaceutical composition is formulated for delivery to a subject, e.g., for gene editing. Suitable routes of administrating the pharmaceutical composition described herein include, without limitation: topical, subcutaneous, transdermal, intradermal, intralesional, intraarticular, intraperitoneal, intravesical, transmucosal, gingival, intradental, intracochlear, transtympanic, intraorgan, epidural, intrathecal, intramuscular, intravenous, intravascular, intraosseus, periocular, intratumoral, intracerebral, and intracerebroventricular administration.
In some embodiments, the pharmaceutical composition described herein is administered locally to a diseased site (e.g., tumor site). In some embodiments, the pharmaceutical composition described herein is administered to a subject by injection, by means of a catheter, by means of a suppository, or by means of an implant, the implant being of a porous, non-porous, or gelatinous material, including a membrane, such as a sialastic membrane, or a fiber.
In other embodiments, the pharmaceutical composition described herein is delivered in a controlled release system. In one embodiment, a pump may be used (see, e.g., Langer, 1990, Science 249: 1527-1533; Sefton, 1989, CRC Crit. Ref. Biomed. Eng.14:201; Buchwald et al., 1980, Surgery 88:507; Saudek et al., 1989, N. Engl. J. Med.321 :574). In another embodiment, polymeric materials can be used. (See, e.g., Medical Applications of Controlled Release (Langer and Wise eds., CRC Press, Boca Raton, Fla., 1974); Controlled Drug Bioavailability, Drug Product Design and Performance (Smolen and Ball eds., Wiley, New York, 1984); Ranger and Peppas, 1983, Macromol. Sci. Rev. Macromol. Chem.23:61. See also Levy et al., 1985, Science 228: 190; During et al., 1989, Ann. Neurol.25:351; Howard et al., 1989, J. Neurosurg.71 : 105). Other controlled release systems are discussed, for example, in Langer, supra.
In some embodiments, the pharmaceutical composition is formulated in accordance with routine procedures as a composition adapted for intravenous or subcutaneous administration to a subject, e.g., a human. In some embodiments, pharmaceutical composition for administration by injection are solutions in sterile isotonic aqueous buffer. Where necessary, the pharmaceutical can also include a solubilizing agent and a local anesthetic such as lignocaine to ease pain at the site of the injection. Generally, the ingredients are supplied either separately or mixed together in unit dosage form, for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent. Where the pharmaceutical is to be administered by infusion, it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline. Where the pharmaceutical composition is administered by injection, an ampoule of sterile water for injection or saline can be provided so that the ingredients can be mixed prior to administration.
A pharmaceutical composition for systemic administration may be a liquid, e.g., sterile saline, lactated Ringer’s or Hank’s solution. In addition, the pharmaceutical composition can be in solid forms and re-dissolved or suspended immediately prior to use. Lyophilized forms are also contemplated.
The pharmaceutical composition can be contained within a lipid particle or vesicle, such as a liposome or microcrystal or LNP, which is also suitable for parenteral administration. The particles can be of any suitable structure, such as unilamellar or plurilamellar, so long as compositions are contained therein. Compounds can be entrapped in “stabilized plasmid- lipid particles” (SPLP) containing the fusogenic lipid dioleoylphosphatidylethanolamine (DOPE), low levels (5-10 mol%) of cationic lipid, and stabilized by a polyethyleneglycol (PEG) coating (Zhang Y. P. et al., Gene Ther.1999, 6:1438-47). Positively charged lipids such as N-[l-(2,3- dioleoyloxi)propyl]-N,N,N-trimethyl-amoniummethylsulfate, or “DOTAP,” are particularly preferred for such particles and vesicles. The preparation of such lipid particles is well known. See, e.g., U.S. Patent Nos.4,880,635; 4,906,477; 4,911,928; 4,917,951; 4,920,016; and 4,921,757; each of which is incorporated herein by reference.
Further, the pharmaceutical composition can be provided as a pharmaceutical kit comprising (a) a container containing a recombinant retron-based genome editing system or one or more components thereof in lyophilized form and (b) a second container containing a pharmaceutically acceptable diluent (e.g., sterile water) for injection. The pharmaceutically acceptable diluent can be used for reconstitution or dilution of the lyophilized system of the invention. Optionally associated with such contained s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use or sale for human administration.
In another aspect, an article of manufacture containing materials useful for the treatment of the diseases described above is included. In some embodiments, the article of manufacture comprises a container and a label. Suitable containers include, for example, bottles, vials, syringes, and test tubes. The containers may be formed from a variety of materials such as glass or plastic. In some embodiments, the container holds a composition that is effective for treating a disease described herein and may have a sterile access port. For example, the container may be an intravenous solution bag or a vial having a stopper pierce- able by a hypodermic injection needle. The active agent in the composition is a compound of the invention. In some embodiments, the label on or associated with the container indicates that the composition is used for treating the disease of choice. The article of manufacture may further comprise a second container comprising a pharmaceutically acceptable buffer, such as phosphate-buffered saline, Ringer's solution, or dextrose solution. It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters, needles, syringes, and package inserts with instructions for use.
KITS
Also provided are kits comprising engineered retron constructs as described herein. In some embodiments, the kit provides an engineered retron construct or a vector system comprising such a retron construct. In some embodiments, the engineered retron construct, included in the kit, comprises a heterologous sequence capable of providing a cell with a nucleic acid encoding a protein or regulatory RNA of interest, a cellular barcode, a donor polynucleotide suitable for use in gene editing, e.g., by homology directed repair (HDR) or recombination-mediated genetic engineering (recombineering), or a CRISPR protospacer DNA sequence for use in molecular recording. Other agents may also be included in the kit such as transfection agents, host cells, suitable media for culturing cells, buffers, and the like.
In the context of a kit, agents can be provided in liquid or sold form in any convenient packaging (e.g., stick pack, dose pack, etc.). The agents of a kit can be present in the same or separate containers. The agents may also be present in the same container. In addition to the above components, the subject kits may further include (in certain embodiments) instructions for practicing the subject methods. These instructions may be present in the subject kits in a variety of forms, one or more of which may be present in the kit. One form in which these instructions may be present is as printed information on a suitable medium or substrate, e.g., a piece or pieces of paper on which the information is printed, in the packaging of the kit, in a package insert, and the like. Yet another form of these instructions is a computer readable medium, e.g., diskette, compact disk (CD), flash drive, and the like, on which the information has been recorded. Yet another form of these instructions that may be present is a website address which may be used via the internet to access the information at a removed site. UTILITY Retrons can be engineered with heterologous sequences for use in a variety of applications. For example, heterologous sequences can be added to retron constructs to provide a cell with a heterologous nucleic acid encoding a protein or regulatory RNA of interest, a cellular barcode, a donor polynucleotide/repair template suitable for use in gene editing, including therapeutic editing, e.g., by homology directed repair (HDR) or recombination- mediated genetic engineering (recombineering), or a CRISPR protospacer DNA sequence for use in molecular recording, as discussed further herein. Such heterologous sequences may be inserted, for example, into the msr gene or the msd gene such that the heterologous sequence is transcribed by the retron reverse transcriptase as part of the msDNA product.
The compositions and methods described herein find use in, for example, (1) metabolic engineering of bacteria and/or yeast or other eukaryotic cells to increase the production of products of interest such as functional molecules (e.g., chemicals, fuels, materials, and proteins) or for their use in medical research (e.g., gene therapy, discovery of new drugs); (2) improved molecular recording technologies by using different mutations to sense one or several cellular events simultaneously; (3) design cellular chassis as a recipient of engineered biological systems in synthetic biology or bioengineering (e.g., bacterial chassis to improve phage editing technologies); (4) multiplexed gene therapy in mammalian, such as human cells; and (5) multiplexed editing of phage genomes for engineered phage therapy.
In some embodiments, the engineered retrons, components, and systems described herein may be used for research tools, such as kits, functional genomics assays, and generating engineered cell lines and animal models for research and drug screening. The kit may comprise one or more reagents in addition to the engineered retron, such as a buffer, a control reagent, a control vector, a control RNA polynucleotide, a reagent for in vitro production of the polypeptide from DNA, and adaptors for sequencing. A buffer can be, for example, a stabilization buffer, a reconstituting buffer, a diluting buffer, a wash buffer, or a buffer for introducing a polypeptide and/or polynucleotide of the kit into a cell. In some instances, a kit can comprise one or more additional reagents specific for plants. One or more additional reagents for plants can include, for example, soil, nutrients, plants, seeds, spores, Agrobacterium, a T-DNA vector, and a pBINAR vector.
PRODUCTION OF PROTEIN OR RNA
For example, the single-stranded DNA generated by an engineered retron can be used to produce a desired product of interest in cells. In some embodiments, the retron is engineered with a heterologous sequence encoding a polypeptide of interest to allow production of the polypeptide from the retron msDNA generated in a cell. The polypeptide of interest may be any type of protein/peptide including, without limitation, an enzyme, an extracellular matrix protein, a receptor, transporter, ion channel, or other membrane protein, a hormone, a neuropeptide, an antibody, or a cytoskeletal protein; or a fragment thereof, or a biologically active domain of interest. In some embodiments, the protein is a therapeutic protein or therapeutic antibody for use in treatment of a disease.
Non-limiting examples of polypeptides of interest include: growth hormones, insulinlike growth factors (IGF-1), Fat-1, Phytase, xylanase, beta-glucanase, Lysozyme or lysostaphin, Histone deacetylase such as HDAC6, CD163, etc.
In other embodiments, the retron is engineered with a heterologous sequence encoding an RNA of interest to allow production of the RNA from the retron in a cell. The RNA of interest may be any type of RNA including, without limitation, a RNA interference (RNAi) nucleic acid or regulatory RNA such as, but not limited to, a microRNA (miRNA), a small interfering RNA (siRNA), a short hairpin RNA (shRNA), a small nuclear RNA (snRNA), a long non-coding RNA (IncRNA), an antisense nucleic acid, and the like.
GENE EDITING
In some embodiments, the retron is used for genome editing a desired site. A retron is engineered with a heterologous nucleic acid sequence encoding a donor polynucleotide suitable for use with nuclease genome editing system. The nuclease is designed to specifically target a location proximal to the desired edit (the nuclease should be designed such that it will not cut the target once the edit is properly installed). The nuclease (e.g., Cas or non-Cas) is linked to the retron, either by direct fusion to the RT or by fusion of the msDNA to the gRNA (only applicable for RNA-guided nucleases). A heterologous nucleic acid sequence is inserted into the retron msd.
In some embodiments, the heterologous nucleic acid sequence has 10-100 or more bp of homologous nucleic acid sequence to the genome on both sides of the desired edit. The desired edit (insertion, deletion, or mutation) is in between the homologous sequence.
In some embodiments, donor polynucleotides comprise a sequence comprising an intended genome edit flanked by a pair of homology arms responsible for targeting the donor polynucleotide to the target locus to be edited in a cell. The donor polynucleotide typically comprises a 5' homology arm that hybridizes to a 5' genomic target sequence and a 3' homology arm that hybridizes to a 3' genomic target sequence. The homology arms are referred to herein as 5' and 3' (z.e., upstream and downstream) homology arms, which relate to the relative position of the homology arms to the nucleotide sequence comprising the intended edit within the donor polynucleotide. The 5' and 3' homology arms hybridize to regions within the target locus in the genomic DNA to be modified, which are referred to herein as the “5' target sequence” and “3' target sequence,” respectively.
The homology arm must be sufficiently complementary for hybridization to the target sequence to mediate homologous recombination between the donor polynucleotide and genomic DNA at the target locus. For example, a homology arm may comprise a nucleotide sequence having at least about 80-100% sequence identity to the corresponding genomic target sequence, including any percent identity within this range, such as at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity thereto, wherein the nucleotide sequence comprising the intended edit can be integrated into the genomic DNA by HDR at the genomic target locus recognized (z.e., having sufficient complementary for hybridization) by the 5' and 3' homology arms.
In some embodiments, the corresponding homologous nucleotide sequences in the genomic target sequence (z.e., the “5' target sequence” and “3' target sequence”) flank a specific site for cleavage and/or a specific site for introducing the intended edit. The distance between the specific cleavage site and the homologous nucleotide sequences (e.g., each homology arm) can be several hundred nucleotides. In some embodiments, the distance between a homology arm and the cleavage site is 200 nucleotides or less (e.g., 0, 10, 20, 30, 50, 75, 100, 125, 150, 175, and 200 nucleotides). In most cases, a smaller distance may give rise to a higher gene targeting rate. In some embodiments, the donor polynucleotide is substantially identical to the target genomic sequence, across its entire length except for the sequence changes to be introduced to a portion of the genome that encompasses both the specific cleavage site and the portions of the genomic target sequence to be altered.
A homology arm can be of any length, e.g. 10 nucleotides or more, 15 nucleotides or more, 20 nucleotides or more, 50 nucleotides or more, 100 nucleotides or more, 250 nucleotides or more, 300 nucleotides or more, 350 nucleotides or more, 400 nucleotides or more, 450 nucleotides or more, 500 nucleotides or more, 1000 nucleotides (1 kb) or more, 5000 nucleotides (5 kb) or more, 10000 nucleotides (10 kb) or more, etc. In some instances, the 5' and 3' homology arms are substantially equal in length to one another. However, in some instances the 5' and 3' homology arms are not necessarily equal in length to one another. For example, one homology arm may be 30% shorter or less than the other homology arm, 20% shorter or less than the other homology arm, 10% shorter or less than the other homology arm, 5% shorter or less than the other homology arm, 2% shorter or less than the other homology arm, or only a few nucleotides less than the other homology arm. In other instances, the 5' and 3' homology arms are substantially different in length from one another, e.g., one may be 40% shorter or more, 50% shorter or more, sometimes 60% shorter or more, 70% shorter or more, 80% shorter or more, 90% shorter or more, or 95% shorter or more than the other homology arm.
The donor polynucleotide may be used in combination with an RNA-guided nuclease, which is targeted to a particular genomic sequence (z.e., genomic target sequence to be modified) by a guide RNA. A target-specific guide RNA comprises a nucleotide sequence that is complementary to a genomic target sequence, and thereby mediates binding of the nuclease- gRNA complex by hybridization at the target site. For example, the gRNA can be designed with a sequence complementary to the sequence of a minor allele to target the nuclease-gRNA complex to the site of a mutation. The mutation may comprise an insertion, a deletion, or a substitution. For example, the mutation may include a single nucleotide variation, gene fusion, translocation, inversion, duplication, frameshift, missense, nonsense, or other mutation associated with a phenotype or disease of interest. The targeted minor allele may be a common genetic variant or a rare genetic variant. In some embodiments, the gRNA is designed to selectively bind to a minor allele with single base-pair discrimination, for example, to allow binding of the nuclease-gRNA complex to a single nucleotide polymorphism (SNP). In particular, the gRNA may be designed to target disease-relevant mutations of interest for the purpose of genome editing to remove the mutation from a gene. Alternatively, the gRNA can be designed with a sequence complementary to the sequence of a major or wild-type allele to target the nuclease-gRNA complex to the allele for the purpose of genome editing to introduces a mutation into a gene in the genomic DNA of the cell, such as an insertion, deletion, or substitution. Such genetically modified cells can be used, for example, to alter phenotype, confer new properties, or produce disease models for drug screening.
In some embodiments, the RNA-guided nuclease used for genome modification is a clustered regularly interspersed short palindromic repeats (CRISPR) system Cas nuclease. Any RNA-guided Cas nuclease capable of catalyzing site- directed cleavage of DNA to allow integration of donor polynucleotides by the HDR mechanism can be used in genome editing, including CRISPR system Class 1, Type I, II, or III Cas nucleases; Class 2, Type II nuclease (such as Cas9); a Class 2, Type V nuclease (such as Cpfl), or a Class 2, Type VI nuclease (such as C2c2). Examples of Cas proteins include Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8al, Cas8a2, Cas8b, Cas8c, Cas9 (Csnl or Csxl2), CaslO, CaslOd, CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, Csf4, and Cul966, and homologs or modified versions thereof.
In some embodiments, a Class 1, type II CRISPR system Cas9 endonuclease is used. Cas9 nucleases from any species, or biologically active fragments, variants, analogs, or derivatives thereof that retain Cas9 endonuclease activity (ie., catalyze site-directed cleavage of DNA to generate double-strand breaks) may be used to perform genome modification as described herein. The Cas9 need not be physically derived from an organism but may be synthetically or recombinantly produced. Cas9 sequences from a number of bacterial species are well known in the art and listed in the National Center for Biotechnology Information (NCBI) database. See, for example, NCBI entries for Cas9 from: Streptococcus pyogenes (WP 002989955, WP_038434062, WP_011528583); Campylobacter jejuni (WP_022552435, YP 002344900), Campylobacter coll (WP 060786116); Campylobacter fetus (WP 059434633); Corynebacterium ulcerans (NC_015683, NC_017317); Corynebacterium diphtheria (NC_016782, NC_016786); Enterococcus faecalis (WP 033919308); Spiroplasma syrphidicola (NC 021284); Prevotella intermedia (NC 017861); Spiroplasma taiwanense (NC 021846); Streptococcus iniae (NC 021314); Belliella baltica (NC 018010); Psychroflexus torquisl (NC O 18721); Streptococcus thermophilus (YP 820832), Streptococcus mutans (WP 061046374, WP 024786433); Listeria innocua (NP 472073); Listeria monocytogenes (WP 061665472); Legionella pneumophila (WP 062726656); Staphylococcus aureus (WP_001573634); Francisella tularensis (WP_032729892, WP_014548420), Enterococcus faecalis (WP 033919308); Lactobacillus rhamnosus (WP 048482595, WP_032965177); and Neisseria meningitidis (WP_061704949, YP_002342100); all of which sequences (as entered by the date of filing of this application) are herein incorporated by reference in their entireties. Any of these sequences or a variant thereof comprising a sequence having at least about 70- 100% sequence identity thereto, including any percent identity within this range, such as 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity thereto, can be used for genome editing, as described herein. See also Fonfara et al. (2014) Nucleic Acids Res. 42(4):2577-90; Kapitonov et al. (2015) J. Bacterid. 198(5): 797-807, Shmakov et al. (2015) Mol. Cell. 60(3):385- 397, and Chylinski et al. (2014) Nucleic Acids Res. 42(10):6091-6105); for sequence comparisons and a discussion of genetic diversity and phylogenetic analysis of Cas9.
The genomic target site will typically comprise a nucleotide sequence that is complementary to the gRNA and may further comprise a protospacer adjacent motif (PAM). In some embodiments, the target site comprises 20-30 base pairs in addition to a 3 or more base pair PAM. Typically, the first nucleotide of a PAM can be any nucleotide, while the two or more other nucleotides will depend on the specific Cas9 protein that is chosen. Exemplary PAM sequences are known to those of skill in the art and include, without limitation, NNG, NGN, NAG, and NGG, wherein N represents any nucleotide. In some embodiments, the allele targeted by a gRNA comprises a mutation that creates a PAM within the allele, wherein the PAM promotes binding of the Cas9-gRNA complex to the allele.
In some embodiments, the gRNA is 5-50 nucleotides, 10-30 nucleotides, 15- 25 nucleotides, 18-22 nucleotides, or 19-21 nucleotides in length, or any length between the stated ranges, including, for example, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides in length. The guide RNA may be a single guide RNA comprising crRNA and tracrRNA sequences in a single RNA molecule, or the guide RNA may comprise two RNA molecules with crRNA and tracrRNA sequences residing in separate RNA molecules.
In another embodiment, the CRISPR nuclease from Prevotella and Francisella 1 (Cpfl, or Casl2a) is used. Cpfl is another class II CRISPR/Cas system RNA-guided nuclease with similarities to Cas9 and may be used analogously. Unlike Cas9, Cpfl does not require a tracrRNA and only depends on a crRNA in its guide RNA, which provides the advantage that shorter guide RNAs can be used with Cpfl for targeting than Cas9. Cpfl is capable of cleaving either DNA or RNA. The PAM sites recognized by Cpfl have the sequences 5'-YTN-3' (where “Y” is a pyrimidine and “N” is any nucleobase) or 5'-TTN-3', in contrast to the G-rich PAM site recognized by Cas9. Cpfl cleavage of DNA produces double-stranded breaks with a sticky - ends having a 4 or 5 nucleotide overhang. For a discussion of Cpfl, see, e.g., Ledford et al. (2015) Nature. 526 (7571): 17-17, Zetsche et al. (2015) Cell. 163 (3):759-771, Murovec et al. (2017) Plant Biotechnol. J. 15(8):917-926, Zhang et al. (2017) Front. Plant Sci. 8: 177, Fernandes et al. (2016) Postepy Biochem. 62(3) :315-326; herein incorporated by reference.
C2cl (Casl2b) is another class II CRISPR/Cas system RNA-guided nuclease that may be used. C2cl, similarly to Cas9, depends on both a crRNA and tracrRNA for guidance to target sites. See, e.g., Shmakov etal. (2015) Mol Cell. 60(3):385-397, Zhang etal. (2017) Front Plant Sci. 8: 177; herein incorporated by reference.
In yet another embodiment, an engineered RNA-guided Fokl nuclease may be used. RNA-guided Fokl nucleases comprise fusions of inactive Cas9 (dCas9) and the Fokl endonuclease (FokI-dCas9), wherein the dCas9 portion confers guide RNA-dependent targeting on Fokl. For a description of engineered RNA-guided Fold nucleases, see, e.g., Havlicek et al. (2017) Mol. Ther. 25(2):342-355, Pan et al. (2016) Sci Rep. 6:35794, Tsai et al. (2014) Nat Biotechnol. 32(6):569-576; herein incorporated by reference.
In other embodiments, any other Cas enzymes and variants described in other sections of the application (all incorporated herein) can be used similarly.
In some embodiments, the RNA-guided nuclease is provided in the form of a protein, optionally where the nuclease is complexed with a gRNA to form a ribonucleoprotein (RNP) complex. In some embodiments, the RNA-guided nuclease is provided by a nucleic acid encoding the RNA-guided nuclease, such as an RNA (e.g., messenger RNA) or DNA (expression vector). In some embodiments, the RNA-guided nuclease and the gRNA are both provided by vectors, such as the vectors and the vector system described in other parts of the application (all incorporated herein by reference). Both can be expressed by a single vector or separately on different vectors. The vectors encoding the RNA-guided nuclease and gRNA may be included in the vector system comprising the engineered retron msr gene, msd gene and ret gene sequences. In some embodiments, the RNA-guided nuclease is fused to the RT and/or the msDNA.
The RNP complex may be administered to a subject or delivered into a cell by methods known in the art, such as those described in U.S. Pat. No. 11,390,884, which is incorporated by reference herein in its entirety. In some embodiments, the endonuclease/gRNA ribonucleoprotein (RNP) complexes are delivered to cells by electroporation. Direct delivery of the RNP complex to a subject or cell eliminates the need for expression from nucleic acids (e.g., transfection of plasmids encoding Cas9 and gRNA). It also eliminates unwanted integration of DNA segments derived from nucleic acid delivery (e.g., transfection of plasmids encoding Cas9 and gRNA). An endonuclease/gRNA ribonucleoprotein (RNP) complex usually is formed prior to administration.
Codon usage may be optimized to further improve production of an RNA-guided nuclease and/or reverse transcriptase (RT) in a particular cell or organism. For example, a nucleic acid encoding an RNA-guided nuclease or reverse transcriptase can be modified to substitute codons having a higher frequency of usage in a yeast cell, a bacterial cell, a human cell, a non-human cell, a mammalian cell, a rodent cell, a mouse cell, a rat cell, or any other host cell of interest, as compared to the naturally occurring polynucleotide sequence. When a nucleic acid encoding the RNA-guided nuclease or reverse transcriptase is introduced into cells, the protein can be transiently, conditionally, or constitutively expressed in the cell.
In some embodiments, the engineered retron used for genome editing with nuclease genome editing systems can further include accessory or enhancer proteins for recombination. Examples of recombination enhancers can include nonhomologous end joining (NHEJ) inhibitors (e.g., inhibitor of DNA ligase IV, a KU inhibitor (e.g., KU70 or KU80), a DNA-PKc inhibitor, or an artemis inhibitor) and homologous directed repair (HDR) promoters, or both, that can enhance or improve more precise genome editing and/or the efficiency of homologous recombination. In some embodiments, the recombination accessory or enhancers can comprise C-terminal binding protein interacting protein (CtIP), cyclinB2, Rad family members (e.g., Rad50, Rad51, Rad52, etc).
CtIP is a transcription factor containing C2H2 zinc fingers that are involved in early steps of homologous recombination. Mammalian CtIP and its orthologs in other eukaryotes promote the resection of DNA double-strand breaks and are essential for meiotic recombination. HDR may be enhanced by using Cas9 nuclease associated (e.g., fused) to an N-terminal domain of CtIP, an approach that forces CtIP to the cleavage site and increases transgene integration by HDR. In some embodiments, an N-terminal fragment of CtIP, called HE for HDR enhancer, may be sufficient for HDR stimulation and requires the CtIP multimerization domain and CDK phosphorylation sites to be active. HDR stimulation by the Cas9-HE fusion depends on the guide RNA used, and therefore the guide RNA will be designed accordingly.
Using the gene editing system described herein, any target gene or sequence in a host cell can be edited or modified for a desired trait, including but not limited to: Myostatin (e.g., GDF8) to increase muscle growth; Pc POLLED to induce hairlessness; KISS 1R to induce bore taint; Dead end protein (dnd) to induce sterility; Nano2 and DDX to induce sterility; CD 163 to induce PRRSV resistance; RELA to induce ASFV resilience; CD 18 to induce Mannheimia (Pasteurella) haemolytica resilience; NRAMPl to induce tuberculosis resilience; Negative regulators of muscle mass (e.g., Myostatin) to increase muscle mass.
In additional to the eukaryotic and prokaryotic gene editing described herein, bacteriophages, which naturally shape bacterial communities, can be co-opted as a biological technology to help eliminate pathogenic bacteria from our bodies and food supply. Phage genome editing is a critical tool to engineer more effective phage technologies. However, editing phage genomes has traditionally been a low efficiency process that requires laborious screening, counter selection, or in vitro construction of modified genomes. These requirements impose limitations on the type and throughput of phage modifications, which in turn limit our knowledge and potential for innovation. Provided herein is a scalable approach for engineering phage genomes using recombitrons: modified bacterial retrons that generate recombineering donor DNA along with single stranded binding and annealing proteins to integrate those donors into phage genomes. This system can efficiently create genome modifications in multiple distinct phages without the need for counterselection. Moreover, the process is continuous, with edits accumulating in the phage genome the longer the phage is cultured with the host, and multiplexable, with different editing hosts contributing distinct mutations along the genome of a phage in a mixed culture. In lambda phage, as an example, recombitrons yield single-base substitutions at up to 99% efficiency, short (<20 base pair) insertions and deletions at 5-50%, and up to 5 distinct mutations installed on a single phage genome, all without counterselection and only a few hours of hands-on time.
The compositions and methods described herein provide genomic editing of phage genomes by supplying donor DNA and by using endogenously or recombinantly expressed proteins that facilitate transfer of the edited sequences from the donor DNA into the phage genomes during phage replication.
One problem with currently available phage editing methods is that donor DNA is not provided in sufficient amounts to the host cells to provide efficient phage genomic editing. The methods and compositions described herein solve this problem by expressing multiple copies of retron noncoding RNAs (ncRNAs) from an expression cassette or expression vector as templates for donor DNAs. Multiple copies of the donor DNAs are then generated from each ncRNA by reverse transcription.
Editing of phage genomes generally is done during phage replication. Once a bacteriophage attaches to a susceptible host, it pursues one of two replication strategies: lytic or lysogenic. During a lytic replication cycle, a phage attaches to a susceptible host bacterium, introduces its genome into the host cell cytoplasm, and utilizes the ribosomes of the host to manufacture its proteins. The host cell resources are rapidly converted to phage genomes and capsid proteins, which assemble into multiple copies of the original phage. As the host cell dies, it is either actively or passively lysed, releasing the new bacteriophage to infect another host cell. In the lysogenic replication cycle, the phage also attaches to a susceptible host bacterium and introduces its genome into the host cell cytoplasm. However, the phage genome is instead integrated into the bacterial cell chromosome or maintained as an episomal element where, in both cases, it is replicated and passed on to daughter bacterial cells without killing them. The phage with the genomes that will be edited can be lytic, temperate, or lysogenic phage. For example, one type editing that can be performed using the methods described here can be converting temperate or lysogenic phages into lytic phages.
In some cases, the donor DNA includes a sequence having a sequence identity of about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% to a target phage genomic DNA sequence (or a complement thereof). In some cases, the donor DNA, or complement thereof, includes a sequence having a sequence identity of at least about 90%, 95%, 96%, 97%, 98%, or 99% to a target nucleic acid.
The target phage sequences can be any site in the phage genome. However, in some cases the donor DNA do not edit target phage sequences involved in phage cellular entry or phage replication. Instead, the donor DNA can, for example, target phage sequences that bacterial cells defensively target. In other words, the target sites in phage genomes are selected to improve phage killing or increase phage inhibition of bacterial growth.
The endogenously or recombinantly expressed proteins facilitate transfer of the editing sequences from the donor DNA into the phage genomes during phage replication. These proteins can include one or more single strand annealing proteins (SSAPs), single-stranded DNA binding proteins (SSBs), mutant mismatch repair proteins, or a combination thereof.
The methods described herein can therefore perform genomic editing without using clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) systems (see e.g., Marraffini and Sontheimer. Nature Reviews Genetics 11 : 181-190 (2010); Sorek et al. Nature Reviews Microbiology 2008 6: 181-6; Karginov and Hannon. Mol Cell 2010 1 :7-19; Hale et al. Mol Cell 2010:45:292-302; Jinek et al. Science 2012 337:815-820; Bikard and Marraffini Curr Opin Immunol 2012 24: 15-20; Bikard et al. Cell Host & Microbe 2012 12: 177-186).
In some embodiments, the retron is engineered with a heterologous sequence encoding a donor polynucleotide suitable for use with a CRISPR/Cas genome editing system. Donor polynucleotides comprise a sequence comprising an intended genome edit flanked by a pair of homology arms responsible for targeting the donor polynucleotide to the target locus to be edited in a cell. The donor polynucleotide typically comprises a 5' homology arm that hybridizes to a 5' genomic target sequence and a 3' homology arm that hybridizes to a 3' genomic target sequence. The homology arms are referred to herein as 5' and 3' (i.e., upstream and downstream) homology arms, which relate to the relative position of the homology arms to the nucleotide sequence comprising the intended edit within the donor polynucleotide. The 5' and 3' homology arms hybridize to regions within the target locus in the genomic DNA to be modified, which are referred to herein as the "5' target sequence" and "3' target sequence," respectively.
The homology arm must be sufficiently complementary for hybridization to the target sequence to mediate homologous recombination between the donor polynucleotide and genomic DNA at the target locus. For example, a homology arm may comprise a nucleotide sequence having at least about 80-100% sequence identity to the corresponding genomic target sequence, including any percent identity within this range, such as at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity thereto, wherein the nucleotide sequence comprising the intended edit can be integrated into the genomic DNA by HDR at the genomic target locus recognized (i.e., having sufficient complementary for hybridization) by the 5' and 3' homology arms.
In certain embodiments, the corresponding homologous nucleotide sequences in the genomic target sequence (i.e., the "5' target sequence" and "3' target sequence") flank a specific site for cleavage and/or a specific site for introducing the intended edit. The distance between the specific cleavage site and the homologous nucleotide sequences (e.g., each homology arm) can be several hundred nucleotides. In some embodiments, the distance between a homology arm and the cleavage site is 200 nucleotides or less (e.g., 0, 10, 20, 30, 50, 75, 100, 125, 150, 175, and 200 nucleotides). In most cases, a smaller distance may give rise to a higher gene targeting rate. In a preferred embodiment, the donor polynucleotide is substantially identical to the target genomic sequence, across its entire length except for the sequence changes to be introduced to a portion of the genome that encompasses both the specific cleavage site and the portions of the genomic target sequence to be altered.
A homology arm can be of any length, e.g., 10 nucleotides or more, 15 nucleotides or more, 20 nucleotides or more, 50 nucleotides or more, 100 nucleotides or more, 250 nucleotides or more, 300 nucleotides or more, 350 nucleotides or more, 400 nucleotides or more, 450 nucleotides or more, 500 nucleotides or more, 1000 nucleotides (1 kb) or more, 5000 nucleotides (5 kb) or more, 10000 nucleotides (10 kb) or more, etc. In some instances, the 5' and 3' homology arms are substantially equal in length to one another. However, in some instances the 5' and 3' homology arms are not necessarily equal in length to one another. For example, one homology arm may be 30% shorter or less than the other homology arm, 20% shorter or less than the other homology arm, 10% shorter or less than the other homology arm, 5% shorter or less than the other homology arm, 2% shorter or less than the other homology arm, or only a few nucleotides less than the other homology arm. In other instances, the 5' and 3' homology arms are substantially different in length from one another, e.g., one may be 40% shorter or more, 50% shorter or more, sometimes 60% shorter or more, 70% shorter or more, 80% shorter or more, 90% shorter or more, or 95% shorter or more than the other homology arm.
The donor polynucleotide is used in combination with an RNA-guided nuclease, which is targeted to a particular genomic sequence (i.e., genomic target sequence to be modified) by a guide RNA. A target-specific guide RNA comprises a nucleotide sequence that is complementary to a genomic target sequence, and thereby mediates binding of the nuclease- gRNA complex by hybridization at the target site. For example, the gRNA can be designed with a sequence complementary to the sequence of a minor allele to target the nuclease-gRNA complex to the site of a mutation. The mutation may comprise an insertion, a deletion, or a substitution. For example, the mutation may include a single nucleotide variation, gene fusion, translocation, inversion, duplication, frameshift, missense, nonsense, or other mutation associated with a phenotype or disease of interest. The targeted minor allele may be a common genetic variant or a rare genetic variant. In certain embodiments, the gRNA is designed to selectively bind to a minor allele with single base-pair discrimination, for example, to allow binding of the nuclease-gRNA complex to a single nucleotide polymorphism (SNP). In particular, the gRNA may be designed to target disease-relevant mutations of interest for the purpose of genome editing to remove the mutation from a gene. Alternatively, the gRNA can be designed with a sequence complementary to the sequence of a major or wild-type allele to target the nuclease-gRNA complex to the allele for the purpose of genome editing to introduces a mutation into a gene in the genomic DNA of the cell, such as an insertion, deletion, or substitution. Such genetically modified cells can be used, for example, to alter phenotype, confer new properties, or produce disease models for drug screening.
In certain embodiments, the RNA-guided nuclease used for genome modification is a clustered regularly interspersed short palindromic repeats (CRISPR) system Cas nuclease. Any RNA-guided Cas nuclease capable of catalyzing site-directed cleavage of DNA to allow integration of donor polynucleotides by the HDR mechanism can be used in genome editing, including CRISPR system type I, type II, or type III Cas nucleases. Examples of Cas proteins include Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8al, Cas8a2, Cas8b, Cas8c, Cas9 (Csnl or Csxl2), CaslO, CaslOd, CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, Csf4, and Cul966, and homologs or modified versions thereof.
In certain embodiments, a type II CRISPR system Cas9 endonuclease is used. Cas9 nucleases from any species, or biologically active fragments, variants, analogs, or derivatives thereof that retain Cas9 endonuclease activity (i.e., catalyze site-directed cleavage of DNA to generate double-strand breaks) may be used to perform genome modification as described herein. The Cas9 need not be physically derived from an organism but may be synthetically or recombinantly produced. Cas9 sequences from a number of bacterial species are well known in the art and listed in the National Center for Biotechnology Information (NCBI) database. See, for example, NCBI entries for Cas9 from: Streptococcus pyogenes (WP 002989955, WP_038434062, WP_011528583); Campylobacter jejuni (WP_022552435, YP_002344900), Campylobacter coll (WP 060786116); Campylobacter fetus (WP 059434633); Corynebacterium ulcerans (NC_015683, NC_017317); Corynebacterium diphtheria (NC_016782, NC_016786); Enterococcus faecalis (WP_033919308); Spiroplasma syrphidicola (NC_021284); Prevotella intermedia (NC_017861); Spiroplasma taiwanense (NC_021846); Streptococcus iniae (NC_021314); Belliella baltica (NC_018010); Psychrojlexus torquisl (NC 018721); Streptococcus thermophilus (YP 820832), Streptococcus mutans (WP_061046374, WP_024786433); Listeria innocua (NP_472073); Listeria monocytogenes (WP 061665472); Legionella pneumophila (WP 062726656); Staphylococcus aureus (WP_001573634); Francisella tularensis (WP_032729892, WP_0 14548420), Enterococcus faecalis (WP 033919308); Lactobacillus rhamnosus (WP_048482595, WP_032965177); and Neisseria meningitidis (WP_061704949, YP 002342100); all of which sequences (as entered by the date of filing of this application) are herein incorporated by reference in their entireties. Any of these sequences or a variant thereof comprising a sequence having at least about 70-100% sequence identity thereto, including any percent identity within this range, such as 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity thereto, can be used for genome editing, as described herein. See also Fonfara et al. (2014) Nucleic Acids Res. 42(4):2577-90; Kapitonov et al. (2015) J. Bacteriol. 198(5):797-807, Shmakov et al. (2015) Mol. Cell. 60(3):385-397, and Chylinski et al. (2014) Nucleic Acids Res. 42(10):6091-6105); for sequence comparisons and a discussion of genetic diversity and phylogenetic analysis of Cas9.
The CRISPR-Cas system naturally occurs in bacteria and archaea where it plays a role in RNA-mediated adaptive immunity against foreign DNA. The bacterial type II CRISPR system uses the endonuclease, Cas9, which forms a complex with a guide RNA (gRNA) that specifically hybridizes to a complementary genomic target sequence, where the Cas9 endonuclease catalyzes cleavage to produce a double-stranded break. Targeting of Cas9 typically further relies on the presence of a 5' protospacer-adjacent motif (PAM) in the DNA at or near the gRNA-binding site.
The genomic target site will typically comprise a nucleotide sequence that is complementary to the gRNA and may further comprise a protospacer adjacent motif (PAM). In certain embodiments, the target site comprises 20-30 base pairs in addition to a 3 base pair PAM. Typically, the first nucleotide of a PAM can be any nucleotide, while the two other nucleotides will depend on the specific Cas9 protein that is chosen. Exemplary PAM sequences are known to those of skill in the art and include, without limitation, NNG, NGN, NAG, and NGG, wherein N represents any nucleotide. In certain embodiments, the allele targeted by a gRNA comprises a mutation that creates a PAM within the allele, wherein the PAM promotes binding of the Cas9-gRNA complex to the allele.
In certain embodiments, the gRNA is 5-50 nucleotides, 10-30 nucleotides, 15-25 nucleotides, 18-22 nucleotides, or 19-21 nucleotides in length, or any length between the stated ranges, including, for example, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides in length. The guide RNA may be a single guide RNA comprising crRNA and tracrRNA sequences in a single RNA molecule, or the guide RNA may comprise two RNA molecules with crRNA and tracrRNA sequences residing in separate RNA molecules.
In another embodiment, the CRISPR nuclease from Prevotella and Francisella 1 (Cpfl) is used. Cpfl is another class II CRISPR/Cas system RNA-guided nuclease with similarities to Cas9 and may be used analogously. Unlike Cas9, Cpfl does not require a tracrRNA and only depends on a crRNA in its guide RNA, which provides the advantage that shorter guide RNAs can be used with Cpfl for targeting than Cas9. Cpfl is capable of cleaving either DNA or RNA. The PAM sites recognized by Cpfl have the sequences 5'-YTN-3' (where "Y" is a pyrimidine and "N" is any nucleobase) or 5'-TTN-3', in contrast to the G-rich PAM site recognized by Cas9. Cpfl cleavage of DNA produces double-stranded breaks with a sticky- ends having a 4 or 5 nucleotide overhang. For a discussion of Cpfl, see, e.g., Ledford et al. (2015) Nature. 526 (7571): 17-17, Zetsche et al. (2015) Cell. 163 (3):759-771, Murovec et al. (2017) Plant Biotechnol. J. 15(8):917-926, Zhang et al. (2017) Front. Plant Sci. 8: 177, Fernandes et al. (2016) Postepy Biochem. 62(3):315-326; herein incorporated by reference.
C2clis another class II CRISPR/Cas system RNA-guided nuclease that may be used. C2cl, similarly to Cas9, depends on both a crRNA and tracrRNA for guidance to target sites. For a description of C2cl, see, e.g., Shmakov et al. (2015) Mol Cell. 60(3):385-397, Zhang et al. (2017) Front Plant Sci. 8: 177; herein incorporated by reference.
In yet another embodiment, an engineered RNA-guided FokI nuclease may be used. RNA-guided FokI nucleases comprise fusions of inactive Cas9 (dCas9) and the FokI endonuclease (FokI-dCas9), wherein the dCas9 portion confers guide RNA-dependent targeting on FokI. For a description of engineered RNA-guided FokI nucleases, see, e.g., Havlicek et al. (2017) Mol. Ther. 25(2):342-355, Pan et al. (2016) Sci Rep. 6:35794, Tsai et al. (2014) Nat Biotechnol. 32(6):569-576; herein incorporated by reference.
The RNA-guided nuclease can be provided in the form of a protein, optionally where the nuclease complexed with a gRNA, or provided by a nucleic acid encoding the RNA-guided nuclease, such as an RNA (e.g., messenger RNA) or DNA (expression vector). In some embodiments, the RNA-guided nuclease and the gRNA are both provided by vectors. Both can be expressed by a single vector or separately on different vectors. The vector(s) encoding the RNA-guided nuclease an gRNA may be included in the vector system comprising the engineered retron msr gene, msd gene and ret gene sequences.
Codon usage may be optimized to improve production of an RNA-guided nuclease and/or retron reverse transcriptase in a particular cell or organism. For example, a nucleic acid encoding an RNA-guided nuclease or reverse transcriptase can be modified to substitute codons having a higher frequency of usage in a yeast cell, a bacterial cell, a human cell, a nonhuman cell, a mammalian cell, a rodent cell, a mouse cell, a rat cell, or any other host cell of interest, as compared to the naturally occurring polynucleotide sequence. When a nucleic acid encoding the RNA-guided nuclease or reverse transcriptase is introduced into cells, the protein can be transiently, conditionally, or constitutively expressed in the cell.
RECOMBINEERING Recombineering (recombination-mediated genetic engineering) can be used in modifying chromosomal as well as episomal replicons in cells, for example, to create gene replacements, gene knockouts, deletions, insertions, inversions, or point mutations. Recombineering can also be used to modify a plasmid or bacterial artificial chromosome (B AC), for example, to clone a gene or insert markers or tags. The engineered retrons described herein can be used in recombineering applications to provide linear single-stranded or doublestranded DNA for recombination. Homologous recombination may be mediated by bacteriophage proteins such as RecE/RecT from Rac prophage or Reda. S from bacteriophage lambda. The linear DNA should have sufficient homology at the 5' and 3' ends to a target DNA molecule present in a cell (e.g., plasmid, BAC, or chromosome) to allow recombination.
The linear double-stranded or single-stranded DNA molecule used in recombineering (i.e. donor polynucleotide) comprises a sequence having the intended edit to be inserted flanked by two homology arms that target the linear DNA molecule to a target site for homologous recombination. Homology arms for recombineering typically range in length from 13-300 nucleotides, or 20 to 200 nucleotides, including any length within this range such as 13, 14, 15, 16, 17, 18, 19, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, or 200 nucleotides in length. In some embodiments, a homology arm is at least 15, at least 20, at least 30, at least 40, or at least 50 or more nucleotides in length. Homology arms ranging from 40-50 nucleotides in length generally have sufficient targeting efficiency for recombination; however, longer homology arms ranging from 150 to 200 bases or more may further improve targeting efficiency. In some embodiments, the 5' homology arm and the 3' homology arm differ in length. For example, the linear DNA may have about 50 bases at the 5' end and about 20 bases at the 3' end with homology to the region to be targeted.
The bacteriophage homologous recombination proteins can be provided to a cell as proteins or by one or more vectors encoding the recombination proteins. In some embodiments, one or more vectors encoding the bacteriophage recombination proteins are included in the vector system comprising the engineered retron msr gene, msd gene and ret gene sequences.
Additionally, a number of bacterial strains containing prophage recombination systems are available for recombineering, including, without limitation, DY380, containing a defective X prophage with recombination proteins exo, bet, and gam; EL250, derived from DY380, which in addition to the recombination genes found in DY380, also contains a tightly controlled arabinose-inducible flpe gene (flpe mediates recombination between two identical frt sites); EL350, also derived from DY380, which in addition to the recombination genes found in DY380, also contains a tightly controlled arabinose-inducible ere gene (ere mediates recombination between two identical loxP sites; SW102, derived from DY380, which is designed for BAC recombineering using a galK positive/negative selection; SW105, derived from EL250, which can also be used for galK positive/negative selection, but like EL250, contain an ara-inducible Flpe gene; and SW106, derived from EL350, which can be used for galK positive/negative selection, but like EL350, contains an ara-inducible Cre gene. Recombineering can be carried out by transfecting bacterial cells of such strains with an engineered retron comprising a heterologous sequence encoding a linear DNA suitable for recombineering. For a discussion of recombineering systems and protocols, see, e.g., Sharan et al. (2009) Nat Protoc. 4(2): 206-223, Zhang et al. (1998) Nature Genetics 20: 123-128, Muyrers et al. (1999) Nucleic Acids Res. 27: 1555-1557, Yu et al. (2000) Proc. Natl. Acad. Sci U.S.A. 97 (11):5978-5983; herein incorporated by reference.
MOLECULAR RECORDING
In some embodiments, the heterologous sequence in the engineered retron construct comprises a synthetic CRISPR protospacer DNA sequence to allow molecular recording. The endogenous CRISPR Casl-Cas2 system is normally utilized by bacteria and archaea to keep track of foreign DNA sequences originating from viral infections by storing short sequences (i.e., protospacers) that confer sequence-specific resistance to invading viral nucleic acids within genome-based arrays. These arrays not only preserve the spacer sequences but also record the order in which the sequences are acquired, generating a temporal record of acquisition events.
This system can be adapted to record arbitrary DNA sequences into a genomic CRISPR array in the form of "synthetic protospacers" that are introduced into cells using engineered retrons. Engineered retrons carrying the protospacer sequences can be used for integration of synthetic CRISPR protospacer sequences at a specific genomic locus by utilizing the CRISPR system Casl-Cas2 complex. Molecular recording can be used to keep track of certain biological events by producing a stable genetic memory tracking code. See, e.g., Shipman et al. (2016) Science 353(6298):aafl l75 and International Patent Application Publication No. WO/2018/191525; herein incorporated by reference in their entireties.
In some embodiments, the CRISPR-Cas system is harnessed to record specific and arbitrary DNA sequences into a bacterial genome. The DNA sequences can be produced by an engineered retron within the cell. For example, the engineered retron can be used to produce the protospacers within the cell, which are inserted into a CRISPR array within the cell. The cell may be modified to include one or more engineered retrons (or vector systems encoding them) that can produce one or more synthetic protospacers in the cell, wherein the synthetic protospacers are added to the CRISPR array. A record of defined sequences, recorded over many days, and in multiple modalities can be generated.
In some embodiments, the engineered retron comprises an msd protospacer nucleic acid region or an msr protospacer nucleic acid region. In the case of a msr protospacer nucleic acid region, the protospacer sequence is first incorporated into the msr RNA, which is reverse transcribed into protospacer DNA. Double stranded protospacer DNA is produced when two complementary protospacer DNA sequences having complementary sequences hybridize, or when a double-stranded structure (such as a hairpin) is formed in a single stranded protospacer DNA (e.g., a single msDNA can form an appropriate hairpin structure to provide the double stranded DNA protospacer).
In some embodiments, a single stranded DNA produced in vivo from a first engineered retron may be hybridized with a complementary single-stranded DNA produced in vivo from the same retron or a second engineered retron or may form a hairpin structure and then used as a protospacer sequence to be inserted into a CRISPR array as a spacer sequence. The engineered retron(s) should provide sufficient levels of the protospacer sequence within a cell for incorporation into the CRISPR array. The use of protospacers generated within the cell extends the in vivo molecular recording system from only capturing information known to a user, to capturing biological or environmental information that may be previously unknown to a user. For example, an msDNA protospacer sequence in an engineered retron construct may be driven by a promoter that is downstream of a sensor pathway for a biological phenomenon or environmental toxin. The capture and storage of the protospacer sequence in the CRISPR array records the event. If multiple msDNA protospacers are driven by different promoters, the activity of those promoters is recorded (along with anything that may be upstream of the promoters) as well as the relative order of promoter activity (based on the relative position of spacer sequences in the CRISPR array). At any point after the recording has taken place, the CRISPR array may be sequenced to determine whether a given biological or environmental event has taken place and the order of multiple events, given by the presence and relative position of msDNA-derived spacers in the CRISPR array. In some embodiments, the synthetic protospacer further comprises an AAG PAM sequence at its 5' end. Protospacers including the 5' AAG PAM are acquired by the CRISPR array with greater efficiency than those that do not include a PAM sequence.
In some embodiments, Casl and Cas2 are provided by a vector that expresses the Casl and Cas2 at a level sufficient to allow the synthetic protospacer sequences produced by engineered retrons to be acquired by a CRISPR array in a cell. Such a vector system can be used to allow molecular recording in a cell that lacks endogenous Cas proteins.
THERAPEUTIC APPLICATIONS
The engineered ncRNAs, reverse transcriptases, Cas nucleases, and the expression systems described herein and/or cells containing the engineered ncRNAs, reverse transcriptases, Cas nucleases, or expression systems can be administered to a subject. Such a subject may suffer from a disease or condition or be suspected of suffering from a disease or condition. Symptoms of the disease or condition can be reduced by such administration. In some cases, progression of the disease or condition can be prevented or reduced by such administration. In some cases, the subject may be asymptomatic but be genetically predisposed to developing disease or condition.
Hence, described herein are methods of administering one or more engineered ncRNAs, reverse transcriptases, Cas nucleases, and/or expression systems therefor and/or cells containing the engineered ncRNAs, reverse transcriptases, Cas nucleases, to a subject. The methods can provide prophylaxis, amelioration and/or therapy for a variety of diseases or conditions, including cystic fibrosis, thalassemia, sickle cell anemia, Huntington's disease, diabetes, Duchenne's Muscular Dystrophy, Tay-Sachs Disease, Marfan syndrome, Alzheimer’s disease, Leber's hereditary optic atrophy (LHON), myoclonic epilepsy with ragged red fibers (MERRF), mitochondrial encephalopathy, lactic acidosis, and stroke-like episodes (MELAS; a type of dementia), obesity, cancers, brain ischemia, coronary disease, myocardial infarction, reperfusion hindrance of ischemic diseases, atopic dermatitis, psoriasis vulgaris, contact dermatitis, keloid, decubital ulcer, ulcerative colitis, Crohn's disease, nephropathy, glomerulosclerosis, albuminuria, nephritis, renal failure, rheumatoid arthritis, osteoarthritis, asthma, chronic obstructive pulmonary disease (COPD), and combinations thereof.
Also provided herein are methods of diagnosing, prognosing, treating, and/or preventing a disease, state, or condition in or of a subject, using the engineered retron of the invention. Generally, the methods of diagnosing, prognosing, treating, and/or preventing a disease, state, or condition in or of a subject can include modifying a polynucleotide in a subject or cell thereof using a composition, system, or component thereof of the engineered retron as described herein, and/or include detecting a diseased or healthy polynucleotide in a subject or cell thereof using a composition, system, or component thereof of the engineered retron as described herein.
In some embodiments, the method of treatment or prevention can include using a composition, system, or component of the engineered retron to modify a polynucleotide of an infectious organism (e.g., bacterial or virus) within a subject or cell thereof.
In some embodiments, the method of treatment or prevention can include using a composition, system, or component of the engineered retron to modify a polynucleotide of an infectious organism or symbiotic organism within a subject.
In some embodiments, the composition, system, and components of the engineered retron can be used to develop models of diseases, states, or conditions.
In some embodiments, the composition, system, and components of the engineered retron can be used to detect a disease state or correction thereof, such as by a method of treatment or prevention described herein.
In some embodiments, the composition, system, and components of the engineered retron can be used to screen and select cells that can be used, for example, as treatments or preventions described herein.
In some embodiments, the composition, system, and components thereof can be used to develop biologically active agents that can be used to modify one or more biologic functions or activities in a subject or a cell thereof.
In general, the method can include delivering a composition, system, and/or component of the engineered retron to a subject or cell thereof, or to an infectious or symbiotic organism by a suitable delivery technique and/or composition. Once administered, the components can operate as described elsewhere herein to elicit a nucleic acid modification event. In some embodiments, the nucleic acid modification event can occur at the genomic, epigenomic, and/or transcriptomic level. DNA and/or RNA cleavage, gene activation, and/or gene deactivation can occur.
The composition, system, and components of the engineered retron as described elsewhere herein can be used to treat and/or prevent a disease, such as a genetic and/or epigenetic disease, in a subject; to treat and/or prevent genetic infectious diseases in a subject, such as bacterial infections, viral infections, fungal infections, parasite infections, and combinations thereof; to modify the composition or profile of a microbiome in a subject, which can in turn modify the health status of the subject; to modify cells ex vivo, which can then be administered to the subject whereby the modified cells can treat or prevent a disease or symptom thereof; or to treat mitochondrial diseases, where the mitochondrial disease etiology involves a mutation in the mitochondrial DNA.
Also provided is a method of treating a subject, e.g., a subject in need thereof, comprising inducing gene editing by transforming the subject with the polynucleotide encoding one or more components of the composition, system, or complex or any of polynucleotides or vectors described herein of the engineered retron, and administering them to the subject.
Also provided is a method of treating a subject, e.g., a subject in need thereof, comprising inducing transcriptional activation or repression of multiple target gene loci by transforming the subject with the polynucleotides or vectors described herein, wherein said polynucleotide or vector encodes or comprises one or more components of composition, system, complex or component of the engineered retron, and comprising multiple Cas effectors.
Also provided is a method of treating a subject, e.g., a subject in need thereof, comprising inducing gene editing by transforming the subject with the Cas effector(s), and encoding and expressing in vivo the remaining portions of the composition, system, (e.g., RNA, guides), complex or component of the engineered retron. A suitable repair template may also be provided by the engineered retron as described herein elsewhere.
Also provided is a method of treating a subject, e.g., a subject in need thereof, comprising inducing transcriptional activation or repression by transforming the subject with the systems or compositions herein.
Also provided is a method of inducing one or more polynucleotide modifications in a eukaryotic or prokaryotic cell or component thereof (e.g., a mitochondria) of a subject, infectious organism, and/or organism of the microbiome of the subject. The modification can include the introduction, deletion, or substitution of one or more nucleotides at a target sequence of a polynucleotide of one or more cell(s). The modification can occur in vitro, ex vivo, in situ, or in vivo.
In some embodiments, the method of treating or inhibiting a condition or a disease caused by one or more mutations in a genomic locus in a eukaryotic organism or a non-human organism can include manipulation of a target sequence within a coding, non-coding or regulatory element of said genomic locus in a target sequence in a subject or a non-human subject in need thereof comprising modifying the subject or a non -human subject by manipulation of the target sequence and wherein the condition or disease is susceptible to treatment or inhibition by manipulation of the target sequence including providing treatment comprising delivering a composition comprising the particle delivery system or the delivery system or the virus particle of any one of the above embodiment or the cell of any one of the above embodiment.
Also provided herein is the use of the particle delivery system or the delivery system or the virus vector (in viral particle) of any one of the above embodiments or the cell of any one of the above embodiments in ex vivo or in vivo gene or genome editing; or for use in in vitro, ex vivo or in vivo gene therapy.
Also provided herein are particle delivery systems, non-viral delivery systems, and/or the virus particle of any one of the above embodiments or the cell of any one of the above embodiments used in the manufacture of a medicament for in vitro, ex vivo or in vivo gene or genome editing or for use in in vitro, ex vivo or in vivo gene therapy or for use in a method of modifying an organism or a non-human organism by manipulation of a target sequence in a genomic locus associated with a disease or in a method of treating or inhibiting a condition or disease caused by one or more mutations in a genomic locus in a eukaryotic organism or a non- human organism.
In some embodiments, target polynucleotide modification using the subject engineered retron and the associated composition, vectors, system and methods comprises addition, deletion, or substitution of 1 -about 10k nucleotides at each target sequence of said polynucleotide of said cell(s). The modification can include the addition, deletion, or substitution of at least 1, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, 100, 200, 250, 300, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 5000, 6000, 7000, 8000, 9000, 10,000 or more nucleotides at each target sequence.
In some embodiments, formation of system or complex results in cleavage, nicking, and/or another modification of one or both strands in or near (e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, or more base pairs from) the target sequence.
In some embodiments, a method of modifying a target polynucleotide in a cell to treat or prevent a disease can include allowing a composition, system, or component of the subject engineered retron to bind to the target polynucleotide, e.g., to effect cleavage, nicking, or other modification as the composition, system, is capable of said target polynucleotide, thereby modifying the target polynucleotide, wherein the composition, system, or component thereof, complex with a guide sequence, and hybridize said guide sequence to a target sequence within the target polynucleotide, wherein said guide sequence is optionally linked to a tracr mate sequence, which in turn can hybridize to a tracr sequence. In some embodiments, modification can include cleaving or nicking one or two strands at the location of the target sequence by one or more components of the composition, system, or component thereof.
In some embodiments, the engineered retron and the associated compositions, systems, vectors, uses, and methods of use, can be used to treat diseases of the circulatory system. In some embodiments, the treatment can be carried out by using an AAV or a lentiviral vector to deliver the engineered retron, composition, system, and/or vector described herein to modify hematopoietic stem cells (HSCs) or iPSCs in vivo or ex vivo. In some embodiments, the treatment can be carried out by correcting HSCs or iPSCs as to the disease using a composition, system, herein or a component thereof, wherein the composition, system, optionally includes a suitable HDR repair template (e.g., a template in the msDNA of the engineered retron).
In some embodiments, the treatment or prevention for treating a circulatory system or blood disease can include modifying a human cord blood cell. In some embodiments, the treatment or prevention for treating a circulatory system or blood disease can include modifying a granulocyte colony-stimulating factor-mobilized peripheral blood cell (mPB) with any modification described herein. In some embodiments, the human cord blood cell or mPB can be CD34+. In some embodiments, the cord blood cells or mPB cells modified are autologous. In some embodiments, the cord blood cells or mPB cells are allogenic. In addition to the modification of the disease genes, allogenic cells can be further modified using the composition, system, described herein to reduce the immunogenicity of the cells when delivered to the recipient. The modified cord blood cells or mPB cells can be optionally expanded in vitro. The modified cord blood cell(s) or mPB cells can be derived to a subject in need thereof using any suitable delivery technique.
The composition and system may be engineered to target genetic locus or loci in HSCs. In some embodiments, the components of the systems can be codon-optimized for a eukaryotic cell and especially a mammalian cell, e.g., a human cell, for instance, HSC, or iPSC and sgRNA targeting a locus or loci in HSC, such as circulatory disease, can be prepared. These may be delivered via particles, such as the lipid nanoparticle delivery system described herein. The particles may be formed by the components of the systems herein being admixed.
In some embodiments, after ex vivo modification the HSCs or iPCS can be expanded prior to administration to the subject. Expansion of HSCs can be via any suitable method such as that described by, Lee, “Improved ex vivo expansion of adult hematopoietic stem cells by overcoming CUL4-mediated degradation of H0XB4.” Blood. 2013 May 16;121(20):4082-9. doi: 10.1182/blood-2012-09-455204. Epub 2013 Mar 21.
In some embodiments, the HSCs or iPSCs modified are autologous. In some embodiments, the HSCs or iPSCs are allogenic. In addition to the modification of the disease genes, allogenic cells can be further modified using the composition, system, described herein to reduce the immunogenicity of the cells when delivered to the recipient.
In some embodiments, the engineered retron and the associated compositions, systems, vectors, uses, and methods of use, can be used to treat neurological diseases. In some embodiments, the neurological diseases comprise diseases of the brain and CNS.
Delivery options for the diseases in the brain include encapsulation of the systems in the form of either DNA or RNA into liposomes and conjugating to molecular Trojan horses for trans-blood brain barrier (BBB) delivery. Molecular Trojan horses have been shown to be effective for delivery of B-gal expression vectors into the brain of non-human primates. The same approach can be used to delivery vectors or vector systems of the invention. In other embodiments, an artificial virus can be generated for CNS and/or brain delivery.
In some embodiments, the engineered retron and the associated compositions, systems, vectors, uses, and methods of use, can be used to treat hearing diseases or hearing loss in one or both ears. Deafness is often caused by lost or damaged hair cells that cannot relay signals to auditory neurons. In some embodiments, the composition, system, or modified cells can be delivered to one or both ears for treating or preventing hearing disease or loss by any suitable method or technique known in the art, such as US20120328580 (e.g., auricular administration), by intratympanic injection (e.g., into the middle ear), and/or injections into the outer, middle, and/or inner ear; administration in situ, via a catheter or pump (U.S. 2006/0030837) and Jacobsen (U.S. Pat. No. 7,206,639). Also see US20120328580. Cells resulting from such methods can then be transplanted or implanted into a patient in need of such treatment.
In some embodiments, the engineered retron and the associated compositions, systems, vectors, uses, and methods of use, can be used to treat diseases in non-dividing cells. Exemplary non-dividing cells include muscle cells or neurons. In such cells, homologous recombination (HR) is generally suppressed in the G1 cell-cycle phase but can be turned back on using art- recognized methods, such as Orthwein et al. (Nature. 2015 Dec 17; 528(7582): 422-426).
In some embodiments, the engineered retron and the associated compositions, systems, vectors, uses, and methods of use, can be used to treat diseases of the eye. In some embodiments, the engineered retron and the associated compositions, systems, vectors, uses, and methods of use, can be used to treat muscle diseases and cardiovascular diseases.
In some embodiments, the engineered retron and the associated compositions, systems, vectors, uses, and methods of use, can be used to treat diseases of the liver and kidney.
In some embodiments, the engineered retron and the associated compositions, systems, vectors, uses, and methods of use, can be used to treat epithelial and lung diseases.
In some embodiments, the engineered retron and the associated compositions, systems, vectors, uses, and methods of use, can be used to treat diseases of the skin.
In some embodiments, the engineered retron and the associated compositions, systems, vectors, uses, and methods of use, can be used to treat cancer.
In some embodiments, the engineered retron and the associated compositions, systems, vectors, uses, and methods of use, can be used in adoptive cell therapy.
In some embodiments, the engineered retron and the associated compositions, systems, vectors, uses, and methods of use, can be used to treat infectious diseases.
In some embodiments, the engineered retron and the associated compositions, systems, vectors, uses, and methods of use, can be used to treat mitochondrial diseases.
Ex Vivo Cellular Modification
In certain embodiments, gene transfer may more easily be performed under ex vivo conditions. Ex vivo gene therapy refers to the isolation of cells from a subject, the delivery of a nucleic acid into cells in vitro, and then the return of the modified cells back into the subject. This may involve the collection of a biological sample comprising cells from the subject. For example, blood can be obtained by venipuncture, cells can be obtained by scrapings, and solid tissue samples can be obtained by surgical techniques etc. according to methods available in the art.
Usually, but not always, the subject who receives the cells (i.e., the recipient) is also the subject from whom the cells are harvested or obtained, which provides the advantage that the donated cells are autologous. However, cells can be obtained from another subject (i.e., donor), a culture of cells from a donor, or from established cell culture lines. Cells may be obtained from the same or a different species than the subject to be treated, but preferably are of the same species, and more preferably of the same immunological profile as the subject. Such cells can be obtained, for example, from a biological sample comprising cells from a close relative or matched donor, then transfected with nucleic acids (e.g., comprising an engineered retron), and administered to a subject in need of genome modification, for example, for treatment of a disease or condition.
The following Examples illustrate some of the experiments performed in the develop of the invention.
EXAMPLE
Introduction
Multiplexing - the act of consolidating multiple discrete elements into a single composite channel - has enabled genomic technologies to scale toward the complexity of the biology we hope to understand. Today, one might use multiplexed DNA synthesis to make a library of distinct CRISPR gRNAs on a single synthesis chip, then use multiplexed experimental design to clone and transfect that library of gRNAs across cells in a single culture, and finally use multiplexed sequencing to analyze the effect of the perturbation on a single sequencing flow-cell (1,2). This now-standard multiplexed gRNA workflow has allowed scientists run experiments across every gene in parallel with barely more effort than they might have previously put into determining the effect of single gene. However, the typical multiplexing of a gRNA library precludes an important level of analysis: it is implemented across cells, where a single edit is made per genome, and thus cannot be used to study the interaction of mutations within a genome.
Technologies for multiplexing within genomes - where multiple distinct, non-adjacent edits are made using a single, consolidated editor - are much more limited. Yet, applications for multiplexing within genomes abound in both fundamental biology (e.g., studying epistasis, long-range gene regulation, and genome organization) and biotechnology (e.g., metabolic engineering, molecular recording, and genome minimization). These complex applications require precise mutations, not genomic scars or transcriptional perturbations. Precision is essential to understand combinatorial genome complexity, such as probing compensatory mutations across genes in a complex or interrogating enhancer-promoter interactions, and is necessary to build nuanced technological advances, such as ribosome-dependent tuning of gene expression in a metabolic pathway.
In bacteria, the most commonly used approach to introduce combinatorial, precise mutations is MAGE (multiplexed automated genome engineering), which relies on single stranded DNA (ssDNA) recombineering (3-5). A eukaryotic version of this technology has been developed to extend this approach to yeast (6). However, MAGE is limited by its requirement for numerous labor-intensive recombineering cycles required to attain efficient combinatorial editing rates, and by its reliance on exogenously delivered oligonucleotides that leave no trackable plasmid element for phenotyping by proxy (7). Base-editing (BE) and primeediting (PE) (8-10) are two other precise editing approaches that can be multiplexed (11-16). Base-editors are the simplest to multiplex using tandem gRNAs but are limited to single base mutations of a defined type (either A»T-to-G»C or OG-to-T»A) (11,13-14). Prime-editors have also been multiplexed, but the complexity of the editing elements grows quickly with additional sites. In bacteria, multiplexed prime editing requires a three plasmid system, and multiple edits occur on the same genome in less than 1% of cells (12), while systems built for human and plant cells require two gRNAs per site in addition to the editing template, which can create issues with the assembly of multiplexed plasmids (14-16).
Another way to introduce precise mutations that is compatible with both prokaryotic and eukaryotic editing is to produce editing donors inside a cell using modified retrons. Retrons are bacterial tripartite systems that have been shown to provide phage defense (17-20). Two of the components of the retron operon are a reverse transcriptase and a small (200-300 base), structured non-coding RNA (ncRNA). The reverse transcriptase recognizes and partially reverse transcribes the ncRNA into a single-stranded DNA fragment that is present at the abundance of a cellular transcript (18,21-24).
We and others have previously shown that the retron ncRNA can be modified to encode an editing donor to precisely edit the genomes of bacteria, phage, plant, yeast, and even human cells (25-31). However, these retron-derived editors have only been used to edit genomic positions one at a time. Here, we describe a substantial modification of the retron ncRNA to produce multiple editing donors simultaneously from a single transcript after reverse transcription. We show that these multiplexed, arrayed retron elements - termed multitrons - can be paired with single- stranded annealing proteins to edit prokaryotic genomes and with CRISPR components to edit eukaryotic genomes (26-29). We demonstrate utility with proof- of-concept applications in molecular recording, multiplexed deletions, and metabolic engineering.
Provided herein is a system for generating multiple changes simultaneously in a bacterial or yeast and other eukaryotic genomes. The changes include a broad range of different kinds of edits from single point mutations to large insertions, deletions and replacements/substitutions enabling large-scale genome engineering in prokaryotic and eukaryotic genomes. These edits to the host genome are introduced by a reverse-transcribed editing donor, which edits the bacterial genome during replication and the yeast genome after a double-strand break (DSB) generated by a CRISPR derived nuclease. Editing of multiple DNA loci in a single genome is desired for many applications in biotechnology such as rewiring metabolic pathways to improve the production of compounds of interest, creating molecular recorders, new genetic circuits or improving the use of different cells as biosensors.
Materials and Methods Biological replicates were taken from distinct samples, not the same sample measured repeatedly.
Plasmid Construction
All the plasmids used in this work are listed in Table 3. Furthermore, all the RT-donors and the oligonucleotides containing the desired mutations for the editing experiments are listed in Table 4.
Table 3
Figure imgf000189_0001
Figure imgf000190_0001
Figure imgf000191_0001
Table 3 cont'd
Figure imgf000192_0001
Table 3 cont'd
Figure imgf000193_0001
Table 4
Figure imgf000194_0001
E. coli
To clone additional 70 bp donors in a single msd, pSLS.492 (29) plasmid containing a rpoB donor was used as backbone. To clone a donor upstream of the rpoB donor, a 60 bp reverse oligo annealing (25bp) with the 5’ region of the msd and containing 35 bp of the new donor, and a 60 bp forward oligo annealing (25bp) with the 5’ end of rpoB donor and harboring the other half of the new donor were used. To clone a donor downstream of the rpoB donor, a 60 bp forward oligo annealing (25b) with the 3’ region of the msd and containing 35 bp of the new donor, and a 60 bp reverse oligo annealing (25bp) with the 3’ end of rpoB donor and harboring the other half of the new donor were used. After a 30 cycles PCR reaction with Q5 hot-start high-fidelity polymerase (NEB) following recommended vendor protocol, a KLD reaction (NEB) was carried out to self-ligate the plasmid encoding an additional donor. To add a third donor in a single msd, the mentioned cycle was repeated again.
To construct the plasmids harboring the retron arrays in their different architectures the pCDF-DUET-1 vector (Novagen) was used as a backbone. A parental plasmid (pAGD159; Table 3) containing a whole ncRNA with a gyrA donor downstream of the first T7 promoter, and Ecol-RT downstream of the second T7 promoter was constructed. To assess whole ncRNA retron arrays, the ncRNA harboring the rpoB donor from pSLS.492 was amplified and cloned upstream and downstream of the gyrA -containing ncRNA by Gibson Assembly. To construct the plasmids containing the msd array, firstly, the msr was deleted from pAGD159 and subsequently cloned between the second T7 promoter and Ecol-RT using a Gibson Assembly approach. Finally, the msd harboring the rpoB donor from pSLS.492 was amplified and cloned upstream and downstream of the gyrA -containing ncRNA by Gibson Assembly. To test if the msd array could act as a single transcript unit independent of the msr region, a T7 terminator was cloned between the msd array and the second T7 promoter.
To construct multitrons containing more than 2 arrayed msd a one-pot Golden Gate cloning approach was used. Firstly, a plasmid containing a sfGFP stuffer flanked by two inverted Bsal (type IIS restriction enzyme) target sites were cloned in the place of the msd Array generating pAGD236 (see Figure 4C for reference). Editing units, based on a msd with a donor were order as gBlocks (IDT) flanked by inverted Bsal target sites and compatible nucleotide overhangs to clone them in tandem. The Golden Gate protocol was carried out in 20uL reactions as follows: 1 uL pAGD236, 5uL of each gBlock (3uL for 5x msd arrays), 1.5uL Bsal (NEB), 2uL T4 DNA ligase Buffer, 0.5 uL T4 DNA ligase (NEB). Depending on the complexity of the reaction (more number of editing units) The reaction consists on 30 or 60 cycles (depending on the complexity) of 5 min at 16°C and 5 min at 37°C and a final cycle of 10 min at 60°C.
To optimize multitrons for metabolic engineering, the retron cassette (ncRNA and RT) from pSLS.492 was cloned into pORTMAGE-Ecl (33) upstream of the CspRecT gene (Figure S2). RBS optimization of Ecol RT and CspRecT genes were carried out using primers that contain the optimized RBS and self-ligating the plasmids using KLD reaction mix. Finally, recombineering operon was cloned into pKD-46 (35) backbone to obtain the parental temperature-sensitive multitron plasmid (pAGD248). Multitron msd array architecture with the sfGFP stuffer flanked by two inverted Bsal described previously was cloned into pAGD248 generating pAGD335. The above-mentioned golden gate reaction was used to clone gBlocks containing the required donors into the pAGD335 backbone to generate the multitrons versions used in Fig 4.
S. cerevisiae
To assess whether Csy4 could enable the processing of editrons and retron msd/Cas9 gRNA units for genome editing, pSCL390, a derivative of pZS.157 (Addgene #114454), was generated with a yeast codon-optimized P2A-Csy4 CDS gblock (IDT) cloned downstream of the SpCas9 CDS by Gibson Assembly.
To compare the genome editing efficiencies of ribozyme-processed editrons to Csy4- processed editrons, pSCL.396, a derivative of pSCL.39 (Addgene #184973), was generated with the 5’ Hammerhead ribozyme and 3’ HDV ribozyme replaced by Csy4 recognition sites by amplification of the editron and backbone from pSCL.39 and assembled via Gibson Assembly.
To assess whether Csy4 could enable the processing of arrayed editrons, we generated pSCL.391, a derivative of pSCL.39 where a second editron, targeting the S. cerevisiae FAA1 locus was added on the 3’ end of the ADE2 -targeting editron by Gibson Assembly. The cassette thus consists of two editrons, separated by a Csy4 recognition site, and flanked by a Hammerhead ribozyme and a HDV ribozyme on the 5’ and 3’ of the expression cassette, respectively.
To construct plasmids for the expression of retron msd arrays, first, a Golden Gate compatible entry vector, pSCL.452 was generated that carries the Gal7 promoter and terminator, alongside a cassette for expression of the retron msr from a Pol III SNR52 promoter. pSCL.452 is a derivative of a derivative of pSCL.39, generated by Gibson Assembly of the pSCL.39 backbone, amplified to replace the recombitron with inverted PaqCI sites for Golden Gate assembly, with a gblock (IDT) encoding pSNR52p-msr-SUP4t.
Next, plasmids carrying retron msd arrays for the editing of multiple loci in the yeast genome were generated by Golden Gate cloning of pre PaqCI-digested pSCL.452 with gBlocks (IDT) that encoded a PaqCI cut site, a retron msd-encoded donor and paired gRNA for editing, a Csy4 recognition sequence, and a PaqCI cut site (Fig 6E). gBlocks were ordered with compatible nucleotide overhangs to enable random cloning of all combinations of gblocks into the entry plasmid, after PaqCI digestion. We ordered gblocks to edit the ADE2, FAA1, TRP2, SGS1 and CAN1 loci. These were cloned into the PaqCI-digested pSCL.452 backbone by Golden Gate cloning, yielding plasmids pSCL.473 (editors for ADE2, FA A l)., pSCL.475 (editors for 4 Z)/<2, CAN1 and FAA 1) and pSCL.672 (editors for 4 Z)/<2, FAA1, TRP2, SGS1 and CAN1).
H. sapiens
All human vectors are derivatives of pSCL.273, itself a derivative of pCAGGS. pCAGGS was modified by replacing the MCS and rb_glob_polyA sequence with an IDT gblock containing inverted BbsI restriction sites and a SpCas9 tracrRNA, using Gibson Assembly. The resulting plasmid, pSCL.273, contains an SV40 ori for plasmid maintenance in HEK293T cells. The strong CAG promoter is followed by the BbsI sites and SpCas9 tracrRNA. Bbsl-mediated digestion of pSCL.273 yields a backbone for single or library cloning of plasmids with inserts that contain {retron RT - Hl promoter - hCtRNAn_msdRNAn_gRNAn }, by Gibson Assembly or Golden Gate cloning (see Fig 6e for an illustration of this principle). The retron RT (or its catalytically dead counterpart) and Hl promoter fragments were synthesized through IDT, as were the hCtRNAn_msdRNAn_gRNAn units. Golden gate cloning of these elements alongside 3 editor units (EMX1, FANCF, and HEK3) yielded plasmids pSCL.757 (CAGp-Eco3RT-TYpA // Hl-msr-tRNA-Cys-GCA-EA7A7_msd-gRNA); pSCL.758 (CAGp-Eco3RT-TYpA // H I -msr-tRNA-Cys-GCA-ZZZW3_msd-gRNA-tRNA- Cys-GCA-/HM7’_msd-gRNA-tRNA-Cys-GCA-7AT¥/_msd-gRNA) and pSCL.760 (CAGp- dEco3RT-TYpA // Hl-msr-tRNA-Cys-GCA-EACY7_msd-gRNA).
Strains and Growth Conditions
All bacterial and yeast strains are listed in Table 5.
Table 5
Bacterial Strains
Figure imgf000198_0001
The E. coli strains used in this study were DH5a (New England Biolabs) for cloning purposes, bMS.346 (DE3) for retron recombineering assays. Bacteria were grown in LB medium (10 g/1 tryptone, 5 g/1 yeast extract, 5 g/1 NaCl). Antibiotics were added as required (carbenicillin, spectinomycin, kanamycin and chloramphenicol).
Yeast Strains
All yeast strains were created by LiAc/SS carrier DNA/PEG transformation (52) of BY4742 (26). Strains for evaluating the effect of Csy4 on genome editing efficiency were created by BY4742 integration of plasmids pZS.157 (Addgene #114454) or pSCL.390. The plasmids were KpnI-linearized and inserted into the genome by homologous recombination into the HIS3 locus. Transformants were isolated on SC-HIS plates.
Bacterial Recombineering expression and analysis
In experiments using multitrons to edit bacterial genomes, the retron cassette encoded in a pET-21 (+) plasmid (Novagen) and the CspRecT and mutLE32K in the plasmid pORTMAGE-Ecl (33) were overexpressed using 1 mM IPTG, 1 mM m-toluic acid and 0.2% arabinose for 16 h with shaking at 37°C. For the molecular recording assay (Fig 1G), a control without m-Tol and different concentration of the inducer, ranging from 0,005 mM to 0,1 mM, were added. To engineer the lycopene metabolic pathway (Fig 4), bMS.346 electrocompetent cells containing pAC-LYC (42) plasmid, were transformed with different multitron plasmid versions (Table 3 and 4) and growth for 16 h at 30°C. Single colonies from the transformation plate were inoculated into 500uL of LB in triplicates in ImL deep-well plates and incubated at 30°C for 24 h with vigorous shaking to prevent the cells from settling. A 1 : 1000 dilution of the cultures were passaged into LB 1% arabinose and incubated at 30°C for 24 h with vigorous shaking. This last step was repeated for a total of 72h of editing.
After the different type of assays carried out in this study, a volume of 25 ul of culture was collected, mixed with 25 ul of water and incubated at 95 °C for 10 min. A volume of 1 ul of this boiled culture was used as a template in 30-ul reactions with primers flanking the edit site, which additionally contained adapters for Illumina sequencing preparation (Table 6). These amplicons were indexed and sequenced on an Illumina MiSeq instrument and processed with custom Python software to quantify the percentage of precisely edited genomes. 5 Table 6
Figure imgf000200_0001
Figure imgf000201_0001
Yeast editing expression and analysis
The parental strains (-Csy4: HIS3::pZS.157; +Csy4: 7/AS'3::pSCL390) were transformed with variants of the editron expression cassettes by LiAc/SS carrier DNA/PEG transformation. Single colonies from the transformation plate were inoculated into 500uL of SC-HIS-URA 2% raffinose in triplicates in ImL deep-well plates and incubated at 30°C for 24 h with vigorous shaking to prevent the cells from settling. Cultures were passaged into SC- HIS-URA 2% galactose and incubated at 30°C for 24 h with vigorous shaking. This was repeated once more for experiments meant to compare the genome editing efficiencies of ribozyme-processed editrons to Csy4-processed editrons, for a total of 48h of editing; and four more times for experiments meant to assess whether arrays of retron msds could be used to edit multiple loci in the yeast genome, for a total of 120h of editing. At each timepoint of galactose- induced editing, a 250uL aliquot of the cultures was harvested, pelleted and washed with water, and prepped for deep sequencing of the loci of interest.
To compare the bulk editing rates across sites to rates of edits in individual colonies for the Csy4-processed editrons, after 48h of editing, dilutions were plated on SC-HIS-URA plates. For each of 3 biological replicates, 10 colonies were grown overnight in SC-HIS-URA to saturation and subjected to genomic DNA extraction and targeted PCR of the ADE2 and FAA1 loci, as described below. Amplicons were sent for Sanger sequencing, and editing rates per biological replicated were calculated by assessing the Sanger reads for the 10 colonies per biological replicate for the expected precise edit.
Samples were prepped for deep sequencing of the edited loci as described previously (29). Briefly, genomic DNA was extracted by (1) resuspending the cell pellets in 120uL of lysis buffer (100 mM EDTA pH 8, 50 mM Tris-HCl pH 8, 2% SDS) and heating them to 95 °C for 15 min; (2) cooling the lysate on ice and adding 60uL of protein precipitation buffer (7.5 M ammonium acetate), then inverting gently and placing samples at -20°C for lOmin; (3) centrifugation of the samples at maximum speed for 2mins (or until a clear supernatant forms) and collecting the supernatant (~100uL) in new 1 ,5mL tubes; (4) precipitating the nucleic acids by adding equal parts of ice-cold isopropanol to the samples, mixing the samples thoroughly and incubating the mix at -20°C for lOmin (or overnight for higher yield), followed by pelleting by centrifugation at maximum speed for 2min; (5) washing the pellet twice with 200 pl of ice- cold 70% ethanol, followed by air-drying it; and (6) resuspending the pellet in 40 pl of water. 0.5uL of gDNA was used as template in 20-pl PCR reactions with primers flanking the edit site in of the target locus, which additionally contained adapters for Illumina sequencing preparation (Table 6). The primers do not bind to the retron msd donor sequence. These amplicons were indexed and sequenced on an Illumina MiSeq instrument and processed with custom Python software to quantify the percentage of precise edits using the retron derived RT-DNA template.
Human Cell Culture
HEK293T cells, expressing spCas9 from a piggyBac integrated, TRE3G driven, doxycycline-inducible (1 pg/ml) cassette (18), were seeded at 7 xlO5 live cells/well in coated 6-well plates and grown in DMEM +GlutaMax supplement (Thermo Fisher #10566016) overnight. Lipofectamine 3000 transfection mixes were prepared in independent triplicates and cells were transfected with 5ug of plasmid per well (3 wells per plasmid). Cells were passaged the next day and doxycycline was refreshed at passaging. Cells were grown for an additional 48 h, for a total of 72h of editing. Three days after transfection, cells were collected for sequencing analysis. To prepare samples for sequencing, cell pellets were collected, and gDNA was extracted using a QIAamp DNA mini kit according to the manufacturer’s instructions. DNA was eluted in 150 pl of ultra-pure, nuclease-free water. 0.5uL of gDNA was used as template in 20-pl PCR reactions with primers flanking the edit site in of the target locus, which additionally contained adapters for Illumina sequencing preparation (Table 6). The primers do not bind to the retron msd donor sequence. These amplicons were indexed and sequenced on an Illumina MiSeq instrument and processed with custom Python software to quantify the percentage of precise edits using the retron derived RT-DNA template.
Whole-Genome Sequencing and Alignment to Measure Off-Target Mutagenesis.
A total of 7 genomes were sequenced using a shot-gun approach: E. coli bMS.346 parental strain, 3 individual colonies after one recombineering round using a wild-type Ecol RT and 3 individual colonies after one recombineering round using a dead Eco 1 RT. Prior to sequencing, 3 ml LB liquid culture of each isolate was grown for 16h at 37°C. The gDNA was isolated by using the Quick-DNA/RNA™ Miniprep Plus Kit (Zymo Research). Extracted gDNA was measured using QubitTM IX dsDNA High Sensitive (HS; Thermo Scientific). gDNA was tagmented using Tn5 transposase using the following reaction (50uL): 25 uL 2x TD Buffer (20 mM Tris-HCl pH 7.6, 10 mM MgC12 and 20% dimethyl formamide), 2.5uL Tn5 (in-house prepared) and 50 ng gDNA. The reaction was incubated for Ih 30’ at 37 °C. The gDNA was cleaned-up and eluted in 15uL using the DNA Clean & Concentrator (Zymo Research). Tagmented gDNAs were indexed and sequenced on an Illumina MiSeq instrument. E. coli strain bMS.346 whole genome variants were called against A. coli K12 sbstr. MG1655 genome (accession no. NC_000913) using Geneious Prime® 2023.2.1 software alignment tools. Variants appearing in the genome of the wild-type and dead RT isolates were called against the bMS.346 parental strain.
Colorimetric screen and assay for lycopene production.
After cycle 3 (72h) of the metabolic engineering assay, cells from the edited bMS.346 populations using different multitrons were plated on LB-chloramphenicol agar plates and grown for 1 day at 30° C and 2 days more in darkness and at room temperature to produce red colonies. Per edited population with a multitron, plates containing around 103 colonies were screened by visual inspection searching for increased red colour intensity. A total of 84 colonies (12 isolates from each multitron version and 12 from the control) were selected for lycopene quantification. These isolated colonies were grown into 1 mL LB-chloramphenicol in 1 mL deep-well plates for 24 h at 37° C to cure multitron plasmid. For lycopene extraction, 1 ml of cells were centrifuged at 16,000g for 30 s, the supernatant was removed, and the cell pellet was resuspended with 1 mL water. Cells were re-centrifuged at 16,000g for 30 s, the supernatant was removed, and the cells were resuspended in 200 ml acetone and incubated in the dark for 15 min at 55 °C with intermittent vortexing. The mixture was centrifuged at 16,000g for 1 min and the supernatant containing the lycopene was transferred to 96 white/clear bottom plate. Absorbance at 470 nm of the extracted lycopene solution was measured using a spectrophotometer to determine the lycopene content. Lycopene yield of the different colonies from each was calculated by normalizing the times of lycopene production against the control. Cells coming from different clusters of lycopene production were re-striked in LB- chloramphenicol agar plates grown for 24h at 30°C and for another 48h at room temperature. Between 3 and 8 colonies from each re-striking were selected to quantify the lycopene production following the described protocol and for Sanger sequencing across the dxs/idi targets.
Table 2
Figure imgf000205_0001
Figure imgf000205_0002
Figure imgf000205_0003
Assessment of plasmid stability
E. coli
Recombineering plasmid was transformed into E. coli strain bMS.346, followed by 5 days of growing and diluting in the presence or absence of the arabinose. A dilution of the final culture was diluted and plated. Finally, the msd Array of 10 individual colonies per replicate (n=3) were amplified and sequenced to assess genetic stability of the multitron approach (see Fig S2F for reference).
S. cerevisiae
Three individual colonies of yeast carrying 2, 3 or 5 donor arrayed retron msdRNA- Cas9 gRNA expression cassettes were inoculated in C-URA-HIS 2% Raffinose media and passaged 5 times overnight in C-URA-HIS with 2% Galactose, for a total of 120h of editing at 30C. After 120h of editing, dilutions were plated on C-HIS-URA plates and 10 colonies for each biological replicates were subjected to plasmid extraction. Plasmids were sent for wholeplasmid sequencing and consensus reads were aligned to the reference plasmid.
Results
Multiplexed recombineering from multiple donors in a retron msd
The use of retrons in bacterial recombineering was originally developed for applications in molecular recording (25) and has more recently been optimized to install single targeted edits and interrogate biology (26-29). To do so, a retron ncRNA - which can be divided into two regions: an msr (multicopy single-stranded RNA) that is not reverse transcribed and an msd (multicopy single-stranded DNA) that is reverse transcribed - is modified to encode an editing donor within the msd region. This modified ncRNA is expressed in cells along with a retron reverse transcriptase (e.g., retron Ecol-RT) that reverse transcribes the retron msd to produce an editing donor (RT -Donor). An overexpressed single-stranded annealing protein (SSAP, e.g., CspRecT) and the host single-stranded binding protein (SSB) promote annealing of the RT -Donor to the lagging strand of a replicating chromosome to install the edited sequence (32-33).
We aimed to further modify retrons to create multitron editors, capable of multiplexed editing of a single genome from a consolidated retron element generating multiple RT -Donors per transcript. Recombineering via oligonucleotide donors is most efficient with donors between 70 and 90 bases long3, which is also the ideal range for retron recombineering donors (28,31). Yet, retron RTs are capable of reverse transcribing much longer RT-Donors, even up to an entire gene length (26). Thus, we initially tested a multitron architecture that encodes multiple 70 bp donors end-to-end within a single msd loop (Fig 1A) using the two tandem donors to make point mutations in both the rpoB and gyrA genes in E. coli. We tested two versions of this multitron with the donors in each of the possible orders in the msd as well as a control rpoB singleplex editor. Both tandem multitron variants edited both sites, and editing rates for rpoB were comparable in the singleplex versus multitron configurations (Fig IB).
When comparing the two multitron versions, we noticed that the site edited by the first donor in the multitron tended to have a higher editing rate than the site edited by the second donor. The donor in position one is reverse transcribed first, so the editing difference could be due to a small effect of RT processivity, or due to a positional effect of the donors after reverse transcription. To distinguish between these possibilities, we compared the relative editing efficiencies at each site using the multitrons versus synthetic oligonucleotides of the same sequence as the tandem RT-Donors. Unlike RT-Donors produced by multitrons, oligonucleotide donors had similar relative editing rates across the sites independent of their donor position (Fig 1C), consistent with an effect of RT processivity.
We next tested three donor multitrons in the tandem msd architecture, using a third donor targeting lacZ on the leading strand (less effective than targeting the lagging strand). All three sites were edited in each of the three permutations of donor order (Fig ID), with the same positional bias for higher editing at the 5’ end of the RT -Donor (Fig IE). Although the positional bias is a bug in our intended design, we wondered whether it could be exploited to create a range of editing efficiencies for analog molecular recording. Retrons have previously been used as analog molecular recorders capable of detecting the magnitude and duration of a specific input by accumulating precise mutations in the genome25. These analog molecular recorders are, however, limited to operating in the linear range of the interaction between reporter and editing efficacy. We reasoned that using a tandem multitron could add robustness by expanding the dynamic range of a recording across multiple sites. We constructed another multitron encoding three lagging donors (gyrA,priB, rpoB) driven by an m-toluic acid (mTol)- inducible promoter. Here too, we found that the editing rates were inversely proportional to the order of donor reverse transcription at maximal induction (Fig IF). As a result, the editing rates for each site saturate at different mTol concentrations when used as an analog recorder of mTol (Fig 1G), effectively increasing the dynamic range of the recorder.
Improved Multiplexed Editing Using Donors in Retron Arrays
To overcome the effect of donor position inside a single msd loop, we engineered a different version of the multitron architecture composed of an ncRNA array with multiple msr- msd regions in tandem, each one containing a distinct donor to edit a unique target site (Fig 2A). With this arrayed ncRNA multitron, the retron RT has different substrates available within a transcript to generate multiple RT-donors independently, each at the same distance from an internal RT priming site. We tested the ability of this arrayed ncRNA multitron to edit rpoB and gyrA versus singleplex retron editors and found that the arrayed ncRNA multitron performed as well or better than the singleplex versions (Fig 2A). However, this arrayed ncRNA created a new constraint. The length of the ncRNA donor unit is 229 bp and the arrayed design adds 109 bp of direct repeat for each additional editor due to msr duplication, both of which pose challenges for the synthesis and assembly of new multitron plasmids.
Therefore, we engineered a third multitron version composed of an msd array rather than an ncRNA array. In this case, each msd encodes a distinct donor as in the previous version, but the msr is expressed in trans as a separate transcript (Fig 2B). This trans msr arrangement was previously shown to be a tolerated modification for reverse transcription of endogenous retron msds (34). In practice, this reduces the editing unit to 149 bp and reduces the length of the longest direct repeat to 74 bases. The trans msr can interact with any of the arrayed msds, again keeping the donor at a constant distance from the site of RT priming (Fig 2C).
We tested editing by the arrayed msd multitron versus singleplex editors and found no difference in editing rates at either site (Fig 2B). The trans msr arrangement in fact yielded consistently higher editing rates than the endogenous retron ncRNA architecture in both singleplex and multiplexed forms throughout this project. Although the msd array and msr/RT transcript contain no terminator between them and could potentially be transcribed as a single unit rather than the intended trans arrangement, we found both sites could be edited at a similar efficiency when using a plasmid containing a terminator between the msd array and the msr (Fig SI).
To test whether donor position inside the msd array multitron affects editing, we constructed three multitron variants with donors to edit priB, rpoB and gyrA genes in each possible order. All three sites were edited by each multitron variant (Fig 2D), and there was no effect of donor position using arrayed msds (Fig 2E). Finally, to push the limits of within- genome multiplexing, we constructed an arrayed msd multitron to simultaneously edit 5 target sites (hda, fbaH, priB, rpoB and gyrA . Editing rates ranged from 5 to 25% for each site, illustrating that arrayed msd multitrons are a potent tool for multiplexed genome editing technologies (Fig 2F).
Increasing Limits of Deletion Size Using Nested Multitrons One benefit of using retron-derived donors is that they support a broad range of precise mutations, including insertions, deletions and replacements. However, when recombineering with either retron RT-Donor or oligonucleotide donors, the efficiency of inserting and deleting base pairs is inversely related to the size of the edit3,31. This is presumably intrinsic to the mechanism of recombineering, a result that we replicated here using RT -Donors to delete 1 to 100 bp, finding a declining efficiency with deletion size whether using an endogenous ncRNA architecture or the trans msr architecture (Fig 3 A).
We wondered whether we could overcome this limitation on deletion efficiency at larger sizes by using arrayed msd multitrons encoding a series of nested deletion donors. A nested deletion series consists of multiple donors intended to make deletions of increasing size progressively at same locus. If the smallest deletion succeeds, it creates a smaller target size for a previously disfavored large deletion. We explored nested deletions by first comparing the editing efficiency of single 25 and 50 bp deletions in the lacZ gene with simultaneous deletions of overlapping 25 and 50 bp at the same location using a multitron (Fig 3B). The 50 bp deletion was not significantly less efficient than the 25 bp deletion using singleplex retron donors so, unsurprisingly, the rate of 50 bp deletions by the multitron version was not significantly increased. However, the rate of the 25 bp deletion was decreased by the multitron, suggesting that 25 bp deletions were being converted into 50 bp deletions.
Next, we tested a multitron containing a 25, 50, and 100 bp nested deletion donor series (Fig 3C). In this case, the previously disfavored 100 bp deletion was significantly more efficient using the multitron series than using the singleplex deletion donor. In fact, this strategy created a 100 bp deletion in -42% of genomes, overcoming an intrinsic inefficiency in recombineering deletions. Furthermore, the multitrons generated a heterogeneous population of genetic elements with different deletions sizes that could be used to probe functional domains of a target gene or miniature versions of a protein of interest.
Multiple Edits in an Individual Genome Using Optimized Multitrons
Up to this point, editing has been quantified by bulk sequencing of each individual locus, with the assumption that edits accumulate on genomes according to the product of the rates at each site. We next aimed to explicitly test that assumption. First, we designed a multitron editor producing three, non-overlapping msd donors, each targeting a single gene, gyrA, in a genome window of 300 bp (Fig 4A). All 70 bp donors target the lagging strand. With this narrow editing window, we were able to analyze recombineering efficiencies for individual sites as well as combinatorial edits from an amplicon of the locus. Sequencing revealed editing rates of 8 to 25% across the sites, comparable to previous experiments (Fig 4A). From this individual site data, we calculated an expected frequency that we should find the various double edits and the triple edits among genomes, based on the product of rates at each site (Fig 4B). We compared this to the real frequency of each double combination and the triple edit in our sequencing data and found that the expected and real rates were matched (Fig 4B). Here, the double edits were present in 1.4-7.1% of genomes and the triple edit was present in -0.77% of genomes.
To test the accumulation of multiple edits on individual genomes in a more practical scenario, we decided to isolate multiply edited clones using a single editing plasmid that can be easily removed after editing. To do this, we combined the five molecular elements required for multitron recombineering - msd array, msr, RT, RecT, and dominant negative mutL (to suppress mismatch repair for single base mutations) - onto a single plasmid with RSF1010 origin of replication (Fig S2A). However, initial testing of this architecture yielded editing rates for the rpoB gene were ~5x lower using the single plasmid compared to the previous two plasmid system (-5% and -25%, respectively). To increase recombineering efficiency, we added an A. coli optimized ribosome binding site (RBS) immediately upstream of only the RT gene or both the RT and the CspRecT genes, both of which increased editing rates but still fell short of the level achieved by the two-plasmid system (Fig S2A).
We next changed the origin of replication for the single plasmid system, opting for a temperature-sensitive origin (oriRlOl) so that the plasmid becomes curable after editing by moving from a permissive temperature (30°C) to a non-permissive temperature (37°C) (35,36). Interestingly, the editing rates using this single plasmid finally reached comparable levels to those of the previous the two-plasmid system (Fig S2A). This improvement in editing was not due to an effect of temperature, as we found similar editing rates with a temperature-insensitive version at both 30°C and 37°C (Fig S2B). An alternative possibility that is consistent with the data could be the effect of the different inducers used with the different plasmid backbones: m- toluic acid for RSF1010 derived plasmid and arabinose for the oriRlOl derived plasmid. We find that increasing concentrations of m-toluic acid have a negative effect on bacterial growth (Fig C. We do not exclude an additional effect of the plasmid copy number. Next, we optimized arabinose concentration (Fig S2D). Finally, we also studied the stability of the genetic system with retrons arrays of different length using a 5-day protocol in the presence or absence of the inducer (Fig S2E, F). Sequencing of the whole retron array harboring 2, 3 or 5 msds with different donors revealed that in most cases more than 80% of the colonies preserve an intact retron array after 5 days showing the robustness of the multitron technology.
With curable, single-plasmid parameters optimized, we next attempted to isolate clones that were simultaneously edited at distant regions of an individual genome (fbaH and hda). We found substantial editing of each target (-20%) and additionally found that the efficiency of editing could be increased to -45% with an additional day of editing, demonstrating the continuous nature of this approach (Fig 4C). Following editing, we cured the temperature sensitive editing plasmid from 96 individual colonies (48 after 24 hours and 48 after 48 hours) and sequenced the editing loci from each colony. The overall rates of editing at both sites and time points from the individual colonies closely matched the bulk sequencing data (Fig 4C). We also calculated the expected frequency of finding doubly edited colonies based on the product of the bulk rates at each site and found that the real frequency of doubly edited colonies (-4% after 24 hours and -22% after 48 hours) was exactly reflected in the real colony sequencing (Fig 4D).
We also investigated the background mutation rate of multitrons to evaluate the usefulness of the method when fidelity is required. Specifically, we measured the accumulation of local and global off-target mutations in E. coli bMS.346 genome in the presence or absence of RT activity. First, we constructed a dead RT version of the multitron targeting//^/// and hda genes which showed eliminated effective precise editing (Fig S3 A). Local off-target mutations were quantified by analyzing the 70 bp homology window of fbaH and hda donors in the chromosome for unintentional mutations. We found no difference in mutation frequency in the donor window in the live versus dead RT condition (approximately 5xl0'5errors/base, consistent with Illumina sequencing error; Fig S3B). Global off-target mutations were measured by comparing whole-genome sequencing of colonies after recombineering against with the parental strain. We found three mutations across the colonies in the live RT version (one of which appears to be a longer homologous recombination event between the plasmid araC and the genome araC) versus two mutations across the colonies in the dead RT version. The number of mutations is below what has been found previously with CspRecT alone (4 off- target mutations per genome) (33), so we conclude that the retron component is not adding substantively to off-target mutations.
Metabolic Engineering in Bacterial Genomes Using Multitrons
We next pushed toward a proof-of-concept use of multitrons in metabolic engineering by modifying bacterial genomes. First, we next assessed the ability of re-optimized, temperature-sensitive arrayed msd multitrons to simultaneously edit five positions (J da,fbaH, priB, rpoB and gyrA). All sites were precisely edited after 24h and editing continued to increase over the next 24h following a passage, illustrating the continuous nature of the retron-derived editing (Fig 5A).
To test multitrons in the context of metabolic engineering, we chose to focus on increasing production of lycopene by modifying genes in its biosynthetic pathway (Fig 5B). We selected eight bacterial genes which have been shown to affect lycopene yield (3,37-39) (Fig 5B). Five of them (t/x.s, idi, ispA, ispC, rpoS) were subjected to modification of their RBS regions to enhance their similarity to the canonical Shine-Dalgamo sequence (TAAGGAGGT) (40). The other three genes (gmpA, gdhA, fdhF) were specifically targeted for inactivation by the introduction of premature stop codons within their open reading frames.
We established a general workflow for metabolic engineering using multitrons (Fig 5c; Material and Methods). The multitron plasmid (MP) was generated using a one-pot golden gate approach (41) to clone arrayed msds encoding different donors. The MP was next transformed into the bacterial host harboring the lycopene plasmid (LP, a plasmid containing three essential genes (crtE, crtl, cr TB) required for lycopene production (42). Editing cycles were carried out at the permissive temperature (30°C), with dilutions of the culture after every cycle. Editing targets were sequenced in bulk using Illumina MiSeq to determine overall efficiencies. In parallel, cells were plated at 37°C to cure the MP. Finally, red colonies (indicative of lycopene) from the plates were selected for further quantification of lycopene production levels (Fig 5C). In total, we tested six different arrayed msd multitrons across this workflow, containing target gene donors in combinations that have been shown to increase lycopene yield (3). Editing rates were measured after cycles 1 and 3 of editing (24h and 72h, respectively) showing values that increase with time (Fig. 5D). After 72h of editing, the precise editing rates when making one or two mutations ranged from 10 to 40%. When making three or five mutations, editing rates were lower, which could be due to the known negative fitness effect (3) of these mutations on the bacterial growth (Fig 5D).
We measured relative lycopene production from 84 isolated red colonies after plating cultures on LB agar plates after editing (Fig 5E). In each case other than the control, individual colonies produced variable amounts of lycopene, likely resulting from the intended genotypic diversity generated by the editing. As an example, the most productive isolate after RBS optimization of dxs and idi genes increased lycopene production by more than 400% of control values, there was a second production cluster around 300% of control, and a final cluster around 200% of control (Fig 5E). We reasoned that these three different clusters may represent a single dxs mutation, a single idi mutation, and both together. To test that hypothesis, a representative of each cluster was selected and re-streaked for colonies, which were re-measured for lycopene and Sanger sequenced. Indeed, that the best producing isolate carried RBS mutations of both dxs and idi genes, second-best had only the dxs mutation, and the third-best had only the idi mutation (Fig 5F). This proof-of-concept was achieved with a single cloning reaction (one-pot Golden Gate) to generate a single plasmid and one course of editing, creating both single mutants and the double mutant. To generate this same result without multiplexing would require cloning two distinct editors for each of the sites, running parallel editing, genotyping, and quantification on each single edit. Then, curing the plasmid from an edited clone, adding the opposite plasmid to make the other edit, running another editing course, and finally quantifying the double mutant. Thus, a multiplexed experiment generates a diversity of genotypes and corresponding phenotypes across multiple sites simultaneously.
Multitrons with CRISPR Editing in Eukaryotic Cells
Given the success of the arrayed msd multitron in recombineering, we next sought to expand the utility of this technology to eukaryotic cells. Retron RT-Donors have been used in S. cerevisiae in combination with CRISPR Cas9 and gRNAs to install precise mutations via templated repair of a cut site (Fig 6A). The architecture of the donor element in yeast is typically a retron ncRNA fused to a CRISPR gRNA and scaffold, all surrounded by ribozymes to excise the editing elements from an mRNA. Given the goal of engineering a eukaryotic msd array, the relatively large, structured ribozymes present a potential engineering hurdle if they need to be multiply duplicated. Therefore, we first tested replacement of the ribozymes with Csy4 recognition sites and Csy4 nuclease expression by comparing a singleplex retron-derived precise editor of the ADE2 locus in the standard arrangement using ribozymes against an alternate version in which the flanking ribozymes were replaced by Csy4 recognition sites. In both cases, we tested editing with or without the inclusion of a Csy4 gene in an integrated, inducible, genomic cassette that also expresses the retron RT and Cas9. We found, as expected, no effect of Csy4 expression on the ribozyme version of the precise editor, but a dramatic effect of Csy4 expression on the alternate version with Csy4 sites. Precise editing nearly matched the efficiently of the ribozyme version with Csy4 expression, but was sharply reduced in its absence, indicating that processing of the non-coding elements is required and can be achieved using Csy4 (Fig 6A). We next tested a eukaryotic multitron based on an array of ncRNA/gRNAs targeting ADE2 and FAA1 for precise mutations of three base pairs each. For each site, the ncRNA encoding the donor for the site was fused to the gRNA for the same site. The two sites were separated by a Csy4 recognition site and the double ncRNA/gRNA array was surrounded by ribozymes (Fig 6B). Both sites were edited to nearly 100% in the presence of Csy4 expression. In the absence of Csy4, in contrast, the Nd site was edited to nearly 100%, while the ADE2 editing was sharply reduced. In our multitron, the ADE2 donor/gRNA was in the first position, suggesting that Csy4 processing is required on the 3’ end, adjacent to the gRNA scaffold, but dispensable on the 5’ end, adjacent to the msr.
Analogously to our bacterial editors, we verified that edits accumulate on genomes according to the product of the rates at each site. To this end, we compared bulk editing rates across the ADE2 and FAA1 sites to rates of edits in individual colonies. As in the bacterial experiments, colony sequencing matched bulk sequencing for both individual sites and for the expected frequency of double edits. We found that virtually all of the colonies sequenced contained the precise edits intended, consistent with the rates inferred from bulk Illumina amplicon sequencing (Fig 6C, D).
It is preferable to minimize the donor/gRNA unit for practical reasons of construction, just as in the prokaryotic version. Therefore, in a parallel to the prokaryotic msd array multitron, we engineered a eukaryotic msd/gRNA array multitron, transferring the msr to a distinct transcript to reduce editing unit size and avoid long direct repeats (Fig 6E). This enabled construction of multitrons of arbitrary size using efficient one-step golden gate cloning. The msd encoding the donor remains fused to its matched gRNA, while a trans msr is able to function as a primer to create the RT-Donor internally (Fig 6F). We tested versions of this eukaryotic arrayed msd/gRNA multitron to precisely edit two, three, or five non-adjacent sites simultaneously (Fig 6G-I). In each case, all targeted sites were edited at a rate that increased over time.
Finally, we sought to test whether the engineered eukaryotic msd/gRNA array multitron would enable precise genome editing in human cells. We adapted an approach for multiplexing pegRNA expression14, described initially to enable multiplexed prime and base editing, to enable the expression and processing of multiple retron msds and a single retron msr in trans. This yielded expression cassettes analogous to those developed for yeast editing, with tRNAs driving the processing of the msd/gRNA cassettes. We found that these engineered cassettes enabled the simultaneous precise edits of three non-adjacent sites in the human genome, from a single plasmid, in cultured HEK293T human cells (Fig 6J). Taken together, our data shows that the arrayed msd multitron with trans msr is a generalizable strategy for multiplexing edits within a genome.
Discussion
This work demonstrates the construction, optimization, and use of multitrons for multiplexed precise editing within genomes of prokaryotic and eukaryotic cells. Final versions make use of donor-encoding retron msd arrays. Critically, we engineered the msd array format by optimizing not only for editing efficiency, but also for enabling practical cellular and molecular workflows. The compact multitron form is compatible with single-plasmid designs, one-step golden gate assembly, and plasmid removal in prokaryotic cells. These features should permit widespread adoption of the multitron editing approach. A concurrent work has shown a similar approach, providing independent validation of the utility of multiplexed retrons for recombineering (43).
We demonstrate simultaneous editing of up to five sites, with replacements of up to 8 base pairs per site, and deletions of up to 100 bases. This approach builds on previous work using oligonucleotides for MAGE by enabling efficient multisite editing without repeated transformations and by enabling a user to specify distinct combinations of donors per cell rather than relying on the random segregation of electroporated oligos. Multitrons enable a wider range of precise mutations than multiplexed base editors, and a more compact and simplified form than multiplexed prime editors.
We found that the rate of combinatorial editing on a single genome was predicted by product of rates at each individual site. As the number of editing sites increased, the rate at each site decreased. Thus, for 4+ edits, the rate of achieving all mutations on a single genome can drop well below 1 : 1,000. Whether his rate is high enough will depend on the application. For instance, if edited cells are to be subjected to a selective phenotyping assay, the fact that combinatorial mutants are present in the population, even at low rates, is sufficient to enable quantification of enrichment or depletion. If, however, one needs to isolate a clone without phenotypic selection, we would recommend limiting the edits per round of editing to <3 at this time. Further development of the technology or the addition of simultaneous counter-selection will help drive the practical number of edits up in the future.
For contextualization to other technologies, one alternative is base editing, which can also be multiplexed. On the upside, Base Editors can reach efficiencies of over 80% (13) and can be multiplexed to more than 30 loci (14). However, it is important to note that only 2 of the 14 edits we made in bacteria and none of the edits made to yeast are suited to base editing. The deletions and RBS modifications that we made are a particularly salient example of a place where base editors fall short. MAGE is a more relevant comparison to the bacterial work and can achieve similar efficiencies, although with dramatically more hands on type to complete the multiple electroporation cycles. However, MAGE cannot be used to make the edits that we show in yeast or human cells (new to the revision) so as a technology, we would argue that the multitrons are a more universal platform.
The existing yeast genome editing toolbox is vast and spans from simple HR-based editing to more nuanced, multiplexed approaches that have enabled both trackable, genomewide phenotypic screens and targeted, saturation mutagenesis of individual ORFs (44-50). However, “trackable and multiplex” in this context has usually meant many changes across many genomes, with <1 change per genome, rather than >1 changes on an individual genome; and tools that do enable multiple changes per single genome typically do not support trackability of precise and varied edits or require involved and time-consuming workflows. In this sense, we believe that multitrons, in their ability to support multiple trackable and precise edits per individual genome, will naturally fit into the toolbox of yeast biologists in years to come.
We demonstrate proof-of-concept uses in molecular recording, genetic element minimization, and metabolic engineering.
Bibliography l.Shalem, O., Sanjana, N. E. & Zhang, F. High-throughput functional genomics using CRISPR-Cas9. Nat Rev Genet 16, 299-311 (2015).
2.Doench, J. G. Am I ready for CRISPR? A user’s guide to genetic screens. Nat Rev Genet 19, 67-80 (2018).
3.Wang, H. H. et al. Programming cells by multiplex genome engineering and accelerated evolution. Nature 460, 894-898 (2009).
4.Lajoie, M. J. et al. Genomically recoded organisms expand biological functions. Science 342, 357-60 (2013).
5.Nyerges, A. et al. Directed evolution of multiple genomic loci allows the prediction of antibiotic resistance. Proc Natl Acad Sci USA 115, E5726-E5735 (2018).
6. Barbieri, E. M., Muir, P., Akhuetie-Oni, B. O., Yellman, C. M. & Isaacs, F. J. Precise Editing at DNA Replication Forks Enables Multiplex Genome Engineering in Eukaryotes. Ce// 171, 1453-1467. el3 (2017). 7. Isaacs, F. J. et al. Precise manipulation of chromosomes in vivo enables genome-wide codon replacement. Science 333, 348-53 (2011).
8.Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420-4 (2016).
9.Gaudelli, N. M. et al. Programmable base editing of A»T to G»C in genomic DNA without DNA cleavage. Nature 551, 464-471 (2017).
10. Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149-157 (2019).
11.Tong, Y. et al. Highly efficient DSB-free base editing for streptomycetes with CRISPR- BEST. Proc Natl Acad Sci USA 116, 20366-20375 (2019).
12. Tong, Y., Jorgensen, T. S., Whitford, C. M., Weber, T. & Lee, S. Y. A versatile genetic engineering toolkit for E. coli based on CRISPR-prime editing. Nat Commun 12, 5206 (2021).
13.Volke, D. C., Martino, R. A., Kozaeva, E., Smania, A. M. & Nikel, P. I. Modular (de)construction of complex bacterial phenotypes by CRISPR/nCas9-assisted, multiplex cytidine base-editing. Nat Conunun 13, 3026 (2022).
14. Yuan, Q. & Gao, X. Multiplex base- and prime-editing with drive-and-process CRISPR arrays. Nat Commun 13, 2771 (2022).
15.Aulicino, F. et al. Highly efficient CRISPR-mediated large DNA docking and multiplexed prime editing using a single baculovirus. Nucleic Acids Res 50, 7783-7799 (2022).
16.Li, H. et al. Multiplex precision gene editing by a surrogate prime editor in rice. Mol Plant 15, 1077-1080 (2022).
17. Gao, L. et al. Diverse enzymatic activities mediate antiviral immunity in prokaryotes. Science 369, 1077-1084 (2020).
18.Millman, A. et al. Bacterial Retrons Function In Anti-Phage Defense. Cell 183, 1551— 1561. el2 (2020).
19. Mestre, M. R., Gonzalez-Delgado, A., Gutierrez-Rus, L. I., Martinez-Abarca, F. & Toro, N. Systematic prediction of genes functionally associated with bacterial retrons and classification of the encoded tripartite systems. Nucleic Acids Res 48, 12632-12647 (2020).
20.Bobonis, J. et al. Bacterial retrons encode phage-defending tripartite toxin-antitoxin systems. Nature 609, 144-150 (2022). 21.Yee, T., Furuichi, T., Inouye, S. & Inouye, M. Multicopy single-stranded DNA isolated from a gram-negative bacterium, Myxococcus xanthus. Cell 38, 203-9 (1984).
22. Inouye, S., Hsu, M. Y., Eagle, S. & Inouye, M. Reverse transcriptase associated with the biosynthesis of the branched RNA-linked msDNA in Myxococcus xanthus. Cell 56, 709-17 (1989).
23. Lampson, B. C., Inouye, M. & Inouye, S. Reverse transcriptase with concomitant ribonuclease H activity in the cell-free synthesis of branched RNA-linked msDNA of Myxococcus xanthus. Cell 56, 701-7 (1989).
24. Simon, A. J., Ellington, A. D. & Finkelstein, I. J. Retrons and their applications in genome engineering. Nucleic Acids Res 47, 11007-11019 (2019).
25.Farzadfard, F. & Lu, T. K. Genomically encoded analog memory with precise in vivo DNA writing in living cell populations. Science 346, 1256272 (2014).
26. Sharon, E. et al Functional Genetic Variants Revealed by Massively Parallel Precise Genome Editing. Cell 175, 544-557.el6 (2018).
27. Simon, A. J., Morrow, B. R. & Ellington, A. D. Retroelement-Based Genome Editing and Evolution. ACS Synth Biol 7, 2600-2611 (2018).
28. Schubert, M. G. et al. High-throughput functional variant screens via in vivo production of single-stranded DNA. Proc Natl Acad Sci USA 118, (2021).
29. Lopez, S. C., Crawford, K. D., Lear, S. K., Bhattarai-Kline, S. & Shipman, S. L. Precise genome editing across kingdoms of life using retron-derived DNA. Nat Chem Biol 18, 199- 206 (2022).
30. Jiang, W. et al. High-efficiency retron-mediated single-stranded DNA production in plants. Synth Biol (Oxf) 7, ysac025 (2022).
31. Fishman, C. B. et al. Continuous Multiplexed Phage Genome Editing Using Recombitrons. bioRxiv (2023)
32.Mosberg, J. A., Lajoie, M. J. & Church, G. M. Lambda red recombineering in Escherichia coli occurs through a fully single-stranded intermediate. Genetics 186, 791-9 (2010).
33.Wannier, T. M. et al. Improved bacterial recombineering by parallelized protein discovery . Proc Natl Acad Sci USA 117, 13689-13698 (2020).
34.Palka, C., Fishman, C. B., Bhattarai-Kline, S., Myers, S. A. & Shipman, S. L. Retron reverse transcriptase termination and phage defense are dependent on host RNase HL Nucleic Acids Res 50, 3490-3504 (2022). 35.Datsenko, K. A. & Wanner, B. L. One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc Natl Acad Sci U S A 97, 6640-5 (2000).
36. Ellington, A. J. & Reisch, C. R. Efficient and iterative retron-mediated in vivo recombineering in Escherichia coli. Synth Biol (Oxf) 7, ysac007 (2022).
37. Alper, H., Jin, Y. S., Moxley, J. F. & Stephanopoulos, G. Identifying gene targets for the metabolic engineering of lycopene biosynthesis in Escherichia coli. Metab Eng 7, 155-64 (2005).
38. Kang, M. J. et al. Identification of genes affecting lycopene accumulation n Escherichia coli using a shot-gun method. Biotechnol Bioeng 91, 636-42 (2005).
39. Jin, Y. S. & Stephanopoulos, G. Multi-dimensional gene target search for improving lycopene biosynthesis in Escherichia coli. Metab Eng 9, 337-47 (2007).
40. Chen, H., Bjerknes, M., Kumar, R. & Jay, E. Determination of the optimal aligned spacing between the Shine-Dalgarno sequence and the translation initiation codon of Escherichia coli mRNAs. Nucleic Acids Res 22, 4953-7 (1994).
41. Engler, C., Kandzia, R. & Marillonnet, S. A one pot, one step, precision cloning method with high throughput capability. PLoS One 3, e3647 (2008).
42. Cunningham, F. X. J., Sun, Z., Chamovitz, D., Hirschberg, J. & Gantt, E. Molecular structure and enzymatic function of lycopene cyclase from the cyanobacterium Synechococcus sp strain PCC7942. Plant Cell 6, 1107-21 (1994).
43. Liu, W. et al. Retron-mediated multiplex genome editing and continuous evolution in Escherichia coli. Nucleic Acids Res 51, 8293-8307, doi: 10.1093/nar/gkad607 (2023).
44. DiCarlo, J. et al. Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic Acids Res, 41 (7), 4336-4343 (2013).
45. DiCarlo, J. et al. Yeast Oligo-Mediated Genome Engineering (YOGE). ACS Synthetic Biology, 2 (12), 741-749 (2013).
46.Roy, K., et al. Multiplexed precision genome editing with trackable genomic barcodes in yeast. Nat Biotechnol 36, 512-520 (2018).
47. Guo, X., et al. High-throughput creation and functional profiling of DNA sequence variant libraries using CRISPR-Cas9 in yeast. Nat Biotechnol 36, 540-546 (2018).
48.Swiat, M. et al. /v/Cpfl : a novel and efficient genome editing tool for Saccharomyces cerevisiae, Nucleic Acids Res, 45 (21), 12585-12598 (2017).
49. Ferreira, R. et al. Multiplexed CRISPR/Cas9 Genome Editing and Gene Regulation Using Csy4 in Saccharomyces cerevisiae. ACS Synthetic Biology, 7 (1), 10-15 (2018). 50. Liang, Z. et al. Advanced eMAGE for highly efficient combinatorial editing of a stable genome. bioRxiv 2020.08.30.256743.
51. Niwa, H., Yamamura, K. & Miyazaki, J. Efficient selection for high-expression transfectants with a novel eukaryotic vector. Gene 108, 193-199 (1991).
52.Gietz, R. D. & Schiestl, R. H. High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat Protoc 2, 31-4 (2007).
The embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. Other embodiments may be utilized and formulation and method of using changes may be made without departing from the scope of the invention. The detailed description is not to be taken in a limiting sense, and the scope of the invention is defined only by the appended claims, along with the full scope of equivalents to which such claims are entitled.
It will be appreciated by those skilled in the art that changes could be made to the embodiments described above without departing from the broad inventive concept thereof. It is understood, therefore, that this invention is not limited to the particular embodiments disclosed, but it is intended to cover modifications within the spirit and scope of the present invention as defined by the present description.
Under no circumstances may the patent be interpreted to be limited to the specific examples or embodiments or methods specifically disclosed herein. Under no circumstances may the patent be interpreted to be limited by any statement made by any Examiner or any other official or employee of the Patent and Trademark Office unless such statement is specifically and without qualification or reservation expressly adopted in a responsive writing by Applicants.
The terms and expressions that have been employed are used as terms of description and not of limitation, and there is no intent in the use of such terms and expressions to exclude any equivalent of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention as claimed. Thus, it will be understood that although the present invention has been specifically disclosed by preferred embodiments and optional features, modification and variation of the concepts herein disclosed may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this invention as defined by the appended claims and statements of the invention. The invention has been described broadly and generically herein. Each of the narrower species and subgeneric groupings falling within the generic disclosure also form part of the invention. This includes the generic description of the invention with a proviso or negative limitation removing any subject matter from the genus, regardless of whether or not the excised material is specifically recited herein. In addition, where features or aspects of the invention are described in terms of Markush groups, those skilled in the art will recognize that the invention is also thereby described in terms of any individual member or subgroup of members of the Markush group.
All publications, patents, and patent applications, Genbank sequences, websites and other published materials referred to throughout the disclosure herein are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application, Genbank sequences, websites and other published materials was specifically and individually indicated to be incorporated by reference. In the event that the definition of a term incorporated by reference conflicts with a term defined herein, this specification shall control.

Claims

WHAT IS CLAIMED IS:
1. A multiplex engineered retron comprising: a) at least one msr gene encoding multicopy single-stranded RNA (msRNA); b) at least one msd gene encoding multicopy single-stranded DNA (msDNA); c) two or more heterologous sequences of interest; and d) a ret gene encoding a reverse transcriptase.
2. The engineered retron of claim 1, wherein the retron comprises two to five or more different heterologous sequences of interest.
3. The engineered retron of claim 1 or 2, wherein the retron comprises at least two msr genes and at least two msd genes.
4. The engineered retron of claim 1 or 2, wherein the retron comprises one msr gene and at least two msd genes.
5. The engineered retron of any of claims 1 to 4, wherein the heterologous sequences are inserted into the msr gene and/or the msd gene.
6. The engineered retron of claim 3 or 4, wherein each of the at least two msd genes independently comprise at least one heterologous sequence.
7. The engineered retron of any one of claims 1 to 6, wherein single-stranded DNA (msDNA) encoded by the msd gene comprises a msd stem loop, and where the loop comprises the heterologous sequence(s) of interest.
8. The engineered retron of any one of claims 1 to 7, wherein each heterologous sequence independently encodes a donor polynucleotide comprising a 5' homology arm that hybridizes to a 5' target sequence and a 3' homology arm that hybridizes to a 3' target sequence flanking a donor nucleotide sequence comprising an intended edit to be integrated at a target locus by homology directed repair (HDR) or recombineering.
9. The engineered retron of claim 8, wherein the edit is one or more gene replacements, gene knockouts, deletions, nested deletions, insertions, inversions, or point mutations.
10. The engineered retron of any one of claims 1 to 9, further comprising a modification which results in enhanced production of msDNA formed from the retron.
11. The engineered retron of any one of claims 1 to 10, wherein the heterologous sequence comprises a CRISPR protospacer DNA sequence.
12. The engineered retron of claim 11, wherein the CRISPR protospacer DNA sequence comprises a modified AAG protospacer adjacent motif (PAM).
13. The engineered retron of any one of claims 1 to 12, further comprising a barcode sequence.
14. The engineered retron of claim 13, wherein the barcode sequence is located in a hairpin loop of the msDNA.
15. The engineered retron of any one of claims 1 to 14, wherein the msr gene and the msd gene are provided in a trans arrangement or a cis arrangement.
16. The engineered retron of any one of claims 1 to 15, wherein the ret gene is provided in a trans arrangement with respect to the msr gene and/or the msd gene.
17. The engineered retron of any one of claims 1 to 16, wherein the msr gene, msd gene, and ret gene are a modified bacterial retron msr gene, msd gene, and ret gene.
18. The engineered retron of any one of claims 1 to 17, wherein the msr gene, msd gene, and ret gene are independently a modified myxobacteria retron, a modified Escherichia coli retron, a modified Salmonella enterica retron, or a modified Vibrio cholerae retron.
19. The engineered retron of claim 18, wherein the modified Escherichia coli retron is a modified EC83 or a modified EC86.
20. A vector system comprising one or more vectors comprising the engineered retron of any one of claims 1-19.
21. The vector system of claim 20, wherein the msr gene and the msd gene are provided by the same vector or different vectors.
22. The vector system of claim 20 or 21, wherein the msr gene, the msd gene, and the ret gene are provided by the same vector.
23. The vector system of any one of claims 20 to 22, wherein the same vector comprises a promoter operably linked to the msr gene and the msd gene.
24. The vector system of claim 23, wherein the promoter is further operably linked to the ret gene.
25. The vector system of claim 23, further comprising a second promoter operably linked to the ret gene.
26. The vector system of claim 20, wherein the msr gene, the msd gene, and the ret gene are provided by different vectors.
27. The vector system of any one of claims 20 to 26, wherein the one or more vectors are viral vectors or nonviral vectors.
28. The vector system of claim 27, wherein the nonviral vectors are plasmids.
29. The vector system of any one of claims 20 to 28, wherein the engineered retron comprises two or more heterologous sequences, wherein each heterologous sequence independently encodes a donor polynucleotide comprising a 5' homology arm that hybridizes to a 5' target sequence and a 3' homology arm that hybridizes to a 3' target sequence flanking a nucleotide sequence comprising an intended edit to be integrated at a target locus by homology directed repair (HDR) or recombineering.
30. The vector system of any one of claims 20 to 29, further comprising a vector encoding an RNA-guided nuclease.
31. The vector system of claim 30, wherein the RNA-guided nuclease is a Cas nuclease or an engineered RNA-guided Fokl-nuclease.
32. The vector system of claim 31, wherein the Cas nuclease is Cas9 or Cpfl .
33. The vector system of any one of claims 20 to 32, wherein the engineered retron comprises a CRISPR protospacer DNA sequence.
34. The vector system of any one of claims 20 to 33, further comprising a vector encoding a Casl and/or Cas2 protein.
35. The vector system of claim 34, further comprising a vector comprising a CRISPR array sequence.
36. The vector system of any one of claims 20 to 35, further comprising a vector encoding bacteriophage homologous recombination proteins.
37. The vector system of claim 36, wherein the vector encoding the bacteriophage homologous recombination proteins is a replication defective X prophage comprising the exo, bet, and gam genes.
38. An isolated host cell comprising the engineered retron of any one of claims 1 to 19 or the vector system of any one of claims 20 to 37.
39. The host cell of claim 38, wherein the host cell is a prokaryotic, archeon, or eukaryotic host cell.
40. The host cell of claim 39, wherein the eukaryotic host cell is a mammalian host cell.
41. The host cell of claim 40, wherein the mammalian host cell is a human host cell.
42. The host cell of any one of claims 38 to 41, wherein the host cell endogenously expresses or has been modified to express one or more single stand annealing proteins (SSAPs), one more single stranded DNA binding proteins (SSBs), one or more mutant mismatch repair proteins or combination thereof.
43. A kit comprising the engineered retron of any one of claims 1 to 19, the vector system of any one of claims 20 to 37, or the host cell of any one of claims 38 to 42.
44. The kit of claim 43, further comprising instructions for genetically modifying a cell with the engineered retron.
45. A multiplex method of genetically modifying a cell comprising: a) transfecting a cell with the engineered retron of any one of claims 8 to 19; b) introducing an RNA-guided nuclease and guide RNA into the cell, wherein the RNA-guided nuclease forms a complex with the guide RNA, said guide RNAs directing the complex to the genomic target locus, wherein the RNA-guided nuclease creates a double-stranded break in the genomic DNA at the genomic target locus, and the donor polynucleotide generated by the engineered retron is integrated at the genomic target locus recognized by its 5' homology arm and 3' homology arm by homology directed repair (HDR) to produce a genetically modified cell.
46. The method of claim 45, wherein the RNA-guided nuclease is a Cas nuclease or an engineered RNA-guided Fokl-nuclease.
47. The method of claim 46, wherein the Cas nuclease is Cas9 or Cpfl .
48. The method of any one of claims 41 to 47, wherein the RNA-guided nuclease is provided by a vector or a recombinant polynucleotide integrated into the genome of the cell.
49. The method of any one of claims 45 to 48, wherein the engineered retron is provided by a vector.
50. The method of any one of claims 45 to 49, wherein the donor polynucleotide is used to create two or more independent gene replacements, gene knockouts, deletions, nested deletions, insertions, inversions, or point mutations.
51. A multiplex method of genetically modifying a cell by recombineering, the method comprising: a) transfecting the cell with the engineered retron of claim 8 to 19; and b) introducing bacteriophage recombination proteins into the cell, wherein the bacteriophage recombination proteins mediate homologous recombination at a target locus such that the donor polynucleotide generated by the engineered retron is integrated at the target locus recognized by its 5' homology arm and 3' homology arm to produce a genetically modified cell.
52. The method of claim 51, wherein the donor polynucleotide is used to modify a plasmid, bacterial artificial chromosome (BAC), or a bacterial chromosome in the bacterial cell by recombineering.
53. The method of claim 51 or 52, wherein each donor polynucleotide can create a gene replacement, gene knockout, deletion, nested deletion, insertion, inversion, or point mutation.
54. The method of any one of claims 51 to 53, wherein said introducing bacteriophage recombination proteins into the cell comprises insertion of a replication-defective X prophage into the bacterial genome.
55. The method of any one of claims 51 to 54, wherein the bacteriophage comprises exo, bet, and gam genes.
56. A method of barcoding a cell comprising transfecting a cell with the engineered retron of any one of claims 13 to 19.
57. A multiplex method of producing an in vivo molecular recording system comprising: a) introducing a Casl protein or a Cas2 protein of a CRISPR adaptation system into a host cell; b) introducing a CRISPR array nucleic acid sequence comprising a leader sequence and at least one repeat sequence into the host cell, wherein the CRISPR array nucleic acid sequence is integrated into genomic DNA or a vector in the host cell; and c) introducing a plurality of engineered retrons according to any one of claims 1 to 19 into the host cell, wherein each retron comprises a different protospacer DNA sequence that can be processed and inserted into the CRISPR array nucleic acid sequence.
58. The method of claim 57, wherein the Casl protein or the Cas2 protein are provided by a vector.
59. The method of claim 57 or 58, wherein the engineered retron is provided by a vector.
60. The method of any one of claims 57 to 59, wherein the plurality of engineered retrons comprises at least three different protospacer DNA sequences.
61. An engineered cell comprising an in vivo molecular recording system comprising: a) a Casl protein or a Cas2 protein of a CRISPR adaptation system; b) a CRISPR array nucleic acid sequence comprising a leader sequence and at least one repeat sequence into the host cell, wherein the CRISPR array nucleic acid sequence is integrated into genomic DNA or a vector in the engineered cell; and c) a plurality of engineered retrons according to any one of claims 1 to 19, wherein each retron comprises a different protospacer DNA sequence that can be processed and inserted into the CRISPR array nucleic acid sequence.
62. The engineered cell of claim 61, wherein the Casl protein or the Cas2 protein are provided by a vector.
63. The engineered cell of claim 61 or 62, wherein the engineered retron is provided by a vector.
64. The engineered cell of any one of claims 61 to 63, wherein the plurality of engineered retrons comprises at least three different protospacer DNA sequences.
65. A kit comprising the engineered cell of any one of claims 61 to 64 and instructions for in vivo molecular recording.
66. A multiplex method of producing recombinant msDNA comprising: a) transfecting a host cell with the engineered retron of any one of claims 1 to 19 or the vector system of any one of claims 20 to 37; and b) culturing the host cell under suitable conditions, wherein the msDNA is produced.
67. An engineered retron ncRNA comprising: a) at least one msr gene encoding multicopy single-stranded RNA (msRNA); b) at least one msd gene encoding multicopy single-stranded DNA (msDNA); c) at least one guide RNA; and d) two or more repair templates.
68. The engineered retron of claim 67, wherein the retron comprises two to five different repair templates.
69. The engineered retron of claim 67 or 68, wherein the retron comprises at least two msr genes and at least two msd genes.
70. The engineered retron of claim 67 or 68, wherein the retron comprises one msr gene and at least two msd genes.
71. The engineered retron of any one of claims 67 to 70, wherein the repair templates are inserted into the msr gene and/or the msd gene.
72. The engineered retron of any one of claims 69 or 71, wherein each of the at least two msd genes comprise at least one repair template.
73. The engineered retron of any one of claims 67 to 72, wherein the at least one guide RNA is fused to the end of each msd gene.
74. The engineered retron of any one of claims 67 to 73, wherein each of the msd genes is separated by a Csy4 site.
75. The engineered retron of any one of claims 67 to 74, wherein single-stranded DNA (msDNA) encoded by the msd gene comprises a msd stem loop, and where the loop comprises the repair template.
76. The multiplex engineered retron of any one of claims 67 to 75, wherein the retron comprises: a) one msr gene; b) two to five msd genes; c) at least one guide RNA fused to the end of each msd gene; d) at least one repair template in each msd gene; and e) at least one Csy4 site.
77. The engineered retron of claim 76, wherein the msd genes are separated by the Csy4 site.
78. The engineered retron of any one of claims 67 to 77, wherein the guide RNA binds to a target genomic DNA.
79. The engineered retron of any one of claims 67 to 78, wherein the guide RNA binds to a target genomic DNA in a bacterial, yeast, or mammalian cell.
80. The engineered retron of claim 79, wherein the mammalian cell is a human cell.
81. The engineered retron of any one of claims 67 to 80, wherein the repair template binds to a target genomic DNA.
82. The engineered retron of any one of claims 67 to 81, wherein the repair template binds to a target genomic DNA in a bacterial, yeast, or mammalian cell.
83. The engineered retron of any one of claims 67 to 82, wherein the repair template binds to a target genomic DNA having at least one allele with a mutation or polymorphism.
84. The engineered retron of any one of claims 67 to 83, wherein the repair template comprises one or more non-complementary nucleotides to the repair templates target genomic DNA.
85. The engineered retron of any one of claims 67 to 84, wherein the repair template comprises two or more, or three or more non-complementary nucleotides to the repair templates target genomic DNA.
86. The engineered retron claim 84 or 85, wherein the non-complementary nucleotides are ‘repair’ nucleotides that can substitute for mutant, variant, or polymorphism nucleotides in the target genomic DNA.
87. A composition comprising a carrier and the engineered retron of any one of claims 67 to 86.
88. A multiplex method comprising administering the engineered retron of any one of claims 67 to 86, or the composition of claim 87 to a subject or to cell(s) from the subject.
89. The method of claim 88, wherein the subject has, or is suspected of having or developing a disease or condition.
90. The method of claim 89, wherein the disease or condition is cystic fibrosis, thalassemia, sickle cell anemia, Huntington's disease, diabetes, Duchenne's Muscular Dystrophy, Tay-Sachs Disease, Marfan syndrome, Alzheimer’s disease, Leber's hereditary optic atrophy (LHON), myoclonic epilepsy with ragged red fibers (MERRF), mitochondrial encephalopathy, lactic acidosis, and stroke-like episodes (MELAS; a type of dementia), obesity, cancers, brain ischemia, coronary disease, myocardial infarction, reperfusion hindrance of ischemic diseases, atopic dermatitis, psoriasis vulgaris, contact dermatitis, keloid, decubital ulcer, ulcerative colitis, Crohn's disease, nephropathy, glomerulosclerosis, albuminuria, nephritis, renal failure, rheumatoid arthritis, osteoarthritis, asthma, chronic obstructive pulmonary disease (COPD), and combinations thereof.
91. An expression cassette comprising a nucleotide sequence encoding the engineered retron of any one of claims 67 to 86, and optionally a nucleotide sequence encoding a retron reverse transcriptase.
92. The expression cassette of claim 91, wherein the nucleotide sequence encoding the engineered retron further comprises at least one promoter.
93. The expression cassette of claim 92, wherein the at least one promoter is a RNA polymerase III (pol III) promoter.
94. The expression cassette of claim 93, wherein the pol III promoter is a constitutive promoter.
95. The expression cassette of claim 93 or 94, wherein the pol III promoter is selected from SNR52, 7SK, U6, or Hl.
96. The expression cassette of any one of claims 93 to 95, wherein the msr gene is expressed from the pol III promoter.
97. The expression cassette of any one of claims 92 to 96, wherein the at least one promoter is a RNA polymerase II (pol II) promoter.
98. The expression cassette of claim 97, wherein the pol II promoter is an inducible promoter.
99. The expression cassette of claim 97 or 98, wherein the msd gene is expressed from the pol II promoter.
100. A vector comprising the expression cassette of any one of claims 91 to 99.
101. A composition comprising a carrier and the expression cassette of one of claims 91 to 99 or the vector of claim 100.
102. A multiplex method comprising administering the expression cassette of any one of claims 91 to 99 or the vector of claim 100, or the composition of claim 101 to a subject or to cell(s) from the subject.
103. The method of claim 102, wherein the subject has, or is suspected of having or developing a disease or condition.
104. The method of claim 103, wherein the disease or condition is cystic fibrosis, thalassemia, sickle cell anemia, Huntington's disease, diabetes, Duchenne's Muscular Dystrophy, Tay-Sachs Disease, Marfan syndrome, Alzheimer’s disease, Leber's hereditary optic atrophy (LHON), myoclonic epilepsy with ragged red fibers (MERRF), mitochondrial encephalopathy, lactic acidosis, and stroke-like episodes (MELAS; a type of dementia), obesity, cancers, brain ischemia, coronary disease, myocardial infarction, reperfusion hindrance of ischemic diseases, atopic dermatitis, psoriasis vulgaris, contact dermatitis, keloid, decubital ulcer, ulcerative colitis, Crohn's disease, nephropathy, glomerulosclerosis, albuminuria, nephritis, renal failure, rheumatoid arthritis, osteoarthritis, asthma, chronic obstructive pulmonary disease (COPD), and combinations thereof.
105. A multiplex gene editing system comprising: one or more vectors comprising one or more nucleotide sequences encoding the engineered retron of any one of claims 67 to 83, a retron reverse transcriptase, and a Cas nuclease.
106. The gene editing system of claim 105, wherein the retron reverse transcriptase and Cas nuclease are encoded as a fusion protein.
107. The gene editing system of claim 105 or 106, wherein the one or more vectors comprise one or more promoters.
108. The gene editing system any one of claims 105 to 107, wherein the guide RNA of the retron binds to a target genomic DNA.
109. The gene editing system any one of claims 105 to 108, wherein the guide RNA of the retron binds to a target genomic DNA in a bacterial, yeast, or mammalian cell.
110. The gene editing system any one of claims 105 to 109, wherein the guide RNA of the retron binds to a target genomic DNA in a mammalian cell.
111. The gene editing system of claim 110, wherein the mammalian cell is a human cell.
112. The gene editing system any one of claims 105 to 111, wherein the repair template of the retron binds to a target genomic DNA.
113. The gene editing system any one of claims 105 to 112, wherein the repair template of the retron binds to a target genomic DNA in a bacterial, yeast, or mammalian cell.
114. The gene editing system any one of claims 105 to 113, wherein the repair template of the retron binds to a target genomic DNA having at least one allele with a mutation or polymorphism.
115. The gene editing system any one of claims 105 to 114, wherein the repair template of the retron comprises one or more non-complementary nucleotides to the repair templates target genomic DNA.
116. The gene editing system any one of claims 105 to 115, wherein the repair template of the retron comprises two or more, or three or more non-complementary nucleotides to the repair templates target genomic DNA.
117. The gene editing system of claim 115 or 116, wherein the non-complementary nucleotides are ‘repair’ nucleotides that can substitute for mutant, variant, or polymorphism nucleotides in the target genomic DNA.
118. The gene editing system of any one of claims 107 to 117, wherein the one more promoters is a RNA polymerase III (pol III) promoter.
119. The gene editing system of claim 118, wherein the pol III promoter is a constitutive promoter.
120. The gene editing system of claim 118 or 119, wherein the pol III promoter is selected from SNR52, 7SK, U6, or Hl
121. The gene editing system of any one of claims 118 to 120, wherein the msr gene is expressed from the pol III promoter.
122. The gene editing system of any one of claims 107 to 121, wherein the one or more promoters is a RNA polymerase II (pol II) promoter.
123. The gene editing system of claim 122, wherein the pol II promoter is an inducible promoter.
124. The gene editing system of claim 122 or 123, wherein the msd gene is expressed from the pol II promoter.
125. The gene editing system of any one of claims 105 to 124, comprising a first vector encoding the retron and a second vector encoding the retron reverse transcriptase and Cas nuclease.
126. The gene editing system of any one of claims 105 to 125, wherein the Cas nuclease is selected is a Cas9 or Cpfl.
127. The gene editing system of any one of claims 105 to 126, wherein the Cas nuclease is SpCas9.
128. A composition comprising a carrier and the gene editing system of any one of claims 105 to 127.
129. A multiplex method comprising administering the gene editing system of any one of claims 105 to 127, or the composition of claim 128 to a subject or to cell(s) from the subject.
130. The method of claim 129, wherein the subject has, or is suspected of having or developing a disease or condition.
131. The method of claim 130, wherein the disease or condition is cystic fibrosis, thalassemia, sickle cell anemia, Huntington's disease, diabetes, Duchenne's Muscular Dystrophy, Tay-Sachs Disease, Marfan syndrome, Alzheimer’s disease, Leber's hereditary optic atrophy (LHON), myoclonic epilepsy with ragged red fibers (MERRF), mitochondrial encephalopathy, lactic acidosis, and stroke-like episodes (MELAS; a type of dementia), obesity, cancers, brain ischemia, coronary disease, myocardial infarction, reperfusion hindrance of ischemic diseases, atopic dermatitis, psoriasis vulgaris, contact dermatitis, keloid, decubital ulcer, ulcerative colitis, Crohn's disease, nephropathy, glomerulosclerosis, albuminuria, nephritis, renal failure, rheumatoid arthritis, osteoarthritis, asthma, chronic obstructive pulmonary disease (COPD), and combinations thereof.
132. A multiplex method of genetically editing one or more target sites in one or more cells, comprising:
(a) transfecting a population of cells with the expression cassette of any one of claims 91 to 99, or the gene editing system of any one of claims 105 to 127 to generate a population of transfected cells; and
(b) selecting one or more cells from the population of transfected cells as genetically edited cells.
133. The method of claim 132, wherein selecting one or more cells comprises generating colonies from individual transfected cells to provide isogenic individual colonies and selecting one or more precisely edited cells from at least one isogenic colony.
134. The method of claim 133, further comprising sequencing one or more genomic target sites in cells from one or more isogenic individual colonies to confirm that the genomic target sites in at least one of the isogenic individual colonies are precisely edited, thereby generating precisely edited cells.
135. The method of claim 133 or 134, further comprising administering a population of the precisely edited cells to a subject.
136. The method of claim 135, wherein the subject has, or is suspected of having or developing a disease or condition.
137. The method of claim 136, wherein the disease or condition is cystic fibrosis, thalassemia, sickle cell anemia, Huntington's disease, diabetes, Duchenne's Muscular Dystrophy, Tay-Sachs Disease, Marfan syndrome, Alzheimer’s disease, Leber's hereditary optic atrophy (LHON), myoclonic epilepsy with ragged red fibers (MERRF), mitochondrial encephalopathy, lactic acidosis, and stroke-like episodes (MELAS; a type of dementia), obesity, cancers, brain ischemia, coronary disease, myocardial infarction, reperfusion hindrance of ischemic diseases, atopic dermatitis, psoriasis vulgaris, contact dermatitis, keloid, decubital ulcer, ulcerative colitis, Crohn's disease, nephropathy, glomerulosclerosis, albuminuria, nephritis, renal failure, rheumatoid arthritis, osteoarthritis, asthma, chronic obstructive pulmonary disease (COPD), and combinations thereof.
PCT/US2024/036205 2023-06-30 2024-06-28 Multiplexed retron genome editing in prokaryotic and eukaryotic genomes Pending WO2025007020A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363524317P 2023-06-30 2023-06-30
US63/524,317 2023-06-30

Publications (1)

Publication Number Publication Date
WO2025007020A1 true WO2025007020A1 (en) 2025-01-02

Family

ID=93939943

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/036205 Pending WO2025007020A1 (en) 2023-06-30 2024-06-28 Multiplexed retron genome editing in prokaryotic and eukaryotic genomes

Country Status (1)

Country Link
WO (1) WO2025007020A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120843522A (en) * 2025-09-23 2025-10-28 济宁医学院 A precise editing method for universal virulent phage genomes

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023081756A1 (en) * 2021-11-03 2023-05-11 The J. David Gladstone Institutes, A Testamentary Trust Established Under The Will Of J. David Gladstone Precise genome editing using retrons
WO2023225358A1 (en) * 2022-05-20 2023-11-23 The Board Of Trustees Of The Leland Stanford Junior University Generation and tracking of cells with precise edits
WO2024044673A1 (en) * 2022-08-24 2024-02-29 The J. David Gladstone Institutes, A Testamentary Trust Established Under The Will Of J. David Gladstone Dual cut retron editors for genomic insertions and deletions

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023081756A1 (en) * 2021-11-03 2023-05-11 The J. David Gladstone Institutes, A Testamentary Trust Established Under The Will Of J. David Gladstone Precise genome editing using retrons
WO2023225358A1 (en) * 2022-05-20 2023-11-23 The Board Of Trustees Of The Leland Stanford Junior University Generation and tracking of cells with precise edits
WO2024044673A1 (en) * 2022-08-24 2024-02-29 The J. David Gladstone Institutes, A Testamentary Trust Established Under The Will Of J. David Gladstone Dual cut retron editors for genomic insertions and deletions

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHEN SHI-AN A., KERN ALEXANDER F., ANG ROY MOH LIK, XIE YIHUA, FRASER HUNTER B.: "Gene-by-environment interactions are pervasive among natural genetic variants", CELL GENOMICS, vol. 3, no. 4, 1 April 2023 (2023-04-01), pages 100273, XP093255953, ISSN: 2666-979X, DOI: 10.1016/j.xgen.2023.100273 *
GONZÁLEZ-DELGADO ALEJANDRO, LOPEZ SANTIAGO C., ROJAS-MONTERO MATÍAS, FISHMAN CHLOE B., SHIPMAN SETH L.: "Simultaneous multi-site editing of individual genomes using retron arrays", BIORXIV, 17 July 2023 (2023-07-17), XP093255956, Retrieved from the Internet <URL:https://pmc.ncbi.nlm.nih.gov/articles/PMC10370050/pdf/nihpp-2023.07.17.549397v1.pdf> DOI: 10.1101/2023.07.17.549397 *
LIM H ET AL.: "Multiplex Generation, Tracking, and Functional Screening of Substitution Mutants using a CRISPR/Retron System", ACS SYNTHETIC BIOLOGY, vol. 9, 2020, pages 1003 - 1009, XP093025284, DOI: 10.1021/acssynbio.0c00002 *
LIU WENQIAN, ZUO SIQI, SHAO YOURAN, BI KE, ZHAO JIARUN, HUANG LEI, XU ZHINAN, LIAN JIAZHANG: "Retron-mediated multiplex genome editing and continuous evolution in Escherichia coli", NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, GB, vol. 51, no. 15, 25 August 2023 (2023-08-25), GB , pages 8293 - 8307, XP093255959, ISSN: 0305-1048, DOI: 10.1093/nar/gkad607 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120843522A (en) * 2025-09-23 2025-10-28 济宁医学院 A precise editing method for universal virulent phage genomes

Similar Documents

Publication Publication Date Title
AU2022203759B2 (en) Delivery, engineering and optimization of systems, methods and compositions for sequence manipulation and therapeutic applications
RU2725502C2 (en) Delivery, construction and optimization of systems, methods and compositions for targeted action and modeling of diseases and disorders of postmitotic cells
EP3011030B1 (en) Optimized crispr-cas double nickase systems, methods and compositions for sequence manipulation
RU2716421C2 (en) Delivery, use and use in therapy of crispr-cas systems and compositions for targeted action on disorders and diseases using viral components
EP3011035B1 (en) Assay for quantitative evaluation of target site cleavage by one or more crispr-cas guide sequences
US20250043269A1 (en) Precise Genome Editing Using Retrons
EP2931899A1 (en) Functional genomics using crispr-cas systems, compositions, methods, knock out libraries and applications thereof
WO2015089419A9 (en) Delivery, use and therapeutic applications of the crispr-cas systems and compositions for targeting disorders and diseases using particle delivery components
CN106062197A (en) Delivery, engineering and optimization of tandem guide systems, methods and compositions for sequence manipulation
WO2023141602A2 (en) Engineered retrons and methods of use
US11866728B2 (en) Engineered retrons and methods of use
WO2025007020A1 (en) Multiplexed retron genome editing in prokaryotic and eukaryotic genomes
HK1252619B (en) Delivery, engineering and optimization of systems, methods and compositions for sequence manipulation and therapeutic applications
BR122024006902A2 (en) APPLICATION, MANIPULATION AND OPTIMIZATION OF SYSTEMS, METHODS AND COMPOSITIONS FOR SEQUENCE MANIPULATION AND THERAPEUTIC APPLICATIONS
HK1223645B (en) Optimized crispr-cas double nickase systems, methods and compositions for sequence manipulation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24833077

Country of ref document: EP

Kind code of ref document: A1