WO2024044767A2 - Recruitment of donor dna from in vivo assembled plasmids for saturation genome editing - Google Patents
Recruitment of donor dna from in vivo assembled plasmids for saturation genome editing Download PDFInfo
- Publication number
- WO2024044767A2 WO2024044767A2 PCT/US2023/072942 US2023072942W WO2024044767A2 WO 2024044767 A2 WO2024044767 A2 WO 2024044767A2 US 2023072942 W US2023072942 W US 2023072942W WO 2024044767 A2 WO2024044767 A2 WO 2024044767A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- binding domain
- nucleic acid
- donor
- dna
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/102—Mutagenizing nucleic acids
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61K—PREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
- A61K31/00—Medicinal preparations containing organic active ingredients
- A61K31/70—Carbohydrates; Sugars; Derivatives thereof
- A61K31/7088—Compounds having three or more nucleosides or nucleotides
- A61K31/711—Natural deoxyribonucleic acids, i.e. containing only 2'-deoxyriboses attached to adenine, guanine, cytosine or thymine and having 3'-5' phosphodiester links
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/90—Stable introduction of foreign DNA into chromosome
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B40/00—Libraries per se, e.g. arrays, mixtures
- C40B40/04—Libraries containing only organic compounds
- C40B40/06—Libraries containing nucleotides or polynucleotides, or derivatives thereof
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/90—Stable introduction of foreign DNA into chromosome
- C12N15/902—Stable introduction of foreign DNA into chromosome using homologous recombination
- C12N15/907—Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
Definitions
- Genome engineering is a key technology for biotechnology, agriculture, and medicine, enabling the development of microbial strains and crops with desirable properties and the generation of cell-based models and therapies for studying and treating disease.
- CRISPR nucleases have revolutionized genome engineering by allowing the efficient generation of single-strand breaks (nicks) or double-strand breaks (DSBs) at nearly any desired location in the genome. These DNA breaks can be harnessed to promote sequence alterations at the target site by either relying on error-prone machinery to “break” the gene or by coaxing the cellular machinery to install defined edits through template-driven processes or direct base modifications.
- DSB-based editing by homology-directed repair exhibits high efficiency in many cell types with active cell division, and for various organisms important for biotechnology and basic research such as S.cerevisiae.
- DSB-based editing exhibits both a high overall editing efficiency and the ability to introduce virtually any sequence change of arbitrarily small or large size.
- DSB-HDR CRISPR approaches suffer from low editing survival, particularly in cases where mismatch tolerance enables a guide to re-cleave the genomic target sequence after editing, and also suffers from undesired structural variant (SV) generation at particular genomic regions.
- the present disclosure provides methods and systems for improving the editing efficiency at target sites within a host cell genome.
- BRIEF SUMMARY Provided herein are methods and compositions for site-specific editing of DNA at a target site in the genome of a host cell. The methods and compositions provide advantages over previous methods by increasing the editing efficiency, fidelity and/or survival of cells comprising site-specific (targeted) genetic edits.
- the disclosure provides a method for increasing site-specific genomic editing efficiency, fidelity, and/or survival of a cell.
- the method comprises introducing into a cell: i) a first linear double stranded polynucleotide comprising a double stranded DNA (dsDNA) donor sequence and a nucleic acid sequence encoding a guide RNA (gRNA) operably linked to first promoter; and ii) a second linear double stranded polynucleotide; wherein the second linear double stranded polynucleotide is linked to the first linear double stranded polynucleotide by homology directed repair (HDR) or non-homologous end joining (NHEJ) to produce a circular donor-gRNA plasmid inside of the cell; wherein the cell comprises a CRISPR-associated (Cas) endonuclease or a nucleic acid sequence encoding the Cas endonuclea
- HDR homology directed repair
- the first linear double stranded polynucleotide further comprises a DNA binding domain recognition sequence.
- the DNA binding domain recognition sequence binds to a LexA DNA binding domain or a forkhead homolog 1 (FKH1) DNA binding domain.
- the first linear double stranded polynucleotide further comprises nucleotide sequences complementary to sequences adjacent to the dsDNA break site generated by the Cas endonuclease.
- the first promoter is constitutive. In some embodiments, the first promoter is inducible.
- the Cas endonuclease is a Cas9 or Cas12a endonuclease, or a modified variant thereof.
- the method for increasing site-specific genomic editing efficiency, fidelity, and/or survival of a cell further comprises introducing into the cell: iii) a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a second promoter, the retron ncRNA comprising: a. a stabilizing 5 ⁇ ribozyme sequence; b. one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences; c. an msr sequence; d.
- ncRNA retron structured non-coding RNA
- RNA binding domain recognition sequence is a MS2 stem loop sequence.
- the MS2 stem loop sequence binds to a MS2 coat protein (MCP) RNA binding domain.
- MCP MS2 coat protein
- the single stranded nucleic acid binding domain recognition sequence binds to a Cas endonuclease binding domain.
- the second promoter is constitutive. In some embodiments, the second promoter is inducible.
- the method for increasing site-specific genomic editing efficiency, fidelity, and/or survival of a cell further comprises introducing into the cell iv) a fusion protein comprising a) a DNA binding domain, an RNA binding domain and/or a single stranded nucleic acid binding domain and b) a dsDNA break site-localizing domain, or a nucleic acid encoding the fusion protein, wherein the fusion protein binds to the DNA binding domain recognition sequence of the circular donor-gRNA plasmid or the first linear double stranded polynucleotide prior to assembly, and/or the one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences of the retron, and binds to a dsDNA break site generated by the Cas endonuclease, thereby recruiting the dsDNA donor sequences and/or the ssDNA retron donor sequences to the dsDNA break in the genome
- the fusion protein comprises an dsDNA break site-localizing domain, wherein the dsDNA break site-localizing domain is a forkhead-associated (FHA) domain.
- the fusion protein comprises an RNA binding domain, wherein the RNA binding domain is a MCP RNA binding domain.
- the fusion protein comprises the MCP RNA binding domain and the FHA domain.
- the fusion protein comprises a DNA binding domain, wherein the DNA binding domain is a LexA DNA binding domain or an FKH1 DNA binding domain.
- the fusion protein comprises: (i) the LexA DNA binding domain or the FKH1 DNA binding domain, (ii) the MCP RNA binding domain, and (iii) the FHA domain in one of the following orders: (i), (ii), (iii); (i), (iii), (ii); (ii), (i), (iii), (i); (iii), (i), (iii); and (iii), (ii), (i), (i).
- the cell is a eukaryotic cell.
- the second linear double stranded polynucleotide further comprises a barcode sequence.
- the barcode sequence integrates into a designated barcode locus in the host cell genome.
- integration of the barcode sequence into the barcode locus comprises cleavage of the barcode locus genomic DNA by an endonuclease, wherein the endonuclease is operably linked to an inducible promoter.
- the second linear double stranded polynucleotide comprises a selectable marker.
- the second linear double stranded polynucleotide comprises both a barcode sequence and a selectable marker.
- the endonuclease is a Cas endonuclease or a homing endonuclease.
- the circular donor-RNA plasmid comprises sequences that are homologous to sequences flanking the endonuclease cleavage site, such that homologous recombination results in integration of the barcode sequence into the barcode locus.
- the endonuclease is a homing endonuclease, wherein the homing endonuclease is an I-SceI endonuclease operably linked to a GAL1 promoter.
- the barcode locus comprises nucleic acid sequences encoding the Cas endonuclease used for editing the target site, sequences encoding the RT, and/or sequences encoding the fusion protein flanked by the endonuclease cleavages sites, wherein expression of the endonuclease results in removal of the nucleic acid sequences encoding the Cas endonuclease, the nucleic acid sequences encoding the RT, and/or the nucleic acid sequences encoding the fusion protein, concomitant with the integration of the barcode sequence.
- a method for removing a plasmid which has integrated into an edited target locus in the genome of a cell comprises: i) a nucleic acid sequence encoding a guide RNA operably linked to first promoter; ii) a nucleic acid sequence encoding a homing endonuclease operably linked to a second inducible promoter; iii) a nucleic acid sequence encoding a Cas endonuclease operably linked to a third inducible promoter; iv) a nucleic acid sequence that is cleaved by the homing endonuclease; and/or v) a nucleic acid sequence that is cleaved by the Cas endonuclease; wherein the method comprises inducing expression of the homing endonuclease and/or the Cas endonuclease to cleave the
- the plasmid is the donor-gRNA plasmid disclosed above. In some embodiments, plasmid integration is accompanied by tandem repeat duplication of the donor sequence, removal of the plasmid results in recovery of a desired edit at the target locus.
- the homing endonuclease is an I-SceI endonuclease.
- the Cas endonuclease is Cas9, or a modified variant thereof. In some embodiments, the Cas endonuclease is SaCas9, or a modified variant thereof.
- the second and/or third promoters are GAL1 promoter inducible by galactose. In some embodiments, the second and/or third promoters are inducible by tetracycline or anhydrotetracycline (aTc). In some embodiments, the second promoter is GAL1 promoter that is inducible by galactose and the third promoter is inducible by tetracycline or anhydrotetracycline (aTc).
- the plasmid further comprises (vi) a barcode sequence that is flanked by (iv) the nucleic acid sequence that is cleaved by the homing endonuclease and/or (v) the nucleic acid sequence that is cleaved by the Cas endonuclease.
- inducing expression of the homing endonuclease and/or the Cas endonuclease results in integration of the barcode sequence at a barcode locus, while simultaneously removing the plasmid from the edited target locus.
- a method for multiplexed editing of DNA in cells comprising introducing into the cells: i) a guide RNA that binds a target site in the genomic DNA in the cells; ii) a library of linear double stranded donor polynucleotides comprising a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a promoter, wherein the retron ncRNA comprising: a. an optional stabilizing 5 ⁇ ribozyme sequence; b. one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences; c. an msr sequence; d.
- ncRNA retron structured non-coding RNA
- an msd sequence e. a donor sequence for homology directed repair (HDR) inserted within the msd sequence, wherein each donor sequence in the library introduces a different edit at the target site that binds the guide RNA; and, f.
- HDR homology directed repair
- a linear double stranded polynucleotide wherein the linear double stranded polynucleotide of (iii) is linked in vivo to the linear recombinant double stranded polynucleotide of (ii) by homology directed repair (HDR) or non- homologous end joining (NHEJ) to produce a circular donor plasmid; and iv) a fusion protein comprising an RNA binding domain or single stranded nucleic acid binding domain connected to a DNA break site-localizing domain, or a nucleic acid encoding the fusion protein; wherein the cells comprise a Cas endonuclease or a nucleic acid encoding the same, and a retron-specific reverse transcriptase (RT) or a nucleic acid encoding the same; wherein the fusion protein binds to the one or more RNA binding domain recognition
- the RNA binding domain recognition sequence is a MS2 stem loop sequence.
- the MS2 stem loop sequence binds to a MS2 coat protein (MCP) RNA binding domain.
- MCP MS2 coat protein
- the single stranded nucleic acid binding domain recognition sequence binds to a Cas endonuclease binding domain.
- the locus surrounding the dsDNA break site accumulates phosphothreonine (pT) modified proteins recognized by forkhead-associated (FHA) domains.
- the fusion protein comprises an dsDNA break site-localizing domain, wherein the dsDNA break site localizing domain is an FHA domain.
- the fusion protein comprises a MCP RNA binding domain and an FHA domain.
- the linear double stranded donor polynucleotide of (ii) further comprises a DNA binding domain recognition sequence.
- the nucleic acid binding domain recognition sequence binds to a LexA DNA binding domain or a forkhead homolog 1 (FKH1) DNA binding domain.
- the fusion protein further comprises a LexA DNA domain or a FKH1 DNA binding domain.
- the LexA DNA domain or the FKH1 DNA binding domain is located between the MCP RNA binding domain and the FHA domain.
- the fusion protein forms a complex with the circular plasmid and the dsDNA break site, thereby recruiting the circular plasmid to the DNA break and enhancing HDR.
- the linear double stranded donor polynucleotide of (ii) further comprises nucleotide sequences complementary to sequences adjacent to the dsDNA break.
- the promoter is a constitutive promoter.
- the Cas endonuclease is a Cas9 or Cas12a endonuclease, or a modified variant thereof.
- the plurality of cells are eukaryotic cells.
- the editing efficiency, fidelity, and/or survival is improved compared to a method that does not include in vivo plasmid assembly to produce a circular plasmid.
- the linear double stranded polynucleotide of (iii) further comprises a barcode sequence.
- the linear double stranded polynucleotide of (iii) comprises a selectable marker.
- the disclosure provides a system for editing DNA at a target site in the genome of a cell, comprising: (i) a first linear double-stranded donor polynucleotide comprising a nucleic acid sequence encoding a guide RNA operably linked to a first promoter and a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a second promoter, the retron ncRNA comprising: a. an optional stabilizing 5’ ribozyme sequence; b. one or more RNA binding domain recognition sequences or one or more single- stranded nucleic acid binding domain recognition sequences; c. an msr sequence; d.
- a first linear double-stranded donor polynucleotide comprising a nucleic acid sequence encoding a guide RNA operably linked to a first promoter and a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to
- an msd sequence e. a donor sequence for homology directed repair (HDR) inserted within the msd sequence; and f. a first inverted repeat sequence and a second inverted repeat sequence; (ii) a second linear double-stranded polynucleotide, and (iii) a fusion protein comprising an RNA binding domain or single-stranded nucleic acid binding domain connected to a dsDNA break site-localizing domain, or a nucleic acid encoding the fusion protein.
- the second linear double stranded polynucleotide comprises a selectable marker.
- the system further comprises a cell that comprises a CRISPR- associated (Cas) endonuclease or a nucleic acid encoding the same, and a retron-specific reverse transcriptase (RT) or a nucleic acid encoding the same.
- the fusion protein binds to the one or more RNA binding domain recognition sequences or one or more single-stranded nucleic acid binding domain recognition sequences and binds to a dsDNA break site generated by the Cas endonuclease at the target site.
- retron ncRNA expressed by (i) is reverse transcribed by the RT in vivo to produce multiple ssDNA molecules comprising ssDNA retron donor sequences, wherein individual ssDNA molecules bind to the fusion protein to produce a complex between the linear ssDNA molecules, the fusion protein, and the dsDNA break site, thereby recruiting the ssDNA retron donor sequence to the dsDNA break site and promoting editing by HDR.
- the second linear double-stranded polynucleotide comprises a selectable marker.
- MAGESTIC 3.0 combining double-stranded plasmid donor recruitment by LexA-FHA, single-stranded donor DNA recruitment by retron- amplification of donor DNA with MS2-FHA, and HDR-based plasmid assembly.
- the 3 different components of MAGESTIC 3.0 plasmid donor recruitment, retron production and recruitment, and in vivo plasmid assembly
- Editing can be performed with constitutive expression of the guide RNA, Cas9, and donor machinery, such that editing can take place immediately after transformation as cells form colonies on agar plates.
- NGG protospacer adjacent motifs PAMs
- TTTV PAMs Two different windows rich in NGG protospacer adjacent motifs (PAMs) and TTTV PAMs were chosen for SpCas9 (20- bp guides) and LbCas12a (23-bp guides), respectively.
- Each guide was paired with a library of donor DNAs with all possible SNVs across the target sequence including the PAM.
- NGS next-generation sequencing
- All three SNVs at each position are combined into a single column.
- the arrows at the top of the plot denote the position and directionality of the guides, with PAMs for SpCas9 and LbCas12a represented by the end and beginning of each arrow, respectively.
- SpCas9-NG PMCID: PMC6368452
- SpG Cas9 PMC7297043
- SpRY Cas9 PMC7297043
- impLbCas12a PMC7144938 recognizes a wide array of T/C-rich PAMs, including the TTTV (recognized by WT LbCas12a), TNTN, TACV, TTCV, TCCV, CTCV, CCCV, and VTTV.
- TTTV cognized by WT LbCas12a
- TNTN TNTN
- TACV TACV
- TTCV TTCV
- TCCV TCCV
- CTCV CTCV
- CCCV CTCV
- CCCV CTCV
- CCCV CCCV
- VTTV VTTV
- the guide-donor plasmid is integrated at the target sites with introduction of a direct repeat of a donor sequence. Introducing a couple of nuclease cleavage sites within the guide-donor plasmid allows for excision of these plasmids and recovery of the desired edits.
- 3-primer PCRs were designed for both the upstream junction (USJ) and downstream junction (DSJ) of the integrated plasmid.
- USJ upstream junction
- DSJ downstream junction
- a primer internal to the guide-donor plasmid was designed to yield 1.5 to 2 kb for loci with integrated guide-donor plasmid and 3114 bp for intact locus.
- the DSJ internal primer was designed 3.7 to 4.2 kb for loci with integrated guide-donor plasmid.
- the DSJ and USJ products were designed to be shorter and longer than the intact locus product, respectively, to ensure that biases in PCR efficiency due to length of the products did not have a major impact on the interpretation of plasmid integration rates.
- the data are from the 5FC-treated cultures (stage 5) to demonstrate the total SNV levels at the final stage of MAGESTIC.
- the position of each SNV (relative to the start of each window) is shown on the x-axis, and the total fraction of SNV edits at each position is shown on the y-axis. (i.e. the sum of the edit fractions for the 3 possible SNVs at each position).
- the PAMs are separated according to whether they are canonical (top) or non-canonical (bottom). The upper right corner indicates the total fraction of SNV-edited sequence observed in the sample.
- a entity or “an” entity refers to one or more of that entity.
- a nucleic acid molecule refers to one or more nucleic acid molecules.
- the terms “a”, “an”, “one or more” and “at least one” can be used interchangeably.
- the terms “comprising”, “including” and “having” can be used interchangeably.
- the term “about” means a range of values including the specified value, which a person of ordinary skill in the art would consider reasonably similar to the specified value. In embodiments, about means within a standard deviation using measurements generally acceptable in the art. In embodiments, about means a range extending to +/- 10% of the specified value (e.g., +/- 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or 10% of the specified value). In embodiments, about means the specified value.
- the term “genome editing” refers to a type of genetic engineering in which DNA is inserted, replaced, or removed from a target DNA (e.g., the genome of a cell) using one or more nucleases and/or nickases.
- the nucleases create specific double-strand breaks (DSBs) at desired locations in the genome and harness the cell's endogenous mechanisms to repair the induced break by homology-directed repair (HDR) (e.g., homologous recombination) or by nonhomologous end joining (NHEJ).
- HDR homology-directed repair
- NHEJ nonhomologous end joining
- DNA nuclease refers to an enzyme capable of cleaving the phosphodiester bonds between the nucleotide subunits of DNA and may be an endonuclease or an exonuclease.
- the DNA nuclease may be an engineered (e.g., programmable or targetable) DNA nuclease which can be used to induce genome editing of a target DNA sequence.
- Any suitable DNA nuclease can be used including, but not limited to, CRISPR-associated protein (Cas) nucleases, other endo- or exo-nucleases, variants thereof, fragments thereof, and combinations thereof.
- Cas CRISPR-associated protein
- DSB double-strand cut refers to the severing or cleavage of both strands of the DNA double helix.
- the DSB may result in cleavage of both stands at the same position leading to “blunt ends” or staggered cleavage resulting in a region of single-stranded DNA at the end of each DNA fragment, or “sticky ends”.
- a DSB may arise from the action of one or more DNA nucleases.
- NHEJ nonhomologous end joining
- NHEJ nonhomologous end joining
- HDR homology-directed repair
- HDR homologous recombination
- retron is used in accordance with its plain ordinary meaning and refers to a DNA sequence found in the genome of many bacteria species that codes for reverse transcriptase (RT) and a unique single-stranded DNA/RNA hybrid called multicopy single-stranded DNA (msDNA).
- RT reverse transcriptase
- msDNA multicopy single-stranded DNA
- the retron msr-msd RNA is the non-coding RNA produced by retron elements and is the immediate precursor to the synthesis of msDNA.
- the retron msr RNA folds into a characteristic secondary structure that contains a conserved guanosine residue at the end of a stem loop.
- Synthesis of DNA by the retron-encoded reverse transcriptase (RT) results in a DNA/RNA chimera which is composed of small single-stranded DNA linked to small single-stranded RNA.
- the RNA strand is joined to the 5 ⁇ end of the DNA chain via a 2 ⁇ –5 ⁇ phosphodiester linkage that occurs from the 2 ⁇ position of the conserved internal guanosine residue.
- the retron operon carries a promoter sequence P that controls the synthesis of an RNA transcript carrying three loci: msr, msd, and ret.
- the ret gene product processes the msd/msr portion of the RNA transcript into msDNA.
- Retron elements are about 2 kb long. They contain a single operon controlling the synthesis of an RNA transcript carrying three loci, msr, msd, and ret, that are involved in msDNA synthesis.
- the DNA portion of msDNA is encoded by the msd region, the RNA portion is encoded by the msr region, while the product of the ret open-reading frame is a reverse transcriptase (RT) similar to the RTs produced by retroviruses and other types of retroelements.
- RT reverse transcriptase
- the retron RT contains seven regions of conserved amino acids, including a highly conserved tyr-ala-asp-asp (YADD) sequence associated with the catalytic core.
- the ret gene product is responsible for processing the msd/msr portion of the RNA transcript into msDNA.
- reverse transcriptase refers to its plain and ordinary meaning as an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription.
- polypeptide and “protein” refer to a polymer of amino acid residues and are not limited to a minimum length.
- peptides are included within the definition. Both full length proteins and fragments thereof are encompassed by the definition.
- the terms also include post expression modifications of the polypeptide, for example, glycosylation, acetylation, phosphorylation, hydroxylation, and the like.
- a "polypeptide” refers to a protein which includes modifications, such as deletions, additions and substitutions to the native sequence, so long as the protein maintains the desired activity. These modifications may be deliberate, as through site directed mutagenesis, or may be accidental, such as through mutations of hosts which produce the proteins or errors due to PCR amplification.
- single stranded nucleic acid binding domain refers to a polypeptide or aptamer that preferentially binds to specific sequences of single stranded DNA or single stranded RNA.
- Single stranded nucleic acid binding domain recognition domains of polypeptides include, but are not limited to, Cas endonucleases such as Cas13 or Cas14, oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds, such as in such as human POT1, Schizosaccharomyces pombe Pot1, Sterkiella nova TEPB, CspB protein from Bacillus caldolyticus and Bacillus subtilis; K homology (KH) domains, such as in KH domain- containing proteins include heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)-binding protein (FBP), and poly(C)-binding proteins (PCBP) 1 and 2; RNA recognition motifs (RRMs) which bind DNA such as in FBP-interacting repressor (FIR), hnRNP A1, and hnRNP D (also known as Auf1); and whirly
- RNA binding domain refers to a polypeptide or aptamer that preferentially binds to specific sequences of a single stranded or double stranded RNA which, in the case of a polypeptide, can include the entire protein or a functional portion thereof.
- RNA binding domains include an MS2 coat protein (MCP), Pumilio (PUF), RNA Recognition Motif (RRM), Double-Stranded RNA-Binding Domain (dsRBD), Zinc finger (ZF) Domains (CCHH zinc fingers: TFIIIA, CCCH zinc fingers, CCHC zinc knuckles, RanBP2-type ZFs), Z-alpha, arginine/glycine rich (RGG) domains, or K Homology (KH) Domain, and Poly(A) Binding Proteins.
- MCP MS2 coat protein
- PEF Pumilio
- RRM RNA Recognition Motif
- dsRBD Double-Stranded RNA-Binding Domain
- ZF Zinc finger Domains
- CCHH zinc fingers: TFIIIA, CCCH zinc fingers, CCHC zinc knuckles, RanBP2-type ZFs Zinc finger
- ZF Zinc finger
- RNA binding domain recognition sequence refers to the RNA sequence to which an RNA binding domain preferentially binds.
- DNA break localizing domain refers to a polypeptide that preferentially binds to regions of DNA damage and/or DNA repair proteins which can include the entire protein or a functional portion thereof.
- Non-limiting examples of DNA break localizing domains include 14-3-3 proteins, WW domains, Polo-box domains (in PLK1), WD40 repeats (including those in the E3 ligase SCF ⁇ TrCP), BRCT domains (including those in BRCA1) and FHA domains (such as in Fkh1p, CHK2 and MDC1). Other examples are provided in Tables 1-5 (see below).
- sequence specific endonuclease refers to an enzyme that cleaves at a specific sequence within a polynucleotide sequence.
- the nuclease activity can be partially or completed inhibited, so that only one of the two strands or neither strand is cleaved,.
- sequence specific endonucleases include CRISPR associated (Cas) nuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
- Cas9 encompasses type II clustered regularly interspaced short palindromic repeats (CRISPR) system of Cas9 endonucleases from any species, and also includes biologically active fragments, variants, analogs, and derivatives thereof that retain Cas9 endonuclease activity (i.e., catalyze site-directed cleavage of DNA to generate double- strand breaks).
- CRISPR clustered regularly interspaced short palindromic repeats
- a Cas9 endonuclease binds to and cleaves DNA at a site comprising a sequence complementary to its bound guide RNA (gRNA).
- gRNA bound guide RNA
- a Cas9 polynucleotide, nucleic acid, oligonucleotide, protein, polypeptide, or peptide refers to a molecule derived from any source. The molecule need not be physically derived from an organism but may be synthetically or recombinantly produced. Cas9 sequences from a number of bacterial species are well known in the art and listed in the National Center for Biotechnology Information (NCBI) database.
- NCBI National Center for Biotechnology Information
- SpCas9 is a Cas9 from Streptococcus pyogenes (WP_002989955, WP_038434062, WP_011528583), or a variant thereof.
- SaCas9 is a Cas9 from Staphylococcus aureus (WP_001573634), or a variant thereof.
- sequences or a variant thereof comprising a sequence having at least about 70-100% sequence identity thereto, including any percent identity within this range, such as 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity thereto, can be used for genome editing, as described herein, wherein the variant retains biological activity, such as Cas9 site-directed endonuclease activity. See also Fonfara et al. (2014) Nucleic Acids Res.
- Cas12 encompasses a subtype of Cas12 proteins, previously known as Cpf1, and an RNA-guided endonuclease that forms part of the CRISPR system in some bacteria and archaea.
- Cas12a is distinguished from Cas9 by a its single RuvC endonuclease active site, its 5' protospacer adjacent motif preference, and for creating sticky rather than blunt ends at the cut site.
- LbCas12a is a Cas12a from Lachnospiraceae bacterium (ND2006), is the widely used orthologue for targeted mutagenesis.
- derivative is intended any suitable modification of the native polypeptide of interest, of a fragment of the native polypeptide, or of their respective analogs, such as glycosylation, phosphorylation, polymer conjugation (such as with polyethylene glycol), or other addition of foreign moieties, as long as the desired biological activity of the native polypeptide is retained.
- fragment is intended a molecule consisting of only a part of the intact full-length sequence and structure.
- the fragment can include a C-terminal deletion, an N- terminal deletion, and/or an internal deletion of the polypeptide.
- Active fragments of a particular protein or polypeptide will generally include at least about 5-10 contiguous amino acid residues of the full length molecule, preferably at least about 15-25 contiguous amino acid residues of the full length molecule, and most preferably at least about 20-50 or more contiguous amino acid residues of the full length molecule, or any integer between 5 amino acids and the full length sequence, provided that the fragment in question retains biological activity, such as Cas9 site- directed endonuclease activity.
- “Substantially purified” generally refers to isolation of a substance (compound, polynucleotide, nucleic acid, protein, polypeptide, polypeptide composition) such that the substance comprises the majority percent of the sample in which it resides.
- a substantially purified component comprises 50%, preferably 80%-85%, more preferably 90-95% of the sample.
- Techniques for purifying polynucleotides and polypeptides of interest are well-known in the art and include, for example, ion-exchange chromatography, affinity chromatography and sedimentation according to density.
- isolated is meant, when referring to a polypeptide, that the indicated molecule is separate and discrete from the whole organism with which the molecule is found in nature or is present in the substantial absence of other biological macro molecules of the same type.
- polynucleotide is a nucleic acid molecule devoid, in whole or part, of sequences normally associated with it in nature; or a sequence, as it exists in nature, but having heterologous sequences in association therewith; or a molecule disassociated from the chromosome.
- polynucleotide oligonucleotide
- nucleic acid and “nucleic acid molecule” are used herein to include a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. This term refers only to the primary structure of the molecule.
- the term includes triple-, double- and single-stranded DNA, as well as triple- , double- and single-stranded RNA. It also includes modifications, such as by methylation and/or by capping, and unmodified forms of the polynucleotide.
- polynucleotide examples include polydeoxyribonucleotides (containing 2-deoxy-D-ribose), polyribonucleotides (containing D- ribose), any other type of polynucleotide which is an N- or C-glycoside of a purine or pyrimidine base, and other polymers containing nonnucleotidic backbones, for example, polyamide (e.g., peptide nucleic acids (PNAs)) and polymorpholino (commercially available from the Anti-Virals, Inc., Corvallis, Oreg., as Neugene) polymers, and other synthetic sequence-specific nucleic acid polymers providing that the polymers contain nucleobases in a configuration which allows for base pairing and base stacking, such as is found in DNA and RNA.
- PNAs peptide nucleic acids
- polynucleotide oligonucleotide
- nucleic acid nucleic acid molecule
- these terms include, for example, 3 ⁇ -deoxy-2',5 ⁇ -DNA, oligodeoxyribonucleotide N3 ⁇ P5 ⁇ phosphoramidates, 2'-O-alkyl-substituted RNA, double- and single-stranded DNA, as well as double- and single-stranded RNA, microRNA, DNA:RNA hybrids, and hybrids between PNAs and DNA or RNA, and also include known types of modifications, for example, labels which are known in the art, methylation, "caps," substitution of one or more of the naturally occurring nucleotides with an analog (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine
- an analog e.g., 2-aminoadenosine, 2-thiothymidine
- the term also includes locked nucleic acids (e.g., comprising a ribonucleotide that has a methylene bridge between the 2'-oxygen atom and the 4'-carbon atom).
- locked nucleic acids e.g., comprising a ribonucleotide that has a methylene bridge between the 2'-oxygen atom and the 4'-carbon atom.
- hybridize and “hybridization” refer to the formation of complexes between nucleotide sequences which are sufficiently complementary to form duplexes via Watson-Crick base pairing.
- identity refers to an exact nucleotide to nucleotide or amino acid to amino acid correspondence of two polynucleotides or polypeptide sequences, respectively. Percent identity can be determined by a direct comparison of the sequence information between two molecules by aligning the sequences, counting the exact number of matches between the two aligned sequences, dividing by the length of the shorter sequence, and multiplying the result by 100. Readily available computer programs can be used to aid in the analysis, such as ALIGN, Dayhoff, M.O.
- percent identity of a particular nucleotide sequence to a reference sequence can be determined using the homology algorithm of Smith and Waterman with a default scoring table and a gap penalty of six nucleotide positions.
- Another method of establishing percent identity in the context of the present disclosure is to use the MPSRCH package of programs copyrighted by the University of Edinburgh, developed by John F. Collins and Shane S. Sturrok, and distributed by IntelliGenetics, Inc. (Mountain View, CA). From this suite of packages, the Smith Waterman algorithm can be employed where default parameters are used for the scoring table (for example, gap open penalty of 12, gap extension penalty of one, and a gap of six).
- BLAST Altschul et al.
- homology can be determined by hybridization of polynucleotides under conditions which form stable duplexes between homologous regions, followed by digestion with single stranded specific nuclease(s), and size determination of the digested fragments.
- DNA sequences that are substantially homologous can be identified in a Southern hybridization experiment under, for example, stringent conditions, as defined for that particular system. Defining appropriate hybridization conditions is within the skill of the art. See, e.g., Sambrook et al., supra; DNA Cloning, supra; Nucleic Acid Hybridization, supra.
- the term "homologous region” refers to a region of a nucleic acid with homology to another nucleic acid region.
- a "homologous region” is present in a nucleic acid molecule is determined with reference to another nucleic acid region in the same or a different molecule.
- the term "homologous, region,” as used herein, refers to the ability of nucleic acid molecules to hybridize to each other.
- a single-stranded nucleic acid molecule can have two homologous regions which are capable of hybridizing to each other.
- the term "homologous region” includes nucleic acid segments with complementary sequences.
- Homologous regions may vary in length but will typically be between 4 and 500 nucleotides (e.g., from about 4 to about 40, from about 40 to about 80, from about 80 to about 120, from about 120 to about 160, from about 160 to about 200, from about 200 to about 240, from about 240 to about 280, from about 280 to about 320, from about 320 to about 360, from about 360 to about 400, from about 400 to about 440, etc.).
- the terms “complementary” or “complementarity” refers to polynucleotides that are able to form base pairs with one another.
- Base pairs are typically formed by hydrogen bonds between nucleotide units in an anti-parallel orientation between polynucleotide strands.
- Complementary polynucleotide strands can base pair in a Watson- Crick manner (e.g., A to T, A to U, C to G), or in any other manner that allows for the formation of duplexes.
- uracil (U) rather than thymine (T) is the base that is considered to be complementary to adenosine.
- uracil U
- thymine thymine
- the ability to substitute a thymine is implied, unless otherwise stated.
- “Complementarity” may exist between two RNA strands, two DNA strands, or between a RNA strand and a DNA strand. It is generally understood that two or more polynucleotides may be “complementary” and able to form a duplex despite having less than perfect or less than 100% complementarity. Two sequences are "perfectly complementary” or “100% complementary” if at least a contiguous portion of each polynucleotide sequence, comprising a region of complementarity, perfectly base pairs with the other polynucleotide without any mismatches or interruptions within such region.
- Two or more sequences are considered “perfectly complementary” or “100% complementary” even if either or both polynucleotides contain additional non-complementary sequences as long as the contiguous region of complementarity within each polynucleotide is able to perfectly hybridize with the other.
- "Less than perfect” complementarity refers to situations where less than all of the contiguous nucleotides within such region of complementarity are able to base pair with each other. Determining the percentage of complementarity between two polynucleotide sequences is a matter of ordinary skill in the art.
- a gRNA may comprise a sequence "complementary" to a target sequence (e.g., major or minor allele), capable of sufficient base-pairing to form a duplex (i.e., the gRNA hybridizes with the target sequence). Additionally, the gRNA may comprise a sequence complementary to a sequence adjacent to a PAM sequence, wherein the gRNA also hybridizes with the sequence adjacent to a PAM sequence in a target DNA.
- a "target site” or “target sequence” is the nucleic acid sequence recognized (i.e., sufficiently complementary for hybridization) by a guide RNA (gRNA) or a homology arm of a donor polynucleotide.
- the target site may be allele-specific (e.g., a major or minor allele).
- target edit site or “target edit locus” or “edit locus” refer to a target site in the host cell genome comprising a nucleic acid sequence recognized by a guide RNA (gRNA) or a homology arm of a donor polynucleotide that is or was edited by the methods of the disclosure.
- gRNA guide RNA
- barcode refers to a DNA sequence used to identify a target molecular during DNA sequencing. A barcode generally is about 20 bp in length, but also can be around 10-100 bp.
- a barcode constitutes a random or pseudo-random DNA sequence within the insert fragment used for in vivo plasmid assembly. As illustrated in Figures 2b and 2c, in vivo plasmid assembly results the insert fragment comprising the barcode linked to the guide-donor fragment, thereby allowing a simple sequencing step to identify a particular donor and/or guide sequences in each cell.
- the term “barcode locus” refers to a locus in the host cell genome where a barcode of the disclosure is integrated. The barcode locus can be at a different location in the host cell genome than the target site.
- the term “subject expression sequence” refers to any polynucleotide of any length and any sequence that can be transcribed into RNA.
- the subject expression sequence is a polynucleotide inserted within the msd region of the retron non-coding RNA (ncRNA) which is converted to complementary DNA (cDNA) during reverse transcription.
- ncRNA retron non-coding RNA
- cDNA complementary DNA
- the subject expression sequence is a donor polynucleotide.
- donor polynucleotide or “donor sequence” refers to a polynucleotide that provides a sequence of an intended edit to be integrated into the genome at a target locus by HDR.
- homology arm is meant a portion of a donor polynucleotide that is responsible for targeting the donor polynucleotide to the genomic sequence to be edited in a cell.
- the donor polynucleotide typically comprises a 5 ⁇ homology arm that hybridizes to a 5 ⁇ genomic target sequence and a 3 ⁇ homology arm that hybridizes to a 3 ⁇ genomic target sequence flanking a nucleotide sequence comprising the intended edit to the genomic DNA, with the positive or plus strand of the double helix (also called Watson strand) used arbitrarily as the reference.
- the homology arms are referred to herein as 5 ⁇ and 3 ⁇ (i.e., upstream and downstream) homology arms, which relates to the relative position of the homology arms to the nucleotide sequence comprising the intended edit within the donor polynucleotide.
- the 5 ⁇ and 3 ⁇ homology arms hybridize to regions within the target locus in the genomic DNA to be modified, which are referred to herein as the "5 ⁇ target sequence” and "3 ⁇ target sequence,” respectively.
- the nucleotide sequence comprising the intended edit is integrated into the genomic DNA by HDR at the genomic target locus recognized (i.e., sufficiently complementary for hybridization) by the 5 ⁇ and 3 ⁇ homology arms.
- administering a nucleic acid, such as a retron, a nucleic acid encoding a fusion of an RNA binding domain or single stranded nucleic acid binding domain and DNA break localizing domain, guide RNA, or nucleic acid encoding a protein such as a endonuclease, reverse transcriptase or fusion protein of the disclosure, to a cell comprises transforming, transducing, transfecting, electroporating, translocating, fusing, phagocytosing, shooting or ballistic methods, etc., i.e., any means by which a nucleic acid can be transported across a cell membrane.
- a gRNA will bind to a substantially complementary sequence and not to unrelated sequences.
- a gRNA that "selectively binds" to a particular allele such as a particular mutant allele (e.g., allele comprising a substitution, insertion, or deletion), denotes a gRNA that binds preferentially to the particular target allele, but to a lesser extent to a wild-type allele or other sequences.
- a gRNA that selectively binds to a particular target DNA sequence will selectively direct binding of an RNA-guided nuclease (e.g., Cas9) to a substantially complementary sequence at the target site and not to unrelated sequences.
- an RNA-guided nuclease e.g., Cas9
- the term “recombination target site” denotes a region of a nucleic acid molecule comprising a binding site or sequence-specific motif recognized by a site-specific recombinase that binds at the target site and catalyzes recombination of specific sequences of DNA at the target site.
- Site-specific recombinases catalyze recombination between two such target sites.
- label and “detectable label” refer to a molecule capable of detection, including, but not limited to, radioactive isotopes, fluorescers, chemiluminescers, chromophores, enzymes, enzyme substrates, enzyme cofactors, enzyme inhibitors, semiconductor nanoparticles, dyes, metal ions, metal sols, ligands (e.g., biotin, streptavidin or haptens) and the like.
- fluorescer refers to a substance or a portion thereof which is capable of exhibiting fluorescence in the detectable range.
- labels which may be used in the practice of the present disclosure include, but are not limited to, SYBR green, SYBR gold, a CAL Fluor dye such as CAL Fluor Gold 540, CAL Fluor Orange 560, CAL Fluor Red 590, CAL Fluor Red 610, and CAL Fluor Red 635, a Quasar dye such as Quasar 570, Quasar 670, and Quasar 705, an Alexa Fluor such as Alexa Fluor 350, Alexa Fluor 488, Alexa Fluor 546, Alexa Fluor 555, Alexa Fluor 594, Alexa Fluor 647,and Alexa Fluor 784, a cyanine dye such as Cy 3, Cy3.5, Cy5, Cy5.5, and Cy7, fluorescein, 2', 4', 5 ⁇ , 7'- tetrachloro-4-7-dichloro
- Recombinant as used herein to describe a nucleic acid molecule means a polynucleotide of genomic, cDNA, viral, semisynthetic, or synthetic origin which, by virtue of its origin or manipulation, is not associated with all or a portion of the polynucleotide with which it is associated in nature.
- the term "recombinant” as used with respect to a protein or polypeptide means a polypeptide produced by expression of a recombinant polynucleotide.
- the gene of interest is cloned and then expressed in transformed organisms, as described further below. The host organism expresses the foreign gene to produce the protein under expression conditions.
- transformation refers to the insertion of an exogenous polynucleotide into a host cell, irrespective of the method used for the insertion. For example, direct uptake, transduction or f-mating are included.
- the exogenous polynucleotide may be maintained as a non-integrated vector, for example, a plasmid, or alternatively, may be integrated into the host genome.
- Recombinant host cells refer to cells which can be, or have been, used as recipients for recombinant vector or other transferred DNA, and include the original progeny of the original cell which has been transfected.
- a "coding sequence” or a sequence which "encodes" a selected polypeptide is a nucleic acid molecule which is transcribed (in the case of DNA) and translated (in the case of mRNA) into a polypeptide in vivo when placed under the control of appropriate regulatory sequences (or "control elements").
- the boundaries of the coding sequence can be determined by a start codon at the 5 ⁇ (amino) terminus and a translation stop codon at the 3 ⁇ (carboxy) terminus.
- a coding sequence can include, but is not limited to, cDNA from viral, prokaryotic or eukaryotic mRNA, genomic DNA sequences from viral or prokaryotic DNA, and even synthetic DNA sequences.
- a transcription termination sequence may be located 3 ⁇ to the coding sequence.
- the coding sequence may be interrupted by introns which can be self- splicing group I or group II introns or those which are spliced out by the host cell splicing machinery,
- Typical "control elements” include, but are not limited to, transcription promoters, transcription enhancer elements, introns (located anywhere in the transcript), transcription termination signals, polyadenylation sequences (located 3 ⁇ to the translation stop codon), sequences for optimization of initiation of translation (located 5 ⁇ to the coding sequence), and translation termination sequences.
- "Operably linked” refers to an arrangement of elements wherein the components so described are configured so as to perform their usual function.
- a given promoter operably linked to a coding sequence is capable of effecting the expression of the coding sequence when the proper enzymes are present.
- the promoter need not be contiguous with the coding sequence, so long as it functions to direct the expression thereof.
- intervening untranslated yet transcribed sequences can be present between the promoter sequence and the coding sequence and the promoter sequence can still be considered “operably linked” to the coding sequence.
- "Expression cassette” or "expression construct” refers to an assembly which is capable of directing the expression of the sequence(s) or gene(s) of interest.
- An expression cassette generally includes control elements, as described above, such as a promoter which is operably linked to (so as to direct transcription of) the sequence(s) or gene(s) of interest, and often includes a polyadenylation sequence as well.
- the expression cassette described herein may be contained within a plasmid or viral vector construct (e.g., a vector for genome modification comprising a genome editing cassette comprising a promoter operably linked to a polynucleotide encoding a guide RNA and a donor polynucleotide).
- the construct may also include, one or more selectable markers, a signal which allows the construct to exist as single stranded DNA (e.g., a M13 origin of replication), at least one multiple cloning site, and a "mammalian" origin of replication (e.g., a SV40 or adenovirus origin of replication) or “yeast” origin of replication (e.g. a 2-micron vector or centromeric vector with an autonomously replicating sequence (ARS)).
- a signal which allows the construct to exist as single stranded DNA
- a M13 origin of replication e.g., a M13 origin of replication
- a “mammalian" origin of replication e.g., a SV40 or adenovirus origin of replication
- yeast origin of replication e.g. a 2-micron vector or centromeric vector with an autonomously replicating sequence (ARS)
- ARS autonomously replicating sequence
- transfection techniques are generally known in the art. See, e.g., Graham et al. (1973) Virology, 52:456, Sambrook et al. (2001) Molecular Cloning, a laboratory manual, 3rd edition, Cold Spring Harbor Laboratories, New York, Davis et al. (1995) Basic Methods in Molecular Biology, 2nd edition, McGraw-Hill, and Chu et al. (1981) Gene 13:197.
- Such techniques can be used to introduce one or more exogenous nucleic acids moieties into suitable host cells.
- the term refers to both stable and transient uptake of the genetic material and includes uptake of peptide- or antibody-linked nucleic acids.
- a “vector” is capable of transferring nucleic acid sequences to target cells (e.g., viral vectors, non-viral vectors, particulate carriers, and liposomes).
- target cells e.g., viral vectors, non-viral vectors, particulate carriers, and liposomes.
- vector construct e.g., viral vectors, non-viral vectors, particulate carriers, and liposomes.
- expression vector e transfer vector
- the term includes cloning and expression vehicles, as well as plasmid and viral vectors.
- variant refer to biologically active derivatives of the reference molecule that retain desired activity, such as site-directed Cas9 endonuclease activity.
- variant and analog refer to compounds having a native polypeptide sequence and structure with one or more amino acid additions, substitutions (generally conservative in nature) and/or deletions, relative to the native molecule, so long as the modifications do not destroy biological activity and which are "substantially homologous" to the reference molecule as defined below.
- amino acid sequences of such analogs will have a high degree of sequence homology to the reference sequence, e.g., amino acid sequence homology of more than 50%, generally more than 60%-70%, even more particularly 80%-85% or more, such as at least 90%-95% or more, when the two sequences are aligned.
- the analogs will include the same number of amino acids but will include substitutions, as explained herein.
- the term "mutein” further includes polypeptides having one or more amino acid-like molecules including but not limited to compounds comprising only amino and/or imino molecules, polypeptides containing one or more analogs of an amino acid (including, for example, unnatural amino acids, etc.), polypeptides with substituted linkages, as well as other modifications known in the art, both naturally occurring and non-naturally occurring (e.g., synthetic), cyclized, branched molecules and the like.
- the term also includes molecules comprising one or more N-substituted glycine residues (a "peptoid") and other synthetic amino acids or peptides.
- amino acids are generally divided into four families: (1) acidic -- aspartate and glutamate; (2) basic -- lysine, arginine, histidine; (3) non-polar -- alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan; and (4) uncharged polar -- glycine, asparagine, glutamine, cysteine, serine threonine, and tyrosine. Phenylalanine, tryptophan, and tyrosine are sometimes classified as aromatic amino acids.
- the polypeptide of interest may include up to about 5-10 conservative or non-conservative amino acid substitutions, or even up to about 15-25 conservative or non-conservative amino acid substitutions, or any integer between 5-25, so long as the desired function of the molecule remains intact.
- Gene transfer or “gene delivery” refers to methods or systems for reliably inserting DNA or RNA of interest into a host cell. Such methods can result in transient expression of non-integrated transferred DNA, extrachromosomal replication and expression of transferred replicons (e.g., episomes), or integration of transferred genetic material into the genomic DNA of host cells.
- Gene delivery expression vectors include, but are not limited to, vectors derived from bacterial plasmid vectors, viral vectors, non-viral vectors, adenoviruses, retroviruses, alphaviruses, pox viruses, and vaccinia viruses.
- the term "derived from” is used herein to identify the original source of a molecule but is not meant to limit the method by which the molecule is made which can be, for example, by chemical synthesis or recombinant means.
- a polynucleotide "derived from” a designated sequence refers to a polynucleotide sequence which comprises a contiguous sequence of approximately at least about 6 nucleotides, preferably at least about 8 nucleotides, more preferably at least about 10-12 nucleotides, and even more preferably at least about 15-20 nucleotides corresponding, i.e., identical or complementary to, a region of the designated nucleotide sequence.
- the derived polynucleotide will not necessarily be derived physically from the nucleotide sequence of interest, but may be generated in any manner, including, but not limited to, chemical synthesis, replication, reverse transcription or transcription, which is based on the information provided by the sequence of bases in the region(s) from which the polynucleotide is derived. As such, it may represent either a sense or an antisense orientation of the original polynucleotide.
- subject includes both vertebrates and invertebrates, including, without limitation, mammals, including human and non-human mammals such as non-human primates, including chimpanzees and other apes and monkey species; laboratory animals such as mice, rats, rabbits, hamsters, guinea pigs, and chinchillas; domestic animals such as dogs and cats; farm animals such as sheep, goats, pigs, horses and cows; and birds such as domestic, wild and game birds, including chickens, turkeys and other gallinaceous birds, ducks, geese, and the like.
- mammals including human and non-human mammals such as non-human primates, including chimpanzees and other apes and monkey species
- laboratory animals such as mice, rats, rabbits, hamsters, guinea pigs, and chinchillas
- domestic animals such as dogs and cats
- farm animals such as sheep, goats, pigs, horses and cows
- birds such as domestic, wild and game birds,
- the methods of the present disclosure find use in experimental animals, in veterinary application, and in the development of animal models for disease, including, but not limited to, rodents including mice, rats, and hamsters; primates, and transgenic animals.
- rodents including mice, rats, and hamsters; primates, and transgenic animals.
- the terms “subject,” “individual,” and “patient” are used interchangeably herein to refer to a vertebrate, such as a mammal. Mammals include, but are not limited to, murines, simians, humans, farm animals, sport animals, and pets. Tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed.
- Genetic disease refers to a disease, partially or completely, directly or indirectly, caused by one or more abnormalities in the genome, especially a condition that is present from birth.
- the abnormality may be a mutation, an insertion or a deletion.
- the abnormality may affect the coding sequence of the gene or its regulatory sequence.
- the genetic disease may be selected from the group consisting of an inherited muscle disease (e.g., congenital myopathy or a muscular dystrophy), a lysosomal storage disease, a heritable disorder of connective tissue, a neurodegenerative disorder, and a skeletal dysplasia.
- the genetic disease may be, but is not limited to, Duchenne muscular dystrophy (DMD), Becker's muscular dystrophy, Lamb-girdle muscular dystrophy, dysferlinopathy, dystroglycanopathy, aspartylglucosaminuria, Batten disease, cystinosis, Fabry disease, Gaucher disease, Pompe disease, Tay Sachs disease, Sandhoff disease, metachromatic leukodystrophy, mucolipidosis, mucopolysaccharide storage diseases, Niemann-Pick disease, Schindler disease, Krabbe disease, Ehlers-Danlos syndrome, epidermolysis bullosa, Marfan syndrome, neurofibromatosis, spinal muscular atrophy, amyotrophic lateral sclerosis, progressive muscular atrophy, fragile X syndrome, Charcot-Marie-Tooth disease, osteogenesis imperfecta, achondroplasia, or osteopetrosis.
- DMD Duchenne muscular dystrophy
- Becker's muscular dystrophy Lamb-girdle muscular dys
- ribozyme refers to an RNA molecule that is capable of catalyzing a biochemical reaction.
- ribozymes function in protein synthesis, catalyzing the linking of amino acids in the ribosome.
- ribozymes participate in various other RNA processing functions, such as splicing, viral replication, and tRNA biosynthesis.
- ribozymes can be self-cleaving.
- Non-limiting examples of ribozymes include the HDV ribozyme, the Lariat capping ribozyme (formally called GIR1 branching ribozyme), the glmS ribozyme, group I and group II self-splicing introns, the hairpin ribozyme, the hammerhead ribozyme, various rRNA molecules, RNase P, the twister ribozyme, the VS ribozyme, the pistol ribozyme, and the hatchet ribozyme.
- the HDV ribozyme the Lariat capping ribozyme (formally called GIR1 branching ribozyme)
- the glmS ribozyme group I and group II self-splicing introns
- the hairpin ribozyme the hammerhead ribozyme
- various rRNA molecules RNase P
- the twister ribozyme the VS ribozyme
- ribozyme-containing R2 elements examples include the self- cleaving ribozyme-containing R2 elements, the L1Tc retrotransposon found in Trypanosoma cruzi, short interspaced nuclear elements (SINEs) in Schistosomes, Penelope-like elements and retrozymes.
- SINEs short interspaced nuclear elements
- Penelope-like elements retrozymes.
- ribozymes see, e.g., Doherty, et al. Ann. Rev. Biophys. Biomol. Struct. 30: 457-475 (2001) and Weinberg, et al., Nucleic Acids Research, (47) 18: 9480–9494 (2019); incorporated herein by reference in its entirety for all purposes.
- administering includes oral administration, topical contact, administration as a suppository, intravenous, intraperitoneal, intramuscular, intralesional, intrathecal, intranasal, or subcutaneous administration to a subject. Administration is by any route, including parenteral and transmucosal (e.g., buccal, sublingual, palatal, gingival, nasal, vaginal, rectal, or transdermal). Parenteral administration includes, e.g., intravenous, intramuscular, intra-arteriole, intradermal, subcutaneous, intraperitoneal, intraventricular, and intracranial.
- Administering also refers to delivery of material, including biological material such as nucleic acids and/or proteins, into cells by transformation, transfection, transduction, ballistic methods, electroporation, or injection (e.g., intravenous, intramuscular, intra-arteriole, intradermal, subcutaneous, intraperitoneal, intraventricular, and intracranial injection).
- material including biological material such as nucleic acids and/or proteins
- administering also refers to delivery of material, including biological material such as nucleic acids and/or proteins, into cells by transformation, transfection, transduction, ballistic methods, electroporation, or injection (e.g., intravenous, intramuscular, intra-arteriole, intradermal, subcutaneous, intraperitoneal, intraventricular, and intracranial injection).
- treating refers to an approach for obtaining beneficial or desired results including, but not limited to, a therapeutic benefit and/or a prophylactic benefit.
- therapeutic benefit is meant any therapeutically relevant improvement in or effect on one or more diseases, conditions, or symptoms under
- compositions may be administered to a subject at risk of developing a particular disease, condition, or symptom, or to a subject reporting one or more of the physiological symptoms of a disease, even though the disease, condition, or symptom may not have yet been manifested.
- effective amount or “sufficient amount” refers to the amount of an agent that is sufficient to effect beneficial or desired results.
- the therapeutically effective amount may vary depending upon one or more of: the subject and disease condition being treated, the weight and age of the subject, the severity of the disease condition, the manner of administration and the like, which can readily be determined by one of ordinary skill in the art.
- the specific amount may vary depending on one or more of: the particular agent chosen, the host cell type, the location of the host cell in the subject, the dosing regimen to be followed, whether it is administered in combination with other compounds, timing of administration, and the physical delivery system in which it is carried.
- pharmaceutically acceptable carrier refers to a substance that aids the administration of an active agent to a cell, an organism, or a subject.
- “Pharmaceutically acceptable carrier” refers to a carrier or excipient that can be included in the compositions of the invention and that causes no significant adverse toxicological effect on the patient.
- heterologous refers to biological material that is introduced, inserted, or incorporated into a recipient (e.g., host) organism that originates from another organism. Typically, the heterologous material that is introduced into the recipient organism (e.g., a host cell) is not normally found in that organism.
- Heterologous material can include, but is not limited to, nucleic acids, amino acids, peptides, proteins, and structural elements such as genes, promoters, and cassettes.
- a host cell can be, but is not limited to, a bacterium, a yeast cell, a mammalian cell, or a plant cell.
- the introduction of heterologous material into a host cell or organism can result, in some instances, in the expression of additional heterologous material in or by the host cell or organism.
- the transformation of a yeast host cell with an expression vector that contains DNA sequences encoding a bacterial protein may result in the expression of the bacterial protein by the yeast cell.
- the incorporation of heterologous material may be permanent or transient.
- heterologous material may be permanent or transient.
- Methods for editing DNA in a cell [0110] Most biological traits are complex and controlled by genetic variants across the genome. To fully unravel their genetic architectures, methods that can systematically dissect the causal variants in each locus in a comprehensive and scalable fashion are preferred. Towards this goal, the inventors previously developed a donor DNA-based CRISPR system for engineering and functionally profiling thousands of genetic variants in pooled screens termed Multiplexed Accurate Genome Editing with Short, Trackable, Integrated Cellular barcodes (MAGESTIC).
- MAGESTIC Multiplexed Accurate Genome Editing with Short, Trackable, Integrated Cellular barcodes
- MAGESTIC 3.0 which combines three orthogonal enhancements for homology-directed repair (HDR): donor DNA recruitment with a DNA break-site binding domain, single-stranded donor DNA synthesis with the bacterial retron system, and in vivo assembly of linearized donor plasmids.
- HDR homology-directed repair
- Each system functions at different stages in the editing process to improve editing outcomes by increasing the fraction of correctly edited cells, and reducing the fraction of non- edited and aberrantly edited cells.
- the retron produces multiple copies of single-stranded DNA (ssDNA), which accumulate in the cell and improve editing outcomes over multiple generations.
- the linearized guide-donor plasmids provide an optimal donor DNA template immediately upon transformation, prior to the buildup of retron template in the cell, and thereby enhancing editing survival for guides with higher cleavage efficacy.
- Donor recruitment brings either the double-stranded or single-stranded DNA templates in close proximity to the target site to improve HDR efficiency.
- MAGESTIC 3.0 improves editing efficiency to the highest overall levels of any system, nearly completely inhibiting both the toxicity associated with editing as well structural variant formation at susceptible target sites.
- the target locus is in the genome of a cell.
- the methods provide the advantage of combining in vivo plasmid assembly, donor recruitment, and retron donor DNA generation.
- the inventors demonstrate that each editing system improves edit outcomes at different sites in different ways.
- the integration of all three systems results in substantially improved editing at all sites measured and in multiple distinct assays, enabling effective editing of structural variant prone regions and saturation editing across entire genomic loci for the first time.
- the disclosure provides a method for increasing editing efficiency, fidelity, and/or survival of an edited cell, the method comprising introducing into a cell: i) a linear double stranded polynucleotide comprising a double stranded DNA (dsDNA) donor sequence and a nucleic acid sequence encoding a guide RNA operably linked to first promoter; and ii) a linear double stranded polynucleotide; wherein the linear double stranded polynucleotide of (ii) is linked in vivo to the linear double stranded polynucleotide of (i) by homology directed repair (HDR) or non-homologous end joining (NHEJ) to produce a circular donor-gRNA (or guide-donor) plasmid; wherein the cell comprises a CRISPR-associated (Cas) endonuclease or a nucleic acid sequence encoding the Cas endonuclea
- dsDNA double
- the linear double stranded polynucleotide of (ii) is linked in vivo to the linear double stranded polynucleotide of (i) by homology directed repair (HDR).
- HDR homology directed repair
- plasmids harboring guide and donor cassettes are first linearized with a restriction enzyme, such as but not limited to HindIII or I-SceI, although any restriction enzyme site can be used.
- the plasmid fragment can be amplified by PCR.
- the linear double stranded polynucleotide of (i) is referred to as the guide-donor backbone (or the donor-gRNA backbone).
- a separate piece of the plasmid, the linear double stranded polynucleotide of (ii), also referred to as the insert, can similarly be produced by restriction digestion or by PCR amplification.
- the insert comprises a selectable marker required for cell growth.
- the ends of the linear double stranded polynucleotide of (i) (or the guide-donor backbone, or the donor-gRNA backbone) and the ends of the linear double stranded polynucleotide of (ii) (or the insert) overlap with sufficient homology for repair by HDR.
- the region of overlap comprises 20 or greater than 20 base-pairs, such as 20, 30, 40, 50 or hundreds to thousands of base pairs of overlap.
- linear double stranded polynucleotide of (i) (or the guide-donor backbone, or the donor-gRNA backbone) and the linear double stranded polynucleotide of (ii) (or the insert) are then transformed into cells which reconstitute the circular vectors by HDR.
- the linear double stranded polynucleotide of (ii) is linked in vivo to the linear recombinant double stranded polynucleotide of (i) by non-homologous end joining (NHEJ).
- NHEJ non-homologous end joining
- the guide-donor plasmid (or the donor-gRNA plasmid) is cleaved by one or two restriction enzymes that leave sticky end overhangs.
- the sticky end overhangs are incompatible overhangs (i.e., do not hybridize to each other) to prevent ligation of the linear double stranded polynucleotide of (i) (or the guide- donor backbone, or the donor-gRNA backbone).
- the linear double stranded polynucleotide of (ii) (or the insert) can similarly be produced by restriction digestion of the vector or by PCR amplification followed by restriction digestion.
- the linear double stranded polynucleotide of (ii) (or the insert) comprises a selectable marker required for cell growth.
- the overhangs generated on both ends of the insert are incompatible for self-ligation, and only compatible for ligation with the guide-donor backbone.
- the linear double stranded polynucleotide of (i) (or guide-donor backbone, or the donor-gRNA backbone) and the linear double stranded polynucleotide of (ii) (or the insert) are then transformed into cells which reconstitute the circular vectors by ligation via non-homologous end joining (NHEJ).
- NHEJ non-homologous end joining
- the linear double stranded polynucleotide of (ii) comprises a selectable marker. In some embodiments, the linear double stranded polynucleotide of (ii) further comprises a barcode marker. In some embodiments, the barcode sequence integrates into a designated barcode locus in the host cell genome. In some embodiments, such integration is inducible. In some embodiments, the integration of the barcode sequence into the barcode locus happens after the designated barcode locus is cleaved by a recombinant endonuclease. In some embodiments, a recombinant nucleic acid sequence encoding the endonuclease is operably linked to an inducible promoter.
- the integration of the barcode sequence into the barcode locus in the host cell genome is inducible.
- the method further comprises introducing into the cell a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a second promoter, the retron ncRNA comprising: one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences; an msr sequence; an msd sequence; a donor sequence for homology directed repair (HDR) inserted within the msd sequence; and a first inverted repeat sequence and a second inverted repeat sequence.
- ncRNA retron structured non-coding RNA
- the retron optionally comprises a stabilizing 5 ⁇ ribozyme sequence.
- the cell further comprises a retron-specific reverse transcriptase (RT) or a nucleic acid encoding the retron RT, and the retron RT and the retron ncRNA generate multicopy single-stranded DNA (msDNA) containing the single- stranded donor (or named ssDNA retron donor) sequences.
- RT reverse transcriptase
- msDNA multicopy single-stranded DNA
- the RNA binding domain recognition sequence is an RNA sequence specifically bound by an RNA binding domain of a polypeptide or an aptamer.
- RNA binding domain recognition sequences that bind polypeptide RNA binding domains include, but are not limited to, MS2 stem loop sequence which binds to the MS2 coat protein (MCP), a Pumilio (PUF) recognition sequence, RNA Recognition Motif (RRM) recognition sequence, Double-Stranded RNA-Binding Domain (dsRBD) recognition sequence, Zinc finger (ZF) Domain recognition sequences, Z-alpha, arginine/glycine rich (RGG) domain recognition sequences, a K Homology (KH) Domain recognition sequence, or Poly(A) tail.
- MCP MS2 coat protein
- PAF Pumilio
- RRM RNA Recognition Motif
- dsRBD Double-Stranded RNA-Binding Domain
- ZF Zinc finger
- ZF Zinc finger domain recognition sequences
- Z-alpha arginine/glycine rich domain recognition sequences
- KH K Homology domain recognition sequence
- Poly(A) tail Poly(A)
- an exemplary MS2 coat protein is a bacteriophage MS2 coat protein (see, for example UniProtKB - J9QBW2 (J9QBW2_BPMS2) and UniProtKB - P03612 (CAPSD_BPMS2)).
- the one or more RNA binding domain recognition sequences comprises a stem loop sequence from the bacteriophage MS2.
- the RNA binding domain recognition sequence is a MS2 stem loop sequence.
- the MS2 stem loop sequence binds to a MS2 coat protein (MCP) RNA binding domain.
- the single stranded nucleic acid binding domain recognition sequence is a single stranded DNA or RNA sequence specifically bound by a single stranded nucleic acid binding domain of a polypeptide or an aptamer.
- Non-limiting examples of single stranded nucleic acid binding domain recognition sequences are described in Dickey et al., “Single-Stranded DNA-Binding Proteins: Multiple Domains for Multiple Functions,” Structure 21(7), pgs 1074-1084, July 2, 2013, and references cited therein. As described in Dickey et al.
- oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds
- KH K homology
- RRMs RNA recognition motifs
- OB folds are formed from a five-stranded ⁇ barrel with interspersed loop and helical elements, show significant structural divergence and are capable of binding a variety of ligands in addition to ssDNA and ssRNA (Theobald et al., 2003).
- OB folds can bind ssDNA with high sequence specificity.
- telomere-end protection (TEP) proteins utilize OB folds to sequence specifically bind the GT-rich 30 ssDNA overhang constitutively found at the end of eukaryotic telomeres (reviewed in Horvath, 2011; Lewis and Wuttke, 2012).
- KH domains are small domains (approximately 70 aa) characterized by three ⁇ helices packed against a three-stranded ⁇ sheet (reviewed in Valverde et al., 2008), and KH domains from proteins structurally characterized in complex with ssDNA include heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)-binding protein (FBP), and poly(C)-binding proteins (PCBP) 1 and 2.
- hnRNP K heterogeneous ribonucleoprotein K
- FUSE far upstream element
- PCBP poly(C)-binding proteins
- RRMs most often bind RNA, but have also been shown to bind ssDNA (reviewed in Cle ⁇ ry et al., 2008).
- RRMs are typically about 90 aa in length and form a relatively large ⁇ sheet surface (more similar to OB folds than to KH domains) packed against two a helices.
- the majority of RRMs contain two conserved sequence motifs (RNPs) on strands 1 and 3 that form the primary nucleic acid-binding surface. Residues found elsewhere in the sheet (sometimes including an additional strand) and intervening loops also contribute to nucleic acid binding.
- Whirly domains are large (approximately 180 aa) domains that contain two roughly parallel four-stranded ⁇ sheets with interspersed helical elements. Individual domains form tetramers through interaction of the helices, and these tetramers further interact to form hexamers of tetramers (Cappadocia et al., 2010, 2012). See Dickey et al., “Single-Stranded DNA-Binding Proteins: Multiple Domains for Multiple Functions,” Structure 21(7), pgs 1074-1084, July 2, 2013, and references cited therein.
- the single stranded nucleic acid binding domain recognition sequence binds to a Cas endonuclease binding domain.
- the one or more single stranded nucleic acid binding domain recognition sequences include, but are not limited to, oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds, such as in human POT1, Schizosaccharomyces pombe Pot1, Sterkiella nova TEPB, Cdc13, CspB protein from Bacillus caldolyticus and Bacillus subtilis; K homology (KH) domains, such as in KH domain- containing proteins heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)-binding protein (FBP), and poly(C)-binding proteins (PCBP) 1 and 2; RNA recognition motifs (RRMs) which bind DNA such as in FBP-interacting repressor (FIR), hnRNP A1, and hnRNP D (also known as Auf1); and whirly domains such as in the mitochondria
- OB oligonucleo
- the single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a G-quadruplex binding domain including nucleolin, hnRNP, serine/arginine-rich splicing factors (SRSF) 1 and 9, splicing factor U2AF, TRF2, FRM2, and the RNA helicase associated with AU-rich element (RHAU) proteins (see V. Brázda et al., DNA and RNA quadruplex-binding proteins. Int J Mol Sci. 2014;15(10):17493-17517. doi:10.3390/ijms151017493).
- SRSF serine/arginine-rich splicing factors
- the single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a Cas endonuclease.
- chimeric constructs encoding a retron multicopy single- stranded DNA (msDNA), which comprises an msr RNA covalently attached to a msd DNA, wherein the RNA comprises one or more RNA binding domain recognition sequences and an msr sequence; and wherein the DNA comprises an msd sequence and a subject expression sequence inserted within the msd sequence.
- the subject expression sequence comprises a donor sequence for homologous directed repair (HDR).
- RNA binding domain is an RNA binding domain of a polypeptide that binds to a MS2 stem loop sequence which binds to the MS2 coat protein (MCP), a Pumilio (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double-Stranded RNA-Binding Domain (dsRBD) recognition sequence, a Zinc finger (ZF) Domain recognition sequence, a Z-alpha, arginine/glycine rich (RGG) domain recognition sequence, a K Homology (KH) Domain recognition sequence, or a Poly(A) tail.
- MCP MS2 coat protein
- PEF Pumilio
- RRM RNA Recognition Motif
- dsRBD Double-Stranded RNA-Binding Domain
- ZF Zinc finger
- ZG arginine/glycine rich domain recognition sequence
- KH K Homology domain recognition sequence
- Poly(A) tail a Poly(A) tail.
- the single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a polypeptide that binds to a specific sequence of a single stranded DNA or RNA, such as a Cas endonuclease binding domain.
- Single stranded nucleic acid binding domain recognition domains of polypeptides include, but are not limited to, oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds, such as in such as human POT1, Schizosaccharomyces pombe Pot1, Sterkiella nova TEPB, CspB protein from Bacillus caldolyticus and Bacillus subtilis; K homology (KH) domains, such as in KH domain- containing proteins include heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)-binding protein (FBP), and poly(C)-binding proteins (PCBP) 1 and 2; RNA recognition motifs (RRMs) which bind DNA such as in FBP-interacting repressor (FIR), hnRNP A1, and hnRNP D (also known as Auf1); and whirly domains such as in the mitochondrial whirly protein Why2 and the ma
- RNA binding proteins with well-characterized motifs can be utilized for recruiting the retron msDNA.
- an inverted LexA-LexA repeat with an intervening loop sequence could be inserted into the reverse-transcribed portion of the retron donor. Upon reverse transcription these inverted repeats would fold back on one another creating a highly stable stem loop structure and enable the LexA DNA binding domain to be utilized.
- the FHA domain could be replaced with other domains known to bind to double-strand breaks, or the MCP could be fused directly to Cas9 to have retron donor present at the cut site when Cas9 cleavage occurs.
- RNA binding domains and aptamers could be used in place of the MS2 system such as the programmable RNA-binding domains of Pumilio/fem-3 mRNA binding factors (PUF domains) (Zhao et al., Nucleic Acids Research, 2018 PMCID: PMC5961129) or using CRISPR-Cas systems, where the scaffold for a deactivated Cas nuclease could be introduced in place of MS2 loops, and the deactivated Cas enzyme fused to the FHA domain.
- the DNA break site localizing domain is a DNA break site localizing domain of a polypeptide listed in Tables 1-5 below. Table 1. Human Proteins for Recruitment to DNA Break Table 2.
- the method further comprises introducing into the cell a fusion protein comprising a) a DNA binding domain, an RNA binding domain and/or single stranded nucleic acid binding domain and b) a dsDNA break site-localizing domain, or a nucleic acid encoding the fusion protein, wherein the fusion protein binds to the DNA binding domain recognition sequence of the circular donor-gRNA plasmid or the first linear double stranded polynucleotide prior to assembly, and/or the one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences of the retron, and binds to a dsDNA break site generated by the Cas endonuclease, thereby recruiting the dsDNA donor sequences and/or the ssDNA retron donor
- the fusion protein comprises a DNA binding domain and a dsDNA break site-localizing domain.
- such fusion protein can bind to the DNA binding domain recognition sequence of the circular donor-gRNA plasmid or the first linear double stranded polynucleotide prior to assembly and the dsDNA break site in the genome of the cell, thereby recruiting the dsDNA donor sequences to the dsDNA break in the genome of the cell and promoting editing by HDR.
- the fusion protein comprises an RNA binding domain or single stranded nucleic acid binding domain and a dsDNA break site-localizing domain.
- such fusion protein can bind to the RNA binding domain or single stranded nucleic acid binding domain of the retron.
- a retron RNA expressed by the nucleic acid sequence encoding the retron is reverse transcribed by the RT in vivo to produce multiple ssDNA molecules.
- the fusion protein binds to the RNA binding domain or single stranded nucleic acid binding domain of the individual ssDNA retron molecules to produce a complex between the linear ssDNA molecules, the fusion protein, and the double strand DNA break site, thereby recruiting the ssDNA retron donor sequences to the dsDNA break in the genome of the cell and promoting editing by HDR [0129]
- the fusion protein comprises a) a DNA binding domain, b) an RNA binding domain or single stranded nucleic acid binding domain, and c) a dsDNA break site-localizing domain.
- such fusion protein can bind to i) the circular donor-gRNA plasmid or the first linear double stranded polynucleotide prior to assembly, ii) the individual ssDNA retron molecules, and iii) the dsDNA break site in the genome of the cell, thereby recruiting the dsDNA donor sequences and the ssDNA retron donor sequences to the dsDNA break in the genome of the cell and promoting editing by HDR.
- the cell expresses a Cas endonuclease or a nucleic acid encoding the same, and a reverse transcriptase (RT) or a nucleic acid encoding the same.
- the single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a Cas endonuclease.
- the fusion protein binds to the one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences and also binds to a double strand DNA break site generated by a Cas endonuclease.
- retron RNA expressed by the nucleic acid sequence encoding a retron is reverse transcribed by the RT in vivo to produce multiple single stranded DNA molecules, wherein individual single stranded DNA molecules bind to the fusion protein to produce a complex between the linear DNA molecules, the fusion protein, and the double strand DNA break site, thereby recruiting the retron donor sequence to the DNA break and promoting editing by HDR.
- the locus surrounding the double strand DNA break site accumulates phosphothreonine (pT) modified proteins recognized by forkhead-associated (FHA) domains.
- RNA binding domain of the fusion protein comprises an RNA binding domain of MS2 coat protein (MCP) and the DNA break site localizing domain of the fusion protein comprises a forkhead-associated (FHA) domain.
- the fusion protein comprises a MS2 coat protein (MCP) RNA binding domain and a forkhead-associated (FHA) phosphothreonine-binding domain.
- the linear double stranded donor recruitment polynucleotide of (i) further comprises a site for a nucleic acid binding domain.
- the nucleic acid binding domain is a LexA DNA binding domain or a forkhead homolog 1 (FKH1) DNA binding domain.
- the fusion protein further comprises a LexA DNA binding domain or an FKH1 DNA binding domain.
- the fusion protein comprises three domains selected from a DNA binding domain, an RNA binding domain, and a DNA break site localizing domain.
- the fusion protein comprises (i) a LexA DNA binding domain (ii) an RNA binding domain of MCP, and (iii) an FHA DNA break site localizing domain.
- the domains in the fusion protein are arranged in one of the following configurations: (i), (ii), (iii); (i), (iii), (ii); (ii), (i), (iii); (ii), (iii), (i); (iii), (i), (iii); and (iii), (ii), (ii), (i). [0135]
- the fusion protein comprises (i) the FKH1 DNA binding domain (ii) the RNA binding domain of MCP, and (iii) an FHA DNA break site localizing domain.
- the domains in the fusion protein are arranged in one of the following configurations: (i), (ii), (iii); (i), (iii), (ii); (ii), (i), (iii); (ii), (iii), (i); (iii), (i), (iii); and (iii), (ii), (i). [0136]
- the fusion protein forms a complex with the circular plasmid or linearized donor-gRNA backbone prior to assembly and the double strand DNA break site, thereby recruiting the circular plasmid or linearized donor backbone to the DNA break and enhancing HDR.
- the donor recruitment polynucleotide comprises nucleotide sequences complementary to sequences adjacent to the DNA break.
- the sequences complementary to sequences adjacent to the DNA break comprise the same sequences that are transcribed from the retron donor in the retron RNA.
- the first promoter is a constitutive or inducible promoter.
- the second promoter is a constitutive or inducible promoter.
- the Cas endonuclease is a Cas9 or Cas12a endonuclease, or a modified variant thereof.
- the cell is a eukaryotic cell.
- the eukaryotic cell is selected from a yeast cell, a vertebrate cell, or a mammalian cell.
- the linear double stranded polynucleotide of (ii) further comprises a barcode sequence.
- integration of the barcode sequence into a barcode locus in the host cell genome is inducible. It will be understood that the barcode locus can be a predetermined or designated locus that is different than the edited target site locus (the edited locus).
- integration of the barcode sequence into the barcode locus comprises cleavage of the barcode locus genomic DNA by an endonuclease, wherein the endonuclease is operably linked to an inducible promoter.
- the endonuclease is a Cas endonuclease or a homing endonuclease.
- the homing endonuclease is an I-SceI endonuclease operably linked to a GAL1 promoter that is inducible by galactose.
- the circular donor recruitment plasmid comprises sequences that are homologous to sequences flanking the homing endonuclease cleavage site, such that homologous recombination results in integration of the barcode sequence into the barcode locus.
- the barcode locus in the host cell genome comprises nucleic acid sequences encoding the Cas endonuclease used for editing the target site, or sequences encoding the reverse transcriptase (RT), and/or sequences encoding the fusion protein flanked by the homing endonuclease cleavages sites, wherein expression of the homing endonuclease results in removal of the nucleic acid sequences encoding the Cas endonuclease, or sequences encoding the reverse transcriptase (RT), and/or sequences encoding the fusion protein, concomitantly (in tandem or together) with integration of the barcode sequence.
- RT reverse transcriptase
- the disclosure provides a method for removing a plasmid which has integrated into an edited target locus in the genome of a cell.
- the plasmid is a guide-donor plasmid as disclosed herein.
- the guide-donor plasmid comprises: i) a nucleic acid sequence encoding a guide RNA operably linked to first promoter; ii) a nucleic acid sequence encoding a homing endonuclease operably linked to a second promoter, wherein the second promoter is inducible; iii) a nucleic acid sequence encoding a Cas endonuclease operably linked to a third promoter, wherein the third promoter is inducible; iv) a nucleic acid sequence that is cleaved by the homing endonuclease; and/or v) a nucleic acid sequence that is cleaved by the Cas endonuclease.
- the method for removing the integrated plasmid from the genome of the cell comprises inducing expression of the homing endonuclease and/or the Cas endonuclease to cleave the integrated plasmid DNA, thereby removing the plasmid from the edited target locus.
- plasmid integration is accompanied by tandem repeat duplication of the donor sequence. In this case, cleavage of the integrated plasmid can result in recovery of the desired edit at the target locus by tandem repeat-mediated deletion of the integrated plasmid and one of the donor copies.
- the homing endonuclease is an I-SceI endonuclease.
- the second promoter is inducible by galactose. In some embodiments, the second promoter is GAL1 promoter. In some embodiments, the Cas endonuclease is Cas9, or a modified variant thereof. In some embodiments, the Cas endonuclease is SaCas9, or a modified variant thereof. In some embodiments, the third promoter is inducible by tetracycline or anhydrotetracycline (aTc).
- the guide-donor plasmid further comprises a barcode sequence that is flanked by (iv) the nucleic acid sequence that is cleaved by the homing endonuclease and/or (v) the nucleic acid sequence that is cleaved by the Cas endonuclease.
- inducing expression of the homing endonuclease and/or the Cas endonuclease results in integration of the barcode sequence at a barcode locus, while simultaneously removing the guide-donor plasmid from the edited target locus.
- a method for multiplexed editing of DNA in cells comprising introducing into the cells: i) a guide RNA that binds a target site in the genomic DNA in the cells; ii) a library of linear double stranded donor polynucleotides comprising a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a promoter, wherein the retron ncRNA comprising: one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences; an msr sequence; an msd sequence; a donor sequence for homology directed repair (HDR) inserted within the msd sequence, wherein each donor sequence in the library introduces a different edit at the target site that binds the guide RNA; and, a first inverted repeat sequence and a second inverted repeat sequence; iii) a linear double stranded polynu
- the retron further or optionally comprises a stabilizing 5 ⁇ ribozyme sequence.
- the linear double stranded polynucleotide of (iii) comprises a selectable marker.
- the guide RNA of (i) is physically linked to the linear double stranded donor polynucleotides of (ii). In some embodiments, the same guide RNA is physically linked to different linear double stranded donor recruitment polynucleotides present in the library of (ii).
- the same first guide RNA is physically linked to a first library of linear double stranded donor recruitment polynucleotides
- a different second guide RNA is physically linked to a second library of linear double stranded donor recruitment polynucleotides, and so on.
- the guide RNA and linear double stranded donor recruitment polynucleotides are synthesized as part of the same polynucleotide, and thus covalently linked.
- the RNA binding domain recognition sequence is a MS2 stem loop sequence.
- the MS2 stem loop sequence binds to a MS2 coat protein (MCP) binding domain.
- the single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a Cas endonuclease.
- the locus surrounding the double-strand DNA break site accumulates phosphothreonine (pT) modified proteins recognized by forkhead-associated (FHA) domains.
- the RNA binding domain of the fusion comprises an RNA binding domain of MS2 coat protein (MCP) and the DNA break site localizing domain of the fusion protein comprises a forkhead-associated (FHA) domain.
- the fusion protein comprises a MS2 coat protein (MCP) binding domain and a forkhead-associated (FHA) phosphothreonine-binding domain.
- MCP MS2 coat protein
- FHA forkhead-associated phosphothreonine-binding domain
- the linear double stranded donor recruitment polynucleotides of (ii) further comprise a nucleic acid binding domain.
- the nucleic acid binding domain is a LexA DNA binding domain or a forkhead homolog 1 (FKH1) DNA binding domain.
- the fusion protein further comprises a LexA domain or FKH1 binding domain.
- the fusion protein comprises three domains selected from a DNA binding domain, an RNA binding domain, and a DNA break site localizing domain.
- the fusion protein comprises (i) a LexA DNA binding domain (ii) an RNA binding domain of MCP, and (iii) an FHA DNA break site localizing domain.
- the domains in the fusion protein are arranged in one of the following configurations: (i), (ii), (iii); (i), (iii); (ii), (i), (i), (iii); (ii), (iii), (i); (iiii), (i), (i), (ii); and (iii), (ii), (i).
- the fusion protein comprises (i) the FKH1 DNA binding domain (ii) the RNA binding domain of MCP, and (iii) an FHA DNA break site localizing domain.
- the domains in the fusion protein are arranged in one of the following configurations: (i), (ii), (iii); (i), (iii), (ii); (ii), (iii), (i); (iii), (i), (iii), (i), (i), (ii); and (iii), (ii), (i).
- the fusion protein forms a complex with the circular plasmid and the double strand DNA break site, thereby recruiting the circular plasmid to the DNA break and enhancing HDR.
- the donor recruitment polynucleotide comprises nucleotide sequences complementary to sequences adjacent to the DNA break.
- the sequences complementary to sequences adjacent to the DNA break comprise the same sequences that are transcribed from the retron donor in the retron RNA.
- the promoter is a constitutive or inducible promoter.
- the Cas endonuclease is a Cas9 or Cas12a endonuclease, or a modified variant thereof.
- the plurality of cells are eukaryotic cells.
- the eukaryotic cells are selected from a yeast cell, a vertebrate cell, or a mammalian cell.
- the editing efficiency, fidelity, and/or survival is improved compared to a method that does not include in vivo plasmid assembly to produce a circular donor recruitment plasmid.
- the linear double stranded polynucleotide of (iii) further comprises a barcode sequence.
- retrons Exemplary retrons comprising msr, msd, and inverted repeat sequences that can be used in the nucleic acids of the disclosure are provided in Table 6. The retrons in Table 6 also express reverse transcriptases that can be used in the methods of the disclosure. Table 6. Exemplary retrons. (see Simon, A.J., et al., Retrons and their applications in genome engineering, Nucleic Acids Research, Volume 47, Issue 21, 02 December 2019, Pages 11007–11019).
- the retron encoded by the nucleic acids described herein is a Retron-Eco1 (Ec86) retron and reverse transcriptase system.
- the disclosure provides a system for editing DNA at a target site in the genome of a cell.
- the system comprises: (i) a first linear double-stranded donor polynucleotide comprising a nucleic acid sequence encoding a guide RNA operably linked to a first promoter and a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a second promoter, the retron ncRNA comprising: one or more RNA binding domain recognition sequences or one or more single- stranded nucleic acid binding domain recognition sequences; an msr sequence; an msd sequence; a donor sequence for homology directed repair (HDR) inserted within the msd sequence; and a first inverted repeat sequence and a second inverted repeat sequence; (ii) a second linear double-stranded polynucleotide, and (iii) a fusion protein comprising an RNA binding domain or single-stranded nucleic acid binding domain connected to a dsDNA break site-localizing domain, or a nu
- the retron optionally comprises a stabilizing 5’ ribozyme sequence.
- the linear double stranded polynucleotide of (ii) comprises a selectable marker.
- the system further comprises a cell that expresses a CRISPR- associated (Cas) endonuclease or a nucleic acid encoding the same, and a retron-specific reverse transcriptase (RT) or a nucleic acid encoding the same.
- Cas CRISPR- associated
- RT retron-specific reverse transcriptase
- the fusion protein binds to the one or more RNA binding domain recognition sequences or one or more single-stranded nucleic acid binding domain recognition sequences and also binds to a double-strand DNA break site generated by the Cas endonuclease at the target site.
- the retron RNA expressed by (i) is reverse transcribed by the RT in vivo to produce multiple ssDNA molecules comprising ssDNA retron donor sequences, wherein individual ssDNA molecules bind to the fusion protein to produce a complex between the linear DNA molecules, the fusion protein, and the dsDNA break site, thereby recruiting the ssDNA retron donor sequence to the dsDNA break site and promoting editing by HDR.
- msDNA retron multicopy single-stranded DNA
- msDNA which comprises an msr RNA covalently attached to a msd DNA complexes including a chimera of an RNA hybridized to a DNA, wherein the RNA comprises one or more RNA binding domain recognition sequences and an msr sequence; and wherein the DNA comprises an msd sequence and a subject expression sequence inserted within the msd sequence, and where the chimera is non-covalently bound to a polypeptide that includes an RNA binding domain or single stranded nucleic acid binding domain bound to a DNA break site localizing domain.
- msDNA retron multicopy single-stranded DNA
- Genome editing may be performed on a single cell or a population of cells of interest and can be performed on any type of cell, including any cell from a prokaryotic, eukaryotic, or archaeon organism, including bacteria, archaea, fungi, protists, plants, and animals.
- Cells from tissues, organs, and biopsies, as well as recombinant cells, genetically modified cells, cells from cell lines cultured in vitro, and artificial cells (e.g., nanoparticles, liposomes, polymersomes, or microcapsules encapsulating nucleic acids) may all be used in the practice of the present disclosure.
- the methods of the disclosure are also applicable to editing of nucleic acids in cellular fragments, cell components, or organelles comprising nucleic acids (e.g., mitochondria in animal and plant cells, plastids (e.g., chloroplasts) in plant cells and algae).
- Cells may be cultured or expanded prior to or after performing genome editing as described herein.
- the cells are yeast cells.
- RNA-guided nuclease can be targeted to a particular genomic sequence (i.e., genomic target sequence to be modified) by altering its guide RNA sequence.
- a target-specific guide RNA comprises a nucleotide sequence that is complementary to a genomic target sequence, and thereby mediates binding of the nuclease-gRNA complex by hybridization at the target site.
- the gRNA can be designed with a sequence complementary to the sequence of a minor allele to target the nuclease-gRNA complex to the site of a mutation.
- the mutation may comprise an insertion, a deletion, or a substitution.
- the mutation may include a single nucleotide variation, gene fusion, translocation, inversion, duplication, frameshift, missense, nonsense, or other mutation associated with a phenotype or disease of interest.
- the targeted minor allele may be a common genetic variant or a rare genetic variant.
- the gRNA is designed to selectively bind to a minor allele with single base-pair discrimination, for example, to allow binding of the nuclease-gRNA complex to a single nucleotide polymorphism (SNP).
- SNP single nucleotide polymorphism
- the gRNA may be designed to target disease-relevant mutations of interest for the purpose of genome editing to remove the mutation from a gene.
- the gRNA can be designed with a sequence complementary to the sequence of a major or wild-type allele to target the nuclease-gRNA complex to the allele for the purpose of genome editing to introduce a mutation into a gene in the genomic DNA of the cell, such as an insertion, deletion, or substitution.
- Such genetically modified cells can be used, for example, to alter phenotype, confer new properties, or produce disease models for drug screening.
- the RNA-guided nuclease used for genome modification is a clustered regularly interspaced short palindromic repeats (CRISPR) system Cas nuclease.
- CRISPR clustered regularly interspaced short palindromic repeats
- RNA-guided Cas nuclease capable of catalyzing site-directed cleavage of DNA to allow integration of donor polynucleotides by the HDR mechanism can be used in genome editing, including CRISPR system type I, type II, or type III Cas nucleases.
- Cas proteins include Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8a1, Cas8a2, Cas8b, Cas8c, Cas9 (Csn1 or Csx12), Cas10, Cas10d, Cas12a/Cpf1, Mad7TM (INSCRIPTA ®), CasF, CasG, CasH, Csy1, Csy2, Csy3, Cse1 (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Cs
- a type II CRISPR system such as a Cas9 endonuclease is used.
- Cas9 nucleases from any species, or biologically active fragments, variants, analogs, or derivatives thereof that retain Cas9 endonuclease activity (i.e., catalyze site-directed cleavage of DNA to generate double-strand breaks) may be used to perform genome modification as described herein.
- the Cas9 need not be physically derived from an organism, but may be synthetically or recombinantly produced. Cas9 sequences from a number of bacterial species are well known in the art and listed in the National Center for Biotechnology Information (NCBI) database.
- NCBI National Center for Biotechnology Information
- sequences or a variant thereof comprising a sequence having at least about 70-100% sequence identity thereto, including any percent identity within this range, such as 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity thereto, can be used for genome editing, as described herein. See also Fonfara et al. (2014) Nucleic Acids Res. 42(4):2577-90; Kapitonov et al. (2015) J. Bacteriol.
- the bacterial type II CRISPR system uses the endonuclease, Cas9, which forms a complex with a guide RNA (gRNA) that specifically hybridizes to a complementary genomic target sequence, where the Cas9 endonuclease catalyzes cleavage to produce a double-stranded break.
- Cas9 typically further relies on the presence of a 3 ⁇ protospacer-adjacent motif (PAM) in the DNA directly downstream of the gRNA-binding site.
- the genomic target site will typically comprise a nucleotide sequence that is complementary to the gRNA and may further comprise a protospacer adjacent motif (PAM).
- the target site comprises 20-30 base pairs in addition to a 3 base pair PAM.
- the first nucleotide of a PAM can be any nucleotide, while the two other nucleotides will depend on the specific Cas9 protein that is chosen.
- Exemplary PAM sequences are known to those of skill in the art and include, without limitation, NNG, NGN, NAG, and NGG, wherein N represents any nucleotide.
- the allele targeted by a gRNA comprises a mutation that creates a PAM within the allele, wherein the PAM promotes binding of the Cas9-gRNA complex to the allele.
- the gRNA is 5-50 nucleotides, 10-30 nucleotides, 15-25 nucleotides, 18-22 nucleotides, or 19-21 nucleotides in length, or any length between the stated ranges, including, for example, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides in length.
- the guide RNA may be a single guide RNA comprising crRNA and tracrRNA sequences in a single RNA molecule, or the guide RNA may comprise two RNA molecules with crRNA and tracrRNA sequences residing in separate RNA molecules.
- CRISPR nuclease from Prevotella and Francisella 1 may be used.
- Cpf1 also known as Cas12a
- Cas12a is another class II CRISPR/Cas system RNA-guided nuclease with similarities to Cas9 and may be used analogously.
- Cpf1 does not require a tracrRNA and only depends on a crRNA in its guide RNA, which provides the advantage that shorter guide RNAs can be used with Cpf1 for targeting than Cas9.
- Cpf1 is capable of cleaving either DNA or RNA.
- the PAM sites recognized by Cpf1 have the sequences 5 ⁇ -YTN-3 ⁇ (where "Y” is a pyrimidine and “N” is any nucleobase) or 5 ⁇ -TTTV-3 ⁇ and are located 5 ⁇ to the gRNA binding site, in contrast to the G-rich PAM site recognized by Cas9 which is located 3 ⁇ to the gRNA binding site.
- Cpf1/Cas12a cleavage of DNA produces double-stranded breaks with a sticky-ends having a 4 or 5 nucleotide overhang.
- Cpf1 see, e.g., Ledford et al. (2015) Nature. 526 (7571):17-17, Zetsche et al.
- a class 2 type V-A CRISPR-Cas (Cas12a/Cpf1) nuclease can be used, such as Mad7TM.
- MAD7TM is an engineered class 2 type V-A CRISPR-Cas (Cas12a/Cpf1) system isolated from Eubacterium rectale.
- C2c1 is another class II CRISPR/Cas system RNA-guided nuclease that may be used.
- C2c1 similarly to Cas9, depends on both a crRNA and tracrRNA for guidance to target sites.
- RNA-guided FokI nucleases comprise fusions of inactive Cas9 (dCas9) and the FokI endonuclease (FokI-dCas9), wherein the dCas9 portion confers guide RNA-dependent targeting on FokI.
- dCas9 inactive Cas9
- FokI-dCas9 FokI endonuclease
- the RNA-guided nuclease can be provided in the form of a protein, such as the nuclease complexed with a gRNA, or provided by a nucleic acid encoding the RNA-guided nuclease, such as an RNA (e.g., messenger RNA) or DNA (expression vector). Codon usage may be optimized to improve production of an RNA-guided nuclease in a particular cell or organism.
- a nucleic acid encoding an RNA-guided nuclease can be modified to substitute codons having a higher frequency of usage in a yeast cell, a bacterial cell, a human cell, a non-human cell, a mammalian cell, a rodent cell, a mouse cell, a rat cell, or any other host cell of interest, as compared to the naturally occurring polynucleotide sequence.
- the protein can be transiently, conditionally, or constitutively expressed in the cell.
- Donor polynucleotides and gRNAs are readily synthesized by standard techniques, e.g., solid phase synthesis via phosphoramidite chemistry, as disclosed in U.S. Patent Nos. 4,458,066 and 4,415,732, incorporated herein by reference; Beaucage et al., Tetrahedron (1992) 48:2223-2311; and Applied Biosystems User Bulletin No. 13 (1 April 1987).
- Other chemical synthesis methods include, for example, the phosphotriester method described by Narang et al., Meth. Enzymol. (1979) 68:90 and the phosphodiester method disclosed by Brown et al., Meth. Enzymol. (1979) 68:109.
- gRNA-donor polynucleotide cassettes can be produced by standard oligonucleotide synthesis techniques and subsequently ligated into vectors. Moreover, libraries of gRNA-donor polynucleotide cassettes directed against thousands of genomic targets can be readily created using highly parallel array-based oligonucleotide library synthesis methods (see, e.g., Cleary et al. (2004) Nature Methods 1:241-248, Svensen et al. (2011) PLoS One 6(9):e24906).
- adapter sequences can be added to oligonucleotides to facilitate high- throughput amplification or sequencing.
- a pair of adapter sequences can be added at the 5 ⁇ and 3 ⁇ ends of an oligonucleotide to allow amplification or sequencing of multiple oligonucleotides simultaneously by the same set of primers.
- restriction sites can be incorporated into oligonucleotides to facilitate cloning of oligonucleotides into vectors.
- oligonucleotides comprising gRNA-donor polynucleotide cassettes can be designed with a common 5 ⁇ restriction site and a common 3 ⁇ restriction site to facilitate ligation into the genome modification vectors.
- a restriction digest that selectively cleaves each oligonucleotide at the common 5 ⁇ restriction site and the common 3 ⁇ restriction site is performed to produce restriction fragments that can be cloned into vectors (e.g., plasmids or viral vectors), followed by transformation of cells with the vectors comprising the gRNA-donor polynucleotide cassettes.
- vectors e.g., plasmids or viral vectors
- a restriction site can also be added in between the gRNA and donor polynucleotide sequences to enable a second cloning step for the introduction of a guide RNA scaffold sequence or other constructs into the vector.
- Amplification of polynucleotides encoding gRNA-donor polynucleotide cassettes may be performed, for example, before ligation into genome modification vectors or before sequencing and after barcoding. Any method for amplifying oligonucleotides may be used, including, but not limited to polymerase chain reaction (PCR), isothermal amplification, nucleic acid sequence-based amplification (NASBA), transcription mediated amplification (TMA), strand displacement amplification (SDA), and ligase chain reaction (LCR).
- the genome editing cassettes comprise common 5 ⁇ and 3 ⁇ priming sites to allow amplification of the gRNA-donor polynucleotide sequences in parallel with a set of universal primers.
- a set of selective primers is used to selectively amplify a subset of the gRNA-donor polynucleotides from a pooled mixture.
- Cells that are transformed with recombinant polynucleotides comprising the genome editing cassettes may be prokaryotic cells or eukaryotic cells and are preferably designed for high-efficiency incorporation of gRNA-donor polynucleotide libraries by transformation. Methods of introducing nucleic acids into a host cell are well known in the art. Commonly used methods of transformation include chemically-induced transformation, typically using divalent cations (e.g., CaCl 2 ), and electroporation.
- divalent cations e.g., CaCl 2
- the method for active donor recruitment comprises: a) introducing into a cell a fusion protein comprising a protein that selectively binds to the DNA break connected to a polypeptide comprising a nucleic acid binding domain; and b) introducing into the cell a donor polynucleotide comprising i) a nucleotide sequence sufficiently complementary to hybridize to a sequence adjacent to the DNA break, and ii) a nucleotide sequence comprising a binding site recognized by the nucleic acid binding domain of the fusion protein, wherein the nucleic acid binding domain selectively binds to the binding site on the donor polynucleotide to produce a complex between the donor polynucleotide and the fusion protein, thereby recruiting the donor polynucleotide to the DNA break and promoting HDR.
- the DNA break may be created by a site-specific nuclease, such as, but not limited to, a Cas nuclease (e.g., Cas9, Cpf1, or C2c1), an engineered RNA-guided FokI nuclease, a zinc finger nuclease (ZFN), a transcription activator-like effector-based nuclease (TALEN), a restriction endonuclease, a meganuclease, a homing endonuclease, and the like.
- a site-specific nuclease such as, but not limited to, a Cas nuclease (e.g., Cas9, Cpf1, or C2c1), an engineered RNA-guided FokI nuclease, a zinc finger nuclease (ZFN), a transcription activator-like effector-based nuclease (TALEN), a restriction endonuclease,
- the DNA break may be a single-stranded (nick) or double-stranded DNA break. If the DNA break is a single-stranded DNA break, the fusion protein used comprises a protein that selectively binds to the single-stranded DNA break, whereas if the DNA break is a double- stranded DNA break, the fusion protein used comprises a protein that selectively binds to the double-stranded DNA break. The fusion protein can also recognize both single-stranded and double-stranded DNA breaks.
- the protein that selectively binds to the DNA break can be, for example, an RNA-guided nuclease, such as a Cas nuclease (e.g., Cas9 or Cpf1) or an engineered RNA- guided FokI nuclease.
- Donor polynucleotides may be single-stranded or double-stranded and may be composed of RNA or DNA.
- a donor polynucleotide comprising DNA can be produced from a donor polynucleotide comprising RNA, if desired, by reverse transcription using reverse transcriptase either in the cell (e.g. by a retron reverse transcriptase) or outside the cell (e.g.
- RNA binding domain may be any protein or domain from a protein that binds a known RNA sequence. Examples of each of these proteins are well known in the art. Non- limiting examples of RNA binding domains include domains of proteins that bind to MS2 stem loop sequence, a Pumilio (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double-Stranded RNA-Binding Domain (dsRBD) recognition sequence, a Zinc finger (ZF) Domain recognition sequence, a Z-alpha, arginine/glycine rich (RGG) domain recognition sequence, a K Homology (KH) Domain recognition sequence, or a Poly(A) tail.
- PAF Pumilio
- RRM RNA Recognition Motif
- dsRBD Double-Stranded RNA-Binding Domain
- ZF Zinc finger
- ZF Zinc finger domain recognition sequence
- Z-alpha arginine/glycine rich domain recognition sequence
- KH K Ho
- the single stranded nucleic acid binding domain may be any protein or domain from a protein that binds a known single stranded nucleic acid sequence. Examples of each of these proteins are well known in the art.
- Single stranded nucleic acid binding domain recognition domains of polypeptides include, but are not limited to, oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds, such as in such as human POT1, Schizosaccharomyces pombe Pot1, Sterkiella nova TEPB, CspB protein from Bacillus caldolyticus and Bacillus subtilis; K homology (KH) domains, such as in KH domain- containing proteins include heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)-binding protein (FBP), and poly(C)-binding proteins (PCBP) 1 and 2; RNA recognition motifs (RRMs) which bind DNA such as in
- the fusion protein may comprise a FHA phosphothreonine- binding domain, wherein the donor polynucleotide is selectively recruited to a DNA break having a protein comprising a phosphorylated threonine residue located sufficiently close to the DNA break for the FHA phosphothreonine-binding domain to bind to the phosphorylated threonine residue.
- the FHA phosphothreonine-binding domain may be combined with any RNA binding domain (e.g., fusion with MCP) or single stranded nucleic acid binding domain (e.g. OB-fold) for donor recruitment.
- the donor recruitment protein includes a fusion of a polypeptide domain from any protein that has an RNA binding domain or single stranded nucleic acid binding domain with a polypeptide domain from any protein that has a DNA break localizing domain.
- DNA break localizing domains include domains of proteins that bind to areas of DNA damage and/or DNA repair proteins. Phospho-Ser/Thr-binding domains have emerged as crucial regulators of cell cycle progression and DNA damage signaling.
- Such domains include 14-3-3 proteins, WW domains, Polo-box domains (in PLK1), WD40 repeats (including those in the E3 ligase SCF ⁇ TrCP ), BRCT domains (including those in BRCA1) and FHA domains (such as in CHK2 and MDC1). These domains all have the potential to be used in donor recruitment systems. FHA domains are conserved between eukaryotes and bacteria and thus would also have utility in bacteria as well as eukaryotes for donor recruitment. Examples of proteins or genes encoding such proteins are provided, without limitation, in Tables 1-5.
- the donor recruitment protein comprises a polypeptide sequence from a DNA break-recruiting protein from the same kingdom, phylum or division, class, order, family, genus, and/or species as the cell to be genetically modified.
- the fusion protein comprises an RNA binding domain of MS2 coat protein (MCP) joined to a forkhead-associated (FHA) domain.
- the fusion protein comprises comprises an RNA binding domain of MS2 coat protein (MCP) joined to an FHA phosphothreonine-binding domain.
- the fusion protein comprises a LexA domain, the RNA binding domain of MCP and the FHA domain.
- the LexA domain is from the LexA repressor protein (UniProtKB - P0A7C2). It will be understood that the arrangement or order of the LexA domain, the RNA binding domain of MCP and the FHA domain in the fusion protein can be varied as described herein.
- an inhibitor of the non-homologous end joining (NHEJ) pathway is used to further increase the frequency of cells genetically modified by HDR.
- inhibitors of the NHEJ pathway include any compound (agent) that inhibits or blocks either expression or activity of any protein component in the NHEJ pathway.
- Protein components of the NHEJ pathway include, but are not limited to, Ku70, Ku86, DNA protein kinase (DNA-PK), Rad50, MRE11, NBS1, DNA ligase IV, and XRCC4.
- An exemplary inhibitor is wortmannin which inhibits at least one protein component (e.g., DNA-PK) of the NHEJ pathway.
- RNA interference or CRISPR- interference may also be used to block expression of a protein component of the NHEJ pathway (e.g., DNA-PK or DNA ligase IV).
- siRNAs small interfering RNAs
- hairpin RNAs and other RNA or RNA:DNA species which can be cleaved or dissociated in vivo to form siRNAs
- RNA interference RNA interference
- deactivated Cas9 dCas9
- sgRNAs single guide RNAs
- an HDR enhancer such as RS-1 may be used to increase the frequency of HDR in cells (Song et al. (2016) Nat. Commun.7:10548).
- compositions or formulations comprising the nucleic acids, systems, fusion proteins and constructs described herein.
- the pharmaceutical compositions and formulations can be combined with a pharmaceutically acceptable carrier for administration to a subject or patient.
- a pharmaceutically acceptable carrier for administration to a subject or patient.
- Example 1 shows that combining in vivo plasmid assembly and donor recruitment improved the survival of yeast colonies undergoing targeted edits in the genome.
- FIG. 1b To test the effectiveness of donor recruitment and a previously demonstrated HDR- based plasmid assembly approach on improving editing survival, we combined 24 editing cassettes randomly selected from a guide-donor library. We spiked in non-editing cassettes to simulate low efficacy guides and synthesis errors expected in guide-donor libraries, and guide- donors with edits outside the “seed” region which are expected to drop out of the library due to Cas9 mismatch tolerance and repeated cleavage of donor DNA and edited target DNA ( Figure 1b).
- plasmid assembly enhanced survival for the donor-cleaving variants and a subset of the 24 editing cassettes.
- donor recruitment by LexA-FHA ( Figure 1a) promoted survival for all edit types, and was more effective at preventing accumulation of non-editing cassettes in the library than plasmid assembly ( Figure 1c).
- plasmid assembly specifically enhanced survival of donor-cleaving guides suggested that perhaps somehow it was promoting a continual cycle of perfect repair at the edited sites, and that different modes of plasmid assembly might have a different impact on these cassettes.
- Example 2 This example describes how combining donor recruitment, retron donor production, and in vivo plasmid assembly resulted in improved editing at target sites in the genome.
- We tested combining plasmid assembly with the retron donor recruitment system we previously developed, where MS2-FHA or MS2-LexA-FHA fusion proteins are used to recruit either retron donor or both retron and plasmid donor, respectively.
- We termed this system MAGESTIC 3.0 to account for the three orthogonal HDR-enhancing systems involved.
- MAGESTIC 3.0 in different genome editing applications, including on several hard-to- edit regions of the genome which undergo structural variant (SV) formation, with saturation editing of a gene where single-nucleotide edits can undergo repeated cleavage and repair cycles, and in pooled editing screens involving natural variants in complex trait loci genome- wide.
- a schematic of the method is shown in Figure 2.
- MAGESTIC 3.0 at sites we previously documented to undergo high levels of SV formation upon editing, either in the form of deletions ( Figure 3a) or translocation ( Figure 3b).
- MAGESTIC 3.0 dramatically reduced structural variant formation in these regions, nearly completely eliminating the loss of genomic coverage as assessed by whole-genome sequencing on an edited pool.
- Example 3 This example describes a method for enhanced guide-donor plasmid cleavage.
- This example describes a method for enhanced guide-donor plasmid cleavage.
- MAGESTIC 3.0 improves edit outcomes in four key areas which have limited the effective of genome-scale editing screens, by (1) improving editing survival right after transformation, (2) maintaining survival for single-nucleotide edits which undergo repeated cycles of repair and cleavage, (3) outcompeting endogenous repair processes which regenerate unedited sequence, and (4) outcompeting aberrant, alternative repair processes at structural variant-prone regions of the genome.
- plasmid assembly as an additional benefit, as the PCR-barcoded inserts used in MAGESTIC 3.0 enable tagging each transformant with a unique barcode, giving dozens of internal replicates for each targeted variant.
- Example 4 This example describes multiple PAM variant nucleases derived from SpCas9 and LbCas12a generate saturation nucleotide editing and subsequently remove integrated guide- donor plasmids.
- desired targets often either lack traditional PAMs in their vicinity and thus cannot be edited, or they are in PAM-distal regions where they would be recut at a high rate and result in unintended mutations or cell death. Therefore, complete saturation editing of a genome requires a system that can work with nucleases recognizing a wide array of PAMs. The goal of this experiment was to show that complete saturation nucleotide editing of genomic regions is possible with MAGESTIC 3.0 by using nucleases recognizing diverse PAMs.
- the MAGESTIC 3.0 editing and barcoding system includes 5 stages: (1) colony formation after the transformation of guide-donor plasmids; (2) an outgrowth in liquid media to increase overall editing percentages across the library (this step is especially important for weaker guides); (3) induction of barcoding and guide-donor plasmid destruction by turning on the I-SceI and SaCas9 nucleases, along with the guide X1 for SaCas9, in a media containing galactose and anhydrotetracycline (ATc); (4) a 2 nd outgrowth in the medium containing galactose and Atc; and (5) counter-selection of the cells with residual guide-donor plasmids using 5-fluorocytosine (5FC) ( Figure 8a).
- 5FC 5-fluorocytosine
- Plasmid integration is an inherent feature of editing with all types of nucleases and PAM sites. Taking the genomic editing at the PDR5 promoter region as an example, the MAGESTIC 3.0 editing and barcoding system can provide a saturation nucleotide editing and subsequently removal of integrated guide-donor plasmids from the genomic DNA ( Figure 8b-c). Assessing plasmid integration by PCR is a simple way to confirm that editing has taken place in experiments utilizing plasmid donors. Importantly, the integrated plasmids can be removed and converted back to the intended, correct edits by induction of nucleases which will cleave the plasmid and promote HDR across the integrated donor repeats.
- a method for increasing site-specific genomic editing efficiency, fidelity, and/or survival of a cell comprising introducing into a cell: i) a first linear double stranded polynucleotide comprising a double stranded DNA (dsDNA) donor sequence and a nucleic acid sequence encoding a guide RNA (gRNA) operably linked to a first promoter; and ii) a second linear double stranded polynucleotide, wherein the second linear double stranded polynucleotide is linked to the first linear double stranded polynucleotide by homology directed repair (HDR) or non-homologous end joining (NHEJ) to form a circular donor-gRNA plasmid inside of the cell; wherein the cell comprises a CRISPR-associated (Cas) endonuclease or a nucleic acid sequence encoding the Cas endonuclease, and wherein the Cas endonuclea
- Embodiment 2 The method of embodiment 1, wherein the first linear double stranded polynucleotide further comprises a DNA binding domain recognition sequence.
- Embodiment 3. The method of embodiment 2, wherein the DNA binding domain recognition sequence binds to a LexA DNA binding domain or a forkhead homolog 1 (FKH1) DNA binding domain.
- Embodiment 4. The method of any one of embodiments 1 to 3, wherein the first linear double stranded polynucleotide further comprises nucleotide sequences complementary to sequences adjacent to the dsDNA break site generated by the Cas endonuclease.
- Embodiment 6 The method of any one of embodiments 1 to 5, wherein the Cas endonuclease is a Cas9 or Cas12a endonuclease, or a modified variant thereof.
- Embodiment 7. The method of any one of embodiments 1 to 6, further comprising introducing into the cell a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a second promoter, the retron ncRNA comprising: a. a stabilizing 5 ⁇ ribozyme sequence; b.
- ncRNA retron structured non-coding RNA
- RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences c. an msr sequence; d. an msd sequence; e. a donor sequence for homology directed repair (HDR) inserted within the msd sequence; and, f. a first inverted repeat sequence and a second inverted repeat sequence, wherein the cell further comprises a retron-specific reverse transcriptase (RT) or a nucleic acid encoding the retron RT, and wherein the retron RT and the retron ncRNA generate multicopy single-stranded DNA (msDNA) containing the single-stranded donor (ssDNA retron donor) sequences.
- RT retron-specific reverse transcriptase
- msDNA multicopy single-stranded DNA
- ssDNA retron donor single-stranded donor
- RNA binding domain recognition sequence is a MS2 stem loop sequence.
- Embodiment 9 The method of embodiment 8, wherein the MS2 stem loop sequence binds to a MS2 coat protein (MCP) RNA binding domain.
- Embodiment 10 The method of embodiment 7, wherein the single stranded nucleic acid binding domain recognition sequence binds to a Cas endonuclease binding domain.
- Embodiment 11 The method of any one of embodiments 7 to 10, wherein the second promoter is a constitutive or inducible promoter.
- Embodiment 13 The method of embodiment 12, wherein the fusion protein comprises an dsDNA break site-localizing domain, wherein the dsDNA break site-localizing domain is a forkhead-associated (FHA) domain.
- Embodiment 14 The method of embodiment 12 or 13, wherein the fusion protein comprises an RNA binding domain, wherein the RNA binding domain is a MCP RNA binding domain.
- Embodiment 15 The method of any one of embodiments 12 to 14, wherein the fusion protein comprises the MCP RNA binding domain and the FHA domain.
- Embodiment 17 The method of any one of embodiments 12 to 16, wherein the fusion protein comprises (i) the LexA DNA binding domain or the FKH1 DNA binding domain, (ii) the MCP RNA binding domain, and (iii) the FHA domain in one of the following orders: (i), (ii), (iii); (i), (iii), (ii); (ii), (iii), (i); (iii), (i), (i), (i), (iii), (i); (iiii), (i), (i), (iii); and (iii), (ii), (i).
- Embodiment 18 The method of any one of embodiments 1 to 17, wherein the cell is a eukaryotic cell.
- Embodiment 19 The method of any one of embodiments 1 to 18, wherein the second linear double stranded polynucleotide further comprises a barcode sequence.
- Embodiment 20 The method of embodiment 19, wherein the barcode sequence integrates into a designated barcode locus in the host cell genome.
- Embodiment 21 The method of embodiment 20, wherein integration of the barcode sequence into the barcode locus comprises cleavage of the barcode locus genomic DNA by an endonuclease, wherein the endonuclease is operably linked to an inducible promoter.
- Embodiment 22 The method of embodiment 21, wherein the endonuclease is a Cas endonuclease or a homing endonuclease.
- Embodiment 23 The method of any one of embodiments 19 to 22, wherein the circular donor-RNA plasmid comprises sequences that are homologous to sequences flanking the endonuclease cleavage site, such that homologous recombination results in integration of the barcode sequence into the barcode locus.
- Embodiment 24 Embodiment 24.
- the barcode locus comprises nucleic acid sequences encoding the Cas endonuclease used for editing the target site, sequences encoding the RT, and/or sequences encoding the fusion protein flanked by the endonuclease cleavages sites, wherein expression of the endonuclease results in removal of the nucleic acid sequences encoding the Cas endonuclease, the nucleic acid sequences encoding the RT, and/or the nucleic acid sequences encoding the fusion protein, concomitant with the integration of the barcode sequence.
- Embodiment 26 comprises nucleic acid sequences encoding the Cas endonuclease used for editing the target site, sequences encoding the RT, and/or sequences encoding the fusion protein flanked by the endonuclease cleavages sites, wherein expression of the endonuclease results in removal of the nucleic acid sequences encoding
- Embodiment 27 A method for removing a plasmid which has integrated into an edited target locus in the genome of a cell, wherein the plasmid comprises: i) a nucleic acid sequence encoding a guide RNA operably linked to a first promoter; ii) a nucleic acid sequence encoding a homing endonuclease operably linked to a second promoter, wherein the second promoter is inducible; iii) a nucleic acid sequence encoding a Cas endonuclease operably linked to a third promoter, wherein the third promoter is inducible; iv) a nucleic acid sequence that is cleaved by the homing endonuclease; and/or v) a nucleic acid sequence that is cleaved by
- Embodiment 28 The method of embodiment 27, wherein removal of the plasmid results in recovery of a desired edit at the target locus.
- Embodiment 29 The method of embodiment 27 or 28, wherein the homing endonuclease is an I-SceI endonuclease.
- Embodiment 30 The method of any one of embodiments 27 to 29, wherein the Cas endonuclease is Cas9, or a modified variant thereof.
- Embodiment 31 The method of embodiment 30, wherein the Cas endonuclease is SaCas9, or a modified variant thereof.
- Embodiment 32 The method of embodiment 27, wherein removal of the plasmid results in recovery of a desired edit at the target locus.
- Embodiment 29 The method of embodiment 27 or 28, wherein the homing endonuclease is an I-SceI endonuclease.
- Embodiment 30 The method of any one of embodiments 27 to 29, wherein the Ca
- Embodiment 33 The method of any one of embodiments 27 to 31, wherein the second and/or third promoters are inducible by tetracycline or anhydrotetracycline (aTc).
- Embodiment 34 The method of any one of embodiments 27 to 33, wherein the second promoter is GAL1 promoter that is inducible by galactose and the third promoter is inducible by tetracycline or anhydrotetracycline (aTc).
- Embodiment 35 Embodiment 35.
- the plasmid further comprises (vi) a barcode sequence that is flanked by (iv) the nucleic acid sequence that is cleaved by the homing endonuclease and/or (v) the nucleic acid sequence that is cleaved by the Cas endonuclease.
- Embodiment 36 The method of embodiment 35, wherein inducing expression of the homing endonuclease and/or the Cas endonuclease results in integration of the barcode sequence at a barcode locus, while simultaneously removing the plasmid from the edited target locus.
- a method for multiplexed editing of DNA in cells comprising introducing into the cells: i) a guide RNA that binds a target site in the genomic DNA in the cells; ii) a library of linear double stranded donor polynucleotides comprising a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a promoter, wherein the retron ncRNA comprising: a. a stabilizing 5 ⁇ ribozyme sequence; b. one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences; c. an msr sequence; d. an msd sequence; e.
- a donor sequence for homology directed repair inserted within the msd sequence, wherein each donor sequence in the library introduces a different edit at the target site that binds the guide RNA; and, f. a first inverted repeat sequence and a second inverted repeat sequence; iii) a linear double stranded polynucleotide; wherein the linear double stranded polynucleotide of (iii) is linked in vivo into the linear recombinant double stranded polynucleotide of (ii) by homology directed repair (HDR) or non- homologous end joining (NHEJ) to produce a circular donor plasmid; and iv) a fusion protein comprising an RNA binding domain or single stranded nucleic acid binding domain connected to a DNA break site-localizing domain, or a nucleic acid encoding the fusion protein; wherein the cells express a Cas endonuclease or comprise a nucleot
- Embodiment 38 The method of embodiment 37, wherein the RNA binding domain recognition sequence is a MS2 stem loop sequence.
- Embodiment 39 The method of embodiment 38, wherein the MS2 stem loop sequence binds to a MS2 coat protein (MCP) RNA binding domain.
- MCP MS2 coat protein
- Embodiment 40 The method of any one of embodiments 37 to 39, wherein the single stranded nucleic acid binding domain recognition sequence binds to a Cas endonuclease binding domain.
- Embodiment 41 Embodiment 41.
- Embodiment 42 The method of any one of embodiments 37 to 41, wherein the fusion protein comprises an dsDNA break site-localizing domain, wherein the dsDNA break site localizing domain is an FHA domain.
- Embodiment 43 The method of any one of embodiments 37 to 42, wherein the fusion protein comprises a MCP RNA binding domain and an FHA domain.
- Embodiment 45 The method of embodiment 44, wherein the nucleic acid binding domain recognition sequence binds to a LexA DNA binding domain or a forkhead homolog 1 (FKH1) DNA binding domain.
- Embodiment 46 The method of any one of embodiments 37 to 45, wherein the fusion protein further comprises a LexA DNA domain or a FKH1 DNA binding domain.
- Embodiment 47 The method of any one of embodiments 37 to 43, wherein the linear double stranded donor polynucleotide of (ii) further comprises a DNA binding domain recognition sequence.
- Embodiment 48 The method of embodiment 46 or 47, wherein the fusion protein forms a complex with the circular plasmid and the dsDNA break site, thereby recruiting the circular plasmid to the DNA break and enhancing HDR.
- Embodiment 49 The method of any one of embodiments 37 to 48, wherein the linear double stranded donor polynucleotide of (ii) further comprises nucleotide sequences complementary to sequences adjacent to the dsDNA break.
- Embodiment 50 Embodiment 50.
- Embodiment 51 The method of any one of embodiments 37 to 50, wherein the Cas endonuclease is a Cas9 or Cas12a endonuclease, or a modified variant thereof.
- Embodiment 52 The method of any one of embodiments 37 to 51, wherein the plurality of cells are eukaryotic cells.
- Embodiment 53 The method of any one of embodiments 37 to 52, wherein the editing efficiency, fidelity, and/or survival is improved compared to a method that does not include in vivo plasmid assembly to produce a circular plasmid.
- Embodiment 54 The method of any one of embodiments 37 to 53, wherein the linear double stranded polynucleotide of (iii) further comprises a barcode sequence.
- Embodiment 55 The method of any one of embodiments 37 to 54, wherein the linear double stranded polynucleotide of (iii) comprises a selectable marker.
- Embodiment 56 The method of any one of embodiments 37 to 54, wherein the linear double stranded polynucleotide of (iii) comprises a selectable marker.
- a system for editing DNA at a target site in the genome of a cell comprising: (i) a first linear double-stranded donor polynucleotide comprising a nucleic acid sequence encoding a guide RNA operably linked to a first promoter and a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a second promoter, the retron ncRNA comprising: a. a stabilizing 5’ ribozyme sequence; b. one or more RNA binding domain recognition sequences or one or more single- stranded nucleic acid binding domain recognition sequences; c. an msr sequence; d. an msd sequence; e.
- a donor sequence for homology directed repair inserted within the msd sequence; and f. a first inverted repeat sequence and a second inverted repeat sequence; (ii) a second linear double-stranded polynucleotide, and (iii) a fusion protein comprising an RNA binding domain or single-stranded nucleic acid binding domain connected to a dsDNA break site-localizing domain, or a nucleic acid encoding the fusion protein.
- HDR homology directed repair
- Embodiment 56 further comprising a cell that comprises a CRISPR-associated (Cas) endonuclease or a nucleic acid encoding the same, and a retron-specific reverse transcriptase (RT) or a nucleic acid encoding the same.
- Cas CRISPR-associated
- RT retron-specific reverse transcriptase
- Embodiment 58 The system of embodiment 56 or 57, wherein the fusion protein binds to the one or more RNA binding domain recognition sequences or one or more single- stranded nucleic acid binding domain recognition sequences and binds to a dsDNA break site generated by the Cas endonuclease at the target site.
- Embodiment 59 Embodiment 59.
- retron RNA expressed by (i) is reverse transcribed by the RT in vivo to produce multiple ssDNA molecules comprising ssDNA retron donor sequences, wherein individual ssDNA molecules bind to the fusion protein to produce a complex between the linear ssDNA molecules, the fusion protein, and the dsDNA break site, thereby recruiting the ssDNA retron donor sequence to the dsDNA break site and promoting editing by HDR.
- Embodiment 60 The system of any one of embodiments 56 to 59, wherein the second linear double-stranded polynucleotide comprises a selectable marker.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Genetics & Genomics (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biochemistry (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Microbiology (AREA)
- Medicinal Chemistry (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Physics & Mathematics (AREA)
- Mycology (AREA)
- General Chemical & Material Sciences (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Pharmacology & Pharmacy (AREA)
- Epidemiology (AREA)
- Animal Behavior & Ethology (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Crystallography & Structural Chemistry (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Described herein is a double-strand break, homology-directed repair based genome editing system which combines in vivo plasmid assembly, donor recruitment, and retron donor DNA generation. Each editing system improves edit outcomes at different sites in different ways. The integration of all three systems results in substantially improved editing at all sites measured and in multiple distinct assays, enabling effective editing of structural variant prone regions and saturation editing across entire genomic loci.
Description
PATENT Attorney Docket No.079445-011010PC-1395375 Client Ref. No. S22-339 RECRUITMENT OF DONOR DNA FROM IN VIVO ASSEMBLED PLASMIDS FOR SATURATION GENOME EDITING CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application claims priority to U.S. Provisional Application No.63/401,083, filed August 25, 2022, the disclosure of which is herein incorporated by reference in its entirety for all purposes. STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT [0002] This invention was made with Government support under Contract Nos. R01HG012446 and R01GM121932 awarded by the National Institutes of Health. The Government has certain rights in the invention. BACKGROUND [0003] Genome engineering is a key technology for biotechnology, agriculture, and medicine, enabling the development of microbial strains and crops with desirable properties and the generation of cell-based models and therapies for studying and treating disease. CRISPR nucleases have revolutionized genome engineering by allowing the efficient generation of single-strand breaks (nicks) or double-strand breaks (DSBs) at nearly any desired location in the genome. These DNA breaks can be harnessed to promote sequence alterations at the target site by either relying on error-prone machinery to “break” the gene or by coaxing the cellular machinery to install defined edits through template-driven processes or direct base modifications. While initial work in the field focused on exploring potential off-target effects of double-strand break (DSB)-generating CRISPR nucleases, it has become clear that aberrant on-target editing outcomes, including insertions and deletions (indels) as well as large structural variant (SV) alterations, are a more important concern in most DSB-CRISPR applications. To avoid the undesirable repair events which can accompany editing with (DSB)- generating CRISPR nucleases, some approaches have turned to nickase Cas9 and fused it to various protein and RNA effectors. Examples of these methods include prime editing and base editing which install defined edits through template-driven processes or direct base modifications, respectively. These methods are currently being deployed in various clinical and
research settings. However, these tools are not efficient enough for most high-throughput screening applications, as low editing efficiencies result in an excessive level of false negatives. Furthermore, base editing suffers from a limited targeting range and allowable edit types, and has a problem of bystander edits at the target site. [0004] DSB-based editing by homology-directed repair (HDR) exhibits high efficiency in many cell types with active cell division, and for various organisms important for biotechnology and basic research such as S.cerevisiae. In contrast to prime and base editing, DSB-based editing exhibits both a high overall editing efficiency and the ability to introduce virtually any sequence change of arbitrarily small or large size. However, at the same time, DSB-HDR CRISPR approaches suffer from low editing survival, particularly in cases where mismatch tolerance enables a guide to re-cleave the genomic target sequence after editing, and also suffers from undesired structural variant (SV) generation at particular genomic regions. [0005] The present disclosure provides methods and systems for improving the editing efficiency at target sites within a host cell genome. BRIEF SUMMARY [0006] Provided herein are methods and compositions for site-specific editing of DNA at a target site in the genome of a host cell. The methods and compositions provide advantages over previous methods by increasing the editing efficiency, fidelity and/or survival of cells comprising site-specific (targeted) genetic edits. [0007] In one aspect, the disclosure provides a method for increasing site-specific genomic editing efficiency, fidelity, and/or survival of a cell. In some embodiments, the method comprises introducing into a cell: i) a first linear double stranded polynucleotide comprising a double stranded DNA (dsDNA) donor sequence and a nucleic acid sequence encoding a guide RNA (gRNA) operably linked to first promoter; and ii) a second linear double stranded polynucleotide; wherein the second linear double stranded polynucleotide is linked to the first linear double stranded polynucleotide by homology directed repair (HDR) or non-homologous end joining (NHEJ) to produce a circular donor-gRNA plasmid inside of the cell; wherein the cell comprises a CRISPR-associated (Cas) endonuclease or a nucleic acid sequence encoding the Cas endonuclease, and wherein the Cas endonuclease and the circular donor-gRNA plasmid
and/or the first linear double stranded polynucleotide prior to assembly generate a site-specific edit in the genome of the cell, and increases the editing efficiency, fidelity, and/or survival of the cell compared to a method that does not include in vivo plasmid assembly to produce a circular donor-gRNA plasmid. [0008] In some embodiments, the first linear double stranded polynucleotide further comprises a DNA binding domain recognition sequence. In some embodiments, the DNA binding domain recognition sequence binds to a LexA DNA binding domain or a forkhead homolog 1 (FKH1) DNA binding domain. In some embodiments, the first linear double stranded polynucleotide further comprises nucleotide sequences complementary to sequences adjacent to the dsDNA break site generated by the Cas endonuclease. [0009] In some embodiments, the first promoter is constitutive. In some embodiments, the first promoter is inducible. In some embodiments, the Cas endonuclease is a Cas9 or Cas12a endonuclease, or a modified variant thereof. [0010] As disclosed therein, the method for increasing site-specific genomic editing efficiency, fidelity, and/or survival of a cell further comprises introducing into the cell: iii) a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a second promoter, the retron ncRNA comprising: a. a stabilizing 5ƍ ribozyme sequence; b. one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences; c. an msr sequence; d. an msd sequence; e. a donor sequence for homology directed repair (HDR) inserted within the msd sequence; and, f. a first inverted repeat sequence and a second inverted repeat sequence, wherein the cell further comprises a retron-specific reverse transcriptase (RT) or a nucleic acid encoding the retron RT, and wherein the retron RT and the retron ncRNA generate multicopy single-stranded DNA (msDNA) containing the single-stranded donor (or named ssDNA retron donor) sequences.
[0011] In some embodiments, the RNA binding domain recognition sequence is a MS2 stem loop sequence. In some embodiments, the MS2 stem loop sequence binds to a MS2 coat protein (MCP) RNA binding domain. In some embodiments, the single stranded nucleic acid binding domain recognition sequence binds to a Cas endonuclease binding domain. In some embodiments, the second promoter is constitutive. In some embodiments, the second promoter is inducible. [0012] As disclosed therein, the method for increasing site-specific genomic editing efficiency, fidelity, and/or survival of a cell further comprises introducing into the cell iv) a fusion protein comprising a) a DNA binding domain, an RNA binding domain and/or a single stranded nucleic acid binding domain and b) a dsDNA break site-localizing domain, or a nucleic acid encoding the fusion protein, wherein the fusion protein binds to the DNA binding domain recognition sequence of the circular donor-gRNA plasmid or the first linear double stranded polynucleotide prior to assembly, and/or the one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences of the retron, and binds to a dsDNA break site generated by the Cas endonuclease, thereby recruiting the dsDNA donor sequences and/or the ssDNA retron donor sequences to the dsDNA break in the genome of the cell and promoting editing by HDR. [0013] In some embodiments, the fusion protein comprises an dsDNA break site-localizing domain, wherein the dsDNA break site-localizing domain is a forkhead-associated (FHA) domain. In some embodiments, the fusion protein comprises an RNA binding domain, wherein the RNA binding domain is a MCP RNA binding domain. In some embodiments, the fusion protein comprises the MCP RNA binding domain and the FHA domain. In some embodiments, the fusion protein comprises a DNA binding domain, wherein the DNA binding domain is a LexA DNA binding domain or an FKH1 DNA binding domain. In some embodiments, the fusion protein comprises: (i) the LexA DNA binding domain or the FKH1 DNA binding domain,
(ii) the MCP RNA binding domain, and (iii) the FHA domain in one of the following orders: (i), (ii), (iii); (i), (iii), (ii); (ii), (i), (iii); (ii), (iii), (i); (iii), (i), (ii); and (iii), (ii), (i). [0014] In some embodiments, the cell is a eukaryotic cell. [0015] In some embodiments, the second linear double stranded polynucleotide further comprises a barcode sequence. In some embodiments, the barcode sequence integrates into a designated barcode locus in the host cell genome. In some embodiments, integration of the barcode sequence into the barcode locus comprises cleavage of the barcode locus genomic DNA by an endonuclease, wherein the endonuclease is operably linked to an inducible promoter. In some embodiments, the second linear double stranded polynucleotide comprises a selectable marker. In some embodiments, the second linear double stranded polynucleotide comprises both a barcode sequence and a selectable marker. [0016] In some embodiments, the endonuclease is a Cas endonuclease or a homing endonuclease. In some embodiments, the circular donor-RNA plasmid comprises sequences that are homologous to sequences flanking the endonuclease cleavage site, such that homologous recombination results in integration of the barcode sequence into the barcode locus. In some embodiments, the endonuclease is a homing endonuclease, wherein the homing endonuclease is an I-SceI endonuclease operably linked to a GAL1 promoter. [0017] In some embodiments, the barcode locus comprises nucleic acid sequences encoding the Cas endonuclease used for editing the target site, sequences encoding the RT, and/or sequences encoding the fusion protein flanked by the endonuclease cleavages sites, wherein expression of the endonuclease results in removal of the nucleic acid sequences encoding the Cas endonuclease, the nucleic acid sequences encoding the RT, and/or the nucleic acid
sequences encoding the fusion protein, concomitant with the integration of the barcode sequence. [0018] In another aspect, provided is a method for removing a plasmid which has integrated into an edited target locus in the genome of a cell, wherein the plasmid comprises: i) a nucleic acid sequence encoding a guide RNA operably linked to first promoter; ii) a nucleic acid sequence encoding a homing endonuclease operably linked to a second inducible promoter; iii) a nucleic acid sequence encoding a Cas endonuclease operably linked to a third inducible promoter; iv) a nucleic acid sequence that is cleaved by the homing endonuclease; and/or v) a nucleic acid sequence that is cleaved by the Cas endonuclease; wherein the method comprises inducing expression of the homing endonuclease and/or the Cas endonuclease to cleave the integrated plasmid DNA, thereby removing the plasmid from the edited target locus. [0019] In some embodiments, the plasmid is the donor-gRNA plasmid disclosed above. In some embodiments, plasmid integration is accompanied by tandem repeat duplication of the donor sequence, removal of the plasmid results in recovery of a desired edit at the target locus. [0020] In some embodiments, the homing endonuclease is an I-SceI endonuclease. In some embodiments, the Cas endonuclease is Cas9, or a modified variant thereof. In some embodiments, the Cas endonuclease is SaCas9, or a modified variant thereof. [0021] In some embodiments, the second and/or third promoters are GAL1 promoter inducible by galactose. In some embodiments, the second and/or third promoters are inducible by tetracycline or anhydrotetracycline (aTc). In some embodiments, the second promoter is GAL1 promoter that is inducible by galactose and the third promoter is inducible by tetracycline or anhydrotetracycline (aTc). [0022] In some embodiments, the plasmid further comprises (vi) a barcode sequence that is flanked by (iv) the nucleic acid sequence that is cleaved by the homing endonuclease and/or (v) the nucleic acid sequence that is cleaved by the Cas endonuclease. In some embodiments, inducing expression of the homing endonuclease and/or the Cas endonuclease results in integration of the barcode sequence at a barcode locus, while simultaneously removing the plasmid from the edited target locus.
[0023] In another aspect, provided is a method for multiplexed editing of DNA in cells, the method comprising introducing into the cells: i) a guide RNA that binds a target site in the genomic DNA in the cells; ii) a library of linear double stranded donor polynucleotides comprising a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a promoter, wherein the retron ncRNA comprising: a. an optional stabilizing 5ƍ ribozyme sequence; b. one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences; c. an msr sequence; d. an msd sequence; e. a donor sequence for homology directed repair (HDR) inserted within the msd sequence, wherein each donor sequence in the library introduces a different edit at the target site that binds the guide RNA; and, f. a first inverted repeat sequence and a second inverted repeat sequence; iii) a linear double stranded polynucleotide; wherein the linear double stranded polynucleotide of (iii) is linked in vivo to the linear recombinant double stranded polynucleotide of (ii) by homology directed repair (HDR) or non- homologous end joining (NHEJ) to produce a circular donor plasmid; and iv) a fusion protein comprising an RNA binding domain or single stranded nucleic acid binding domain connected to a DNA break site-localizing domain, or a nucleic acid encoding the fusion protein; wherein the cells comprise a Cas endonuclease or a nucleic acid encoding the same, and a retron-specific reverse transcriptase (RT) or a nucleic acid encoding the same; wherein the fusion protein binds to the one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences of the retron, and binds to a dsDNA break site generated by the Cas endonuclease; wherein retron ncRNA expressed by (ii) is reverse transcribed by the RT in vivo to produce multicopy single-stranded DNA (msDNA) molecules, comprising a single-stranded DNA portion (encoded by msd) and a single-stranded RNA portion (encoded by msr) linked by a 2’-
to-5’ phosphodiester moiety installed as the first nucleotide is reverse transcribed by the retron RT, wherein individual msDNA molecules bind to the fusion protein to produce a complex between the linear msDNA molecules, the fusion protein, and the double-strand DNA break site locus, thereby recruiting the retron donor sequence to the DNA break locus and promoting editing by HDR, wherein a designed edit is introduced at the target site, with a plurality of different edits produced in different cells and each cell receiving a single edit. [0024] In some embodiments, the RNA binding domain recognition sequence is a MS2 stem loop sequence. In some embodiments, the MS2 stem loop sequence binds to a MS2 coat protein (MCP) RNA binding domain. In some embodiments, the single stranded nucleic acid binding domain recognition sequence binds to a Cas endonuclease binding domain. [0025] In some embodiments, the locus surrounding the dsDNA break site accumulates phosphothreonine (pT) modified proteins recognized by forkhead-associated (FHA) domains. In some embodiments, the fusion protein comprises an dsDNA break site-localizing domain, wherein the dsDNA break site localizing domain is an FHA domain. In some embodiments, the fusion protein comprises a MCP RNA binding domain and an FHA domain. [0026] In some embodiments, the linear double stranded donor polynucleotide of (ii) further comprises a DNA binding domain recognition sequence. In some embodiments, the nucleic acid binding domain recognition sequence binds to a LexA DNA binding domain or a forkhead homolog 1 (FKH1) DNA binding domain. [0027] In some embodiments, the fusion protein further comprises a LexA DNA domain or a FKH1 DNA binding domain. In some embodiments, the LexA DNA domain or the FKH1 DNA binding domain is located between the MCP RNA binding domain and the FHA domain. In some embodiments, the fusion protein forms a complex with the circular plasmid and the dsDNA break site, thereby recruiting the circular plasmid to the DNA break and enhancing HDR. [0028] In some embodiments, the linear double stranded donor polynucleotide of (ii) further comprises nucleotide sequences complementary to sequences adjacent to the dsDNA break. [0029] In some embodiments, the promoter is a constitutive promoter. In some embodiments, the Cas endonuclease is a Cas9 or Cas12a endonuclease, or a modified variant thereof. In some embodiments, the plurality of cells are eukaryotic cells. In some
embodiments, the editing efficiency, fidelity, and/or survival is improved compared to a method that does not include in vivo plasmid assembly to produce a circular plasmid. In some embodiments, the linear double stranded polynucleotide of (iii) further comprises a barcode sequence. In some embodiments, the linear double stranded polynucleotide of (iii) comprises a selectable marker. [0030] In another aspect, the disclosure provides a system for editing DNA at a target site in the genome of a cell, comprising: (i) a first linear double-stranded donor polynucleotide comprising a nucleic acid sequence encoding a guide RNA operably linked to a first promoter and a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a second promoter, the retron ncRNA comprising: a. an optional stabilizing 5’ ribozyme sequence; b. one or more RNA binding domain recognition sequences or one or more single- stranded nucleic acid binding domain recognition sequences; c. an msr sequence; d. an msd sequence; e. a donor sequence for homology directed repair (HDR) inserted within the msd sequence; and f. a first inverted repeat sequence and a second inverted repeat sequence; (ii) a second linear double-stranded polynucleotide, and (iii) a fusion protein comprising an RNA binding domain or single-stranded nucleic acid binding domain connected to a dsDNA break site-localizing domain, or a nucleic acid encoding the fusion protein. In some embodiments, the second linear double stranded polynucleotide comprises a selectable marker. [0031] In some embodiments, the system further comprises a cell that comprises a CRISPR- associated (Cas) endonuclease or a nucleic acid encoding the same, and a retron-specific reverse transcriptase (RT) or a nucleic acid encoding the same. In some embodiments, the fusion protein binds to the one or more RNA binding domain recognition sequences or one or more single-stranded nucleic acid binding domain recognition sequences and binds to a dsDNA break site generated by the Cas endonuclease at the target site.
[0032] In some embodiments, retron ncRNA expressed by (i) is reverse transcribed by the RT in vivo to produce multiple ssDNA molecules comprising ssDNA retron donor sequences, wherein individual ssDNA molecules bind to the fusion protein to produce a complex between the linear ssDNA molecules, the fusion protein, and the dsDNA break site, thereby recruiting the ssDNA retron donor sequence to the dsDNA break site and promoting editing by HDR. In some embodiments, the second linear double-stranded polynucleotide comprises a selectable marker. BRIEF DESCRIPTION OF THE DRAWINGS [0033] Figure 1. Combining plasmid assembly and donor recruitment for improved variant editing survival. (a) Schematic of LexA-FHA donor recruitment and plasmid assembly methods. (b) Overview of the mini-library plasmid editing competition assay, where the survival of editing dominates the population dynamics. (c) Percent abundance of each editing cassette in the pool as a function of generations of editing in glucose with constitutive Cas9. The donors and adjacent barcodes on each plasmid were sequenced to infer overall abundance of each strain. [0034] Figure 2. Combining donor recruitment, retron donor production, and plasmid assembly for MAGESTIC 3.0. (a) Overview of MAGESTIC 3.0 combining double-stranded plasmid donor recruitment by LexA-FHA, single-stranded donor DNA recruitment by retron- amplification of donor DNA with MS2-FHA, and HDR-based plasmid assembly. (b) The 3 different components of MAGESTIC 3.0 (plasmid donor recruitment, retron production and recruitment, and in vivo plasmid assembly) can function simultaneously to improve editing outcomes at the target site (left side). Editing can be performed with constitutive expression of the guide RNA, Cas9, and donor machinery, such that editing can take place immediately after transformation as cells form colonies on agar plates. Editing is allowed to proceed to completion by propagating cells for an additional ~6 generations in selective liquid media, and then barcoding can be initiated by inducing expression of a barcoding nuclease, for example by using galactose to turn on expression of the I-SceI endonuclease controlled by the GAL1 promoter. [0035] Figure 3. MAGESTIC 3.0 enables successful editing at structural variant-prone regions of the genome. (a) Overview of MAGESTIC 3.0 combining double-stranded plasmid donor recruitment by LexA-FHA, single-stranded donor DNA recruitment by retron- amplification of donor DNA with MS2-FHA, and HDR-based plasmid assembly. (a) Levels of
deletions induced at deletion-prone loci with different editing systems. (b) Editing near the telomeric region can result in non-reciprocal translocations, as evidenced by this example on chr IX. These guides in (a) and (b) are all highly efficient such that all sequence not undergoing a deletion or translocation was found to be edited with donor DNA. [0036] Figure 4. Combining plasmid assembly with retron donor and donor recruitment for saturation genome editing. (a) A multiplexed editing assay where all possible single nucleotide variants (SNVs) across two genomic regions are attempted with each donor DNA enhancement system and either SpCas9 or LbCas12a. Two different windows rich in NGG protospacer adjacent motifs (PAMs) and TTTV PAMs were chosen for SpCas9 (20- bp guides) and LbCas12a (23-bp guides), respectively. Each guide was paired with a library of donor DNAs with all possible SNVs across the target sequence including the PAM. (b) Each region was analyzed by next-generation sequencing (NGS) to quantify the levels of each SNV along the targeted region. For visualization purposes all three SNVs at each position are combined into a single column. The arrows at the top of the plot denote the position and directionality of the guides, with PAMs for SpCas9 and LbCas12a represented by the end and beginning of each arrow, respectively. The fraction in the upper right hand corner represents the total amount of edited SNV fraction for each donor system. Two different regions of the plasmid were cut with either I-SceI or HindIII for HDR-based plasmid assembly, giving similar results. [0037] Figure 5. Unique transformant barcodes enable discovery of plasmid integration in a subset of clones across ORFs. (a) A library of natural variants was designed targeting the genes IRA1 and IRA2, along with control edits introducing premature-termination codons (PTCs) or matched synonymous substitution controls. The libraries were grown in 1M sorbitol for 20 generations, and then barcodes sequenced to identify which variants were enriched or depleted and infer fitness effects. The profile of synonymous edits across the gene mimicking the impacts of PTCs suggested gene disruption by plasmid integration. (b) The proposed mechanism of editing with linear donor DNA (far left), plasmid donor DNA (center), and plasmid integration. [0038] Figure 6. Validating plasmid integration across 24 distinct genomic target regions and different editing systems. (a) The 24 editing cassettes assayed in Figure 1 were assayed individually for plasmid integration behavior with a 3-primer PCR to capture both the
WT locus and either the left or right junction. Integration levels were assayed either (b) directly after transformation or (c) after 12 generations of additional editing. [0039] Figure 7. Development of a dual inducible I-SceI and SaCas9 guide X system for complete resolution of plasmid integration events and removal of residual guide-donor plasmids. (a) Cleavage of the integrated plasmids could promote rescue of the intended edit through deletion of the plasmid by tandem-repeat-mediated HDR. To facilitate this, the I-SceI endonuclease was controlled by the galactose-inducible GAL-L promoter, and the SaCas9 nuclease controlled by the aTc-inducible WTC846 system, along with a guide “X” targeting the plasmid adjacent to the short homology near the integrating barcode. (b) PCR products for assaying overall guide-donor plasmid levels (P) and plasmid integration junctions (I). (c) Two different libraries targeting PDR5 locus with SpCas9 as in Figure 3 were in vivo plasmid assembled with a barcoded insert with either no cut sites (-I-SceI, -SaCas9 X1), with a site for I-SceI only (+I-SceI, -SaCas9 X1), or with sites for both I-SceI and SaCas9 X1 (+I-SceI, +SaCas9 X1). These libraries were washed from the plate, outgrown in glucose for additional editing, and then shifted to galactose and aTc medium to induce barcoding, and then to either liquid or agar CSM-HIS+5FC medium to counter-select for FCY1. They were then harvested for DNA and analyzed qualitatively for plasmid integration by PCR. [0040] Figure 8. Saturation editing of the PDR5 promoter with multiple PAM variant nucleases from SpCas9 and LbCas12a and subsequent removal of integrated guide-donor plasmids. (a) The 5 stages of MAGESTIC editing and barcoding process: (1) colony formation after the transformation of guide-donor plasmids; (2) an outgrowth in liquid media to increase overall editing percentages across the library (this step is especially important for weaker guides); (3) induction of barcoding and guide-donor plasmid destruction by turning on the I- SceI and SaCas9 nucleases, along with the guide X1 for SaCas9, in a media containing galactose and anhydrotetracycline (ATc); (4) a 2nd outgrowth in the medium containing galactose and Atc; and (5) counter-selection of the cells with residual guide-donor plasmids using 5-fluorocytosine (5FC). (b) The PDR5 promoter region targeted in this experiment was divided into 3 windows to enable complete coverage of each region with paired-end 150 bp reads. Both SpCas9 and LbCas12a nucleases were utilized, along with multiple engineered variants that recognize a broader array of PAMs. SpCas9-NG (PMCID: PMC6368452) and SpG Cas9 (PMC7297043) recognize NG PAMs; SpRY Cas9 (PMC7297043) recognizes nearly any PAM with a preference for NRN; and impLbCas12a (PMC7144938) recognizes a wide array of T/C-rich PAMs, including the TTTV (recognized by WT LbCas12a), TNTN, TACV,
TTCV, TCCV, CTCV, CCCV, and VTTV. (c) depicts a schematic illustration of an integrated guide-donor plasmid. The guide-donor plasmid is integrated at the target sites with introduction of a direct repeat of a donor sequence. Introducing a couple of nuclease cleavage sites within the guide-donor plasmid allows for excision of these plasmids and recovery of the desired edits. To assess the overall levels of plasmid integration at each stage of MAGESTIC, 3-primer PCRs were designed for both the upstream junction (USJ) and downstream junction (DSJ) of the integrated plasmid. For the USJ, a primer internal to the guide-donor plasmid was designed to yield 1.5 to 2 kb for loci with integrated guide-donor plasmid and 3114 bp for intact locus. The DSJ internal primer was designed 3.7 to 4.2 kb for loci with integrated guide-donor plasmid. There is a range of sizes of the USJ and DSJ products due to the fact that the guide-donor plasmids will integrate at variable sites across each window. The DSJ and USJ products were designed to be shorter and longer than the intact locus product, respectively, to ensure that biases in PCR efficiency due to length of the products did not have a major impact on the interpretation of plasmid integration rates. (d) Both WT and PAM-variant SpCas9 and LbCas12a nucleases were mixed and matched with guide-donor libraries targeting either canonical PAMs (upper panel for each window; NGG for SpCas9, TTTV for LbCas12a) or non-canonical PAMs (lower panel for each window; NGNG for Cas9 variants excluding NGG, and T/C-rich PAMs for impLbCas12a excluding TTTV). As a control, the SpRY PAM variant of SpCas9 was transformed with the LbCas12a library, to test whether plasmid integration depends on target site cleavage. For each nuclease + guide-donor library combination, cells were passaged through the 5 stages of MAGESTIC and plasmid integration levels were analyzed by the USJ and DSJ 3-primer PCRs. The results show that all nucleases promote on- target plasmid integration, and that the induction of I-SceI and SaCas9, along with guide X1 successfully removes all detectable integrated guide-donor plasmids. The direct repeats of the donor DNA flanking each guide-donor plasmid promote recombination and recovery of the intended edits. (e) The total levels of SNVs in each window were measured by next-generation sequencing of each sample shown in (d). The data are from the 5FC-treated cultures (stage 5) to demonstrate the total SNV levels at the final stage of MAGESTIC. The position of each SNV (relative to the start of each window) is shown on the x-axis, and the total fraction of SNV edits at each position is shown on the y-axis. (i.e. the sum of the edit fractions for the 3 possible SNVs at each position). As in (d), for each window, the PAMs are separated according to whether they are canonical (top) or non-canonical (bottom). The upper right corner indicates the total fraction of SNV-edited sequence observed in the sample. (f) All three windows with both the canonical and non-canonical PAMs were combined to show overall SNV editing levels
across the PDR5 promoter for each nuclease. The data in (e) were summed with the coordinates adjusted to be relative to the PDR5 transcription start site. For each nuclease the average SNV editing levels across all 6 samples (3 windows with both the canonical and non-canonical PAMs) are shown in the upper right of each panel. Note that SpCas9 data includes non- canonical PAMs, explaining why the average editing level is lower than for the SpCas9 variants which recognize the non-canonical PAMs. DETAILED DESCRIPTION [0041] Provided herein are methods and compositions to increase the efficiency of genetic editing at a target site in the genome of a cell. [0042] The practice of the present disclosure will employ, unless otherwise indicated, conventional methods of genome editing, biochemistry, chemistry, immunology, molecular biology and recombinant DNA techniques within the skill of the art. Such techniques are explained fully in the literature. See, e.g., Targeted Genome Editing Using Site-Specific Nucleases: ZFNs, TALENs, and the CRISPR/Cas9 System (T. Yamamoto ed., Springer, 2015); Genome Editing: The Next Step in Gene Therapy (Advances in Experimental Medicine and Biology, T. Cathomen, M. Hirsch, and M. Porteus eds., Springer, 2016); Aachen Press Genome Editing (CreateSpace Independent Publishing Platform, 2015); Handbook of Experimental Immunology, Vols. I-IV (D.M. Weir and C.C. Blackwell eds., Blackwell Scientific Publications); A.L. Lehninger, Biochemistry (Worth Publishers, Inc., current addition); Sambrook, et al., Molecular Cloning: A Laboratory Manual (3rd Edition, 2001); Methods In Enzymology (S. Colowick and N. Kaplan eds., Academic Press, Inc.). [0043] All publications, patents and patent applications cited herein, whether supra or infra, are hereby incorporated by reference in their entireties. I. Definitions [0044] Before the present invention is further described, it is to be understood that this invention is not strictly limited to particular embodiments described, as such may of course vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the claims. [0045] It must be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. It should further be understood that as used herein, the term “a” entity or “an” entity refers to one
or more of that entity. For example, a nucleic acid molecule refers to one or more nucleic acid molecules. As such, the terms “a”, “an”, “one or more” and “at least one” can be used interchangeably. Similarly, the terms “comprising”, “including” and “having” can be used interchangeably. [0046] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, the preferred methods and materials are now described. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates, which may need to be independently confirmed. [0047] It is appreciated that certain features of the disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub- combination. All combinations of the embodiments are specifically embraced by the present invention and are disclosed herein just as if each and every combination was individually and explicitly disclosed. In addition, all sub-combinations are also specifically embraced by the present invention and are disclosed herein just as if each and every such sub-combination was individually and explicitly disclosed herein. [0048] It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements or use of a “negative” limitation. [0049] As used herein, the term “about” means a range of values including the specified value, which a person of ordinary skill in the art would consider reasonably similar to the specified value. In embodiments, about means within a standard deviation using measurements generally acceptable in the art. In embodiments, about means a range extending to +/- 10% of
the specified value (e.g., +/- 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or 10% of the specified value). In embodiments, about means the specified value. [0050] The term “genome editing” refers to a type of genetic engineering in which DNA is inserted, replaced, or removed from a target DNA (e.g., the genome of a cell) using one or more nucleases and/or nickases. The nucleases create specific double-strand breaks (DSBs) at desired locations in the genome and harness the cell's endogenous mechanisms to repair the induced break by homology-directed repair (HDR) (e.g., homologous recombination) or by nonhomologous end joining (NHEJ). The nickases create specific single-strand breaks at desired locations in the genome. In one non-limiting example, two nickases can be used to create two single-strand breaks on opposite strands of a target DNA, thereby generating a blunt or a sticky end. Any suitable DNA nucleases and/or nickases can be introduced into a cell to induce genome editing of a target DNA sequence. [0051] The term “DNA nuclease” refers to an enzyme capable of cleaving the phosphodiester bonds between the nucleotide subunits of DNA and may be an endonuclease or an exonuclease. According to the present invention, the DNA nuclease may be an engineered (e.g., programmable or targetable) DNA nuclease which can be used to induce genome editing of a target DNA sequence. Any suitable DNA nuclease can be used including, but not limited to, CRISPR-associated protein (Cas) nucleases, other endo- or exo-nucleases, variants thereof, fragments thereof, and combinations thereof. [0052] The term “double-strand break” or “DSB” or “double-strand cut” refers to the severing or cleavage of both strands of the DNA double helix. The DSB may result in cleavage of both stands at the same position leading to “blunt ends” or staggered cleavage resulting in a region of single-stranded DNA at the end of each DNA fragment, or “sticky ends”. A DSB may arise from the action of one or more DNA nucleases. [0053] The term “nonhomologous end joining” or “NHEJ” refers to a pathway that repairs double-strand DNA breaks in which the break ends are directly ligated without the need for a homologous template. [0054] The term “homology-directed repair” or “HDR” refers to a mechanism in cells to accurately and precisely repair double-strand DNA breaks using a homologous template to guide repair. The most common form of HDR is homologous recombination (HR), a type of genetic recombination in which nucleotide sequences are exchanged between two similar or identical molecules of DNA.
[0055] As used herein, the term “retron” is used in accordance with its plain ordinary meaning and refers to a DNA sequence found in the genome of many bacteria species that codes for reverse transcriptase (RT) and a unique single-stranded DNA/RNA hybrid called multicopy single-stranded DNA (msDNA). The retron msr-msd RNA is the non-coding RNA produced by retron elements and is the immediate precursor to the synthesis of msDNA. The retron msr RNA folds into a characteristic secondary structure that contains a conserved guanosine residue at the end of a stem loop. Synthesis of DNA by the retron-encoded reverse transcriptase (RT) results in a DNA/RNA chimera which is composed of small single-stranded DNA linked to small single-stranded RNA. The RNA strand is joined to the 5ƍ end of the DNA chain via a 2ƍ–5ƍ phosphodiester linkage that occurs from the 2ƍ position of the conserved internal guanosine residue. The retron operon carries a promoter sequence P that controls the synthesis of an RNA transcript carrying three loci: msr, msd, and ret. The ret gene product, a reverse transcriptase, processes the msd/msr portion of the RNA transcript into msDNA. Retron elements are about 2 kb long. They contain a single operon controlling the synthesis of an RNA transcript carrying three loci, msr, msd, and ret, that are involved in msDNA synthesis. The DNA portion of msDNA is encoded by the msd region, the RNA portion is encoded by the msr region, while the product of the ret open-reading frame is a reverse transcriptase (RT) similar to the RTs produced by retroviruses and other types of retroelements. Like other reverse transcriptases, the retron RT contains seven regions of conserved amino acids, including a highly conserved tyr-ala-asp-asp (YADD) sequence associated with the catalytic core. The ret gene product is responsible for processing the msd/msr portion of the RNA transcript into msDNA. [0056] As used herein, the term “reverse transcriptase” refers to its plain and ordinary meaning as an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription. [0057] The terms "polypeptide" and "protein" refer to a polymer of amino acid residues and are not limited to a minimum length. Thus, peptides, oligopeptides, dimers, multimers, and the like, are included within the definition. Both full length proteins and fragments thereof are encompassed by the definition. The terms also include post expression modifications of the polypeptide, for example, glycosylation, acetylation, phosphorylation, hydroxylation, and the like. Furthermore, for purposes of the present disclosure, a "polypeptide" refers to a protein which includes modifications, such as deletions, additions and substitutions to the native sequence, so long as the protein maintains the desired activity. These modifications may be
deliberate, as through site directed mutagenesis, or may be accidental, such as through mutations of hosts which produce the proteins or errors due to PCR amplification. [0058] As used herein, the term “single stranded nucleic acid binding domain” refers to a polypeptide or aptamer that preferentially binds to specific sequences of single stranded DNA or single stranded RNA. Single stranded nucleic acid binding domain recognition domains of polypeptides include, but are not limited to, Cas endonucleases such as Cas13 or Cas14, oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds, such as in such as human POT1, Schizosaccharomyces pombe Pot1, Sterkiella nova TEPB, CspB protein from Bacillus caldolyticus and Bacillus subtilis; K homology (KH) domains, such as in KH domain- containing proteins include heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)-binding protein (FBP), and poly(C)-binding proteins (PCBP) 1 and 2; RNA recognition motifs (RRMs) which bind DNA such as in FBP-interacting repressor (FIR), hnRNP A1, and hnRNP D (also known as Auf1); and whirly domains such as in the mitochondrial whirly protein Why2 and the mammalian transcriptional regulator PurA. See, for example, Dickey TH et al. (2013) Structure 21(7);1074-1084. [0059] As used herein, the term “RNA binding domain” refers to a polypeptide or aptamer that preferentially binds to specific sequences of a single stranded or double stranded RNA which, in the case of a polypeptide, can include the entire protein or a functional portion thereof. Non-limiting examples of RNA binding domains include an MS2 coat protein (MCP), Pumilio (PUF), RNA Recognition Motif (RRM), Double-Stranded RNA-Binding Domain (dsRBD), Zinc finger (ZF) Domains (CCHH zinc fingers: TFIIIA, CCCH zinc fingers, CCHC zinc knuckles, RanBP2-type ZFs), Z-alpha, arginine/glycine rich (RGG) domains, or K Homology (KH) Domain, and Poly(A) Binding Proteins. Other examples include Fox-1, U1A, pentatricopeptide repeat proteins, hnRNP K homology domains, or antibodies engineered to bind RNA. The term “RNA binding domain recognition sequence” refers to the RNA sequence to which an RNA binding domain preferentially binds. [0060] As used herein, the term “DNA break localizing domain” refers to a polypeptide that preferentially binds to regions of DNA damage and/or DNA repair proteins which can include the entire protein or a functional portion thereof. Non-limiting examples of DNA break localizing domains include 14-3-3 proteins, WW domains, Polo-box domains (in PLK1), WD40 repeats (including those in the E3 ligase SCFȕTrCP), BRCT domains (including those
in BRCA1) and FHA domains (such as in Fkh1p, CHK2 and MDC1). Other examples are provided in Tables 1-5 (see below). [0061] As used herein, “sequence specific endonuclease” refers to an enzyme that cleaves at a specific sequence within a polynucleotide sequence. In some aspects, the nuclease activity can be partially or completed inhibited, so that only one of the two strands or neither strand is cleaved,. Non-limiting examples of sequence specific endonucleases include CRISPR associated (Cas) nuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease. [0062] The term "Cas9" as used herein encompasses type II clustered regularly interspaced short palindromic repeats (CRISPR) system of Cas9 endonucleases from any species, and also includes biologically active fragments, variants, analogs, and derivatives thereof that retain Cas9 endonuclease activity (i.e., catalyze site-directed cleavage of DNA to generate double- strand breaks). A Cas9 endonuclease binds to and cleaves DNA at a site comprising a sequence complementary to its bound guide RNA (gRNA). [0063] A Cas9 polynucleotide, nucleic acid, oligonucleotide, protein, polypeptide, or peptide refers to a molecule derived from any source. The molecule need not be physically derived from an organism but may be synthetically or recombinantly produced. Cas9 sequences from a number of bacterial species are well known in the art and listed in the National Center for Biotechnology Information (NCBI) database. See, for example, NCBI entries for Cas9 from: Streptococcus pyogenes (WP_002989955, WP_038434062, WP_011528583); Campylobacter jejuni (WP_022552435, YP_002344900), Campylobacter coli (WP_060786116); Campylobacter fetus (WP_059434633); Corynebacterium ulcerans (NC_015683, NC_017317); Corynebacterium diphtheria (NC_016782, NC_016786); Enterococcus faecalis (WP_033919308); Spiroplasma syrphidicola (NC_021284); Prevotella intermedia (NC_017861); Spiroplasma taiwanense (NC_021846); Streptococcus iniae (NC_021314); Belliella baltica (NC_018010); Psychroflexus torquisI (NC_018721); Streptococcus thermophilus (YP_820832), Streptococcus mutans (WP_061046374, WP_024786433); Listeria innocua (NP_472073); Listeria monocytogenes (WP_061665472); Legionella pneumophila (WP_062726656); Staphylococcus aureus (WP_001573634); Francisella tularensis (WP_032729892, WP_014548420), Enterococcus faecalis (WP_033919308); Lactobacillus rhamnosus (WP_048482595, WP_032965177); and Neisseria meningitidis (WP_061704949, YP_002342100); all of which sequences (as entered by the date of filing of
this application) are herein incorporated by reference. SpCas9 is a Cas9 from Streptococcus pyogenes (WP_002989955, WP_038434062, WP_011528583), or a variant thereof. SaCas9 is a Cas9 from Staphylococcus aureus (WP_001573634), or a variant thereof. Any of these sequences or a variant thereof comprising a sequence having at least about 70-100% sequence identity thereto, including any percent identity within this range, such as 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity thereto, can be used for genome editing, as described herein, wherein the variant retains biological activity, such as Cas9 site-directed endonuclease activity. See also Fonfara et al. (2014) Nucleic Acids Res. 42(4):2577-90; Kapitonov et al. (2015) J. Bacteriol. 198(5):797-807, Shmakov et al. (2015) Mol. Cell. 60(3):385-397, and Chylinski et al. (2014) Nucleic Acids Res.42(10):6091-6105); for sequence comparisons and a discussion of genetic diversity and phylogenetic analysis of Cas9. [0064] The term "Cas12" as used herein encompasses a subtype of Cas12 proteins, previously known as Cpf1, and an RNA-guided endonuclease that forms part of the CRISPR system in some bacteria and archaea. Cas12a is distinguished from Cas9 by a its single RuvC endonuclease active site, its 5' protospacer adjacent motif preference, and for creating sticky rather than blunt ends at the cut site. LbCas12a is a Cas12a from Lachnospiraceae bacterium (ND2006), is the widely used orthologue for targeted mutagenesis. [0065] By "derivative" is intended any suitable modification of the native polypeptide of interest, of a fragment of the native polypeptide, or of their respective analogs, such as glycosylation, phosphorylation, polymer conjugation (such as with polyethylene glycol), or other addition of foreign moieties, as long as the desired biological activity of the native polypeptide is retained. Methods for making polypeptide fragments, analogs, and derivatives are generally available in the art. [0066] By "fragment" is intended a molecule consisting of only a part of the intact full-length sequence and structure. The fragment can include a C-terminal deletion, an N- terminal deletion, and/or an internal deletion of the polypeptide. Active fragments of a particular protein or polypeptide will generally include at least about 5-10 contiguous amino acid residues of the full length molecule, preferably at least about 15-25 contiguous amino acid residues of the full length molecule, and most preferably at least about 20-50 or more contiguous amino acid residues of the full length molecule, or any integer between 5 amino acids and the full length
sequence, provided that the fragment in question retains biological activity, such as Cas9 site- directed endonuclease activity. [0067] "Substantially purified" generally refers to isolation of a substance (compound, polynucleotide, nucleic acid, protein, polypeptide, polypeptide composition) such that the substance comprises the majority percent of the sample in which it resides. Typically, in a sample, a substantially purified component comprises 50%, preferably 80%-85%, more preferably 90-95% of the sample. Techniques for purifying polynucleotides and polypeptides of interest are well-known in the art and include, for example, ion-exchange chromatography, affinity chromatography and sedimentation according to density. [0068] By "isolated" is meant, when referring to a polypeptide, that the indicated molecule is separate and discrete from the whole organism with which the molecule is found in nature or is present in the substantial absence of other biological macro molecules of the same type. The term "isolated" with respect to a polynucleotide is a nucleic acid molecule devoid, in whole or part, of sequences normally associated with it in nature; or a sequence, as it exists in nature, but having heterologous sequences in association therewith; or a molecule disassociated from the chromosome. [0069] The terms "polynucleotide," "oligonucleotide," "nucleic acid" and "nucleic acid molecule" are used herein to include a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. This term refers only to the primary structure of the molecule. Thus, the term includes triple-, double- and single-stranded DNA, as well as triple- , double- and single-stranded RNA. It also includes modifications, such as by methylation and/or by capping, and unmodified forms of the polynucleotide. More particularly, the terms "polynucleotide," "oligonucleotide," "nucleic acid" and "nucleic acid molecule" include polydeoxyribonucleotides (containing 2-deoxy-D-ribose), polyribonucleotides (containing D- ribose), any other type of polynucleotide which is an N- or C-glycoside of a purine or pyrimidine base, and other polymers containing nonnucleotidic backbones, for example, polyamide (e.g., peptide nucleic acids (PNAs)) and polymorpholino (commercially available from the Anti-Virals, Inc., Corvallis, Oreg., as Neugene) polymers, and other synthetic sequence-specific nucleic acid polymers providing that the polymers contain nucleobases in a configuration which allows for base pairing and base stacking, such as is found in DNA and RNA. There is no intended distinction in length between the terms "polynucleotide," "oligonucleotide," "nucleic acid" and "nucleic acid molecule," and these terms will be used
interchangeably. Thus, these terms include, for example, 3ƍ-deoxy-2',5ƍ-DNA, oligodeoxyribonucleotide N3ƍ P5ƍ phosphoramidates, 2'-O-alkyl-substituted RNA, double- and single-stranded DNA, as well as double- and single-stranded RNA, microRNA, DNA:RNA hybrids, and hybrids between PNAs and DNA or RNA, and also include known types of modifications, for example, labels which are known in the art, methylation, "caps," substitution of one or more of the naturally occurring nucleotides with an analog (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, C5-propynylcytidine, C5- propynyluridine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-methylcytidine, 7- deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, O(6)-methylguanine, and 2-thiocytidine), internucleotide modifications such as, for example, those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoramidates, carbamates, etc.), with negatively charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.), and with positively charged linkages (e.g., aminoalklyphosphoramidates, aminoalkylphosphotriesters), those containing pendant moieties, such as, for example, proteins (including nucleases, toxins, antibodies, signal peptides, poly-L-lysine, etc.), those with intercalators (e.g., acridine, psoralen, etc.), those containing chelators (e.g., metals, radioactive metals, boron, oxidative metals, etc.), those containing alkylators, those with modified linkages (e.g., alpha anomeric nucleic acids, etc.), as well as unmodified forms of the polynucleotide or oligonucleotide. The term also includes locked nucleic acids (e.g., comprising a ribonucleotide that has a methylene bridge between the 2'-oxygen atom and the 4'-carbon atom). See, for example, Kurreck et al. (2002) Nucleic Acids Res.30: 1911-1918; Elayadi et al. (2001) Curr. Opinion Invest. Drugs 2: 558-561; Orum et al. (2001) Curr. Opinion Mol. Ther. 3: 239-243; Koshkin et al. (1998) Tetrahedron 54: 3607-3630; Obika et al. (1998) Tetrahedron Lett.39: 5401-5404. [0070] The terms "hybridize" and "hybridization" refer to the formation of complexes between nucleotide sequences which are sufficiently complementary to form duplexes via Watson-Crick base pairing. [0071] In general, "identity" refers to an exact nucleotide to nucleotide or amino acid to amino acid correspondence of two polynucleotides or polypeptide sequences, respectively. Percent identity can be determined by a direct comparison of the sequence information between two molecules by aligning the sequences, counting the exact number of matches between the two aligned sequences, dividing by the length of the shorter sequence, and multiplying the result by 100. Readily available computer programs can be used to aid in the analysis, such as ALIGN, Dayhoff, M.O. in Atlas of Protein Sequence and Structure M.O. Dayhoff ed., 5 Suppl.
3:353358, National Biomedical Research Foundation, Washington, DC, which adapts the local homology algorithm of Smith and Waterman Advances in Appl. Math. 2:482489, 1981 for peptide analysis. Programs for determining nucleotide sequence identity are available in the Wisconsin Sequence Analysis Package, Version 8 (available from Genetics Computer Group, Madison, WI) for example, the BESTFIT, FASTA and GAP programs, which also rely on the Smith and Waterman algorithm. These programs are readily utilized with the default parameters recommended by the manufacturer and described in the Wisconsin Sequence Analysis Package referred to above. For example, percent identity of a particular nucleotide sequence to a reference sequence can be determined using the homology algorithm of Smith and Waterman with a default scoring table and a gap penalty of six nucleotide positions. [0072] Another method of establishing percent identity in the context of the present disclosure is to use the MPSRCH package of programs copyrighted by the University of Edinburgh, developed by John F. Collins and Shane S. Sturrok, and distributed by IntelliGenetics, Inc. (Mountain View, CA). From this suite of packages, the Smith Waterman algorithm can be employed where default parameters are used for the scoring table (for example, gap open penalty of 12, gap extension penalty of one, and a gap of six). From the data generated the "Match" value reflects "sequence identity." Other suitable programs for calculating the percent identity or similarity between sequences are generally known in the art, for example, another alignment program is BLAST, used with default parameters. For example, BLASTN and BLASTP can be used with the following default parameters: genetic code = standard; filter = none; strand = both; cutoff = 60; expect = 10; Matrix = BLOSUM62; Descriptions = 50 sequences; sort by = HIGH SCORE; Databases = non-redundant, GenBank + EMBL + DDBJ + PDB + GenBank CDS translations + Swiss protein + Spupdate + PIR. Details of these programs are readily available. [0073] Alternatively, homology can be determined by hybridization of polynucleotides under conditions which form stable duplexes between homologous regions, followed by digestion with single stranded specific nuclease(s), and size determination of the digested fragments. DNA sequences that are substantially homologous can be identified in a Southern hybridization experiment under, for example, stringent conditions, as defined for that particular system. Defining appropriate hybridization conditions is within the skill of the art. See, e.g., Sambrook et al., supra; DNA Cloning, supra; Nucleic Acid Hybridization, supra.
[0074] The term "homologous region" refers to a region of a nucleic acid with homology to another nucleic acid region. Thus, whether a "homologous region" is present in a nucleic acid molecule is determined with reference to another nucleic acid region in the same or a different molecule. Further, since a nucleic acid is often double-stranded, the term "homologous, region," as used herein, refers to the ability of nucleic acid molecules to hybridize to each other. For example, a single-stranded nucleic acid molecule can have two homologous regions which are capable of hybridizing to each other. Thus, the term "homologous region" includes nucleic acid segments with complementary sequences. Homologous regions may vary in length but will typically be between 4 and 500 nucleotides (e.g., from about 4 to about 40, from about 40 to about 80, from about 80 to about 120, from about 120 to about 160, from about 160 to about 200, from about 200 to about 240, from about 240 to about 280, from about 280 to about 320, from about 320 to about 360, from about 360 to about 400, from about 400 to about 440, etc.). [0075] As used herein, the terms "complementary" or "complementarity" refers to polynucleotides that are able to form base pairs with one another. Base pairs are typically formed by hydrogen bonds between nucleotide units in an anti-parallel orientation between polynucleotide strands. Complementary polynucleotide strands can base pair in a Watson- Crick manner (e.g., A to T, A to U, C to G), or in any other manner that allows for the formation of duplexes. As persons skilled in the art are aware, when using RNA as opposed to DNA, uracil (U) rather than thymine (T) is the base that is considered to be complementary to adenosine. However, when a uracil is denoted in the context of the present disclosure, the ability to substitute a thymine is implied, unless otherwise stated. "Complementarity" may exist between two RNA strands, two DNA strands, or between a RNA strand and a DNA strand. It is generally understood that two or more polynucleotides may be "complementary" and able to form a duplex despite having less than perfect or less than 100% complementarity. Two sequences are "perfectly complementary" or "100% complementary" if at least a contiguous portion of each polynucleotide sequence, comprising a region of complementarity, perfectly base pairs with the other polynucleotide without any mismatches or interruptions within such region. Two or more sequences are considered "perfectly complementary" or "100% complementary" even if either or both polynucleotides contain additional non-complementary sequences as long as the contiguous region of complementarity within each polynucleotide is able to perfectly hybridize with the other. "Less than perfect" complementarity refers to situations where less than all of the contiguous nucleotides within such region of complementarity are able to base pair with each other. Determining the percentage of
complementarity between two polynucleotide sequences is a matter of ordinary skill in the art. For purposes of Cas9 targeting, a gRNA may comprise a sequence "complementary" to a target sequence (e.g., major or minor allele), capable of sufficient base-pairing to form a duplex (i.e., the gRNA hybridizes with the target sequence). Additionally, the gRNA may comprise a sequence complementary to a sequence adjacent to a PAM sequence, wherein the gRNA also hybridizes with the sequence adjacent to a PAM sequence in a target DNA. [0076] A "target site" or "target sequence" is the nucleic acid sequence recognized (i.e., sufficiently complementary for hybridization) by a guide RNA (gRNA) or a homology arm of a donor polynucleotide. The target site may be allele-specific (e.g., a major or minor allele). [0077] The terms “target edit site” or “target edit locus” or “edit locus” refer to a target site in the host cell genome comprising a nucleic acid sequence recognized by a guide RNA (gRNA) or a homology arm of a donor polynucleotide that is or was edited by the methods of the disclosure. [0078] The term “barcode” refers to a DNA sequence used to identify a target molecular during DNA sequencing. A barcode generally is about 20 bp in length, but also can be around 10-100 bp. As disclosed herein, a barcode constitutes a random or pseudo-random DNA sequence within the insert fragment used for in vivo plasmid assembly. As illustrated in Figures 2b and 2c, in vivo plasmid assembly results the insert fragment comprising the barcode linked to the guide-donor fragment, thereby allowing a simple sequencing step to identify a particular donor and/or guide sequences in each cell. [0079] The term “barcode locus” refers to a locus in the host cell genome where a barcode of the disclosure is integrated. The barcode locus can be at a different location in the host cell genome than the target site. [0080] As used herein, the term “subject expression sequence” refers to any polynucleotide of any length and any sequence that can be transcribed into RNA. In aspects, the subject expression sequence is a polynucleotide inserted within the msd region of the retron non-coding RNA (ncRNA) which is converted to complementary DNA (cDNA) during reverse transcription. In aspects, the subject expression sequence is a donor polynucleotide. [0081] The term "donor polynucleotide" or “donor sequence” refers to a polynucleotide that provides a sequence of an intended edit to be integrated into the genome at a target locus by HDR.
[0082] By "homology arm" is meant a portion of a donor polynucleotide that is responsible for targeting the donor polynucleotide to the genomic sequence to be edited in a cell. The donor polynucleotide typically comprises a 5ƍ homology arm that hybridizes to a 5ƍ genomic target sequence and a 3ƍ homology arm that hybridizes to a 3ƍ genomic target sequence flanking a nucleotide sequence comprising the intended edit to the genomic DNA, with the positive or plus strand of the double helix (also called Watson strand) used arbitrarily as the reference. The homology arms are referred to herein as 5ƍ and 3ƍ (i.e., upstream and downstream) homology arms, which relates to the relative position of the homology arms to the nucleotide sequence comprising the intended edit within the donor polynucleotide. The 5ƍ and 3ƍ homology arms hybridize to regions within the target locus in the genomic DNA to be modified, which are referred to herein as the "5ƍ target sequence" and "3ƍ target sequence," respectively. The nucleotide sequence comprising the intended edit is integrated into the genomic DNA by HDR at the genomic target locus recognized (i.e., sufficiently complementary for hybridization) by the 5ƍ and 3ƍ homology arms. [0083] "Administering" a nucleic acid, such as a retron, a nucleic acid encoding a fusion of an RNA binding domain or single stranded nucleic acid binding domain and DNA break localizing domain, guide RNA, or nucleic acid encoding a protein such as a endonuclease, reverse transcriptase or fusion protein of the disclosure, to a cell comprises transforming, transducing, transfecting, electroporating, translocating, fusing, phagocytosing, shooting or ballistic methods, etc., i.e., any means by which a nucleic acid can be transported across a cell membrane. [0084] By "selectively binds" with reference to a guide RNA is meant that the guide RNA binds preferentially to a target sequence of interest or binds with greater affinity to the target sequence than to other genomic sequences. For example, a gRNA will bind to a substantially complementary sequence and not to unrelated sequences. A gRNA that "selectively binds" to a particular allele, such as a particular mutant allele (e.g., allele comprising a substitution, insertion, or deletion), denotes a gRNA that binds preferentially to the particular target allele, but to a lesser extent to a wild-type allele or other sequences. A gRNA that selectively binds to a particular target DNA sequence will selectively direct binding of an RNA-guided nuclease (e.g., Cas9) to a substantially complementary sequence at the target site and not to unrelated sequences.
[0085] As used herein, the term “recombination target site” denotes a region of a nucleic acid molecule comprising a binding site or sequence-specific motif recognized by a site-specific recombinase that binds at the target site and catalyzes recombination of specific sequences of DNA at the target site. Site-specific recombinases catalyze recombination between two such target sites. The location and relative orientation of the target sites determines the outcome of recombination. For example, translocation occurs if the recombination target sites are on separate DNA molecules. [0086] As used herein, the terms "label" and "detectable label" refer to a molecule capable of detection, including, but not limited to, radioactive isotopes, fluorescers, chemiluminescers, chromophores, enzymes, enzyme substrates, enzyme cofactors, enzyme inhibitors, semiconductor nanoparticles, dyes, metal ions, metal sols, ligands (e.g., biotin, streptavidin or haptens) and the like. The term "fluorescer" refers to a substance or a portion thereof which is capable of exhibiting fluorescence in the detectable range. Particular examples of labels which may be used in the practice of the present disclosure include, but are not limited to, SYBR green, SYBR gold, a CAL Fluor dye such as CAL Fluor Gold 540, CAL Fluor Orange 560, CAL Fluor Red 590, CAL Fluor Red 610, and CAL Fluor Red 635, a Quasar dye such as Quasar 570, Quasar 670, and Quasar 705, an Alexa Fluor such as Alexa Fluor 350, Alexa Fluor 488, Alexa Fluor 546, Alexa Fluor 555, Alexa Fluor 594, Alexa Fluor 647,and Alexa Fluor 784, a cyanine dye such as Cy 3, Cy3.5, Cy5, Cy5.5, and Cy7, fluorescein, 2', 4', 5ƍ, 7'- tetrachloro-4-7-dichlorofluorescein (TET), carboxyfluorescein (FAM), 6-carboxy-4',5ƍ- dichloro-2',7'-dimethoxyfluorescein (JOE), hexachlorofluorescein (HEX), rhodamine, carboxy-X-rhodamine (ROX), tetramethyl rhodamine (TAMRA), FITC, dansyl, umbelliferone, dimethyl acridinium ester (DMAE), Texas red, luminol, NADPH, horseradish peroxidase (HRP), and Į-ȕ-galactosidase. [0087] "Recombinant" as used herein to describe a nucleic acid molecule means a polynucleotide of genomic, cDNA, viral, semisynthetic, or synthetic origin which, by virtue of its origin or manipulation, is not associated with all or a portion of the polynucleotide with which it is associated in nature. The term "recombinant" as used with respect to a protein or polypeptide means a polypeptide produced by expression of a recombinant polynucleotide. In general, the gene of interest is cloned and then expressed in transformed organisms, as described further below. The host organism expresses the foreign gene to produce the protein under expression conditions.
[0088] The term "transformation" refers to the insertion of an exogenous polynucleotide into a host cell, irrespective of the method used for the insertion. For example, direct uptake, transduction or f-mating are included. The exogenous polynucleotide may be maintained as a non-integrated vector, for example, a plasmid, or alternatively, may be integrated into the host genome. [0089] "Recombinant host cells", "host cells," "cells", "cell lines," "cell cultures," and other such terms denoting microorganisms or higher eukaryotic cell lines cultured as unicellular entities refer to cells which can be, or have been, used as recipients for recombinant vector or other transferred DNA, and include the original progeny of the original cell which has been transfected. [0090] A "coding sequence" or a sequence which "encodes" a selected polypeptide, is a nucleic acid molecule which is transcribed (in the case of DNA) and translated (in the case of mRNA) into a polypeptide in vivo when placed under the control of appropriate regulatory sequences (or "control elements"). The boundaries of the coding sequence can be determined by a start codon at the 5ƍ (amino) terminus and a translation stop codon at the 3ƍ (carboxy) terminus. A coding sequence can include, but is not limited to, cDNA from viral, prokaryotic or eukaryotic mRNA, genomic DNA sequences from viral or prokaryotic DNA, and even synthetic DNA sequences. A transcription termination sequence may be located 3ƍ to the coding sequence. The coding sequence may be interrupted by introns which can be self- splicing group I or group II introns or those which are spliced out by the host cell splicing machinery, [0091] Typical "control elements," include, but are not limited to, transcription promoters, transcription enhancer elements, introns (located anywhere in the transcript), transcription termination signals, polyadenylation sequences (located 3ƍ to the translation stop codon), sequences for optimization of initiation of translation (located 5ƍ to the coding sequence), and translation termination sequences. [0092] "Operably linked" refers to an arrangement of elements wherein the components so described are configured so as to perform their usual function. Thus, a given promoter operably linked to a coding sequence is capable of effecting the expression of the coding sequence when the proper enzymes are present. The promoter need not be contiguous with the coding sequence, so long as it functions to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between the promoter
sequence and the coding sequence and the promoter sequence can still be considered "operably linked" to the coding sequence. [0093] "Expression cassette" or "expression construct" refers to an assembly which is capable of directing the expression of the sequence(s) or gene(s) of interest. An expression cassette generally includes control elements, as described above, such as a promoter which is operably linked to (so as to direct transcription of) the sequence(s) or gene(s) of interest, and often includes a polyadenylation sequence as well. Within certain embodiments of the present disclosure, the expression cassette described herein may be contained within a plasmid or viral vector construct (e.g., a vector for genome modification comprising a genome editing cassette comprising a promoter operably linked to a polynucleotide encoding a guide RNA and a donor polynucleotide). In addition to the components of the expression cassette, the construct may also include, one or more selectable markers, a signal which allows the construct to exist as single stranded DNA (e.g., a M13 origin of replication), at least one multiple cloning site, and a "mammalian" origin of replication (e.g., a SV40 or adenovirus origin of replication) or “yeast” origin of replication (e.g. a 2-micron vector or centromeric vector with an autonomously replicating sequence (ARS)). [0094] The term "transfection" is used to refer to the uptake of foreign DNA by a cell. A cell has been "transfected" when exogenous nucleic acids have been introduced inside the cell membrane. A number of transfection techniques are generally known in the art. See, e.g., Graham et al. (1973) Virology, 52:456, Sambrook et al. (2001) Molecular Cloning, a laboratory manual, 3rd edition, Cold Spring Harbor Laboratories, New York, Davis et al. (1995) Basic Methods in Molecular Biology, 2nd edition, McGraw-Hill, and Chu et al. (1981) Gene 13:197. Such techniques can be used to introduce one or more exogenous nucleic acids moieties into suitable host cells. The term refers to both stable and transient uptake of the genetic material and includes uptake of peptide- or antibody-linked nucleic acids. [0095] A "vector" is capable of transferring nucleic acid sequences to target cells (e.g., viral vectors, non-viral vectors, particulate carriers, and liposomes). Typically, "vector construct," "expression vector," and "gene transfer vector," mean any nucleic acid construct capable of directing the expression of a nucleic acid of interest and which can transfer nucleic acid sequences to target cells. Thus, the term includes cloning and expression vehicles, as well as plasmid and viral vectors.
[0096] The terms "variant", "analog" and "mutein" refer to biologically active derivatives of the reference molecule that retain desired activity, such as site-directed Cas9 endonuclease activity. In general, the terms "variant" and "analog" refer to compounds having a native polypeptide sequence and structure with one or more amino acid additions, substitutions (generally conservative in nature) and/or deletions, relative to the native molecule, so long as the modifications do not destroy biological activity and which are "substantially homologous" to the reference molecule as defined below. In general, the amino acid sequences of such analogs will have a high degree of sequence homology to the reference sequence, e.g., amino acid sequence homology of more than 50%, generally more than 60%-70%, even more particularly 80%-85% or more, such as at least 90%-95% or more, when the two sequences are aligned. Often, the analogs will include the same number of amino acids but will include substitutions, as explained herein. The term "mutein" further includes polypeptides having one or more amino acid-like molecules including but not limited to compounds comprising only amino and/or imino molecules, polypeptides containing one or more analogs of an amino acid (including, for example, unnatural amino acids, etc.), polypeptides with substituted linkages, as well as other modifications known in the art, both naturally occurring and non-naturally occurring (e.g., synthetic), cyclized, branched molecules and the like. The term also includes molecules comprising one or more N-substituted glycine residues (a "peptoid") and other synthetic amino acids or peptides. (See, e.g., U.S. Patent Nos. 5,831,005; 5,877,278; and 5,977,301; Nguyen et al., Chem. Biol. (2000) 7:463-473; and Simon et al., Proc. Natl. Acad. Sci. USA (1992) 89:9367–9371 for descriptions of peptoids). Methods for making polypeptide analogs and muteins are known in the art and are described further below. [0097] As explained above, analogs generally include substitutions that are conservative in nature, i.e., those substitutions that take place within a family of amino acids that are related in their side chains. Specifically, amino acids are generally divided into four families: (1) acidic -- aspartate and glutamate; (2) basic -- lysine, arginine, histidine; (3) non-polar -- alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan; and (4) uncharged polar -- glycine, asparagine, glutamine, cysteine, serine threonine, and tyrosine. Phenylalanine, tryptophan, and tyrosine are sometimes classified as aromatic amino acids. For example, it is reasonably predictable that an isolated replacement of leucine with isoleucine or valine, an aspartate with a glutamate, a threonine with a serine, or a similar conservative replacement of an amino acid with a structurally related amino acid, will not have a major effect on the biological activity. For example, the polypeptide of interest may include up to about 5-10
conservative or non-conservative amino acid substitutions, or even up to about 15-25 conservative or non-conservative amino acid substitutions, or any integer between 5-25, so long as the desired function of the molecule remains intact. One of skill in the art may readily determine regions of the molecule of interest that can tolerate change by reference to Hopp/Woods and Kyte-Doolittle plots, well known in the art. [0098] "Gene transfer" or "gene delivery" refers to methods or systems for reliably inserting DNA or RNA of interest into a host cell. Such methods can result in transient expression of non-integrated transferred DNA, extrachromosomal replication and expression of transferred replicons (e.g., episomes), or integration of transferred genetic material into the genomic DNA of host cells. Gene delivery expression vectors include, but are not limited to, vectors derived from bacterial plasmid vectors, viral vectors, non-viral vectors, adenoviruses, retroviruses, alphaviruses, pox viruses, and vaccinia viruses. [0099] The term "derived from" is used herein to identify the original source of a molecule but is not meant to limit the method by which the molecule is made which can be, for example, by chemical synthesis or recombinant means. [0100] A polynucleotide "derived from" a designated sequence refers to a polynucleotide sequence which comprises a contiguous sequence of approximately at least about 6 nucleotides, preferably at least about 8 nucleotides, more preferably at least about 10-12 nucleotides, and even more preferably at least about 15-20 nucleotides corresponding, i.e., identical or complementary to, a region of the designated nucleotide sequence. The derived polynucleotide will not necessarily be derived physically from the nucleotide sequence of interest, but may be generated in any manner, including, but not limited to, chemical synthesis, replication, reverse transcription or transcription, which is based on the information provided by the sequence of bases in the region(s) from which the polynucleotide is derived. As such, it may represent either a sense or an antisense orientation of the original polynucleotide. [0101] The term "subject" includes both vertebrates and invertebrates, including, without limitation, mammals, including human and non-human mammals such as non-human primates, including chimpanzees and other apes and monkey species; laboratory animals such as mice, rats, rabbits, hamsters, guinea pigs, and chinchillas; domestic animals such as dogs and cats; farm animals such as sheep, goats, pigs, horses and cows; and birds such as domestic, wild and game birds, including chickens, turkeys and other gallinaceous birds, ducks, geese, and the like. In some cases, the methods of the present disclosure find use in experimental animals, in
veterinary application, and in the development of animal models for disease, including, but not limited to, rodents including mice, rats, and hamsters; primates, and transgenic animals. [0102] The terms “subject,” “individual,” and “patient” are used interchangeably herein to refer to a vertebrate, such as a mammal. Mammals include, but are not limited to, murines, simians, humans, farm animals, sport animals, and pets. Tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed. [0103] “Genetic disease” as used herein refers to a disease, partially or completely, directly or indirectly, caused by one or more abnormalities in the genome, especially a condition that is present from birth. The abnormality may be a mutation, an insertion or a deletion. The abnormality may affect the coding sequence of the gene or its regulatory sequence. The genetic disease may be selected from the group consisting of an inherited muscle disease (e.g., congenital myopathy or a muscular dystrophy), a lysosomal storage disease, a heritable disorder of connective tissue, a neurodegenerative disorder, and a skeletal dysplasia. For example, the genetic disease may be, but is not limited to, Duchenne muscular dystrophy (DMD), Becker's muscular dystrophy, Lamb-girdle muscular dystrophy, dysferlinopathy, dystroglycanopathy, aspartylglucosaminuria, Batten disease, cystinosis, Fabry disease, Gaucher disease, Pompe disease, Tay Sachs disease, Sandhoff disease, metachromatic leukodystrophy, mucolipidosis, mucopolysaccharide storage diseases, Niemann-Pick disease, Schindler disease, Krabbe disease, Ehlers-Danlos syndrome, epidermolysis bullosa, Marfan syndrome, neurofibromatosis, spinal muscular atrophy, amyotrophic lateral sclerosis, progressive muscular atrophy, fragile X syndrome, Charcot-Marie-Tooth disease, osteogenesis imperfecta, achondroplasia, or osteopetrosis. Other genetic diseases include hemophilia, cystic fibrosis, Huntington's chorea, familial hypercholesterolemia (LDL receptor defect), hepatoblastoma, Wilson's disease, congenital hepatic porphyria, inherited disorders of hepatic metabolism, Lesch Nyhan syndrome, sickle cell anemia, thalassaemias, xeroderma pigmentosum, Fanconi's anemia, retinitis pigmentosa, ataxia telangiectasia, Bloom's syndrome, retinoblastoma, and Tay-Sachs disease. [0104] The term “ribozyme” refers to an RNA molecule that is capable of catalyzing a biochemical reaction. In some instances, ribozymes function in protein synthesis, catalyzing the linking of amino acids in the ribosome. In other instances, ribozymes participate in various other RNA processing functions, such as splicing, viral replication, and tRNA biosynthesis. In some instances, ribozymes can be self-cleaving. Non-limiting examples of ribozymes include
the HDV ribozyme, the Lariat capping ribozyme (formally called GIR1 branching ribozyme), the glmS ribozyme, group I and group II self-splicing introns, the hairpin ribozyme, the hammerhead ribozyme, various rRNA molecules, RNase P, the twister ribozyme, the VS ribozyme, the pistol ribozyme, and the hatchet ribozyme. Other examples include the self- cleaving ribozyme-containing R2 elements, the L1Tc retrotransposon found in Trypanosoma cruzi, short interspaced nuclear elements (SINEs) in Schistosomes, Penelope-like elements and retrozymes. For more information regarding ribozymes, see, e.g., Doherty, et al. Ann. Rev. Biophys. Biomol. Struct. 30: 457-475 (2001) and Weinberg, et al., Nucleic Acids Research, (47) 18: 9480–9494 (2019); incorporated herein by reference in its entirety for all purposes. [0105] As used herein, the term “administering” includes oral administration, topical contact, administration as a suppository, intravenous, intraperitoneal, intramuscular, intralesional, intrathecal, intranasal, or subcutaneous administration to a subject. Administration is by any route, including parenteral and transmucosal (e.g., buccal, sublingual, palatal, gingival, nasal, vaginal, rectal, or transdermal). Parenteral administration includes, e.g., intravenous, intramuscular, intra-arteriole, intradermal, subcutaneous, intraperitoneal, intraventricular, and intracranial. Other modes of delivery include, but are not limited to, the use of liposomal formulations, intravenous infusion, transdermal patches, etc. Administering also refers to delivery of material, including biological material such as nucleic acids and/or proteins, into cells by transformation, transfection, transduction, ballistic methods, electroporation, or injection (e.g., intravenous, intramuscular, intra-arteriole, intradermal, subcutaneous, intraperitoneal, intraventricular, and intracranial injection). [0106] The term “treating” refers to an approach for obtaining beneficial or desired results including, but not limited to, a therapeutic benefit and/or a prophylactic benefit. By therapeutic benefit is meant any therapeutically relevant improvement in or effect on one or more diseases, conditions, or symptoms under treatment. For prophylactic benefit, the compositions may be administered to a subject at risk of developing a particular disease, condition, or symptom, or to a subject reporting one or more of the physiological symptoms of a disease, even though the disease, condition, or symptom may not have yet been manifested. [0107] The term “effective amount” or “sufficient amount” refers to the amount of an agent that is sufficient to effect beneficial or desired results. The therapeutically effective amount may vary depending upon one or more of: the subject and disease condition being treated, the weight and age of the subject, the severity of the disease condition, the manner of
administration and the like, which can readily be determined by one of ordinary skill in the art. The specific amount may vary depending on one or more of: the particular agent chosen, the host cell type, the location of the host cell in the subject, the dosing regimen to be followed, whether it is administered in combination with other compounds, timing of administration, and the physical delivery system in which it is carried. [0108] The term “pharmaceutically acceptable carrier” refers to a substance that aids the administration of an active agent to a cell, an organism, or a subject. “Pharmaceutically acceptable carrier” refers to a carrier or excipient that can be included in the compositions of the invention and that causes no significant adverse toxicological effect on the patient. Non- limiting examples of pharmaceutically acceptable carrier include water, NaCl, normal saline solutions, lactated Ringer's, normal sucrose, normal glucose, cell culture media, and the like. One of skill in the art will recognize that other pharmaceutical carriers are useful in the present invention. [0109] As used herein, the term “heterologous” refers to biological material that is introduced, inserted, or incorporated into a recipient (e.g., host) organism that originates from another organism. Typically, the heterologous material that is introduced into the recipient organism (e.g., a host cell) is not normally found in that organism. Heterologous material can include, but is not limited to, nucleic acids, amino acids, peptides, proteins, and structural elements such as genes, promoters, and cassettes. A host cell can be, but is not limited to, a bacterium, a yeast cell, a mammalian cell, or a plant cell. The introduction of heterologous material into a host cell or organism can result, in some instances, in the expression of additional heterologous material in or by the host cell or organism. As a non-limiting example, the transformation of a yeast host cell with an expression vector that contains DNA sequences encoding a bacterial protein may result in the expression of the bacterial protein by the yeast cell. The incorporation of heterologous material may be permanent or transient. Also, the expression of heterologous material may be permanent or transient. II. Methods for editing DNA in a cell [0110] Most biological traits are complex and controlled by genetic variants across the genome. To fully unravel their genetic architectures, methods that can systematically dissect the causal variants in each locus in a comprehensive and scalable fashion are preferred. Towards this goal, the inventors previously developed a donor DNA-based CRISPR system for engineering and functionally profiling thousands of genetic variants in pooled screens
termed Multiplexed Accurate Genome Editing with Short, Trackable, Integrated Cellular barcodes (MAGESTIC). Provided herein is a substantially improved system termed MAGESTIC 3.0, which combines three orthogonal enhancements for homology-directed repair (HDR): donor DNA recruitment with a DNA break-site binding domain, single-stranded donor DNA synthesis with the bacterial retron system, and in vivo assembly of linearized donor plasmids. Each system functions at different stages in the editing process to improve editing outcomes by increasing the fraction of correctly edited cells, and reducing the fraction of non- edited and aberrantly edited cells. The retron produces multiple copies of single-stranded DNA (ssDNA), which accumulate in the cell and improve editing outcomes over multiple generations. The linearized guide-donor plasmids (or donor-gRNA plasmids) provide an optimal donor DNA template immediately upon transformation, prior to the buildup of retron template in the cell, and thereby enhancing editing survival for guides with higher cleavage efficacy. Donor recruitment brings either the double-stranded or single-stranded DNA templates in close proximity to the target site to improve HDR efficiency. By combining each into a single system, MAGESTIC 3.0 improves editing efficiency to the highest overall levels of any system, nearly completely inhibiting both the toxicity associated with editing as well structural variant formation at susceptible target sites. [0111] Thus, in one aspect, provided herein are methods for increasing the editing efficiency at a target locus in a cell. In some embodiments, the target locus is in the genome of a cell. The methods provide the advantage of combining in vivo plasmid assembly, donor recruitment, and retron donor DNA generation. The inventors demonstrate that each editing system improves edit outcomes at different sites in different ways. Surprisingly, the integration of all three systems results in substantially improved editing at all sites measured and in multiple distinct assays, enabling effective editing of structural variant prone regions and saturation editing across entire genomic loci for the first time. [0112] In one aspect, the disclosure provides a method for increasing editing efficiency, fidelity, and/or survival of an edited cell, the method comprising introducing into a cell: i) a linear double stranded polynucleotide comprising a double stranded DNA (dsDNA) donor sequence and a nucleic acid sequence encoding a guide RNA operably linked to first promoter; and ii) a linear double stranded polynucleotide;
wherein the linear double stranded polynucleotide of (ii) is linked in vivo to the linear double stranded polynucleotide of (i) by homology directed repair (HDR) or non-homologous end joining (NHEJ) to produce a circular donor-gRNA (or guide-donor) plasmid; wherein the cell comprises a CRISPR-associated (Cas) endonuclease or a nucleic acid sequence encoding the Cas endonuclease, and wherein the Cas endonuclease and the circular donor- gRNA plasmid and/or the linear double stranded polynucleotide of (i) prior to assembly generate a site-specific edit in the genome of the cell, and increases the editing efficiency, fidelity, and/or survival of the cell compared to a method that does not include in vivo plasmid assembly to produce a circular donor-gRNA plasmid. [0113] In some embodiments, the linear double stranded polynucleotide of (ii) is linked in vivo to the linear double stranded polynucleotide of (i) by homology directed repair (HDR). For the HDR-dependent assembly, plasmids harboring guide and donor cassettes are first linearized with a restriction enzyme, such as but not limited to HindIII or I-SceI, although any restriction enzyme site can be used. Alternatively, the plasmid fragment can be amplified by PCR. The linear double stranded polynucleotide of (i) is referred to as the guide-donor backbone (or the donor-gRNA backbone). A separate piece of the plasmid, the linear double stranded polynucleotide of (ii), also referred to as the insert, can similarly be produced by restriction digestion or by PCR amplification. In some embodiments, the insert comprises a selectable marker required for cell growth. The ends of the linear double stranded polynucleotide of (i) (or the guide-donor backbone, or the donor-gRNA backbone) and the ends of the linear double stranded polynucleotide of (ii) (or the insert) overlap with sufficient homology for repair by HDR. In some embodiments, the region of overlap comprises 20 or greater than 20 base-pairs, such as 20, 30, 40, 50 or hundreds to thousands of base pairs of overlap. The linear double stranded polynucleotide of (i) (or the guide-donor backbone, or the donor-gRNA backbone) and the linear double stranded polynucleotide of (ii) (or the insert) are then transformed into cells which reconstitute the circular vectors by HDR. [0114] In some embodiments, the linear double stranded polynucleotide of (ii) is linked in vivo to the linear recombinant double stranded polynucleotide of (i) by non-homologous end joining (NHEJ). For the NHEJ assembly method, the guide-donor plasmid (or the donor-gRNA plasmid) is cleaved by one or two restriction enzymes that leave sticky end overhangs. In some embodiments, the sticky end overhangs are incompatible overhangs (i.e., do not hybridize to each other) to prevent ligation of the linear double stranded polynucleotide of (i) (or the guide-
donor backbone, or the donor-gRNA backbone). The linear double stranded polynucleotide of (ii) (or the insert) can similarly be produced by restriction digestion of the vector or by PCR amplification followed by restriction digestion. In some embodiments, the linear double stranded polynucleotide of (ii) (or the insert) comprises a selectable marker required for cell growth. In some embodiments, the overhangs generated on both ends of the insert are incompatible for self-ligation, and only compatible for ligation with the guide-donor backbone. The linear double stranded polynucleotide of (i) (or guide-donor backbone, or the donor-gRNA backbone) and the linear double stranded polynucleotide of (ii) (or the insert) are then transformed into cells which reconstitute the circular vectors by ligation via non-homologous end joining (NHEJ). [0115] In some embodiments, the linear double stranded polynucleotide of (ii) comprises a selectable marker. In some embodiments, the linear double stranded polynucleotide of (ii) further comprises a barcode marker. In some embodiments, the barcode sequence integrates into a designated barcode locus in the host cell genome. In some embodiments, such integration is inducible. In some embodiments, the integration of the barcode sequence into the barcode locus happens after the designated barcode locus is cleaved by a recombinant endonuclease. In some embodiments, a recombinant nucleic acid sequence encoding the endonuclease is operably linked to an inducible promoter. In such embodiments, the integration of the barcode sequence into the barcode locus in the host cell genome is inducible. [0116] In some embodiments, the method further comprises introducing into the cell a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a second promoter, the retron ncRNA comprising: one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences; an msr sequence; an msd sequence; a donor sequence for homology directed repair (HDR) inserted within the msd sequence; and a first inverted repeat sequence and a second inverted repeat sequence. In some embodiments, the retron optionally comprises a stabilizing 5ƍ ribozyme sequence. In some embodiments, the cell further comprises a retron-specific reverse transcriptase (RT) or a nucleic acid encoding the retron RT, and the retron RT and the retron ncRNA generate multicopy single-stranded DNA (msDNA) containing the single- stranded donor (or named ssDNA retron donor) sequences. [0117] In some embodiments, the RNA binding domain recognition sequence is an RNA sequence specifically bound by an RNA binding domain of a polypeptide or an aptamer.
Examples of RNA binding domain recognition sequences that bind polypeptide RNA binding domains include, but are not limited to, MS2 stem loop sequence which binds to the MS2 coat protein (MCP), a Pumilio (PUF) recognition sequence, RNA Recognition Motif (RRM) recognition sequence, Double-Stranded RNA-Binding Domain (dsRBD) recognition sequence, Zinc finger (ZF) Domain recognition sequences, Z-alpha, arginine/glycine rich (RGG) domain recognition sequences, a K Homology (KH) Domain recognition sequence, or Poly(A) tail. An exemplary MS2 coat protein (MCP) is a bacteriophage MS2 coat protein (see, for example UniProtKB - J9QBW2 (J9QBW2_BPMS2) and UniProtKB - P03612 (CAPSD_BPMS2)). [0118] In some embodiments, the one or more RNA binding domain recognition sequences comprises a stem loop sequence from the bacteriophage MS2. In some embodiments, the RNA binding domain recognition sequence is a MS2 stem loop sequence. In some embodiments, the MS2 stem loop sequence binds to a MS2 coat protein (MCP) RNA binding domain. [0119] In some embodiments, the single stranded nucleic acid binding domain recognition sequence is a single stranded DNA or RNA sequence specifically bound by a single stranded nucleic acid binding domain of a polypeptide or an aptamer. Non-limiting examples of single stranded nucleic acid binding domain recognition sequences are described in Dickey et al., “Single-Stranded DNA-Binding Proteins: Multiple Domains for Multiple Functions,” Structure 21(7), pgs 1074-1084, July 2, 2013, and references cited therein. As described in Dickey et al. (2013), single stranded DNA-binding proteins have a wide range of structures and functions, but many of them contain small autonomous domains whose recognition of ssDNA has been well studied. These domains include four structural topologies that have been structurally characterized with ssDNA: oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds, K homology (KH) domains, RNA recognition motifs (RRMs), and whirly domains. OB folds are formed from a five-stranded ȕ barrel with interspersed loop and helical elements, show significant structural divergence and are capable of binding a variety of ligands in addition to ssDNA and ssRNA (Theobald et al., 2003). OB folds can bind ssDNA with high sequence specificity. For example, telomere-end protection (TEP) proteins utilize OB folds to sequence specifically bind the GT-rich 30 ssDNA overhang constitutively found at the end of eukaryotic telomeres (reviewed in Horvath, 2011; Lewis and Wuttke, 2012). Example of proteins containing OB folds include Pot1, Cdc13, and TEBP, which are responsible for coordinating end protection and telomerase recruitment at the telomere. KH domains are small domains (approximately 70 aa) characterized by three Į helices packed against a three-stranded ȕ sheet (reviewed in Valverde et al., 2008), and KH domains from proteins structurally
characterized in complex with ssDNA include heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)-binding protein (FBP), and poly(C)-binding proteins (PCBP) 1 and 2. RRMs most often bind RNA, but have also been shown to bind ssDNA (reviewed in Cle´ry et al., 2008). RRMs are typically about 90 aa in length and form a relatively large ȕ sheet surface (more similar to OB folds than to KH domains) packed against two a helices. The majority of RRMs contain two conserved sequence motifs (RNPs) on strands 1 and 3 that form the primary nucleic acid-binding surface. Residues found elsewhere in the sheet (sometimes including an additional strand) and intervening loops also contribute to nucleic acid binding. Whirly domains are large (approximately 180 aa) domains that contain two roughly parallel four-stranded ȕ sheets with interspersed helical elements. Individual domains form tetramers through interaction of the helices, and these tetramers further interact to form hexamers of tetramers (Cappadocia et al., 2010, 2012). See Dickey et al., “Single-Stranded DNA-Binding Proteins: Multiple Domains for Multiple Functions,” Structure 21(7), pgs 1074-1084, July 2, 2013, and references cited therein. In some embodiments, the single stranded nucleic acid binding domain recognition sequence binds to a Cas endonuclease binding domain. [0120] Thus, in some embodiments, the one or more single stranded nucleic acid binding domain recognition sequences include, but are not limited to, oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds, such as in human POT1, Schizosaccharomyces pombe Pot1, Sterkiella nova TEPB, Cdc13, CspB protein from Bacillus caldolyticus and Bacillus subtilis; K homology (KH) domains, such as in KH domain- containing proteins heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)-binding protein (FBP), and poly(C)-binding proteins (PCBP) 1 and 2; RNA recognition motifs (RRMs) which bind DNA such as in FBP-interacting repressor (FIR), hnRNP A1, and hnRNP D (also known as Auf1); and whirly domains such as in the mitochondrial whirly protein Why2 and the mammalian transcriptional regulator PurA. In some embodiments, the single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a G-quadruplex binding domain including nucleolin, hnRNP, serine/arginine-rich splicing factors (SRSF) 1 and 9, splicing factor U2AF, TRF2, FRM2, and the RNA helicase associated with AU-rich element (RHAU) proteins (see V. Brázda et al., DNA and RNA quadruplex-binding proteins. Int J Mol Sci. 2014;15(10):17493-17517. doi:10.3390/ijms151017493). In some embodiments, the single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a Cas endonuclease.
[0121] Also provided herein are chimeric constructs encoding a retron multicopy single- stranded DNA (msDNA), which comprises an msr RNA covalently attached to a msd DNA, wherein the RNA comprises one or more RNA binding domain recognition sequences and an msr sequence; and wherein the DNA comprises an msd sequence and a subject expression sequence inserted within the msd sequence. In aspects, the subject expression sequence comprises a donor sequence for homologous directed repair (HDR). [0122] Also provided herein are polypeptides and their encoding nucleic acids comprising an RNA binding domain or single stranded nucleic acid binding domain covalently bound to a dsDNA break site localizing domain. In aspects, the RNA binding domain is an RNA binding domain of a polypeptide that binds to a MS2 stem loop sequence which binds to the MS2 coat protein (MCP), a Pumilio (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double-Stranded RNA-Binding Domain (dsRBD) recognition sequence, a Zinc finger (ZF) Domain recognition sequence, a Z-alpha, arginine/glycine rich (RGG) domain recognition sequence, a K Homology (KH) Domain recognition sequence, or a Poly(A) tail. [0123] In aspects, the single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a polypeptide that binds to a specific sequence of a single stranded DNA or RNA, such as a Cas endonuclease binding domain. Single stranded nucleic acid binding domain recognition domains of polypeptides include, but are not limited to, oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds, such as in such as human POT1, Schizosaccharomyces pombe Pot1, Sterkiella nova TEPB, CspB protein from Bacillus caldolyticus and Bacillus subtilis; K homology (KH) domains, such as in KH domain- containing proteins include heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)-binding protein (FBP), and poly(C)-binding proteins (PCBP) 1 and 2; RNA recognition motifs (RRMs) which bind DNA such as in FBP-interacting repressor (FIR), hnRNP A1, and hnRNP D (also known as Auf1); and whirly domains such as in the mitochondrial whirly protein Why2 and the mammalian transcriptional regulator PurA. [0124] It will be understood that additional RNA binding proteins with well-characterized motifs can be utilized for recruiting the retron msDNA. As an alternative mechanism to recruit the retron via the cDNA, an inverted LexA-LexA repeat with an intervening loop sequence could be inserted into the reverse-transcribed portion of the retron donor. Upon reverse transcription these inverted repeats would fold back on one another creating a highly stable
stem loop structure and enable the LexA DNA binding domain to be utilized. The FHA domain could be replaced with other domains known to bind to double-strand breaks, or the MCP could be fused directly to Cas9 to have retron donor present at the cut site when Cas9 cleavage occurs. Alternatively, other RNA binding domains and aptamers could be used in place of the MS2 system such as the programmable RNA-binding domains of Pumilio/fem-3 mRNA binding factors (PUF domains) (Zhao et al., Nucleic Acids Research, 2018 PMCID: PMC5961129) or using CRISPR-Cas systems, where the scaffold for a deactivated Cas nuclease could be introduced in place of MS2 loops, and the deactivated Cas enzyme fused to the FHA domain. [0125] In aspects, the DNA break site localizing domain is a DNA break site localizing domain of a polypeptide listed in Tables 1-5 below. Table 1. Human Proteins for Recruitment to DNA Break
Table 2. Mammalian FOX Genes
Table 3. Human DNA Damage-Binding Genes
Table 4: Human DNA Repair Genes
Table 5: Yeast DNA Repair Genes
[0126] In some embodiments, the method further comprises introducing into the cell a fusion protein comprising a) a DNA binding domain, an RNA binding domain and/or single stranded nucleic acid binding domain and b) a dsDNA break site-localizing domain, or a nucleic acid encoding the fusion protein, wherein the fusion protein binds to the DNA binding domain recognition sequence of the circular donor-gRNA plasmid or the first linear double stranded polynucleotide prior to assembly, and/or the one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences of the retron, and binds to a dsDNA break site generated by the Cas endonuclease, thereby recruiting the dsDNA donor sequences and/or the ssDNA retron donor sequences to the dsDNA break in the genome of the cell and promoting editing by HDR. [0127] In some embodiments, the fusion protein comprises a DNA binding domain and a dsDNA break site-localizing domain. In some embodiments, such fusion protein can bind to the DNA binding domain recognition sequence of the circular donor-gRNA plasmid or the first linear double stranded polynucleotide prior to assembly and the dsDNA break site in the genome of the cell, thereby recruiting the dsDNA donor sequences to the dsDNA break in the genome of the cell and promoting editing by HDR. [0128] In some embodiments, the fusion protein comprises an RNA binding domain or single stranded nucleic acid binding domain and a dsDNA break site-localizing domain. In some embodiments, such fusion protein can bind to the RNA binding domain or single stranded
nucleic acid binding domain of the retron. As disclosed herein, a retron RNA expressed by the nucleic acid sequence encoding the retron is reverse transcribed by the RT in vivo to produce multiple ssDNA molecules. In some embodiments, the fusion protein binds to the RNA binding domain or single stranded nucleic acid binding domain of the individual ssDNA retron molecules to produce a complex between the linear ssDNA molecules, the fusion protein, and the double strand DNA break site, thereby recruiting the ssDNA retron donor sequences to the dsDNA break in the genome of the cell and promoting editing by HDR [0129] In some embodiments, the fusion protein comprises a) a DNA binding domain, b) an RNA binding domain or single stranded nucleic acid binding domain, and c) a dsDNA break site-localizing domain. In some embodiments, such fusion protein can bind to i) the circular donor-gRNA plasmid or the first linear double stranded polynucleotide prior to assembly, ii) the individual ssDNA retron molecules, and iii) the dsDNA break site in the genome of the cell, thereby recruiting the dsDNA donor sequences and the ssDNA retron donor sequences to the dsDNA break in the genome of the cell and promoting editing by HDR. [0130] In some embodiments, the cell expresses a Cas endonuclease or a nucleic acid encoding the same, and a reverse transcriptase (RT) or a nucleic acid encoding the same. In some embodiments, the single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a Cas endonuclease. In some embodiments, the fusion protein binds to the one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences and also binds to a double strand DNA break site generated by a Cas endonuclease. [0131] In some embodiments, retron RNA expressed by the nucleic acid sequence encoding a retron is reverse transcribed by the RT in vivo to produce multiple single stranded DNA molecules, wherein individual single stranded DNA molecules bind to the fusion protein to produce a complex between the linear DNA molecules, the fusion protein, and the double strand DNA break site, thereby recruiting the retron donor sequence to the DNA break and promoting editing by HDR. [0132] In some embodiments, the locus surrounding the double strand DNA break site accumulates phosphothreonine (pT) modified proteins recognized by forkhead-associated (FHA) domains. As is known in the art, accumulation of phosphothreonine (pT) modified proteins is a natural property of the cell’s DNA damage response. Thus, in some embodiments, RNA binding domain of the fusion protein comprises an RNA binding domain of MS2 coat
protein (MCP) and the DNA break site localizing domain of the fusion protein comprises a forkhead-associated (FHA) domain. In some embodiments, the fusion protein comprises a MS2 coat protein (MCP) RNA binding domain and a forkhead-associated (FHA) phosphothreonine-binding domain. [0133] In some embodiments, the linear double stranded donor recruitment polynucleotide of (i) further comprises a site for a nucleic acid binding domain. In some embodiments, the nucleic acid binding domain is a LexA DNA binding domain or a forkhead homolog 1 (FKH1) DNA binding domain. Thus, in some embodiments, the fusion protein further comprises a LexA DNA binding domain or an FKH1 DNA binding domain. [0134] In some embodiments, the fusion protein comprises three domains selected from a DNA binding domain, an RNA binding domain, and a DNA break site localizing domain. In some embodiments, the fusion protein comprises (i) a LexA DNA binding domain (ii) an RNA binding domain of MCP, and (iii) an FHA DNA break site localizing domain. In some embodiments, the domains in the fusion protein are arranged in one of the following configurations: (i), (ii), (iii); (i), (iii), (ii); (ii), (i), (iii); (ii), (iii), (i); (iii), (i), (ii); and (iii), (ii), (i). [0135] In some embodiments, the fusion protein comprises (i) the FKH1 DNA binding domain (ii) the RNA binding domain of MCP, and (iii) an FHA DNA break site localizing domain. In some embodiments, the domains in the fusion protein are arranged in one of the following configurations: (i), (ii), (iii); (i), (iii), (ii); (ii), (i), (iii); (ii), (iii), (i); (iii), (i), (ii); and (iii), (ii), (i). [0136] In some embodiments, the fusion protein forms a complex with the circular plasmid or linearized donor-gRNA backbone prior to assembly and the double strand DNA break site, thereby recruiting the circular plasmid or linearized donor backbone to the DNA break and enhancing HDR. [0137] In some embodiments, the donor recruitment polynucleotide comprises nucleotide sequences complementary to sequences adjacent to the DNA break. In some embodiments, the sequences complementary to sequences adjacent to the DNA break comprise the same sequences that are transcribed from the retron donor in the retron RNA. [0138] In some embodiments, the first promoter is a constitutive or inducible promoter. In some embodiments, the second promoter is a constitutive or inducible promoter.
[0139] In some embodiments, the Cas endonuclease is a Cas9 or Cas12a endonuclease, or a modified variant thereof. [0140] In some embodiments, the cell is a eukaryotic cell. In some embodiments, the eukaryotic cell is selected from a yeast cell, a vertebrate cell, or a mammalian cell. [0141] In some embodiments, the linear double stranded polynucleotide of (ii) further comprises a barcode sequence. In some embodiments, integration of the barcode sequence into a barcode locus in the host cell genome is inducible. It will be understood that the barcode locus can be a predetermined or designated locus that is different than the edited target site locus (the edited locus). [0142] In some embodiments, integration of the barcode sequence into the barcode locus comprises cleavage of the barcode locus genomic DNA by an endonuclease, wherein the endonuclease is operably linked to an inducible promoter. In some embodiments, the endonuclease is a Cas endonuclease or a homing endonuclease. In some embodiments, the homing endonuclease is an I-SceI endonuclease operably linked to a GAL1 promoter that is inducible by galactose. [0143] In some embodiments, the circular donor recruitment plasmid comprises sequences that are homologous to sequences flanking the homing endonuclease cleavage site, such that homologous recombination results in integration of the barcode sequence into the barcode locus. [0144] In some embodiments, the barcode locus in the host cell genome comprises nucleic acid sequences encoding the Cas endonuclease used for editing the target site, or sequences encoding the reverse transcriptase (RT), and/or sequences encoding the fusion protein flanked by the homing endonuclease cleavages sites, wherein expression of the homing endonuclease results in removal of the nucleic acid sequences encoding the Cas endonuclease, or sequences encoding the reverse transcriptase (RT), and/or sequences encoding the fusion protein, concomitantly (in tandem or together) with integration of the barcode sequence. [0145] In another aspect, the disclosure provides a method for removing a plasmid which has integrated into an edited target locus in the genome of a cell. In some embodiments, the plasmid is a guide-donor plasmid as disclosed herein. In some embodiments, the guide-donor plasmid comprises: i) a nucleic acid sequence encoding a guide RNA operably linked to first promoter;
ii) a nucleic acid sequence encoding a homing endonuclease operably linked to a second promoter, wherein the second promoter is inducible; iii) a nucleic acid sequence encoding a Cas endonuclease operably linked to a third promoter, wherein the third promoter is inducible; iv) a nucleic acid sequence that is cleaved by the homing endonuclease; and/or v) a nucleic acid sequence that is cleaved by the Cas endonuclease. In such embodiments, the method for removing the integrated plasmid from the genome of the cell comprises inducing expression of the homing endonuclease and/or the Cas endonuclease to cleave the integrated plasmid DNA, thereby removing the plasmid from the edited target locus. [0146] In some embodiments, plasmid integration is accompanied by tandem repeat duplication of the donor sequence. In this case, cleavage of the integrated plasmid can result in recovery of the desired edit at the target locus by tandem repeat-mediated deletion of the integrated plasmid and one of the donor copies. [0147] In some embodiments, the homing endonuclease is an I-SceI endonuclease. In some embodiments, the second promoter is inducible by galactose. In some embodiments, the second promoter is GAL1 promoter. In some embodiments, the Cas endonuclease is Cas9, or a modified variant thereof. In some embodiments, the Cas endonuclease is SaCas9, or a modified variant thereof. In some embodiments, the third promoter is inducible by tetracycline or anhydrotetracycline (aTc). [0148] In some embodiments, the guide-donor plasmid further comprises a barcode sequence that is flanked by (iv) the nucleic acid sequence that is cleaved by the homing endonuclease and/or (v) the nucleic acid sequence that is cleaved by the Cas endonuclease. In some embodiments, inducing expression of the homing endonuclease and/or the Cas endonuclease results in integration of the barcode sequence at a barcode locus, while simultaneously removing the guide-donor plasmid from the edited target locus. [0149] In another aspect, a method for multiplexed editing of DNA in cells is provided, the method comprising introducing into the cells: i) a guide RNA that binds a target site in the genomic DNA in the cells; ii) a library of linear double stranded donor polynucleotides comprising a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a promoter, wherein the retron ncRNA comprising:
one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences; an msr sequence; an msd sequence; a donor sequence for homology directed repair (HDR) inserted within the msd sequence, wherein each donor sequence in the library introduces a different edit at the target site that binds the guide RNA; and, a first inverted repeat sequence and a second inverted repeat sequence; iii) a linear double stranded polynucleotide; wherein the linear double stranded polynucleotide of (iii) is linked in vivo to the linear recombinant double stranded polynucleotide of (ii) by homology directed repair (HDR) or non- homologous end joining (NHEJ) to produce a circular donor plasmid; and iv) a fusion protein comprising an RNA binding domain or single stranded nucleic acid binding domain connected to a DNA break site-localizing domain, or a nucleic acid encoding the fusion protein; wherein the cells comprises a Cas endonuclease or a nucleic acid encoding the same, and a retron-specific reverse transcriptase (RT) or a nucleic acid encoding the same; wherein the fusion protein binds to the one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences of the retron, and binds to a dsDNA break site generated by the Cas endonuclease; wherein retron RNA expressed by (ii) is reverse transcribed by the RT in vivo to produce multicopy single-stranded DNA (msDNA) molecules, comprising a single-stranded DNA portion (encoded by msd) and a single-stranded RNA portion (encoded by msr) linked by a 2’- to-5’ phosphodiester moiety installed as the first nucleotide is reverse transcribed by the retron RT, wherein individual msDNA molecules bind to the fusion protein to produce a complex between the linear msDNA molecules, the fusion protein, and the double-strand DNA break site locus, thereby recruiting the ssDNA retron donor sequence to the dsDNA break locus and promoting editing by HDR, wherein a designed edit is introduced at the target site, with a plurality of different edits produced in different cells and each cell receiving a single edit.
[0150] In some embodiments, the retron further or optionally comprises a stabilizing 5ƍ ribozyme sequence. In some embodiments, the linear double stranded polynucleotide of (iii) comprises a selectable marker. [0151] In some embodiments, the guide RNA of (i) is physically linked to the linear double stranded donor polynucleotides of (ii). In some embodiments, the same guide RNA is physically linked to different linear double stranded donor recruitment polynucleotides present in the library of (ii). In some embodiments, the same first guide RNA is physically linked to a first library of linear double stranded donor recruitment polynucleotides, and a different second guide RNA is physically linked to a second library of linear double stranded donor recruitment polynucleotides, and so on. In some embodiments, the guide RNA and linear double stranded donor recruitment polynucleotides are synthesized as part of the same polynucleotide, and thus covalently linked. [0152] In some embodiments, the RNA binding domain recognition sequence is a MS2 stem loop sequence. In some embodiments, the MS2 stem loop sequence binds to a MS2 coat protein (MCP) binding domain. In some embodiments, the single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a Cas endonuclease. [0153] In some embodiments, the locus surrounding the double-strand DNA break site accumulates phosphothreonine (pT) modified proteins recognized by forkhead-associated (FHA) domains. In some embodiments, the RNA binding domain of the fusion comprises an RNA binding domain of MS2 coat protein (MCP) and the DNA break site localizing domain of the fusion protein comprises a forkhead-associated (FHA) domain. Thus, in some embodiments, the fusion protein comprises a MS2 coat protein (MCP) binding domain and a forkhead-associated (FHA) phosphothreonine-binding domain. [0154] In some embodiments, the linear double stranded donor recruitment polynucleotides of (ii) further comprise a nucleic acid binding domain. In some embodiments, the nucleic acid binding domain is a LexA DNA binding domain or a forkhead homolog 1 (FKH1) DNA binding domain. Thus, in some embodiments, the fusion protein further comprises a LexA domain or FKH1 binding domain. [0155] In some embodiments, the fusion protein comprises three domains selected from a DNA binding domain, an RNA binding domain, and a DNA break site localizing domain. In some embodiments, the fusion protein comprises (i) a LexA DNA binding domain (ii) an RNA binding domain of MCP, and (iii) an FHA DNA break site localizing domain. In some
embodiments, the domains in the fusion protein are arranged in one of the following configurations: (i), (ii), (iii); (i), (iii), (ii); (ii), (i), (iii); (ii), (iii), (i); (iii), (i), (ii); and (iii), (ii), (i). [0156] In some embodiments, the fusion protein comprises (i) the FKH1 DNA binding domain (ii) the RNA binding domain of MCP, and (iii) an FHA DNA break site localizing domain. In some embodiments, the domains in the fusion protein are arranged in one of the following configurations: (i), (ii), (iii); (i), (iii), (ii); (ii), (i), (iii); (ii), (iii), (i); (iii), (i), (ii); and (iii), (ii), (i). [0157] In some embodiments, the fusion protein forms a complex with the circular plasmid and the double strand DNA break site, thereby recruiting the circular plasmid to the DNA break and enhancing HDR. [0158] In some embodiments, the donor recruitment polynucleotide comprises nucleotide sequences complementary to sequences adjacent to the DNA break. In some embodiments, the sequences complementary to sequences adjacent to the DNA break comprise the same sequences that are transcribed from the retron donor in the retron RNA. [0159] In some embodiments, the promoter is a constitutive or inducible promoter. [0160] In some embodiments, the Cas endonuclease is a Cas9 or Cas12a endonuclease, or a modified variant thereof. [0161] In some embodiments, the plurality of cells are eukaryotic cells. In some embodiments, the eukaryotic cells are selected from a yeast cell, a vertebrate cell, or a mammalian cell. [0162] In some embodiments, the editing efficiency, fidelity, and/or survival is improved compared to a method that does not include in vivo plasmid assembly to produce a circular donor recruitment plasmid. [0163] In some embodiments, the linear double stranded polynucleotide of (iii) further comprises a barcode sequence. III. Retrons [0164] Exemplary retrons comprising msr, msd, and inverted repeat sequences that can be used in the nucleic acids of the disclosure are provided in Table 6. The retrons in Table 6 also express reverse transcriptases that can be used in the methods of the disclosure.
Table 6. Exemplary retrons.
(see Simon, A.J., et al., Retrons and their applications in genome engineering, Nucleic Acids Research, Volume 47, Issue 21, 02 December 2019, Pages 11007–11019). [0165] In some embodiments, the retron encoded by the nucleic acids described herein is a Retron-Eco1 (Ec86) retron and reverse transcriptase system. IV. Compositions [0166] In another aspect, the disclosure provides a system for editing DNA at a target site in the genome of a cell. In some embodiments, the system comprises: (i) a first linear double-stranded donor polynucleotide comprising a nucleic acid sequence encoding a guide RNA operably linked to a first promoter and a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a second promoter, the retron ncRNA comprising: one or more RNA binding domain recognition sequences or one or more single- stranded nucleic acid binding domain recognition sequences; an msr sequence; an msd sequence; a donor sequence for homology directed repair (HDR) inserted within the msd sequence; and a first inverted repeat sequence and a second inverted repeat sequence; (ii) a second linear double-stranded polynucleotide, and (iii) a fusion protein comprising an RNA binding domain or single-stranded nucleic acid binding domain connected to a dsDNA break site-localizing domain, or a nucleic acid encoding the fusion protein. [0167] In some embodiments, the retron optionally comprises a stabilizing 5’ ribozyme sequence. In some embodiments, the linear double stranded polynucleotide of (ii) comprises a selectable marker.
[0168] In some embodiments, the system further comprises a cell that expresses a CRISPR- associated (Cas) endonuclease or a nucleic acid encoding the same, and a retron-specific reverse transcriptase (RT) or a nucleic acid encoding the same. [0169] In some embodiments, the fusion protein binds to the one or more RNA binding domain recognition sequences or one or more single-stranded nucleic acid binding domain recognition sequences and also binds to a double-strand DNA break site generated by the Cas endonuclease at the target site. [0170] In some embodiments, the retron RNA expressed by (i) is reverse transcribed by the RT in vivo to produce multiple ssDNA molecules comprising ssDNA retron donor sequences, wherein individual ssDNA molecules bind to the fusion protein to produce a complex between the linear DNA molecules, the fusion protein, and the dsDNA break site, thereby recruiting the ssDNA retron donor sequence to the dsDNA break site and promoting editing by HDR. [0171] Also provided are constructs encoding a retron multicopy single-stranded DNA (msDNA), which comprises an msr RNA covalently attached to a msd DNA complexes including a chimera of an RNA hybridized to a DNA, wherein the RNA comprises one or more RNA binding domain recognition sequences and an msr sequence; and wherein the DNA comprises an msd sequence and a subject expression sequence inserted within the msd sequence, and where the chimera is non-covalently bound to a polypeptide that includes an RNA binding domain or single stranded nucleic acid binding domain bound to a DNA break site localizing domain. V. Methods of Use [0172] Provided herein are methods of treating a genetic disease in a subject in need thereof comprising administering to the subject an effective amount of a nucleic acid, system, fusion protein or construct described herein. [0173] Genome editing may be performed on a single cell or a population of cells of interest and can be performed on any type of cell, including any cell from a prokaryotic, eukaryotic, or archaeon organism, including bacteria, archaea, fungi, protists, plants, and animals. Cells from tissues, organs, and biopsies, as well as recombinant cells, genetically modified cells, cells from cell lines cultured in vitro, and artificial cells (e.g., nanoparticles, liposomes, polymersomes, or microcapsules encapsulating nucleic acids) may all be used in the practice of the present disclosure. The methods of the disclosure are also applicable to editing of nucleic acids in cellular fragments, cell components, or organelles comprising nucleic acids (e.g., mitochondria
in animal and plant cells, plastids (e.g., chloroplasts) in plant cells and algae). Cells may be cultured or expanded prior to or after performing genome editing as described herein. In one embodiment, the cells are yeast cells. [0174] An RNA-guided nuclease can be targeted to a particular genomic sequence (i.e., genomic target sequence to be modified) by altering its guide RNA sequence. A target-specific guide RNA comprises a nucleotide sequence that is complementary to a genomic target sequence, and thereby mediates binding of the nuclease-gRNA complex by hybridization at the target site. For example, the gRNA can be designed with a sequence complementary to the sequence of a minor allele to target the nuclease-gRNA complex to the site of a mutation. The mutation may comprise an insertion, a deletion, or a substitution. For example, the mutation may include a single nucleotide variation, gene fusion, translocation, inversion, duplication, frameshift, missense, nonsense, or other mutation associated with a phenotype or disease of interest. The targeted minor allele may be a common genetic variant or a rare genetic variant. In certain embodiments, the gRNA is designed to selectively bind to a minor allele with single base-pair discrimination, for example, to allow binding of the nuclease-gRNA complex to a single nucleotide polymorphism (SNP). In particular, the gRNA may be designed to target disease-relevant mutations of interest for the purpose of genome editing to remove the mutation from a gene. Alternatively, the gRNA can be designed with a sequence complementary to the sequence of a major or wild-type allele to target the nuclease-gRNA complex to the allele for the purpose of genome editing to introduce a mutation into a gene in the genomic DNA of the cell, such as an insertion, deletion, or substitution. Such genetically modified cells can be used, for example, to alter phenotype, confer new properties, or produce disease models for drug screening. [0175] In certain embodiments, the RNA-guided nuclease used for genome modification is a clustered regularly interspaced short palindromic repeats (CRISPR) system Cas nuclease. Any RNA-guided Cas nuclease capable of catalyzing site-directed cleavage of DNA to allow integration of donor polynucleotides by the HDR mechanism can be used in genome editing, including CRISPR system type I, type II, or type III Cas nucleases. Examples of Cas proteins include Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8a1, Cas8a2, Cas8b, Cas8c, Cas9 (Csn1 or Csx12), Cas10, Cas10d, Cas12a/Cpf1, Mad7™ (INSCRIPTA ®), CasF, CasG, CasH, Csy1, Csy2, Csy3, Cse1 (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3,
Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, and Cu1966, and homologs or modified versions thereof. [0176] In certain embodiments, a type II CRISPR system such as a Cas9 endonuclease is used. Cas9 nucleases from any species, or biologically active fragments, variants, analogs, or derivatives thereof that retain Cas9 endonuclease activity (i.e., catalyze site-directed cleavage of DNA to generate double-strand breaks) may be used to perform genome modification as described herein. The Cas9 need not be physically derived from an organism, but may be synthetically or recombinantly produced. Cas9 sequences from a number of bacterial species are well known in the art and listed in the National Center for Biotechnology Information (NCBI) database. See, for example, NCBI entries for Cas9 from: Streptococcus pyogenes (WP_002989955, WP_038434062, WP_011528583); Campylobacter jejuni (WP_022552435, YP_002344900), Campylobacter coli (WP_060786116); Campylobacter fetus (WP_059434633); Corynebacterium ulcerans (NC_015683, NC_017317); Corynebacterium diphtheria (NC_016782, NC_016786); Enterococcus faecalis (WP_033919308); Spiroplasma syrphidicola (NC_021284); Prevotella intermedia (NC_017861); Spiroplasma taiwanense (NC_021846); Streptococcus iniae (NC_021314); Belliella baltica (NC_018010); Psychroflexus torquisI (NC_018721); Streptococcus thermophilus (YP_820832), Streptococcus mutans (WP_061046374, WP_024786433); Listeria innocua (NP_472073); Listeria monocytogenes (WP_061665472); Legionella pneumophila (WP_062726656); Staphylococcus aureus (WP_001573634); Francisella tularensis (WP_032729892, WP_014548420), Enterococcus faecalis (WP_033919308); Lactobacillus rhamnosus (WP_048482595, WP_032965177); and Neisseria meningitidis (WP_061704949, YP_002342100); all of which sequences (as entered by the date of filing of this application) are herein incorporated by reference. Any of these sequences or a variant thereof comprising a sequence having at least about 70-100% sequence identity thereto, including any percent identity within this range, such as 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity thereto, can be used for genome editing, as described herein. See also Fonfara et al. (2014) Nucleic Acids Res. 42(4):2577-90; Kapitonov et al. (2015) J. Bacteriol. 198(5):797-807, Shmakov et al. (2015) Mol. Cell. 60(3):385-397, and Chylinski et al. (2014) Nucleic Acids Res. 42(10):6091-6105); for sequence comparisons and a discussion of genetic diversity and phylogenetic analysis of Cas9.
[0177] The CRISPR-Cas system naturally occurs in bacteria and archaea where it plays a role in RNA-mediated adaptive immunity against foreign DNA. The bacterial type II CRISPR system uses the endonuclease, Cas9, which forms a complex with a guide RNA (gRNA) that specifically hybridizes to a complementary genomic target sequence, where the Cas9 endonuclease catalyzes cleavage to produce a double-stranded break. Targeting of Cas9 typically further relies on the presence of a 3ƍ protospacer-adjacent motif (PAM) in the DNA directly downstream of the gRNA-binding site. [0178] The genomic target site will typically comprise a nucleotide sequence that is complementary to the gRNA and may further comprise a protospacer adjacent motif (PAM). In certain embodiments, the target site comprises 20-30 base pairs in addition to a 3 base pair PAM. Typically, the first nucleotide of a PAM can be any nucleotide, while the two other nucleotides will depend on the specific Cas9 protein that is chosen. Exemplary PAM sequences are known to those of skill in the art and include, without limitation, NNG, NGN, NAG, and NGG, wherein N represents any nucleotide. In certain embodiments, the allele targeted by a gRNA comprises a mutation that creates a PAM within the allele, wherein the PAM promotes binding of the Cas9-gRNA complex to the allele. [0179] In certain embodiments, the gRNA is 5-50 nucleotides, 10-30 nucleotides, 15-25 nucleotides, 18-22 nucleotides, or 19-21 nucleotides in length, or any length between the stated ranges, including, for example, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides in length. The guide RNA may be a single guide RNA comprising crRNA and tracrRNA sequences in a single RNA molecule, or the guide RNA may comprise two RNA molecules with crRNA and tracrRNA sequences residing in separate RNA molecules. [0180] In another embodiment, the CRISPR nuclease from Prevotella and Francisella 1 (Cpf1) may be used. Cpf1, also known as Cas12a, is another class II CRISPR/Cas system RNA-guided nuclease with similarities to Cas9 and may be used analogously. Unlike Cas9, Cpf1 does not require a tracrRNA and only depends on a crRNA in its guide RNA, which provides the advantage that shorter guide RNAs can be used with Cpf1 for targeting than Cas9. Cpf1 is capable of cleaving either DNA or RNA. The PAM sites recognized by Cpf1 have the sequences 5ƍ-YTN-3ƍ (where "Y" is a pyrimidine and "N" is any nucleobase) or 5ƍ-TTTV-3ƍ and are located 5ƍ to the gRNA binding site, in contrast to the G-rich PAM site recognized by Cas9 which is located 3ƍ to the gRNA binding site. Cpf1/Cas12a cleavage of DNA produces
double-stranded breaks with a sticky-ends having a 4 or 5 nucleotide overhang. For a discussion of Cpf1, see, e.g., Ledford et al. (2015) Nature. 526 (7571):17-17, Zetsche et al. (2015) Cell.163 (3):759-771, Murovec et al. (2017) Plant Biotechnol. J.15(8):917-926, Zhang et al. (2017) Front. Plant Sci.8:177, Fernandes et al. (2016) Postepy Biochem.62(3):315-326; herein incorporated by reference. [0181] In another embodiment, a class 2 type V-A CRISPR-Cas (Cas12a/Cpf1) nuclease can be used, such as Mad7™. MAD7™ is an engineered class 2 type V-A CRISPR-Cas (Cas12a/Cpf1) system isolated from Eubacterium rectale. It is an RNA-guided nuclease with demonstrated gene editing activity in Escherichia coli, yeast, human, mice and rat cells. See Liu Z et al., CRISPR J.2020 Apr;3(2):97-108. [0182] C2c1 is another class II CRISPR/Cas system RNA-guided nuclease that may be used. C2c1, similarly to Cas9, depends on both a crRNA and tracrRNA for guidance to target sites. For a description of C2c1, see, e.g., Shmakov et al. (2015) Mol Cell. 60(3):385-397, Zhang et al. (2017) Front Plant Sci.8:177; herein incorporated by reference. [0183] In yet another embodiment, an engineered RNA-guided FokI nuclease may be used. RNA-guided FokI nucleases comprise fusions of inactive Cas9 (dCas9) and the FokI endonuclease (FokI-dCas9), wherein the dCas9 portion confers guide RNA-dependent targeting on FokI. For a description of engineered RNA-guided FokI nucleases, see, e.g., Havlicek et al. (2017) Mol. Ther.25(2):342-355, Pan et al. (2016) Sci Rep.6:35794, Tsai et al. (2014) Nat Biotechnol.32(6):569-576; herein incorporated by reference. [0184] The RNA-guided nuclease can be provided in the form of a protein, such as the nuclease complexed with a gRNA, or provided by a nucleic acid encoding the RNA-guided nuclease, such as an RNA (e.g., messenger RNA) or DNA (expression vector). Codon usage may be optimized to improve production of an RNA-guided nuclease in a particular cell or organism. For example, a nucleic acid encoding an RNA-guided nuclease can be modified to substitute codons having a higher frequency of usage in a yeast cell, a bacterial cell, a human cell, a non-human cell, a mammalian cell, a rodent cell, a mouse cell, a rat cell, or any other host cell of interest, as compared to the naturally occurring polynucleotide sequence. When a nucleic acid encoding the RNA-guided nuclease is introduced into cells, the protein can be transiently, conditionally, or constitutively expressed in the cell. [0185] Donor polynucleotides and gRNAs are readily synthesized by standard techniques, e.g., solid phase synthesis via phosphoramidite chemistry, as disclosed in U.S. Patent Nos.
4,458,066 and 4,415,732, incorporated herein by reference; Beaucage et al., Tetrahedron (1992) 48:2223-2311; and Applied Biosystems User Bulletin No. 13 (1 April 1987). Other chemical synthesis methods include, for example, the phosphotriester method described by Narang et al., Meth. Enzymol. (1979) 68:90 and the phosphodiester method disclosed by Brown et al., Meth. Enzymol. (1979) 68:109. In view of the short lengths of gRNAs (typically about 20 nucleotides in length) and donor polynucleotides (typically about 100-150 nucleotides), gRNA-donor polynucleotide cassettes can be produced by standard oligonucleotide synthesis techniques and subsequently ligated into vectors. Moreover, libraries of gRNA-donor polynucleotide cassettes directed against thousands of genomic targets can be readily created using highly parallel array-based oligonucleotide library synthesis methods (see, e.g., Cleary et al. (2004) Nature Methods 1:241-248, Svensen et al. (2011) PLoS One 6(9):e24906). [0186] In addition, adapter sequences can be added to oligonucleotides to facilitate high- throughput amplification or sequencing. For example, a pair of adapter sequences can be added at the 5ƍ and 3ƍ ends of an oligonucleotide to allow amplification or sequencing of multiple oligonucleotides simultaneously by the same set of primers. Additionally, restriction sites can be incorporated into oligonucleotides to facilitate cloning of oligonucleotides into vectors. For example, oligonucleotides comprising gRNA-donor polynucleotide cassettes can be designed with a common 5ƍ restriction site and a common 3ƍ restriction site to facilitate ligation into the genome modification vectors. A restriction digest that selectively cleaves each oligonucleotide at the common 5ƍ restriction site and the common 3ƍ restriction site is performed to produce restriction fragments that can be cloned into vectors (e.g., plasmids or viral vectors), followed by transformation of cells with the vectors comprising the gRNA-donor polynucleotide cassettes. A restriction site can also be added in between the gRNA and donor polynucleotide sequences to enable a second cloning step for the introduction of a guide RNA scaffold sequence or other constructs into the vector. [0187] Amplification of polynucleotides encoding gRNA-donor polynucleotide cassettes may be performed, for example, before ligation into genome modification vectors or before sequencing and after barcoding. Any method for amplifying oligonucleotides may be used, including, but not limited to polymerase chain reaction (PCR), isothermal amplification, nucleic acid sequence-based amplification (NASBA), transcription mediated amplification (TMA), strand displacement amplification (SDA), and ligase chain reaction (LCR). In one embodiment, the genome editing cassettes comprise common 5ƍ and 3ƍ priming sites to allow amplification of the gRNA-donor polynucleotide sequences in parallel with a set of universal
primers. In another embodiment, a set of selective primers is used to selectively amplify a subset of the gRNA-donor polynucleotides from a pooled mixture. [0188] Cells that are transformed with recombinant polynucleotides comprising the genome editing cassettes may be prokaryotic cells or eukaryotic cells and are preferably designed for high-efficiency incorporation of gRNA-donor polynucleotide libraries by transformation. Methods of introducing nucleic acids into a host cell are well known in the art. Commonly used methods of transformation include chemically-induced transformation, typically using divalent cations (e.g., CaCl 2 ), and electroporation. See, e.g., Sambrook et al. (2001) Molecular Cloning, a laboratory manual, 3rd edition, Cold Spring Harbor Laboratories, New York, Davis et al. (1995) Basic Methods in Molecular Biology, 2nd edition, McGraw-Hill, and Chu et al. (1981) Gene 13:197; herein incorporated by reference in their entireties. [0189] Normally, random diffusion of donor DNA to a DNA break is rate-limiting for homologous repair. Active donor recruitment may be used to increase the frequency of cells genetically modified by HDR. The method for active donor recruitment comprises: a) introducing into a cell a fusion protein comprising a protein that selectively binds to the DNA break connected to a polypeptide comprising a nucleic acid binding domain; and b) introducing into the cell a donor polynucleotide comprising i) a nucleotide sequence sufficiently complementary to hybridize to a sequence adjacent to the DNA break, and ii) a nucleotide sequence comprising a binding site recognized by the nucleic acid binding domain of the fusion protein, wherein the nucleic acid binding domain selectively binds to the binding site on the donor polynucleotide to produce a complex between the donor polynucleotide and the fusion protein, thereby recruiting the donor polynucleotide to the DNA break and promoting HDR. [0190] The DNA break may be created by a site-specific nuclease, such as, but not limited to, a Cas nuclease (e.g., Cas9, Cpf1, or C2c1), an engineered RNA-guided FokI nuclease, a zinc finger nuclease (ZFN), a transcription activator-like effector-based nuclease (TALEN), a restriction endonuclease, a meganuclease, a homing endonuclease, and the like. Any site- specific nuclease that selectively cleaves a sequence at the target integration site for the donor polynucleotide may be used. [0191] The DNA break may be a single-stranded (nick) or double-stranded DNA break. If the DNA break is a single-stranded DNA break, the fusion protein used comprises a protein that selectively binds to the single-stranded DNA break, whereas if the DNA break is a double- stranded DNA break, the fusion protein used comprises a protein that selectively binds to the
double-stranded DNA break. The fusion protein can also recognize both single-stranded and double-stranded DNA breaks. [0192] In the fusion, the protein that selectively binds to the DNA break can be, for example, an RNA-guided nuclease, such as a Cas nuclease (e.g., Cas9 or Cpf1) or an engineered RNA- guided FokI nuclease. [0193] Donor polynucleotides may be single-stranded or double-stranded and may be composed of RNA or DNA. A donor polynucleotide comprising DNA can be produced from a donor polynucleotide comprising RNA, if desired, by reverse transcription using reverse transcriptase either in the cell (e.g. by a retron reverse transcriptase) or outside the cell (e.g. by a recombinant reverse transcriptase such as M-MLV). [0194] The RNA binding domain may be any protein or domain from a protein that binds a known RNA sequence. Examples of each of these proteins are well known in the art. Non- limiting examples of RNA binding domains include domains of proteins that bind to MS2 stem loop sequence, a Pumilio (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double-Stranded RNA-Binding Domain (dsRBD) recognition sequence, a Zinc finger (ZF) Domain recognition sequence, a Z-alpha, arginine/glycine rich (RGG) domain recognition sequence, a K Homology (KH) Domain recognition sequence, or a Poly(A) tail. [0195] The single stranded nucleic acid binding domain may be any protein or domain from a protein that binds a known single stranded nucleic acid sequence. Examples of each of these proteins are well known in the art. Single stranded nucleic acid binding domain recognition domains of polypeptides include, but are not limited to, oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds, such as in such as human POT1, Schizosaccharomyces pombe Pot1, Sterkiella nova TEPB, CspB protein from Bacillus caldolyticus and Bacillus subtilis; K homology (KH) domains, such as in KH domain- containing proteins include heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)-binding protein (FBP), and poly(C)-binding proteins (PCBP) 1 and 2; RNA recognition motifs (RRMs) which bind DNA such as in FBP-interacting repressor (FIR), hnRNP A1, and hnRNP D (also known as Auf1); and whirly domains such as in the mitochondrial whirly protein Why2 and the mammalian transcriptional regulator PurA. [0196] In another embodiment, the fusion protein may comprise a FHA phosphothreonine- binding domain, wherein the donor polynucleotide is selectively recruited to a DNA break
having a protein comprising a phosphorylated threonine residue located sufficiently close to the DNA break for the FHA phosphothreonine-binding domain to bind to the phosphorylated threonine residue. The FHA phosphothreonine-binding domain may be combined with any RNA binding domain (e.g., fusion with MCP) or single stranded nucleic acid binding domain (e.g. OB-fold) for donor recruitment. [0197] Without being bound by theory, it is contemplated that the donor recruitment protein includes a fusion of a polypeptide domain from any protein that has an RNA binding domain or single stranded nucleic acid binding domain with a polypeptide domain from any protein that has a DNA break localizing domain. [0198] Non-limiting examples of DNA break localizing domains include domains of proteins that bind to areas of DNA damage and/or DNA repair proteins. Phospho-Ser/Thr-binding domains have emerged as crucial regulators of cell cycle progression and DNA damage signaling. Such domains include 14-3-3 proteins, WW domains, Polo-box domains (in PLK1), WD40 repeats (including those in the E3 ligase SCFȕTrCP), BRCT domains (including those in BRCA1) and FHA domains (such as in CHK2 and MDC1). These domains all have the potential to be used in donor recruitment systems. FHA domains are conserved between eukaryotes and bacteria and thus would also have utility in bacteria as well as eukaryotes for donor recruitment. Examples of proteins or genes encoding such proteins are provided, without limitation, in Tables 1-5. Additional genes/proteins are known in the art and can be found, for example, by searching public gene or protein databases for genes or proteins known to have a role in DNA repair or binding of DNA damage (e.g., gene ontology term analysis). It is contemplated that proteins from any species can be used (e.g., eukaryotic proteins, proteins from yeast, mammalian cells, including human proteins, and/or from fungus). In embodiments, the donor recruitment protein comprises a polypeptide sequence from a DNA break-recruiting protein from the same kingdom, phylum or division, class, order, family, genus, and/or species as the cell to be genetically modified. [0199] In some embodiments, the fusion protein comprises an RNA binding domain of MS2 coat protein (MCP) joined to a forkhead-associated (FHA) domain. In some embodiments, the fusion protein comprises comprises an RNA binding domain of MS2 coat protein (MCP) joined to an FHA phosphothreonine-binding domain. In some embodiments, the fusion protein comprises a LexA domain, the RNA binding domain of MCP and the FHA domain. In some embodiments, the LexA domain is from the LexA repressor protein (UniProtKB - P0A7C2).
It will be understood that the arrangement or order of the LexA domain, the RNA binding domain of MCP and the FHA domain in the fusion protein can be varied as described herein. [0200] In certain embodiments, an inhibitor of the non-homologous end joining (NHEJ) pathway is used to further increase the frequency of cells genetically modified by HDR. Examples of inhibitors of the NHEJ pathway include any compound (agent) that inhibits or blocks either expression or activity of any protein component in the NHEJ pathway. Protein components of the NHEJ pathway include, but are not limited to, Ku70, Ku86, DNA protein kinase (DNA-PK), Rad50, MRE11, NBS1, DNA ligase IV, and XRCC4. An exemplary inhibitor is wortmannin which inhibits at least one protein component (e.g., DNA-PK) of the NHEJ pathway. Another exemplary inhibitor is Scr7 (5,6-bis((E)-benzylideneamino)-2- mercaptopyrimidin-4-ol), which inhibits joining of DSBs (Maruyama et al. (2015) Nat. Biotechnol.33(5):538-542, Lin et al. (2016) Sci. Rep.6:34531). RNA interference or CRISPR- interference may also be used to block expression of a protein component of the NHEJ pathway (e.g., DNA-PK or DNA ligase IV). For example, small interfering RNAs (siRNAs), hairpin RNAs, and other RNA or RNA:DNA species which can be cleaved or dissociated in vivo to form siRNAs may be used to inhibit the NHEJ pathway by RNA interference. Alternatively, deactivated Cas9 (dCas9) together with single guide RNAs (sgRNAs) complementary to the promoter or exonic sequences of genes of the NHEJ pathway can be used in transcriptional repression by CRISPR-interference. Alternatively, an HDR enhancer such as RS-1 may be used to increase the frequency of HDR in cells (Song et al. (2016) Nat. Commun.7:10548). VI. Pharmaceutical compositions [0201] Also provided are pharmaceutical compositions or formulations comprising the nucleic acids, systems, fusion proteins and constructs described herein. The pharmaceutical compositions and formulations can be combined with a pharmaceutically acceptable carrier for administration to a subject or patient. [0202] It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents, and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes.
EXAMPLES Example 1 [0203] This example shows that combining in vivo plasmid assembly and donor recruitment improved the survival of yeast colonies undergoing targeted edits in the genome. [0204] To test the effectiveness of donor recruitment and a previously demonstrated HDR- based plasmid assembly approach on improving editing survival, we combined 24 editing cassettes randomly selected from a guide-donor library. We spiked in non-editing cassettes to simulate low efficacy guides and synthesis errors expected in guide-donor libraries, and guide- donors with edits outside the “seed” region which are expected to drop out of the library due to Cas9 mismatch tolerance and repeated cleavage of donor DNA and edited target DNA (Figure 1b). Strikingly, plasmid assembly enhanced survival for the donor-cleaving variants and a subset of the 24 editing cassettes. On the other hand, donor recruitment by LexA-FHA (Figure 1a) promoted survival for all edit types, and was more effective at preventing accumulation of non-editing cassettes in the library than plasmid assembly (Figure 1c). The observation that plasmid assembly specifically enhanced survival of donor-cleaving guides suggested that perhaps somehow it was promoting a continual cycle of perfect repair at the edited sites, and that different modes of plasmid assembly might have a different impact on these cassettes. To test an alternative method for assembling plasmids, we introduced fragments which would require assembly by non-homologous end-joining (NHEJ) rather than HDR (Figure 1a). Surprisingly, this resulted in a marked improvement over the HDR-based plasmid assembly (Figure 1c). Importantly, combining either HDR- or NHEJ-based plasmid assembly with donor recruitment increased editing survival overall and improved variant representation relative to any of the individual methods. The combined system approached the abundance distribution observed in the absence of Cas9 for many targets, suggesting a near complete abrogation of editing toxicity (Figure 1c). Example 2 [0205] This example describes how combining donor recruitment, retron donor production, and in vivo plasmid assembly resulted in improved editing at target sites in the genome. [0206] We tested combining plasmid assembly with the retron donor recruitment system we previously developed, where MS2-FHA or MS2-LexA-FHA fusion proteins are used to recruit either retron donor or both retron and plasmid donor, respectively. We termed this system MAGESTIC 3.0 to account for the three orthogonal HDR-enhancing systems involved. We
tested MAGESTIC 3.0 in different genome editing applications, including on several hard-to- edit regions of the genome which undergo structural variant (SV) formation, with saturation editing of a gene where single-nucleotide edits can undergo repeated cleavage and repair cycles, and in pooled editing screens involving natural variants in complex trait loci genome- wide. A schematic of the method is shown in Figure 2. [0207] First, we challenged MAGESTIC 3.0 at sites we previously documented to undergo high levels of SV formation upon editing, either in the form of deletions (Figure 3a) or translocation (Figure 3b). MAGESTIC 3.0 dramatically reduced structural variant formation in these regions, nearly completely eliminating the loss of genomic coverage as assessed by whole-genome sequencing on an edited pool. [0208] We next tested the ability of MAGESTIC 3.0 to enable complete saturation editing across a guide and PAM region (Figure 4). We show that for both SpCas9 and LbCas12a, MAGESTIC 3.0 improves the overall edited fraction of these libraries from 79 to 84% and from 49 to 59%, respectively, demonstrating additive improvement when all three systems are operating simultaneously. [0209] Finally, we used MAGESTIC 3.0 with SpCas9 and a PAM-relaxed version of LbCas12a to edit and phenotype all 7,186 variants residing in 112 quantitative trait loci across the genome for 32 conditions, revealing a complex genotype-phenotype map of causal natural variants. These results revealed a caveat of guide-donor plasmid editing. We found an unusual profile at several genes where disruption was previously known to confer a strong beneficial fitness effect. We found that for two such genes, IRA1 and IRA2, a fraction of barcodes for synonymous edits and promoter variants gave strong phenotypes matching the null mutant premature termination codon (PTC) controls (Figure 5). This observation was only possible because of the barcoded plasmid assembly specific to MAGESTIC 3.0 where each transformant receives a unique tag. As editing happens very early on in many cases, these barcodes effectively serve as unique edit identifiers for a substantial fraction of target sites, enabling distinguishing aberrantly edited from correctly edited strains. We suspected that plasmid integration could underlie these effects. We assayed a large set of natural variant targets genome-wide to validate the prevalence of this phenomenon (Figure 6), and confirmed that plasmid integration is a pervasive and previously unappreciated aspect of donor plasmid editing in yeast.
Example 3 [0210] This example describes a method for enhanced guide-donor plasmid cleavage. [0211] To address the problem of plasmid integration and further improve upon the MAGESTIC 3.0 system, we developed a method for enhanced guide-donor plasmid cleavage. We reasoned that introducing cleavage sites in between the guide and donor on the plasmid would enable recovering correct edits by virtue of the direct donor DNA repeats generated during plasmid integration (Figure 7). We first included the I-SceI endonuclease controlled by the galactose-inducible GAL-L promoter. We also engineered an inducible SaCas9 nuclease system under the control of the TetR-WTC846 system, along with a guide “X” targeting the guide-donor plasmid. Remarkably, only the dual I-SceI/SaCas9 system enabled complete removal of plasmid integration events, as well as complete removal of residual guide-donor plasmid upon barcoding in galactose+aTc medium and subsequent 5FC counter selection of FCY1. [0212] In summary, we show that MAGESTIC 3.0 improves edit outcomes in four key areas which have limited the effective of genome-scale editing screens, by (1) improving editing survival right after transformation, (2) maintaining survival for single-nucleotide edits which undergo repeated cycles of repair and cleavage, (3) outcompeting endogenous repair processes which regenerate unedited sequence, and (4) outcompeting aberrant, alternative repair processes at structural variant-prone regions of the genome. We also show that plasmid assembly as an additional benefit, as the PCR-barcoded inserts used in MAGESTIC 3.0 enable tagging each transformant with a unique barcode, giving dozens of internal replicates for each targeted variant. This feature enabled us to identify plasmid integration as a previously overlooked, pervasive artifact that impacts a subset of the barcodes for each target site in guide- donor plasmid-based editing systems. To address this issue, we engineered a dual I-SceI + SaCas9 nuclease method to recover correctly edited loci from such events through tandem- repeat mediated HDR and to completely remove plasmid integrations and residual intact guide- donor plasmid from the libraries post-editing. With these innovations, MAGESTIC 3.0 will facilitate the functional analysis of the genome at single-nucleotide resolution by accurate phenotyping of thousands of variants and should inspire further attempts to harness HDR for genome engineering and CRISPR screens across organisms.
Example 4 [0213] This example describes multiple PAM variant nucleases derived from SpCas9 and LbCas12a generate saturation nucleotide editing and subsequently remove integrated guide- donor plasmids. [0214] For many genome editing applications, desired targets often either lack traditional PAMs in their vicinity and thus cannot be edited, or they are in PAM-distal regions where they would be recut at a high rate and result in unintended mutations or cell death. Therefore, complete saturation editing of a genome requires a system that can work with nucleases recognizing a wide array of PAMs. The goal of this experiment was to show that complete saturation nucleotide editing of genomic regions is possible with MAGESTIC 3.0 by using nucleases recognizing diverse PAMs. By utilizing SpCas9 and LbCas12a variants that recognize a wide range of simple motifs, including NGG, NGNG, TTTV, TNTN, and other T/C-rich motifs (TACV, TTCV, TCCV, CTCV, CCCV, and VTTV), the chances of recovering enough cells with the desired edits increases substantially. [0215] This experiment is built on the data shown in Example 3. The MAGESTIC 3.0 editing and barcoding system includes 5 stages: (1) colony formation after the transformation of guide-donor plasmids; (2) an outgrowth in liquid media to increase overall editing percentages across the library (this step is especially important for weaker guides); (3) induction of barcoding and guide-donor plasmid destruction by turning on the I-SceI and SaCas9 nucleases, along with the guide X1 for SaCas9, in a media containing galactose and anhydrotetracycline (ATc); (4) a 2nd outgrowth in the medium containing galactose and Atc; and (5) counter-selection of the cells with residual guide-donor plasmids using 5-fluorocytosine (5FC) (Figure 8a). Plasmid integration is an inherent feature of editing with all types of nucleases and PAM sites. Taking the genomic editing at the PDR5 promoter region as an example, the MAGESTIC 3.0 editing and barcoding system can provide a saturation nucleotide editing and subsequently removal of integrated guide-donor plasmids from the genomic DNA (Figure 8b-c). Assessing plasmid integration by PCR is a simple way to confirm that editing has taken place in experiments utilizing plasmid donors. Importantly, the integrated plasmids can be removed and converted back to the intended, correct edits by induction of nucleases which will cleave the plasmid and promote HDR across the integrated donor repeats. As a control, the SpRY PAM variant of SpCas9 was transformed with the LbCas12a library, to show that plasmid integration indeed depends on target site cleavage. A separate control without
donor recruitment or retron donor shows that plasmid integration is not an artifact of donor enhancement, but rather an inherent feature of editing with donor plasmids (Figure 8d-f). [0216] All publications and patent applications mentioned in this disclosure are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference. [0217] No admission is made that any reference cited herein constitutes prior art. The discussion of the references states what their authors assert, and the Applicant reserves the right to challenge the accuracy and pertinence of the cited documents. It will be clearly understood that, although a number of information sources, including scientific journal articles, patent documents, and textbooks, may be referred to herein; this reference does not constitute an admission that any of these documents forms part of the common general knowledge in the art. [0218] The discussion of the general methods given herein is intended for illustrative purposes only. Other alternative methods and alternatives will be apparent to those of skill in the art upon review of this disclosure and are to be included within the scope of this application. [0219] While particular alternatives of the present disclosure have been disclosed, it is to be understood that various modifications and combinations are possible and are contemplated within the scope of the appended claims. There is no intention, therefore, of limitations to the exact abstract and disclosure herein presented. EXEMPLARY EMBODIMENTS [0220] Exemplary embodiments provided in accordance with the presently disclosed subject matter include, but are not limited to, the claims and the following embodiments: [0221] Embodiment 1. A method for increasing site-specific genomic editing efficiency, fidelity, and/or survival of a cell, the method comprising introducing into a cell: i) a first linear double stranded polynucleotide comprising a double stranded DNA (dsDNA) donor sequence and a nucleic acid sequence encoding a guide RNA (gRNA) operably linked to a first promoter; and ii) a second linear double stranded polynucleotide, wherein the second linear double stranded polynucleotide is linked to the first linear double stranded polynucleotide by homology directed repair (HDR) or non-homologous end joining (NHEJ) to form a circular donor-gRNA plasmid inside of the cell;
wherein the cell comprises a CRISPR-associated (Cas) endonuclease or a nucleic acid sequence encoding the Cas endonuclease, and wherein the Cas endonuclease and the circular donor-gRNA plasmid and/or the first linear double stranded polynucleotide prior to assembly generate a site-specific edit in the genome of the cell and increases the editing efficiency, fidelity, and/or survival of the cell compared to a method that does not include in vivo plasmid assembly to produce a circular donor-gRNA plasmid. [0222] Embodiment 2. The method of embodiment 1, wherein the first linear double stranded polynucleotide further comprises a DNA binding domain recognition sequence. [0223] Embodiment 3. The method of embodiment 2, wherein the DNA binding domain recognition sequence binds to a LexA DNA binding domain or a forkhead homolog 1 (FKH1) DNA binding domain. [0224] Embodiment 4. The method of any one of embodiments 1 to 3, wherein the first linear double stranded polynucleotide further comprises nucleotide sequences complementary to sequences adjacent to the dsDNA break site generated by the Cas endonuclease. [0225] Embodiment 5. The method of any one of embodiments 1 to 4, wherein the first promoter is a constitutive or inducible promoter. [0226] Embodiment 6. The method of any one of embodiments 1 to 5, wherein the Cas endonuclease is a Cas9 or Cas12a endonuclease, or a modified variant thereof. [0227] Embodiment 7. The method of any one of embodiments 1 to 6, further comprising introducing into the cell a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a second promoter, the retron ncRNA comprising: a. a stabilizing 5ƍ ribozyme sequence; b. one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences; c. an msr sequence; d. an msd sequence; e. a donor sequence for homology directed repair (HDR) inserted within the msd sequence; and, f. a first inverted repeat sequence and a second inverted repeat sequence,
wherein the cell further comprises a retron-specific reverse transcriptase (RT) or a nucleic acid encoding the retron RT, and wherein the retron RT and the retron ncRNA generate multicopy single-stranded DNA (msDNA) containing the single-stranded donor (ssDNA retron donor) sequences. [0228] Embodiment 8. The method of embodiment 7, wherein the RNA binding domain recognition sequence is a MS2 stem loop sequence. [0229] Embodiment 9. The method of embodiment 8, wherein the MS2 stem loop sequence binds to a MS2 coat protein (MCP) RNA binding domain. [0230] Embodiment 10. The method of embodiment 7, wherein the single stranded nucleic acid binding domain recognition sequence binds to a Cas endonuclease binding domain. [0231] Embodiment 11. The method of any one of embodiments 7 to 10, wherein the second promoter is a constitutive or inducible promoter. [0232] Embodiment 12. The method of any one of embodiments 1 to 11, further comprising introducing into the cell a fusion protein comprising a) a DNA binding domain, an RNA binding domain and/or a single stranded nucleic acid binding domain and b) a dsDNA break site-localizing domain, or a nucleic acid encoding the fusion protein, wherein the fusion protein binds to the DNA binding domain recognition sequence of the circular donor-gRNA plasmid or the first linear double stranded polynucleotide prior to assembly, and/or the one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences of the retron, and binds to a dsDNA break site generated by the Cas endonuclease, thereby recruiting the dsDNA donor sequences and/or the ssDNA retron donor sequences to the dsDNA break in the genome of the cell and promoting editing by HDR. [0233] Embodiment 13. The method of embodiment 12, wherein the fusion protein comprises an dsDNA break site-localizing domain, wherein the dsDNA break site-localizing domain is a forkhead-associated (FHA) domain. [0234] Embodiment 14. The method of embodiment 12 or 13, wherein the fusion protein comprises an RNA binding domain, wherein the RNA binding domain is a MCP RNA binding domain.
[0235] Embodiment 15. The method of any one of embodiments 12 to 14, wherein the fusion protein comprises the MCP RNA binding domain and the FHA domain. [0236] Embodiment 16. The method of any one of embodiments 12 to 15, wherein the fusion protein comprises a DNA binding domain, wherein the DNA binding domain is a LexA DNA binding domain or an FKH1 DNA binding domain. [0237] Embodiment 17. The method of any one of embodiments 12 to 16, wherein the fusion protein comprises (i) the LexA DNA binding domain or the FKH1 DNA binding domain, (ii) the MCP RNA binding domain, and (iii) the FHA domain in one of the following orders: (i), (ii), (iii); (i), (iii), (ii); (ii), (i), (iii); (ii), (iii), (i); (iii), (i), (ii); and (iii), (ii), (i). [0238] Embodiment 18. The method of any one of embodiments 1 to 17, wherein the cell is a eukaryotic cell. [0239] Embodiment 19. The method of any one of embodiments 1 to 18, wherein the second linear double stranded polynucleotide further comprises a barcode sequence. [0240] Embodiment 20. The method of embodiment 19, wherein the barcode sequence integrates into a designated barcode locus in the host cell genome. [0241] Embodiment 21. The method of embodiment 20, wherein integration of the barcode sequence into the barcode locus comprises cleavage of the barcode locus genomic DNA by an endonuclease, wherein the endonuclease is operably linked to an inducible promoter. [0242] Embodiment 22. The method of embodiment 21, wherein the endonuclease is a Cas endonuclease or a homing endonuclease. [0243] Embodiment 23. The method of any one of embodiments 19 to 22, wherein the circular donor-RNA plasmid comprises sequences that are homologous to sequences flanking
the endonuclease cleavage site, such that homologous recombination results in integration of the barcode sequence into the barcode locus. [0244] Embodiment 24. The method of embodiment 22 or 23, wherein the endonuclease is a homing endonuclease, wherein the homing endonuclease is an I-SceI endonuclease operably linked to a GAL1 promoter. [0245] Embodiment 25. The method of any one of embodiments 21 to 24, wherein the barcode locus comprises nucleic acid sequences encoding the Cas endonuclease used for editing the target site, sequences encoding the RT, and/or sequences encoding the fusion protein flanked by the endonuclease cleavages sites, wherein expression of the endonuclease results in removal of the nucleic acid sequences encoding the Cas endonuclease, the nucleic acid sequences encoding the RT, and/or the nucleic acid sequences encoding the fusion protein, concomitant with the integration of the barcode sequence. [0246] Embodiment 26. The method of any one of embodiment 1 to 25, wherein the second linear double stranded polynucleotide comprises a selectable marker. [0247] Embodiment 27. A method for removing a plasmid which has integrated into an edited target locus in the genome of a cell, wherein the plasmid comprises: i) a nucleic acid sequence encoding a guide RNA operably linked to a first promoter; ii) a nucleic acid sequence encoding a homing endonuclease operably linked to a second promoter, wherein the second promoter is inducible; iii) a nucleic acid sequence encoding a Cas endonuclease operably linked to a third promoter, wherein the third promoter is inducible; iv) a nucleic acid sequence that is cleaved by the homing endonuclease; and/or v) a nucleic acid sequence that is cleaved by the Cas endonuclease; wherein the method comprises inducing expression of the homing endonuclease and/or the Cas endonuclease to cleave the integrated plasmid DNA, thereby removing the plasmid from the edited target locus. [0248] Embodiment 28. The method of embodiment 27, wherein removal of the plasmid results in recovery of a desired edit at the target locus.
[0249] Embodiment 29. The method of embodiment 27 or 28, wherein the homing endonuclease is an I-SceI endonuclease. [0250] Embodiment 30. The method of any one of embodiments 27 to 29, wherein the Cas endonuclease is Cas9, or a modified variant thereof. [0251] Embodiment 31. The method of embodiment 30, wherein the Cas endonuclease is SaCas9, or a modified variant thereof. [0252] Embodiment 32. The method of any one of embodiments 27 to 31, wherein the second and/or third promoters are GAL1 promoter inducible by galactose. [0253] Embodiment 33. The method of any one of embodiments 27 to 31, wherein the second and/or third promoters are inducible by tetracycline or anhydrotetracycline (aTc). [0254] Embodiment 34. The method of any one of embodiments 27 to 33, wherein the second promoter is GAL1 promoter that is inducible by galactose and the third promoter is inducible by tetracycline or anhydrotetracycline (aTc). [0255] Embodiment 35. The method of any one of embodiments 27 to 34, wherein the plasmid further comprises (vi) a barcode sequence that is flanked by (iv) the nucleic acid sequence that is cleaved by the homing endonuclease and/or (v) the nucleic acid sequence that is cleaved by the Cas endonuclease. [0256] Embodiment 36. The method of embodiment 35, wherein inducing expression of the homing endonuclease and/or the Cas endonuclease results in integration of the barcode sequence at a barcode locus, while simultaneously removing the plasmid from the edited target locus. [0257] Embodiment 37. A method for multiplexed editing of DNA in cells, the method comprising introducing into the cells: i) a guide RNA that binds a target site in the genomic DNA in the cells; ii) a library of linear double stranded donor polynucleotides comprising a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a promoter, wherein the retron ncRNA comprising: a. a stabilizing 5ƍ ribozyme sequence;
b. one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences; c. an msr sequence; d. an msd sequence; e. a donor sequence for homology directed repair (HDR) inserted within the msd sequence, wherein each donor sequence in the library introduces a different edit at the target site that binds the guide RNA; and, f. a first inverted repeat sequence and a second inverted repeat sequence; iii) a linear double stranded polynucleotide; wherein the linear double stranded polynucleotide of (iii) is linked in vivo into the linear recombinant double stranded polynucleotide of (ii) by homology directed repair (HDR) or non- homologous end joining (NHEJ) to produce a circular donor plasmid; and iv) a fusion protein comprising an RNA binding domain or single stranded nucleic acid binding domain connected to a DNA break site-localizing domain, or a nucleic acid encoding the fusion protein; wherein the cells express a Cas endonuclease or comprise a nucleic acid encoding the same, and wherein the cells express a retron-specific reverse transcriptase (RT) or comprise a nucleic acid encoding the same; wherein the fusion protein binds to the one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences of the retron, and binds to a dsDNA break site generated by the Cas endonuclease, thereby recruiting the ssDNA retron donor sequence to the dsDNA break locus and promoting editing by HDR, wherein a designed edit is introduced at the target site, with a plurality of different edits produced in different cells and each cell receiving a single edit. [0258] Embodiment 38. The method of embodiment 37, wherein the RNA binding domain recognition sequence is a MS2 stem loop sequence. [0259] Embodiment 39. The method of embodiment 38, wherein the MS2 stem loop sequence binds to a MS2 coat protein (MCP) RNA binding domain.
[0260] Embodiment 40. The method of any one of embodiments 37 to 39, wherein the single stranded nucleic acid binding domain recognition sequence binds to a Cas endonuclease binding domain. [0261] Embodiment 41. The method of any one of embodiments 37 to 40, wherein the locus surrounding the dsDNA break site accumulates phosphothreonine (pT) modified proteins recognized by forkhead-associated (FHA) domains. [0262] Embodiment 42. The method of any one of embodiments 37 to 41, wherein the fusion protein comprises an dsDNA break site-localizing domain, wherein the dsDNA break site localizing domain is an FHA domain. [0263] Embodiment 43. The method of any one of embodiments 37 to 42, wherein the fusion protein comprises a MCP RNA binding domain and an FHA domain. [0264] Embodiment 44. The method of any one of embodiments 37 to 43, wherein the linear double stranded donor polynucleotide of (ii) further comprises a DNA binding domain recognition sequence. [0265] Embodiment 45. The method of embodiment 44, wherein the nucleic acid binding domain recognition sequence binds to a LexA DNA binding domain or a forkhead homolog 1 (FKH1) DNA binding domain. [0266] Embodiment 46. The method of any one of embodiments 37 to 45, wherein the fusion protein further comprises a LexA DNA domain or a FKH1 DNA binding domain. [0267] Embodiment 47. The method of embodiment 46, wherein the LexA DNA domain or the FKH1 DNA binding domain is located between the MCP RNA binding domain and the FHA domain. [0268] Embodiment 48. The method of embodiment 46 or 47, wherein the fusion protein forms a complex with the circular plasmid and the dsDNA break site, thereby recruiting the circular plasmid to the DNA break and enhancing HDR. [0269] Embodiment 49. The method of any one of embodiments 37 to 48, wherein the linear double stranded donor polynucleotide of (ii) further comprises nucleotide sequences complementary to sequences adjacent to the dsDNA break. [0270] Embodiment 50. The method of any one of embodiments 37 to 49, wherein the promoter is a constitutive promoter.
[0271] Embodiment 51. The method of any one of embodiments 37 to 50, wherein the Cas endonuclease is a Cas9 or Cas12a endonuclease, or a modified variant thereof. [0272] Embodiment 52. The method of any one of embodiments 37 to 51, wherein the plurality of cells are eukaryotic cells. [0273] Embodiment 53. The method of any one of embodiments 37 to 52, wherein the editing efficiency, fidelity, and/or survival is improved compared to a method that does not include in vivo plasmid assembly to produce a circular plasmid. [0274] Embodiment 54. The method of any one of embodiments 37 to 53, wherein the linear double stranded polynucleotide of (iii) further comprises a barcode sequence. [0275] Embodiment 55. The method of any one of embodiments 37 to 54, wherein the linear double stranded polynucleotide of (iii) comprises a selectable marker. [0276] Embodiment 56. A system for editing DNA at a target site in the genome of a cell, comprising: (i) a first linear double-stranded donor polynucleotide comprising a nucleic acid sequence encoding a guide RNA operably linked to a first promoter and a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a second promoter, the retron ncRNA comprising: a. a stabilizing 5’ ribozyme sequence; b. one or more RNA binding domain recognition sequences or one or more single- stranded nucleic acid binding domain recognition sequences; c. an msr sequence; d. an msd sequence; e. a donor sequence for homology directed repair (HDR) inserted within the msd sequence; and f. a first inverted repeat sequence and a second inverted repeat sequence; (ii) a second linear double-stranded polynucleotide, and
(iii) a fusion protein comprising an RNA binding domain or single-stranded nucleic acid binding domain connected to a dsDNA break site-localizing domain, or a nucleic acid encoding the fusion protein. [0277] Embodiment 57. The system of embodiment 56, further comprising a cell that comprises a CRISPR-associated (Cas) endonuclease or a nucleic acid encoding the same, and a retron-specific reverse transcriptase (RT) or a nucleic acid encoding the same. [0278] Embodiment 58. The system of embodiment 56 or 57, wherein the fusion protein binds to the one or more RNA binding domain recognition sequences or one or more single- stranded nucleic acid binding domain recognition sequences and binds to a dsDNA break site generated by the Cas endonuclease at the target site. [0279] Embodiment 59. The system of any one of embodiments 56 to 58, wherein retron RNA expressed by (i) is reverse transcribed by the RT in vivo to produce multiple ssDNA molecules comprising ssDNA retron donor sequences, wherein individual ssDNA molecules bind to the fusion protein to produce a complex between the linear ssDNA molecules, the fusion protein, and the dsDNA break site, thereby recruiting the ssDNA retron donor sequence to the dsDNA break site and promoting editing by HDR. [0280] Embodiment 60. The system of any one of embodiments 56 to 59, wherein the second linear double-stranded polynucleotide comprises a selectable marker.
Claims
WHAT IS CLAIMED IS: 1. A method for increasing site-specific genomic editing efficiency, fidelity, and/or survival of a cell, the method comprising introducing into a cell: i) a first linear double stranded polynucleotide comprising a double stranded DNA (dsDNA) donor sequence and a nucleic acid sequence encoding a guide RNA (gRNA) operably linked to a first promoter; and ii) a second linear double stranded polynucleotide, wherein the second linear double stranded polynucleotide is linked to the first linear double stranded polynucleotide by homology directed repair (HDR) or non-homologous end joining (NHEJ) to form a circular donor-gRNA plasmid inside of the cell; wherein the cell comprises a CRISPR-associated (Cas) endonuclease or a nucleic acid sequence encoding the Cas endonuclease, and wherein the Cas endonuclease and the circular donor-gRNA plasmid and/or the first linear double stranded polynucleotide prior to assembly generate a site-specific edit in the genome of the cell and increases the editing efficiency, fidelity, and/or survival of the cell compared to a method that does not include in vivo plasmid assembly to produce a circular donor-gRNA plasmid.
2. The method of claim 1, wherein the first linear double stranded polynucleotide further comprises a DNA binding domain recognition sequence.
3. The method of claim 2, wherein the DNA binding domain recognition sequence binds to a LexA DNA binding domain or a forkhead homolog 1 (FKH1) DNA binding domain.
4. The method of claim 1, wherein the first linear double stranded polynucleotide further comprises nucleotide sequences complementary to sequences adjacent to the dsDNA break site generated by the Cas endonuclease.
5. The method of claim 1, wherein the first promoter is a constitutive or inducible promoter.
6. The method of claim 1, wherein the Cas endonuclease is a Cas9 or Cas12a endonuclease, or a modified variant thereof.
7. The method of claim 1, further comprising introducing into the cell a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a second promoter, the retron ncRNA comprising: a. a stabilizing 5ƍ ribozyme sequence; b. one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences; c. an msr sequence; d. an msd sequence; e. a donor sequence for homology directed repair (HDR) inserted within the msd sequence; and, f. a first inverted repeat sequence and a second inverted repeat sequence, wherein the cell further comprises a retron-specific reverse transcriptase (RT) or a nucleic acid encoding the retron RT, and wherein the retron RT and the retron ncRNA generate multicopy single-stranded DNA (msDNA) containing the single-stranded donor (ssDNA retron donor) sequences.
8. The method of claim 7, wherein the RNA binding domain recognition sequence is a MS2 stem loop sequence.
9. The method of claim 8, wherein the MS2 stem loop sequence binds to a MS2 coat protein (MCP) RNA binding domain.
10. The method of claim 7, wherein the single stranded nucleic acid binding domain recognition sequence binds to a Cas endonuclease binding domain.
11. The method of claim 7, wherein the second promoter is a constitutive or inducible promoter.
12. The method of claim 1, further comprising introducing into the cell a fusion protein comprising a) a DNA binding domain, an RNA binding domain and/or a single stranded nucleic acid binding domain and b) a dsDNA break site-localizing domain, or a nucleic acid encoding the fusion protein, wherein the fusion protein binds to the DNA binding domain recognition sequence of the circular donor-gRNA plasmid or the first linear double stranded polynucleotide
prior to assembly, and/or the one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences of the retron, and binds to a dsDNA break site generated by the Cas endonuclease, thereby recruiting the dsDNA donor sequences and/or the ssDNA retron donor sequences to the dsDNA break in the genome of the cell and promoting editing by HDR.
13. The method of claim 12, wherein the fusion protein comprises an dsDNA break site-localizing domain, wherein the dsDNA break site-localizing domain is a forkhead-associated (FHA) domain.
14. The method of claim 12 or 13, wherein the fusion protein comprises an RNA binding domain, wherein the RNA binding domain is a MCP RNA binding domain.
15. The method of claim 12, wherein the fusion protein comprises the MCP RNA binding domain and the FHA domain.
16. The method of claim 12, wherein the fusion protein comprises a DNA binding domain, wherein the DNA binding domain is a LexA DNA binding domain or an FKH1 DNA binding domain.
17. The method of claim 12, wherein the fusion protein comprises (i) the LexA DNA binding domain or the FKH1 DNA binding domain, (ii) the MCP RNA binding domain, and (iii) the FHA domain in one of the following orders: (i), (ii), (iii); (i), (iii), (ii); (ii), (i), (iii); (ii), (iii), (i); (iii), (i), (ii); and (iii), (ii), (i).
18. The method of claim 1, wherein the cell is a eukaryotic cell.
19. The method of claim 1, wherein the second linear double stranded polynucleotide further comprises a barcode sequence.
20. The method of claim 19, wherein the barcode sequence integrates into a designated barcode locus in the host cell genome.
21. The method of claim 20, wherein integration of the barcode sequence into the barcode locus comprises cleavage of the barcode locus genomic DNA by an endonuclease, wherein the endonuclease is operably linked to an inducible promoter.
22. The method of claim 21, wherein the endonuclease is a Cas endonuclease or a homing endonuclease.
23. The method of claim 19, wherein the circular donor-RNA plasmid comprises sequences that are homologous to sequences flanking the endonuclease cleavage site, such that homologous recombination results in integration of the barcode sequence into the barcode locus.
24. The method of claim 22 or 23, wherein the endonuclease is a homing endonuclease, wherein the homing endonuclease is an I-SceI endonuclease operably linked to a GAL1 promoter.
25. The method of claim 21, wherein the barcode locus comprises nucleic acid sequences encoding the Cas endonuclease used for editing the target site, sequences encoding the RT, and/or sequences encoding the fusion protein flanked by the endonuclease cleavages sites, wherein expression of the endonuclease results in removal of the nucleic acid sequences encoding the Cas endonuclease, the nucleic acid sequences encoding the RT, and/or the nucleic acid sequences encoding the fusion protein, concomitant with the integration of the barcode sequence.
26. The method of claim 1, wherein the second linear double stranded polynucleotide comprises a selectable marker.
27. A method for removing a plasmid which has integrated into an edited target locus in the genome of a cell, wherein the plasmid comprises: i) a nucleic acid sequence encoding a guide RNA operably linked to a first promoter; ii) a nucleic acid sequence encoding a homing endonuclease operably linked to a second promoter, wherein the second promoter is inducible; iii) a nucleic acid sequence encoding a Cas endonuclease operably linked to a third promoter, wherein the third promoter is inducible;
iv) a nucleic acid sequence that is cleaved by the homing endonuclease; and/or v) a nucleic acid sequence that is cleaved by the Cas endonuclease; wherein the method comprises inducing expression of the homing endonuclease and/or the Cas endonuclease to cleave the integrated plasmid DNA, thereby removing the plasmid from the edited target locus.
28. The method of claim 27, wherein removal of the plasmid results in recovery of a desired edit at the target locus.
29. The method of claim 27 or 28, wherein the homing endonuclease is an I-SceI endonuclease.
30. The method of claim 27, wherein the Cas endonuclease is Cas9, or a modified variant thereof.
31. The method of claim 30, wherein the Cas endonuclease is SaCas9, or a modified variant thereof.
32. The method of claim 27, wherein the second and/or third promoters are GAL1 promoter inducible by galactose.
33. The method of claim 27, wherein the second and/or third promoters are inducible by tetracycline or anhydrotetracycline (aTc).
34. The method of claim 27, wherein the second promoter is GAL1 promoter that is inducible by galactose and the third promoter is inducible by tetracycline or anhydrotetracycline (aTc).
35. The method of claim 27, wherein the plasmid further comprises (vi) a barcode sequence that is flanked by (iv) the nucleic acid sequence that is cleaved by the homing endonuclease and/or (v) the nucleic acid sequence that is cleaved by the Cas endonuclease.
36. The method of claim 35, wherein inducing expression of the homing endonuclease and/or the Cas endonuclease results in integration of the barcode sequence at a barcode locus, while simultaneously removing the plasmid from the edited target locus.
37. A method for multiplexed editing of DNA in cells, the method comprising introducing into the cells:
i) a guide RNA that binds a target site in the genomic DNA in the cells; ii) a library of linear double stranded donor polynucleotides comprising a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a promoter, wherein the retron ncRNA comprising: a. a stabilizing 5ƍ ribozyme sequence; b. one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences; c. an msr sequence; d. an msd sequence; e. a donor sequence for homology directed repair (HDR) inserted within the msd sequence, wherein each donor sequence in the library introduces a different edit at the target site that binds the guide RNA; and, f. a first inverted repeat sequence and a second inverted repeat sequence; iii) a linear double stranded polynucleotide; wherein the linear double stranded polynucleotide of (iii) is linked in vivo into the linear recombinant double stranded polynucleotide of (ii) by homology directed repair (HDR) or non-homologous end joining (NHEJ) to produce a circular donor plasmid; and iv) a fusion protein comprising an RNA binding domain or single stranded nucleic acid binding domain connected to a DNA break site-localizing domain, or a nucleic acid encoding the fusion protein; wherein the cells express a Cas endonuclease or comprise a nucleic acid encoding the same, and wherein the cells express a retron-specific reverse transcriptase (RT) or comprise a nucleic acid encoding the same; wherein the fusion protein binds to the one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences of the retron, and binds to a dsDNA break site generated by the Cas endonuclease, thereby recruiting the ssDNA retron donor sequence to the dsDNA break locus and promoting editing by HDR, wherein a designed edit is introduced at the target site, with a plurality of different edits produced in different cells and each cell receiving a single edit.
38. The method of claim 37, wherein the RNA binding domain recognition sequence is a MS2 stem loop sequence.
39. The method of claim 38, wherein the MS2 stem loop sequence binds to a MS2 coat protein (MCP) RNA binding domain.
40. The method of claim 37, wherein the single stranded nucleic acid binding domain recognition sequence binds to a Cas endonuclease binding domain.
41. The method of claim 37, wherein the locus surrounding the dsDNA break site accumulates phosphothreonine (pT) modified proteins recognized by forkhead- associated (FHA) domains.
42. The method of claim 37, wherein the fusion protein comprises an dsDNA break site-localizing domain, wherein the dsDNA break site localizing domain is a FHA domain.
43. The method of claim 37, wherein the fusion protein comprises a MCP RNA binding domain and a FHA domain.
44. The method of claim 37, wherein the linear double stranded donor polynucleotide of (ii) further comprises a DNA binding domain recognition sequence.
45. The method of claim 44, wherein the nucleic acid binding domain recognition sequence binds to a LexA DNA binding domain or a forkhead homolog 1 (FKH1) DNA binding domain.
46. The method of claim 37, wherein the fusion protein further comprises a LexA DNA domain or a FKH1 DNA binding domain.
47. The method of claim 46, wherein the LexA DNA domain or the FKH1 DNA binding domain is located between the MCP RNA binding domain and the FHA domain.
48. The method of claim 46 or 47, wherein the fusion protein forms a complex with the circular plasmid and the dsDNA break site, thereby recruiting the circular plasmid to the DNA break and enhancing HDR.
49. The method of claim 37, wherein the linear double stranded donor polynucleotide of (ii) further comprises nucleotide sequences complementary to sequences adjacent to the dsDNA break.
50. The method of claim 37, wherein the promoter is a constitutive promoter.
51. The method of claim 37, wherein the Cas endonuclease is a Cas9 or Cas12a endonuclease, or a modified variant thereof.
52. The method of claim 37, wherein the plurality of cells are eukaryotic cells.
53. The method of claim 37, wherein the editing efficiency, fidelity, and/or survival is improved compared to a method that does not include in vivo plasmid assembly to produce a circular plasmid.
54. The method of claim 37, wherein the linear double stranded polynucleotide of (iii) further comprises a barcode sequence.
55. The method of claim 37, wherein the linear double stranded polynucleotide of (iii) comprises a selectable marker.
56. A system for editing DNA at a target site in the genome of a cell, comprising: (i) a first linear double-stranded donor polynucleotide comprising a nucleic acid sequence encoding a guide RNA operably linked to a first promoter and a nucleic acid sequence encoding a retron structured non-coding RNA (ncRNA) operably linked to a second promoter, the retron ncRNA comprising: a. a stabilizing 5’ ribozyme sequence; b. one or more RNA binding domain recognition sequences or one or more single-stranded nucleic acid binding domain recognition sequences; c. an msr sequence; d. an msd sequence; e. a donor sequence for homology directed repair (HDR) inserted within the msd sequence; and f. a first inverted repeat sequence and a second inverted repeat sequence; (ii) a second linear double-stranded polynucleotide, and
(iii) a fusion protein comprising an RNA binding domain or single-stranded nucleic acid binding domain connected to a dsDNA break site-localizing domain, or a nucleic acid encoding the fusion protein.
57. The system of claim 56, further comprising a cell that comprises a CRISPR-associated (Cas) endonuclease or a nucleic acid encoding the same, and a retron- specific reverse transcriptase (RT) or a nucleic acid encoding the same.
58. The system of claim 56 or 57, wherein the fusion protein binds to the one or more RNA binding domain recognition sequences or one or more single-stranded nucleic acid binding domain recognition sequences and binds to a dsDNA break site generated by the Cas endonuclease at the target site.
59. The system of claim 56, wherein retron RNA expressed by (i) is reverse transcribed by the RT in vivo to produce multiple ssDNA molecules comprising ssDNA retron donor sequences, wherein individual ssDNA molecules bind to the fusion protein to produce a complex between the linear ssDNA molecules, the fusion protein, and the dsDNA break site, thereby recruiting the ssDNA retron donor sequence to the dsDNA break site and promoting editing by HDR.
60. The system of claim 56, wherein the second linear double-stranded polynucleotide comprises a selectable marker.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202263401083P | 2022-08-25 | 2022-08-25 | |
| US63/401,083 | 2022-08-25 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2024044767A2 true WO2024044767A2 (en) | 2024-02-29 |
| WO2024044767A3 WO2024044767A3 (en) | 2024-06-27 |
Family
ID=90014129
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2023/072942 Ceased WO2024044767A2 (en) | 2022-08-25 | 2023-08-25 | Recruitment of donor dna from in vivo assembled plasmids for saturation genome editing |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024044767A2 (en) |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP4431607A3 (en) * | 2016-09-09 | 2024-12-11 | The Board of Trustees of the Leland Stanford Junior University | High-throughput precision genome editing |
| US12416015B2 (en) * | 2017-09-15 | 2025-09-16 | The Board Of Trustees Of The Leland Stanford Junior University | Multiplex production and barcoding of genetically engineered cells |
| WO2021046243A2 (en) * | 2019-09-03 | 2021-03-11 | Myeloid Therapeutics, Inc. | Methods and compositions for genomic integration |
| WO2022272293A1 (en) * | 2021-06-23 | 2022-12-29 | The Board Of Trustees Of The Leland Stanford Junior University | Compositions and methods for efficient retron production and genetic editing |
-
2023
- 2023-08-25 WO PCT/US2023/072942 patent/WO2024044767A2/en not_active Ceased
Also Published As
| Publication number | Publication date |
|---|---|
| WO2024044767A3 (en) | 2024-06-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP6737974B1 (en) | Nuclease-mediated DNA assembly | |
| US20250034562A1 (en) | Compositions and methods for improving the efficacy of cas9-based knock-in strategies | |
| ES2955957T3 (en) | CRISPR hybrid DNA/RNA polynucleotides and procedures for use | |
| US11680262B2 (en) | Method for inducing exon skipping by genome editing | |
| US10526590B2 (en) | Compounds and methods for CRISPR/Cas-based genome editing by homologous recombination | |
| CN113444747B (en) | Methods and compositions for targeted genetic modification using paired guide RNAs | |
| US20230125704A1 (en) | Modified bacterial retroelement with enhanced dna production | |
| EP3872177B1 (en) | Compositions and methods for enhancing homologous recombination | |
| EP3682004A2 (en) | Multiplex production and barcoding of genetically engineered cells | |
| CN106062197A (en) | Delivery, engineering and optimization of tandem guide systems, methods and compositions for sequence manipulation | |
| US20220389415A1 (en) | Production and tracking of engineered cells with combinatorial genetic modifications | |
| US20240110163A1 (en) | Crispr-associated based-editing of the complementary strand | |
| WO2017106251A1 (en) | Cas discrimination using tuned guide rna | |
| WO2024044767A2 (en) | Recruitment of donor dna from in vivo assembled plasmids for saturation genome editing | |
| WO2022272294A1 (en) | Compositions and methods for efficient retron recruitment to dna breaks | |
| JP2025530183A (en) | Rett Syndrome Treatment | |
| HK40058696A (en) | Compositions and methods for enhancing homologous recombination | |
| WO2024023734A1 (en) | MULTI-gRNA GENOME EDITING | |
| WO2024044736A2 (en) | Enhanced mammalian crispr editing with separated retron donor and nickases | |
| HK40005602B (en) | Compositions and methods for enhancing homologous recombination | |
| HK40005602A (en) | Compositions and methods for enhancing homologous recombination | |
| HK1245839B (en) | Crispr hybrid dna/rna polynucleotides and methods of use | |
| BR122019001480B1 (en) | SET OF TWO CLASS 2 CRISPR POLYNUCLEOTIDES, CLASS 2 CRISPR SYSTEM, IN VITRO METHODS OF MODIFYING A TARGET NUCLEIC ACID MOLECULE AND FOR MODULATING THE TRANSCRIPTION OF AT LEAST ONE GENE INTO THE TARGET NUCLEIC ACID MOLECULE |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23858361 Country of ref document: EP Kind code of ref document: A2 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 23858361 Country of ref document: EP Kind code of ref document: A2 |