[go: up one dir, main page]

WO2024259103A1 - Cadre d'apprentissage profond pour prédire une activité sur cible et hors cible de arn guides crispr - Google Patents

Cadre d'apprentissage profond pour prédire une activité sur cible et hors cible de arn guides crispr Download PDF

Info

Publication number
WO2024259103A1
WO2024259103A1 PCT/US2024/033813 US2024033813W WO2024259103A1 WO 2024259103 A1 WO2024259103 A1 WO 2024259103A1 US 2024033813 W US2024033813 W US 2024033813W WO 2024259103 A1 WO2024259103 A1 WO 2024259103A1
Authority
WO
WIPO (PCT)
Prior art keywords
grna
target
grnas
gene
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/033813
Other languages
English (en)
Inventor
Hans-Hermann WESSELS
Andrew STIRN
David A. Knowles
Neville E. SANJANA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New York University NYU
New York Genome Center Inc
Original Assignee
New York University NYU
New York Genome Center Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New York University NYU, New York Genome Center Inc filed Critical New York University NYU
Publication of WO2024259103A1 publication Critical patent/WO2024259103A1/fr
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2320/00Applications; Uses
    • C12N2320/10Applications; Uses in screening processes
    • C12N2320/12Applications; Uses in screening processes in functional genomics, i.e. for the determination of gene function
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation

Definitions

  • RNA-guided, RNA-targeting Type VI CRISPR Cas proteins enable direct manipulation of cellular RNAs with high precision compared to previous RNA- targeting technologies 1-5 .
  • a growing number of RNA-engineering technologies have been developed using nuclease active Cas 13 or inactive dCasl3 effector proteins 6 . These methods critically rely on the ability of Cas 13 to distinguish between binding sites in target RNAs and closely related secondary (off-target) binding sites based on the complementarity between guide RNA (gRNA) sequence and bound RNA sequence. In general, the goal is to maximize on-target gRNA activity while minimizing off-target effects.
  • Precise modulation of gene expression in mammalian systems can be achieved in multiple ways.
  • synthetic promoter sequences 23 or tetracycline-dependent promoter constructs 24 can be used to modulate gene expression.
  • insertion of czs-regulatory elements (i.e. miRNAs binding sites) in the 3’UTRs of endogenous genes renders them susceptible to the recruitment of the endogenous RNA surveillance and silencing machinery 25 .
  • Such approaches require a considerable amount of engineering on an individual target basis.
  • programmable nuclease-null (dCas9) CRISPR systems provide a flexible and scalable alternative for systematic titration of gene expression 26 .
  • epigenetic effector domains commonly fused to dCas9 e.g. KRAB domain
  • gRNAs guide RNAs
  • CRISPRi CRISPR interference
  • CRISPRa CRISPR activation
  • the method includes (i) providing a first perfect match (PM) gRNA that comprises a spacer sequence that is 100% complementary to a target sequence within the gene; (ii) providing a second mismatch gRNA that targets the gene, wherein the spacer sequence of the second gRNA comprises one or more mismatches with the target DNA sequence such that the CRISPRi or CRISPRa activity on the gene obtained using the second gRNA is intermediate between that obtained using the first gRNA and that obtained using a scrambled gRNA providing no CRISPRi or CRISPRa activity on the gene; and (iii) providing a third indel mismatch gRNA that targets the gene, wherein the space sequence of the third sgRNA comprises one or more mismatches with the target DNA sequence such that the CRISPRi or CRISPRa activity on the gene obtained using the third gRNA is intermediate between that obtained using the second gRNA and that obtained using a scrambled gRNA providing no CRISPRi or CRISPRa activity on the gene
  • the method includes providing one or more additional mismatch gRNAs, wherein each of the one or more additional gRNAs provide CRISPRi or CRISPRa activity on the gene that is intermediate between that obtained using the third gRNA and that obtained using a scrambled gRNA providing no CRISPRi or CRISPRa activity on the gene, and wherein the mismatches with the target sequence of each of the one or more additional gRNAs are selected according to rule (a).
  • a computational model termed TIGER comprises one or more of the following layers: i) 2D convolution with 32 4 x 4 filters and ReLU activations ii) 2D convolution with 32 4 x 4 filters and ReLU activations iii) 2D max pooling with a 1 x 2 pool size iv) Flatten to vector v) Dropout with 0.25 dropout rate vi) Concatenation with a vector of non-sequence features vii) Dense layer with 128 sigmoid outputs viii) Dropout with 0.1 dropout rate ix) Dense layer with 32 sigmoid outputs x) Dropout with 0.1 dropout rate xi) Lineai’ output layer with scalar output; wherein the non-sequence features comprise one or more of the following: a) target accessibility b) hybridization MFE c) guide MFE d) target location within transcript e) junction proximity f) guide secondary structure wherein the computational model predicts
  • a set of single guide RNAs for obtaining a series of discrete expression levels of a target gene using CRISPRi or CRISPRa.
  • the set comprises (i) a first perfect match (PM) gRNA that comprises a spacer sequence that is 100% complementary to a target sequence within the gene; (ii) a second mismatch gRNA that targets the gene, wherein the spacer sequence of the second gRNA comprises one or more mismatches with the target DNA sequence such that the CRISPRi or CRISPRa activity on the gene obtained using the second gRNA is intermediate between that obtained using the first gRNA and that obtained using a scrambled gRNA providing no CRISPRi or CRISPRa activity on the gene; and (iii) a third indel mismatch gRNA that targets the gene, wherein the space sequence of the third sgRNA comprises one or more mismatches with the target DNA sequence such that the CRISPRi or CRISPRa activity on the gene obtained using
  • a method of selecting a gRNA for use in a Cas-based genome editing system capable of introducing a discreet level of expression of a target gene in a cell population includes transducing mammalian cells with a library of gRNAs, the library comprising a plurality of different gRNAs for one or more genes, the plurality of gRNAs comprising for each gene: i) perfect-match (PM) gRNAs that are 100% complementary to a target sequence within the gene; ii) mismatch gRNAs that contain 1, 2, or 3 nucleotide mismatches to the target sequence within the gene; iii) indel gRNAs that contain 1 or 2 nucleotide insertions or deletions to the target sequence within the gene; determining the activity of each gRNA in the library; determining the relative activity of all mismatch gRNAs and indel gRNAs relative to their cognate PM gRNAs; and selecting the gRNA based on the discreet
  • a method for precise modulation of target RNA comprises providing a gRNA that allows a desired level of effector activity of a Casl3d enzyme when the gRNA and Cas 13d enzyme are contacted with the target RNA to which the gRNA binds, wherein the gRNA is identified by a method as described herein.
  • a method of predicting the efficacy of a subject gRNA for use in a Cas-based genome editing system includes analyzing the subject gRNA sequence with a computational model to predict efficacy of the subject gRNA, the computational model comprising one or more of the following layers: i) 2D convolution with 32 4 x 4 filters and ReLU activations ii) 2D convolution with 324 x 4 filters and ReLU activations iii) 2D max pooling with a 1 x 2 pool size iv) Flatten to vector v) Dropout with 0.25 dropout rate vi) Concatenation with a vector of non-sequence features vii) Dense layer with 128 sigmoid outputs viii) Dropout with 0.1 dropout rate ix) Dense layer with 32 sigmoid outputs x) Dropout with 0.1 dropout rate xi) Linear output layer with scalar output; wherein the non-sequence features comprise one or
  • a system for selecting gRNA for titrating expression of a target gene in a target cell from a multiplicity of candidate guides includes a memory; and a processor coupled with the memory and configured to: analyze the subject gRNA sequence with a computational model to predict efficacy of the subject gRNA, the computational model comprising one or more of the following layers: i) 2D convolution with 32 4 x 4 filters and ReLU activations ii) 2D convolution with 32 4 x 4 filters and ReLU activations iii) 2D max pooling with a 1 x 2 pool size iv) Flatten to vector v) Dropout with 0.25 dropout rate vi) Concatenation with a vector of non-sequence features vii) Dense layer with 128 sigmoid outputs viii) Dropout with 0.1 dropout rate ix) Dense layer with 32 sigmoid outputs x) Dropout with 0.1 dropout rate xi) Linear- output layer
  • FIG. 1A - FIG. 1H show pooled CRISPR-Casl3 essentiality screen assaying Casl3d guide RNA (gRNA) efficacy.
  • FIG. 1A Design of pooled CRISPR-Casl3d screen for mapping gRNA variants with mismatch and insertion-deletion changes to perfect- match (PM) gRNAs.
  • FIG. IB Composition of gRNA library containing 120,000 perfectly matching and mismatched gRNA sequences targeting the coding region of essential genes.
  • FIG. IE Fraction of active gRNAs (logiFC ⁇ -0.5) for PM gRNAs separated by RF on quartile predictions.
  • FIG. IF Fraction of active (logiFC ⁇ -0.5) predicted Quartile 4 (Q4) PM gRNAs for all 16 essential gene targets.
  • FIG. 2A - FIG. 2G show large-scale mapping of Casl3d guide RNA (gRNA) mismatch activity.
  • FIG. 2A Empirical cumulative distribution of gRNA depletion for all gRNAs by introduced mutation type. Vertical line indicates cutoff for active gRNAs (logrFC ⁇ -0.5).
  • FIG. 2B Fraction of active gRNAs for gRNAs as shown in FIG. 1A.
  • FIG. 2E (left)) Heatmap depicting all insertion types.
  • FIG. 2E (right ⁇ ) Boxplot highlighting the single insertion gRNAs.
  • FIG. 2F Detailed representation of relative activity for single mismatch (SM) gRNA separated by reference guide nucleotide (bottom) or substitution identity (top) for each mismatch position.
  • FIG. 3 A - FIG. 31 show a deep learning model to predict optimal Casl3d guide RNAs.
  • FIG. 3A TIGER combines one-hot encoded guide and target sequences for sequence input, following an AlexNet architecture but allowing for non- sequence features as inputs to the first dense layer.
  • FIG. 3B Correlation of predictions with additional sequence context (5' only, 3' only, combined 5' and 3') to the 23nt gRNA target site (10-fold cross-validation, target site-level) using a sequence only model. Ho denotes the best performing condition and all differences between other conditions and Ho arc significant (p ⁇ 0.05, Steiger’s test 67 ).
  • FIG. 3A TIGER combines one-hot encoded guide and target sequences for sequence input, following an AlexNet architecture but allowing for non- sequence features as inputs to the first dense layer.
  • FIG. 3B Correlation of predictions with additional sequence context (5' only, 3' only, combined 5' and 3') to the 23nt g
  • FIG. 3E Receiver-operator characteristic (ROC) curve and other performance metrics for all gRNAs from a previously published screen using flow cytometry of cell surface proteins (Ref. 7 ).
  • FIG. 3E employ a Steiger’s test 67 for Pearson and Spearman comparisons, DeLong’s test 68,69 for AUROC comparisons and a bootstrapped Kolmogorov-Smirnov test 70 for AUPRC comparisons.
  • FIG. 3F Design of pooled CRISPR-Casl3d screen targeting 5,166 genes with 8 high efficacy gRNAs from TIGERcombined predictions.
  • FIG. 3F Design of pooled CRISPR-Casl3d screen targeting 5,166 genes with 8 high efficacy gRNAs from TIGERcombined predictions.
  • FIG. 4A - FIG. 4J show training TIGER using gRNAs with mismatches enables prediction of off-target activity and transcript modulation using gRNAs with single mismatches.
  • FIG. 4A The correlation between observed and TIGER-predicted gRNA abundance by gRNA design type.
  • FIG. 4B The correlation between change in observed and TIGER-predicted gRNA abundance by gRNA design type. The change in gRNA abundance is defined as the difference in fold-change for a particular gRNA with mismatches and its cognate perfect match (PM) gRNA.
  • FIG. 4C Receiver-operator characteristic (ROC) curves for each gRNA design type from 10- fold target site cross-validation.
  • FIG. 4A The correlation between observed and TIGER-predicted gRNA abundance by gRNA design type.
  • FIG. 4B The correlation between change in observed and TIGER-predicted gRNA abundance by gRNA design type. The change in gRNA abundance is defined
  • FIG. 4D Averaged correlation (Pearson and Spearman) and aggregated areas under the ROC and precision-recall curves for each gRNA design type from 10- fold target-site cross-validation.
  • FIG. 4E A framework for using gRNA with single-mismatches to modulate Casl3 targeting activity.
  • FIG. 4H Design of pooled CRISPR-Casl3d screen for TIGERcombined gRNA predictions targeting 1,082 common essential genes with 4 high efficacy PM gRNAs, and 10 SM gRNAs with varying relative activity.
  • FIG. 5 provides a schematic of the TIGER model architecture.
  • FIG. 6 provides a graphical summary of various validation folding strategies.
  • FIG. 7 provides a schematic of the numbering strategy for the gRNA (bottom) and target (top).
  • FIGs. 8A-8C show two non-sequence features used in the TIGER model.
  • FIG. 8A shows the correlation of target site accessibility and gRNA efficacy.
  • FIG. 8B shows the correlation of target site hybridization and gRNA efficacy.
  • FIG. 8C shows the schematic of how these features are assessed with the top referring to target site accessibility, and the bottom referring to target site hybridization.
  • FIGs. 9A-9N are representative figures showing the data for 10,000 perfect match (PM) gRNA, and the assessment of their non-sequence features, as used in the TIGER model.
  • FIGs. 10A-10B demonstrate that combinatorial knockdown of synthetic lethal pair HDAC1 - HDAC2 confirms lack ofcollateral RNA cleavage after low copy-number transduction of RfxCasl3d.
  • FIG. 10A shows that combinatorial knockdown of synthetic lethal pair HDAC1 - HDAC2 confirms lack ofcollateral RNA cleavage after low copy-number transduction of RfxCasl3d.
  • Described herein is a method of screening and selecting a clustered regularly interspaced short palindromic repeats (CRISPR) guide RNA (gRNA) suitable for forming a complex with a CRISPR-associated protein 13d (Casl3d) or a variant thereof and directing the complex to a target RNA.
  • the method includes assessing the on-target and/or off target activity of the gRNA.
  • the inventors generated the largest Casl3d dataset to date measuring the activity of -250,000 gRNAs across a total of 14 individual screens, and performed a comprehensive assessment of Casl3d gRNA on-target and off-target activity. Specifically, they sought to characterize perfect match (PM) gRNA activity determinants and gRNAs permutations across a large set of nucleotide mismatches and indcls relative to their cognate target sites. It was determined that a gRNA’s ability to trigger Casl3d nuclease activity depends on the permutation position within the gRNA, the nucleotide identity, as well as target site context. Prior studies have not characterized certain gRNA permutations such as insertions or deletions.
  • TIGER TIGER CNN model was trained for on-target activity and off-target activity.
  • TIGER has best-in-class performance for Casl3d on- target activity, compared to existing Casl3d on-target models, including those with larger training sets.
  • the TIGER model is the first compelling attempt to understand and model Casl3d off-target binding and nuclease activation.
  • the TIGER platform is applied to develop a novel approach for precise and massively parallel interrogation of gene dosage.
  • Novel CRISPR technologies hold great promise for a new generation of therapeutic agents.
  • RNA-targ eting CRISPR proteins have recently been shown to provide therapeutic value in disease models 12 l 7 .
  • High precision is key to the safety of therapeutic RNA- targeting CRISPR agents.
  • TIGER predictions will enable ranking and ultimately avoidance of undesired off-target binding sites and nuclease activation, and further spur development of RNA- targeting therapeutics.
  • the ability to distinguish between closely related target sites may for example enable targeting of allelic variants and other nearly undruggablc targets like fusion gene products 47 .
  • RNA-targeting CRISPR perturbations can be used to systematically study the effect of gene dosage at the RNA levels.
  • This platform fundamentally extends on previous microRNA-based platforms 25 that, on one hand, lack scalability due to laborious target site engineering and, on the other hand, lack target- specificity if engineered microRNAs are provided exogenously due to their short target site recognition sequence.
  • tuning of gene expression at the RNA level may be beneficial compared to modulation at the DNA level, as gene expression initiation is inherently stochastic 48 and biological systems have evolved in a way to fine-tune gene expression post-transcriptionally 49,50 .
  • DNA- targeting e.g., dCas9-based CRISPR approaches have been proposed for gene expression modulation 26 .
  • epigenetic effector domains e.g., KRAB
  • fused dCas9 proteins are well suited as they may act more in a binary on-off fashion 27,51 , and may lack precision for closely spaced genes due to spreading of chromatin modifications and DNA methylation 52 .
  • KRAB epigenetic effector domains fused dCas9 proteins
  • compositions and methods for generating discrete, intermediate expression levels of any gene of interest when using CRISPRi or CRISPRa involve the introduction of one or more mismatches or indels into the targeting sequence of gRNAs so as to achieve a level of CRISPRi or CRISPRa activity that is, e.g., intermediate between that obtained with a gRNA sharing 100% homology (perfect match, PM) with a target RNA sequence and/or an unmodified constant region and that obtained with a non-specific gRNA providing no CRISPRi or CRISPRa activity on the gene in question.
  • a level of CRISPRi or CRISPRa activity that is, e.g., intermediate between that obtained with a gRNA sharing 100% homology (perfect match, PM) with a target RNA sequence and/or an unmodified constant region and that obtained with a non-specific gRNA providing no CRISPRi or CRISPRa activity on the gene in question.
  • rules are provided by which the specific effects of a given mismatch or mutation on CRISPRi or CRISPRa activity can be determined, allowing the design of sets of gRNAs targeting a given gene and providing a series of discrete levels of expression of the gene. As described herein, such sets can be combined to form libraries targeting multiple genes, including large libraries targeting thousands of genes in the genome.
  • CRISPR interference CRISPR interference
  • CRISPR activation CRISPRa
  • dCasl3 ovcrcxprcssion
  • sgRNA single guide RNA
  • the Casl3-gRNA complex binds to RNA (e.g., mRNA) via base-pairing between the gRNA and RNA, i.e., the targeting sequence of the sgRNA and the target RNA sequence on the e.g., mRNA, and the fused transcriptional repressor or activator leads to downregulation or upregulation of the gene, respectively.
  • RNA e.g., mRNA
  • the present disclosure provides methods to control the activity of Casl3 at a given RNA target site, e.g., by introducing mismatches into the gRNA (e.g., within the targeting sequence of the gRNA) or by introducing mutations into the gRNA constant region.
  • the present disclosure also provides rules, factors, and parameters to determine how a given mismatch in a gRNA targeting sequence affects the extent of repression or activation of a target gene by CRISPRi or CRISPRa, allowing the design of sets of mismatched gRNAs against the gene to allow its downregulation or upregulation to varying extents.
  • the information on the expression level of the target gene is encoded in the gRNA sequence or in the vector encoding the gRNA, and can therefore be read out by, e.g., deep sequencing and matched to a resulting phenotype.
  • experiments involving systematically mismatched gRNAs can be conducted in a single pooled experiment, reducing experimental variation and enhancing reproducibility. It will be appreciated that any of the herein-described methods and compositions can be applied to both gene downregulation (using CRISPRi) and overexpression (using CRISPRa).
  • gRNA Guide RNA
  • the CRISPR system described herein has two main components: a Casl3 protein and the gRNA.
  • the Cas protein has nuclease activity, and the gRNA leads the Cas protein to the specific target.
  • a gRNA for Cas 13 is a small RNA with a conserved sequence forming a stem-loop structure (also known as direct repeat or ‘DR’) and a spacer sequence. The spacer is sometimes referred to as a guide. Cas 13 binds to the DR, and the spacer sequence is complementary to the target RNA.
  • compositions and methods provided herein utilize spacer sequences that are a perfect match (PM), i.e., 100% complementary to the target sequence, in addition to spacer sequences having one or more mismatches (mutations as compared to the PM) or indcls (one or more nucleotide insertions or deletions as compared to the PM).
  • PM perfect match
  • indcls one or more nucleotide insertions or deletions as compared to the PM.
  • the spacer is about 20 nucleotides (nt) to about 33 nt. In a further embodiment, the spacer is about 20 nt, about 21 nt, about 22 nt, about 23 nt, about 24 nt, about 25, nt, about 26, nt, about 27 nt, about 28 nt, about 29 nt, about 30 nt, about 31 nt, about 32 nt, or about 33nt. In one embodiment, the spacer is about 27 nt. In one embodiment, the spacer is about 23 nt. Existence of a sequence in a target RNA similar to a protospacer or protospacer adjacent motif (PAM) was not found in the CRISPR-Casl3d system.
  • PAM protospacer adjacent motif
  • nucleotide residues in a gRNA or a portion of it are numbered as illustrated in Figure 7.
  • the numbering is based on a numbering from 5’ end of the gRNA to 3’ end recognizing the guide match start as nt 1.
  • the guide match start is the first nucleotide residue (nt) from the 5’ end of the gRNA which is capable of matching to a nt of a target RNA.
  • the nt numbering at the 3’ side of the guide match start is a positive integer positively correlated to its distance to the guide match start, while the nt numbering at the 5’ side of the guide match start is a negative integer whose absolute value is positively correlated to its distance to the guide match start.
  • One exception is the last nt of the DR stem loop contiguously proceeding the first nt of the guide is numbered as nt 0.
  • an order of a nt is implying, for example, via using the terms “first” “last” “proceeding” or similar, the order is counted from the 5’ end to the 3’ end.
  • the nt numbering is from 5’ end of the target RNA to its 3’ end recognizing the nt which is capable of matching to the guide match start as nt 0.
  • the nt numbering at the 3’ side of the nt matching to the guide match start is a positive integer positively correlated to its distance to the guide match start, while the nt numbering at the 5’ side of the nt matching to the guide match start is a negative integer whose absolute value is positively correlated to its distance to the guide match start.
  • nucleotide 1 defines the guide stall site (GSS) being the most 5’ guide RNA base matching the target RNA.
  • Nucleotide 2 relative to GSS is the subsequent base (moving in the 5’ to 3’ direction) in the guide RNA and so on.
  • wc denote the target nucleotide opposite to the GSS as nucleotide 0.
  • target RNA nucleotide -1 is upstream to the GSS and pairs with guide nucleotide 2, while target RNA nucleotide +1 is downstream of the target site and so on.
  • a range of nt is also illustrated as nucleotide position p over the distance d to the position p+d with its cognate sequence.
  • a nt range is noted as (nt x: y) indicating nt x to nt y, wherein x and y is an integer which may be positive, negative or zero.
  • nt residue which may be an RNA or a DNA
  • adenine is the complementary base of thymine in DNA and of uracil in RNA.
  • nucleotide residues matching with each other are a pair of nucleotide residues (nt), or paired nt.
  • nt nucleotide residues
  • Hybridization is the process of complementary base pairs (nucleotide residues) binding to form a double helix.
  • the term “hybridization” or any other grammatical variation hereof refers to at least two regions from one single nucleic acid molecule or of two or more nucleic acid molecules which comprises at least one nucleotide residue in one region matches a nucleotide residue in another region.
  • each of the nt in the first region matches to a nt in the second region.
  • each of the nt in the first region matches to each of the nt in the second region.
  • one or more mismatch(es) may be found between two regions, for example one mismatch, two mismatches, two consecutive mismatches, two nonconsecutive mismatches, three or more mismatches (consecutive or nonconsecutive).
  • the gRNA may target any region of the target RNA.
  • Casl3 proteins only bind and cut ssRNA is essential that your gRNAs for Casl3 target the loops or single- stranded regions of the RNA, and not dsRNA portions generated due to secondary structure.
  • Nucleic acid secondary structure is the base pairing interactions within a single nucleic acid polymer or between two polymers. It can be represented as a list of bases which are paired in a nucleic acid molecule. Nucleic acid secondary structure can be determined from atomic coordinates (tertiary structure) obtained by X-ray crystallography, often deposited in the Protein Data Bank. Current methods include 3DNA/DSSR and MC-annotatc. Methods for nucleic acid secondary structure prediction are also available, for example those relying on a nearest neighbor thermodynamic model. A common method to determine the most probable structures given a sequence of nucleotides makes use of a dynamic programming algorithm that seeks to find structures with low free energy. The lower the free energy is, the more stable the secondary structure is.
  • minimum free energy has been used in characterizing a secondary structure.
  • minimum free energies (MFEs) of a gRNA secondary structure were derived using RNAfold [ — gquad ] on the full-length gRNA sequence.
  • a MFE of a secondary structure form by two regions hybridizing to each other (for example a target RNA and it corresponding guide) is referred to as a hybridization MFE.
  • Target RNA unpaired probability was calculated using RNAplfold [ -L 40 -W 80 -u 50 ] as described previously.
  • RNA-RNA-hybridization was calculated using RNAhybrid [ -s - c ] using the di-nucleotide frequency derived from the target sequence.
  • RNA- hybridization minimum free energy for each spacer RNA nucleotide position p over the distance d to the position p + d with its cognate target sequence.
  • G-quadruplex is a secondary structure formed in nucleic acid by sequences that are rich in guanine. They are helical structures containing guanine tetrads that can form from one, two or four strands. Four guanine bases can associate through Hoogsteen hydrogen bonding to form a square planar structure called a guanine tetrad (G-tetrad or G-quartet), and two or more guanine tetrads (from G-tracts, continuous runs of guanine) can stack on top of each other to form a G-quadruplex.
  • G-quadruplex structures can be computationally predicted from DNA or RNA sequence motifs or other method available publicly or commercially.
  • RNAfold may be used to determine a presence or absence of a G-quadruplex.
  • a “target RNA” refers to an RNA molecule or a nucleic acid molecule to which a spacer sequence is designed to target, e.g., have complementarity, where hybridization between a target RNA and a guide promotes the formation of a CRISPR-Cas 13d complex.
  • the target RNA comprises at least 20 nt (or at least 23 nt, or at least 87 nt, or at least 100 nt) RNA residues or a modification thereof.
  • the target RNA comprises at least 20 nt contiguous RNA residues or a modification thereof.
  • the region of a target RNA which is capable of hybridizing to a spacer of a gRNA is referred to herein as a potential hybridization region.
  • a potential hybridization region Such target RNA, a hybridization region therein, a gRNA which the hybridization region of the target RNA may hybridize to, and a spacer of the gRNA correspond to each other.
  • the term “seed” region or any other grammatical variation thereof means a critical region of the target sequence of Class 2, Type VI enzymes (e.g., Casl3d) that must be strictly complementary to the CRISPR RNA guide to ensure knock-down efficacy. Mismatches between the target and CRISPR RNA guide sequence can contribute to off-target activity.
  • the critical Casl3d seed region is defined as the region located between spacer RNA nucleotides 15 to 21.
  • the seed region is defined as the region located between spacer RNA nucleotides 15 to 21, with its center at nucleotide 18 relative to the guide RNA 5’ end. Within the seed region, single mismatches lead to diminished guide enrichment, while mismatches outside the seed region were better tolerated (see Figure 2F).
  • the gRNA also includes a short hairpin region, also referred to as a “direct repeat” (DR) sequence, via which the Casl3 protein complexes with the guide RNA.
  • DR direct repeat
  • Casl3 enzymes each recognize a direct repeat (DR) sequence containing a conserved stem loop structure within their cognate crRNA.
  • the DR sequence motifs, RNA fold, and DR position relative to the spacer sequence are each distinct.
  • the DR is located on the 5' end while the Casl3b DR is 3' of the spacer sequence.
  • the DR is 5’ to the spacer sequence.
  • the DR is 3’ to the spacer sequence.
  • Cheng et al Structural Basis for the RNA-Guided Ribonuclease Activity of CRISPR-Cas 13d, Cell. 2018 Sep 20;175(l):212-223.el7, which is incorporated herein by reference.
  • WO 2022/063314 which describes DRs and modified DRs that are useful herein. This document is incorporated herein by reference.
  • gRNAs are provided with one or more mismatches in the spacer sequence of the gRNA in order to generate intermediate levels of CRISPRi or CRISPRa activity.
  • Complementary nucleotides are, generally, A and T (or A and U), and G and C.
  • sets of gRNAs arc provided with different mismatches to generate a constitutive expression levels of a target gene.
  • a set typically includes at least one gRNA in which the spacer sequence is a PM, as well as one or more gRNAs that comprise one or more mismatches within the spacer sequence.
  • Mismatches in the targeting sequence selected according to the present methods reduce the CRISPRi or CRISPRa activity to an intermediate level between that of a PM gRNA with 100% homology to the target RNA (e.g., providing 100% CRISPRi or CRISPRa activity) and that of a non-targeting gRNA that does not target the target RNA (i.e., with a targeting sequence comprising insufficient homology to the target RNA sequence to promote Casl3 binding and consequent CRISPRi or CRISPRa activity).
  • a given gene can be targeted using a single set of gRNAs that recognize a single target sequence within the gene, or with multiple sets that each target a different sequence within the target RNA.
  • a gRNA comprising one or more mismatches in the spacer sequence provides about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 80%, 85%, 90%, or 95% CRISPRi or CRISPRa activity, wherein 100% CRISPRi or CRISPRa activity corresponds to the activity in the presence of a PM sgRNA, and wherein 0% CRISPRi or CRISPRa activity corresponds to the activity in the presence of a non-targeting sgRNA with no, or only insignificant amounts of, homology to the target sequence.
  • the method comprises the following steps:
  • the gRNA is a perfect match gRNA, a mismatch gRNA, or an indel gRNA
  • the ranking is performed using deep learning, such as using the TIGER model described herein (sec, c.g., Examples 4-5).
  • the inventors have trained a convolutional neural network termed “TIGER” (Targeted Inhibition of Gene Expression via gRNA design) to predict efficacy from CRISPR guide sequence and context.
  • TIGER outperforms existing models at predicting on- and off-target activity on our dataset and published datasets.
  • TIGER scoring combined with specific mismatches yields the first general framework to modulate transcript expression, enabling use of RNA-targeting CRISPRs to precisely control gene dosage.
  • Our model uses guide sequence, target sequence, and additional non-sequence features (FIG. 3A).
  • the sequence input to the convolutional neural network (CNN) consists of 23 nt target and gRNA sequences with 2 nt of upstream and downstream target context.
  • Our neural network layers are:
  • This neural network layer refers to a 2D convolutional layer that uses 32 filters with a size of 4x4 each.
  • the input to this layer would typically be a 2 dimensional image, and the layer would apply the 32 filters to different parts of the input image to extract features.
  • the ReLU activation function is then applied element-wise to the output of each filter.
  • one or more of the following non-sequence features may be input as a vector at the first dense layer: i. target accessibility - (See, FIG. 8A, 8C (top))
  • Table 1 provides blaze marks for the non-sequence features, and the blaze marks for each quintile, with quintile 1 providing the highest efficacy levels, and quintile 5 providing the lowest. See, FIGs. 9A-9N.
  • a method of titrating activity of a gene of interest is provided. After identifying an active PM gRNA, knockdown activity is titrated using various single mismatches (SM) between the gRNA and target site. See, FIG. 4E.
  • SM single mismatches
  • FIG. 4E TIGER can accurately predict “activity ratio”, i.e., the ratio of the fold-change of a SM gRNA to the fold-change of its PM cognate gRNA. (FIG. 4E), and thus, can be used to generate one or more gRNA capable of providing a desired level of knockdown/expression of the gene. In certain embodiments, this is particularly useful for conditions where gene dosage matters, e.g., haploinsufficiency. See, e.g., Huang et al, Characterising and Predicting Haploinsufficiency in the Human Genome, PLoS Genetics, 2010, Vol 6, No. 10, el001154.
  • nt 15 to nt 21 (or nt 17 to 18 or nt 18) of the gRNA matching with its corresponding hybridization region of the target RNA without mismatches ranks higher than those with mismatches.
  • gRNA having one or two or more mismatches to its corresponding target RNA ranks lower comparing to those having zero mismatches (PM).
  • a method for predicting on-target activity of a gRNA composed of a DR stem loop and a spacer is capable of forming a complex with a Casl3d or a variant thereof and directing the complex to the target RNA.
  • the method comprises characterizing one or more of the features (any one or combination of the features as disclosed herein) of a plurality of gRNAs and their corresponding target RNAs; assessing on-target activity of each of the gRNAs; constructing a model using the characterization data and the on-target activity data by a modeling method.
  • the modeling method comprises TIGER modeling as described herein, and as exemplified at huggingface.co/spaces/Knowles-Lab/tiger.
  • input of the model comprises characterization(s) of one or more of features of a gRNA and its corresponding target RNA.
  • output of the model is an on-target score of the gRNA.
  • an on-target score is an assigned number (for example, an integer, rational number or irrational number) which positively correlates to on-target activity of a gRNA.
  • the predicting method further comprises applying the constructed model to a gRNA and generating an on-target score of the gRNA.
  • the predicting method comprises applying the constructed model to two or more gRNAs (such as a first gRNA and a second gRNA), and generating on-target scores of the gRNAs.
  • the gRNAs share the same target RNA.
  • the gRNA is capable of hybridizing to a different (overlapping or non-overlapping) hybridization region of the same target RNA.
  • the predicting method further comprises comparing the generated on-target scores and selecting the gRNA having the higher/highest score for directing the gRNA-Cas!3d complex to the target RNA.
  • an on-target activity of a gRNA may refer to one or more of the following: cell survival, efficacy of the gRNA in forming a complex with a Casl3d protein or a variant thereof; efficacy of the gRNA in hybridizing to the corresponding target RNA; efficacy of the gRNA in directing a Casl3d-gRNA complex to the target RNA; efficacy of the gRNA in reducing the corresponding target RNA; and enrichment or abundance or depletion of the gRNA (or the guide of the gRNA or the target RNA) after applying the gRNA and a Cast 3d or a variant thereof to a cell or cell culture.
  • the gRNA efficacy was determined by quantifying gRNA abundances in sorted and unsorted cell populations. The value represents the log2 fold change of sorted divided by input (for example, unsorted) counts. Higher values depict higher efficacies/efficiencies for target knockdown owed to the screen design.
  • an on-target score may be used to quantify the on-target activity.
  • an on-target score is an efficiency quintile as used here in (QI to Q5 also shown as binl to bin5).
  • an on-target score is a measured or calculated efficacy, for example, a fold change of gRNA/guide/target RNA abundance before versus after applying the gRNA.
  • the gRNA is composed of a DR stem loop and a guide, and is capable of forming a complex with a Casl3d or a variant thereof and directing the complex to the target RNA.
  • the predicting method comprises characterizing one or more of the features of a plurality of gRNAs and their corresponding target RNAs; assessing off-target activity of each of the gRNAs; and constructing a model using the characterization and the off-target activity acquired by a modeling method, e.g., TIGER.
  • the predicting method further comprises applying the TIGER model to a gRNA and generating an activity ratio of the gRNA.
  • the predicting method further comprises applying the constructed model to two or more gRNA (for example, a first gRNA and a second gRNA) and generating activity ratios of the gRNAs.
  • the gRNAs share the same target RNA.
  • the gRNA is capable of hybridizing to a different (overlapping or non-overlapping) hybridization region of the same target RNA.
  • the predicting method further comprises comparing the generated activity ratio and selecting the gRNA having appropriate predicted activity ratio for the desired gene expression level.
  • an off-target activity refers to an activity of a gRNA-Casl3d complex binds to and optionally nicks an RNA which is not the target RNA.
  • An off-target effect refers to binding of a gRNA-Cas!3d complex with an RNA which is not the target RNA and any consequence(s) thereof, for example, reduction of a non-target RNA, reduction of a peptide or a protein encoded by the non-target RNA, increase or reduction of a peptide or a protein whose expression is regulated by the non-target RNA, and any physiological change(s) relating thereto.
  • a method for selecting a gRNA from two or more of gRNAs for directing a complex which comprises the gRNA and a CRISPR-associated protein 13d (Casl3d) or a variant thereof (z.e., Casl3d-gRNA complex or gRNA-Casl3d complex) to a target RNA for directing a complex which comprises the gRNA and a CRISPR-associated protein 13d (Casl3d) or a variant thereof (z.e., Casl3d-gRNA complex or gRNA-Casl3d complex) to a target RNA.
  • Casl3d CRISPR-associated protein 13d
  • a modeling method refers to a mathematical or statistical analysis, for example, random forest models, classification and regression tree models, boosting, Bayesian networks, Markov random field, linear and generalized linear’ models, boosted tree models, neural networks, support vector machines, general chi-squared automatic interaction detector models, interactive tree models, multiadaptive regression spline, machine learning classifiers, a multi hypothesis testing, a principal component analysis, and any combinations thereof.
  • a mathematical or statistical analysis for example, random forest models, classification and regression tree models, boosting, Bayesian networks, Markov random field, linear and generalized linear’ models, boosted tree models, neural networks, support vector machines, general chi-squared automatic interaction detector models, interactive tree models, multiadaptive regression spline, machine learning classifiers, a multi hypothesis testing, a principal component analysis, and any combinations thereof.
  • the analysis can be characterized by a learning style including any one or more of: supervised learning (e.g., using logistic regression, using back propagation neural networks), unsupervised learning (e.g., using an Apriori algorithm, using K-means clustering), semi-supervised learning, reinforcement learning (e.g., using a Q-learning algorithm, using temporal difference learning), and any other suitable learning style.
  • supervised learning e.g., using logistic regression, using back propagation neural networks
  • unsupervised learning e.g., using an Apriori algorithm, using K-means clustering
  • semi-supervised learning e.g., using a Q-learning algorithm, using temporal difference learning
  • reinforcement learning e.g., using a Q-learning algorithm, using temporal difference learning
  • the analysis can implement any one or more of: a regression algorithm (c.g., ordinary least squares, logistic regression, stepwise regression, multivariate adaptive regression splines, locally estimated scatterplot smoothing, etc.), an instance-based method (e.g., k-nearest neighbor, learning vector quantization, self-organizing map, etc.), a regularization method (e.g., ridge regression, least absolute shrinkage and selection operator, elastic net, etc.), a decision tree learning method (e.g., classification and regression tree, iterative dichotomiser 3, C4.5, chi-squared automatic interaction detection, decision stump, random forest, multivariate adaptive regression splines, gradient boosting machines, etc.), a Bayesian method (e.g., naive Bayes, averaged one-dependence estimators, Bayesian belief network, etc.), a kernel method (e.g., a support vector machine, a radial basis function, a linear discriminate analysis, etc.), a clustering of
  • the machine learning classifier may be a discriminant analysis (DA) machine learning classifier, a nearest neighbor (NN) machine learning classifier, a random forest (RF) machine learning classifier, or a support vector machine (SVM).
  • a DA machine learning classifier may be a linear discriminant analysis (LDA) classifier, or a quadratic discriminant analysis (QDA) classifier.
  • the SVM classifier may have three kernels, including a linear kernel, a radial basis function (RBF) kernel, and a polynomial kernel.
  • the machine learning classifier may employ a convolutional neural network (CNN).
  • a modeling method may be performed on a computer.
  • characterizing a feature or a grammatical variation thereof refers to a qualitative or quantitative manner of describing the feature. For example, it may be presence or absence of the feature, a numeric range of the feature, or a parameter/number/percentage calculated.
  • the ranking and/or any of the predicting methods as disclosed herein are determined in silica in software.
  • Such software is, for example, an R language program, a Python program or similar. Other codes performing the same function may also be used.
  • target RNA which is a messenger RNA (mRNA), a mature mRNA, a primary transcript mRNA (pre-mRNA), a ribosomal RNA (rRNA), a 5.8S rRNA, a 5S rRNA, a transfer RNA (tRNA), a transfer-messenger RNA (tmRNA), an enhancer RNA (eRNA), a small interfering RNA (siRNA), a microRNA (miRNA), a small nucleolar RNA (snoRNA), a Piwi-interacting RNA (piRNA), a tRNA-derived small RNA (tsRNA), a small rDNA-derived RNA (srRNA), a non-coding RNA (ncRNA), long (intergenic) non-coding RNA (lincRNA/lncRNA), a single-stranded RNA (ssRNA), a circular RNA (circRNA), a vault RNA (vRNA/vtRNA), a SmY
  • mRNA
  • RNA targets RNase P
  • a noncoding regulatory RNA e.g. 7SK RNA
  • RNA-viruses single stranded DNA
  • CDS coding sequence
  • UTR untranslated region
  • nucleic acid molecule comprising one or more of the gRNA(s) as disclosed, or a nucleic acid sequence complementary to the gRNA(s), or a nucleic acid sequence encoding the gRNA(s), or a nucleic acid sequence complementary to the gRNA coding sequence.
  • the nucleic acid molecule is a DNA.
  • the nucleic acid molecule is a mature RNA.
  • the nucleic acid molecule comprises a DNA sequence encoding the gRNA(s).
  • the nucleic acid molecule further comprises a first regulatory sequence directing expression of the gRNA(s).
  • the first regulatory sequence may comprise without limitation, a Pol ITT promoter, for example, a U6 promoter, a Hl promoter, a T7 promoter, and a 7SK promoter.
  • a nucleic acid molecule encoding a gRNA may be in operative association with an RNA pol III promoter.
  • RNA pol III promoters which can be used are publicly or commercially available, for example the U6 promoter, the promoter fragments derived from Hl RNA genes or U6 snRNA genes of human or mouse origin or from any other species.
  • pol III promoters can be modified/engineered to incorporate other desirable properties such as the ability to be induced by small chemical molecules, either ubiquitously or in a tissue-specific manner.
  • the promoter may be activated by tetracycline.
  • the promoter may be activated by IPTG (lad system). See, US5902880A and US7195916B2.
  • a Pol III promoter from various species might be utilized, such as human, mouse or rat.
  • the nucleic acid molecule further comprises a DNA sequence encoding a Class 2, Type VI effector protein or a variant thereof.
  • the encoded protein is any Class 2, Type VI protein.
  • the protein is a Casl3d protein.
  • the effector protein is a RfxCas 13d from Ruminococcus flavefaciens strain XPD3002.
  • Casl3d proteins may be utilized, for example, an AdmCasl3d from Anaerobic digester metagenome 15706, EsCas 13d from Eubacterium siraeum DSM15702, P7EOCasl3d from Gut metagenome assembly PlE0-k21, I/rCas 13d from Uncultured Ruminoccocus sp., Rf fC A 3d from Ruminoccocus flavefaciens EDI, and RaCas 13d from Ruminoccocus albus.
  • the feature(s), ranges of the features(s), and any combination thereof may be adjusted according to a Casl3d other than R rCasl3d.
  • the CasI3d or a variant thereof further comprises a nuclear localization signal (NLS) or a cytosolic signal or a nuclear-export signal (NES).
  • the Casl3d or a variant thereof is fused to an endoplasmic reticulum localization element, an Outer Mitochondrial membrane localization element, a Mitochondria localizing element, a Nucleolus localizing element (NIK3x), a Nuclear lamina localizing element (LMNA) or a Nuclear pore complex localizing element (SENP2).
  • the Casl3d or a variant thereof is capable of nicking a target RNA.
  • the Casl3d or a variant thereof has been engineered and does not have a nuclease activity, therefore referred to as a dead Cast 3d.
  • the DNA sequence encoding the effector, e.g., Casl3d, protein is under the control of a regulatory sequence directing expression thereof in a mammalian cell.
  • the nucleic acid molecule comprises a second regulatory sequence which directs expression of the Cast 3d protein or a variation thereof.
  • the second regulatory sequence comprises an RNA polymerase II (Pol II) promoter, for example, an EF-1 Alpha Short (EFS) promoter, or a Tet operator (tetO) promoter.
  • the second regulatory sequence comprises one or more of the following: a polyadenylation (poly(A)) sequence, a selectable marker, a tag, and a Woodchuck Hepatitis Virus (WHP) Posttranscriptional Regulatory Element (WPRE) sequence.
  • poly(A) polyadenylation
  • WP Woodchuck Hepatitis Virus
  • the tag is selected from one or more of the following: a FLAG tag, a poly(His) tag, a chitin binding protein (CBP) tag, a maltose binding protein (MBP) tag, a Strep tag, a glutathione-S-transferase (GST) tag, a thioredoxin (TRX) tag, a poly(NANP) tag, a V5 tag, a HA tag, a Spot tag, a T7 tag, a NE tag, a fluorescence tag, a Green Fluorescent Protein (GFP) tag, and a MYC tag.
  • a FLAG tag a poly(His) tag
  • CBP chitin binding protein
  • MBP maltose binding protein
  • GST glutathione-S-transferase
  • TRX thioredoxin
  • poly(NANP) tag a poly(NANP) tag
  • V5 tag a V5 tag
  • the FLAG tag has a sequence of DYKDDDK, SEQ ID NO:47.
  • the selectable marker is a puromycin resistance gene, a kanamycin resistance gene, a chloramphenicol resistance gene, a blasticidin S resistance gene, an ampicillin resistance gene, a tetracycline resistance gene, or a G418 resistance gene.
  • the Casl3d coding sequence is operably linked to a regulatory element to ensure expression in a target cell.
  • the promoter is an inducible promoter, such as a doxycycline inducible promoter.
  • the regulatory element(s) comprises an RNA pol II promoter.
  • RNA pol II promoter is a promoter that is sufficient to direct accurate initiation of transcription by the RNA polymerase II machinery, wherein the RNA polymerase II (RNAP II and Pol II) is a RNA polymerase found in the nucleus of eukaryotic cells, catalyzing the transcription of DNA to synthesize precursors of messenger RNA (mRNA) and most small nuclear RNA (snRNA) and microRNA.
  • mRNA messenger RNA
  • snRNA small nuclear RNA
  • Polymerase II promoters that can be used within the compositions and methods described herein are publicly or commercially available to a skilled artisan, for example, viral promoters obtained from the genomes of viruses including promoters from polyoma virus, fowlpox virus (UK 2,211,504), adenovirus (such as Adenovirus 2 or 5), herpes simplex virus (thymidine kinase promoter), bovine papilloma virus, avian sarcoma virus, cytomegalovirus (CMV), a retrovirus (e.g., MoMLV, or RSV LTR), Hcpatitis-B virus, Myeloproliferative sarcoma virus promoter (MPSV), VISNA, and Simian Virus 40 (SV40); other heterologous mammalian promoters including the actin promoter, P-actin promoter, immunoglobulin promoter, heat- shock protein promoters, human Ubiquitin-C promoter,
  • the promoter is a CMV promoter.
  • the promoter is an EF-1 Alpha Short (EFS) promoter, or a Tet operator (tetO) promoter.
  • regulatory element refers to expression control sequences which are contiguous with the nucleic acid sequence of interest (for example, a Casl3d coding sequence or a sequence for expressing a gRNA) and expression control sequences that act in trans or at a distance to control the nucleic acid sequence of interest.
  • regulatory elements comprise but not limited to: promoter; enhancer; transcription factor; transcription terminator; efficient RNA processing signals such as splicing and poly adenylation signals (poly A); sequences that stabilize cytoplasmic mRNA, for example Woodchuck Hepatitis Virus (WHP) Posttranscriptional Regulatory Element (WPRE); sequences that enhance translation efficiency (i.e., Kozak consensus sequence); sequences that enhance protein stability; and when desired, sequences that enhance secretion of the encoded product.
  • WTP Woodchuck Hepatitis Virus
  • WPRE Posttranscriptional Regulatory Element
  • Regulatory sequences include those which direct constitutive expression of a nucleic acid sequence in many types of target cell and those which direct expression of the nucleic acid sequence only in certain target cells (e.g., tissue-specific regulatory sequences).
  • the Casl3d can be delivered by way of a vector comprising a regulatory sequence to direct synthesis of the Casl3d at specific intervals, or over a specific time period. It will be appreciated by those skilled in the art that the design of the vector can depend on such factors as the choice of the ta get cell, the level of expression desired, and the like.
  • operably linked sequences or sequences “in operative association” include both expression control sequences that are contiguous with the nucleic acid sequence of interest (for example, a Casl3d coding sequence or a sequence for expressing a gRNA) and expression control sequences that act in trans or at a distance to control the nucleic acid sequence of interest.
  • the nucleic acid sequence encoding a Casl3d protein further comprises a reporter gene or a nucleic acid encoding a selectable marker, which may include sequences encoding geneticin, hygromicin, ampicillin or purimycin resistance, among others.
  • a reporter gene which is used as an indication of whether the Casl3d coding sequence has been incorporated into and/or expressed as a functional protein in the target cell or not, is readily selected by one of skill in the art, including without limitation, the E. coli lacZ gene, the chloramphenicol acetyltransferase (CAT) gene, or a gene encoding a fluorescent protein such as Green fluorescent protein (GFP).
  • one nucleic molecule comprises the sequence for the gRNA and a separate nucleic molecule encodes the sequence of the Casl3d protein.
  • the vector comprising a gRNA and or a nucleic acid molecule as disclosed.
  • the vector is a viral vector, a retrovirus vector, a lentiviral vector, an adenovirus vector, an adeno-associated virus vector, or a hybrid viral vector.
  • the vector is a non- viral vector or an analogous carrier, such as a nanoparticle, a lipid complex, a polymer, a quantum dot, a carbon nanotube, a magnetic nanoparticle, or a gold nanoparticle.
  • a vector for example, a plasmid for producing of the vector is provided.
  • a ribonucleoprotein (RNP) complex as described herein includes a Class 2, Type VI effector protein and a gRNA, as defined herein.
  • a cell which contains one or more of the gRNA, nucleic acid molecules, RNP or compositions described herein.
  • the cell may be mammalian, preferably a human cell. In other embodiments, the cell may be bacterial.
  • a library comprising a plurality of gRNAs or nucleic acid molecules or RNPs or vectors or cells as disclosed.
  • each of the gRNAs is capable of directing a Casl3d or a variant thereof to a different target RNA or a different region of one target RNA.
  • the library is a lentiviral library.
  • a composition comprising a pharmaceutical acceptable carrier and one or more gRNA(s), RNPs, or nucleic acid molecule(s) or vector(s), or cells as disclosed.
  • These compositions may be for pharmaceutical use and thus useful in the treatment of a disease associated with an abnormal RNA or misregulation of an RNA transcript.
  • Some examples of these diseases arc the diseases mentioned specifically above.
  • the gRNA, RNPs, pharmaceutical compositions, cells, vectors and libraries may also comprise gRNA having guide sequences which mismatch the target and allow the Class 2, Type VI effector protein to bind the target, but not elicit target degradation when used in the methods known to those of skill in the art as well as the methods described and exemplified specifically herein.
  • One or more of the gRNAs, nucleic acid molecules, RNPs, vectors, cells, and libraries described herein are useful in a variety of methods including without limitation, treating a disease associated with an abnormal RNA; screening functional RNA(s); knocking-down, detecting, or editing a target RNA; or detecting or editing splicing, alternative isoforms, intron retention or differential UTR usage, or binding but not degrading the target.
  • the gRNA(s), nucleic acid molecule(s), RNP(s), vector(s), cell(s), or composition(s) containing one of more of them are used as a medicament, for example, in the treatment of a disease associated with an abnormal RNA such as by reducing the level of the abnormal RNA.
  • a disease associated with an abnormal RNA such as by reducing the level of the abnormal RNA.
  • Such disease may be a cancer/tumor, a virus infection, or a genetic disorder.
  • the treatment comprises contacting a target cell, and/or a biological sample from a subject having or suspected of having the disease with the gRNA(s), nucleic acid molecule(s), RNP(s), or vector(s) described herein.
  • target RNA of the gRNA(s) is/are the abnormal RNA(s) associated with the disease.
  • the level of the abnormal RNA(s) in the target cell and/or in the biological sample is reduced.
  • the level of the abnormal RNA(s) after the treatment is reduced to at least about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, or about 95% of the level before the treatment or the level of a subject having this disease.
  • the level of the abnormal RNA(s) after the treatment is about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 100%, about 1.2 fold, about 1.5 fold, about 2 fold, about 3 fold or about 4 fold of a control level of a subject who is free of the disease.
  • the targets are blocked but not degraded.
  • the targets are modified temporarily.
  • the targets are modified permanently.
  • the gRNA(s), nucleic acid molecule(s), RNP(s), vector(s), cell(s), or composition(s) containing one of more of them arc used as a medicament, for example, in the treatment of a disease associated with an insufficient amount of a gene or gene product, e.g., haploin sufficiency, such as by increasing the level of RNA.
  • the treatment comprises contacting a target cell, and/or a biological sample from a subject having or suspected of having the disease with the gRNA(s), nucleic acid molecule(s), RNP(s), or vector(s) described herein.
  • target RNA of the gRNA(s) RNA(s) associated with the insufficiency e.g., target RNA of the gRNA(s) RNA(s) associated with the insufficiency.
  • the level of the RNA(s) in the target cell and/or in the biological sample is increased.
  • the level of the RNA(s) after the treatment is increased by at least about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, or about 95% of the level before the treatment or the level of a subject having this disease.
  • the level of the RNA(s) after the treatment is about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 100%, about 1.2 fold, about 1.5 fold, about 2 fold, about 3 fold or about 4 fold of a control level of a subject who is free of the disease.
  • the level after treatment is a clinically therapeutic level.
  • a method of treating a disease associated with an abnormal RNA or misregulation of an RNA transcript comprises administering to a subject in need thereof the gRNA, nucleic acid molecules, vectors, RNBs, cells, or pharmaceutical compositions described herein.
  • the administering step involves in one embodiment, delivering the selected or designed gRNA as a mature RNA to a cell that expresses an RNA-targeting CRISPR-associated protein, e.g., a Class 2, Type VI protein, such as Casl3d or a variant.
  • the cell has been conditioned or modified to express the Casl3d or variant, and the administering occurs ex vivo.
  • the administering step involves delivering the gRNA described herein in a vector which co-expresses the RNA-targeting CRISPR-associated protein.
  • the administering step involves delivering the gRNA and RNA-targeting CRISPR-associated protein as a ribonucleoprotein complex to the subject.
  • the administering step involves delivering the nucleotide molecule containing the gRNA with a separate nucleotide molecule that expresses the RNA-targeting CRISPR-associated protein.
  • cancer and “tumor” are used interchangeably and refer to an abnormal cell growth invading or spreading to other parts of the subject or having a potential of the invading or spreading.
  • abnormal RNAs may be present in a tumor/cancer cell.
  • the cancer/tumor includes, but is not limited to, a solid tumor (e.g.. breast, colon, ovarian, lung, liver and glioma, Mesothelioma, and non-small cell lung cancer), a B cell lymphoma, a Cutaneous T cell lymphoma and a Lymphoid leukemia.
  • a target cell may generate abnormal RNA(s) in order to neutralize the virus. Additionally or alternatively, after the virus’ entry to a target cell, the virus may utilize the RNA producing machinery of the target cell producing abnormal RNA(s) in order to replicate the virus, or to lyse the target cell, or to perform other function(s) required by fulfilling the virus life cycle.
  • virus infection may include HCV infection and related liver diseases, smallpox, the common cold and different types of flu, corona virus infections, measles, mumps, rubella, chicken pox, and shingles, hepatitis (HCV, HBV, or HAV), HIV, herpes and cold sores, polio, rabies, Ebola and Hanta fever.
  • Abnormal RNA(s) may also be found in other diseases, including, without limitation, Atherosclerosis, Polycystic Kidney Disease, Cardiac disease, Cardiac stress, Myocardial infarction, Kidney fibrosis, Cardiac fibrosis, diabetes, Diabetes-related kidney complications, type 2 diabetes, non-alcoholic fatty liver diseases, mycosis fungoides, and Scleroderma.
  • RNA-causing defects associated with misregulation or defects in RNA include without limitation Prader Willi syndrome, Spinal muscular atrophy (SMA), Dyskeratosis congenita (X-linked), Dyskeratosis congenita (autosomal dominant), Dyskeratosis congenita (autosomal dominant), Diamond-Blackfan anemia, Shwachman-Diamond syndrome, Treacher-Collins syndrome, Prostate cancer, Myotonic dystrophy, type 1 (DM1), Myotonic dystrophy, type 2 (DM2), Spinocerebellar ataxia 8 (SCA8), Huntington's disease-like 2 (HDL2), Fragile X-associated tremor ataxia syndrome (FXTAS), Fragile X syndrome, X-linked mental retardation, Oculopharyngeal muscular dystrophy (OPMD), Human pigmentary genodermatosis, Retinitis pigmentosa, Cartilage-hair hypoplasia (recessive
  • the abnormal RNA(s) is/are presented in a biological sample. In a further embodiment, the abnormal RNA(s) may not be within a cell.
  • a functional screening method comprises contacting one or more gRNA(s), and/or nucleic acid molecule(s), and/or vector(s), and/or a library as disclosed with a target cell of a cell culture, a tissue, or a subject.
  • the method comprises amplifying the nucleic acid molecule or the vector in the target cell, and optionally quantifying the nucleic acid molecule or the vector.
  • a Casl3d protein is expressed by a nucleic acid molecule or a vector in the target cell.
  • the gRNA forms a complex with a Casl3d or a variation thereof, and directs the complex to a target RNA.
  • the nucleic acid molecule or vector is the same nucleic acid molecule or vector which comprises or expresses the gRNA(s).
  • the nucleic acid molecule or vector expresses the Casl3d protein but not the gRNAs and thus, is referred to as “Casl3d molecule” or “Casl3d vector” as used herein.
  • the ratio of the Casl3d molecule (or Casl3d vector) to a gRNA (or nucleic acid molecule and/or vectors providing the gRNA) is about 100 to 1 to about 1 to 100, including each ratio therebetween. In one embodiment, the ratio is about 10 to 1, about 5 to 1, about 4 to 1, about 3 to 1, about 2 to 1, about 1 to 1, about 1 to 2, about 1 to 3, about 1 to 4, about 1 to 5, or about 1 to 10. In a further embodiment, the ratio is a molar ratio.
  • the encoded Casl3d protein is a /?/xCas 13d from Ruminococcus flavefaciens strain XPD3002.
  • Other Casl3d may also be utilized, for example, AdmCasl3d from Anaerobic digester metagenome 15706, sCas 13d from Eubacterium siraeum DSMl 5702, P/EtiCas 13d from Gut metagenome assembly PlE0-k21, LECas 13d from Uncultured Ruminoccocus sp., 7? ⁇ Casl3d from Ruminoccocus flavefaciens FD1, and 7?aCasl3d from Ruminoccocus albus.
  • the Cast 3d or a variant thereof further comprises a nuclear localization signal (NLS) or a cytosolic signal or a nuclear-export signal (NES).
  • NLS nuclear localization signal
  • NES nuclear-export signal
  • the Casl3d or a variant thereof is capable of nicking a target RNA.
  • the Casl3d or a variant thereof has been engineered and does not have a nuclease activity.
  • the Casl3d is conjugated to a reporter molecule.
  • the method reduces level of one or more of target RNA(s) in a target cell.
  • the method functionally knocks down or knocks out one or more gene(s) expressing the target RNA(s).
  • the method knocks down or knocks out one or more gene(s) in a plurality of target cells in parallel.
  • a selective pressure or a stimulus is applied to the target cells prior to, during or after the contacting step, which is referred to as a perturbation step.
  • Such selective pressure or a stimulus includes, for example, a chemical agent or a biological agent or actively physically disturbing the target cell(s).
  • the method further comprises assessing cell viability, cell proliferation, cell apoptosis, cell death, cell phenotype, existence or concentration of a molecule (for example, the target RNA(s)), protein or cell marker expression, or response to a stimulus of a target cell, or a function which may be achieved by the cell culture, tissue, or subject comprising the target cell(s).
  • a molecule for example, the target RNA(s)
  • protein or cell marker expression or response to a stimulus of a target cell, or a function which may be achieved by the cell culture, tissue, or subject comprising the target cell(s).
  • a method for editing or modifying a target RNA comprising contacting a gRNA-Cas!3d RNB complex with a target RNA.
  • this method or any composition used in the method is used for treatment of a disease associated with the target RNA.
  • the gRNA of the complex is as disclosed herein.
  • the complex is produced by a vector or a nucleic acid sequence disclosed.
  • the Casl3d nicks the target RNA.
  • the Casl3d has been engineered to have no nuclease activity. Other suitable Casl3d variants have been discussed in other sections of this application.
  • the Cas 13d of the complex is engineered to edit or modify an RNA, for example.
  • the Cas 13d may be conjugated to an RNA aminasc, deaminase (e.g., ADAR, AD ARI, ADAR2), methylase, or demethylase (e.g., ALKBH5).
  • the Casl3d is conjugated to a splicing factor, for example a RBF0X1 or RBM38, whereby exon inclusion in the target RNA is induced when the hybridization region is at the downstream intron (i.e., intron at the 3’ side of an extron), and whereby exon exclusion in the target RNA is induced when the hybridization region is within the target exon.
  • the Cas 13d is conjugated to a poly adenylation factor, for example, Nudix hydrolase 21 (NUDT21), whereby polyadenylation of RNA is induced at the hybridization region of the target protein.
  • NUDT21 Nudix hydrolase 21
  • nucleic acid or a “nucleotide”, as described herein, can be RNA, DNA, or a modification thereof, and can be selected, for example, from a nucleic acid encoding a protein of interest, oligonucleotides, nucleic acid analogues, for example peptide- nucleic acid (PNA), pseudocomplementary PNA (pc-PNA), locked nucleic acid (LNA) etc.
  • PNA peptide- nucleic acid
  • pc-PNA pseudocomplementary PNA
  • LNA locked nucleic acid
  • the terms “nucleotide” “nucleic acid” “nucleotide residue” and “nucleic acid residue” are used interchangeably, referring to a nucleotide in a nucleic acid polymer.
  • consecutive nucleotide residues refer to nucleotide residues in a contiguous region of a nucleic acid polymer.
  • a nucleic acid molecule (RNA or DNA) or a nucleotide therein may be modified or edited.
  • modification or edition includes 5' capping, 3' polyadenylation, and RNA splicing.
  • the modification or edition includes methylation (for example on a A residue resulting in a m 6 A), demethylation (for example, on a m 6 A, optionally via a RNA demethylase, including but not limited to ALKBH5), deamination (for example, from adenosine (A) to inosine (I), optionally via a tRNA-specific adenosine deaminase (ADAT), or from C to U, optionally via a pentatricopeptide repeat (PPR) protein), or amination (for example, from U to C or from G to A).
  • adenosine (A) to inosine (I) optionally via a tRNA-specific adenosine deamina
  • RNA Ribonucleic acid
  • RNA is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes.
  • RNA may refer to a CRISPR guide RNA, a messenger RNA (mRNA), a mitochondrial RNA, short hairpin RNAi (shRNAi), small interfering RNA (siRNA), a mature mRNA, a primary transcript mRNA (pre- mRNA), a ribosomal RNA (rRNA), a 5.8S rRNA, a 5S rRNA, a transfer RNA (tRNA), a transfer-messenger RNA (tmRNA), an enhancer RNA (eRNA), a small interfering RNA (siRNA), a microRNA (miRNA), a small nucleolar RNA (snoRNA), a Piwi-intcracting RNA (piRNA), a tRNA-derived small RNA (tsRNA), a small rDNA-derived RNA (
  • the target RNA is an endogenous RNA. Additionally, or alternatively, the target RNA comprises/is a CDS. In another embodiment, the target RNA comprises/is a UTR (including a 5’ UTR or a 3’ UTR). In yet another embodiment, the target RNA comprises/is an intron.
  • deoxyribonucleic acid is a polymeric molecule formed by deoxyribonucleic acid, including, but not limited to, genomic DNA, double-strand DNA, singlestrand DNA, DNA packaged with a histone protein, complementary DNA (cDNA which is reverse-transcribed from an RNA), mitochondrial DNA, and chromosomal DNA.
  • a “vector” as used herein is a biological or chemical moiety comprising a nucleic acid sequence which can be introduced into an appropriate cell for replication or expression of said the nucleic acid sequence.
  • Common vectors include naked DNA, phage, transposon, plasmids, viral vectors, cosmids (Phillip McClean, www.ndsu.edu/pubweb/ ⁇ mcclean/plsc731/clonin / cloning4.htm) and artificial chromosomes (Gong, Shiaoching, et al. “ gene expression atlas of the central nervous system based on bacterial artificial chromosomes.” Nature 425.6961 (2003): 917-925).
  • vector refers to a circular double stranded DNA loop into which additional nucleic acid segments can be ligated.
  • a viral vector wherein additional nucleic acid segments can be ligated into the viral genome.
  • Certain vectors are capable of autonomous replication in a cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors).
  • the vector is a lentiviral vector.
  • Other vectors e.g., non-episomal mammalian vectors
  • a “viral vector” refers to a synthetic or artificial viral particle in which an expression cassette containing a nucleic acid sequence of interest is packaged in a viral capsid or envelope.
  • viral vector include but are not limited to lentivirus, adenoviruses (Ads), retroviruses (y-rctroviruscs and lentiviruscs), poxviruses, adcno-associatcd viruses (AAV), baculoviruses, herpes simplex viruses.
  • the viral vector is replication defective.
  • replication-defective virus refers to a viral vector, wherein any viral genomic sequences also packaged within the viral capsid or envelope are replication-deficient; i.e., they cannot generate progeny virions but retain the ability to infect cells.
  • carrier includes any and all solvents, dispersion media, vehicles, coatings, diluents, antibacterial and antifungal agents, isotonic and absorption delaying agents, buffers, carrier solutions, suspensions, colloids, and the like.
  • carrier includes any and all solvents, dispersion media, vehicles, coatings, diluents, antibacterial and antifungal agents, isotonic and absorption delaying agents, buffers, carrier solutions, suspensions, colloids, and the like.
  • Supplementary active ingredients can also be incorporated into the compositions.
  • pharmaceutically acceptable refers to molecular entities and compositions that do not produce an allergic or similar untoward reaction when administered to a subject.
  • Delivery vehicles such as lipid particle, liposomes, nanocapsules, nanosphere, nanoparticle, microparticles, microspheres, lipid particles, vesicles, and the like, may be used for the introduction of the compositions of the present invention into suitable target cells.
  • biological sample any biological fluids, cells or tissues of a subject that is suitable for use, such as, for example, cell-containing body fluids such as blood, sperm, cerebral spinal fluid, saliva, sputum or urine, leukocyte fractions, buffy coat, feces, swabs, puncture fluids, skin fragments, whole organisms or parts thereof, organs, organ fragments, tissues and tissue parts of a subject.
  • cell-containing body fluids such as blood, sperm, cerebral spinal fluid, saliva, sputum or urine, leukocyte fractions, buffy coat, feces, swabs, puncture fluids, skin fragments, whole organisms or parts thereof, organs, organ fragments, tissues and tissue parts of a subject.
  • Suitable samples are in the form of sections, biopsies, fine needle aspirates or tissue sections, isolated cells, for example in the form of adherent or suspended cell cultures, plants, plant parts, plant tissues from the fractions may be carried out at the same time or one or plant cells, bacteria, viruses, yeasts and fungi, without being limited thereto.
  • the biological sample contains a target RNA.
  • a suitable biological sample is a tissue section from human tissue, such as a tumor.
  • Pooled viral CRISPR “libraries” are a heterogenous population of viral transfer vectors, each containing an individual gRNA targeting a single gene in a given genome.
  • the term “tag” refers to a peptide or polypeptide whose presence can be readily detected.
  • the tag is selected from one or more of the following: a FLAG tag, a poly(His) tag, a chitin binding protein (CBP) tag, a maltose binding protein (MBP) tag, a Strep tag, a glutathionc-S-transfcrasc (GST) tag, a thiorcdoxin (TRX) tag, a poly(NANP) tag, a V5 tag, a HA tag, a Spot tag, a T7 tag, a NE tag, a fluorescence tag, a Green Fluorescent Protein (GFP) tag, and a MYC tag.
  • the tag is a florescent protein such as Green fluorescent protein (GFP).
  • reporter molecule which is used to indicate the presence of a molecule to which it is conjugated (for example, a gRNA, a nucleic acid molecule, a protein, or a Cast 3d), is readily known by one of skill in the ail.
  • the reporter molecule may be a tag or a nucleic acid molecule encoding a tag.
  • the reporter molecule may be an enzyme or a nucleic acid molecule expressing the enzyme, such as an E. coli lacZ enzyme, or a chloramphenicol acetyltransferase (CAT), or a luciferase.
  • the term “selectable marker” refers to a molecule, a peptide or polypeptide whose presence can be readily detected in a target cell when selective pressure is applied to the cell.
  • the selectable marker is a puromycin resistance gene, a kanamycin resistance gene, a chloramphenicol resistance gene, a blasticidin S resistance gene, a geneticin resistance gene, a hygromicin resistance gene, an ampicillin resistance gene, a tetracycline resistance gene, or a G418 resistance gene.
  • target cell may refer to any cell of interest.
  • a target cell may refer to a cell having a target RNA or suspected of having a target RNA.
  • the term “target cell” refers to a cell of various mammalian species.
  • the target cell is a mammalian cell.
  • the target cell might be a eukaryotic cell, a prokaryotic cell, an embryonic stem cell, a cancer cell, a neuronal cell, an epithelial cell, an immune cell, an endocrine cell, a muscle cell, an erythrocyte, or a lymphocyte.
  • mammal or grammatical variations thereof, are intended to encompass a singular “mammal” and plural “mammals,” and includes, but is not limited to, humans; primates such as apes, monkeys, orangutans, and chimpanzees; canids such as dogs and wolves; felids such as cats, lions, and tigers; equids such as horses, donkeys, and zebras; food animals such as cows, pigs, and sheep; ungulates such as deer and giraffes; rodents such as mice, rats, hamsters and guinea pigs; wild animals, such as bears, domesticated animals, livestock and laboratory animals.
  • a mammal is a human.
  • subject includes any mammal in need of these methods or compositions, including particularly humans. The subject may be male or female.
  • the terms “therapy”, “treatment” and any grammatical variations thereof shall mean any of prevention, delay of outbreak, reducing the severity of the disease symptoms, and/or removing the disease symptoms (to cure) in a subject in need.
  • a or “an” refers to one or more.
  • an expression cassette is understood to represent one or more such cassettes.
  • the terms “a” (or “an”), “one or more,” and “at least one” are used interchangeably herein.
  • the term “one or more” refers to any integer from one to the maximum including any integer therebetween.
  • the term “about” means a variability of plus or minus 10 % from the reference given, unless otherwise specified.
  • the terms “reduce” “decrease” “alleviate” “ameliorate” “improve” “delay” “earlier” “low” “high” “mitigate”, any grammatical variation thereof, or any similar terms indication a change means a variation of about 5 fold, about 2 fold, about 1 fold, about 90%, about 80%, about 70%, about 60%, about 50%, about 40%, about 30%, about 20%, about 10%, about 5 % compared to a reference (e.g., a guide generated without using the disclosed methods, or a non-targeting control), unless otherwise specified.
  • a reference e.g., a guide generated without using the disclosed methods, or a non-targeting control
  • any range as disclosed herein includes the endpoint and every number/nt/percentage/value therebetween, unless specified.
  • any embodiment listed with respect to a gRNA, a nucleic acid molecule, a vector, a library, a composition, any other component, a method, or a use may be combined with any other embodiments with respect to a gRNA, a nucleic acid molecule, a vector, a library, a composition, any other component, a method, or a use.
  • Technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs and by reference to published texts, which provide one skilled in the art with a general guide to many of the terms used in the present application. The definitions contained in this specification are provided for clarity in describing the components and compositions herein and are not intended to limit the claimed invention.
  • RNA-targeting CRISPR effectors Transcriptome engineering in living cells with RNA-targeting CRISPR effectors is empowering a diverse and expanding set of applications, including transcript knockdown, editing, post-transcriptional modification, labeling, isoform targeting, affinity proteomics, and high-throughput perturbations.
  • transcript knockdown editing, post-transcriptional modification, labeling
  • isoform targeting affinity proteomics, and high-throughput perturbations.
  • all of these applications depend on accurate prediction of on-target activity and off-target avoidance.
  • mismatches and indels have a position- and context-dependent impact on Casl3d activity: single nucleotide mismatches at the center of a sensitive seed region lead to full gRNA inactivation, while extended mismatches at gRNA ends are largely tolerated, and thus are the main reason for unexpected off-target activity.
  • TIGER Insertion of Gene Expression via gRNA design
  • Example 1 Methods Cell culture
  • CRISPR Casl3d gRNA libraries 1) A 120,000 gRNA library tiling 16 essential genes with perfect match, mismatch, and indel gRNAs (Supplementary Data 1-2). 2) A 42,326 gRNA on-target library designed using our TIGER CO mbined model targeting 5,166 target genes with 8 PM gRNAs (Supplementary Data 5). 3) A 48,608 gRNA titration library targeting 1,082 essential genes with four perfect match and ten single mismatch gRNAs (Supplementary Data 10).
  • NT non-targeting
  • the pooled gRNA libraries were synthesized as single-stranded oligonucleotides (Twist Biosciences) and then PCR amplified in 1 reaction per 10,000 gRNAs with a 50pl reaction volume: 0.5pl Q5 polymerase (NEB), lOpl 5x reaction buffer, 2pl oligo pool (Ing/pl), 2.5pl of each fwd and rev primer (lOpM), 2.5pl dNTPs (lOmM), 30pl H2O.
  • Thermocycling conditions for the PCRs were 98C/30s, 8x or 9x[98C/10s, 63C/10s, 72C/15s], 72C/3min.
  • the PCR product was gel-purified or purified using Zymo Clean and Concentrator 25 kit and Gibson-cloned into BsmBI-digested pLentiRfxGuide-Puro (Addgene #138151) using each time eight Gibson reactions with a 20pl reaction volume: 500ng digested plasmid (0.088 pmol), 123.15ng purified oligo pool (1.3245 pmol, 15:1 molar ratio), lOpl 2x Gibson Assembly Master Mix (NEB), incubated for 1 hour at 50°C. Each gRNA was represented with >200 colonies. Complete library representation with minimal bias (90 th percentile/ 10 th percentile gRNA read ratios of ⁇ 2.5 for all libraries) was verified by Illumina sequencing (MiSeq).
  • Lentivirus was produced via transfection of library plasmid pool and appropriate packaging plasmids (psPAX2: Addgene 12260; pMD2.G: Addgene 12259) using linear polyethylenimine MW25000 (Polysciences 23966). Specifically, we seeded ten million HEK293FT cells in 10cm dishes, transfected the cells the next evening (per dish: 60pl PEI, 9.2pg plasmid pool, 6.4 pg psPAX2, 4.4pg pMD2.G) and changed the medium the morning after. At 3 days post-transfection, viral supernatant was collected and passed through a 0.45 um filter and stored at -80°C until use.
  • Doxycycline-inducible /?/.rCas 13d-NLS HEK293FT and HAP1 cells were transduced with the pooled library lentivirus in separate infection replicates ensuring at least lOOOx guide representation in the selected cell pool per infection replicate using a standard spinfection protocol.
  • R xCasl3d expression was induced by addition of Ipg/ml doxycycline (Sigma D9891) upon complete puromycin selection at the time of input sample collection (Day 0). Cells were passaged every two to three days (maintaining full representation) and supplemented with fresh doxycycline.
  • genomic DNA at least 1000 cells per construct representation from each sample at day 0, day 15 and day 30.
  • TIGER on-target and titration screens we collected samples at day 0, day 7 and day 14.
  • PCR1 and PCR2 were used a two-step PCR protocol to amplify the guide RNA cassette for Illumina sequencing from genomic DNA (gDNA).
  • the gDNA was extracted from screen cells using the following protocol 56 : Per 100 million cells, 12 mL of NK Lysis Buffer (50 mM Tris, 50 mM EDTA, 1% SDS, pH 8) were used for cell lysis. Once cells were resuspended, 60 mL of 20 mg/ml Proteinase K (Qiagen) was added and the sample was incubated at 55 °C overnight. The next day, 60 mL of 20 mg/ mL RNase A (Qiagen) was added, mixed, and samples were incubated at 37°C for 30 min.
  • NK Lysis Buffer 50 mM Tris, 50 mM EDTA, 1% SDS, pH 8
  • PCR1 For the PCR1 reaction, we used 960pg (screen 1) or 880pg (screen 461 2 and 3) gDNA for each sample. Per sample we performed 96 (screen 1) or 88 (screens 2 and 3) lOOpL PCR1 reactions with a lOOpl reaction volume: 1 Opl lOx Taq buffer, 0.02U/pl Taq-B enzyme (Enzymatics #P7250L), 0.2mM dNTPs, 0.2pM forward and reverse primers and lOOng gDNA/pl; PCR1 thermocycling conditions were: 94°C/30s, lOx (screen 1) or 18x (screens 2 and 3) [94°C/10s, 55°C/30s, 68°C/45s], 68°C/3min. Since our screen 1 library contained a large number of gRNAs with hamming distance of one to one another, we decided to only perform 10 cycles of PCR1. For each sample, all PCR1 products were pool
  • PCR2 thermocycling conditions were: 98°C/30s, 18x (screenl) or 8x (screen 2 and 3) [98°C/10s, 63°C/30s, 72°C/45s], 72°C/5min.
  • screen 1 we performed an additional PCR2 on the linearized plasmid pool sample with either Q5 or Taq-B polymerase. We found raw counts to be highly correlated with no obvious influence due to the choice of polymerase. PCR primers can be found in Supplementary Data 14.
  • PCR2 products were pooled, followed by normalization (gel-based band densitometry quantification), before combining equal amounts of uniquely-barcoded samples.
  • the pooled product was then purified using SPRI beads.
  • First, we performed a 0.6x vol/vol SPRI to remove gDNA carryover, followed by addition of a 0.3x vol/vol SPRI (0.6 + 0.3 0.9x final) to the supernatant to purify the ⁇ 260bp PCR product.
  • Oligonucleotides can be found in Supplementary Data 14.
  • the final amplicons were sequenced on Illumina NextSeq 500 - II MidOutput 1x 150 v2.5 (screen 1 ) and Illumina NextSeq 500 - II HighOutput 1x150 v2.5 (screen 2 and 3).
  • Reads were first demultiplexed based on Illumina i7 barcodes present in PCR2 reverse primers using bcl2fastq and then by their custom in-read i5 barcode allowing for 1 mismatch. Reads were trimmed to the expected gRNA length by searching for known anchor sequences relative to the guide sequence. Reads were collapsed ( ASTX-Toolkil) to count perfect duplicates followed by exact string-match intersection with the reference to retain only perfectly matching and unique alignments. The raw guide RNA counts (Supplementary Data 3, 6 and 11) were normalized using a median of ratios method 57 and then batch corrected for biological replicates using combat implemented in the SVA R package 58 .
  • RNA enrichments (Supplementary Data 4, 7 and 12) were calculated building the count ratios between a timepoint and the corresponding input (Day 0) sample for each replicate followed by log2-transformation (log2FC). Consistency between replicates was estimated using Pearson correlations and robust rank aggregation (RRA) 60 . For data representation and modeling, we used the mean log2FC across replicates. Delta log2FC for mismatching guides was calculated by subtracting the logoFC of the permuted gRNA from the PM reference guide. For data representations in FIG. 1 and FIG.
  • gRNA secondary structure and minimum free energies was derived using RNAfold [-gquad] on the full-length gRNA (DR + guide) sequence 61 .
  • Target RNA pairing probability (accessibility) was calculated using RNAplfold [-L 40 -W 80 -u 50] as described before 7 . These parameters specify a moving window of 80 nucleotides and a maximal base pairing span of up to 40 nucleotides. We chose these parameters because prior studies 3,62,63 both in the context of Cas 13 and RNA interference have found optimal performance for a local window around the target site.
  • Target RNA accessibility features 1. Position -11 upstream to the first spacer nucleotide with width 23 nt; 2. Position -1 1 with width 4 nt; 3. Position -19 with width 4 nt; 4. Position -25 with width 4 nt.
  • RNA-RNA-hybridization between the guide RNA (PM and gRNA with nucleotide substitutions, but not for indel gRNAs) and its target site was calculated using RNAhybrid [-s - c] 64 .
  • RNAhybrid [-s - c] 64 .
  • the nucleotide context of each point was then correlated with the observed log2FC gRNA enrichments for all perfect match gRNAs, either directly or using partial correlation accounting for gRNA folding MFE. In each case we used Pearson correlation.
  • non-targeting gRNAs We used the distribution of non-targeting gRNAs to determine which gRNAs targeting essential genes to consider as being active. We selected the most depleted 1% of non-targeting gRNAs as the threshold for activity after testing for normality of the non-targeting gRNA distribution (Lilliefors test, p ⁇ 0.001). This threshold corresponds to a log 2 FC ⁇ -0.50 for screen 1 (HEK293FT), log 2 FC ⁇ -0.44 and log 2 FC ⁇ -0.29 for screen 2 in HEK293FT and HAP1, respectively.
  • the sequence input to the convolutional neural network consists of 23 nt target and gRNA sequences with 2 nt of upstream and downstream target context.
  • the input initially is processed with two consecutive convolution layers each with 324x4 kernels followed by a rectified linear unit (ReLU).
  • ReLU rectified linear unit
  • the first two dense (hidden) layers apply sigmoid activations.
  • the single output neuron does not apply any activation function. This design is similar to prior deep learning models in computer vision 38 and sequence analysis 26 .
  • biopython version 1.79
  • pandas version 1.4.5
  • tensorflowS probability version 0.16.0
  • matplotlib version 3.5.1
  • seaborn version 0.11.2
  • shap version 0.41.0
  • statsmodels version 0.12.2
  • gRNA-level CV gRNA-level CV
  • related gRNAs such as mismatch gRNAs (for a particular PM gRNA in the holdout group) may still be included in the training set.
  • gRNA600 level CV we observe better performance with gRNA600 level CV than target- or gene-level CV.
  • BiGRU Bidirectional Gated Recurrent Unit
  • BiGRU architecture to closely mimic the dense layers of TIGER.
  • the BiGRU model uses a convolution kernel of length one to learn a 32-dimensional embedding from the one-hot-encoded target and gRNA sequences (16 unique 32-dimensional embeddings for each possible guide-target pair). This embedding feeds a bidirectional gated recurrent unit (GRU) layer, which outputs a 32-dimensional representation for each sequence position.
  • GRU gated recurrent unit
  • Example 2 A massively-parallel RfxCas 13d screen for perfect-match and variant guide RNAs
  • mismatch gRNAs we designed the mismatch gRNAs to contain 1 , 2 or 3 nucleotide mismatches and the indel gRNAs to contain 1 or 2 nucleotides indels. For both mismatches and indels, we designed separate groups of gRNAs with adjacent placement of mismatches/indels or random spacing of the mismatches/indels.
  • Example 3 Indels are more deleterious than base substitutions for Casl3d gRNAs
  • PM gRNAs For 600 perfect-match (PM) gRNAs predicted to have high activity by RF O n (quartile Q3 or Q4), we designed 108,600 gRNA variants (18,100 per gene). These variant gRNAs include 83,400 gRNAs with single, double, or triple base substitutions. We also included 25,200 gRNAs containing single or double insertions or deletions (indels) (FIG. 1A, FIG. IB). We found 66.1% of PM gRNAs to be active (log2 fold-change ⁇ -0.5, FDR ⁇ 0.01) (FIG. 2A) (see Methods).
  • Example 4 A deep learning model to predict guide RNA efficacy
  • our model Similar to a previous CNN for Cas9 off-target prediction 26 , our model has two convolution layers followed by a max-pooling layer and interleaves three dropout and dense layers for a total of two hidden layers plus an output layer. Our model has two architectural augmentations beyond those used in prior work: additional sequence context flanking the 23nt target site and the flexibility to input a vector of non-sequence features at our first dense layer. We considered six groups of non-sequence features: 1) gRNA folding minimum free energy (MFE), 2) the RNA-RNA hybridization MFE between spacer and target site (multiple positions), 3) target accessibility (i.e.
  • MFE gRNA folding minimum free energy
  • mismatch gRNAs we designed the mismatch gRNAs to contain 1 , 2 or 3 nucleotide mismatches and the indel gRNAs to contain 1 or 2 nucleotides indels. For both mismatches and indels, we designed separate groups of gRNAs with adjacent placement of mismatches/indels or random spacing of the mismatches/indels.
  • Equation (1) defines the variable y g i, which is the normalized LFC.
  • m g is the mean of variable g across all samples in the dataset
  • s g is the standard deviation of variable g across all samples in the dataset.
  • the model uses guide sequence, target sequence, and additional non- sequence features (FIG. 5).
  • the sequence input to the convolutional neural network (CNN) consists of 23 nt target and gRNA sequences with 2 nt of upstream and downstream target context.
  • Our neural network layers are:
  • Fig. 6 summarizes graphically.
  • PM perfect matched
  • SM singly mismatched
  • RNA-RNA hybridization (as in target site accessibility, gRNA-RNA hybridization MFE, and gRNA folding MFE) had the largest contributions to model predictions, consistent with our earlier findings (FIG. 3C).
  • RNA-targeting RfxCas 13d CRISPR screens could discriminate essential genes from non-essential genes.
  • 5166 target genes included in these screens we embedded a set of 1,082 common essential genes and 458 non-essential genes based on DepMap classifications (see Methods).
  • TIGERcombined model could successfully discriminate essential genes against control genes (FIG. 31); AUROC 0.86 and 0.95 in HEK293FT and HAP1 cells).
  • Gene depletion in HAP1 cells was generally more pronounced compared to that in HEK293FT cells (Supplementary Fig. 5a).
  • HAP1 cells showed higher sensitivity (AUROC) for the identification of essential genes (FIG. 31).
  • AUROC sensitivity
  • Example 7 Predicting off-target performance and titrating gene knockdown effects
  • TIGER can predict a PM gRNA’s efficacy and how this efficacy changes when mismatches are introduced
  • the efficacy ratio as the ratio of the fold-change of a singlemismatch gRNA to the fold-change of its PM cognate gRNA.
  • target site-level CV we compared predicted and observed relative gRNA activity for the 23 single mismatch gRNA variants designed for each individual target site.
  • Example 8 Validation of TIGER model
  • HDAC1 and HDAC2 synthetic lethality titration screen.
  • Each target site in HDAC1 was paired with each target site in HDAC2 in a dual gRNA array.
  • Targeting a single gene from HDAC1 and HDAC2 should decrease cell fitness due to the high expression levels of these genes — 147 TPM for HDAC1 and 130 TPM for HDAC2 — if unintended collateral activity occurs. Consistent with our previous data, we found that targeting only one gene, HDAC1 or HDAC2, from this synthetic lethal gene pair does not cause significant gRNA depletion in our lentiviral, low-copy RfxCasl3d cell lines (FIG. 10B). As expected, targeting both genes in the same cell resulted in gRNA depletion. Further, we found that the level of depletion from the screen for all titration pairs correlates with the predicted activity for partial knockdown gRNAs (Data not shown).
  • CRISPR-Net A Recurrent Convolutional Network Quantifies CRISPR Off-Target Activities with Mismatches and Indels. Adv. Sci. 7, 1-17 (2020).

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Chemical & Material Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Cell Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Mycology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

L'invention concerne un ARN (ARNg) à répétitions palindromiques courtes groupées et régulièrement espacées (CRISPR) de classe 2, de type VI qui comprend une séquence en tige-boucle à répétitions directes (DR) et une séquence de guidage ou d'espacement. L'invention concerne également des procédés de génération, de sélection, de caractérisation et d'optimisation d'un ARN (ARNg) à répétitions palindromiques courtes groupées et régulièrement espacées (CRISPR) destiné à être utilisé dans le système de CRISPR-CaslSd présentement décrit. L'invention concerne également un procédé de criblage permettant d'identifier un gRNA particulièrement approprié pour une utilisation avec des cibles spécifiées.
PCT/US2024/033813 2023-06-13 2024-06-13 Cadre d'apprentissage profond pour prédire une activité sur cible et hors cible de arn guides crispr Pending WO2024259103A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363507870P 2023-06-13 2023-06-13
US63/507,870 2023-06-13

Publications (1)

Publication Number Publication Date
WO2024259103A1 true WO2024259103A1 (fr) 2024-12-19

Family

ID=93852649

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/033813 Pending WO2024259103A1 (fr) 2023-06-13 2024-06-13 Cadre d'apprentissage profond pour prédire une activité sur cible et hors cible de arn guides crispr

Country Status (1)

Country Link
WO (1) WO2024259103A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119724349A (zh) * 2025-02-28 2025-03-28 电子科技大学长三角研究院(衢州) 一种基于预训练模型和rna二级结构的rna g-四链体预测方法和系统
CN120412824A (zh) * 2025-07-03 2025-08-01 电子科技大学(深圳)高等研究院 一种增强小分子水合自由能预测准确性的方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210308171A1 (en) * 2018-08-07 2021-10-07 The Broad Institute, Inc. Methods for combinatorial screening and use of therapeutic targets thereof
US20220001030A1 (en) * 2018-10-15 2022-01-06 Fondazione Telethon Genome editing methods and constructs
US20220259593A1 (en) * 2019-07-26 2022-08-18 The Regents Of The University Of California Control of mammalian gene dosage using crispr

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210308171A1 (en) * 2018-08-07 2021-10-07 The Broad Institute, Inc. Methods for combinatorial screening and use of therapeutic targets thereof
US20220001030A1 (en) * 2018-10-15 2022-01-06 Fondazione Telethon Genome editing methods and constructs
US20220259593A1 (en) * 2019-07-26 2022-08-18 The Regents Of The University Of California Control of mammalian gene dosage using crispr

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BODAI ZSOLT, BISHOP ALENA L., GANTZ VALENTINO M., KOMOR ALEXIS C.: "Targeting double-strand break indel byproducts with secondary guide RNAs improves Cas9 HDR-mediated genome editing efficiencies", NATURE COMMUNICATIONS, vol. 13, no. 1, XP093047691, DOI: 10.1038/s41467-022-29989-9 *
WESSELS HANS-HERMANN; MéNDEZ-MANCILLA ALEJANDRO; GUO XINYI; LEGUT MATEUSZ; DANILOSKI ZHARKO; SANJANA NEVILLE E.: "Massively parallel Cas13 screens reveal principles for guide RNA design", NATURE BIOTECHNOLOGY, vol. 38, no. 6, 16 March 2020 (2020-03-16), New York, pages 722 - 727, XP037167639, ISSN: 1087-0156, DOI: 10.1038/s41587-020-0456-9 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119724349A (zh) * 2025-02-28 2025-03-28 电子科技大学长三角研究院(衢州) 一种基于预训练模型和rna二级结构的rna g-四链体预测方法和系统
CN120412824A (zh) * 2025-07-03 2025-08-01 电子科技大学(深圳)高等研究院 一种增强小分子水合自由能预测准确性的方法
CN120412824B (zh) * 2025-07-03 2025-09-09 电子科技大学(深圳)高等研究院 一种增强小分子水合自由能预测准确性的方法

Similar Documents

Publication Publication Date Title
Wessels et al. Prediction of on-target and off-target activity of CRISPR–Cas13d guide RNAs using deep learning
Shi et al. The ZSWIM8 ubiquitin ligase mediates target-directed microRNA degradation
US11913017B2 (en) Efficient genetic screening method
Meunier et al. Birth and expression evolution of mammalian microRNA genes
Guo et al. Small RNAs originated from pseudogenes: cis-or trans-acting?
Lehnert et al. Evidence for co-evolution between human microRNAs and Alu-repeats
US20230022311A1 (en) Methods and compositions involving crispr class 2, type vi guides
Zheng et al. Genome-wide double-stranded RNA sequencing reveals the functional significance of base-paired RNAs in Arabidopsis
WO2024259103A1 (fr) Cadre d'apprentissage profond pour prédire une activité sur cible et hors cible de arn guides crispr
Mamdani et al. Integrating mRNA and miRNA weighted gene co-expression networks with eQTLs in the nucleus accumbens of subjects with alcohol dependence
WO2019113499A1 (fr) Procédés à haut rendement pour identifier des interactions et des réseaux de gènes
EP2705152B1 (fr) Compositions et essais de gènes rapporteurs multiplexes
Sun et al. Transcriptome exploration in Leymus chinensis under saline-alkaline treatment using 454 pyrosequencing
Shin et al. Rare variation in non-coding regions with evolutionary signatures contributes to autism spectrum disorder risk
CN114206376A (zh) 新型crispr dna靶向酶及系统
Garren et al. Global analysis of mouse polyomavirus infection reveals dynamic regulation of viral and host gene expression and promiscuous viral RNA editing
Cora et al. Ab initio identification of putative human transcription factor binding sites by comparative genomics
Zhang et al. Expression profile analysis of circular RNAs in BmN cells (Bombyx mori) upon BmNPV infection
KR102412631B1 (ko) 염기교정 유전자가위의 염기교정 효율 및 결과 예측 시스템
Márquez-Molins et al. Multiomic analisys reveals that viroid infection induces a temporal reprograming of plant-defence mechanisms at multiple regulatory levels
US20220290132A1 (en) Engineered CRISPR/Cas9 Systems for Simultaneous Long-term Regulation of Multiple Targets
Wahl et al. Evaluation of the chicken transcriptome by SAGE of B cells and the DT40 cell line
CN102203281B (zh) 复制条码筛选检测
Kim et al. Single cell CRISPR base editor engineering and transcriptional characterization of cancer mutations
Kanuparthi et al. Sequence based prediction of cell type specific microRNA binding and mRNA degradation for therapeutic discovery

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24824154

Country of ref document: EP

Kind code of ref document: A1