[go: up one dir, main page]

WO2022271725A1 - Detecting crispr genome modification on a cell-by-cell basis - Google Patents

Detecting crispr genome modification on a cell-by-cell basis Download PDF

Info

Publication number
WO2022271725A1
WO2022271725A1 PCT/US2022/034376 US2022034376W WO2022271725A1 WO 2022271725 A1 WO2022271725 A1 WO 2022271725A1 US 2022034376 W US2022034376 W US 2022034376W WO 2022271725 A1 WO2022271725 A1 WO 2022271725A1
Authority
WO
WIPO (PCT)
Prior art keywords
cells
sequencing
sets
genetically modified
basis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2022/034376
Other languages
French (fr)
Inventor
Hanlee P. Ji
Heonseok KIM
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leland Stanford Junior University
Original Assignee
Leland Stanford Junior University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leland Stanford Junior University filed Critical Leland Stanford Junior University
Publication of WO2022271725A1 publication Critical patent/WO2022271725A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/02Libraries contained in or displayed by microorganisms, e.g. bacteria or animal cells; Libraries contained in or displayed by vectors, e.g. plasmids; Libraries containing only microorganisms or vectors
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof

Definitions

  • Various molecular tools could be designed to produce a desired genomic modification. Different molecular tools that can produce a desired genetic modification may exhibit different efficacy of achieving the desired genetic modification. Moreover, different molecular tools may exhibit different effects on the expression of the non-target genes and on the global gene expression in the genetically modified cell.
  • an assay that could be used to confirm that a molecular tool, such as CRISPR-Cas9, that is designed to produce a genetic modification produces the desired genetic modification, particularly, on a cell-by-cell basis.
  • An assay is also provided for screening molecular tools that are designed for a genetic modification, particularly, on a cell-by-cell basis, to identify the molecular tool that produces a desired genomic modification. Further, an assay is provided for screening the effects of a molecular tool that is designed to generate a genetic modification on the global gene expression profile of the genetically modified cells.
  • a Kit is also provided that could be used for performing the method disclosed herein.
  • FIG. 1 Schematic representation of exemplary embodiment of the disclosure.
  • CRISPR-based genome editing to introduce changes into a gene’s sequence (2) long-read sequencing to characterize the CRISPR-based alterations based on changes in the imRNA sequence; (3) cDNA barcoding to determine which cell or cell population has the CRISPR edit; (4) linkage of the CRISPR edit observed in the long-read sequence data with the short-read sequencing from the same cell or set of cells with the forementioned CRISPR edit.
  • Long-read sequencing encompass read lengths greater than 500-600 bases.
  • Short read sequences are defined as read lengths less than 500-600 bases.
  • FIGS. 2A-2E (A) Overview of single-cell short/long-read integration strategy.
  • FIGS. 3A-3C (A) Overview of single-cell CRISPR screen integrated with long-read sequencing. (B) Boxplot showing the ratio of short PTPRC transcript isoform (RO and RB) for cells with guide RNAs targeting indicated genes. P values are calculated in comparison with the nontarget cells. Genes which have less than 3 cells with target guide RNAs are not shown. (C) Heatmap showing proportion of each transcript isoform (x-axis) with each cell (y-axis) and clustering based on transcript isoform proportion for cells having indicated guide RNA sequence.
  • FIGS. 4A-4C (A) Overview of splicing factors affect alternative splicing. (B) Quantification of short transcript isoform per target gene. For each gene (x-axis), cells with guide RNAs target the gene were grouped and the ratio of transcript isoform RO and RB among all PTPRC isoforms are shown as box plot. (C)
  • FIG. 5 illustrates a process of using a based editor to introduce engineered gene mutations into single cells.
  • FIG. 6 illustrates a multiplexed sequencing approach to identify mutations from single cell RNA-seq.
  • SPEX refers to single cell prime extension (SPEX).
  • FIG. 7 illustrates single-cell level detection of CRISPR induced TP53 mutations and their effect on single cell expression.
  • polynucleotide and “nucleic acid,” used interchangeably herein, refer to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. Thus, this term includes, but is not limited to, single-, double-, or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.
  • hybridizable or “complementary” or “substantially complementary” it is meant that a nucleic acid (e.g. RNA, DNA) comprises a sequence of nucleotides that enables it to non-covalently bind, i.e. form Watson-Crick base pairs and/or G/U base pairs, “anneal”, or “hybridize,” to another nucleic acid in a sequence-specific, antiparallel, manner (i.e., a nucleic acid specifically binds to a complementary nucleic acid) under the appropriate in vitro and/or in vivo conditions of temperature and solution ionic strength.
  • a nucleic acid e.g. RNA, DNA
  • anneal i.e. form Watson-Crick base pairs and/or G/U base pairs
  • Standard Watson-Crick base-pairing includes: adenine (A) pairing with thymidine (T), adenine (A) pairing with uracil (U), and guanine (G) pairing with cytosine (C) [DNA, RNA]
  • adenine (A) pairing with thymidine (T) adenine (A) pairing with uracil (U)
  • guanine (G) can also base pair with uracil (U).
  • G/U base-pairing is at least partially responsible for the degeneracy (i.e., redundancy) of the genetic code in the context of tRNA anti-codon base-pairing with codons in mRNA.
  • a guanine (G) e.g., of dsRNA duplex of a guide RNA molecule; of a guide RNA base pairing with a target nucleic acid, etc.
  • U uracil
  • A an adenine
  • a G/U base-pair can be made at a given nucleotide position of a dsRNA duplex of a guide RNA molecule, the position is not considered to be non-complementary, but is instead considered to be complementary.
  • sequence of a polynucleotide need not be 100% complementary to that of its target nucleic acid to be specifically hybridizable or hybridizable. Moreover, a polynucleotide may hybridize over one or more segments such that intervening or adjacent segments are not involved in the hybridization event (e.g., a bulge, a loop structure or hairpin structure, etc.).
  • a polynucleotide can comprise 60% or more, 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% or more, or 100% sequence complementarity to a target region within the target nucleic acid sequence to which it will hybridize.
  • an antisense nucleic acid in which 18 of 20 nucleotides of the antisense compound are complementary to a target region, and would therefore specifically hybridize would represent 90 percent complementarity.
  • the remaining noncomplementary nucleotides may be clustered or interspersed with complementary nucleotides and need not be contiguous to each other or to complementary nucleotides.
  • Percent complementarity between particular stretches of nucleic acid sequences within nucleic acids can be determined using any convenient method. Example methods include BLAST programs (basic local alignment search tools) and PowerBLAST programs (Altschul et al., J. Mol.
  • Binding refers to a non-covalent interaction between macromolecules (e.g., between a protein and a nucleic acid; between a modified CRISPR/Cas effector polypeptide/guide RNA complex and a target nucleic acid; and the like).
  • the macromolecules While in a state of non-covalent interaction, the macromolecules are said to be “associated” or “interacting” or “binding” (e.g., when a molecule X is said to interact with a molecule Y, it is meant the molecule X binds to molecule Y in a non-covalent manner). Not all components of a binding interaction need be sequence-specific (e.g., contacts with phosphate residues in a DNA backbone), but some portions of a binding interaction may be sequence-specific.
  • Binding interactions are generally characterized by a dissociation constant (KD) of less than 10 6 M, less than 10 7 M, less than 10 8 M, less than 10 9 M, less than 10 10 M, less than 10 11 M, less than 10 12 M, less than 10 13 M, less than 10 14 M, or less than 10- 15 M.
  • KD dissociation constant
  • Affinity refers to the strength of binding, increased binding affinity being correlated with a lower KD.
  • a “cell” as used herein, denotes an in vivo or in vitro eukaryotic cell or a cell line.
  • a “binding site for a guide-RNA” as used herein is a polynucleotide (e.g., DNA such as genomic DNA) that includes a site ("target site” or "target sequence") targeted by a modified CRISPR/Cas effector polypeptide.
  • the target sequence is the sequence to which the guide sequence of a guide nucleic acid (e.g., guide RNA; e.g., a dual guide RNA or a single-molecule guide RNA) will hybridize.
  • the target site (or target sequence) 5'-GAGCAUAUC-3' within a target nucleic acid is targeted by (or is bound by, or hybridizes with, or is complementary to) the sequence 5’- -3’.
  • Suitable hybridization conditions include physiological conditions normally present in a cell.
  • the strand of the target nucleic acid that is complementary to and hybridizes with the guide RNA is referred to as the “complementary strand” or “target strand”; while the strand of the target nucleic acid that is complementary to the “target strand” (and is therefore not complementary to the guide RNA) is referred to as the “non-target strand” or “non complementary strand.”
  • long-read sequencing refers to sequencing read lengths greater than 500 bases, particularly, longer than 600 bases.
  • short read sequencing refers to sequencing read lengths less than 600 bases, particularly, less than 500 bases.
  • the terms “may,” “optional,” “optionally,” or “may optionally” mean that the subsequently described circumstance may or may not occur, so that the description includes instances where the circumstance occurs and instances where it does not.
  • long read sequencing such as single molecule real time (SMRT) sequencing or nanopore sequencing
  • SMRT single molecule real time
  • nanopore sequencing nanopore sequencing
  • a cell’s long read sequencing can be combined with the cell’s short read transcriptome information (FIG. 2A).
  • FOG. 2A short read transcriptome information
  • an assay is provided herein that allows one to evaluate cells, on the basis of single cells or sets of cells, that are genetically modified, for example, via CRISPR-mediated genetic edit.
  • the assay disclosed herein allows: (1) confirming the genomic modification, for example, CRISPR edit, based on the target gene’s mRNA; (2) assigning a desired genetic modification, for example, CRISPR-based genomic edit, to an individual cell or set of cells; and (3) determining the effects of a genetic modification on cellular phenotypes such as global gene or protein expression.
  • Certain non-limiting examples of such molecular tools include: 1 ) incorporation of a genetic material into a targeted site in the genome, for example, via homologous recombination; 2) random incorporation of genetic material into a target chromosome; 3) introduction of random mutations in a target genetic material, for example, via exposure to mutagens.
  • More recent tools for introducing genetic modifications in a target genome include programmable nuclease-based genome editing.
  • Programmable nucleases such as zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regularly interspaced short palindromic repeat (CRISPR)- Cas-associated nucleases, provide targeted gene editing platforms.
  • ZFNs zinc-finger nucleases
  • TALENs transcription activator-like effector nucleases
  • CRISPR clustered regularly interspaced short palindromic repeat
  • Nuclease-based genetic modification involves targeted alterations in genomic regions based on nuclease-induced double-stranded breaks (DSBs) at a specific desired locus in the target genome.
  • DSBs leads to the production of damaged DNA and stimulation of the cell’s DNA repair mechanism, such as homology-directed repair (HDR) and nonhomologous end-joining (NHEJ).
  • HDR homology-directed repair
  • NHEJ nonhomologous end-joining
  • CRISPR-based genome editing is used to introduce genetic modifications.
  • the assay involves: CRISPR-based genome editing to introduce changes into a gene’s sequence and long-read sequencing to characterize the CRISPR-based alterations based on the changes in the mRNA sequence.
  • the long-read sequence can involve cDNA barcoding to determine which cell or set of cells has the desired CRISPR edit.
  • the CRISPR edit observed in the long-read sequence data can be linked with the short-read sequencing from the same cell or set of cells to determine the effects of the genetic modification on the cell, for example, on global gene expression.
  • the CRISPR system suitable for use in the methods of the present disclosure can be CRISPR-Cas9.
  • a guide nucleic acid suitable for inclusion in a CRISPR-system used in the present disclosure can include: i) a first segment (referred to herein as a “targeting segment”); and ii) a second segment (referred to herein as a “protein-binding segment”).
  • a “segment” is a region of a molecule, e.g., a contiguous stretch of nucleotides in a nucleic acid molecule.
  • a segment can also be a section of a complex such that a segment may comprise regions of more than one molecule.
  • the “targeting segment” is also referred to herein as a “variable region” of a guide RNA.
  • the “protein-binding segment” is also referred to herein as a “constant region” of a guide RNA.
  • the first segment (targeting segment) of a guide RNA includes a nucleotide sequence (a guide sequence) that is complementary to (and therefore hybridizes with) a specific sequence (a target site) within a target nucleic acid (e.g., a target ssRNA, a target ssDNA, the complementary strand of a double stranded target DNA, etc.).
  • the protein-binding segment (or “protein-binding sequence”) interacts with, for example, binds to, a CRISPR/Cas effector polypeptide.
  • the protein-binding segment of a guide RNA includes two complementary stretches of nucleotides that hybridize to one another to form a double stranded RNA duplex (dsRNA duplex).
  • Site-specific binding and/or cleavage of a target nucleic acid can occur at locations (e.g., target sequence of a target locus) determined by base-pairing complementarity between the guide RNA (the guide sequence of the guide RNA) and the target nucleic acid.
  • a guide RNA and a CRISPR/Cas effector polypeptide form a complex (e.g., bind via non-covalent interactions).
  • the guide RNA provides target specificity to the complex by including a targeting segment, which includes a guide sequence (a nucleotide sequence that is complementary to a sequence of a target nucleic acid).
  • the CRISPR/Cas effector polypeptide of the complex provides the site-specific activity (e.g., cleavage activity or an activity provided by the CRISPR/Cas effector polypeptide when the CRISPR/Cas effector polypeptide is a CRISPR/Cas effector polypeptide fusion polypeptide, i.e., has a fusion partner).
  • the CRISPR/Cas effector polypeptide is guided to a target nucleic acid sequence (e.g. a target sequence in a chromosomal nucleic acid, e.g., a chromosome; a target sequence in an extrachromosomal nucleic acid, e.g. an episomal nucleic acid, a minicircle, an ssRNA, an ssDNA, etc.; a target sequence in a mitochondrial nucleic acid; a target sequence in a chloroplast nucleic acid; a target sequence in a plasmid; a target sequence in a viral nucleic acid; etc.) by virtue of its association with the guide RNA.
  • a target nucleic acid sequence e.g. a target sequence in a chromosomal nucleic acid, e.g., a chromosome
  • a target sequence in an extrachromosomal nucleic acid e.g. an episomal nucleic acid,
  • the “guide sequence” also referred to as the “targeting sequence” of a guide RNA can be modified so that the guide RNA can target a CRISPR/Cas effector polypeptide to any desired sequence of any desired target nucleic acid, with the exception that the protospacer adjacent motif (PAM) sequence can be considered.
  • PAM protospacer adjacent motif
  • a guide RNA can have a targeting segment with a sequence (a guide sequence) that has complementarity with (e.g., can hybridize to) a sequence in a nucleic acid in a eukaryotic cell, e.g., a viral nucleic acid, a eukaryotic nucleic acid (e.g., a eukaryotic chromosome, chromosomal sequence, a eukaryotic RNA, etc.), and the like.
  • a eukaryotic cell e.g., a viral nucleic acid, a eukaryotic nucleic acid (e.g., a eukaryotic chromosome, chromosomal sequence, a eukaryotic RNA, etc.), and the like.
  • a guide RNA includes two separate nucleic acid molecules: an “activator” and a “targeter” and is referred to herein as a “dual guide RNA,” a “double-molecule guide RNA,” a “two-molecule guide RNA,” or a “dgRNA.”
  • the activator and targeter are covalently linked to one another (e.g., via intervening nucleotides) and the guide RNA is referred to as a “single guide RNA,” a “Cas9 single guide RNA,” a “single-molecule Cas9 guide RNA,” or a “one- molecule Cas9 guide RNA,” or simply “sgRNA.”
  • the target DNA can be a genomic nucleic acid, a mitochondrial nucleic acid; a chloroplast nucleic acid; a plasmid; or a viral nucleic acid.
  • the target DNA can be isolated from a cell or can be within an intact cell.
  • RNA-guided endonuclease for example, Cas9 and the guide-RNA
  • the RNA-guided endonuclease are transfected into the cells to contact the RNA-guided endonuclease with the genomic DNA of the cell.
  • An example of such transfection method is disclosed in the “Materials and Methods” below.
  • CRISPR is used to target expressed genes.
  • the guide RNAs can be designed to target (1) exon-intron junctions, (2) exon sequences, or (3) regulatory sequences. Once a guide RNAs are selected, it can be synthesized for singleplex or multiplex transduction.
  • CRISPR-mediated genetic editing can involve different cell delivery systems, including: (1) plasmid transfection; (2) viral transduction that stably integrates the gRNA sequence into the genome; (3) a gRNA (singleplex or multiplex) that is directly associated with Cas9 for ribonucleoprotein (RNP) delivery.
  • plasmid transfection a viral transduction that stably integrates the gRNA sequence into the genome
  • gRNA singleplex or multiplex
  • Cas9 for ribonucleoprotein (RNP) delivery ribonucleoprotein
  • Any of these Cas9-guide RNA delivery systems can be used to genetically modify cells grown in tissue culture or directly on tissue using RNP delivery.
  • CRISPR/Cas9 and other enzymes in the class introduces double stranded DNA breaks (DSBs) - this genomic alteration leads to insertions and deletions (indels).
  • DSBs double stranded DNA breaks
  • Indels insertions and deletions
  • Base editors introduce point mutations without a DNA double-strand break (DSB) or a requirement for template donor DNA (Gaudelli, 2017; Komor, 2016; Nishida, 2016; Kim, 2019).
  • CBEs cytosine base editors
  • ABEs adenine base editors
  • CBEs were developed by combining APOBEC1 enzymes, which remove an amine group from cytosine, with catalytically dead Cas9 (dCas9) or Cas9 nickase (nCas9) ( Komor, 2016).
  • ABEs involve fusing an adenine deaminase to the Cas9 variant. Because an adenine deaminase accepts single-stranded DNA as a substrate, researchers created new ssDNA-targetable enzymes with engineered adenine deaminases (Gaudelli, 2017; Kim, 2019).
  • Based editors allow for engineering in specific point mutations into the genome and allows their detection at single cell resolution. It does this by using base editor technology to introduce the mutation followed by single cell long read sequencing to determine which cells have the mutation Single cells undergo targeted sequencing of the engineered mutation. This is done by targeted sequencing of the specific gene undergo base editors. This involves using a special primer that provides multiplexed amplification of the cDNA target that were engineered. Then, the targeted products undergo long read sequencing and the mutation is identified.
  • a point mutation on the cell’s function can be determined by integrating the long-read sequencing to identify the cells with the point mutation and the short read sequencing which identifies changes in that specific cell’s gene expression. Combining the long and short read cell barcodes, one has the single cell sequence data of both complete cDNAs and gene expression. The process of using a based editor to introduce engineered gene mutations into single cells is illustrated in Fig. 5.
  • the first step is the binding of base editor-gRNA complex to its target DNA.
  • the base pairing of the gRNA molecule and the complementary target DNA strand approximately 20nt of single-stranded DNA are displaced.
  • the deaminase enzyme edits the target DNA bases within this ssDNA (i.e., R-loop).
  • Base editors work efficiently in human cells with comparable efficiencies of Cas9 (Kim, 2019). Therefore, base editors are an adaptable tool for introducing various genetic substitution mutations in the genome. Using a specially designed gRNA that acts as a repair template, prime editors introduce the mutation.
  • An assay disclosed herein can be applied across a range of cell numbers.
  • a multiplexed transduction with a guide RNA library can be conducted.
  • sets of cells and assigning genetic modifications, e.g., CRISPR edits a multiplexed transduction can be conducted on sets of cells that are grouped into different partitions, separate wells, or separate plates.
  • the cells can be grown, harvested as a single cell suspension and cDNA can be prepared.
  • a cell indexing barcode can be incorporated into the cDNAs at the 5’ or 3’ end such that one can assign a set of cells to a given guide RNA library.
  • the barcode can be used on different number of cells ranging from one cell to a group of cells.
  • cells can be grown in partitions, wells, or plates.
  • each set of cells can be transduced with a CRISPR-Cas9 involving a multiplexed pool of guide RNAs.
  • Intact cDNA can be prepared for sequencing without any fragmentation. Avoiding fragmentation retains the full length of the cDNA as an extended molecule.
  • targeted sequencing can be performed on the gene or sets of genes that were targeted for modification.
  • the targeted sequence library preparation can involve: (1) PCR amplification, (2) selective hybridization with a bait oligonucleotide or (3) single primer extension of the target gene or cDNA from the target gene.
  • the sequencing library preparation can be performed with a full-length cDNA without any fragmentation.
  • long-read sequencing can be conducted.
  • the Oxford NanoporeTM or Pacific BiosciencesTM sequencing methods which generates long-read sequences can be used.
  • the cell indexing barcode can be first identified from the long-read sequence to determine which cells were exposed to a given guide RNA. Then, how the target cDNA sequence was changed could be determined.
  • the long read sequence can cover the entire mRNA sequence and, therefore, a specific genetic modification can be found at any location in the transcript and still be linked to the cell index barcode at the 5’ or 3’ end.
  • the cell indexing barcode from the long read with the CRISPR genotype can be matched with the same cell indexing barcode from short read data (RNA-Seq or antibody barcodes). Linking these two barcodes enables the CRISPR genotype to be assigned to a given molecular phenotype for a specific population of cells.
  • certain embodiments of the invention disclose how single cell long read analysis and genetic modifications, e.g., CRISPR edits, can be used to directly confirm the genetic modifications and used in cellular engineering applications.
  • Certain embodiments of the disclosure provide a method for analyzing cells, comprising:
  • step (c) on the basis of single cells or sets of cells, comparing the identified modification in the target gene with the modification expected in step (a).
  • Any target gene can be genetically modified. Also, any portion of the target gene can be modified, which includes: exon-intron junction, protein-coding sequence of a gene, promoter of a gene, or 3’ untranslated region of a gene.
  • the term “on the basis of a single cell” as used herein indicates that the analysis is made on a cell-by-cell basis. For example, the coding sequence of the mRNA encoded by the target gene is identified in individual cells from the population of genetically modified cells. Similarly, the identified target modification in individual cells is compared to the modification expected in step (a).
  • the term “on the basis of sets of cells” as used herein indicates that the analysis is made on different sets of cells. For example, the coding sequence of the mRNA encoded by the target gene is identified in different sets of cells, particularly, wherein different sets of cells could be descendants from different cells from the population of genetically modified cells. Similarly, the identified target modification in sets of cells is compared to the modification expected in step (a).
  • sequencing the mRNA encoded by the target gene from the genetically modified cells is performed on the basis of single cells. Also, in some cases, sequencing the mRNA encoded by the target gene from the genetically modified cells is performed on the basis of sets of cells.
  • step (b) can comprises: (i) separating single cells or sets of cells from the genetically modified cells,
  • step (ii) reverse transcribing the mRNAs encoded by the target genes from the single cells or sets of cells separated in step (i) to produce cDNAs, wherein the primer for the reverse transcription of the mRNAs comprises a unique barcode on the basis of single cells or sets of cells,
  • 100 unique barcodes can be incorporated in 100 reverse transcription primers, each of which contains: 1) a sequence that binds to the mRNA encoded by the target gene and 2) a unique barcode.
  • the reverse transcription primer can contain a primer binding site that could be used to subsequently amplify the cDNA.
  • the sequence that binds to the mRNA encoded by the target gene can be the same or different in the different reverse transcription primers.
  • 100 unique barcodes can be incorporated in 100 reverse transcription primers, each of which contains: 1) a sequence that binds to the mRNA encoded by the target gene and 2) a unique barcode.
  • the sequence that binds to the mRNA encoded by the target gene can be the same or different in the different reverse transcription primers.
  • the cDNAs are attached to the beads and the method comprises in step (iii), pooling the beads.
  • the reverse transcription primers could be attached to the beads, thereby attaching the amplified cDNA to the beads.
  • the step of separating single cells or sets of cells from the genetically modified cells comprises separating the single cells or sets of cells in individual wells of a multi-well plate or separating the cells in individual droplets in an emulsion.
  • imRNA can be isolated from the single cells.
  • the step of producing the cDNA can be performed in the droplet.
  • single cells cultured in individual wells can be grown into multiple cells, i.e., to produce sets of cells that descend from the single cells.
  • mRNA can be isolated from the sets of cells and treated according to the methods disclosed herein.
  • the cDNAs produced from single cells or sets of cells are sequenced.
  • substantial entirety of the mRNAs encoded by the target gene is sequenced.
  • the term “substantial entirety of an mRNA” includes sequences from the first exon to the last exon of a transcript with the possible exception of the sequences at the 5’ end of the first exon and the 3’ end of the last exon.
  • the sequences at the 5’ end of the first exon and the 3’ end of the last exon could be used for primer binding and, therefore, mutations in these sequences may not be detected.
  • the method disclosed herein comprises PCR amplifying the cDNAs containing the unique barcodes.
  • the reverse transcription primer can contain a primer binding site that could be used to subsequently amplify the cDNA. Accordingly, one of the primer pairs that amplifies a cDNA can bind to the primer binding site introduced into the cDNA via the reverse transcription primer. In certain such cases, the other primer can bind to the sequence at the 3’ end of the cDNA.
  • primer pairs can be designed that specifically bind to the sequences at the 5’ and the 3’ ends of the cDNA. Such sequences can be designed based on the sequence of the target gene. Therefore, in some cases, the primer pair for amplifying the cDNAs comprises: a first primer that hybridizes with the sequence at the 5’ end of the mRNA encoded by the target gene and a second primer that hybridizes with the sequence at the 3’ of the mRNA encoded by the target gene, and wherein one or both the primers in the primer pair comprise a unique barcode.
  • the method disclosed herein involves amplifying the cDNA produced from a single cell or a set of cells using a primer pair. The amplification product so produced contains the barcode introduced into the cDNA thereby indicating the source cell or group of cells of the cDNA.
  • the amplified cDNA can be sequenced, particularly, using long-read sequencing.
  • the long-read sequencing comprises single molecule real time (SMRT) sequencing or nanopore sequencing.
  • SMRT sequencing can be circular consensus sequencing or continuous long read sequencing.
  • SMRT sequencing an amplicon is ligated to hairpin adapters to form a circular molecule, called a SMRT bell.
  • the SMRTbell is bound by a DNA polymerase and loaded onto a SMRT Cell for sequencing.
  • a SMRT Cell can contain up to 8 million zero-mode waveguides (ZMWs). ZMWs are chambers of picolitre volumes. Light penetrates the lower 20-30 nm of SMRT Cells. The SMRTbell template and polymerase become immobilized on the bottom of the chamber.
  • dNTPs deoxynucleoside triphosphates
  • nanopore sequencing long DNA strand is tagged with sequencing adapters preloaded with a motor protein on one or both ends.
  • the DNA is combined with tethering proteins and loaded onto the flow cell for sequencing.
  • the flow cell contains protein nanopores embedded in a synthetic membrane.
  • the tethering proteins bring the molecules to be sequenced towards the nanopores and as the motor protein unwinds the DNA, an electric current is applied, which drives the negatively charged DNA through the pore.
  • the DNA is sequenced as it passes through the pore and causes characteristic changes in the current.
  • Long-read sequencing can sequence at least about 500 or at least about 600 bases. Particularly, long-read sequencing sequences at least 800, at least 1000, at least 1200, at least 1400, at least 1600, at least 1800, at least 2000, at least 2500, or at least 3,000 bases of the amplified products. Thus, the long-read sequence can be used to sequence a target mRNA of at least 500 to at least 3,000 bases in length.
  • the method comprises further sequencing the transcriptomes of the genetically modified cells on the basis of single cells or sets of cells.
  • the method comprises conducting short-read sequencing of the transcriptome on the basis of single cells or sets of cells.
  • mRNA is isolated from the single cells offsets of cells and analyzed via transcriptome analysis by short-read sequencing.
  • sequencing the transcriptomes comprises:
  • step (ii) reverse transcribing the transcriptomes from the genetically modified cells or sets of cells separated in step (i) to produce cDNAs of the transcriptomes, wherein the primers used for the reverse transcription comprise a unique barcode on the basis of single cells or sets of cells,
  • step (iv) amplifying the cDNAs and sequencing the amplification products of the cDNAs of the transcriptomes, and (iv) depending on the unique barcodes in the amplification products produced in step (iv), quantifying the transcriptomes from the genetically modified cells on the basis of single cells or sets of cells.
  • the reverse transcribing the transcriptomes can be performed using primers comprising: 1 ) random nucleotide sequences, for example, random hexamers, or 2) oligo-dT sequence.
  • the primers can have a unique barcode on the basis of single cells or sets of cells.
  • the same barcode can be used in long-read sequencing of mRNA sequencing of the target gene from a cell or set of cells and short-read sequencing of the transcriptome of the cell or the set of cells.
  • the same barcode could be used to attribute a cDNA sequence and the transcriptome sequence to a cell or a set of cells.
  • Transcriptome from the single cells or sets of cells can be sequenced using short-read sequencing.
  • the short-read sequencing comprises paired-end sequencing. Certain details of short-read sequencing are also described by the Logsdon etal. (2020) publication.
  • the disclosure provides a method of determining efficacy of a molecular tool for editing the target gene.
  • Certain such methods comprise methods of analyzing cells as disclosed herein and further comparing, on the basis of single cells or sets of cells, the observed modification to the target gene with the modification expected in the target gene. Based on the number of single cells or the number of sets of cells that exhibit the desired modification as compared to the total number of genetically modified cells or sets of cells, the efficacy of the molecular tool for producing the genetic modification can be determined.
  • the methods disclosed herein involve editing one or more target genes. Certain such methods comprise methods of analyzing cells as disclosed herein and further comparing:
  • the mRNA encoded by one or more additional target genes can be analyzed on the basis of single cells or sets of cells.
  • the same barcode can be incorporated in the cDNAs produced from one cell or one set of cell.
  • multiple reverse transcriptase primers can be designed, each primer directed at producing cDNA from a different target mRNA but all reverse transcriptase primers having one barcode.
  • all cDNAs from a single cell or a set of cell contains the same barcode.
  • step (b) can be performed by:
  • step (ii) reverse transcribing the mRNAs encoded by the one or more additional target genes from the cells or sets of cells separated in step (i) to produce cDNAs for the one or more additional target genes, wherein the primer for the reverse transcription of the mRNAs comprises a unique barcode on the basis of single cells or sets of cells,
  • the methods described above for sequencing cDNAs for a target gene can be similarly applied for sequencing cDNAs for the one or more additional target genes.
  • the mRNA for the one or more additional target genes can be sequenced using long-read sequencing. Certain details of long-read sequencing are described above and such are also applicable to sequencing one or more additional target genes.
  • kits having one or more components and/or reagents and/or devices, where applicable, for practicing one or more of the above-described methods.
  • the subject kits may vary greatly. Kits of interest include those having one or more reagents mentioned herein, and associated devices where applicable, with respect to the steps of:
  • step (c) on the basis of single cells or sets of cells, comparing the identified modification in the target gene with the modification expected in step (a).
  • Kits may include certain combinations of components in a single reaction vessel. Kits may include different components in different vessels.
  • a kit comprises: reagents for genetic modifications in a target gene, such as CRISPR-Cas9 and gRNA; transfection reagents, cells or cell lines, media for culturing the cells, reverse transcription primers, primer pairs for amplification of cDNAs, reagents for sequencing, etc.
  • the methods described in this disclosure find use in a variety of applications. Applications of interest include, but are not limited to: research applications and therapeutic applications. Methods of the invention find use in a variety of different applications including any convenient application where identifying effects of genetic modifications, e.g., CRISPR-mediated genomic editing is desired.
  • the method finds particular use in analyzing and/or engineering therapeutic cells, e.g., genetically engineered cells that are destined for therapeutic use, e.g., stem cells or immune cells.
  • the method may be used to analyze knockouts and/or modifications in T cells or natural killer (NK) cells.
  • the method may be used to analyze therapeutic cells that have been modified by CRISPR editing to be allogenic.
  • the method may be used to analyze immune cells that have a knockout in a immune checkpoint inhibitor such as PD1 , CTLA-4, TIM-3, VISTA, LAG-3, IDO or KIR, etc., that have a knockout in an endogenous receptor such as a knockout in TRAC or TRBC, etc, or that have CRISPR-mediated edit that modifies the expression of a cytokine or other inflammatory molecule or a component of a signal transduction pathway, etc.
  • the cells being analyzed may be primary immune cells, or they may be expanded primary immune cells.
  • HEK293T cells and Cas9-stable HEK293T cells were maintained in Dulbecco's Modified Eagle Medium (DMEM) supplemented with 10% Fetal Bovine Serum (FBS).
  • DMEM Dulbecco's Modified Eagle Medium
  • FBS Fetal Bovine Serum
  • Jurkat ATCC TIB-152
  • RPMI Roswell Park Memorial Institute 1640 Medium supplemented with 10%
  • the oligonucleotide pool for guide RNA library cloning were synthesized. Amplified guide RNA cassettes were cloned into a plasmid for expression.
  • HEK293T cells 2.0 x 10 6 HEK293T cells were plated 24h prior to transfection.
  • Cells were transfected with lentiviral sguide RNA library (2000ng), psPAX2 (1500ng, Addgene plasmid #12260) and pMD2.G (500ng, Addgene plasmid #12259) using a lipofectant agent.
  • the viral supernatant was collected after 48hr of transfection, filtered through a 0.45miti filter, and used.
  • the lentiviral supernatant and 8pg of polybrene were added and the mixture was centrifuged at 800g for 30 minutes at 32 degrees. After that, cell pellets were resuspended to fresh media and plated in a 6- well plate. After 72 hours, transduced cells were selected by puromycin.
  • Short read transcripts Basecalling for 5’ gene expression libraries was performed followed by alignment to reference genome GRCh38, and transcript quantification. In preparation for integrated analysis, a cell transcript data matrix was processed by removing cells with fewer than 100 or more than 8000 genes, cells with more than 30% mitochondrial genes. Additionally, any genes present in 3 or fewer cells were removed. Dimension reduction was performed using principal component analysis and UMAP with 30 principal components and cluster resolution of 0.8.
  • the putative long- read barcode is identified by evaluating the soft clipped portions of the aligned long reads, which are extracted using a custom python script.
  • a second custom python script used a machine-learning approach to identify the barcode.
  • the list of valid short-read barcodes was vectorized using with a kmer length of 8 to create a reference list.
  • the 5’ soft-clipped region of each read was then vectorized in the same way and compared to the created reference using a cosine similarity metric.
  • the 3’ soft-clipped region of each read was evaluated by matching the reverse-complement of the soft-clipped sequence to the reference list.
  • the sguide RNA was in-vitro transcribed by T7 RNA polymerase. Templates for sguide RNA were generated by extension of two complementary oligo nucleotides. Transcribed RNA was purified by column purification. Purified RNA was quantified by fluorimetry.
  • Example 1 Identifying CRISPR edits using long reads that covers mRNA - full lenpth cDNA from individual cells
  • Each cell indexing barcode represents one cell.
  • the cell indexing barcode consists of a DNA sequence that is specifically added to the cDNA extracted from an individual cell or set of cells (FIG. 1). As noted, each cell indexing barcode represents only one cell. In the case of populations of cells, the cell indexing barcode represents a group of cells (two or more).
  • the RACK1 transcript was amplified using two primers that included sequence from the 5’ adaptor for the sequencing library and the last exon of RACK1.
  • the amplified full-length cDNA underwent nanopore sequencing (Oxford Nanopore) which generates long reads. We performed base calling and aligned the long reads to the reference genome, GrCh38.
  • the full length cDNAs had cell indexing barcodes at the 5’ end - as noted, this barcodes enables the assignment of a long read covering a transcript to its cell or cells of origin.
  • each long- read sequence did not align to the human genome.
  • This non-aligning sequence represents the cell indexing barcode which is not found in the human genome sequence.
  • the soft-clipped sequence and the whitelist of barcodes are vectorized using 8-mers, a sequence of eight bps. The frequency of these short sequence tracts (i.e. k-mers) were determined from a whitelist of predesignated barcodes representing the ground truth.
  • the long sequence read covers the entire mRNA sequence. Therefore a specific genotype edit can be identified at any location in the transcript and still be linked to the cell index barcode at the 5’ or 3’ end.
  • Using long read sequencing of target cDNAs we characterized the sequence of the mRNA in each individual cell. This analysis identified the different RACK1 transcript isoforms as well as the cell indexing barcodes identified the individual cell from which the mRNA originated. Then, we aggregated reads per cell indexing barcode and used hierarchical clustering to determine the distribution of different cDNA sequences, representing the full length mRNA, among cell subpopulations.
  • the long read sequence covers the entire mRNA sequence and, therefore, a specific genotype edit can be found at any location in the transcript and still be linked to the cell index barcode at the 5’ or 3’ end. From the reads that covered the entire RACK1 cDNA, we determined the structure of the transcript and the composition of the cell indexing barcode.
  • the different CRISPR-generated RACK1 isoforms changes the gene expression for a given cell or set of cells.
  • This sequence linkage can use any type of single cell library process where one matches the cell indexing barcode sequences between the two different sequence data sets (single cell long read and single cell short read) that come from the same cell population.
  • PTPRC transmembrane phosphatase - its pre-mRNA alternative splicing is critical for changing T cell regulatory states.
  • PTPRC has five highly expressed isoforms. This includes two short ones where there is substantial degree of exon loss and longer isoforms where the majority of exons from the variable region are retained.
  • CD4-CD8-double negative T cells and NK precursor cells preferentially express longer isoforms like RABC and RBC and when activated, T cells and NK cells preferentially express shorter isoforms like RO and RB.
  • PTPRC transcript isoform structure.
  • a guide RNA lentiviral library targeting 16 splicing factors (two guide RNAs per gene) and five non-targeting guide RNAs as negative controls (Table 1).
  • HNRNPLL and SRSF5 induce exon skipping of PTPRC and PCBP2 and HNRNPD inhibit exon skipping.
  • HNRNPLL and SRSF5 knock-outs inhibited PTPRC exon skipping, their isoform expression patterns were significantly different (data not shown). The ratio of RBC and RABC isoforms was higher in the knock-outs than in non-targeted cells.
  • HNRNPLL gene for a single knock-out experiment.
  • RNP Cas9 ribonucleoprotein
  • Methods the Cas9 ribonucleoprotein
  • HNRNPLL RNP-treated cells Most of the stimulated wild-type cells had RO and RB transcript isoforms. Flowever, the stimulated HNRNPLL RNP-treated cells had less RO and RB transcript isoforms (10.32-fold, P ⁇ 1.0e-5, FIGS. 3B and 3C).
  • PTPRC we analyzed the impact of splicing factors on myosin light chain 6 ( MYL6 ) transcript isoforms. Exon6 skipping of MYL6 is known to be regulated by various splicing factors.
  • Table 4 List of oligonucleotides for gRNA capture. Table 4. Mutation rate detected from long-read sequencing for each gRNA target.
  • T ⁇ -> C and A ⁇ -> G were identified that are suited for ABE and CBE enzymes.
  • the C to T transition is one of the most frequent mutations in human genome. For example, among the 10 most reported TP53 mutations reported in COSMIC database, 9 mutations were transitions. Among this set, CBEs can be used to engineer in eight while ABEs can engineer one. Using TP53 as an example, nine out of the ten mutations were viable candidates for base editors.
  • the spCas9 base editor requires ‘NGG’ - this sequence is referred to as the protospacer adjacent motif (PAM).
  • PAM protospacer adjacent motif
  • Oligonucleotides were synthesized with the gRNA sequence and subclone them into plasmid or lentiviral vectors. For less than 100 gRNAs we will order single oligonucleotides. For larger sets, we will order oligonucleotides from array synthesis and include primer sequences to enable rapid subcloning into plasmid or lentiviral vectors.
  • Base editors were applied to introduce mutations into a cell line. 10 gRNAs which target various TP53 mutations were designed. Using electroporation, a multiplexed plasmid pool with all gRNAs and CRISPR based editors was transduced into the colon cancer cell-line HOT 116. The cells underwent single cell cDNA generation and then were sequenced with both short- and long read platforms.
  • a targeted multiplexed enrichment was done using a based on single-primer extension method.
  • primers were designed for the target transcripts and then synthesized.
  • a linear single primer extension from the cDNA library increased the yield of the target while minimizing the generation of off-target sequences.
  • the target product undergoes a 2 nd DNA strand synthesis using DNA polymerase.
  • the product is loaded on to a single molecule sequencer (Oxford Nanopore or Pacific Biosciences) and the target reads are analyzed. The mutation is identified within single cell using the cell barcode information.
  • the short read sequencing provided the gene expression profile for each cell.
  • Fig. 7 shows how long and short read single cell sequencing can be integrated with to match the gene expression profiles from single cells with the mutation.
  • the cells with a TP53 mutation showed distinct transcriptional patterns compared to the cells with wildtype TP53. This proof-of-concept study demonstrated that this technology provides high-throughput engineering and analysis of various cancer mutations into single cells.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Biomedical Technology (AREA)
  • Wood Science & Technology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Plant Pathology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The disclosure pertains to a method for analyzing cells, comprising: editing a target gene in a population of cells to produce genetically modified cells; on the basis of single cells or sets of cells, sequencing substantially the entirety of the coding sequence of the mRNA encoded by the target gene from the genetically modified cells or sets of cells to identify a modification in the target gene; and on the basis of single cells or sets of cells, comparing the identified modification in the target gene with the expected genetic modification. The disclosure also pertains to determining efficacy of a method for genetically modifying a target gene. Further provided is a method for determining the effects of a genetic modification on global gene expression. Kits for performing the methods disclosed herein are also provided.

Description

DETECTING CRISPR GENOME MODIFICATION ON A CELL-BY-CELL BASIS
CROSS-REFERENCING
This application claims the benefit of U.S. provisional application serial no. 63/214,680, filed on June 24, 2021 , which application is incorporated by reference herein.
INTRODUCTION
Various molecular tools could be designed to produce a desired genomic modification. Different molecular tools that can produce a desired genetic modification may exhibit different efficacy of achieving the desired genetic modification. Moreover, different molecular tools may exhibit different effects on the expression of the non-target genes and on the global gene expression in the genetically modified cell.
SUMMARY
Provided herein is an assay that could be used to confirm that a molecular tool, such as CRISPR-Cas9, that is designed to produce a genetic modification produces the desired genetic modification, particularly, on a cell-by-cell basis. An assay is also provided for screening molecular tools that are designed for a genetic modification, particularly, on a cell-by-cell basis, to identify the molecular tool that produces a desired genomic modification. Further, an assay is provided for screening the effects of a molecular tool that is designed to generate a genetic modification on the global gene expression profile of the genetically modified cells.
A Kit is also provided that could be used for performing the method disclosed herein.
BRIEF DESCRIPTION OF THE FIGURES
FIG. 1. Schematic representation of exemplary embodiment of the disclosure. (1) CRISPR-based genome editing to introduce changes into a gene’s sequence; (2) long-read sequencing to characterize the CRISPR-based alterations based on changes in the imRNA sequence; (3) cDNA barcoding to determine which cell or cell population has the CRISPR edit; (4) linkage of the CRISPR edit observed in the long-read sequence data with the short-read sequencing from the same cell or set of cells with the forementioned CRISPR edit. Long-read sequencing encompass read lengths greater than 500-600 bases. Short read sequences are defined as read lengths less than 500-600 bases.
FIGS. 2A-2E. (A) Overview of single-cell short/long-read integration strategy. (B) Structure of RACK1 transcript. The red arrows point to CRISPR target sites that disrupt splicing acceptor sequences. (C,D) Heatmaps showing the proportion of each transcript isoform (x-axis) with each cell (y-axis) and clustering based on transcript isoform proportion for RACK1 exon5 treated cells (c) and multiple exon treated cells (d). (E) Expression level of RACK1 in each cell cluster. P values were calculated in comparison with cluster 3 (P= 5.7e-12, 2.6e-11 , 9.9e-11 , 1.6e-10, 1.1e-07, 9.5e-06, 2.5e-07, 1.1e-08 ; two-sided t-test).
FIGS. 3A-3C. (A) Overview of single-cell CRISPR screen integrated with long-read sequencing. (B) Boxplot showing the ratio of short PTPRC transcript isoform (RO and RB) for cells with guide RNAs targeting indicated genes. P values are calculated in comparison with the nontarget cells. Genes which have less than 3 cells with target guide RNAs are not shown. (C) Heatmap showing proportion of each transcript isoform (x-axis) with each cell (y-axis) and clustering based on transcript isoform proportion for cells having indicated guide RNA sequence.
FIGS. 4A-4C. (A) Overview of splicing factors affect alternative splicing. (B) Quantification of short transcript isoform per target gene. For each gene (x-axis), cells with guide RNAs target the gene were grouped and the ratio of transcript isoform RO and RB among all PTPRC isoforms are shown as box plot. (C)
Heatmap showing proportion of each transcript isoform (x-axis) with each cell (y- axis) and clustering based on transcript isoform proportion for each sample. FIG. 5 illustrates a process of using a based editor to introduce engineered gene mutations into single cells.
FIG. 6 illustrates a multiplexed sequencing approach to identify mutations from single cell RNA-seq. SPEX refers to single cell prime extension (SPEX).
FIG. 7 illustrates single-cell level detection of CRISPR induced TP53 mutations and their effect on single cell expression.
DEFINITIONS
Before embodiments of the present disclosure are further described, it is to be understood that this disclosure is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present disclosure will be limited only by the appended claims.
The terms “polynucleotide” and “nucleic acid,” used interchangeably herein, refer to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. Thus, this term includes, but is not limited to, single-, double-, or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.
By “hybridizable” or “complementary” or “substantially complementary" it is meant that a nucleic acid (e.g. RNA, DNA) comprises a sequence of nucleotides that enables it to non-covalently bind, i.e. form Watson-Crick base pairs and/or G/U base pairs, “anneal”, or “hybridize,” to another nucleic acid in a sequence-specific, antiparallel, manner (i.e., a nucleic acid specifically binds to a complementary nucleic acid) under the appropriate in vitro and/or in vivo conditions of temperature and solution ionic strength. Standard Watson-Crick base-pairing includes: adenine (A) pairing with thymidine (T), adenine (A) pairing with uracil (U), and guanine (G) pairing with cytosine (C) [DNA, RNA] In addition, for hybridization between two RNA molecules (e.g., dsRNA), and for hybridization of a DNA molecule with an RNA molecule (e.g., when a DNA target nucleic acid base pairs with a guide RNA, etc.): guanine (G) can also base pair with uracil (U). For example, G/U base-pairing is at least partially responsible for the degeneracy (i.e., redundancy) of the genetic code in the context of tRNA anti-codon base-pairing with codons in mRNA. Thus, in the context of this disclosure, a guanine (G) (e.g., of dsRNA duplex of a guide RNA molecule; of a guide RNA base pairing with a target nucleic acid, etc.) is considered complementary to both a uracil (U) and to an adenine (A). For example, when a G/U base-pair can be made at a given nucleotide position of a dsRNA duplex of a guide RNA molecule, the position is not considered to be non-complementary, but is instead considered to be complementary.
It is understood that the sequence of a polynucleotide need not be 100% complementary to that of its target nucleic acid to be specifically hybridizable or hybridizable. Moreover, a polynucleotide may hybridize over one or more segments such that intervening or adjacent segments are not involved in the hybridization event (e.g., a bulge, a loop structure or hairpin structure, etc.). A polynucleotide can comprise 60% or more, 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% or more, or 100% sequence complementarity to a target region within the target nucleic acid sequence to which it will hybridize. For example, an antisense nucleic acid in which 18 of 20 nucleotides of the antisense compound are complementary to a target region, and would therefore specifically hybridize, would represent 90 percent complementarity. In this example, the remaining noncomplementary nucleotides may be clustered or interspersed with complementary nucleotides and need not be contiguous to each other or to complementary nucleotides. Percent complementarity between particular stretches of nucleic acid sequences within nucleic acids can be determined using any convenient method. Example methods include BLAST programs (basic local alignment search tools) and PowerBLAST programs (Altschul et al., J. Mol. Biol., 1990, 215, 403-410; Zhang and Madden, Genome Res., 1997, 7, 649-656), the Gap program (Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics Computer Group, University Research Park, Madison Wis.), e.g., using default settings, which uses the algorithm of Smith and Waterman (Adv. Appl. Math., 1981 , 2, 482-489), and the like.
"Binding" as used herein (e.g. with reference to an RNA-binding domain of a polypeptide, binding to a target nucleic acid, and the like) refers to a non-covalent interaction between macromolecules (e.g., between a protein and a nucleic acid; between a modified CRISPR/Cas effector polypeptide/guide RNA complex and a target nucleic acid; and the like). While in a state of non-covalent interaction, the macromolecules are said to be “associated” or “interacting” or “binding” (e.g., when a molecule X is said to interact with a molecule Y, it is meant the molecule X binds to molecule Y in a non-covalent manner). Not all components of a binding interaction need be sequence-specific (e.g., contacts with phosphate residues in a DNA backbone), but some portions of a binding interaction may be sequence-specific. Binding interactions are generally characterized by a dissociation constant (KD) of less than 106 M, less than 107 M, less than 108 M, less than 109 M, less than 10 10 M, less than 10 11 M, less than 10 12 M, less than 10 13 M, less than 10 14 M, or less than 10-15 M. "Affinity" refers to the strength of binding, increased binding affinity being correlated with a lower KD.
A “cell” as used herein, denotes an in vivo or in vitro eukaryotic cell or a cell line.
A “binding site for a guide-RNA” as used herein is a polynucleotide (e.g., DNA such as genomic DNA) that includes a site ("target site" or "target sequence") targeted by a modified CRISPR/Cas effector polypeptide. The target sequence is the sequence to which the guide sequence of a guide nucleic acid (e.g., guide RNA; e.g., a dual guide RNA or a single-molecule guide RNA) will hybridize. For example, the target site (or target sequence) 5'-GAGCAUAUC-3' within a target nucleic acid is targeted by (or is bound by, or hybridizes with, or is complementary to) the sequence 5’- -3’. Suitable hybridization conditions include physiological conditions normally present in a cell. For a double stranded target nucleic acid, the strand of the target nucleic acid that is complementary to and hybridizes with the guide RNA is referred to as the “complementary strand” or “target strand”; while the strand of the target nucleic acid that is complementary to the “target strand” (and is therefore not complementary to the guide RNA) is referred to as the “non-target strand” or “non complementary strand.”
As used herein, the term “long-read sequencing” refers to sequencing read lengths greater than 500 bases, particularly, longer than 600 bases. The term “short read sequencing” refers to sequencing read lengths less than 600 bases, particularly, less than 500 bases.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
Certain ranges are presented herein with numerical values being preceded by the term "about." The term "about" is used herein to provide literal support for the exact number that it precedes, as well as a number that is near to or approximately the number that the term precedes. In determining whether a number is near to or approximately a specifically recited number, the near or approximating unrecited number may be a number which, in the context in which it is presented, provides the substantial equivalent of the specifically recited number.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, representative illustrative methods and materials are now described.
All publications and patents cited in this specification are herein incorporated by reference as if each individual publication or patent were specifically and individually indicated to be incorporated by reference and are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. The citation of any publication is for its disclosure prior to the filing date and should not be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed.
It is noted that, as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.
As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present invention. Any recited method can be carried out in the order of events recited or in any other order which is logically possible.
While the method has or will be described for the sake of grammatical fluidity with functional explanations, it is to be expressly understood that the claims, unless expressly formulated under 35 U.S.C. §112, are not to be construed as necessarily limited in any way by the construction of "means" or "steps" limitations, but are to be accorded the full scope of the meaning and equivalents of the definition provided by the claims under the judicial doctrine of equivalents, and in the case where the claims are expressly formulated under 35 U.S.C. §112 are to be accorded full statutory equivalents under 35 U.S.C. §112. In describing and claiming the present invention, certain terminology will be used in accordance with the definitions set out below. It will be appreciated that the definitions provided herein are not intended to be mutually exclusive.
As used herein, the phrases “for example,” “for instance,” “such as,” or “including” are meant to introduce examples that further clarify more general subject matter. These examples are provided only as an aid for understanding the disclosure and are not meant to be limiting in any fashion.
As used herein, the terms “may,” "optional," "optionally," or “may optionally” mean that the subsequently described circumstance may or may not occur, so that the description includes instances where the circumstance occurs and instances where it does not.
Definitions of other terms and concepts appear throughout the detailed description.
DETAILED DESCRIPTION
Disclosed herein is a new approach which characterizes genetic modification induced-alterations of full-length transcript isoforms from either individual cells or sets of cells (FIG. 1 ). Genetic modifications can be designed to alter specific exon- intron junctions that have the potential of altering transcript structure. Genetic modifications can also be designed to alter the sequence within an exon that undergoes transcription to produce a mutated mRNA transcript.
In certain cases, long read sequencing, such as single molecule real time (SMRT) sequencing or nanopore sequencing, can be performed, which provides the full-length transcript sequence. In addition, a cell’s long read sequencing can be combined with the cell’s short read transcriptome information (FIG. 2A). Overall, the data from long and short read sequencing from a single cell or a set of cells is used to assess the full transcriptome of the single cell or the set of cells.
Thus, an assay is provided herein that allows one to evaluate cells, on the basis of single cells or sets of cells, that are genetically modified, for example, via CRISPR-mediated genetic edit.
The assay disclosed herein allows: (1) confirming the genomic modification, for example, CRISPR edit, based on the target gene’s mRNA; (2) assigning a desired genetic modification, for example, CRISPR-based genomic edit, to an individual cell or set of cells; and (3) determining the effects of a genetic modification on cellular phenotypes such as global gene or protein expression. TYPES OF GENETIC MODIFICATIONS
Various tools for genetic modification are routinely used in the art. Certain non-limiting examples of such molecular tools include: 1 ) incorporation of a genetic material into a targeted site in the genome, for example, via homologous recombination; 2) random incorporation of genetic material into a target chromosome; 3) introduction of random mutations in a target genetic material, for example, via exposure to mutagens.
More recent tools for introducing genetic modifications in a target genome include programmable nuclease-based genome editing. Programmable nucleases, such as zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regularly interspaced short palindromic repeat (CRISPR)- Cas-associated nucleases, provide targeted gene editing platforms.
Nuclease-based genetic modification involves targeted alterations in genomic regions based on nuclease-induced double-stranded breaks (DSBs) at a specific desired locus in the target genome. DSBs leads to the production of damaged DNA and stimulation of the cell’s DNA repair mechanism, such as homology-directed repair (HDR) and nonhomologous end-joining (NHEJ). Specific genetic modifications can be introduced in desired target genomic sites using HDR or NHEJ.
The methods disclosed herein can be used to analyze genetic modifications introduced using any suitable tool for genetic modification. In certain cases, CRISPR-based genome editing is used to introduce genetic modifications.
CRISPR-based genome editing
In some embodiments, the assay involves: CRISPR-based genome editing to introduce changes into a gene’s sequence and long-read sequencing to characterize the CRISPR-based alterations based on the changes in the mRNA sequence. The long-read sequence can involve cDNA barcoding to determine which cell or set of cells has the desired CRISPR edit. Moreover, the CRISPR edit observed in the long-read sequence data can be linked with the short-read sequencing from the same cell or set of cells to determine the effects of the genetic modification on the cell, for example, on global gene expression. The CRISPR system suitable for use in the methods of the present disclosure can be CRISPR-Cas9.
A guide nucleic acid suitable for inclusion in a CRISPR-system used in the present disclosure can include: i) a first segment (referred to herein as a “targeting segment”); and ii) a second segment (referred to herein as a “protein-binding segment”).
A “segment” is a region of a molecule, e.g., a contiguous stretch of nucleotides in a nucleic acid molecule. A segment can also be a section of a complex such that a segment may comprise regions of more than one molecule. The “targeting segment” is also referred to herein as a “variable region” of a guide RNA. The “protein-binding segment” is also referred to herein as a “constant region” of a guide RNA. The first segment (targeting segment) of a guide RNA includes a nucleotide sequence (a guide sequence) that is complementary to (and therefore hybridizes with) a specific sequence (a target site) within a target nucleic acid (e.g., a target ssRNA, a target ssDNA, the complementary strand of a double stranded target DNA, etc.). The protein-binding segment (or “protein-binding sequence”) interacts with, for example, binds to, a CRISPR/Cas effector polypeptide. The protein-binding segment of a guide RNA includes two complementary stretches of nucleotides that hybridize to one another to form a double stranded RNA duplex (dsRNA duplex). Site-specific binding and/or cleavage of a target nucleic acid (e.g., genomic DNA) can occur at locations (e.g., target sequence of a target locus) determined by base-pairing complementarity between the guide RNA (the guide sequence of the guide RNA) and the target nucleic acid.
A guide RNA and a CRISPR/Cas effector polypeptide form a complex (e.g., bind via non-covalent interactions). The guide RNA provides target specificity to the complex by including a targeting segment, which includes a guide sequence (a nucleotide sequence that is complementary to a sequence of a target nucleic acid). The CRISPR/Cas effector polypeptide of the complex provides the site-specific activity (e.g., cleavage activity or an activity provided by the CRISPR/Cas effector polypeptide when the CRISPR/Cas effector polypeptide is a CRISPR/Cas effector polypeptide fusion polypeptide, i.e., has a fusion partner). In other words, the CRISPR/Cas effector polypeptide is guided to a target nucleic acid sequence (e.g. a target sequence in a chromosomal nucleic acid, e.g., a chromosome; a target sequence in an extrachromosomal nucleic acid, e.g. an episomal nucleic acid, a minicircle, an ssRNA, an ssDNA, etc.; a target sequence in a mitochondrial nucleic acid; a target sequence in a chloroplast nucleic acid; a target sequence in a plasmid; a target sequence in a viral nucleic acid; etc.) by virtue of its association with the guide RNA.
The “guide sequence” also referred to as the “targeting sequence” of a guide RNA can be modified so that the guide RNA can target a CRISPR/Cas effector polypeptide to any desired sequence of any desired target nucleic acid, with the exception that the protospacer adjacent motif (PAM) sequence can be considered. Thus, for example, a guide RNA can have a targeting segment with a sequence (a guide sequence) that has complementarity with (e.g., can hybridize to) a sequence in a nucleic acid in a eukaryotic cell, e.g., a viral nucleic acid, a eukaryotic nucleic acid (e.g., a eukaryotic chromosome, chromosomal sequence, a eukaryotic RNA, etc.), and the like.
In some cases, a guide RNA includes two separate nucleic acid molecules: an “activator” and a “targeter” and is referred to herein as a “dual guide RNA,” a “double-molecule guide RNA,” a “two-molecule guide RNA,” or a “dgRNA.” In some embodiments, the activator and targeter are covalently linked to one another (e.g., via intervening nucleotides) and the guide RNA is referred to as a “single guide RNA,” a “Cas9 single guide RNA,” a “single-molecule Cas9 guide RNA,” or a “one- molecule Cas9 guide RNA,” or simply “sgRNA.”
The target DNA can be a genomic nucleic acid, a mitochondrial nucleic acid; a chloroplast nucleic acid; a plasmid; or a viral nucleic acid. The target DNA can be isolated from a cell or can be within an intact cell.
The RNA-guided endonuclease, for example, Cas9 and the guide-RNA, are transfected into the cells to contact the RNA-guided endonuclease with the genomic DNA of the cell. An example of such transfection method is disclosed in the “Materials and Methods” below. In some cases, CRISPR is used to target expressed genes. The guide RNAs can be designed to target (1) exon-intron junctions, (2) exon sequences, or (3) regulatory sequences. Once a guide RNAs are selected, it can be synthesized for singleplex or multiplex transduction.
CRISPR-mediated genetic editing can involve different cell delivery systems, including: (1) plasmid transfection; (2) viral transduction that stably integrates the gRNA sequence into the genome; (3) a gRNA (singleplex or multiplex) that is directly associated with Cas9 for ribonucleoprotein (RNP) delivery. Any of these Cas9-guide RNA delivery systems can be used to genetically modify cells grown in tissue culture or directly on tissue using RNP delivery.
Base editors
Many mutations are single nucleotide variants, many of which lead to amino acid substitutions {Goodwin, 2016; Landrum, 2016}. Conventional CRISPR does not generate substitutions. Rather, CRISPR/Cas9 and other enzymes in the class introduces double stranded DNA breaks (DSBs) - this genomic alteration leads to insertions and deletions (indels). Given the general nature of the Cas9 break, other types of genomic alterations can be introduced such as large deletions and rearrangements (Kosicki, 2018; Shin, 2017). Base editors introduce point mutations without a DNA double-strand break (DSB) or a requirement for template donor DNA (Gaudelli, 2017; Komor, 2016; Nishida, 2016; Kim, 2019). There are two general classes which include cytosine base editors (CBEs) and adenine base editors (ABEs). CBEs were developed by combining APOBEC1 enzymes, which remove an amine group from cytosine, with catalytically dead Cas9 (dCas9) or Cas9 nickase (nCas9) (Komor, 2016). ABEs involve fusing an adenine deaminase to the Cas9 variant. Because an adenine deaminase accepts single-stranded DNA as a substrate, researchers created new ssDNA-targetable enzymes with engineered adenine deaminases (Gaudelli, 2017; Kim, 2019).
Based editors allow for engineering in specific point mutations into the genome and allows their detection at single cell resolution. It does this by using base editor technology to introduce the mutation followed by single cell long read sequencing to determine which cells have the mutation Single cells undergo targeted sequencing of the engineered mutation. This is done by targeted sequencing of the specific gene undergo base editors. This involves using a special primer that provides multiplexed amplification of the cDNA target that were engineered. Then, the targeted products undergo long read sequencing and the mutation is identified.
The consequences of a point mutation on the cell’s function can be determined by integrating the long-read sequencing to identify the cells with the point mutation and the short read sequencing which identifies changes in that specific cell’s gene expression. Combining the long and short read cell barcodes, one has the single cell sequence data of both complete cDNAs and gene expression. The process of using a based editor to introduce engineered gene mutations into single cells is illustrated in Fig. 5.
The first step is the binding of base editor-gRNA complex to its target DNA. With the base pairing of the gRNA molecule and the complementary target DNA strand, approximately 20nt of single-stranded DNA are displaced. Subsequently, the deaminase enzyme edits the target DNA bases within this ssDNA (i.e., R-loop).
Base editors work efficiently in human cells with comparable efficiencies of Cas9 (Kim, 2019). Therefore, base editors are an adaptable tool for introducing various genetic substitution mutations in the genome. Using a specially designed gRNA that acts as a repair template, prime editors introduce the mutation.
TESTING A SINGLE CELL OR A SET OF CELLS
An assay disclosed herein can be applied across a range of cell numbers.
For example, if the assay is conducted on single cells for assigning genetic modification, e.g., CRISPR edits, a multiplexed transduction with a guide RNA library can be conducted. In the case of sets of cells and assigning genetic modifications, e.g., CRISPR edits, a multiplexed transduction can be conducted on sets of cells that are grouped into different partitions, separate wells, or separate plates.
Post-genetic modification, e.g., CRISPR transduction, the cells can be grown, harvested as a single cell suspension and cDNA can be prepared. A cell indexing barcode can be incorporated into the cDNAs at the 5’ or 3’ end such that one can assign a set of cells to a given guide RNA library. The barcode can be used on different number of cells ranging from one cell to a group of cells.
In the case of more than one cell, cells can be grown in partitions, wells, or plates. For example, each set of cells can be transduced with a CRISPR-Cas9 involving a multiplexed pool of guide RNAs. Each set of cells (N = 1 or greater) can then analyzed for the presence of the CRISPR edit in the mRNA sequence.
Intact cDNA can be prepared for sequencing without any fragmentation. Avoiding fragmentation retains the full length of the cDNA as an extended molecule. To identify the genetic modification, e.g., CRIPS edit, targeted sequencing can be performed on the gene or sets of genes that were targeted for modification.
The targeted sequence library preparation can involve: (1) PCR amplification, (2) selective hybridization with a bait oligonucleotide or (3) single primer extension of the target gene or cDNA from the target gene. The sequencing library preparation can be performed with a full-length cDNA without any fragmentation.
With the targeted cDNA library, long-read sequencing can be conducted. For example, the Oxford Nanopore™ or Pacific Biosciences™ sequencing methods, which generates long-read sequences can be used. Using the long-read sequencing, the cell indexing barcode can be first identified from the long-read sequence to determine which cells were exposed to a given guide RNA. Then, how the target cDNA sequence was changed could be determined. The long read sequence can cover the entire mRNA sequence and, therefore, a specific genetic modification can be found at any location in the transcript and still be linked to the cell index barcode at the 5’ or 3’ end.
To link the CRISPR edit genotype to other molecular phenotype features, the cell indexing barcode from the long read with the CRISPR genotype can be matched with the same cell indexing barcode from short read data (RNA-Seq or antibody barcodes). Linking these two barcodes enables the CRISPR genotype to be assigned to a given molecular phenotype for a specific population of cells.
Overall, certain embodiments of the invention disclose how single cell long read analysis and genetic modifications, e.g., CRISPR edits, can be used to directly confirm the genetic modifications and used in cellular engineering applications. METHODS
Certain embodiments of the disclosure provide a method for analyzing cells, comprising:
(a) editing a target gene in a population of cells to produce genetically modified cells;
(b) on the basis of single cells or sets of cells, sequencing substantially the entirety of the coding sequence of the mRNA encoded by the target gene from the genetically modified cells or sets of cells to identify a modification in the target gene; and
(c) on the basis of single cells or sets of cells, comparing the identified modification in the target gene with the modification expected in step (a).
Any target gene can be genetically modified. Also, any portion of the target gene can be modified, which includes: exon-intron junction, protein-coding sequence of a gene, promoter of a gene, or 3’ untranslated region of a gene.
The term “on the basis of a single cell” as used herein indicates that the analysis is made on a cell-by-cell basis. For example, the coding sequence of the mRNA encoded by the target gene is identified in individual cells from the population of genetically modified cells. Similarly, the identified target modification in individual cells is compared to the modification expected in step (a).
The term “on the basis of sets of cells” as used herein indicates that the analysis is made on different sets of cells. For example, the coding sequence of the mRNA encoded by the target gene is identified in different sets of cells, particularly, wherein different sets of cells could be descendants from different cells from the population of genetically modified cells. Similarly, the identified target modification in sets of cells is compared to the modification expected in step (a).
Thus, in some cases, sequencing the mRNA encoded by the target gene from the genetically modified cells is performed on the basis of single cells. Also, in some cases, sequencing the mRNA encoded by the target gene from the genetically modified cells is performed on the basis of sets of cells.
In some cases, step (b) can comprises: (i) separating single cells or sets of cells from the genetically modified cells,
(ii) reverse transcribing the mRNAs encoded by the target genes from the single cells or sets of cells separated in step (i) to produce cDNAs, wherein the primer for the reverse transcription of the mRNAs comprises a unique barcode on the basis of single cells or sets of cells,
(iii) pooling the cDNAs from the genetically modified single cells or sets of cells,
(iv) amplifying the cDNAs and sequencing the amplification products, and
(iv) depending on the unique barcodes in the amplification products, identifying the modifications in the mRNAs on the basis of single cells or sets of cells.
Thus, different primers for reverse transcription are used in different single cells or sets of cells.
For example, to analyze 100 cells, 100 unique barcodes can be incorporated in 100 reverse transcription primers, each of which contains: 1) a sequence that binds to the mRNA encoded by the target gene and 2) a unique barcode. In addition, the reverse transcription primer can contain a primer binding site that could be used to subsequently amplify the cDNA.
The sequence that binds to the mRNA encoded by the target gene can be the same or different in the different reverse transcription primers.
Similarly, to analyze 100 sets of cells, 100 unique barcodes can be incorporated in 100 reverse transcription primers, each of which contains: 1) a sequence that binds to the mRNA encoded by the target gene and 2) a unique barcode. The sequence that binds to the mRNA encoded by the target gene can be the same or different in the different reverse transcription primers.
In some cases, the cDNAs are attached to the beads and the method comprises in step (iii), pooling the beads. For example, the reverse transcription primers could be attached to the beads, thereby attaching the amplified cDNA to the beads.
The step of separating single cells or sets of cells from the genetically modified cells comprises separating the single cells or sets of cells in individual wells of a multi-well plate or separating the cells in individual droplets in an emulsion. When single cells are cultured in individual wells, imRNA can be isolated from the single cells.
When the single cells can also be separated in individual droplets in an emulsion, the step of producing the cDNA can be performed in the droplet.
Alternatively, single cells cultured in individual wells can be grown into multiple cells, i.e., to produce sets of cells that descend from the single cells. mRNA can be isolated from the sets of cells and treated according to the methods disclosed herein.
In some cases, the cDNAs produced from single cells or sets of cells are sequenced. In certain embodiments, substantial entirety of the mRNAs encoded by the target gene is sequenced.
As used herein, the term “substantial entirety of an mRNA” includes sequences from the first exon to the last exon of a transcript with the possible exception of the sequences at the 5’ end of the first exon and the 3’ end of the last exon. For example, the sequences at the 5’ end of the first exon and the 3’ end of the last exon could be used for primer binding and, therefore, mutations in these sequences may not be detected.
In some cases, the method disclosed herein comprises PCR amplifying the cDNAs containing the unique barcodes. As noted above, the reverse transcription primer can contain a primer binding site that could be used to subsequently amplify the cDNA. Accordingly, one of the primer pairs that amplifies a cDNA can bind to the primer binding site introduced into the cDNA via the reverse transcription primer. In certain such cases, the other primer can bind to the sequence at the 3’ end of the cDNA.
Alternatively, primer pairs can be designed that specifically bind to the sequences at the 5’ and the 3’ ends of the cDNA. Such sequences can be designed based on the sequence of the target gene. Therefore, in some cases, the primer pair for amplifying the cDNAs comprises: a first primer that hybridizes with the sequence at the 5’ end of the mRNA encoded by the target gene and a second primer that hybridizes with the sequence at the 3’ of the mRNA encoded by the target gene, and wherein one or both the primers in the primer pair comprise a unique barcode. Thus, the method disclosed herein involves amplifying the cDNA produced from a single cell or a set of cells using a primer pair. The amplification product so produced contains the barcode introduced into the cDNA thereby indicating the source cell or group of cells of the cDNA.
LONG-READ SEQUENCING
The amplified cDNA can be sequenced, particularly, using long-read sequencing. In some cases, the long-read sequencing comprises single molecule real time (SMRT) sequencing or nanopore sequencing. The SMRT sequencing can be circular consensus sequencing or continuous long read sequencing.
Certain details of long-read sequencing, for example, SMRT (developed by Pacific Biosciences (PacBio)™) and nanopore sequencing (developed by Oxford Nanopore Technologies™) are described by the publication Logsdon etal. (2020), Long-read human genome sequencing and its applications, Nature Reviews Genetics, Vol. 21 , pages 597-614, which is herein incorporated by reference in its entirety.
Briefly, in SMRT sequencing, an amplicon is ligated to hairpin adapters to form a circular molecule, called a SMRT bell. The SMRTbell is bound by a DNA polymerase and loaded onto a SMRT Cell for sequencing. A SMRT Cell can contain up to 8 million zero-mode waveguides (ZMWs). ZMWs are chambers of picolitre volumes. Light penetrates the lower 20-30 nm of SMRT Cells. The SMRTbell template and polymerase become immobilized on the bottom of the chamber.
During the sequencing reaction, fluorescently labelled deoxynucleoside triphosphates (dNTPs) are incorporated into the newly synthesized strand, a fluorescent dNTP is held in the detection volume, and a light pulse from the well excites the fluorophore. A camera detects the light emitted from the excited fluorophore, which records the wavelength and the position of the incorporated base in the nascent strand. The DNA sequence is determined by the changing fluorescent emission that is recorded within each ZMW
In nanopore sequencing, long DNA strand is tagged with sequencing adapters preloaded with a motor protein on one or both ends. The DNA is combined with tethering proteins and loaded onto the flow cell for sequencing. The flow cell contains protein nanopores embedded in a synthetic membrane. The tethering proteins bring the molecules to be sequenced towards the nanopores and as the motor protein unwinds the DNA, an electric current is applied, which drives the negatively charged DNA through the pore. The DNA is sequenced as it passes through the pore and causes characteristic changes in the current.
Long-read sequencing can sequence at least about 500 or at least about 600 bases. Particularly, long-read sequencing sequences at least 800, at least 1000, at least 1200, at least 1400, at least 1600, at least 1800, at least 2000, at least 2500, or at least 3,000 bases of the amplified products. Thus, the long-read sequence can be used to sequence a target mRNA of at least 500 to at least 3,000 bases in length.
TRANSCRIPTOME ANALYSIS
In some cases, the method comprises further sequencing the transcriptomes of the genetically modified cells on the basis of single cells or sets of cells. To that end, the method comprises conducting short-read sequencing of the transcriptome on the basis of single cells or sets of cells. Thus, mRNA is isolated from the single cells offsets of cells and analyzed via transcriptome analysis by short-read sequencing.
In certain such cases, sequencing the transcriptomes comprises:
(i) separating single cells or sets of cells from the genetically modified cells,
(ii) reverse transcribing the transcriptomes from the genetically modified cells or sets of cells separated in step (i) to produce cDNAs of the transcriptomes, wherein the primers used for the reverse transcription comprise a unique barcode on the basis of single cells or sets of cells,
(iii) pooling the cDNAs from the transcriptomes of the genetically modified cells or sets of cells,
(iv) amplifying the cDNAs and sequencing the amplification products of the cDNAs of the transcriptomes, and (iv) depending on the unique barcodes in the amplification products produced in step (iv), quantifying the transcriptomes from the genetically modified cells on the basis of single cells or sets of cells.
The reverse transcribing the transcriptomes can be performed using primers comprising: 1 ) random nucleotide sequences, for example, random hexamers, or 2) oligo-dT sequence. The primers can have a unique barcode on the basis of single cells or sets of cells.
In some cases, the same barcode can be used in long-read sequencing of mRNA sequencing of the target gene from a cell or set of cells and short-read sequencing of the transcriptome of the cell or the set of cells. Thus, the same barcode could be used to attribute a cDNA sequence and the transcriptome sequence to a cell or a set of cells.
Transcriptome from the single cells or sets of cells can be sequenced using short-read sequencing. In some cases, the short-read sequencing comprises paired-end sequencing. Certain details of short-read sequencing are also described by the Logsdon etal. (2020) publication.
EFFICACY ANALYSIS OF METHODS OF GENETIC MODIFICATIONS
In some cases, the disclosure provides a method of determining efficacy of a molecular tool for editing the target gene. Certain such methods comprise methods of analyzing cells as disclosed herein and further comparing, on the basis of single cells or sets of cells, the observed modification to the target gene with the modification expected in the target gene. Based on the number of single cells or the number of sets of cells that exhibit the desired modification as compared to the total number of genetically modified cells or sets of cells, the efficacy of the molecular tool for producing the genetic modification can be determined.
For example, if 100 cells are genetically modified by CRISPR-mediated gene editing using a specific gRNA and 40 cells out of the 100 cells exhibit the desired genetic edit, then the efficacy of the specific gRNA to produce the desired genetic edit is 40%. MODIFYING AND TESTING MULTIPLE TARGET GENES
In certain cases, the methods disclosed herein involve editing one or more target genes. Certain such methods comprise methods of analyzing cells as disclosed herein and further comparing:
(a) editing one or more additional target genes in the population of cells to produce the population of genetically modified cells;
(b) the basis of single cells or sets of cells, sequencing substantially the entirety of the coding sequences of the mRNAs encoded by the one or more additional target genes from the genetically modified cells to identify one or more modifications in the one or more additional target genes; and
(c) on the basis of single cells or the sets of cells, comparing the identified modification to the one or more additional target genes to the modifications expected in the one or more additional target genes.
Like the methods of analyzing a target gene disclosed herein, the mRNA encoded by one or more additional target genes can be analyzed on the basis of single cells or sets of cells.
While amplifying the mRNAs encoded by one or more additional target genes, the same barcode can be incorporated in the cDNAs produced from one cell or one set of cell. Thus, multiple reverse transcriptase primers can be designed, each primer directed at producing cDNA from a different target mRNA but all reverse transcriptase primers having one barcode. Thus, all cDNAs from a single cell or a set of cell contains the same barcode.
Thus, in some cases, step (b) can be performed by:
(i) separating single cells or sets of cells from the genetically modified cells,
(ii) reverse transcribing the mRNAs encoded by the one or more additional target genes from the cells or sets of cells separated in step (i) to produce cDNAs for the one or more additional target genes, wherein the primer for the reverse transcription of the mRNAs comprises a unique barcode on the basis of single cells or sets of cells,
(iii) pooling the cDNAs from the genetically modified cells or sets of cells, (iv) amplifying the cDNAs for the one or more additional target genes and sequencing the amplification products, and
(iv) depending on the unique barcodes in the amplification products, identifying the modifications in the mRNAs for the one or more additional target genes on the basis of single cells or sets of cells.
The methods described above for sequencing cDNAs for a target gene can be similarly applied for sequencing cDNAs for the one or more additional target genes. Particularly, the mRNA for the one or more additional target genes can be sequenced using long-read sequencing. Certain details of long-read sequencing are described above and such are also applicable to sequencing one or more additional target genes.
KITS
Also provided are kits having one or more components and/or reagents and/or devices, where applicable, for practicing one or more of the above-described methods. The subject kits may vary greatly. Kits of interest include those having one or more reagents mentioned herein, and associated devices where applicable, with respect to the steps of:
(a) editing a target gene in a population of cells to produce genetically modified cells;
(b) on the basis of single cells or sets of cells, sequencing substantially the entirety of the coding sequence of the mRNA encoded by the target gene from the genetically modified cells or sets of cells to identify a modification in the target gene; and
(c) on the basis of single cells or sets of cells, comparing the identified modification in the target gene with the modification expected in step (a).
Kits may include certain combinations of components in a single reaction vessel. Kits may include different components in different vessels.
In some cases, a kit comprises: reagents for genetic modifications in a target gene, such as CRISPR-Cas9 and gRNA; transfection reagents, cells or cell lines, media for culturing the cells, reverse transcription primers, primer pairs for amplification of cDNAs, reagents for sequencing, etc.
A person of ordinary skill in the art can readily design a kit according to the details of the methods disclosed above and such embodiments are within the purview of the invention.
UTILITY
The methods described in this disclosure find use in a variety of applications. Applications of interest include, but are not limited to: research applications and therapeutic applications. Methods of the invention find use in a variety of different applications including any convenient application where identifying effects of genetic modifications, e.g., CRISPR-mediated genomic editing is desired.
For example, the method finds particular use in analyzing and/or engineering therapeutic cells, e.g., genetically engineered cells that are destined for therapeutic use, e.g., stem cells or immune cells. For example, the method may be used to analyze knockouts and/or modifications in T cells or natural killer (NK) cells. In some embodiments the method may be used to analyze therapeutic cells that have been modified by CRISPR editing to be allogenic. In other embodiments, the method may be used to analyze immune cells that have a knockout in a immune checkpoint inhibitor such as PD1 , CTLA-4, TIM-3, VISTA, LAG-3, IDO or KIR, etc., that have a knockout in an endogenous receptor such as a knockout in TRAC or TRBC, etc, or that have CRISPR-mediated edit that modifies the expression of a cytokine or other inflammatory molecule or a component of a signal transduction pathway, etc. In some embodiments, the cells being analyzed may be primary immune cells, or they may be expanded primary immune cells.
MATERIALS AND METHODS
Cell culture conditions HEK293T cells and Cas9-stable HEK293T cells were maintained in Dulbecco's Modified Eagle Medium (DMEM) supplemented with 10% Fetal Bovine Serum (FBS). Jurkat (ATCC TIB-152) and Cas9-stable Jurkat cells were maintained in Roswell Park Memorial Institute (RPMI) 1640 Medium supplemented with 10%
FBS.
Transfection and electroporation condition
We used 1 .2 x 106 HEK293T cells to transfect the Cas9 expression plasmid (1 OOOng) and sguide RNA plasmids (1 OOOng). For electroporation, 2 x 105 Jurkat cells were used. We used 1250 ng of the Cas9 protein and then we used 585 ng of sguide RNA were incubated at room temperature for 10 minutes before the electroporation to form the RNP complex. RNP was added to Jurkat cells and electroporated using Neon electroporation system. Electroporated cells were transferred to a 24-well plate containing the culture medium. After six days of transfection or electroporation, cells were harvested. For determining the CRISPR edit in one cell, the individual cells were processed to extract their RNA. From each cell, cDNA is prepared with a cell indexing barcode added to either the 5’ or 3’ end.
Long-read sequencing
From the single-cell full length cDNA, 25 ng were used to amplify transcripts. Primer sequences: Partial_R1 - CTACACGACGCTCTTCCGATCT (SEQ ID NO: 1 ), RACK1_ex8 - ACACT CGCACCAGGTT GT CCG (SEQ ID NO: 2), PTPRC_ex7 - CCAG AAGGGCT CAG AGTGGT (SEQ ID NO: 3), MYL6_ex7 - ACACAGGGAAAGGCACGGACTCTGG (SEQ ID NO: 4). Taq polymerase was used for amplification. Libraries were prepared with 900fmol of each amplicon for MinlON flow cell FLO-MIN106D (Oxford Nanopore Technologies, Oxford, UK) or 80fmol for Flongle flow cell FLO-FLG001 (Oxford Nanopore Technologies) using Native Barcoding Expansion and Ligation Sequencing Kit (Oxford Nanopore Technologies) as per the manufacturer’s protocol. Libraries were sequenced on MinlON over 48h. Lentiviral guide RNA library production
The oligonucleotide pool for guide RNA library cloning were synthesized. Amplified guide RNA cassettes were cloned into a plasmid for expression.
Lentivirus production
2.0 x 106 HEK293T cells were plated 24h prior to transfection. Cells were transfected with lentiviral sguide RNA library (2000ng), psPAX2 (1500ng, Addgene plasmid #12260) and pMD2.G (500ng, Addgene plasmid #12259) using a lipofectant agent. The viral supernatant was collected after 48hr of transfection, filtered through a 0.45miti filter, and used.
Lentivirus transduction
To 1.0 x 105 Cas9-stable Jurkat, the lentiviral supernatant and 8pg of polybrene were added and the mixture was centrifuged at 800g for 30 minutes at 32 degrees. After that, cell pellets were resuspended to fresh media and plated in a 6- well plate. After 72 hours, transduced cells were selected by puromycin.
Single cell transcript isoform analysis
Short read transcripts: Basecalling for 5’ gene expression libraries was performed followed by alignment to reference genome GRCh38, and transcript quantification. In preparation for integrated analysis, a cell transcript data matrix was processed by removing cells with fewer than 100 or more than 8000 genes, cells with more than 30% mitochondrial genes. Additionally, any genes present in 3 or fewer cells were removed. Dimension reduction was performed using principal component analysis and UMAP with 30 principal components and cluster resolution of 0.8.
Long read transcripts: Basecalling was performed and aligned to the reference genome GRCh3. A custom python script was used to identified reads which span exon 1 through exon 8 for RACK1, exon 3 through exon for PTPRC, or exon 5 through exon 7 for MYL6. For each read, exons which were present in the transcript were identified and recorded: for example 1234-678 indicated a RACK1 transcript which skips exon 5.
Integration of long and short reads: Using knowledge of the adapter/cell indexing barcode/unique molecular barcode structure in the reads as well as the valid single cell indexing barcodes from short-read sequencing, the putative long- read barcode is identified by evaluating the soft clipped portions of the aligned long reads, which are extracted using a custom python script. A second custom python script used a machine-learning approach to identify the barcode. First, the list of valid short-read barcodes was vectorized using with a kmer length of 8 to create a reference list. The 5’ soft-clipped region of each read was then vectorized in the same way and compared to the created reference using a cosine similarity metric. Similarly, the 3’ soft-clipped region of each read was evaluated by matching the reverse-complement of the soft-clipped sequence to the reference list.
For each read, the barcodes with highest cosine similarity to the reference list were evaluated further. Edit distance to the reference barcode was calculated for each 16bp window across the search sequence for the top five barcodes with non zero cosine similarity score. The barcode with lowest edit distance (and in cases of a tie, the highest cosine similarity score), was selected for final filtering. If the sequenced cDNA barcode had an edit distance < 3 from the reference short-read barcode it was considered a matching barcode, otherwise the read was not considered a match to any of the short-read barcodes and was excluded from further integrated analysis. Output from this script was then summarized using awk and bash commands to provide transcript counts per isoform exon pattern identified from the initial long read processing for each barcode. A clustered heatmap was generated showing the proportion of each isoform per cell indexing barcode.
Guide RNA in-vitro transcription
The sguide RNA was in-vitro transcribed by T7 RNA polymerase. Templates for sguide RNA were generated by extension of two complementary oligo nucleotides. Transcribed RNA was purified by column purification. Purified RNA was quantified by fluorimetry.
Jurkat cell stimulation
2 x 106 Jurkat cells were stimulated using half of the recommended concentration in 6-well plate for 24h. Stimulated cells were subjected to cDNA preparation for single cells.
The following example(s) is/are offered by way of illustration and not by way of limitation.
EXAMPLES
The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are all or the only experiments performed. Efforts have been made to ensure accuracy with respect to numbers used (e.g. amounts, temperature, etc.) but some experimental errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, molecular weight is weight average molecular weight, temperature is in degrees Centigrade, and pressure is at or near atmospheric.
General methods in molecular and cellular biochemistry can be found in such standard textbooks as Molecular Cloning: A Laboratory Manual, 3rd Ed. (Sambrook et al., HaRBor Laboratory Press 2001); Short Protocols in Molecular Biology, 4th Ed. (Ausubel et al. eds., John Wiley & Sons 1999); Protein Methods (Bollag et al., John Wiley & Sons 1996); Nonviral Vectors for Gene Therapy (Wagner et al. eds., Academic Press 1999); Viral Vectors (Kaplift & Loewy eds., Academic Press 1995); Immunology Methods Manual (I. Lefkovits ed., Academic Press 1997); and Cell and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle & Griffiths, John Wiley & Sons 1998), the disclosures of which are incorporated herein by reference. Reagents, cloning vectors, cells, and kits for methods referred to in, or related to, this disclosure are available from commercial vendors such as BioRad, Agilent Technologies, Thermo Fisher Scientific, Sigma-Aldrich, New England Biolabs (NEB), Takara Bio USA, Inc., and the like, as well as repositories such as e.g., Addgene,
Inc., American Type Culture Collection (ATCC), and the like.
Example 1: Identifying CRISPR edits using long reads that covers mRNA - full lenpth cDNA from individual cells
As a proof-of-concept demonstration, we designed and conducted an experiment where we used CRISPR to target the RACK1 gene. We used Cas9 and a guide RNA that modifies the splicing acceptor of exon 5 (FIG. 2B). This gene encodes a receptor for activated C kinase 1. It plays a role in intracellular protein shuffling and anchoring. This gene was chosen because it is one the most highly expressed genes in the human embryonic kidney 293T (FIEK293T) cell line. In addition, it has eight exons that could be targeted with CRISPR. We selected a guide RNA sequence for the exon 5 target and generated a plasmid construct expressing both Cas9 and target guide RNA. This plasmid was transfected into HEK293T cells.
After six days, we prepared cDNAs with cell indexing barcodes at the 5’ end. Each cell indexing barcode represents one cell. The cell indexing barcode consists of a DNA sequence that is specifically added to the cDNA extracted from an individual cell or set of cells (FIG. 1). As noted, each cell indexing barcode represents only one cell. In the case of populations of cells, the cell indexing barcode represents a group of cells (two or more).
From this cDNAs containing a barcode, the RACK1 transcript was amplified using two primers that included sequence from the 5’ adaptor for the sequencing library and the last exon of RACK1. The amplified full-length cDNA underwent nanopore sequencing (Oxford Nanopore) which generates long reads. We performed base calling and aligned the long reads to the reference genome, GrCh38. As noted, the full length cDNAs had cell indexing barcodes at the 5’ end - as noted, this barcodes enables the assignment of a long read covering a transcript to its cell or cells of origin.
After aligning the long reads to the reference genome, a portion of each long- read sequence did not align to the human genome. This non-aligning sequence represents the cell indexing barcode which is not found in the human genome sequence. We examined this part of the sequence to identify matches to the whitelist of cell indexing barcodes identified in short-read single cell sequencing. For the first step, the soft-clipped sequence and the whitelist of barcodes are vectorized using 8-mers, a sequence of eight bps. The frequency of these short sequence tracts (i.e. k-mers) were determined from a whitelist of predesignated barcodes representing the ground truth. Using a cosine similarity metric, we compared the white-list cell indexing barcodes with the frequency of occurrence in the soft-clipped sequence from the long reads. The five barcodes with the highest cosine similarity score were evaluated with the remainder having a score of zero discarded. The edit distance between each 16bp region of the soft-clipped sequence and the whitelist barcode was calculated. A valid match between the two barcodes was based on an edit distance of two or less.
The long sequence read covers the entire mRNA sequence. Therefore a specific genotype edit can be identified at any location in the transcript and still be linked to the cell index barcode at the 5’ or 3’ end. Using long read sequencing of target cDNAs, we characterized the sequence of the mRNA in each individual cell. This analysis identified the different RACK1 transcript isoforms as well as the cell indexing barcodes identified the individual cell from which the mRNA originated. Then, we aggregated reads per cell indexing barcode and used hierarchical clustering to determine the distribution of different cDNA sequences, representing the full length mRNA, among cell subpopulations.
In our analysis of the wildtype cells that did not undergo CRISPR transduction, we determined that 5030 (90%) out of 5074 cells had full length RACK1 mRNA. This result demonstrated how long-read analysis combined with sequencing full length cDNA (imRNA transcripts) characterizes the CRISPR alteration of a target gene.
From the cells that were transduced with CRISPR, 6028 cells among 7548 cells had more than 33% of their RACK1 mRNA transcripts lacking exon 5, thus confirming that these specific cells had the CRISPR genome edit (FIG. 2C). We performed hierarchical clustering based on the expression levels of the different transcript isoforms present in each cell. A subset of cells (Cluster 1) contained CRISPR edits in the splicing acceptor from all 3 allele of RACK1. Some cells had predominantly full transcripts (1234 out of 7548 cells, 82% full length transcript in average) which delineated the cell population which did not have a CRISPR edit as grouped in Cluster 2. Using this approach, we identified the individual cells with the CRISPR edit based on the changes in the mRNA transcript sequence.
Example 2 - Multiplexed CRISPR assay
Next, we designed a multiplexed CRISPR assay that targeted multiple exons in RACK1. This experiment involved selecting guide RNAs that targeted splicing acceptor sites of RACK1 exons 2-7. Guide RNA sequences were subclones into a plasmid. We transfected a pooled plasmid library containing all guide RNAs into FIEK293T cells with stable Cas9 expression (Methods). After transducing the guide RNA library into cells, we incubated the cells and then harvested them. The cells were processed to generate a cDNA library with cell-specific barcodes. For this case, each cell indexing barcode represented a single cell. The cDNA with cell indexing barcodes was prepared for long read sequencing. The long read sequence covers the entire mRNA sequence and, therefore, a specific genotype edit can be found at any location in the transcript and still be linked to the cell index barcode at the 5’ or 3’ end. From the reads that covered the entire RACK1 cDNA, we determined the structure of the transcript and the composition of the cell indexing barcode.
Our results indicated that the multiplexed CRISPR guide RNAs generated edits leading to different exon-skipping events across different cells (FIG. 2D). We identified which cells had specific guide RNA based on the changes in the mRNA sequence. For example, we determined that cells in Cluster 3 expressed full length RACK1 transcripts dominantly (83% full length transcripts per cell on average) thus indicating that this subpopulation had not undergone genome editing. Overall, we used this information to subset the cells into ones which had a received a specific guide RNA.
We linked the long-read data indicating which cells had the CRISPR edit to the matching cells with additional genomic information such as gene expression. Specifically, the different CRISPR-generated RACK1 isoforms changes the gene expression for a given cell or set of cells. For this analysis, we integrated the long- read sequencing of RACK1 cDNA with short-read sequencing from cDNA with the cell indexing barcodes for individual cells. This sequence linkage can use any type of single cell library process where one matches the cell indexing barcode sequences between the two different sequence data sets (single cell long read and single cell short read) that come from the same cell population. Then, we compared short-read gene expression profiles among the RACK1 isoform clusters for cells which had undergone multiplexed CRISPR editing (FIG. 2D).
First, using the short-read data for gene expression from each cell, we compared the expression levels of RACK1 between the non-edited cells versus RACK1 edited cells. As expected, RACK1 expression among the non-edited cells (Cluster 3) was significantly higher than CRISPR-edited cells (6.59 fold change, P =
1.85e-9) (FIG. 2E). Next, we compared RACK1 expression from each cluster of cells having RACK1 exon 5 CRISPR edit. The non-edited cells (cluster 2) had 1.53 fold higher (P = 1 04e-194) RACK1 expression level than the cells in other clusters (FIG. 2C). These experiments demonstrated that this approach enables the direct characterization of CRISPR edits based on the specific target changes in the mRNA sequence which include the overall level of expression for a given target gene.
Different splicing factors influence alternative splicing events. We leveraged this aspect of isoform regulation to demonstrate another application of our approach. Namely, we determined how different splicing factors affect the mRNA sequence. This mechanism of isoform generation is particularly crucial for cells involved with the immune system, where increased functional flexibility is paramount. For this experiment, we chose to study the PTPRC gene. Expressed in T cells, PTPRC is a transmembrane phosphatase - its pre-mRNA alternative splicing is critical for changing T cell regulatory states. PTPRC has five highly expressed isoforms. This includes two short ones where there is substantial degree of exon loss and longer isoforms where the majority of exons from the variable region are retained. For example, CD4-CD8-double negative T cells and NK precursor cells preferentially express longer isoforms like RABC and RBC and when activated, T cells and NK cells preferentially express shorter isoforms like RO and RB.
We selected a series of guide RNAs targeting a set of 16 splicing factor genes (Table 1 ). For these experiments, we used the human Jurkat human cell lines which is derived from a T cell leukemia and stably expressed Cas9.We chose splicing factors which are expressed and play a critical role pre-mRNA processing in Jurkat cells.1 16 17 We integrated this multiplexed CRISPR assay with cDNA libraries contained cell indexing barcodes for long-read sequencing. With long read sequence, we determined how each splicing factor contributed to changes in
PTPRC’s transcript isoform structure. We transduced a guide RNA lentiviral library targeting 16 splicing factors (two guide RNAs per gene) and five non-targeting guide RNAs as negative controls (Table 1). After 14 days, we harvested the cells, generated single cell libraries and conducted sequencing with both short and targeted long reads of PTPRC.
Table 1
Figure imgf000034_0001
Figure imgf000035_0001
Figure imgf000036_0001
To identify whether the target gene had been subject to a CRISPR edit, we sequenced the full-length cDNA with long reads. This used the same process as previously described. For each target we identified the specific CRISPR edit genotype as well as identifying the specific cells containing the edit (data not shown). For example, the gene SRSF5 was targeted with a Cas9-guide RNA which generated a deletion within an internal exon. The long read sequence showed which cells had the specific edit of this gene while cells without the gRNA did not have this edit.
To examine the downstream effects of knocking out these splice factor genes, we used long read sequencing of the full-length cDNA to determine the structure PTPRC transcripts across the individual cells. Similar to what we described previously, we characterized each cell’s RNA-seq profile with short read data, the sequence of the target gene and it mRNA (for CRISPR) and the downstream PTPRC mRNA (downstream gene) using long reads. For the PTPRC gene, the isoform nomenclature includes the shorter one, RO and RB which are highly expressed in naive or activated T-cells. The longer ones are referred to as RAB, RBC and RABC.
We determined the expression and relative ratios of the five most abundant PTPRC transcript isoforms for these cells using long-read sequencing. For each long read, its isoform structure was identified through alignment with the exons, and the cell indexing barcodes were identified as previously described. Among the different cells with the CRISPR gene targets versus the negative control, we compared the average ratio of short PTPRC isoform category and the fold change differences (FIG. 3B). The knock-out of HNRNPLL and SRSF5 reduced PTPRC exon skipping events resulting in lower RO and RB abundance (2.01 -fold change and 1.18-fold change respectively). In comparison, the knock-out of PCBP2 and HNRNPD increased exon skipping events; this resulted in higher RO and RB abundance (1.15-fold change and 1.16-fold change respectively). Therefore, HNRNPLL and SRSF5 induce exon skipping of PTPRC and PCBP2 and HNRNPD inhibit exon skipping. Although HNRNPLL and SRSF5 knock-outs inhibited PTPRC exon skipping, their isoform expression patterns were significantly different (data not shown). The ratio of RBC and RABC isoforms was higher in the knock-outs than in non-targeted cells. The ratio of RBC was comparable between the two knock-outs but the ratio of RABC was much higher for the HNRNPLL knock-out (5.60-fold change, P = 5.90e-37). The consequences of PCBP2 and HNRNPD knock-outs were nearly equivalent with respect to the expression of the RB isoform (P = 0.1). However the PCBP2 knock-out had greater expression of the RO isoform (1.20-fold change, P = 0.017) (data not shown).
From our review of the results of the 16 gene analysis, CRISPR edits of HNRNPLL had the most significant effect on PTPRC isoform regulation. We compared the cells expressing the nontarget-1 guide RNA versus the cells expressing the HNRNPLL -1 guide RNA (FIG. 3C). In contrast to the cells with nontarget-1 guide RNA, the cells expressing the HNRNPLL -1 guide RNA had a significantly different expression patterns and ratios of PTPRC isoforms. These cells had a relatively lower expression of RO and RB isoforms (2.45-fold, P = 1.16e-32).
To verify this result, we selected the HNRNPLL gene for a single knock-out experiment. We used an electroporation approach to introduce the Cas9 ribonucleoprotein (RNP) and the guide RNA into Jurkat cells (Methods). Among the HNRNPLL alleles of electroporated cells, about 84% possessed CRISPR-induced mutations based on genomic DNA genotyping. Six days after electroporation, we prepared single cell cDNA libraries from the wild-type Jurkat cells and KO pool cells. Subsequently, we performed single cell short-read sequencing to enumerate RNA expression and long-read sequencing to determine the PTPRC isoforms per each cell. Similar to the previous single-cell CRISPR screen result, wild-type cells demonstrated abbreviated transcripts (i.e. RO, RB) while HNRNPLL RNP-treated cells expressed the longer transcript isoforms (i.e. RABC, RBC) (7.35-fold, P < 1.0e- 5, FIGS. 3B and 3C). When comparing the RNP versus lentivirus based HNRNPLL guide RNAs, we observed similar PTPRC transcript structures and expression levels (FIGS. 3B and 3C).
This experimental system had an additional functional readout to assess PTPRC isoform expression. Exposing these cells phorbol 12-myristate 13-acetate (PMA) and ionomycin induces Jurkat cells to express shorter isoforms of PTPRC. Taking advantage of this property, we challenged both CRISPR-treated and wildtype cells with this molecule combination then subsequently performed single-cell sequencing. Overall, our results showed that PMA and ionomycin stimulation increased the differences in isoform expression between the wild-type cells and
HNRNPLL RNP-treated cells. Most of the stimulated wild-type cells had RO and RB transcript isoforms. Flowever, the stimulated HNRNPLL RNP-treated cells had less RO and RB transcript isoforms (10.32-fold, P < 1.0e-5, FIGS. 3B and 3C). In addition to PTPRC, we analyzed the impact of splicing factors on myosin light chain 6 ( MYL6 ) transcript isoforms. Exon6 skipping of MYL6 is known to be regulated by various splicing factors. We found that CELF2 targeted cells had a higher percentage of full-length MYL6 transcript isoforms compared to cells targeted by other splicing factors (1.44-fold, P = 1.9e-14, not shown), indicating that disrupting CELF2 reduces the occurrence of exon-skipping.
Table 2.
Figure imgf000038_0001
Figure imgf000039_0001
Figure imgf000039_0002
* Number of reads depends on available pores in oxford nanopore flowcells; number of pores differ between flowcells. ** Reads not aligned to target genes are mostly derived from mispriming of gene specific primers.
*** We only used reads having both exonl &8 of RACK1 , exon3&7 of PTPRC, or exon5&7 of MYL6 for the analysis. Table 3. List of gRNAs
Figure imgf000039_0003
Figure imgf000040_0001
Table 4. List of oligonucleotides for gRNA capture.
Figure imgf000040_0002
Table 4. Mutation rate detected from long-read sequencing for each gRNA target.
Figure imgf000040_0003
Figure imgf000041_0001
Example 3 - Base editing
1) Mutations (T <-> C and A <-> G) were identified that are suited for ABE and CBE enzymes. The C to T transition is one of the most frequent mutations in human genome. For example, among the 10 most reported TP53 mutations reported in COSMIC database, 9 mutations were transitions. Among this set, CBEs can be used to engineer in eight while ABEs can engineer one. Using TP53 as an example, nine out of the ten mutations were viable candidates for base editors.
(2) Which mutations have the required sequence context for base editing were determined. The spCas9 base editor requires ‘NGG’ - this sequence is referred to as the protospacer adjacent motif (PAM). Starting from the first base of protospacer, the sequence between the 3rd to 8th base provide an optimal window for the current generation of base editors [10, 36]. Therefore, engineering in a cancer mutation requires the presence of the PAM segment. Among the nine examples of the TP53 substitution mutations, two can be targeted by the current NGG PAM base editors.
(3) Oligonucleotides were synthesized with the gRNA sequence and subclone them into plasmid or lentiviral vectors. For less than 100 gRNAs we will order single oligonucleotides. For larger sets, we will order oligonucleotides from array synthesis and include primer sequences to enable rapid subcloning into plasmid or lentiviral vectors.
(4) Base editors were applied to introduce mutations into a cell line. 10 gRNAs which target various TP53 mutations were designed. Using electroporation, a multiplexed plasmid pool with all gRNAs and CRISPR based editors was transduced into the colon cancer cell-line HOT 116. The cells underwent single cell cDNA generation and then were sequenced with both short- and long read platforms.
To identify genes which have the mutation, a targeted multiplexed enrichment was done using a based on single-primer extension method. For this process, primers were designed for the target transcripts and then synthesized. A linear single primer extension from the cDNA library increased the yield of the target while minimizing the generation of off-target sequences. The target product undergoes a 2nd DNA strand synthesis using DNA polymerase. Finally, the product is loaded on to a single molecule sequencer (Oxford Nanopore or Pacific Biosciences) and the target reads are analyzed. The mutation is identified within single cell using the cell barcode information.
The short read sequencing provided the gene expression profile for each cell. Using the Oxford nanopore system, we sequenced the TP53 cDNA transcript from the single cell cDNA. Long and short read cell barcodes were matched then TP53 point mutation were identified from the long-read data. We identified each cell’s different TP53 mutations and the matching gene expression signature. This workflow is illustrated in Fig 6.
The cells were clustered according to their TP53 mutation genotype and compared transcript profile between groups (Fig. 7). Fig. 7 shows how long and short read single cell sequencing can be integrated with to match the gene expression profiles from single cells with the mutation.
The cells with a TP53 mutation showed distinct transcriptional patterns compared to the cells with wildtype TP53. This proof-of-concept study demonstrated that this technology provides high-throughput engineering and analysis of various cancer mutations into single cells.

Claims

CLAIMS We claim:
1. A method for analyzing cells, comprising:
(a) editing a target gene in a population of cells to produce genetically modified cells;
(b) on the basis of single cells or sets of cells, sequencing substantially the entirety of the coding sequence of the mRNA encoded by the target gene from the genetically modified cells or sets of cells to identify a modification in the target gene; and
(c) on the basis of single cells or sets of cells, comparing the identified modification in the target gene with the modification expected in step (a).
2. The method of claim 1 , wherein sequencing the mRNA encoded by the target gene from the genetically modified cells is performed on the basis of single cells.
3. The method of claim 1 , wherein sequencing the mRNA encoded by the target gene from the genetically modified cells is performed on the basis of sets of cells.
4. The method of claim 1 , wherein the editing of (a) is done using a base editor, resulting in a nucleotide substitution.
5. The method of any of claims 1 to 4, wherein step (b) is done by:
(i) separating single cells or sets of cells from the genetically modified cells,
(ii) reverse transcribing the mRNAs encoded by the target genes from the single cells or sets of cells separated in step (i) to produce cDNAs, wherein the primer for the reverse transcription of the mRNAs comprises a unique barcode on the basis of single cells or sets of cells,
(iii) pooling the cDNAs from the genetically modified single cells or sets of cells,
(iv) amplifying the cDNAs and sequencing the amplification products, and (iv) depending on the unique barcodes in the amplification products, identifying the modifications in the mRNAs on the basis of single cells or sets of cells.
6. The method of claim 5, wherein the cDNAs are attached to the beads and the method comprises in step (iii), pooling the beads.
7. The method of claim 5 or 6, wherein the step (i) of separating single cells or sets of cells from the genetically modified cells comprises separating the single cells or sets of cells in individual wells of a multi-well plate or separating the cells in individual droplets in an emulsion.
8. The method of any preceding claim, comprising: reverse transcribing the mRNAs encoded by the target genes into cDNAs using a primer that specifically amplifies substantially the entirety of the coding sequence of the mRNAs, wherein the primer for the reverse transcription of the mRNAs comprises a unique barcode, amplifying the cDNAs using a primer pair, and sequencing the amplified products.
9. The method of claim 8, wherein the primer pair for amplifying the cDNAs comprises: a first primer that hybridizes with the sequence at the 5’ end of the mRNA encoded by the target gene and a second primer that hybridizes with the sequence at the 3’ of the mRNA encoded by the target gene, and wherein one or both the primers in the primer pair comprise a unique barcode.
10. The method of any preceding claim, wherein sequencing the cDNA amplification products comprises long-read sequencing.
11. The method of claim 10, wherein the long-read sequencing comprises single molecule real time (SMRT) sequencing or nanopore sequencing.
12. The method of claim 11 , wherein the SMRT sequencing comprises circular consensus sequencing or continuous long read sequencing.
13. The method any preceding claim, further comprising sequencing the transcriptomes of the genetically modified cells on the basis of single cells or sets of cells.
14. The method of claim 13, wherein sequencing the transcriptomes comprises:
(i) separating single cells or sets of cells from the genetically modified cells,
(ii) reverse transcribing the transcriptomes from the genetically modified cells or sets of cells separated in step (i) to produce cDNAs of the transcriptomes, wherein the primers used for the reverse transcription comprise a unique barcode on the basis of single cells or sets of cells,
(iii) pooling the cDNAs from the transcriptomes of the genetically modified cells or sets of cells,
(iv) amplifying the cDNAs and sequencing the amplification products of the cDNAs of the transcriptomes, and
(iv) depending on the unique barcodes in the amplification products produced in step (iv), quantifying the transcriptomes from the genetically modified cells on the basis of single cells or sets of cells.
15. The method of claim 14, comprising reverse transcribing the transcriptomes using primers comprising: 1) random nucleotide sequences or 2) oligo-dT sequence, wherein the primers have a unique barcode on the basis of single cells or sets of cells.
16. The method of claim 14 or 15, wherein sequencing the amplification products of the cDNAs of the transcriptomes comprises short read sequencing.
17. The method of claim 16, wherein the short-read sequencing comprises paired-end sequencing.
18. The method of any preceding claim, further comprising determining efficacy of editing the target gene by comparing, on the basis of single cells or sets of cells, the observed modification to the target gene with the modification expected in the target gene.
19. The method of any preceding claim, further comprising:
(a) editing one or more additional target genes in the population of cells to produce the population of genetically modified cells;
(b) the basis of single cells or sets of cells, sequencing substantially the entirety of the coding sequences of the mRNAs encoded by the one or more additional target genes from the genetically modified cells to identify one or more modifications in the one or more additional target genes; and
(c) on the basis of single cells or the sets of cells, comparing the identified modification to the one or more additional target genes to the modifications expected in the one or more additional target genes.
20. The method of claim 19, wherein sequencing the mRNA encoded by the target gene from the genetically modified cells is performed on the basis of single cells.
21. The method of claim 19, wherein sequencing the mRNA encoded by the target gene from the genetically modified cells is performed on the basis of sets of cells.
22. The method of any of claims 19 to 21 , wherein step (b) is done by:
(i) separating single cells or sets of cells from the genetically modified cells,
(ii) reverse transcribing the mRNAs encoded by the one or more additional target genes from the cells or sets of cells separated in step (i) to produce cDNAs for the one or more additional target genes, wherein the primer for the reverse transcription of the imRNAs comprises a unique barcode on the basis of single cells or sets of cells,
(iii) pooling the cDNAs from the genetically modified cells or sets of cells,
(iv) amplifying the cDNAs for the one or more additional target genes and sequencing the amplification products, and
(iv) depending on the unique barcodes in the amplification products, identifying the modifications in the mRNAs for the one or more additional target genes on the basis of single cells or sets of cells.
23. The method of any of claims 19 to 22, further comprising sequencing the transcriptomes of the genetically modified cells on the basis of single cells or sets of cells.
24. The method of any of claims 19 to 23, wherein sequencing the amplification products of the cDNAs for the one or more additional target genes comprises long- read sequencing.
25. The method of claim 23 or 24, wherein sequencing the transcriptomes of the genetically modified cells comprises short-read sequencing.
PCT/US2022/034376 2021-06-24 2022-06-21 Detecting crispr genome modification on a cell-by-cell basis Ceased WO2022271725A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163214680P 2021-06-24 2021-06-24
US63/214,680 2021-06-24

Publications (1)

Publication Number Publication Date
WO2022271725A1 true WO2022271725A1 (en) 2022-12-29

Family

ID=84544898

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/034376 Ceased WO2022271725A1 (en) 2021-06-24 2022-06-21 Detecting crispr genome modification on a cell-by-cell basis

Country Status (1)

Country Link
WO (1) WO2022271725A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024092151A1 (en) * 2022-10-27 2024-05-02 The Board Of Trustees Of The Leland Stanford Junior University Direct measurement of engineered cancer mutations and their transcriptional phenotypes in single cells

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020056451A1 (en) * 2018-09-21 2020-03-26 Garvan Institute Of Medical Research Phenotypic and molecular characterisation of single cells
WO2020168075A1 (en) * 2019-02-13 2020-08-20 Beam Therapeutics Inc. Splice acceptor site disruption of a disease-associated gene using adenosine deaminase base editors, including for the treatment of genetic disease

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020056451A1 (en) * 2018-09-21 2020-03-26 Garvan Institute Of Medical Research Phenotypic and molecular characterisation of single cells
WO2020168075A1 (en) * 2019-02-13 2020-08-20 Beam Therapeutics Inc. Splice acceptor site disruption of a disease-associated gene using adenosine deaminase base editors, including for the treatment of genetic disease

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GUPTA ISHAAN, COLLIER PAUL G, HAASE BETTINA, MAHFOUZ AHMED, JOGLEKAR ANOUSHKA, FLOYD TAYLOR, KOOPMANS FRANK, BARRES BEN, SMIT AUGU: "Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells", NATURE BIOTECHNOLOGY, NATURE PUBLISHING GROUP US, NEW YORK, vol. 36, no. 12, 1 December 2018 (2018-12-01), New York, pages 1197 - 1202, XP093020954, ISSN: 1087-0156, DOI: 10.1038/nbt.4259 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024092151A1 (en) * 2022-10-27 2024-05-02 The Board Of Trustees Of The Leland Stanford Junior University Direct measurement of engineered cancer mutations and their transcriptional phenotypes in single cells

Similar Documents

Publication Publication Date Title
US12460200B2 (en) Methods and compositions comprising CRISPR-Cpf1 and paired guide CRISPR RNAs for programmable genomic deletions
US20240376460A1 (en) High-throughput single-cell sequencing with reduced amplification bias
Oikonomopoulos et al. Methodologies for transcript profiling using long-read technologies
US20200399690A1 (en) Compositions and methods for selection of nucleic acids
EP3555305B1 (en) Method for increasing throughput of single molecule sequencing by concatenating short dna fragments
Deininger Alu elements: know the SINEs
Karlic et al. Long non-coding RNA exchange during the oocyte-to-embryo transition in mice
WO2020206285A1 (en) Methods and applications for cell barcoding
US20220033811A1 (en) Method and kit for preparing complementary dna
JP2018532419A (en) CRISPR-Cas sgRNA library
US10968447B2 (en) Methods and compositions for enrichment of target polynucleotides
EP3935185A1 (en) Compositions and methods of labeling nucleic acids and sequencing and analysis thereof
Sterling et al. An efficient and sensitive method for preparing cDNA libraries from scarce biological samples
Ritter et al. Deletion of a telomeric region on chromosome 8 correlates with higher productivity and stability of CHO cell lines
Agarwal et al. Sequencing of first-strand cDNA library reveals full-length transcriptomes
US11946163B2 (en) Methods for measuring and improving CRISPR reagent function
Bundschuh et al. Complete characterization of the edited transcriptome of the mitochondrion of Physarum polycephalum using deep sequencing of RNA
US20180346963A1 (en) Preparation of Concatenated Polynucleotides
WO2022271725A1 (en) Detecting crispr genome modification on a cell-by-cell basis
WO2020172199A1 (en) Guide strand library construction and methods of use thereof
Gupta et al. Molecular biology and genetic engineering
US20230287396A1 (en) Methods and compositions of nucleic acid enrichment
Raghavan et al. High-throughput screening and CRISPR-Cas9 modeling of causal lipid-associated expression quantitative trait locus variants
Mamia et al. T cell correction pipeline for Inborn Errors of Immunity
WO2024092151A1 (en) Direct measurement of engineered cancer mutations and their transcriptional phenotypes in single cells

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22829170

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22829170

Country of ref document: EP

Kind code of ref document: A1