[go: up one dir, main page]

WO2025160253A1 - Methods for chromatin accessibility and transcriptome analysis of cells having genetic perturbations - Google Patents

Methods for chromatin accessibility and transcriptome analysis of cells having genetic perturbations

Info

Publication number
WO2025160253A1
WO2025160253A1 PCT/US2025/012714 US2025012714W WO2025160253A1 WO 2025160253 A1 WO2025160253 A1 WO 2025160253A1 US 2025012714 W US2025012714 W US 2025012714W WO 2025160253 A1 WO2025160253 A1 WO 2025160253A1
Authority
WO
WIPO (PCT)
Prior art keywords
cells
barcode
nuclei
cell
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2025/012714
Other languages
French (fr)
Inventor
Neville E. SANJANA
Rachel E. YAN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New York University NYU
New York Genome Center Inc
Original Assignee
New York University NYU
New York Genome Center Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New York University NYU, New York Genome Center Inc filed Critical New York University NYU
Publication of WO2025160253A1 publication Critical patent/WO2025160253A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1082Preparation or screening gene libraries by chromosomal integration of polynucleotide sequences, HR-, site-specific-recombination, transposons, viral vectors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1096Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2320/00Applications; Uses
    • C12N2320/10Applications; Uses in screening processes
    • C12N2320/12Applications; Uses in screening processes in functional genomics, i.e. for the determination of gene function

Definitions

  • a method for evaluating effects of genetic perturbations on chromatin accessibility and the transcriptome of single cells in a population of cells comprising: (a) obtaining a heterogeneous population of cells having single cells with one or more genetic perturbations having been introduced by a CRISPR guide RNA that targets a gene or genomic region of interest, the single cells comprising one or more CRISPR guide RNAs; (b) obtaining cell nuclei from all or a portion of the single cells of (a) and separating the nuclei into partitions, and incubating the cell nuclei in with a tagmentation buffer that comprises a transposome complex, wherein the transposome complex comprises a transposase, a transposon, and a nucleotide sequence comprising a handle sequence and a first barcode, wherein the transposase causes staggered doublestranded breaks in DNA, and wherein the handle sequence and the first barcode are linked to the double-stranded DNA at the stagg
  • a method for evaluating effects of genetic perturbations on chromatin accessibility and the transcriptome of single cells in a population of cells comprising: (a) obtaining a heterogeneous population of cells having single cells with one or more genetic perturbations having been introduced by a CRISPR guide RNA that targets a gene or genomic region of interest, the single cells comprising one or more CRISPR guide RNAs; (b) obtaining cell nuclei from all or a portion of the single cells of (a) and separating the nuclei into partitions, and incubating the cell nuclei in with a tagmentation buffer that comprises a transposome complex, wherein the transposome complex comprises a transposase, a transposon, and a nucleotide sequence comprising a handle sequence and a first barcode, wherein the transposase causes staggered doublestranded breaks in DNA, and wherein the handle sequence and the first barcode are linked to the double-stranded DNA at the stagg
  • the first barcode is unique to a partition and differs from another or all other first barcodes present in additional partitions.
  • the one or more genetic perturbations include CRISPR-Cas mediated editing, including CRISPR/Cas9, prime-editing, base-editing, CRISPRa, and/or CRISPRi.
  • more than one CRISPR guide RNA targets a gene or genomic region of interest or a different gene genomic region of interest in a single cell.
  • the one or more partitions of (b) are individual wells of a microwell plate, optionally a 96 well plate.
  • one or more partitions of (b) contain at least about 1000, about 2000, about 5000, about 25,000, or about 50,000 nuclei per partition.
  • the method further comprises washing the nuclei of step (b) prior to step (c) to stop the tagmentation reaction without disrupting the cell nuclei, wherein the washing comprises addition of EDTA.
  • step (e) comprises partitioning at least about 2, at least about 5, at least about 10, at least about 15, or at least about 20 nuclei with the bead.
  • the PCR products generated in step (e) are separated to obtain a library comprising a combination of PCR products comprising double-stranded DNA of (b), cDNA from transcription of the CRISPR guide RNAs and cDNA generated from cellular RNA, optionally wherein the separation is based on size.
  • the PCR products generated in step (e) are separated to obtain a first library comprising doublestranded DNA of (b), and a second library comprising (i) cDNA from transcription of the CRISPR guide RNAs and (ii) cDNA generated from cellular RNA, optionally wherein the separation is based on size.
  • the population of cells of (a) have been further treated with a chemical agent or a biological agent.
  • the analysis is limited to cells (nuclei) defined as having at least 200 fragments per cell and/or perturbations wherein at least 100 cells are identified as having the perturbation.
  • FIG. 1 A - FIG. IL show MultiPerturb-seq combines single-cell RNA-sequencing and single-cell ATAC-sequencing with pooled CRISPR perturbations for high-throughput functional genomics.
  • FIG. 1 A MultiPerturb-seq combines combinatorial indexing with droplet microfluidics for trimodal capture.
  • FIG. IB Cost comparison for various single-cell CRISPR pooled screens methods.
  • FIG. 1C Capillary electrophoresis of AT AC, RNA, and CRISPR inhibition (CRISPRi) guide RNA (gRNA) libraries from MultiPerturb-seq.
  • FIG. ID - FIG. IF Single-cell collision rate quantification for ATAC fragments (FIG. ID, 11.6%), RNA transcripts (FIG. IE, 6.2%), and CRISPR gRNAs (FIG. IF, 6,6%) aligning to the human and mouse genomes. ATAC and RNA plots are downsampled for visualization.
  • FIG. 1G Uniform Manifold Approximation and Projection (UMAP) on RNA (transcript) data colored by species.
  • Mouse 3T3 fibroblasts (transduced with the mouse non-targeting gRNA library) constituted 20% of all cells prior to nuclei isolation.
  • FIG. 1H Open chromatin peaks (ATAC), transcripts (RNA) and gRNAs (CRISPR) detected for BT16 (human) cells and 3T3 (mouse) cells.
  • FIG. II Distance of ATAC peaks from transcription start sites (TSS).
  • FIG. 1 J Proportion of single cells with 1, 2, or more than 2 gRNAs detected.
  • FIG. IK Comparison between cells with histone methyltransferase perturbations (Histone MTs) and cells with non-targeting (NT) control perturbations for gene expression and open chromatin at the RFX3 locus.
  • FIG. IL Comparison between cells with perturbations targeting H3F3A and cells with non-targeting (NT) control perturbations for gene expression and open chromatin at the PPM1B locus.
  • reads are normalized to cell number, tracks are binned in 500 bp bins for visualization and scale bars denote 25 kb.
  • FIG. 2A shows MultiPerturb-seq ATAC, mRNA and CRISPR guide RNA library amplicons.
  • Color key Light blue: Illumina P5; Dark yellow: Barcode 2 (10X ATAC GEM barcode); Purple/Grey: 10X ATAC GEM capture sequence/Nextera Read 1; Light grey: variable ATAC, RNA, or gRNA region; Grey/Pink: Nextera Read 2; Light yello : Barcode 1 (MEDS or TSO barcode); Light blue: primer binding region; Orange: Illumina P7; Brown: UMI; Blue-grey: gRNA scaffold; Navy blue: RNA handle.
  • FIG. 3 A - FIG. 3D show CRISPR library design and quality control.
  • FIG. 3 A Classification of targets in the AT/RT CRISPRi library by epigenetic and transcriptional functions. Filled boxes indicate that the target gene has the indicated molecular function 71 .
  • FIG. 3B Representation of guide RNAs in the plasmid library. Bias/uniformity was calculated as the ratio of counts at the 90th percentile/lOth percentile.
  • FIG. 3C Viral titration of the library virus.
  • FIG. 3D Guide RNA representation in the screen based on cell number (fragments/cell threshold set at 100 fragments/cell).
  • FIG. 4 shows a MultiPerturb-seq workflow. Nuclei are isolated, undergo tagmentation with barcoded MEDS, and then reverse transcription with matching barcoded primers. All molecular species undergo second-round barcoding via droplet microfluidics (10X ATAC kit), then ATAC and RNA fractions are separated and undergo library preparation via custom PCRs. ATAC fragments are amplified directly. The mRNA is first tagmented and then amplified for short-read sequencing. The gRNA is additionally enriched via biotin pulldown prior to amplification.
  • FIG. 5A - FIG. 5J show optimization of MultiPerturb-seq conditions. Optimization across the three modalities (ATAC, RNA, gRNA).
  • FIG. 5A Optimization of ATAC libraries varying the Tn amount, tagmentation buffer and PCR annealing temperature (Ta): The amount of Tn protein is indicated in pl (50 pl reactions) in Omni Lysis Buffer (10 mM Tris- HC1, pH 7.4, 10 mM NaCl, 3 mM MgCh, 0.1% NP-40 (ThermoFisher 85124), 0.1% Tween- 20 (Sigma P1379), 0.01% digitonin (Promega G9441) 48 or the lysis buffer from the original ATAC protocol (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3 mM MgC12,+ 1% Tween-20 (Sigma P1379)) 46 .
  • annealing temperatures for PCR amplification (57-67°C) as shown.
  • FIG. 5B Optimization of ATAC libraries varying the length of nuclear lysis time (minutes) and cell number (in thousands).
  • FIG. 5C Optimization of RNA capture by amplification of a specific cDNA transcript (GAPDH, using intron-spanning primers) with different input cell numbers and template-switch oligonucleotides (TSOs). Reverse transcription and PCR was performed using either RNA extracted using TRIzol (ThermoFisher 15596026) (for inputs in ng) or nuclei (for inputs in number of cells).
  • TRIzol ThermoFisher 15596026
  • FIG. 5D Comparison of 2X KAPA HiFi Master Mix (Roche 07958935001) and PfuX7 DNA polymerase 51 for cDNA amplification.
  • FIG. 5E Optimization of gRNA capture with postreverse transcription (2. OX) and post-PCR (2.5X) SPRI cleanups.
  • FIG. 5F Optimization of gRNA capture with post-reverse transcription SPRI (2. OX) or ExoSAP-IT cleanup with the indication amount of RT primer (pl in a 40 pl RT reaction).
  • FIG. 5G Optimization of joint ATAC and RNA capture (mRNA and gRNA) using different TSOs as in FIG. 5C.
  • FIG. 5H Optimization of joint ATAC and RNA capture (mRNA and gRNA) for different gRNAbiotin pulldown strategies. Biotinylated primers were used during ISPCR (Biotin ISPCR) and/or the intermediate gRNA PCR (Biotin Int). (FIG.
  • FIG. 6 A - FIG. 6B show an overview of MultiPerturb-seq sequencing, alignment and read mapping.
  • FIG. 6A Overview of data processing for MultiPerturb-seq.
  • FIG. 6B The intersection of cell barcodes between modalities (ATAC, RNA, and CRISPR). All aligned cell barcodes were considered in this analysis. 429,139 cell barcodes (with reads in all three modalities) were then filtered based on read number and other metrics.
  • FIG. 7A - FIG. 7P show ATAC, RNA, and gRNA quality metrics.
  • FIG. 7A - FIG. 7C Single-cell collision rate quantification for ATAC fragments (FIG. 7A), RNA transcripts (FIG. 7B), and guide RNA transcripts (FIG. 7C) aligning to the human and mouse genomes. For visualization, outliers beyond the 99th percentiles are omitted.
  • FIG. 7D Percent reads per cell mapping to incorrect species in MultiPerturbseq and scifi-RNA-seq 11 . Cells were annotated as human or mouse based on the dominant species.
  • MultiPerturb-seq had a 80:20 humammouse mix and scifi-RNA-seq had a 50:50 humammouse mix.
  • FIG. 7E Proportion of fragments within 3 kb of transcription start sites (TSSs) in MultiPerturb-seq and CRISPR- sciATAC 6 .
  • FIG. 7F Proportion of mitochondrial reads (RNA) in cells.
  • FIG. 7G Fraction of all cells with gRNAs assigned compared to other single-cell perturbation methods.
  • FIG. 7H Pseudobulk expression of CRISPRi target genes relative to non-targeting controls by perturbation.
  • FIG. 71 CRISPRi target genes detected in cells with a non-targeting control perturbation, but not in cells that received a gene-targeting perturbations were assigned a log2(FC) of -10. All CRISPRi target genes that were detected in cells with a genetargeting perturbation were detected in cells with a non-targeting perturbation.
  • FIG. 71 - FIG. 7P Comparison of RNA and ATAC metrics to other single-cell methods including (FIG. 71) RNA unique molecular identifiers (UMIs) per cell 15 ' 18 , (FIG. 7J) unique genes per cell 15 ' 18 , (FIG. 7K) UMIs per cell 11 , (FIG. 7L) unique genes per cell 11 , (FIG.
  • FIG. 8 A - FIG. 8 J show comparison of MultiPerturb-seq and the 10X Multi ome kit with CROP-seq (CROP-Multiome).
  • FIG. 8A The epigenomic remodelers library was recloned into the specialized guide RNA (gRNA) plasmid, CROP-seq 19 to perform a multiomic CRISPR screen on the 10X Multiome kit in BT16 cells.
  • FIG. 8B - FIG. 8E Comparison of differentially expressed genes (compared to non-targeting gRNA control) between MultiPerturb-seq and CROP-Multiome strategies for cells receiving (FIG. 8B) SETD5, (FIG. 8C) BRD7, (FIG.
  • FIG. 8D Fraction of high-quality cells (at least 1000 unique RNA reads per cell) with successful gRNA capture/ assignment in MultiPerturb- seq and CROP-Multiome.
  • FIG. 8G RNA unique reads per cell
  • FIG. 8H RNA genes per cell
  • FIG. 81 unique ATAC fragments (mapped to peaks) per cell
  • FIG. 8J unique ATAC peaks per cell. Cells were uniformly defined as having either 1,000 RNA or 1,000 ATAC reads across comparisons of technologies.
  • CROP-Multiome we did not require gRNA capture due to the low capture rate.
  • FIG. 9A - FIG. 9G show MultiPerturb-seq identifies genetic perturbations that trigger differentiation in atypical teratoid/rhaboid tumor (AT/RT).
  • FIG. 9A Overview of differentiation challenge in AT/RT brain tumors and design of pooled CRISPR library to identify chromatin remodelers for cancer reprogramming therapy.
  • FIG. 9B Correlation between gene-perturbed human AT/RT cells and gene expression over developmental stages from 4 weeks post-conception (wpc) to senior adulthood. 28 The Pearson correlation is computed on the top 1000 highly variable genes and values are normalized such that cells receiving a non-targeting perturbation display as zero on the colorscale.
  • FIG. 9A Overview of differentiation challenge in AT/RT brain tumors and design of pooled CRISPR library to identify chromatin remodelers for cancer reprogramming therapy.
  • FIG. 9B Correlation between gene-perturbed human AT/RT cells and gene expression over developmental stages from 4 weeks post-conception (wpc)
  • FIG. 9D RNA and ATAC differentiation score
  • FIG. 9E ATAC differentiation score
  • FIG. 9F RNA and ATAC differentiation scores for all CRISPRi gene perturbations.
  • Marker genes of neural differentiation include markers of neurons (CCND3, GABBR1, GPM6B), astrocytes (SYNJ2), inhibitors of mesenchymal lineages (ITM2B), and genes with previously defined roles in cancer differentiation therapy (ARHGEF3 7 , SYNJ2 i1, 32 ).
  • Marker genes of sternness include markers of embryonic stem cells (TRIM24 15 , SMARCAD1 16 ), neural stem cells (EPHB4 17 ), neural progenitors (CACHD1TM, TACC3 79 ), mesenchymal lineages (ARID3B m ), and cancer stem markers (TRIM24 ⁇ , ARID3B 2 ).
  • FIG. 11 A - FIG. 11H show changes in open chromatin at ENCODE regulatory elements and comparison with open chromatin from healthy brain.
  • FIG. 11 A Correlation between gene-perturbed human AT/RT cells and open chromatin peaks in developmental 34 and adult 35 brain atlases. The Pearson correlation is computed on the top 1000 highly variable promoter-adjacent peaks.
  • FIG. 1 IB - FIG. 1 IE Rank of different CRISPRi perturbations in the MultiPerturb-seq screen by changes in open chromatin at genomic loci overlapping ENCODE 36
  • FIG. 1 IB promoters
  • FIG. 11C proximal
  • FIG. 11C proximal
  • FIG. 1 ID distal enhancers (dELS), as well as (FIG. 1 IE) poised elements (DNase-H3K4me3).
  • dELS distal enhancers
  • FIG. 1 IE poised elements
  • FIG. 1 IF Cumulative density of fold-changes in open chromatin overlapping specific ENCODE regulatory element in gene-perturbed cells compared to non-targeting cells.
  • FIG. 11G Sum of fold-changes (log2) at peaks overlapping ENCODE regulatory elements 36 (right) with perturbed genes grouped by protein complex. Complex types are also classified by groupings from the EpiFactors database 71 .
  • FIG. 11H RNA and ATAC differentiation scores for each chromatin modifier complex.
  • FIG. 12A - FIG. 121 show chromatin accessibility, gene expression and differentiation assays after ZNHIT1 knockdown.
  • ZNHIT1 functions within the SRCAP complex to deposit histone variant H2A.Z.
  • YEATS4 and KAT5 also function in H2A.Z deposition and/or acetylation.
  • FIG. 12B - FIG. 12C Gene Ontology (GO) Biological Processes analyses for ZNHIT1 -perturbed cells from MultiPerturb-seq for FIG. 12B the closest genes to the 10,000 most significant differential AT AC peaks (compared to cells with a non-targeting [NT] gRNA) and
  • FIG. 12A ZNHIT1 functions within the SRCAP complex to deposit histone variant H2A.Z.
  • YEATS4 and KAT5 also function in H2A.Z deposition and/or acetylation.
  • FIG. 12B - FIG. 12C Gene Ontology (GO) Biological Processes analyses for ZNHIT
  • FIG. 12C upregulated genes (compared to cells with a NT gRNA).
  • FIG. 12D Flow cytometry gating for SOX2 analysis.
  • FIG. 12F Representative images of EdU labeling for cell cycle analysis. Scale bar: 5pm.
  • NT non-targeting
  • FIG. 12H Representative immunofluorescence images of MAP2 expression in BT12 AT/RT cells with a NT or ZNHIT1 -targeting gRNA. Scale bar: 50pm.
  • FIG. 13 A - FIG. 13P show ZNHIT1 loss drives AT/RT cell cycle arrest and differentiation via decreased H2A.Z deposition.
  • FIG. 13 A CRISPRi validation in AT/RT cells to assess sternness, proliferation and differentiation after ZNHIT1 loss.
  • FIG. 13B SOX2 expression in cells receiving ZNHIT L SOX2 or non-targeting (negative control, NT) guide RNAs (gRNAs).
  • FIG. 13H Expression and quantification of (FIG. 13F) ATOH8, (FIG. 13G) TUJ1, and (FIG. 13H) MAP2 in BT16 cells with ZNHIT1 -targeting or NT gRNAs.
  • FIG. 131 CUT&RUN of H2A.Z, H3K4me3, and IgG (negative control) in BT16 cells with ZNHIT 1 -targeting or NT gRNAs.
  • FIG.13N Quantification of S-phase cells from FIG. 13M and significance determined via %2-test.
  • FIG. 130 Representative immunofluorescence images of MAP2 expression in BT16, BT12, and CHLA06 AT/RT cells ⁇ AA ⁇ H2AZ1- or H2AZ2- targeting (or NT) gRNAs.
  • FIG. 14A - FIG. 14C show H2A.Z and H3K4me3 CUT&RUN after ZNHIT1 loss.
  • FIG. 14A Correlation between replicates of H2A.Z CUT&RUN for BT16 AT/RT cells receiving a non-targeting (NT) or ZNHIT1 -targeting guide RNA (gRNA). For visualization, outliers beyond the 99th percentile are omitted.
  • FIG. 14C Binding of H3K4me3 and IgG (negative control) near transcription star sites in BT16 cells with ZNHIT1 -targeting or NT gRNAs.
  • FIG. 15A - FIG. 15D show H2A.Z loss hinders cell cycle progression.
  • FIG. 15 A Gating strategy for cell cycle analysis with propidium iodide (PI) in CHLA06 AT/RT cells (Sony SH800).
  • FIG. 15B Gating strategy for cell cycle analysis with PI in BT12 AT/RT cells (MACS Quant 10 ).
  • FIG. 15D Quantification of S-phase cells from FIG. 15C. Significance was determine using a %2-test
  • a scalable in vitro method for analyzing chromatin accessibility and screening RNA of single cells having genetic perturbations in a heterogeneous population (e.g., a library of cells).
  • the technology (termed “MultiPerturb-seq” for Multi ome Perturb- seq) is useful in the research and development of new therapies by allowing interrogation of single-cell transcriptome and chromatin accessibility profiles at scale.
  • CRISPR perturbations may be used to precisely target known or novel pharmacologic or gene therapy targets.
  • Analysis of the cell transcriptome provides a view of cell state, while chromatin accessibility profiling adds additional information about cell state while also providing information about putative mechanism of action. Linking these through a pooled screen with combinatorial indexing allows hundreds to thousands of targets to be screened in a single experiment, allowing for iterative and rapid hypothesis generation and discovery. The method allows for analyses to be performed in a scalable and efficient matter that provides significant cost savings in comparison to various single-cell CRISPR pooled screening methods.
  • MultiPerturb-seq utilizes methods of introducing CRISPR perturbations in combination with Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) and RNA sequencing methodologies.
  • nucleic acid can be RNA, DNA, or a modification thereof, and can be single or double stranded, and can be selected, for example, from a group including: nucleic acid encoding a protein of interest, oligonucleotides, nucleic acid analogues, for example peptide- nucleic acid (PNA), pseudocomplementary PNA (pc-PNA), locked nucleic acid (LNA) etc.
  • PNA peptide- nucleic acid
  • pc-PNA pseudocomplementary PNA
  • LNA locked nucleic acid
  • nucleic acid sequences include, for example, but are not limited to nucleic acid sequence encoding proteins, for example that act as transcriptional repressors, antisense molecules, ribozymes, small inhibitory nucleic acid sequences, for example but are not limited to RNA interference (RNAi), short hairpin RNAi (shRNAi), small interfering RNA (siRNA), micro RNAi (mRNAi), antisense oligonucleotides etc.
  • RNAi RNA interference
  • shRNAi short hairpin RNAi
  • siRNA small interfering RNA
  • miRNAi micro RNAi
  • antisense oligonucleotides etc.
  • nucleotide “nucleic acid” “nucleotide residue” and “nucleic acid residue” are used interchangeably, referring to a nucleotide in a nucleic acid polymer.
  • consecutive nucleotide residues refer to nucleotide residues in a contiguous region of a nucleic acid polymer.
  • RNA Ribonucleic acid
  • RNA is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes.
  • RNA may refer to a CRISPR guide RNA, a messenger RNA (mRNA), a long non-coding RNA (IncRNA), a mitochondrial RNA, a microRNA (miRNA), non-coding RNAs, transfer RNA, ribosomal RNA, short hairpin RNAi (shRNAi), or small interfering RNA (siRNA).
  • deoxyribonucleic acid is a polymeric molecule formed by deoxyribonucleic acid, including, but not limited to, genomic DNA, double-strand DNA, single-strand DNA, DNA packaged with a histone protein, complementary DNA (cDNA which is reverse-transcribed from a RNA), mitochondrial DNA, and chromosomal DNA.
  • Nucleic acid sequences described herein can be cloned using routine molecular biology techniques, or generated de novo by DNA synthesis, which can be performed using routine procedures by service companies having business in the field of DNA synthesis and/or molecular cloning (e.g. GeneArt, GenScript, Life Technologies, Eurofins).
  • nucleic acid sequences encoding aspects of a CRISPR-Cas editing system can be assembled and placed into any suitable genetic element, e.g., naked DNA, phage, transposon, cosmid, episome, etc., which transfers the sequences carried thereon to a host cell, e.g., for generating non-viral delivery systems (e.g., RNA-based systems, naked DNA, or the like), or for generating viral vectors in a packaging host cell, and/or for delivery to a host cells in a subject.
  • the methods used to make such engineered constructs are known to those with skill in nucleic acid manipulation and include genetic engineering, recombinant engineering, and synthetic techniques.
  • dNTP stands for deoxyribonucleotide triphosphate. Each dNTP is made up of a phosphate group, a deoxyribose sugar and a nitrogenous base. There are four different dNTPs and can be split into two groups: the purines (including dATP, deoxyadenosine 5'- triphosphate, and dGTP, deoxyguanine 5'-triphosphate) and the pyrimidines (including dTTP, deoxythymidine 5'-triphosphate, and dCTP, deoxy cytidine 5'-triphosphate).
  • purines including dATP, deoxyadenosine 5'- triphosphate, and dGTP, deoxyguanine 5'-triphosphate
  • pyrimidines including dTTP, deoxythymidine 5'-triphosphate, and dCTP, deoxy cytidine 5'-triphosphate.
  • dNTP Mix is a mixture (normally in a solution containing sodium salts) of dATP, dCTP, dGTP and dTTP, suitable for use in polymerase chain reaction (PCR), sequencing, fill-in reactions, nick translation, cDNA synthesis, and TdT-tailing reactions.
  • PCR polymerase chain reaction
  • complementary DNA can refer to a synthetic DNA reverse transcribed from RNA through the action of a reverse transcriptase.
  • the cDNA may be single-stranded or double-stranded and can include strands that have either or both of a sequence that is substantially identical to a part of the RNA sequence or a complement to a part of the RNA sequence.
  • Perturbation refers to the effects on one more target genes or genomic regions of interest, including modification in the expression of gene products (including proteins) or a target sequence.
  • Perturbations include mutations or modifications such as, e.g. small nucleotide insertions or deletions (indels) or a larger deletion, insertion, or inversion.
  • the introduction a mutation or modification is referred to as “editing” or “gene editing”.
  • Perturbations include transcriptional silencing or repression (e.g. CRISPRi) or activation (CRISPRa) of a target genes or genomic regions of interest.
  • the cells include eukaryotic cells such as plant cells, animal cells, fungal cells, protozoan cells, or algae cells.
  • the cells are a mammalian cells.
  • the cells are stem cells (for example, an embryonic stem cell), cancer cells, neuronal cells, epithelial cells, immune cells (e.g., lymphocytes), endocrine cells, germ cells, somatic cells, kidney cells, liver cells, pancreatic cells, skin cells, fat cells, bone cells, or muscle cells.
  • the cells are a cell line, for example an HEK293 cell, an NIH-3T3 cell, or a K562 cell.
  • an oligo refers to short DNA or RNA molecules.
  • an oligo can be at least about 1 to 500 monomeric components, e.g., nucleotides, in length.
  • an oligo can be about 20 to about 80 nucleotides in length.
  • an oligo is formed of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
  • an oligo refers to a template switch oligonucleotide
  • primers can refer to a short polynucleotide, generally with a free 3 '-OH group, that binds to a target or template polynucleotide present in a sample by hybridizing with the target or template, and thereafter promoting extension of the primer to form a polynucleotide complementary to the target or template.
  • Primers can include polynucleotides ranging from 5 to 1000 or more nucleotides.
  • the primer has a length of at least 4 nucleotides, 5 nucleotides, 10 nucleotides, 15 nucleotides, 20 nucleotides, 25 nucleotides, 30 nucleotides, 35 nucleotides, 40 nucleotides, 45 nucleotides, 50 nucleotides, 60 nucleotides, 70 nucleotides, 80 nucleotides, 90 nucleotides, 100 nucleotides, or a length within a range of any two of the foregoing lengths.
  • a barcode describes a defined polymer, e.g., a polynucleotide, which when it is a functional element of the polymer construct, is specific for a compartment, a single cell, or cell nucleus or cellular components (for example, DNA, RNA and/or mitochondria and ribosomes) thereof.
  • the barcode is about 2 to 4 monomeric components, e.g., nucleotide bases, in length.
  • the barcode is at least about 1 to 100 monomeric components, e.g., nucleotides, in length.
  • the barcode is formed of a sequence of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
  • a barcode can be an artificial sequence or a naturally occurring sequence.
  • each barcode within a population of barcodes is different.
  • a portion of barcodes in a population of barcodes is different, e.g., at least about 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, or 99% of the barcodes in a population of barcodes is different.
  • a population of barcodes may be randomly generated or non-randomly generated.
  • a population of barcodes are error correcting barcodes.
  • Barcodes can be used to computationally deconvolute the multiplexed sequencing data and identify sequence reads derived from an individual cell, compartment, etc.
  • a barcode can also be used for deconvolution of a collection of cells or cell nuclei or cellular components thereof that have been distributed into small compartments for enhanced mapping.
  • a barcode also refers to a process of introducing a barcode to a DNA or RNA. Examples of introducing a barcode are illustrated in FIG. 1 A.
  • a barcode may be located at the 3’ end of a reverse transcription (RT) primer, such as, a RT primer comprising a oligo d(T)n (also termed as RT oligo, referring to a polyT oligo) at the 5’ end and a barcode at the 3’ end.
  • a barcode may be located at the 3’ end of a PCR primer. Such primer may be used in amplifying tagmented DNA or guide RNA via a PCR reaction.
  • a nucleic acid such as DNA or RNA
  • UMI unique molecular identifier
  • RTT random molecular tag
  • the UMI permits identification of amplification duplicates of the polymer with which it is associated.
  • one or more UMI may be associated with a single polymer.
  • the UMI may be positioned 5’ or 3’ to the barcode in the composition.
  • the UMI may be inserted into the polymer as part of the described methods.
  • a UMI is added during the method, for example, during reverse transcription.
  • Each UMI for each polymer e.g., oligonucleotide or polynucleotide is different from any other UMI used in the compositions or methods.
  • the UMI is formed of a random sequence of DNA, RNA, modified bases or combinations of these bases or other monomers of the polymers identified above.
  • a UMI is about 8 monomeric components, e.g., nucleotides, in length.
  • each UMI can be at least about 1 to 100 monomeric components, e.g., nucleotides, in length.
  • the UMI is formed of a random sequence of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
  • nucleic acids e.g., n-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N
  • partition refers to a physical area or volume that separates or isolates a subset of cells and/or cell nuclei from other subsets.
  • a subset may be a single cell or a nucleus from a single cell, and the partition isolates each cell or cell nuclei.
  • a partition may be an aqueous compartment (for example, microfluidic droplet), a solid compartment (for example, a well on a plate, a tube, a vial, a particle, a microparticle, and/or a bead), or a separated region on a surface (for example, a chip, a microplate, or a slide).
  • the method comprises obtaining a heterogenous population of cells having single cells with one or more genetic perturbations.
  • the perturbation have been introduced by one or more CRISPR guide RNAs.
  • the guide RNAs are subsequently amplified and sequenced to identify the guide RNAs themselves and the corresponding genes or genomic targets for that single cell.
  • Amplification includes obtaining a nucleic acid that includes a guide RNA sequence and a barcode.
  • a heterogeneous population of cells is obtained by transducing cells with a CRISPR-Cas vector library that includes guide RNAs that target multiple genes or genomic targets.
  • the CRISPR-Cas system is a method for functionally inactivating genes in a cell using a CRISPR-associated endonuclease (z.e., Cas, for example, Cas9, dCas) to perturb a target gene or genomic region of interest’s transcription (e.g., to disrupt or repress expression).
  • a CRISPR-associated endonuclease z.e., Cas, for example, Cas9, dCas
  • a small RNA guide RNA, gRNA
  • Perturbations can also be introduced by prime-editing, base-editing, CRISPRa, and/or CRISPRi methodologies, wherein the respective guide RNA constructs (e.g., pegRNA) are captured and sequenced according to methods described herein.
  • a genome refers to the genetic material of an organism.
  • the genome includes both the genes (the coding genomic sequences which code for protein in the organism) and the noncoding DNA (which does not encode protein in the organism, including but not limited to introns, sequences for non-coding RNAs, regulatory regions such as promoter and enhancer, and repetitive DNA), as well as mitochondrial DNA and chloroplast DNA.
  • Genome editing is a type of genetic engineering in which DNA is inserted, deleted, modified or replaced in the genome of an organism. Editing the genome can be achieved using engineered nucleases such as CRISPR-Cas9 (or other CRISPR enzymes).
  • CRISPR-Cas9 or other CRISPR enzymes.
  • the methods described herein apply to cells that are perturbed, for example, by a gain-of-function genomic editing, a loss-of-function genomic editing, an upregulation or downregulation of certain coding or non-coding genomic sequence, or epigenome editing.
  • guide RNA refers to a nucleic acid sequence which can hybridize to a unique sequence located 3’ or 5’ from a T-rich protospacer-adjacent motif (PAM) in a contiguous region of the genome or a chromosome of a cell, wherein the guide is capable of complexing with Cas protein and providing targeting specificity and binding ability for nuclease activity of Cas.
  • the guide RNA is about 18 nucleotides (nt) to about 35 nt. In certain embodiments, the guide RNA is about 23 nt.
  • CRISPR RNA spacer refers to a nucleic acid sequence which encodes a guide RNA.
  • the spacer is a DNA.
  • the spacer is about 18 nucleotides (nt) to about 35nt. In one embodiment, the spacer is about 23 nt.
  • guide RNA sequence comprises a UMI sequence.
  • the term “a heterogeneous population of cells” refers to multiple cells, which are not identical to each other.
  • the heterogenous population of cells includes those that are differentiated by one or more guide RNAs present in single cells.
  • the heterogeneous population of cells includes cells having different guide RNAs that target a different region of a gene or genomic region of interest.
  • a subset of cells z.e., part of but not the whole cell population
  • the heterogenous population of cells include cells from an experimental timepoint (e.g., a control untreated subset and one more subsets obtained one or timepoints following exposure to a drug).
  • the methods provided herein comprise a perturbation step comprising transducing cells with one or more vectors and culturing the cells.
  • Each vector comprises a nucleic acid sequence encoding a Cas protein in operative association with a first promoter which controls expression of the Cas protein, and a CRISPR guide RNA coding sequence in operative association with a second promoter which controls transcription thereof.
  • the vector is a lentiviral vector.
  • the cells are incubated with the vector at a multiplicity of infection (MOI) of about 0.05, about 0.1, about 0.2, or about 0.3.
  • MOI multiplicity of infection
  • operably linked sequences or sequences “in operative association” include both expression control sequences that are contiguous with the nucleic acid sequence of interest and expression control sequences that act in trans or at a distance to control the nucleic acid sequence of interest.
  • a “vector” as used herein is a biological or chemical moiety comprising a nucleic acid sequence which can be introduced into an appropriate cell for replication or expression of said the nucleic acid sequence.
  • Common vectors include naked DNA, phage, transposon, plasmids, viral vectors, cosmids and artificial chromosomes (Gong, Shiaoching, et al. “A gene expression atlas of the central nervous system based on bacterial artificial chromosomes.” Nature 425.6961 (2003): 917-925).
  • plasmid refers to a circular double stranded DNA loop into which additional nucleic acid segments can be ligated.
  • vectors are capable of autonomous replication in a cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors).
  • the vector is a lentiviral vector.
  • Other vectors e.g., non-episomal mammalian vectors
  • a “viral vector” refers to a synthetic or artificial viral particle in which an expression cassette containing a nucleic acid sequence of interest is packaged in a viral capsid or envelope.
  • viral vector include but are not limited to lentivirus, adenoviruses (Ads), retroviruses (/-retroviruses and lentiviruses), poxviruses, adeno-associated viruses (AAV), baculoviruses, herpes simplex viruses.
  • the viral vector is replication defective.
  • replication-defective virus refers to a viral vector, wherein any viral genomic sequences also packaged within the viral capsid or envelope are replicationdeficient; z.e., they cannot generate progeny virions but retain the ability to infect cells.
  • the vector further comprises a reporter gene or a nucleic acid encoding a selectable marker, which may include sequences encoding geneticin, hygromicin, ampicillin or purimycin resistance, among others.
  • selectable marker refers to a peptide or polypeptide whose presence can be readily detected in a cell when a selective pressure is applied to the cell.
  • a reporter gene which is used as an indication of presence of the vector in a cell or not, is readily known by one of skill in the art.
  • the E. coli lacZ gene the chloramphenicol acetyltransferase (CAT) gene, or a gene encoding a fluorescent protein such as Green fluorescent protein (GFP).
  • CAT chloramphenicol acetyltransferase
  • GFP Green fluorescent protein
  • the promoter is an inducible promoter, such as a doxycycline inducible promoter.
  • the first promoter is an RNA pol II promoter.
  • An RNA pol II promoter is a promoter that is sufficient to direct accurate initiation of transcription by the RNA polymerase II machinery, wherein the RNA polymerase II (RNAP II and Pol II) is an RNA polymerase found in the nucleus of eukaryotic cells, catalyzing the transcription of DNA to synthesize precursors of messenger RNA (mRNA) and most small nuclear RNA (snRNA) and microRNA.
  • mRNA messenger RNA
  • snRNA most small nuclear RNA
  • Polymerase II promoters that can be used within the compositions and methods described herein are publicly or commercially available to a skilled artisan, for example, viral promoters obtained from the genomes of viruses including promoters from polyoma virus, fowlpox virus (UK 2,211,504), adenovirus (such as Adenovirus 2 or 5), herpes simplex virus (thymidine kinase promoter), bovine papilloma virus, avian sarcoma virus, cytomegalovirus (CMV), a retrovirus (e.g., MoMLV, or RSV LTR), Hepatitis-B virus, Myeloproliferative sarcoma virus promoter (MPSV), VISNA, and Simian Virus 40 (SV40); other heterologous mammalian promoters including the actin promoter, P-actin promoter, immunoglobulin promoter, heat-shock protein promoters, human Ubiquitin-C promoter,
  • the second promoter is an RNA pol III promoter.
  • a RNA pol III promoter is a promoter that is sufficient to direct accurate initiation of transcription by the RNA polymerase III machinery, wherein the RNA polymerase III (RNAP III and Pol III) is a RNA polymerase transcribing DNA to synthesize ribosomal 5S ribosomal RNA (rRNA), transfer RNA (tRNA), crRNA, and other small RNAs (for example, guide RNA).
  • Polymerase III promoters which can be used with the invention are publicly or commercially available, for example the U6 promoter, the promoter fragments derived from Hl RNA genes or U6 snRNA genes of human or mouse origin or from any other species.
  • pol III promoters can be modified/engineered to incorporate other desirable properties such as the ability to be induced by small chemical molecules, either ubiquitously or in a tissue-specific manner.
  • the promoter may be activated by tetracycline.
  • the promoter may be activated by IPTG (lacl system). See, US5902880A and US7195916B2.
  • a Pol III promoter from various species might be utilized, such as human, mouse or rat.
  • more than one (z.e., multiple) CRISPR guide RNA transcribed by the vectors is targeted to each functional unit of a cell genome of interest.
  • each vector transcribes a single guide RNA.
  • each vector transcribes about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 15, about 20, about 25, or more guide RNAs.
  • the functional unit of a cell genome of interest refers to a genomic sequence which serves a certain function or is suspected of having a certain function. Such function may be expressing a protein of interest, transcribing to an RNA of interest, or regulating a gene of interest.
  • a functional unit of a cell genome typically encompasses a limited region of the genome, such as a region of 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90 to 100 kb of genomic DNA.
  • the functional unit of a cell genome is a coding sequence.
  • the functional unit of a cell genome is a noncoding genomic sequence.
  • the non-coding sequence may be in regions 5' and 3' of the coding region of a gene of interest.
  • the cells of the cells of the heterogenous population of cells or a subset thereof are cultured the cells with a chemical agent or a biological agent or actively physically disturbing the cell culture.
  • chemical agent includes various small molecule drugs/compounds
  • biological agent refers to biological drugs, which are a diverse category of drugs and are generally large, complex molecules. These biological drugs may be produced through biotechnology in a living system, such as a microorganism, plant cell, or animal cell.
  • the cells may be incubated with the chemical and/or biological agent or any combinations thereof, such as a library of peptides or a library of small molecules or a library of anti-cancer drugs, which are available commercially or publicly. See, for example, www. selleckchem .
  • the cells are contacted with various chemical drugs or biological drugs for large-scale drug screens.
  • the cells are treated via CRISPR-Cas enzyme and various guide RNA.
  • the term physical disturbance refers to an active mixing, shaking, stretching, or stirring of the cells in culture.
  • a population of cells is treated separately with any one of the perturbations as described herein or with any combinations of the perturbations, resulting in a heterogeneous population of cells.
  • Chromatin accessibility is the degree to which nuclear macromolecules are able to physically contact chromatinized DNA and is determined by the occupancy and topological organization of nucleosomes as well as other chromatin-binding factors that occlude access to DNA. If such physical contact can be established in a certain region of the DNA, that DNA region is considered to be in an open chromatin state.
  • the organization of accessible chromatin across the genome reflects a network of permissible physical interactions through which enhancers, promoters, insulators, and chromatin-binding factors cooperatively regulate gene expression.
  • chromatin accessibility may refer to chromatin accessibility across the cell genome.
  • ATAC-seq transposase accessible chromatin sequencing
  • transposase for example, a hyperactive mutant Tn5 transposase
  • the transposase excises any sufficiently long DNA in a process called tagmentation: the simultaneous fragmentation and tagging of DNA performed by transposase pre-loaded with sequencing adaptors.
  • the tagged DNA fragments (referred to as fragmented DNA or tagmented DNA) can be amplified by PCR and sequenced. Sequencing reads are then be used to infer regions of increased accessibility as well as to map regions of transcription-factor binding sites and nucleosome positions.
  • the methods provided include performing a tagmention step to assess the effects of perturbations on chromatin accessibility.
  • the method comprises obtaining cell nuclei from all or a portion of the single cells of that have genetic perturbations.
  • the methods include a preparation step, in which the cells are lysed in a resuspension buffer.
  • the cell membrane is lysed but the cell nuclei remain intact.
  • the lysed cells still contain mitochondria. For example, using the cell lysing method performed in the Examples, an about 20% to about 50% mitochondrial reads were found in the ATAC library.
  • cell nucleus or any grammatical variation thereof may refer to a cell nucleus, the membrane-bound organelle found in eukaryotic cells which contains cell genome. It may also include some cytosomal/cytosomic components which remain physically attached to the cell nucleus after cell lysing, for example, endoplasmic reticulum (ER) connected to the nucleus and some mitochondria.
  • ER endoplasmic reticulum
  • Isolated nuclei are present in separate partitions for the tagmentation step.
  • an individual partition contains at least about 1000, about 2000, about 5000, about 25,000, or about 50,000 nuclei per partition.
  • the partition is an individual well of multiwall plate (e.g., a 96-well plate).
  • the tagementation step includes separating nuclei into at least 5, 10, 20, 40, 60, or 80 partitions.
  • partitions include a nucleotide sequence that includes unique first barcode (i.e., a barcode sequence distinguishable from a barcode sequence of another or all other partitions).
  • additional partitions have a nucleotide sequence that includes a second barcode, a third barcode, or a fourth barcode, etc. that is linked to the DNA during tagmentation.
  • Nuclei in partitions are incubated with in tagmentation buffer that comprises a transposome complex, which includes a transposase, a transposon, and a nucleotide sequence comprising a handle sequence and the first barcode.
  • the transposase causes staggered double-stranded breaks in DNA, and the handle sequence and the first barcode are linked to the double-stranded DNA at the staggered breaks.
  • a “transposase” is an enzyme that binds to the end of a transposon and catalyzes its movement to another part of the genome by a cut and paste mechanism or a replicative transposition mechanism.
  • such enzyme is a member of the RNase superfamily of proteins which includes retroviral integrases.
  • Examples of transposases include Tn3, Tn5, and hyperactive mutants thereof.
  • Tn5 can be found in Shewanella and Escherichia bacteria.
  • An example of a hyperactive mutant Tn5 comprises a mutation of E54K.
  • the transposase is TnY or Tn5.
  • TnY is a hyperactive mutant of the transposase from Vibrio parahemolyticus (ViPar).
  • the inside and outside ends (IE and OE, respectively) of the ViPar transposon utilize the same sequence as the IE and OE of the Tn5 transposon, suggesting the ViPar transposon would be compatible with existing Tn5-based workflows.
  • Two mutations were introduced: (1) P50K, equivalent to the mutation E54K in Tn5, which is predicted to make the transposon hyperactive and (2) M53Q, which changes the residue that interacts with nucleotide 9 (a thymine) on the nontransferred strand of the mosaic end (ME) similar to Tn5 Q57, predicted to increase binding to the Tn5 ME.
  • TnY The ViPar transposase with P50K and M53Q mutations, henceforth referred to as TnY, showed Tn5 ME loading and tagmentation activity.
  • the insertion site preference of TnY was characterized by performing tagmentation on NA12878 DNA and sequencing on a MiSeq Instrument (Illumina); it was found that TnY has insertion site preferences distinct from, but of a similar magnitude to those of Tn5.
  • transposon refers to a nucleic acid molecule that is capable of being incorporated into a nucleic acid by a transposase enzyme.
  • a transposon includes two transposon ends (also termed “arms” and “mosaic end” or “ME”, for example, a doublestranded mosaic end comprising a pMENT common oligo).
  • the two transposon ends are linked by a sequence that is sufficiently long to form a loop in the presence of a transposase.
  • Transposons can be double-, single-stranded, or mixed, containing single- and double-stranded region(s), depending on the transposase used to insert the transposon.
  • transposon ends are doublestranded, but the linking sequence need not be double-stranded.
  • transposition event these transposons are inserted into double-stranded DNA.
  • transposon end refers to the sequence region that interacts with transposase.
  • the transposon ends are double-stranded for transposases Mu, Tn3, Tn5, Tn7, TnlO, etc.
  • the transposon ends are single-stranded for transposases IS200/IS605 and ISrad2, but form a secondary structure, just like a doublestranded region.
  • transposase enzyme See, for example, US20150337298A1, which is incorporated herein by reference.
  • the transposome complex comprises a transposase assembled with a transposon comprising two mosaic end double-stranded (MEDS) oligos.
  • the transposome complex comprises a barcode in one or both of the MEDS oligos.
  • the transposome complex further comprises a nucleic acid sequence at the 5’ ends of the MEDS oligos, wherein the nucleic acid sequence is able to anneal to a PCR primer.
  • a T5 oligo may be annealed to MEDS A and a T7 oligo may be annealed to MEDS B.
  • handle refers to a nucleic acid sequence that is complementary to and capable of binding a capture sequence. For example, in a suitable PCR amplification, the handle and capture sequences anneal and application results in a sequence complementary to the handle and/or capture sequences.
  • the handle is positioned at the 3’ end of an oligonucleotide sequence. In other embodiments, the handle is positioned at the 5’ end of a construct oligonucleotide sequence.
  • the handle has a length of at least 4 nucleotides, 5 nucleotides, 10 nucleotides, 15 nucleotides, 20 nucleotides, 25 nucleotides, 30 nucleotides, 35 nucleotides, 40 nucleotides, 45 nucleotides, 50 nucleotides, 60 nucleotides, 70 nucleotides, 80 nucleotides, 90 nucleotides, 100 nucleotides, or a length within a range of any two of the foregoing lengths.
  • the tagmentation step include the addition of a reagent to stop the tagmentation before proceeding with subsequent steps.
  • the reagent includes EDTA.
  • RNAs including, e.g., CRISPR guide RNAs and other RNA species
  • cDNA complementary DNA
  • Cell nuclei are incubated with reverse transcription primers and an oligo (i.e., a template switch oligo (TSO)) that includes a handle sequence and a barcode sequence, or the corresponding antisense sequence thereof, reverse transcriptase, and dNTPs in a reverse transcription buffer.
  • TSO template switch oligo
  • the reverse transcription reaction generates cDNA products that the handle and barcode sequences.
  • Suitable primers can be designed for the reverse transcription.
  • the reverse transcription primers include polydT.
  • the reverse transcription primers include primers specific for CRISPR guide RNA.
  • the reverse transcription step is performed in the same partitions as the tagmentation step (i.e., nuclei present in a partition are not redistributed following tagmentation step and/or transferred to new partitions). In certain embodiments, all or a portion of nuclei present in a partition during a tagmentation reaction are transferred to a new partition. Irrespective of whether the tagmentation step and reverse transcription steps are performed in the same partition, the barcode linked to the double-stranded DNA during the tagmentation step is matched to the barcode of the template switch oligo. In certain embodiments, the barcode of the template switch oligo is a first barcode, identical to the first barcode of a tagmentation reaction.
  • the TSO of the corresponding reverse transcription includes a matching barcode identifying nuclei (or a subset of nuclei).
  • cellular RNA and CRISPR guide RNAs are reverse transcribed to cDNA comprising a handle sequence and a barcode sequence, or the corresponding reverse-complement sequence thereof.
  • the barcode sequences of the fragmentation and reverse transcription reactions can be used to assign sequences to specific cell or nuclei.
  • the TSO further comprises a “Unique Molecular Identifier “ (UMI), which is a random sequence of nucleotide bases, which when it is a functional element of the polymer construct, is specific for that polymer construct.
  • UMI Unique Molecular Identifier
  • the UMI permits identification of amplification duplicates of the polymer construct/construct oligonucleotide sequence with which it is associated.
  • One or more UMI may be associated with a single polymer construct/construct oligonucleotide sequence.
  • the UMI may be positioned 5’ or 3’ to a barcode in a nucleotide construct (e.g., a TSO).
  • a UMI is added during the method.
  • RNA-sequencing method a method for RNA-sequencing a UMI is added during the method.
  • another UMI is introduced during reverse transcription.
  • Each UMI is specific for its construct oligonucleotide sequence.
  • the UMI is about 8 nucleotides in length. In other embodiments, each UMI can be at least about 1 to 100 nucleotides, in length.
  • the UMI is formed of a random sequence of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
  • cell nuclei are pooled.
  • the pooled nuclei include nuclei from multiple (i.e., more than one) partition.
  • the pooled nuclei are in an aqueous suspension.
  • the pooled nuclei include nuclei and/or a subset of nuclei from at least 5, at least 10, at least 20, at least 40, at least 60, or at least 80 partitions.
  • nuclei are randomly partitioned.
  • the nuclei suspension is then subject to an additional, second-barcoding step utilizing droplet microfluidics (e.g., lOx Genomics AT AC kit).
  • droplet microfluidics e.g., lOx Genomics AT AC kit.
  • the random partitioning results in individual partitions (a droplet) containing a bead and at least about 2, at least about 5, at least about 10, at least about 15, at least about 20 nuclei, at least about 30, at least about 40, or at least about 50 nuclei.
  • nuclei are partitioned with a bead comprising linked nucleotide sequences comprising a bead-specific barcode sequence and a capture sequence.
  • the bead-specific barcode sequence is a 1 OX AT AC GEM barcode.
  • the capture sequence is complementary to and capable of binding a handle sequence present on products of the tagmentation step (e.g., a nucleic acid having a handle sequence and a first barcode linked to double-stranded DNA) or the reverse transcription step (i.e., a nucleotide sequence having a handle sequence and a first barcode in combination with a CRISPR guide RNA transcribed cDNA or cDNA generated from a nuclear and/or a cellular RNA of comprising the first barcode).
  • the capture sequence is a 1 OX AT AC GEM capture sequence.
  • a PCR reaction results in disruption of the nuclei in a droplet, and generation of molecular species having a specific bead-barcode.
  • the resulting molecular species include products of the tagmentation and/or reverse-transcription steps, thereby generating a library of amplicons having a first barcode and a second barcode. See FIG. 2.
  • pooled nuclei are randomly partitioned into a set of partitions and that do not include a bead for the additional barcoding step.
  • the individual partitions contain oligos comprising a second-barcode sequence and a capture sequence.
  • the pooled nucleic are randomly partitioned and then incubated with oligos comprising a second-barcode sequence and a capture sequence.
  • the oligos of each partition are unique, i.e., the second-barcode of the oligo is not present in another or any other individual partitions.
  • the pooled nuclei are randomly partitioned into at least 5, at least 10, at least 20, at least 30, at least 40, at least 50 partitions.
  • the partitions are wells of a microwell plate (e.g., a 96-well plate)
  • the random partitioning results in individual partitions (e.g., wells) containing at least about 2, at least about 5, at least about 10, at least about 15, at least about 20 nuclei, at least about 30, at least about 40, or at least about 50 nuclei.
  • the second barcode sequence is a 10X AT AC GEM barcode.
  • the capture sequence is a 10X AT AC GEM capture sequence.
  • a PCR reaction is performed with the cells in individual partitions (e.g., wells), resulting in disruption of the nuclei, and generation of molecular species having a specific bead-barcode.
  • the resulting molecular species include products of the tagmentation and/or reverse-transcription steps, thereby generating a library of amplicons having a first barcode and a second barcode.
  • DNA and/or cDNA are extracted and sequenced.
  • the methods comprise further amplification (linear or exponential) to obtain libraries with increased copy numbers of molecular species. Analysis of the sequences provides chromatin accessibility and RNA sequences (transcriptome) information for single cells that have identifiable genetic perturbations (through capture and sequencing of guide RNAs).
  • the methods comprise isolation of a molecular species from an amplification library or a subset of molecular species from an application library. See Examples 1 and 2 for exemplary protocols.
  • PCR amplification products are separated to obtain a library comprising a combination of PCR products comprising double-stranded DNA and cDNA from transcription of CRISPR guide RNAs and cDNA generated from cellular RNA.
  • PCR amplification products are separated to obtain a first library comprising double-stranded DNA of (b), and a second library comprising (i) cDNA from transcription of the CRISPR guide RNAs and (ii) cDNA generated from cellular RNA, optionally wherein the separation is based on size.
  • the separation is based on size.
  • separation separation is achieved using a streptavidin-biotin mediated method, wherein prior PCR reaction links a biotinylating site to a molecular species.
  • DNA sequencing is the process of determining a nucleic acid sequence - the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine.
  • Methods of sequencing may include, but do not limited to, Maxam-Gilbert sequencing, shotgun sequencing, bridge PCR, Chain-termination methods, Single-molecule real-time sequencing, Ion semiconductor (Ion Torrent sequencing), Pyrosequencing (454), Sequencing by synthesis (Illumina), Combinatorial probe anchor synthesis (cPAS- BGI/MGI), Sequencing by ligation (SOLiD sequencing), Nanopore Sequencing, Chain termination (Sanger sequencing), Massively parallel signature sequencing (MPSS), and Polony sequencing.
  • Such sequence may be performed on a deep sequencing platform which sequences for multiple times, sometimes hundreds or even thousands of times and/or via a next-generation sequencing (NGS) approach (which is also known as high-throughput sequencing).
  • NGS next-generation sequencing
  • the DNAs or cDNAs having the same first barcode and second barcode are identified as being obtained from the same cell (or nuclei).
  • the second barcode is a bead-specific barcode.
  • presence of certain RNA in the cell is determined through sequencing cDNAs.
  • the guide RNA may be aligned to identify a respective target gene or genomic region of interest.
  • transcriptome shown by RNA sequences may be acquired via cDNA sequencing, thus providing data available via traditional RNA-seq (RNA sequencing).
  • the genomic DNAs are analyzed as in ATAC-seq.
  • sequence reads of the fragmented genomic DNAs are acquired and aligned to a reference genome (for example, using programs available to one of skill in the art such as BWA and Bowtie2).
  • one or more parameters for quality control purposes are acquired, for example, fragment size distribution, library complexity, adjusting read start position based on transposase (for example, aligning sequence reads to the positive strand are offset by ⁇ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 bp, and all reads aligning to the negative strand are offset by ⁇ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 bp), and promoter/transcript body score (which is calculated for coverage of promoter divided by the coverage of transcripts body, showing if the signal is enriched in promoters).
  • aligning sequence reads to the positive strand are offset by + 4 bp
  • all reads aligning to the negative strand are offset by -5 bp).
  • mapping results are separated according to uniqueness and alignment type (concordant, discordant, and non-concordant/non-discordant). Peak-calling identifying enriched (signal) regions in ATAC-seq data is then performed using tools, such as MACS2.
  • the chromosome position is plotted in x axis and the enrichment score is plotted in y axis. Therefore, peaks in the plot identified enriched regions in chromosome, indicating open chromatin with high chromatin accessibility.
  • Nucleosome free, mononucleosome, dinucleosome, and trinucleosome regions may be identified: (1) Nucleosome free, mononucleosome, dinucleosome, and trinucleosome regions; (2) distribution of nucleosome-free and nucleosome-bound regions; (3) transcription factor footprints; (4) sample correlations. Numbers of ATAC fragments, peaks, as well as differential peaks (for example, for comparing ATAC-seq samples from two different conditions) may be obtained using this method.
  • cells with at least about 50, about 100, about 200, about 300, about 400, about 500, about 600, about 700, about 800, about 900, about 1000, about 2000, about 3000, about 4000, about 5000, about 6000, about 7000, about 8000, or about 9000 unique ATAC-seq fragments are selected for analysis.
  • each cell is required to have at least about 10, about 20, about 30, about 40, about 50, about 60, about 70, about 80, about 90, about 100, about 200, about 300, about 400, about 500, about 600, about 700, about 800, about 900, about 1000, about 2000, about 3000, or about 4000 RNA (for example guide RNA) reads with at least about 90%, about 95%, about 96%, about 97%, about 98%, or about 99% of the reads assigned to one RNA sequence.
  • RNA for example guide RNA
  • cells with at least about 2000 unique ATAC-seq fragments are selected for analyses.
  • each cell is required to have at least about 100 guide RNA reads with at least about 99% of the reads assigned to one RNA sequence.
  • analysis is limited to cells having a specific perturbation, wherein at least 25, 50, 100, 150, 200, or 250 cells are identified as having the perturbation.
  • ChlP-seq may be used to identify enrichment or depletion in accessibility of transcription factor (TF) binding sites following chromatin modifier knockout.
  • JASPAR motifs may be used to predict TF binding sites from the JASPAR database was also utilized (386 motifs from JASPAR 2016, human CORE dataset). Transcription factor motif enrichment and depletion scores may be calculated, for example, using chromVAR20.
  • coverage per base around AP-1 motifs using mononucleosomal fragments (defined as paired-end ATAC-seq fragments with a length between 180 and 247 nt9) was calculated, for example, using BEDTools.
  • accessibility of enhancers and promoters may be determined.
  • a null peak distribution derived from non-perturbed cells and/or untreated is used as a reference and data acquired from cells is compared to the reference.
  • each cell population per perturbation is down-sampled to a smaller cell number and the data acquired is compared to a non-perturbated cell population of a similar size.
  • Each population of cells is resampled about 100, about 200, about 500, about 600, about 700, about 800, about 900, about 1000, about 1500, about 2000, about 3000, about 5000, or more times and the coverage at transcription start sites, weak enhancers (midpoint), and strong enhancers (midpoint) is calculated.
  • a method for evaluating effects of genetic perturbations on chromatin accessibility and the transcriptome of single cells in a population of cells comprising:
  • transposome complex comprises a transposase, a transposon, and a nucleotide sequence comprising a handle sequence and a first barcode, wherein the transposase causes staggered double-stranded breaks in DNA, and wherein the handle sequence and the first barcode are linked to the doublestranded DNA at the staggered breaks;
  • (c) performing reverse transcription on nuclei from (b), which comprises contacting and incubating the nuclei with reverse transcription primers and template switch oligos (TSOs) comprising the handle sequence and the first barcode, or the corresponding reverse-complement sequence thereof, optionally wherein the TSO comprise a UMI, reverse transcriptase, and dNTPs in a reverse transcription buffer, whereby cellular RNA and CRISPR guide RNAs are reverse transcribed to cDNA comprising the handle sequence and the first barcode, or the corresponding reverse-complement sequence thereof;
  • TSOs reverse transcription primers and template switch oligos
  • a method for evaluating effects of genetic perturbations on chromatin accessibility and the transcriptome of single cells in a population of cells comprising:
  • transposome complex comprises a transposase, a transposon, and a nucleotide sequence comprising a handle sequence and a first barcode, wherein the transposase causes staggered double-stranded breaks in DNA, and wherein the handle sequence and the first barcode are linked to the doublestranded DNA at the staggered breaks;
  • (c) performing reverse transcription on nuclei from (b), which comprises contacting and incubating the nuclei with reverse transcription primers and template switch oligos (TSOs) comprising the handle sequence and the first barcode, or the corresponding reverse-complement sequence thereof, optionally wherein the TSO comprise a UMI, reverse transcriptase, and dNTPs in a reverse transcription buffer, whereby cellular RNA and CRISPR guide RNAs are reverse transcribed to cDNA comprising the handle sequence and the first barcode, or the corresponding reverse-complement sequence thereof;
  • TSOs reverse transcription primers and template switch oligos
  • any one of embodiments Al to A3, wherein the one or more genetic perturbations include CRISPR-Cas mediated editing, including CRISPR/Cas9, prime-editing, base-editing, CRISPRa, and/or CRISPRi.
  • A5 The method of any one of embodiments Al to A4, wherein more than one CRISPR guide RNA targets a gene or genomic region of interest or a different gene genomic region of interest in a single cell.
  • step (e) comprises partitioning at least about 2, at least about 5, at least about 10, at least about 15, or at least about 20 nuclei with the bead.
  • step (e) comprises partitioning at least about 2, at least about 5, at least about 10, at least about 15, or at least about 20 nuclei with the bead.
  • step (e) comprises partitioning at least about 2, at least about 5, at least about 10, at least about 15, or at least about 20 nuclei with the bead.
  • step (e) comprises partitioning at least about 2, at least about 5, at least about 10, at least about 15, or at least about 20 nuclei with the bead.
  • step (e) comprises partitioning at least about 2, at least about 5, at least about 10, at least about 15, or at least about 20 nuclei with the bead.
  • step (e) comprises partitioning at least about 2, at least about 5, at least about 10, at least about 15, or at least about 20 nuclei with the bead.
  • step (e) comprises partitioning at least about 2, at least about 5, at least about 10, at
  • Al 1 The method of any one of embodiments Al to A10, wherein the PCR products generated in step (e) are separated to obtain a first library comprising double-stranded DNA of (b), and a second library comprising (i) cDNA from transcription of the CRISPR guide RNAs and (ii) cDNA generated from cellular RNA, optionally wherein the separation is based on size.
  • A12 The method of any one of embodiments Al to Al l, wherein the population of cells of (a) have been further treated with a chemical agent or a biological agent.
  • Al 3 The method of any one of embodiments Al to A12, wherein the analysis is limited to cells (nuclei) defined as having at least 200 fragments per cell and/or perturbations wherein at least 100 cells are identified as having the perturbation.
  • BT16-luciferase 42 cells were a gift from Rintaro Hashizume.
  • BT12 cells were a gift from Charles Roberts. NIH-3T3 (CRL-1658) and CHLA06 (CRL-3038) were acquired from ATCC.
  • HEK293FT cells were acquired from ThermoFisher (R70007).
  • BT16 and BT12 cells were validated by STR profiling, while other lines were authenticated by the vendor. All cell lines were maintained at 37 °C and 5% CO2 in DIO medium: DMEM with high glucose and stabilized L-glutamine (Caisson DML23) supplemented with 10% Serum Plus II (Sigma 14009C).
  • Monoclonal CRISPRi -expressing BT16 cell lines were generated by transducing cells with lentiCRISPRi(v2)-Blast (Addgene 170068) 14 , selecting with lOpg/ml Blasticidin S (ThermoFisher Al 113903), and plating at a low density for colony picking. Several clones were selected and monitored for growth. A clone maintaining normal BT16 growth patterns and CRISPRi(v2) expression by Cas9 immunocytochemistry was selected for the MultiPerturb-seq screen. NIH-3T3, BT12, and CHLA06 cells were also transduced with lenti CRISPRi (v2)-Blast and selected with lOpg/ml blasticidin for 1 week.
  • gRNAs guide RNAs
  • the AT/RT library targeted genes that encode proteins with roles in DNA modification, histone modification, histone chaperones, transcription factors, chromatin remodelers, and structural factors.
  • the library was designed using gRNAs from the Dolcetto CRISPRi library and CRISPick 43 . Three gRNAs were selected per gene and homopolymers were excluded. Oligonucleotides were ordered and synthesized by Twist Biosciences in pooled format. For the mouse spike-in, mouse non-targeting gRNAs were ordered individually through Integrated DNA Technologies (IDT) and pooled for library cloning.
  • IDT Integrated DNA Technologies
  • Oligonucleotides were diluted, and a PCR cycle test was performed to ascertain the minimum cycles needed for library amplification to preserve integrity. Following this, oligonucleotides were amplified using a two-step nested PCR, then cloned in lentiGuideFE- Puro (Addgene 170069) with Gibson cloning using Gibson mix (NEB E261 IL) and precipitated with ethanol. The library was then transformed into Endura cells (Biosearch 60242-2). Bacteria were then grown on plates, maxi-prepped (IBI Scientific IB47125), and then sequenced. For quality control, libraries were sequenced on Illumina MiSeq.
  • Lentiviral libraries were prepared in T225 flasks. Each flask was seeded with 27x10 6 cells the day before in 30 ml of antibiotic-free D10 media to achieve 80-90% confluence before transfection.
  • the transfection mix was 24.9pg of the transfer plasmid (including the epigenetic remodelers or mouse non-targeting library), 13.7pg pMD2.G (Addgene 12260), 19.9pg psPAX2 (Addgene 12259), 2490pl OptiMEM (Invitrogen 51985-091) and 138pl 1 mg/ml polyethylenimine linear MW 25000 (Polysciences 23966). The mixture was mixed and allowed to incubate for 10 minutes at room temperature.
  • nuclei were spun down, pooled, resuspended in 450pl PBS and combined with tagmentation mix: 240pl 5X TD-TAPS (50mM TAPS-NaOH buffer, pH 8.5 [Boston BioProducts BB-2375], 25mM MgCh, 50% DMF [Sigma 494488]), 120pl 10% Tween-20, 300pl dilution buffer (lOmM Tris-HCl, pH 7.4, lOOmM NaCl, 50% glycerol, ImM DTT), 30pl RiboLock RNase inhibitor (ThermoFisher EO0381). The nuclei were then split among wells of barcoded transposomes for tagmentation.
  • RT reverse transcription
  • 8pl 5X RT buffer 250 mM Tris-HCl, 375 mM KC1, 15 mM MgCh, 50 mM DTT
  • 2pl dNTPs 2pl MPSprimer_06 (lOpM)
  • 4pl MPSprimer_08 4pl MPSprimer_08
  • 2pl Maxima RT H-minus ThermoFisher EP0753
  • Ipl Ribolock ThermoFisher EO0381
  • Nuclei were then resuspended well by triturating with a narrowed pipette tip and all wells were pooled into 2 x 1.5mL tubes, spun down, and re-pooled in a 1.5mL tube.
  • the narrowed pipette tip was produced using a standard plastic 20pl pipette tip (Rainin) melted to narrow gauge using an infrared sterilizer (Joanlab DS-900S). After observing nuclei to avoid clumps and counting, nuclei were resuspended in diluted nuclei buffer to achieve the desired loading amount (100,000 nuclei in 8pl) and combined with 7pl ATAC buffer B (lOx Genomics PN2000193).
  • Part 2 1 OX ATAC GEM generation, bar coding, and cleanup
  • nuclei suspension was prepared for second-round barcoding using droplet microfluidics (10X Genomics ATAC kit PN1000176) following the manufacturer’s instructions. Briefly, nuclei were mixed with the master mix (56.5 pl Barcoding reagent B (PN2000194), 1.5pl Reducing agent A (PN2000087), 2pl Barcoding enzyme (PN2000125/139), and loaded onto the Chromium Next GEM Chip H (PN1000162) with glycerol, gel beads, and partitioning oil. Following the run on the Chromium Controller, lOOpl GEMs were collected and transferred to a PCR tube for GEM incubation. 15 cycles were substituted for 12 cycles during the linear amplification. GEMs were then cleaned with Dynabeads per the manufacturer’s instructions, and libraries were split into 20pl ATAC and 20pl RNA libraries for final library prep. We recovered ⁇ 3.6 cells per droplet on average. Part 3: Library preparation
  • the ATAC fraction (20pl) was cleaned up with 1.2X SPRI (Illumina) and amplified with an lOOpl reaction using NEBNext: 50pl 2X High-Fidelity 2X Master Mix (NEB M0541S), 5pl MPSprimer_04 (lOpM), MPSprimer_14 (lOpM), 20pl ATAC fraction and 20pl water (30 seconds 98°C, (10 seconds 98°C, 30 seconds 63°C, 1 minute 72°C) x 10-15 cycles, 2 minutes 72°C, hold 4°C), then cleaned with double-sided SPRI (0.45X, 1.8X) in order to isolate fragments of lengths 50-1000 bp.
  • NEBNext 50pl 2X High-Fidelity 2X Master Mix
  • NEB M0541S 5pl MPSprimer_04 (lOpM), MPSprimer_14 (lOpM)
  • 20pl ATAC fraction and 20pl water (30 seconds 98°C, (10 seconds
  • RNA fraction (20pl) was cleaned by incubation with 8ul ExoSAP for 15 minutes at 37°C and then 15 minutes at 80°C.
  • the cleaned RNA product was amplified using an ISPCR 50 with an lOOpl KAPA HiFi reaction (Roche 07958935001): 50pl 2X Master Mix, 2.5pl MPSprimer_04 (lOpM), 2.5pl MPSprimer_07 (lOpM), 2.5pl MPSprimer_12 (lOpM), 28pl cleaned RNA product, and 14.5pl water (3 minutes 95°C, (20 seconds 95°C, 30 seconds 66°C, 1 minute 72°C) x 10 cycles, 2 minutes 72°C, hold 4°C).
  • the mRNA and gRNA fractions were split using a two-sided SPRI 4 .
  • the mRNA was collected with a 0.6X SPRI and the gRNA was isolated from the supernatant using an additional 1.4X SPRI. Each fraction was then resuspended in 1 Opl water.
  • the mRNA may then be amplified with 3-9 additional cycles of a 50pl reaction if there is less than Ing of product: 25pl 2X KAPA HiFi Master Mix, 1.25pl MPSprimer_04 (lOpM), 1.25pl MPSprimer_07 (lOpM), lOpl cleaned RNA product, and 12.5pl water (3 minutes 95°C, (20 seconds 95°C, 30 seconds 66°C, 1 minute 72°C) x 3-9 cycles, 2 minutes 72°C, hold 4°C).
  • the 1 Opl mRNA fraction was tagmented with Tn loaded with MPSprimer l 13 in 20pl of tagmentation buffer for 5 minutes at 55°C. This was then cleaned with DNA Clean & Concentrator-5 (Zymo D4014), resuspended in 33.5 l water and PCR amplified with 50pl PfuX7 51 : lOpl 5X GC buffer, Ipl dNTPs, 2.5pl MPSprimer_04 (lOpM), 2.5pl MPSprimer_05 (lOpM), 0.5pl X7 polymerase, and 33.5pl mRNA fraction using the following program: 5 minutes 72°C, 30 seconds 98°C, (10 seconds 98°C, 30 seconds 61°C, 1 minute 72°C) x 10 cycles, 2 minutes 72°C, hold 4°C.
  • the lOpl gRNA fraction was cleaned with 4pl 0.2U/pl ExoSAP and amplified with a 50pl intermediate PCR: 25pl 2X KAPA HiFi Master Mix with 1.25pl biotinylated guide scaffold primer (MPS_primerl 1, lOpM), 1.25 l MPSprimer_04 (lOpM), lOpl gRNA fraction, and 8.5pl water (3 minutes 95°C, (20 seconds 95°C, 30 seconds 64°C, 1 minute 72°C) x 10 cycles, 2 minutes 72°C, hold 4°C), then cleaned again with 1.8X SPRI, resuspended in lOpl water, and incubated with 4pl ExoSAP.
  • MPS_primerl 1, lOpM 1.25 l MPSprimer_04
  • gRNA was pulled down with Dynal MyOne Dynabeads Streptavidin Cl (ThermoFisher 65001), resuspended in 45pl water, then amplified with a final inner (guide library) PCR using KAPA HiFi Master Mix: 50pl Master Mix, 2.5pl MPSprimer_04 (lOpM), 2.5pl MPSprimer_13 (lOpM), and 45pl gRNA pulldown product (3 minutes 95°C, (20 seconds 95°C, 30 seconds 57°C, 1 minute 72°C) x 10 cycles, 2 minutes 72°C, hold 4°C).
  • MultiPerturb-seq was developed incrementally, first incorporating ATAC, then mRNA and gRNA capture, ensuring preservation of each modality throughout the process (several key examples shown in FIG. 5).
  • ATAC 10X ATAC kit
  • mRNA and gRNA capture ensuring preservation of each modality throughout the process.
  • FIG. 5 we built off of our previous work 6 , adapting it to the 10X ATAC kit using a mock gel bead oligonucleotide.
  • We then adapted the direct guide capture technique from 4 also described in 55 .
  • TSO template switch oligonucleotide
  • UMI barcode and unique molecular identifier
  • FIG. 5 A - FIG. 5 J are 1-2% with Ikb Plus DNA ladder (NEB N3200L) unless otherwise noted.
  • Ikb Plus DNA ladder N3200L
  • the barcode 1 reference was derived from oligonucleotide sequences and the barcode 2 reference was constructed from the whitelist provided by cellranger-atac (10X Genomics). ATAC reads were aligned with bowtie2 59 (version 2.5.1) with default parameters to the joint human (hg38) and mouse (mm 10) genome reference provided by 10X Genomics.
  • Open chromatin peaks were called using macs2 60 callpeak (version 2.2.7.1) with the parameters -f BED -g hs -p 0.01 -nomodel -shift 37 -extsize 73 -B -SPMR -keep-dup all -call-summits then reads were assigned to peaks based on loci with bedtools window (version 2.30.0) with a 100 bp window around the start position.
  • mRNA reads were aligned with STAR 61 (version 2.7.3a) using the settings -quantMode GeneCounts -soloFeatures GeneFull_Ex50pAS, then annotated with subread 62 featureCounts (version 2.0.4) using a joint human (hg38) and mouse (mmlO) gtf (10X Genomics, 2020-A) with the settings -t gene -R SAM. Aligned reads were then joined to create a list of cell barcodes (barcode 1 and barcode 2), unique molecular identifiers (UMIs) if applicable, and aligned/annotated reads.
  • barcode 1 and barcode 2 unique molecular identifiers
  • Perturbed cells were separated (pseudo-bulk) by perturbation and compared to published transcriptomic 28 and accessible chromatin 34, 35 atlases by computing the Pearson correlation across the top 1000 highly variable genes or peaks. Correlations were computed between each perturbation-specific pseudo-bulk and previously published primary tissue gene expression or open chromatin. For all correlations and differentiation scores, we only used cells with at least 200 fragments per cell and perturbations with at least 100 cells captured.
  • HVGs highly variable genes
  • HVPPs highly variable promoter-adjacent peaks
  • SCEPTRE 65 a nonparametric tool that resamples perturbations to infer associations with gene expression 65 with features per cell and counts per cell as covariates.
  • SCEPTRE to other analyses beyond gene expression, such as the ATAC nearest gene (any distance), ATAC TSS (+/-2kb), and RNA transcription factor transcription factor signatures from msigdb.
  • Gene Ontology (GO) enrichment analyses were performed using clusterProfiler enrichGO (version 4.6.2).
  • CUT&RUN For CUT&RUN, we used the CUT ANA ChIC/CUT&RUN Kit (EpiCypher 14-1048) with antibodies against H2A.Z (Abeam ab4174), H3K4me3 (EpiCypher 14-1048), and IgG (EpiCypher 14-1048).
  • Coordinates chromosome, start, end, and peak pileups (height at peak summit) from macs2 outputs were used for further analysis. Scaling factors were calculated based on the percent of uniquely aligned reads from the E. coli spike-in alignment out of the total uniquely aligned reads (human and bacterial). Peak pileups were adjusted by the scaling factor.
  • peaks that were reproducibly present between replicates. To do this, peaks from each biological replicate were intersected. Overlapping peaks with peak heights within 50% of each other (between replicates) were kept for further analysis and termed reproducible peaks. For each reproducible peak, we randomly chose it from either biological replicate to avoid issues with averaging or peak merging, which may alter peak shape.
  • E. co/z-normalized bigwig files were created using deeptools 68 bamCoverage (version 3.4.2) with the options — scaleF actor — extendReads —binSize lO.Heatmaps were generated using deeptools computeMatrix reference-point with the parameters —referencePoint center -a 3000 -b 3000 -p 8 —skipZeros — sortRegions descend — sortUsing mean and the blacklist file ENCODE blacklist v2 for hg38 69 as — blackListFileName to filter out reads aligning to problematic genome regions.
  • CUT&RUN signal was computed using deeptools multiBigwigSummary with transcription start site coordinates and the blacklist as above. For the pileup visualization, only one replicate per biological condition is shown.
  • BT16, BT12, and/or CHLA06 cells with lentiCRISPRi(v2)- Blast were transduced with guide RNAs (gRNAs) in lentiGuideFE-Puro (Addgene 170069).
  • gRNAs guide RNAs
  • the gRNAs were designed using the Dolcetto CRISPRi library and CRISPick 43 then synthesized by Integrated DNA Technologies (IDT).
  • IDTT Integrated DNA Technologies
  • the backbone was digested with BsmBI (ThermoFisher FD0454) and oligos were annealed, phosphorylated and ligated into the lentiGuideFE-Puro backbone.
  • Lentivirus was produced as described in Lentivirus production above (scaled to 6-well format) and stored at -80°C. For arrayed validations, sufficient lentivirus was added to the cells to achieve 20 - 50% cell transduction. After 48 hours, cells were replated in media with puromycin (1 pg/ml) and selected for 3 days. SOX2 staining and flow cytometry
  • Cells were plated in 96-well plates with 5,000 cells per well in triplicate. The next day, media was aspirated, and cells were washed and fixed with 4% paraformaldehyde (diluted 1 :4 from 16%, Electron Microscopy Sciences 15710-S) for 15 minutes, and washed with PBS. Cells were then permeabilized with 0.2% Tween-20 for 5 minutes and blocked with PBS with 0.2% Tween-20 and 3% BSA for 1 hour.
  • the corresponding secondary antibody was added at a 1 :800 dilution (ThermoFisher A- 21202 for TUJ1 (mouse), ThermoFisher A-l 1073 for MAP2 (guinea pig), ThermoFisher 31572 for ATOH8 (rabbit)) with 2mM Hoechst (Sigma B2261) and incubated for 1 hour at room temperature. Cells were then washed with PBS for an additional 3 washes. All steps were performed at room temperature on a rocker unless otherwise noted. Images were acquired with a 20X objective using an epifluorescence microscope (Keyence BZ-X800). Five images were acquired per well.
  • nuclei were stained with 2mM Hoechst 3342 (Sigma 4533) for 15 minutes, washed with PBS, and images were acquired with a 20X objective using an epifluorescence microope (Keyence BZ-X800). The images were processed for display using FIJI (version 2.1.0) and quantitative image analysis was run in CellProfiler (version 4.2). Cells were quantified based on Hoechst staining and binned into EdU positive and EdU negative cells based on the intensity of the signal, using the ClassifyObjects module.
  • 3e6 nuclei should be resuspended in 156.863 l for 19.125k cells/ pl, of which 8pl nuclei stock should be used.
  • Example 3 Pooled CRISPR screens with joint single-cell chromatin accessibility and transcriptome profiling
  • MultiPerturb-seq a high-throughput CRISPR screening platform with joint single nucleus chromatin accessibility, transcriptome, and guide RNA capture
  • MultiPerturb-seq a high-throughput CRISPR screening platform with joint single nucleus chromatin accessibility, transcriptome, and guide RNA capture
  • AT/RT atypical teratoid/rhabdoid tumor
  • SMARCB1 SWESNF chromatin remodeling subunit
  • MultiPerturb-seq links pooled CRISPR perturbations with single-cell open chromatin (ATAC-sequencing) and gene expression (RNA-sequencing) profiles at scale (FIG. 1 A, FIG. 2).
  • ATC-sequencing atypical teratoid/rhabdoid tumor
  • RNA-sequencing gene expression profiles at scale
  • MultiPerturb-seq open chromatin provides a broad overview of epigenetic state, capturing many levels of gene regulation, while gene expression provides a robust view of cell state and developmental stage; together, they link CRISPR perturbations with cell states and putative mechanisms of action for transcriptional reprogramming.
  • MultiPerturb-seq we combine combinatorial indexing and droplet microfluidics to scale throughput 11 ' 13 — loading 100,000 cells on a single 10X Chromium AT AC lane — which results in significant cost advantages over existing uni- and multimodal single-cell perturbation approaches (FIG. IB).
  • FIG. 1H, FIG. 7E - FIG. 7H We achieved robust detection of expressed genes, open chromatin peaks, and gRNAs.
  • FIG. 1H, FIG. 7E - FIG. 7H We achieved robust detection of expressed genes, open chromatin peaks, and gRNAs.
  • For the ATAC we observed characteristic open chromatin enrichment around transcriptional start sites (FIG. II, FIG. 7E) and, for the RNA, we found low mitochondrial reads (FIG. 7F).
  • RNA and ATAC capture compared to other single-cell RNA-seq and single-cell ATAC-seq technologies, including increased unique molecular identifiers (UMIs) and genes per cell (FIG. 71 - FIG. 7L), as well as increased ATAC fragments and peaks per cell (FIG. 7M - FIG. 7P) 6, n ’ 15 ' 18 .
  • UMIs unique molecular identifiers
  • FIG. 7M - FIG. 7P increased ATAC fragments and peaks per cell
  • MultiPerturb-seq outperformed CROP -Multiome along several important dimensions, including better gRNA capture (FIG. 8F) and higher RNA UMIs per cell (FIG. 8G), RNA genes per cell (FIG. 8H), ATAC fragments (FIG. 81), and ATAC peaks per cell (FIG. 8J).
  • FIG. 8F better gRNA capture
  • FIG. 8G RNA UMIs per cell
  • FIG. 8H RNA genes per cell
  • ATAC fragments FIG. 81
  • ATAC peaks per cell FIG. 8J
  • AT/RT central nervous system cancer
  • SMARCB1 an essential subunit of the SWI/SNF chromatin remodeling complex, which is one of the most commonly mutated protein complexes in cancer 20
  • AT/RT is extremely aggressive, and no AT/RT-specific therapies are available:
  • the current standard-of-care is high dose radiation and chemotherapy with autologous stem cell transplant 22 .
  • these intensive (and toxic) therapies the disease is still nearly uniformly fatal with a median overall survival of four years 22 .
  • SMARCB1 Due to the loss of SMARCB1, AT/RT are dependent on alternate epigenetic regulators, such as poly comb 23 ' 25 , and SMARCBl-null embryonic stem cell models fail to differentiate into neurons due to altered gene regulation 26 . Therefore, using MultiPerturb-seq, we targeted -100 epigenetic remodelers in human AT/RT cells (BT16) and sought to discover whether knockdown of specific remodelers can ameliorate the dysfunctional epigenome in AT/RT and restore differentiation (FIG. 9A).
  • AT/RT may arise from a variety of lineages, including non-neural lineages 27
  • negative control (non-targeting) perturbations we found a subset of perturbations with transcriptomes that had greater similarity to late brain stages rather than early ones, such as ZNHIT1, CTCF, GATAD2B, and others. These tended to express higher levels of genes correlated with neural differentiation such as CCND3 29 , GPM6B 30 , and SYNJ2 31 ’ 32 (FIG. 10).
  • the chromatin landscape in AT/RT is unusual with broad changes due to loss of SMARCB1, where residual SWI/SNF complexes cannot maintain accessibility to enhancers needed for differentiation 33 .
  • FIG. 11G We also examined ENCODE cv.s-regulatory elements (CREs) 36 and found a greater number of our perturbations triggered changes in chromatin accessibility at promoters with fewer perturbations acting at enhancers (FIG. 1 IB - FIG. 1 IF). Furthermore, when grouping target genes by complex, we found that knockdown of repressor complex (LSD-CoREST/BHC) subunits (HDAC1, HDAC2, RCOR1) tended to increase accessibility at ENCODE CREs, while knockdown of CERF complex subunits (CERC2, SMARCA 7) tended to decrease accessibility (FIG. 11G).
  • LSD-CoREST/BHC subunits HDAC1, HDAC2, RCOR1
  • CERF complex subunits CERF complex subunits
  • RNA and ATAC differentiation scores were not always correlated (FIG. 9F).
  • RNA and ATAC differentiation score was not always correlated (FIG. 9F).
  • most perturbations of BAF complex members led to high ATAC differentiation and low RNA differentiation scores, suggesting that loss of residual BAF complexes can reshape/restore the chromatin landscape but that these perturbations are not sufficient to differentiate cells (FIG. 11H).
  • ZNHIT1 is a subunit of the SRCAP (SNF-2 related CBP activator protein) complex, which is an INO80 family complex that mediates ATP-dependent exchange of histone H2A.Z, leading to chromatin remodeling and transcriptional modulation (FIG. 12 A). ZNHIT1 has previously been shown to maintain sternness in intestinal stem cells by promoting H2A.Z incorporation 37 .
  • SRCAP SNF-2 related CBP activator protein
  • FIG. 9G, FIG. 12B - FIG. 12E differentially accessible chromatin in ZNHIT1 -perturbed cells compared to non-targeting controls.
  • ZNHIT1 inhibition may be a good candidate to push AT/RT cells toward terminal differentiation.
  • the central goal of an AT/RT reprogramming therapy is cessation of cellular proliferation.
  • H2A.Z is encoded by two genes that differ only by three amino acids
  • H2AZ1 encoded by H2AZ1
  • H2A.Z.2 encoded by H2AZ2
  • H2AZ2 H2A.Z.2
  • MultiPerturb-seq a multiomic pooled CRISPR screening platform, which captures ATAC, mRNA, and CRISPR perturbations. This method increases throughput more than 10-fold over prior unimodal single-cell perturbation screens and does so with lower cost than other single-cell perturbation methods. Compared to performing separate pooled screens for each modality, MultiPerturb-seq can directly link changes in open chromatin and gene expression, yield multi-modal data without the need for computational integration methods, and provides a better controlled assay with fewer technical and biological confounders.
  • MultiPerturb-seq identified ZNHIT1 as a potential target for AT/RT reprogramming therapy, which we further confirmed by demonstrating that ZNHIT1 knockdown pushes cells toward terminal differentiation.
  • MultiPerturb-seq is already compatible with protein capture on the 10X ATAC kit using DNA-barcoded antibodies 41 , as well as other types of guide RNAs with a spacer near the 5’ end (e.g. CRISPR/Cas9, CRISPRa, prime-editing, base-editing).
  • CRISPR/Cas9 CRISPR/Cas9
  • CRISPRa prime-editing
  • base-editing base-editing
  • Tumor suppressor SMARCB1 suppresses super-enhancers to govern hESC lineage determination. Elife 8, e45672 (2019).
  • Domcke, S. et al. A human cell atlas of fetal chromatin accessibility. Science 370, eaba7612 (2020).
  • Valdes-Mora, F. et al. Acetylation of H2A.Z is a key epigenetic modification associated with gene deregulation and epigenetic remodeling in cancer. Genome Research 22, 307-321 (2012).

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Plant Pathology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Virology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Methods for high-throughput CRISPR screening combined with single nucleus chromatin accessibility, transcriptome, and guide RNA capture (MultiPerturb-seq) are provided. The methods include obtaining a heterogeneous population of cells having single cells with one or more genetic perturbations having been introduced by a CRISPR guide RNA that targets a gene or genomic region of interest.

Description

METHODS FOR CHROMATIN ACCESSIBILITY AND TRANSCRIPTOME ANALYSIS OF CELLS HAVING GENETIC PERTURBATIONS
STATEMENT OF GOVERNMENT SUPPORT
This invention was made with government support under HG010099, HG012790, CA218668, CA279135, and GM138635 awarded by the National Institutes of Health. The government has certain rights in the invention.
REFERENCE TO AN ELECTRONIC SEQUENCE LISTING
The electronic sequence listing filed herewith named “NYG-LIPP-214.PCT.xml” (3,807 bytes, created on January 21, 2025) is incorporated herein by reference in its entirety.
BACKGROUND
Recent advances in single-cell perturbation screens have enabled scalable profiling of rich cellular states and phenotypes, particularly with transcriptional phenotype. Several groups have developed methods that expand single-cell perturbation screens to capture modalities such as protein, chromatin accessibility, and 3D genome conformation. These single-cell screens have included a diverse array of genetic perturbations, including knockout using Cas9 nuclease, transcriptional modulation using CRISPRi and CRISPRa, targeting of RNA using Casl3, precise variant insertion via HDR or base-editing, and overexpression with open-reading frame (ORF) libraries. However, these methods often costly and not easily scalable to provide higher-throughput readouts.
A continuing need in the art exists for improved methods for profiling gene expression and chromatin accessibility in single cells having genetic perturbations.
SUMMARY OF THE INVENTION
In one aspect, provided herein is a method for evaluating effects of genetic perturbations on chromatin accessibility and the transcriptome of single cells in a population of cells, the method comprising: (a) obtaining a heterogeneous population of cells having single cells with one or more genetic perturbations having been introduced by a CRISPR guide RNA that targets a gene or genomic region of interest, the single cells comprising one or more CRISPR guide RNAs; (b) obtaining cell nuclei from all or a portion of the single cells of (a) and separating the nuclei into partitions, and incubating the cell nuclei in with a tagmentation buffer that comprises a transposome complex, wherein the transposome complex comprises a transposase, a transposon, and a nucleotide sequence comprising a handle sequence and a first barcode, wherein the transposase causes staggered doublestranded breaks in DNA, and wherein the handle sequence and the first barcode are linked to the double-stranded DNA at the staggered breaks; (c) performing reverse transcription on nuclei from (b), which comprises contacting and incubating the nuclei with reverse transcription primers and template switch oligos (TSOs) comprising the handle sequence and the first barcode, or the corresponding reverse-complement sequence thereof, optionally wherein the TSO comprise a UMI, reverse transcriptase, and dNTPs in a reverse transcription buffer, whereby cellular RNA and CRISPR guide RNAs are reverse transcribed to cDNA comprising the handle sequence and the first barcode, or the corresponding reversecomplement sequence thereof; (d) pooling nuclei from multiple partitions; (e) randomly partitioning one or more nuclei of (d) with a bead, the bead having linked nucleotide sequences comprising a bead-specific barcode and a capture sequence, the capture sequence being complementary to and capable of binding the handle sequence, and disrupting the nuclei and performing PCR amplification wherein the handle sequence binds the bead capture sequence to generate PCR products comprising the bead-specific barcode in combination with each of (i) double-stranded DNA of (b) comprising the first barcode; (ii) CRISPR guide RNA transcribed cDNA of (c) comprising the first barcode; and (iii) cDNA generated from nuclear and/or cellular RNA of (c) comprising the first barcode; (f) sequencing and analyzing the PCR amplification products generated in (e) to associate the effects of a genetic perturbation with the chromatin accessibility and the transcriptome from a single cell, whereby sequences acquired with the same combination of the first barcode and the beadspecific barcode are identified as being from the same cell.
In one aspect, provided herein is a method for evaluating effects of genetic perturbations on chromatin accessibility and the transcriptome of single cells in a population of cells, the method comprising: (a) obtaining a heterogeneous population of cells having single cells with one or more genetic perturbations having been introduced by a CRISPR guide RNA that targets a gene or genomic region of interest, the single cells comprising one or more CRISPR guide RNAs; (b) obtaining cell nuclei from all or a portion of the single cells of (a) and separating the nuclei into partitions, and incubating the cell nuclei in with a tagmentation buffer that comprises a transposome complex, wherein the transposome complex comprises a transposase, a transposon, and a nucleotide sequence comprising a handle sequence and a first barcode, wherein the transposase causes staggered doublestranded breaks in DNA, and wherein the handle sequence and the first barcode are linked to the double-stranded DNA at the staggered breaks; (c) performing reverse transcription on nuclei from (b), which comprises contacting and incubating the nuclei with reverse transcription primers and template switch oligos (TSOs) comprising the handle sequence and the first barcode, or the corresponding reverse-complement sequence thereof, optionally wherein the TSO comprise a UMI, reverse transcriptase, and dNTPs in a reverse transcription buffer, whereby cellular RNA and CRISPR guide RNAs are reverse transcribed to cDNA comprising the handle sequence and the first barcode, or the corresponding reversecomplement sequence thereof; (d) pooling nuclei from multiple partitions; (e) randomly partitioning one or more nuclei of (d), and contacting the nuclei with nucleotide sequences comprising a second barcode and a capture sequence, the capture sequence being complementary to and capable of binding the handle sequence, wherein the and disrupting the nuclei and performing PCR amplification wherein the handle sequence binds the bead capture sequence to generate PCR products comprising the second barcode in combination with each of (i) double-stranded DNA of (b) comprising the first barcode; (ii) CRISPR guide RNA transcribed cDNA of (c) comprising the first barcode; and (iii) cDNA generated from nuclear and/or cellular RNA of (c) comprising the first barcode; (f) sequencing and analyzing the PCR amplification products generated in (e) to associate the effects of a genetic perturbation with the chromatin accessibility and the transcriptome from a single cell, whereby sequences acquired with the same combination of the first barcode and the second barcode are identified as being from the same cell.
In certain embodiments, the first barcode is unique to a partition and differs from another or all other first barcodes present in additional partitions. In certain embodiments, the one or more genetic perturbations include CRISPR-Cas mediated editing, including CRISPR/Cas9, prime-editing, base-editing, CRISPRa, and/or CRISPRi. In certain embodiments, more than one CRISPR guide RNA targets a gene or genomic region of interest or a different gene genomic region of interest in a single cell. In certain embodiments, the one or more partitions of (b) are individual wells of a microwell plate, optionally a 96 well plate. In certain embodiments, one or more partitions of (b) contain at least about 1000, about 2000, about 5000, about 25,000, or about 50,000 nuclei per partition. In certain embodiments, the method further comprises washing the nuclei of step (b) prior to step (c) to stop the tagmentation reaction without disrupting the cell nuclei, wherein the washing comprises addition of EDTA. In certain embodiments, step (e) comprises partitioning at least about 2, at least about 5, at least about 10, at least about 15, or at least about 20 nuclei with the bead. In certain embodiments, the PCR products generated in step (e) are separated to obtain a library comprising a combination of PCR products comprising double-stranded DNA of (b), cDNA from transcription of the CRISPR guide RNAs and cDNA generated from cellular RNA, optionally wherein the separation is based on size. In certain embodiments, the PCR products generated in step (e) are separated to obtain a first library comprising doublestranded DNA of (b), and a second library comprising (i) cDNA from transcription of the CRISPR guide RNAs and (ii) cDNA generated from cellular RNA, optionally wherein the separation is based on size. In certain embodiments, the population of cells of (a) have been further treated with a chemical agent or a biological agent. In certain embodiments, the analysis is limited to cells (nuclei) defined as having at least 200 fragments per cell and/or perturbations wherein at least 100 cells are identified as having the perturbation.
Other aspects and advantages of the invention will be readily apparent from the following detailed description of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 A - FIG. IL show MultiPerturb-seq combines single-cell RNA-sequencing and single-cell ATAC-sequencing with pooled CRISPR perturbations for high-throughput functional genomics. (FIG. 1 A) MultiPerturb-seq combines combinatorial indexing with droplet microfluidics for trimodal capture. (FIG. IB) Cost comparison for various single-cell CRISPR pooled screens methods. (FIG. 1C) Capillary electrophoresis of AT AC, RNA, and CRISPR inhibition (CRISPRi) guide RNA (gRNA) libraries from MultiPerturb-seq. All three libraries show expected patterns (ATAC: Nucleosome bands; Tagmented RNA: Range of fragments centered around 400 bp; CRISPR gRNA: Distinct amplicon band at -200 bp). (FIG. ID - FIG. IF) Single-cell collision rate quantification for ATAC fragments (FIG. ID, 11.6%), RNA transcripts (FIG. IE, 6.2%), and CRISPR gRNAs (FIG. IF, 6,6%) aligning to the human and mouse genomes. ATAC and RNA plots are downsampled for visualization. (FIG. 1G) Uniform Manifold Approximation and Projection (UMAP) on RNA (transcript) data colored by species. Mouse 3T3 fibroblasts (transduced with the mouse non-targeting gRNA library) constituted 20% of all cells prior to nuclei isolation. (FIG. 1H) Open chromatin peaks (ATAC), transcripts (RNA) and gRNAs (CRISPR) detected for BT16 (human) cells and 3T3 (mouse) cells. (FIG. II) Distance of ATAC peaks from transcription start sites (TSS). (FIG. 1 J) Proportion of single cells with 1, 2, or more than 2 gRNAs detected. (FIG. IK) Comparison between cells with histone methyltransferase perturbations (Histone MTs) and cells with non-targeting (NT) control perturbations for gene expression and open chromatin at the RFX3 locus. (FIG. IL) Comparison between cells with perturbations targeting H3F3A and cells with non-targeting (NT) control perturbations for gene expression and open chromatin at the PPM1B locus. For FIG. IK and FIG. IL, reads are normalized to cell number, tracks are binned in 500 bp bins for visualization and scale bars denote 25 kb.
FIG. 2A shows MultiPerturb-seq ATAC, mRNA and CRISPR guide RNA library amplicons. Molecular species at each step of the MultiPerturb-seq protocol from tagmentation and reverse transcription through library preparation. All three molecular species - ATAC, mRNA, and gRNA — are captured on the 10X ATAC Kit, and amplified with a series of custom primers and PCRs. Color key: Light blue: Illumina P5; Dark yellow: Barcode 2 (10X ATAC GEM barcode); Purple/Grey: 10X ATAC GEM capture sequence/Nextera Read 1; Light grey: variable ATAC, RNA, or gRNA region; Grey/Pink: Nextera Read 2; Light yello : Barcode 1 (MEDS or TSO barcode); Light blue: primer binding region; Orange: Illumina P7; Brown: UMI; Blue-grey: gRNA scaffold; Navy blue: RNA handle.
FIG. 3 A - FIG. 3D show CRISPR library design and quality control. (FIG. 3 A) Classification of targets in the AT/RT CRISPRi library by epigenetic and transcriptional functions. Filled boxes indicate that the target gene has the indicated molecular function71. (FIG. 3B) Representation of guide RNAs in the plasmid library. Bias/uniformity was calculated as the ratio of counts at the 90th percentile/lOth percentile. (FIG. 3C) Viral titration of the library virus. (FIG. 3D) Guide RNA representation in the screen based on cell number (fragments/cell threshold set at 100 fragments/cell).
FIG. 4 shows a MultiPerturb-seq workflow. Nuclei are isolated, undergo tagmentation with barcoded MEDS, and then reverse transcription with matching barcoded primers. All molecular species undergo second-round barcoding via droplet microfluidics (10X ATAC kit), then ATAC and RNA fractions are separated and undergo library preparation via custom PCRs. ATAC fragments are amplified directly. The mRNA is first tagmented and then amplified for short-read sequencing. The gRNA is additionally enriched via biotin pulldown prior to amplification.
FIG. 5A - FIG. 5J show optimization of MultiPerturb-seq conditions. Optimization across the three modalities (ATAC, RNA, gRNA). (FIG. 5A) Optimization of ATAC libraries varying the Tn amount, tagmentation buffer and PCR annealing temperature (Ta): The amount of Tn protein is indicated in pl (50 pl reactions) in Omni Lysis Buffer (10 mM Tris- HC1, pH 7.4, 10 mM NaCl, 3 mM MgCh, 0.1% NP-40 (ThermoFisher 85124), 0.1% Tween- 20 (Sigma P1379), 0.01% digitonin (Promega G9441)48 or the lysis buffer from the original ATAC protocol (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3 mM MgC12,+ 1% Tween-20 (Sigma P1379))46. The annealing temperatures (Ta) for PCR amplification (57-67°C) as shown. (FIG. 5B) Optimization of ATAC libraries varying the length of nuclear lysis time (minutes) and cell number (in thousands). (FIG. 5C) Optimization of RNA capture by amplification of a specific cDNA transcript (GAPDH, using intron-spanning primers) with different input cell numbers and template-switch oligonucleotides (TSOs). Reverse transcription and PCR was performed using either RNA extracted using TRIzol (ThermoFisher 15596026) (for inputs in ng) or nuclei (for inputs in number of cells). (FIG. 5D) Comparison of 2X KAPA HiFi Master Mix (Roche 07958935001) and PfuX7 DNA polymerase51 for cDNA amplification. (FIG. 5E) Optimization of gRNA capture with postreverse transcription (2. OX) and post-PCR (2.5X) SPRI cleanups. (FIG. 5F) Optimization of gRNA capture with post-reverse transcription SPRI (2. OX) or ExoSAP-IT cleanup with the indication amount of RT primer (pl in a 40 pl RT reaction). (FIG. 5G) Optimization of joint ATAC and RNA capture (mRNA and gRNA) using different TSOs as in FIG. 5C.
A:TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAATGTCCGGGrGrGrG (SEQ ID NO: 1);
B :TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAATGTCCGGGrGrGrG/3 SpC3/ (SEQ ID NO: 2); C: TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCAACCGATNNNNNNNNNNrGrG rG (SEQ ID NO: 3). (FIG. 5H) Optimization of joint ATAC and RNA capture (mRNA and gRNA) for different gRNAbiotin pulldown strategies. Biotinylated primers were used during ISPCR (Biotin ISPCR) and/or the intermediate gRNA PCR (Biotin Int). (FIG. 51) Optimization of joint ATAC and RNA capture (polyA and gRNA) for different ATAC incubation times at 37°C (given in minutes and denoted as 37 time), inclusion of EDTA and a PBS wash after the tagmentation (denoted as EDTA/wash), and for different RT incubation times at 53°C incubation (given in minutes and denoted as 53 time). (FIG. 5 J) Optimization of ATAC for different tagmentation volumes (ATAC vol), different EDTA stop solution volumes (EDTA), and inclusion of a PBS washing after tagmentation and EDTA stop solution addition (wash).
FIG. 6 A - FIG. 6B show an overview of MultiPerturb-seq sequencing, alignment and read mapping. (FIG. 6A) Overview of data processing for MultiPerturb-seq. (FIG. 6B) The intersection of cell barcodes between modalities (ATAC, RNA, and CRISPR). All aligned cell barcodes were considered in this analysis. 429,139 cell barcodes (with reads in all three modalities) were then filtered based on read number and other metrics.
FIG. 7A - FIG. 7P show ATAC, RNA, and gRNA quality metrics. (FIG. 7A - FIG. 7C) Single-cell collision rate quantification for ATAC fragments (FIG. 7A), RNA transcripts (FIG. 7B), and guide RNA transcripts (FIG. 7C) aligning to the human and mouse genomes. For visualization, outliers beyond the 99th percentiles are omitted. (FIG. 7D) Percent reads per cell mapping to incorrect species in MultiPerturbseq and scifi-RNA-seq11. Cells were annotated as human or mouse based on the dominant species. MultiPerturb-seq had a 80:20 humammouse mix and scifi-RNA-seq had a 50:50 humammouse mix. (FIG. 7E) Proportion of fragments within 3 kb of transcription start sites (TSSs) in MultiPerturb-seq and CRISPR- sciATAC6. (FIG. 7F) Proportion of mitochondrial reads (RNA) in cells. (FIG. 7G) Fraction of all cells with gRNAs assigned compared to other single-cell perturbation methods. (FIG. 7H) Pseudobulk expression of CRISPRi target genes relative to non-targeting controls by perturbation. For visualization, CRISPRi target genes detected in cells with a non-targeting control perturbation, but not in cells that received a gene-targeting perturbations were assigned a log2(FC) of -10. All CRISPRi target genes that were detected in cells with a genetargeting perturbation were detected in cells with a non-targeting perturbation. (FIG. 71 - FIG. 7P) Comparison of RNA and ATAC metrics to other single-cell methods including (FIG. 71) RNA unique molecular identifiers (UMIs) per cell15'18, (FIG. 7J) unique genes per cell15'18, (FIG. 7K) UMIs per cell11, (FIG. 7L) unique genes per cell11, (FIG. 7M) unique ATAC fragments per cell15, (FIG. 7N) unique ATAC peaks per cell15, (FIG. 70) unique ATAC fragments (mapped to peaks) per cell6, (FIG. 7P) ATAC peaks per cell6, Cells were uniformly defined as having either 1,000 RNA or 1,000 ATAC reads to facilitate comparisons.
FIG. 8 A - FIG. 8 J show comparison of MultiPerturb-seq and the 10X Multi ome kit with CROP-seq (CROP-Multiome). (FIG. 8A) The epigenomic remodelers library was recloned into the specialized guide RNA (gRNA) plasmid, CROP-seq19 to perform a multiomic CRISPR screen on the 10X Multiome kit in BT16 cells. (FIG. 8B - FIG. 8E) Comparison of differentially expressed genes (compared to non-targeting gRNA control) between MultiPerturb-seq and CROP-Multiome strategies for cells receiving (FIG. 8B) SETD5, (FIG. 8C) BRD7, (FIG. 8D) ACTL6B, or (FIG. 8E) SSRP /-targeting gRNAs. Differential expression was computed using SCEPTRE65. (FIG. 8F) Fraction of high-quality cells (at least 1000 unique RNA reads per cell) with successful gRNA capture/ assignment in MultiPerturb- seq and CROP-Multiome. (FIG. 8G) RNA unique reads per cell, (FIG. 8H) RNA genes per cell, (FIG. 81) unique ATAC fragments (mapped to peaks) per cell, and (FIG. 8J) unique ATAC peaks per cell. Cells were uniformly defined as having either 1,000 RNA or 1,000 ATAC reads across comparisons of technologies. In addition, for CROP-Multiome, we did not require gRNA capture due to the low capture rate.
FIG. 9A - FIG. 9G show MultiPerturb-seq identifies genetic perturbations that trigger differentiation in atypical teratoid/rhaboid tumor (AT/RT). (FIG. 9A) Overview of differentiation challenge in AT/RT brain tumors and design of pooled CRISPR library to identify chromatin remodelers for cancer reprogramming therapy. (FIG. 9B) Correlation between gene-perturbed human AT/RT cells and gene expression over developmental stages from 4 weeks post-conception (wpc) to senior adulthood.28 The Pearson correlation is computed on the top 1000 highly variable genes and values are normalized such that cells receiving a non-targeting perturbation display as zero on the colorscale. (FIG. 9C) Correlation between gene-perturbed human AT/RT cells and open chromatin peaks in developmental34 and adult35 brain atlases (left) and sum of fold-changes (log?) at peaks overlapping ENCODE regulatory elements36 (right). The Pearson correlation is computed on the top 1000 highly variable promoter-adjacent peaks and values are normalized such that cells receiving a non-targeting perturbation display as zero on the colorscale. PLS: promoterlike sequence, pELS: proximal enhancer-like sequence, dELS: distal enhancer-like sequence, DNase-H3K4me3 : poised elements.36 (FIG. 9D - FIG. 9E) Ranked CRISPRi gene perturbations by RNA differentiation score (FIG. 9D) and ATAC differentiation score (FIG. 9E). Higher values indicate greater similarity to postnatal primary brain tissues. (FIG. 9F) RNA and ATAC differentiation scores for all CRISPRi gene perturbations. (FIG. 9G) Normalized difference in correlations of gene expression between ZNHIT1 -perturbed cells and cells receiving NT (negative control) perturbations. For each cell population (ZNHIT1, NT), we computed the Pearson correlation of gene expression with human brain developmental expression (n = 53 primary cerebrum samples at the indicated developmental timepoints). Line denotes LOESS fit and shaded region indicates the 95% confidence interval. FIG. 10 shows expression of marker genes in gene-perturbed AT/RT cells with the highest and lowest RNA differentiation scores (top) and in primary cerebrum tissues28 (bottom, n = 53 samples from 4 weeks post conception [wpc] to adulthood with 1-4 donors per developmental stage). Marker genes of neural differentiation include markers of neurons (CCND3, GABBR1, GPM6B), astrocytes (SYNJ2), inhibitors of mesenchymal lineages (ITM2B), and genes with previously defined roles in cancer differentiation therapy (ARHGEF37 , SYNJ2i1, 32). Marker genes of sternness include markers of embryonic stem cells (TRIM2415 , SMARCAD116), neural stem cells (EPHB417), neural progenitors (CACHD1™, TACC379), mesenchymal lineages (ARID3Bm), and cancer stem markers (TRIM24^, ARID3B 2).
FIG. 11 A - FIG. 11H show changes in open chromatin at ENCODE regulatory elements and comparison with open chromatin from healthy brain. (FIG. 11 A) Correlation between gene-perturbed human AT/RT cells and open chromatin peaks in developmental34 and adult35 brain atlases. The Pearson correlation is computed on the top 1000 highly variable promoter-adjacent peaks. (FIG. 1 IB - FIG. 1 IE) Rank of different CRISPRi perturbations in the MultiPerturb-seq screen by changes in open chromatin at genomic loci overlapping ENCODE36 (FIG. 1 IB) promoters (PLS), (FIG. 11C) proximal (pELS) and (FIG. 1 ID) distal enhancers (dELS), as well as (FIG. 1 IE) poised elements (DNase-H3K4me3). Each plot shows the sum of fold changes (log2) at peaks overlapping with ENCODE regulatory elements36 in gene-perturbed cells compared to cells with a non-targeting (NT) gRNA. (FIG. 1 IF) Cumulative density of fold-changes in open chromatin overlapping specific ENCODE regulatory element in gene-perturbed cells compared to non-targeting cells. The density is computed over different genetic perturbations and significance was calculated using a two- sided Kolmogorov- Smirnov test with FDR correction: n.s., not significant, **,p < 0.01 and **** ,p < 0.0001. (FIG. 11G) Sum of fold-changes (log2) at peaks overlapping ENCODE regulatory elements36 (right) with perturbed genes grouped by protein complex. Complex types are also classified by groupings from the EpiFactors database71. (FIG. 11H) RNA and ATAC differentiation scores for each chromatin modifier complex.
FIG. 12A - FIG. 121 show chromatin accessibility, gene expression and differentiation assays after ZNHIT1 knockdown. (FIG. 12 A) ZNHIT1 functions within the SRCAP complex to deposit histone variant H2A.Z. YEATS4 and KAT5 also function in H2A.Z deposition and/or acetylation. (FIG. 12B - FIG. 12C) Gene Ontology (GO) Biological Processes analyses for ZNHIT1 -perturbed cells from MultiPerturb-seq for FIG. 12B the closest genes to the 10,000 most significant differential AT AC peaks (compared to cells with a non-targeting [NT] gRNA) and (FIG. 12C) upregulated genes (compared to cells with a NT gRNA). (FIG. 12D) Flow cytometry gating for SOX2 analysis. (FIG. 12E) Quantification of SOX2-positive cells (n = 3 biological replicates with 3 guide RNAs). (FIG. 12F) Representative images of EdU labeling for cell cycle analysis. Scale bar: 5pm. g, Quantification of EdU incorporation in YEATS4- and KAT5-perturbed cells compared to non-targeting (NT) control perturbations (n = 3 biological replicates with 3 gRNAs). Oneway ANOVA with Tukey’s post-hoc test: **, p < 10-2. (FIG. 12H) Representative immunofluorescence images of MAP2 expression in BT12 AT/RT cells with a NT or ZNHIT1 -targeting gRNA. Scale bar: 50pm. (FIG. 121) Quantification of MAP2 immunofluorescence (n = 3 biological replicate gRNAs per condition and 3 technical replicates per gRNA). Open circles represent the median for each gRNA. n = 5 images for n = 3 technical replicates per 3 biological replicates with 3 guide RNAs. Mann-Whitney U test: **** p < | Q—4
FIG. 13 A - FIG. 13P show ZNHIT1 loss drives AT/RT cell cycle arrest and differentiation via decreased H2A.Z deposition. (FIG. 13 A) CRISPRi validation in AT/RT cells to assess sternness, proliferation and differentiation after ZNHIT1 loss. (FIG. 13B) SOX2 expression in cells receiving ZNHIT L SOX2 or non-targeting (negative control, NT) guide RNAs (gRNAs). (FIG. 13C) Proportion of S-phase genes38 as a fraction of expression of all cell-cycle genes (n = 262 ZNHIT 1 -perturbed cells and 4,808 NT cells with at least 100 RNA UMI counts). Error bars indicate the 95% confidence interval (bootstrap resampling). (FIG. 13D) EdU incorporation in cells with ZNHIT '/-targeting gRNAs compared to NT gRNAs (n = 3 biological replicates). Treatment with the topoisomerase II inhibitor doxorubicin (Doxo) serves as a positive control for cell cycle arrest. Significance was determined via a one-way ANOVA with Tukey’s post-hoc test. (FIG. 13E) ATOH8 transcription factor signature in MultiPerturb-seq. Transcription factor signatures were calculated by aggregating counts of ATOH8 target genes (n = 262 ZNHITI -perturbed cells and 4,808 NT cells with at least 100 RNA UMI counts). (FIG. 13F - FIG. 13H) Expression and quantification of (FIG. 13F) ATOH8, (FIG. 13G) TUJ1, and (FIG. 13H) MAP2 in BT16 cells with ZNHIT1 -targeting or NT gRNAs. (FIG. 131) CUT&RUN of H2A.Z, H3K4me3, and IgG (negative control) in BT16 cells with ZNHIT 1 -targeting or NT gRNAs. (FIG. 13J) H2A.Z CUT&RUN peaks in BT16 cells with ZNHIT '/-targeting or NT gRNAs n = 2 biological replicates). (FIG. 13K) Change in H2A.Z peak height for shared peaks (present in both cells with ZNHIT1 -targeting or NT gRNAs). For visualization, 5,000 sampled peak pairs are shown individually and outliers beyond the 99th percentile are omitted. Significance was determined with a paired /-test. (FIG. 13L) Enriched Gene Ontology Biological Processes for nearest genes to the top 10,000 H2A.Z-bound peaks with the largest decreases in ZNHIT1 -perturbed cells. (FIG. 13M) Cell cycle analysis in CHLA06 AT/RT cells transduced with ZNHIT1-, H2AZ1-, or 772ZZ2 -targeting (or NT) gRNAs (n = 2 - 3 guide RNAs per perturbed gene). (FIG.13N) Quantification of S-phase cells from FIG. 13M and significance determined via %2-test. (FIG. 130) Representative immunofluorescence images of MAP2 expression in BT16, BT12, and CHLA06 AT/RT cells \AA\ H2AZ1- or H2AZ2- targeting (or NT) gRNAs. (FIG. 13P) Quantification of MAP2 expression in BT16, BT12, and CHLA06 AT/RT cells with H2AZ1- or H2AZ2 -targeting (or NT) gRNAs (n = 3 biological replicate gRNAs per condition and 3 technical replicates per gRNA). Open circles represent the median for each gRNA. For all panels, significance levels: **, p < 10-2 and **** p < JO-4 Unless specified otherwise, significance was determined via a two-sided Mann-Whitney U test.
FIG. 14A - FIG. 14C show H2A.Z and H3K4me3 CUT&RUN after ZNHIT1 loss. (FIG. 14A) Correlation between replicates of H2A.Z CUT&RUN for BT16 AT/RT cells receiving a non-targeting (NT) or ZNHIT1 -targeting guide RNA (gRNA). For visualization, outliers beyond the 99th percentile are omitted. (FIG. 14B) H3K4me3 CUT&RUN peaks in BT16 cells with ZN/// 77 -targeting or NT gRNAs (n = 2 biological replicates). (FIG. 14C) Binding of H3K4me3 and IgG (negative control) near transcription star sites in BT16 cells with ZNHIT1 -targeting or NT gRNAs.
FIG. 15A - FIG. 15D show H2A.Z loss hinders cell cycle progression. (FIG. 15 A) Gating strategy for cell cycle analysis with propidium iodide (PI) in CHLA06 AT/RT cells (Sony SH800). (FIG. 15B) Gating strategy for cell cycle analysis with PI in BT12 AT/RT cells (MACS Quant10). (FIG. 15C) Cell cycle analysis in BT12 AT/RT cells transduced with ZNHIT1-, H2AZ1-, or H2AZ2 -targeting (or NT) gRNAs (n = 2 - 3 guide RNAs per target). (FIG. 15D) Quantification of S-phase cells from FIG. 15C. Significance was determine using a %2-test
DETAILED DESCRIPTION OF THE INVENTION
A scalable in vitro method is provided for analyzing chromatin accessibility and screening RNA of single cells having genetic perturbations in a heterogeneous population (e.g., a library of cells). The technology (termed “MultiPerturb-seq” for Multi ome Perturb- seq) is useful in the research and development of new therapies by allowing interrogation of single-cell transcriptome and chromatin accessibility profiles at scale. CRISPR perturbations may be used to precisely target known or novel pharmacologic or gene therapy targets.
Analysis of the cell transcriptome provides a view of cell state, while chromatin accessibility profiling adds additional information about cell state while also providing information about putative mechanism of action. Linking these through a pooled screen with combinatorial indexing allows hundreds to thousands of targets to be screened in a single experiment, allowing for iterative and rapid hypothesis generation and discovery. The method allows for analyses to be performed in a scalable and efficient matter that provides significant cost savings in comparison to various single-cell CRISPR pooled screening methods.
Unless defined otherwise in this specification, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs and by reference to published texts, which provide one skilled in the art with a general guide to many of the terms used in the present application.
As used throughout this specification and the claims, the terms “comprising”, “containing”, “including”, and its variants are inclusive of other components, elements, integers, steps and the like. Conversely, the term “consisting” and its variants are exclusive of other components, elements, integers, steps and the like.
It is to be noted that the term “a” or “an”, refers to one or more, for example, “a perturbation”, is understood to represent one or more perturbations. As such, the terms “a” (or “an”), “one or more,” and “at least one” is used interchangeably herein.
As used herein, the term “about” means a variability of plus or minus 10% from the reference given, unless otherwise specified.
As used herein, the phrase “consisting essentially of’ limits the scope of a described composition or method to the specified materials or steps and those that do not materially affect the basic and novel characteristics of the described or claimed method or composition.
Wherever in this specification, a method or composition is described as “comprising” certain steps or features, it is also meant to encompass the same method or composition consisting essentially of those steps or features and consisting of those steps or features.
In certain embodiments, provided herein is a method for evaluating the effects of genetic perturbations on chromatin accessibility and the transcriptome of single cells in a population of cells. In certain The method (MultiPerturb-seq) utilizes methods of introducing CRISPR perturbations in combination with Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) and RNA sequencing methodologies.
The methods herein relate to perturbation and assessment of nucleic acids. A “nucleic acid”, “nucleic acid sequence”, or “nucleotide sequence” as described herein, can be RNA, DNA, or a modification thereof, and can be single or double stranded, and can be selected, for example, from a group including: nucleic acid encoding a protein of interest, oligonucleotides, nucleic acid analogues, for example peptide- nucleic acid (PNA), pseudocomplementary PNA (pc-PNA), locked nucleic acid (LNA) etc. Such nucleic acid sequences include, for example, but are not limited to nucleic acid sequence encoding proteins, for example that act as transcriptional repressors, antisense molecules, ribozymes, small inhibitory nucleic acid sequences, for example but are not limited to RNA interference (RNAi), short hairpin RNAi (shRNAi), small interfering RNA (siRNA), micro RNAi (mRNAi), antisense oligonucleotides etc.
In certain embodiments, the terms “nucleotide” “nucleic acid” “nucleotide residue” and “nucleic acid residue” are used interchangeably, referring to a nucleotide in a nucleic acid polymer. In a further embodiment, consecutive nucleotide residues refer to nucleotide residues in a contiguous region of a nucleic acid polymer.
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. As used herein, RNA may refer to a CRISPR guide RNA, a messenger RNA (mRNA), a long non-coding RNA (IncRNA), a mitochondrial RNA, a microRNA (miRNA), non-coding RNAs, transfer RNA, ribosomal RNA, short hairpin RNAi (shRNAi), or small interfering RNA (siRNA).
As used herein, deoxyribonucleic acid (DNA) is a polymeric molecule formed by deoxyribonucleic acid, including, but not limited to, genomic DNA, double-strand DNA, single-strand DNA, DNA packaged with a histone protein, complementary DNA (cDNA which is reverse-transcribed from a RNA), mitochondrial DNA, and chromosomal DNA.
Nucleic acid sequences described herein can be cloned using routine molecular biology techniques, or generated de novo by DNA synthesis, which can be performed using routine procedures by service companies having business in the field of DNA synthesis and/or molecular cloning (e.g. GeneArt, GenScript, Life Technologies, Eurofins). For example, nucleic acid sequences encoding aspects of a CRISPR-Cas editing system can be assembled and placed into any suitable genetic element, e.g., naked DNA, phage, transposon, cosmid, episome, etc., which transfers the sequences carried thereon to a host cell, e.g., for generating non-viral delivery systems (e.g., RNA-based systems, naked DNA, or the like), or for generating viral vectors in a packaging host cell, and/or for delivery to a host cells in a subject. The methods used to make such engineered constructs are known to those with skill in nucleic acid manipulation and include genetic engineering, recombinant engineering, and synthetic techniques. See, e.g., Green and Sambrook, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, Cold Spring Harbor, NY (2012). dNTP stands for deoxyribonucleotide triphosphate. Each dNTP is made up of a phosphate group, a deoxyribose sugar and a nitrogenous base. There are four different dNTPs and can be split into two groups: the purines (including dATP, deoxyadenosine 5'- triphosphate, and dGTP, deoxyguanine 5'-triphosphate) and the pyrimidines (including dTTP, deoxythymidine 5'-triphosphate, and dCTP, deoxy cytidine 5'-triphosphate). As used herein, dNTP Mix (also referred to as dNTPs herein) is a mixture (normally in a solution containing sodium salts) of dATP, dCTP, dGTP and dTTP, suitable for use in polymerase chain reaction (PCR), sequencing, fill-in reactions, nick translation, cDNA synthesis, and TdT-tailing reactions.
As used herein, “complementary DNA” or “cDNA” can refer to a synthetic DNA reverse transcribed from RNA through the action of a reverse transcriptase. The cDNA may be single-stranded or double-stranded and can include strands that have either or both of a sequence that is substantially identical to a part of the RNA sequence or a complement to a part of the RNA sequence.
As used herein, the term “perturbation” refers to the effects on one more target genes or genomic regions of interest, including modification in the expression of gene products (including proteins) or a target sequence. Perturbations include mutations or modifications such as, e.g. small nucleotide insertions or deletions (indels) or a larger deletion, insertion, or inversion. In certain embodiments, the introduction a mutation or modification is referred to as “editing” or “gene editing”. Perturbations include transcriptional silencing or repression (e.g. CRISPRi) or activation (CRISPRa) of a target genes or genomic regions of interest.
The methods described herein are amenable to a variety of different cell types (or nuclei therefrom). In certain embodiments, the cells include eukaryotic cells such as plant cells, animal cells, fungal cells, protozoan cells, or algae cells. In one embodiment, the cells are a mammalian cells. In a further embodiment, the cells are stem cells (for example, an embryonic stem cell), cancer cells, neuronal cells, epithelial cells, immune cells (e.g., lymphocytes), endocrine cells, germ cells, somatic cells, kidney cells, liver cells, pancreatic cells, skin cells, fat cells, bone cells, or muscle cells. In certain embodiments, the cells are a cell line, for example an HEK293 cell, an NIH-3T3 cell, or a K562 cell.
As used herein, the term “oligo” (z.e., oligonucleotide) refers to short DNA or RNA molecules. In one embodiment, an oligo can be at least about 1 to 500 monomeric components, e.g., nucleotides, in length. In a further embodiment, an oligo can be about 20 to about 80 nucleotides in length. Thus, in various embodiments, an oligo is formed of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,
79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 80, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 nucleotides. In certain embodiments, an oligo refers to a template switch oligonucleotide
(TSO).
Some embodiments include the use of primers. As used herein, a “primer” can refer to a short polynucleotide, generally with a free 3 '-OH group, that binds to a target or template polynucleotide present in a sample by hybridizing with the target or template, and thereafter promoting extension of the primer to form a polynucleotide complementary to the target or template. Primers can include polynucleotides ranging from 5 to 1000 or more nucleotides. In some embodiments, the primer has a length of at least 4 nucleotides, 5 nucleotides, 10 nucleotides, 15 nucleotides, 20 nucleotides, 25 nucleotides, 30 nucleotides, 35 nucleotides, 40 nucleotides, 45 nucleotides, 50 nucleotides, 60 nucleotides, 70 nucleotides, 80 nucleotides, 90 nucleotides, 100 nucleotides, or a length within a range of any two of the foregoing lengths.
As used herein, a barcode describes a defined polymer, e.g., a polynucleotide, which when it is a functional element of the polymer construct, is specific for a compartment, a single cell, or cell nucleus or cellular components (for example, DNA, RNA and/or mitochondria and ribosomes) thereof. In one embodiment, the barcode is about 2 to 4 monomeric components, e.g., nucleotide bases, in length. In other embodiments, the barcode is at least about 1 to 100 monomeric components, e.g., nucleotides, in length. Thus, in various embodiments, the barcode is formed of a sequence of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,
62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
87, 88, 89, 80, 91, 92, 93, 94, 95, 96, 97, 98, 99, or up to 100 monomeric components, e.g., nucleic acids. A barcode can be an artificial sequence or a naturally occurring sequence. In certain embodiments, each barcode within a population of barcodes is different. In other embodiments, a portion of barcodes in a population of barcodes is different, e.g., at least about 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, or 99% of the barcodes in a population of barcodes is different. A population of barcodes may be randomly generated or non-randomly generated. In certain embodiments, a population of barcodes are error correcting barcodes. Barcodes can be used to computationally deconvolute the multiplexed sequencing data and identify sequence reads derived from an individual cell, compartment, etc. A barcode can also be used for deconvolution of a collection of cells or cell nuclei or cellular components thereof that have been distributed into small compartments for enhanced mapping.
In certain embodiments, the term “barcode” and “barcoded” also refers to a process of introducing a barcode to a DNA or RNA. Examples of introducing a barcode are illustrated in FIG. 1 A. In certain embodiments, a barcode may be located at the 3’ end of a reverse transcription (RT) primer, such as, a RT primer comprising a oligo d(T)n (also termed as RT oligo, referring to a polyT oligo) at the 5’ end and a barcode at the 3’ end. In certain embodiments, a barcode may be located at the 3’ end of a PCR primer. Such primer may be used in amplifying tagmented DNA or guide RNA via a PCR reaction.
In certain embodiments, a nucleic acid (such as DNA or RNA) is barcoded using a “unique molecular identifier” (UMI), also called equivalently a “random molecular tag” (RMT), which is a random sequence of monomeric components of a polymer as described above, e.g., nucleotide bases, is specific for that polymer. The UMI permits identification of amplification duplicates of the polymer with which it is associated. In the description of the methods and compositions herein, one or more UMI may be associated with a single polymer. The UMI may be positioned 5’ or 3’ to the barcode in the composition. In another embodiment, the UMI may be inserted into the polymer as part of the described methods. In one embodiment of the methods described herein, a UMI is added during the method, for example, during reverse transcription. Each UMI for each polymer e.g., oligonucleotide or polynucleotide, is different from any other UMI used in the compositions or methods. In any embodiment, the UMI is formed of a random sequence of DNA, RNA, modified bases or combinations of these bases or other monomers of the polymers identified above. In one embodiment, a UMI is about 8 monomeric components, e.g., nucleotides, in length. In other embodiments, each UMI can be at least about 1 to 100 monomeric components, e.g., nucleotides, in length. Thus, in various embodiments, the UMI is formed of a random sequence of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,
73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 80, 91, 92, 93, 94, 95, 96, 97,
98, 99 or up to 100 monomeric components, e.g., nucleic acids.
The terms “another,” “first,” “second,” “third,” “fourth,” “fifth,” and “sixth,” are used throughout this specification as reference terms to distinguish between various forms and components of the compositions and methods, for example, barcodes, compartment sets, or promoters.
As used herein, the term “partition” refers to a physical area or volume that separates or isolates a subset of cells and/or cell nuclei from other subsets. In certain embodiments, a subset may be a single cell or a nucleus from a single cell, and the partition isolates each cell or cell nuclei. A partition may be an aqueous compartment (for example, microfluidic droplet), a solid compartment (for example, a well on a plate, a tube, a vial, a particle, a microparticle, and/or a bead), or a separated region on a surface (for example, a chip, a microplate, or a slide).
I. Genomic Perturbations
The methods provided rely on the ability to link genetic perturbations with additional multiomic readouts for a single cell. In certain embodiments, the method comprises obtaining a heterogenous population of cells having single cells with one or more genetic perturbations. In certain embodiment, the perturbation have been introduced by one or more CRISPR guide RNAs. The guide RNAs are subsequently amplified and sequenced to identify the guide RNAs themselves and the corresponding genes or genomic targets for that single cell. Amplification includes obtaining a nucleic acid that includes a guide RNA sequence and a barcode. In certain embodiments, a heterogeneous population of cells is obtained by transducing cells with a CRISPR-Cas vector library that includes guide RNAs that target multiple genes or genomic targets.
The CRISPR-Cas system is a method for functionally inactivating genes in a cell using a CRISPR-associated endonuclease (z.e., Cas, for example, Cas9, dCas) to perturb a target gene or genomic region of interest’s transcription (e.g., to disrupt or repress expression). A small RNA (guide RNA, gRNA) is used to guide the nuclease to a defined target site. Perturbations can also be introduced by prime-editing, base-editing, CRISPRa, and/or CRISPRi methodologies, wherein the respective guide RNA constructs (e.g., pegRNA) are captured and sequenced according to methods described herein.
As used herein, a genome refers to the genetic material of an organism. The genome includes both the genes (the coding genomic sequences which code for protein in the organism) and the noncoding DNA (which does not encode protein in the organism, including but not limited to introns, sequences for non-coding RNAs, regulatory regions such as promoter and enhancer, and repetitive DNA), as well as mitochondrial DNA and chloroplast DNA.
Genome editing, or genomic editing, or gene editing, is a type of genetic engineering in which DNA is inserted, deleted, modified or replaced in the genome of an organism. Editing the genome can be achieved using engineered nucleases such as CRISPR-Cas9 (or other CRISPR enzymes). The methods described herein apply to cells that are perturbed, for example, by a gain-of-function genomic editing, a loss-of-function genomic editing, an upregulation or downregulation of certain coding or non-coding genomic sequence, or epigenome editing.
The terms “guide RNA,” “gRNA,” “guide,” or “guide sequence,” refer to a nucleic acid sequence which can hybridize to a unique sequence located 3’ or 5’ from a T-rich protospacer-adjacent motif (PAM) in a contiguous region of the genome or a chromosome of a cell, wherein the guide is capable of complexing with Cas protein and providing targeting specificity and binding ability for nuclease activity of Cas. In certain embodiments, the guide RNA is about 18 nucleotides (nt) to about 35 nt. In certain embodiments, the guide RNA is about 23 nt. The terms “CRISPR RNA spacer,” “spacer,” and “guide RNA coding sequence” are used interchangeably herein and refer to a nucleic acid sequence which encodes a guide RNA. In certain embodiments, the spacer is a DNA. In certain embodiments, the spacer is about 18 nucleotides (nt) to about 35nt. In one embodiment, the spacer is about 23 nt. In certain embodiments, guide RNA sequence comprises a UMI sequence.
As used herein, the term “a heterogeneous population of cells” refers to multiple cells, which are not identical to each other. In certain embodiments, the heterogenous population of cells includes those that are differentiated by one or more guide RNAs present in single cells. In certain embodiments, the heterogeneous population of cells includes cells having different guide RNAs that target a different region of a gene or genomic region of interest. In another example for heterogeneous population of cells, a subset of cells (z.e., part of but not the whole cell population) is treated with a drug. In certain embodiments, the heterogenous population of cells include cells from an experimental timepoint (e.g., a control untreated subset and one more subsets obtained one or timepoints following exposure to a drug).
In certain embodiments, the methods provided herein comprise a perturbation step comprising transducing cells with one or more vectors and culturing the cells. Each vector comprises a nucleic acid sequence encoding a Cas protein in operative association with a first promoter which controls expression of the Cas protein, and a CRISPR guide RNA coding sequence in operative association with a second promoter which controls transcription thereof. In certain embodiments, the vector is a lentiviral vector. In certain embodiments, the cells are incubated with the vector at a multiplicity of infection (MOI) of about 0.05, about 0.1, about 0.2, or about 0.3.
As used herein, “operably linked” sequences or sequences “in operative association” include both expression control sequences that are contiguous with the nucleic acid sequence of interest and expression control sequences that act in trans or at a distance to control the nucleic acid sequence of interest.
A “vector” as used herein is a biological or chemical moiety comprising a nucleic acid sequence which can be introduced into an appropriate cell for replication or expression of said the nucleic acid sequence. Common vectors include naked DNA, phage, transposon, plasmids, viral vectors, cosmids and artificial chromosomes (Gong, Shiaoching, et al. “A gene expression atlas of the central nervous system based on bacterial artificial chromosomes.” Nature 425.6961 (2003): 917-925). One type of vector is a “plasmid”, which refers to a circular double stranded DNA loop into which additional nucleic acid segments can be ligated. Another type of vector is a viral vector, wherein additional nucleic acid segments can be ligated into the viral genome. Certain vectors are capable of autonomous replication in a cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). In certain embodiments, the vector is a lentiviral vector. Other vectors (e.g., non-episomal mammalian vectors) are integrated into the genome of a cell upon introduction into the cell, and thereby are replicated along with the cell genome.
A “viral vector” refers to a synthetic or artificial viral particle in which an expression cassette containing a nucleic acid sequence of interest is packaged in a viral capsid or envelope. Examples of viral vector include but are not limited to lentivirus, adenoviruses (Ads), retroviruses (/-retroviruses and lentiviruses), poxviruses, adeno-associated viruses (AAV), baculoviruses, herpes simplex viruses. In one embodiment, the viral vector is replication defective. A “replication-defective virus” refers to a viral vector, wherein any viral genomic sequences also packaged within the viral capsid or envelope are replicationdeficient; z.e., they cannot generate progeny virions but retain the ability to infect cells.
Optionally, the vector further comprises a reporter gene or a nucleic acid encoding a selectable marker, which may include sequences encoding geneticin, hygromicin, ampicillin or purimycin resistance, among others. As used herein, the term “selectable marker” refers to a peptide or polypeptide whose presence can be readily detected in a cell when a selective pressure is applied to the cell. A reporter gene, which is used as an indication of presence of the vector in a cell or not, is readily known by one of skill in the art. For example, the E. coli lacZ gene, the chloramphenicol acetyltransferase (CAT) gene, or a gene encoding a fluorescent protein such as Green fluorescent protein (GFP).
In certain embodiments, the promoter is an inducible promoter, such as a doxycycline inducible promoter. In certain embodiments, the first promoter is an RNA pol II promoter. An RNA pol II promoter is a promoter that is sufficient to direct accurate initiation of transcription by the RNA polymerase II machinery, wherein the RNA polymerase II (RNAP II and Pol II) is an RNA polymerase found in the nucleus of eukaryotic cells, catalyzing the transcription of DNA to synthesize precursors of messenger RNA (mRNA) and most small nuclear RNA (snRNA) and microRNA.
A variety of Polymerase II promoters that can be used within the compositions and methods described herein are publicly or commercially available to a skilled artisan, for example, viral promoters obtained from the genomes of viruses including promoters from polyoma virus, fowlpox virus (UK 2,211,504), adenovirus (such as Adenovirus 2 or 5), herpes simplex virus (thymidine kinase promoter), bovine papilloma virus, avian sarcoma virus, cytomegalovirus (CMV), a retrovirus (e.g., MoMLV, or RSV LTR), Hepatitis-B virus, Myeloproliferative sarcoma virus promoter (MPSV), VISNA, and Simian Virus 40 (SV40); other heterologous mammalian promoters including the actin promoter, P-actin promoter, immunoglobulin promoter, heat-shock protein promoters, human Ubiquitin-C promoter, PGK promoter. Additional promoters are readily known and available. See, e.g., (Kadonaga, 2012), WO 2014/15134, and WO 2016/054153.
In certain embodiments, the second promoter is an RNA pol III promoter. As recognized by one of skill in the art, a RNA pol III promoter is a promoter that is sufficient to direct accurate initiation of transcription by the RNA polymerase III machinery, wherein the RNA polymerase III (RNAP III and Pol III) is a RNA polymerase transcribing DNA to synthesize ribosomal 5S ribosomal RNA (rRNA), transfer RNA (tRNA), crRNA, and other small RNAs (for example, guide RNA). A variety of Polymerase III promoters which can be used with the invention are publicly or commercially available, for example the U6 promoter, the promoter fragments derived from Hl RNA genes or U6 snRNA genes of human or mouse origin or from any other species. In addition, pol III promoters can be modified/engineered to incorporate other desirable properties such as the ability to be induced by small chemical molecules, either ubiquitously or in a tissue-specific manner. For example, in one embodiment the promoter may be activated by tetracycline. In another embodiment, the promoter may be activated by IPTG (lacl system). See, US5902880A and US7195916B2. In another embodiment, a Pol III promoter from various species might be utilized, such as human, mouse or rat.
In certain embodiments, more than one (z.e., multiple) CRISPR guide RNA transcribed by the vectors is targeted to each functional unit of a cell genome of interest. In certain embodiments, there are about 1, about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 50, about 75, about 100 or more different guide RNAs targeted to each functional unit of a cell genome of interest. In certain embodiments, each vector transcribes a single guide RNA. In certain embodiments, each vector transcribes about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 15, about 20, about 25, or more guide RNAs.
As used herein, the functional unit of a cell genome of interest refers to a genomic sequence which serves a certain function or is suspected of having a certain function. Such function may be expressing a protein of interest, transcribing to an RNA of interest, or regulating a gene of interest. A functional unit of a cell genome typically encompasses a limited region of the genome, such as a region of 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90 to 100 kb of genomic DNA. In certain embodiments, the functional unit of a cell genome is a coding sequence. In certain embodiments, the functional unit of a cell genome is a noncoding genomic sequence. In further embodiments, the non-coding sequence may be in regions 5' and 3' of the coding region of a gene of interest.
In certain embodiments, the cells of the cells of the heterogenous population of cells or a subset thereof are cultured the cells with a chemical agent or a biological agent or actively physically disturbing the cell culture. The term chemical agent includes various small molecule drugs/compounds, while the term biological agent refers to biological drugs, which are a diverse category of drugs and are generally large, complex molecules. These biological drugs may be produced through biotechnology in a living system, such as a microorganism, plant cell, or animal cell. Types of biological products approved for use in the United States, including therapeutic proteins (such as filgrastim), monoclonal antibodies (such as adalimumab), vaccines (such as those for influenza and tetanus), cell therapy drug (for example, CarT), and gene therapy drug (for example, recombinant AAV vectors). During the perturbation step, the cells may be incubated with the chemical and/or biological agent or any combinations thereof, such as a library of peptides or a library of small molecules or a library of anti-cancer drugs, which are available commercially or publicly. See, for example, www. selleckchem . com/ screening/ anti -cancer-compound- library ,html?gclid=Cj wKC Aj wOtHoBRBhEiwAvP 1 GFfLrUWZGJpXyE_ QMr_f3NMvn9tC8433K8edIeOYkL08wUNdHzzwgFhoCquQQAvD_BwE, www.genscript.com/peptide-library.html, www.creative-biolabs.com/drug- discovery/therapeutics/whole-peptide-library.htm, phoenixpeptide.com/products/category/Peptide-Libraries/, www.selleckchem.com/screening/express-pick-library-premium- version.html?gclid=CjwKCAjw0tHoBRBhEiwAvPlGFTm7F6ezXNklpUNajAWqP8Nc4C Oj2NlMNTes9pEGADe8nMF7UmUgPxoCT9cQAvD_BwE, www.selleckchem.com/screening/fda-approved-drug-library.html and www.chembridge.com/screening_libraries/. In certain embodiments, the cells are contacted with various chemical drugs or biological drugs for large-scale drug screens. In certain embodiments, the cells are treated via CRISPR-Cas enzyme and various guide RNA. The term physical disturbance refers to an active mixing, shaking, stretching, or stirring of the cells in culture. In certain embodiments, a population of cells is treated separately with any one of the perturbations as described herein or with any combinations of the perturbations, resulting in a heterogeneous population of cells.
II. Chromatin Accessibility /Tagmentation
Chromatin accessibility is the degree to which nuclear macromolecules are able to physically contact chromatinized DNA and is determined by the occupancy and topological organization of nucleosomes as well as other chromatin-binding factors that occlude access to DNA. If such physical contact can be established in a certain region of the DNA, that DNA region is considered to be in an open chromatin state. The organization of accessible chromatin across the genome reflects a network of permissible physical interactions through which enhancers, promoters, insulators, and chromatin-binding factors cooperatively regulate gene expression. This landscape of accessibility changes dynamically in response to both external stimuli and developmental cues, and emerging evidence suggests that homeostatic maintenance of accessibility is itself dynamically regulated through a competitive interplay between chromatin-binding factors and nucleosomes. See, for example, Klemm et al., Chromatin accessibility and the regulatory epigenome. Nat Rev Genet. 2019 Apr;20(4):207- 220. doi: 10.1038/s41576-018-0089-8, which is incorporated herein by reference. Therefore, it is important to illustrate how chromatin accessibility defines regulatory elements within the genome and how these epigenetic features are dynamically established to control gene expression. As used herein, the term “chromatin accessibility” may refer to chromatin accessibility across the cell genome.
Assays of transposase accessible chromatin sequencing (ATAC-seq) is an efficient assay to assess genome-wide chromatin accessibility using a robust transposase to fragment the genome. Specifically, ATAC-seq identifies accessible DNA regions by probing open chromatin with a transposase (for example, a hyperactive mutant Tn5 transposase) that inserts sequencing adapters into open regions of the genome. The transposase excises any sufficiently long DNA in a process called tagmentation: the simultaneous fragmentation and tagging of DNA performed by transposase pre-loaded with sequencing adaptors. The tagged DNA fragments (referred to as fragmented DNA or tagmented DNA) can be amplified by PCR and sequenced. Sequencing reads are then be used to infer regions of increased accessibility as well as to map regions of transcription-factor binding sites and nucleosome positions.
The methods provided include performing a tagmention step to assess the effects of perturbations on chromatin accessibility. In certain embodiments, the method comprises obtaining cell nuclei from all or a portion of the single cells of that have genetic perturbations. In certain embodiments, the methods include a preparation step, in which the cells are lysed in a resuspension buffer. In certain embodiments, the cell membrane is lysed but the cell nuclei remain intact. In certain embodiments, the lysed cells still contain mitochondria. For example, using the cell lysing method performed in the Examples, an about 20% to about 50% mitochondrial reads were found in the ATAC library. Therefore, as used herein, the term “cell nucleus” or any grammatical variation thereof may refer to a cell nucleus, the membrane-bound organelle found in eukaryotic cells which contains cell genome. It may also include some cytosomal/cytosomic components which remain physically attached to the cell nucleus after cell lysing, for example, endoplasmic reticulum (ER) connected to the nucleus and some mitochondria.
Isolated nuclei are present in separate partitions for the tagmentation step. In certain embodiments, an individual partition contains at least about 1000, about 2000, about 5000, about 25,000, or about 50,000 nuclei per partition. In certain embodiments, the partition is an individual well of multiwall plate (e.g., a 96-well plate). In certain embodiments, the tagementation step includes separating nuclei into at least 5, 10, 20, 40, 60, or 80 partitions. In certain embodiments, partitions include a nucleotide sequence that includes unique first barcode (i.e., a barcode sequence distinguishable from a barcode sequence of another or all other partitions). In certain embodiments, additional partitions have a nucleotide sequence that includes a second barcode, a third barcode, or a fourth barcode, etc. that is linked to the DNA during tagmentation.
Nuclei in partitions are incubated with in tagmentation buffer that comprises a transposome complex, which includes a transposase, a transposon, and a nucleotide sequence comprising a handle sequence and the first barcode. The transposase causes staggered double-stranded breaks in DNA, and the handle sequence and the first barcode are linked to the double-stranded DNA at the staggered breaks.
A “transposase” is an enzyme that binds to the end of a transposon and catalyzes its movement to another part of the genome by a cut and paste mechanism or a replicative transposition mechanism. In one embodiment, such enzyme is a member of the RNase superfamily of proteins which includes retroviral integrases. Examples of transposases include Tn3, Tn5, and hyperactive mutants thereof. Tn5 can be found in Shewanella and Escherichia bacteria. An example of a hyperactive mutant Tn5 comprises a mutation of E54K. In certain embodiments of this method, the transposase is TnY or Tn5. TnY is a hyperactive mutant of the transposase from Vibrio parahemolyticus (ViPar). The inside and outside ends (IE and OE, respectively) of the ViPar transposon utilize the same sequence as the IE and OE of the Tn5 transposon, suggesting the ViPar transposon would be compatible with existing Tn5-based workflows. Two mutations were introduced: (1) P50K, equivalent to the mutation E54K in Tn5, which is predicted to make the transposon hyperactive and (2) M53Q, which changes the residue that interacts with nucleotide 9 (a thymine) on the nontransferred strand of the mosaic end (ME) similar to Tn5 Q57, predicted to increase binding to the Tn5 ME. The ViPar transposase with P50K and M53Q mutations, henceforth referred to as TnY, showed Tn5 ME loading and tagmentation activity. The insertion site preference of TnY was characterized by performing tagmentation on NA12878 DNA and sequencing on a MiSeq Instrument (Illumina); it was found that TnY has insertion site preferences distinct from, but of a similar magnitude to those of Tn5.
As used herein, the term “transposon” refers to a nucleic acid molecule that is capable of being incorporated into a nucleic acid by a transposase enzyme. A transposon includes two transposon ends (also termed “arms” and “mosaic end” or “ME”, for example, a doublestranded mosaic end comprising a pMENT common oligo). In certain embodiments, the two transposon ends are linked by a sequence that is sufficiently long to form a loop in the presence of a transposase. Transposons can be double-, single-stranded, or mixed, containing single- and double-stranded region(s), depending on the transposase used to insert the transposon. For Mu, Tn3, Tn5, Tn7, or TnlO transposases, the transposon ends are doublestranded, but the linking sequence need not be double-stranded. In a transposition event, these transposons are inserted into double-stranded DNA. The term “transposon end” refers to the sequence region that interacts with transposase. The transposon ends are double-stranded for transposases Mu, Tn3, Tn5, Tn7, TnlO, etc. The transposon ends are single-stranded for transposases IS200/IS605 and ISrad2, but form a secondary structure, just like a doublestranded region. In a transposition event, single-stranded transposons are inserted into singlestranded DNA by a transposase enzyme. See, for example, US20150337298A1, which is incorporated herein by reference.
In certain embodiments, the transposome complex comprises a transposase assembled with a transposon comprising two mosaic end double-stranded (MEDS) oligos. In a further embodiment, the transposome complex comprises a barcode in one or both of the MEDS oligos. In certain embodiments, the transposome complex further comprises a nucleic acid sequence at the 5’ ends of the MEDS oligos, wherein the nucleic acid sequence is able to anneal to a PCR primer. For example, a T5 oligo may be annealed to MEDS A and a T7 oligo may be annealed to MEDS B.
Methods for assembling a transposome complex and performing tagmentation are known in the art. See, e.g., WO 2021/011433 Al, which incorporated herein by reference.
The term “handle” refers to a nucleic acid sequence that is complementary to and capable of binding a capture sequence. For example, in a suitable PCR amplification, the handle and capture sequences anneal and application results in a sequence complementary to the handle and/or capture sequences. In some embodiments, the handle is positioned at the 3’ end of an oligonucleotide sequence. In other embodiments, the handle is positioned at the 5’ end of a construct oligonucleotide sequence. In certain embodiments, the handle has a length of at least 4 nucleotides, 5 nucleotides, 10 nucleotides, 15 nucleotides, 20 nucleotides, 25 nucleotides, 30 nucleotides, 35 nucleotides, 40 nucleotides, 45 nucleotides, 50 nucleotides, 60 nucleotides, 70 nucleotides, 80 nucleotides, 90 nucleotides, 100 nucleotides, or a length within a range of any two of the foregoing lengths.
In certain embodiments, the tagmentation step include the addition of a reagent to stop the tagmentation before proceeding with subsequent steps. In certain embodiments, the reagent includes EDTA.
III. Reverse Transcription / Cellular Indexing and Barcodes
The methods provided include performing reverse transcription wherein RNAs (including, e.g., CRISPR guide RNAs and other RNA species) are captured and reverse transcribed to complementary DNA (cDNA). Cell nuclei are incubated with reverse transcription primers and an oligo (i.e., a template switch oligo (TSO)) that includes a handle sequence and a barcode sequence, or the corresponding antisense sequence thereof, reverse transcriptase, and dNTPs in a reverse transcription buffer. The reverse transcription reaction generates cDNA products that the handle and barcode sequences. Suitable primers can be designed for the reverse transcription. In certain embodiments, the reverse transcription primers include polydT. In certain embodiments, the reverse transcription primers include primers specific for CRISPR guide RNA.
In certain embodiments, the reverse transcription step is performed in the same partitions as the tagmentation step (i.e., nuclei present in a partition are not redistributed following tagmentation step and/or transferred to new partitions). In certain embodiments, all or a portion of nuclei present in a partition during a tagmentation reaction are transferred to a new partition. Irrespective of whether the tagmentation step and reverse transcription steps are performed in the same partition, the barcode linked to the double-stranded DNA during the tagmentation step is matched to the barcode of the template switch oligo. In certain embodiments, the barcode of the template switch oligo is a first barcode, identical to the first barcode of a tagmentation reaction. Where nuclei are separated into multiple partitions during the tagmentation step and DNA of the nuclei present in a partition are linked to a first barcode, a second barcode, a third barcode, a fourth barcode, etc., the TSO of the corresponding reverse transcription includes a matching barcode identifying nuclei (or a subset of nuclei). As a result of the reverse transcription reaction, cellular RNA and CRISPR guide RNAs are reverse transcribed to cDNA comprising a handle sequence and a barcode sequence, or the corresponding reverse-complement sequence thereof. During subsequent sequence and analysis steps the barcode sequences of the fragmentation and reverse transcription reactions can be used to assign sequences to specific cell or nuclei.
In certain embodiments, the TSO further comprises a “Unique Molecular Identifier “ (UMI), which is a random sequence of nucleotide bases, which when it is a functional element of the polymer construct, is specific for that polymer construct. The UMI permits identification of amplification duplicates of the polymer construct/construct oligonucleotide sequence with which it is associated. One or more UMI may be associated with a single polymer construct/construct oligonucleotide sequence. The UMI may be positioned 5’ or 3’ to a barcode in a nucleotide construct (e.g., a TSO). In one embodiment of the methods described herein, depending on which RNA-sequencing method is used, a UMI is added during the method. However, not all RNA-seq methods make use of UMIs. In the example of single cell droplet RNA-sequencing described below, another UMI is introduced during reverse transcription. Each UMI is specific for its construct oligonucleotide sequence. In certain embodiments, the UMI is about 8 nucleotides in length. In other embodiments, each UMI can be at least about 1 to 100 nucleotides, in length. Thus in various embodiments, the UMI is formed of a random sequence of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 80,
91, 92, 93, 94, 95, 96, 97, 98, 99 or up to 100 monomeric components, e.g., nucleic acids.
Following tagmentation and reverse transcription steps, cell nuclei are pooled. The pooled nuclei include nuclei from multiple (i.e., more than one) partition. In certain embodiments, the pooled nuclei are in an aqueous suspension. In certain embodiments, the pooled nuclei include nuclei and/or a subset of nuclei from at least 5, at least 10, at least 20, at least 40, at least 60, or at least 80 partitions.
Following the pooling step, nuclei are randomly partitioned. In certain embodiments, the nuclei suspension is then subject to an additional, second-barcoding step utilizing droplet microfluidics (e.g., lOx Genomics AT AC kit). An advantage of the methods described herein is the ability to combine combinatorial indexing and droplet microfluidics technology to achieve high-throughput single cell analysis. For example, the methods are compatible with super-loading a single ATAC lane (e.g., loading -100,000 nuclei on a single lane). See, Cell AT AC Library & Gel Bead Kit, 4 rxns PN- 1000176 Chromium Next GEM Chip H Single Cell Kit, 48 rxns PN-1000161. In certain embodiments, the random partitioning results in individual partitions (a droplet) containing a bead and at least about 2, at least about 5, at least about 10, at least about 15, at least about 20 nuclei, at least about 30, at least about 40, or at least about 50 nuclei.
In certain embodiments, nuclei are partitioned with a bead comprising linked nucleotide sequences comprising a bead-specific barcode sequence and a capture sequence. In certain embodiments, the bead-specific barcode sequence is a 1 OX AT AC GEM barcode. The capture sequence is complementary to and capable of binding a handle sequence present on products of the tagmentation step (e.g., a nucleic acid having a handle sequence and a first barcode linked to double-stranded DNA) or the reverse transcription step (i.e., a nucleotide sequence having a handle sequence and a first barcode in combination with a CRISPR guide RNA transcribed cDNA or cDNA generated from a nuclear and/or a cellular RNA of comprising the first barcode). In certain embodiments, the capture sequence is a 1 OX AT AC GEM capture sequence. In certain embodiments, a PCR reaction results in disruption of the nuclei in a droplet, and generation of molecular species having a specific bead-barcode. The resulting molecular species include products of the tagmentation and/or reverse-transcription steps, thereby generating a library of amplicons having a first barcode and a second barcode. See FIG. 2.
In other embodiments, pooled nuclei are randomly partitioned into a set of partitions and that do not include a bead for the additional barcoding step. In certain embodiments, the individual partitions contain oligos comprising a second-barcode sequence and a capture sequence. In certain embodiments, the pooled nucleic are randomly partitioned and then incubated with oligos comprising a second-barcode sequence and a capture sequence. In certain embodiments, the oligos of each partition are unique, i.e., the second-barcode of the oligo is not present in another or any other individual partitions. In certain embodiments, the pooled nuclei are randomly partitioned into at least 5, at least 10, at least 20, at least 30, at least 40, at least 50 partitions. In certain embodiments, the partitions are wells of a microwell plate (e.g., a 96-well plate) In certain embodiments, the random partitioning results in individual partitions (e.g., wells) containing at least about 2, at least about 5, at least about 10, at least about 15, at least about 20 nuclei, at least about 30, at least about 40, or at least about 50 nuclei. In certain embodiments, the second barcode sequence is a 10X AT AC GEM barcode. In certain embodiments, the capture sequence is a 10X AT AC GEM capture sequence. In certain embodiments, a PCR reaction is performed with the cells in individual partitions (e.g., wells), resulting in disruption of the nuclei, and generation of molecular species having a specific bead-barcode. The resulting molecular species include products of the tagmentation and/or reverse-transcription steps, thereby generating a library of amplicons having a first barcode and a second barcode.
IV. Sequencing and Analysis
During the sequencing step, DNA and/or cDNA are extracted and sequenced. In certain embodiments, the methods comprise further amplification (linear or exponential) to obtain libraries with increased copy numbers of molecular species. Analysis of the sequences provides chromatin accessibility and RNA sequences (transcriptome) information for single cells that have identifiable genetic perturbations (through capture and sequencing of guide RNAs).
In certain embodiments, the methods comprise isolation of a molecular species from an amplification library or a subset of molecular species from an application library. See Examples 1 and 2 for exemplary protocols. In certain embodiments, PCR amplification products are separated to obtain a library comprising a combination of PCR products comprising double-stranded DNA and cDNA from transcription of CRISPR guide RNAs and cDNA generated from cellular RNA. In certain embodiments, PCR amplification products are separated to obtain a first library comprising double-stranded DNA of (b), and a second library comprising (i) cDNA from transcription of the CRISPR guide RNAs and (ii) cDNA generated from cellular RNA, optionally wherein the separation is based on size. In certain embodiments, the separation is based on size. In certain embodiments, separation is achieved using a streptavidin-biotin mediated method, wherein prior PCR reaction links a biotinylating site to a molecular species.
DNA sequencing is the process of determining a nucleic acid sequence - the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Methods of sequencing may include, but do not limited to, Maxam-Gilbert sequencing, shotgun sequencing, bridge PCR, Chain-termination methods, Single-molecule real-time sequencing, Ion semiconductor (Ion Torrent sequencing), Pyrosequencing (454), Sequencing by synthesis (Illumina), Combinatorial probe anchor synthesis (cPAS- BGI/MGI), Sequencing by ligation (SOLiD sequencing), Nanopore Sequencing, Chain termination (Sanger sequencing), Massively parallel signature sequencing (MPSS), and Polony sequencing. Such sequence may be performed on a deep sequencing platform which sequences for multiple times, sometimes hundreds or even thousands of times and/or via a next-generation sequencing (NGS) approach (which is also known as high-throughput sequencing).
After sequencing, the DNAs or cDNAs having the same first barcode and second barcode are identified as being obtained from the same cell (or nuclei). In certain embodiments, the second barcode is a bead-specific barcode. In certain embodiments, presence of certain RNA in the cell (for example, a microRNA or a CRISPR guide RNA) is determined through sequencing cDNAs. In a further embodiment, the guide RNA may be aligned to identify a respective target gene or genomic region of interest. In certain embodiments, transcriptome shown by RNA sequences may be acquired via cDNA sequencing, thus providing data available via traditional RNA-seq (RNA sequencing).
In certain embodiments, the genomic DNAs (fragmented by transposase in the tagmentation step) are analyzed as in ATAC-seq. For example, sequence reads of the fragmented genomic DNAs are acquired and aligned to a reference genome (for example, using programs available to one of skill in the art such as BWA and Bowtie2). In certain embodiments, one or more parameters for quality control purposes are acquired, for example, fragment size distribution, library complexity, adjusting read start position based on transposase (for example, aligning sequence reads to the positive strand are offset by ± 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 bp, and all reads aligning to the negative strand are offset by ± 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 bp), and promoter/transcript body score (which is calculated for coverage of promoter divided by the coverage of transcripts body, showing if the signal is enriched in promoters). In one embodiment, aligning sequence reads to the positive strand are offset by + 4 bp, and all reads aligning to the negative strand are offset by -5 bp). In certain embodiments, mapping results are separated according to uniqueness and alignment type (concordant, discordant, and non-concordant/non-discordant). Peak-calling identifying enriched (signal) regions in ATAC-seq data is then performed using tools, such as MACS2. In one embodiment, the chromosome position is plotted in x axis and the enrichment score is plotted in y axis. Therefore, peaks in the plot identified enriched regions in chromosome, indicating open chromatin with high chromatin accessibility. One or more of the following may be identified: (1) Nucleosome free, mononucleosome, dinucleosome, and trinucleosome regions; (2) distribution of nucleosome-free and nucleosome-bound regions; (3) transcription factor footprints; (4) sample correlations. Numbers of ATAC fragments, peaks, as well as differential peaks (for example, for comparing ATAC-seq samples from two different conditions) may be obtained using this method.
In certain embodiments, cells with at least about 50, about 100, about 200, about 300, about 400, about 500, about 600, about 700, about 800, about 900, about 1000, about 2000, about 3000, about 4000, about 5000, about 6000, about 7000, about 8000, or about 9000 unique ATAC-seq fragments are selected for analysis. Additionally or alternatively, each cell is required to have at least about 10, about 20, about 30, about 40, about 50, about 60, about 70, about 80, about 90, about 100, about 200, about 300, about 400, about 500, about 600, about 700, about 800, about 900, about 1000, about 2000, about 3000, or about 4000 RNA (for example guide RNA) reads with at least about 90%, about 95%, about 96%, about 97%, about 98%, or about 99% of the reads assigned to one RNA sequence. In certain embodiments, cells with at least about 2000 unique ATAC-seq fragments are selected for analyses. Additionally, or alternatively, each cell is required to have at least about 100 guide RNA reads with at least about 99% of the reads assigned to one RNA sequence.
In certain embodiments, analysis is limited to cells having a specific perturbation, wherein at least 25, 50, 100, 150, 200, or 250 cells are identified as having the perturbation.
In a further embodiment, ChlP-seq may be used to identify enrichment or depletion in accessibility of transcription factor (TF) binding sites following chromatin modifier knockout. In another embodiment, JASPAR motifs may be used to predict TF binding sites from the JASPAR database was also utilized (386 motifs from JASPAR 2016, human CORE dataset). Transcription factor motif enrichment and depletion scores may be calculated, for example, using chromVAR20. In yet another embodiment, coverage per base around AP-1 motifs using mononucleosomal fragments (defined as paired-end ATAC-seq fragments with a length between 180 and 247 nt9) was calculated, for example, using BEDTools. In one embodiment, accessibility of enhancers and promoters may be determined.
In certain embodiments, a null peak distribution derived from non-perturbed cells and/or untreated (chemical agent or a biological agent) is used as a reference and data acquired from cells is compared to the reference. In certain embodiments, to avoid biases that may arise when comparing coverage between different perturbations with different numbers of single cells, each cell population per perturbation is down-sampled to a smaller cell number and the data acquired is compared to a non-perturbated cell population of a similar size. Each population of cells is resampled about 100, about 200, about 500, about 600, about 700, about 800, about 900, about 1000, about 1500, about 2000, about 3000, about 5000, or more times and the coverage at transcription start sites, weak enhancers (midpoint), and strong enhancers (midpoint) is calculated.
V. Specific Embodiments
Al . A method for evaluating effects of genetic perturbations on chromatin accessibility and the transcriptome of single cells in a population of cells, the method comprising:
(a) obtaining a heterogeneous population of cells having single cells with one or more genetic perturbations having been introduced by a CRISPR guide RNA that targets a gene or genomic region of interest, the single cells comprising one or more CRISPR guide RNAs;
(b) obtaining cell nuclei from all or a portion of the single cells of (a) and separating the nuclei into partitions, and incubating the cell nuclei in with a tagmentation buffer that comprises a transposome complex, wherein the transposome complex comprises a transposase, a transposon, and a nucleotide sequence comprising a handle sequence and a first barcode, wherein the transposase causes staggered double-stranded breaks in DNA, and wherein the handle sequence and the first barcode are linked to the doublestranded DNA at the staggered breaks;
(c) performing reverse transcription on nuclei from (b), which comprises contacting and incubating the nuclei with reverse transcription primers and template switch oligos (TSOs) comprising the handle sequence and the first barcode, or the corresponding reverse-complement sequence thereof, optionally wherein the TSO comprise a UMI, reverse transcriptase, and dNTPs in a reverse transcription buffer, whereby cellular RNA and CRISPR guide RNAs are reverse transcribed to cDNA comprising the handle sequence and the first barcode, or the corresponding reverse-complement sequence thereof;
(d) pooling nuclei from multiple partitions;
(e) randomly partitioning one or more nuclei of (d) with a bead, the bead having linked nucleotide sequences comprising a bead-specific barcode and a capture sequence, the capture sequence being complementary to and capable of binding the handle sequence, and disrupting the nuclei and performing PCR amplification wherein the handle sequence binds the bead capture sequence to generate PCR products comprising the bead-specific barcode in combination with each of (i) double-stranded DNA of (b) comprising the first barcode; (ii) CRISPR guide RNA transcribed cDNA of (c) comprising the first barcode; and (iii) cDNA generated from nuclear and/or cellular RNA of (c) comprising the first barcode; (f) sequencing and analyzing the PCR amplification products generated in (e) to associate the effects of a genetic perturbation with the chromatin accessibility and the transcriptome from a single cell, whereby sequences acquired with the same combination of the first barcode and the bead-specific barcode are identified as being from the same cell.
A2. A method for evaluating effects of genetic perturbations on chromatin accessibility and the transcriptome of single cells in a population of cells, the method comprising:
(a) obtaining a heterogeneous population of cells having single cells with one or more genetic perturbations having been introduced by a CRISPR guide RNA that targets a gene or genomic region of interest, the single cells comprising one or more CRISPR guide RNAs;
(b) obtaining cell nuclei from all or a portion of the single cells of (a) and separating the nuclei into partitions, and incubating the cell nuclei in with a tagmentation buffer that comprises a transposome complex, wherein the transposome complex comprises a transposase, a transposon, and a nucleotide sequence comprising a handle sequence and a first barcode, wherein the transposase causes staggered double-stranded breaks in DNA, and wherein the handle sequence and the first barcode are linked to the doublestranded DNA at the staggered breaks;
(c) performing reverse transcription on nuclei from (b), which comprises contacting and incubating the nuclei with reverse transcription primers and template switch oligos (TSOs) comprising the handle sequence and the first barcode, or the corresponding reverse-complement sequence thereof, optionally wherein the TSO comprise a UMI, reverse transcriptase, and dNTPs in a reverse transcription buffer, whereby cellular RNA and CRISPR guide RNAs are reverse transcribed to cDNA comprising the handle sequence and the first barcode, or the corresponding reverse-complement sequence thereof;
(d) pooling nuclei from multiple partitions;
(e) randomly partitioning one or more nuclei of (d), and contacting the nuclei with nucleotide sequences comprising a second barcode and a capture sequence, the capture sequence being complementary to and capable of binding the handle sequence, wherein the and disrupting the nuclei and performing PCR amplification wherein the handle sequence binds the bead capture sequence to generate PCR products comprising the second barcode in combination with each of (i) double-stranded DNA of (b) comprising the first barcode; (ii) CRISPR guide RNA transcribed cDNA of (c) comprising the first barcode; and (iii) cDNA generated from nuclear and/or cellular RNA of (c) comprising the first barcode;
(f) sequencing and analyzing the PCR amplification products generated in (e) to associate the effects of a genetic perturbation with the chromatin accessibility and the transcriptome from a single cell, whereby sequences acquired with the same combination of the first barcode and the second barcode are identified as being from the same cell.
A3. The method of Al or A2, wherein the first barcode is unique to a partition and differs from another or all other first barcodes present in additional partitions.
A4. The method of any one of embodiments Al to A3, wherein the one or more genetic perturbations include CRISPR-Cas mediated editing, including CRISPR/Cas9, prime-editing, base-editing, CRISPRa, and/or CRISPRi.
A5. The method of any one of embodiments Al to A4, wherein more than one CRISPR guide RNA targets a gene or genomic region of interest or a different gene genomic region of interest in a single cell.
A6. The method of any one of embodiments Al to A5, wherein the one or more partitions of (b) are individual wells of a microwell plate, optionally a 96 well plate.
A7. The method of any one of embodiments Al to A6, wherein one or more partitions of (b) contain at least about 1000, about 2000, about 5000, about 25,000, or about 50,000 nuclei per partition.
A8. The method of any one of embodiments Al to A7, further comprising washing the nuclei of step (b) prior to step (c) to stop the tagmentation reaction without disrupting the cell nuclei, wherein the washing comprises addition of EDTA.
A9. The method of any one of embodiments Al to A8, wherein step (e) comprises partitioning at least about 2, at least about 5, at least about 10, at least about 15, or at least about 20 nuclei with the bead. A10. The method of any one of embodiments Al to A9, wherein the PCR products generated in step (e) are separated to obtain a library comprising a combination of PCR products comprising double-stranded DNA of (b), cDNA from transcription of the CRISPR guide RNAs and cDNA generated from cellular RNA, optionally wherein the separation is based on size.
Al 1. The method of any one of embodiments Al to A10, wherein the PCR products generated in step (e) are separated to obtain a first library comprising double-stranded DNA of (b), and a second library comprising (i) cDNA from transcription of the CRISPR guide RNAs and (ii) cDNA generated from cellular RNA, optionally wherein the separation is based on size.
A12. The method of any one of embodiments Al to Al l, wherein the population of cells of (a) have been further treated with a chemical agent or a biological agent.
Al 3. The method of any one of embodiments Al to A12, wherein the analysis is limited to cells (nuclei) defined as having at least 200 fragments per cell and/or perturbations wherein at least 100 cells are identified as having the perturbation.
EXAMPLES
The following examples are provided for purposes of illustration only. The protocols and methods described in the examples are not considered to be limitations on the scope of the claimed invention. Rather this specification should be construed to encompass any and all variations that become evident as a result of the teaching provided herein. One of skill in the art will understand that changes or variations can be made in the disclosed embodiments of the examples and expected similar results can be obtained. For example, the substitutions of reagents that are chemically or physiologically related for the reagents described herein are anticipated to produce the same or similar results. All such similar substitutes and modifications are apparent to those skilled in the art and fall within the scope of the invention. Example 1 : Materials and Methods
Cell lines
BT16-luciferase42 cells were a gift from Rintaro Hashizume. BT12 cells were a gift from Charles Roberts. NIH-3T3 (CRL-1658) and CHLA06 (CRL-3038) were acquired from ATCC. HEK293FT cells were acquired from ThermoFisher (R70007). BT16 and BT12 cells were validated by STR profiling, while other lines were authenticated by the vendor. All cell lines were maintained at 37 °C and 5% CO2 in DIO medium: DMEM with high glucose and stabilized L-glutamine (Caisson DML23) supplemented with 10% Serum Plus II (Sigma 14009C). Monoclonal CRISPRi -expressing BT16 cell lines were generated by transducing cells with lentiCRISPRi(v2)-Blast (Addgene 170068)14, selecting with lOpg/ml Blasticidin S (ThermoFisher Al 113903), and plating at a low density for colony picking. Several clones were selected and monitored for growth. A clone maintaining normal BT16 growth patterns and CRISPRi(v2) expression by Cas9 immunocytochemistry was selected for the MultiPerturb-seq screen. NIH-3T3, BT12, and CHLA06 cells were also transduced with lenti CRISPRi (v2)-Blast and selected with lOpg/ml blasticidin for 1 week.
Guide RNA design for pooled library and array validation
To identify factors involved in reprogramming AT/RT cells, we constructed a library of 109 epigenomic remodelers with 3 guide RNAs (gRNAs) per gene. The AT/RT library targeted genes that encode proteins with roles in DNA modification, histone modification, histone chaperones, transcription factors, chromatin remodelers, and structural factors. We also included 17 non-targeting controls that do not target anywhere in the human genome. The library was designed using gRNAs from the Dolcetto CRISPRi library and CRISPick43. Three gRNAs were selected per gene and homopolymers were excluded. Oligonucleotides were ordered and synthesized by Twist Biosciences in pooled format. For the mouse spike-in, mouse non-targeting gRNAs were ordered individually through Integrated DNA Technologies (IDT) and pooled for library cloning.
Pooled CRISPR library cloning and quality control
Oligonucleotides were diluted, and a PCR cycle test was performed to ascertain the minimum cycles needed for library amplification to preserve integrity. Following this, oligonucleotides were amplified using a two-step nested PCR, then cloned in lentiGuideFE- Puro (Addgene 170069) with Gibson cloning using Gibson mix (NEB E261 IL) and precipitated with ethanol. The library was then transformed into Endura cells (Biosearch 60242-2). Bacteria were then grown on plates, maxi-prepped (IBI Scientific IB47125), and then sequenced. For quality control, libraries were sequenced on Illumina MiSeq. Reads were demultiplexed using bcl2fastq (version 2.20), guide spacers were extracted using cutadapt44 (version 4.0), and aligned with bowtie45 (version 1.1.2). For the epigenomic remodeler library, we recovered 98% of the designed guide RNAs and, using the read distribution, we computed that the 90th: 10th quantile ratio of guide RNAs was 1.8. For the non-targeting library (mouse), we recovered 100% of the designed gRNA and the 90th: 10th quantile ratio was 6.5.
Lentivirus production
Lentiviral libraries were prepared in T225 flasks. Each flask was seeded with 27x106 cells the day before in 30 ml of antibiotic-free D10 media to achieve 80-90% confluence before transfection. The transfection mix was 24.9pg of the transfer plasmid (including the epigenetic remodelers or mouse non-targeting library), 13.7pg pMD2.G (Addgene 12260), 19.9pg psPAX2 (Addgene 12259), 2490pl OptiMEM (Invitrogen 51985-091) and 138pl 1 mg/ml polyethylenimine linear MW 25000 (Polysciences 23966). The mixture was mixed and allowed to incubate for 10 minutes at room temperature. After removing 15 ml media from the cells, the mixture was added dropwise. Six hours following transfection, an additional 15 ml fresh media with 1% BSA (VWR AAJ65097-18) was added. Viral supernatants were collected 72 hours following transfection, spun down, filtered with a 0.45mm filter (Millipore SE1M003M00). Lentivirus for the pooled library was concentrated using 2 ml of a 20% sucrose cushion by ultracentrifugation (24,000 rpm in a JS24.38 swinging bucket rotor, Avanti JXN30) for two hours at 4°C, then resuspended in PBS, aliquoted, and stored at -80°C.
Pooled library transduction
Pooled libraries were transduced into BT16 and NH4-3T3 cells with the corresponding libraries with variable viral volumes to determine the appropriate multiplicity of infection for a high single-infection rate, as determined by puromycin survival (pSUrvivai). We aimed for a pSUrvivai of 1 - 5% to ensure single-guide integration. Based on this titration, cells were infected with the appropriate volume of virus. Forty-eight hours following transduction, BT16 and NH4-3T3 cells were lifted and selected with 1 pg/ml and 2pg/ml puromycin respectively (Invivogen ant-pr-1). At the same time, we performed in-line controls in 6-well plates and confirmed that pSUrvivai was within the 1 - 5% target. Seven days following infection, cells were lifted, counted, and pooled with 80% BT16 (human) cells and 20% mouse cells (3T3) as a spike-in control for the MultiPerturb-seq library preparation workflow.
MultiPerturb-seq library preparation
Part 1: Nuclei isolation, tagmentation, and reverse transcription
Overall, our ATAC protocol is similar to a previous, well-validated ATAC method46 and our transposomes are assembled as in Picelli et al.47 with MEDS-A (MPSprimer Ol), pMENT (MPSprimer_02), and 48 barcoded MEDS-B (MPSprimer_17 - MPSprimer_64) for a 48-well barcoded transposome plate. Of note, although we used standard unsalted oligonucleotides (Integrated DNA Technologies), we found that HPLC-purified oligonucleotides can lead to increased fragments captured per cell. MultiPerturb-seq may also be performed without combinatorial indexing, in which case we advise use of HPLC-purified oligonucleotides since only one MEDS-B is required.
2.4 million human cells and 600k mouse cells were combined and lysed in 1ml Omni lysis buffer (lOmM Tris-HCl, pH 7.4, lOmM NaCl, 3mM MgCh, 0.1% NP-40 (ThermoFisher 85124), 0.1% Tween-20 (Sigma P1379), 0.01% digitonin (Promega G9441)48. Cells were lysed for 10 minutes on ice. After lysis, nuclei were spun down, pooled, resuspended in 450pl PBS and combined with tagmentation mix: 240pl 5X TD-TAPS (50mM TAPS-NaOH buffer, pH 8.5 [Boston BioProducts BB-2375], 25mM MgCh, 50% DMF [Sigma 494488]), 120pl 10% Tween-20, 300pl dilution buffer (lOmM Tris-HCl, pH 7.4, lOOmM NaCl, 50% glycerol, ImM DTT), 30pl RiboLock RNase inhibitor (ThermoFisher EO0381). The nuclei were then split among wells of barcoded transposomes for tagmentation.
Cells were then incubated for 30 minutes at 37°C in tagmentation mix while shaking at 350rpm on a ThermoMixer. Following tagmentation, 1 pl 126mM EDTA was added to each well and mixed to stop tagmentation. After this, 50pl PBS was added, and nuclei were spun at 400rcf for 4 minutes at 4°C. 53 l of supernatant was then removed, leaving 17pl and the nuclei pellet undisrupted. For the reverse transcription (RT), we added a master mix of 8pl 5X RT buffer (250 mM Tris-HCl, 375 mM KC1, 15 mM MgCh, 50 mM DTT), 2pl dNTPs, 2pl MPSprimer_06 (lOpM), 4pl MPSprimer_08 (lOpM), 2pl Maxima RT H-minus (ThermoFisher EP0753), and Ipl Ribolock (ThermoFisher EO0381) per well. We then added 4pl of barcoded TSO to match the ATAC barcodes to individual wells. This plate was then incubated for 90 minutes at 53°C, while shaking at 450rpm on a ThermoMixer. An alternative reverse transcription protocol using thermal cycling (50 °C for 10 min; then three cycles of 8 °C for 12 s, 15 °C for 45 s, 20 °C for 45 s, 30 °C for 30 s, 42 °C for 2 min and 50 °C for 3 min followed by a final step at 50 °C for 5 min) as previously used in ISSAAC- seq49 improves both ATAC and RNA capture, and we recommend this cycling instead of the fixed temperature RT. Nuclei were then resuspended well by triturating with a narrowed pipette tip and all wells were pooled into 2 x 1.5mL tubes, spun down, and re-pooled in a 1.5mL tube. The narrowed pipette tip was produced using a standard plastic 20pl pipette tip (Rainin) melted to narrow gauge using an infrared sterilizer (Joanlab DS-900S). After observing nuclei to avoid clumps and counting, nuclei were resuspended in diluted nuclei buffer to achieve the desired loading amount (100,000 nuclei in 8pl) and combined with 7pl ATAC buffer B (lOx Genomics PN2000193).
Part 2: 1 OX ATAC GEM generation, bar coding, and cleanup
The nuclei suspension was prepared for second-round barcoding using droplet microfluidics (10X Genomics ATAC kit PN1000176) following the manufacturer’s instructions. Briefly, nuclei were mixed with the master mix (56.5 pl Barcoding reagent B (PN2000194), 1.5pl Reducing agent A (PN2000087), 2pl Barcoding enzyme (PN2000125/139), and loaded onto the Chromium Next GEM Chip H (PN1000162) with glycerol, gel beads, and partitioning oil. Following the run on the Chromium Controller, lOOpl GEMs were collected and transferred to a PCR tube for GEM incubation. 15 cycles were substituted for 12 cycles during the linear amplification. GEMs were then cleaned with Dynabeads per the manufacturer’s instructions, and libraries were split into 20pl ATAC and 20pl RNA libraries for final library prep. We recovered ~3.6 cells per droplet on average. Part 3: Library preparation
The ATAC fraction (20pl) was cleaned up with 1.2X SPRI (Illumina) and amplified with an lOOpl reaction using NEBNext: 50pl 2X High-Fidelity 2X Master Mix (NEB M0541S), 5pl MPSprimer_04 (lOpM), MPSprimer_14 (lOpM), 20pl ATAC fraction and 20pl water (30 seconds 98°C, (10 seconds 98°C, 30 seconds 63°C, 1 minute 72°C) x 10-15 cycles, 2 minutes 72°C, hold 4°C), then cleaned with double-sided SPRI (0.45X, 1.8X) in order to isolate fragments of lengths 50-1000 bp. The RNA fraction (20pl) was cleaned by incubation with 8ul ExoSAP for 15 minutes at 37°C and then 15 minutes at 80°C. The cleaned RNA product was amplified using an ISPCR50 with an lOOpl KAPA HiFi reaction (Roche 07958935001): 50pl 2X Master Mix, 2.5pl MPSprimer_04 (lOpM), 2.5pl MPSprimer_07 (lOpM), 2.5pl MPSprimer_12 (lOpM), 28pl cleaned RNA product, and 14.5pl water (3 minutes 95°C, (20 seconds 95°C, 30 seconds 66°C, 1 minute 72°C) x 10 cycles, 2 minutes 72°C, hold 4°C). Following amplification, the mRNA and gRNA fractions were split using a two-sided SPRI4. The mRNA was collected with a 0.6X SPRI and the gRNA was isolated from the supernatant using an additional 1.4X SPRI. Each fraction was then resuspended in 1 Opl water. The mRNA may then be amplified with 3-9 additional cycles of a 50pl reaction if there is less than Ing of product: 25pl 2X KAPA HiFi Master Mix, 1.25pl MPSprimer_04 (lOpM), 1.25pl MPSprimer_07 (lOpM), lOpl cleaned RNA product, and 12.5pl water (3 minutes 95°C, (20 seconds 95°C, 30 seconds 66°C, 1 minute 72°C) x 3-9 cycles, 2 minutes 72°C, hold 4°C).
After this, the 1 Opl mRNA fraction was tagmented with Tn loaded with MPSprimer l 13 in 20pl of tagmentation buffer for 5 minutes at 55°C. This was then cleaned with DNA Clean & Concentrator-5 (Zymo D4014), resuspended in 33.5 l water and PCR amplified with 50pl PfuX751: lOpl 5X GC buffer, Ipl dNTPs, 2.5pl MPSprimer_04 (lOpM), 2.5pl MPSprimer_05 (lOpM), 0.5pl X7 polymerase, and 33.5pl mRNA fraction using the following program: 5 minutes 72°C, 30 seconds 98°C, (10 seconds 98°C, 30 seconds 61°C, 1 minute 72°C) x 10 cycles, 2 minutes 72°C, hold 4°C. The lOpl gRNA fraction was cleaned with 4pl 0.2U/pl ExoSAP and amplified with a 50pl intermediate PCR: 25pl 2X KAPA HiFi Master Mix with 1.25pl biotinylated guide scaffold primer (MPS_primerl 1, lOpM), 1.25 l MPSprimer_04 (lOpM), lOpl gRNA fraction, and 8.5pl water (3 minutes 95°C, (20 seconds 95°C, 30 seconds 64°C, 1 minute 72°C) x 10 cycles, 2 minutes 72°C, hold 4°C), then cleaned again with 1.8X SPRI, resuspended in lOpl water, and incubated with 4pl ExoSAP. Following cleanup, the gRNA was pulled down with Dynal MyOne Dynabeads Streptavidin Cl (ThermoFisher 65001), resuspended in 45pl water, then amplified with a final inner (guide library) PCR using KAPA HiFi Master Mix: 50pl Master Mix, 2.5pl MPSprimer_04 (lOpM), 2.5pl MPSprimer_13 (lOpM), and 45pl gRNA pulldown product (3 minutes 95°C, (20 seconds 95°C, 30 seconds 57°C, 1 minute 72°C) x 10 cycles, 2 minutes 72°C, hold 4°C). Samples were evaluated with Tapestation High Sensitivity DI 000 ScreenTape and Reagents (Agilent 5067), quantified with Qubit (ThermoFisher Q33231), and sequenced on both Illumina MiSeq and Illumina NovaSeq 6000 vl.5 platforms with 16bp index 1, 8bp index 2, and 50 (MiSeq) or lOObp (NovaSeq) read 1 and 2.
MultiPerturb-seq optimization
MultiPerturb-seq was developed incrementally, first incorporating ATAC, then mRNA and gRNA capture, ensuring preservation of each modality throughout the process (several key examples shown in FIG. 5). In brief, we built off of our previous work6, adapting it to the 10X ATAC kit using a mock gel bead oligonucleotide. We further optimized ATAC conditions based on protocols including46, 48, 52, 53 (FIG. 5 A, FIG. 5B). Both Tn554 and TnY6 were used in these experiments. We then adapted the direct guide capture technique from 4, also described in 55. We designed a template switch oligonucleotide (TSO)56 with barcode and unique molecular identifier (UMI) (FIG. 5C), and tested PCR50, 57 and cleanup conditions to achieve mRNA and gRNA capture (FIG. 5C, FIG. 5D). We also tested several variants of TSO (FIG. 5E). We incorporated an ISPCR50 to amplify RNA, as well as biotin pulldown and nested PCR to further enrich gRNA. Finally, we ensured trimodality integrity, confirming that tagmentation was stopped before reverse transcription, to avoid tagmenting the RNA- DNA heteroduplex58 (FIG. 5J), and that conditions preserved each species (FIG. 5K). Agarose gels in FIG. 5 A - FIG. 5 J are 1-2% with Ikb Plus DNA ladder (NEB N3200L) unless otherwise noted. For cost estimates, we used the method’s calculated cost when provided or estimated it based on major cost drivers (e.g. 10X Genomics Kits). Sequencing cost was not included in these estimates.
Read alignment and pre-processing
For alignment and pre-processing (FIG. 6A), we demultiplexed reads using bcl2fastq (version 2.20) with FASTQs for index reads. Reads were then trimmed with cutadapt44 (version 4.0) to extract barcode 1 (well barcode), barcode 2 (droplet barcode), ATAC reads, mRNA reads, gRNA reads, and UMIs based on position, then aligned separately (FIG. 6B). Barcodes and gRNA spacers were aligned with bowtie45 (version 1.1.2) with the settings -v 2 -m 1 -norc -best -strata. The barcode 1 reference was derived from oligonucleotide sequences and the barcode 2 reference was constructed from the whitelist provided by cellranger-atac (10X Genomics). ATAC reads were aligned with bowtie259 (version 2.5.1) with default parameters to the joint human (hg38) and mouse (mm 10) genome reference provided by 10X Genomics. Open chromatin peaks were called using macs260 callpeak (version 2.2.7.1) with the parameters -f BED -g hs -p 0.01 -nomodel -shift 37 -extsize 73 -B -SPMR -keep-dup all -call-summits then reads were assigned to peaks based on loci with bedtools window (version 2.30.0) with a 100 bp window around the start position. mRNA reads were aligned with STAR61 (version 2.7.3a) using the settings -quantMode GeneCounts -soloFeatures GeneFull_Ex50pAS, then annotated with subread62 featureCounts (version 2.0.4) using a joint human (hg38) and mouse (mmlO) gtf (10X Genomics, 2020-A) with the settings -t gene -R SAM. Aligned reads were then joined to create a list of cell barcodes (barcode 1 and barcode 2), unique molecular identifiers (UMIs) if applicable, and aligned/annotated reads. These were then deduplicated using awk based on barcode, UMI, and position, then reformatted as a counts matrix using DropletUtils63. For barcode collision rate calculations, we defined a collision in any modality as having <66% of the primary species. Each modality was evaluated independently using the same threshold. Cells with at least 500 RNA or ATAC fragments were considered for barcode collision analysis.
Guide RNA assignment
Perturbed cells were separated (pseudo-bulk) by perturbation and compared to published transcriptomic28 and accessible chromatin34, 35 atlases by computing the Pearson correlation across the top 1000 highly variable genes or peaks. Correlations were computed between each perturbation-specific pseudo-bulk and previously published primary tissue gene expression or open chromatin. For all correlations and differentiation scores, we only used cells with at least 200 fragments per cell and perturbations with at least 100 cells captured.
For analysis of MultiPerturb-seq gene expression, we first identified highly variable genes (HVGs). We defined HVGs as those genes with the largest standard deviation across cerebrum samples (n = 53 samples from 4 weeks post-conception [wpc] to adulthood with 1- 4 donors per developmental stage for that tissue). To compute correlations between MPS and the transcriptomic developmental atlas at specific timepoints, we take the Pearson correlations using only the top 1000 HVGs.
For analysis of MultiPerturb-seq open chromatin, we first identified highly variable promoter-adjacent peaks (HVPPs). We defined HVPPs as those peaks within 2 kb of a protein-coding gene transcription start site with the greatest standard deviation over a unified sample of the MPS ATAC-seq dataset (n = 11 perturbation pseudo-bulk samples) and accessible chromatin pre- or postnatal primary tissues (n = 8 prenatal samples of different brain cell types and 1 postnatal sample from frontal cortex). To compute correlations between MPS and the accessible chromatin developmental atlases, we take the Pearson correlations using only the top 1000 HVPPs.
We computed normalized differentiation scores for either gene expression or open chromatin by taking the difference between correlations (Pearson) with late (postnatal) timepoints and early (prenatal) timepoints to identify those perturbations that increased similarity to mature tissues. This difference was computed using the mean of the correlations over each post- or pre-natal timepoint. That is, we computed one mean correlation across timepoints prenatal and one mean correlation across timepoints postnatal, normalized each mean correlation, and then took the difference between these normalized means. For the normalization (over perturbations), for each stage (pre-natal or post-natal), we computed maximum and minimum values over perturbations and then assigned each perturbation a normalized rinorm = (n - min(r)) / (max(r) - min(r)).
Differentially expressed genes, peaks, and signatures
In order to identify differentially expressed genes and peaks, we used SCEPTRE65, a nonparametric tool that resamples perturbations to infer associations with gene expression65 with features per cell and counts per cell as covariates. We included barcodes with at least 100 fragments as cells and genes with at least 10 cells captured (n = 106,424 cells). We also applied SCEPTRE to other analyses beyond gene expression, such as the ATAC nearest gene (any distance), ATAC TSS (+/-2kb), and RNA transcription factor transcription factor signatures from msigdb. Gene Ontology (GO) enrichment analyses were performed using clusterProfiler enrichGO (version 4.6.2). CROP -Multiome
We recloned our epigenomic remodeler library into CROP-seq-opti66 (Addgene 106280), a vector that places the guide RNA within a polyadenylated mRNA transcript, thus allowing capture by the 3’ polyA tail19. We then transduced the same BT16 clone expressing CRISPRi-v2 with the CROP-seq library, and prepared snATAC-seq and snRNA-seq libraries using the 10X Multiome kit (10X Genomics 1000285). Library cloning, virus production, titration, transduction, and selection was performed as described above for MultiPerturb-seq. We loaded 10,000 cells on one 10X Multiome lane, per manufacturer’s instructions. In brief, four days after infection, 200,000 cells (80% BT16 cells and 20% NIH-3T3) were trypsinzied, washed, and lysed in 500pl chilled lysis buffer (10X Genomics) with 12.5pl Ribolock RNase inhibitor (ThermoFisher EO0381). Cells were washed 3 times with ImL wash buffer (10X Genomics) with 12.5pl Ribolock, and 16,100 cells were resuspended in lOpl transposition mix (10X Genomics) and incubated for 60 minutes at 37°C. Following tagmentation, the mix was loaded on the GEM chip as instructed and run on the Chromium Controller X (10X Genomics). Following incubation, 5 pl quenching agent was added to stop the reaction before proceeding to post-GEM cleanup and library preparation per the manufacturer’s instructions (10X Genomics). Samples were sequenced on the Illumina NovaSeq 6000 vl.5 platform with 34bp index 1, 24bp index 2, and 125bp read 1 and 2 and counts matrices were generated with cellranger-arc (version 2.0.2, 10X Genomics). Polyadenylated guide RNA identities aligned with bowtie and joined with barcodes as described above for MultiPerturb-seq with the barcode whitelist provided with cellranger-arc. CUT&RUN For CUT&RUN, we used the CUT ANA ChIC/CUT&RUN Kit (EpiCypher 14-1048) with antibodies against H2A.Z (Abeam ab4174), H3K4me3 (EpiCypher 14-1048), and IgG (EpiCypher 14-1048). BT16 cells were transduced with a ZNHIT1 -targeting or a nontargeting (negative) control guide RNA (n = 2 biological replicate transductions per guide RNA). Two days later, cells were lifted and selected with 1 pg/ml puromycin. An in-line control was used to ensure complete selection. Five days following transduction, cells were collected for CUT&RUN. 500,000 cells were used per condition. Cells were lifted, washed, and bound to 10 pl activated Concanavalin A-conjugated paramagnetic beads (EpiCypher), then resuspended with 0.5 pg of the antibody of interest and incubated overnight at 4°C on a rotator. The next day, the beads were washed twice with permeabilization buffer and incubated with 2.5 pl pAG-MNase (Epicypher) for 10 minutes at room temperature. Following binding, the beads were washed, and 2mM CaCh was added to begin digestion. Digestion was allowed to proceed for 2 hours at 4°C, then the reaction was terminated by adding 33 pl Stop Buffer (Epicypher) and incubating the reactions at 37°C for 10 minutes. We used a 0.5 ng E. coli DNA (Epicypher 18-1401) spike-in to normalize samples. DNA was purified with spin columns provided (EpiCypher). Libraries were then prepared using the NEBNext Ultra II DNA Library Prep Kit (New England Biolabs E7645S), pooled, and sequenced on an Illumina NextSeq 500 with 2 x 75 bp paired-end reads.
Coordinates (chromosome, start, end, and peak pileups (height at peak summit) from macs2 outputs were used for further analysis. Scaling factors were calculated based on the percent of uniquely aligned reads from the E. coli spike-in alignment out of the total uniquely aligned reads (human and bacterial). Peak pileups were adjusted by the scaling factor. When combining biological replicates, we sought to only consider peaks that were reproducibly present between replicates. To do this, peaks from each biological replicate were intersected. Overlapping peaks with peak heights within 50% of each other (between replicates) were kept for further analysis and termed reproducible peaks. For each reproducible peak, we randomly chose it from either biological replicate to avoid issues with averaging or peak merging, which may alter peak shape. To compare changes in peak strength/height, reproducible peaks from each biological condition (ZNHIT1 or non-targeting gRNA) were then intersected and this consensus set was used for downstream analysis. Gene ontology (GO) enrichment was computed using clusterProfiler enrichGO (version 4.6.2).
For visualization, E. co/z-normalized bigwig files were created using deeptools68 bamCoverage (version 3.4.2) with the options — scaleF actor — extendReads —binSize lO.Heatmaps were generated using deeptools computeMatrix reference-point with the parameters —referencePoint center -a 3000 -b 3000 -p 8 —skipZeros — sortRegions descend — sortUsing mean and the blacklist file ENCODE blacklist v2 for hg3869 as — blackListFileName to filter out reads aligning to problematic genome regions. CUT&RUN signal was computed using deeptools multiBigwigSummary with transcription start site coordinates and the blacklist as above. For the pileup visualization, only one replicate per biological condition is shown.
Arrayed validation
For arrayed validation, BT16, BT12, and/or CHLA06 cells with lentiCRISPRi(v2)- Blast were transduced with guide RNAs (gRNAs) in lentiGuideFE-Puro (Addgene 170069). The gRNAs were designed using the Dolcetto CRISPRi library and CRISPick43 then synthesized by Integrated DNA Technologies (IDT). The backbone was digested with BsmBI (ThermoFisher FD0454) and oligos were annealed, phosphorylated and ligated into the lentiGuideFE-Puro backbone. Lentivirus was produced as described in Lentivirus production above (scaled to 6-well format) and stored at -80°C. For arrayed validations, sufficient lentivirus was added to the cells to achieve 20 - 50% cell transduction. After 48 hours, cells were replated in media with puromycin (1 pg/ml) and selected for 3 days. SOX2 staining and flow cytometry
To label and quantify SOX2+ cells, cells were lifted, washed, and stained with LIVE/DEAD Violet (Thermo Fisher L34963) (diluted 1 :400, 15pl for IxlO6 cells) for 5 minutes at room temperature, then washed with PBS and fixed with 1% formaldehyde and incubated at room temperature for 10 minutes on rotator (Thermo Scientific Digital Tube Revolver 88881101)58. Following fixation, they were quenched with 0.125M glycine (by addition of 2.5 M glycine), washed with PBS, and lysed with lOOpl of a previously optimized lysis buffer58 (lOmM Tris-HCl pH 7.5, lOmM NaCl, 3mM MgCh, 0.1% NP-40, 1% BSA) on ice for 5 minutes. Then they were washed with 1 ml wash buffer (same as lysis without NP- 40) and blocked in 1 ml PBS with 3% BSA for 30 minutes at room temperature. Following blocking, they were washed and resuspended in 100 pl PBS-3% BSA and antibody (Ipg for 5xl06 cells, anti-SOX2 Biolegend 656104) for 60 minutes at room temperature. They were then washed twice more (PBS with 3% BSA and 1% Tween) and resuspended in PBS with 1% BSA and 2mM EDTA for flow cytometry on the flow cytometer (Sony SH800). Immunofluorescence
Cells were plated in 96-well plates with 5,000 cells per well in triplicate. The next day, media was aspirated, and cells were washed and fixed with 4% paraformaldehyde (diluted 1 :4 from 16%, Electron Microscopy Sciences 15710-S) for 15 minutes, and washed with PBS. Cells were then permeabilized with 0.2% Tween-20 for 5 minutes and blocked with PBS with 0.2% Tween-20 and 3% BSA for 1 hour. Cells were then incubated with primary antibodies: TUJ1 at a 1 : 1000 dilution (BioLegend 801201), MAP2 at a 1 :500 dilution (SYSY 188004), or ATOH8 (ThermoFisher PA5-65024) at a 1 :400 dilution overnight at 4°C. The following day, cells were washed three times for 5 minutes with cold PBS. The corresponding secondary antibody was added at a 1 :800 dilution (ThermoFisher A- 21202 for TUJ1 (mouse), ThermoFisher A-l 1073 for MAP2 (guinea pig), ThermoFisher 31572 for ATOH8 (rabbit)) with 2mM Hoechst (Sigma B2261) and incubated for 1 hour at room temperature. Cells were then washed with PBS for an additional 3 washes. All steps were performed at room temperature on a rocker unless otherwise noted. Images were acquired with a 20X objective using an epifluorescence microscope (Keyence BZ-X800). Five images were acquired per well.
Quantitative image analysis was run in CellProfiler (version 4.2). Cells were identified based on Hoechst staining and intensity was quantified using the ClassifyObjects module. ATOH8 signal (nuclear) was quantified using integrated intensity (sum) per cell/object, while TUJ1 and MAP2 signal (cytoplasmic) were quantified using mean intensity per cell/object. Normalization was performed to the median intensity of cells/objects receiving non-targeting (NT) gRNAs. Cells/objects with an assigned intensity (integrated or mean depending on the protein) greater than 3 standard deviations from the NT mean were excluded as fluorescent debris.
EdU incorporation
Cells were labeled with 5-ethynyl -2’ -deoxyuridine (EdU) using the Click-iT EdU Cell Proliferation Kit (Thermo Fisher C10337). 2,000 cells/well were plated on 96 well plates in triplicate. Cells were incubated with 10 mM EdU for 30 minutes, fixed with 4% PFA for 15 minutes, and permeabilized with 0.5% Triton X-100 for 10 minutes at room temperature. Cells were then washed and incubated with the Click-iT reaction cocktail for 30 minutes. As a positive control, untransduced BT16 cells were exposed to IpM doxorubicin (MedChemExpress HY-15142) to inhibit proliferation and EdU incorporation. After EdU staining, nuclei were stained with 2mM Hoechst 3342 (Sigma 4533) for 15 minutes, washed with PBS, and images were acquired with a 20X objective using an epifluorescence microope (Keyence BZ-X800). The images were processed for display using FIJI (version 2.1.0) and quantitative image analysis was run in CellProfiler (version 4.2). Cells were quantified based on Hoechst staining and binned into EdU positive and EdU negative cells based on the intensity of the signal, using the ClassifyObjects module.
Cell cycle analysis with propidium Iodide
Cells were pelleted in 1.5 mL tubes, washed once with 1 mL PBS, and resuspended well in 300 pl PBS. 700 pl of ice cold 100% ethanol was added to fix cells at a final concentration of 70%. Fixed cells were then incubated on ice at 4°C overnight. For propidium iodide (PI) staining, cells were spun down at 1000g for 4 minutes and ethanol was removed. Cells were washed with ImL PBS and stained with 0.5mL FxCycle PI/RNAse solution (ThermoFisher Fl 0797) per 1 million cells. Pellets were resuspended and incubated for 15 minutes at room temperature before being resuspended for flow cytometry (Sony SH800 or MACS Quant 10). Sequential gating was performed as follows: exclusion of debris on the basis of forward (FSC-A) and side (SSC-H) scatter cell parameters followed by getting on singlets with FSC-A - FSC-H. The cell cycle profile was modeled, and gates were generated based on the PI-A signal of the cell population by FlowJo using a Watson model.
Example 2: MultiPerturb-seq protocol
See FIG. 6A for an experimental overview.
Part 1 : Nuclei isolation, tagmentation, and reverse transcription
1. Mix 1 pl barcoded MEDS (1 : 1 MEDS-AMEDS-B, 50nM) with 6pl diluted Tn (0.5mg/ml), vortex, and incubate at RT for 30’, then store at -20°C. To prepare one reaction plate, plate Ipl loaded transposomes per well for the number of desired wells (e.g. 48 wells). Of note, although we used standard unsalted oligonucleotides (Integrated DNA Technologies), we found that HPLC-purified oligonucleotides can lead to increased fragments captured per cell.
2. Trypsinize, filter, and collect 2.4m BT16 cells and 600k 3T3 cells (3e6 total), then lyse with ImL Omni lysis buffer 10’ on ice. In the meantime, prepare tagmentation mix Spin down, pool, and resuspend nuclei in PBS for tagmentation. Add 19pl of master mix to 1 pl of barcoded Tn in plate format (e.g. 48-wells). Incubate for 30’ at 37°C in tagmentation mix while shaking at 350rpm. Add I l 126mM EDTA (to a final concentration of 6mM) to each well and mix to stop tagmentation. Then add 50pl PBS on top and spin down at 400rcf for 4’ at 4°C. Remove 53 l, leaving 17pl and nuclei pellet undisrupted. In the meantime, prepare barcoded RT reagent plate to match the ATAC plate. Prepare the following RT master mix, and add 17pl to each well of nuclei, then add 4 pl of the barcoded TSO from the TSO plate. Perform reverse transcription as below in a thermocycler:
9. Resuspend well, triturating with a narrow tip (may melt to narrow gauge) and pool all wells into 2 x 1.5mL tube. Spin down and re-pool in 1.5mL tube.
10. Count nuclei twice and take the average count. Resuspend nuclei in diluted nuclei buffer.
Theoretically, 3e6 nuclei should be resuspended in 156.863 l for 19.125k cells/ pl, of which 8pl nuclei stock should be used.
100,000 (targeted nuclei recovery) x 1.53
8pl (volume of nuclei stock) = — — — — - - ■. - - - — — —
Nuclei stock concentration (nuclei/pl)
11. Resuspend in 15pl, mixing extremely well.
Part 2: 10X ATAC GEM generation, barcoding, and cleanup
12. Please refer to the 10X ATAC manual pages 24-34. In brief a. Prepare master mix + nuclei on ice b. Assemble Chromium Next GEM Chip H i. Dispense 50% glycerol into unused chip wells ii. Load row 1 with master mix + nuclei v. Attach 10X Gasket, aligning the notch with the top left-hand comer. Do not touch the smooth surface. c. Run the Chromium Controller X/iX. d. Remove gasket and open the chip holder, folding the lid back until it clicks at 45°. Slowly aspirate lOOpl GEMs from the lowest points of the recovery wells in row 3, then slowly (over 20”) transfer to a PCR tube. GEM incubation
*Substitute 15 cycles for 12 cycles during linear amplification.
** Store at 15°C for up to 18h or at -20°C for up to 1 week. Post GEM incubation cleanup with Dynabeads. Stop before proceeding to SPRIselect. Split into AT AC (20pl) and RNA (20pl) libraries. Part 3 : Library preparation
ATAC library
16. SPRI cleanup (1.2X) —> resuspend in 20pl H2O for PCR 17. Library PCR:
18. Double-sided SPRI cleanup (0.45X, 1.8X) 19. QC and sequence
ISPCR
20. ExoSAP: add 8pl ExoSAP for 20pl sample. Incubate in thermocycler: 21. ISPCR as follows:
22. SPRI cleanup (0.6X SPRI to split) resuspend in 1 Opl H2O. This is your mRNA fraction
23. Set aside supernatant and add 1.4X SPRI (2. OX SPRI total) resuspend in 1 Opl H2O. This is your gRNA fraction. mRNA library
24. Load Tn with MEDS-B for mRNA-derived cDNA (primer 426).
25. Perform tagmentation as follows:
26. Place in incubator for 5’ at 55°C.
27. Clean with Zymo DNA Clean & Concentrator-5, Capped Columns (Cat. D4014) per the manufacturer’s instructions. Elute in 33.5 l for PCR. 28. mRNA library PCR as follows: gRNA library
29. ExoSAP: add 4pl ExoSAP to 1 Op] sample. Incubate in thermocycler:
30. Intermediate PCR with biotinylated guide scaffold primer as follows: 31. SPRI (1.6X) —> resuspend in lOpl EEO.
32. ExoSAP: add 4pl ExoSAP to 1 Opl sample. Incubate in thermocycler:
33. Perform biotin pulldown with Dynal MyOne Dynabeads Streptavidin Cl (Thermo Fisher cat. 65001).
In brief: a. Prepare 25pl beads by washing 3X with 1ml B&W buffer per wash. Place on magnet 1’ between washes. b. Resuspend in 50pl 2X B&W buffer and incubate with 50pl sample (14 + 36pl H20). c. Incubate 15’ RT on rocker. d. Let sit on magnet for 3 ’ . e. Wash 2X with 200pl B&W buffer, followed by 2X washes with H2O. f. Resuspend beads in 45pl H2O.
34. Perform inner (guide library) PCR as follows:
Recipes
Transposome annealing
1. Dissolve MEDS in oligo annealing buffer to a final concentration of lOOpM. Mix lOOpl top + lOOpl bottom oligo for both MEDS A and MEDS B.
2. Anneal in thermocycler:
Oligo annealing buffer
Omni-based Lysis Buffer 5X TD-TAPS
Dilution buffer
ExoSAP Wash Buffer
2X B&W buffer
Example 3: Pooled CRISPR screens with joint single-cell chromatin accessibility and transcriptome profiling
We developed a high-throughput CRISPR screening platform with joint single nucleus chromatin accessibility, transcriptome, and guide RNA capture (MultiPerturb-seq), using combinatorial indexing combined with droplet microfluidics to scale throughput and integrate all three modalities. We then applied MultiPerturb-seq to identify key genes whose loss can trigger differentiation in a rare pediatric cancer, atypical teratoid/rhabdoid tumor (AT/RT), driven by loss of the SWESNF chromatin remodeling subunit SMARCB1. We profiled -100,000 gene-perturbed cells to identify ZNHIT1 as a potential target for AT/RT cancer reprogramming therapy.
MultiPerturb-seq links pooled CRISPR perturbations with single-cell open chromatin (ATAC-sequencing) and gene expression (RNA-sequencing) profiles at scale (FIG. 1 A, FIG. 2). We then apply this method to drive mechanism-based discovery of differentiation regulators for a rare pediatric brain cancer, atypical teratoid/rhabdoid tumor (AT/RT). While cancer reprogramming therapy (i.e. differentiation therapy) has been curative for patients with some malignancies (e.g. acute promyelocytic leukemia10), success has been limited in other cancers due to a lack of high-throughput methods to identify reprogramming targets. In MultiPerturb-seq, open chromatin provides a broad overview of epigenetic state, capturing many levels of gene regulation, while gene expression provides a robust view of cell state and developmental stage; together, they link CRISPR perturbations with cell states and putative mechanisms of action for transcriptional reprogramming. We also sought to reduce reagent cost and labor: Recent genome-wide single-cell perturbation screens have required -100 lanes of commercial single-cell library preparation kits2. In MultiPerturb-seq, we combine combinatorial indexing and droplet microfluidics to scale throughput11'13 — loading 100,000 cells on a single 10X Chromium AT AC lane — which results in significant cost advantages over existing uni- and multimodal single-cell perturbation approaches (FIG. IB).
After cloning CRISPR guide RNA (gRNA) libraries into lentiviral vectors and producing virus, we transduced mammalian cells that already express a second-generation CRISPR repressor14 at a low multiplicity-of-infection (-0.05) to achieve one guide per cell and selected using puromycin for cells receiving a CRISPR perturbation (FIG. 3 A - FIG. 3D). We waited 7 days to ensure sufficient time for protein depletion and then collected cells for MultiPerturb-seq library preparation (FIG. 4). After nuclei isolation and distribution into wells, we tagmented open chromatin using barcoded transposomes (FIG. 5A and FIG. 5B)6. Next, we performed reverse transcription with a mix of poly-dT and CRISPR gRNA-specific primers and barcoded template switch oligonucleotides (TSO) with matching barcodes (FIG. 5C - FIG. 5F). We then pooled cells for second-round barcoding via droplet microfluidics using 10X Genomics AT AC kit gel beads. Lastly, AT AC, RNA, and CRISPR gRNA libraries were amplified and prepared for sequencing (FIG. 1C, FIG. 5G - FIG. 5 J, FIG. 6A - FIG. 6B).
To quantify single-cell isolation in MultiPerturb-seq, we performed a species-mixing experiment with 80% human (BT16) and 20% mouse (3T3) cells, and robustly captured AT AC, RNA, and gRNA molecules (FIG. ID - FIG. 1G, FIG. 7A - FIG. 7D). We quantified the percent of barcode combinations which contained a mixture of mouse and human fragments (collisions in cell assignment) for each of the three modalities captured. We achieved low barcode collision rates for RNA (6.2%), ATAC (11.6%) and gRNA (6.6%) libraries, despite loading -10-fold more cells than the standard for the lOx Genomics ATAC kit. We achieved robust detection of expressed genes, open chromatin peaks, and gRNAs (FIG. 1H, FIG. 7E - FIG. 7H). For the ATAC, we observed characteristic open chromatin enrichment around transcriptional start sites (FIG. II, FIG. 7E) and, for the RNA, we found low mitochondrial reads (FIG. 7F). The majority of cells only had one gRNA detected and decreased expression of the targeted gene when compared to cells receiving a non-targeting gRNA: 78% of high-quality cells were assigned gRNA identities (FIG. 1J, FIG. 7G, FIG. 7H). Notably, this does not require the use of any modified CRISPR plasmids or specialized bead oligonucleotides. We also found similar or better RNA and ATAC capture compared to other single-cell RNA-seq and single-cell ATAC-seq technologies, including increased unique molecular identifiers (UMIs) and genes per cell (FIG. 71 - FIG. 7L), as well as increased ATAC fragments and peaks per cell (FIG. 7M - FIG. 7P)6, n15'18.
Though it is not compatible with barcoded superloading, we also utilized the 10X Multiome kit and the specialized guide RNA plasmid, CROP-seq19 as an alternate method of multi-modal capture and performed a lower-throughput version of a multiomic CRISPR screen (-10,000 vs. -100,000 cells per lane) (FIG. 8A), which we termed CROP -Multiome. Reassuringly, gene expression changes after perturbation were highly correlated between MultiPerturb-seq and CROP -Multiome, supporting the validity of the results on both platforms (FIG. 8B - FIG. 8E). However, MultiPerturb-seq outperformed CROP -Multiome along several important dimensions, including better gRNA capture (FIG. 8F) and higher RNA UMIs per cell (FIG. 8G), RNA genes per cell (FIG. 8H), ATAC fragments (FIG. 81), and ATAC peaks per cell (FIG. 8J). Given these differences and the additional advantages of 10-fold increased cell loading, direct guide RNA capture without a specialized plasmid, and 5’ capture, we used the MultiPerturb-seq data for all subsequent analyses.
The combination of ATAC and RNA modalities allowed us to detect perturbation- linked changes in open chromatin and gene expression. Despite the sparsity of the single-cell data, we were able to see clear patterns when examining individual genes and groups of genes with shared function. For example, after knockdown of histone methyltransferases (DOT1L, EHMT2, KDM1A, KDM6A, KMT2B, KMT2D, MECOM, MLLT1, PRDM16, PRMT5, SETD2, SETD5, SETDB1, SUV39H2), we found increases in open chromatin at the RFX3 locus and increased BFX3 gene expression (FIG. IK). We also were able to identify perturbationspecific changes: After knockdown of histone variant EI3F3A, we found the opposite at the PPM1B locus, where we observed decreased chromatin accessibility and expression of PPM1B (FIG. IL).
We next sought to apply MultiPerturb-seq to a rare pediatric central nervous system cancer, AT/RT, which is driven by a change in chromatin remodeling. In AT/RT, biallelic loss of SMARCB1 — an essential subunit of the SWI/SNF chromatin remodeling complex, which is one of the most commonly mutated protein complexes in cancer20 — prevents complete differentiation of progenitors and drives tumor proliferation21. AT/RT is extremely aggressive, and no AT/RT-specific therapies are available: The current standard-of-care is high dose radiation and chemotherapy with autologous stem cell transplant22. Despite these intensive (and toxic) therapies, the disease is still nearly uniformly fatal with a median overall survival of four years22. Due to the loss of SMARCB1, AT/RT are dependent on alternate epigenetic regulators, such as poly comb23'25, and SMARCBl-null embryonic stem cell models fail to differentiate into neurons due to altered gene regulation26. Therefore, using MultiPerturb-seq, we targeted -100 epigenetic remodelers in human AT/RT cells (BT16) and sought to discover whether knockdown of specific remodelers can ameliorate the dysfunctional epigenome in AT/RT and restore differentiation (FIG. 9A).
Because AT/RT may arise from a variety of lineages, including non-neural lineages27, we first compared the MultiPerturb-seq transcriptomes to reference developmental and adult atlases of multiple human tissues28 (cortex, cerebellum, kidney, ovary, testis, and liver) and found the highest overall similarity with brain cortical tissue. To assess the impact of perturbations on differentiation, we measured the correlation in transcriptomic profiles between gene-perturbed cells and primary tissues from different brain developmental stages (FIG. 9B). Compared to negative control (non-targeting) perturbations, we found a subset of perturbations with transcriptomes that had greater similarity to late brain stages rather than early ones, such as ZNHIT1, CTCF, GATAD2B, and others. These tended to express higher levels of genes correlated with neural differentiation such as CCND329, GPM6B30, and SYNJ23132 (FIG. 10).
The chromatin landscape in AT/RT is unusual with broad changes due to loss of SMARCB1, where residual SWI/SNF complexes cannot maintain accessibility to enhancers needed for differentiation33. To further focus our analysis, we leveraged the multimodal nature of our assay to find epigenetic remodeler perturbations that may help normalize the AT/RT chromatin landscape (FIG. 9C). Using recent ATAC-seq atlases from primary fetal34 and adult35 brain tissues, we sought to identify perturbations resulting in open chromatin profiles with greater correlation to mature brain tissue, and found that perturbations of ZNHIT1, MECOM, CERC2, TRRAP, and others led to genome-wide chromatin profiles that were more similar to tissue from postnatal brain than fetal brain (FIG. 9C, FIG. 11 A). We also examined ENCODE cv.s-regulatory elements (CREs)36 and found a greater number of our perturbations triggered changes in chromatin accessibility at promoters with fewer perturbations acting at enhancers (FIG. 1 IB - FIG. 1 IF). Furthermore, when grouping target genes by complex, we found that knockdown of repressor complex (LSD-CoREST/BHC) subunits (HDAC1, HDAC2, RCOR1) tended to increase accessibility at ENCODE CREs, while knockdown of CERF complex subunits (CERC2, SMARCA 7) tended to decrease accessibility (FIG. 11G).
Next, we computed differentiation scores for gene expression (RNA) and open chromatin (ATAC) that captured relative similarity to postnatal versus prenatal brain tissues (see Methods) (FIG. 9D - FIG. 9E). Interestingly, we found that RNA and ATAC differentiation score was not always correlated (FIG. 9F). For example, we found that most perturbations of BAF complex members led to high ATAC differentiation and low RNA differentiation scores, suggesting that loss of residual BAF complexes can reshape/restore the chromatin landscape but that these perturbations are not sufficient to differentiate cells (FIG. 11H).
After examining both differentiation scores, we identified multiple genes with high RNA and ATAC differentiation scores and subsequently focused on ZNHIT1, which was the top-ranked gene perturbation for joint ATAC and RNA differentiation score (FIG. 9F). ZNHIT1 is a subunit of the SRCAP (SNF-2 related CBP activator protein) complex, which is an INO80 family complex that mediates ATP-dependent exchange of histone H2A.Z, leading to chromatin remodeling and transcriptional modulation (FIG. 12 A). ZNHIT1 has previously been shown to maintain sternness in intestinal stem cells by promoting H2A.Z incorporation37. ZNHIT1 knock-down induced large changes at multiple regulatory elements, including promoters and enhancers, with increased transcriptomic similarity to postnatal — and specifically adult — brain tissues. (FIG. 9G, FIG. 12B - FIG. 12E). To identify potential mechanisms of action, we examined differentially accessible chromatin in ZNHIT1 -perturbed cells compared to non-targeting controls. We found that ZNHIT1 perturbation led to changes in accessibility near genes involved in neuronal differentiation and axonogenesis (FIG. 12B), as well as increased expression of genes for neuron projection development, cell polarity, and cell growth (FIG. 12C).
Given the broad changes in chromatin organization and more differentiated transcriptional state upon ZNHIT1 loss, we wondered whether ZNHIT1 inhibition may be a good candidate to push AT/RT cells toward terminal differentiation. We cloned individual CRISPR guide RNAs to target ZNHIT1 and measured loss of sternness, decreased proliferation, and increased expression of differentiated neuronal markers (FIG. 13 A). Using intracellular antibody labeling and flow cytometry, we found diminished SOX2 protein expression after knockdown of ZNHIT1, compared to non-targeting guide RNA controls (FIG. 13B, FIG. 12D, FIG. 12E). The central goal of an AT/RT reprogramming therapy is cessation of cellular proliferation. Because cell cycle arrest occurs during Gl, preventing progression to S-phase, we evaluated the relative proportion of cells in S-phase (FIG. 13C). We examined genes classified as cell cycle markers38 and found that ZNHIT1 perturbation led to a 19% decrease in expression of S-phase genes compared to non-targeting controls. We confirmed this by assaying changes in proliferation via incorporation of the thymidine analogue 5-ethynyl-2'-deoxyuridine (EdU) after a 30 minute pulse and found that ZNHIT1 knockdown decreased progression through S-phase by 43% relative to non-targeting controls (FIG. 13D). Perturbation of related proteins (SRCAP complex co-factor YEATS4 and H2A.Z acetylase KAT5) resulted in similar decreases in EdU incorporation, suggesting that other SRCAP members and enzymes involved in H2A.Z biogenesis are required for normal cell cycle progression (FIG. 13F, FIG. 13G).
In the MultiPerturb-seq data, we also found that target genes of the transcription factor ATOH8 had increased expression in ZNHIT1 -perturbed cells (~9-fold increase), compared to cells receiving a non-targeting guide RNA (FIG. 13E). ATOH8 expression promotes neuronal differentiation and supports neuronal functions39. To confirm these findings, we performed immunocytochemistry for ATOH8 in ZNHIT1 -perturbed cells and found that ATOH8 expression was increased (FIG. 13F). We also observed increases in early (TUJ1) and more mature (MAP2) neuronal markers in ZNHIT1 -perturbed cells, further supporting a role for ZNHIT1 in AT/RT differentiation (FIG. 13G, FIG. 13H, FIG. 13H, FIG. 131).
Given that ZNHIT1 deposits histone variant H2A.Z and acetylation of H2A.Z is a key epigenetic hallmark of many cancers40, we also sought to characterize changes in H2A.Z in AT/RT upon ZNHIT1 loss using CUT&RUN (FIG. 131, FIG. 14 A). In ZNHIT1 -perturbed cells, we found a 56% decrease in the overall number of H2A.Z-bound peaks (FIG. 13 J). As a control, we also measured the promoter-associated histone modification H3K4me3 using CUT&RUN and found little change in overall peak number (FIG. 14B, FIG. 14C)). For peaks present in both control and ZNHIT1 knockdown cells, we found a median 29% decrease in H2A.Z (FIG. 13K). The peaks with the largest decrease in H2A.Z after ZNHIT1 loss tended to be near genes involved in cell growth, differentiation, and development (FIG. 13L).
Given that H2A.Z is encoded by two genes that differ only by three amino acids, we separately perturbed H2A.Z.1 (encoded by H2AZ1) and H2A.Z.2 (encoded by H2AZ2) and measured changes in cell cycle and differentiation. We found a large reduction in cells in S- phase after knock-down of H2A.Z.2 (74% decrease) but not upon H2A.Z.1 loss and this result was consistent across different AT/RT cell lines (FIG. 13M, FIG. 13N, FIG. 14), suggesting that the cell cycle arrest mediated by ZNHIT1 perturbation may work via its role in H2A.Z deposition. We also found that loss of H2A.Z.1 and/or H2A.Z.2 increases expression of the mature neuronal marker MAP2 across 3 different AT/RT cell lines (FIG. 130, FIG. 13P).
In sum, we have presented MultiPerturb-seq, a multiomic pooled CRISPR screening platform, which captures ATAC, mRNA, and CRISPR perturbations. This method increases throughput more than 10-fold over prior unimodal single-cell perturbation screens and does so with lower cost than other single-cell perturbation methods. Compared to performing separate pooled screens for each modality, MultiPerturb-seq can directly link changes in open chromatin and gene expression, yield multi-modal data without the need for computational integration methods, and provides a better controlled assay with fewer technical and biological confounders. Applied to a rare pediatric brain tumor model, MultiPerturb-seq identified ZNHIT1 as a potential target for AT/RT reprogramming therapy, which we further confirmed by demonstrating that ZNHIT1 knockdown pushes cells toward terminal differentiation. We demonstrate the ability of MultiPerturb-seq to perform high-throughput screens with rich phenotypic and mechanistic readout, and the promise of ZNHIT1 and H2A.Z modulation for AT/RT differentiation, though further studies will be needed to understand the therapeutic potential. From a technical viewpoint, there are several ways to further extend this platform. First, MultiPerturb-seq is already compatible with protein capture on the 10X ATAC kit using DNA-barcoded antibodies41, as well as other types of guide RNAs with a spacer near the 5’ end (e.g. CRISPR/Cas9, CRISPRa, prime-editing, base-editing). Second, with two rounds of barcoding, there is an opportunity for a first round of arrayed barcoding in situations where DNA barcoding is challenging, such as different pharmacologic perturbations or processing multiple timepoints in a single experiment. Taken together, MultiPerturb-seq brings together epigenome and transcriptome phenotyping to study the impact of many genetic perturbations.
References
1. Dixit, A. et al. Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens. Cell 167, 1853-1866. el817 (2016).
2. Replogle, J.M. et al. Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq. Cell (2022). 3. Frangieh, C.J. et al. Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nature genetics 53, 332-341 (2021).
4. Mimitou, E.P. et al. Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells. Nature Methods 16, 409-412 (2019).
5. Rubin, A.J. et al. Coupled Single-Cell CRISPR Screening and Epigenomic Profiling Reveals Causal Gene Regulatory Networks. Cell 176, 361-376 e317 (2019).
6. Liscovitch-Brauer, N. et al. Profiling the genetic determinants of chromatin accessibility with scalable single-cell CRISPR screens. Nature Biotechnology (2021).
7. Pierce, S.E., Granja, I.M. & Greenleaf, W.I. High-throughput single-cell chromatin accessibility CRISPR screens enable unbiased identification of regulatory networks in cancer. Nature Communications 12, 2969 (2021).
8. Yubao, C. et al. Perturb-tracing enables high-content screening of multiscale 3D genome regulators. bioRxiv, 2023.2001.2031.525983 (2023).
9. Morris, I.A., Sun, I.S. & Sanjana, N.E. Next-generation forward genetic screens: uniting high-throughput perturbations with single-cell analysis. Trends in Genetics (2023).
10. Huang, M.E. et al. Use of all-trans retinoic acid in the treatment of acute promyelocytic leukemia. Blood 72, 567-572 (1988).
11. Datlinger, P. et al. Ultra-high-throughput single-cell RNA sequencing and perturbation screening with combinatorial fluidic indexing. Nature Methods 18, 635-642 (2021).
12. Lareau, C.A. et al. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nature Biotechnology 37, 916-924 (2019).
13. Hao, Z. et al. txci-ATAC-seq, a massive-scale single-cell technique to profile chromatin accessibility. bioRxiv, 2023.2005.2011.540245 (2023).
14. Morris, I. A. et al. Discovery of target genes and pathways at GWAS loci by pooled single-cell CRISPR screens. Science 380, eadh7699 (2023).
15. Ma, S. et al. Chromatin Potential Identified by Shared Single-Cell Profiling of RNA and Chromatin. Cell 183, 1103-1116.el 120 (2020).
16. Cao, I. et al. loint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380-1385 (2018). 17. Chen, S., Lake, B.B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nature Biotechnology 37, 1452-1457 (2019).
18. Zhu, C. et al. An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nature Structural & Molecular Biology 26, 1063-1070 (2019).
19. Datlinger, P. et al. Pooled CRISPR screening with single-cell transcriptome readout. Nature Methods 14, 297-301 (2017).
20. Kadoch, C. & Crabtree, G.R. Mammalian SWI/SNF chromatin remodeling complexes and cancer: Mechanistic insights gained from human genomics. Science Advances 1, el500447 (2015).
21. Jackson, E.M. et al. Genomic analysis using high-density single nucleotide polymorphism-based oligonucleotide arrays and multiplex ligation-dependent probe amplification provides a comprehensive analysis of INI1/SMARCB1 in malignant rhabdoid tumors. Clin Cancer Res 15, 1923-1930 (2009).
22. Reddy, A.T. et al. Efficacy of High-Dose Chemotherapy and Three-Dimensional Conformal Radiation for Atypical Teratoid/Rhabdoid Tumor: A Report From the Children's Oncology Group Trial ACNS0333. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 38, 1175-1185 (2020).
23. Wilson, B.G. et al. Epigenetic antagonism between polycomb and SWESNF complexes during oncogenic transformation. Cancer Cell 18, 316-328 (2010).
24. Wang, X. et al. BRD9 defines a SWESNF sub-complex and constitutes a specific vulnerability in malignant rhabdoid tumors. Nat Commun 10, 1881 (2019).
25. Nakayama, R.T. et al. SMARCB1 is required for widespread BAF complex-mediated activation of enhancers and bivalent promoters. Nature genetics 49, 1613-1623 (2017).
26. Langer, L.F., Ward, J.M. & Archer, T.K. Tumor suppressor SMARCB1 suppresses super-enhancers to govern hESC lineage determination. Elife 8, e45672 (2019).
27. Jessa, S. et al. Stalled developmental programs at the root of pediatric brain tumors. Nature genetics 51, 1702-1713 (2019).
28. Cardoso-Moreira, M. et al. Gene expression across mammalian organ development. Nature 571, 505-509 (2019). 29. Lacomme, M., Liaubet, L., Pituello, F. & Bel-Vialar, S. NEUR0G2 drives cell cycle exit of neuronal precursors by specifically repressing a subset of cyclins acting at the G1 and S phases of the cell cycle. Mol Cell Biol 32, 2596-2607 (2012).
30. Bayat, H. et al. CRISPR/Cas9-mediated deletion of a GA-repeat in human GPM6B leads to disruption of neural cell differentiation from NT2 cells. Scientific reports 14, 2136 (2024).
31. Jovanovic, V.M. et al. A defined roadmap of radial glia and astrocyte differentiation from human pluripotent stem cells. Stem Cell Reports 18, 1701-1720 (2023).
32. Chuang, Y.Y. et al. Role of synaptojanin 2 in glioma cell migration and invasion. Cancer research 64, 8271-8275 (2004).
33. Wang, X. et al. SMARCB1 -mediated SWI/SNF complex function is essential for enhancer regulation. Nature genetics 49, 289-295 (2017).
34. Domcke, S. et al. A human cell atlas of fetal chromatin accessibility. Science 370, eaba7612 (2020).
35. Zhang, K. et al. A single-cell atlas of chromatin accessibility in the human genome. Cell 184, 5985-6001. e5919 (2021).
36. Moore, J.E. et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699-710 (2020).
37. Zhao, B. et al. Znhitl controls intestinal stem cell maintenance by regulating H2A.Z incorporation. Nature Communications 10, 1071 (2019).
38. Viner-Breuer, R., Yilmaz, A., Benvenisty, N. & Goldberg, M. The essentiality landscape of cell cycle related genes in human pluripotent and cancer cells. Cell Div 14, 15 (2019).
39. Divvela, S.S.K., Saberi, D. & Brand-Saberi, B. Atoh8 in Development and Disease. Biology (Basel) 11 (2022).
40. Valdes-Mora, F. et al. Acetylation of H2A.Z is a key epigenetic modification associated with gene deregulation and epigenetic remodeling in cancer. Genome Research 22, 307-321 (2012).
41. Mimitou, E.P. et al. Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nature Biotechnology 39, 1246-1258 (2021). 42. Hashizume, R. et al. Morphologic and molecular characterization of ATRT xenografts adapted for orthotopic therapeutic testing. Neuro-Oncology 12, 366-376 (2010).
43. Sanson, K.R. et al. Optimized libraries for CRISPR-Cas9 genetic screens with multiple modalities. Nature Communications 9, 5416 (2018).
44. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet Journal 17, 3 (2011).
45. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10, R25 (2009).
46. Buenrostro, J.D., Giresi, P.G., Zaba, L.C., Chang, H.Y. & Greenleaf, W.J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature Methods 10, 1213 (2013).
47. Picelli, S. et al. Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Research 24, 2033-2040 (2014).
48. Corces, M.R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nature Methods 14, 959-962 (2017).
49. Xu, W. et al. ISSAAC-seq enables sensitive and flexible multimodal profiling of chromatin accessibility and gene expression in single cells. Nature Methods (2022).
50. Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nature Protocols 9, 171-181 (2014).
51. Norholm, M.H.H. A mutant Pfu DNA polymerase designed for advanced uracil- excision DNA engineering. BMC Biotechnology 10, 21 (2010).
52. Grandi, F.C., Modi, H., Kampman, L. & Corces, M.R. Chromatin accessibility profiling by ATAC-seq. Nature Protocols 17, 1518-1552 (2022).
53. Preissl, S. et al. Single-nucleus analysis of accessible chromatin in developing mouse forebrain reveals cell-type-specific transcriptional regulation. Nature Neuroscience 21, 432- 439 (2018).
54. Adey, A.C. Tagmentation-based single-cell genomics. Genome Research 31, 1693- 1705 (2021). 55. Replogle, J.M. et al. Combinatorial single-cell CRISPR screens by direct guide RNA capture and targeted sequencing. Nature Biotechnology 38, 954-961 (2020).
56. Zhu, Y.Y., Machleder, E.M., Chenchik, A., Li, R. & Siebert, P.D. Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. Biotechniques 30, 892-897 (2001).
57. Bagnoli, J.W. et al. Sensitive and powerful single-cell RNA sequencing using mcSCRB-seq. Nature Communications 9, 2937 (2018).
58. Di, L. et al. RNA sequencing by direct tagmentation of RNA/DNA hybrids. Proceedings of the National Academy of Sciences 117, 2886 (2020).
59. Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357-359 (2012).
60. Zhang, Y. et al. Model-based analysis of ChlP-Seq (MACS). Genome Biology 9, R137 (2008).
61. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15-21 (2013).
62. Liao, Y., Smyth, G.K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923-930 (2014).
63. Griffiths, J. A., Richard, A.C., Bach, K., Lun, A.T.L. & Marioni, J.C. Detection and removal of barcode swapping in single-cell RNA-seq data. Nature Communications 9, 2667 (2018).
64. Levenshtein, V.I. Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady 10, 707-710 (1966).
65. Timothy, B., Kaishu, M., Kathryn, R. & Eugene, K. Robust differential expression testing for single-cell CRISPR screens at low multiplicity of infection. bioRxiv, 2023.2005.2015.540875 (2023).
66. Hill, A.J. et al. On the design of CRISPR-based single-cell molecular screens. Nature Methods 15, 271-274 (2018).
67. Skene, P.J. & Henikoff, S. An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. eLife 6 (2017). 68. Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I.J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032-2034 (2015).
69. Ramirez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic acids research 44, W160-W165 (2016).
70. Amemiya, H.M., Kundaje, A. & Boyle, A.P. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Scientific reports 9, 9354 (2019).
71. Fiskin, E. et al. Single-cell profiling of proteins and chromatin accessibility using PHAGE-ATAC. Nature Biotechnology 40, 374-381 (2022).
72. Stirling, D.R. et al. CellProfiler 4: improvements in speed, utility and usability. BMC Bioinformatics 22, 433 (2021).
73. Medvedeva, Y.A. et al. EpiFactors: a comprehensive database of human epigenetic factors and complexes. Database (Oxford) 2015, bav067 (2015).
74. D’ Amato, L. et al. ARHGEF3 controls HDACi-induced differentiation via RhoA- dependent pathways in acute myeloid leukemias. Epigenetics 10, 6-18 (2015).
75. Nenasheva, V.V. & Tarantul, V.Z. Many Faces of TRIM Proteins on the Road from Pluripotency to Neurogenesis. Stem Cells and Development 29, 1-14 (2019).
76. Hong, F. et al. Dissecting Early Differentially Expressed Genes in a Mixture of Differentiating Embryonic Stem Cells. PLOS Computational Biology 5, el000607 (2009).
77. Liu, T. et al. EphB4 Regulates Self-Renewal, Proliferation and Neuronal Differentiation of Human Embryonic Neural Stem Cells in Vitro. Cellular Physiology and Biochemistry 41, 819-834 (2017).
78. Powell, G.T. et al. Cachdl interacts with Wnt receptors and regulates neuronal asymmetry in the zebrafish brain. Science 384, 573-579 (2024).
79. Sadek, C.M. et al. TACC3 expression is tightly regulated during early differentiation. Gene Expr Patterns 3, 203-211 (2003).
80. Takebe, A. et al. Microarray analysis of PDGFRa+ populations in ES cell differentiation culture identifies genes involved in differentiation of mesoderm and mesenchyme including ARID3b that is essential for development of embryonic mesenchymal cells. Developmental Biology 293, 25-37 (2006). 81. Zhang, L.-h. et al. TRIM24 promotes sternness and invasiveness of glioblastoma cells via activating Sox2 expression. Neuro-Oncology 22, 1797-1808 (2020).
82. Kobayashi, K., Era, T., Takebe, A., Jakt, L.M. & Nishikawa, S.-I. ARID3B Induces Malignant Transformation of Mouse Embryonic Fibroblasts and Is Strongly Associated with Malignant Neuroblastoma. Cancer research 66, 8331-8336 (2006).
All patent and non-patent publications cited in this specification are incorporated herein by reference in their entireties. US Provisional Patent Application No. 63/624,062, filed January 23, 2024, and US Provisional Patent Application No. 63/697,184, filed September 20, 2024, are incorporated herein by reference in their entireties. While the invention has been described with reference to particular embodiments, it will be appreciated that modifications can be made without departing from the spirit of the invention. Such modifications are intended to fall within the scope of the appended claims.

Claims

What is claimed is:
1. A method for evaluating effects of genetic perturbations on chromatin accessibility and the transcriptome of single cells in a population of cells, the method comprising:
(a) obtaining a heterogeneous population of cells having single cells with one or more genetic perturbations having been introduced by a CRISPR guide RNA that targets a gene or genomic region of interest, the single cells comprising one or more CRISPR guide RNAs;
(b) obtaining cell nuclei from all or a portion of the single cells of (a) and separating the nuclei into partitions, and incubating the cell nuclei in with a tagmentation buffer that comprises a transposome complex, wherein the transposome complex comprises a transposase, a transposon, and a nucleotide sequence comprising a handle sequence and a first barcode, wherein the transposase causes staggered double-stranded breaks in DNA, and wherein the handle sequence and the first barcode are linked to the doublestranded DNA at the staggered breaks;
(c) performing reverse transcription on nuclei from (b), which comprises contacting and incubating the nuclei with reverse transcription primers and template switch oligos (TSOs) comprising the handle sequence and the first barcode, or the corresponding reverse-complement sequence thereof, optionally wherein the TSO comprise a UMI, reverse transcriptase, and dNTPs in a reverse transcription buffer, whereby cellular RNA and CRISPR guide RNAs are reverse transcribed to cDNA comprising the handle sequence and the first barcode, or the corresponding reverse-complement sequence thereof;
(d) pooling nuclei from multiple partitions;
(e) randomly partitioning one or more nuclei of (d) with a bead, the bead having linked nucleotide sequences comprising a bead-specific barcode and a capture sequence, the capture sequence being complementary to and capable of binding the handle sequence, and disrupting the nuclei and performing PCR amplification wherein the handle sequence binds the bead capture sequence to generate PCR products comprising the bead-specific barcode in combination with each of (i) double-stranded DNA of (b) comprising the first barcode; (ii) CRISPR guide RNA transcribed cDNA of (c) comprising the first barcode; and (iii) cDNA generated from nuclear and/or cellular RNA of (c) comprising the first barcode;
(f) sequencing and analyzing the PCR amplification products generated in (e) to associate the effects of a genetic perturbation with the chromatin accessibility and the transcriptome from a single cell, whereby sequences acquired with the same combination of the first barcode and the bead-specific barcode are identified as being from the same cell.
2. A method for evaluating effects of genetic perturbations on chromatin accessibility and the transcriptome of single cells in a population of cells, the method comprising:
(a) obtaining a heterogeneous population of cells having single cells with one or more genetic perturbations having been introduced by a CRISPR guide RNA that targets a gene or genomic region of interest, the single cells comprising one or more CRISPR guide RNAs;
(b) obtaining cell nuclei from all or a portion of the single cells of (a) and separating the nuclei into partitions, and incubating the cell nuclei in with a tagmentation buffer that comprises a transposome complex, wherein the transposome complex comprises a transposase, a transposon, and a nucleotide sequence comprising a handle sequence and a first barcode, wherein the transposase causes staggered double-stranded breaks in DNA, and wherein the handle sequence and the first barcode are linked to the doublestranded DNA at the staggered breaks;
(c) performing reverse transcription on nuclei from (b), which comprises contacting and incubating the nuclei with reverse transcription primers and template switch oligos (TSOs) comprising the handle sequence and the first barcode, or the corresponding reverse-complement sequence thereof, optionally wherein the TSO comprise a UMI, reverse transcriptase, and dNTPs in a reverse transcription buffer, whereby cellular RNA and CRISPR guide RNAs are reverse transcribed to cDNA comprising the handle sequence and the first barcode, or the corresponding reverse-complement sequence thereof;
(d) pooling nuclei from multiple partitions;
(e) randomly partitioning one or more nuclei of (d), and contacting the nuclei with nucleotide sequences comprising a second barcode and a capture sequence, the capture sequence being complementary to and capable of binding the handle sequence, wherein the and disrupting the nuclei and performing PCR amplification wherein the handle sequence binds the bead capture sequence to generate PCR products comprising the second barcode in combination with each of (i) double-stranded DNA of (b) comprising the first barcode; (ii) CRISPR guide RNA transcribed cDNA of (c) comprising the first barcode; and (iii) cDNA generated from nuclear and/or cellular RNA of (c) comprising the first barcode; (f) sequencing and analyzing the PCR amplification products generated in (e) to associate the effects of a genetic perturbation with the chromatin accessibility and the transcriptome from a single cell, whereby sequences acquired with the same combination of the first barcode and the second barcode are identified as being from the same cell.
3. The method of claim 1 or 2, wherein the first barcode is unique to a partition and differs from another or all other first barcodes present in additional partitions.
4. The method of any one of claims 1 to 3, wherein the one or more genetic perturbations include CRISPR-Cas mediated editing, including CRISPR/Cas9, prime-editing, base-editing, CRISPRa, and/or CRISPRi.
5. The method of any one of claims 1 to 4, wherein more than one CRISPR guide RNA targets a gene or genomic region of interest or a different gene genomic region of interest in a single cell.
6. The method of any one of claims 1 to 5, wherein the one or more partitions of (b) are individual wells of a microwell plate, optionally a 96 well plate.
7. The method of any one of claims 1 to 6, wherein one or more partitions of (b) contain at least about 1000, about 2000, about 5000, about 25,000, or about 50,000 nuclei per partition.
8. The method of any one of claims 1 to 7, further comprising washing the nuclei of step (b) prior to step (c) to stop the tagmentation reaction without disrupting the cell nuclei, wherein the washing comprises addition of EDTA.
9. The method of any one of claims 1 to 8, wherein step (e) comprises partitioning at least about 2, at least about 5, at least about 10, at least about 15, or at least about 20 nuclei with the bead.
10. The method of any one of claims 1 to 9, wherein the PCR products generated in step (e) are separated to obtain a library comprising a combination of PCR products comprising double-stranded DNA of (b), cDNA from transcription of the CRISPR guide RNAs and cDNA generated from cellular RNA, optionally wherein the separation is based on size.
11. The method of any one of claims 1 to 10, wherein the PCR products generated in step (e) are separated to obtain a first library comprising double-stranded DNA of (b), and a second library comprising (i) cDNA from transcription of the CRISPR guide RNAs and (ii) cDNA generated from cellular RNA, optionally wherein the separation is based on size.
12. The method of any one of claims 1 to 11, wherein the population of cells of (a) have been further treated with a chemical agent or a biological agent.
13. The method of any one of claims 1 to 12, wherein the analysis is limited to cells (nuclei) defined as having at least 200 fragments per cell and/or perturbations wherein at least 100 cells are identified as having the perturbation.
PCT/US2025/012714 2024-01-23 2025-01-23 Methods for chromatin accessibility and transcriptome analysis of cells having genetic perturbations Pending WO2025160253A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202463624062P 2024-01-23 2024-01-23
US63/624,062 2024-01-23
US202463697184P 2024-09-20 2024-09-20
US63/697,184 2024-09-20

Publications (1)

Publication Number Publication Date
WO2025160253A1 true WO2025160253A1 (en) 2025-07-31

Family

ID=96545705

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2025/012714 Pending WO2025160253A1 (en) 2024-01-23 2025-01-23 Methods for chromatin accessibility and transcriptome analysis of cells having genetic perturbations

Country Status (1)

Country Link
WO (1) WO2025160253A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020176788A1 (en) * 2019-02-28 2020-09-03 10X Genomics, Inc. Profiling of biological analytes with spatially barcoded oligonucleotide arrays
US20220267759A1 (en) * 2019-07-12 2022-08-25 New York Genome Center, Inc. Methods and compositions for scalable pooled rna screens with single cell chromatin accessibility profiling

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020176788A1 (en) * 2019-02-28 2020-09-03 10X Genomics, Inc. Profiling of biological analytes with spatially barcoded oligonucleotide arrays
US20220267759A1 (en) * 2019-07-12 2022-08-25 New York Genome Center, Inc. Methods and compositions for scalable pooled rna screens with single cell chromatin accessibility profiling

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LISCOVITCH-BRAUER NOA; MONTALBANO ANTONINO; DENG JIALE; MéNDEZ-MANCILLA ALEJANDRO; WESSELS HANS-HERMANN; MOSS NICHOLAS G.; KU: "Profiling the genetic determinants of chromatin accessibility with scalable single-cell CRISPR screens", NATURE BIOTECHNOLOGY, vol. 39, no. 10, 29 April 2021 (2021-04-29), New York, pages 1270 - 1277, XP037583625, ISSN: 1087-0156, DOI: 10.1038/s41587-021-00902-x *
YAN RACHEL E.; CORMAN ALBA; KATGARA LYLA; WANG XIAO; XUE XINHE; GAJIC ZORAN Z.; SAM RICHARD; FARID MICHAEL; FRIEDMAN SAMUEL M.; CH: "Pooled CRISPR screens with joint single-nucleus chromatin accessibility and transcriptome profiling", NATURE BIOTECHNOLOGY, vol. 43, no. 10, 21 November 2024 (2024-11-21), New York, pages 1628 - 1634, XP038453362, ISSN: 1087-0156, DOI: 10.1038/s41587-024-02475-x *

Similar Documents

Publication Publication Date Title
US20220267759A1 (en) Methods and compositions for scalable pooled rna screens with single cell chromatin accessibility profiling
AU2021261918B2 (en) High-Throughput Single-Cell Transcriptome Libraries and Methods of Making and of Using
Denker et al. The second decade of 3C technologies: detailed insights into nuclear organization
Hrdlickova et al. RNA‐Seq methods for transcriptome analysis
Zhang et al. Strand-specific libraries for high throughput RNA sequencing (RNA-Seq) prepared without poly (A) selection
US11913017B2 (en) Efficient genetic screening method
Wang et al. RNA-DNA differences are generated in human cells within seconds after RNA exits polymerase II
JP7637390B2 (en) High-throughput single nucleus and single cell libraries and methods of making and using
JP2011500092A (en) Method of cDNA synthesis using non-random primers
US20230383336A1 (en) Method for nucleic acid detection by oligo hybridization and pcr-based amplification
Yan et al. Pooled CRISPR screens with joint single-nucleus chromatin accessibility and transcriptome profiling
Garren et al. Global analysis of mouse polyomavirus infection reveals dynamic regulation of viral and host gene expression and promiscuous viral RNA editing
JP7489455B2 (en) Detection and analysis of mammalian DNA methylation
Zhang et al. Loci specific epigenetic drug sensitivity
WO2025160253A1 (en) Methods for chromatin accessibility and transcriptome analysis of cells having genetic perturbations
Mahat et al. Single-cell nascent RNA sequencing using click-chemistry unveils coordinated transcription
Carninci Cap-Analysis Gene Expression (CAGE): The Science of Decoding Genes Transcription
EP4269618B1 (en) Methods of making high-throughput single-cell transcriptome libraries
US20250297243A1 (en) Single cell multiomics
Smith Genetic and Epigenetic Identity of Centromeres
Kempfer Chromatin folding in health and disease: exploring allele-specific topologies and the reorganization due to the 16p11. 2 deletion in autism-spectrum disorder
HK40103460A (en) Methods of making high-throughput single-cell transcriptome libraries
WO2024033411A1 (en) Methods for determining the location of a target sequence and uses
Alles Poly (A) Tail Regulation in the Nucleus
CN117015602A (en) Analyze expression of protein-coding variants in cells

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25745692

Country of ref document: EP

Kind code of ref document: A1