WO2025207407A1 - Programmable enrichment via rna fish for single-cell rna analysis - Google Patents
Programmable enrichment via rna fish for single-cell rna analysisInfo
- Publication number
- WO2025207407A1 WO2025207407A1 PCT/US2025/020760 US2025020760W WO2025207407A1 WO 2025207407 A1 WO2025207407 A1 WO 2025207407A1 US 2025020760 W US2025020760 W US 2025020760W WO 2025207407 A1 WO2025207407 A1 WO 2025207407A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cells
- nuclei
- cell
- probes
- rna
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
Definitions
- HHV-6 + , CAR + T cells that can occur at a frequency of 1 in 10,000 cells in infusion products was identified, which may contribute to the etiology of HHV-6 encephalitis in patients receiving cell therapies.
- these anecdotes represent diverse populations and tissue types, the conceptual mode of discovery for these populations has been consistent: the profiles of ⁇ 10 5 -10 7 cells were generated, yielding ⁇ 10 1 - 10 3 events of interest.
- downstream analyses including the identification of transcriptional heterogeneity within these populations, inference of additional marker genes, and analyses of gene regulatory networks.
- RNA(s) are labeled using fluorescence in situ hybridization (FISH), and a subset of the cells are selected by flow cytometry and then analyzed via scRNA-seq.
- FISH fluorescence in situ hybridization
- scRNA-seq The broad applicability of PERF-seq to enrich immune cell subsets using individual and combinations of RNA markers, including the mRNA of transcription factors is demonstrated. Further, the compatibility of this protocol with nuclei extracted from frozen or FFPE tissue samples is shown.
- This approach enables an efficient enrichment and high-throughput profiling of cells and nuclei populations of interest using logic-gated sorting across heterogeneous cell and tissue types.
- the method was implemented using single-cell RNA sequencing (scRNA-seq).
- scRNA-seq single-cell RNA sequencing
- the general approach could be used for other single cell RNA analysis methods (e.g., methods that are based on hybridization of probes to RNA in the and then analyzing the hybridized probes or ligation products of the same).
- the method may comprise obtaining a sample comprising fixed cells or fixed nuclei, hybridizing oligonucleotide probes to RNA in the fixed cells or fixed nuclei to produce labeled cells or labeled nuclei, enriching for a sub-population of the fixed cells or fixed nuclei based on their labeling by flow cytometry, treating the enriched cells or enriched nuclei with a double- stranded DNAse for a sufficient time to degrade the oligonucleotide probes, inactivating the double-stranded DNAse, and performing single cell RNA analysis on the cells.
- Figs. 1A-1F Rationale and development of PERF-seq.
- Fig. 1A Overall schematic of the PERF-scq assay.
- Target RNA(s) arc bound by initiator probes.
- Hairpin amplifiers generate fluorescent signal and enable fluorescence-activated cell sorting (FACS) before single-cell profiling with the droplet-based scRNA-seq Flex kit.
- Fig. IB Top left: Knee plot of cells profiled with standard Flex versus HCR-FISH sorted cells.
- Top right Summary of fully mapping (blue) or half-mapping (grey) to the reference probe set.
- FIG. 1C Summary of experiments identifying the HCR polymer as the corrupting agent for data quality.
- Fig. ID Conditions screened for polymer stripping, including DNase and formamide.
- Fig. IE Summary of conditions analyzed for sorting buffer to improve data quality.
- Fig. IF Overall summary of UMIs (top) and genes (bottom) detected per cell comparing initial FlowFISH — > Flex vO experiment, from (Fig. IB) to the PERF-seq library, L, from panel (Fig. IE). The median values for each metric and each library are reported.
- Figs. 2A-2F Benchmarking of PERF-seq.
- Fig. 2A Schematic of ACTB staining. PBMCs were isolated, stained, and analyzed for ACTB expression.
- Fig. 2B Benchmarking of IncRNA XZSTby cell line mixing. A serial dilution of K562s (XX / XIST + ) into Rajis (XY I XIST) was performed and XIST FlowFISH was assessed to recover the positive population.
- Fig. 2C Schematic of PERF-seq benchmarking experiment for four libraries, including standard Flex and variable probe staining/sorting.
- Fig. 2D Flow sort strategy for CD3E+ cells for the PERF- seq library.
- Figs. 5A-5N Study of rare nuclei from fresh and FFPE tissue.
- Fig. 5A Schematic of nuclei PERF-seq experiments. Nuclei were isolated from either frozen mouse brain tissue or FFPE human glioblastoma multiform (GBM tissue) and enriched for specific populations based on HCR-FlowFISH, showing the sort strategy.
- Fig. 5B Downsampling analysis for library saturation and UMI benchmarking for the mouse brain nuclei. The dotted line represents the mean reads per cell for a final comparison (depth of lowest sample).
- Fig. 5C Same as (Fig. 5B) but for the human FFPE tissue sample.
- Fig. 5C Same as (Fig. 5B) but for the human FFPE tissue sample.
- Fig. 5H Reduced dimensionality representation of the human FFPE nuclei FlowFISH enriched/depleted populations profiled with PERF-seq.
- Fig. 51 Same as (Fig. 5D) but colored by marker genes used in the FlowFISH panel. The boxed population was further subclustered.
- Fig 5 J Empirical cumulative distribution plot of total UMI count for the sum of the three genes enriched via FlowFISH, stratified by the captured PERF-seq library.
- Fig. 5K Top differentially expressed genes between the two FFPE populations profiled with PERF-seq.
- Fig. 5L Gene-gene correlations of relevant marker genes, including those used in the FlowFISH enrichment panel.
- Fig. 5M Sub-clustering of the Panel+ population with cluster states noted.
- Fig. 5N Top marker genes enriched in specific sub-clusters; arrows indicate critical populations where each gene is highly expressed.
- Figs. 7A-7F Profiling somatic mosaicism with PERFF-seq.
- Fig. 7A Schematic of experiment. PBMCs from donors of different ages were sorted for a ten-gene OR-gated panel of MSY.
- Fig. 7B Mean per-cell expression of all genes detected in Flex with genes analyzed for FlowFISH noted.
- Fig. 7C Summary of the percentage of MSY- cells, with donor age labels, from the FlowFISH cytometry data.
- Fig. 7D UMAP embedding of PERFF-seq profiles from the 51-year-old donor based on MSY sorting logic.
- Fig. 7A-7F Profiling somatic mosaicism with PERFF-seq.
- Fig. 7A Schematic of experiment. PBMCs from donors of different ages were sorted for a ten-gene OR-gated panel of MSY.
- Fig. 7B Mean per-cell expression of all genes detected in Flex with genes
- Fig. 8 Summary of BCL11A RNA expression across populations. Bulk RNA-seq of sorted populations of BCL11A20. Design and results of cytometry analysis of PBMCs co-stained with BCL11A mRNA (via HCR-FISH) and CD19 and CD123 protein (via antibodies). Mean fluorescence intensity (MFI) for BCL11 A of each population is quantified.
- MFI Mean fluorescence intensity
- Fig. 9 Design and results of antibody and HCR FISH co-staining to evaluate CD4 RNA expression. Summary of CD4 HCR FISH signal, stratified by CD3E populations. Bulk RNA-seq expression of CD4 from FACS-isolated populations. Design and results of antibody and HCR FISH co-staining to evaluate CD4 RNA expression.
- the method may comprise obtaining a sample comprising fixed cells or fixed nuclei. This may be done by treating a sample comprising single cells or nuclei with a chemical cross-linker (e.g., paraformaldehyde or glutaraldehyde) or isolating fixed nuclei from a fixed tissue sample (e.g., a section of tissue that has been fixed using paraformaldehyde or glutaraldehyde).
- a chemical cross-linker e.g., paraformaldehyde or glutaraldehyde
- the next steps of the method may comprise hybridizing oligonucleotide probes to RNA in the fixed cells or fixed nuclei to produce labeled cells or labeled nuclei, enriching for a sub-population of the fixed cells or fixed nuclei based on their labeling by flow cytometry, treating the enriched cells or enriched nuclei with a doublestranded DNAse (or another enzyme or treatment that is capable of degrading double- stranded DNAs) for a sufficient time to degrade the oligonucleotide probes, inactivating the doublestranded DNAse, and performing single cell RNA analysis on the cells or nuclei.
- a doublestranded DNAse or another enzyme or treatment that is capable of degrading double- stranded DNAs
- the double-stranded DNAase should be DNA-spccific and should specifically degrade double- stranded DNA molecules over single- stranded DNA molecules (meaning that the activity of the enzyme on a single-stranded DNA substrate should be less than 1%, less than 0.5%, or less than 0.1% of the activity of the enzyme on a double-stranded DNA substrate).
- dsDNAses or duplex DNAses can be purchased from a variety of vendors, including Thermofisher (Waltham, MA) VWR (Radnor, PA) and New England Biolabs (Ipswich, MA), among many others.
- the enzyme may be heat labile so that it can be readily inactivated by moderate heat treatment (e.g., by incubation at 55°C for at least 10 minutes).
- the double stranded DNAse may be inactivated, e.g., by exposure to a temperature in the range of 50 °C to 75 °C (e.g., 50-60 °C) and/or by addition of a chelating agent (e.g., EDTA) and/or a reducing agent (e.g., DDT), before analyzing the RNA (e.g., by making cDNA or hybridizing probes).
- a chelating agent e.g., EDTA
- a reducing agent e.g., DDT
- the method is done without uncrosslinking the enriched cells or enriched nuclei, and without denaturing the oligonucleotide probes from the RNA.
- the method may be done in the absence of a step in which formamide, DMSO, or another chemical denaturant is added to the sample to denature the oligonucleotide probes from the RNA.
- the method may further comprise permeabilizing the cells or nuclei using a detergent (e.g. Tween).
- a detergent e.g. Tween
- the method may comprise treating a sample comprising single cells or nuclei with a chemical cross-linker (e.g., paraformaldehyde or glutaraldehyde) and a detergent such as Tween to produce the fixed cells or fixed nuclei, where the fixed cells or fixed nuclei are permeabilized.
- a chemical cross-linker e.g., paraformaldehyde or glutaraldehyde
- a detergent such as Tween
- RNA is labeled in situ (i.e., within the cells or nuclei) and cells that have a particular labeling pattern are enriched (where a labeling pattern can be, e.g., the presence of a particular transcript, the amount of a particular transcript, the presence of a particular combination of transcripts, the lack of one transcript and the presence of another, in a cell etc.).
- FISH-Flow This technique is commonly referred to as “FISH-Flow”, “Flow-FISH” or “FlowFISH”in the art and, as described above, comprises hybridizing fluorescently labeled oligonucleotides to mRNA in fixed cells or nuclei in situ and then selecting for a sub-population of the cells using a flow cytometry-based sorting.
- Such methods arc generally described in, e.g., Arrigucci ct al (Nat Protoc.
- the flow cytometry may detect single RNA molecules, methods for which are described in Smith et al (ACS Nano. 2020 14: 2324-2335), Yoo et al (Anal. Chem.
- At least 500 cells or nuclei are enriched.
- This method can be used to select rare cells, e.g., hematopoietic or non- hematopoietic cells that are present at a low concentration (e.g., less than 1%, less than 0.5%, less than 0.1%, less than 0.05%, or less than 0.01% in the initial population).
- a low concentration e.g., less than 1%, less than 0.5%, less than 0.1%, less than 0.05%, or less than 0.01% in the initial population.
- less than 10% (e.g., less than 5%, less than 1%, less than 0.5%, less than 0.1%, less than 0.05%, or less than 0.01%) of the fixed cells or fixed nuclei may be enriched.
- RNA may be labeled using Z-probes (see. e.g., Tripath et al Noncoding RNA 2018 4: 20), branched DNA (see, e.g., Wang et al J Mol Diagn. 2012 14: 22-29, Player et al J Histochem Cytochem 2001 49: 603-12), SABER (See, e.g., Kish et al Nature Methods 2019 16: 533-544) or by hybridization chain reaction (see. e.g., Schwarzkopf et al Development. 2021 148: devl99847 and Nat. Biotechnol. 2018 10.1038/nbt.4286).
- Z-probes see. e.g., Tripath et al Noncoding RNA 2018 4: 20
- branched DNA see, e.g., Wang et al J Mol Diagn. 2012 14: 22-29, Player et al J Histochem Cytochem 2001 49: 603-12
- SABER See,
- the oligonucleotide probes comprise a pair of unlabeled initiator probes that hybridize to adjacent sites in a target RNA and fluorescently labeled amplification probes that hybridize to a pair of initiator probes when the initiator probes are hybridized to their target RNA.
- the fluorescently labeled amplification probes are designed to hybridize to one another as well as to the initiator probes, thereby forming a complex comprising multiple amplification probes when the initiator probes hybridize to their target RNA.
- the amplification probes may comprise a hairpin structure.
- hybridization chain reaction approaches can be found in, e.g., Choi et al (Nature Biotechnology 201028:1208-12), Yamaguchi et al (Environmental Microbiology 2015 17: 2532-2541), Schulte et al (Development 2024 151:dev202307), Zheng et al (Anal Methods. 2023 15: 1422-1430), Choi et al (Development 2018 145: dcvl65753) and Choi et al ACS Nano 2014 8: 4284-4294, among many others.
- the FISH protocol used in the present methods may have a nucleic acid amplification step.
- the FISH protocol used in the present method does not involve nucleic acid amplification step (i.e., is free of nucleic acid amplification).
- FISH methods that are capable of resolving single molecules and are referred to as “single-molecule FISH” or smFISH methods. Any smFISH methods may be employed herein.
- the cells may be additionally labeled and enriched in other ways (e.g., using antibodies that bind to cell surface antigens or intracellar antigens), thereby providing a multomics-based way to enrich for cells of interest.
- the single-cell RNA analysis can be performed by a variety of methods, some of which may employ a single-cell compartmentalization approach and others of which may employ a split- and-pool barcoding approach, examples of which are described below.
- the single cell RNA analysis may be done using a single-cell compartmentalization method that comprises: i) compartmentalizing the cells or nuclei (e.g., in an emulsion, droplets, wells, or other containers), wherein at least some compartments receive a single cell or nucleus; (ii) making cell-specifically barcoded cDNA from the cells or nuclei in the compartments; and (iii) sequencing the cell-specifically barcoded cDNA.
- This method is an scRNA-seq approach.
- the compartments may additionally comprise a bead that comprises a cell-specifically barcoded reverse transcription primer (i.e., primer molecules that are tethered to or embedded in the bead), where the primers associated with each bead have a unique barcode (i.e., a barcode that distinguishes the beads from one another).
- the cell-specifically barcoded reverse transcription primer may be an oligo(dT) or random primer and, in addition, may comprise a unique molecular identifier (UMI), e.g., a ‘random’ sequence.
- the method may comprise: (i) releasing the primer from the beads, allowing the released primer to hybridize to RNA, and extending the primer using a reverse transcriptase, (ii) lysing the cells or nuclei, allowing the released RNA to hybridize to the primer, and extending the primer using a reverse transcriptase on the beads, or (iii) releasing the primer from the beads, lysing the cells, allowing the released primer and released RNA to hybridize, and extending the primer using a reverse transcriptase. Examples of such single-cell compartmentalization methods are described in De Simone et al (Methods Mol. Biol. 2019 1979: 87-110), Gao et al (Curr.
- the single cells are encapsulated into a gel-beads-in- emulsion (GEMs).
- GEMs gel-beads-in- emulsion
- each functional GEM contains a single cell, a single gel bead, and reverse transcription reagents.
- olignonucleotide primers are composed of 4 distinct parts (a PCR primer sequence (essential for the sequencing), a bead-specific barcode (which becomes the single cell barcode), a unique molecular identifier (UMI) sequence and, at the 3’ end a oligo(dT) sequence (that enables captures poly-adenylated mRNA molecules), a pseudo-random sequence, a random sequence, a gene-specific sequence, etc. .
- a PCR primer sequence essential for the sequencing
- a bead-specific barcode which becomes the single cell barcode
- UMI unique molecular identifier
- oligo(dT) sequence that enables captures poly-adenylated mRNA molecules
- pseudo-random sequence a random sequence
- a gene-specific sequence etc.
- the gene expression level of each gene can be determined using the UMIs.
- the compartments may additionally comprise a bead that comprises a pair of primers, at least one of which may be cell- specifically barcoded.
- one, the other or both of the primers may contain a cell specific barcode, a unique molecular identifier (UMI), e.g., a ‘random’ sequence and sequences that hybridize to the probes at the 3’ end, so that the hybridized probes, or ligation products thereof can be amplified by the primers.
- UMI unique molecular identifier
- Downstream amplification and sequencing follow the standard Flex guidelines with no modifications.
- this workflow uses a bead oligo to barcode a ligated probe junction pair from the WTA probe set to barcode mRNA molecules for gene expression counts. These barcoding events occur inside a standard droplet microfluidic workflow for single-cell sequencing before the oil emulsion droplets are broken, and per-cell nuclei acid sequences are amplified in a bulk PCR reaction.
- the single cell RNA analysis is done using a split-and-pool barcoding method that may comprise: (i) making cDNA in the cells; (ii) compartmentalizing the cells, wherein at least some compailments receive multiple cells; and (iii) adding cell-specific barcodes to the cDNA in the cells using a split-and-pool barcoding method.
- split-and-pool barcoding methods involve partitioning a sample containing cells or nuclei into several compartments, where the compartments receive multiple cells or nuclei, adding a different building block (or “subunit”) for the cell-specific barcode to each partition, pooling the sample, then repeating the partitioning, addition and pooling steps until a sufficient number of subunits have been added and the cells or nucleic in the sample are uniquely indexed.
- a building block or “subunit”
- indexing can be done using the split-and-pool approach (or “combinatorial barcoding” or “combinatorial indexing” as it is sometimes called) are described in a variety of publications including Kuchina et al (Science 2021 371:eaba5257), O’Huallachain et al (Commun. Biol. 2020 3: 279), Cao et al (Science 2017 357: 661-667), Rosenberg (Science 2018 360: 176-182) and WO2012106385A2, among many others.
- scRNA-seq methods of interest include, but are not limited to, Tang (Tang et al, Nature Methods 2009 6: 377-382), STRT (Islam et al, Genome Research 2011 21: 1160-1167), SMART-seq (Ramskbld et al, Nature Biotechnology 201230: 777-782), SORT-seq (Muraro et al, Cell Systems 2016 3: 385-394.e3), CEL-seq (Hashimshony et al Cell Reports 20122: 666- 673), RAGE-seq (Singh et al Nature Communications 2019 10: 3120), Quartz-seq (Sasagawa Genome Biology 2013 14: R31), Cl-CAGE (Kouno et al Nature Communications 2019 10: 360), REAP-seq (Dal Molin et al Briefings in Bioinformatics 2019 20: 1384-1394), and CITE- seq (Pet
- RNA e.g., mRNA
- other components of the cells e.g., open chromatin, genomic DNA or protein expression may be assayed in the same assay.
- the method further comprises assaying the expression of a protein in the fixed cells or fixed nuclei (e.g., via binding of labeled antibodies), and the enrichment step may comprise enriching for a sub-population of the fixed cells or fixed nuclei based on their labeling and protein expression, by flow cytometry.
- the method may comprise amplifying the products (e.g., cDNA or ligation products) en masse prior to sequencing.
- sequence reads produced from the method can be analyzed to provide gene expression profiles for individual cells, i.c., an analysis of the “transcriptomc” of individual cells, i.e., which mRNAs are expressed by the cells and their abundance.
- the expression of at least 100, at least 500, at least 1,000, at least 5,000 or at least 10,000 genes may be analyzed in at least 100, at least 500, at least 1,000, at least 5,000 or at least 10,000 individual cells or nuclei.
- the sample may comprise cells that are grown as a cell suspension, disassociated cells, or blood cells, or nuclei isolated from the same, for example.
- the sample may contain cells that are in solution, e.g., cultured cells that have been grown as a cell suspension, or nuclei isolated from the same.
- disassociated cells which cells may have been produced by disassociating cultured cells or cells that are in a solid tissue, e.g., a soft tissue such as liver or spleen, etc. using trypsin or the like
- nuclei from the same may be used.
- the sample may contain blood cells, e.g., whole blood or a sub-population of cells thereof, or nuclei from the same.
- Sub-populations of cells in whole blood include platelets, red blood cells (erythrocytes), platelets and white blood cells (i.e., peripheral blood leukocytes, which are made up of neutrophils, lymphocytes, eosinophils, basophils, and monocytes).
- the cells can be from any source.
- the cells may be obtained from a culture of cells, e.g., a cell line. In other cases, the cells may be isolated from an individual (e.g., a patient or the like).
- the cells may be isolated from a soft tissue or from a bodily fluid, or from a cell culture that is grown in vitro.
- the single cells may be isolated by digesting a soft tissue such as brain, adrenal gland, skin, lung, spleen, kidney, liver, spleen, lymph node, bone marrow, bladder stomach, small intestine, large intestine or muscle, etc.
- Bodily fluids include blood, plasma, saliva, mucous, phlegm, cerebral spinal fluid, pleural fluid, tears, lactal duct fluid, lymph, sputum, cerebrospinal fluid, synovial fluid, urine, amniotic fluid, and semen, etc.
- nuclei can be purified from the cells.
- nuclei can be obtained from a sample of tissue that has previously been fixed.
- Cells and nuclei from yeast, plants and animals, such as fish, birds, reptiles, amphibians and mammals may be analyzed using the subject methods.
- mammalian cells or nuclei i.e., cells or nuclei from mice, rabbits, primates, or humans, or cultured derivatives thereof, may be used.
- the method can be used to compare two samples.
- the method may comprise analyzing a first population of cells using the abovedescribed method to produce a first data set; and analyzing a second population of cells using the above-described method to produce a second data set; and comparing the first data set to the second data set, e.g., to see if there are any changes in RNA expression between the two samples.
- the first population of cells and the second population of cells are collected from the same individual at different times. In other embodiments, the first population of cells and the second population of cells are different populations of cells collected from tissues or different individuals.
- Exemplary cell types that can be analyzed in the method include, for example, cells isolated from a tissue biopsy (e.g., from a tissue having a disease such as colon, breast, prostate, lung, skin cancer, or infected with a pathogen etc.) and normal cells from the same tissue, e.g., from the same patient; cells grown in tissue culture that are immortal (e.g., cells with a proliferative mutation or an immortalizing transgene), infected with a pathogen, or treated (e.g., with environmental or chemical agents such as peptides, hormones, altered temperature, growth condition, physical stress, cellular transformation, etc.), and normal cells (e.g., cells that are otherwise identical to the experimental cells except that they are not immortalized, infected, or treated, etc.); cells isolated from a mammal with a cancer, a disease, a geriatric mammal, or a mammal exposed to a condition, and cells from a mammal of the same species, e.g.
- cells of different types e.g., neuronal and non-ncuronal cells, or cells of different status (e.g., before and after a stimulus on the cells) may be compared.
- the experimental material is cells susceptible to infection by a pathogen such as a virus, e.g., human immunodeficiency virus (HIV), etc.
- the control material is cells resistant to infection by the pathogen.
- the sample pair is represented by undifferentiated cells, e.g., stem cells, and differentiated cells.
- the method may be used to identify the effect of a test agent, e.g., a drug, or to determine if there are differences in the effect of two or more different test agents.
- a test agent e.g., a drug
- two or more identical populations of cells may be prepared and, depending on how the experiment is to be performed, one or more of the populations of cells may be incubated with the test agent for a defined period of time. After incubation with the test agent, gene expression of the populations of cells can be analyzed using the methods set forth above, and the results can be compared.
- the cells may be blood cells, and the cells can be incubated with the test agent ex vivo. These methods can be used to determine the mode of action of a test agent.
- the method described above may also be used as a diagnostic (which term is intended to include methods that provide a diagnosis as well as methods that provide a prognosis). These methods may comprise, e.g., analyzing the transcriptome of a subset of cells from a patient using the method described above to produce data; and providing a diagnosis or prognosis based on the data.
- the method set forth herein may be used to provide a reliable diagnostic to any condition associated with, e.g., altered gene expression.
- the method can be applied to the characterization, classification, differentiation, grading, staging, diagnosis, or prognosis of a condition characterized by an epigenetic pattern (e.g., a pattern of gene expression).
- the method can be used to determine whether the pattern of labeling of a sample from an individual suspected of being affected by a disease or condition is the same or different compared to a pattern of labeling for a sample that is considered “normal” with respect to the disease or condition.
- the method can be directed to diagnosing an individual with a condition that is characterized by expression pattern, where the pattern is correlated with the condition.
- the methods set forth herein can also be used for predicting the susceptibility of an individual to a condition.
- Exemplary conditions that arc suitable for analysis using the methods set forth herein can be, for example, cell proliferative disorder or predisposition to cell proliferative disorder; metabolic malfunction or disorder; immune malfunction, damage or disorder; CNS malfunction, damage or disease; symptoms of aggression or behavioral disturbance; clinical, psychological and social consequences of brain damage; psychotic disturbance and personality disorder; dementia or associated syndrome; cardiovascular disease, malfunction and damage; malfunction, damage or disease of the gastrointestinal tract; malfunction, damage or disease of the respiratory system; lesion, inflammation, infection, immunity and/or convalescence; malfunction, damage or disease of the body as an abnormality in the development process; malfunction, damage or disease of the skin, the muscles, the connective tissue or the bones; endocrine and metabolic malfunction, damage or disease; headache or sexual malfunction, and combinations thereof.
- the method can provide a prognosis, e.g., to determine if a patient is at risk for recurrence.
- Cancer recurrence is a concern relating to a variety of types of cancer.
- the prognostic method can be used to identify surgically treated patients likely to experience cancer recurrence so that they can be offered additional therapeutic options, including preoperative or postoperative adjuncts such as chemotherapy, radiation, biological modifiers and other suitable therapies.
- the methods are especially effective for determining the risk of metastasis in patients who demonstrate no measurable metastasis at the time of examination or surgery.
- the method can also be used as a theranostic, i.e., to provide a recommendation for a course of treatment for a patient having a disease or condition, e.g., a patient that has cancer.
- a course of treatment refers to the therapeutic measures taken for a patient after diagnosis or after treatment.
- a determination of the likelihood for recurrence, spread, or patient survival can assist in determining whether a more conservative or more radical approach to therapy should be taken, or whether treatment modalities should be combined.
- cancer recurrence it can be advantageous to precede or follow surgical treatment with chemotherapy, radiation, immunotherapy, biological modifier therapy, gene therapy, vaccines, and the like, or adjust the span of time during which the patient is treated.
- a lab will receive a sample (e.g., blood) from a remote location (e.g., a physician’s office or hospital), the lab will analyze cells in the sample as described above to produce data, and the data may be forwarded to the remote location for analysis.
- a sample e.g., blood
- a remote location e.g., a physician’s office or hospital
- Kits Kits comprising components for performing the method, as described above, are also provided.
- the components may be in separate containers or the same container, as needed.
- RNA sequencing scRNA-seq
- scRNA-seq single-cell RNA sequencing
- many populations have been described primarily based on the presence or absence of specific marker transcripts, which limits the isolation and further profiling of populations.
- Programmable Enrichment via RNA Flow-FISH by sequencing PERF-seq
- a scalable assay that enables scRNA-seq profiles from subpopulations of complex cellular mixtures defined by the presence or absence of RNA transcripts, is used.
- PBMCs peripheral blood mononuclear cells
- Vials were thawed and viability exceeded 90% for all samples.
- PBMCs were used as the primary input for developing the assay in Fig. 1 due to ease of material availability and well- defined heterogeneity for MS4A1 and CD3E.
- Fig. 2-4 the same vials were used but enriched for specific markers as indicated in the experimental overview schematics (Fig. 2c, 3a, 4a). All experiments started with ⁇ 10M cells, except for the TF sort experiment (Fig. 4), which began with ⁇ 25M cells to yield ample cell numbers for downstream profiling given the rare BCL11A population that was sorted.
- HCR FlowFISH Detection Stage Pcrmcabilizcd cclls/nuclci were resuspended in prewarmed 400uL of hybridization buffer (molecular instruments) per 500,000 to IM cells. Cells/nuclei were incubated for 30 minutes at 370C, 300rpm in a heated lid thermomixer. The probe solution was prepared by mixing 8uL of luM probe stock and hybridization buffer for a final lOOuL volume per sample. Probe solution was added to each sample for a final probe concentration of 16nM and cells were incubated at 37C for 16-24 hours.
- HCR FlowFISH Amplification Stage Cells/nuclei were centrifuged and resuspended in 150uL of amplification buffer and incubated at room temperature for 30 minutes. In the meantime, 5uL of 3uM hl and h2 hairpin stock was aliquoted for each probe set and snap cooled by performing a heat shock at 950C for 90 seconds and cooling in the dark at room temperature for 30 minutes. To prepare the hairpin solution, snap-cooled hl and h2 hairpins were mixed with an amplification buffer to make a final volume of lOOuL per sample. The hairpin solution was added to appropriate samples for the final hairpin concentration of 60nM. Cells/nuclei were incubated at room temperature for 16-24 hours. (This time can be reduced to 4 hours). After incubating, samples were washed 6x with 500uL of SSCT for each sample.
- HCR Polymer Disassembly Sorted cells were pelleted and resuspended in 275uL of lx dsDNase buffer (Thermofisher #EN0771) and incubated for 15 minutes after which 25uL of dsDNase enzyme (Thermofisher #EN0771) was added and the sample was incubated at 37°C for 2 hours. After incubation, 3uL of IM DTT was added to the sample to quench dsDNAse activity and incubated at 55°C for 5 minutes for heat inactivation. Samples were pelleted at 850xg for 5 minutes, resuspended in 500uL of pre-warmed wash buffer, and incubated for 10 minutes.
- the mixed cells were then fixed and permeabilized as described in the prior sections followed by the HCRFlowFISH protocol using ATSTRNA probes for detection, and Alexa Fluor 647-conjugated hairpins for amplification. FACS analyses were conducted on ThermoFisher Attune NxT Flow Cytometer.
- Mouse Tissue Sourcing Mouse brain tissue was sourced from Zyagen Inc. as a fresh frozen whole brain stored in OCT. Upon receipt tissue was stored at -80C. Dissection was performed in a cryotome and immediately processed for nuclei processing.
- Mouse brain dissociation and profiling Mouse Brain Dissociation was performed as described by lOx genomics in tissue fixation and dissociation protocol. Briefly, fresh frozen mouse cerebellum was weighed and fixed in 4% paraformaldehyde solution for 2 hours at 2mL per 25mg of tissue with periodic agitation. Then, the tissue was centrifuged and re-suspended in lx PBS twice. Washed tissue was resuspended in ice-cold 70% ethanol at 2mL per 25mg of tissue and incubated overnight at 4C. After incubation, the tissue was centrifuged and resuspended with lx PBS twice.
- GBM FFPE dissociation FFPE samples were preprocessed on a prototype S2 Singulator system. The sample was automatically processed In a NIC+ cartridge (S2 Genomics #100-215-389) by three 15 min deparaffinization steps (CitriSolv, VWR), rehydrated by successive 1 mL washes of 100%, 100%, 70%, 50%, and 30% ethanol, followed by 2 washes of PBS.
- the sample was then spun at 1,000g for 3 min and resuspended in 0.5 mL Nuclei Isolation Reagent (NIR, S2 Genomics, #100-063-396) with 0.1 ul/uL RNase inhibitor (Protector, Millipore Sigma, #3335399001); all subsequent solutions had RNase inhibitor.
- the sample was dissociated to single nuclei in a second NIC+ cartridge with 2 mL of NIR for 10 min followed by a 2 mL wash with Nuclei Storage Reagent (NSR, S2 Genomics, #100-063-405).
- the single nuclei suspension was spun 500g for 5 min, resuspended in NSR, and counted.
- Bioinformatics analyses overview All bioinformatics analyses were conducted using standard output files from the execution of CellRanger to sequencing data of the Flex libraries. Downstream analyses, including cell filtering, marker gene analyses, and visualization, were 17 performed using Seurat v4 . In brief, cells were identified via a combination of passing the CellRanger knee plot as well as meeting minimum quality control standards, including at least 1,000 UMIs detected, 500 genes detected, and no more than 5% mitochondrial RNA abundance, which are standard thresholds for scRNA-seq analyses. For all sub-clustering analyses (Figs.
- the output from classification was partitioned as “B cells” for MS4A1 + , “CD4+ T cells” for CD4 + ,CD3E + cells, “CD8+ T cells” and “T cells” for the CD4- ,CD3E + population, and all other labels as the negative population.
- genes were clustered individually and cell type annotations were defined based on standard practice for the presence or absence of individual marker genes. The proportions annotated as accurate classification represent the total number of high-quality cells (n> 10,000 per comparison) and were consistent between different classification methods, verifying the specificity of the enrichment via sorting strategy and preservation of transcriptomcs for downstream analyses. For comparisons with other RNA-seq datasets, normalized data from flow-sorted bulk populations and single-cell annotations. A collection of 2,674 target genes of BCL11 A was downloaded from Harmonizome 44
- Nuclei analyses Mouse brain analyses, including clustering and sub-clustering analyses, proceeded as described above. The selection of plotted marker genes followed the cerebellum atlas that defined oligodendrocyte subtype markers without any prior enrichment. To compare PERF-seq performance against frozen nuclei, a public four-plex FLEX library was downloaded and the dissociated eye nuclei was selected as the closest anatomical tissue to the profiled tissue, noting these are an imperfect yet useful comparison (Fig. 5B).
- HCR polymer in the sorted population of cells was disassembled by adapting formamide and enzymatic stripping of the HCR polymer, both of which have been used in imaging and microscopy analyses.
- multiple dsDNases including a dsDNase (Thermo) that preferentially degrades double- stranded DNA were assessed (Methods).
- dsDNases including a dsDNase (Thermo) that preferentially degrades double- stranded DNA were assessed (Methods).
- formamide enabled polymer stripping as expected, there was no meaningful improvement in data quality.
- the presence of formamide was sufficient to inhibit the capture of virtually any ligation product (0.5% reads mapping to the full probe set), irrespective of the presence of HCR-polymer (Fig. ID).
- PERF-seq benchmarking assess the feasibility, sensitivity, and specificity of potential enrichments, the efficiency of the HCR-FlowFISH probes to enrich for well-defined populations was considered.
- Staining PBMCs for ACTB with two different amplifiers allowed us to assess HCR FISH efficiency on both the 647 or 488 channels via FACS. In either channel, 97% of cells were positive for ACTB, indicating the high sensitivity of HCR-FlowFISH and compatibility with multiple fluorescent colors (Fig. 2A).
- Fig. 2A Fig. 2A
- two human cell lines, Rajis and K562s were mixed in varying abundances, including two orders of magnitude dilution of the K562 cells into the Rajis (Methods).
- the key data quality metrics per cell did not differ by more than -10% comparing public Flex (median 5,200 UMIs and median 2,857 genes detected per cell), Flex completed herein (median 4,885 UMIs and median 2,536 genes), and the PERF-seq libraries (median 4,789 UMIs and median 2,535 genes; Fig. 4F)
- the benchmarking analyses verify HCR-FlowFISH as a sensitive and specific workflow for enriching populations, and PERF-seq as a protocol that yields consistent mRNA profiles with minimal, if any, loss in data quality.
- Multi-color programmable enrichment with PERF-seq Given the successful benchmarking of the assay that enriched an individual marker, further benchmark PERF-seq was investigated by designing probes against three well-described genes in different immune populations in PBMCs, CD3E, MS4A1 (CD20), and CD4 (Fig. 3A). Using one fluorophore per gene, four populations of cells were recovered using PERF-seq, including a CD4 and CD3E double-positive population that together specifically enrich for CD4 + T cells when either marker alone would CD4 monocytes; CD3E'. CD8 + T cells; Fig. 3B).
- scRNA-seq profiling of the four populations in a single reaction using in-line hashing yielded a total of 35,220 cells across one in-line barcoded multiplexed capture (Fig. 3C; ).
- three independent cell type annotation methods were performed, indicating that PERF-seq had 70-88% accuracy in recovering the expected cell type labels (Fig. 3D).
- the enrichment was lower for the negative population (i.e., B and T cells were not all sorted), indicative of potential variation in probe efficacy in the FlowFISH workflow (Methods).
- Application of standard clustering and cell type annotation workflows showed a clear skewing of cell states in either enriched library, including a consistent depletion of T cells from the sorted TF libraries (Figs. 4E, 4F).
- BCL11A is an essential regulator of lymphoid development with ‘B cell’ as part of its name
- nearly 75% of the enriched cell population from its PERF-seq library were pDCs.
- This enrichment was consistent with bulk and single-cell expression indicating that pDCs express BCL11A 1-2 orders of magnitude higher than B cells.
- cells from this sorted library had higher BCL11A expression and target gene module scores than B cells from the other two libraries, indicating that the PERF-seq workflow could enrich for cells with high TF activities within specific cell types.
- the SPll enrichment primarily resulted in monocyte and classical dendritic cells (eDCs), again consistent with the expectation of where this factor is expressed in PBMCs (Fig. 4F).
- PERF-seq targeting IL3RA was performed using an inclusive sort gate to include the IL3RA iovi population (marked in the prior analysis with SPI1 + FlowFISH). From the 9,178 profiled cells, clustering analyses identified 95 AS DCs, and gene correlation analyses confirmed the separation of expression modules co-occurring with these two TFs.
- analyses of the PERF-seq datasets and the original Smart-seq2 profiles broadly corroborate the cDC-like and pDC-like subsets of AS DCs and nominate the lineage-defining BCL11A and SPI1 TFs as potentially critical regulators of the heterogeneity in this rare cell state.
- Subclustering of the 1,015 vascular cells from the positive sort confirmed the two major populations, including VWF + endothelial cells and FNI + /DNC + mural cells (Figs. 5L, 5M). While identification of these major populations within the vaculature corroborate prior scRNA-seq analyses, high-quality PERF-seq profiles allowed for further identification of 8 subclusters within the enriched vasculature (Figs. 5M, 5N).
- PVLAP + endothelial cells included an abundant population of PVLAP + endothelial cells, a vascular marker of blood-brain barrier disruption, proliferating cells marked by MKI67, and an ultra-rare population of mural cells expressing OGN that has not been well-defined in the human brain but potentially linked to brain tumorogenesis.
- MSY ten-gene FlowFISH panel for the male-specific region of the Y chromosome was designed (MSY; Fig. 7A).
- the MSY genes were elected because each is lowly expressed individually (1.5 UMIs per gene per positive cell;sum 5.5 UMIs per positive cell) compared with other targets (13.4 UMIs; Fig. 7B).
- PERFF-seq can infer somatic changes as long as a sufficient fluorescent signal can be generated via FlowFISH. Although MSY genes are lowly expressed, the analyses show PERFF-seq can be applied to a range of gene expression levels with corresponding degrees of enrichment.
- CD4 FlowFISH data was re-analyzed in the context of the HCR-FlowFISH data.
- the analyses indicate that CD4 RNA expression is actually more intermediate in T cells relative to monocytes (Fig. 9), which was subsequently confirmed with an additional antibody staining experiments.
- the collective applications confirm the utility of PERF-seq to enrich a variety of transcriptomic markers across distinct input material, including cells and nuclei derived from heterogenous tissue preservation methods, including FFPE (Fig. 5).
- FFPE Fig. 5
- dissociations from solid tissue types often require the isolation of nuclei rather than cells for genomics profiling, the enrichment of specific cell populations is challenging due to a lack of well-described antibodies that stain the surface of the nucleus or ability to enrich based appropriate intranuclear proteins.
- PERF-seq may be used to enrich for specific populations defined by genes without the need for laborious genetic engineering, including via Cre- recombinases.
- prior methods such as Probe-seq have coupled RNA FISH to bulk RNA sequencing, PERF-seq is distinct in its ability to profile populations of single cells at high throughput ( ⁇ 10 4 -10 5 cells per capture).
- PERF-seq utilizes commercially available kits for FlowFISH and scRNA-seq profiling
- its adoption should be straightforward in groups proficient in cither technology.
- the number of cells/nuclei recovered from the enrichment sort should be sufficient to start the lOx Flex workflow, which typically requires ⁇ 10 5 -10 6 cells for cell pelleting upstream of droplet encapsulation.
- sorting for increasingly rare populations requires a concomitant abundance of starting material.
- the HCR-FlowFISH kits have 10 + colors, conceptually allowing for sophisticated AND/NOT/OR logic-gating of populations for profiling.
- HCR-FlowFISH FACS enables distinct sorting strategies.
- PERF-seq allows for designing an inclusive sorting logic where populations can be subsequently refined using transcriptomic profiles. For example, the application to sorting BCL11A + cells allowed for detailed reanalyses of both AS DCs and B cells in the same experiment without pre- specifying the populations with distinct markers during sorting as these populations could be readily separated from scRNA-seq analyses (Fig. 4).
- PERF-seq may be a particularly advantageous protocol in settings where the sorting and enrichment of cells or nuclei where antibodies against marker proteins are either not available, not applicable, or poorly defined for a population of interest.
- FlowFISH technologies have been established for viral gene expression, ribosomal RNA content, or long non-coding RNAs (Fig. 2B)
- the workflow distinctly enables studying populations defined by these markers that have been understudied in conventional genomics analyses.
- populations may be defined with existing large-scale scRNA-seq atlases, followed by rational nucleic acid cytometry, which will be accelerated by the development of the assay described herein.
- PERF-seq may aid in the etching of cell atlases with an even greater definition.
- Du, J. et al. S100B is selectively expressed by gray matter protoplasmic astrocytes and myelinating oligodendrocytes in the developing CNS. Mol. Brain 14, 154 (2021).
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Organic Chemistry (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Physics & Mathematics (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Plant Pathology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Immunology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Provided herein is a method comprising: obtaining a sample comprising fixed cells or fixed nuclei, hybridizing oligonucleotide probes to RNA in the fixed cells or fixed nuclei, to produce labeled cells or labeled nuclei, enriching for a sub-population of the fixed cells or fixed nuclei based on their labeling, by flow cytometry, treating the enriched cells or enriched nuclei with a double-stranded DNAse for a sufficient time to degrade the oligonucleotide probes, inactivating the double-stranded DNAse, and performing single cell RNA analysis on the cells or nuclei.
Description
PROGRAMMABLE ENRICHMENT VIA RNA FISH FOR SINGLE-CELL RNA ANALYSIS
GOVERNMENT SUPPORT
This invention was made with Government support under contracts R00HG012579 and UM1HG012076 awarded by the National Institutes of Health. The Government has certain rights in the invention.
CROSS-REFERENCING
This application claims the benefit of U.S. provisional application serial no. 63/569,677, filed on March 25, 2024, which application is incorporated by reference herein.
BACKGROUND
Through continued efforts to define a Human Cell Atlas, the application of single-cell genomics technologies, particularly scRNA-seq, has identified many rare cell states that may contribute to the etiology of complex diseases. For example, CFTR+ pulmonary ionocytes occur in human lung epithelial at a frequency of 1 in 200 cells and likely mediate the pathogenesis of cystic fibrosis. Similarly, enteric neurons occur at a frequency of 1 in 300 nuclei in the colon and AXL+ SIGLEC6+ dendritic cells (AS DCs) occur at a frequency of 1 in 5,000 peripheral blood mononuclear cells (PBMCs). More recently, a rare population of HHV-6+, CAR+ T cells that can occur at a frequency of 1 in 10,000 cells in infusion products was identified, which may contribute to the etiology of HHV-6 encephalitis in patients receiving cell therapies. Though these anecdotes represent diverse populations and tissue types, the conceptual mode of discovery for these populations has been consistent: the profiles of ~105-107 cells were generated, yielding ~ 101- 103 events of interest. As a consequence of tremendous resources being used to define very rare but consequential populations of cells, there is limited power for downstream analyses, including the identification of transcriptional heterogeneity within these populations, inference of additional marker genes, and analyses of gene regulatory networks.
Even after these populations arc identified via scRNA-seq, challenges persist in isolating populations defined by marker transcripts for further characterization. When possible, a frequently used approach is the enrichment or depletion of cells expressing specific surface
proteins via fluorescence-activated cell sorting (FACS). However, as these rare populations are defined through transcriptomics analyses, analogous surface proteins may not be well- established, such as HHV-6+ CAR T cells that are currently defined by the presence of viral RNA. Further, many sample preparations require nuclei dissociation steps from frozen or formalin-fixed paraffin-embedded (FFPE) tissue samples that eliminate the possibility of enriching or depleting based on any non-nuclear protein. Though the profiling of intranuclear proteins and scRNA-seq has been recently demonstrated, high-quality antibodies recognizing transcription factors can be inaccessible due to a lack of highly structured antigens available for targeting. These limitations motivate an approach that enriches for either cells or nuclei using individual or arbitrary combinations of RNA markers upstream of additional genomics profiling via scRNA-seq.
The present disclosure addresses this problem.
SUMMARY
The present disclosure describes, among other things, a method referred to as “Programmable Enrichment via RNA Flow-FISH by sequencing” (or “PERF-seq” for short). In this method, RNA(s) are labeled using fluorescence in situ hybridization (FISH), and a subset of the cells are selected by flow cytometry and then analyzed via scRNA-seq. The broad applicability of PERF-seq to enrich immune cell subsets using individual and combinations of RNA markers, including the mRNA of transcription factors is demonstrated. Further, the compatibility of this protocol with nuclei extracted from frozen or FFPE tissue samples is shown. This approach enables an efficient enrichment and high-throughput profiling of cells and nuclei populations of interest using logic-gated sorting across heterogeneous cell and tissue types. The method was implemented using single-cell RNA sequencing (scRNA-seq). However, the general approach could be used for other single cell RNA analysis methods (e.g., methods that are based on hybridization of probes to RNA in the and then analyzing the hybridized probes or ligation products of the same).
In some embodiments, the method may comprise obtaining a sample comprising fixed cells or fixed nuclei, hybridizing oligonucleotide probes to RNA in the fixed cells or fixed nuclei to produce labeled cells or labeled nuclei, enriching for a sub-population of the fixed cells or fixed nuclei based on their labeling by flow cytometry, treating the enriched cells or enriched
nuclei with a double- stranded DNAse for a sufficient time to degrade the oligonucleotide probes, inactivating the double-stranded DNAse, and performing single cell RNA analysis on the cells.
Other aspects and embodiments are described in greater detail below.
BRIEF DESCRIPTION OF THE FIGURES
The skilled artisan will understand that the drawings, described below, are for illustration purposes only. The drawings are not intended to limit the scope of the present teachings in any way.
Figs. 1A-1F. Rationale and development of PERF-seq. Fig. 1A: Overall schematic of the PERF-scq assay. Target RNA(s) arc bound by initiator probes. Hairpin amplifiers generate fluorescent signal and enable fluorescence-activated cell sorting (FACS) before single-cell profiling with the droplet-based scRNA-seq Flex kit. Fig. IB: Top left: Knee plot of cells profiled with standard Flex versus HCR-FISH sorted cells. Top right: Summary of fully mapping (blue) or half-mapping (grey) to the reference probe set. Bottom left: Bioanalyzer traces highlighting the expected product size of the full probe (blue) and half probe (grey) for a high- quality Flex library. Bottom right: Same as bottom left but for the FlowFISH -> Flex vO experiment. Fig. 1C: Summary of experiments identifying the HCR polymer as the corrupting agent for data quality. Fig. ID: Conditions screened for polymer stripping, including DNase and formamide. Fig. IE: Summary of conditions analyzed for sorting buffer to improve data quality. Fig. IF: Overall summary of UMIs (top) and genes (bottom) detected per cell comparing initial FlowFISH — > Flex vO experiment, from (Fig. IB) to the PERF-seq library, L, from panel (Fig. IE). The median values for each metric and each library are reported.
Figs. 2A-2F. Benchmarking of PERF-seq. Fig. 2A: Schematic of ACTB staining. PBMCs were isolated, stained, and analyzed for ACTB expression. Fig. 2B: Benchmarking of IncRNA XZSTby cell line mixing. A serial dilution of K562s (XX / XIST+) into Rajis (XY I XIST) was performed and XIST FlowFISH was assessed to recover the positive population. Fig. 2C: Schematic of PERF-seq benchmarking experiment for four libraries, including standard Flex and variable probe staining/sorting. Fig. 2D: Flow sort strategy for CD3E+ cells for the PERF- seq library. Fig. 2E: The proportion of cells from the sorted PERF-seq library annotated as T cells using three different computational methods for classification. Fig. 2F: Downsampling
analysis for library saturation and UMI benchmarking. The dotted line represents the mean reads per cell for a final comparison (depth of lowest sample).
Figs. 3A-3G. Enrichment of cells via combinatorial logic. Fig. 3A: Schematic of experimental design. Probes targeting three genes with three distinct fluorophores stained PBMCs from a health donor. Fig. 3B: FlowFISH signal and sort gates. Percentages represent the overall fraction of events sorted in each gate. Fig. 3C: Reduced dimensionality representation of four populations profiled with PERF-seq. Colors represent gates drawn from the FlowFISH sort in (Fig. 3B). Fig. 3D: Percent of high-quality cells from PERF-seq assigned to expected cell types using three distinct annotation methods. Colors represent gates drawn from the FlowFISH sort in (Fig. 3B). Fig. 3E: Annotation of relevant marker genes for populations in redacted dimensionality space, including genes used in the FlowFISH panel. Fig. 3F: Differential gene expression (DGE) analysis comparing CD4+ and CD4~ populations from the CZUE+sort (panel b, right). Genes corroborating annotation are highlighted. Fig. 3G: Subclustering of CD3E+ICD4+ cells, highlighting rare subclusters marked by relevant genes.
Figs. 4A-4K. Rare cell states enriched via transcriptional regulators. Fig. 4A: Schematic of experiment. Human PBMCs were stained with probes targeting BCL11A and SPI1. Fig. 4B: Reduced dimensionality representation of PERF-seq profiles from three populations based on TF FlowFISH sorting logic. Fig. 4C: Depiction of marker gene expression across all PERF-seq profiled cells. Fig. 4D: Empirical cumulative distribution plot of raw UMI count for BCL11A (left) and SPI1 (right) stratified by the captured PERF-seq library. Fig. 4E: Annotated cell states from PERF-seq profiling. Fig. 4F: Proportions of each cell type per library with major cell types labeled. Colors match (Fig. 4E). Fig. 4G: Relative enrichment of each cell type in either the BCL11A+ sort (x-axis) or SPI1+ sort (y-axis) relative to the negative population. AX LA, S1GLEC6+ dendritic cells (AS DCs) are highlighted as the only enriched population in both sorted populations. Colors match (Fig. 4E). Fig. 4H: Reduced representation of AS DCs highlighting the TF FlowFISH library and defining marker gene expression. Fig. 41: Volcano plot comparing differentially expressed genes from the two FlowFISH sorted populations. Notable marker genes are highlighted, including known and newly characterized marker genes for AS DC subsets. Fig. 4J: Violin plots of marker genes, stratified by the FlowFISH library. All genes were significantly differentially expressed at a false discovery rate (FDR) <0.01. Fig. 4K:
Gene-gene correlations of all AS DCs using the original dataset, highlighting the co-occurrence of transcription factors from the analysis with established marker genes for the AS DC subsets.
Figs. 5A-5N. Study of rare nuclei from fresh and FFPE tissue. Fig. 5A: Schematic of nuclei PERF-seq experiments. Nuclei were isolated from either frozen mouse brain tissue or FFPE human glioblastoma multiform (GBM tissue) and enriched for specific populations based on HCR-FlowFISH, showing the sort strategy. Fig. 5B: Downsampling analysis for library saturation and UMI benchmarking for the mouse brain nuclei. The dotted line represents the mean reads per cell for a final comparison (depth of lowest sample). Fig. 5C: Same as (Fig. 5B) but for the human FFPE tissue sample. Fig. 5D: Reduced dimensionality representation of the mouse brain nuclei FlowFISH cnrichcd/dcplctcd populations profiled with PERF-seq. Fig. 5E: Same as (Fig. 5D) but colored by Mobp marker gene expression. The boxed population was further subclustered. Fig. 5F: Empirical cumulative distribution plot of raw UMI count for Mobp stratified by the captured PERF-seq library. Fig. 5G: Sub-clustering of the Mobp+ population with top marker genes per cluster noted where arrows highlight these populations. Fig. 5H: Reduced dimensionality representation of the human FFPE nuclei FlowFISH enriched/depleted populations profiled with PERF-seq. Fig. 51: Same as (Fig. 5D) but colored by marker genes used in the FlowFISH panel. The boxed population was further subclustered. Fig 5 J: Empirical cumulative distribution plot of total UMI count for the sum of the three genes enriched via FlowFISH, stratified by the captured PERF-seq library. Fig. 5K: Top differentially expressed genes between the two FFPE populations profiled with PERF-seq. Fig. 5L: Gene-gene correlations of relevant marker genes, including those used in the FlowFISH enrichment panel. Fig. 5M: Sub-clustering of the Panel+ population with cluster states noted. Fig. 5N: Top marker genes enriched in specific sub-clusters; arrows indicate critical populations where each gene is highly expressed.
Figs. 6A-6F. Supporting analyses of PERF-seq development. Fig. 6A: Schematic overview of lOx Flex workflow, including probe hybridization to cells upstream of chromium and bead oligo extension of the ligation product. (Fig. 6B: Representative bioanalyzer trace outlining complete versus incomplete sequencing molecules. The Fig. 6C: Comparison of FlowFISH signal using cither unstained cells or the hairpin only in comparison to the sorted MS4A1 positive population and/or stripped population. Fig. 6D: Same as in (Fig. 6C) but for the CD3E gene. Fig. 6E: Bioanalyzer traces for representative libraries from panels in Fig. 1,
highlighting half- and fully- mapped probes. Fig. 6F : Bioanalyzer traces of library preparation where the full PERF-seq workflow was completed except for the omission of the dsDNAse stripping step.
Figs. 7A-7F. Profiling somatic mosaicism with PERFF-seq. Fig. 7A: Schematic of experiment. PBMCs from donors of different ages were sorted for a ten-gene OR-gated panel of MSY. Fig. 7B: Mean per-cell expression of all genes detected in Flex with genes analyzed for FlowFISH noted. Fig. 7C: Summary of the percentage of MSY- cells, with donor age labels, from the FlowFISH cytometry data. Fig. 7D: UMAP embedding of PERFF-seq profiles from the 51-year-old donor based on MSY sorting logic. Fig. 7E: Analyses of cell types from scRNA-scq analyses for cell types enriched (left) or depleted (right) in the MSY- library. The colors represent cell types as shown in Fig. 7C. Fig. 7F: Gene set enrichment analyses of MSY- versus MSY+ CD 14 monocytes, highlighting TNF signaling by NF-KB. Statistical significance is based on a permutated enrichment score under a two-sided null, prolif., proliferative.
Fig. 8: Summary of BCL11A RNA expression across populations. Bulk RNA-seq of sorted populations of BCL11A20. Design and results of cytometry analysis of PBMCs co-stained with BCL11A mRNA (via HCR-FISH) and CD19 and CD123 protein (via antibodies). Mean fluorescence intensity (MFI) for BCL11 A of each population is quantified.
Fig. 9 Design and results of antibody and HCR FISH co-staining to evaluate CD4 RNA expression. Summary of CD4 HCR FISH signal, stratified by CD3E populations. Bulk RNA-seq expression of CD4 from FACS-isolated populations. Design and results of antibody and HCR FISH co-staining to evaluate CD4 RNA expression.
DETAILED DESCRIPTION
Unless defined otherwise herein, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials arc described.
All patents and publications, including all sequences disclosed within such patents and publications, referred to herein are expressly incorporated by reference.
Numeric ranges are inclusive of the numbers defining the range. Unless otherwise indicated, nucleic acids are written left to right in 5' to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.
The headings provided herein are not limitations of the various aspects or embodiments of the invention. Accordingly, the terms defined immediately below are more fully defined by reference to the specification as a whole.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Singleton, et al., DICTIONARY OF MICROBIOLOGY AND MOLECULAR BIOLOGY, 2D ED., John Wiley and Sons, New York (1994), and Hale & Markham, THE HARPER COLLINS DICTIONARY OF BIOLOGY, Harper Perennial, N.Y. (1991) provide one of skill with the general meaning of many of the terms used herein. Still, certain terms are defined below for the sake of clarity and ease of reference.
It must be noted that, as used in this specification and the appended claims, the singular forms "a," "an" and "the" include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to "a cross-linker" includes two or more cross-linkers.
As noted above, in some embodiments the method may comprise obtaining a sample comprising fixed cells or fixed nuclei. This may be done by treating a sample comprising single cells or nuclei with a chemical cross-linker (e.g., paraformaldehyde or glutaraldehyde) or isolating fixed nuclei from a fixed tissue sample (e.g., a section of tissue that has been fixed using paraformaldehyde or glutaraldehyde). The next steps of the method may comprise hybridizing oligonucleotide probes to RNA in the fixed cells or fixed nuclei to produce labeled cells or labeled nuclei, enriching for a sub-population of the fixed cells or fixed nuclei based on their labeling by flow cytometry, treating the enriched cells or enriched nuclei with a doublestranded DNAse (or another enzyme or treatment that is capable of degrading double- stranded DNAs) for a sufficient time to degrade the oligonucleotide probes, inactivating the doublestranded DNAse, and performing single cell RNA analysis on the cells or nuclei.
In these embodiments, the double-stranded DNAase should be DNA-spccific and should specifically degrade double- stranded DNA molecules over single- stranded DNA molecules (meaning that the activity of the enzyme on a single-stranded DNA substrate should be less than
1%, less than 0.5%, or less than 0.1% of the activity of the enzyme on a double-stranded DNA substrate). Such enzymes, which are referred to as dsDNAses or “duplex DNAses” can be purchased from a variety of vendors, including Thermofisher (Waltham, MA) VWR (Radnor, PA) and New England Biolabs (Ipswich, MA), among many others. As many of the commercial enzymes are, the enzyme may be heat labile so that it can be readily inactivated by moderate heat treatment (e.g., by incubation at 55°C for at least 10 minutes). In such embodiments, the double stranded DNAse may be inactivated, e.g., by exposure to a temperature in the range of 50 °C to 75 °C (e.g., 50-60 °C) and/or by addition of a chelating agent (e.g., EDTA) and/or a reducing agent (e.g., DDT), before analyzing the RNA (e.g., by making cDNA or hybridizing probes).
In any embodiment, the method is done without uncrosslinking the enriched cells or enriched nuclei, and without denaturing the oligonucleotide probes from the RNA. For example, the method may be done in the absence of a step in which formamide, DMSO, or another chemical denaturant is added to the sample to denature the oligonucleotide probes from the RNA.
In any embodiment, the method may further comprise permeabilizing the cells or nuclei using a detergent (e.g. Tween). As such, in some embodiments, the method may comprise treating a sample comprising single cells or nuclei with a chemical cross-linker (e.g., paraformaldehyde or glutaraldehyde) and a detergent such as Tween to produce the fixed cells or fixed nuclei, where the fixed cells or fixed nuclei are permeabilized.
Flow cytometry methods that enrich for cells are well known in the art and are commonly referred to as fluorescence activated cell sorting or FACS. In the present method, cellular RNA is labeled in situ (i.e., within the cells or nuclei) and cells that have a particular labeling pattern are enriched (where a labeling pattern can be, e.g., the presence of a particular transcript, the amount of a particular transcript, the presence of a particular combination of transcripts, the lack of one transcript and the presence of another, in a cell etc.). This technique is commonly referred to as “FISH-Flow”, “Flow-FISH” or “FlowFISH”in the art and, as described above, comprises hybridizing fluorescently labeled oligonucleotides to mRNA in fixed cells or nuclei in situ and then selecting for a sub-population of the cells using a flow cytometry-based sorting. Such methods arc generally described in, e.g., Arrigucci ct al (Nat Protoc. 12: 1245-1260), Frccn-van Heeren et al (BioTech 2021, 10, 21), Grau-Exposito et al (mBio 2017 8: e00876-17), Hanley et al (PEoS ONE 8(2): e57002), Reilly et al (Nature Genetics 2021 53: 1166-1176) and Bushkin et
al (J Immunol. 2015 194: 836-841) among many others and can be readily adapted herein. In some embodiments, the flow cytometry may detect single RNA molecules, methods for which are described in Smith et al (ACS Nano. 2020 14: 2324-2335), Yoo et al (Anal. Chem. 2022 94: 1752-175) Femino (Science 1998 280: 585-90), Choi et al (Development 2018 145: devl65753) and Raj et al (Nat. Methods 2008 5: 877-9) among many others. In some embodiments, at least 500 cells or nuclei (e.g., at least 1,000, at least 5,000, at least 10,000, or at least 50,000 cells or nuclei are enriched. This method can be used to select rare cells, e.g., hematopoietic or non- hematopoietic cells that are present at a low concentration (e.g., less than 1%, less than 0.5%, less than 0.1%, less than 0.05%, or less than 0.01% in the initial population). As such, in some embodiments, less than 10% (e.g., less than 5%, less than 1%, less than 0.5%, less than 0.1%, less than 0.05%, or less than 0.01%) of the fixed cells or fixed nuclei may be enriched.
Any suitable system may be used in the labeling step of the method. For example, the RNA may be labeled using Z-probes (see. e.g., Tripath et al Noncoding RNA 2018 4: 20), branched DNA (see, e.g., Wang et al J Mol Diagn. 2012 14: 22-29, Player et al J Histochem Cytochem 2001 49: 603-12), SABER (See, e.g., Kish et al Nature Methods 2019 16: 533-544) or by hybridization chain reaction (see. e.g., Schwarzkopf et al Development. 2021 148: devl99847 and Nat. Biotechnol. 2018 10.1038/nbt.4286). In both the Z-probe approach and the hybridization chain reaction approach, the oligonucleotide probes comprise a pair of unlabeled initiator probes that hybridize to adjacent sites in a target RNA and fluorescently labeled amplification probes that hybridize to a pair of initiator probes when the initiator probes are hybridized to their target RNA. In the hybridization chain reaction approach, the fluorescently labeled amplification probes are designed to hybridize to one another as well as to the initiator probes, thereby forming a complex comprising multiple amplification probes when the initiator probes hybridize to their target RNA. In these embodiments, the amplification probes may comprise a hairpin structure. Further details of hybridization chain reaction approaches can be found in, e.g., Choi et al (Nature Biotechnology 201028:1208-12), Yamaguchi et al (Environmental Microbiology 2015 17: 2532-2541), Schulte et al (Development 2024 151:dev202307), Zheng et al (Anal Methods. 2023 15: 1422-1430), Choi et al (Development 2018 145: dcvl65753) and Choi et al ACS Nano 2014 8: 4284-4294, among many others. The FISH protocol used in the present methods may have a nucleic acid amplification step. In other embodiments, the FISH protocol used in the present method does not involve nucleic acid
amplification step (i.e., is free of nucleic acid amplification). FISH methods that are capable of resolving single molecules and are referred to as “single-molecule FISH” or smFISH methods. Any smFISH methods may be employed herein.
In addition to being labeled by FISH, the cells may be additionally labeled and enriched in other ways (e.g., using antibodies that bind to cell surface antigens or intracellar antigens), thereby providing a multomics-based way to enrich for cells of interest.
The single-cell RNA analysis can be performed by a variety of methods, some of which may employ a single-cell compartmentalization approach and others of which may employ a split- and-pool barcoding approach, examples of which are described below.
In some embodiments, the single cell RNA analysis may be done using a single-cell compartmentalization method that comprises: i) compartmentalizing the cells or nuclei (e.g., in an emulsion, droplets, wells, or other containers), wherein at least some compartments receive a single cell or nucleus; (ii) making cell-specifically barcoded cDNA from the cells or nuclei in the compartments; and (iii) sequencing the cell-specifically barcoded cDNA. This method is an scRNA-seq approach. In these embodiments, at least some of the compartments may additionally comprise a bead that comprises a cell-specifically barcoded reverse transcription primer (i.e., primer molecules that are tethered to or embedded in the bead), where the primers associated with each bead have a unique barcode (i.e., a barcode that distinguishes the beads from one another). In these embodiments, the cell-specifically barcoded reverse transcription primer may be an oligo(dT) or random primer and, in addition, may comprise a unique molecular identifier (UMI), e.g., a ‘random’ sequence. In these embodiments, the method may comprise: (i) releasing the primer from the beads, allowing the released primer to hybridize to RNA, and extending the primer using a reverse transcriptase, (ii) lysing the cells or nuclei, allowing the released RNA to hybridize to the primer, and extending the primer using a reverse transcriptase on the beads, or (iii) releasing the primer from the beads, lysing the cells, allowing the released primer and released RNA to hybridize, and extending the primer using a reverse transcriptase. Examples of such single-cell compartmentalization methods are described in De Simone et al (Methods Mol. Biol. 2019 1979: 87-110), Gao et al (Curr. Genomics 202021: 602-609), Kraus et al (I. Immunol. Methods 2022 502: 113227) and See et al (Front Immunol. 2018 9: 2425) among many others. In some embodiments, the single cells are encapsulated into a gel-beads-in- emulsion (GEMs). In this method, each functional GEM contains a single cell, a single gel bead,
and reverse transcription reagents. On the gel bead, olignonucleotide primers are composed of 4 distinct parts (a PCR primer sequence (essential for the sequencing), a bead-specific barcode (which becomes the single cell barcode), a unique molecular identifier (UMI) sequence and, at the 3’ end a oligo(dT) sequence (that enables captures poly-adenylated mRNA molecules), a pseudo-random sequence, a random sequence, a gene-specific sequence, etc. .Within each GEM reaction vesicle, a single cell is lysed and may undergoreverse transcription. cDNA from the same cell are identified thanks to the barcode. In addition, the gene expression level of each gene can be determined using the UMIs.
In other embodiments, the single cell RNA analysis may done using a single-cell compartmentalization method that comprises: (i) hybridizing probes to RNA in the cells or nuclei; (ii) compartmentalizing the cells or nuclei, wherein at least some compartments receive a single cell; (iii) adding single cell barcodes to the hybridized probes, or ligation products thereof, in the compartments; and (iv) sequencing the cell- specifically barcoded probes, or ligation products thereof. In some cases, pairs of probes may bind to adjacent sites in the RNA and may be ligated together in an oligonucleotide ligation assay (OLA) (see, e.g., Credle et al Nucleic Acids Res. 2017 45: el28). Again, at least some of the compartments may additionally comprise a bead that comprises a pair of primers, at least one of which may be cell- specifically barcoded. In these embodiments, one, the other or both of the primers may contain a cell specific barcode, a unique molecular identifier (UMI), e.g., a ‘random’ sequence and sequences that hybridize to the probes at the 3’ end, so that the hybridized probes, or ligation products thereof can be amplified by the primers. In these embodiments, the method may comprise: (i) releasing the primers from the beads, allowing the released primer to hybridize to the probes or ligation products of the same, and copying the probes or ligation products by primer extension. A limited number of rounds of amplification may be performed in the compartments. This part of the method can be implemented using lOx Genomics’ FLEX platform. Details of an example of this method can be found in Janesick et al (Nature Communications 2023 14: 8353) Specifically, after dsDNase digestion, the inclusion of the lOx Flex WTA probes proceeds as is standard for the workflow before droplet encapsulation on the lOx Genomics Chromium platform. Downstream amplification and sequencing follow the standard Flex guidelines with no modifications. In brief, this workflow uses a bead oligo to barcode a ligated probe junction pair from the WTA probe set to barcode mRNA molecules for gene expression counts. These barcoding events occur inside a
standard droplet microfluidic workflow for single-cell sequencing before the oil emulsion droplets are broken, and per-cell nuclei acid sequences are amplified in a bulk PCR reaction.
In other embodiments, the single cell RNA analysis is done using a split-and-pool barcoding method that may comprise: (i) making cDNA in the cells; (ii) compartmentalizing the cells, wherein at least some compailments receive multiple cells; and (iii) adding cell-specific barcodes to the cDNA in the cells using a split-and-pool barcoding method. In very general terms, split-and-pool barcoding methods involve partitioning a sample containing cells or nuclei into several compartments, where the compartments receive multiple cells or nuclei, adding a different building block (or “subunit”) for the cell-specific barcode to each partition, pooling the sample, then repeating the partitioning, addition and pooling steps until a sufficient number of subunits have been added and the cells or nucleic in the sample are uniquely indexed.
Alternatively, the single cell RNA analysis is done using a split-and-pool barcoding method comprising: (i) hybridizing probes to RNA in the cells or nuclei; (ii) compartmentalizing the cells or nuclei, wherein at least some compartments receive multiple cells; (iii) adding single cell barcodes to the hybridized probes, or ligation products thereof, using a split- and-pool- barcoding method; and (iv) sequencing the cell- specifically barcoded probes, or ligation products thereof. As described above, pairs of probes may bind to adjacent sites in the RNA and may be ligated together in an oligonucleotide ligation assay (OLA)
The general principles of how indexing can be done using the split-and-pool approach (or “combinatorial barcoding” or “combinatorial indexing” as it is sometimes called) are described in a variety of publications including Kuchina et al (Science 2021 371:eaba5257), O’Huallachain et al (Commun. Biol. 2020 3: 279), Cao et al (Science 2017 357: 661-667), Rosenberg (Science 2018 360: 176-182) and WO2012106385A2, among many others. scRNA-seq methods of interest include, but are not limited to, Tang (Tang et al, Nature Methods 2009 6: 377-382), STRT (Islam et al, Genome Research 2011 21: 1160-1167), SMART-seq (Ramskbld et al, Nature Biotechnology 201230: 777-782), SORT-seq (Muraro et al, Cell Systems 2016 3: 385-394.e3), CEL-seq (Hashimshony et al Cell Reports 20122: 666- 673), RAGE-seq (Singh et al Nature Communications 2019 10: 3120), Quartz-seq (Sasagawa Genome Biology 2013 14: R31), Cl-CAGE (Kouno et al Nature Communications 2019 10: 360), REAP-seq (Dal Molin et al Briefings in Bioinformatics 2019 20: 1384-1394), and CITE- seq (Peterson et al Nature Biotechnology 2017 35: 936-939) among many others.
In addition to performing an analysis of RNA (e.g., mRNA) in the cells, other components of the cells, e.g., open chromatin, genomic DNA or protein expression may be assayed in the same assay. In some of these embodiments, the method further comprises assaying the expression of a protein in the fixed cells or fixed nuclei (e.g., via binding of labeled antibodies), and the enrichment step may comprise enriching for a sub-population of the fixed cells or fixed nuclei based on their labeling and protein expression, by flow cytometry.
However the cell-specifically barcoded cDNA is made, the method may comprise amplifying the products (e.g., cDNA or ligation products) en masse prior to sequencing.
As would be apparent, sequence reads produced from the method can be analyzed to provide gene expression profiles for individual cells, i.c., an analysis of the “transcriptomc” of individual cells, i.e., which mRNAs are expressed by the cells and their abundance. For example, the expression of at least 100, at least 500, at least 1,000, at least 5,000 or at least 10,000 genes may be analyzed in at least 100, at least 500, at least 1,000, at least 5,000 or at least 10,000 individual cells or nuclei.
In any embodiment, the sample may comprise cells that are grown as a cell suspension, disassociated cells, or blood cells, or nuclei isolated from the same, for example. For example, the sample may contain cells that are in solution, e.g., cultured cells that have been grown as a cell suspension, or nuclei isolated from the same. In other embodiments, disassociated cells (which cells may have been produced by disassociating cultured cells or cells that are in a solid tissue, e.g., a soft tissue such as liver or spleen, etc. using trypsin or the like) or nuclei from the same may be used. In particular embodiments, the sample may contain blood cells, e.g., whole blood or a sub-population of cells thereof, or nuclei from the same. Sub-populations of cells in whole blood include platelets, red blood cells (erythrocytes), platelets and white blood cells (i.e., peripheral blood leukocytes, which are made up of neutrophils, lymphocytes, eosinophils, basophils, and monocytes). The cells can be from any source. The cells may be obtained from a culture of cells, e.g., a cell line. In other cases, the cells may be isolated from an individual (e.g., a patient or the like). The cells may be isolated from a soft tissue or from a bodily fluid, or from a cell culture that is grown in vitro. For example, the single cells may be isolated by digesting a soft tissue such as brain, adrenal gland, skin, lung, spleen, kidney, liver, spleen, lymph node, bone marrow, bladder stomach, small intestine, large intestine or muscle, etc. Bodily fluids include blood, plasma, saliva, mucous, phlegm, cerebral spinal fluid, pleural fluid, tears, lactal
duct fluid, lymph, sputum, cerebrospinal fluid, synovial fluid, urine, amniotic fluid, and semen, etc. As noted above, nuclei can be purified from the cells. In some embodiments, nuclei can be obtained from a sample of tissue that has previously been fixed.
Cells and nuclei from yeast, plants and animals, such as fish, birds, reptiles, amphibians and mammals may be analyzed using the subject methods. In certain embodiments, mammalian cells or nuclei, i.e., cells or nuclei from mice, rabbits, primates, or humans, or cultured derivatives thereof, may be used.
In some embodiments, the method can be used to compare two samples. In these embodiments, the method may comprise analyzing a first population of cells using the abovedescribed method to produce a first data set; and analyzing a second population of cells using the above-described method to produce a second data set; and comparing the first data set to the second data set, e.g., to see if there are any changes in RNA expression between the two samples.
In some embodiments, the first population of cells and the second population of cells are collected from the same individual at different times. In other embodiments, the first population of cells and the second population of cells are different populations of cells collected from tissues or different individuals.
Exemplary cell types that can be analyzed in the method include, for example, cells isolated from a tissue biopsy (e.g., from a tissue having a disease such as colon, breast, prostate, lung, skin cancer, or infected with a pathogen etc.) and normal cells from the same tissue, e.g., from the same patient; cells grown in tissue culture that are immortal (e.g., cells with a proliferative mutation or an immortalizing transgene), infected with a pathogen, or treated (e.g., with environmental or chemical agents such as peptides, hormones, altered temperature, growth condition, physical stress, cellular transformation, etc.), and normal cells (e.g., cells that are otherwise identical to the experimental cells except that they are not immortalized, infected, or treated, etc.); cells isolated from a mammal with a cancer, a disease, a geriatric mammal, or a mammal exposed to a condition, and cells from a mammal of the same species, e.g., from the same family, that is healthy or young; and differentiated cells and non-differentiated cells from the same mammal (e.g., one cell being the progenitor of the other in a mammal, for example). In one embodiment, cells of different types, e.g., neuronal and non-ncuronal cells, or cells of different status (e.g., before and after a stimulus on the cells) may be compared. In another embodiment, the experimental material is cells susceptible to infection by a pathogen such as a
virus, e.g., human immunodeficiency virus (HIV), etc., and the control material is cells resistant to infection by the pathogen. In another embodiment of the invention, the sample pair is represented by undifferentiated cells, e.g., stem cells, and differentiated cells.
In some exemplary embodiments, the method may be used to identify the effect of a test agent, e.g., a drug, or to determine if there are differences in the effect of two or more different test agents. In these embodiments, two or more identical populations of cells may be prepared and, depending on how the experiment is to be performed, one or more of the populations of cells may be incubated with the test agent for a defined period of time. After incubation with the test agent, gene expression of the populations of cells can be analyzed using the methods set forth above, and the results can be compared. In a particular embodiment, the cells may be blood cells, and the cells can be incubated with the test agent ex vivo. These methods can be used to determine the mode of action of a test agent.
The method described above may also be used as a diagnostic (which term is intended to include methods that provide a diagnosis as well as methods that provide a prognosis). These methods may comprise, e.g., analyzing the transcriptome of a subset of cells from a patient using the method described above to produce data; and providing a diagnosis or prognosis based on the data.
The method set forth herein may be used to provide a reliable diagnostic to any condition associated with, e.g., altered gene expression. The method can be applied to the characterization, classification, differentiation, grading, staging, diagnosis, or prognosis of a condition characterized by an epigenetic pattern (e.g., a pattern of gene expression). For example, the method can be used to determine whether the pattern of labeling of a sample from an individual suspected of being affected by a disease or condition is the same or different compared to a pattern of labeling for a sample that is considered “normal” with respect to the disease or condition. In particular embodiments, the method can be directed to diagnosing an individual with a condition that is characterized by expression pattern, where the pattern is correlated with the condition. The methods set forth herein can also be used for predicting the susceptibility of an individual to a condition.
Exemplary conditions that arc suitable for analysis using the methods set forth herein can be, for example, cell proliferative disorder or predisposition to cell proliferative disorder; metabolic malfunction or disorder; immune malfunction, damage or disorder; CNS malfunction,
damage or disease; symptoms of aggression or behavioral disturbance; clinical, psychological and social consequences of brain damage; psychotic disturbance and personality disorder; dementia or associated syndrome; cardiovascular disease, malfunction and damage; malfunction, damage or disease of the gastrointestinal tract; malfunction, damage or disease of the respiratory system; lesion, inflammation, infection, immunity and/or convalescence; malfunction, damage or disease of the body as an abnormality in the development process; malfunction, damage or disease of the skin, the muscles, the connective tissue or the bones; endocrine and metabolic malfunction, damage or disease; headache or sexual malfunction, and combinations thereof.
In some embodiments, the method can provide a prognosis, e.g., to determine if a patient is at risk for recurrence. Cancer recurrence is a concern relating to a variety of types of cancer. The prognostic method can be used to identify surgically treated patients likely to experience cancer recurrence so that they can be offered additional therapeutic options, including preoperative or postoperative adjuncts such as chemotherapy, radiation, biological modifiers and other suitable therapies. The methods are especially effective for determining the risk of metastasis in patients who demonstrate no measurable metastasis at the time of examination or surgery.
The method can also be used as a theranostic, i.e., to provide a recommendation for a course of treatment for a patient having a disease or condition, e.g., a patient that has cancer. A course of treatment refers to the therapeutic measures taken for a patient after diagnosis or after treatment. For example, a determination of the likelihood for recurrence, spread, or patient survival, can assist in determining whether a more conservative or more radical approach to therapy should be taken, or whether treatment modalities should be combined. For example, when cancer recurrence is likely, it can be advantageous to precede or follow surgical treatment with chemotherapy, radiation, immunotherapy, biological modifier therapy, gene therapy, vaccines, and the like, or adjust the span of time during which the patient is treated.
In a particular embodiment, a lab will receive a sample (e.g., blood) from a remote location (e.g., a physician’s office or hospital), the lab will analyze cells in the sample as described above to produce data, and the data may be forwarded to the remote location for analysis.
Kits
Kits comprising components for performing the method, as described above, are also provided. The components may be in separate containers or the same container, as needed.
The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are all or the only experiments performed. Efforts have been made to ensure accuracy with respect to numbers used (e.g. amounts, temperature, etc.) but some experimental errors and deviations should be accounted for. Unless indicated otherwise, parts arc parts by weight, molecular weight is weight average molecular weight, temperature is in degrees Centigrade, and pressure is at or near atmospheric.
EXAMPLES
The widespread application of single-cell genomics technologies has accelerated understanding of the breadth and depth of heterogeneity of cell states in diverse contexts. As single-cell RNA sequencing (scRNA-seq) has been the primary modality used for profiling, many populations have been described primarily based on the presence or absence of specific marker transcripts, which limits the isolation and further profiling of populations. To address this limitation, Programmable Enrichment via RNA Flow-FISH by sequencing (PERF-seq), a scalable assay that enables scRNA-seq profiles from subpopulations of complex cellular mixtures defined by the presence or absence of RNA transcripts, is used. Across vignettes of immune populations (n= 141,227 cells) as well as nuclei from fresh frozen and formalin-fixed paraffin-embedded brain tissue (/z=29,522), programmable logic to enrich for cell populations via RNA-based cytometry upstream of high-throughput scRNA-seq is demonstrated. Together, this approach provides a rational, programmable method for studying cell identities and transcriptional heterogeneity of rare populations identifiable by one or more marker transcripts.
Methods
PERF-seq method and development: First, enriching for populations via HCR- FlowFISH proceeds with minimal modifications to the protocol except for the use of RNase- inhibitor BSA buffer to preserve RNA quality. After the enriched populations are isolated, a brief
dsDNase step is used to degrade the HCR-polymer, which is critical for lOx Genomics library preparation. After dsDNase digestion, the inclusion of the lOx Flex probes proceeds as is standard for the workflow before droplet encapsulation on the lOx Genomics Chromium platform. Downstream amplification and sequencing follow the standard Flex guidelines with no modifications. Thus, the assay leverages the quality-controlled aspects of both workflows with minimal modifications, but it is emphasized that these modifications (dsDNase and RNase-free BSA) can severely limit data quality if left unaccounted. Complete details for each step are available from the manufacturer as well as the protocols. io link accompanying this manuscript.
Immune cell experiments: Protocol development and optimization were performed on cryoprcscrvcd peripheral blood mononuclear cells (PBMCs) were sourced from ATCC and AllCells. Vials were thawed and viability exceeded 90% for all samples. PBMCs were used as the primary input for developing the assay in Fig. 1 due to ease of material availability and well- defined heterogeneity for MS4A1 and CD3E. For applications in Fig. 2-4, the same vials were used but enriched for specific markers as indicated in the experimental overview schematics (Fig. 2c, 3a, 4a). All experiments started with ~10M cells, except for the TF sort experiment (Fig. 4), which began with ~25M cells to yield ample cell numbers for downstream profiling given the rare BCL11A population that was sorted.
Cell Fixation and permeabilization: Fixation and permeabilization was performed as described in “HCR RNA flow cytometry protocol for mammalian cells in suspension” provided by molecular instruments. Briefly, cells were thawed and fixed in 4% paraformaldehyde solution (4% paraformaldehyde in lx PBS and 0.1% Tween 20) at room temperature for 1 hour at a concentration of 1 million cells per mL (IM/mL). After fixation, cells were centrifuged at 350 xg for 5 minutes and resuspended in a PBST solution (lx PBS and 0.1% Tween 20) at IM/mL. This step was repeated once for a total of 2 washes. After washing, cells were permeabilized with ice-cold 70% EtOH overnight at IM/mL. After permeabilization, cells were centrifuged and resuspended with PBST solution at IM/mL twice.
HCR FlowFISH: Probes for RNA targets of interest and complementary hairpins were purchased from Molecular Instruments at the highest number of probe pairs available for genes of interest.
Most steps were performed as described in the “HCR RNA flow cytometry protocol for mammalian cells in suspension” provided by molecular instruments and Reilly et al 15 with the following adjustments:
First, all centrifugation was performed at 850 xg for 5 minutes unless otherwise noted. Second, low-binding plasticware tubes and RNase free molecular’ grade reagents were utilized when possible. Third, during the detection stage, it was found that optimal signal-to-noise ratio was achieved during fluorescence detection at 16nM probe concentration per 500,000 - 1 million cells (8uL of probe stock per sample). Finally, 370C incubations were performed in a heated lid thermomixer with gentle shaking.
HCR FlowFISH Detection Stage: Pcrmcabilizcd cclls/nuclci were resuspended in prewarmed 400uL of hybridization buffer (molecular instruments) per 500,000 to IM cells. Cells/nuclei were incubated for 30 minutes at 370C, 300rpm in a heated lid thermomixer. The probe solution was prepared by mixing 8uL of luM probe stock and hybridization buffer for a final lOOuL volume per sample. Probe solution was added to each sample for a final probe concentration of 16nM and cells were incubated at 37C for 16-24 hours. To pellet cells/nuclei, 500uL of SSCT solution (5X SSC, 0.1% Tween 20) was added to each sample and centrifuged at 850xg for 15 minutes. Cells/nuclei were resuspended in 500uL of prewarmed probe wash buffer (molecular instruments), incubated for 10 minutes at 370C, and subsequently pelleted at 850xg for 5 minutes. This step was repeated 3 more times for a total of 4 washes. Then, cells/nuclei were resuspended in 500uL of SSCT solution and incubated at room temperature for 5 minutes.
HCR FlowFISH Amplification Stage: Cells/nuclei were centrifuged and resuspended in 150uL of amplification buffer and incubated at room temperature for 30 minutes. In the meantime, 5uL of 3uM hl and h2 hairpin stock was aliquoted for each probe set and snap cooled by performing a heat shock at 950C for 90 seconds and cooling in the dark at room temperature for 30 minutes. To prepare the hairpin solution, snap-cooled hl and h2 hairpins were mixed with an amplification buffer to make a final volume of lOOuL per sample. The hairpin solution was added to appropriate samples for the final hairpin concentration of 60nM. Cells/nuclei were incubated at room temperature for 16-24 hours. (This time can be reduced to 4 hours). After incubating, samples were washed 6x with 500uL of SSCT for each sample.
Costaining with antibodies: PBMCs were thawed and stained with anti-CD123 antibody (Biolegend S18O16F; Extended Data Fig. 4n). Antibody staining was immediately followed by
fixation/permeabilization and HCRFlowFISH as described above. It was noted that surface antibodies conjugated to synthetic dyes result in the most robust signal when used in conjunction with the HCR-FlowFISH protocol.
Sample Enrichment using FACS: Samples were resuspended in sorting and collection buffer (lx PBS, 5% BSA-Gibco #15260037, 0.13U/uL RNAse inhibitor - Millipore Sigma #3335399001) and filtered through a 35um strainer. Cells were kept in dark and on ice until sorting. Collection tubes were prepared with 300uL of collection buffer. Cells were sorted using the BD FACS Aria™ III or FACSymphony S6 with 70um nozzle. Representative gating is shown where appropriate. For multiplexing experiments, compensation was performed with single-color controls.
HCR Polymer Disassembly: Sorted cells were pelleted and resuspended in 275uL of lx dsDNase buffer (Thermofisher #EN0771) and incubated for 15 minutes after which 25uL of dsDNase enzyme (Thermofisher #EN0771) was added and the sample was incubated at 37°C for 2 hours. After incubation, 3uL of IM DTT was added to the sample to quench dsDNAse activity and incubated at 55°C for 5 minutes for heat inactivation. Samples were pelleted at 850xg for 5 minutes, resuspended in 500uL of pre-warmed wash buffer, and incubated for 10 minutes. This step was repeated once for a total of 2 washes. Samples were then resuspended in 500uL of SSCT buffer and incubated for 5 minutes. Samples were centrifuged and resuspended in ImL of 0.5x PBS and 0.02% BSA (Thermofisher #AM2616) and 0.2u/uL RNAse inhibitor (Millipore Sigma #3335399001). Polymer Disassembly was assessed in FACS by measurement of fluorescence intensity on the BD FACS Aria™ III. lOx Genomics Flex: Preparation of all lOx Genomics Flex libraries were prepared using the manufacturer’s instructions as all modifications for PERF-seq happen upstream of Flex probe hybridization. All libraries were sequenced on an Illumina Nextseq 550, Novaseq 6000, or Novaseq X with standard dual-indexing and demultiplexing. Raw .bcl files were processed using CellRanger v7.2, and the resulting .fastq files were quantified for the human and mouse probe set to version 1.0.1 using default parameters for the CellRanger pipeline.
Cell line mixing and benchmarking: The XY Burkitt's lymphoma cell line (Rajis) and the XX Chronic Myelogenous Leukemia cell line K562 cells were obtained from ATCC. Raji cells were cultured in RPMI-1640, while K562 cells were cultured in Iscove's Modified Dulbecco's Medium (IMDM), both supplemented with 10% fetal bovine serum and 1%
Penicillin/Streptomycin. To benchmark the recovery of populations, cells were washed with PBS, centrifuged, and counted. Raji cells were subsequently mixed with decreasing 10-fold dilutions of K562 cells. The mixed cells were then fixed and permeabilized as described in the prior sections followed by the HCRFlowFISH protocol using ATSTRNA probes for detection, and Alexa Fluor 647-conjugated hairpins for amplification. FACS analyses were conducted on ThermoFisher Attune NxT Flow Cytometer.
Mouse Tissue Sourcing: Mouse brain tissue was sourced from Zyagen Inc. as a fresh frozen whole brain stored in OCT. Upon receipt tissue was stored at -80C. Dissection was performed in a cryotome and immediately processed for nuclei processing.
Mouse brain dissociation and profiling: Mouse Brain Dissociation was performed as described by lOx genomics in tissue fixation and dissociation protocol. Briefly, fresh frozen mouse cerebellum was weighed and fixed in 4% paraformaldehyde solution for 2 hours at 2mL per 25mg of tissue with periodic agitation. Then, the tissue was centrifuged and re-suspended in lx PBS twice. Washed tissue was resuspended in ice-cold 70% ethanol at 2mL per 25mg of tissue and incubated overnight at 4C. After incubation, the tissue was centrifuged and resuspended with lx PBS twice. Tissue was then resuspended in 2mL dissociation buffer (160uL LiberaseTL enzyme - Millipore Sigma #5401020001 + RPMI) using the gentleMACS OctoDissociator with heaters (Miltenyi Biotec # 130-096-427) for 30 minutes at 50rpm. Nuclei were washed with lx PBS + 0.02% BSA (Thermofisher #AM2616) + 0.2u/uL RNAse inhibitor and stained lul/ml DAPI(Thermofisher #62248) for 10 minutes. To remove excess debris, DAPI+ nuclei singlets were sorted using FACS before processing by HCRFlowFISH. Nuclei were either stored for future use according to lOx Genomics Recommendation or proceeded directly into HCRFlowFISH.
GBM FFPE dissociation: FFPE samples were preprocessed on a prototype S2 Singulator system. The sample was automatically processed In a NIC+ cartridge (S2 Genomics #100-215-389) by three 15 min deparaffinization steps (CitriSolv, VWR), rehydrated by successive 1 mL washes of 100%, 100%, 70%, 50%, and 30% ethanol, followed by 2 washes of PBS. The sample was then spun at 1,000g for 3 min and resuspended in 0.5 mL Nuclei Isolation Reagent (NIR, S2 Genomics, #100-063-396) with 0.1 ul/uL RNase inhibitor (Protector, Millipore Sigma, #3335399001); all subsequent solutions had RNase inhibitor. The sample was dissociated to single nuclei in a second NIC+ cartridge with 2 mL of NIR for 10 min followed by
a 2 mL wash with Nuclei Storage Reagent (NSR, S2 Genomics, #100-063-405). The single nuclei suspension was spun 500g for 5 min, resuspended in NSR, and counted.
Bioinformatics analyses overview: All bioinformatics analyses were conducted using standard output files from the execution of CellRanger to sequencing data of the Flex libraries. Downstream analyses, including cell filtering, marker gene analyses, and visualization, were 17 performed using Seurat v4 . In brief, cells were identified via a combination of passing the CellRanger knee plot as well as meeting minimum quality control standards, including at least 1,000 UMIs detected, 500 genes detected, and no more than 5% mitochondrial RNA abundance, which are standard thresholds for scRNA-seq analyses. For all sub-clustering analyses (Figs. 3G, 4H, 5G, 5M), cells were required to be both present in the enriched PERF-seq library as well as belonging to the Seurat cluster associated with the majority of the population. All differential expression and marker gene analyses were performed using the FindMarkers functionality in 17
Seurat . All custom code to reproduce all custom downstream analyses, including intermediate data files is available as part of an online repository.
Benchmarking analyses: To examine the loss of data quality in PERF-seq compared to analogous Flex libraries, all comparisons were made against gold-standard data generated and released by lOx Genomics. Saturation curves were drawn by downsampling the total reads in the library to 0.1%, 1%, 2.5%, 5%, 10%, 30%, 50%, 75%, and 100% of the total sequencing depth. Downsampling proceeded via the sampled function in R on the per-molecule vectors encoded in the _sample_molecule_info.h5 file from the CellRanger processing. To compare median per-cell UMI and gene counts, the read depth of the lowest library was selected in the comparison and every other library was downsampled to compare relevant statistics. Thus, these analyses are robust to differences in sequencing depth, UMI collapsing, and barcode correction (which occur within the CellRanger processing steps upstream of the h5 output).
Immune cell analyses: To benchmark the enrichment efficiency (Figs 2E, 3D), three distinct methods were utilized. First, reference projections with azimuth to a gold-standard PBMC atlas were utilized, noting the differences in chemistries between the reference and projections (reference: reverse-transcription-based; here: probe-based Flex chemistry). Qualitatively, the cell type assignments from azimuth were sensible but projection onto the two- dimensional space often failed to cover the breadth of the reference, which are attributed to
differences in the fundamental sequencing chemistry. As a second reference-based method, genes were annotated using the default celldex workflow for human immune cells. For the logicgated classification (Fig. 3), the output from classification was partitioned as “B cells” for MS4A1+, “CD4+ T cells” for CD4+,CD3E+ cells, “CD8+ T cells” and “T cells” for the CD4- ,CD3E+ population, and all other labels as the negative population. Finally, genes were clustered individually and cell type annotations were defined based on standard practice for the presence or absence of individual marker genes. The proportions annotated as accurate classification represent the total number of high-quality cells (n> 10,000 per comparison) and were consistent between different classification methods, verifying the specificity of the enrichment via sorting strategy and preservation of transcriptomcs for downstream analyses. For comparisons with other RNA-seq datasets, normalized data from flow-sorted bulk populations and single-cell annotations. A collection of 2,674 target genes of BCL11 A was downloaded from Harmonizome 44
3.0 using the standard AddModuleScore functionality in Seurat.
Nuclei analyses: Mouse brain analyses, including clustering and sub-clustering analyses, proceeded as described above. The selection of plotted marker genes followed the cerebellum atlas that defined oligodendrocyte subtype markers without any prior enrichment. To compare PERF-seq performance against frozen nuclei, a public four-plex FLEX library was downloaded and the dissociated eye nuclei was selected as the closest anatomical tissue to the profiled tissue, noting these are an imperfect yet useful comparison (Fig. 5B).
For existing GBM FFPE Flex data, the counts matrix was downloaded from the datasets hosted on the lOx Genomics website. The two GBM samples were run as a 4-plex in-line hashing with another tumor type (colorectal) which was discarded during pre-processing. These processed data were used both in defining endothelial/mural cell markers as well as in the downsampling performance analysis (Fig. 5C). Selection of marker genes was based on prior 32 profiles of endothelial cells from GBM cells
Results
Assay rationale and overview: Despite a broad availability of tissues preserved in formaldehyde, including FFPE tissue blocks, RNA extraction from these sources has remained challenging. Notably, formaldehyde-associated RNA degradation is a major limiting factor in
generating high-quality libraries from cells that have been fixed, and the degraded RNA molecules cannot be reliably reverse-transcribed for conventional downstream analyses. Recently, a new workflow from lOx Genomics, termed Flex, facilitates the profiling of fixed cells using whole transcriptome probe pairs, where genes are detected and quantified based on the barcoding of the ligated barcode product in droplets (Fig. 6A). In parallel, advances in Fluorescence In Situ Hybridization (FISH) technologies, including hybridization chain reaction (HCR), facilitate a multiplexable method that can yield a tethered amplification of fluorophore signal when probes are bound to RNA molecules of interest, enabling higher signal-to-noise separation and detection of specific populations. Notably, all RNA FISH technologies, require an irreversible fixation step, disqualifying profiling via conventional scRNA-scq technologies. Thus, a programmable (i.e., users define target genes) method was developed for enriching cells via flow cytometry based sorting with signal generated by RNA FISH followed by single-cell transcriptomes measured via the Flex scRNA-seq workflow (Fig. 1A). Such an approach allows for studying heterogeneity underlying cell states marked only by the presence/absence of specific marker genes, including rare cell states noted above.
PERF-seq development and quality control: To establish this potential assay, the simple concatenation of the HCR-FlowFISH workflow with the lOx Genomics Flex library preparation steps was tested. After sorting PBMCs for MS4A1, a B cell-specific marker gene, cells were captured and quality control measures were compared to a standard Flex library of only PBMCs. Despite targeting a similar number of cells (-3,000), the FlowFISH-sorted cells yielded far fewer cells with overall worse per-cell data quality than the standard Flex library (Fig. IB). In the CellRanger quality control report, it was observed that the FlowFISH^FIcx vO library contained a high number of reads half-mapping to the probeset, which is indicative of poor ligation efficiency of the Flex probeset (Fig. IB; Fig. 6B). This measure was corroborated by the BioAnalyzer traces for these two libraries (Fig. IB), which verified that the extension of the truncated probes severely limited per-cell data quality, consistent with an application note from lOx Genomics (Methods). Thus, the direct integration of FlowFISH into fixed-cell scRNA-seq did not yield high-quality genomics data.
To rescue the ability to profile cells downstream of FlowFISH with scRNA-scq, the impact of various components of the upstream FISH workflow were isolated. These included the buffer dextran sulfate, which is utilized for background suppression but has been shown to
negatively impact enzymatic activity, and the inclusion of the reagents necessary for generating the fluorescent signal, including the combination of the probes and hairpins (HCR-polymer) or the hairpin alone (Fig. 1C). By selectively removing pieces of the FlowFISH workflow, it was observed that while dextran sulfate had no impact on library quality, the generation of the HCR polymer was sufficient to impair data quality (1.2% vs 76% full probe reads; Fig. 1C). As the formation of the gene probe-hairpin polymer is essential for FlowFISH (both HCR- and other types), direct elimination from the protocol is not feasible. Thus, it was possible that removing the tethered probe-hairpin polymer after sorting but before scRNA-seq library preparation could rescue transcriptional profiling.
To assess this, HCR polymer in the sorted population of cells was disassembled by adapting formamide and enzymatic stripping of the HCR polymer, both of which have been used in imaging and microscopy analyses. For enzymatic approaches, multiple dsDNases, including a dsDNase (Thermo) that preferentially degrades double- stranded DNA were assessed (Methods). Though formamide enabled polymer stripping as expected, there was no meaningful improvement in data quality. In fact, the presence of formamide was sufficient to inhibit the capture of virtually any ligation product (0.5% reads mapping to the full probe set), irrespective of the presence of HCR-polymer (Fig. ID). Alternatively, the disassembly of the HCR polymer via either dsDNase yielded markedly higher library complexity and full gene probe percentages than any prior library with the HCR-polymer, resulting in library metrics on par with the original Flex libraries with a concomitant reduction in fluorescence signal (Fig. ID; Figs. 6C, 6D). To further enhance the scRNA-seq libraries, it was found that sorting using BSA instead of FBS and including RNase inhibitor resulted in overall better library quality and transcriptional profiling (Fig. 1H; Methods), yielding a FlowFISH-sorted library with 90% of molecules mapping to a full probe set and 4.2% mapping to half-ligated probes, which was confirmed via BioAnalyzer traces (Fig. 6E). Taken together, the synthesis of these optimization steps for FlowFISH sorting and profiling via scRNA-seq produces the new assay: PERF-seq.
As a final verification that the polymer stripping was necessary for high-quality library preparation, the full PERF-seq workflow with all modifications except the dsDNase stripping step was completed (Methods). A Bio Analyzer trace of the resulting library confirmed a high quantity of incomplete molecules again reflective of incomplete probe ligation (Fig. 6F). Thus,
though various pieces of the protocol were optimized in the workflow, the polymer stripping step is required for high-quality scRNA-seq libraries.
PERF-seq benchmarking:! assess the feasibility, sensitivity, and specificity of potential enrichments, the efficiency of the HCR-FlowFISH probes to enrich for well-defined populations was considered. Staining PBMCs for ACTB with two different amplifiers allowed us to assess HCR FISH efficiency on both the 647 or 488 channels via FACS. In either channel, 97% of cells were positive for ACTB, indicating the high sensitivity of HCR-FlowFISH and compatibility with multiple fluorescent colors (Fig. 2A). Next, to verify sensitivity, two human cell lines, Rajis and K562s, were mixed in varying abundances, including two orders of magnitude dilution of the K562 cells into the Rajis (Methods). Probes were designed against XIST, a long-noncoding RNA (IncRNA) expressed in cells with XX sex chromosomes, including K562s, and absent from cells with XY chromosomes, including Rajis. Flow cytometry analyses confirmed the ability to sensitively and specifically detect the XIST* populations in each synthetic mixture, including a sensitivity of 1 in 500 cells (Fig. 2B). These analyses confirmed the feasibility of the workflow with non-coding features that may enable new methods for studying populations expressing specific RNA molecules.
Next, to quantitatively assess the performance of the scRNA-seq profiles, four libraries were multiplexed to compare the changes made in PERF-seq compared to the Flex protocol, including the probe staining/stripping and sorting (Fig. 2C; Methods). Across the four-plex profiles, 59,313 cells were profiled, recovering the major cell types expected from PBMCs. For the PERF-seq library, profiled CD3E* cells were sorted, and the resulting libraries yielded -95- 97% T cells and corresponding depletion of other cell types, including B cells and monocytes (Figs. 2D, 2E). Qualitatively, major unexpected population shifts from these four conditions were not observed, indicating that the steps in the PERF-seq library prep do not meaningfully bias transcript detection relative to the standard Flex workflow. In particular, minimal differences between CD3E and CD3D expression in the T cell populations were observed under the different treatment conditions, indicating that the transcript targeted by FlowFISH can still be quantified via scRNA-seq. Comparing the four-plex experiment to a gold-standard public PBMC dataset from lOx Genomics, minimal, if any, loss in data quality was observed, as downsampling to a consistent mean reads / cell (16,750 reads / cell based on the lowest library). Explicitly, the key data quality metrics per cell did not differ by more than -10% comparing public Flex
(median 5,200 UMIs and median 2,857 genes detected per cell), Flex completed herein (median 4,885 UMIs and median 2,536 genes), and the PERF-seq libraries (median 4,789 UMIs and median 2,535 genes; Fig. 4F) Taken together, the benchmarking analyses verify HCR-FlowFISH as a sensitive and specific workflow for enriching populations, and PERF-seq as a protocol that yields consistent mRNA profiles with minimal, if any, loss in data quality.
Multi-color programmable enrichment with PERF-seq Given the successful benchmarking of the assay that enriched an individual marker, further benchmark PERF-seq was investigated by designing probes against three well-described genes in different immune populations in PBMCs, CD3E, MS4A1 (CD20), and CD4 (Fig. 3A). Using one fluorophore per gene, four populations of cells were recovered using PERF-seq, including a CD4 and CD3E double-positive population that together specifically enrich for CD4+ T cells when either marker alone would CD4 monocytes; CD3E'. CD8+ T cells; Fig. 3B). scRNA-seq profiling of the four populations in a single reaction using in-line hashing yielded a total of 35,220 cells across one in-line barcoded multiplexed capture (Fig. 3C; ). To verify the quality of the enrichments, three independent cell type annotation methods were performed, indicating that PERF-seq had 70-88% accuracy in recovering the expected cell type labels (Fig. 3D). Notably, the enrichment was lower for the negative population (i.e., B and T cells were not all sorted), indicative of potential variation in probe efficacy in the FlowFISH workflow (Methods). Analyses of standard marker genes in PBMC analyses confirmed the expected populations and confirmed recovery of the transcripts used for FlowFISH enrichment and sorting, noting the overall depletion CD4+,CD3E- myeloid cells that were not sorted based on the strategy described herein (Fig. 3E). Again, a meaningful loss of CD3E, CD4, or MS4A1 expression in expected populations was not observed, further verifying the detection of these transcripts in scRNA-seq following the sorting and HCR polymer stripping.
To confirm the efficacy of the double positive staining, differential gene expression analyses of CD3E+ cells that either co-expressed or were negative for the CD4 transcript were performed. Reassuringly, the top differential transcripts coincided with key markers for CD4+ and CD8+ T cells, including the cognate transcripts themselves (Fig. 3F). To further verify the multiple probe enrichment, the 9,035 cells from the CD4,CD3E double-positive population were reclustered, and major CD4+ T cell subsets in peripheral blood were identified, including T regs expressing FOXP3, naive T cells marked by LEF1, interferon-responsive cells expressing IFIT
genes, and cytotoxic CD4+ cells expressing GZMH (Fig. 3G). Together, these analyses verify the compatibility of the workflow to enrich for cellular populations based on combinations of RNA markers for specific population enrichment.
Rare cell states enriched via transcriptional regulators: As many immune cell subsets have been canonically defined by the presence of surface proteins, transcriptional heterogeneity using other RNA features typically inaccessible to conventional antibody-based flow cytometry was investigated. Specific populations expressing disparate transcription factors (TFs) in the hematopoietic system were enriched, noting their importance as lineage-defining factors in differentiation, and SPll were selected as transcription factors with well-described functions in the hematopoietic system and robust expression across various cell types from public scRNA-scq atlases. 37,566 cells over three libraries were profiled, including SPI1+, BCL11A+, and doublenegative libraries (Fig. 4B). Cell states segregated based not only on the sorted transcription factors but also as a function of well-established (surface) marker genes from PBMCs (Fig. 4C). Specifically, robust enrichment was observed as 96.0% and 91.1% of cells sorted for SPll and BCL11A had non-zero UMI counts for those genes (compared to 13.3% and 9.5% in the unsorted fraction, respectively) with over half of the positive cells having 10+ UMIs in the enriched libraries (Fig. 4D). Application of standard clustering and cell type annotation workflows showed a clear skewing of cell states in either enriched library, including a consistent depletion of T cells from the sorted TF libraries (Figs. 4E, 4F).
Though BCL11A is an essential regulator of lymphoid development with ‘B cell’ as part of its name, nearly 75% of the enriched cell population from its PERF-seq library were pDCs. This enrichment was consistent with bulk and single-cell expression indicating that pDCs express BCL11A 1-2 orders of magnitude higher than B cells. Though there was minimal enrichment of the B cell state in the BCL11A+ PERF-seq library, cells from this sorted library had higher BCL11A expression and target gene module scores than B cells from the other two libraries, indicating that the PERF-seq workflow could enrich for cells with high TF activities within specific cell types. Further, the SPll enrichment primarily resulted in monocyte and classical dendritic cells (eDCs), again consistent with the expectation of where this factor is expressed in PBMCs (Fig. 4F).
Next, it was investigated whether any population was enriched in both TF+ populations profiled with PERF-seq, and indeed, it was observed that AS DCs, a population of ~1 in 5,000
PBMCs had a nearly 5-fold enrichment in both BCL11A and SPI1 -enriched populations (Fig. 4G). Sub-clustering of the 269 cells from the sorted populations confirmed consistent expression of AXL and SIGLEC6, verifying the AS DC identities (Fig. 4H). As the authors previously reported two subclasses of AS DCs, including pDC-like and cDC-like cells, it was hypothesized that the FlowFISH-sorted lineage-defining transcription factors may be associated with heterogeneity in this cell state. Thus, differential gene expression analyses between the SPI1+ and BCL11A+ sorted populations were performed, and indeed, the key markers of the pDC-like subset (e.g., 1L3RA, MZB1) vs a cDC-like subset were identified (e.g., 1FI30, 1TGAX; Figs. 41, 4J). In addition to reproducing differential expression between markers previously described in these AS DC subpopulations, the PERF-scq differential analyses further identified new molecules that may play a functional role in these populations, including surface molecules CD5, TLR9). granzymes (GZMB and cytokines (IL1B Fig. 4J).
Next, reanalysis of the original Smart-seq2 data confirmed the co-expression of BCL11A and SPI1 in these AS DC populations with marker genes previously identified for the two cDC- like and pDC-like subpopulations (Fig. 4K). To further validate the identification of these TFs as AS DC subpopulations, PERF-seq targeting IL3RA was performed using an inclusive sort gate to include the IL3RAiovi population (marked in the prior analysis with SPI1+ FlowFISH). From the 9,178 profiled cells, clustering analyses identified 95 AS DCs, and gene correlation analyses confirmed the separation of expression modules co-occurring with these two TFs. Thus, analyses of the PERF-seq datasets and the original Smart-seq2 profiles broadly corroborate the cDC-like and pDC-like subsets of AS DCs and nominate the lineage-defining BCL11A and SPI1 TFs as potentially critical regulators of the heterogeneity in this rare cell state.
The final experiment in this setting reveals a key conceptual vignette of decoupling RNA and protein for enrichments using this platform. Specifically, IL3RA encodes CD 123, an AS DC and pDC-specific surface marker the PERF-seq profiles included many other T cell, B cell, and other myeloid populations, which is consistent with bulk mRNA profiles in PBMCs. Further flow cytometry analyses confirmed distinct populations of CD 2 +IIL3RA+ populations (AS DCs and pDCs) as well as a clear CD 1 ' -UL3RA+ population (T cells, B cells, etc.). These analyses collectively verify the PERF-scq workflow for identifying transcriptional heterogeneity in rare populations but emphasize the importance of determining genes for enrichments/depletions in a
data-driven manner using RNA profiles rather than prior conventions or surface proteins markers to determine the best markers for enrichment and profiling.
Enrichment and heterogeneity of rare nuclei revealed via PERF-seq'. As the workflow confirmed the overall ability to enrich cellular populations based on mRNA expression, it was assessed whether PERF-seq would be similarly compatible with the enrichment and profiling of single nuclei under different preservation conditions. Such a workflow would be particularly compelling due to the lack of well-described antibodies against nuclear proteins that can be used to enrich distinct populations. To assess the possibility, nuclei were isolated from frozen adult mouse brain cerebellum and human glioblastoma multiform (GBM) tissue preserved in FFPE (Fig. 5A). For the mouse cerebellum sample, it was sought to enrich for oligodendrocytes that could be enriched via the highly-specific gene Mobp. The relative sparsity of these cells in the cerebellum (-2% of total cells reported) and the implication of cell state- specific disease progression in multiple neurodegenerative disorders motivated the profiling. For the human GBM samples (n=2 donors), three vasculature-associated genes, DCN, FN1, and VWF, were selected that have been individually implicated as biomarkers in the potential pathogenesis or treatment response of glioblastomas. Reanalysis of existing GBM FFPE data suggested that these three genes are principally expressed in a rare (-4.2%) population of vascular-derived cells in primary GBM tumors, motivating detailed study via PERF-seq. Similar to the immune cell benchmarking (Fig. 2), downsampling analyses confirmed that the upstream HCR-FlowFISH workflow had minimal, if any, effect on total UMI recovery as compared to public Flex profiles from lOx Genomics (Figs. 5B, 5C).
After sorting and profiling the -4.1% of Mobp expressing populations, it was determined that 98.6% of cells sorted from the Mobp* FISH expressed >1 Mobp UMI compared to 2.0% of the Mobp' population, a nearly 50x enrichment (Figs. 5A, 5D-5F). Co-embedding of the Mobp+ and Mobp' samples and annotation of canonical marker genes confirmed the expected cerebellum cell types, including Gabra6 and Rbfox3 marking granule Cells, Gadl (interneurons), and Itih3 (Bergmann glia) that were detected in the Mobp' library. Subclustering on Mobp+ positive oligodendrocytes recovered four major populations with distinct marker profiles. Cluster 0 (n=3,805 nuclei) was characterized by 1133 and Ptgds known marker genes of mature terminally differentiated oligodendrocytes whereas cluster 1 (n=3,194 nuclei) was characterized by Klk6 and S100B, markers of maturing oligodendrocyte precursors that populate hindbrain structures
and the spinal cord (Fig. 5G. Clusters 2 and 3 were rare populations (n=429 and n=212 nuclei) defined by a mix of mature oligodendrocyte, oligodendrocyte precursor, and neuronal synapse markers (Atplbl, Scnb, Snap25 for cluster 2) in addition to Agt and Aqp4 marking cluster 3, likely a low-frequency Mobp+ astrocyte population (Fig. 5G. Though rare, prior evidence has suggested both subclusters may play a functional role, including cluster 2 as a population of differentiating oligodendroglia that are pruning synapses as well as cluster 3 representing Mobp+ astrocytes that has previously been described in cortical astrocyte populations.
Next, the PERF-seq profiles from the GBM enrichment were analyzed. A distinct cell cluster enriched primarily for cells from the vasculature OR-gated panel that expressed relevant marker genes was observed (Figs. 5H, 51). Not only did distributions of the sum of the UMIs in the panel show clear separation between the positive and negative populations (Fig. 5J, but overall a ~23x increase in signal was detected (mean panel4": 33.0 UMIs; mean panel-: 1.4 UMIs) where most background expression was driven by promiscuous expression of FN1, as expected (Fig. 51). Differential expression analyses confirmed genes enriched in the panel were among the top effects by fold-change with other collagen-associated genes similarly enriching within the vascular-enriched cells (Fig. 5K). Subclustering of the 1,015 vascular cells from the positive sort confirmed the two major populations, including VWF+ endothelial cells and FNI+/DNC+ mural cells (Figs. 5L, 5M). While identification of these major populations within the vaculature corroborate prior scRNA-seq analyses, high-quality PERF-seq profiles allowed for further identification of 8 subclusters within the enriched vasculature (Figs. 5M, 5N). These included an abundant population of PVLAP+ endothelial cells, a vascular marker of blood-brain barrier disruption, proliferating cells marked by MKI67, and an ultra-rare population of mural cells expressing OGN that has not been well-defined in the human brain but potentially linked to brain tumorogenesis.
Resolving somatic mosaicism with PERFF-seq: To further evaluate the sensitivity of PERFF-seq to lowly expressed genes, an OR-gated panel targeting multiple genes co-expressed in individual cells was built to boost sensitivity. Conceptually, if any one or combination is expressed at sufficient levels, a fluorescent signal would be generated, allowing for the isolation of the population. The goal of this experiment was to study mosaic loss-of-Y chromosome (LOY), the most common, age-related, somatically acquired mutation in the male genome. When LOY occurs in a hematopoietic stem and progenitor cell, the descendant, terminally
differentiated cells present with LOY across multiple lineages with a myeloid skew. To capture LOY events, a ten-gene FlowFISH panel for the male-specific region of the Y chromosome was designed (MSY; Fig. 7A). The MSY genes were elected because each is lowly expressed individually (1.5 UMIs per gene per positive cell;sum 5.5 UMIs per positive cell) compared with other targets (13.4 UMIs; Fig. 7B). Profiling four healthy donors aged from 20 years to 51 years via FlowFISH, the expected age- associated accumulation of MSY- cells in up to 1.9% in our oldest donor (Fig. 7C) was expected. Downstream analyses of PERFF-seq libraries profiling 35,283 cells confirmed a ~2-3x increase in cells with no Y chromosome UMIs and an overall shift in the cumulative distribution across the three aged donors. Using Azimuth, the expected major PBMC types were recovered, including anticipated skews between libraries from the MSY sort (Fig. 7D). Cell-type annotations confirmed anticipated skews24 between MSY- and MSY+ PERFF-seq libraries, including MSY+ enrichment of monocytes, CD4+ naive T cells and Treg cells (Figs. 7D and 7E). Differential gene expression and downstream gene set enrichment analyses for CD 14 monocytes revealed enrichment of four hallmark gene sets, including tumor necrosis factor (TNF) signaling by nuclear factor K-light-chain enhancer of activated B cells (NF-KB), suggesting that these monocytes may be linked to inflammatory dysregulation implicated in aged individuals with LOY25 (Fig. 7F). In sum, PERFF-seq can infer somatic changes as long as a sufficient fluorescent signal can be generated via FlowFISH. Although MSY genes are lowly expressed, the analyses show PERFF-seq can be applied to a range of gene expression levels with corresponding degrees of enrichment.
Taken together, the identification of these rare subpopulations from heterogeneous tissue inputs expands the scope of PERF-seq logic-gating to enrich nuclei upstream of single-cell profiling. These demonstrations in heterogenous brain tissue types emphasize the power of this programmable, RNA-based method for enrichment upstream of scRNA-seq.
Co-detection of protein and RNA via fluorescence for sorting upstream of single-cell genomics. Aw experiment was performed that combines surface antibody staining to identify populations with HCR-FlowFISH RNA signal. The results confirm a substantial increase in BCL11A expression via HCR-FlowFISH (~6x increase in pDCs compared to B cells), which is consistent with the bulk RNA-scq data from these populations (Fig. 8). As other populations have residual BCL11 A expression, including monocytes and dendritic cells, our HCR FlowFISH
data shows minimal, if any separation of B cells from these populations, indicating that it would not be possible to separate B cells from other PBMCs with high fidelity with only this marker.
As a different experiment to assess separating intermediate populations, the CD4 FlowFISH data was re-analyzed in the context of the HCR-FlowFISH data. The analyses indicate that CD4 RNA expression is actually more intermediate in T cells relative to monocytes (Fig. 9), which was subsequently confirmed with an additional antibody staining experiments.
As the primary results show that multiplexed sort logic can be successful, including in the characterization of CD4+/CD3E+ T cells, it was concluded that the lower expression of CD4 in T cells can still be useful for defining populations. Taken together, it was concluded that HCR- FlowFISH is indeed quantitative, and these vignettes indicate that more moderately expressed genes can be useful markers of specific cell types but should be used in conjugation with other markers (e.g., surface antibodies or other HCR-FlowFISH genes) in order to achieve clear separation.
Discussion
The collective applications confirm the utility of PERF-seq to enrich a variety of transcriptomic markers across distinct input material, including cells and nuclei derived from heterogenous tissue preservation methods, including FFPE (Fig. 5). As dissociations from solid tissue types often require the isolation of nuclei rather than cells for genomics profiling, the enrichment of specific cell populations is challenging due to a lack of well-described antibodies that stain the surface of the nucleus or ability to enrich based appropriate intranuclear proteins. Further, as continued efforts to draw cell atlases have unveiled increasing complex potential cell types defined by specific markers, including in the mouse cerebellum, PERF-seq may be used to enrich for specific populations defined by genes without the need for laborious genetic engineering, including via Cre- recombinases. Though prior methods such as Probe-seq have coupled RNA FISH to bulk RNA sequencing, PERF-seq is distinct in its ability to profile populations of single cells at high throughput (~104-105 cells per capture).
As PERF-seq utilizes commercially available kits for FlowFISH and scRNA-seq profiling, its adoption should be straightforward in groups proficient in cither technology. In this method, the number of cells/nuclei recovered from the enrichment sort should be sufficient to start the lOx Flex workflow, which typically requires ~105-106 cells for cell pelleting upstream of
droplet encapsulation. In this sense, sorting for increasingly rare populations requires a concomitant abundance of starting material. Though experimental feasibility with up to three fluorophores in the same experiment was shown (Fig. 3), the HCR-FlowFISH kits have 10+ colors, conceptually allowing for sophisticated AND/NOT/OR logic-gating of populations for profiling. These limitations can be overcome via the use of new reagents for efficiently handling small numbers of cells that may extend to the PERF-seq workflow.
Noting the cell number requirement for all droplet-based scRNA-seq workflows, the coupling of HCR-FlowFISH FACS to scRNA-seq enables distinct sorting strategies. Rather than counting events on a cytometer, PERF-seq allows for designing an inclusive sorting logic where populations can be subsequently refined using transcriptomic profiles. For example, the application to sorting BCL11A+ cells allowed for detailed reanalyses of both AS DCs and B cells in the same experiment without pre- specifying the populations with distinct markers during sorting as these populations could be readily separated from scRNA-seq analyses (Fig. 4). Though many surface markers such as CD3E and MS4A1 (CD20) were consistent between surface protein and mRNA expression, other markers such as IL3RA (CD123) had gene expression in lineages where surface protein expression is absent. Similarly, though BCL11A is required for both B cell and pDCs development, the higher mRNA expression of this TF in pDCs resulted in a substantial enrichment of this cell type compared to a minimal enrichment of B cells in the PERF-seq library. Collectively, these vignettes motivate a careful data-driven exploration of appropriate marker genes using expression data rather than conventional knowledge derived from past FACS markers. Fortunately, such data-driven explorations are straightforward given the wealth of high-quality scRNA-seq profiles across many tissues, systems, and pathologies.
PERF-seq may be a particularly advantageous protocol in settings where the sorting and enrichment of cells or nuclei where antibodies against marker proteins are either not available, not applicable, or poorly defined for a population of interest. In particular, as FlowFISH technologies have been established for viral gene expression, ribosomal RNA content, or long non-coding RNAs (Fig. 2B), the workflow distinctly enables studying populations defined by these markers that have been understudied in conventional genomics analyses. Thus, an iterative process is envisioned where populations may be defined with existing large-scale scRNA-seq atlases, followed by rational nucleic acid cytometry, which will be accelerated by the
development of the assay described herein. Through these iterative definitions, enrichments, and applications, PERF-seq may aid in the etching of cell atlases with an even greater definition.
References
1. Regev, A. et al. The Human Cell Atlas. Elife 6, (2017).
2. Montoro, D. T. et al. A revised airway epithelial hierarchy includes CFTR- expressing ionocytes. Nature 560, 319-324 (2018).
3. Drokhlyansky, E. et al. The Human and Mouse Enteric Nervous System at Single-Cell Resolution. Cell 182, (2020).
4. Stuart, T. et al. Comprehensive Integration of Single-Cell Data. Cell 177 , 1888— 19O2.e21 (2019).
5. Villani, A.-C. et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science 356, (2017).
6. Lareau, C. A. et al. Latent human herpesvirus 6 is reactivated in CAR T cells. Nature 623, 608-615 (2023).
7. Chung, H. et al. Joint single-cell measurements of nuclear proteins and RNA in vivo. Nat. Methods 18, 1204-1212 (2021).
8. Evers, D. L., Fowler, C. B., Cunningham, B. R., Mason, J. T. & O’Leary, T. J. The effect of formaldehyde fixation on RNA: optimization of formaldehyde adduct removal. J. Mol. Diagn. 13, 282-288 (2011).
9. Choi, H. M. T. et al. Third-generation in situ hybridization chain reaction: multiplexed, quantitative, sensitive, versatile, robust. Development 145, (2018).
10. Choi, H. M. T., Beck, V. A. & Pierce, N. A. Next-Generation in Situ Hybridization Chain Reaction: Higher Gain, Lower Cost, Greater Durability. ACS Nano 8, 4284- 4294 (2014).
11. Wang, Y. et al. EASLFISH for thick tissue defines lateral hypothalamus spatio- molecular organization. Cell 184, 6361-6377. e24 (2021).
12. Alon, S. et al. Expansion sequencing: Spatially precise in situ transcriptomics in intact biological systems. Science 371, (2021).
13. Marshall, J. L. et al. HyPR-seq: Single-cell quantification of chosen RNAs via hybridization and sequencing of DNA probes. Proc. Natl. Acad. Sci. U. S. A. 117, 33404-33413 (2020).
14. Cano-Gamez, E. et al. Single-cell transcriptomics identifies an effectomess gradient shaping the response of CD4+ T cells to cytokines. Nat. Commun. 11, 1801 (2020).
15. Orkin, S. H. & Zon, L. I. Hematopoiesis: an evolving paradigm for stem cell biology. Cell 132, 631-644 (2008).
16. Choi, J. et al. Haemopedia RNA-seq: a database of gene expression during haematopoiesis in mice and humans. Nucleic Acids Res. 47, D780-D785 (2018).
17. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573- 3587.e29 (2021).
18. Kozareva, V. et al. A transcriptomic atlas of mouse cerebellar cortex comprehensively defines cell types. Nature 598, 214-219 (2021).
19. Siokas, V. et al. Myelin-associated oligodendrocyte basic protein rs616147 polymorphism as a risk factor for Parkinson’s disease. Acta Neurol. Scand. 145, 223-228 (2022).
20. Irwin, D. J. et al. Myelin oligodendrocyte basic protein and prognosis in behavioral-variant frontotemporal dementia. Neurology 83, 502-509 (2014).
21. Kon, T. et al. Immunoreactivity of myelin-associated oligodendrocytic basic protein in Lewy bodies. Neuropathology 39, 279-285 (2019).
22. Patel, K. S. et al. Decorin expression is associated with predictive diffusion MR phenotypes of anti-VEGF efficacy in glioblastoma. Sci. Rep. 10, 14819 (2020).
23. Senes, E. et al. Fibronectin expression in glioblastomas promotes cell cohesion, collective invasion of basement membrane in vitro and orthotopic tumor growth in mice. Oncogene 33, 3451-3462 (2014).
24. Mojiri, A. et al. Functional assessment of von Willebrand factor expression by cancer cells of non-endothelial origin. Oncotarget 8, 13015-13029 (2017).
25. Sung, H.-Y. et al. Down-regulation of interleukin- 33 expression in oligodendrocyte precursor cells impairs oligodendrocyte lineage progression. J. Neurochem. 150, 691-708 (2019).
26. Huang, H.-T. & Tzeng, S.-F. Interleukin- 33 has the protective effect on oligodendrocytes against impairment induced by cuprizone intoxication. Neurochem. Int. 172, 105645 (2024).
27. Floriddia, E. M. et al. Distinct oligodendrocyte populations have spatial preference and different responses to spinal cord injury. Nat. Common. 11, 5860 (2020).
28. Langlieb, J. et al. The molecular cytoarchitecture of the adult mouse brain. Nature 624, 333-342 (2023).
29. Du, J. et al. S100B is selectively expressed by gray matter protoplasmic astrocytes and myelinating oligodendrocytes in the developing CNS. Mol. Brain 14, 154 (2021).
30. Auguste, Y. S. S. et al. Oligodendrocyte precursor cells engulf synapses during circuit remodeling in mice. Nat. Neurosci. 25, 1273-1278 (2022).
31. Morel, L. et al. Intracortical astrocyte subpopulations defined by astrocyte reporter Mice in the adult brain. Glia 67, 171-181 (2019).
32. Xie, Y. et al. Key molecular alterations in endothelial cells in human glioblastoma uncovered through single-cell RNA sequencing. JCI insight 6, (2021).
33. Mei, Y. et al. Osteoglycin promotes meningioma development through downregulation of NF2 and activation of mTOR signaling. Cell Common. Signal. 15, 34 (2017).
34. Amamoto, R. et al. Probe-Seq enables transcriptional profiling of specific cell types from heterogeneous tissue by RNA-based isolation. Elife 8, (2019).
35. Sample enrichment for single-nucleus sequencing using concanavalin A- conjugated magnetic beads. STAR Protocols 4, 102595 (2023).
36. Yu, Y. et al. Bell la is essential for lymphoid development and negatively regulates p53. J. Exp. Med. 209, 2467-2483 (2012).
37. Ippolito, G. C. et al. Dendritic cell fate is determined by BCL11A. Proc. Natl. Acad. Sci. U. S. A. Ill, E998-1006 (2014).
38. Warren, C. J. et al. Quantification of virus-infected cells using RNA FISH-Flow. STAR Protoc 4, 102291 (2023).
39. Antony, C., Somers, P., Gray, E. M., Pimkin, M. & Paralkar, V. R. FISH-Flow to quantify nascent and mature ribosomal RNA in mouse and human cells. STAR Protoc 4, 102463 (2023).
40. Gonzalez-Vasconcellos, I., Cobos-Fernandez, M. A., Atkinson, M. J., Fernandez-
Piqueras, J. & Santos, J. Quantifying telomeric IncRNAs using PNA-labelled RNA-Flow FISH
(RNA-Flow). Commun Biol 5, 513 (2022).
41. Clark, I. C. et al. Identification of astrocyte regulators by nucleic acid cytometry. Nature 614, 326-333 (2023).
42. Reilly, S. K. et al. Direct characterization of cis-regulatory elements and functional dissection of complex genetic associations using HCR-FlowFISH. Nat. Genet. 53, 1166-1176 (2021).
43. Aran, D. et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol. 20, 163-172 (2019).
44. Rouillard, A. D. et al. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database 2016, (2016).
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it is readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims.
Claims
1. A method comprising:
(a) obtaining a sample comprising fixed cells or fixed nuclei;
(b) hybridizing oligonucleotide probes to RNA in the fixed cells or fixed nuclei, to produce labeled cells or labeled nuclei;
(c) enriching for a sub-population of the fixed cells or fixed nuclei based on their labeling, by flow cytometry;
(d) treating the enriched cells or enriched nuclei with a double-stranded DNAse for a sufficient time to degrade the oligonucleotide probes;
(e) inactivating the double-stranded DNAse; and
(f) performing single-cell RNA analysis on the cells or nuclei.
2. The method of claim 1, wherein the method is done without uncrosslinking the enriched cells or enriched nuclei.
3. The method of claim 1 or 2, wherein the method is done without denaturing the oligonucleotide probes from the RNA.
4. The method of any prior claim, wherein the double stranded DNAse is inactivated by exposure to a temperature in the range of 50 °C to 75 °C and/or by addition of a reducing or chelating agent.
5. The method of any prior claim, wherein at least 1,000 cells or nuclei are enriched in step (c).
6. The method of any prior claim, wherein less than 10% of the cells fixed cells or fixed nuclei are enriched in step (c).
7. The method of any prior claim, wherein the chemical crosslinker comprises paraformaldehyde or glutaraldehyde.
8. The method of any prior claim, wherein step (a) further comprises permeabilizing the cells or nuclei using a detergent.
9. The method of any prior claim, wherein the oligonucleotide probes of (b) comprise: (i) a pair of unlabeled initiator probes that hybridize to adjacent sites in a target RNA and (ii) fluorescently labeled amplification probes that hybridize to a pair of initiator probes when the initiator probes are hybridized to their target RNA.
10. The method of claim 9, wherein the fluorescently labeled amplification probes are designed to hybridize to one another as well as to the initiator probes, thereby forming a complex comprising multiple amplification probes when the initiator probes hybridize to their target RNA.
11. The method of claim 10, wherein the amplification probes have a hairpin structure and the labeling is done via a hybridization chain reaction.
12. The method of any prior claim, wherein the single cell RNA analysis is done using a single-cell compartmentalization method or a split-and-pool barcoding method.
13. The method of claim 12, wherein the single cell RNA analysis is done using a single-cell compartmentalization method that comprises:
(i) compartmentalizing the cells or nuclei, wherein at least some compartments receive a single cell;
(ii) making cell- specifically barcoded cDNA from the cells or nuclei in the compartments; and
(iii) sequencing the cell- specifically barcoded cDNA.
14. The method of claim 12, wherein the single cell RNA analysis is done using a single-cell compartmentalization method that comprises:
(i) hybridizing probes to RNA in the cells or nuclei;
(ii) compartmentalizing the cells or nuclei, wherein at least some compartments receive a single cell;
(iii) adding single cell barcodes to the hybridized probes, or ligation products thereof, in the compartments; and
(iv) sequencing the cell- specifically barcoded probes, or ligation products thereof.
15. The method of claim 12, wherein the single cell RNA analysis is done using a split- and-pool barcoding method comprising:
(i) making cDNA in the cells or nuclei;
(ii) compartmentalizing the cells or nuclei, wherein at least some compartments receive multiple cells; and
(iii) adding cell-specific barcodes to the cDNA in the cells or nuclei using a split-and- pool-barcoding method.
16. The method of claim 12, wherein the single cell RNA analysis is done using a split- and-pool barcoding method comprising:
(i) making cDNA in the cells or nuclei;
(ii) compartmentalizing the cells or nuclei, wherein at least some compartments receive multiple cells; and
(iii) adding cell-specific barcodes to the cDNA in the cells or nuclei using a split-and- pool-barcoding method.
17. The method of claim 12, wherein the single cell RNA analysis is done using a split- and-pool barcoding method comprising:
(i) hybridizing probes to RNA in the cells or nuclei;
(ii) compartmentalizing the cells or nuclei, wherein at least some compartments receive multiple cells;
(iii) adding single cell barcodes to the hybridized probes, or ligation products thereof,
using a split-and-pool-barcoding method; and
(iv) sequencing the cell- specifically barcoded probes, or ligation products thereof.
18. The method of any prior claim, wherein the method comprises making cell- specifically barcoded cDNA and sequencing the cell-specifically barcoded cDNA, or an amplification product thereof.
19. The method of any prior claim, wherein the sample comprises cells that are grown as a cell suspension, disassociated cells, optionally from a soft tissue, blood cells, or nuclei isolated from the same.
20. The method of any prior claim, wherein the method further comprises assaying the expression of a protein in the fixed cells or fixed nuclei, and step (c) comprises enriching for a sub-population of the fixed cells or fixed nuclei based on their labeling and protein expression, by flow cytometry.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463569677P | 2024-03-25 | 2024-03-25 | |
| US63/569,677 | 2024-03-25 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025207407A1 true WO2025207407A1 (en) | 2025-10-02 |
Family
ID=97215886
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/020760 Pending WO2025207407A1 (en) | 2024-03-25 | 2025-03-20 | Programmable enrichment via rna fish for single-cell rna analysis |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025207407A1 (en) |
-
2025
- 2025-03-20 WO PCT/US2025/020760 patent/WO2025207407A1/en active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Hughes et al. | Second-strand synthesis-based massively parallel scRNA-seq reveals cellular states and molecular features of human inflammatory skin pathologies | |
| Massoni-Badosa et al. | An atlas of cells in the human tonsil | |
| US20230348971A1 (en) | Transposition into native chromatin for personal epigenomics | |
| Liu et al. | Single-cell analysis of long non-coding RNAs in the developing human neocortex | |
| MacParland et al. | Single cell RNA sequencing of human liver reveals distinct intrahepatic macrophage populations | |
| Pressl et al. | Selective vulnerability of layer 5a corticostriatal neurons in Huntington’s disease | |
| Fan et al. | Combinatorial labeling of single cells for gene expression cytometry | |
| Engel et al. | Innate-like functions of natural killer T cell subsets result from highly divergent gene programs | |
| EP3994696B1 (en) | Systems and methods for sample preparation, sample sequencing, and sequencing data bias correction and quality control | |
| Guo et al. | Single-cell transcriptome profiling and chromatin accessibility reveal an exhausted regulatory CD4+ T cell subset in systemic lupus erythematosus | |
| CN116157533A (en) | Capturing genetic targets using hybridization methods | |
| CN109777872B (en) | T cell subsets and their signature genes in lung cancer | |
| EP3274476B1 (en) | Digital analysis of circulating tumor cells in blood samples | |
| Ocañas et al. | Minimizing the ex vivo confounds of cell-isolation techniques on transcriptomic and translatomic profiles of purified microglia | |
| CN109906381B (en) | Methods of identifying, targeting and isolating human Dendritic Cell (DC) precursors, 'pre-DCs', and uses thereof | |
| US20200397828A1 (en) | Atlas of choroid plexus cell types and therapeutic and diagnostic uses thereof | |
| Chen et al. | Distinct transcriptomes and autocrine cytokines underpin maturation and survival of antibody-secreting cells in systemic lupus erythematosus | |
| Pimpalwar et al. | Methods for isolation and transcriptional profiling of individual cells from the human heart | |
| Singh et al. | In situ 10-cell RNA sequencing in tissue and tumor biopsy samples | |
| JP2018525034A (en) | Methods for providing tumor-specific T cells | |
| Amamoto et al. | FIN-Seq: transcriptional profiling of specific cell types from frozen archived tissue of the human central nervous system | |
| Kim et al. | Boosting of tau protein aggregation by CD40 and CD48 gene expression in Alzheimer's disease | |
| Abay et al. | Transcript-specific enrichment enables profiling of rare cell states via single-cell RNA sequencing | |
| Ben-Othman et al. | Systems biology methods applied to blood and tissue for a comprehensive analysis of immune response to hepatitis B vaccine in adults | |
| TW201920660A (en) | Novel cell line and uses thereof |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25776147 Country of ref document: EP Kind code of ref document: A1 |