[go: up one dir, main page]

WO2025043080A2 - Systems and methods for cellular spatial analysis - Google Patents

Systems and methods for cellular spatial analysis Download PDF

Info

Publication number
WO2025043080A2
WO2025043080A2 PCT/US2024/043413 US2024043413W WO2025043080A2 WO 2025043080 A2 WO2025043080 A2 WO 2025043080A2 US 2024043413 W US2024043413 W US 2024043413W WO 2025043080 A2 WO2025043080 A2 WO 2025043080A2
Authority
WO
WIPO (PCT)
Prior art keywords
cells
genes
cell
gland
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/043413
Other languages
French (fr)
Other versions
WO2025043080A3 (en
Inventor
Patrick DANAHER
Joachim Helmut Schmid
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bruker Spatial Biology Inc
Original Assignee
Bruker Spatial Biology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bruker Spatial Biology Inc filed Critical Bruker Spatial Biology Inc
Publication of WO2025043080A2 publication Critical patent/WO2025043080A2/en
Publication of WO2025043080A3 publication Critical patent/WO2025043080A3/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks

Definitions

  • One class of methods designed to fish biology from the data deluge of spatial transcriptomics looks for spatial correlated sets of genes, that is, genes that tend to be expressed in the same regions. Spatial correlation between genes can arise through direct cell-cell communication, or from some underlying latent variable; both these mechanisms are of interest.
  • a method for analyzing a biological sample to identify genes having spatial correlations with one another comprising: (a) retrieving, by a computer processor and from a database: a location data indicative of relative positions of the plurality of cells in a multi-dimensional image of the biological sample; and a transcriptomic data of a plurality of genes of the plurality of cells; (b) processing, by the computer processor, at least the transcriptomic data to generate a gene expression matrix characterizing each cell of the plurality of cells by a gene expression level of the plurality of genes; (c) analyzing, by the computer processor, the location data and the transcriptomic data to generate an environment confounder matrix, the environment confounder matrix characterizes each cell of the plurality of cells by one or more environment variables of a region adjacent to or surrounding the each cell in the multi-dimensional image, the one or more environment variables are based at least in part on (i) cell classification of one or more cells in at least the region or (ii) measurement artifact
  • the method comprising (d) further comprises determining a degree of the correlation
  • method comprising (e) comprises identifying the at least one subset based on the degree of correlation.
  • the identifying in (e) is based at least in part on determining a threshold level of the degree of correlation.
  • the determining the correlation in (d) comprises analyzing covariance of the gene expression matrix conditional on the environment confounder matrix.
  • the method further comprises, in (d), generating a conditional correlation matrix of the plurality of genes based on the covariance, the conditional correlation matrix is different from correlation matrix of the gene expression matrix.
  • the at least one subset comprises a plurality of subsets that are different from one another.
  • the plurality of subsets comprises a first subset having a first plurality of genes and a second subset having a second plurality of genes, the first plurality of genes has at least one gene that is not in common with the second plurality of genes.
  • the gene expression matrix comprises an environment expression matrix, the environment expression matrix characterizes a cell of the plurality of cells by analyzing the gene expression level of the plurality of genes of nearest neighboring cells of the cell within the multi-dimensional image, and the method comprises, in (b), processing the transcriptomic data and the location data to generate the environment expression matrix .
  • a number of the nearest neighboring cells is at most about 1,000, at most about 500, at most about 100, or at most about 50.
  • the method further comprises displaying, via a graphical user interface, the genes of the at least one subset to a user.
  • the method further comprises, based on the analyzing in (d), generating a gene cluster map comprising a plurality of shapes representing the plurality of genes, the plurality of shapes is arranged in a plurality of clusters, a cluster of the plurality of clusters corresponds to the at least one subset.
  • the method further comprises displaying, via a graphical user interface, the gene cluster map to a user.
  • the multi-dimensional image is a two-dimensional image.
  • the method further comprises, subsequent to (e), scoring each cell of the plurality of cells based on single cell expression level of the genes of the at least one subset. In some aspects, the method further comprises generating an additional multi-dimensional image of the biological sample based on the scoring. In some aspects, the method further comprises, in (c), receiving selection of the one or more environment variables from a user via a graphical user interface. In some aspects, the one or more environment variables are based on both (i) the cell classification in the at least the region and (ii) the measurement artifact of the transcriptomic data in the at least the region. In some aspects, the measurement artifact comprises detection data via a synthetic control probe sequence.
  • the one ormore environment variables are based at least in part on data comprising one or more of (i) a number of cells having a cell classification of interest in at least the region of the multi-dimensional image, (ii) a number of different cell classifications identified in at least the region of the multi-dimensional image, (iii) a ratio between numbers of cells of two different cell classifications of interest in at least the region of the multi-dimensional image, or (iv) a relative location between a cell having a cell classification of interest and a tissue substructure in at least a portion of the multidimensional image, or any combination thereof.
  • the region can be characterized by having at most about 5 cells, at most about 10 cells, at most about 20 cells, at most about 50 cells, or at most about 100 cells.
  • a number of the one or more environment variables in the environment confounder matrix is at least about 5, at least about 10, at least about 15, or at least about 20.
  • the cell classification comprises one or more of endothelial cells, epithelial cells, dermal cells, endodermal cells, mesodermal cells, fibroblasts, osteocytes, chondrocytes, immune cells, dendritic cells, hepatic cells, pancreatic cells, or stromal cells, or any combination thereof.
  • the cell classification comprises one or more of salivary gland mucous cells, salivary gland serous cells, von Ebner's gland cells, mammary gland cells, lacrimal gland cells, ceruminous gland cells, eccrine sweat gland dark cells, eccrine sweat gland clear cells, apocrine sweat gland cells, gland of Moll cells, sebaceous gland cells, bowman's gland cells, Brunner's gland cells, seminal vesicle cells, prostate gland cells, bulbourethral gland cells, Bartholin's gland cells, gland of Littre cells, uterus endometrium cells, isolated goblet cells, stomach lining mucous cells, gastric gland zymogenic cells, gastric gland oxyntic cells, pancreatic acinar cells, Paneth cells, type II pneumocytes, Clara cells, somatotropes, lactotropes, thyrotropes, gonadotropes, corticotropes, intermediate pituitary cells, magnocellular
  • the cell classification comprises one or more of embryonic stem cells, embryonic germ cells, induced pluripotent stem cells, mesenchymal stem cells, bone marrow-derived mesenchymal stem cells, bone marrow-derived mesenchymal stromal cells, tissue plastic-adherent placental stem cells (PDACs), umbilical cord stem cells, amniotic fluid stem cells, amnion derived adherent cells (AMDACs), osteogenic placental adherent cells (OPACs), adipose stem cells, limbal stem cells, dental pulp stem cells, myoblasts, endothelial progenitor cells, neuronal stem cells, exfoliated teeth derived stem cells, hair follicle stem cells, dermal stem cells, parthenogenically derived stem cells, reprogrammed stem cells, amnion derived adherent cells, or side population stem cells, or any combination thereof.
  • PDACs tissue plastic-adherent placental stem cells
  • ADACs amniotic
  • the transcriptomic data comprises one or more of gene expression assays with fluorescently labeled probes, RNA sequencing (RNA-seq), microarray analysis, reverse transcription polymerase chain reaction (RT-PCR), cap analysis of gene expression, or single-cell RNA sequencing (scRNA-seq), or any combination thereof.
  • the plurality of cells comprises at least about 100 cells, at least about 200 cells, at least about 500 cells, or at least about 1,000 cells.
  • the plurality of genes comprises at least about 10 genes, at least about 20 genes, at least about 50 genes, or at least about 100 genes.
  • the plurality of genes are endogenous genes.
  • the plurality of genes comprises about from 5,000 genes to 6,000 genes.
  • the plurality of genes comprises about from 19,000 genes to 20,000 genes.
  • a system comprising one or more computer processors and computer memory coupled thereto, the computer memory comprising a machine executable code that, upon execution by the one or more computer processors, for analyzing a biological sample to identify genes having spatial correlations with one another, comprising: (a) a software module configured to retrieve, from a database: a location data indicative of relative positions of the plurality of cells in a multi-dimensional image of the biological sample; and a transcriptomic data of a plurality of genes of the plurality of cells; (b) a software module to configured to process, at least the transcriptomic data to generate a gene expression matrix characterizing each cell of the plurality of cells by a gene expression level of the plurality of genes; (c) a software module configured to analyze, the location data and the transcriptomic data to generate an environment confounder matrix, the environment confounder matrix characterizes each cell of the plurality of cells by one or more environment
  • the system comprising (d) further comprises a software configured to determine a degree of the correlation
  • method comprising (e) comprises a software configured to identify the at least one subset based on the degree of correlation.
  • to identify in (e) is based at least in part on determining a threshold level of the degree of correlation.
  • to determine the correlation in (d) comprises analyzing covariance of the gene expression matrix conditional on the environment confounder matrix.
  • the system further comprises, in (d), a software configured to generate a conditional correlation matrix of the plurality of genes based on the covariance, the conditional correlation matrix is different from the gene expression matrix.
  • the at least one subset comprises a plurality of subsets that are different from one another.
  • the plurality of subsets comprises a first subset having a first plurality of genes and a second subset having a second plurality of genes, the first plurality of genes has at least one gene that is not in common with the second plurality of genes.
  • the gene expression matrix comprises an environment expression matrix, the environment expression matrix characterizes a cell of the plurality of cells by analyzing the gene expression level of the plurality of genes of nearest neighboring cells of the cell within the multi-dimensional image, and the system comprises, in (b), a software configured to process the transcriptomic data and the location data to generate the environment expression matrix.
  • a number of the nearest neighboring cells is at most about 1,000, at most about 500, at most about 100, or at most about 50.
  • the system further comprises a software configured to display, via a graphical user interface, the genes of the at least one subset to a user.
  • the system further comprises, based on the analyzing in (d), a software configured to generate a gene cluster map comprising a plurality of shapes representing the plurality of genes, the plurality of shapes is arranged in a plurality of clusters, a cluster of the plurality of clusters corresponds to the at least one subset.
  • the system further comprises a software configured to display, via a graphical user interface, the gene cluster map to a user.
  • the multi-dimensional image is a two-dimensional image.
  • the system further comprises, subsequentto (e), a software configured to score each cell of the plurality of cells based on single cell expression level of the genes of the at least one sub set.
  • the system further comprises a software configured to generate an additional multi-dimensional image of the biological sample based on the scoring.
  • the system further comprises, in (c), a software configured to receive selection of the one or more environment variables from a user via a graphical user interface.
  • the one or more environment variables are based on both (i) the cell classification in the at least the region and (ii) the measurement artifact of the transcriptomic data in the at least the region.
  • the measurement artifact comprises detection data via a synthetic control probe sequence.
  • the one or more environment variables are based at least in part on one or more of (i) a number of cells having a cell classification of interest in at least the region of the multi-dimensional image, (ii) a number of different cell classifications identified in at least the region of the multi-dimensional image, (iii) a ratio between numbers of cells of two different cell classifications of interest in at least the region of the multi-dimensional image, and (iv) a relative location between a cell having a cell classification of interest or a tissue substructure in at least a portion of the multidimensional image, or any combination thereof.
  • the region can be characterized by having at most about 5 cells, at most about 10 cells, at most about 20 cells, at most about 50 cells, or at most about 100 cells.
  • a number of the one or more environment variables in the environment confounder matrix is at least about 5, at least about 10, at least about 15, or at least about 20.
  • the cell classification comprises one or more of endothelial cells, epithelial cells, dermal cells, endodermal cells, mesodermal cells, fibroblasts, osteocytes, chondrocytes, immune cells, dendritic cells, hepatic cells, pancreatic cells, or stromal cells, or any combination thereof.
  • the cell classification comprises one or more of salivary gland mucous cells, salivary gland serous cells, von Ebner's gland cells, mammary gland cells, lacrimal gland cells, ceruminous gland cells, eccrine sweat gland dark cells, eccrine sweat gland clear cells, apocrine sweat gland cells, gland of Moll cells, sebaceous gland cells, bowman's gland cells, Brunner's gland cells, seminal vesicle cells, prostate gland cells, bulbourethral gland cells, Bartholin's gland cells, gland of Littre cells, uterus endometrium cells, isolated goblet cells, stomach lining mucous cells, gastric gland zymogenic cells, gastric gland oxyntic cells, pancreatic acinar cells, Paneth cells, type II pneumocytes, Clara cells, somatotropes, lactotropes, thyrotropes, gonadotropes, corticotropes, intermediate pituitary cells, magnocellular
  • the cell classification comprises one or more of embryonic stem cells, embryonic germ cells, induced pluripotent stem cells, mesenchymal stem cells, bone marrow-derived mesenchymal stem cells, bone marrow-derived mesenchymal stromal cells, tissue plastic-adherent placental stem cells (PDACs), umbilical cord stem cells, amniotic fluid stem cells, amnion derived adherent cells (AMDACs), osteogenic placental adherent cells (OPACs), adipose stem cells, limbal stem cells, dental pulp stem cells, myoblasts, endothelial progenitor cells, neuronal stem cells, exfoliated teeth derived stem cells, hair follicle stem cells, dermal stem cells, parthenogenically derived stem cells, reprogrammed stem cells, amnion derived adherent cells, or side population stem cells, or any combination thereof.
  • PDACs tissue plastic-adherent placental stem cells
  • ADACs amniotic
  • the transcriptomic data comprises one or more of gene expression assays with fluorescently labeled probes, RNA sequencing (RNA-seq), microarray analysis, reverse transcription polymerase chain reaction (RT-PCR), cap analysis of gene expression, or single-cell RNA sequencing (scRNA-seq), or any combination thereof.
  • the plurality of cells comprises at least about 100 cells, at least about 200 cells, at least about 500 cells, or at least about 1,000 cells.
  • the plurality of genes comprises at least about 10 genes, at least about 20 genes, at least about 50 genes, or at least about lOO genes.
  • the plurality of genes are endogenous genes.
  • the plurality of genes comprises about from 5,000 genes to 6,000 genes.
  • the plurality of genes comprises about 19,000 genes.
  • the method and system described herein provide means for quickly identifying spatial correlations meriting attention.
  • the method and system described herein provide means for identifying gene modules with spatial correlations that cannot be explained by trivial factors like the cell type landscape or technical effects. It typically discovers dozens of such modules.
  • the method and system described herein provide means to implicate cell types in module activity and to describe module spatial patterns.
  • the method and system described herein provide a powerful and convenient way to quickly identify spatial transcriptomics trends that deserve scarce analyst attention.
  • Fig. 1A shows a non-limiting example of cell type map of a colon cancer.
  • Fig. IB shows a non-limiting example of a cell’s nearest neighbors.
  • Fig. 1C shows a non-limiting example of subset of the environment expression matrix.
  • Fig. ID shows a non-limiting example of raw correlation matrix of the environment expression matrix showing near-ubiquitous correlations.
  • Fig. IE shows a non-limiting example of subset of the environment confounding matrix, encoding cell type abundance and other confounding variables in each cell’s neighborhood.
  • Fig. IF shows a non-limiting example of correlation matrix of the environment matrix conditional on the confounding matrix, over the same subset of genes.
  • Fig. 1G shows a non-limiting example raw vs. conditional correlation of environment gene expression. Selected pairs of marker genes are highlighted.
  • Fig. 1H shows a non -limiting example of network representation of correlation between all genes in all modules.
  • Fig. II shows a non-limiting example of environment scores for a “tumor-promoting inflammation” module.
  • Fig. 1J shows a non-limiting example of single-cell scores for the module.
  • Fig. IK shows a non-limiting example of mRNA molecules of module genes.
  • Fig. IL shows a non-limiting example of estimated involvement of each cell type in each module.
  • Fig. IM shows a non-limiting example of estimated involvement of each cell type in each gene of the highlighted module.
  • Fig. 2A shows a non-limiting example of correlation structure of 51 ligands assigned to modules.
  • Fig. 2B shows a non-limiting example of involvement of each cell type in each module.
  • Figs. 2C-2E show non-limiting examples of environment expression of a module holding chemo-attractants (Fig. 2C), MHC2 antigen presentation genes (Fig. 2D), and MHC1 antigen presentation genes (Fig. 2E).
  • Fig. 2F shows a non -limiting example of conditional correlations of 555 ligand-receptor pairs.
  • FIG. 2G shows a non-limiting example of spatial map of single-cell expression of the ligand-receptor pair FCER2 & CR2.
  • Fig. 2H shows a non-limiting example of conditional correlation network around the FCER2-CR2 ligand-receptor pair.
  • Fig. 3A shows a non-limiting example of chart of workflow of SPARC.
  • Fig. 3B shows a non-limiting example of workflow for building the environment matrix and the conditioning matrix.
  • Fig. 3C shows a non-limiting example of workflow for deriving modules from conditional correlation matrix.
  • Fig. 3D shows a non-limiting example of workflow for calculating module scores and gene weights.
  • Fig. 3E shows a non-limiting example of workflow for scoring involvement of cell type.
  • Fig. 4 shows a non-limiting example of a computing device; in this case, a device with one or more processors, memory, storage, and a network interface.
  • FIG. 5 shows a non-limiting example of a web/mobile application provision system; in this case, a system providing browser-based and/or native mobile user interfaces.
  • Fig. 6 shows a non -limiting example of a cloud-based web/mobile application provision system; in this case, a system comprising an elastically load balanced, auto -scaling web server and application server resources as well synchronously replicated databases.
  • the term “about” in some cases refers to an amount that is approximately the stated amount, in some cases near the stated amount by 10%, 5%, or 1%, including increments therein, and in some cases, in reference to a percentage, refers to an amount that is greater or less the stated percentage by 10%, 5%, or 1%, including increments therein.
  • each of the expressions “at least one of A, B and C,” “at least one of A, B, or C,” “one or more of A, B, and C”, “one or more of A, B, or C” and “A, B, and/or C” means A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B and C together.
  • a method for analyzing a biological sample to identify genes having spatial correlations with one another comprising: (a) retrieving, by a computer processor and from a database: a location data indicative of relative positions of the plurality of cells in a multi-dimensional image of the biological sample; and a transcriptomic data of a plurality of genes of the plurality of cells; (b) processing, by the computer processor, at least the transcriptomic data to generate a gene expression matrix characterizing each cell of the plurality of cells by a gene expression level of the plurality of genes; (c) analyzing, by the computer processor, the location data and the transcriptomic data to generate an environment confounder matrix, the environment confounder matrix characterizes each cell of the plurality of cells by one or more environment variables of a region adjacent to or surrounding the each cell in the multi-dimensional image, the one or more environment variables are based at least in part on (i) cell classification of one or more cells in at least the region or (ii) measurement artifact
  • the method comprising (d) further can comprise determining a degree of the correlation
  • method comprising (e) can comprise identifying the at least one subset based on the degree of correlation.
  • the identifying in (e) can be based at least in part on determining a threshold level of the degree of correlation.
  • the determining the correlation in (d) can comprise analyzing covariance of the gene expression matrix conditional on the environment confounder matrix.
  • the method further can comprise, in (d), generating a conditional correlation matrix of the plurality of genes based on the covariance, the conditional correlation matrix can be different from the gene expression matrix.
  • the at least one subset can comprise a plurality of subsets that are different from one another.
  • the plurality of subsets can comprise a first subset having a first plurality of genes and a second subset having a second plurality of genes, the first plurality of genes has at least one gene that can be not in common with the second plurality of genes.
  • the gene expression matrix can comprise an environment expression matrix, the environment expression matrix characterizes a cell of the plurality of cells by analyzing the gene expression level of the plurality of genes of nearest neighboring cells of the cell within the multi-dimensional image, and the method can comprise, in (b), processing the transcriptomic data and the location data to generate the environment expression matrix.
  • a number of the nearest neighboring cells can be at most about 1,000, at most about 500, at most about 100, or at most about 50. In some cases, a number of nearest neighboring cells can be more than 1 ,000 cells. In some cases, a number of nearest neighboring cells can be about 1 cell, 2 cells, 3 cells, 4 cells, 5 cells, 6 cells, 7 cells, 8 cells, 9 cells, 10 cells, 15 cells, 20 cells, 25 cells, 30 cells, 35 cells, 40 cells, 45 cells, or about 50 cells. In some cases, a number of nearest neighboring cells can be about 55 cells, 60 cells, 65 cells, 70 cells, 75 cells, 80 cells, 85 cells, 90 cells, 95 cells, or about 100 cells.
  • a number of nearest neighboring cells can be about 150 cells, 200 cells, 250 cells, 300 cells, 350 cells, 400 cells, 450 cells, 500 cells, 550 cells, 600 cells, 650 cells, 700 cells, 750 cells, 800 cells, 850 cells, 900 cells, 950 cells, or about 1,000 cells.
  • the method further can comprise displaying, via a graphical user interface, the genes of the at least one subset to a user.
  • the method further can comprise, based on the analyzing in (d), generating a gene cluster map comprising a plurality of shapes representing the plurality of genes, the plurality of shapes can be arranged in a plurality of clusters, a cluster of the plurality of clusters corresponds to the at least one subset.
  • the method further can comprise displaying, via a graphical user interface, the gene cluster map to a user.
  • the multi-dimensional image can be a two-dimensional image.
  • the method further can comprise, subsequent to (e), scoring each cell of the plurality of cells based on single cell expression level of the genes of the at least one subset.
  • the method further can comprise generating an additional multi-dimensional image of the biological sample based on the scoring.
  • the method further can comprise, in (c), receiving selection of the one or more environment variables from a user via a graphical user interface.
  • the one or more environment variables are based on both (i) the cell classification in the at least the region and (ii) the measurement artifact of the transcriptomic data in the at least the region.
  • the measurement artifact can comprise detection data via a synthetic control probe sequence.
  • the one or more environment variables are based at least in part on one or more of (i) a number of cells having a cell classification of interest in at least the region of the multi-dimensional image, (ii) a number of different cell classifications identified in at least the region of the multi- dimensional image, (iii) a ratio between numbers of cells of two different cell classifications of interest in at least the region of the multi-dimensional image, or (iv) a relative location between a cell having a cell classification of interest and a tissue substructure in at least a portion of the multi-dimensional image, or any combination thereof.
  • the region can be characterized by having at most about 5 cells, at most about 10 cells, at most about 20 cells, atmost about 50 cells, or at most about 100 cells. In some cases, the region can be characterized by having about 1 cell, 2 cells, 3 cells, 4 cells, 5 cells, 6 cells, 7 cells, 8 cells, 9 cells, 10 cells, 15 cells, 20 cells, 25 cells, 30 cells, 35 cells, 40 cells, 45 cells, or about 50 cells. In some cases, the region can be characterized by having about 55 cells, 60 cells, 65 cells, 70 cells, 75 cells, 80 cells, 85 cells, 90 cells, 95 cells, or about 100 cells.
  • a number of the one or more environment variables in the environment confounder matrix can be at least about 5, at least about 10, at least about 15, or at least about 20. In some cases, a number of the one or more environment variables in the environment confounder matrix can be at least about least about 2, at least about 3, at least about 4, at least about 5, at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, at least about 11, at least about 12, at least about 13, at least about 14, at least about 15, at least about 16, at least about 17, at least about 18, at least about 19, at least about 20, at least about 21, at least about 22, at least about 23, at least about 24, at least about 25, at least about 26, at least about 27, at least about 28, at least about 29, at least about 30, at least about 31, at least about 32, at least about 33, at least about 34, at least about 35, at least about 36, at least about 37, at least about 38, at least about 39, at least about 40, at least about 41, at least about 42, at least about 43, at least about 44, at least about 45, at least about
  • the cell classification can comprise one or more of endothelial cells, epithelial cells, dermal cells, endodermal cells, mesodermal cells, fibroblasts, osteocytes, chondrocytes, immune cells, dendritic cells, hepatic cells, pancreatic cells, or stromal cells, or any combination thereof.
  • the cell classification can comprise one or more of salivary gland mucous cells, salivary gland serous cells, von Ebner's gland cells, mammary gland cells, lacrimal gland cells, ceruminous gland cells, eccrine sweat gland dark cells, eccrine sweat gland clear cells, apocrine sweat gland cells, gland of Moll cells, sebaceous gland cells, bowman's gland cells, Brunner's gland cells, seminal vesicle cells, prostate gland cells, bulbourethral gland cells, Bartholin's gland cells, gland of Littre cells, uterus endometrium cells, isolated goblet cells, stomach lining mucous cells, gastric gland zymogenic cells, gastric gland oxyntic cells, pancreatic acinar cells, Paneth cells, type II pneumocytes, Clara cells, somatotropes, lactotropes, thyrotropes, gonadotropes, corticotropes, intermediate pituitary cells, magno
  • the cell classification can comprise one or more of embryonic stem cells, embryonic germ cells, induced pluripotent stem cells, mesenchymal stem cells, bone marrow-derived mesenchymal stem cells, bone marrow-derived mesenchymal stromal cells, tissue plastic-adherent placental stem cells (PDACs), umbilical cord stem cells, amniotic fluid stem cells, amnion derived adherent cells (AMDACs), osteogenic placental adherent cells (OPACs), adipose stem cells, limbal stem cells, dental pulp stem cells, myoblasts, endothelial progenitor cells, neuronal stem cells, exfoliated teeth derived stem cells, hair follicle stem cells, dermal stem cells, parthenogenically derived stem cells, reprogrammed stem cells, amnion derived adherent cells, or side population stem cells, or any combination thereof .
  • PDACs tissue plastic-adherent placental stem cells
  • ADACs amn
  • the transcriptomic data can be selected from data comprising one or more of gene expression assays with fluorescently labeled probes, RNA sequencing (RNA-seq), microarray analysis, reverse transcription polymerase chain reaction (RT-PCR), cap analysis of gene expression, or single-cell RNA sequencing (scRNA-seq), or any combination thereof, in some aspects, the plurality of cells can comprise at least about 100 cells, at least about 200 cells, at least about 500 cells, or at least about 1,000 cells. In some aspects, the plurality of genes can comprise at least about 10 genes, at least about 20 genes, at least about 50 genes, or at least about 100 genes.
  • the plurality of genes can comprise at least about 150 genes, at least about 200 genes, at least about 250 genes, at least about 300 genes, at least about 350 genes, at least about 400 genes, at least about 450 genes, at least about 500 genes, atleast about 550 genes, at least about 600 genes, atleast about 650 genes, at least about 700 genes, at least about 750 genes, at least about 800 genes, at least about 850 genes, at least about 900 genes, at least about 950 genes, at least about 1,000 genes, atleast about 1,050 genes, at least about 1, 100 genes, at least about 1,150 genes, at least about 1,200 genes, at least about 1,250 genes, at least about 1,300 genes, at least about l,350 genes, atleast about 1,400 genes, at least about 1,450 genes, at least about 1,500 genes, at least about 1,550 genes, at least about 1,600 genes, at least about 1,650 genes, at least about 1,700 genes, at least about 1,750 genes, at least about 1,800 genes, atleast about l,850 genes,
  • the plurality of genes can comprise about from 5,000 genes to 6,000 genes. In some aspects, the plurality of genes can comprise at least about 5,100 genes, at least about 5,200 genes, at least about 5,300 genes, atleast about 5,400 genes, at least about 5,500 genes, at least about 5,600 genes, at least about 5,700 genes, at least about 5,800 genes, at least about 5,900 genes, at least about 6,000 genes, at least about 6, 100 genes, at least about 6,200 genes, at least about 6,300 genes, at least about 6,400 genes, at least about
  • 6.500 genes at least about 6,600 genes, at least about 6,700 genes, at least about 6,800 genes, at least about 6,900 genes, at least about 7,000 genes, at least about 7, 100 genes, at least about 7,200 genes, at least about 7,300 genes, at least about 7,400 genes, at least about
  • the plurality of genes can comprise about 19,000 genes.
  • the plurality of genes can comprise at least about 10,000 genes, at least about 10,500 genes, at least about 11,000 genes, at least about 11,500 genes, atleast about 12,000 genes, at least about 12,500 genes, at least about 13,000 genes, atleast about 13,500 genes, atleast about 14,000 genes, at least about 14,500 genes, atleast about 15,000 genes, at least about 15,500 genes, at least about 16,000 genes, atleast about 16,500 genes, at least about 17,000 genes, at least about 17,500 genes, at least about 18,000 genes, at least about 18,500 genes, at least about 19,000 genes, at least about 19,500 genes, or at least about 20,000 genes.
  • the plurality of genes are endogenous genes.
  • a system comprising one or more computer processors and computer memory coupled thereto, the computer memory comprising a machine executable code that, upon execution by the one or more computer processors, for analyzing a biological sample to identify genes having spatial correlations with one another, comprising: (a) a software module configured to retrieve, from a database: a location data indicative of relative positions of the plurality of cells in a multi-dimensional image of the biological sample; and a transcriptomic data of a plurality of genes of the plurality of cells; (b) a software module to configured to process, at least the transcriptomic data to generate a gene expression matrix characterizing each cell of the plurality of cells by a gene expression level of the plurality of genes; (c) a software module configured to analyze, the location data and the transcriptomic data to generate an environment confounder matrix, the environment confounder matrix characterizes each cell of the plurality of cells by one or more environment variables of a region adjacent to or surrounding the each cell in the multi - dimensional image
  • the system comprising (d) further can comprise a software configured to determine a degree of the correlation
  • method comprising (e) can comprise a software configured to identify the at least one subset based on the degree of correlation.
  • to identify in (e) can be based at least in part on determining a threshold level of the degree of correlation.
  • to determine the correlation in (d) can comprise analyzing covariance of the gene expression matrix conditional on the environment confounder matrix.
  • the system further can comprise, in (d), a software configured to generate a conditional correlation matrix of the plurality of genes based on the covariance, the conditional correlation matrix can be different from the gene expression matrix.
  • the at least one subset can comprise a plurality of subsets that are different from one another.
  • the plurality of subsets can comprise a first subset having a first plurality of genes and a second subset having a second plurality of genes, the first plurality of genes has at least one gene not in common with the second plurality of genes.
  • the gene expression matrix can comprise an environment expression matrix, the environment expression matrix characterizes a cell of the plurality of cells by analyzing the gene expression level of the plurality of genes of nearest neighboring cells of the cell within the multi-dimensional image, and the system can comprise, in (b), a software configured to process the transcriptomic data and the location data to generate the environment expression matrix.
  • a number of the nearest neighboring cells can be at most about 1,000, at most about 500, at most about 100, or at most about 50.
  • the system further can comprise a software configured to display, via a graphical user interface, the genes of the at least one subset to a user.
  • the system further can comprise, based on the analyzing in (d), a software configured to generate a gene cluster map comprising a plurality of shapes representing the plurality of genes, the plurality of shapes can be arranged in a plurality of clusters, a cluster of the plurality of clusters corresponds to the at least one subset.
  • the system further can comprise a software configured to display, via a graphical user interface, the gene cluster map to a user.
  • the multi-dimensional image can be a two-dimensional image.
  • the system further can comprise, subsequent to (e), a software configured to score each cell of the plurality of cells based on single cell expression level of the genes of the at least one subset.
  • the system further can comprise a software configured to generate an additional multi-dimensional image of the biological sample based on the scoring.
  • the system further can comprise, in (c), a software configured to receive selection of the one or more environment variables from a user via a graphical user interface.
  • the one or more environment variables are based on both (i) the cell classification in the at least the region and (ii) the measurement artifact of the transcriptomic data in the at least the region.
  • the measurement artifact can comprise detection data via a synthetic control probe sequence.
  • the one or more environment variables are based at least in part on one or more of (i) a number of cells having a cell classification of interest in at least the region of the multidimensional image, (ii) a number of different cell classifications identified in at least the region of the multi-dimensional image, (iii) a ratio between numbers of cells of two different cell classifications of interest in at least the region of the multi-dimensional image, or (iv) a relative location between a cell having a cell classification of interest and a tissue substructure in at least a portion of the multi-dimensional image, or any combination thereof.
  • the region can be characterized by having at most about 5 cells, at most about 10 cells, at most about 20 cells, at most about 50 cells, or at most about 100 cells.
  • a number of the one or more environment variables in the environment confounder matrix can be at least about 5, at least about 10, at least about 15, or at least about 20.
  • the cell classification can comprise one or more of endothelial cells, epithelial cells, dermal cells, endodermal cells, mesodermal cells, fibroblasts, osteocytes, chondrocytes, immune cells, dendritic cells, hepatic cells, pancreatic cells, or stromal cells, or any combination thereof.
  • the cell classification can comprise one or more of salivary gland mucous cells, salivary gland serous cells, von Ebner's gland cells, mammary gland cells, lacrimal gland cells, ceruminous gland cells, eccrine sweat gland dark cells, eccrine sweat gland clear cells, apocrine sweat gland cells, gland of Moll cells, sebaceous gland cells, bowman's gland cells, Brunner's gland cells, seminal vesicle cells, prostate gland cells, bulbourethral gland cells, Bartholin's gland cells, gland of Littre cells, uterus endometrium cells, isolated goblet cells, stomach lining mucous cells, gastric gland zymogenic cells, gastric gland oxyntic cells, pancreatic acinar cells, Paneth cells, type II pneumocytes, Clara cells, somatotropes, lactotropes, thyrotropes, gonadotropes, corticotropes, intermediate pituitary cells, magno
  • the cell classification can comprise one or more of embryonic stem cells, embryonic germ cells, induced pluripotent stem cells, mesenchymal stem cells, bone marrow-derived mesenchymal stem cells, bone marrow-derived mesenchymal stromal cells, tissue plasticadherent placental stem cells (PDACs), umbilical cord stem cells, amniotic fluid stem cells, amnion derived adherent cells (AMDACs), osteogenic placental adherent cells (OPACs), adipose stem cells, limbal stem cells, dental pulp stem cells, myoblasts, endothelial progenitor cells, neuronal stem cells, exfoliated teeth derived stem cells, hair follicle stem cells, dermal stem cells, parthenogenically derived stem cells, reprogrammed stem cells, amnion derived adherent cells, or side population stem cells, or any combination thereof.
  • PDACs tissue plasticadherent placental stem cells
  • ADACs amniotic fluid
  • the plurality of genes can comprise at least about 2,100 genes, at least about 2,200 genes, at least about 2,300 genes, at least about 2,400 genes, at least about 2,500 genes, at least about 2,600 genes, at least about 2,700 genes, at least about 2,800 genes, at least about 2,900 genes, at least about 3,000 genes, at least about 3,100 genes, at least about 3,200 genes, at least about 3,300 genes, at least about 3,400 genes, at least about 3,500 genes, at least about 3,600 genes, at least about 3,700 genes, at least about 3,800 genes, at least about 3,900 genes, at least about 4,000 genes, at least about 4,100 genes, at least about 4,200 genes, at least about 4,300 genes, at least about 4,400 genes, at least about 4,500 genes, at least about 4,600 genes, at least about 4,700 genes, at least about 4,800 genes, at least about 4,900 genes, or at least about 5,000 genes. In some aspects, the plurality of genes can comprise about from 5,000 genes to 6,000 genes. In some aspects, the plurality of genes can comprise at least about
  • the plurality of genes can comprise about 19,000 genes.
  • the plurality of genes can comprise at least about 10,000 genes, at least about 10,500 genes, at least about 11,000 genes, at least about 11,500 genes, at least about 12,000 genes, at least about 12,500 genes, atleast about 13,000 genes, at least about 13,500 genes, at least about 14,000 genes, at least about 14,500 genes, at least about 15,000 genes, at least about 15,500 genes, at least about 16,000 genes, at least about 16,500 genes, at least about 17,000 genes, at least about 17,500 genes, atleast about 18,000 genes, atleast about 18,500 genes, at least about 19,000 genes, at least about 19,500 genes, or at least about 20,000 genes.
  • the plurality of genes are endogenous genes.
  • Method and system described herein for analyzing a biological sample to identify genes having spatial correlations with one another comprise retrieving, by a computer processor and from a database: a location data indicative of relative positions of a plurality of cells in a multi-dimensional image of the biological sample; and a transcriptomic data of a plurality of genes of the plurality of cells.
  • the biological sample can be obtained at least in part of one or more of biopsy collection, surgical resection, xenograft, animal model, fine needle aspiration, peripheral blood collection, bone marrow biopsy, healthy tissue sampling, neoplastic tissue sampling, malignant tissue sampling, diseased tissue sampling, and implanted tissue sampling.
  • the biological sample can comprise cells or tissues.
  • the cells comprise primary cells, stem cells, immune cells, carcinoma cells, sarcoma cells, lymphoma cells, melanoma cells, cancer cells, or neoplastic cells.
  • the cells comprise germ cell tumor cells, blastoma cells, bladder cancer cells, breast cancer cells, colon cancer cells, colorectal cancer cells, endocrine tumor cells, esophageal cancer cells, glioblastoma cells, Hodgkin lymphoma cells, lung cancer cells, melanoma cells, or prostate cancer cells.
  • the cells comprise endothelial cells, epithelial cells, dermal cells, endodermal cells, mesodermal cells, fibroblasts, osteocytes, chondrocytes, immune cells, dendritic cells, hepatic cells, pancreatic cells, or stromal cells.
  • the image of biological sample can comprise a plurality of cells.
  • the plurality of cells can comprise at least about 10 cells, at least about 20 cells, at least about 30 cells, at least about 50 cells, at least about 100 cells, at least about 200 cells, at least about 300 cells, at least about 400 cells, at least about 500 cells, at least about 600 cells, at least about 700 cells, at least about 800 cells, at least about 900 cells, at least about 1000 cells, at least about 1200 cells, at least about 1400 cells, at least about 1600 cells, at least about 1800 cells, or at least about 2000 cells.
  • Method and system described herein comprise retrieving transcriptomic data of a plurality of genes of the plurality of cells.
  • Transcriptomic data may be used to analyze the expression levels of genes or RNA molecules, such as messenger RNA (mRNA) or non- coding RNA, in a particular sample. Differential gene expression patterns or alternative splicing events may serve as biomarkers for specific diseases or physiological conditions.
  • Transcriptomic probes may be identified through techniques like RNA sequencing (RNA- seq) or microarray analysis.
  • Transcriptomic data encompass biomarkers derived from the analysis of gene expression patterns, RNA molecules, and other transcriptomic data.
  • biomarkers derived from transcriptomic features comprise differential gene expression, alternative splicing patterns, fusion genes and chimeric transcripts, non-coding RNA biomarkers, gene expression signatures, or regulatory networks and pathways.
  • transcriptomic data can comprise data from gene expression assays with fluorescently labeled probes, RNA sequencing (RNA-seq), microarray analysis, reverse transcription polymerase chain reaction (RT-PCR), cap analysis of gene expression, or single-cell RNA sequencing (scRNA-seq).
  • transcriptomic data can comprise data from imaging transcriptomics or spot-based spatial assays.
  • data from imaging transcriptomics can comprise data from CosMx Spatial Molecular Imager (SMI), Xenium, or MERscope.
  • data from spotbased spatial assays may comprise data from Visium, Slide-seq or GeoMx Digital Spatial Profiler (DSP).
  • transcriptomic data may comprise measurement artifact.
  • the measurement artifact can comprise detection data via a synthetic negative control probe sequence.
  • a synthetic control probe can be a reference or control RNA molecule that can be artificially synthesized and added to biological samples during the experimental process. The purpose of synthetic control probes may be to serve as internal standards for monitoring the quality and performance of various steps in the transcriptomic analysis pipeline, particularly during RNA sequencing experiments. Synthetic control probes may be designed to have known sequences and concentrations. By introducing known sequences into the experimental samples, the quality and reliability of the experimental process may be assessed. Any deviations from the expected results may be indicative of issues such as RNA degradation, contamination, or technical biases.
  • Synthetic control probes may be used as a normalization reference, helping to correct for differences in sequencing depth, library preparation efficiency, and other technical factors that can introduce bias into the data. Synthetic control probes may be designed to mimic the characteristics of endogenous transcripts, such as their length, GC content, and secondary structure. The detection limits of experimental setup may be assessed. If synthetic control probes are reliably detected at low concentrations, it may indicate the sensitivity of the assay. By introducing a known quantity of synthetic RNA into the samples, the measurements and expression values may be calibrated, making them more comparable across different experiments and platforms.
  • negative control probes for “ERCC” sequences may be used. These probes bind nothing in any known genome and may be used to measure background in the system. In some embodiments, negative control barcodes or “false-codes” maybe used. The barcode sequences may not be generated by any physical probe and may perform as a component of background.
  • the plurality of genes can comprise endogenous genes or exogenous genes.
  • endogenous genes comprise protein-coding genes, ribosomal RNA (rRNA) genes, transfer RNA (tRNA) genes, small nuclear RNA (snRNA) genes, small nucleolar RNA (snoRNA) genes, microRNA (miRNA) genes, circular RNA (circRNA) genes, transfer-messenger RNA (tmRNA) gene, pseudogenes, transposable elements (TEs), immunoglobulin and T-cell receptor genes, or histone and other chromatin- related genes.
  • rRNA ribosomal RNA
  • tRNA transfer RNA
  • snRNA small nuclear RNA
  • snoRNA small nucleolar RNA
  • miRNA microRNA
  • circRNA circular RNA
  • transfer-messenger RNA tmRNA
  • TEs transposable elements
  • immunoglobulin and T-cell receptor genes or histone and other chromatin- related genes.
  • the method and system disclosed herein comprise retrieving a location data indicative of relative positions in a multi-dimensional image.
  • the method and system disclosed herein comprise retrieving the location data from multiple images depicting various cells or tissues obtained from a biological sample.
  • the image may be referred to a plurality of images representing different cells or tissues derived from a biological sample. These images may be generated through various imaging techniques, such as microscopy, histology, immunohistochemistry, or other relevant methods used in the field of biological sample analysis.
  • the images comprise microscope-derived images, comprising images from optical microscopes, electron microscopes, or scanning probe microscopes.
  • the multi-dimensional image can be a two-dimensional (2D) image, a three-dimensional (3D) image, or 2D projection of a 3D image.
  • the method and system disclosed herein comprise retrieving the location data from at least 1 image, at least 2 images, at least 3 images, at least 5 images, at least 10 images, at least 15 images, at least 20 images, at least 30 images, at least 35 images, at least 40 images, at least 45 images, at least 50 images, at least 55 images, at least 60 images, at least 65 images, at least 70 images, at least 80 images, at least 90 images, at least 100 images, at least 120 images, at least 150 images, at least 200 images or more of the biological sample.
  • the method and system disclosed herein may comprise processing, by the computer processor, at least the transcriptomic data to generate a gene expression matrix characterizing each cell of the plurality of cells by a gene expression level of the plurality of genes.
  • the plurality of genes can comprise at least about 2 genes, at least about 5 genes, at least about 10 genes, at least about 20 genes, at least about 30 genes, at least about 40 genes, at least about 50 genes, at least about 60 genes, at least about 70 genes, at least about 80 genes, at least about 90 genes, at least about 100 genes, at least about 120 genes, at least about 150 genes, at least about 200 genes, at least about 300 genes, at least about 400 genes, or at least about 500 genes.
  • the plurality of genes can comprise at least about 150 genes, at least about 200 genes, at least about 250 genes, at least about 300 genes, atleast about 350 genes, at least about 400 genes, at least about 450 genes, at least about 500 genes, at least about 550 genes, at least about 600 genes, at least about 650 genes, at least about 700 genes, at least about 750 genes, at least about 800 genes, at least about 850 genes, at least about 900 genes, at least about 950 genes, at least about 1,000 genes, at least about 1,050 genes, at least about 1,100 genes, at least about 1, 150 genes, at least about 1,200 genes, atleast about l,250 genes, atleast about 1,300 genes, at least about 1,350 genes, at least about 1,400 genes, at least about 1,450 genes, at least about 1,500 genes, at least about 1,550 genes, at least about 1,600 genes, at least about 1,650 genes, at least about 1,700 genes, atleast about l,750 genes, atleast about 1,800 genes, at least about 1,850 genes,
  • 3.500 genes at least about 3,600 genes, at least about 3,700 genes, at least about 3,800 genes, at least about 3,900 genes, at least about 4,000 genes, at least about 4, 100 genes, at least about 4,200 genes, at least about 4,300 genes, at least about 4,400 genes, at least about
  • the plurality of genes can comprise about from 5,000 genes to 6,000 genes.
  • the plurality of genes can comprise atleast about 5,100 genes, at least about 5,200 genes, at least about 5.300 genes, at least about 5,400 genes, at least about 5,500 genes, at least about 5,600 genes, at least about 5,700 genes, at least about 5,800 genes, at least about 5,900 genes, at least about 6,000 genes, at least about 6, 100 genes, at least about 6,200 genes, at least about
  • the plurality of genes can comprise about 19,000 genes.
  • a gene expression matrix can be a fundamental data structure to represent the expression levels of genes across cells in different biological samples. It's a tabular data format where rows correspond to genes and columns correspond to individual cells. Each cell in the matrix contains a numerical value that represents the expression level of a specific gene in a specific cell. In some embodiments, each row in the matrix corresponds to a specific gene. The genes are identified by their unique gene symbols or identifiers. In some embodiments, each column in the matrix represents an individual biological sample. Samples can be derived from various sources, such as different tissues, experimental conditions, time points, or individuals. In some embodiments, the values in the cells of the matrix represent the expression levels of genes in the corresponding cells.
  • These expression values may be quantified using different units, such as counts, reads per kilobase of transcript per million mapped reads (RPKM), fragments per kilobase of transcript per million mapped reads (FPKM), or transcripts per million (TPM). These values provide information about the relative abundance of each gene in each cell of the sample.
  • RPKM reads per kilobase of transcript per million mapped reads
  • FPKM fragments per kilobase of transcript per million mapped reads
  • TPM transcripts per million
  • gene expression matrices may be used to compare the expression levels of genes between different cells or conditions to identify genes that are significantly upregulated or downregulated.
  • gene expression matrices are used for clustering analysis, where genes or cells with similar expression patterns are grouped together.
  • gene expression matrices may be used to construct coexpression networks that reveal interactions and relationships among genes. Visualization techniques, such as heatmaps and dendrograms, help reveal patterns and relationships in the data.
  • Principal Component Analysis (PCA) and Dimensionality Reduction may be used to reduce the dimensionality of the gene expression matrix to highlight the most important patterns in the data, aiding in visualization and interpretation.
  • the method further comprising displaying, via a graphical user interface, the genes of the at least one subset to a user.
  • gene expression matrices may be used as input data for machine learning algorithms to build predictive models, such as classifying samples based on their expression profiles.
  • the method and system disclosed herein may comprise analyzing the gene expression level of the plurality of genes of nearest neighboring cells of the cell.
  • a number of the nearest neighboring cells can be from about 5 to about 3000.
  • the number of the nearest neighboring cells can be from about 5 to about 3000, from about 5 to about 2500, from about 5 to about 2000, from about 5 to about 1500, from about 5 to about 1200, from about 5 to about 1000, from about 5 to about 800, from about 5 to about 600, from about 5 to about 500, from about 5 to about 400, from about 5 to about 300, from about 5 to about 200, from about 5 to about 100, from about 5 to about 50, or from about 5 to about 10.
  • the number of the nearest neighboring cells can be from about 10 to about 3000, from about 10 to about 2500, from about 10 to about 2000, from about 10 to about 1500, from about 10 to about 1200, from about 10 to about 1000, from about 10 to about 800, from about 10 to about 600, from about 10 to about 500, from about 10 to about400, from about 10 to about 300, from about 10 to about 200, from about lO to about 100, orfrom about lO to about 50.
  • the number of the nearest neighboring cells can be from about 50 to about 3000, from about 50 to about 2500, from about 50 to about 2000, from about 50 to about 1500, from about 50 to about 1200, from about 50 to about 1000, from about 50 to about 800, from about 5 Oto about 600, from about 50 to about 500, from about 50 to about400, from about 50 to about 300, from about 50 to about 200, or from about 50 to about 100.
  • the number of the nearest neighboring cells can be from about 100 to about 3000, from about 100 to about 2500, from about 100 to about 2000, from about lOOto about 1500, from about lOOto about 1200, from about 100 to about 1000, from about lOOto about 800, from about 100 to about 600, from about 100 to about 500, from about 100 to about 400, from about 100 to about 300, or from about 100 to about 200.
  • the number of the nearest neighboring cells can be from about 200 to about 3000, from about 200 to about 2500, from about 200 to about 2000, from about 200 to about 1500, from about 200 to about 1200, from about 200 to about 1000, from about 200 to about 800, from about 200 to about 600, from about 200 to about 500, from about 200 to about 400, or from about 200 to about 300.
  • the number of the nearest neighboring cells can be from about 300 to about 3000, from about 300to about2500, from about 300 to about2000, from about 300 to about 1500, from about 300to about 1200, from about 300 to about 1000, from about 300 to about 800, from about 300 to about 600, from about 300 to about 500, or from about 300 to about 400.
  • the number of the nearest neighboring cells can be from about 400 to about 3000, from about 400 to about 2500, from about 400 to about 2000, from about 400 to about 1500, from about 400 to about 1200, from about 400 to about 1000, from about 400 to about 800, from about 400 to about 600, or from about 400 to about 500.
  • the number of the nearest neighboring cells can be from about 500 to about 3000, from about 500 to about 2500, from about 500 to about 2000, from about 500 to about 1500, from about 500to about 1200, from about 500 to about 1000, from about 500 to about 800, or from about 500 to about 600.
  • the number of the nearest neighboring cells can be from about 600 to about 3000, from about 600 to about 2500, from about 600 to about 2000, from about 600 to about 1500, from about 600 to about 1200, from about 600 to about 1000, or from about 600 to about 800. In some embodiments, the number of the nearest neighboring cells can be from about 800 to about 3000, from about 800 to about 2500, from about 800 to about 2000, from about 800 to about 1500, from about 800 to about 1200, or from about 800 to about 1000. In some embodiments, the number of the nearest neighboring cells can be from about 1000 to about 3000, from about 1000 to about 2500, from about 1000 to about2000, from about lOOOto about 1500, or from about lOOOto about 1200.
  • the number of the nearest neighboring cells can be from about 1200 to about 3000, from about 1200 to about2500, from about 1200 to about 2000, or from about 1200 to about 1500. In some embodiments, the number of the nearest neighboring cells can be from about 1500 to about 3000, from about 1500 to about 2500, or from about 1500 to about 2000. In some embodiments, the number of the nearest neighboring cells can be from about 2000 to about 3000, or from about 2000 to about 2500. In some embodiments, the number of the nearest neighboring cells can be from about 2500 to about 3000.
  • the number of the nearest neighboring cells can be at most about 3, 000, at most about 2500, at most about 2000, at most about 1500, atmost about 1200, at most about 1000, at most about 800, at most about 600, at most about 500, at most about 400, at most about 300, at most about 200, at most about 100, at most about 50, at most about 10, or at most about 5 .
  • the number of the nearest neighboring cells can be at least about 5, at least about 10, at least about 50, at least about 100, at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 800, at least about 1000, at least about 1200, at least about 1500, at least about 2000, at least about 2500, or at least about 3000. In some embodiments, the number of the nearest neighboring cells can be about 5, about 10, about 50, about 100, about 200, about 300, about 400, about 500, about 600, about 800, about 1000, about 1200, about 1500, about 2000, about 2500, or about 3000.
  • the method and system disclosed herein may comprise analyzing, by the computer processor, the location data and the transcriptomic data to generate an environment confounder matrix, the environment confounder matrix characterizes each cell of the plurality of cells by one or more environment variables of a region adjacent to or surrounding the each cell in the multi-dimensional image, the one or more environment variables are based at least in part on (i) cell classification of one or more cells in at least the region or (ii) measurement artifact of the transcriptomic data in at least the region .
  • An environment confounder matrix can be used to summarize traits of a cell’ s neighbors.
  • a single row of the environment confounder matrix would hold values describing, for a single cell, how many cells by classifications are in its neighborhood.
  • the immune cells may comprise B-cells, T-cells, Natural Killer (NK) cells, dendritic cells, macrophages, neutrophils, monocytes, or microglia.
  • NK Natural Killer
  • the cell classification can comprise one or more members selected from the group comprise salivary gland mucous cells, salivary gland serous cells, von Ebner's gland cells, mammary gland cells, lacrimal gland cells, ceruminous gland cells, eccrine sweat gland dark cells, eccrine sweat gland clear cells, apocrine sweat gland cells, gland of Moll cells, sebaceous gland cells, bowman's gland cells, Brunner's gland cells, seminal vesicle cells, prostate gland cells, bulbourethral gland cells, Bartholin's gland cells, gland of Littre cells, uterus endometrium cells, isolated goblet cells, stomach lining mucous cells, gastric gland zymogenic cells, gastric gland oxyntic cells, pancreatic acinar cells, Paneth cells, type II pneumocytes, Clara cells, somatotropes, lactotropes, thyrotropes, gonadotropes, corticotropes, intermediate pitu
  • the cells comprise embryonic stem cells, embryonic germ cells, induced pluripotent stem cells, mesenchymal stem cells, bone marrow-derived mesenchymal stem cells, bone marrow-derived mesenchymal stromal cells, tissue plastic-adherent placental stem cells (PDACs), umbilical cord stem cells, amniotic fluid stem cells, amnion derived adherent cells (AMDACs), osteogenic placental adherent cells (OPACs), adipose stem cells, limbal stem cells, dental pulp stem cells, myoblasts, endothelial progenitor cells, neuronal stem cells, exfoliated teeth derived stem cells, hair follicle stem cells, dermal stem cells, parthenogenically derived stem cells, reprogrammed stem cells, amnion derived adherent cells, and side population stem cells.
  • PDACs tissue plastic-adherent placental stem cells
  • ADACs amnion derived adherent cells
  • transcriptomic data may comprise measurement artifact.
  • the measurement artifact can comprise detection data via a synthetic control probe sequence.
  • negative control probes for “ERCC” sequences may be used. These probes bind nothing in any known genome and may be used to measure background in the system.
  • negative control barcodes or “false-codes” may be used. The barcode sequences may not be generated by any physical probe and may perform as a component of background.
  • the method and system described herein comprise the one or more environment variables based at least in part on one or more members selected from the group comprising (i) a number of cells having a cell classification of interest in at least the region of the multidimensional image, (ii) a number of different cell classifications identified in at least the region of the multi-dimensional image, (iii) a ratio between numbers of cells of two different cell classifications of interest in at least the region of the multi-dimensional image, and (iv) a relative location between a cell having a cell classification of interest and a tissue substructure in at least a portion of the multi-dimensional image.
  • the region of the multi-dimensional image may be characterized by having at most about 200 cells, at most about 180 cells, at most about 160 cells, at most about 140 cells, at most about 120 cells, at most about 100 cells, at most about 80 cells, at most about 60 cells, at most about 40 cells, at most about 30 cells, at most about 20 cells, at most about 10 cells, or at most about 5 cells.
  • the region of the multi-dimensional image may be characterized by pixels in spatial coordinates.
  • the region maybe at least about 1, at least about 2, at least about 5, at least about 10, at least about 15, at least about 20, at least about 30, at least about 40, at least about 50, at least about 60, at least about 80, at least about 100, at least about 120, at least about 150, at least about 200 pixels in at least one or more dimensions of X, Y, or Z of the multi-dimension image.
  • the region may be at most about200, at most about 150, at most about 120, atmost about 100, at most about 80, at most about 60, at most about 50, at most about 40, at most about 30, at most about 20, at most about 15, at most about 10, at most about 5, at most about 2, at most about 1 pixels in at least one or more dimensions of X, Y, or Z of the multi-dimension image.
  • environment variables further comprise cell type composition of a cell’s neighborhood.
  • the cellular environment in living organisms is complex, comprising various cell types that interact and influence each other's functions.
  • the composition of cell types in a neighborhood can vary significantly depending on the tissue or organ being considered. The proportions of these cell types within a cell's neighborhood can vary based on factors like developmental stage, tissue function, injury, and disease.
  • environment variables further comprise counts of negative control probes in a cell’s neighborhood.
  • environment variables further comprise total gene counts in a cell’s neighborhood. The collective number of genes expressed across all the cells within the immediate vicinity of a particular cell within a tissue or organ can be counted. Each cell in an organism's body contains a full set of genes, but not all genes are actively expressed in every cell at all times. The gene expression profile of a cell determines its function and interactions within its microenvironment.
  • the environment confounder matrix may comprise a number of one or more environment variable.
  • the number of the one or more environment variables in the environment confounder matrix canbe from 1 to 50.
  • the number of the one or more environment variables in the environment confounder matrix can be from about 1 to about 50, from about 1 to about 40, from about 1 to about 30, from about 1 to about 20, from about 1 to about 15, from about 1 to about 10, from about 1 to about 5, or from about 1 to about 2.
  • the number of the one or more environment variables in the environment confounder matrix can be from about 2 to about 50, from about 2 to about 40, from about 2 to about 30, from about 2 to about 20, from about 2 to about 15, from about 2 to about 10, or from about 2 to about 5.
  • the number of the one or more environment variables in the environment confounder matrix can be from about 20 to about 50, from about 20 to about 40, or from about 20 to about 30. In some embodiments, the number of the one or more environment variables in the environment confounder matrix canbe from about 30 to about 50, or from about 30 to about 40. In some embodiments, the number of the one or more environment variables in the environment confounder matrix canbe from about 40 to about 50. In some embodiments, the number of the one or more environment variables in the environment confounder matrix can be at least about 1, at least about 2, at least about 5, at least about 10, at least about 15, at least about 20, at least about 30, at least about 40, or at least about 50. In some embodiments, the method and system described herein further comprise receiving selection of the one or more environment variables from a user via a graphical user interface.
  • the method and system described herein comprise determining, by the computer processor, a correlation between the plurality of genes of the gene expression matrix conditional on the environment confounder matrix; and identifying, by the computer processor and based on the correlation determined, at least one subset of the plurality of genes, genes of the at least one subset can be mutually correlated to one another in the biological sample.
  • a non-limiting example of workflow of SPARC comprising four operations: generating of spatial correlation conditional on cell type and other confounding variables, deriving gene modules from the conditional correlation matrix, scoring module activity, and estimating the role of each cell type in each gene/module.
  • Environment expression matrix can be computed, to include the average expression profile of one cell’s neighbors in a single row. Correlation of environment expression matrix conditional on environment confounding matrix can be computed and exported, as well as the neighbor network. As shown in Fig. 3B a non-limiting example of Operation 1 , in some embodiments, cell coordinates from the image, (i.e., XY positions) are used to yield neighbor relationships. Single cell expression environment matrix and the generated neighbor relationships are used to generate environment expression matrix, which can comprise expression around each cells’ neighborhood, as illustrated in Fig. 1C. Further, neighbor relationship and cell type assignments and other environment variables are used to generate environment confounding matrix. In some embodiments, environment confounding matrix can comprise average value of confounding variables in each cell’s neighborhood.
  • conditional covariance can be generated, based on covariances of environment expression matrix and confounding environment matrix: cov(X
  • Y) cov(X) - cov(X,Y)cov(Y) cov(Y,X), where X can be environment expression matrix and Y can be confounding environment matrix.
  • the calculated conditional correlation covariance can be rescaled to have unit diagonal, as illustrated in Fig. 1G.
  • the method and system described herein further comprise determining the correlation can comprise analyzing covariance of the gene expression matrix conditional on the environment confounder matrix.
  • the method and system described herein further comprise determining a degree of the correlation and identifying the at least one subset based on the degree of correlation.
  • the identifying can be based at least in part on determining a threshold level of the degree of correlation.
  • the at least one subset can comprise a plurality of subsets that are different from one another.
  • the plurality of subsets can comprise a first subset having a first plurality of genes and a second sub set having a second plurality of genes, the first plurality of genes has at least one gene that is not in common with the second plurality of genes.
  • gene modules can be derived from the conditional correlation matrix.
  • conditional correlation matrix and gene expression matrix can be used as input.
  • the conditional correlation matrix is classified by clusters defining gene modules.
  • gene weights can be defined for scoring module activity.
  • the module gene membership and module gene weights can be generated.
  • the original conditional correlation is transformed to square of the original value when the original value is greater than 0.1 (as shown in Fig. IF).
  • the original conditional correlation is round to zero when the original conditional correlation is less than 0.1.
  • transformed correlation is used to define network and/or graph with edge weights.
  • Clustering is performed to generate gene classification defining membership.
  • classification is performed based on one or more methods comprising K-Means clustering, hierarchical clustering, community detection in networks, Leiden clustering, graph partitioning algorithms, or dimensionality reduction and clustering.
  • gene weights can be calculated for a given module.
  • weight of each gene (mean neighborhood expression of the gene) -1/2 . The resulting values of weights can be rescaled to sum to 1 .
  • the method and system described herein further comprise determining a degree of the correlation and identifying the at least one sub set based on the degree of correlation. In some embodiments, the method and system described herein further comprise displaying, via a graphical user interface, a graphical representation of the additional degree of correlation . In some embodiments, the identifying is based at least in part on determining a threshold level of the degree of correlation.
  • the method and system described herein further comprise generating a gene cluster map comprising a plurality of shapes representing the plurality of genes.
  • the plurality of shapes is arranged in a plurality of clusters, and a cluster of the plurality of clusters corresponds to the at least one subset.
  • the method and system described herein further comprising displaying, via a graphical user interface, the gene cluster map to a user.
  • scores for module activity can be generated.
  • input data can comprise one or more of neighbor network, module gene weights, and/or gene expression matrix.
  • neighbor network may come from output of Operation 1 .
  • module gene weights may come from output of Operation 2.
  • scores for single cell module can be computed by taking average of single cell expression of module genes.
  • the average may be weighted average, arithmetic average, geometric mean, harmonic mean, median, or quadratic mean.
  • scores for environment module can be computed by taking average of environment expression of module genes.
  • the average may be weighted average, arithmetic average, geometric mean, harmonic mean, median, or quadratic mean.
  • scores for module activity is generated.
  • gene weights can be calculated.
  • weight (mean neighborhood expression of the gene) /2 . All weights can be rescaled to sum to 1.
  • the mean may be weighted average, arithmetic average, geometric mean, harmonic mean, median, or quadratic mean.
  • scores for single cell module is calculated based on single expression matrix and single cell module weights.
  • scores for environment module can be calculated based environment expression matrix (output from Operation 1) and environment expression weights.
  • cell type attribution analysis is performed to estimate the role of each cell type in each gene/module.
  • one or more of scores of single cell module, scores of environment module, and/or gene expression matrix can be used as input. Score for each cell type’s involvement with each module gene is calculated. Given a module gene and a cell type, the correlation of environment scores for the module with neighborhood expression of the gene by the cell type is generated. Role of each cell type in each module is summarized, by reporting the maximum value of the above statistic of the cell type attains over the module genes. In some embodiments, for each module, a matrix of cell type vs gene attribution scores is reported.
  • a single matrix of module vs cell type attribution score is reported.
  • Fig. 3E a nonlimiting example of Operation 4, the involvement of a single cell type in a single module gene and the involvement of a cell type in the module across all genes can be scored.
  • one or more of neighbor relationships, single cell expression levels, and/or cell types can be used as input, to generate cell type specific environment expression, comprising total expression of the gene by the cell type in each cell’s neighborhood. The correlation is generated between cell type specific environment expression and cells’ environment scores for the module.
  • the correlation may be based on Pearson Correlation, Spearman’s Rank correlation, Kendall’s Tau, Point-Biserial Correlation, Distance Correlation, Partial Correlation or Bivariate Correlation.
  • the attribution score is generated for involvement of the cell type in the gene’s contribution to the module. For scoring involvement of a cell type in the module across all genes, attribution score for a cell type in a module is calculated by taking the maximum of attribution score of the cell type in the module genes.
  • the method and system described herein further comprise scoring each cell of the plurality of cells based on single cell expression level of the genes of the at least one subset. In some embodiments, the method and system described herein further comprising generating an additional multi-dimensional image of the biological sample based on the scoring.
  • the method and system described herein comprise clustering analysis for gene expression matrices.
  • the method and system described herein further comprise classifying conditional correlation matrices by clusters defining gene modules.
  • the methods include using a trained classifier or algorithm to analyze sample data, particularly to performing a binary classification of gene expression.
  • Clustering is a widely used algorithm for community detection in network analysis, particularly in the field of single -cell RNA sequencing (scRNA-seq) data analysis. It's used to identify distinct clusters or communities of cells based on their gene expression profiles.
  • clustering algorithms consider both the feature values and the spatial proximity of the objects or features within an image of biological sample.
  • Examples of clustering algorithms commonly used in geospatial analysis include k-means clustering, hierarchical clustering, DBSCAN (Density -Based Spatial Clustering of Applications with Noise), or spatially constrained clustering algorithms.
  • the resulting clusters can be evaluated and validated using appropriate metrics to assess the quality and significance of the clustering results.
  • Clustering can reveal hidden relationships within gene expression data and provide insights into the underlying biology.
  • RNA expression data is collected using techniques like microarrays or RNA sequencing (RNA-seq). Each row represents a gene, and each column represents a sample (e.g., different cells, tissues, conditions, time points).
  • Data Preprocessing the data is normalized to account for differences in library sizes and other technical factors.
  • methods comprise TPM (transcripts per million) normalization for RNA-seq data.
  • Feature Selection in some cases, a sub set of informative genes is selected if the dataset is large. This can reduce noise and computational complexity.
  • Distance Metric a distance metric is chosen to quantify the similarity or dissimilarity between genes based on their expression profiles. In some cases, metrics comprise Euclidean distance, Pearson correlation, or cosine similarity.
  • Clustering Algorithm Selection a clustering algorithm is chosen based on dataset's characteristics.
  • algorithms comprise hierarchical clustering, K-means clustering, and more advanced methods like DBSCAN (Density -Based Spatial Clustering of Applications with Noise) or agglomerative clustering.
  • Clustering Execution The chosen clustering algorithm is applied to preprocessed gene expression data. The algorithm will group genes with similar expression patterns into clusters.
  • Visualization visualizations of the clustering results is created to better understand the relationships between gene expression patterns.
  • visualization can comprise Heatmaps, dendrogram plots, UMAP, or t-SNE plots.
  • Cluster Analysis clustering results is interpreted to provide insights into the roles of coexpressed genes.
  • a group of samples from two or more groups can be analyzed with a statistical classification method. Differential gene or nucleic acid level data can be discovered that can be used to build a classifier that differentiates between the two or more groups. A new sample can then be analyzed so that the classifier can associate the new sample with one of the two or more groups.
  • Commonly used supervised classifiers include without limitation the neural network (multi-layer perceptron), support vector machines, k- nearest neighbors, Gaussian mixture model, Gaussian, naive Bayes, decision tree and radial basis function (RBF) classifiers.
  • Linear classification methods include Fisher's linear discriminant, LDA, logistic regression, naive Bayes classifier, perceptron, and support vector machines (SVMs).
  • Other classifiers for use with the invention include quadratic classifiers, k-nearest neighbor, boosting, decision trees, random forests, neural networks, pattern recognition, Elastic Net, Golub Classifier, Parzen -window, Iterative RELIEF, Classification Tree, Maximum Likelihood Classifier, Nearest Centroid, Prediction Analysis of Microarrays (PAM), Fuzzy C-Means Clustering, Bayesian networks and Hidden Markov models.
  • PAM Prediction Analysis of Microarrays
  • Fuzzy C-Means Clustering Bayesian networks and Hidden Markov models.
  • the methods described herein is performing a binary classification of gene expression with at least about 70%, at least about 72%, at least about 74%, at least about 76%, at least about 78%, at least about 80%, at least about 82%, at least about 84%, at least about 86%, at least about 88%, at least about 90%, at least about 92%, at least about 94%, at least about 96%, at least about 98%, or at least about 100% sensitivity.
  • the methods described herein is performing a binary classification of gene expression with at least about 70%, at least about 72%, at least about 74%, at least about 76%, at least about 78%, at least about 80%, at least about 82%, at least about 84%, at least about 86%, at least about 88%, at least about 90%, at least about 92%, at least about 94%, at least about 96%, at least about 98%, or at least about 100% specificity.
  • the methods described herein is performing a binary classification of gene expression with at least about 70%, at least about 72%, at least about 74%, at least about 76%, at least about 78%, at least about 80%, at least about 82%, at least about 84%, at least about 86%, at least about 88%, at least about 90%, at least about 92%, at least about 94%, at least about 96%, at least about 98%, or at least about 100% accuracy.
  • Training of multi-dimensional classifiers may be performed using numerous samples. For example, training of the multi-dimensional classifier may be performed using at least about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200 or more samples. In some cases, training of the multidimensional classifier may be performed using at least about 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 350, 400, 450, 500 or more samples.
  • training of the multi-dimensional classifier may be performed using at least about 525, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 2000 or more samples.
  • the methods described herein may comprise training machine learning models.
  • the methods described herein may comprise trained machine learning models comprise a supervised machine learning model, an unsupervised machine learning model, a deep learning model, or a time-series machine learning model.
  • the trained algorithm may comprise an unsupervised machine learning algorithm.
  • the trained algorithm may comprise a supervised machine learning algorithm.
  • the trained algorithm may comprise a deep learning algorithm.
  • the trained algorithm may comprise a time-series machine learning algorithm.
  • the trained algorithm may comprise a classification and regression tree (CART) algorithm.
  • the supervised machine learning algorithm may comprise, for example, a Random Forest, a support vector machine (SVM), a neural network, or a deep learning algorithm.
  • the trained algorithm may comprise a self -supervised machine learning algorithm.
  • the time-series machine learning algorithm may comprise autoregressive integrated moving average (ARIMA), recurrent neural networks (RNN), convolutional neural networks (CNN), Gaussian processes, long short-term memory networks, gated recurrent unit networks, Hidden Markov Models, or transformer-based models.
  • ARIMA autoregressive integrated moving average
  • RNN recurrent neural networks
  • CNN convolutional neural networks
  • Gaussian processes long short-term memory networks
  • gated recurrent unit networks Markov models
  • Hidden Markov Models or transformer-based models.
  • a machine learning algorithm of a method as described herein utilizes one or more neural networks.
  • a neural network is a type of computational system that can learn the relationships between an input dataset and a target dataset.
  • a neural network may be a software representation of a human neural system (e.g., cognitive system), intended to capture “learning” and “generalization” abilities as used by a human.
  • the machine learning algorithm can comprise a neural network comprising a CNN.
  • Non-limiting examples of structural components of machine learning algorithms described herein include: CNNs, recurrent neural networks, dilated CNNs, fully -connected neural networks, deep generative models, and Boltzmann machines. Total number of learnable or trainable parameters;
  • the neural network can comprise artificial neural networks (ANNs).
  • ANNs may be machine learning algorithms that may be trained to map an input dataset to an output dataset, where the ANN can comprise an interconnected group of nodes organized into multiple layers of nodes.
  • the ANN architecture may comprise at least an input layer, one or more hidden layers, and an output layer.
  • the ANN may comprise any total number of layers, and any number of hidden layers, where the hidden layers function as trainable feature extractors that allow mapping of a set of input data to an output value or set of output values.
  • a deep learning algorithm (such as a deep neural network (DNN)) is an ANN comprising a plurality of hidden layers, e.g., two or more hidden layers.
  • DNN deep neural network
  • Each layer of the neural network may comprise a number of nodes (or “neurons”).
  • a node receives input that comes either directly from the input data or the output of nodes in previous layers, and performs a specific operation, e.g., a summation operation.
  • a connection from an input to a node is associated with a weight (or weighting factor).
  • the node may sum up the products of all pairs of inputs and their associated weights.
  • the weighted sum may be offset with a bias.
  • the output of a node or neuron may be gated using a threshold or activation function.
  • the activation function may be a linear or non-linear function.
  • the activation function may be, for example, a rectified linear unit (ReLU) activation function, a Leaky ReLU activation function, or other function such as a saturating hyperbolic tangent, identity, binary step, logistic, arctan, softsign, parametric rectified linear unit, exponential linear unit, softplus, bent identity, softexponential, sinusoid, sine, Gaussian, or sigmoid function, or any combination thereof.
  • ReLU rectified linear unit
  • Leaky ReLU activation function or other function such as a saturating hyperbolic tangent, identity, binary step, logistic, arctan, softsign, parametric rectified linear unit, exponential linear unit, softplus, bent identity, softexponential, sinusoid, sine, Gaussian, or sigmoid function, or any combination thereof.
  • the weighting factors, bias values, and threshold values, or other computational parameters of the neural network may be “taught” or “learned” in a training phase using one or more sets of training data.
  • the parameters may be trained using the input data from a training dataset and a gradient descent or backward propagation method so that the output value(s) that the ANN computes can be consistent with the examples included in the training dataset.
  • a machine learning algorithm can comprise a neural network such as a deep CNN.
  • the network is constructed with any number of convolutional layers, dilated layers or fully -connected layers.
  • the number of convolutional layers is between 1-10 and the dilated layers between 0-10.
  • the total number of convolutional layers may be at least about 1, 2, 3, 4, 5, 10, 15, 20, or greater, and the total number of dilated layers may be at least about 1, 2, 3, 4, 5, 10, 15, 20, or greater.
  • the total number of convolutional layers may be at most about 20, 15, 10, 5, 4, 3, or less, and the total number of dilated layers may be at most about 20, 15, 10, 5, 4, 3, or less. In some embodiments, the number of convolutional layers is between 1-10 and the fully-connected layers between 0-10.
  • the total number of convolutional layers (including input and output layers) may be at least about 1, 2, 3, 4, 5, 10, 15, 20, or greater, and the total number of fully -connected layers may be at least about 1 , 2, 3, 4, 5, 10, 15, 20, or greater.
  • the total number of convolutional layers may be at most about 20, 15, 10, 5, 4, 3, 2, 1, or less, and the total number of fully -connected layers may be at most about 20, 15, 10, 5, 4, 3, 2, 1, or less.
  • an attention mechanism e.g., a transformer
  • Attention mechanisms may focus on, or “attend to,” certain input regions while ignoring others. This may increase model performance because certain input regions may be less relevant.
  • an attention unit can compute a dot product of a context vector and the input at the operation, among other operations. The output of the attention unit may define where the most relevant information in the input sequence is located.
  • FIG. 4 a block diagram is shown depicting an exemplary machine that includes a computer system 400 (e.g., a processing or computing system) within which a set of instructions can execute for causing a device to perform or execute any one or more of the aspects and/or methodologies for static code scheduling of the present disclosure.
  • a computer system 400 e.g., a processing or computing system
  • the components in Fig. 4 are examples only and do not limit the scope of use or functionality of any hardware, software, embedded logic component, or a combination of two or more such components implementing particular embodiments.
  • Computer system 400 may include one or more processors 401, a memory 403, and a storage 408 that communicate with each other, and with other components, via a bus 440.
  • the bus 440 may also link a display 432, one or more input devices 433 (which may, for example, include a keypad, a keyboard, a mouse, a stylus, etc.), one or more output devices 434, one or more storage devices 435, and various tangible storage media 436. All of these elements may interface directly or via one or more interfaces or adaptors to the bus 440.
  • the various tangible storage media 436 can interface with the bus 440 via storage medium interface 426.
  • Computer system 400 may have any suitable physical form, including but not limited to one or more integrated circuits (ICs), printed circuit boards (PCBs), mobile handheld devices (such as mobile telephones or PDAs), laptop or notebook computers, distributed computer systems, computing grids, or servers.
  • ICs integrated circuits
  • PCBs printed circuit boards
  • mobile handheld devices such as mobile telephone
  • Computer system 400 includes one or more processor(s) 401 (e.g., central processing units (CPUs), general purpose graphics processing units (GPGPUs), or quantum processing units (QPUs)) that carry out functions.
  • processor(s) 401 e.g., central processing units (CPUs), general purpose graphics processing units (GPGPUs), or quantum processing units (QPUs)
  • processor(s) 401 optionally contains a cache memory unit 402 for temporary local storage of instructions, data, or computer addresses.
  • Processor(s) 401 are configured to assist in execution of computer readable instructions.
  • Computer system 400 may provide functionality for the components depicted in Fig. 4 as a result of the processor(s) 401 executing non-transitory, processor-executable instructions embodied in one or more tangible computer-readable storage media, such as memory 403, storage 408, storage devices 435, and/or storage medium 436.
  • the computer-readable media may store software that implements particular embodiments, and processor(s) 401 may execute the software.
  • Memory 403 may read the software from one or more other computer- readable media (such as mass storage device(s) 435, 436) or from one or more other sources through a suitable interface, such as network interface 420.
  • the software may cause processor(s) 401 to carry out one or more processes or one or more steps of one or more processes described or illustrated herein. Carrying out such processes or steps may include defining data structures stored in memory 403 and modifying the data structures as directed by the software.
  • the memory 403 may include various components (e.g., machine readable media) including, but not limited to, a random access memory component (e.g., RAM 404) (e.g., static RAM (SRAM), dynamic RAM (DRAM), ferroelectric random access memory (FRAM), phase-change random access memory (PRAM), etc.), a read-only memory component (e.g., ROM 405), and any combinations thereof.
  • ROM 405 may act to communicate data and instructions unidirectionally to processor(s) 401
  • RAM 404 may act to communicate data and instructions bidirectionally with processor(s) 401.
  • ROM 405 and RAM 404 may include any suitable tangible computer-readable media described below.
  • a basic input/output system 406 (BIOS) including basic routines that help to transfer information between elements within computer system 400, such as during startup, may be stored in the memory 403.
  • Fixed storage 408 is connected bidirectionally to processor(s) 401, optionally through storage control unit 407.
  • Fixed storage 408 provides additional data storage capacity and may also include any suitable tangible computer-readable media described herein.
  • Storage 408 may be used to store operating system 409, executable(s) 410, data 411, applications 412 (application programs), and the like.
  • Storage 408 can also include an optical disk drive, a solid-state memory device (e.g., flash-based systems), or a combination of any of the above.
  • Information in storage 408 may, in appropriate cases, be incorporated as virtual memory in memory 403.
  • storage device(s) 435 may be removably interfaced with computer system 100 (e.g., via an external port connector (not shown)) via a storage device interface 425.
  • storage device(s) 435 and an associated machine-readable medium may provide non-volatile and/or volatile storage of machine-readable instructions, data structures, program modules, and/or other data for the computer system 400.
  • software may reside, completely or partially, within a machine-readable medium on storage device(s) 435.
  • software may reside, completely or partially, within processor(s) 401
  • such architectures include an Industry Standard Architecture (ISA) bus, an Enhanced ISA (EISA) bus, a Micro Channel Architecture (MCA) bus, a Video Electronics Standards Association local bus (VLB), a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCLX) bus, an Accelerated Graphics Port (AGP) bus, HyperTransport (HTX) bus, serial advanced technology attachment (SATA) bus, and any combinations thereof.
  • ISA Industry Standard Architecture
  • EISA Enhanced ISA
  • MCA Micro Channel Architecture
  • VLB Video Electronics Standards Association local bus
  • PCI Peripheral Component Interconnect
  • PCLX PCI-Express
  • AGP Accelerated Graphics Port
  • HTX HyperTransport
  • SATA serial advanced technology attachment
  • Computer system 400 may also include an input device 433.
  • a user of computer system 400 may enter commands and/or other information into computer system 400 via input device(s) 433.
  • Examples of an input device(s) 433 include, but are not limited to, an alpha-numeric input device (e.g., a keyboard), a pointing device (e.g., a mouse or touchpad), a touchpad, a touch screen, a multi-touch screen, a joystick, a stylus, a gamepad, an audio input device (e.g., a microphone, a voice response system, etc.), an optical scanner, a video or still image capture device (e.g., a camera), and any combinations thereof.
  • an alpha-numeric input device e.g., a keyboard
  • a pointing device e.g., a mouse or touchpad
  • a touchpad e.g., a touch screen
  • a multi-touch screen e.g., a joystick,
  • the input device is a Kinect, Leap Motion, or the like.
  • Input device(s) 433 may be interfaced to bus 440 via any of a variety of input interfaces 423 (e.g., input interface 423) including, but not limited to, serial, parallel, game port, USB, FIREWIRE, THUNDERBOLT, or any combination of the above.
  • computer system 400 when computer system 400 is connected to network 430, computer system 400 may communicate with other devices, specifically mobile devices and enterprise systems, distributed computing systems, cloud storage systems, cloud computing systems, and the like, connected to network 430. Communications to and from computer system 400 may be sent through network interface 420.
  • network interface 420 may receive incoming communications (such as requests or responses from other devices) in the form of one or more packets (such as Internet Protocol (IP) packets) from network 430, and computer system 400 may store the incoming communications in memory 403 for processing.
  • IP Internet Protocol
  • Computer system 400 may similarly store outgoing communications (such as requests or responses to other devices) in the form of one or more packets in memory 403 and communicated to network 430 from network interface 420.
  • Processor(s) 401 may access these communication packets stored in memory 403 for processing.
  • Examples of the network interface 420 include, but are not limited to, a network interface card, a modem, and any combination thereof.
  • Examples of a network 430 or network segment 430 include, but are not limited to, a distributed computing system, a cloud computing system, a wide area network (WAN) (e.g., the Internet, an enterprise network), a local area network (LAN) (e.g., a network associated with an office, a building, a campus or other relatively small geographic space), a telephone network, a direct connection between two computing devices, a peer-to-peer network, and any combinations thereof.
  • a network, such as network 430 may employ a wired and/or a wireless mode of communication. In general, any network topology may be used.
  • Information and data canbe displayed through a display 432.
  • a display 432 include, but are not limited to, a cathode ray tube (CRT), a liquid crystal display (LCD), a thin film transistor liquid crystal display (TFT-LCD), an organic liquid crystal display (OLED) such as a passive-matrix OLED (PMOLED) or active-matrix OLED (AMOLED) display, a plasma display, and any combinations thereof.
  • the display 432 can interface to the processor(s) 401, memory 403, and fixed storage 108, as well as other devices, such as input device(s) 433, via the bus 440.
  • the display 432 is linked to the bus 440 via a video interface 422, and transport of data between the display 432 and the bus 440 can be controlled via the graphics control 421.
  • the display is a video projector.
  • the display is a head-mounted display (HMD) such as a VR headset.
  • suitable VR headsets include, by way of non -limiting examples, HTC Vive, Oculus Rift, Samsung Gear VR, Microsoft HoloLens, Razer OSVR, FOVE VR, Zeiss VR One, Avegant Glyph, Freefly VR headset, and the like.
  • the display is a combination of devices such as those disclosed herein.
  • computer system 400 may include one or more other peripheral output devices 434 including, but not limited to, an audio speaker, a printer, a storage device, and any combinations thereof.
  • peripheral output devices may be connected to the bus 440 via an output interface 424.
  • Examples of an output interface 424 include, but are not limited to, a serial port, a parallel connection, a USB port, a FIREWIRE port, a THUNDERBOLT port, and any combinations thereof.
  • computer system 400 may provide functionality as a result of logic hardwired or otherwise embodied in a circuit, which may operate in place of or together with software to execute one or more processes or one or more steps of one or more processes described or illustrated herein.
  • Reference to software in this disclosure may encompass logic, and reference to logic may encompass software.
  • reference to a computer-readable medium may encompass a circuit (such as an IC) storing software for execution, a circuit embodying logic for execution, or both, where appropriate.
  • the present disclosure encompasses any suitable combination of hardware, software, or both.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine.
  • a processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
  • a software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
  • the storage medium may be integral to the processor.
  • the processor and the storage medium may reside in an ASIC.
  • the ASIC may reside in a user terminal.
  • the processor and the storage medium may reside as discrete components in a user terminal.
  • suitable computing devices include, by way of non-limiting examples, cloud computing platforms, distributed computing platforms, server clusters, server computers, desktop computers, laptop computers, notebook computers, sub - notebook computers, netbook computers, netpad computers, set-top computers, media streaming devices, handheld computers, Internet appliances, mobile smartphones, and tablet computers.
  • the computing device includes an operating system configured to perform executable instructions.
  • the operating system is, for example, software, including programs and data, which manages the device’s hardware and provides services for execution of applications.
  • suitable server operating systems include, by way of non-limiting examples, FreeBSD, OpenBSD, NetBSD®, Linux, Apple® Mac OS X Server®, Oracle® Solaris®, Windows Server®, and Novell® NetWare®.
  • suitable personal computer operating systems include, by way of non-limiting examples, Microsoft® Windows®, Apple® Mac OS X®, UNIX®, and UNIX -like operating systems such as GNU/Linux®.
  • the operating system is provided by cloud computing.
  • suitable mobile smartphone operating systems include, by way of non-limiting examples, Nokia® Symbian® OS, Apple® iOS®, Research In Motion® BlackBerry OS®, Google® Android®, Microsoft® Windows Phone® OS, Microsoft® Windows Mobile® OS, Linux®, and Palm® WebOS®.
  • Non-transitory computer readable storage medium
  • the platforms, systems, media, and methods disclosed herein include one or more non-transitory computer readable storage media encoded with a program including instructions executable by the operating system of an optionally networked computing device.
  • a computer readable storage medium is a tangible component of a computing device.
  • a computer readable storage medium is optionally removable from a computing device.
  • a computer readable storage medium includes, by way of non -limiting examples, CD-ROMs, DVDs, flash memory devices, solid state memory, magnetic disk drives, magnetic tape drives, optical disk drives, distributed computing systems including cloud computing systems and services, and the like.
  • the program and instructions are permanently, substantially permanently, semi-permanently, or non- transitorily encoded on the media.
  • the platforms, systems, media, and methods disclosed herein include at least one computer program, or use of the same.
  • a computer program includes a sequence of instructions, executable by one or more processor(s) of the computing device’s CPU, written to perform a specified task.
  • Computer readable instructions may be implemented as program modules, such as functions, objects, Application Programming Interfaces (APIs), computing data structures, and the like, that perform particular tasks or implement particular abstract datatypes.
  • APIs Application Programming Interfaces
  • a computer program can comprise one sequence of instructions.
  • a computer program can comprise a plurality of sequences of instructions.
  • a computer program is provided from one location. In other embodiments, a computer program is provided from a plurality of locations.
  • a computer program includes one or more software modules.
  • a computer program includes, in part or in whole, one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof.
  • a computer program includes a web application.
  • a web application in various embodiments, utilizes one or more software frameworks and one or more database systems.
  • a web application is created upon a software framework such as Microsoft® .NET or Ruby on Rails (RoR).
  • a web application utilizes one or more database systems including, by way of non-limiting examples, relational, non-relational, object oriented, associative, XML, and document oriented database systems.
  • suitable relational database systems include, by way of non-limiting examples, Microsoft® SQL Server, mySQLTM, and Oracle®.
  • a web application in various embodiments, is written in one or more versions of one or more languages.
  • a web application may be written in one or more markup languages, presentation definition languages, client-side scripting languages, server-side coding languages, database query languages, or combinations thereof.
  • a web application is written to some extent in a markup language such as Hypertext Markup Language (HTML), Extensible Hypertext Markup Language (XHTML), or extensible Markup Language (XML).
  • a web application is written to some extent in a presentation definition language such as Cascading Style Sheets (CSS).
  • CSS Cascading Style Sheets
  • a web application is written to some extent in a client-side scripting language such as Asynchronous JavaScript and XML (AJAX), Flash® ActionScript, JavaScript, or Silverlight®.
  • AJAX Asynchronous JavaScript and XML
  • a web application is written to some extent in a server-side coding language such as Active Server Pages (ASP), ColdFusion®, Perl, JavaTM, JavaServer Pages (JSP), Hypertext Preprocessor (PHP), PythonTM, Ruby, Tel, Smalltalk, WebDNA®, or Groovy.
  • a web application is written to some extent in a database query language such as Structured Query Language (SQL).
  • SQL Structured Query Language
  • a web application integrates enterprise server products such as IBM® Lotus Domino®.
  • a web application includes a media player element.
  • a media player element utilizes one or more of many suitable multimedia technologies including, by way of non -limiting examples, Adobe® Flash®, HTML 5, Apple® QuickTime®, Microsoft® Silverlight®, JavaTM, and Unity®.
  • an application provision system can comprise one or more databases 500 accessed by a relational database management system (RDBMS) 510.
  • RDBMSs include Firebird, MySQL, PostgreSQL, SQLite, Oracle Database, Microsoft SQL Server, IBMDB2, IBM Informix, SAP Sybase, Teradata, and the like.
  • the application provision system further can comprise one or more application severs 520 (such as Java servers, .NET servers, PHP servers, and the like) and one or more web servers 530 (such as Apache, IIS, GWS and the like).
  • the web server(s) optionally expose one or more web services via app application programming interfaces (APIs) 540.
  • APIs app application programming interfaces
  • an application provision system alternatively has a distributed, cloud-based architecture 600 and can comprise elastically load balanced, auto-scaling web server resources 610 and application server resources 620 as well synchronously replicated databases 630.
  • a computer program includes a mobile application provided to a mobile computing device.
  • the mobile application is provided to a mobile computing device at the time it is manufactured.
  • the mobile application is provided to a mobile computing device via the computer network described herein.
  • a mobile application is created by techniques known to those of skill in the art using hardware, languages, and development environments known to the art. Those of skill in the art will recognize that mobile applications are written in several languages. Suitable programming languages include, by way of non -limiting examples, C, C++, C#, Objective-C, JavaTM, JavaScript, Pascal, Object Pascal, PythonTM, Ruby, VB.NET, WML, and XHTML/HTML with or without CSS, or combinations thereof.
  • Suitable mobile application development environments are available from several sources. Commercially available development environments include, by way of non-limiting examples, AirplaySDK, alcheMo, Appcelerator®, Celsius, Bedrock, Flash Lite, .NET Compact Framework, Rhomobile, and WorkLight Mobile Platform. Other development environments are available without cost including, by way of non -limiting examples, Lazarus, MobiFlex, Mo Sync, and PhoneGap. Also, mobile device manufacturers distribute software developer kits including, by way of non-limiting examples, iPhone and iPad (iOS) SDK, AndroidTM SDK, BlackBerry® SDK, BREW SDK, Palm® OS SDK, Symbian SDK, webOS SDK, and Windows® Mobile SDK.
  • iOS iPhone and iPad
  • a computer program includes a standalone application, which is a program that is run as an independent computer process, not an add-on to an existing process, e.g., not a plug-in.
  • standalone applications are often compiled.
  • a compiler is a computer program(s) that transforms source code written in a programming language into binary object code such as assembly language or machine code. Suitable compiled programming languages include, by way of non-limiting examples, C, C++, Objective-C, COBOL, Delphi, Eiffel, JavaTM, Lisp, PythonTM, Visual Basic, and VB .NET, or combinations thereof. Compilation is often performed, at least in part, to create an executable program.
  • a computer program includes one or more executable complied applications.
  • the computer program includes a web browser plug-in (e.g., extension, etc.).
  • a plug-in is one or more software components that add specific functionality to a larger software application. Makers of software applications support plug-ins to enable third-party developers to create abilities which extend an application, to support easily adding new features, and to reduce the size of an application. When supported, plug-ins enable customizing the functionality of a software application. For example, plug-ins are commonly used in web browsers to play video, generate interactivity, scan for viruses, and display particular file types. Those of skill in the art will be familiar with several web browser plug-ins including, Adobe® Flash® Player, Microsoft® Silverlight®, and Apple® QuickTime®.
  • the toolbar can comprise one or more web browser extensions, add-ins, or add-ons. In some embodiments, the toolbar can comprise one or more explorer bars, tool bands, or desk bands.
  • plug-in frameworks are available that enable development of plug-ins in various programming languages, including, by way of non-limiting examples, C++, Delphi, JavaTM, PHP, PythonTM, and VB .NET, or combinations thereof.
  • Web browsers are software applications, designed for use with network-connected computing devices, for retrieving, presenting, and traversing information resources on the World Wide Web. Suitable web browsers include, by way of non-limiting examples, Microsoft® Internet Explorer®, Mozilla® Firefox®, Google® Chrome, Apple® Safari®, Opera Software® Opera®, andKDE Konqueror. In some embodiments, the web browser is a mobile web browser. Mobile web browsers (also called microbrowsers, mini-browsers, and wireless browsers) are designed for use on mobile computing devices including, by way of non-limiting examples, handheld computers, tablet computers, netbook computers, subnotebook computers, smartphones, music players, personal digital assistants (PDAs), and handheld video game systems.
  • PDAs personal digital assistants
  • Suitable mobile web browsers include, by way of non-limiting examples, Google® Android® browser, RIM BlackBerry® Browser, Apple® Safari®, Palm® Blazer, Palm® WebOS® Browser, Mozilla® Firefox® for mobile, Microsoft® Internet Explorer® Mobile, Amazon® Kindle® Basic Web, Nokia® Browser, Opera Software® Opera® Mobile, and Sony® PSPTM browser.
  • the platforms, systems, media, and methods disclosed herein include software, server, and/or database modules, or use of the same.
  • software modules are created by techniques known to those of skill in the art using machines, software, and languages known to the art.
  • the software modules disclosed herein are implemented in a multitude of ways.
  • a software module can comprise a file, a section of code, a programming object, a programming structure, a distributed computing resource, a cloud computing resource, or combinations thereof.
  • a software module can comprise a plurality of files, a plurality of sections of code, a plurality of programming objects, a plurality of programming structures, a plurality of distributed computing resources, a plurality of cloud computing resources, or combinations thereof.
  • the one or more software modules comprise, by way of non -limiting examples, a web application, a mobile application, a standalone application, and a distributed or cloud computing application.
  • software modules are in one computer program or application. In other embodiments, software modules are in more than one computer program or application. In some embodiments, software modules are hosted on one machine. In other embodiments, software modules are hosted on more than one machine. In further embodiments, software modules are hosted on a distributed computing platform such as a cloud computing platform. In some embodiments, software modules are hosted on one or more machines in one location. In other embodiments, software modules are hosted on one or more machines in more than one location.
  • the platforms, systems, media, and methods disclosed herein include one or more databases, or use of the same.
  • suitable databases include, by way of non -limiting examples, relational databases, non-relational databases, object oriented databases, object databases, entity - relationship model databases, associative databases, XML databases, document oriented databases, and graph databases. Further non-limiting examples include SQL, PostgreSQL, MySQL, Oracle, DB2, Sybase, and MongoDB.
  • a database is Internetbased.
  • a database is web -based.
  • a database is cloud computing-based.
  • a database is a distributed database.
  • a database is based on one or more local computer storage devices.
  • Example 1 SPARC in a colon cancer profiled with the CosMx SMI 6000-plex assay.
  • SPARC is a toolkit for quickly identifying spatial correlations meriting attention.
  • SPARC identifies gene modules with spatial correlations that cannot be explained by trivial factors like the cell type landscape or technical effects. In some embodiments, it discovers dozens of such modules. To steer analysts towards the most interesting clusters, the software includes tools to implicate cell types in module activity and to describe module spatial patterns. SPARC is a powerful and convenient way to quickly identify spatial transcriptomics trends that deserve scarce analyst attention.
  • CosMx SMI liver cancer data was downloaded from https://nanostring.com/products/cosmx-spatial- molecular-imager/human-liver-rna-ffpe-dataset/.
  • the CosMx SMI human liver data provides a subcellular expression map of 1,000 genes and a single cell tissue atlas that categorizes each cell in the tissue as one of 18 unique cell types.
  • the complete dataset consists of over 800,000 single cells and -700 million transcripts, and a single-cell tissue atlas across a -180 mm 2 area of liver tissue.
  • the high-plex analysis provided deep insight into the cell and tissue changes that occur in cancer, including infiltration of diverse immune cells.
  • CosMx SMI colon cancer data was generated by running a 5 -micron slice from a FFPE colon cancer sample through the CosMx SMI instrument, using a 6000-plex RNA panel. Colon cancer cell typing was performed using Insitutype.
  • SPARC Algorithm Environment expression and confounder matrices were defined by averaging gene expression (or confounder variable values) across each cell’s neighbors.
  • Neighbors was defined by K-nearest or radius-based logic. For example, the neighbors were defined as the 50 cells closest to a cell in XY space. The covariance of the former matrix conditional on the latter is calculated with cov(X
  • Y) cov(X) - cov(X,Y)cov(Y)-lcov(Y,X).
  • Gene modules may be derived from a correlation matrix in many ways.
  • SPARC created a network graph in which all genes sharing conditional correlations above some threshold were connected, then clustered this graph using the Leiden algorithm.
  • Module scores were calculated as weighted averages of their genes; the default used inverse square root weighting to account for the Poisson-like mean-variance relation seen in count data. Module scores were calculated from single cell expression profiles (“single cell scores”) and from cells’ neighborhood profiles (“environment scores”).
  • SPARC was run on normalized data, computed by dividing each cell’s expression vector by its sum. Cell type, total counts and total negative control counts were used for confounding variables. SPARC’s default parameters were used throughout.
  • Ligands and ligand-receptor pairs were taken from CellChatDB (Jin S, Guerrero-Juarez CF, Zhang L, Chang I, Ramos R, Kuan CH, Myung P, Plikus MV, Nie Q. Inference and analysis of cell-cell communication using CellChat. Nature communications. 2021 Feb 17; 12(1): 1088).
  • Tertiary lymphoid structures were defined by clustering B -cell locations with the dbscan algorithm. The 3 largest clusters (130, 501 and 905 B -cells) were called tertiary lymphoid structures; the next-largest cluster had just 28 B-cells.
  • Gene expression fold-changes were computed by T-test.
  • SPARC in a colon cancer profiled with the CosMx SMI 6000-plex assay was demonstrated as in Fig. 1A.
  • SPARC begins by taking the expression profile of a small neighborhood around each cell as shown in Fig. IB and building an “environment expression matrix” as shown in Fig. 1C.
  • Typical methods produce results akin to taking the correlation matrix of the environment matrix as shown in Fig. ID
  • SPARC built an “environment confounder matrix” as shown in Fig. IE to summarize these variables for each cell’s neighborhood.
  • SPARC defined spatial correlation as the correlation matrix of the environment matrix, conditional on the confounder matrix as shown in Fig. IF.
  • Entries in this conditional correlation matrix measure genes’ tendency to be expressed in the same neighborhoods, beyond what cell typ e and other confound ers can explain.
  • Genes with cell type specific expression showed strong spatially correlation that disappears in the conditional correlation matrix as shown in Fig. 1G. Correlations that remain are not explainable by the trivial factors of the confounding matrix.
  • Most of the strongest spatial correlations in the unadjusted analysis as shown in Fig. ID were revealed as spurious by the conditional correlation matrix: among the top 5000 gene pairs found without adjusting for confounders, with a range of (0.62, 0.97), only 1018 had correlations > 0.2 in the conditional correlation matrix.
  • the conditional correlation matrix is too complex for human interpretation.
  • SPARC aided comprehension by extracting modules of mutually spatially co -expressed genes as shown in Fig. 1H
  • These modules provided a view of spatial correlation that was small enough to be understandable and expansive enough to capture complex biology.
  • One module discovered in this example analysis consisted of 17 genes collectively suggestive of tumorpromoting inflammation as illustrated in Figs. 1H, II, 1 J, IK, and IM. This included genes involved in microenvironment remodeling (CCL18, MMP2, CSTK), growth factor signaling (SFRP2, GREM1, DCN, SERPINF1), and inflammation (C3, C1R, PTGIS).
  • Each module was scored with a weighted average of its genes. Module scores were calculated both for cells’ environments and for single cell expression. A spatial map of environment scores for tumor-promoting inflammation module showed its peaking in the stroma, with smaller hotspots in the tumor bed as shown in Fig. II. As shown in Fig. 1 J, CAFs and macrophages driving module activity, with nearby mast cells, smooth muscle cells and stromal cells were found to be also participating, when looking at single-cell scores for the module. More nuanced behavior of the module genes was found across cell types and space, when zooming in to allow individual transcripts resolvable as shown in Fig. IK.
  • Fig. IM Dozens of modules may be discovered in a study. To help identify the modules of greatest interest, SPARC is used to estimate the role of each cell type in each module. Cell type involvement was summarized at the module level to facilitate comparisons between modules as shown in Fig. IL, or at the gene level to give a more nuanced view of module behavior as shown in Fig. IM.
  • the tumor-promoting inflammation module was primarily attributed to CAFs and macrophages, with a stromal cell type playing a more minor role (Fig. IL). Macrophages were primarily responsible for expression of CCL18, F13A1, and PLTP, and CAFs were the main contributor to the remaining genes (Fig. IM).
  • the genes of the exemplary summary plot illustrated in Fig. IL can be connected into one or more modules with various cell types.
  • the genes illustrated in the exemplary summary plot in Fig. IL include, for example, CRHR2 HBA1/2 46, CEACAM3 A1BG 13, IGHG1/2 GCDH 4, IGLL1 GLL5 6, JCHAIN MZB1 POU2AF1 3, GCG PYY 2, FCGBP MUC2 5,
  • OLFM4 ITLN1 DMBT1 MSLN SPIB COLl 1 A2_3, CA2_S100A14_HMGCS2_3, BCAS1 MLPH 2, B3GNT7 PLA2G2A 7, MT1G MT1X 5, ITM2C TSPAN3 4, CD24_SELENBP1_KLF5_3, PIGR CTSS 2, PLAC8 TMPRSS2 17, KRT8 KRT 19 4, MT2A _TIMP1_SERPINE1_3, C1S SERPING1 2, COL4A1 COL4A2 7,
  • FOS EGR1 12 TAGLN MYL9 10, THBS2 COL11 Al_4, COL6A2 POSTN BGN 3, LGALS1 SPARC 8, LUM MMP11 MFAP2 3, COL1 A1 COL3A1 7, MMP1 MMP3 4, VEGFA LOXL2 20, NKD1 APCDD1 FN1 3, ID1_ID3_2,PKM_TPI1_2, IFI6 ISG15 7, LAMC2 MMP7 2, PFN1 S100A6 4, CKB ELF3 11, APP JUP CDH1 3, S100A10_LGALS3_ELOB_3, BHLHE40 NEAT1 2, HDGF LMNA 2,
  • stromal cells type 2b epithelial cancer cells subtype 2, epithelial cancer cells subtype 1, macrophages, cancer-associated fibroblasts (CAFs), pericytes, endothelial cells type 1, mast cells, stromal cells type 3, endothelial cells, smooth muscle cells, monocytes, stromal cells type 1, CD4 naive T-cells, natural killer (NK) cells, plasmacytoid dendritic cells (pDCs), CD8 memory T-cells, neutrophils, glial cells, myeloid dendritic cells (mDCs), Tregs, stromal cells type 4, B-cells, stromal cells type 2a, epithelial normal cells in cellular crypts, plasmablasts, plasma cells, epithelial normal villi subtype 1, and epithelial normal cells of unclear origin.
  • CAFs cancer-associated fibroblasts
  • pericytes endothelial cells type 1
  • mast cells stromal cells type 3
  • smooth muscle cells monocytes
  • SPARC used subsets of cells to speed computation time when possible.
  • the complete SPARC workflow took less than 15 minutes on a 5.
  • 12xlarge EC2 instance server to analyze this dataset of 112,846 cells and 6,000 genes.
  • Example 2 A knowledge-driven (biology -first) workflow
  • the algorithm may be applied on a subset of the genes.
  • the genes may be involved in a given pathways.
  • the pathway may comprise hypoxia, apoptosis, proliferation, or most any pathway from GO, KEGG, or Reactome databases.
  • This re-analyzed process produced 18 modules containing 51 ligands, many arising from multiple cell types as shown in Figs. 2A and 2B. Each ligand may be associated with various cell types, as shown in the exemplary summary plot of Fig. 2B. The ligands associated with the various cell types as illustrated in the exemplary summary plot of Fig.
  • 2B include, for example, CCL19 CCL21 2, HLA-DRA HLA-DPA1 5 , HLA-G HLA-F HLA-E 3 , COL4A1 COL4A2 LAMA4 3, COL1 A1 COL6A1 6 , COMP THBS2 FN1 3 , VEGFA ADM 5, CCL18 CCL13 C3 3, GAL_VIP_2, GDF6 WNT9A 2,
  • stromal cells type 3 mast cells, glial cells, B- cells, myeloid dendritic cells (mDCs), stromal cells type 4, Tregs, epithelial normal villi subtype 1, epithelial normal cells in cellular crypts, CD4 naive T-cells, CD8 memory T-cells, endothelial cells type 1, stromal cells type 2a, natural killer (NK) cells, plasmablasts, monocytes, stromal cells type 1, neutrophils, plasmacytoid dendritic cells (pDCs), plasma cells, epithelial normal cells of unclear origin, endothelial cells, pericytes, smooth muscle cells, stromal cells type 2b, epithelial cancer cells subtype 2, epithelial cancer cells subtype 1, macrophages, and cancer-associated fibroblasts (CAFs).
  • CAFs cancer-associated fibroblasts
  • Ligand-receptor pairs may also be applied for another use case. If a ligand -receptor pair displays spatial correlation, it suggests these genes are co-regulated, presumably either by the ligand increasing the receptor’s expression or via some latent variable inducing regional expression in both genes (Li 2023). As shown in Fig. 2F, of the 555 ligand-receptor pairs in this panel, very few showed evidence for spatial co -regulation: only 11 had conditional correlation > 0.1. One highly correlated pair was FCER2 and CR2, both primarily expressed by B-cells. As shown in Fig.
  • SPARC can also be used to explore individual genes of high prior interest.
  • the correlation network around FCER2 and CR2 was examined. When looking only at conditional correlations > 0.1. FCER2 had no further connections, but CR2 belonged to a densely-connected network of 10 additional genes, many involved in B-cell development and activation (citations needed). Under a strong prior that causal arrows point from the ligand FCER2 to the receptor CR2, it would be reasonable to invest time exploring the hypothesis that the additional genes connected to CR2 are activated downstream of FCER2 - CR2 signaling. Thus SPARC results can be used to hint at the downstream effects of a ligand-receptor interaction.
  • SPARC supports two workflows: data-driven hypothesis generation via clustering of panel-wide correlation results, and knowledge-driven hypothesis testing via examination of correlations among genes of prior interest. In both cases, tools are provided to aid deeper explorations.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present disclosure provides methods and systems for identifying gene modules with spatial correlations through clustering of panel-wide correlation results. The panel-wide correlation results can include location data and transcriptomic data.

Description

SYSTEMS AND METHODS FOR CELLULAR SPATIAL ANALYSIS
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No. 63/578,444, filed August 24, 2023, which application is incorporated herein by reference in its entirety and to which application we claim priority under 35 U.S.C. § 120.
BACKGROUND
[0002] Single cell spatial transcriptomics data, in which hundreds to thousands of genes are measured in situ across potentially millions of cells, poses the most daunting version yet of the oldest problem in “omics,” namely, discovery of all the interesting biology contained within a dataset. More specifically, an exploratory data analysis will search for trends that are biologically non-obvious, that are intelligible to a human mind, and that do not arise from artifacts in assay chemistry or data analysis.
SUMMARY OF THE INVENTION
[0003] One class of methods designed to fish biology from the data deluge of spatial transcriptomics looks for spatial correlated sets of genes, that is, genes that tend to be expressed in the same regions. Spatial correlation between genes can arise through direct cell-cell communication, or from some underlying latent variable; both these mechanisms are of interest.
[0004] One failing keeps spatial correlation methods from widespread use: most cell types are spatially organized, which induces spatial correlation among any genes whose expression varies across cell types. Thus, spatial correlation often provides little more than an oblique readout of a tissue’s cell type layout. Some methods avoid this pitfail by wrapping cell type into highly specific models; this approach, however, limits the diversity of trends they can discover.
[0005] In one aspect, disclosed herein is a method for analyzing a biological sample to identify genes having spatial correlations with one another, the method comprising: (a) retrieving, by a computer processor and from a database: a location data indicative of relative positions of the plurality of cells in a multi-dimensional image of the biological sample; and a transcriptomic data of a plurality of genes of the plurality of cells; (b) processing, by the computer processor, at least the transcriptomic data to generate a gene expression matrix characterizing each cell of the plurality of cells by a gene expression level of the plurality of genes; (c) analyzing, by the computer processor, the location data and the transcriptomic data to generate an environment confounder matrix, the environment confounder matrix characterizes each cell of the plurality of cells by one or more environment variables of a region adjacent to or surrounding the each cell in the multi-dimensional image, the one or more environment variables are based at least in part on (i) cell classification of one or more cells in at least the region or (ii) measurement artifact of the transcriptomic data in at least the region; (d) determining, by the computer processor, a correlation between the plurality of genes of the gene expression matrix conditional on the environment confounder matrix; and (e) identifying, by the computer processor and based on the correlation determined in (d), at least one subset of the plurality of genes, genes of the at least one subset are mutually correlated to one another in the biological sample.
[0006] In some aspects, the method comprising (d) further comprises determining a degree of the correlation, and method comprising (e) comprises identifying the at least one subset based on the degree of correlation. In some aspects, the identifying in (e) is based at least in part on determining a threshold level of the degree of correlation. In some aspects, the determining the correlation in (d) comprises analyzing covariance of the gene expression matrix conditional on the environment confounder matrix. In some aspects, the method further comprises, in (d), generating a conditional correlation matrix of the plurality of genes based on the covariance, the conditional correlation matrix is different from correlation matrix of the gene expression matrix. In some aspects, the at least one subset comprises a plurality of subsets that are different from one another. In some aspects, the plurality of subsets comprises a first subset having a first plurality of genes and a second subset having a second plurality of genes, the first plurality of genes has at least one gene that is not in common with the second plurality of genes. In some aspects, the gene expression matrix comprises an environment expression matrix, the environment expression matrix characterizes a cell of the plurality of cells by analyzing the gene expression level of the plurality of genes of nearest neighboring cells of the cell within the multi-dimensional image, and the method comprises, in (b), processing the transcriptomic data and the location data to generate the environment expression matrix . In some aspects, a number of the nearest neighboring cells is at most about 1,000, at most about 500, at most about 100, or at most about 50. In some aspects, the method further comprises displaying, via a graphical user interface, the genes of the at least one subset to a user. In some aspects, the method further comprises, based on the analyzing in (d), generating a gene cluster map comprising a plurality of shapes representing the plurality of genes, the plurality of shapes is arranged in a plurality of clusters, a cluster of the plurality of clusters corresponds to the at least one subset. In some aspects, the method further comprises displaying, via a graphical user interface, the gene cluster map to a user. In some aspects, the multi-dimensional image is a two-dimensional image. In some aspects, the method further comprises, subsequent to (e), scoring each cell of the plurality of cells based on single cell expression level of the genes of the at least one subset. In some aspects, the method further comprises generating an additional multi-dimensional image of the biological sample based on the scoring. In some aspects, the method further comprises, in (c), receiving selection of the one or more environment variables from a user via a graphical user interface. In some aspects, the one or more environment variables are based on both (i) the cell classification in the at least the region and (ii) the measurement artifact of the transcriptomic data in the at least the region. In some aspects, the measurement artifact comprises detection data via a synthetic control probe sequence. In some aspects, the one ormore environment variables are based at least in part on data comprising one or more of (i) a number of cells having a cell classification of interest in at least the region of the multi-dimensional image, (ii) a number of different cell classifications identified in at least the region of the multi-dimensional image, (iii) a ratio between numbers of cells of two different cell classifications of interest in at least the region of the multi-dimensional image, or (iv) a relative location between a cell having a cell classification of interest and a tissue substructure in at least a portion of the multidimensional image, or any combination thereof. In some aspects, the region can be characterized by having at most about 5 cells, at most about 10 cells, at most about 20 cells, at most about 50 cells, or at most about 100 cells. In some aspects, a number of the one or more environment variables in the environment confounder matrix is at least about 5, at least about 10, at least about 15, or at least about 20. In some aspects, the cell classification comprises one or more of endothelial cells, epithelial cells, dermal cells, endodermal cells, mesodermal cells, fibroblasts, osteocytes, chondrocytes, immune cells, dendritic cells, hepatic cells, pancreatic cells, or stromal cells, or any combination thereof. In some aspects, the cell classification comprises one or more of salivary gland mucous cells, salivary gland serous cells, von Ebner's gland cells, mammary gland cells, lacrimal gland cells, ceruminous gland cells, eccrine sweat gland dark cells, eccrine sweat gland clear cells, apocrine sweat gland cells, gland of Moll cells, sebaceous gland cells, bowman's gland cells, Brunner's gland cells, seminal vesicle cells, prostate gland cells, bulbourethral gland cells, Bartholin's gland cells, gland of Littre cells, uterus endometrium cells, isolated goblet cells, stomach lining mucous cells, gastric gland zymogenic cells, gastric gland oxyntic cells, pancreatic acinar cells, Paneth cells, type II pneumocytes, Clara cells, somatotropes, lactotropes, thyrotropes, gonadotropes, corticotropes, intermediate pituitary cells, magnocellular neurosecretory cells, gut cells, respiratory tract cells, thyroid epithelial cells, parafollicular cells, parathyroid gland cells, parathyroid chief cell, oxyphil cell, adrenal gland cells, chromaffin cells, Leydig cells, theca interna cells, corpus luteum cells, granulosa lutein cells, theca lutein cells, juxtaglomerular cell, macula densa cells, peripolar cells, mesangial cell, blood vessel and lymphatic vascular endothelial fenestrated cells, blood vessel and lymphatic vascular endothelial continuous cells, blood vessel and lymphatic vascular endothelial splenic cells, synovial cells, serosal cell, squamous cells, columnar cells, dark cells, vestibular membrane cell, stria vascularis basal cells, stria vascularis marginal cell, cells of Claudius, cells of Boettcher, choroid plexus cells, pia-arachnoid squamous cells, pigmented ciliary epithelium cells, nonpigmented ciliary epithelium cells, corneal endothelial cells, peg cells, respiratory tract ciliated cells, oviduct ciliated cell, uterine endometrial ciliated cells, rete testis ciliated cells, ductulus efferens ciliated cells, ciliated ependymal cells, epidermal keratinocytes, epidermal basal cells, keratinocyte of fingernails and toenails, nail bed basal cells, medullary hair shaft cells, cortical hair shaft cells, cuticular hair shaft cells, cuticular hair root sheath cells, hair root sheath cells of Huxley's layer, hair root sheath cells of Henle's layer, external hair root sheath cells, hair matrix cells, surface epithelial cells of stratified squamous epithelium, basal cell of epithelia, urinary epithelium cells, auditory inner hair cells of organ of Corti, auditory outer hair cells of organ of Corti, basal cells of olfactory epithelium, cold-sensitive primary sensory neurons, heat-sensitive primary sensory neurons, Merkel cells of epidermis, olfactory receptor neurons, pain- sensitive primary sensory neurons, photoreceptor rod cells, photoreceptor blue-sensitive cone cells, photoreceptor green-sensitive cone cells, photoreceptor red-sensitive cone cells, proprioceptive primary sensory neurons, touch-sensitive primary sensory neurons, type I carotid body cells, type II carotid body cell, type I hair cell of vestibular apparatus of ear, type II hair cells of vestibular apparatus of ear, type I taste bud cells, cholinergic neural cells, adrenergic neural cells, peptidergic neural cells, inner pillar cells of organ of Corti, outer pillar cells of organ of Corti, inner phalangeal cells of organ of Corti, outer phalangeal cells of organ of Corti, border cells of organ of Corti, Hensen cells of organ of Corti, vestibular apparatus supporting cells, taste bud supporting cells, olfactory epithelium supporting cells, Schwann cells, satellite cells, enteric glial cells, astrocytes, neurons, oligodendrocytes, spindle neurons, anterior lens epithelial cells, crystallin -containing lens fiber cells, hepatocytes, adipocytes, white fat cells, brown fat cells, liver lipocytes, kidney glomerulus parietal cells, kidney glomerulus podocytes, kidney proximal tubule brush border cells, loop of Henle thin segment cells, kidney distal tubule cells, kidney collecting duct cells, type I pneumocytes, pancreatic duct cells, nonstriated duct cells, duct cells, intestinal brush border cells, exocrine gland striated duct cells, gall bladder epithelial cells, ductulus efferens nonciliated cells, epididymal principal cells, epididymal basal cells, ameloblast epithelial cells, planum semilunatum epithelial cells, organ of Corti interdental epithelial cells, loose connective tissue fibroblasts, corneal keratocytes, tendon fibroblasts, bone marrow reticular tissue fibroblasts, nonepithelial fibroblasts, pericytes, nucleus pulposus cells, cementoblast/cementocytes, odontoblasts, odontocytes, hyaline cartilage chondrocytes, fibrocartilage chondrocytes, elastic cartilage chondrocytes, osteoblasts, osteocytes, osteoclasts, osteoprogenitor cells, hyalocytes, stellate cells, hepatic stellate cells, pancreatic stellate cells, red skeletal muscle cells, white skeletal muscle cells, intermediate skeletal muscle cells, nuclear bag cells of muscle spindle, nuclear chain cells of muscle spindle, satellite cells, ordinary heart muscle cells, nodal heart muscle cells, Purkinje fiber cells, smooth muscle cells, myoepithelial cells of iris, myoepithelial cell of exocrine glands, reticulocytes, megakaryocytes, monocytes, connective tissue macrophages, epidermal Langerhans cells, dendritic cells, microglial cells, neutrophils, eosinophils, basophils, mast cell, helper T cells, suppressor T cells, cytotoxic T cell, natural Killer T cells, B cells, natural killer cells, melanocytes, retinal pigmented epithelial cells, oogonia/oocytes, spermatids, spermatocytes, spermatogonium cells, spermatozoa, ovarian follicle cells, Sertoli cells, thymus epithelial cell, and/or interstitial kidney cells. In some aspects, the cell classification comprises one or more of embryonic stem cells, embryonic germ cells, induced pluripotent stem cells, mesenchymal stem cells, bone marrow-derived mesenchymal stem cells, bone marrow-derived mesenchymal stromal cells, tissue plastic-adherent placental stem cells (PDACs), umbilical cord stem cells, amniotic fluid stem cells, amnion derived adherent cells (AMDACs), osteogenic placental adherent cells (OPACs), adipose stem cells, limbal stem cells, dental pulp stem cells, myoblasts, endothelial progenitor cells, neuronal stem cells, exfoliated teeth derived stem cells, hair follicle stem cells, dermal stem cells, parthenogenically derived stem cells, reprogrammed stem cells, amnion derived adherent cells, or side population stem cells, or any combination thereof. In some aspects, the transcriptomic data comprises one or more of gene expression assays with fluorescently labeled probes, RNA sequencing (RNA-seq), microarray analysis, reverse transcription polymerase chain reaction (RT-PCR), cap analysis of gene expression, or single-cell RNA sequencing (scRNA-seq), or any combination thereof. In some aspects, the plurality of cells comprises at least about 100 cells, at least about 200 cells, at least about 500 cells, or at least about 1,000 cells. In some aspects, the plurality of genes comprises at least about 10 genes, at least about 20 genes, at least about 50 genes, or at least about 100 genes. In some aspects, the plurality of genes are endogenous genes. In some aspects, the plurality of genes comprises about from 5,000 genes to 6,000 genes. In some aspects, the plurality of genes comprises about from 19,000 genes to 20,000 genes. [0007] In one aspect, disclosed herein a system comprising one or more computer processors and computer memory coupled thereto, the computer memory comprising a machine executable code that, upon execution by the one or more computer processors, for analyzing a biological sample to identify genes having spatial correlations with one another, comprising: (a) a software module configured to retrieve, from a database: a location data indicative of relative positions of the plurality of cells in a multi-dimensional image of the biological sample; and a transcriptomic data of a plurality of genes of the plurality of cells; (b) a software module to configured to process, at least the transcriptomic data to generate a gene expression matrix characterizing each cell of the plurality of cells by a gene expression level of the plurality of genes; (c) a software module configured to analyze, the location data and the transcriptomic data to generate an environment confounder matrix, the environment confounder matrix characterizes each cell of the plurality of cells by one or more environment variables of a region adjacent to or surrounding the each cell in the multidimensional image, the one or more environment variables are based at least in part on (i) cell classification of one or more cells in at least the region or (ii) measurement artifact of the transcriptomic data in at least the region; (d) a software configured to determine, a correlation between the plurality of genes of the gene expression matrix conditional on the environment confounder matrix; and (e) a software configured to identify, based on the correlation determined in (d), at least one subset of the plurality of genes, genes of the at least one subset are mutually correlated to one another in the biological sample.
[0008] In some aspects, the system comprising (d) further comprises a software configured to determine a degree of the correlation, and method comprising (e) comprises a software configured to identify the at least one subset based on the degree of correlation. In some aspects, to identify in (e) is based at least in part on determining a threshold level of the degree of correlation. In some aspects, to determine the correlation in (d) comprises analyzing covariance of the gene expression matrix conditional on the environment confounder matrix. In some aspects, the system further comprises, in (d), a software configured to generate a conditional correlation matrix of the plurality of genes based on the covariance, the conditional correlation matrix is different from the gene expression matrix. In some aspects, the at least one subset comprises a plurality of subsets that are different from one another. In some aspects, the plurality of subsets comprises a first subset having a first plurality of genes and a second subset having a second plurality of genes, the first plurality of genes has at least one gene that is not in common with the second plurality of genes. In some aspects, the gene expression matrix comprises an environment expression matrix, the environment expression matrix characterizes a cell of the plurality of cells by analyzing the gene expression level of the plurality of genes of nearest neighboring cells of the cell within the multi-dimensional image, and the system comprises, in (b), a software configured to process the transcriptomic data and the location data to generate the environment expression matrix. In some aspects, a number of the nearest neighboring cells is at most about 1,000, at most about 500, at most about 100, or at most about 50. In some aspects, the system further comprises a software configured to display, via a graphical user interface, the genes of the at least one subset to a user. In some aspects, the system further comprises, based on the analyzing in (d), a software configured to generate a gene cluster map comprising a plurality of shapes representing the plurality of genes, the plurality of shapes is arranged in a plurality of clusters, a cluster of the plurality of clusters corresponds to the at least one subset. In some aspects, the system further comprises a software configured to display, via a graphical user interface, the gene cluster map to a user. In some aspects, the multi-dimensional image is a two-dimensional image. In some aspects, the system further comprises, subsequentto (e), a software configured to score each cell of the plurality of cells based on single cell expression level of the genes of the at least one sub set. In some aspects, the system further comprises a software configured to generate an additional multi-dimensional image of the biological sample based on the scoring. In some aspects, the system further comprises, in (c), a software configured to receive selection of the one or more environment variables from a user via a graphical user interface. In some aspects, the one or more environment variables are based on both (i) the cell classification in the at least the region and (ii) the measurement artifact of the transcriptomic data in the at least the region. In some aspects, the measurement artifact comprises detection data via a synthetic control probe sequence. In some aspects, the one or more environment variables are based at least in part on one or more of (i) a number of cells having a cell classification of interest in at least the region of the multi-dimensional image, (ii) a number of different cell classifications identified in at least the region of the multi-dimensional image, (iii) a ratio between numbers of cells of two different cell classifications of interest in at least the region of the multi-dimensional image, and (iv) a relative location between a cell having a cell classification of interest or a tissue substructure in at least a portion of the multidimensional image, or any combination thereof. In some aspects, the region can be characterized by having at most about 5 cells, at most about 10 cells, at most about 20 cells, at most about 50 cells, or at most about 100 cells. In some aspects, a number of the one or more environment variables in the environment confounder matrix is at least about 5, at least about 10, at least about 15, or at least about 20. In some aspects, the cell classification comprises one or more of endothelial cells, epithelial cells, dermal cells, endodermal cells, mesodermal cells, fibroblasts, osteocytes, chondrocytes, immune cells, dendritic cells, hepatic cells, pancreatic cells, or stromal cells, or any combination thereof. In some aspects, the cell classification comprises one or more of salivary gland mucous cells, salivary gland serous cells, von Ebner's gland cells, mammary gland cells, lacrimal gland cells, ceruminous gland cells, eccrine sweat gland dark cells, eccrine sweat gland clear cells, apocrine sweat gland cells, gland of Moll cells, sebaceous gland cells, bowman's gland cells, Brunner's gland cells, seminal vesicle cells, prostate gland cells, bulbourethral gland cells, Bartholin's gland cells, gland of Littre cells, uterus endometrium cells, isolated goblet cells, stomach lining mucous cells, gastric gland zymogenic cells, gastric gland oxyntic cells, pancreatic acinar cells, Paneth cells, type II pneumocytes, Clara cells, somatotropes, lactotropes, thyrotropes, gonadotropes, corticotropes, intermediate pituitary cells, magnocellular neurosecretory cells, gut cells, respiratory tract cells, thyroid epithelial cells, parafollicular cells, parathyroid gland cells, parathyroid chief cell, oxyphil cell, adrenal gland cells, chromaffin cells, Leydig cells, theca interna cells, corpus luteum cells, granulosa lutein cells, theca lutein cells, juxtaglomerular cell, macula densa cells, peripolar cells, mesangial cell, blood vessel and lymphatic vascular endothelial fenestrated cells, blood vessel and lymphatic vascular endothelial continuous cells, blood vessel and lymphatic vascular endothelial splenic cells, synovial cells, serosal cell, squamous cells, columnar cells, dark cells, vestibular membrane cell, stria vascularis basal cells, stria vascularis marginal cell, cells of Claudius, cells of Boettcher, choroid plexus cells, pia-arachnoid squamous cells, pigmented ciliary epithelium cells, nonpigmented ciliary epithelium cells, corneal endothelial cells, peg cells, respiratory tract ciliated cells, oviduct ciliated cell, uterine endometrial ciliated cells, rete testis ciliated cells, ductulus efferens ciliated cells, ciliated ependymal cells, epidermal keratinocytes, epidermal basal cells, keratinocyte of fingernails and toenails, nail bed basal cells, medullary hair shaft cells, cortical hair shaft cells, cuticular hair shaft cells, cuticular hair root sheath cells, hair root sheath cells of Huxley's layer, hair root sheath cells of Henle's layer, external hair root sheath cells, hair matrix cells, surface epithelial cells of stratified squamous epithelium, basal cell of epithelia, urinary epithelium cells, auditory inner hair cells of organ of Corti, auditory outer hair cells of organ of Corti, basal cells of olfactory epithelium, cold-sensitive primary sensory neurons, heat-sensitive primary sensory neurons, Merkel cells of epidermis, olfactory receptor neurons, pain- sensitive primary sensory neurons, photoreceptor rod cells, photoreceptor blue-sensitive cone cells, photoreceptor green-sensitive cone cells, photoreceptor red-sensitive cone cells, proprioceptive primary sensory neurons, touch-sensitive primary sensory neurons, type I carotid body cells, type II carotid body cell, type I hair cell of vestibular apparatus of ear, type II hair cells of vestibular apparatus of ear, type I taste bud cells, cholinergic neural cells, adrenergic neural cells, peptidergic neural cells, inner pillar cells of organ of Corti, outer pillar cells of organ of Corti, inner phalangeal cells of organ of Corti, outer phalangeal cells of organ of Corti, border cells of organ of Corti, Hensen cells of organ of Corti, vestibular apparatus supporting cells, taste bud supporting cells, olfactory epithelium supporting cells, Schwann cells, satellite cells, enteric glial cells, astrocytes, neurons, oligodendrocytes, spindle neurons, anterior lens epithelial cells, crystallin -containing lens fiber cells, hepatocytes, adipocytes, white fat cells, brown fat cells, liver lipocytes, kidney glomerulus parietal cells, kidney glomerulus podocytes, kidney proximal tubule brush border cells, loop of Henle thin segment cells, kidney distal tubule cells, kidney collecting duct cells, type I pneumocytes, pancreatic duct cells, nonstriated duct cells, duct cells, intestinal brush border cells, exocrine gland striated duct cells, gall bladder epithelial cells, ductulus efferens nonciliated cells, epididymal principal cells, epididymal basal cells, ameloblast epithelial cells, planum semilunatum epithelial cells, organ of Corti interdental epithelial cells, loose connective tissue fibroblasts, corneal keratocytes, tendon fibroblasts, bone marrow reticular tissue fibroblasts, nonepithelial fibroblasts, pericytes, nucleus pulposus cells, cementoblast/cementocytes, odontoblasts, odontocytes, hyaline cartilage chondrocytes, fibrocartilage chondrocytes, elastic cartilage chondrocytes, osteoblasts, osteocytes, osteoclasts, osteoprogenitor cells, hyalocytes, stellate cells, hepatic stellate cells, pancreatic stellate cells, red skeletal muscle cells, white skeletal muscle cells, intermediate skeletal muscle cells, nuclear bag cells of muscle spindle, nuclear chain cells of muscle spindle, satellite cells, ordinary heart muscle cells, nodal heart muscle cells, Purkinje fiber cells, smooth muscle cells, myoepithelial cells of iris, myoepithelial cell of exocrine glands, reticulocytes, megakaryocytes, monocytes, connective tissue macrophages, epidermal Langerhans cells, dendritic cells, microglial cells, neutrophils, eosinophils, basophils, mast cell, helper T cells, suppressor T cells, cytotoxic T cell, natural Killer T cells, B cells, natural killer cells, melanocytes, retinal pigmented epithelial cells, oogonia/oocytes, spermatids, spermatocytes, spermatogonium cells, spermatozoa, ovarian follicle cells, Sertoli cells, thymus epithelial cell, and/or interstitial kidney cells. In some aspects, the cell classification comprises one or more of embryonic stem cells, embryonic germ cells, induced pluripotent stem cells, mesenchymal stem cells, bone marrow-derived mesenchymal stem cells, bone marrow-derived mesenchymal stromal cells, tissue plastic-adherent placental stem cells (PDACs), umbilical cord stem cells, amniotic fluid stem cells, amnion derived adherent cells (AMDACs), osteogenic placental adherent cells (OPACs), adipose stem cells, limbal stem cells, dental pulp stem cells, myoblasts, endothelial progenitor cells, neuronal stem cells, exfoliated teeth derived stem cells, hair follicle stem cells, dermal stem cells, parthenogenically derived stem cells, reprogrammed stem cells, amnion derived adherent cells, or side population stem cells, or any combination thereof. In some aspects, the transcriptomic data comprises one or more of gene expression assays with fluorescently labeled probes, RNA sequencing (RNA-seq), microarray analysis, reverse transcription polymerase chain reaction (RT-PCR), cap analysis of gene expression, or single-cell RNA sequencing (scRNA-seq), or any combination thereof. In some aspects, the plurality of cells comprises at least about 100 cells, at least about 200 cells, at least about 500 cells, or at least about 1,000 cells. In some aspects, the plurality of genes comprises at least about 10 genes, at least about 20 genes, at least about 50 genes, or at least about lOO genes. In some aspects, the plurality of genes are endogenous genes. In some aspects, the plurality of genes comprises about from 5,000 genes to 6,000 genes. In some aspects, the plurality of genes comprises about 19,000 genes.
[0009] The method and system described herein provide means for quickly identifying spatial correlations meriting attention. The method and system described herein provide means for identifying gene modules with spatial correlations that cannot be explained by trivial factors like the cell type landscape or technical effects. It typically discovers dozens of such modules. To steer analysts towards the most interesting clusters, the method and system described herein provide means to implicate cell types in module activity and to describe module spatial patterns. The method and system described herein provide a powerful and convenient way to quickly identify spatial transcriptomics trends that deserve scarce analyst attention.
INCORPORATION BY REFERENCE
[0010] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The patent application file contains at least one drawing executed in color. Copies of this patent application with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
[0012] The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
[0013] Fig. 1A shows a non-limiting example of cell type map of a colon cancer.
[0014] Fig. IB shows a non-limiting example of a cell’s nearest neighbors.
[0015] Fig. 1C shows a non-limiting example of subset of the environment expression matrix.
[0016] Fig. ID shows a non-limiting example of raw correlation matrix of the environment expression matrix showing near-ubiquitous correlations.
[0017] Fig. IE shows a non-limiting example of subset of the environment confounding matrix, encoding cell type abundance and other confounding variables in each cell’s neighborhood.
[0018] Fig. IF shows a non-limiting example of correlation matrix of the environment matrix conditional on the confounding matrix, over the same subset of genes.
[0019] Fig. 1G shows a non-limiting example raw vs. conditional correlation of environment gene expression. Selected pairs of marker genes are highlighted.
[0020] Fig. 1H shows a non -limiting example of network representation of correlation between all genes in all modules.
[0021] Fig. II shows a non-limiting example of environment scores for a “tumor-promoting inflammation” module.
[0022] Fig. 1J shows a non-limiting example of single-cell scores for the module.
[0023] Fig. IK shows a non-limiting example of mRNA molecules of module genes.
[0024] Fig. IL shows a non-limiting example of estimated involvement of each cell type in each module.
[0025] Fig. IM shows a non-limiting example of estimated involvement of each cell type in each gene of the highlighted module.
[0026] Fig. 2A shows a non-limiting example of correlation structure of 51 ligands assigned to modules.
[0027] Fig. 2B shows a non-limiting example of involvement of each cell type in each module.
[0028] Figs. 2C-2E show non-limiting examples of environment expression of a module holding chemo-attractants (Fig. 2C), MHC2 antigen presentation genes (Fig. 2D), and MHC1 antigen presentation genes (Fig. 2E).
[0029] Fig. 2F shows a non -limiting example of conditional correlations of 555 ligand-receptor pairs.
[0030] Fig. 2G shows a non-limiting example of spatial map of single-cell expression of the ligand-receptor pair FCER2 & CR2.
[0031] Fig. 2H shows a non-limiting example of conditional correlation network around the FCER2-CR2 ligand-receptor pair. [0032] Fig. 3A shows a non-limiting example of chart of workflow of SPARC.
[0033] Fig. 3B shows a non-limiting example of workflow for building the environment matrix and the conditioning matrix.
[0034] Fig. 3C shows a non-limiting example of workflow for deriving modules from conditional correlation matrix.
[0035] Fig. 3D shows a non-limiting example of workflow for calculating module scores and gene weights.
[0036] Fig. 3E shows a non-limiting example of workflow for scoring involvement of cell type.
[0037] Fig. 4 shows a non-limiting example of a computing device; in this case, a device with one or more processors, memory, storage, and a network interface.
[0038] Fig. 5 shows a non-limiting example of a web/mobile application provision system; in this case, a system providing browser-based and/or native mobile user interfaces.
[0039] Fig. 6 shows a non -limiting example of a cloud-based web/mobile application provision system; in this case, a system comprising an elastically load balanced, auto -scaling web server and application server resources as well synchronously replicated databases.
DETAILED DESCRIPTION OF THE INVENTION
[0040] While various embodiments of the invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.
[0041] As used in the specification and claims, the singular form “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[0042] As used herein, the term “about” in some cases refers to an amount that is approximately the stated amount, in some cases near the stated amount by 10%, 5%, or 1%, including increments therein, and in some cases, in reference to a percentage, refers to an amount that is greater or less the stated percentage by 10%, 5%, or 1%, including increments therein.
[0043] As used herein, the phrases “at least one,” “one or more,” and “and/or” are open-ended expressions that are both conjunctive and disjunctive in operation. For example, each of the expressions “at least one of A, B and C,” “at least one of A, B, or C,” “one or more of A, B, and C”, “one or more of A, B, or C” and “A, B, and/or C” means A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B and C together.
[0044] Reference throughout this specification to “some embodiments,” “further embodiments,” or “a particular embodiment,” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrase “in some embodiments,” or “in further embodiments,” or “in a particular embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
[0045] In one aspect, disclosed herein is a method for analyzing a biological sample to identify genes having spatial correlations with one another, the method comprising: (a) retrieving, by a computer processor and from a database: a location data indicative of relative positions of the plurality of cells in a multi-dimensional image of the biological sample; and a transcriptomic data of a plurality of genes of the plurality of cells; (b) processing, by the computer processor, at least the transcriptomic data to generate a gene expression matrix characterizing each cell of the plurality of cells by a gene expression level of the plurality of genes; (c) analyzing, by the computer processor, the location data and the transcriptomic data to generate an environment confounder matrix, the environment confounder matrix characterizes each cell of the plurality of cells by one or more environment variables of a region adjacent to or surrounding the each cell in the multi-dimensional image, the one or more environment variables are based at least in part on (i) cell classification of one or more cells in at least the region or (ii) measurement artifact of the transcriptomic data in at least the region; (d) determining, by the computer processor, a correlation between the plurality of genes of the gene expression matrix conditional on the environment confounder matrix; and (e) identifying, by the computer processor and based on the correlation determined in (d), at least one subset of the plurality of genes, genes of the at least one subset are mutually correlated to one another in the biological sample.
[0046] In some aspects, the method comprising (d) further can comprise determining a degree of the correlation, and method comprising (e) can comprise identifying the at least one subset based on the degree of correlation. In some aspects, the identifying in (e) can be based at least in part on determining a threshold level of the degree of correlation. In some aspects, the determining the correlation in (d) can comprise analyzing covariance of the gene expression matrix conditional on the environment confounder matrix. In some aspects, the method further can comprise, in (d), generating a conditional correlation matrix of the plurality of genes based on the covariance, the conditional correlation matrix can be different from the gene expression matrix. In some aspects, the at least one subset can comprise a plurality of subsets that are different from one another. In some aspects, the plurality of subsets can comprise a first subset having a first plurality of genes and a second subset having a second plurality of genes, the first plurality of genes has at least one gene that can be not in common with the second plurality of genes. In some aspects, the gene expression matrix can comprise an environment expression matrix, the environment expression matrix characterizes a cell of the plurality of cells by analyzing the gene expression level of the plurality of genes of nearest neighboring cells of the cell within the multi-dimensional image, and the method can comprise, in (b), processing the transcriptomic data and the location data to generate the environment expression matrix. In some aspects, a number of the nearest neighboring cells can be at most about 1,000, at most about 500, at most about 100, or at most about 50. In some cases, a number of nearest neighboring cells can be more than 1 ,000 cells. In some cases, a number of nearest neighboring cells can be about 1 cell, 2 cells, 3 cells, 4 cells, 5 cells, 6 cells, 7 cells, 8 cells, 9 cells, 10 cells, 15 cells, 20 cells, 25 cells, 30 cells, 35 cells, 40 cells, 45 cells, or about 50 cells. In some cases, a number of nearest neighboring cells can be about 55 cells, 60 cells, 65 cells, 70 cells, 75 cells, 80 cells, 85 cells, 90 cells, 95 cells, or about 100 cells. In some cases, a number of nearest neighboring cells can be about 150 cells, 200 cells, 250 cells, 300 cells, 350 cells, 400 cells, 450 cells, 500 cells, 550 cells, 600 cells, 650 cells, 700 cells, 750 cells, 800 cells, 850 cells, 900 cells, 950 cells, or about 1,000 cells. In some aspects, the method further can comprise displaying, via a graphical user interface, the genes of the at least one subset to a user. In some aspects, the method further can comprise, based on the analyzing in (d), generating a gene cluster map comprising a plurality of shapes representing the plurality of genes, the plurality of shapes can be arranged in a plurality of clusters, a cluster of the plurality of clusters corresponds to the at least one subset. In some aspects, the method further can comprise displaying, via a graphical user interface, the gene cluster map to a user. In some aspects, the multi-dimensional image can be a two-dimensional image. In some aspects, the method further can comprise, subsequent to (e), scoring each cell of the plurality of cells based on single cell expression level of the genes of the at least one subset. In some aspects, the method further can comprise generating an additional multi-dimensional image of the biological sample based on the scoring. In some aspects, the method further can comprise, in (c), receiving selection of the one or more environment variables from a user via a graphical user interface. In some aspects, the one or more environment variables are based on both (i) the cell classification in the at least the region and (ii) the measurement artifact of the transcriptomic data in the at least the region. In some aspects, the measurement artifact can comprise detection data via a synthetic control probe sequence. In some aspects, the one or more environment variables are based at least in part on one or more of (i) a number of cells having a cell classification of interest in at least the region of the multi-dimensional image, (ii) a number of different cell classifications identified in at least the region of the multi- dimensional image, (iii) a ratio between numbers of cells of two different cell classifications of interest in at least the region of the multi-dimensional image, or (iv) a relative location between a cell having a cell classification of interest and a tissue substructure in at least a portion of the multi-dimensional image, or any combination thereof. In some aspects, the region can be characterized by having at most about 5 cells, at most about 10 cells, at most about 20 cells, atmost about 50 cells, or at most about 100 cells. In some cases, the region can be characterized by having about 1 cell, 2 cells, 3 cells, 4 cells, 5 cells, 6 cells, 7 cells, 8 cells, 9 cells, 10 cells, 15 cells, 20 cells, 25 cells, 30 cells, 35 cells, 40 cells, 45 cells, or about 50 cells. In some cases, the region can be characterized by having about 55 cells, 60 cells, 65 cells, 70 cells, 75 cells, 80 cells, 85 cells, 90 cells, 95 cells, or about 100 cells. In some aspects, a number of the one or more environment variables in the environment confounder matrix can be at least about 5, at least about 10, at least about 15, or at least about 20. In some cases, a number of the one or more environment variables in the environment confounder matrix can be at least about least about 2, at least about 3, at least about 4, at least about 5, at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, at least about 11, at least about 12, at least about 13, at least about 14, at least about 15, at least about 16, at least about 17, at least about 18, at least about 19, at least about 20, at least about 21, at least about 22, at least about 23, at least about 24, at least about 25, at least about 26, at least about 27, at least about 28, at least about 29, at least about 30, at least about 31, at least about 32, at least about 33, at least about 34, at least about 35, at least about 36, at least about 37, at least about 38, at least about 39, at least about 40, at least about 41, at least about 42, at least about 43, at least about 44, at least about 45, at least about 46, at least about 47, at least about 48, at least about 49, at least about 50, at least about 51, at least about 52, at least about 53, at least about 54, at least about 55, at least about 56, at least about 57, at least about 58, at least about 59, at least about 60, at least about 61, at least about 62, at least about 63, at least about 64, at least about 65, at least about 66, at least about 67, at least about 68, at least about 69, at least about 70, at least about 71, at least about 72, at least about 73, at least about 74, at least about 75, at least about 76, at least about 77, at least about 78, at least about 79, at least about 80, at least about 81, at least about 82, at least about 83, at least about 84, at least about 85, at least about 86, at least about 87, at least about 88, at least about 89, at least about 90, at least about 91, at least about 92, at least about 93, at least about 94, at least about 95, at least about 96, at least about 97, at least about 98 , at least about 99, or at least about 100. In some aspects, the cell classification can comprise one or more of endothelial cells, epithelial cells, dermal cells, endodermal cells, mesodermal cells, fibroblasts, osteocytes, chondrocytes, immune cells, dendritic cells, hepatic cells, pancreatic cells, or stromal cells, or any combination thereof. In some aspects, the cell classification can comprise one or more of salivary gland mucous cells, salivary gland serous cells, von Ebner's gland cells, mammary gland cells, lacrimal gland cells, ceruminous gland cells, eccrine sweat gland dark cells, eccrine sweat gland clear cells, apocrine sweat gland cells, gland of Moll cells, sebaceous gland cells, bowman's gland cells, Brunner's gland cells, seminal vesicle cells, prostate gland cells, bulbourethral gland cells, Bartholin's gland cells, gland of Littre cells, uterus endometrium cells, isolated goblet cells, stomach lining mucous cells, gastric gland zymogenic cells, gastric gland oxyntic cells, pancreatic acinar cells, Paneth cells, type II pneumocytes, Clara cells, somatotropes, lactotropes, thyrotropes, gonadotropes, corticotropes, intermediate pituitary cells, magnocellular neurosecretory cells, gut cells, respiratory tract cells, thyroid epithelial cells, parafollicular cells, parathyroid gland cells, parathyroid chief cell, oxyphil cell, adrenal gland cells, chromaffin cells, Leydig cells, theca interna cells, corpus luteum cells, granulosa lutein cells, theca lutein cells, juxtaglomerular cell, macula densa cells, peripolar cells, mesangial cell, blood vessel and lymphatic vascular endothelial fenestrated cells, blood vessel and lymphatic vascular endothelial continuous cells, bloodvessel and lymphatic vascular endothelial splenic cells, synovial cells, serosal cell, squamous cells, columnar cells, dark cells, vestibular membrane cell, stria vascularis basal cells, stria vascularis marginal cell, cells of Claudius, cells of Boettcher, choroid plexus cells, pia-arachnoid squamous cells, pigmented ciliary epithelium cells, nonpigmented ciliary epithelium cells, corneal endothelial cells, peg cells, respiratory tract ciliated cells, oviduct ciliated cell, uterine endometrial ciliated cells, rete testis ciliated cells, ductulus efferens ciliated cells, ciliated ependymal cells, epidermal keratinocytes, epidermal basal cells, keratinocyte of fingernails and toenails, nail bed basal cells, medullary hair shaft cells, cortical hair shaft cells, cuticular hair shaft cells, cuticular hair root sheath cells, hair root sheath cells of Huxley's layer, hair root sheath cells of Henle's layer, external hair root sheath cells, hair matrix cells, surface epithelial cells of stratified squamous epithelium, basal cell of epithelia, urinary epithelium cells, auditory inner hair cells of organ of Corti, auditory outer hair cells of organ of Corti, basal cells of olfactory epithelium, coldsensitive primary sensory neurons, heat-sensitive primary sensory neurons, Merkel cells of epidermis, olfactory receptor neurons, pain-sensitive primary sensory neurons, photoreceptor rod cells, photoreceptor blue-sensitive cone cells, photoreceptor green-sensitive cone cells, photoreceptor red-sensitive cone cells, proprioceptive primary sensory neurons, touch - sensitive primary sensory neurons, type I carotid body cells, type II carotid body cell, type I hair cell of vestibular apparatus of ear, type II hair cells of vestibular apparatus of ear, type I taste bud cells, cholinergic neural cells, adrenergic neural cells, peptidergic neural cells, inner pillar cells of organ of Corti, outer pillar cells of organ of Corti, inner phalangeal cells of organ of Corti, outer phalangeal cells of organ of Corti, border cells of organ of Corti, Hensen cells of organ of Corti, vestibular apparatus supporting cells, taste bud supporting cells, olfactory epithelium supporting cells, Schwann cells, satellite cells, enteric glial cells, astrocytes, neurons, oligodendrocytes, spindle neurons, anterior lens epithelial cells, cry stallin-containing lens fiber cells, hepatocytes, adipocytes, white fat cells, brown fat cells, liver lipocytes, kidney glomerulus parietal cells, kidney glomerulus podocytes, kidney proximal tubule brush border cells, loop of Henle thin segment cells, kidney distal tubule cells, kidney collecting duct cells, type I pneumocytes, pancreatic duct cells, nonstriated duct cells, duct cells, intestinal brush border cells, exocrine gland striated duct cells, gall bladder epithelial cells, ductulus efferens nonciliated cells, epididymal principal cells, epididymal basal cells, ameloblast epithelial cells, planum semilunatum epithelial cells, organ of Corti interdental epithelial cells, loose connective tissue fibroblasts, corneal keratocytes, tendon fibroblasts, bone marrow reticular tissue fibroblasts, nonepithelial fibroblasts, pericytes, nucleus pulposus cells, cementoblast/cementocytes, odontoblasts, odontocytes, hyaline cartilage chondrocytes, fibrocartilage chondrocytes, elastic cartilage chondrocytes, osteoblasts, osteocytes, osteoclasts, osteoprogenitor cells, hyalocytes, stellate cells, hepatic stellate cells, pancreatic stellate cells, red skeletal muscle cells, white skeletal muscle cells, intermediate skeletal muscle cells, nuclear bag cells of muscle spindle, nuclear chain cells of muscle spindle, satellite cells, ordinary heart muscle cells, nodal heart muscle cells, Purkinje fiber cells, smooth muscle cells, myoepithelial cells of iris, myoepithelial cell of exocrine glands, reticulocytes, megakaryocytes, monocytes, connective tissue macrophages, epidermal Langerhans cells, dendritic cells, microglial cells, neutrophils, eosinophils, basophils, mast cell, helper T cells, suppressor T cells, cytotoxic T cell, natural Killer T cells, B cells, natural killer cells, melanocytes, retinal pigmented epithelial cells, oogonia/oocytes, spermatids, spermatocytes, spermatogonium cells, spermatozoa, ovarian follicle cells, Sertoli cells, thymus epithelial cell, or interstitial kidney cells, or any combination thereof. In some aspects, the cell classification can comprise one or more of embryonic stem cells, embryonic germ cells, induced pluripotent stem cells, mesenchymal stem cells, bone marrow-derived mesenchymal stem cells, bone marrow-derived mesenchymal stromal cells, tissue plastic-adherent placental stem cells (PDACs), umbilical cord stem cells, amniotic fluid stem cells, amnion derived adherent cells (AMDACs), osteogenic placental adherent cells (OPACs), adipose stem cells, limbal stem cells, dental pulp stem cells, myoblasts, endothelial progenitor cells, neuronal stem cells, exfoliated teeth derived stem cells, hair follicle stem cells, dermal stem cells, parthenogenically derived stem cells, reprogrammed stem cells, amnion derived adherent cells, or side population stem cells, or any combination thereof . In some aspects, the transcriptomic data can be selected from data comprising one or more of gene expression assays with fluorescently labeled probes, RNA sequencing (RNA-seq), microarray analysis, reverse transcription polymerase chain reaction (RT-PCR), cap analysis of gene expression, or single-cell RNA sequencing (scRNA-seq), or any combination thereof, in some aspects, the plurality of cells can comprise at least about 100 cells, at least about 200 cells, at least about 500 cells, or at least about 1,000 cells. In some aspects, the plurality of genes can comprise at least about 10 genes, at least about 20 genes, at least about 50 genes, or at least about 100 genes. In some aspects, the plurality of genes can comprise at least about 150 genes, at least about 200 genes, at least about 250 genes, at least about 300 genes, at least about 350 genes, at least about 400 genes, at least about 450 genes, at least about 500 genes, atleast about 550 genes, at least about 600 genes, atleast about 650 genes, at least about 700 genes, at least about 750 genes, at least about 800 genes, at least about 850 genes, at least about 900 genes, at least about 950 genes, at least about 1,000 genes, atleast about 1,050 genes, at least about 1, 100 genes, at least about 1,150 genes, at least about 1,200 genes, at least about 1,250 genes, at least about 1,300 genes, at least about l,350 genes, atleast about 1,400 genes, at least about 1,450 genes, at least about 1,500 genes, at least about 1,550 genes, at least about 1,600 genes, at least about 1,650 genes, at least about 1,700 genes, at least about 1,750 genes, at least about 1,800 genes, atleast about l,850 genes, atleast about 1,900 genes, at least about 1,950 genes, or at least about 2,000 genes. In some aspects, the plurality of genes can comprise at least about 2, 100 genes, at least about 2,200 genes, at least about 2,300 genes, at least about 2,400 genes, at least about 2,500 genes, at least about 2,600 genes, at least about
2.700 genes, at least about 2,800 genes, at least about 2,900 genes, at least about 3,000 genes, at least about 3,100 genes, at least about 3,200 genes, at least about 3,300 genes, at least about 3,400 genes, atleast about 3,500 genes, at least about 3,600 genes, at least about
3.700 genes, at least about 3,800 genes, at least about 3,900 genes, at least about 4,000 genes, at least about 4, 100 genes, at least about 4,200 genes, at least about 4,300 genes, at least about 4,400 genes, at least about 4,500 genes, at least about 4,600 genes, at least about
4.700 genes, at least about 4,800 genes, at least about 4,900 genes, or at least about 5,000 genes. In some aspects, the plurality of genes can comprise about from 5,000 genes to 6,000 genes. In some aspects, the plurality of genes can comprise at least about 5,100 genes, at least about 5,200 genes, at least about 5,300 genes, atleast about 5,400 genes, at least about 5,500 genes, at least about 5,600 genes, at least about 5,700 genes, at least about 5,800 genes, at least about 5,900 genes, at least about 6,000 genes, at least about 6, 100 genes, at least about 6,200 genes, at least about 6,300 genes, at least about 6,400 genes, at least about
6.500 genes, at least about 6,600 genes, at least about 6,700 genes, at least about 6,800 genes, at least about 6,900 genes, at least about 7,000 genes, at least about 7, 100 genes, at least about 7,200 genes, at least about 7,300 genes, at least about 7,400 genes, at least about
7.500 genes, at least about 7,600 genes, at least about 7,700 genes, at least about 7,800 genes, at least about 7,900 genes, at least about 8,000 genes, at least about 8, 100 genes, at least about 8,200 genes, at least about 8,300 genes, at least about 8,400 genes, at least about
8.500 genes, at least about 8,600 genes, at least about 8,700 genes, at least about 8,800 genes, at least about 8,900 genes, at least about 9,000 genes, at least about 9, 100 genes, at least about 9,200 genes, at least about 9,300 genes, at least about 9,400 genes, at least about
9.500 genes, at least about 9,600 genes, at least about 9,700 genes, at least about 9,800 genes, at least about 9,900 genes, or at least about 10,000 genes. In some aspects, the plurality of genes can comprise about 19,000 genes. In some aspects, the plurality of genes can comprise at least about 10,000 genes, at least about 10,500 genes, at least about 11,000 genes, at least about 11,500 genes, atleast about 12,000 genes, at least about 12,500 genes, at least about 13,000 genes, atleast about 13,500 genes, atleast about 14,000 genes, at least about 14,500 genes, atleast about 15,000 genes, at least about 15,500 genes, at least about 16,000 genes, atleast about 16,500 genes, at least about 17,000 genes, at least about 17,500 genes, at least about 18,000 genes, at least about 18,500 genes, at least about 19,000 genes, at least about 19,500 genes, or at least about 20,000 genes. In some aspects, the plurality of genes are endogenous genes.
[0047] In one aspect, disclosed herein a system comprising one or more computer processors and computer memory coupled thereto, the computer memory comprising a machine executable code that, upon execution by the one or more computer processors, for analyzing a biological sample to identify genes having spatial correlations with one another, comprising: (a) a software module configured to retrieve, from a database: a location data indicative of relative positions of the plurality of cells in a multi-dimensional image of the biological sample; and a transcriptomic data of a plurality of genes of the plurality of cells; (b) a software module to configured to process, at least the transcriptomic data to generate a gene expression matrix characterizing each cell of the plurality of cells by a gene expression level of the plurality of genes; (c) a software module configured to analyze, the location data and the transcriptomic data to generate an environment confounder matrix, the environment confounder matrix characterizes each cell of the plurality of cells by one or more environment variables of a region adjacent to or surrounding the each cell in the multi - dimensional image, the one or more environment variables are based at least in part on (i) cell classification of one or more cells in at least the region or (ii) measurement artifact of the transcriptomic data in at least the region; (d) a software configured to determine, a correlation between the plurality of genes of the gene expression matrix conditional on the environment confounder matrix; and (e) a software configured to identify, based on the correlation determined in (d), at least one subset of the plurality of genes, genes of the at least one subset are mutually correlated to one another in the biological sample.
[0048] In some aspects, the system comprising (d) further can comprise a software configured to determine a degree of the correlation, and method comprising (e) can comprise a software configured to identify the at least one subset based on the degree of correlation. In some aspects, to identify in (e) can be based at least in part on determining a threshold level of the degree of correlation. In some aspects, to determine the correlation in (d) can comprise analyzing covariance of the gene expression matrix conditional on the environment confounder matrix. In some aspects, the system further can comprise, in (d), a software configured to generate a conditional correlation matrix of the plurality of genes based on the covariance, the conditional correlation matrix can be different from the gene expression matrix. In some aspects, the at least one subset can comprise a plurality of subsets that are different from one another. In some aspects, the plurality of subsets can comprise a first subset having a first plurality of genes and a second subset having a second plurality of genes, the first plurality of genes has at least one gene not in common with the second plurality of genes. In some aspects, the gene expression matrix can comprise an environment expression matrix, the environment expression matrix characterizes a cell of the plurality of cells by analyzing the gene expression level of the plurality of genes of nearest neighboring cells of the cell within the multi-dimensional image, and the system can comprise, in (b), a software configured to process the transcriptomic data and the location data to generate the environment expression matrix. In some aspects, a number of the nearest neighboring cells can be at most about 1,000, at most about 500, at most about 100, or at most about 50. In some aspects, the system further can comprise a software configured to display, via a graphical user interface, the genes of the at least one subset to a user. In some aspects, the system further can comprise, based on the analyzing in (d), a software configured to generate a gene cluster map comprising a plurality of shapes representing the plurality of genes, the plurality of shapes can be arranged in a plurality of clusters, a cluster of the plurality of clusters corresponds to the at least one subset. In some aspects, the system further can comprise a software configured to display, via a graphical user interface, the gene cluster map to a user. In some aspects, the multi-dimensional image can be a two-dimensional image. In some aspects, the system further can comprise, subsequent to (e), a software configured to score each cell of the plurality of cells based on single cell expression level of the genes of the at least one subset. In some aspects, the system further can comprise a software configured to generate an additional multi-dimensional image of the biological sample based on the scoring. In some aspects, the system further can comprise, in (c), a software configured to receive selection of the one or more environment variables from a user via a graphical user interface. In some aspects, the one or more environment variables are based on both (i) the cell classification in the at least the region and (ii) the measurement artifact of the transcriptomic data in the at least the region. In some aspects, the measurement artifact can comprise detection data via a synthetic control probe sequence. In some aspects, the one or more environment variables are based at least in part on one or more of (i) a number of cells having a cell classification of interest in at least the region of the multidimensional image, (ii) a number of different cell classifications identified in at least the region of the multi-dimensional image, (iii) a ratio between numbers of cells of two different cell classifications of interest in at least the region of the multi-dimensional image, or (iv) a relative location between a cell having a cell classification of interest and a tissue substructure in at least a portion of the multi-dimensional image, or any combination thereof. In some aspects, the region can be characterized by having at most about 5 cells, at most about 10 cells, at most about 20 cells, at most about 50 cells, or at most about 100 cells. In some aspects, a number of the one or more environment variables in the environment confounder matrix can be at least about 5, at least about 10, at least about 15, or at least about 20. In some aspects, the cell classification can comprise one or more of endothelial cells, epithelial cells, dermal cells, endodermal cells, mesodermal cells, fibroblasts, osteocytes, chondrocytes, immune cells, dendritic cells, hepatic cells, pancreatic cells, or stromal cells, or any combination thereof. In some aspects, the cell classification can comprise one or more of salivary gland mucous cells, salivary gland serous cells, von Ebner's gland cells, mammary gland cells, lacrimal gland cells, ceruminous gland cells, eccrine sweat gland dark cells, eccrine sweat gland clear cells, apocrine sweat gland cells, gland of Moll cells, sebaceous gland cells, bowman's gland cells, Brunner's gland cells, seminal vesicle cells, prostate gland cells, bulbourethral gland cells, Bartholin's gland cells, gland of Littre cells, uterus endometrium cells, isolated goblet cells, stomach lining mucous cells, gastric gland zymogenic cells, gastric gland oxyntic cells, pancreatic acinar cells, Paneth cells, type II pneumocytes, Clara cells, somatotropes, lactotropes, thyrotropes, gonadotropes, corticotropes, intermediate pituitary cells, magnocellular neurosecretory cells, gut cells, respiratory tract cells, thyroid epithelial cells, parafollicular cells, parathyroid gland cells, parathyroid chief cell, oxyphil cell, adrenal gland cells, chromaffin cells, Leydig cells, theca interna cells, corpus luteum cells, granulosa lutein cells, theca lutein cells, juxtaglomerular cell, macula densa cells, peripolar cells, mesangial cell, blood vessel and lymphatic vascular endothelial fenestrated cells, blood vessel and lymphatic vascular endothelial continuous cells, bloodvessel and lymphatic vascular endothelial splenic cells, synovial cells, serosal cell, squamous cells, columnar cells, dark cells, vestibular membrane cell, stria vascularis basal cells, stria vascularis marginal cell, cells of Claudius, cells of Boettcher, choroid plexus cells, pia-arachnoid squamous cells, pigmented ciliary epithelium cells, nonpigmented ciliary epithelium cells, corneal endothelial cells, peg cells, respiratory tract ciliated cells, oviduct ciliated cell, uterine endometrial ciliated cells, rete testis ciliated cells, ductulus efferens ciliated cells, ciliated ependymal cells, epidermal keratinocytes, epidermal basal cells, keratinocyte of fingernails and toenails, nail bed basal cells, medullary hair shaft cells, cortical hair shaft cells, cuticular hair shaft cells, cuticular hair root sheath cells, hair root sheath cells of Huxley's layer, hair root sheath cells of Henle's layer, external hair root sheath cells, hair matrix cells, surface epithelial cells of stratified squamous epithelium, basal cell of epithelia, urinary epithelium cells, auditory inner hair cells of organ of Corti, auditory outer hair cells of organ of Corti, basal cells of olfactory epithelium, coldsensitive primary sensory neurons, heat-sensitive primary sensory neurons, Merkel cells of epidermis, olfactory receptor neurons, pain-sensitive primary sensory neurons, photoreceptor rod cells, photoreceptor blue-sensitive cone cells, photoreceptor green-sensitive cone cells, photoreceptor red-sensitive cone cells, proprioceptive primary sensory neurons, touch - sensitive primary sensory neurons, type I carotid body cells, type II carotid body cell, type I hair cell of vestibular apparatus of ear, type II hair cells of vestibular apparatus of ear, type I taste bud cells, cholinergic neural cells, adrenergic neural cells, peptidergic neural cells, inner pillar cells of organ of Corti, outer pillar cells of organ of Corti, inner phalangeal cells of organ of Corti, outer phalangeal cells of organ of Corti, border cells of organ of Corti, Hensen cells of organ of Corti, vestibular apparatus supporting cells, taste bud supporting cells, olfactory epithelium supporting cells, Schwann cells, satellite cells, enteric glial cells, astrocytes, neurons, oligodendrocytes, spindle neurons, anterior lens epithelial cells, cry stallin-containing lens fiber cells, hepatocytes, adipocytes, white fat cells, brown fat cells, liver lipocytes, kidney glomerulus parietal cells, kidney glomerulus podocytes, kidney proximal tubule brush border cells, loop of Henle thin segment cells, kidney distal tubule cells, kidney collecting duct cells, type I pneumocytes, pancreatic duct cells, nonstriated duct cells, duct cells, intestinal brush border cells, exocrine gland striated duct cells, gall bladder epithelial cells, ductulus efferens nonciliated cells, epididymal principal cells, epididymal basal cells, ameloblast epithelial cells, planum semilunatum epithelial cells, organ of Corti interdental epithelial cells, loose connective tissue fibroblasts, corneal keratocytes, tendon fibroblasts, bone marrow reticular tissue fibroblasts, nonepithelial fibroblasts, pericytes, nucleus pulposus cells, cementoblast/cementocytes, odontoblasts, odontocytes, hyaline cartilage chondrocytes, fibrocartilage chondrocytes, elastic cartilage chondrocytes, osteoblasts, osteocytes, osteoclasts, osteoprogenitor cells, hyalocytes, stellate cells, hepatic stellate cells, pancreatic stellate cells, red skeletal muscle cells, white skeletal muscle cells, intermediate skeletal muscle cells, nuclear bag cells of muscle spindle, nuclear chain cells of muscle spindle, satellite cells, ordinary heart muscle cells, nodal heart muscle cells, Purkinje fiber cells, smooth muscle cells, myoepithelial cells of iris, myoepithelial cell of exocrine glands, reticulocytes, megakaryocytes, monocytes, connective tissue macrophages, epidermal Langerhans cells, dendritic cells, microglial cells, neutrophils, eosinophils, basophils, mast cell, helper T cells, suppressor T cells, cytotoxic T cell, natural Killer T cells, B cells, natural killer cells, melanocytes, retinal pigmented epithelial cells, oogonia/oocytes, spermatids, spermatocytes, spermatogonium cells, spermatozoa, ovarian follicle cells, Sertoli cells, thymus epithelial cell, and/or interstitial kidney cells. In some aspects, the cell classification can comprise one or more of embryonic stem cells, embryonic germ cells, induced pluripotent stem cells, mesenchymal stem cells, bone marrow-derived mesenchymal stem cells, bone marrow-derived mesenchymal stromal cells, tissue plasticadherent placental stem cells (PDACs), umbilical cord stem cells, amniotic fluid stem cells, amnion derived adherent cells (AMDACs), osteogenic placental adherent cells (OPACs), adipose stem cells, limbal stem cells, dental pulp stem cells, myoblasts, endothelial progenitor cells, neuronal stem cells, exfoliated teeth derived stem cells, hair follicle stem cells, dermal stem cells, parthenogenically derived stem cells, reprogrammed stem cells, amnion derived adherent cells, or side population stem cells, or any combination thereof. In some aspects, the transcriptomic data can comprise one or more of gene expression assays with fluorescently labeled probes, RNA sequencing (RNA-seq), microarray analysis, reverse transcription polymerase chain reaction (RT-PCR), cap analysis of gene expression, or single-cell RNA sequencing (scRNA-seq), or any combination thereof. In some aspects, the plurality of cells can comprise at least about 100 cells, at least about 200 cells, at least about 500 cells, or at least about 1,000 cells. In some aspects, the plurality of genes can comprise at least about 10 genes, at least about 20 genes, at least about 50 genes, or at least about 100 genes. In some aspects, the plurality of genes can comprise at least about 150 genes, at least about 200 genes, at least about 250 genes, at least about 300 genes, at least about 350 genes, at least about 400 genes, atleast about 450 genes, at least about 500 genes, at least about 550 genes, at least about 600 genes, at least about 650 genes, at least about 700 genes, at least about 750 genes, at least about 800 genes, at least about 850 genes, at least about 900 genes, at least about 950 genes, atleast about l,000 genes, atleast about l,050 genes, at least about
1.100 genes, at least about 1,150 genes, at least about 1,200 genes, at least about 1,250 genes, at least about 1,300 genes, at least about 1,350 genes, at least about 1,400 genes, at least about 1,450 genes, atleast about l,500 genes, atleast about 1,550 genes, at least about 1,600 genes, at least about 1,650 genes, at least about 1,700 genes, at least about 1,750 genes, at least about 1,800 genes, at least about 1,850 genes, at least about 1,900 genes, at least about 1,950 genes, or at least about 2,000 genes. In some aspects, the plurality of genes can comprise at least about 2,100 genes, at least about 2,200 genes, at least about 2,300 genes, at least about 2,400 genes, at least about 2,500 genes, at least about 2,600 genes, at least about 2,700 genes, at least about 2,800 genes, at least about 2,900 genes, at least about 3,000 genes, at least about 3,100 genes, at least about 3,200 genes, at least about 3,300 genes, at least about 3,400 genes, at least about 3,500 genes, at least about 3,600 genes, at least about 3,700 genes, at least about 3,800 genes, at least about 3,900 genes, at least about 4,000 genes, at least about 4,100 genes, at least about 4,200 genes, at least about 4,300 genes, at least about 4,400 genes, at least about 4,500 genes, at least about 4,600 genes, at least about 4,700 genes, at least about 4,800 genes, at least about 4,900 genes, or at least about 5,000 genes. In some aspects, the plurality of genes can comprise about from 5,000 genes to 6,000 genes. In some aspects, the plurality of genes can comprise at least about
5.100 genes, at least about 5,200 genes, at least about 5,300 genes, at least about 5,400 genes, at least about 5,500 genes, at least about 5,600 genes, at least about 5,700 genes, at least about 5,800 genes, at least about 5,900 genes, at least about 6,000 genes, at least about
6.100 genes, at least about 6,200 genes, at least about 6,300 genes, at least about 6,400 genes, at least about 6,500 genes, at least about 6,600 genes, at least about 6,700 genes, at least about 6,800 genes, at least about 6,900 genes, at least about 7,000 genes, at least about
7.100 genes, at least about 7,200 genes, at least about 7,300 genes, at least about 7,400 genes, at least about 7,500 genes, at least about 7,600 genes, at least about 7,700 genes, at least about 7,800 genes, at least about 7,900 genes, atleast about 8,000 genes, at least about
8.100 genes, at least about 8,200 genes, at least about 8,300 genes, at least about 8,400 genes, at least about 8,500 genes, at least about 8,600 genes, at least about 8,700 genes, at least about 8,800 genes, at least about 8,900 genes, at least about 9,000 genes, at least about
9.100 genes, at least about 9,200 genes, at least about 9,300 genes, at least about 9,400 genes, at least about 9,500 genes, at least about 9,600 genes, at least about 9,700 genes, at least about 9,800 genes, at least about 9,900 genes, or at least about 10,000 genes. In some aspects, the plurality of genes can comprise about 19,000 genes. In some aspects, the plurality of genes can comprise at least about 10,000 genes, at least about 10,500 genes, at least about 11,000 genes, at least about 11,500 genes, at least about 12,000 genes, at least about 12,500 genes, atleast about 13,000 genes, at least about 13,500 genes, at least about 14,000 genes, at least about 14,500 genes, at least about 15,000 genes, at least about 15,500 genes, at least about 16,000 genes, at least about 16,500 genes, at least about 17,000 genes, at least about 17,500 genes, atleast about 18,000 genes, atleast about 18,500 genes, at least about 19,000 genes, at least about 19,500 genes, or at least about 20,000 genes. In some aspects, the plurality of genes are endogenous genes.
Biological Sample and Transcriptomic Data
[0049] Method and system described herein for analyzing a biological sample to identify genes having spatial correlations with one another comprise retrieving, by a computer processor and from a database: a location data indicative of relative positions of a plurality of cells in a multi-dimensional image of the biological sample; and a transcriptomic data of a plurality of genes of the plurality of cells. In some embodiments, the biological sample can be obtained at least in part of one or more of biopsy collection, surgical resection, xenograft, animal model, fine needle aspiration, peripheral blood collection, bone marrow biopsy, healthy tissue sampling, neoplastic tissue sampling, malignant tissue sampling, diseased tissue sampling, and implanted tissue sampling.
[0050] In some embodiments, the biological sample can comprise cells or tissues. In some embodiments, the cells comprise primary cells, stem cells, immune cells, carcinoma cells, sarcoma cells, lymphoma cells, melanoma cells, cancer cells, or neoplastic cells. In some embodiments, the cells comprise germ cell tumor cells, blastoma cells, bladder cancer cells, breast cancer cells, colon cancer cells, colorectal cancer cells, endocrine tumor cells, esophageal cancer cells, glioblastoma cells, Hodgkin lymphoma cells, lung cancer cells, melanoma cells, or prostate cancer cells. In some embodiments, the cells comprise endothelial cells, epithelial cells, dermal cells, endodermal cells, mesodermal cells, fibroblasts, osteocytes, chondrocytes, immune cells, dendritic cells, hepatic cells, pancreatic cells, or stromal cells.
[0051] In some embodiments, the image of biological sample can comprise a plurality of cells. In some embodiments, the plurality of cells can comprise at least about 10 cells, at least about 20 cells, at least about 30 cells, at least about 50 cells, at least about 100 cells, at least about 200 cells, at least about 300 cells, at least about 400 cells, at least about 500 cells, at least about 600 cells, at least about 700 cells, at least about 800 cells, at least about 900 cells, at least about 1000 cells, at least about 1200 cells, at least about 1400 cells, at least about 1600 cells, at least about 1800 cells, or at least about 2000 cells.
[0052] Method and system described herein comprise retrieving transcriptomic data of a plurality of genes of the plurality of cells. Transcriptomic data may be used to analyze the expression levels of genes or RNA molecules, such as messenger RNA (mRNA) or non- coding RNA, in a particular sample. Differential gene expression patterns or alternative splicing events may serve as biomarkers for specific diseases or physiological conditions. Transcriptomic probes may be identified through techniques like RNA sequencing (RNA- seq) or microarray analysis. Transcriptomic data encompass biomarkers derived from the analysis of gene expression patterns, RNA molecules, and other transcriptomic data. In some embodiments, biomarkers derived from transcriptomic features comprise differential gene expression, alternative splicing patterns, fusion genes and chimeric transcripts, non-coding RNA biomarkers, gene expression signatures, or regulatory networks and pathways. In some embodiments, transcriptomic data can comprise data from gene expression assays with fluorescently labeled probes, RNA sequencing (RNA-seq), microarray analysis, reverse transcription polymerase chain reaction (RT-PCR), cap analysis of gene expression, or single-cell RNA sequencing (scRNA-seq). In some embodiments, transcriptomic data can comprise data from imaging transcriptomics or spot-based spatial assays. In some embodiments, data from imaging transcriptomics can comprise data from CosMx Spatial Molecular Imager (SMI), Xenium, or MERscope. In some embodiments, data from spotbased spatial assays may comprise data from Visium, Slide-seq or GeoMx Digital Spatial Profiler (DSP).
[0053] In some embodiments, transcriptomic data may comprise measurement artifact. In some embodiments, the measurement artifact can comprise detection data via a synthetic negative control probe sequence. A synthetic control probe can be a reference or control RNA molecule that can be artificially synthesized and added to biological samples during the experimental process. The purpose of synthetic control probes may be to serve as internal standards for monitoring the quality and performance of various steps in the transcriptomic analysis pipeline, particularly during RNA sequencing experiments. Synthetic control probes may be designed to have known sequences and concentrations. By introducing known sequences into the experimental samples, the quality and reliability of the experimental process may be assessed. Any deviations from the expected results may be indicative of issues such as RNA degradation, contamination, or technical biases. Synthetic control probes may be used as a normalization reference, helping to correct for differences in sequencing depth, library preparation efficiency, and other technical factors that can introduce bias into the data. Synthetic control probes may be designed to mimic the characteristics of endogenous transcripts, such as their length, GC content, and secondary structure. The detection limits of experimental setup may be assessed. If synthetic control probes are reliably detected at low concentrations, it may indicate the sensitivity of the assay. By introducing a known quantity of synthetic RNA into the samples, the measurements and expression values may be calibrated, making them more comparable across different experiments and platforms. In some embodiments, negative control probes for “ERCC” sequences may be used. These probes bind nothing in any known genome and may be used to measure background in the system. In some embodiments, negative control barcodes or “false-codes” maybe used. The barcode sequences may not be generated by any physical probe and may perform as a component of background.
[0054] In some embodiments, the plurality of genes can comprise endogenous genes or exogenous genes. In some embodiments, endogenous genes comprise protein-coding genes, ribosomal RNA (rRNA) genes, transfer RNA (tRNA) genes, small nuclear RNA (snRNA) genes, small nucleolar RNA (snoRNA) genes, microRNA (miRNA) genes, circular RNA (circRNA) genes, transfer-messenger RNA (tmRNA) gene, pseudogenes, transposable elements (TEs), immunoglobulin and T-cell receptor genes, or histone and other chromatin- related genes.
[0055] In some embodiments, the method and system disclosed herein comprise retrieving a location data indicative of relative positions in a multi-dimensional image. In some embodiments, the method and system disclosed herein comprise retrieving the location data from multiple images depicting various cells or tissues obtained from a biological sample. The image may be referred to a plurality of images representing different cells or tissues derived from a biological sample. These images may be generated through various imaging techniques, such as microscopy, histology, immunohistochemistry, or other relevant methods used in the field of biological sample analysis. In some embodiments, the images comprise microscope-derived images, comprising images from optical microscopes, electron microscopes, or scanning probe microscopes. In some embodiments, the multi-dimensional image can be a two-dimensional (2D) image, a three-dimensional (3D) image, or 2D projection of a 3D image.
[0056] In some embodiments, the method and system disclosed herein comprise retrieving the location data from at least 1 image, at least 2 images, at least 3 images, at least 5 images, at least 10 images, at least 15 images, at least 20 images, at least 30 images, at least 35 images, at least 40 images, at least 45 images, at least 50 images, at least 55 images, at least 60 images, at least 65 images, at least 70 images, at least 80 images, at least 90 images, at least 100 images, at least 120 images, at least 150 images, at least 200 images or more of the biological sample.
Gene expression matrix
[0057] The method and system disclosed herein may comprise processing, by the computer processor, at least the transcriptomic data to generate a gene expression matrix characterizing each cell of the plurality of cells by a gene expression level of the plurality of genes. In some embodiments, the plurality of genes can comprise at least about 2 genes, at least about 5 genes, at least about 10 genes, at least about 20 genes, at least about 30 genes, at least about 40 genes, at least about 50 genes, at least about 60 genes, at least about 70 genes, at least about 80 genes, at least about 90 genes, at least about 100 genes, at least about 120 genes, at least about 150 genes, at least about 200 genes, at least about 300 genes, at least about 400 genes, or at least about 500 genes. In some aspects, the plurality of genes can comprise at least about 150 genes, at least about 200 genes, at least about 250 genes, at least about 300 genes, atleast about 350 genes, at least about 400 genes, at least about 450 genes, at least about 500 genes, at least about 550 genes, at least about 600 genes, at least about 650 genes, at least about 700 genes, at least about 750 genes, at least about 800 genes, at least about 850 genes, at least about 900 genes, at least about 950 genes, at least about 1,000 genes, at least about 1,050 genes, at least about 1,100 genes, at least about 1, 150 genes, at least about 1,200 genes, atleast about l,250 genes, atleast about 1,300 genes, at least about 1,350 genes, at least about 1,400 genes, at least about 1,450 genes, at least about 1,500 genes, at least about 1,550 genes, at least about 1,600 genes, at least about 1,650 genes, at least about 1,700 genes, atleast about l,750 genes, atleast about 1,800 genes, at least about 1,850 genes, at least about 1,900 genes, at least about 1,950 genes, or at least about 2,000 genes. In some aspects, the plurality of genes can comprise at least about 2,100 genes, at least about 2,200 genes, at least about 2,300 genes, at least about 2,400 genes, at least about
2.500 genes, at least about 2,600 genes, at least about 2,700 genes, at least about 2,800 genes, at least about 2,900 genes, at least about 3,000 genes, at least about 3,100 genes, at least about 3,200 genes, at least about 3, 300 genes, at least about 3,400 genes, at least about
3.500 genes, at least about 3,600 genes, at least about 3,700 genes, at least about 3,800 genes, at least about 3,900 genes, at least about 4,000 genes, at least about 4, 100 genes, at least about 4,200 genes, at least about 4,300 genes, at least about 4,400 genes, at least about
4.500 genes, at least about 4,600 genes, at least about 4,700 genes, at least about 4,800 genes, at least about 4,900 genes, or at least about 5,000 genes. In some aspects, the plurality of genes can comprise about from 5,000 genes to 6,000 genes. In some aspects, the plurality of genes can comprise atleast about 5,100 genes, at least about 5,200 genes, at least about 5.300 genes, at least about 5,400 genes, at least about 5,500 genes, at least about 5,600 genes, at least about 5,700 genes, at least about 5,800 genes, at least about 5,900 genes, at least about 6,000 genes, at least about 6, 100 genes, at least about 6,200 genes, at least about
6.300 genes, at least about 6,400 genes, at least about 6,500 genes, at least about 6,600 genes, at least about 6,700 genes, at least about 6,800 genes, at least about 6,900 genes, at least about 7,000 genes, at least about 7, 100 genes, at least about 7,200 genes, at least about
7.300 genes, at least about 7,400 genes, at least about 7,500 genes, at least about 7,600 genes, at least about 7,700 genes, at least about 7,800 genes, at least about 7,900 genes, at least about 8,000 genes, at least about 8,100 genes, at least about 8,200 genes, at least about
8.300 genes, at least about 8,400 genes, at least about 8,500 genes, at least about 8,600 genes, at least about 8,700 genes, at least about 8,800 genes, at least about 8,900 genes, at least about 9,000 genes, at least about 9, 100 genes, at least about 9,200 genes, at least about
9.300 genes, at least about 9,400 genes, at least about 9,500 genes, at least about 9,600 genes, at least about 9,700 genes, at least about 9,800 genes, at least about 9,900 genes, or at least about 10,000 genes. In some aspects, the plurality of genes can comprise about 19,000 genes. In some aspects, the plurality of genes can comprise at least about 10,000 genes, at least about 10,500 genes, at least about 11,000 genes, at least about 11,500 genes, at least about 12,000 genes, atleast about 12,500 genes, at least about 13,000 genes, at least about 13,500 genes, at least about 14,000 genes, at least about 14,500 genes, at least about 15,000 genes, at least about 15,500 genes, at least about 16,000 genes, at least about 16,500 genes, at least about 17,000 genes, atleast about 17,500 genes, at least about 18,000 genes, at least about 18,500 genes, at least about 19,000 genes, atleast about 19,500 genes, or at least about 20,000 genes.
[0058] A gene expression matrix can be a fundamental data structure to represent the expression levels of genes across cells in different biological samples. It's a tabular data format where rows correspond to genes and columns correspond to individual cells. Each cell in the matrix contains a numerical value that represents the expression level of a specific gene in a specific cell. In some embodiments, each row in the matrix corresponds to a specific gene. The genes are identified by their unique gene symbols or identifiers. In some embodiments, each column in the matrix represents an individual biological sample. Samples can be derived from various sources, such as different tissues, experimental conditions, time points, or individuals. In some embodiments, the values in the cells of the matrix represent the expression levels of genes in the corresponding cells. These expression values may be quantified using different units, such as counts, reads per kilobase of transcript per million mapped reads (RPKM), fragments per kilobase of transcript per million mapped reads (FPKM), or transcripts per million (TPM). These values provide information about the relative abundance of each gene in each cell of the sample.
[0059] In some embodiments, gene expression matrices may be used to compare the expression levels of genes between different cells or conditions to identify genes that are significantly upregulated or downregulated. In some embodiments, gene expression matrices are used for clustering analysis, where genes or cells with similar expression patterns are grouped together. In some embodiments, gene expression matrices may be used to construct coexpression networks that reveal interactions and relationships among genes. Visualization techniques, such as heatmaps and dendrograms, help reveal patterns and relationships in the data. In some embodiments, Principal Component Analysis (PCA) and Dimensionality Reduction may be used to reduce the dimensionality of the gene expression matrix to highlight the most important patterns in the data, aiding in visualization and interpretation. In some embodiments, the method further comprising displaying, via a graphical user interface, the genes of the at least one subset to a user. In some embodiments, gene expression matrices may be used as input data for machine learning algorithms to build predictive models, such as classifying samples based on their expression profiles.
[0060] The method and system disclosed herein may comprise analyzing the gene expression level of the plurality of genes of nearest neighboring cells of the cell. In some embodiments, a number of the nearest neighboring cells can be from about 5 to about 3000. In some embodiments, the number of the nearest neighboring cells can be from about 5 to about 3000, from about 5 to about 2500, from about 5 to about 2000, from about 5 to about 1500, from about 5 to about 1200, from about 5 to about 1000, from about 5 to about 800, from about 5 to about 600, from about 5 to about 500, from about 5 to about 400, from about 5 to about 300, from about 5 to about 200, from about 5 to about 100, from about 5 to about 50, or from about 5 to about 10. In some embodiments, the number of the nearest neighboring cells can be from about 10 to about 3000, from about 10 to about 2500, from about 10 to about 2000, from about 10 to about 1500, from about 10 to about 1200, from about 10 to about 1000, from about 10 to about 800, from about 10 to about 600, from about 10 to about 500, from about 10 to about400, from about 10 to about 300, from about 10 to about 200, from about lO to about 100, orfrom about lO to about 50. In some embodiments, the number of the nearest neighboring cells can be from about 50 to about 3000, from about 50 to about 2500, from about 50 to about 2000, from about 50 to about 1500, from about 50 to about 1200, from about 50 to about 1000, from about 50 to about 800, from about 5 Oto about 600, from about 50 to about 500, from about 50 to about400, from about 50 to about 300, from about 50 to about 200, or from about 50 to about 100. In some embodiments, the number of the nearest neighboring cells can be from about 100 to about 3000, from about 100 to about 2500, from about 100 to about 2000, from about lOOto about 1500, from about lOOto about 1200, from about 100 to about 1000, from about lOOto about 800, from about 100 to about 600, from about 100 to about 500, from about 100 to about 400, from about 100 to about 300, or from about 100 to about 200. In some embodiments, the number of the nearest neighboring cells can be from about 200 to about 3000, from about 200 to about 2500, from about 200 to about 2000, from about 200 to about 1500, from about 200 to about 1200, from about 200 to about 1000, from about 200 to about 800, from about 200 to about 600, from about 200 to about 500, from about 200 to about 400, or from about 200 to about 300. In some embodiments, the number of the nearest neighboring cells can be from about 300 to about 3000, from about 300to about2500, from about 300 to about2000, from about 300 to about 1500, from about 300to about 1200, from about 300 to about 1000, from about 300 to about 800, from about 300 to about 600, from about 300 to about 500, or from about 300 to about 400. In some embodiments, the number of the nearest neighboring cells can be from about 400 to about 3000, from about 400 to about 2500, from about 400 to about 2000, from about 400 to about 1500, from about 400 to about 1200, from about 400 to about 1000, from about 400 to about 800, from about 400 to about 600, or from about 400 to about 500. In some embodiments, the number of the nearest neighboring cells can be from about 500 to about 3000, from about 500 to about 2500, from about 500 to about 2000, from about 500 to about 1500, from about 500to about 1200, from about 500 to about 1000, from about 500 to about 800, or from about 500 to about 600. In some embodiments, the number of the nearest neighboring cells can be from about 600 to about 3000, from about 600 to about 2500, from about 600 to about 2000, from about 600 to about 1500, from about 600 to about 1200, from about 600 to about 1000, or from about 600 to about 800. In some embodiments, the number of the nearest neighboring cells can be from about 800 to about 3000, from about 800 to about 2500, from about 800 to about 2000, from about 800 to about 1500, from about 800 to about 1200, or from about 800 to about 1000. In some embodiments, the number of the nearest neighboring cells can be from about 1000 to about 3000, from about 1000 to about 2500, from about 1000 to about2000, from about lOOOto about 1500, or from about lOOOto about 1200. In some embodiments, the number of the nearest neighboring cells can be from about 1200 to about 3000, from about 1200 to about2500, from about 1200 to about 2000, or from about 1200 to about 1500. In some embodiments, the number of the nearest neighboring cells can be from about 1500 to about 3000, from about 1500 to about 2500, or from about 1500 to about 2000. In some embodiments, the number of the nearest neighboring cells can be from about 2000 to about 3000, or from about 2000 to about 2500. In some embodiments, the number of the nearest neighboring cells can be from about 2500 to about 3000. In some embodiments, the number of the nearest neighboring cells can be at most about 3, 000, at most about 2500, at most about 2000, at most about 1500, atmost about 1200, at most about 1000, at most about 800, at most about 600, at most about 500, at most about 400, at most about 300, at most about 200, at most about 100, at most about 50, at most about 10, or at most about 5 . In some embodiments, the number of the nearest neighboring cells can be at least about 5, at least about 10, at least about 50, at least about 100, at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 800, at least about 1000, at least about 1200, at least about 1500, at least about 2000, at least about 2500, or at least about 3000. In some embodiments, the number of the nearest neighboring cells can be about 5, about 10, about 50, about 100, about 200, about 300, about 400, about 500, about 600, about 800, about 1000, about 1200, about 1500, about 2000, about 2500, or about 3000.
Environment confounder matrix
[0061] The method and system disclosed herein may comprise analyzing, by the computer processor, the location data and the transcriptomic data to generate an environment confounder matrix, the environment confounder matrix characterizes each cell of the plurality of cells by one or more environment variables of a region adjacent to or surrounding the each cell in the multi-dimensional image, the one or more environment variables are based at least in part on (i) cell classification of one or more cells in at least the region or (ii) measurement artifact of the transcriptomic data in at least the region .
[0062] An environment confounder matrix can be used to summarize traits of a cell’ s neighbors. In some embodiments, a single row of the environment confounder matrix would hold values describing, for a single cell, how many cells by classifications are in its neighborhood. For example, how many immune cells are in its neighborhood. In some embodiments, the immune cells may comprise B-cells, T-cells, Natural Killer (NK) cells, dendritic cells, macrophages, neutrophils, monocytes, or microglia. In some embodiments, the cell classification can comprise one or more members selected from the group comprise salivary gland mucous cells, salivary gland serous cells, von Ebner's gland cells, mammary gland cells, lacrimal gland cells, ceruminous gland cells, eccrine sweat gland dark cells, eccrine sweat gland clear cells, apocrine sweat gland cells, gland of Moll cells, sebaceous gland cells, bowman's gland cells, Brunner's gland cells, seminal vesicle cells, prostate gland cells, bulbourethral gland cells, Bartholin's gland cells, gland of Littre cells, uterus endometrium cells, isolated goblet cells, stomach lining mucous cells, gastric gland zymogenic cells, gastric gland oxyntic cells, pancreatic acinar cells, Paneth cells, type II pneumocytes, Clara cells, somatotropes, lactotropes, thyrotropes, gonadotropes, corticotropes, intermediate pituitary cells, magnocellular neurosecretory cells, gut cells, respiratory tract cells, thyroid epithelial cells, parafollicular cells, parathyroid gland cells, parathyroid chief cell, oxyphil cell, adrenal gland cells, chromaffin cells, Leydig cells, theca interna cells, corpus luteum cells, granulosa lutein cells, theca lutein cells, juxtaglomerular cell, macula densa cells, peripolar cells, mesangial cell, blood vessel and lymphatic vascular endothelial fenestrated cells, blood vessel and lymphatic vascular endothelial continuous cells, blood vessel and lymphatic vascular endothelial splenic cells, synovial cells, serosal cell, squamous cells, columnar cells, dark cells, vestibular membrane cell, stria vascularis basal cells, stria vascularis marginal cell, cells of Claudius, cells of Boettcher, choroid plexus cells, pia-arachnoid squamous cells, pigmented ciliary epithelium cells, nonpigmented ciliary epithelium cells, corneal endothelial cells, peg cells, respiratory tract ciliated cells, oviduct ciliated cell, uterine endometrial ciliated cells, rete testis ciliated cells, ductulus efferent ciliated cells, ciliated ependymal cells, epidermal keratinocytes, epidermal basal cells, keratinocyte offingemails and toenails, nail bed basal cells, medullary hair shaft cells, cortical hair shaft cells, cuticular hair shaft cells, cuticular hair root sheath cells, hair root sheath cells of Huxley's layer, hair root sheath cells of Henle's layer, external hair root sheath cells, hair matrix cells, surface epithelial cells of stratified squamous epithelium, basal cell of epithelia, urinary epithelium cells, auditory inner hair cells of organ of Corti, auditory outer hair cells of organ of Corti, basal cells of olfactory epithelium, cold-sensitive primary sensory neurons, heat-sensitive primary sensory neurons, Merkel cells of epidermis, olfactory receptor neurons, pain-sensitive primary sensory neurons, photoreceptor rod cells, photoreceptor blue-sensitive cone cells, photoreceptor green-sensitive cone cells, photoreceptor red-sensitive cone cells, proprioceptive primary sensory neurons, touch - sensitive primary sensory neurons, type I carotid body cells, type II carotid body cell, type I hair cell of vestibular apparatus of ear, type II hair cells of vestibular apparatus of ear, type I taste bud cells, cholinergic neural cells, adrenergic neural cells, peptidergic neural cells, inner pillar cells of organ of Corti, outer pillar cells of organ of Corti, inner phalangeal cells of organ of Corti, outer phalangeal cells of organ of Corti, border cells of organ of Corti, Hensen’s cells of organ of Corti, vestibular apparatus supporting cells, taste bud supporting cells, olfactory epithelium supporting cells, Schwann cells, satellite cells, enteric glial cells, astrocytes, neurons, oligodendrocytes, spindle neurons, anterior lens epithelial cells, cry stallin-containing lens fiber cells, hepatocytes, adipocytes, white fat cells, brown fat cells, liver lipocytes, kidney glomerulus parietal cells, kidney glomerulus podocytes, kidney proximal tubule brush border cells, loop of Henle thin segment cells, kidney distal tubule cells, kidney collecting duct cells, type I pneumocytes, pancreatic duct cells, nonstriated duct cells, duct cells, intestinal brush border cells, exocrine gland striated duct cells, gall bladder epithelial cells, ductulus efferent nonciliated cells, epididymal principal cells, epididymal basal cells, ameloblast epithelial cells, planum semilunatum epithelial cells, organ of Corti interdental epithelial cells, loose connective tissue fibroblasts, corneal keratocytes, tendon fibroblasts, bone marrow reticular tissue fibroblasts, nonepithelial fibroblasts, pericytes, nucleus pulposus cells, cementoblast/cementocytes, odontoblasts, odontocytes, hyaline cartilage chondrocytes, fibrocartilage chondrocytes, elastic cartilage chondrocytes, osteoblasts, osteocytes, osteoclasts, osteoprogenitor cells, hyalocytes, stellate cells, hepatic stellate cells, pancreatic stellate cells, red skeletal muscle cells, white skeletal muscle cells, intermediate skeletal muscle cells, nuclear bag cells of muscle spindle, nuclear chain cells of muscle spindle, satellite cells, ordinary heart muscle cells, nodal heart muscle cells, Purkinje fiber cells, smooth muscle cells, myoepithelial cells of iris, myoepithelial cell of exocrine glands, reticulocytes, megakaryocytes, monocytes, connective tissue macrophages, epidermal Langerhans cells, dendric tic cells, microglial cells, neutrophils, eosinophils, basophils, mast cell, helper T cells, suppressor T cells, cytotoxic T cell, natural Killer T cells, B cells, natural killer cells, melanocytes, retinal pigmented epithelial cells, oogonia/oocytes, spermatids, spermatocytes, spermatogonium cells, spermatozoa, ovarian follicle cells, Sertoli cells, thymus epithelial cell, or interstitial kidney cells. In some embodiments, the cells comprise embryonic stem cells, embryonic germ cells, induced pluripotent stem cells, mesenchymal stem cells, bone marrow-derived mesenchymal stem cells, bone marrow-derived mesenchymal stromal cells, tissue plastic-adherent placental stem cells (PDACs), umbilical cord stem cells, amniotic fluid stem cells, amnion derived adherent cells (AMDACs), osteogenic placental adherent cells (OPACs), adipose stem cells, limbal stem cells, dental pulp stem cells, myoblasts, endothelial progenitor cells, neuronal stem cells, exfoliated teeth derived stem cells, hair follicle stem cells, dermal stem cells, parthenogenically derived stem cells, reprogrammed stem cells, amnion derived adherent cells, and side population stem cells.
[0063] In some embodiments, transcriptomic data may comprise measurement artifact. In some embodiments, the measurement artifact can comprise detection data via a synthetic control probe sequence. In some embodiments, negative control probes for “ERCC” sequences may be used. These probes bind nothing in any known genome and may be used to measure background in the system. In some embodiments, negative control barcodes or “false-codes” may be used. The barcode sequences may not be generated by any physical probe and may perform as a component of background. [0064] The method and system described herein comprise the one or more environment variables based at least in part on one or more members selected from the group comprising (i) a number of cells having a cell classification of interest in at least the region of the multidimensional image, (ii) a number of different cell classifications identified in at least the region of the multi-dimensional image, (iii) a ratio between numbers of cells of two different cell classifications of interest in at least the region of the multi-dimensional image, and (iv) a relative location between a cell having a cell classification of interest and a tissue substructure in at least a portion of the multi-dimensional image.
[0065] In some embodiments, the region of the multi-dimensional image may be characterized by having at most about 200 cells, at most about 180 cells, at most about 160 cells, at most about 140 cells, at most about 120 cells, at most about 100 cells, at most about 80 cells, at most about 60 cells, at most about 40 cells, at most about 30 cells, at most about 20 cells, at most about 10 cells, or at most about 5 cells.
[0066] In some embodiments, the region of the multi-dimensional image may be characterized by pixels in spatial coordinates. In some embodiments, the region maybe at least about 1, at least about 2, at least about 5, at least about 10, at least about 15, at least about 20, at least about 30, at least about 40, at least about 50, at least about 60, at least about 80, at least about 100, at least about 120, at least about 150, at least about 200 pixels in at least one or more dimensions of X, Y, or Z of the multi-dimension image. In some embodiments, the region may be at most about200, at most about 150, at most about 120, atmost about 100, at most about 80, at most about 60, at most about 50, at most about 40, at most about 30, at most about 20, at most about 15, at most about 10, at most about 5, at most about 2, at most about 1 pixels in at least one or more dimensions of X, Y, or Z of the multi-dimension image.
[0067] In some embodiments, environment variables further comprise cell type composition of a cell’s neighborhood. The cellular environment in living organisms is complex, comprising various cell types that interact and influence each other's functions. The composition of cell types in a neighborhood can vary significantly depending on the tissue or organ being considered. The proportions of these cell types within a cell's neighborhood can vary based on factors like developmental stage, tissue function, injury, and disease. In some embodiments, environment variables further comprise counts of negative control probes in a cell’s neighborhood. In some embodiments, environment variables further comprise total gene counts in a cell’s neighborhood. The collective number of genes expressed across all the cells within the immediate vicinity of a particular cell within a tissue or organ can be counted. Each cell in an organism's body contains a full set of genes, but not all genes are actively expressed in every cell at all times. The gene expression profile of a cell determines its function and interactions within its microenvironment.
[0068] In some embodiments, the environment confounder matrix may comprise a number of one or more environment variable. The number of the one or more environment variables in the environment confounder matrix canbe from 1 to 50. In some embodiments, the number of the one or more environment variables in the environment confounder matrix can be from about 1 to about 50, from about 1 to about 40, from about 1 to about 30, from about 1 to about 20, from about 1 to about 15, from about 1 to about 10, from about 1 to about 5, or from about 1 to about 2. In some embodiments, the number of the one or more environment variables in the environment confounder matrix can be from about 2 to about 50, from about 2 to about 40, from about 2 to about 30, from about 2 to about 20, from about 2 to about 15, from about 2 to about 10, or from about 2 to about 5. In some embodiments, the number of the one or more environment variables in the environment confounder matrix can be from about 5 to about 50, from about 5 to about 40, from about 5 to about 30, from about 5 to about 20, from about 5 to about 15, or from about 5 to about 10. In some embodiments, the number of the one or more environment variables in the environment confounder matrix can be from about 10 to about 50, from about 10 to about 40, from about 10 to about 30, from about 10 to about 20, or from about 10 to about 15. In some embodiments, the number of the one or more environment variables in the environment confounder matrix can be from about 15 to about 50, from about 15 to about 40, from about 15 to about 30, or from about 15 to about 20. In some embodiments, the number of the one or more environment variables in the environment confounder matrix can be from about 20 to about 50, from about 20 to about 40, or from about 20 to about 30. In some embodiments, the number of the one or more environment variables in the environment confounder matrix canbe from about 30 to about 50, or from about 30 to about 40. In some embodiments, the number of the one or more environment variables in the environment confounder matrix canbe from about 40 to about 50. In some embodiments, the number of the one or more environment variables in the environment confounder matrix can be at least about 1, at least about 2, at least about 5, at least about 10, at least about 15, at least about 20, at least about 30, at least about 40, or at least about 50. In some embodiments, the method and system described herein further comprise receiving selection of the one or more environment variables from a user via a graphical user interface.
Flow chart of SPARC
[0069] The method and system described herein comprise determining, by the computer processor, a correlation between the plurality of genes of the gene expression matrix conditional on the environment confounder matrix; and identifying, by the computer processor and based on the correlation determined, at least one subset of the plurality of genes, genes of the at least one subset can be mutually correlated to one another in the biological sample.
[0070] As shown in Fig. 3A, a non-limiting example of workflow of SPARC comprising four operations: generating of spatial correlation conditional on cell type and other confounding variables, deriving gene modules from the conditional correlation matrix, scoring module activity, and estimating the role of each cell type in each gene/module.
[0071] As illustrated in Operation 1 of Fig. 3A, spatial correlation conditional on cell type and other confounding variables are generated. In some embodiments, at least one or more of gene expression matrix, environment confounding variables, and/or cell coordinates, i.e., XY positions are gathered as input. Cell nearest neighbor network can then be defined.
Environment expression matrix can be computed, to include the average expression profile of one cell’s neighbors in a single row. Correlation of environment expression matrix conditional on environment confounding matrix can be computed and exported, as well as the neighbor network. As shown in Fig. 3B a non-limiting example of Operation 1 , in some embodiments, cell coordinates from the image, (i.e., XY positions) are used to yield neighbor relationships. Single cell expression environment matrix and the generated neighbor relationships are used to generate environment expression matrix, which can comprise expression around each cells’ neighborhood, as illustrated in Fig. 1C. Further, neighbor relationship and cell type assignments and other environment variables are used to generate environment confounding matrix. In some embodiments, environment confounding matrix can comprise average value of confounding variables in each cell’s neighborhood. In some embodiments, a single row of the environment confounder matrix (as shown in Fig. IE) would hold values describing, for a single cell, how many cells by classifications are in its neighborhood. In some embodiments, conditional covariance can be generated, based on covariances of environment expression matrix and confounding environment matrix: cov(X|Y) = cov(X) - cov(X,Y)cov(Y) cov(Y,X), where X can be environment expression matrix and Y can be confounding environment matrix. The calculated conditional correlation covariance can be rescaled to have unit diagonal, as illustrated in Fig. 1G.
[0072] The method and system described herein further comprise determining the correlation can comprise analyzing covariance of the gene expression matrix conditional on the environment confounder matrix. The method and system described herein further comprise determining a degree of the correlation and identifying the at least one subset based on the degree of correlation. In some embodiments, the identifying can be based at least in part on determining a threshold level of the degree of correlation. In some embodiments, the at least one subset can comprise a plurality of subsets that are different from one another. In some embodiments, the plurality of subsets can comprise a first subset having a first plurality of genes and a second sub set having a second plurality of genes, the first plurality of genes has at least one gene that is not in common with the second plurality of genes.
[0073] As illustrated in Operation 2 of Fig. 3A, gene modules can be derived from the conditional correlation matrix. In some embodiments, conditional correlation matrix and gene expression matrix can be used as input. The conditional correlation matrix is classified by clusters defining gene modules. Then gene weights can be defined for scoring module activity. The module gene membership and module gene weights can be generated. As shown in Fig. 3C a non-limiting example of Operation 2, correlation from conditional correlation matrix can be transformed following z = x21(x > 0.1), where x is the original conditional correlation. In some embodiments, the original conditional correlation is transformed to square of the original value when the original value is greater than 0.1 (as shown in Fig. IF). In some embodiments, the original conditional correlation is round to zero when the original conditional correlation is less than 0.1. In some embodiments, transformed correlation is used to define network and/or graph with edge weights. Clustering is performed to generate gene classification defining membership. In some embodiments, classification is performed based on one or more methods comprising K-Means clustering, hierarchical clustering, community detection in networks, Leiden clustering, graph partitioning algorithms, or dimensionality reduction and clustering. Then gene weights can be calculated for a given module. In some embodiments, weight of each gene = (mean neighborhood expression of the gene)-1/2. The resulting values of weights can be rescaled to sum to 1 .
[0074] The method and system described herein further comprise determining a degree of the correlation and identifying the at least one sub set based on the degree of correlation. In some embodiments, the method and system described herein further comprise displaying, via a graphical user interface, a graphical representation of the additional degree of correlation . In some embodiments, the identifying is based at least in part on determining a threshold level of the degree of correlation.
[0075] The method and system described herein further comprise generating a gene cluster map comprising a plurality of shapes representing the plurality of genes. In some embodiments, the plurality of shapes is arranged in a plurality of clusters, and a cluster of the plurality of clusters corresponds to the at least one subset. In some embodiments, the method and system described herein further comprising displaying, via a graphical user interface, the gene cluster map to a user.
[0076] As illustrated in Operation s of Fig. 3A, scores for module activity can be generated. In some embodiments, input data can comprise one or more of neighbor network, module gene weights, and/or gene expression matrix. In some embodiments, neighbor network may come from output of Operation 1 . In some embodiments, module gene weights may come from output of Operation 2. In some embodiments, scores for single cell module can be computed by taking average of single cell expression of module genes. In some embodiments, the average may be weighted average, arithmetic average, geometric mean, harmonic mean, median, or quadratic mean. In some embodiments, scores for environment module can be computed by taking average of environment expression of module genes. In some embodiments, the average may be weighted average, arithmetic average, geometric mean, harmonic mean, median, or quadratic mean. As shown in Fig. 3D a non-limiting example of Operation 3, scores for module activity is generated. For a given module, gene weights can be calculated. In some embodiments, for each gene, its weight is defined as Each gene’s weight = (mean neighborhood expression of the gene) /2. All weights can be rescaled to sum to 1. In some embodiments, the mean may be weighted average, arithmetic average, geometric mean, harmonic mean, median, or quadratic mean. In some embodiments, scores for single cell module is calculated based on single expression matrix and single cell module weights. In some embodiments, scores for environment module can be calculated based environment expression matrix (output from Operation 1) and environment expression weights.
[0077] As illustrated in Operation 4 of Fig. 3A, cell type attribution analysis is performed to estimate the role of each cell type in each gene/module. In some embodiments, one or more of scores of single cell module, scores of environment module, and/or gene expression matrix can be used as input. Score for each cell type’s involvement with each module gene is calculated. Given a module gene and a cell type, the correlation of environment scores for the module with neighborhood expression of the gene by the cell type is generated. Role of each cell type in each module is summarized, by reporting the maximum value of the above statistic of the cell type attains over the module genes. In some embodiments, for each module, a matrix of cell type vs gene attribution scores is reported. In some embodiments, a single matrix of module vs cell type attribution score is reported. As shown in Fig. 3E a nonlimiting example of Operation 4, the involvement of a single cell type in a single module gene and the involvement of a cell type in the module across all genes can be scored. In some embodiments, one or more of neighbor relationships, single cell expression levels, and/or cell types can be used as input, to generate cell type specific environment expression, comprising total expression of the gene by the cell type in each cell’s neighborhood. The correlation is generated between cell type specific environment expression and cells’ environment scores for the module. In some embodiments, the correlation may be based on Pearson Correlation, Spearman’s Rank correlation, Kendall’s Tau, Point-Biserial Correlation, Distance Correlation, Partial Correlation or Bivariate Correlation. The attribution score is generated for involvement of the cell type in the gene’s contribution to the module. For scoring involvement of a cell type in the module across all genes, attribution score for a cell type in a module is calculated by taking the maximum of attribution score of the cell type in the module genes.
[0078] The method and system described herein further comprise scoring each cell of the plurality of cells based on single cell expression level of the genes of the at least one subset. In some embodiments, the method and system described herein further comprising generating an additional multi-dimensional image of the biological sample based on the scoring.
Cluster and trained algorithm
[0079] The method and system described herein comprise clustering analysis for gene expression matrices. The method and system described herein further comprise classifying conditional correlation matrices by clusters defining gene modules. The methods include using a trained classifier or algorithm to analyze sample data, particularly to performing a binary classification of gene expression. Clustering is a widely used algorithm for community detection in network analysis, particularly in the field of single -cell RNA sequencing (scRNA-seq) data analysis. It's used to identify distinct clusters or communities of cells based on their gene expression profiles.
[0080] In some embodiments, clustering algorithms consider both the feature values and the spatial proximity of the objects or features within an image of biological sample. Examples of clustering algorithms commonly used in geospatial analysis include k-means clustering, hierarchical clustering, DBSCAN (Density -Based Spatial Clustering of Applications with Noise), or spatially constrained clustering algorithms. The resulting clusters can be evaluated and validated using appropriate metrics to assess the quality and significance of the clustering results.
[0081] Applying clustering to gene expression data involves grouping genes that have similar expression patterns across different samples or conditions. This process helps to identify coregulated genes that might be involved in similar biological processes, pathways, or functions. Clustering can reveal hidden relationships within gene expression data and provide insights into the underlying biology.
[0082] Apply clustering to gene expression data is generally performed by the following methodology:
[0083] Data Collection: gene expression data is collected using techniques like microarrays or RNA sequencing (RNA-seq). Each row represents a gene, and each column represents a sample (e.g., different cells, tissues, conditions, time points).
[0084] Data Preprocessing: the data is normalized to account for differences in library sizes and other technical factors. In some cases, methods comprise TPM (transcripts per million) normalization for RNA-seq data.
[0085] Feature Selection: in some cases, a sub set of informative genes is selected if the dataset is large. This can reduce noise and computational complexity.
[0086] Distance Metric: a distance metric is chosen to quantify the similarity or dissimilarity between genes based on their expression profiles. In some cases, metrics comprise Euclidean distance, Pearson correlation, or cosine similarity.
[0087] Clustering Algorithm Selection: a clustering algorithm is chosen based on dataset's characteristics. In some embodiments, algorithms comprise hierarchical clustering, K-means clustering, and more advanced methods like DBSCAN (Density -Based Spatial Clustering of Applications with Noise) or agglomerative clustering.
[0088] Clustering Execution: The chosen clustering algorithm is applied to preprocessed gene expression data. The algorithm will group genes with similar expression patterns into clusters.
[0089] Determine Cluster Number: in some embodiments, algorithms may require specification of the number of clusters beforehand (like K-means), while others (like hierarchical clustering) can produce a dendrogram to be cut to determine the number of clusters.
[0090] Visualization: visualizations of the clustering results is created to better understand the relationships between gene expression patterns. In some embodiments, visualization can comprise Heatmaps, dendrogram plots, UMAP, or t-SNE plots.
[0091] Cluster Analysis: clustering results is interpreted to provide insights into the roles of coexpressed genes.
[0092] Validation: Depending on the algorithm, you might need to validate the quality of clusters. Internal validation methods like silhouette score or external validation methods like biological annotation can help assess the biological relevance of the clusters.
[0093] In supervised learning approaches, a group of samples from two or more groups can be analyzed with a statistical classification method. Differential gene or nucleic acid level data can be discovered that can be used to build a classifier that differentiates between the two or more groups. A new sample can then be analyzed so that the classifier can associate the new sample with one of the two or more groups. Commonly used supervised classifiers include without limitation the neural network (multi-layer perceptron), support vector machines, k- nearest neighbors, Gaussian mixture model, Gaussian, naive Bayes, decision tree and radial basis function (RBF) classifiers. Linear classification methods include Fisher's linear discriminant, LDA, logistic regression, naive Bayes classifier, perceptron, and support vector machines (SVMs). Other classifiers for use with the invention include quadratic classifiers, k-nearest neighbor, boosting, decision trees, random forests, neural networks, pattern recognition, Elastic Net, Golub Classifier, Parzen -window, Iterative RELIEF, Classification Tree, Maximum Likelihood Classifier, Nearest Centroid, Prediction Analysis of Microarrays (PAM), Fuzzy C-Means Clustering, Bayesian networks and Hidden Markov models. One of skill will appreciate that these or other classifiers, including improvements of any of these, can be contemplated within the scope of the invention, as well as combinations of any of the foregoing.
[0094] In some embodiments, the methods described herein is performing a binary classification of gene expression with at least about 70%, at least about 72%, at least about 74%, at least about 76%, at least about 78%, at least about 80%, at least about 82%, at least about 84%, at least about 86%, at least about 88%, at least about 90%, at least about 92%, at least about 94%, at least about 96%, at least about 98%, or at least about 100% sensitivity. In some embodiments, the methods described herein is performing a binary classification of gene expression with at least about 70%, at least about 72%, at least about 74%, at least about 76%, at least about 78%, at least about 80%, at least about 82%, at least about 84%, at least about 86%, at least about 88%, at least about 90%, at least about 92%, at least about 94%, at least about 96%, at least about 98%, or at least about 100% specificity. In some embodiments, the methods described herein is performing a binary classification of gene expression with at least about 70%, at least about 72%, at least about 74%, at least about 76%, at least about 78%, at least about 80%, at least about 82%, at least about 84%, at least about 86%, at least about 88%, at least about 90%, at least about 92%, at least about 94%, at least about 96%, at least about 98%, or at least about 100% accuracy.
[0095] Training of multi-dimensional classifiers (e.g., algorithms) may be performed using numerous samples. For example, training of the multi-dimensional classifier may be performed using at least about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200 or more samples. In some cases, training of the multidimensional classifier may be performed using at least about 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 350, 400, 450, 500 or more samples. In some cases, training of the multi-dimensional classifier may be performed using at least about 525, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 2000 or more samples.
[0096] In some embodiments, the methods described herein may comprise training machine learning models. In some embodiments, the methods described herein may comprise trained machine learning models comprise a supervised machine learning model, an unsupervised machine learning model, a deep learning model, or a time-series machine learning model. The trained algorithm may comprise an unsupervised machine learning algorithm. The trained algorithm may comprise a supervised machine learning algorithm. The trained algorithm may comprise a deep learning algorithm. The trained algorithm may comprise a time-series machine learning algorithm. The trained algorithm may comprise a classification and regression tree (CART) algorithm. The supervised machine learning algorithm may comprise, for example, a Random Forest, a support vector machine (SVM), a neural network, or a deep learning algorithm. The trained algorithm may comprise a self -supervised machine learning algorithm. The time-series machine learning algorithm may comprise autoregressive integrated moving average (ARIMA), recurrent neural networks (RNN), convolutional neural networks (CNN), Gaussian processes, long short-term memory networks, gated recurrent unit networks, Hidden Markov Models, or transformer-based models.
[0097] In some embodiments, a machine learning algorithm of a method as described herein utilizes one or more neural networks. In some case, a neural network is a type of computational system that can learn the relationships between an input dataset and a target dataset. A neural network may be a software representation of a human neural system (e.g., cognitive system), intended to capture “learning” and “generalization” abilities as used by a human. In some embodiments, the machine learning algorithm can comprise a neural network comprising a CNN. Non-limiting examples of structural components of machine learning algorithms described herein include: CNNs, recurrent neural networks, dilated CNNs, fully -connected neural networks, deep generative models, and Boltzmann machines. Total number of learnable or trainable parameters;
[0098] In some embodiments, the neural network can comprise artificial neural networks (ANNs). ANNs may be machine learning algorithms that may be trained to map an input dataset to an output dataset, where the ANN can comprise an interconnected group of nodes organized into multiple layers of nodes. For example, the ANN architecture may comprise at least an input layer, one or more hidden layers, and an output layer. The ANN may comprise any total number of layers, and any number of hidden layers, where the hidden layers function as trainable feature extractors that allow mapping of a set of input data to an output value or set of output values. As used herein, a deep learning algorithm (such as a deep neural network (DNN)) is an ANN comprising a plurality of hidden layers, e.g., two or more hidden layers. Each layer of the neural network may comprise a number of nodes (or “neurons”). A node receives input that comes either directly from the input data or the output of nodes in previous layers, and performs a specific operation, e.g., a summation operation. A connection from an input to a node is associated with a weight (or weighting factor). The node may sum up the products of all pairs of inputs and their associated weights. The weighted sum may be offset with a bias. The output of a node or neuron may be gated using a threshold or activation function. The activation function may be a linear or non-linear function. The activation function may be, for example, a rectified linear unit (ReLU) activation function, a Leaky ReLU activation function, or other function such as a saturating hyperbolic tangent, identity, binary step, logistic, arctan, softsign, parametric rectified linear unit, exponential linear unit, softplus, bent identity, softexponential, sinusoid, sine, Gaussian, or sigmoid function, or any combination thereof.
[0099] The weighting factors, bias values, and threshold values, or other computational parameters of the neural network, may be “taught” or “learned” in a training phase using one or more sets of training data. For example, the parameters may be trained using the input data from a training dataset and a gradient descent or backward propagation method so that the output value(s) that the ANN computes can be consistent with the examples included in the training dataset.
[0100] In some embodiments of a machine learning algorithm as described herein, a machine learning algorithm can comprise a neural network such as a deep CNN. In some embodiments in which a CNN is used, the network is constructed with any number of convolutional layers, dilated layers or fully -connected layers. In some embodiments, the number of convolutional layers is between 1-10 and the dilated layers between 0-10. The total number of convolutional layers (including input and output layers) may be at least about 1, 2, 3, 4, 5, 10, 15, 20, or greater, and the total number of dilated layers may be at least about 1, 2, 3, 4, 5, 10, 15, 20, or greater. The total number of convolutional layers may be at most about 20, 15, 10, 5, 4, 3, or less, and the total number of dilated layers may be at most about 20, 15, 10, 5, 4, 3, or less. In some embodiments, the number of convolutional layers is between 1-10 and the fully-connected layers between 0-10. The total number of convolutional layers (including input and output layers) may be at least about 1, 2, 3, 4, 5, 10, 15, 20, or greater, and the total number of fully -connected layers may be at least about 1 , 2, 3, 4, 5, 10, 15, 20, or greater. The total number of convolutional layers may be at most about 20, 15, 10, 5, 4, 3, 2, 1, or less, and the total number of fully -connected layers may be at most about 20, 15, 10, 5, 4, 3, 2, 1, or less.
[0101] Alternatively, an attention mechanism (e.g., a transformer) is applied to mimic human cognitive process of selectively focusing on relevant information while filtering out irrelevant details. Attention mechanisms may focus on, or “attend to,” certain input regions while ignoring others. This may increase model performance because certain input regions may be less relevant. At each operation, an attention unit can compute a dot product of a context vector and the input at the operation, among other operations. The output of the attention unit may define where the most relevant information in the input sequence is located.
Computing system
[0102] Referring to Fig. 4, a block diagram is shown depicting an exemplary machine that includes a computer system 400 (e.g., a processing or computing system) within which a set of instructions can execute for causing a device to perform or execute any one or more of the aspects and/or methodologies for static code scheduling of the present disclosure. The components in Fig. 4 are examples only and do not limit the scope of use or functionality of any hardware, software, embedded logic component, or a combination of two or more such components implementing particular embodiments.
[0103] Computer system 400 may include one or more processors 401, a memory 403, and a storage 408 that communicate with each other, and with other components, via a bus 440. The bus 440 may also link a display 432, one or more input devices 433 (which may, for example, include a keypad, a keyboard, a mouse, a stylus, etc.), one or more output devices 434, one or more storage devices 435, and various tangible storage media 436. All of these elements may interface directly or via one or more interfaces or adaptors to the bus 440. For instance, the various tangible storage media 436 can interface with the bus 440 via storage medium interface 426. Computer system 400 may have any suitable physical form, including but not limited to one or more integrated circuits (ICs), printed circuit boards (PCBs), mobile handheld devices (such as mobile telephones or PDAs), laptop or notebook computers, distributed computer systems, computing grids, or servers.
[0104] Computer system 400 includes one or more processor(s) 401 (e.g., central processing units (CPUs), general purpose graphics processing units (GPGPUs), or quantum processing units (QPUs)) that carry out functions. Processor(s) 401 optionally contains a cache memory unit 402 for temporary local storage of instructions, data, or computer addresses.
Processor(s) 401 are configured to assist in execution of computer readable instructions. Computer system 400 may provide functionality for the components depicted in Fig. 4 as a result of the processor(s) 401 executing non-transitory, processor-executable instructions embodied in one or more tangible computer-readable storage media, such as memory 403, storage 408, storage devices 435, and/or storage medium 436. The computer-readable media may store software that implements particular embodiments, and processor(s) 401 may execute the software. Memory 403 may read the software from one or more other computer- readable media (such as mass storage device(s) 435, 436) or from one or more other sources through a suitable interface, such as network interface 420. The software may cause processor(s) 401 to carry out one or more processes or one or more steps of one or more processes described or illustrated herein. Carrying out such processes or steps may include defining data structures stored in memory 403 and modifying the data structures as directed by the software.
[0105] The memory 403 may include various components (e.g., machine readable media) including, but not limited to, a random access memory component (e.g., RAM 404) (e.g., static RAM (SRAM), dynamic RAM (DRAM), ferroelectric random access memory (FRAM), phase-change random access memory (PRAM), etc.), a read-only memory component (e.g., ROM 405), and any combinations thereof. ROM 405 may act to communicate data and instructions unidirectionally to processor(s) 401, and RAM 404 may act to communicate data and instructions bidirectionally with processor(s) 401. ROM 405 and RAM 404 may include any suitable tangible computer-readable media described below. In one example, a basic input/output system 406 (BIOS), including basic routines that help to transfer information between elements within computer system 400, such as during startup, may be stored in the memory 403.
[0106] Fixed storage 408 is connected bidirectionally to processor(s) 401, optionally through storage control unit 407. Fixed storage 408 provides additional data storage capacity and may also include any suitable tangible computer-readable media described herein. Storage 408 may be used to store operating system 409, executable(s) 410, data 411, applications 412 (application programs), and the like. Storage 408 can also include an optical disk drive, a solid-state memory device (e.g., flash-based systems), or a combination of any of the above. Information in storage 408 may, in appropriate cases, be incorporated as virtual memory in memory 403.
[0107] In one example, storage device(s) 435 may be removably interfaced with computer system 100 (e.g., via an external port connector (not shown)) via a storage device interface 425. Particularly, storage device(s) 435 and an associated machine-readable medium may provide non-volatile and/or volatile storage of machine-readable instructions, data structures, program modules, and/or other data for the computer system 400. In one example, software may reside, completely or partially, within a machine-readable medium on storage device(s) 435. In another example, software may reside, completely or partially, within processor(s) 401
[0108] Bus 440 connects a wide variety of subsystems. Herein, reference to a bus may encompass one or more digital signal lines serving a common function, where appropriate. Bus 440 may be any of several types of bus structures including, but not limited to, a memory bus, a memory controller, a peripheral bus, a local bus, and any combinations thereof, using any of a variety of bus architectures. As an example and not by way of limitation, such architectures include an Industry Standard Architecture (ISA) bus, an Enhanced ISA (EISA) bus, a Micro Channel Architecture (MCA) bus, a Video Electronics Standards Association local bus (VLB), a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCLX) bus, an Accelerated Graphics Port (AGP) bus, HyperTransport (HTX) bus, serial advanced technology attachment (SATA) bus, and any combinations thereof.
[0109] Computer system 400 may also include an input device 433. In one example, a user of computer system 400 may enter commands and/or other information into computer system 400 via input device(s) 433. Examples of an input device(s) 433 include, but are not limited to, an alpha-numeric input device (e.g., a keyboard), a pointing device (e.g., a mouse or touchpad), a touchpad, a touch screen, a multi-touch screen, a joystick, a stylus, a gamepad, an audio input device (e.g., a microphone, a voice response system, etc.), an optical scanner, a video or still image capture device (e.g., a camera), and any combinations thereof. In some embodiments, the input device is a Kinect, Leap Motion, or the like. Input device(s) 433 may be interfaced to bus 440 via any of a variety of input interfaces 423 (e.g., input interface 423) including, but not limited to, serial, parallel, game port, USB, FIREWIRE, THUNDERBOLT, or any combination of the above.
[0110] In particular embodiments, when computer system 400 is connected to network 430, computer system 400 may communicate with other devices, specifically mobile devices and enterprise systems, distributed computing systems, cloud storage systems, cloud computing systems, and the like, connected to network 430. Communications to and from computer system 400 may be sent through network interface 420. For example, network interface 420 may receive incoming communications (such as requests or responses from other devices) in the form of one or more packets (such as Internet Protocol (IP) packets) from network 430, and computer system 400 may store the incoming communications in memory 403 for processing. Computer system 400 may similarly store outgoing communications (such as requests or responses to other devices) in the form of one or more packets in memory 403 and communicated to network 430 from network interface 420. Processor(s) 401 may access these communication packets stored in memory 403 for processing.
[0111] Examples of the network interface 420 include, but are not limited to, a network interface card, a modem, and any combination thereof. Examples of a network 430 or network segment 430 include, but are not limited to, a distributed computing system, a cloud computing system, a wide area network (WAN) (e.g., the Internet, an enterprise network), a local area network (LAN) (e.g., a network associated with an office, a building, a campus or other relatively small geographic space), a telephone network, a direct connection between two computing devices, a peer-to-peer network, and any combinations thereof. A network, such as network 430, may employ a wired and/or a wireless mode of communication. In general, any network topology may be used.
[0112] Information and data canbe displayed through a display 432. Examples of a display 432 include, but are not limited to, a cathode ray tube (CRT), a liquid crystal display (LCD), a thin film transistor liquid crystal display (TFT-LCD), an organic liquid crystal display (OLED) such as a passive-matrix OLED (PMOLED) or active-matrix OLED (AMOLED) display, a plasma display, and any combinations thereof. The display 432 can interface to the processor(s) 401, memory 403, and fixed storage 108, as well as other devices, such as input device(s) 433, via the bus 440. The display 432 is linked to the bus 440 via a video interface 422, and transport of data between the display 432 and the bus 440 can be controlled via the graphics control 421. In some embodiments, the display is a video projector. In some embodiments, the display is a head-mounted display (HMD) such as a VR headset. In further embodiments, suitable VR headsets include, by way of non -limiting examples, HTC Vive, Oculus Rift, Samsung Gear VR, Microsoft HoloLens, Razer OSVR, FOVE VR, Zeiss VR One, Avegant Glyph, Freefly VR headset, and the like. In still further embodiments, the display is a combination of devices such as those disclosed herein.
[0113] In addition to a display 432, computer system 400 may include one or more other peripheral output devices 434 including, but not limited to, an audio speaker, a printer, a storage device, and any combinations thereof. Such peripheral output devices may be connected to the bus 440 via an output interface 424. Examples of an output interface 424 include, but are not limited to, a serial port, a parallel connection, a USB port, a FIREWIRE port, a THUNDERBOLT port, and any combinations thereof.
[0114] In addition or as an alternative, computer system 400 may provide functionality as a result of logic hardwired or otherwise embodied in a circuit, which may operate in place of or together with software to execute one or more processes or one or more steps of one or more processes described or illustrated herein. Reference to software in this disclosure may encompass logic, and reference to logic may encompass software. Moreover, reference to a computer-readable medium may encompass a circuit (such as an IC) storing software for execution, a circuit embodying logic for execution, or both, where appropriate. The present disclosure encompasses any suitable combination of hardware, software, or both.
[0115] Those of skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality.
[0116] The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
[0117] The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by one or more processor(s), or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
[0118] In accordance with the description herein, suitable computing devices include, by way of non-limiting examples, cloud computing platforms, distributed computing platforms, server clusters, server computers, desktop computers, laptop computers, notebook computers, sub - notebook computers, netbook computers, netpad computers, set-top computers, media streaming devices, handheld computers, Internet appliances, mobile smartphones, and tablet computers.
[0119] In some embodiments, the computing device includes an operating system configured to perform executable instructions. The operating system is, for example, software, including programs and data, which manages the device’s hardware and provides services for execution of applications. Those of skill in the art will recognize that suitable server operating systems include, by way of non-limiting examples, FreeBSD, OpenBSD, NetBSD®, Linux, Apple® Mac OS X Server®, Oracle® Solaris®, Windows Server®, and Novell® NetWare®. Those of skill in the art will recognize that suitable personal computer operating systems include, by way of non-limiting examples, Microsoft® Windows®, Apple® Mac OS X®, UNIX®, and UNIX -like operating systems such as GNU/Linux®. In some embodiments, the operating system is provided by cloud computing. Those of skill in the art will also recognize that suitable mobile smartphone operating systems include, by way of non-limiting examples, Nokia® Symbian® OS, Apple® iOS®, Research In Motion® BlackBerry OS®, Google® Android®, Microsoft® Windows Phone® OS, Microsoft® Windows Mobile® OS, Linux®, and Palm® WebOS®.
Non-transitory computer readable storage medium
[0120] In some embodiments, the platforms, systems, media, and methods disclosed herein include one or more non-transitory computer readable storage media encoded with a program including instructions executable by the operating system of an optionally networked computing device. In further embodiments, a computer readable storage medium is a tangible component of a computing device. In still further embodiments, a computer readable storage medium is optionally removable from a computing device. In some embodiments, a computer readable storage medium includes, by way of non -limiting examples, CD-ROMs, DVDs, flash memory devices, solid state memory, magnetic disk drives, magnetic tape drives, optical disk drives, distributed computing systems including cloud computing systems and services, and the like. In some cases, the program and instructions are permanently, substantially permanently, semi-permanently, or non- transitorily encoded on the media.
Computer program
[0121] In some embodiments, the platforms, systems, media, and methods disclosed herein include at least one computer program, or use of the same. A computer program includes a sequence of instructions, executable by one or more processor(s) of the computing device’s CPU, written to perform a specified task. Computer readable instructions may be implemented as program modules, such as functions, objects, Application Programming Interfaces (APIs), computing data structures, and the like, that perform particular tasks or implement particular abstract datatypes. In light of the disclosure provided herein, those of skill in the art will recognize that a computer program may be written in various versions of various languages.
[0122] The functionality of the computer readable instructions may be combined or distributed as desired in various environments. In some embodiments, a computer program can comprise one sequence of instructions. In some embodiments, a computer program can comprise a plurality of sequences of instructions. In some embodiments, a computer program is provided from one location. In other embodiments, a computer program is provided from a plurality of locations. In various embodiments, a computer program includes one or more software modules. In various embodiments, a computer program includes, in part or in whole, one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof.
Web application
[0123] In some embodiments, a computer program includes a web application. In light of the disclosure provided herein, those of skill in the art will recognize that a web application, in various embodiments, utilizes one or more software frameworks and one or more database systems. In some embodiments, a web application is created upon a software framework such as Microsoft® .NET or Ruby on Rails (RoR). In some embodiments, a web application utilizes one or more database systems including, by way of non-limiting examples, relational, non-relational, object oriented, associative, XML, and document oriented database systems. In further embodiments, suitable relational database systems include, by way of non-limiting examples, Microsoft® SQL Server, mySQL™, and Oracle®. Those of skill in the art will also recognize that a web application, in various embodiments, is written in one or more versions of one or more languages. A web application may be written in one or more markup languages, presentation definition languages, client-side scripting languages, server-side coding languages, database query languages, or combinations thereof. In some embodiments, a web application is written to some extent in a markup language such as Hypertext Markup Language (HTML), Extensible Hypertext Markup Language (XHTML), or extensible Markup Language (XML). In some embodiments, a web application is written to some extent in a presentation definition language such as Cascading Style Sheets (CSS). In some embodiments, a web application is written to some extent in a client-side scripting language such as Asynchronous JavaScript and XML (AJAX), Flash® ActionScript, JavaScript, or Silverlight®. In some embodiments, a web application is written to some extent in a server-side coding language such as Active Server Pages (ASP), ColdFusion®, Perl, Java™, JavaServer Pages (JSP), Hypertext Preprocessor (PHP), Python™, Ruby, Tel, Smalltalk, WebDNA®, or Groovy. In some embodiments, a web application is written to some extent in a database query language such as Structured Query Language (SQL). In some embodiments, a web application integrates enterprise server products such as IBM® Lotus Domino®. In some embodiments, a web application includes a media player element. In various further embodiments, a media player element utilizes one or more of many suitable multimedia technologies including, by way of non -limiting examples, Adobe® Flash®, HTML 5, Apple® QuickTime®, Microsoft® Silverlight®, Java™, and Unity®.
[0124] Referring to Fig. 5, in a particular embodiment, an application provision system can comprise one or more databases 500 accessed by a relational database management system (RDBMS) 510. Suitable RDBMSs include Firebird, MySQL, PostgreSQL, SQLite, Oracle Database, Microsoft SQL Server, IBMDB2, IBM Informix, SAP Sybase, Teradata, and the like. In this embodiment, the application provision system further can comprise one or more application severs 520 (such as Java servers, .NET servers, PHP servers, and the like) and one or more web servers 530 (such as Apache, IIS, GWS and the like). The web server(s) optionally expose one or more web services via app application programming interfaces (APIs) 540. Via a network, such as the Internet, the system provides browser-based and/or mobile native user interfaces.
[0125] Referring to Fig. 6, in a particular embodiment, an application provision system alternatively has a distributed, cloud-based architecture 600 and can comprise elastically load balanced, auto-scaling web server resources 610 and application server resources 620 as well synchronously replicated databases 630.
Mobile application
[0126] In some embodiments, a computer program includes a mobile application provided to a mobile computing device. In some embodiments, the mobile application is provided to a mobile computing device at the time it is manufactured. In other embodiments, the mobile application is provided to a mobile computing device via the computer network described herein.
[0127] In view of the disclosure provided herein, a mobile application is created by techniques known to those of skill in the art using hardware, languages, and development environments known to the art. Those of skill in the art will recognize that mobile applications are written in several languages. Suitable programming languages include, by way of non -limiting examples, C, C++, C#, Objective-C, Java™, JavaScript, Pascal, Object Pascal, Python™, Ruby, VB.NET, WML, and XHTML/HTML with or without CSS, or combinations thereof.
[0128] Suitable mobile application development environments are available from several sources. Commercially available development environments include, by way of non-limiting examples, AirplaySDK, alcheMo, Appcelerator®, Celsius, Bedrock, Flash Lite, .NET Compact Framework, Rhomobile, and WorkLight Mobile Platform. Other development environments are available without cost including, by way of non -limiting examples, Lazarus, MobiFlex, Mo Sync, and PhoneGap. Also, mobile device manufacturers distribute software developer kits including, by way of non-limiting examples, iPhone and iPad (iOS) SDK, Android™ SDK, BlackBerry® SDK, BREW SDK, Palm® OS SDK, Symbian SDK, webOS SDK, and Windows® Mobile SDK.
Standalone application
[0129] In some embodiments, a computer program includes a standalone application, which is a program that is run as an independent computer process, not an add-on to an existing process, e.g., not a plug-in. Those of skill in the art will recognize that standalone applications are often compiled. A compiler is a computer program(s) that transforms source code written in a programming language into binary object code such as assembly language or machine code. Suitable compiled programming languages include, by way of non-limiting examples, C, C++, Objective-C, COBOL, Delphi, Eiffel, Java™, Lisp, Python™, Visual Basic, and VB .NET, or combinations thereof. Compilation is often performed, at least in part, to create an executable program. In some embodiments, a computer program includes one or more executable complied applications.
Web browser plug-in
[0130] In some embodiments, the computer program includes a web browser plug-in (e.g., extension, etc.). In computing, a plug-in is one or more software components that add specific functionality to a larger software application. Makers of software applications support plug-ins to enable third-party developers to create abilities which extend an application, to support easily adding new features, and to reduce the size of an application. When supported, plug-ins enable customizing the functionality of a software application. For example, plug-ins are commonly used in web browsers to play video, generate interactivity, scan for viruses, and display particular file types. Those of skill in the art will be familiar with several web browser plug-ins including, Adobe® Flash® Player, Microsoft® Silverlight®, and Apple® QuickTime®. In some embodiments, the toolbar can comprise one or more web browser extensions, add-ins, or add-ons. In some embodiments, the toolbar can comprise one or more explorer bars, tool bands, or desk bands.
[0131] In view of the disclosure provided herein, those of skill in the art will recognize that several plug-in frameworks are available that enable development of plug-ins in various programming languages, including, by way of non-limiting examples, C++, Delphi, Java™, PHP, Python™, and VB .NET, or combinations thereof.
[0132] Web browsers (also called Internet browsers) are software applications, designed for use with network-connected computing devices, for retrieving, presenting, and traversing information resources on the World Wide Web. Suitable web browsers include, by way of non-limiting examples, Microsoft® Internet Explorer®, Mozilla® Firefox®, Google® Chrome, Apple® Safari®, Opera Software® Opera®, andKDE Konqueror. In some embodiments, the web browser is a mobile web browser. Mobile web browsers (also called microbrowsers, mini-browsers, and wireless browsers) are designed for use on mobile computing devices including, by way of non-limiting examples, handheld computers, tablet computers, netbook computers, subnotebook computers, smartphones, music players, personal digital assistants (PDAs), and handheld video game systems. Suitable mobile web browsers include, by way of non-limiting examples, Google® Android® browser, RIM BlackBerry® Browser, Apple® Safari®, Palm® Blazer, Palm® WebOS® Browser, Mozilla® Firefox® for mobile, Microsoft® Internet Explorer® Mobile, Amazon® Kindle® Basic Web, Nokia® Browser, Opera Software® Opera® Mobile, and Sony® PSP™ browser.
Software modules
[0133] In some embodiments, the platforms, systems, media, and methods disclosed herein include software, server, and/or database modules, or use of the same. In view of the disclosure provided herein, software modules are created by techniques known to those of skill in the art using machines, software, and languages known to the art. The software modules disclosed herein are implemented in a multitude of ways. In various embodiments, a software module can comprise a file, a section of code, a programming object, a programming structure, a distributed computing resource, a cloud computing resource, or combinations thereof. In further various embodiments, a software module can comprise a plurality of files, a plurality of sections of code, a plurality of programming objects, a plurality of programming structures, a plurality of distributed computing resources, a plurality of cloud computing resources, or combinations thereof. In various embodiments, the one or more software modules comprise, by way of non -limiting examples, a web application, a mobile application, a standalone application, and a distributed or cloud computing application. In some embodiments, software modules are in one computer program or application. In other embodiments, software modules are in more than one computer program or application. In some embodiments, software modules are hosted on one machine. In other embodiments, software modules are hosted on more than one machine. In further embodiments, software modules are hosted on a distributed computing platform such as a cloud computing platform. In some embodiments, software modules are hosted on one or more machines in one location. In other embodiments, software modules are hosted on one or more machines in more than one location.
Databases
[0134] In some embodiments, the platforms, systems, media, and methods disclosed herein include one or more databases, or use of the same. In view of the disclosure provided herein, those of skill in the art will recognize that many databases are suitable for storage and retrieval of gene expression and images of biological samples information. In various embodiments, suitable databases include, by way of non -limiting examples, relational databases, non-relational databases, object oriented databases, object databases, entity - relationship model databases, associative databases, XML databases, document oriented databases, and graph databases. Further non-limiting examples include SQL, PostgreSQL, MySQL, Oracle, DB2, Sybase, and MongoDB. In some embodiments, a database is Internetbased. In further embodiments, a database is web -based. In still further embodiments, a database is cloud computing-based. In a particular embodiment, a database is a distributed database. In other embodiments, a database is based on one or more local computer storage devices.
EXAMPLES
Example 1. SPARC in a colon cancer profiled with the CosMx SMI 6000-plex assay.
[0135] SPARC is a toolkit for quickly identifying spatial correlations meriting attention.
SPARC identifies gene modules with spatial correlations that cannot be explained by trivial factors like the cell type landscape or technical effects. In some embodiments, it discovers dozens of such modules. To steer analysts towards the most interesting clusters, the software includes tools to implicate cell types in module activity and to describe module spatial patterns. SPARC is a powerful and convenient way to quickly identify spatial transcriptomics trends that deserve scarce analyst attention.
[0136] Methods:
[0137] Data preparation and analysis: CosMx SMI liver cancer data, including cell type annotations, was downloaded from https://nanostring.com/products/cosmx-spatial- molecular-imager/human-liver-rna-ffpe-dataset/. The CosMx SMI human liver data provides a subcellular expression map of 1,000 genes and a single cell tissue atlas that categorizes each cell in the tissue as one of 18 unique cell types. The complete dataset consists of over 800,000 single cells and -700 million transcripts, and a single-cell tissue atlas across a -180 mm2 area of liver tissue. The high-plex analysis provided deep insight into the cell and tissue changes that occur in cancer, including infiltration of diverse immune cells. CosMx SMI colon cancer data was generated by running a 5 -micron slice from a FFPE colon cancer sample through the CosMx SMI instrument, using a 6000-plex RNA panel. Colon cancer cell typing was performed using Insitutype.
[0138] SPARC Algorithm: Environment expression and confounder matrices were defined by averaging gene expression (or confounder variable values) across each cell’s neighbors.
Neighbors was defined by K-nearest or radius-based logic. For example, the neighbors were defined as the 50 cells closest to a cell in XY space. The covariance of the former matrix conditional on the latter is calculated with cov(X|Y) = cov(X) - cov(X,Y)cov(Y)-lcov(Y,X).
[0139] Conditional correlation was calculated by rescaling this covariance matrix to have unit diagonal. The above formula holds for multivariate normal variables. Because the environment matrix was produced by averaging each cell’s 50 nearest neighbors, the central limit theorem provided assurance that multivariate normality approximately holds.
[0140] Gene modules may be derived from a correlation matrix in many ways. SPARC created a network graph in which all genes sharing conditional correlations above some threshold were connected, then clustered this graph using the Leiden algorithm.
[0141] Module scores were calculated as weighted averages of their genes; the default used inverse square root weighting to account for the Poisson-like mean-variance relation seen in count data. Module scores were calculated from single cell expression profiles (“single cell scores”) and from cells’ neighborhood profiles (“environment scores”).
[0142] Analysis of colon cancer: Cell types were defined using the Insitutype R package. As input to Insitutype, preliminary cell type reference profiles were derived by merging the Insitutype package’ s 16 immune and stroma cell type profiles with 11 healthy colon cell type profiles from the Human Cell Atlas (Rozenblatt-Rosen (), Stubbington MJ, Regev A, Teichmann SA. The Human Cell Atlas: from vision to reality'. Nature, 2017 Oct 26;550(7677):451 -3). We further directed Insitutype to fit an additional 2 novel clusters.
[0143] SPARC was run on normalized data, computed by dividing each cell’s expression vector by its sum. Cell type, total counts and total negative control counts were used for confounding variables. SPARC’s default parameters were used throughout.
[0144] Ligands and ligand-receptor pairs were taken from CellChatDB (Jin S, Guerrero-Juarez CF, Zhang L, Chang I, Ramos R, Kuan CH, Myung P, Plikus MV, Nie Q. Inference and analysis of cell-cell communication using CellChat. Nature communications. 2021 Feb 17; 12(1): 1088). Tertiary lymphoid structures were defined by clustering B -cell locations with the dbscan algorithm. The 3 largest clusters (130, 501 and 905 B -cells) were called tertiary lymphoid structures; the next-largest cluster had just 28 B-cells. Gene expression fold-changes were computed by T-test.
[0145] Results
[0146] SPARC in a colon cancer profiled with the CosMx SMI 6000-plex assay was demonstrated as in Fig. 1A. SPARC begins by taking the expression profile of a small neighborhood around each cell as shown in Fig. IB and building an “environment expression matrix” as shown in Fig. 1C. Typical methods produce results akin to taking the correlation matrix of the environment matrix as shown in Fig. ID
[0147] To eliminate the influence of unwanted variables like cell type, signal strength and background intensity, SPARC built an “environment confounder matrix” as shown in Fig. IE to summarize these variables for each cell’s neighborhood. SPARC defined spatial correlation as the correlation matrix of the environment matrix, conditional on the confounder matrix as shown in Fig. IF. Entries in this conditional correlation matrix measure genes’ tendency to be expressed in the same neighborhoods, beyond what cell typ e and other confound ers can explain. Genes with cell type specific expression showed strong spatially correlation that disappears in the conditional correlation matrix as shown in Fig. 1G. Correlations that remain are not explainable by the trivial factors of the confounding matrix. Most of the strongest spatial correlations in the unadjusted analysis as shown in Fig. ID were revealed as spurious by the conditional correlation matrix: among the top 5000 gene pairs found without adjusting for confounders, with a range of (0.62, 0.97), only 1018 had correlations > 0.2 in the conditional correlation matrix.
[0148] The conditional correlation matrix is too complex for human interpretation. SPARC aided comprehension by extracting modules of mutually spatially co -expressed genes as shown in Fig. 1H These modules provided a view of spatial correlation that was small enough to be understandable and expansive enough to capture complex biology. One module discovered in this example analysis consisted of 17 genes collectively suggestive of tumorpromoting inflammation as illustrated in Figs. 1H, II, 1 J, IK, and IM. This included genes involved in microenvironment remodeling (CCL18, MMP2, CSTK), growth factor signaling (SFRP2, GREM1, DCN, SERPINF1), and inflammation (C3, C1R, PTGIS).
[0149] Each module was scored with a weighted average of its genes. Module scores were calculated both for cells’ environments and for single cell expression. A spatial map of environment scores for tumor-promoting inflammation module showed its peaking in the stroma, with smaller hotspots in the tumor bed as shown in Fig. II. As shown in Fig. 1 J, CAFs and macrophages driving module activity, with nearby mast cells, smooth muscle cells and stromal cells were found to be also participating, when looking at single-cell scores for the module. More nuanced behavior of the module genes was found across cell types and space, when zooming in to allow individual transcripts resolvable as shown in Fig. IK.
[0150] Dozens of modules may be discovered in a study. To help identify the modules of greatest interest, SPARC is used to estimate the role of each cell type in each module. Cell type involvement was summarized at the module level to facilitate comparisons between modules as shown in Fig. IL, or at the gene level to give a more nuanced view of module behavior as shown in Fig. IM. The tumor-promoting inflammation module was primarily attributed to CAFs and macrophages, with a stromal cell type playing a more minor role (Fig. IL). Macrophages were primarily responsible for expression of CCL18, F13A1, and PLTP, and CAFs were the main contributor to the remaining genes (Fig. IM).
[0151] For example, the genes of the exemplary summary plot illustrated in Fig. IL can be connected into one or more modules with various cell types. The genes illustrated in the exemplary summary plot in Fig. IL include, for example, CRHR2 HBA1/2 46, CEACAM3 A1BG 13, IGHG1/2 GCDH 4, IGLL1 GLL5 6, JCHAIN MZB1 POU2AF1 3, GCG PYY 2, FCGBP MUC2 5,
OLFM4 ITLN1 DMBT1 3, MSLN SPIB COLl 1 A2_3, CA2_S100A14_HMGCS2_3, BCAS1 MLPH 2, B3GNT7 PLA2G2A 7, MT1G MT1X 5, ITM2C TSPAN3 4, CD24_SELENBP1_KLF5_3, PIGR CTSS 2, PLAC8 TMPRSS2 17, KRT8 KRT 19 4, MT2A _TIMP1_SERPINE1_3, C1S SERPING1 2, COL4A1 COL4A2 7,
FOS EGR1 12, TAGLN MYL9 10, THBS2 COL11 Al_4, COL6A2 POSTN BGN 3, LGALS1 SPARC 8, LUM MMP11 MFAP2 3, COL1 A1 COL3A1 7, MMP1 MMP3 4, VEGFA LOXL2 20, NKD1 APCDD1 FN1 3, ID1_ID3_2,PKM_TPI1_2, IFI6 ISG15 7, LAMC2 MMP7 2, PFN1 S100A6 4, CKB ELF3 11, APP JUP CDH1 3, S100A10_LGALS3_ELOB_3, BHLHE40 NEAT1 2, HDGF LMNA 2,
SLC12A2 RGMB 4, TLE5 CLTA 2, TMSB10 COX4I1 5, COX8A ATP5F1B 4, H3C2 H3C8 12, GPX2 SLC25A6 2, MARCKSL1 PRDX5 2, TMSB4X CALM2 6, NET1 ANP32B 2, CXCL2 CXCL3 2, KPNA2 UBE2S 2, BACE2 LCN2 2, HSP90B1 PDIA4 2, DNAJB1 HSPA1 A_7, FDFT1 SCD 5, LDLR_IER3_NCOA7_3, SSTR4 HAND2 ADD1 3, SNRPD1 NDUFA3 13, NPFF TNFSF15 2, BTG1_TXNIP_IL7R_3, CR2_CXCL13_LTF_3, CCL19 CCL21 2, CXCL12 PTGDS 2, A2M SPARCL1 2, THBS1 DES 2, ACKR1 CLDN5 2, CD163 STAB1 2,
CPA3 TPSAB1/2 2, WASHC3 EPB41L5 2, AGBL4 MSRB2 2, ARHGAP5 FOXO4 2, Custom 173 AC ADM 2, AQP4 FANCC 2, CSMD3 LEP 2, ANO6 PIAS2 GRIN2B 3, LRRC4B NEU3 2, GRID1 TACR3 2, RAB27B TOLLIP 2, CPEB2 KLHL11 2, NEUR0G3 RDH11 2, HEG1 PRPF38A 4, RGS12 CASR 2, GAPVD1 STAT6 2, VIP GAL 4, PAK5 TNNT2 2, GTF2B_IL19_APBA1_3, GRIA1 LANCL1 2, MHC 1 B2M 6, CD74 HLA-DRA 13, CXCL14 IGFBP3 4, SFRP2 CCL18 17, CXCL8 PLAUR 2, STAT1 TYMP PSMB9 3, APOE_APOC1_IFI30_3, and CTSD CTSL GPNMB 3. The cell types illustrated in the exemplary summary plot of Fig. IL that can be correlated with expression of those exemplary genes include, for example, stromal cells type 2b, epithelial cancer cells subtype 2, epithelial cancer cells subtype 1, macrophages, cancer-associated fibroblasts (CAFs), pericytes, endothelial cells type 1, mast cells, stromal cells type 3, endothelial cells, smooth muscle cells, monocytes, stromal cells type 1, CD4 naive T-cells, natural killer (NK) cells, plasmacytoid dendritic cells (pDCs), CD8 memory T-cells, neutrophils, glial cells, myeloid dendritic cells (mDCs), Tregs, stromal cells type 4, B-cells, stromal cells type 2a, epithelial normal cells in cellular crypts, plasmablasts, plasma cells, epithelial normal villi subtype 1, and epithelial normal cells of unclear origin.
[0152] In a study, the analyst will choose confounding variables, then derive modules with a single R command. Summary plots as shown in Figs IL, IM, and IN may suggest a few modules of particular interest. As shown in Fig. IK, to invest real effort, spatial plots of module gene expression were examined to develop a nuanced understanding of the module’s behavior. Observations of interest can be characterized with bespoke summary statistics or prioritized for independent validation.
[0153] SPARC used subsets of cells to speed computation time when possible. The complete SPARC workflow took less than 15 minutes on a 5. 12xlarge EC2 instance server to analyze this dataset of 112,846 cells and 6,000 genes.
Example 2. A knowledge-driven (biology -first) workflow
[0154] In some cases, it examines the conditional correlation structure around genes of prior interest. In some cases, one use case is to restrict analysis to a given class of genes. As an example, the 407 ligands (Jin S, Guerrero-Juarez CF, Zhang L, Chang I, Ramos R, Kuan CH, MyungP, Plikus MV, Nie Q. Inference and analysis of cell-cell communication using CellChat. Nature communications. 2021 Feb 17; 12(1 ): 1088) in the panel were re-analyzed to explore the cell signaling environment in this tumor. In some embodiments, the algorithm may be applied on a subset of the genes. In some embodiments, the genes may be involved in a given pathways. In some embodiments, the pathway may comprise hypoxia, apoptosis, proliferation, or most any pathway from GO, KEGG, or Reactome databases. [0155] This re-analyzed process produced 18 modules containing 51 ligands, many arising from multiple cell types as shown in Figs. 2A and 2B. Each ligand may be associated with various cell types, as shown in the exemplary summary plot of Fig. 2B. The ligands associated with the various cell types as illustrated in the exemplary summary plot of Fig. 2B include, for example, CCL19 CCL21 2, HLA-DRA HLA-DPA1 5 , HLA-G HLA-F HLA-E 3 , COL4A1 COL4A2 LAMA4 3, COL1 A1 COL6A1 6 , COMP THBS2 FN1 3 , VEGFA ADM 5, CCL18 CCL13 C3 3, GAL_VIP_2, GDF6 WNT9A 2,
NPFF TNFSF15 COL4A5 3, GDNF FSHB 2, PYY GCG 2, GUCA2B CEACAM1 2, APP CDH1 2, LAMB3 LAMC2 2, GDF15 EFNA1 2, and CXCL2 CXCL3 2. The various subtypes that are illustrated in the exemplary summary plot of Fig. 2B as being associated with the ligands are, for example, stromal cells type 3, mast cells, glial cells, B- cells, myeloid dendritic cells (mDCs), stromal cells type 4, Tregs, epithelial normal villi subtype 1, epithelial normal cells in cellular crypts, CD4 naive T-cells, CD8 memory T-cells, endothelial cells type 1, stromal cells type 2a, natural killer (NK) cells, plasmablasts, monocytes, stromal cells type 1, neutrophils, plasmacytoid dendritic cells (pDCs), plasma cells, epithelial normal cells of unclear origin, endothelial cells, pericytes, smooth muscle cells, stromal cells type 2b, epithelial cancer cells subtype 2, epithelial cancer cells subtype 1, macrophages, and cancer-associated fibroblasts (CAFs).
[0156] Focusing on modules involved in the anti -tumor immune response, a module of the chemoattractants CCL19 and CCL21 concentrated in a narrow band at the tumor periphery was noted, a module of MHC2 antigen presentation genes diffusing slightly beyond this band, and a module of MHC1 antigen presentation genes peaking in the same region but extending further yet into the tumor bed were found as shown in Figs. 2C, 2D, and 2E. This suggests an interpretation in which a small core of chemoattractant expression attracts professional antigen presenting cells, and an adaptive immune response radiates from this core, eliciting MHC1 expression from surrounding cells.
[0157] Ligand-receptor pairs may also be applied for another use case. If a ligand -receptor pair displays spatial correlation, it suggests these genes are co-regulated, presumably either by the ligand increasing the receptor’s expression or via some latent variable inducing regional expression in both genes (Li 2023). As shown in Fig. 2F, of the 555 ligand-receptor pairs in this panel, very few showed evidence for spatial co -regulation: only 11 had conditional correlation > 0.1. One highly correlated pair was FCER2 and CR2, both primarily expressed by B-cells. As shown in Fig. 2G, their expression rose sharply in 3 lymphoid structures, with B-cells in these regions having 2.57-fold higher CR2 than B-cells elsewhere. Thus SPARC flags this ligand-receptor pair as spatially co-expressed, at which point manual examination easily finds them to be up-regulated in B-cells in tertiary lymphoid structures.
[0158] SPARC can also be used to explore individual genes of high prior interest. The correlation network around FCER2 and CR2 was examined. When looking only at conditional correlations > 0.1. FCER2 had no further connections, but CR2 belonged to a densely-connected network of 10 additional genes, many involved in B-cell development and activation (citations needed). Under a strong prior that causal arrows point from the ligand FCER2 to the receptor CR2, it would be reasonable to invest time exploring the hypothesis that the additional genes connected to CR2 are activated downstream of FCER2 - CR2 signaling. Thus SPARC results can be used to hint at the downstream effects of a ligand-receptor interaction.
[0159] The key insight motivating SPARC is that tissues’ cell type landscapes induce strong spatial correlation between genes, even when those genes do not vary within a cell type. SPARC is designed to ignore these trivial findings and instead report only correlations that cannotbe explained by the cell type landscape. In dataset, a majority of the strongest spatial correlations prove to be uninteresting after adjusting for cell type abundance - false discoveries wasting analyst time.
[0160] SPARC supports two workflows: data-driven hypothesis generation via clustering of panel-wide correlation results, and knowledge-driven hypothesis testing via examination of correlations among genes of prior interest. In both cases, tools are provided to aid deeper explorations.
[0161] Visual exploration is an essential step in spatial transcriptomics analysis: human pattern - recognition capabilities, paired with domain expertise, are a powerful pair not yet replaceable by statistics alone. However, statistics have a key role: telling humans the most interesting places to look. SPARC performs this task quickly, both in computational time and analyst time.
[0162] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims

CLAIMS WHAT IS CLAIMED IS:
1. A method for analyzing a biological sample to identify genes having spatial correlations with one another, the method comprising:
(a) retrieving, by a computer processor and from a database: a location data indicative of relative positions of the plurality of cells in a multidimensional image of the biological sample; and a transcriptomic data of a plurality of genes of the plurality of cells;
(b) processing, by the computer processor, at least the transcriptomic data to generate a gene expression matrix characterizing each cell of the plurality of cells by a gene expression level of the plurality of genes;
(c) analyzing, by the computer processor, the location data and the transcriptomic data to generate an environment confounder matrix, wherein the environment confounder matrix characterizes each cell of the plurality of cells by one or more environment variables of a region adjacent to or surrounding the each cell in the multi-dimensional image, wherein the one or more environment variables are based at least in part on (i) cell classification of one or more cells in at least the region or (ii) measurement artifact of the transcriptomic data in at least the region;
(d) determining, by the computer processor, a correlation between the plurality of genes of the gene expression matrix conditional on the environment confounder matrix; and
(e) identifying, by the computer processor and based on the correlation determined in (d), at least one subset of the plurality of genes, wherein genes of the at least one subset are mutually correlated to one another in the biological sample.
2. The method of claim 1, wherein (d) comprises determining a degree of the correlation, and wherein (e) comprises identifying the at least one subset of the plurality of genes based on the degree of correlation.
3. The method of claim 2, wherein the identifying in (e) is based at least in part on determining a threshold level of the degree of correlation.
4. The method of any one of claims 1-3, wherein determining the correlation in (d) comprises analyzing covariance of the gene expression matrix conditional on the environment confounder matrix.
5. The method of claim 4, further comprising, in (d), generating a conditional correlation matrix of the plurality of genes based on the covariance, wherein the conditional correlation matrix is different from the gene expression matrix.
6. The method of any one of claims 1-5, wherein the at least one subset of the plurality of genes comprises a plurality of subsets that are different from one another.
7. The method of claim 6, wherein the plurality of subsets comprises a first subset having a first plurality of genes and a second subset having a second plurality of genes, wherein the first plurality of genes has at least one gene that is not in common with the second plurality of genes.
8. The method of any one of claims 1-7, wherein the gene expression matrix comprises an environment expression matrix, wherein the environment expression matrix characterizes a cell of the plurality of cells by analyzing the gene expression level of the plurality of genes of nearest neighboring cells of the cell within the multi-dimensional image, and wherein the method further comprises, in (b), processing the transcriptomic data and the location data to generate the environment expression matrix.
9. The method of claim 8, wherein a number of the nearest neighboring cells is at most about 1,000 cells, at most about 500 cells, at most about 100 cells, or at most about 50 cells.
10. The method of any one of claims 1-9, further comprising displaying, via a graphical user interface, the genes of the at least one subset of the plurality of genes to a user.
11. The method of any one of claims 1-10, further comprising, based on the analyzing in (c) and determination in (d), generating a gene cluster map comprising a plurality of shapes representing the plurality of genes, wherein the plurality of shapes is arranged in a plurality of clusters, wherein a cluster of the plurality of clusters corresponds to the at least one subset of the plurality of genes.
12. The method of claim 11, further comprising displaying, via a graphical user interface, the gene cluster map to a user.
13. The method of any one of claims 1-12, wherein the multi-dimensional image is a two- dimensional image.
14. The method of any one of claims 1-13, further comprising, subsequent to (e), scoring each cell of the plurality of cells based on single cell expression level of the genes of the at least one subset of the plurality of genes.
15. The method of claim 14, further comprising generating an additional multi -dimensional image of the biological sample based on the scoring.
16. The method of any one of claims 1-15, further comprising, in (c), receiving selection of the one or more environment variables from a user via a graphical user interface.
17. The method of any one of claims 1-16, wherein the one or more environment variables are based on both (i) the cell classification in the at least the region and (ii) the measurement artifact of the transcriptomic data in the at least the region.
18. The method of any one of claims 1-17, wherein the measurement artifact comprises detection data via a synthetic control probe sequence.
19. The method of any one of claims 1-18, wherein the one or more environment variables are based at least in part on data comprising one or more of (i) a number of cells having a cell classification of interest in at least the region of the multi-dimensional image, (ii) a number of different cell classifications identified in at least the region of the multi-dimensional image, (iii) a ratio between numbers of cells of two different cell classifications of interest in at least the region of the multi-dimensional image, and (iv) a relative location between a cell having a cell classification of interest and a tissue substructure in at least a portion of the multi-dimensional image, or any combination thereof.
20. The method of any one of claims 1-19, wherein the region can be characterized by having at most about 5 cells, at most about 10 cells, at most about 20 cells, at most about 50 cells, or at most about 100 cells.
21. The method of any one of claims 1-20, wherein a number of the one or more environment variables in the environment confounder matrix is at least about 5, at least about 10, at least about 15, or at least about 20.
22. The method of any one of claims 1-21, wherein the cell classification comprises one or more of endothelial cells, epithelial cells, dermal cells, endodermal cells, mesodermal cells, fibroblasts, osteocytes, chondrocytes, immune cells, dendritic cells, hepatic cells, pancreatic cells, and stromal cells, or any combination thereof.
23. The method of any one of claims 1-22, wherein the cell classification comprises one or more of salivary gland mucous cells, salivary gland serous cells, von Ebner's gland cells, mammary gland cells, lacrimal gland cells, ceruminous gland cells, eccrine sweat gland dark cells, eccrine sweat gland clear cells, apocrine sweat gland cells, gland of Moll cells, sebaceous gland cells, bowman's gland cells, Brunner's gland cells, seminal vesicle cells, prostate gland cells, bulbourethral gland cells, Bartholin's gland cells, gland of Littre cells, uterus endometrium cells, isolated goblet cells, stomach lining mucous cells, gastric gland zymogenic cells, gastric gland oxyntic cells, pancreatic acinar cells, Paneth cells, type II pneumocytes, Clara cells, somatotropes, lactotropes, thyrotropes, gonadotropes, corticotropes, intermediate pituitary cells, magnocellular neurosecretory cells, gut cells, respiratory tract cells, thyroid epithelial cells, parafollicular cells, parathyroid gland cells, parathyroid chief cell, oxyphil cell, adrenal gland cells, chromaffin cells, Leydig cells, theca interna cells, corpus luteum cells, granulosa lutein cells, theca lutein cells, juxtaglomerular cell, macula densa cells, peripolar cells, mesangial cell, blood vessel and lymphatic vascular endothelial fenestrated cells, blood vessel and lymphatic vascular endothelial continuous cells, blood vessel and lymphatic vascular endothelial splenic cells, synovial cells, serosal cell, squamous cells, columnar cells, dark cells, vestibular membrane cell, stria vascularis basal cells, stria vascularis marginal cell, cells of Claudius, cells of Boettcher, choroid plexus cells, pia-arachnoid squamous cells, pigmented ciliary epithelium cells, nonpigmented ciliary epithelium cells, corneal endothelial cells, peg cells, respiratory tract ciliated cells, oviduct ciliated cell, uterine endometrial ciliated cells, rete testis ciliated cells, ductuli efferens ciliated cells, ciliated ependymal cells, epidermal keratinocytes, epidermal basal cells, keratinocyte of fingernails and toenails, nail bed basal cells, medullary hair shaft cells, cortical hair shaft cells, cuticular hair shaft cells, cuticular hair root sheath cells, hair root sheath cells of Huxley's layer, hair root sheath cells of Henle's layer, external hair root sheath cells, hair matrix cells, surface epithelial cells of stratified squamous epithelium, basal cell of epithelia, urinary epithelium cells, auditory inner hair cells of organ of Corti, auditory outer hair cells of organ of Corti, basal cells of olfactory epithelium, cold-sensitive primary sensory neurons, heatsensitive primary sensory neurons, Merkel cells of epidermis, olfactory receptor neurons, pain sensitive primary sensory neurons, photoreceptor rod cells, photoreceptor blue -sensitive cone cells, photoreceptor green-sensitive cone cells, photoreceptor red-sensitive cone cells, proprioceptive primary sensory neurons, touch-sensitive primary sensory neurons, type I carotid body cells, type II carotid body cell, type I hair cell of vestibular apparatus of ear, type II hair cells of vestibular apparatus of ear, type I taste bud cells, cholinergic neural cells, adrenergic neural cells, peptidergic neural cells, inner pillar cells of organ of Corti, outer pillar cells of organ of Corti, inner phalangeal cells of organ of Corti, outer phalangeal cells of organ of Corti, border cells of organ of Corti, Hensen cells of organ of Corti, vestibular apparatus supporting cells, taste bud supporting cells, olfactory epithelium supporting cells, Schwann cells, satellite cells, enteric glial cells, astrocytes, neurons, oligodendrocytes, spindle neurons, anterior lens epithelial cells, cry stallin-containing lens fiber cells, hepatocytes, adipocytes, white fat cells, brown fat cells, liver lipocytes, kidney glomerulus parietal cells, kidney glomerulus podocytes, kidney proximal tubule brush border cells, loop of Henle thin segment cells, kidney distal tubule cells, kidney collecting duct cells, type I pneumocytes, pancreatic duct cells, nonstriated duct cells, duct cells, intestinal brush border cells, exocrine gland striated duct cells, gall bladder epithelial cells, ductulus efferensnonciliated cells, epididymal principal cells, epididymal basal cells, ameloblast epithelial cells, planum semilunatum epithelial cells, organ of Corti interdental epithelial cells, loose connective tissue fibroblasts, corneal keratocytes, tendon fibroblasts, bone marrow reticular tissue fibroblasts, nonepithelial fibroblasts, pericytes, nucleus pulposus cells, cementoblast/cementocytes, odontoblasts, odontocytes, hyaline cartilage chondrocytes, fibrocartilage chondrocytes, elastic cartilage chondrocytes, osteoblasts, osteocytes, osteoclasts, osteoprogenitor cells, hyalocytes, stellate cells, hepatic stellate cells, pancreatic stellate cells, red skeletal muscle cells, white skeletal muscle cells, intermediate skeletal muscle cells, nuclear bag cells of muscle spindle, nuclear chain cells of muscle spindle, satellite cells, ordinary heart muscle cells, nodal heart muscle cells, Purkinje fiber cells, smooth muscle cells, myoepithelial cells of iris, myoepithelial cell of exocrine glands, reticulocytes, megakaryocytes, monocytes, connective tissue macrophages, epidermal Langerhans cells, dendritic cells, microglial cells, neutrophils, eosinophils, basophils, mast cell, helper T cells, suppressor ! cells, cytotoxic T cell, natural Killer T cells, B cells, natural killer cells, melanocytes, retinal pigmented epithelial cells, oogonia/oocytes, spermatids, spermatocytes, spermatogonium cells, spermatozoa, ovarian follicle cells, Sertoli cells, thymus epithelial cell, or interstitial kidney cells, or any combination thereof.
24. The method of any one of claims 1-23, wherein the cell classification comprises one or more of embryonic stem cells, embryonic germ cells, induced pluripotent stem cells, mesenchymal stem cells, bone marrow-derived mesenchymal stem cells, bone marrow-derived mesenchymal stromal cells, tissue plastic-adherent placental stem cells (PDACs), umbilical cord stem cells, amniotic fluid stem cells, amnion derived adherent cells (AMDACs), osteogenic placental adherent cells (OP ACs), adipose stem cells, limbal stem cells, dental pulp stem cells, myoblasts, endothelial progenitor cells, neuronal stem cells, exfoliated teeth derived stem cells, hair follicle stem cells, dermal stem cells, parthenogenically derived stem cells, reprogrammed stem cells, amnion derived adherent cells, or side population stem cells, or any combination thereof.
25. The method of any one of claims 1 -24, wherein the transcriptomic data comprises one or more of gene expression assays with fluorescently labeled probes, RNA sequencing (RNA-seq), microarray analysis, reverse transcription polymerase chain reaction (RT-PCR), cap analysis of gene expression, or single-cell RNA sequencing (scRNA-seq), or any combination thereof.
26. The method of any one of claims 1-25, wherein the plurality of cells comprises at least about 100 cells, at least about 200 cells, at least about 500 cells, or at least about 1,000 cells.
27. The method of any one of claims 1-26, wherein the plurality of genes comprises at least about 10 genes, at least about 20 genes, at least about 50 genes, or at least about 100 genes.
28. The method of any one of claims 1 -26, wherein the plurality of genes comprises about from 5,000 genes to 6,000 genes.
29. The method of any one of claims 1 -26, wherein the plurality of genes comprises about 19,000 genes.
30. The method of any one of claims 1-29, wherein the plurality of genes comprise endogenous genes.
31. A system comprising one or more computer processors and computer memory coupled thereto, the computer memory comprising a machine executable code that, upon execution by the one or more computer processors, is configured to perform a method, the method comprising:
(a) retrieving from a database: a location data indicative of relative positions of the plurality of cells in a multidimensional image of the biological sample; and a transcriptomic data of a plurality of genes of the plurality of cells;
(b) processing at least the transcriptomic data to generate a gene expression matrix characterizing each cell of the plurality of cells by a gene expression level of the plurality of genes;
(c) analyzing the location data and the transcriptomic data to generate an environment confound er matrix, wherein the environment confounder matrix characterizes each cell of the plurality of cells by one or more environment variables of a region adjacent to or surrounding the each cell in the multi-dimensional image, wherein the one or more environment variables are based at least in part on (i) cell classification of one or more cells in at least the region or (ii) measurement artifact of the transcriptomic data in at least the region;
(d) determining a correlation between the plurality of genes of the gene expression matrix conditional on the environment confounder matrix; and
(e) identifying, based on the correlation determined in (d), at least one subset of the plurality of genes, wherein genes of the at least one subset are mutually correlated to one another in the biological sample.
32. The system of claim 31, wherein (d) comprises determining a degree of the correlation, and wherein (e) comprises identifying the at least one subset of the plurality of genes based on the degree of correlation.
33. The system of claim 32, wherein the identifying in (e) is based at least in part on determining a threshold level of the degree of correlation.
34. The system of any one of claims 31-33, wherein determining the correlation in (d) comprises analyzing covariance of the gene expression matrix conditional on the environment confounder matrix.
35. The system of claim 34, wherein determining the correlation in (d) further comprises generating a conditional correlation matrix of the plurality of genes based on the covariance, wherein the conditional correlation matrix is different from the gene expression matrix.
36. The system of any one of claims 31-35, wherein the at least one subset of the plurality of genes comprises a plurality of subsets that are different from one another.
37. The system of claim 36, wherein the plurality of subsets comprises a first sub set having a first plurality of genes and a second subset having a second plurality of genes, wherein the first plurality of genes has at least one gene that is not in common with the second plurality of genes.
38. The system of any one of claims 31 -37, wherein the gene expression matrix comprises an environment expression matrix, wherein the environment expression matrix characterizes a cell of the plurality of cells by analyzing the gene expression level of the plurality of genes of nearest neighboring cells of the cell within the multi-dimensional image, and wherein the method further comprises, in (b), processing the transcriptomic data and the location data to generate the environment expression matrix.
39. The system of claim 38, wherein a number of the nearest neighboring cells is at most about 1,000 cells, at most about 500 cells, at most about 100 cells, or at most about 50 cells.
40. The system of any one of claims 31-39, further comprising a graphical user interface configured to display the genes of the at least one subset of the plurality of genes to a user.
41 . The system of any one of claims 31 -40, wherein the one or more processors are further configured to, based on the analyzing in (c) and determination in (d), generate a gene cluster map comprising a plurality of shapes representing the plurality of genes, wherein the plurality of shapes is arranged in a plurality of clusters, wherein a cluster of the plurality of clusters corresponds to the at least one subset of the plurality of genes.
42. The system of claim 41, further comprising a graphical user interface configured to display the gene cluster map to a user.
43. The system of any one of claims 31 -42, wherein the multi-dimensional image is a two- dimensional image.
44. The system of any one of claims 31-43, wherein the one or more processors are further configured to, subsequent to (e), score each cell of the plurality of cells based on single cell expression level of the genes of the at least one subset of the plurality of genes.
45. The system of claim 44, wherein the one or more processors are further configured to generate an additional multi-dimensional image of the biological sample based on the scoring.
46. The system of any one of claims 31-45, wherein the one or more processors are further configured to, in (c), receive selection of the one or more environment variables from a user via a graphical user interface.
47. The system of any one of claims 31-46, wherein the one or more environment variables are based on both (i) the cell classification in the at least the region and (ii) the measurement artifact of the transcriptomic data in the at least the region.
48. The system of any one of claims 31-47, wherein the measurement artifact comprises detection data via a synthetic control probe sequence.
49. The system of any one of claims 31-48, wherein the one or more environment variables are based at least in part on data comprising one or more of (i) a number of cells having a cell classification of interest in at least the region of the multi-dimensional image, (ii) a number of different cell classifications identified in at least the region of the multi-dimensional image, (iii) a ratio between numbers of cells of two different cell classifications of interest in at least the region of the multi-dimensional image, and (iv) a relative location between a cell having a cell classification of interest and a tissue substructure in at least a portion of the multi-dimensional image, or any combination thereof.
50. The system of any one of claims 31-49, wherein the region can be characterized by having at most about 5 cells, at most about 10 cells, at most about 20 cells, at most about 50 cells, or at most about 100 cells.
51. The system of any one of claims 31-50, wherein a number of the one or more environment variables in the environment confounder matrix is at least about 5, at least about 10, at least about 15, or at least about 20.
52. The system of any one of claims 31-51, wherein the cell classification comprises one or more of endothelial cells, epithelial cells, dermal cells, endodermal cells, mesodermal cells, fibroblasts, osteocytes, chondrocytes, immune cells, dendritic cells, hepatic cells, pancreatic cells, and stromal cells, or any combination thereof.
53. The system of any one of claims 31-52, wherein the cell classification comprises one or more of salivary gland mucous cells, salivary gland serous cells, von Ebner's gland cells, mammary gland cells, lacrimal gland cells, ceruminous gland cells, eccrine sweat gland dark cells, eccrine sweat gland clear cells, apocrine sweat gland cells, gland of Moll cells, sebaceous gland cells, bowman's gland cells, Brunner's gland cells, seminal vesicle cells, prostate gland cells, bulbourethral gland cells, Bartholin's gland cells, gland of Littre cells, uterus endometrium cells, isolated goblet cells, stomach lining mucous cells, gastric gland zymogenic cells, gastric gland oxyntic cells, pancreatic acinar cells, paneth cells, type II pneumocytes, clara cells, somatotropes, lactotropes, thyrotropes, gonadotropes, corticotropes, intermediate pituitary cells, magnocellular neurosecretory cells, gut cells, respiratory tract cells, thyroid epithelial cells, parafollicular cells, parathyroid gland cells, parathyroid chief cell, oxyphil cell, adrenal gland cells, chromaffin cells, Leydig cells, theca interna cells, corpus luteum cells, granulosa lutein cells, theca lutein cells, juxtaglomerular cell, macula densa cells, peripolar cells, mesangial cell, blood vessel and lymphatic vascular endothelial fenestrated cells, blood vessel and lymphatic vascular endothelial continuous cells, blood vessel and lymphatic vascular endothelial splenic cells, synovial cells, serosal cell, squamous cells, columnar cells, dark cells, vestibular membrane cell, stria vascularis basal cells, stria vascularis marginal cell, cells of Claudius, cells of Boettcher, choroid plexus cells, pia-arachnoid squamous cells, pigmented ciliary epithelium cells, nonpigmented ciliary epithelium cells, corneal endothelial cells, peg cells, respiratory tract ciliated cells, oviduct ciliated cell, uterine endometrial ciliated cells, rete testis ciliated cells, ductuli efferens ciliated cells, ciliated ependymal cells, epidermal keratinocytes, epidermal basal cells, keratinocyte of fingernails and toenails, nail bed basal cells, medullary hair shaft cells, cortical hair shaft cells, cuticular hair shaft cells, cuticular hair root sheath cells, hair root sheath cells of Huxley's layer, hair root sheath cells of Henle's layer, external hair root sheath cells, hair matrix cells, surface epithelial cells of stratified squamous epithelium, basal cell of epithelia, urinary epithelium cells, auditory inner hair cells of organ of Corti, auditory outer hair cells of organ of Corti, basal cells of olfactory epithelium, cold-sensitive primary sensory neurons, heatsensitive primary sensory neurons, Merkel cells of epidermis, olfactory receptor neurons, painsensitive primary sensory neurons, photoreceptor rod cells, photoreceptor blue -sensitive cone cells, photoreceptor green-sensitive cone cells, photoreceptor red-sensitive cone cells, proprioceptive primary sensory neurons, touch-sensitive primary sensory neurons, type I carotid body cells, type II carotid body cell, type I hair cell of vestibular apparatus of ear, type II hair cells of vestibular apparatus of ear, type I taste bud cells, cholinergic neural cells, adrenergic neural cells, peptidergic neural cells, inner pillar cells of organ of Corti, outer pillar cells of organ of Corti, inner phalangeal cells of organ of Corti, outer phalangeal cells of organ of Corti, border cells of organ of Corti, Hensen cells of organ of Corti, vestibular apparatus supporting cells, taste bud supporting cells, olfactory epithelium supporting cells, Schwann cells, satellite cells, enteric glial cells, astrocytes, neurons, oligodendrocytes, spindle neurons, anterior lens epithelial cells, cry stallin-containing lens fiber cells, hepatocytes, adipocytes, white fat cells, brown fat cells, liver lipocytes, kidney glomerulus parietal cells, kidney glomerulus podocytes, kidney proximal tubule brush border cells, loop of Henle thin segment cells, kidney distal tubule cells, kidney collecting duct cells, type I pneumocytes, pancreatic duct cells, nonstriated duct cells, duct cells, intestinal brush border cells, exocrine gland striated duct cells, gall bladder epithelial cells, ductulus efferensnonciliated cells, epididymal principal cells, epididymal basal cells, ameloblast epithelial cells, planum semilunatum epithelial cells, organ of Corti interdental epithelial cells, loose connective tissue fibroblasts, corneal keratocytes, tendon fibroblasts, bone marrow reticular tissue fibroblasts, nonepithelial fibroblasts, pericytes, nucleus pulposus cells, cementoblast/cementocytes, odontoblasts, odontocytes, hyaline cartilage chondrocytes, fibrocartilage chondrocytes, elastic cartilage chondrocytes, osteoblasts, osteocytes, osteoclasts, osteoprogenitor cells, hyalocytes, stellate cells, hepatic stellate cells, pancreatic stellate cells, red skeletal muscle cells, white skeletal muscle cells, intermediate skeletal muscle cells, nuclear bag cells of muscle spindle, nuclear chain cells of muscle spindle, satellite cells, ordinary heart muscle cells, nodal heart muscle cells, Purkinje fiber cells, smooth muscle cells, myoepithelial cells of iris, myoepithelial cell of exocrine glands, reticulocytes, megakaryocytes, monocytes, connective tissue macrophages, epidermal Langerhans cells, dendritic cells, microglial cells, neutrophils, eosinophils, basophils, mast cell, helper T cells, suppressor ! cells, cytotoxic T cell, natural Killer T cells, B cells, natural killer cells, melanocytes, retinal pigmented epithelial cells, oogonia/oocytes, spermatids, spermatocytes, spermatogonium cells, spermatozoa, ovarian follicle cells, Sertoli cells, thymus epithelial cell, or interstitial kidney cells, or any combination thereof.
54. The system of any one of claims 31-53, wherein the cell classification comprises one or more of embryonic stem cells, embryonic germ cells, induced pluripotent stem cells, mesenchymal stem cells, bone marrow-derived mesenchymal stem cells, bone marrow-derived mesenchymal stromal cells, tissue plastic-adherent placental stem cells (PDACs), umbilical cord stem cells, amniotic fluid stem cells, amnion derived adherent cells (AMDACs), osteogenic placental adherent cells (OP ACs), adipose stem cells, limbal stem cells, dental pulp stem cells, myoblasts, endothelial progenitor cells, neuronal stem cells, exfoliated teeth derived stem cells, hair follicle stem cells, dermal stem cells, parthenogenically derived stem cells, reprogrammed stem cells, amnion derived adherent cells, or side population stem cells, or any combination thereof.
55. The system of any one of claims 31 -54, wherein the transcriptomic data comprises one or more of gene expression assays with fluorescently labeled probes, RNA sequencing (RNA-seq), microarray analysis, reverse transcription polymerase chain reaction (RT-PCR), cap analysis of gene expression, or single-cell RNA sequencing (scRNA-seq), or any combination thereof.
56. The system of any one of claims 31-55, wherein the plurality of cells comprises at least about 100 cells, at least about 200 cells, at least about 500 cells, or at least about 1,000 cells.
57. The system of any one of claims 31-56, wherein the plurality of genes comprises at least about 10 genes, at least about 20 genes, at least about 50 genes, or at least about 100 genes.
58. The system of any one of claims 31 -57, wherein the plurality of genes comprises about from 5,000 genes to 6,000 genes.
59. The system of any one of claims 31 -58, wherein the plurality of genes comprises about 19,000 genes.
60. The system of any one of claims 31-59, wherein the plurality of genes comprise endogenous genes.
PCT/US2024/043413 2023-08-24 2024-08-22 Systems and methods for cellular spatial analysis Pending WO2025043080A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363578444P 2023-08-24 2023-08-24
US63/578,444 2023-08-24

Publications (2)

Publication Number Publication Date
WO2025043080A2 true WO2025043080A2 (en) 2025-02-27
WO2025043080A3 WO2025043080A3 (en) 2025-04-17

Family

ID=94732635

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/043413 Pending WO2025043080A2 (en) 2023-08-24 2024-08-22 Systems and methods for cellular spatial analysis

Country Status (1)

Country Link
WO (1) WO2025043080A2 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DK3511423T4 (en) * 2012-10-17 2024-07-29 Spatial Transcriptomics Ab METHODS AND PRODUCT FOR OPTIMIZING LOCALIZED OR SPATIAL DETECTION OF GENE EXPRESSION IN A TISSUE SAMPLE
EP3788146A4 (en) * 2018-05-02 2022-06-01 The General Hospital Corporation High-resolution spatial macromolecule abundance assessment
CN115346599B (en) * 2022-10-19 2023-02-17 四川大学华西医院 H & E image gene and cell heterogeneity prediction method, system and storage medium

Also Published As

Publication number Publication date
WO2025043080A3 (en) 2025-04-17

Similar Documents

Publication Publication Date Title
Logan et al. Decision making and uncertainty quantification for individualized treatments using Bayesian Additive Regression Trees
US10810213B2 (en) Phenotype/disease specific gene ranking using curated, gene library and network based data structures
Burger Gentle introduction to the statistical foundations of false discovery rate in quantitative proteomics
Chen et al. A gradient boosting algorithm for survival analysis via direct optimization of concordance index
Zhu et al. Targeted exploration and analysis of large cross-platform human transcriptomic compendia
KR20210127798A (en) Semi-supervised learning for training an ensemble of deep convolutional neural networks
CN112912961A (en) Systems and methods for analyzing alternative splicing
Le et al. Machine learning for cell type classification from single nucleus RNA sequencing data
Guo et al. Likelihood-based feature representation learning combined with neighborhood information for predicting circRNA–miRNA associations
Di Camillo et al. Effect of size and heterogeneity of samples on biomarker discovery: synthetic and real data assessment
Chen et al. Improved interpretability of machine learning model using unsupervised clustering: predicting time to first treatment in chronic lymphocytic leukemia
Sant et al. Choir improves significance-based detection of cell types and states from single-cell data
Yu et al. Determination of biomarkers from microarray data using graph neural network and spectral clustering
Wang et al. Integration of steady-state and temporal gene expression data for the inference of gene regulatory networks
Mohamed et al. A novel feature selection algorithm for identifying hub genes in lung cancer
Qiao et al. Reliable imputation of spatial transcriptomes with uncertainty estimation and spatial regularization
Way et al. Discovering pathway and cell type signatures in transcriptomic compendia with machine learning
Giollo et al. Crohn disease risk prediction—Best practices and pitfalls with exome data
Yang et al. Applications of Bayesian statistical methods in microarray data analysis
Sadria et al. Adversarial training improves model interpretability in single-cell RNA-seq analysis
WO2025043080A2 (en) Systems and methods for cellular spatial analysis
Dai et al. Case-only trees and random forests for exploring genotype-specific treatment effects in randomized clinical trials with dichotomous end points
Charitakis et al. Comparative analysis of packages and algorithms for the analysis of spatially resolved transcriptomics data
CN118588161A (en) Prenatal screening data collection and analysis system and method
Girija et al. Deep learning for vehement gene expression exploration

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24857304

Country of ref document: EP

Kind code of ref document: A2