WO2025229503A1 - Methods for identifying exhausted t cells and t-cell receptors thereof - Google Patents
Methods for identifying exhausted t cells and t-cell receptors thereofInfo
- Publication number
- WO2025229503A1 WO2025229503A1 PCT/IB2025/054407 IB2025054407W WO2025229503A1 WO 2025229503 A1 WO2025229503 A1 WO 2025229503A1 IB 2025054407 W IB2025054407 W IB 2025054407W WO 2025229503 A1 WO2025229503 A1 WO 2025229503A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cell
- cells
- exhausted
- exhaustion
- tcr
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
Definitions
- T-cell receptor (TCR) specificity for tumor antigens can be a key factor that determines the effectiveness of an immune response against cancer. Identifying TCRs that are reactive to tumor antigens in a patient-specific manner can significantly improve the development of personalized cancer therapies. However, conventional methods for identifying these TCRs can be labor-intensive and challenging. Thus, the development of bioinformatic tools capable of predicting tumor antigen-reactive TCRs can be of great interest, presenting a potential solution to this problem.
- Tumor antigen-reactive T cell receptors can be found on the surface of a subpopulation of CD8+ and/or CD4+ T cells obtained from a tumor microenvironment (e.g., tumor infiltrating leukocytes (TILs)) that may display exhaustion phenotype. Recognized herein is a need for improved methods and compositions for identifying such exhausted T cells from TILs and the tumor antigen- reactive TCRs of the identified exhausted T cells.
- the prediction algorithms that aim to harness the power of next-generation sequencing technologies like single-cell transcriptome (scGEX) and TCR sequencing (scTCR), can provide improved methods for tumor antigen-reactive TCR identification within a tumor.
- a method of identifying exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer comprising: (a) providing single cell transcriptome data of the population of T cells; (b) classifying each T cell of the population of T cells as a CD4+ T cell or a CD8+ T cell based on an expression level of each classification gene of a set of at least 10 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster and a CD8+ cluster; and (c) calculating (i) a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 5 CD4+ exhaustion gene markers, and (ii) a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster
- Also provided herein is a method of identifying exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer, the method comprising: calculating a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD4+ T cell, wherein the calculating is based on an expression level of each CD4+ exhaustion gene marker of a set of at least 5 CD4+ exhaustion gene markers; wherein the expression level of each CD4+ exhaustion gene marker is from single cell transcriptome data of the population of T cells from the tumor microenvironment of the subject; wherein each T cell classified as a CD4+ T cell with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
- GSEA CD4+ gene set enrichment analysis
- the method further comprises, prior to calculating, classifying a T cell from the population of T cells as a CD4+ T cell based on an expression level of each classification gene of a set of at least 10 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster.
- Also provided herein is a method of identifying exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer, the method comprising: calculating a CD8+ exhaustion score and/or a CD8+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD8+ T cell, wherein the calculating is based on an expression level of each CD8+ exhaustion gene marker of a set of at least 5 CD8+ exhaustion gene markers; wherein the expression level of each CD8+ exhaustion gene marker is from single cell transcriptome data of the population of T cells from the tumor microenvironment of the subject; wherein each T cell classified as a CD8+ T cell with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
- GSEA CD8+ gene set enrichment analysis
- the method further comprises, prior to calculating, classifying a T cell from the population of T cells as a CD8+ T cell based on an expression level of each classification gene of a set of at least 10 classification genes from the single cell transcriptome data, thereby generating a CD8+ cluster.
- the method further comprises classifying each T cell from the population of T cells as a CD4+ T cell or a CD8+ T cell based on an expression level of each classification gene of a set of at least 10 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster and a CD8+ cluster.
- the method further comprises calculating (i) a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 5 CD4+ exhaustion gene markers, and (ii) a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 5 CD8+ exhaustion gene markers.
- GSEA CD4+ gene set enrichment analysis
- the set of at least 5 CD4+ exhaustion gene markers is different from the set of at least 5 CD8+ exhaustion gene markers.
- Also provided herein is a method of classifying CD8+ T cells and CD4+ T cells in a population of T cells comprising: (a) providing single cell transcriptome data of a population of T cells obtained from a tumor microenvironment of a subject having a cancer; (b) classifying each T cell of the population of T cells as a CD4+ T cell or a CD8+ T cell based on an expression level of each classification gene of a set of at least 40 classification genes selected from the group consisting of the genes of Table 2 from the single cell transcriptome data, thereby generating a CD4+ cluster and a CD8+ cluster, wherein a T cell of the CD4+ cluster is classified as CD4+ T cell, and wherein a T cell of the CD8+ cluster is classified as CD8+ T cell.
- the method further comprises calculating a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 5 CD4+ exhaustion gene markers.
- GSEA CD4+ gene set enrichment analysis
- the expression level of each CD4+ exhaustion gene marker is from single cell transcriptome data of a population of T cells from the tumor microenvironment of the subject.
- each T cell within the CD4+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
- the method further comprises calculating a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 5 CD8+ exhaustion gene markers.
- the set of at least 5 CD4+ exhaustion gene markers is different from the set of at least 5 CD8+ exhaustion gene markers.
- the expression level of each CD8+ exhaustion gene marker is from single cell transcriptome data of a population of T cells from a tumor microenvironment of a subject having a cancer.
- each T cell within the CD8+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
- the method further comprises obtaining the population of T cells from the tumor microenvironment of the subject.
- obtaining comprises isolating a tumor or a tumor tissue comprising the population of T cells from the subject.
- the expression level is determined by mRNA transcripts.
- the method further comprises sequencing mRNAs from the population of T cells to obtain the single cell transcriptome data.
- the method further comprises providing single-cell T-cell receptor (scTCR) data of the population of T cells.
- scTCR single-cell T-cell receptor
- the method further comprises sequencing the population of T cells to obtain the scTCR data of each T cell.
- the method further comprises identifying a TCR clonotype of an exhausted CD4+ T cell or an exhausted CD8+ T cell based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells.
- the method further comprises identifying TCR clonotypes of each exhausted CD4+ T cell of the population of T cells based on the scTCR data of exhausted CD4+ T cells.
- the method further comprises identifying TCR clonotypes of each exhausted CD8+ T cell of the population of T cells based on the scTCR data of exhausted CD8+ T cells.
- the method further comprises identifying TCR clonotypes of each exhausted CD4+ T cell and each exhausted CD8+ T cell of the population of T cells based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells.
- a TCR clonotype of a given exhausted CD4+ T cell is matched to the CD4+ exhaustion score and/or the CD4+ GSEA score of the same exhausted CD4+ T cell.
- the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell.
- the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell via a same single cell barcode.
- a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score and/or the CD8+ GSEA score of the same exhausted CD8+ T cell.
- the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell.
- the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell via a same single cell barcode [0036]
- the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD4+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD4+ T cells.
- the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD4+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD4+ T cells.
- the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD8+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD8+ T cells.
- the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD8+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD8+ T cells.
- the method further comprises, prior to obtaining the single cell transcriptomic data, separating a subset of T cells from the population of T cells based on expression of a CD4+ and/or CD8+ exhaustion marker, thereby generating a subset of exhausted T cells and a subset of non-exhausted T cells.
- the CD4+ and/or CD8+ exhaustion marker comprises at least 5 genes selected from the group consisting of genes in Tables 3-6.
- separating comprises fluorescence activated cell sorting (FACS).
- FACS fluorescence activated cell sorting
- the method further comprises sequencing the subset of exhausted T cells and the subset of non-exhausted T cells using single cell sequencing or bulk sequencing.
- the sequencing does not comprise using a barcode.
- the population of T cells are obtained from a frozen sample or a fresh sample.
- the sample is a formalin-fixed paraffin- embedded (FFPE) sample.
- FFPE formalin-fixed paraffin- embedded
- the method further comprises preparing a pharmaceutical composition using the candidate tumor-reactive TCR clonotype or a cell expressing the candidate tumor-reactive TCR clonotype.
- Also provided herein is a method of identifying one or more T-cell receptors from exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer, the method comprising: (a) providing single cell transcriptome data of the population of T cells; (b) classifying each T cell of the population of T cells as a CD4+ T cell or a CD8+ T cell based on an expression level of each classification gene of a set of at least 10 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster and a CD8+ cluster; (c) calculating (i) a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 5 CD4+ exhaustion gene markers, and (ii) a CD8+ exhaustion score and/or a CD8+ GSEA score for a
- the set of at least 10 classification genes comprises at least 10 genes selected from the group consisting of PTPN13, TNFRSF4, CCR6, FOXP3, TSHZ2, MFHAS1, FAAH2, CD4, GK, IL2RA, CRADD, LTB, IRS2, KLRB1, TNFRSF25, LINC02694, THAD A, BATF, TNFRSF18, SELL, IL12RB2, FURIN, HIPK2, MAP3K5, TMEM173, CTSB, SAMHD1, ADAM19, ICOS, GNA15, EPSTI1, ZC3H12D, PHTF2, MAST4, UGP2, RAPGEF6, STAM, CTLA4, RORA, SATB1, ZEB1, PIM2, CD28, LDLRAD4, PELI1, RHBDD2, SOCS3, TRAF3, ABCC1, RNASET2, SPOCK2, ITK, STK24, SNX9, GZMA, RALGAPA1, GZMB, JMJD
- classifying each T cell of the population of T cells comprises classifying each T cell of the population of T cells as a CD4+ cell and/or a CD8+ cell based on an expression level of each classification gene of a set of from 11 to 99 classification genes selected from the group consisting of PTPN13, TNFRSF4, CCR6, FOXP3, TSHZ2, MFHAS1, FAAH2, CD4, GK, IL2RA, CRADD, LTB, IRS2, KLRB1, TNFRSF25, LINC02694, THADA, BATF, TNFRSF18, SELL, IL12RB2, FURIN, HIPK2, MAP3K5, TMEM173, CTSB, SAMHD1, ADAM19, ICOS, GNA15, EPSTI1, ZC3H12D, PHTF2, MAST4, UGP2, RAPGEF6, STAM, CTLA4, RORA, SATB1, ZEB1, PIM2, CD28, LDLRAD4, PELI1, RHBDD
- the set of at least 5 CD4+ exhaustion gene markers comprises at least 5 genes selected from the group consisting of ADGRG1, CD200, CHN1, CPM, CTLA4, CXCL13, DRAIC, ENTPD1, GNG4, ICA1, IGFL2, IGFLR1, MY07A, PDE7B, NMB, PTPN13, TIGIT, TOX, TOX2, and TSHZ2.
- the set of at least 5 CD4+ exhaustion gene markers comprises from 6 to 20 genes selected from the group consisting of ADGRG1, CD200, CHN1, CPM, CTLA4, CXCL13, DRAIC, ENTPD1, GNG4, ICA1, IGFL2, IGFLR1, MY07A, PDE7B, NMB, PTPN13, TIGIT, TOX, TOX2, and TSHZ2.
- the set of at least 5 CD8+ exhaustion gene markers comprises at least 5 genes selected from the group consisting of CHN1, CLECL1, CTLA4, CXCL13, CXCR6, ENTPD1, HAVCR2, KRT86, LAG3, LAYN, LRRN3, MYO IE, MY07A, PDCD1, SIRPG, SRGAP3, STAM, TIGIT, TNFRSF18, and TOX.
- the set of at least 5 CD8+ exhaustion gene markers comprises from 6 to 20 genes selected from the group consisting of CHN1, CLECL1, CTLA4, CXCL13, CXCR6, ENTPD1, HAVCR2, KRT86, LAG3, LAYN, LRRN3, MYO IE, MY07A, PDCD1, SIRPG, SRGAP3, STAM, TIGIT, TNFRSF18, and TOX.
- calculating the CD4+ exhaustion score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each CD4+ exhaustion gene of the set of at least 5 CD4+ exhaustion gene markers to obtain the expression level of each CD4+ exhaustion gene of the set of at least 5 CD4+ exhaustion gene markers; (ii) scaling the UMI count by dividing the UMI count for each gene by the total number of UMIs of the single cell transcriptome data of the T cell and then multiplying the quotient by a scale factor; (iii) applying a logarithmic transformation to obtain normalized UMI counts for the set of at least 5 CD4+ exhaustion gene markers; and (iv) calculating the CD4+ exhaustion score for the T cell as a mean of the normalized UMI counts, wherein the T cell with a CD4+ exhaustion score equal to or higher than 0.65 is identified as an exhausted CD4+ T cell.
- calculating the CD8+ exhaustion score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each CD8+ exhaustion gene of the set of at least 5 CD8+ exhaustion gene markers to obtain the expression level of each gene of the set of at least 5 exhaustion gene markers; (ii) scaling the UMI count by dividing the UMI count for each CD8+ exhaustion gene by the total number of UMIs of the single cell transcriptome data of the T cell and then multiplying the quotient by a scale factor ; (iii) applying a logarithmic transformation to obtain normalized UMI counts for the set of at least 5 CD8+ exhaustion gene markers; and (iv) calculating the CD8+ exhaustion score for the T cell as a mean of the normalized UMI counts, wherein the T cell with a CD8+ exhaustion score equal to or higher than 0.65 is identified as an exhausted CD8+ T cell.
- the set of at least 5 CD4+ exhaustion gene markers comprises at least 5 genes selected from the group consisting of ADD3, AGFG1, AHI1, AP3S1, ARAP2, ARHGEF3, ATP2A2, CCDC6, CD200, CD27, CH25H, CHN1, CNIH1, COTL1, CPM, CRYBG1, CTLA4, CXCL13, DUSP4, ELM01, FABP5, FBLN7, FBXO32, FKBP5, FOXN2, FYB1, GEM, GK, GPRIN3, GRSF1, GYPC, HIPK2, HMGB2, ICA1, IL6ST, IQGAP1, ITM2A, ITPR1, IARID2, LHFPL6, LIMSI, LRMP, LRRC8D, MAGEH1, MTHFD2, NAP1L4, NCOA7, NFATC2, NMB, NR3C1, NUDT16, PDCD1, PGM2L1, PHACTR2, POR, PTPN13
- the set of at least 5 CD4+ exhaustion gene markers comprises from 6 to 88 genes selected from the group consisting of ADD3, AGFG1, AHU, AP3S1, ARAP2, ARHGEF3, ATP2A2, CCDC6, CD200, CD27, CH25H, CHN1, CNIH1, COTL1, CPM, CRYBG1, CTLA4, CXCL13, DUSP4, ELM01, FABP5, FBLN7, FBXO32, FKBP5, FOXN2, FYB1, GEM, GK, GPRIN3, GRSF1, GYPC, HIPK2, HMGB2, ICA1, IL6ST, IQGAP1, ITM2A, ITPR1, JARID2, LHFPL6, LIMSI, LRMP, LRRC8D, MAGEH1, MTHFD2, NAP1L4, NCOA7, NFATC2, NMB, NR3C1, NUDT16, PDCD1, PGM2L1, PHACTR2, POR, PT
- the set of at least 5 CD8+ exhaustion gene markers comprises at least 5 genes selected from the group consisting of AHSA1, ALOX5AP, BAG3, BST2, CACYBP, CARD16, CD3D, CD7, CD82, CHN1, CLECL1, CLEC2B, CLEC2D, CTLA4, CTSD, CXCL13, CXCR6, DUSP4, ENTPD1, FKBP1A, GAPDH, GEM, GZMB, HAVCR2, HLA-DRB1, HSPB1, ICOS, IQGAP1, ITGAE, KRT86, LAG3, LAYN, LSP1, NAP1L4, NR3C1, PDCD1, PELI1, PHLDA1, POLR1E, PRDM1, PTPN22, RABI 1FIP1, RAB27A, RBPJ, RGS1, RGS2, RHBDD2, RUNX2, SAMSN1, SERPINH1, SH3BGRL3, SLA, SNX9, SRG
- the set of at least 5 CD8+ exhaustion gene markers comprises from 6 to 61 genes selected from the group consisting of AHSA1, ALOX5AP, BAG3, BST2, CACYBP, CARD16, CD3D, CD7, CD82, CHN1, CLECL1, CLEC2B, CLEC2D, CTLA4, CTSD, CXCL13, CXCR6, DUSP4, ENTPD1, FKBP1A, GAPDH, GEM, GZMB, HAVCR2, HLA-DRB1, HSPB1, ICOS, IQGAP1, ITGAE, KRT86, LAG3, LAYN, LSP1, NAP1L4, NR3C1, PDCD1, PELI1, PHLDA1, POLR1E, PRDM1, PTPN22, RABI 1FIP1, RAB27A, RBPJ, RGS1, RGS2, RHBDD2, RUNX2, SAMSN1, SERPINH1, SH3BGRL3, SLA, SNX9,
- (A) calculating the CD4+ GSEA score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) increasing a running-sum statistic for each CD4+ exhaustion gene of all genes that appears in the set of at least 5 CD4+ exhaustion gene markers and decreasing a running-sum statistic for each CD4+ exhaustion gene of all genes that does not appear in the set of at least 5 CD4+ exhaustion gene markers; and (iv) calculating the CD4+ GSEA score based on running-sum statistics, wherein the T cell with a CD4+ GSEA score equal to or higher than a cutoff value established from data distribution is identified as an exhausted CD4+ T cell, or (B) calculating the CD4+ GSEA score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each gene
- the cutoff value established from data distribution in (A) or (B) is 0.2.
- calculating in (B)(iii) comprises assessing recovery of the set of at least 5 CD4+ exhaustion genes.
- the set of at least 5 CD4+ exhaustion genes are selected among the top ranked genes from the UMI rank obtained in (B)(ii).
- (A) calculating the CD8+ GSEA score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) increasing a running-sum statistic for each CD8+ exhaustion gene of all genes that appears in the set of at least 5 CD8+ exhaustion gene markers and decreasing a running-sum statistic for each CD8+ exhaustion gene of all genes that does not appear in the set of at least 5 CD8+ exhaustion gene markers; and (iv) calculating the CD8+ GSEA score based on running-sum statistics, wherein the T cell with a CD8+ GSEA score equal to or higher than a cutoff value established from data distribution is identified as an exhausted CD8+ T cell, or (B) calculating the CD8+ GSEA score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each gene
- the cutoff value from data distribution in (A) or (B) is 0.3.
- calculating in (B)(iii) comprises assessing recovery of the set of at least 5 CD8+ exhaustion genes.
- the set of at least 5 CD8+ exhaustion genes are selected among the top ranked genes from the UMI rank obtained in (B)(ii).
- the method further comprises calculating the CD4+ exhaustion score and the CD4+ GSEA score for the T cell of the CD4+ cluster.
- the method further comprises calculating the CD8+ exhaustion score and the CD8+ GSEA score for the T cell of the CD8+ cluster.
- the method further comprises identifying TCR clonotypes of the exhausted CD4+ T cells and exhausted CD8+ cells separately based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells identified in (c), wherein the exhausted CD4+ T cells have both the CD4+ exhaustion score and the CD4+ GSEA score above the threshold value, and the exhausted CD8+ T cells have both the CD8+ exhaustion score and the CD8+ GSEA score above the threshold value.
- the method further comprises identifying TCR clonotypes of the exhausted CD4+ T cells and exhausted CD8+ cells separately based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells identified in (c), wherein the exhausted CD4+ T cells have the CD4+ exhaustion score or the CD4+ GSEA score above the threshold value, and the exhausted CD8+ T cells have the CD8+ exhaustion score or the CD8+ GSEA score above the threshold value.
- the method comprises calculating a mean or median CD4+ exhaustion score and/or exhaustion score and a mean or median CD4+ GSEA score for all CD4+ exhausted T cells having the same TCR clonotype; and/or (b) for each TCR clonotype identified in a CD8+ exhausted T cell, the method comprises calculating a mean or median CD8+ exhaustion score and/or exhaustion score and a mean or median CD8+ GSEA score for all CD8+ exhausted T cells having the same TCR clonotype.
- the method comprises identifying a maximum CD4+ exhaustion score and/or exhaustion score and a maximum CD4+ GSEA score for all CD4+ exhausted T cells having the same TCR clonotype; and/or (b) for each TCR clonotype identified in a CD8+ exhausted T cell, the method comprises identifying a maximum CD8+ exhaustion score and/or exhaustion score and a maximum CD8+ GSEA score for all CD8+ exhausted T cells having the same TCR clonotype.
- a TCR clonotype of a given exhausted CD4+ T cell is matched to the CD4+ exhaustion score and/or the CD4+ GSEA score of the same exhausted CD4+ T cell.
- the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell.
- the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell via a same single cell barcode.
- a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score and/or the CD8+ GSEA score of the same exhausted CD8+ T cell.
- the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell.
- the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell via a same single cell barcode.
- the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD4+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD4+ T cells.
- the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD4+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD4+ T cells.
- the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD8+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD8+ T cells.
- the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD8+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD8+ T cells.
- quality checking comprises excluding candidate tumor-reactive TCR clonotypes which (i) have unique pairing of TCR alpha chain and TCR beta chain, (ii) match to known TCRs from a public database; and/or (iii) express innate immune cell markers.
- candidate tumor-reactive TCR clonotypes that match to a known TCR that recognizes a non-oncogenic pathogen are not selected.
- the method further comprises ranking the candidate tumor-reactive TCR clonotypes of the exhausted CD4+ T cells based on clone size.
- the method further comprises ranking the candidate tumor-reactive TCR clonotypes of the exhausted CD8+ T cells based on clone size.
- the method further comprises ranking the candidate tumor-reactive TCR clonotypes with similar clone sizes based on the mean or median CD4+ exhaustion score, the maximum CD4+ exhaustion score, the mean or median CD4+ GSEA score, and/or the maximum CD4+ GSEA score for all CD4+ exhausted T cells.
- the method further comprises ranking the candidate tumor-reactive TCR clonotypes with similar clone sizes based on the mean or median CD8+ exhaustion score, the maximum CD8+ exhaustion score, the mean or median CD8+ GSEA score, and/or the maximum CD8+ GSEA score for all CD8+ exhausted T cells.
- the same TCR clonotype is determined by having the same CDR3 sequence.
- the candidate tumor-reactive TCR clonotypes that match to known TCRs are determined by having the same CDR3 sequence.
- the candidate tumor-reactive TCR clonotype of a proliferating cell is given a higher weighting value when ranking the candidate tumor-reactive TCR clonotypes.
- the candidate tumor-reactive TCR clonotypes are predicted to be therapeutically relevant.
- a median positive predictive value is at least 0.1 for CD4+ TCR clones or at least 0.1 for CD8+ TCR clones.
- the method further comprises selecting at least one candidate tumor- reactive TCR clonotype from at least the top 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 or more of the candidate tumor-reactive TCR clonotypes ranked.
- the method further comprises delivering a nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype of the candidate tumor- reactive TCR clonotypes into a cell.
- the method further comprises administering the nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype of the candidate tumor- reactive TCR clonotypes, or a cell comprising the nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype of the candidate tumor-reactive TCR clonotypes into a subject.
- the subject is the same subject where the population of T cells are obtained.
- the population of T cells are tumor-infiltrating lymphocytes (TILs).
- TILs tumor-infiltrating lymphocytes
- the population of T cells comprises at least 100, at least 500, at least 1,000, at least 2,000, at least 5,000, at least 10,000 or more cells.
- Also provided herein is a method of identifying one or more T-cell receptors as one or more candidate tumor-reactive TCRs from exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer, the method comprising: (a) providing single cell transcriptome data and single-cell T-cell receptor (scTCR) data of the population of T cells comprising exhausted CD4+ T cells and exhausted CD8+ T cells; and (b) identifying TCR clonotypes of the exhausted CD4+ T cells or the exhausted CD8+ cells based on the scTCR data of the exhausted CD4+ T cells or the exhausted CD8+ T cells, wherein the exhausted CD4+ T cells or the exhausted CD8+ T cells are identified based on the single cell transcriptome data.
- scTCR single-cell T-cell receptor
- the exhausted CD4+ T cells or the exhausted CD8+ T cells are identified by the method of any one of claims 1-45.
- each cell of the exhausted CD4+ T cells or the exhausted CD8+ T cells has an exhaustion score and/or a GSEA score equal to or higher than a threshold value.
- the candidate tumor-reactive TCR induces activation of NF AT.
- the candidate tumor-reactive TCR induces expression of CD69, IFN- y, TNF-a, IL-2, and/or IL- 18.
- a nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype selected by the method of any one of the foregoing embodiments
- a cell comprising a TCR comprising the at least one candidate tumor- reactive TCR clonotype selected by the method of any one of the foregoing embodiments or the nucleic acid of any one of the foregoing embodiments.
- a pharmaceutical composition comprising a TCR comprising (a) the at least one candidate tumor-reactive TCR clonotype selected by the method of any one of the foregoing embodiments, the nucleic acid of any one of the foregoing embodiments, or the cell of any one of the foregoing embodiments, and (b) a pharmaceutically acceptable carrier.
- TCR comprising the at least one candidate tumor-reactive TCR clonotype selected by the method of any one of the foregoing embodiments, the nucleic acid of any one of the foregoing embodiments, the cell of any one of the foregoing embodiments, or the pharmaceutical composition of any one of the foregoing embodiments in the manufacturing of a medicament in treating a cancer in a subject in need thereof.
- the cancer is selected from the group consisting of bone cancer, blood cancer, lung cancer, liver cancer, pancreatic cancer, skin cancer, cancer of the head or neck, cutaneous or intraocular melanoma, uterine cancer, ovarian cancer, rectal cancer, cancer of the anal region, stomach cancer, colon cancer, breast cancer, prostate cancer, carcinoma of the sexual and reproductive organs, Hodgkin’s Disease, cancer of the esophagus, cancer of the small intestine, cancer of the endocrine system, cancer of the thyroid gland, cancer of the parathyroid gland, cancer of the adrenal gland, sarcoma of soft tissue, cancer of the bladder, cancer of the kidney, renal cell carcinoma, carcinoma of the renal pelvis neoplasms of the central nervous system (CNS), neuroectodermal cancer, spinal axis tumors glioma, meningioma, and pituitary adenoma.
- CNS central nervous system
- FIG. 1 depicts the mRNA expression level of the CD4 gene and CD8A gene from scGEX data using a normalized UMI unit.
- FIGs. 2A-2C depicts the separation of tumor infiltrating lymphocytes (TILs) into CD4+ or CD8+ population. Each dot represents a single T cell.
- FIG. 2A shows Uniform Manifold Approximation and Projection (UMAP) on two clusters formed by using the 99-gene gene signature which is detailed in Table 2.
- FIG. 2B shows gene expression levels for the CD4 gene in normalized UMI counts.
- FIG. 2C shows gene expression levels for CD8A gene in normalized UMI counts accordingly.
- FIG. 3A depicts a UMAP on clustering and annotation of T cell functional subtypes.
- FIG. 3B depicts the corresponding CD4+ or CD8+ identity defined as illustrated in FIG. 2 plotted on the same UMAP.
- FIG. 4A depicts the annotated clusters which represent T cell functional subtypes as shown in FIG. 3A (left).
- An exhausted CD8 cluster i.e. CD8.EX
- Identification of exhausted CD4 i.e. CD4_EX
- exhausted CD8 CD8_EX
- FIG. 4B depicts exhaustion signature score assignment for each cell from TILs.
- the top graph shows the distribution of CD8 exhaustion signature scores across annotated functional T cell subtypes.
- the bottom graph shows the distribution of CD4 exhaustion signature scores across annotated functional T cell subtypes. Each dot represents a T cell.
- FIG. 5A depicts exhausted and non-exhausted TILs by 61 -gene CD8 exhaustion signature.
- FIG. 5B depicts the correlation between the GSEA score of the 61 -gene CD8 exhaustion signature and the CD8 exhaustion signature score.
- FIG. 5C depicts area under the curve (AUC) scores from the 61 -gene CD8 exhaustion signature for each T cell subtype.
- FIG. 6A depicts exhausted and non-exhausted TILs by 88-gene CD4 exhaustion signature.
- FIG. 6B depicts the correlation between the GSEA score of the 88-gene CD4 exhaustion signature and the CD4 exhaustion signature score.
- FIG. 6C depicts AUC scores from the 88-gene CD4 exhaustion signature for each T cell subtype.
- FIGs. 7A-7C depict a quality control step on lists of identified exhausted CD4+ or CD8+ clonotypes.
- FIG. 7A depicts expression of CD4.
- FIG. 7B depicts expression of CD8A.
- FIG. 7C depicts expression of CD8B.
- FIG. 8 summarizes an exemplary TCR selection and validation process.
- FIGs. 9A-9B show exhaustion signature score for identified TCRs.
- FIG. 9A shows CD8 exhaustion scores.
- FIG. 9B shows CD4 exhaustion scores.
- FIGs. 10A-10D show results of TCR antigen specificity diversity, organized by cancer-type of tumor sample.
- FIG. 10A shows results for lung cancer sample.
- FIG. 10B shows results for head and neck cancer samples.
- FIG. 10C shows results for colorectal cancer samples.
- FIG. 10D shows results for ovarian and breast cancers.
- FIG. 11 summarizes an exemplary verification process of prioritizing top 10 clones and performing functional validation to calculate a positive prediction value (PPV) for the top clones.
- FIGs. 12A-12C show PPV results calculated fortop clones.
- FIG. 12A shows CD8 PPV results.
- FIG. 12B shows CD4 PPV results.
- FIG. 12C shows CD4 CD8 combined PPV results.
- FIGs. 13A-13C show antigen capture results for overall selectable and top clones.
- FIG. 13A shows results for CD8 clones.
- FIG. 13B shows results for CD4 clones.
- FIG. 13C shows results for combined CD4 and CD8 clones.
- TCRs tumor antigen-reactive T-cell receptors
- scGEX single cell transcriptome
- scTCR tumor infiltrated lymphocytes
- TILs tumor infiltrated lymphocytes
- a tumor antigen-agnostic prediction strategy can be developed to identify tumor- specific TCRs, for both CD8+ and CD4+ T cells, using molecular signatures captured in the sequencing data.
- scGEX and scTCR data are generated with a known sequencing platform (e.g., the lOx sequencing platform) with aimed 20,000 and 5,000 reads coverage for GEX and TCR per cell, respectively.
- T cells are partitioned into CD8+ and CD4+ compartments bioinformatically using scGEX data as input.
- this step can be done experimentally at an initial TILs sorting step, experimental sorting potentially leads to lower yields of T cells post-sort. This is especially a concern for CD8+ T cells, given the imbalanced CD4:CD8 ratio often observed in solid tumors and a higher cost of goods.
- bioinformatically sorting the CD8+ and CD4+ T cells can reduce material costs that would come from sorting. Exhaustion scores for each cell in the two compartments derived from the above step can be calculated.
- CD4 and CD8+ T cell clones are ranked, separately, based on clone size and exhaustion scores.
- the Top N clones (for example, an N of 10 clones encompassing both CD4+ and CD8+ clones) can be selected.
- gene signatures developed to identify tumor-specific TCRs, for both CD8+ and CD4+ T cells, using molecular signatures captured in the sequencing data.
- a gene signature to partition T cells into CD4+ and CD8+ can be used.
- a signature gene list can be developed for compartmentalizing with only scGEX data. This step can be pivotal due to lower mRNA expression level of the CD4+ gene and the high drop off rate in single cell sequencing.
- Two gene lists or exhaustion gene signatures for CD4+ and CD8, respectively, can be used to estimate the exhaustion state as represented by an exhaustion score of the CD4+ or CD8+ cells.
- a short gene list (20 genes) for CD4+ can be used to calculate the exhaustion scores on CD4+ follicular helper cells (CD4.FH) relying on a sequencing depth-based normalization method on gene expression measured by scGEX using unique molecular identifiers (UMIs).
- CD4+ follicular helper cells CD4.FH
- UMIs unique molecular identifiers
- GSEA gene set enrichment analysis
- GSEA gene set enrichment analysis
- the exhaustion scores from the two lists serve to complement one another.
- a short gene list (20 genes) to compute normalization-based score and a long gene list (61 genes) to computed GSEA score can be used to estimate the exhaustion state of CD8+ cells.
- the performance of the end-to-end algorithm for selecting 10 candidate tumor-reactive TCR clonotypes has been validated experimentally and demonstrated with a combined 70% positive predictive value (PPV) as shown in Table 7.
- the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value.
- the term “about” meaning within an acceptable error range for the particular value should be assumed.
- Neoantigen refers to a class of tumor antigens which arise from tumor-specific changes in proteins. Neoantigens encompass, but are not limited to, tumor antigens which arise from, for example, a substitution in a protein sequence, a frame shift mutation, a fusion polypeptide, an in-frame deletion, an insertion, and expression of an endogenous retroviral polypeptide.
- a “neoepitope” refers to an epitope that is not present in a reference, such as a non-diseased cell, e.g., a non-cancerous cell or a germline cell, but is found in a diseased cell, e.g., a cancer cell. This includes situations where a corresponding epitope is found in a normal non-diseased cell or a germline cell but, due to one or more mutations in a diseased cell, e.g., a cancer cell, the sequence of the epitope is changed so as to result in the neoepitope.
- a “mutation” refers to a change of or a difference in a nucleic acid sequence (e.g., a nucleotide substitution, addition or deletion) compared to a reference nucleic acid.
- a “somatic mutation” can occur in any of the cells of the body except the germ cells (sperm and egg) and are not passed on to children. These alterations can (but do not always) cause cancer or other diseases.
- a mutation is a non-synonymous mutation.
- a “non-synonymous mutation” refers to a mutation, for (e.g., a nucleotide substitution), which does result in an amino acid change such as an amino acid substitution in the translation product.
- a “frameshift” occurs when a mutation disrupts the normal phase of a gene’s codon periodicity (also known as “reading frame”), resulting in translation of a non-native protein sequence. It is possible for different mutations in a gene to achieve the same altered reading frame.
- An “antigen-presenting cell” refers to a cell that expresses an MHC molecule and can present an epitope in complex with the MHC molecule.
- the cell can present peptide fragments of protein antigens in association with MHC molecules on its cell surface.
- the term includes professional antigen-presenting cells (e.g., B lymphocytes, macrophages, dendritic cells) as well as any other cells that express an MHC and can present an epitope in complex with the MHC (e.g., keratinocytes, endothelial cells, astrocytes, fibroblasts, oligodendrocytes).
- the APC can be a tissue-specific APC (e.g., Langerhans cells, Kupffer cells, microglia).
- the APC can be a cell that is engineered to express an MHC molecule or a cell that expresses an endogenous MHC molecule.
- a derived epitope when used to discuss an epitope is a synonym for “prepared.”
- a derived epitope can be isolated from a natural source, or it can be synthesized according to standard protocols in the art.
- Synthetic epitopes can comprise artificial amino acid residues “amino acid mimetics,” such as D isomers of natural occurring L amino acid residues or non-natural amino acid residues such as cyclohexylalanine.
- a derived or prepared epitope can be an analog of a native epitope.
- the term “derived from” refers to the origin or source, and can include naturally occurring, recombinant, unpurified, purified or differentiated molecules or cells.
- an expanded or induced antigen specific T cell can be derived from a T cell.
- an expanded or induced antigen specific T cell can be derived from an antigen specific T cell in a biological sample.
- a matured APC e.g., a professional APC
- a non-matured APC e.g., an immature APC
- an APC can be derived from a monocyte (e.g., a CD14 + monocyte).
- an APC can be derived from a bone marrow cell.
- an “epitope” is the collective features of a molecule (e.g., a peptide’s charge and primary, secondary and tertiary structure) that together form a site recognized by another molecule (e.g., an immunoglobulin, T-cell receptor, HLA molecule, or chimeric antigen receptor).
- an epitope can be a set of amino acid residues involved in recognition by a particular immunoglobulin; a Major Histocompatibility Complex (MHC) receptor; or in the context of T cells, those residues recognized by a T-cell receptor protein and/or a chimeric antigen receptor.
- MHC Major Histocompatibility Complex
- Epitopes can be prepared by isolation from a natural source, or they can be synthesized according to standard protocols in the art. Synthetic epitopes can comprise artificial amino acid residues, amino acid mimetics, (such as D isomers of naturally-occurring L amino acid residues or non-naturally-occurring amino acid residues). Throughout this disclosure, epitopes can be referred to in some cases as peptides or peptide epitopes. In certain embodiments, there is a limitation on the length of a peptide of the present disclosure. The embodiment that is length-limited occurs when the protein or peptide comprising an epitope described herein comprises a region (i.e., a contiguous series of amino acid residues) having 100% identity with a native sequence.
- the region with 100% identity to a native sequence generally has a length of: less than or equal to 600 amino acid residues, less than or equal to 500 amino acid residues, less than or equal to 400 amino acid residues, less than or equal to 250 amino acid residues, less than or equal to 100 amino acid residues, less than or equal to 85 amino acid residues, less than or equal to 75 amino acid residues, less than or equal to 65 amino acid residues, and less than or equal to 50 amino acid residues.
- an “epitope” described herein is comprised by a peptide having a region with less than 51 amino acid residues that has 100% identity to a native peptide sequence, in any increment down to 5 amino acid residues; for example 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 amino acid residues.
- a “T cell epitope” refers to a peptide sequence bound by an MHC molecule in the form of a peptide-MHC (pMHC) complex.
- a peptide-MHC complex can be recognized and bound by a TCR of a T cell (e.g., a cytotoxic T-lymphocyte or a T-helper cell).
- a “T cell” includes CD4 + T cells and CD8 + T cells.
- the term T cell also includes both T helper 1 type T cells and T helper 2 type T cells.
- T cells can be generated by the method described in the application, for a clinical application.
- T cells or adoptive T cells referred to here, such as for a clinical application are cells isolated from a biological source, manipulated and cultured ex vivo and prepared into a drug candidate for a specific therapy such as a cancer therapy.
- candidate cells pass specific qualitative and quantitative criteria for fitness for a clinical application, the drug candidate can be designated a drug product. In some cases, a drug product is selected from a number of drug candidates.
- a drug product can be a vaccine, such as an mRNA-based vaccine, a T cell, more specifically, a population of T cells, or more specifically a population of T cells with heterogeneous characteristics and subtypes.
- a drug product, as disclosed herein can have a population of T cells comprising CD8+ T cells, CD4+ T cells, with cells at least above a certain exhibiting antigen specificity, a certain percentage of each exhibiting a memory phenotype, among others.
- TILs tumor infiltrating lymphocytes
- cytotoxic T cells lymphocytes
- Thl and Thl7 CD4+ T cells natural killer cells
- Immune cells refers to a cell that plays a role in the immune response.
- Immune cells are of hematopoietic origin, and include lymphocytes, such as B cells, T cells and natural killer cells; myeloid cells, such as monocytes, macrophages (e.g., Ml Macrophages), dendritic cells, eosinophils, mast cells, basophils, and granulocytes. Immune cells can migrate into tumors.
- TCR T-cell receptor
- MHC major histocompatibility complex
- This multi-subunit immune recognition receptor associates with the CD3 complex and binds peptides presented by the MHC class I and II proteins on the surface of antigen-presenting cells (APCs). Binding of a TCR to a peptide on an APC is a central event in T cell activation.
- a “TCR clonotype” refers to a distinct T cell receptor (TCR) comprising a pair of TCR alpha chain and a TCR beta chain (or a pair of TCR gamma chain and a TCR delta chain) that is unique to a specific T cell clone.
- TCR T cell receptor
- the TCR is a complex of integral membrane proteins that participates in the activation of T cells in response to an antigen.
- Each T cell has a unique TCR, and when that cell replicates, all of its descendants will have the exact same TCR — this group of cells is referred to as a T cell clone.
- TCR clonotype can be used to identify and track these T cell clones as they respond to specific antigens and participate in immune responses.
- the diversity of TCR clonotypes in an individual's immune system can provide valuable insights into the breadth and specificity of their immune response.
- the size of a T cell clone can be determined through a process called T cell receptor (TCR) sequencing. In this process, DNA from T cells can be extracted and sequenced to identify the unique genetic arrangement that encodes the TCR of each cell. This unique sequence can be specific to each T cell clone.
- TCR T cell receptor
- each T cell clone in the sample By counting the frequency of each unique TCR sequence, the relative size of each T cell clone in the sample can be determined due to the fact that every T cell in a specific clone has the same unique TCR sequence. So, the more frequently a particular TCR sequence appears in the sequencing data, the larger the size of that T cell clone. This type of analysis can provide valuable insights into the diversity and specificity of the immune response.
- a “chimeric antigen receptor” or “CAR” refers to an antigen binding protein in that includes an immunoglobulin antigen binding domain (e.g., an immunoglobulin variable domain) and a T cell receptor (TCR) constant domain.
- a “constant domain” of a TCR polypeptide includes a membrane-proximal TCR constant domain, a TCR transmembrane domain and/or a TCR cytoplasmic domain, or fragments thereof.
- a CAR is a monomer that includes a polypeptide comprising an immunoglobulin heavy chain variable domain linked to a TCRp constant domain.
- the CAR is a dimer that includes a first polypeptide comprising an immunoglobulin heavy or light chain variable domain linked to a TCRa or TCRP constant domain and a second polypeptide comprising an immunoglobulin heavy or light chain variable domain (e.g., a K or X variable domain) linked to a TCRp or TCRa constant domain.
- a first polypeptide comprising an immunoglobulin heavy or light chain variable domain linked to a TCRa or TCRP constant domain
- a second polypeptide comprising an immunoglobulin heavy or light chain variable domain (e.g., a K or X variable domain) linked to a TCRp or TCRa constant domain.
- MHC Major Histocompatibility Complex
- HLA human leukocyte antigen
- HLA Human Leukocyte Antigen
- MHC Major Histocompatibility Complex
- the major histocompatibility complex in the genome comprises the genetic region whose gene products expressed on the cell surface are important for binding and presenting endogenous and/or foreign antigens and thus for regulating immunological processes.
- MHC proteins or molecules are important for signaling between lymphocytes and antigen-presenting cells or diseased cells in immune reactions. MHC proteins or molecules bind peptides and present them for recognition by T-cell receptors.
- the proteins encoded by the MHC can be expressed on the surface of cells, and display both self-antigens (peptide fragments from the cell itself) and non-self-antigens (e.g., fragments of invading microorganisms) to a T-cell.
- MHC binding peptides can result from the proteolytic cleavage of protein antigens and represent potential lymphocyte epitopes, (e.g., T cell epitope and B cell epitope).
- MHCs can transport the peptides to the cell surface and present them there to specific cells, such as cytotoxic T-lymphocytes, T-helper cells, or B cells.
- the MHC region can be divided into three subgroups, class I, class II, and class III.
- MHC class I proteins can contain an a-chain and ⁇ 2 -microglobulin (not part of the MHC encoded by chromosome 15). They can present antigen fragments to cytotoxic T-cells.
- MHC class II proteins can contain a- and P-chains and they can present antigen fragments to T-helper cells.
- MHC class III region can encode for other immune components, such as complement components and cytokines.
- the MHC can be both polygenic (there are several MHC class I and MHC class II genes) and polymorphic (there are multiple alleles of each gene).
- a “receptor” refers to a biological molecule or a molecule grouping capable of binding a ligand.
- a receptor can serve, to transmit information in a cell, a cell formation or an organism.
- a receptor comprises at least one receptor unit, for example, where each receptor unit can consist of a protein molecule.
- a receptor has a structure which complements that of a ligand and can complex the ligand as a binding partner. The information is transmitted in particular by conformational changes of the receptor following complexation of the ligand on the surface of a cell.
- a receptor is to be understood as meaning in particular proteins of MHC classes I and II capable of forming a receptor/ligand complex with a ligand, in particular a peptide or peptide fragment of suitable length.
- a “ligand” refers to a molecule which has a structure complementary to that of a receptor and is capable of forming a complex with this receptor.
- a ligand is to be understood as meaning a peptide or peptide fragment which has a suitable length and suitable binding motifs in its amino acid sequence, so that the peptide or peptide fragment is capable of forming a complex with MHC proteins such as MHC class I or MHC class II proteins.
- a “receptor/ligand complex” is also to be understood as meaning a “receptor/peptide complex” or “receptor/peptide fragment complex”, including a peptide- or peptide fragment-presenting MHC molecule such as MHC class I or MHC class II molecules.
- a “native” or a “wild type” sequence refers to a sequence found in nature.
- the term “naturally occurring” as used herein refers to the fact that an object can be found in nature. For example, a peptide or nucleic acid that is present in an organism (including viruses) and can be isolated from a source in nature and which has not been intentionally modified by man in the laboratory is naturally occurring.
- naturally processed refers to the fact that the antigen is not pulsed or overexpressed in a cell by man in the laboratory but is presented by the cell as a product of endogenous pathways of antigen processing and presentation (e.g., via the transporter associated with antigen processing (TAP) pathway to present intracellular antigen on MHC I).
- TEP transporter associated with antigen processing
- motif refers to a pattern of residues in an amino acid sequence of defined length, for example, a peptide of less than about 15 amino acid residues in length, or less than about 13 amino acid residues in length, for example, from about 8 to about 13 amino acid residues (e.g., 8, 9, 10, 11, 12, or 13) for a class I HLA motif and from about 6 to about 25 amino acid residues (e.g., 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25) for a class II HLA motif, which is recognized by a particular HLA molecule. Motifs are typically different for each HLA protein encoded by a given human HLA allele.
- an MHC class I motif identifies a peptide of 7, 8 9, 10, 11, 12 or 13 amino acid residues in length.
- an MHC class II motif identifies a peptide of 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 26 amino acid residues in length.
- a “cross -reactive binding” peptide refers to a peptide that binds to more than one member of a class of binding pair members (e.g., a peptide bound by more than one HLA molecule, or a peptide bound by both a class I HLA molecule and a class II HLA molecule).
- residue refers to an amino acid residue or amino acid mimetic residue incorporated into a peptide or protein by an amide bond or amide bond mimetic, or that is encoded by a nucleic acid (DNA or RNA).
- the nomenclature used to describe peptides or proteins follows the conventional practice. The amino group is presented to the left (the amino- or N-terminus) and the carboxyl group to the right (the carboxy- or C-terminus) of each amino acid residue.
- amino acid residue positions are referred to in a peptide epitope, they are numbered in an amino to carboxyl direction with the first position being the residue located at the amino terminal end of the epitope, or the peptide or protein of which it can be a part.
- the amino- and carboxyl-terminal groups although not specifically shown, are in the form they would assume at physiologic pH values, unless otherwise specified.
- each residue is generally represented by standard three letter or single letter designations.
- the L-form of an amino acid residue is represented by a capital single letter or a capital first letter of a three-letter symbol
- the D-form for those amino acid residues having D-forms is represented by a lower case single letter or a lower case three letter symbol.
- Glycine has no asymmetric carbon atom and is simply referred to as “Gly” or “G”.
- the amino acid sequences of peptides set forth herein are generally designated using the standard single letter symbol.
- peptide and peptide epitope are used interchangeably with “oligopeptide” in the present specification to designate a series of residues connected one to the other, typically by peptide bonds between the a-amino and carboxyl groups of adjacent amino acid residues.
- a “synthetic peptide” refers to a peptide that is obtained from a non-natural source, e.g., is man-made. Such peptides can be produced using such methods as chemical synthesis or recombinant DNA technology. “Synthetic peptides” include “fusion proteins.”
- a “conservative amino acid substitution” is one in which one amino acid residue is replaced with another amino acid residue having a similar side chain.
- Families of amino acid residues having similar side chains have been defined in the art, including basic side chains (e.g., lysine, arginine, histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), beta-branched side chains (e.g., threonine, valine, isoleucine) and aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, histidine).
- “Pharmaceutically acceptable” refers to a generally non-toxic, inert, and/or physiologically compatible composition or component of a composition.
- a “pharmaceutical excipient” or “excipient” comprises a material such as an adjuvant, a carrier, pH-adjusting and buffering agents, tonicity adjusting agents, wetting agents, preservatives, and the like.
- a “pharmaceutical excipient” is an excipient which is pharmaceutically acceptable.
- polynucleotide and “nucleic acid” are used interchangeably herein and refer to polymers of nucleotides of any length, and include DNA and RNA, for example, mRNA.
- the nucleotides can be deoxyribonucleotides, ribonucleotides, modified nucleotides or bases, and/or their analogs, or any substrate that can be incorporated into a polymer by DNA or RNA polymerase.
- the polynucleotide and nucleic acid can be in vitro transcribed mRNA.
- the polynucleotide that is administered using the methods of the invention is mRNA.
- isolated or “biologically pure” refer to material which is substantially or essentially free from components which normally accompany the material as it is found in its native state.
- isolated peptides described herein do not contain some or all of the materials normally associated with the peptides in their in situ environment.
- an “isolated” epitope can be an epitope that does not include the whole sequence of the protein from which the epitope was derived.
- a naturally-occurring polynucleotide or peptide present in a living animal is not isolated, but the same polynucleotide or peptide, separated from some or all of the coexisting materials in the natural system, is isolated.
- Such a polynucleotide can be part of a vector, and/or such a polynucleotide or peptide can be part of a composition, and still be “isolated” in that such vector or composition is not part of its natural environment.
- Isolated RNA molecules include in vivo or in vitro RNA transcripts of the DNA molecules described herein, and further include such molecules produced synthetically.
- a polypeptide, antibody, polynucleotide, vector, cell, or composition which is isolated is substantially pure.
- substantially pure refers to material which is at least 50% pure (i.e., free from contaminants), at least 90% pure, at least 95% pure, at least 98% pure, or at least 99% pure.
- nucleic acids or polypeptides refer to two or more sequences or subsequences that are the same or have a specified percentage of nucleotides or amino acid residues that are the same, when compared and aligned (introducing gaps, if necessary) for maximum correspondence, not considering any conservative amino acid substitutions as part of the sequence identity.
- the percent identity can be measured using sequence comparison software or algorithms or by visual inspection.
- Various algorithms and software that can be used to obtain alignments of amino acid or nucleotide sequences are well-known in the art. These include, for example, BLAST, ALIGN, Megalign, BestFit, GCG Wisconsin Package, and variations thereof.
- two nucleic acids or polypeptides described herein are substantially identical, meaning they have at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, and in some embodiments at least 95%, 96%, 97%, 98%, 99% nucleotide or amino acid residue identity, when compared and aligned for maximum correspondence, as measured using a sequence comparison algorithm or by visual inspection.
- identity exists over a region of the sequences that is at least about 10, at least about 20, at least about 40-60 residues, at least about 60-80 residues in length or any integral value there between.
- identity exists over a longer region than 60-80 residues, such as at least about 80-100 residues, and in some embodiments the sequences are substantially identical over the full length of the sequences being compared, such as an amino acid sequence of a peptide or a coding region of a nucleotide sequence.
- subject refers to any animal (e.g., a mammal), including, for example, humans, non-human primates, canines, felines, rodents, and the like, which is to be the recipient of a particular treatment.
- the terms “subject” and “patient” are used interchangeably herein in reference to a human subject.
- the terms “effective amount” or “therapeutically effective amount” or “therapeutic effect” refer to an amount of a therapeutic effective to “treat” a disease or disorder in a subject or mammal.
- the therapeutically effective amount of a drug has a therapeutic effect and as such can prevent the development of a disease or disorder; slow down the development of a disease or disorder; slow down the progression of a disease or disorder; relieve to some extent one or more of the symptoms associated with a disease or disorder; reduce morbidity and mortality; improve quality of life; or a combination of such effects.
- treating or “treatment” or “to treat” or “alleviating” or “to alleviate” refer to therapeutic measures that cure, slow down, lessen symptoms of, and/or halt progression of a diagnosed pathologic condition or disorder. Thus, those in need of treatment include those already with the disorder.
- treating may refer to reducing, or ameliorating a disorder and/or symptoms associated therewith (e.g., a neoplasia or tumor or infectious agent or an autoimmune disease).
- Treating can refer to administration of the therapy to a subject after the onset, or suspected onset, of a disease (e.g., cancer or infection by an infectious agent or an autoimmune disease).
- Treating includes the concepts of “alleviating”, which refers to lessening the frequency of occurrence or recurrence, or the severity, of any symptoms or other ill effects related to the disease and/or the side effects associated with therapy.
- the term “treating” may also encompass the concept of “managing” which refers to reducing the severity of a disease or disorder in a patient, e.g., extending the life or prolonging the survivability of a patient with the disease, or delaying its recurrence, e.g., lengthening the period of remission in a patient who had suffered from the disease. It is appreciated that, although not precluded, treating a disorder or condition does not require that the disorder, condition, or symptoms associated therewith be completely eliminated.
- prevent refers to prophylactic or preventative measures that slow down the development of a targeted pathologic condition or disorder.
- those in need of prevention include those prone to have the disorder or those in whom the disorder is to be prevented.
- PBMC peripheral blood mononuclear cell
- stimulation refers to a response induced by binding of a stimulatory molecule with its cognate ligand thereby mediating a signal transduction event.
- stimulation of a T cell can refer to binding of a TCR of a T cell to a peptide-MHC complex.
- stimulation of a T cell can refer to a step in which PBMCs are cultured together with peptide loaded APCs.
- enriched refers to a composition or fraction wherein an object species has been partially purified such that the concentration of the object species is substantially higher than the naturally occurring level of the species in a finished product without enrichment.
- induced cell refers to a cell that has been treated with an inducing compound, cell, or population of cells that affects the cell’s protein expression, gene expression, differentiation status, shape, morphology, viability, and the like.
- a “reference” can be used to correlate and/or compare the results obtained in the methods of the present disclosure from a diseased specimen.
- a “reference” may be obtained on the basis of one or more normal specimens, in particular specimens which are not affected by a disease, either obtained from an individual or one or more different individuals (e.g., healthy individuals), such as individuals of the same species.
- a “reference” can be determined empirically by testing a sufficiently large number of normal specimens.
- a tumor unless otherwise mentioned, is a cancerous tumor, and the terms cancer and tumor are used interchangeably throughout the document. While a tumor is a cancer of solid tissue, several of the compositions and methods described herein are in principle applicable to cancers of the blood, such as leukemia.
- mRNA transcripts include, but is not limited to pre-mRNA transcript(s), transcript processing intermediates, mature mRNA(s) ready for translation and transcripts of the gene or genes, or nucleic acids derived from the mRNA transcript(s). Transcript processing can include splicing, editing and degradation.
- a nucleic acid derived from an mRNA transcript refers to a nucleic acid for whose synthesis the mRNA transcript or a subsequence thereof has ultimately served as a template.
- a cDNA reverse transcribed from an mRNA, an RNA transcribed from that cDNA, a DNA amplified from the cDNA, an RNA transcribed from the amplified DNA, etc. are all derived from the mRNA transcript and detection of such derived products is indicative of the presence and/or abundance of the original transcript in a sample.
- mRNA derived samples include, but are not limited to, mRNA transcripts of the gene or genes, cDNA reverse transcribed from the mRNA, cRNA transcribed from the cDNA, DNA amplified from the genes, RNA transcribed from amplified DNA, and the like.
- barcode generally refers to a label, or identifier, that can be part of an analyte to convey information about the analyte.
- a barcode can be a tag attached to an analyte (e.g., nucleic acid molecule) or a combination of the tag in addition to an endogenous characteristic of the analyte (e.g., size of the analyte or end sequence(s)).
- the barcode may be unique. Barcodes can have a variety of different formats, for example, barcodes can include: polynucleotide barcodes; random nucleic acid and/or amino acid sequences; and synthetic nucleic acid and/or amino acid sequences.
- a barcode can be attached to an analyte in a reversible or irreversible manner.
- a barcode can be added to, for example, a fragment of a deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) sample before, during, and/or after sequencing of the sample. Barcodes can allow for identification and/or quantification of individual sequencing-reads in real time.
- sequence of nucleotide bases in one or more polynucleotides generally refers to methods and technologies for determining the sequence of nucleotide bases in one or more polynucleotides.
- the polynucleotides can be, for example, deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), including variants or derivatives thereof (e.g., single stranded DNA).
- Sequencing devices may provide a plurality of sequence reads corresponding to the genetic information of a subject (e.g., human), as generated by the device from a sample comprising polynucleotides.
- next generation sequencing refers to sequencing technologies having increased throughput as compared to the traditional Sanger- and capillary electrophoresis-based approaches, for example with the ability to generate hundreds of thousands or millions of relatively short sequence reads at a time.
- next generation sequencing techniques include, but are not limited to, sequencing by synthesis, sequencing by ligation, and sequencing by hybridization.
- next generations sequencing methods include, but are not limited to, pyrosequencing as used by the GS Junior and GS FLX Systems (454 Life Sciences, Bradford, Conn.); sequencing by synthesis as used by Miseq and Solexa system (Illumina, Inc., San Diego, Calif); the SOLiDTM (Sequencing by Oligonucleotide Ligation and Detection) system and Ion Torrent Sequencing systems such as the Personal Genome Machine or the Proton Sequencer (Thermo Fisher Scientific, Waltham, Mass.), and nanopore sequencing systems (Oxford Nanopore Technologies, Oxford, united Kingdom).
- running-sum statistic refers to a statistical measure obtained by consecutively adding (or subtracting) the values of a data set or time series. This method can be used for moving total computations. In this form of cumulative calculation, the total sum of data values is updated whenever a new data point is added to the series, or an existing data point is subtracted. Running-sum statistics can be useful for analyzing trends over time, checking data integrity, or identifying significant shifts in data points in fields such as finance, data analysis, economics, and engineering.
- pharmaceutically acceptable salt refers to salts derived from a variety of organic and inorganic counter ions known in the art.
- Pharmaceutically acceptable acid addition salts can be formed with inorganic acids and organic acids.
- Preferred inorganic acids from which salts can be derived include, for example, hydrochloric acid, hydrobromic acid, sulfuric acid, nitric acid and phosphoric acid.
- Preferred organic acids from which salts can be derived include, for example, acetic acid, propionic acid, glycolic acid, pyruvic acid, oxalic acid, maleic acid, malonic acid, succinic acid, fumaric acid, tartaric acid, citric acid, benzoic acid, cinnamic acid, mandelic acid, methanesulfonic acid, ethanesulfonic acid, p-toluenesulfonic acid and salicylic acid.
- Pharmaceutically acceptable base addition salts can be formed with inorganic and organic bases.
- Inorganic bases from which salts can be derived include, for example, sodium, potassium, lithium, ammonium, calcium, magnesium, iron, zinc, copper, manganese and aluminum.
- Organic bases from which salts can be derived include, for example, primary, secondary, and tertiary amines, substituted amines including naturally occurring substituted amines, cyclic amines and basic ion exchange resins. Specific examples include isopropylamine, trimethylamine, diethylamine, triethylamine, tripropylamine, and ethanolamine.
- the pharmaceutically acceptable base addition salt is chosen from ammonium, potassium, sodium, calcium, and magnesium salts.
- cocrystal refers to a molecular complex derived from a number of cocrystal formers known in the art.
- a cocrystal typically does not involve hydrogen transfer between the cocrystal and the drug, and instead involves intermolecular interactions, such as hydrogen bonding, aromatic ring stacking, or dispersive forces, between the cocrystal former and the drug in the crystal structure.
- pharmaceutically acceptable carrier or “pharmaceutically acceptable excipient” are intended to include any and all solvents, dispersion media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents, and inert ingredients.
- pharmaceutically acceptable carriers or pharmaceutically acceptable excipients for active pharmaceutical ingredients is well known in the art. Except insofar as any conventional pharmaceutically acceptable carrier or pharmaceutically acceptable excipient is incompatible with the active pharmaceutical ingredient, its use in the therapeutic compositions of the invention is contemplated. Additional active pharmaceutical ingredients, such as other drugs, can also be incorporated into the described compositions, processes and methods.
- a method of identifying exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer comprising: (a) providing single cell transcriptome data of the population of T cells.
- the method comprising (b) classifying each T cell of the population of T cells as a CD4+ cell or a CD8+ cell based on an expression level of each classification gene of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58,
- the method comprising (c) calculating (i) a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers.
- GSEA CD4+ gene set enrichment analysis
- the method comprising (c) calculating (i) a CD4+ exhaustion score and a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers.
- GSEA CD4+ gene set enrichment analysis
- the method comprising (c) calculating (i) a CD4+ exhaustion score or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers.
- the method comprising calculating a CD4+ exhaustion score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least
- the method comprising calculating a CD4+ GSEA score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47
- the method comprising (c) calculating (ii) a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers.
- the method comprising (c) calculating (ii) a CD8+ exhaustion score and a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers.
- the method comprising (c) calculating (ii) a CD8+ exhaustion score or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers.
- the method comprising calculating a CD8+ exhaustion score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD8+ exhaustion gene markers (e.g., see Table 3).
- the method comprising calculating a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least
- the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers is different from the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers.
- each T cell within the CD4+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
- each T cell within the CD8+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
- each T cell within the CD4+ cluster with an exhaustion score and a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
- each T cell within the CD8+ cluster with an exhaustion score and a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
- each T cell within the CD4+ cluster with an exhaustion score or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
- each T cell within the CD8+ cluster with an exhaustion score or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
- the threshold value described herein can be an arbitrary value.
- the threshold value can vary based on the number of gene markers used in the set.
- the threshold value can be a cutoff that is established from data distribution.
- the arbitrary cutoff can be fixed and can be determined by analysis on samples processed by the methods described herein.
- the cutoff can be determined by analyzing score distribution at a clonotype level and selecting a fixed cutoff based upon the overall distribution of a given population of samples.
- a method of identifying exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer comprising: calculating a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD4+ T cell.
- the method comprising: calculating a CD4+ exhaustion score and a CD4+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD4+ T cell.
- the method comprising: calculating a CD4+ exhaustion score or a CD4+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD4+ T cell.
- the calculating is based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers.
- the method comprising calculating a CD4+ exhaustion score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD4+ exhaustion gene markers (e.g., see Table 4).
- the method comprising calculating a CD4+ GSEA score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least
- each CD4+ exhaustion gene marker is from single cell transcriptome data of the population of T cells from the tumor microenvironment of the subject.
- each T cell classified as a CD4+ T cell with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
- each T cell classified as a CD4+ T cell with an exhaustion score and a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
- each T cell classified as a CD4+ T cell with an exhaustion score or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
- the method further comprises, prior to calculating, classifying a T cell from the population of T cells as a CD4+ cell based on an expression level of each classification gene of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster.
- a method of identifying exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer comprising: calculating a CD8+ exhaustion score and/or a CD8+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD8+ T cell.
- the method comprising: calculating a CD8+ exhaustion score and a CD8+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD8+ T cell.
- the method comprising: calculating a CD8+ exhaustion score or a CD8+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD8+ T cell
- the calculating is based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers.
- the method comprising calculating a CD8+ exhaustion score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD8+ exhaustion gene markers (e.g., see Table 3).
- the method comprising calculating a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least
- each CD8+ exhaustion gene marker is from single cell transcriptome data of the population of T cells from the tumor microenvironment of the subject.
- each T cell classified as a CD8+ T cell with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
- each T cell classified as a CD8+ T cell with an exhaustion score and a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
- each T cell classified as a CD8+ T cell with an exhaustion score or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
- the method further comprises, prior to calculating, classifying a T cell from the population of T cells as a CD8+ cell based on an expression level of each classification gene of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 classification genes from the single cell transcriptome data, thereby generating a CD8+ cluster.
- the method further comprises classifying each T cell from the population of T cells as a CD4+ cell or a CD8+ cell based on an expression level of each classification gene of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58
- the method further comprises calculating (i) a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers.
- GSEA CD4+ gene set enrichment analysis
- the method further comprises calculating (i) a CD4+ exhaustion score and a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers.
- GSEA CD4+ gene set enrichment analysis
- the method further comprises calculating (i) a CD4+ exhaustion score or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers.
- GSEA CD4+ gene set enrichment analysis
- the method comprising calculating a CD4+ exhaustion score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD4+ exhaustion gene markers (e.g., see Table 4).
- the method comprising calculating a CD4+ GSEA score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least
- the method further comprises calculating (ii) a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers.
- the method further comprises calculating (ii) a CD8+ exhaustion score and a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers.
- the method further comprises calculating (ii) a CD8+ exhaustion score or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers.
- the method comprising calculating a CD8+ exhaustion score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD8+ exhaustion gene markers (e.g., see Table 3).
- the method comprising calculating a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least
- the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers is different from the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers.
- a method of classifying CD8+ T cells and CD4+ T cells in a population of T cells comprising: (a) providing single cell transcriptome data of a population of T cells obtained from a tumor microenvironment of a subject having a cancer.
- the method further comprises (b) classifying each T cell of the population of T cells as a CD4+ cell or a CD8+ cell based on an expression level of each classification gene of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58,
- a T cell of the CD4+ cluster is classified as CD4+ T cell.
- a T cell of the CD8+ cluster is classified as CD8+ T cell.
- the method further comprises calculating a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers.
- GSEA CD4+ gene set enrichment analysis
- the method further comprises calculating a CD4+ exhaustion score and a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers.
- GSEA CD4+ gene set enrichment analysis
- the method further comprises calculating a CD4+ exhaustion score or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers.
- GSEA CD4+ gene set enrichment analysis
- the method comprises calculating a CD4+ exhaustion score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD4+ exhaustion gene markers (e.g., see Table 4).
- the method comprises calculating a CD4+ GSEA score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least
- each CD4+ exhaustion gene marker is from single cell transcriptome data of a population of T cells from the tumor microenvironment of the subject.
- each T cell within the CD4+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
- each T cell within the CD4+ cluster with an exhaustion score and a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
- each T cell within the CD4+ cluster with an exhaustion score or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
- the method further comprises calculating a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers
- the method further comprises calculating a CD8+ exhaustion score or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers.
- the method further comprises calculating a CD8+ exhaustion score and a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers.
- the method comprises calculating a CD8+ exhaustion score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD8+ exhaustion gene markers (e g., see Table 3).
- the method comprises calculating a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least
- the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers is different from the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers.
- each CD8+ exhaustion gene marker is from single cell transcriptome data of a population of T cells from a tumor microenvironment of a subject having a cancer.
- each T cell within the CD8+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
- each T cell within the CD8+ cluster with an exhaustion score and a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
- each T cell within the CD8+ cluster with an exhaustion score or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
- the method further comprises obtaining the population of T cells from the tumor microenvironment of the subject.
- obtaining comprises isolating a tumor or a tumor tissue comprising the population of T cells from the subject.
- the expression level is determined by mRNA transcripts.
- the method further comprises sequencing mRNAs from the population of T cells to obtain the single cell transcriptome data.
- the method further comprises providing single-cell T-cell receptor (scTCR) data of the population of T cells. In some embodiments, the method further comprises sequencing the population of T cells to obtain the scTCR data of each T cell. In some embodiments, the method further comprises identifying a TCR clonotype of an exhausted CD4+ T cell or an exhausted CD8+ T cell based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells. In some embodiments, the method further comprises identifying TCR clonotypes of each exhausted CD4+ T cell of the population of T cells based on the scTCR data of exhausted CD4+ T cells.
- scTCR single-cell T-cell receptor
- the method further comprises identifying TCR clonotypes of each exhausted CD8+ cell of the population of T cells based on the scTCR data of exhausted CD8+ T cells. In some embodiments, the method further comprises identifying TCR clonotypes of each exhausted CD4+ T cell and each exhausted CD8+ T cell of the population of T cells based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells.
- a TCR clonotype of a given exhausted CD4+ T cell is matched to the CD4+ exhaustion score and/or the CD4+ GSEA score of the same exhausted CD4+ T cell. In some embodiments, a TCR clonotype of a given exhausted CD4+ T cell is matched to the CD4+ exhaustion score and the CD4+ GSEA score of the same exhausted CD4+ T cell. In some embodiments, a TCR clonotype of a given exhausted CD4+ T cell is matched to the CD4+ exhaustion score or the CD4+ GSEA score of the same exhausted CD4+ T cell.
- the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell. In some embodiments, the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell via a same single cell barcode.
- a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score and/or the CD8+ GSEA score of the same exhausted CD8+ T cell.
- a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score and the CD8+ GSEA score of the same exhausted CD8+ T cell.
- a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score or the CD8+ GSEA score of the same exhausted CD8+ T cell.
- the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell. In some embodiments, the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell via a same single cell barcode.
- the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD4+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD4+ T cells. In some embodiments, the method further comprises identifying a clone size expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD4+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD4+ T cells.
- the method further comprises identifying a clone size in the group of exhausted is larger than the clone size in the group of non-exhausted expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD4+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD4+ T cells.
- the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD4+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD4+ T cells.
- the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD8+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD8+ T cells. In some embodiments, the method further comprises identifying a clone size expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD8+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD8+ T cells.
- the method further comprises identifying a clone size in the group of exhausted is larger than the clone size in the group of non-exhausted expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD8+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD8+ T cells.
- the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD8+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD8+ T cells.
- the method further comprises, prior to obtaining the single cell transcriptomic data, separating a subset of T cells from the population of T cells based on expression of a CD4+ and/or CD8+ exhaustion marker, thereby generating a subset of exhausted T cells and a subset of non-exhausted T cells.
- the method further comprises, prior to obtaining the single cell transcriptomic data, separating a subset of T cells from the population of T cells based on expression of a CD4+ and CD8+ exhaustion marker, thereby generating a subset of exhausted T cells and a subset of non-exhausted T cells.
- the method further comprises, prior to obtaining the single cell transcriptomic data, separating a subset of T cells from the population of T cells based on expression of a CD4+ or CD8+ exhaustion marker, thereby generating a subset of exhausted T cells and a subset of non-exhausted T cells.
- the CD4+ and/or CD8+ exhaustion marker comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65,
- separating comprises fluorescence activated cell sorting (FACS).
- the method further comprises sequencing the subset of exhausted T cells and the subset of non-exhausted T cells using single cell sequencing or bulk sequencing. In some embodiments, the sequencing does not comprise using a barcode.
- the population of T cells are obtained from a frozen sample or a fresh sample.
- the sample is a formalin-fixed paraffin-embedded (FFPE) sample.
- the sample is not a FFPE sample.
- the sample is obtained from a tumor of a subject.
- the subject has been treated with a therapy.
- the subject has been treated with the therapy prior to or concurrently with obtaining the sample.
- the therapy comprises an immune checkpoint inhibitor.
- the method further comprises preparing a pharmaceutical composition using the candidate tumor-reactive TCR clonotype or a cell expressing the candidate tumor-reactive TCR clonotype.
- the method comprises (a) providing single cell transcriptome data of the population of T cells.
- the method further comprises (b) classifying each T cell of the population of T cells as a CD4+ T cell or a CD8+ T cell based on an expression level of each classification gene of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least
- the method further comprises (c) calculating (i) a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers.
- GSEA CD4+ gene set enrichment analysis
- the method further comprises (c) calculating (i) a CD4+ exhaustion score or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers.
- GSEA CD4+ gene set enrichment analysis
- the method further comprises (c) calculating (i) a CD4+ exhaustion score and a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers.
- GSEA CD4+ gene set enrichment analysis
- the method comprising calculating a CD4+ exhaustion score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD4+ exhaustion gene markers (e.g., see Table 4).
- the method comprising calculating a CD4+ GSEA score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least
- the method further comprises (c) calculating (ii) a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers.
- the method further comprises (c) calculating (ii) a CD8+ exhaustion score and a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers.
- the method further comprises (c) calculating (ii) a CD8+ exhaustion score or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker a set of at least 2, at least
- the method comprises calculating a CD8+ exhaustion score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD8+ exhaustion gene markers (e.g., see Table 3).
- the method comprises calculating a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least
- the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers is different from the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers.
- each T cell within the CD4+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
- each T cell within the CD4+ cluster with an exhaustion score and a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
- each T cell within the CD4+ cluster with an exhaustion score or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
- the method further comprises each T cell within the CD8+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
- the method further comprises each T cell within the CD8+ cluster with an exhaustion score or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell. In some embodiments, the method further comprises each T cell within the CD8+ cluster with an exhaustion score and a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell. In some embodiments the method further comprises (d) identifying TCR clonotypes of the exhausted CD4+ T cells and exhausted CD8+ cells separately based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells identified in (c).
- Each TCR clonotype can comprise a paired TCR alpha chain and TCR beta chain from the single cell sequencing data, and each TCR clonotype can have a unique CDR3 sequence of a TCR beta chain and/or unique VDJ combination.
- the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 classification genes comprises at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 genes selected from the group consisting of PTPN13, TNFRSF4, CCR6, FOXP3, TSHZ2, MFHAS1, FAAH2, CD4, GK, IL2RA, CRADD, LIB, IRS2, KLRB 1, TNFRSF25, LINC02694, THAD A, BATF, TNFRSF18, SELL, IL I 2RB2, FURIN, HIPK2, MAP3K5, TMEM173, C
- classifying each T cell of the population of T cells comprises classifying each T cell of the population of T cells as a CD4+ cell and/or a CD8+ cell based on an expression level of each classification gene of a set of from 11 to 99 classification genes selected from the group consisting of PTPN13, TNFRSF4, CCR6, FOXP3, TSHZ2, MFHAS1, FAAH2, CD4, GK, IL2RA, CRADD, LTB, IRS2, KLRB1, TNFRSF25, LINC02694, THAD A, BATF, TNFRSF18, SELL, IL12RB2, FURIN, HIPK2, MAP3K5, TMEM173, CTSB, SAMHD1, ADAM19, ICOS, GNA15, EPSTI1, ZC3H12D, PHTF2, MAST4, UGP2, RAPGEF6, STAM, CTLA4, RORA, SATB1, ZEB1, PIM2, CD28, LDLRAD4, PELI1, RHBDD2, SOCS
- the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers comprises at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, or at least 20 genes selected from the group consisting of ADGRG1, CD200, CHN1, CPM, CTLA4, CXCL13, DRAIC, ENTPD1, GNG4, ICA1, IGFL2, IGFLR1, MY07A, PDE7B, NMB, PTPN13, TIGIT, TOX, TOX2, and TSHZ2.
- the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers comprises from 6 to 20 genes selected from the group consisting of ADGRG1, CD200, CHN1, CPM, CTLA4, CXCL13, DRAIC, ENTPD1, GNG4, ICA1, IGFL2, IGFLR1, MY07A, PDE7B, NMB, PTPN13, TIGIT, TOX, TOX2, and TSHZ2.
- the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers comprises at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, or at least 20 genes selected from the group consisting of ADGRG1, CD200, CHN1, CPM, CTLA4, CXCL13, DRAIC, ENTPD1, GNG4, ICA1, IGFL2, IGFLR1, MY07A, PDE7B, NMB, PTPN13, TIGIT, TOX, TOX2, and TSHZ2.
- the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers comprises from 6 to 20 genes selected from the group consisting of ADGRG1, CD200, CHN1, CPM, CTLA4, CXCL13, DRAIC, ENTPD1, GNG4, ICA1, IGFL2, IGFLR1, MYO7A, PDE7B, NMB, PTPN13, TIGIT, TOX, TOX2, and TSHZ2.
- the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers comprises at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, or at least 20 genes selected from the group consisting of CHN1, CLECL1, CTLA4, CXCL13, CXCR6, ENTPD1, HAVCR2, KRT86, LAG3, LAYN, LRRN3, MY01E, MY07A, PDCD1, SIRPG, SRGAP3, STAM, TIGIT, TNFRSF18, and TOX.
- the at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers comprises from 6 to 20 genes selected from the group consisting of CHN1, CLECL1, CTLA4, CXCL13, CXCR6, ENTPD1, HAVCR2, KRT86, LAG3, LAYN, LRRN3, MYO1E, MY07A, PDCD1, SIRPG, SRGAP3, STAM, TIGIT, TNFRSF18, and TOX.
- the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers comprises at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, or at least 20 genes selected from the group consisting of CHN1, CLECL1, CTLA4, CXCL13, CXCR6, ENTPD1, HAVCR2, KRT86, LAG3, LAYN, LRRN3, MY01E, MY07A, PDCD1, SIRPG, SRGAP3, STAM, TIGIT, TNFRSF18, and TOX.
- the at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers comprises from 6 to 20 genes selected from the group consisting of CHN1, CLECL1, CTLA4, CXCL13, CXCR6, ENTPD1, HAVCR2, KRT86, LAG3, LAYN, LRRN3, MYO1E, MY07A, PDCD1, SIRPG, SRGAP3, STAM, TIGIT, TNFRSF18, and TOX.
- calculating the CD4+ exhaustion score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each CD4+ exhaustion gene of the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers to obtain the expression level of each CD4+ exhaustion gene of the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers; (ii) scaling the UMI count by dividing the UMI count for each gene by the total number of UMIs of the single cell transcriptome data of the T cell and then multiplying the quotient by a scale factor; (iii) applying a logarithmic transformation to obtain normalized UMI counts for the set of at least 2, at least 3,
- the scale factor is 10,000. In some embodiments, the scale factor is about 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 10000, 15000, 20000, 25000, 50000, 100000 or more.
- the threshold value can vary depending on the number of exhaustion gene markers used. In some cases, at least 5 exhaustion gene markers are used, and the threshold value can be 0.3 or 0.35. In some cases, 20 exhaustion gene markers are used, and the threshold value can be 13.
- calculating the CD4+ exhaustion score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each CD4+ exhaustion gene of the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers to obtain the expression level of each CD4+ exhaustion gene of the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers; (ii) scaling the UMI count by diving the UMI count for each gene by the total number of UMIs of the single cell transcriptome data of the T cell and then multiplying the quotient by a scale factor; (iii) applying a logarithmic transformation to obtain normalized UMI counts for the set of at least 2, at least 3, at
- the scale factor is 10,000. In some embodiments, the scale factor is about 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 10000, 15000, 20000, 25000, 50000, 100000 or more.
- calculating the CD8+ exhaustion score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each CD8+ exhaustion gene of the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers to obtain the expression level of each gene of the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 exhaustion gene markers; (ii) scaling the UMI count by dividing the UMI count for each CD8+ exhaustion gene by the total number of UMIs of the single cell transcriptome data of the T cell and then multiplying the quotient by a scale factor; (iii) applying a logarithmic transformation to obtain normalized UMI counts for the set of at least 2, at least 3, at least 4,
- calculating the CD8+ exhaustion score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each CD8+ exhaustion gene of the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers to obtain the expression level of each gene of the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 exhaustion gene markers; (ii) scaling the UMI count for each CD8+ exhaustion gene by the total number of UMIs of the single cell transcriptome data of the T cell and then multiplying the quotient by a scale factor; (iii) applying a logarithmic transformation to obtain normalized UMI counts for the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least
- the scale factor is 10,000. In some embodiments, the scale factor is about 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 10000, 15000, 20000, 25000, 50000, 100000 or more.
- the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers comprises at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 genes selected from the group consisting of ADD3, AGFG1, AHI1, AP3S1, ARAP2, ARHGEF3, ATP2A2, CCDC6, CD200, CD27, CH25H, CHN1, CNIH1, COTL1, CPM, CRYBG1, CTLA4, CXCL13, DUSP4, ELMO1, FABP5, FBLN7, FBXO32, FKBP5, FOXN2, FYB1, GEM, GK, GPRIN3, GRSF1, GYPC, HIPK2, HMGB2, ICA1, IL6ST, IQGAP1, ITM2A
- the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers comprises from 6 to 88 genes selected from the group consisting of ADD3, AGFG1, AHI1, AP3S1, ARAP2, ARHGEF3, ATP2A2, CCDC6, CD200, CD27, CH25H, CHN1, CNIH1, COTL1, CPM, CRYBG1, CTLA4, CXCL13, DUSP4, ELMO1, FABP5, FBLN7, FBXO32, FKBP5, FOXN2, FYB1, GEM, GK, GPRIN3, GRSF1, GYPC, HIPK2, HMGB2, ICA1, IL6ST, IQGAP1, ITM2A, ITPR1, JARID2, LHFPL6, LIMSI, LRMP, LRRC8D, MAGEH1, MTHFD2, NAP1L4, NCOA7
- the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers comprises at least 5 genes selected from the group consisting of AHSA1, ALOX5AP, BAG3, BST2, CACYBP, CARD16, CD3D, CD7, CD82, CHN1, CLECL1, CLEC2B, CLEC2D, CTLA4, CTSD, CXCL13, CXCR6, DUSP4, ENTPD1, FKBP1A, GAPDH, GEM, GZMB, HAVCR2, HLA-DRB1, HSPB1, ICOS, IQGAP1, ITGAE, KRT86, LAG3, LAYN, LSP1, NAP1L4, NR3C1, PDCD1, PELI1, PHLDA1, POLR1E, PRDM1, PTPN22, RAB11FIP1, RAB27A, RBPJ, RGS1,
- the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers comprises from 6 to 61 genes selected from the group consisting of AHSA1, AL0X5AP, BAG3, BST2, CACYBP, CARD16, CD3D, CD7, CD82, CHN1, CLECL1, CLEC2B, CLEC2D, CTLA4, CTSD, CXCL13, CXCR6, DUSP4, ENTPD1, FKBP1A, GAPDH, GEM, GZMB, HAVCR2, HLA-DRB1, HSPB1, ICOS, IQGAP1, ITGAE, KRT86, LAG3, LAYN, LSP1, NAP1L4, NR3C1, PDCD1, PELI1, PHLDA1, P0LR1E, PRDM1, PTPN22, RAB11FIP1, RAB27A, RBPJ,
- calculating the CD4+ GSEA score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) increasing a running- sum statistic for each CD4+ exhaustion gene of all genes that appears in the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers and decreasing a running-sum statistic for each CD4+ exhaustion gene of all genes that does not appear in the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers; and (iv) calculating the CD4+ GSEA score based on running-sum
- calculating the CD8+ GSEA score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) increasing a running- sum statistic for each CD8+ exhaustion gene of all genes that appears in the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers and decreasing a running-sum statistic for each CD8+ exhaustion gene of all genes that does not appear in the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers; and (iv) calculating the CD8+ GSEA score based on running-sum
- calculating the CD4+ GSEA score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) calculating an area under the curve (AUC) value of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers; and (iv) calculating the CD4+ GSEA score based on AUC values, wherein the T cell with a CD4+ GSEA score equal to or higher than a cutoff value established from data distribution is identified as an exhausted CD4+ T cell.
- the cutoff value is score equal to or higher than 0.001, equal to or higher than 0.005, equal to or higher than 0.01, equal to or higher than 0.05, equal to or higher than 0.1, equal to or higher than 0. 15, equal to or higher than 0.25, equal to or higher than 0.3, equal to or higher than 0.35, equal to or higher than 0.4, equal to or higher than 0.45, equal to or higher than 0.5, equal to or higher than 0.55, equal to or higher than 0.6, equal to or higher than 0.65, equal to or higher than 0.7, equal to or higher than 0.75, equal to or higher than 0.8, equal to or higher than 0.85, or equal to or higher than 0.9.
- the cutoff value is 0.2.
- calculating in (iii) comprises assessing recovery of the set of at least 5 CD4+ exhaustion genes.
- the set of CD4+ exhaustion genes are selected among the top 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100 or more ranked genes from the UMI rank obtained in (ii).
- calculating the CD8+ GSEA score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) calculating an area under the curve (AUC) value of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers; and (iv) calculating the CD8+ GSEA score based on AUC values, wherein the T cell with a CD8+ GSEA score equal to or higher than a cutoff value established from data distribution is identified as an exhausted CD8+ T cell.
- the cutoff value is score equal to or higher than 0.001, equal to or higher than 0.005, equal to or higher than 0.01, equal to or higher than 0.05, equal to or higher than 0.1, equal to or higher than 0. 15, equal to or higher than 0.25, equal to or higher than 0.3, equal to or higher than 0.35, equal to or higher than 0.4, equal to or higher than 0.45, equal to or higher than 0.5, equal to or higher than 0.55, equal to or higher than 0.6, equal to or higher than 0.65, equal to or higher than 0.7, equal to or higher than 0.75, equal to or higher than 0.8, equal to or higher than 0 85, or equal to or higher than 0.9.
- the cutoff value is 0.3.
- calculating in (iii) comprises assessing recovery of the set of at least 5 CD8+ exhaustion genes.
- the set of CD8+ exhaustion genes are selected among the top 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100 or more ranked genes from the UMI rank obtained in (ii).
- the method further comprises calculating the CD4+ exhaustion score and the CD4+ GSEA score for the T cell of the CD4+ cluster. In some embodiments, the method further comprises calculating the CD8+ exhaustion score and the CD8+ GSEA score for the T cell of the CD8+ cluster. In some embodiments, the method further comprises identifying TCR clonotypes of the exhausted CD4+ T cells and exhausted CD8+ cells separately based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells identified in (c). In some embodiments, the exhausted CD4+ T cells have both the CD4+ exhaustion score and the CD4+ GSEA score above the threshold value.
- the exhausted CD8+ T cells have both the CD8+ exhaustion score and the CD8+ GSEA score above the threshold value. In some embodiments, the exhausted CD4+ T cells have the CD4+ exhaustion score or the CD4+ GSEA score above the threshold value. In some embodiments, the exhausted CD8+ T cells have the CD8+ exhaustion score or the CD8+ GSEA score above the threshold value. In some embodiments, for each TCR clonotype identified in a CD4+ exhausted T cell, calculating a mean or median CD4+ exhaustion score and/or a mean or median CD4+ GSEA score for all CD4+ exhausted T cells having the same TCR clonotype.
- identifying a maximum CD4+ exhaustion score and/or a maximum CD4+ GSEA score for all CD4+ exhausted T cells having the same TCR clonotype for each TCR clonotype identified in a CD4+ exhausted T cell, identifying a maximum CD4+ exhaustion score and/or a maximum CD4+ GSEA score for all CD4+ exhausted T cells having the same TCR clonotype.
- identifying a maximum CD8+ exhaustion score and/or a maximum CD8+ GSEA score for all CD8+ exhausted T cells having the same TCR clonotype identifying a maximum CD8+ exhaustion score and/or a maximum CD8+ GSEA score for all CD8+ exhausted T cells having the same TCR clonotype.
- a TCR clonotype of a given exhausted CD4+ T cell is matched to the CD4+ exhaustion score and/or the CD4+ GSEA score of the same exhausted CD4+ T cell.
- the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell.
- the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell via a same single cell barcode.
- the method can further comprise matching the TCR clonotype of a given exhausted CD4+ T cell to the single cell transcriptome data of the same exhausted CD4+ T cell.
- the method can further comprise matching a barcode of the TCR clonotype of a given exhausted CD4+ T cell to the same barcode of the single cell transcriptome data of the same exhausted CD4+ T cell.
- a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score and/or the CD8+ GSEA score of the same exhausted CD8+ T cell. In some embodiments, a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score and the CD8+ GSEA score of the same exhausted CD8+ T cell. In some embodiments, a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score or the CD8+ GSEA score of the same exhausted CD8+ T cell.
- the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell. In some embodiments, the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell via a same single cell barcode. In some embodiments, the method can further comprise matching the TCR clonotype of a given exhausted CD4+ T cell to the single cell transcriptome data of the same exhausted CD4+ T cell. In some embodiments, the method can further comprise matching a barcode of the TCR clonotype of a given exhausted CD4+ T cell to the same barcode of the single cell transcriptome data of the same exhausted CD4+ T cell.
- the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD4+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD4+ T cells.
- the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD4+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD4+ T cells.
- the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD8+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD8+ T cells.
- the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD8+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD8+ T cells.
- the method further comprises selecting candidate tumor-reactive TCR clonotypes from the TCR clonotypes identified for the exhausted CD4+ T cells and/or the exhausted CD8+ cells.
- the candidate tumor-reactive TCR clonotypes are further quality checked by (i) unique pairing of TCR alpha chain and TCR beta chain, (ii) match to known TCRs from a public database; and/or (iii) expression of innate immune cell markers.
- the quality checking comprises excluding candidate tumor-reactive TCR clonotypes which (i) have unique pairing of TCR alpha chain and TCR beta chain, (ii) match to known TCRs from a public database; and/or (iii) express innate immune cell markers.
- candidate tumor-reactive TCR clonotypes that match to a known TCR that recognizes a non-oncogenic pathogen are not selected.
- the method further comprises ranking the candidate tumor-reactive TCR clonotypes of the exhausted CD4+ T cells based on clone size.
- the method further comprises ranking the candidate tumor-reactive TCR clonotypes of the exhausted CD8+ T cells based on clone size. In some embodiments, the method further comprises ranking the candidate tumor- reactive TCR clonotypes with similar clone sizes based on the mean or median CD4+ exhaustion score, the maximum CD4+ exhaustion score, the mean or median CD4+ GSEA score, and/or the maximum CD4+ GSEA score for all CD4+ exhausted T cells.
- the method further comprises ranking the candidate tumor-reactive TCR clonotypes with similar clone sizes based on the mean or median CD8+ exhaustion score, the maximum CD8+ exhaustion score, the mean or median CD8+ GSEA score, and/or the maximum CD8+ GSEA score for all CD8+ exhausted T cells.
- the same TCR clonotype is determined by having the same CDR3 sequence.
- the candidate tumor-reactive TCR clonotypes that match to known TCRs are determined by having the same CDR3 sequence.
- the candidate tumor- reactive TCR clonotype of a proliferating cell is given a higher weighting value when ranking the candidate tumor-reactive TCR clonotypes.
- a proliferating cell is identified by gene expression.
- a proliferating cell is given a GSEA score based upon expression of genes associated with proliferation.
- a proliferating cell is identified as having a GSEA score that has a calculated area under the curve value above a cutoff value. In some embodiments, the cutoff value is 0.3.
- the genes associated with proliferation comprise one or more genes (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 genes) presented in Table 9.
- a median positive predictive value (PPV) of the prediction algorithm of the methods described herein is at least 0.001, at least 0.005, at least 0.01, at least 0.05, at least 0.1, at least 0.2, at least 0.25, at least 0.3, at least 0.35, at least 0.4, at least 0.45, at least 0.5, at least 0.55, at least 0.6, at least 0.65, at least 0.7, at least 0.75, at least 0.8, at least 0.85, or at least 0.9 for CD4+ TCR clones or the median PPV is at least 0.001, at least 0.005, at least 0.01, at least 0.05, at least 0.1, at least 0.2, at least 0.25, at least 0.3, at least 0.35, at least 0.4, at least 0.45, at least 0.5, at least 0.55, at least 0.6, at least 0.65, at least 0.7, at least
- the method further comprises selecting at least one, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 candidate tumor-reactive TCR clonotype from at least top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500 or more of the candidate tumor-reactive TCR clonotypes ranked.
- the performance of the end-to-end algorithm for selecting at least top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500 or more candidate tumor-reactive TCR clonotypes has a PPV value of at least 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36,
- a PPV closer to 1 represents a more accurate prediction method.
- a PPV may be used to determine the accuracy of the prediction method or algorithm.
- a PPV may be used to adjust the prediction method to accommodate for false positive results that may be generated by the method.
- the method further comprises delivering a nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype of the candidate tumor-reactive TCR clonotypes into a cell. In some embodiments, the method further comprises administering the nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype of the candidate tumor-reactive TCR clonotypes, or a cell comprising the nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype of the candidate tumor-reactive TCR clonotypes into a subject.
- the subject is the same subject where the population of T cells are obtained.
- the population of T cells are tumor- infiltrating lymphocytes (TILs).
- TILs tumor- infiltrating lymphocytes
- the population of T cells comprises at least 100, at least 500, at least 1,000, at least 2,000, at least 5,000, at least 10,000 or more cells.
- Also provided herein is a method of identifying one or more T-cell receptors as one or more candidate tumor-reactive TCRs from exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer, the method comprising: (a) providing single cell transcriptome data and single-cell T-cell receptor (scTCR) data of the population of T cells comprising exhausted CD4+ cells and exhausted CD8+ cells; and (b) identifying TCR clonotypes of the exhausted CD4+ T cells or the exhausted CD8+ cells based on the scTCR data of the exhausted CD4+ T cells or the exhausted CD8+ T cells, wherein the exhausted CD4+ T cells or the exhausted CD8+ T cells are identified based on the single cell transcriptome data.
- scTCR single-cell T-cell receptor
- the exhausted CD4+ T cells or the exhausted CD8+ T cells are identified by any one of the methods disclosed herein.
- each cell of the exhausted CD4+ T cells or the exhausted CD8+ T cells has an exhaustion score and/or a GSEA score equal to or higher than a threshold value.
- each cell of the exhausted CD4+ T cells or the exhausted CD8+ T cells has an exhaustion score and a GSEA score equal to or higher than a threshold value.
- each cell of the exhausted CD4+ T cells or the exhausted CD8+ T cells has an exhaustion score or a GSEA score equal to or higher than a threshold value.
- the candidate tumor-reactive TCR induces activation of NF AT.
- the candidate tumor-reactive TCR induces expression of CD69, IFN-y, TNF-a, IL-2, and/or IL-18.
- nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype selected by any one of the methods disclosed herein.
- a cell comprising a TCR comprising the at least one candidate tumor- reactive TCR clonotype selected by any one of the methods described herein. Also provided herein is a cell comprising a TCR encoded by any one of the nucleic acids disclosed herein. [0225] Further provided herein is a pharmaceutical composition comprising a TCR comprising the at least one candidate tumor-reactive TCR clonotype selected by any one of the methods disclosed herein and a pharmaceutically acceptable carrier. Further provided herein is a pharmaceutical composition comprising a TCR encoded by any one of the nucleic acids disclosed herein and a pharmaceutically acceptable carrier. Further provided herein is a pharmaceutical composition comprising any one of the cells disclosed herein, and a pharmaceutically acceptable carrier.
- TCR comprising the at least one candidate tumor-reactive TCR clonotype selected by the methods described above, the nucleic acid described above, the cell described above, or the pharmaceutical composition described above in the manufacturing of a medicament in treating a cancer in a subject in need thereof.
- the cancer is selected from the group consisting of bone cancer, blood cancer, lung cancer, liver cancer, pancreatic cancer, skin cancer, cancer of the head or neck, cutaneous or intraocular melanoma, uterine cancer, ovarian cancer, rectal cancer, cancer of the anal region, stomach cancer, colon cancer, breast cancer, prostate cancer, carcinoma of the sexual and reproductive organs, Hodgkin’s Disease, cancer of the esophagus, cancer of the small intestine, cancer of the endocrine system, cancer of the thyroid gland, cancer of the parathyroid gland, cancer of the adrenal gland, sarcoma of soft tissue, cancer of the bladder, cancer of the kidney, renal cell carcinoma, carcinoma of the renal pelvis neoplasms of the central nervous system (CNS), neuroectodermal cancer, spinal axis tumors glioma, meningioma, and pituitary adenoma.
- CNS central nervous system
- GSEA Gene Set Enrichment Analysis
- the process of identifying exhausted T cells from a population of T cells within a tumor microenvironment can be facilitated by the development of methods for recognizing these cells based on their transcriptome data and classifying them as CD4+ or CD8+ cells.
- This classification can be determined by the expression level of each classification gene from a defined set of genes. From this point, calculations can be made to determine the exhaustion score and/or GSEA score for each T cell.
- GSEA a computational method, determines whether a predefined set of genes demonstrates statistically significant, concordant differences between two biological states.
- GSEA can play a pivotal role in further classifying and identifying CD4+ and CD8+ T cells that may be regarded as exhausted based on a certain threshold value, providing insight into the cellular dynamics at play within a tumor microenvironment.
- the GSEA score described herein can be calculated using an area under the curve (AUC) test.
- the GSEA can be conducted by using an AUC scoring method which is implemented in a tool, for example, AUCell.
- AUCell uses the “Area Under the Curve” (AUC) to calculate whether a critical subset of the input gene set is enriched within the top expressed genes for each cell.
- AUC Average Under the Curve
- the distribution of AUC scores across all the cells can allow exploring the relative expression of the signature. Since the scoring method is ranking-based, AUCell can be independent of the gene expression units and the normalization procedure. In addition, since the cells are evaluated individually, it can easily be applied to bigger datasets, subsetting the expression matrix if needed.
- the first step to calculate the enrichment of a signature is to create the “rankings”.
- These rankings can be an intermediate step to calculate the AUC, but they are kept as a separate step in the workflow in order to provide more flexibility (e.g., to save them for future analyses, to merge datasets, or process them by parts).
- the genes For each cell, the genes can be ranked from highest to lowest value. The genes with same expression value can be shuffled. Therefore, genes with expression ‘0’ are randomly sorted at the end of the ranking. It may be important to check that most cells have at least the number of expressed/detected genes that are going to be used to calculate the AUC. In order to calculate the AUC, by default the top 5% of the genes in the ranking can be used. This can allow faster execution on bigger datasets and reduce the effect of the noise at the bottom of the ranking (e.g., where many genes might be tied at 0 counts). The percentage to be taken into account can be modified.
- the AUC can estimate the proportion of genes in the gene-set that are highly expressed in each cell. Cells expressing many genes from the gene-set can have higher AUC values than cells expressing fewer (compensating for housekeeping genes, or genes that are highly expressed in all the cells in the dataset). Because the AUC represents the proportion of expressed genes in the gene-set, the relative AUCs across the cells can be used to explore a population of cells that are present in the dataset according to the expression of the gene-set.
- the GSEA score can also be calculated by other methods, for example, the Kolmogorov- Smirnov (K-S) test.
- K-S test is a nonparametric test used to determine whether two underlying one- dimensional probability distributions differ, or to compare a sample with a reference probability distribution.
- the K-S test can be adapted to evaluate the distribution of genes within predefined sets, to see if they are randomly distributed across the ranked list of all genes in a dataset or if they tend to cluster towards the top or bottom of the list, indicating enrichment.
- the K-S test may involve the following steps. All genes in the study may be ranked based on their correlation with a phenotype or biological condition of interest.
- the ranking metric can vary but often involves measures of differential expression, such as fold change or statistical significance.
- the K-S test calculates an enrichment score (ES) that reflects the degree to which that gene set may be overrepresented at the top or bottom of the ranked list of genes.
- the ES is the maximum distance between the cumulative distribution function (CDF) of the gene set and the CDF of the background gene set. Starting from the top of the ranked list, the test moves down the list, increasing a running-sum statistic when encountering a gene in the gene set and decreasing it when encountering genes not in the set. The magnitude of the increment depends on the correlation of the genes with the phenotype.
- the ES may be the peak deviation from zero encountered in this walk - positive if the set is enriched at the top of the ranked list, and negative if enriched at the bottom.
- the ES can be normalized to yield a normalized enrichment score (NES), which allows comparison across gene sets of different sizes.
- NES normalized enrichment score
- the significance of the observed ES (or NES) can be typically assessed through permutation testing. By randomly permuting the phenotype labels or gene labels multiple times and recalculating the ES for each permutation, one can generate a null distribution of ES values against which the observed ES can be compared to estimate a p-value. Since many gene sets are tested simultaneously, correction for multiple hypothesis testing is often applied to control the false discovery rate (FDR).
- FDR false discovery rate
- the K-S test in the context of GSEA provides a powerful way to identify gene sets that are significantly associated with a phenotype, taking into account the collective behavior of genes within sets rather than focusing on individual genes. This approach can be particularly useful in exploring the biological mechanisms underlying complex traits and diseases.
- the ES is the maximum deviation from zero of R(i) across all positions / in the ranked list:
- This ES reflects how much the gene set G may be overrepresented at the top or bottom of the ranked list S, with higher absolute values indicating greater enrichment.
- the sign of the ES indicates whether the set may be enriched at the top (positive ES or bottom (negative ES) of the ranked list S.
- RNA-Seq Next generation sequencing
- NGS Next generation sequencing
- Digital Gene Expression Clonal Single MicroArray
- shotgun sequencing Maxim-Gilbert sequencing
- massively-parallel sequencing The T cells can be used as input for single-cell RNA-Seq methods such as inDrop or DropSeq.
- the sequencing may use single cell barcoding (e.g., partitioning the cells into individual compartment, barcoding nucleic acids released from a single cell, sequencing the nucleic acids, and pair the TCR chains from a single cell based on a same barcode).
- the sequencing may not comprise using a barcode if the sequence encoding the paired TCR chains within a cell has been fused or linked in a single continuous polynucleotide chain.
- Sequencing described herein can be single cell sequencing.
- Single cell sequencing refers to obtaining sequence information from individual cells. It can be used to detect the genome, transcriptome and other multi-omics of single cells.
- a population of cells can be made into single cell suspension and compartmentalized into individual partitions. Within each partition, the sequences released from a single cell can be barcoded and later sequenced.
- Various single cell sequencing methods can be used for TCR reconstruction (see De Simone M, Rossetti G and Pagani M (2016) Single Cell T Cell Receptor Sequencing: Techniques and Future Challenges. Front. Immunol. 9: 1638).
- Bulk sequencing also known as population or conventional sequencing, can be a technique where the genetic material (DNA or RNA) from a large population of cells is collectively extracted and sequenced. This approach, unlike single-cell sequencing, may provide an averaged view of the genetic or transcriptomic profile of all the cells within the sample, hence the term "bulk”.
- the bulk sequencing process can start with the isolation of genetic material from the cell population of interest.
- DNA may be extracted, purified, and fragmented.
- total RNA can be first isolated, and then mRNA can be either directly used or converted into cDNA via reverse transcription.
- the genetic material may be used to construct a sequencing library. This may involve adapter ligation and may also include amplification steps.
- the prepared library can be then sequenced using next-generation sequencing (NGS) platforms, such as Illumina, Ion Torrent, or Pacific Biosciences.
- NGS next-generation sequencing
- the resulting data may not allow for the resolution of individual cellular identities or states within the population. Instead, it offers an averaged “snapshot” of the cell population’s genetic or gene expression status. This could potentially mask the contributions of rare or highly variable cells within the population. Nevertheless, bulk sequencing can be useful for profiling large numbers of samples cost-effectively, establishing a baseline reference for a given tissue or cell type, or identifying common or dominant genetic or transcriptomic features of a population of cells.
- Sequencing described herein can be a single cell sequencing, which can be used for characterizing nucleic acids at a single-cell level.
- the single cell sequencing can use a droplet-based system.
- the single cell sequencing can use a droplet-based system that enables 5’ mRNA digital counting of up to tens of thousands of single cells.
- the single cell sequencing can use a droplet-based system that enables 5’ mRNA digital counting of up to hundreds of thousands of single cells, up to millions of single cells, or more.
- Various droplet-based systems can be used.
- the single cell analysis utilizes compartmentalization or partitioning of individual cells into discrete compartments or partitions (used interchangeably).
- a whole cell can be isolated in a compartment, thereby, allowing that cell to remain separate from other cells of the sample.
- the nucleic acids from a whole cell can be released into the compartment, for example, by contacting the cell with a lysis agent or other stimulus. The released nucleic acids can remain in the compartment, separated from other cells of the sample and also the nucleic acids associated with other cells of the sample.
- Unique identifiers may be previously, subsequently or concurrently delivered to the compartments that hold single cells, in order to allow for the later attribution of, e.g., sequence information, to a particular cell. While in the partitions, unique identifiers, e.g., barcodes or barcode sequences, can be associated with the nucleic acid sequences of nucleic acids from the whole cell using various processes, including ligation and/or amplification techniques. These barcode sequences can be used to determine the origin of a nucleic acid and/or to identify various nucleic acid sequences as being associated with a particular cell.
- Such identification can then allow that analysis to be attributed back to the individual cell or small group of cells from which the nucleic acids were derived. This can be accomplished regardless of whether the cell population represents a 50/50 mix of cell types, a 90/10 mix of cell types, or virtually any ratio of cell types, as well as a complete heterogeneous mix of different cell types, or any mixture between these.
- Differing cell types may include cells or biologic organisms from different tissue types of an individual, from different individuals, from differing genera, species, strains, variants, or any combination of any or all of the foregoing.
- differing cell types may include normal and tumor tissue from an individual, cells from a donor and a recipient (e.g., transplant), multiple different bacterial species, strains and/or variants from environmental, forensic, microbiome or other samples, or any of a variety of other mixtures of cell types.
- compartments comprise droplets of aqueous fluid within a non- aqueous continuous phase, e.g., an oil phase.
- compartments can refer to containers or vessels (such as wells, microwells, tubes, through ports in nanoarray substrates, or other containers). These compartments may comprise, e g., microcapsules or micro-vesicles that have an outer barrier surrounding an inner fluid center or core, or they may be a porous matrix that is capable of entraining and/or retaining materials within its matrix.
- a variety of different vessels are described in, for example, U.S. Patent Application Publication No. 20140155295, the full disclosure of which is incorporated herein by reference in its entirety for all purposes.
- allocating individual cells to discrete compartments may generally be accomplished by introducing a flowing stream of cells in an aqueous fluid into a flowing stream of a non-aqueous fluid, such that droplets are generated at the junction of the two streams.
- a flowing stream of cells in an aqueous fluid into a flowing stream of a non-aqueous fluid, such that droplets are generated at the junction of the two streams.
- the level of occupancy of the resulting partitions in terms of numbers of cells can be controlled.
- it may be desirable to control the relative flow rates of the fluids such that, on average, the partitions contain less than one cell per partition, in order to ensure that those partitions which are occupied, are primarily singly occupied.
- the flow rate can also be altered to provide a higher percentage of partitions that are occupied, e.g., allowing for only a small percentage of unoccupied partitions.
- the flows and channel architectures are controlled as to ensure a desired number of singly occupied partitions, less than a certain level of unoccupied partitions and/or less than a certain level of multiply occupied partitions.
- a droplet-based system disclosed herein can capture any suitable percentage of a cell population to be analyzed into compartments, e.g., droplets. In some cases, it is desirable to capture the entire cell population into droplets. In other cases, capture of a percentage of the cell population is desired or sufficient for downstream analysis and assay. In some embodiments, at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% of the cells of a cell sample are captured in a droplet using a droplet-based system provided herein.
- At most about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% of the cells of a cell sample are captured in a droplet using a droplet-based system provided herein. In some embodiments, approximately 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% of the cells of a cell sample are captured in a droplet using a droplet-based system provided herein.
- between about 10% and about 95%, between about 15% and about 90%, between about 20% and about 85%, between about 25% and about 80%, between about 30% and about 75%, between about 35% and about 70%, between about 40% and about 65%, between about 45% and about 60%, or between about 50% and about 55% of cells of a cell sample are captured in a droplet using a droplet- based system provided herein.
- the percentage of cells captured into droplets can be optimized for a particular type of assay. In some embodiments, approximately 50% of cells of a cell sample loaded into a droplet-based system are captured in a droplet.
- occupied partitions parts containing one or more microcapsules formed from methods and systems disclosed herein include no more than 1 cell per occupied partition. In some cases, fewer than 25% of the occupied partitions contain more than one cell, and in many cases, fewer than 20% of the occupied partitions have more than one cell, while in some cases, fewer than 10% or even fewer than 5% of the occupied partitions include more than one cell per partition.
- the Poisson distribution may increase the number of partitions that would include multiple cells.
- the flow of one or more of the cells, or other fluids directed into the partitioning zone are such that, in many cases, no more than 50% of the generated partitions, 25% of partitions, or 10% of partitions are unoccupied (e.g., including less than 1 cell). Further, in some aspects, these flows are controlled so as to present non-Poisson distribution of single occupied partitions while providing lower levels of unoccupied partitions.
- multiply occupied partitions e.g., containing two, three, four or more cells within a single partition.
- the flow characteristics of the cell and/or bead containing fluids and partitioning fluids may be controlled to provide for such multiply occupied partitions.
- the flow parameters may be controlled to provide a desired occupancy rate at greater than 50% of the partitions, greater than 75%, and in some cases greater than 80%, 85%, 90%, 95%, or higher.
- the partitions described herein can be characterized by having extremely small volumes, e g., less than 10 microliters ( ⁇ L), 5 ⁇ L, 1 ⁇ L, 900 nanoliters (nL), 500 nL, 100 nL, 50 nL, 1 nL, 900 picoliters ( ⁇ L), 800 ⁇ L, 700 ⁇ L, 600 ⁇ L, 500 ⁇ L, 400 ⁇ L, 300 ⁇ L, 200 ⁇ L, 100 ⁇ L, 50 ⁇ L, 20 ⁇ L, 10 ⁇ L, or 1 ⁇ L.
- extremely small volumes e g., less than 10 microliters ( ⁇ L), 5 ⁇ L, 1 ⁇ L, 900 nanoliters (nL), 500 nL, 100 nL, 50 nL, 1 nL, 900 picoliters ( ⁇ L), 800 ⁇ L, 700 ⁇ L, 600 ⁇ L, 500 ⁇ L, 400 ⁇ L, 300 ⁇ L, 200 ⁇ L, 100 ⁇ L, 50
- the droplets may have overall volumes that are less than 1000 ⁇ L, 900 ⁇ L, 800 ⁇ L, 700 ⁇ L, 600 ⁇ L, 500 ⁇ L, 400 ⁇ L, 300 ⁇ L, 200 ⁇ L, 100 ⁇ L, 50 ⁇ L, 20 ⁇ L, 10 ⁇ L, or even less than 1 ⁇ L.
- the sample fluid volume e.g., including co-partitioned cells, within the partitions may be less than 90% of the above described volumes, less than 80%, less than 70%, less than 60%, less than 50%, less than 40%, less than 30%, less than 20%, or even less than 10% the above described volumes.
- Multiple samples can be processed in parallel using droplet-based systems. In some embodiments, at least 2, 3, 4, 5, 6, 7, 8, 9, or 10 samples are processed in parallel.
- the multiple samples processed in parallel may comprise similar numbers of cells. In some cases, the multiple samples processed in parallel do not comprise similar numbers of cells.
- a cell population for analysis can comprise any number of cells.
- a cell sample loaded on a droplet-based system of the disclosure comprises at least about 100, 1,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 125,000, 150,000, 175,000, 200,000, 225,000, 250,000, 275,000, 300,000, 325,000, 350,000, 375,000, 400,000, 425,000, 450,000, 475,000, 500,000, 525,000, 550,000, 575,000, 600,000, 625,000, 650,000, 675,000, 700,000, 725,000, 750,000, 775,000, 800,000, 825,000, 850,000, 875,000, 900,000, 925,000, 950,000, 975,000, or 1,000,000 cells.
- a cell sample loaded on a droplet-based system of the disclosure comprises at most about 100, 1,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 125,000, 150,000, 175,000, 200,000, 225,000, 250,000, 275,000, 300,000, 325,000, 350,000, 375,000, 400,000, 425,000, 450,000, 475,000, 500,000, 525,000, 550,000, 575,000, 600,000, 625,000, 650,000, 675,000, 700,000, 725,000, 750,000, 775,000, 800,000, 825,000, 850,000, 875,000, 900,000, 925,000, 950,000, 975,000, or 1,000,000 cells.
- a cell sample loaded on a droplet-based system of the disclosure comprises approximately 100, 1,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 125,000, 150,000, 175,000, 200,000, 225,000, 250,000, 275,000, 300,000, 325,000, 350,000, 375,000, 400,000, 425,000, 450,000, 475,000, 500,000, 525,000, 550,000, 575,000, 600,000, 625,000, 650,000, 675,000, 700,000, 725,000, 750,000, 775,000, 800,000, 825,000, 850,000, 875,000, 900,000, 925,000, 950,000, 975,000, or 1,000,000 cells.
- partitioning species may generate a population of partitions.
- any suitable number of partitions can be generated to generate the population of partitions
- a population of partitions may be generated that comprises at least about 1,000 partitions, atleast about 5,000 partitions, at least about 10,000 partitions, at least about 50,000 partitions, at least about 100,000 partitions, at least about 500,000 partitions, at least about 1,000,000 partitions, at least about 5,000,000 partitions at least about 10,000,000 partitions, at least about 50,000,000 partitions, at least about 100,000,000 partitions, at least about 500,000,000 partitions or at least about 1,000,000,000 partitions.
- the population of partitions may comprise both unoccupied partitions (e.g., empty partitions) and occupied partitions.
- Single-cell RNA sequencing (scRNA-seq) and single-cell DNA sequencing (scDNA-seq) can be powerful technologies that provide an in-depth view of the genetic material within individual cells.
- the fundamental processes involved in these techniques may be cell isolation, lysis, reverse transcription (for scRNA-seq), amplification, library preparation, and sequencing.
- scRNA-seq the process starts with isolating individual cells from a sample. This can be done using techniques such as FACS (Fluorescence Activated Cell Sorting), microfluidic devices or droplet-based systems. Once the cells are isolated, the cells can be lysed to release the RNAs. The RNAs can then be reverse transcribed into complementary DNA (cDNA). The cDNA can be amplified, which increases the amount of material for downstream analysis. The cDNA library can be then prepared and sequenced. Advanced bioinformatics tools may be subsequently used to analyze the resulting data and generate gene expression profiles for each individual cell. Platforms like Smart-seq2 can provide full-length transcript information, allowing for detection of splice variants, while droplet- based systems like lOx Genomics Chromium excel at processing thousands of cells at a lower read depth per cell.
- FACS Fluorescence Activated Cell Sorting
- Single cell DNA sequencing can follow a similar process with several differences. Instead of isolating RNA and performing reverse transcription, the genomic DNA from the lysed cells can be directly used. Post cell lysis, the genomic DNA can be subjected to whole genome amplification (WGA) to produce sufficient DNA for sequencing. Various amplification techniques can be used, such as Multiple Displacement Amplification (MDA), Amplification via Strand Displacement Amplification (SDA), or MALBAC (Multiple Annealing and Looping Based Amplification Cycles). Following amplification, a sequencing library can be prepared and then sequenced. The resulting data can be used to identify genomic variants, TCR sequences and copy number variations at the single- cell level, unveiling cell-to-cell genomic heterogeneity, which is particularly important in cancer research.
- MDA Multiple Displacement Amplification
- SDA Strand Displacement Amplification
- MALBAC Multiple Annealing and Looping Based Amplification Cycles
- the nucleic acids sequenced in the methods described herein can be barcoded.
- the barcode can be a cell barcode or a molecular barcode. In some cases, a barcode may not be used and sequences are analyzed through bulk sequencing.
- Unique identifiers may be previously, subsequently or concurrently delivered to the partitions that hold the compartmentalized or partitioned cells.
- Barcodes which comprise a barcode sequence, may be delivered, in some embodiments, on an oligonucleotide (referred to interchangeably as a “barcoded oligonucleotide” or “oligonucleotide barcode”), to a partition via any suitable mechanism.
- barcoded oligonucleotides are delivered to a partition via a microcapsule.
- barcoded oligonucleotides are initially associated with the microcapsule and then released from the microcapsule upon application of a stimulus which allows the oligonucleotides to dissociate or to be released from the microcapsule.
- a microcapsule in some embodiments, comprises a bead.
- a bead may be porous, non-porous, solid, semi-solid, semi-fluidic, or fluidic.
- a bead may be dissolvable, disruptable, or degradable. In some cases, a bead may not be degradable.
- the bead may be a gel bead.
- a gel bead can be a hydrogel bead.
- a gel bead can be formed from molecular precursors, such as a polymeric or monomeric species.
- a semi-solid bead can be a liposomal bead.
- Solid beads can comprise metals including iron oxide, gold, and silver. In some cases, the beads are silica beads. In some cases, the beads are rigid. In some cases, the beads are flexible and/or compressible.
- the beads may contain molecular precursors (e.g., monomers or polymers), which may form a polymer network via polymerization of the precursors.
- a precursor may be an already polymerized species capable of undergoing further polymerization via, for example, a chemical cross- linkage.
- a precursor comprises one or more of an acrylamide or a methacrylamide monomer, oligomer, or polymer.
- the bead may comprise prepolymers, which are oligomers capable of further polymerization.
- polyurethane beads may be prepared using prepolymers.
- the bead may contain individual polymers that may be further polymerized together.
- beads may be generated via polymerization of different precursors, such that they comprise mixed polymers, co-polymers, and/or block co-polymers.
- a bead may comprise natural and/or synthetic materials.
- a polymer can be a natural polymer or a synthetic polymer.
- a bead comprises both natural and synthetic polymers.
- natural polymers include proteins and sugars such as deoxyribonucleic acid, rubber, cellulose, starch (e.g., amylose, amylopectin), proteins, enzymes, polysaccharides, silks, polyhydroxyalkanoates, chitosan, dextran, collagen, carrageenan, ispaghula, acacia, agar, gelatin, shellac, sterculia gum, xanthan gum, Com sugar gum, guar gum, gum karaya, agarose, alginic acid, alginate, or natural polymers thereof.
- proteins and sugars such as deoxyribonucleic acid, rubber, cellulose, starch (e.g., amylose, amylopectin), proteins, enzymes, polysaccharides, silk
- Examples of synthetic polymers include acrylics, nylons, silicones, spandex, viscose rayon, polycarboxylic acids, polyvinyl acetate, polyacrylamide, polyacrylate, polyethylene glycol, polyurethanes, polylactic acid, silica, polystyrene, polyacrylonitrile, polybutadiene, polycarbonate, polyethylene, polyethylene terephthalate, poly(chlorotrifluoroethylene), poly(ethylene oxide), poly(ethylene terephthalate), polyethylene, polyisobutylene, poly(methyl methacrylate), poly(oxymethylene), polyformaldehyde, polypropylene, polystyrene, poly(tetrafluoroethylene), poly(vinyl acetate), poly(vinyl alcohol), poly(vinyl chloride), poly(vinylidene dichloride), poly(vinylidene difluoride), poly(vinyl fluoride) and combinations (e.g., co-polymers) thereof.
- Beads may also be formed from materials other than polymers, including lipids, micelles, ceramics, glass-ceramics, material composites, metals, other inorganic materials, and others.
- a chemical cross-linker may be a precursor used to cross-link monomers during polymerization of the monomers and/or may be used to attach oligonucleotides (e.g., barcoded oligonucleotides) to the bead.
- polymers may be further polymerized with a cross-linker species or other type of monomer to generate a further polymeric network.
- Non-limiting examples of chemical cross-linkers include cystamine, gluteraldehyde, dimethyl suberimidate, N-Hydroxysuccinimide crosslinker BS3, formaldehyde, carbodiimide (EDC), SMCC, Sulfo-SMCC, vinylsilane, N,N’diallyltartardiamide (DATD), N,N’-Bis(acryloyl)cystamine (BAC), or homologs thereof.
- the crosslinker used in the present disclosure contains cystamine.
- Crosslinking may be permanent or reversible, depending upon the particular crosslinker used. Reversible crosslinking may allow for the polymer to linearize or dissociate under appropriate conditions. In some cases, reversible cross-linking may also allow for reversible attachment of a material bound to the surface of a bead. In some cases, a cross-linker may form disulfide linkages. In some cases, the chemical cross-linker forming disulfide linkages may be cystamine or a modified cystamine.
- disulfide linkages can be formed between molecular precursor units (e.g., monomers, oligomers, or linear polymers) or precursors incorporated into a bead and oligonucleotides.
- Cystamine (including modified cystamines), for example, is an organic agent comprising a disulfide bond that may be used as a crosslinker agent between individual monomeric or polymeric precursors of a bead.
- Polyacrylamide may be polymerized in the presence of cystamine or a species comprising cystamine (e.g., a modified cystamine) to generate polyacrylamide gel beads comprising disulfide linkages (e.g., chemically degradable beads comprising chemically-reducible cross-linkers).
- the disulfide linkages may permit the bead to be degraded (or dissolved) upon exposure of the bead to a reducing agent.
- chitosan a linear polysaccharide polymer
- glutaraldehyde via hydrophilic chains to form a bead.
- Crosslinking of chitosan polymers may be achieved by chemical reactions that are initiated by heat, pressure, change in pH, and/or radiation.
- the bead may comprise covalent or ionic bonds between polymeric precursors (e.g., monomers, oligomers, linear polymers), oligonucleotides, primers, and other entities.
- polymeric precursors e.g., monomers, oligomers, linear polymers
- oligonucleotides e.g., oligonucleotides, primers, and other entities.
- the covalent bonds comprise carbon-carbon bonds or thioether bonds.
- a bead may comprise an acrydite moiety, which in certain aspects may be used to attach one or more oligonucleotides (e.g., barcode sequence, barcoded oligonucleotide, primer, or other oligonucleotide) to the bead.
- an acrydite moiety can refer to an acrydite analogue generated from the reaction of acrydite with one or more species, such as, the reaction of acrydite with other monomers and cross-linkers during a polymerization reaction.
- Acrydite moieties may be modified to form chemical bonds with a species to be attached, such as an oligonucleotide (e.g., barcode sequence, barcoded oligonucleotide, primer, or other oligonucleotide).
- Acrydite moieties may be modified with thiol groups capable of forming a disulfide bond or may be modified with groups already comprising a disulfide bond.
- the thiol or disulfide via disulfide exchange) may be used as an anchor point for a species to be attached or another part of the acrydite moiety may be used for attachment.
- attachment is reversible, such that when the disulfide bond is broken (e g., in the presence of a reducing agent), the attached species is released from the bead.
- an acrydite moiety comprises a reactive hydroxyl group that may be used for attachment.
- Functionalization of beads for attachment of oligonucleotides may be achieved through a wide range of different approaches, including activation of chemical groups within a polymer, incorporation of active or activatable functional groups in the polymer structure, or attachment at the pre-polymer or monomer stage in bead production.
- precursors e.g., monomers, cross-linkers
- precursors that are polymerized to form a bead may comprise acrydite moieties, such that when a bead is generated, the bead also comprises acrydite moieties.
- the acrydite moieties can be attached to an oligonucleotide, such as a primer (e.g., a primer for amplifying target nucleic acids, barcoded oligonucleotide, etc.) that is desired to be incorporated into the bead.
- the primer comprises a P5 sequence for attachment to a sequencing flow cell for Illumina sequencing.
- the primer comprises a P7 sequence for attachment to a sequencing flow cell for Illumina sequencing. In some cases, the primer comprises a barcode sequence. In some cases, the primer further comprises a unique molecular identifier (UMI). In some cases, the primer comprises an R1 primer sequence for Illumina sequencing. In some cases, the primer comprises an R2 primer sequence for Illumina sequencing
- precursors comprising a functional group that is reactive or capable of being activated such that it becomes reactive can be polymerized with other precursors to generate gel beads comprising the activated or activatable functional group.
- the functional group may then be used to attach additional species (e.g., disulfide linkers, primers, other oligonucleotides, etc.) to the gel beads.
- additional species e.g., disulfide linkers, primers, other oligonucleotides, etc.
- some precursors comprising a carboxylic acid (COOH) group can co-polymerize with other precursors to form a gel bead that also comprises a COOH functional group.
- acrylic acid a species comprising free COOH groups
- acrylamide acrylamide
- bis(acryloyl)cystamine can be co-polymerized together to generate a gel bead comprising free COOH groups.
- the COOH groups of the gel bead can be activated (e.g., via l-Ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC) andN- Hydroxysuccinimide (NHS) or 4-(4,6-Dimethoxy-l,3,5-triazin-2-yl)-4-methylmorpholinium chloride (DMTMM)) such that they are reactive (e.g., reactive to amine functional groups where EDC/NHS or DMTMM are used for activation).
- EDC l-Ethyl-3-(3-dimethylaminopropyl)carbodiimide
- NHS N- Hydroxysuccinimide
- DTMM 4-(4,6-Dimethoxy-
- the activated COOH groups can then react with an appropriate species (e.g., a species comprising an amine functional group where the carboxylic acid groups are activated to be reactive with an amine functional group) comprising a moiety to be linked to the bead.
- an appropriate species e.g., a species comprising an amine functional group where the carboxylic acid groups are activated to be reactive with an amine functional group
- Beads comprising disulfide linkages in their polymeric network may be functionalized with additional species via reduction of some of the disulfide linkages to free thiols.
- the disulfide linkages may be reduced via, for example, the action of a reducing agent (e.g., DTT, TCEP, etc.) to generate free thiol groups, without dissolution of the bead.
- a reducing agent e.g., DTT, TCEP, etc.
- Free thiols of the beads can then react with free thiols of a species or a species comprising another disulfide bond (e.g., via thiol-disulfide exchange) such that the species can be linked to the beads (e g., via a generated disulfide bond).
- free thiols of the beads may react with any other suitable group.
- free thiols of the beads may react with species comprising an acrydite moiety.
- the free thiol groups of the beads can react with the acrydite via Michael addition chemistry, such that the species comprising the acrydite is linked to the bead.
- uncontrolled reactions can be prevented by inclusion of a thiol capping agent such as N-ethylmalieamide or iodoacetate.
- Activation of disulfide linkages within a bead can be controlled such that only a small number of disulfide linkages are activated. Control may be exerted, for example, by controlling the concentration of a reducing agent used to generate free thiol groups and/or concentration of reagents used to form disulfide bonds in bead polymerization. In some cases, a low concentration (e.g., molecules of reducing agent: gel bead ratios of less than about 10,000, 100,000, 1,000,000, 10,000,000, 100,000,000, 1,000,000,000, 10,000,000,000, or 100,000,000,000) of reducing agent may be used for reduction.
- a low concentration e.g., molecules of reducing agent: gel bead ratios of less than about 10,000, 100,000, 1,000,000, 10,000,000, 100,000,000, 1,000,000,000, 10,000,000,000, or 100,000,000,000
- Controlling the number of disulfide linkages that are reduced to free thiols may be useful in ensuring bead structural integrity during functionalization.
- optically-active agents such as fluorescent dyes may be coupled to beads via free thiol groups of the beads and used to quantify the number of free thiols present in a bead and/or track a bead.
- addition of moieties to a gel bead after gel bead formation may be advantageous.
- addition of an oligonucleotide (e.g., barcoded oligonucleotide) after gel bead formation may avoid loss of the species during chain transfer termination that can occur during polymerization.
- smaller precursors e.g., monomers or cross linkers that do not comprise side chain groups and linked moieties
- functionalization after gel bead synthesis can minimize exposure of species (e.g., oligonucleotides) to be loaded with potentially damaging agents (e.g., free radicals) and/or chemical environments.
- the generated gel may possess an upper critical solution temperature (UCST) that can permit temperature driven swelling and collapse of a bead.
- UCT upper critical solution temperature
- Such functionality may aid in oligonucleotide (e g., a primer) infiltration into the bead during subsequent functionalization of the bead with the oligonucleotide.
- Post-production functionalization may also be useful in controlling loading ratios of species in beads, such that, for example, the variability in loading ratio is minimized.
- Species loading may also be performed in a batch process such that a plurality of beads can be functionalized with the species in a single batch.
- an acrydite moiety linked to precursor, another species linked to a precursor, or a precursor itself comprises a labile bond, such as chemically, thermally, or photo-sensitive bonds e g., disulfide bonds, UV sensitive bonds, or the like.
- a labile bond such as chemically, thermally, or photo-sensitive bonds e g., disulfide bonds, UV sensitive bonds, or the like.
- the bead may also comprise the labile bond.
- the labile bond may be, for example, useful in reversibly linking (e.g., covalently linking) species (e.g., barcodes, primers, etc.) to a bead.
- a thermally labile bond may include a nucleic acid hybridization based attachment, e.g., where an oligonucleotide is hybridized to a complementary sequence that is attached to the bead, such that thermal melting of the hybrid releases the oligonucleotide, e.g., a barcode containing sequence, from the bead or microcapsule.
- a nucleic acid hybridization based attachment e.g., where an oligonucleotide is hybridized to a complementary sequence that is attached to the bead, such that thermal melting of the hybrid releases the oligonucleotide, e.g., a barcode containing sequence, from the bead or microcapsule.
- labile bonds may result in the generation of a bead capable of responding to varied stimuli.
- Each type of labile bond may be sensitive to an associated stimulus (e.g., chemical stimulus, light, temperature, etc.) such that release of species attached to a bead via each labile bond may be controlled by the application of the appropriate stimulus.
- an associated stimulus e.g., chemical stimulus, light, temperature, etc.
- Such functionality may be useful in controlled release of species from a gel bead.
- another species comprising a labile bond may be linked to a gel bead after gel bead formation via, for example, an activated functional group of the gel bead as described above.
- barcodes that are releasably, cleavably or reversibly attached to the beads described herein include barcodes that are released or releasable through cleavage of a linkage between the barcode molecule and the bead, or that are released through degradation of the underlying bead itself, allowing the barcodes to be accessed or accessible by other reagents, or both.
- the barcodes that are releasable as described herein may sometimes be referred to as being activatable, in that they are available for reaction once released.
- an activatable barcode may be activated by releasing the barcode from a bead (or other suitable type of partition described herein).
- Other activatable configurations are also envisioned in the context of the described methods and systems.
- labile bonds that may be coupled to a precursor or bead include an ester linkage (e.g., cleavable with an acid, a base, or hydroxylamine), a vicinal diol linkage (e.g., cleavable via sodium periodate), a Diels-Alder linkage (e.g., cleavable via heat), a sulfone linkage (e.g., cleavable via a base), a silyl ether linkage (e.g., cleavable via an acid), a glycosidic linkage (e g., cleavable via an amylase), a peptide linkage (e.g., cleavable via a protease), or a phosphodiester linkage (e.g., cleavable via a nuclease), or a phosphodiester linkage (e.g., cleavable via a nuclel link
- Species that do not participate in polymerization may also be encapsulated in beads during bead generation (e.g., during polymerization of precursors). Such species may be entered into polymerization reaction mixtures such that generated beads comprise the species upon bead formation. In some cases, such species may be added to the beads after formation.
- Such species may include, for example, oligonucleotides, reagents for a nucleic acid amplification reaction (e g., primers, polymerases, dNTPs, co-factors (e.g., ionic co-factors)) including those described herein, reagents for enzymatic reactions (e.g., enzymes, co-factors, substrates), or reagents for a nucleic acid modification reactions such as polymerization, ligation, or digestion. Trapping of such species may be controlled by the polymer network density generated during polymerization of precursors, control of ionic charge within the gel bead (e.g., via ionic species linked to polymerized species), or by the release of other species. Encapsulated species may be released from a bead upon bead degradation and/or by application of a stimulus capable of releasing the species from the bead.
- reagents for a nucleic acid amplification reaction e g.,
- Beads may be of uniform size or heterogeneous size.
- the diameter of a bead may be about 1 pm, 5 pm, 10 pm, 20 pm, 30 pm, 40 pm, 50 pm, 60 pm, 70 pm, 80 pm, 90 pm, 100 pm, 250 pm, 500 pm, or 1 mm.
- a bead may have a diameter of at least about 1 pm, 5 pm, 10 pm, 20 pm, 30 pm, 40 pm, 50 pm, 60 pm, 70 pm, 80 pm, 90 pm, 100 pm, 250 pm, 500 pm, 1 mm, or more.
- a bead may have a diameter of less than about 1 pm, 5 pm, 10 pm, 20 pm, 30 pm, 40 pm, 50 pm, 60 pm, 70 pm, 80 pm, 90 pm, 100 pm, 250 pm, 500 pm, or 1 mm. In some cases, a bead may have a diameter in the range of about 40-75 gm, 30-75 gm, 20-75 gm, 40-85 gm, 40-95 gm, 20-100 gm, 10-100 gm, 1-100 gm, 20-250 gm, or 20-500 gm.
- beads are provided as a population or plurality of beads having a relatively monodisperse size distribution. Where it may be desirable to provide relatively consistent amounts of reagents within partitions, maintaining relatively consistent bead characteristics, such as size, can contribute to the overall consistency.
- the beads described herein may have size distributions that have a coefficient of variation in their cross-sectional dimensions of less than 50%, less than 40%, less than 30%, less than 20%, and in some cases less than 15%, less than 10%, or even less than 5%.
- Beads may be of any suitable shape. Examples of bead shapes include, but are not limited to, spherical, non-spherical, oval, oblong, amorphous, circular, cylindrical, and variations thereof.
- the beads may be degradable, disruptable, or dissolvable spontaneously or upon exposure to one or more stimuli (e.g., temperature changes, pH changes, exposure to particular chemical species or phase, exposure to light, reducing agent, etc.).
- a bead may be dissolvable, such that material components of the beads are solubilized when exposed to a particular chemical species or an environmental change, such as a change temperature or a change in pH.
- a gel bead is degraded or dissolved at elevated temperature and/or in basic conditions.
- a bead may be thermally degradable such that when the bead is exposed to an appropriate change in temperature (e.g., heat), the bead degrades.
- Degradation or dissolution of a bead bound to a species may result in release of the species from the bead.
- a degradable bead may comprise one or more species with a labile bond such that, when the bead/species is exposed to the appropriate stimuli, the bond is broken, and the bead degrades.
- the labile bond may be a chemical bond (e.g., covalent bond, ionic bond) or may be another type of physical interaction (e.g., van der Waals interactions, dipole-dipole interactions, etc.).
- a crosslinker used to generate a bead may comprise a labile bond.
- the labile bond can be broken, and the bead degraded.
- the disulfide bonds of the cystamine can be broken and the bead degraded.
- a degradable bead may be useful in more quickly releasing an attached species (e.g., an oligonucleotide, a barcode sequence, a primer, etc.) from the bead when the appropriate stimulus is applied to the bead as compared to a bead that does not degrade.
- an attached species e.g., an oligonucleotide, a barcode sequence, a primer, etc.
- the species may have greater mobility and accessibility to other species in solution upon degradation of the bead.
- a species may also be attached to a degradable bead via a degradable linker (e.g., disulfide linker).
- the degradable linker may respond to the same stimuli as the degradable bead, or the two degradable species may respond to different stimuli.
- a barcode sequence may be attached, via a disulfide bond, to a polyacrylamide bead comprising cystamine.
- the bead Upon exposure of the barcoded-bead to a reducing agent, the bead degrades, and the barcode sequence is released upon breakage of both the disulfide linkage between the barcode sequence and the bead and the disulfide linkages of the cystamine in the bead.
- a degradable bead may be introduced into a partition, such as a droplet of an emulsion or a well, such that the bead degrades within the partition and any associated species (e.g., oligonucleotides) are released within the droplet when the appropriate stimulus is applied.
- the free species e.g., oligonucleotides
- a polyacrylamide bead comprising cystamine and linked, via a disulfide bond, to a barcode sequence, may be combined with a reducing agent within a droplet of a water-in-oil emulsion.
- the reducing agent breaks the various disulfide bonds resulting in bead degradation and release of the barcode sequence into the aqueous, inner environment of the droplet.
- heating of a droplet comprising a bead-bound barcode sequence in basic solution may also result in bead degradation and release of the attached barcode sequence into the aqueous, inner environment of the droplet.
- degradation may refer to the disassociation of a bound or entrained species from a bead, both with and without structurally degrading the physical bead itself.
- entrained species may be released from beads through osmotic pressure differences due to, for example, changing chemical environments.
- alteration of bead pore sizes due to osmotic pressure differences can generally occur without structural degradation of the bead itself.
- an increase in pore size due to osmotic swelling of a bead can permit the release of entrained species within the bead.
- osmotic shrinking of a bead may cause a bead to better retain an entrained species due to pore size contraction.
- degradable beads it may be desirable to avoid exposing such beads to the stimulus or stimuli that cause such degradation prior to the desired time, in order to avoid premature bead degradation and issues that arise from such degradation, including for example poor flow characteristics and aggregation.
- beads comprise reducible cross-linking groups, such as disulfide groups
- reducing agents e.g., DTT or other disulfide cleaving reagents.
- treatment to the beads described herein will, in some cases be provided free of reducing agents, such as DTT.
- reducing agent free (or DTT free) enzyme preparations in treating the beads described herein.
- enzymes include, e.g., polymerase enzyme preparations, reverse transcriptase enzyme preparations, ligase enzyme preparations, as well as many other enzyme preparations that may be used to treat the beads described herein.
- the terms “reducing agent free” or “DTT free” preparations can refer to a preparation having less than l/10th, less than l/5Oth, and even less than 1/lOOth of the lower ranges for such materials used in degrading the beads.
- the reducing agent free preparation will typically have less than 0.01 mM, 0.005 mM, 0.001 mM DTT, 0.0005 mM DTT, or even less than 0.0001 mM DTT. In many cases, the amount of DTT will be undetectable.
- a stimulus may be used to trigger degradation of the bead, which may result in the release of contents from the bead.
- a stimulus may cause degradation of the bead structure, such as degradation of the covalent bonds or other types of physical interaction.
- These stimuli may be useful in inducing a bead to degrade and/or to release its contents. Examples of stimuli that may be used include chemical stimuli, thermal stimuli, optical stimuli (e g., light) and any combination thereof, as described more fully below.
- Numerous chemical triggers may be used to trigger the degradation of beads. Examples of these chemical changes may include, but are not limited to pH-mediated changes to the integrity of a component within the bead, degradation of a component of a bead via cleavage of cross-linked bonds, and depolymerization of a component of a bead.
- a bead may be formed from materials that comprise degradable chemical crosslinkers, such as BAC or cystamine. Degradation of such degradable crosslinkers may be accomplished through a number of mechanisms.
- a bead may be contacted with a chemical degrading agent that may induce oxidation, reduction or other chemical changes.
- a chemical degrading agent may be a reducing agent, such as dithiothreitol (DTT).
- reducing agents may include P-mercaptoethanol, (2S)-2-amino-l,4-dimercaptobutane (dithiobutylamine or DTBA), tris(2-carboxyethyl) phosphine (TCEP), or combinations thereof.
- a reducing agent may degrade the disulfide bonds formed between gel precursors forming the bead, and thus, degrade the bead.
- a change in pH of a solution such as an increase in pH, may trigger degradation of a bead.
- exposure to an aqueous solution, such as water may trigger hydrolytic degradation, and thus degradation of the bead.
- Beads may also be induced to release their contents upon the application of a thermal stimulus.
- a change in temperature can cause a variety of changes to a bead. For example, heat can cause a solid bead to liquefy. A change in heat may cause melting of a bead such that a portion of the bead degrades. In other cases, heat may increase the internal pressure of the bead components such that the bead ruptures or explodes. Heat may also act upon heat-sensitive polymers used as materials to construct beads.
- changes in temperature or pH may be used to degrade thermo-sensitive or pH-sensitive bonds within beads.
- chemical degrading agents may be used to degrade chemical bonds within beads by oxidation, reduction or other chemical changes.
- a chemical degrading agent may be a reducing agent, such as DTT, wherein DTT may degrade the disulfide bonds formed between a crosslinker and gel precursors, thus degrading the bead.
- a reducing agent may be added to degrade the bead, which may or may not cause the bead to release its contents.
- reducing agents may include dithiothreitol (DTT), 0- mercaptoethanol, (2S)-2-amino-l,4-dimercaptobutane (dithiobutylamine or DTBA), tris(2- carboxyethyl) phosphine (TCEP), or combinations thereof.
- the reducing agent may be present at a concentration of about 0.1 mM, 0.5 mM, 1 mM, 5 mM, or 10 mM.
- the reducing agent may be present at a concentration of at least about 0.1 mM, 0.5 mM, 1 mM, 5 mM, 10 mM, or greater.
- the reducing agent may be present at concentration of at most about 0.1 mM, 0.5 mM, 1 mM, 5 mM, or 10 mM.
- nucleic acid molecules e.g., primer, e.g., barcoded oligonucleotide
- primer e.g., barcoded oligonucleotide
- the pre-defined concentration of the primer is limited by the process of producing oligonucleotide bearing beads.
- the multiple beads within a single partition may comprise different reagents associated therewith.
- the flow and frequency of the different beads into the channel or junction may be controlled to provide for the desired ratio of microcapsules from each source, while ensuring the desired pairing or combination of such beads into a partition with the desired number of cells.
- the single-cell sequencing methods disclosed herein can include obtaining sequence information by molecularly indexing the targets from one or more of the isolated cells from the sample.
- the targets can, for example, be polynucleotides.
- the polynucleotides can be, for example, DNA or RNA (e.g., mRNA).
- Molecular indexing (sometimes referred to as molecular barcoding or molecular tagging) can be used, for example, for high-sensitivity single molecular counting.
- a collection of identical polynucleotide molecules from one or more of the isolated cells can be attached to a diverse set of labels for molecular indexing.
- Each of the labels can comprise, for example, a molecular label (also known as molecular index).
- the method comprises molecularly indexing the polynucleotides from 1, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 600, 800, 1000, 2000, 5000, 10000 cells, or a number or a range between any two of these values.
- Molecular indexing can, for example, be used to identify the origin of an indexed polynucleotide (e g., indicating from which tissue, cell and/or container the indexed polynucleotide is from) and/or to inform the identity of the indexed polynucleotide.
- the container can be a plate, a well, a droplet, a partition, a tube, or like.
- the indexed polynucleotide can comprise, for example, the polynucleotide to be indexed (e.g., an mRNA, a genomic DNA, or a cDNA) and a label region comprising one or more labels.
- the indexed polynucleotide can further comprise one or more of a universal PCR region and an adaptor region.
- the indexed polynucleotide can be situated in a container (e.g., a microtiter plate), and the indexed polynucleotide can further include a unique label (e.g., a sample barcode) for identifying the plate in which the index polynucleotide is situated.
- a unique label e.g., a sample barcode
- An example of the region for identifying the plate is a plate index.
- the label region can, in some embodiments, comprise two or more labels.
- the label region can include a molecular label (also known as a molecular index) and a sample label (also known as a sample barcode).
- the length of the labels can vary.
- the label e.g., the molecular label or the sample label
- the label can be, or be about, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20 nucleotides in length, or longer.
- the molecular label is, or is about, 5 nucleotides in length
- the sample label is, or is about, 5 nucleotides in length.
- the molecular label is, or is about, 10 nucleotides in length
- the sample label is, or is about, 10 nucleotides in length.
- molecularly indexing the polynucleotides comprises generating a molecularly indexed polynucleotide library from one or more of the isolated cells.
- Generating a molecularly indexed polynucleotide library includes generating a plurality of indexed polynucleotides from the one or more of the isolated cells.
- the label region of the first indexed polynucleotide can differ from the label region of the second indexed polynucleotide by at least one, two, three, four, or five nucleotides.
- generating a molecularly indexed polynucleotide library includes contacting a plurality of mRNA molecules with a plurality of oligonucleotides including a poly(T) region and a label region; and conducting a first strand synthesis using a reverse transcriptase to produce single-strand labeled cDNA molecules each comprising a cDNA region and a label region, wherein the plurality of mRNA molecules includes at least two mRNA molecules of different sequences and the plurality of oligonucleotides includes at least two oligonucleotides of different sequences.
- the -TI- library can further comprise amplifying the single-strand labeled cDNA molecules to produce double- strand labeled cDNA molecules; and conducting nested PCR on the double-strand labeled cDNA molecules to produce labeled amplicons.
- the method can include generating an adaptor-labeled amplicon.
- Molecular indexing uses nucleic acid barcodes or tags to label individual DNA or RNA molecules. In some embodiments, it involves adding DNA barcodes or tags to cDNA molecules as they are generated from mRNA. Nested PCR can be performed to minimize PCR amplification bias. Adaptors can be added for sequencing using, for example, NGS.
- the method provided herein can comprise, prior to obtaining the single cell transcriptomic data, separating a subset of T cells from the population of T cells based on expression of a CD4+ and/or CD8+ exhaustion marker, thereby generating a subset of exhausted T cells and a subset of non- exhausted T cells.
- the CD4+ and/or CD8+ exhaustion marker comprises at least 5 genes selected from the group consisting of genes in Tables 3-6.
- separating comprises using flow cytometry.
- the flow cytometry can be fluorescence activated cell sorting (FACS).
- isolating one or more cells of interest in the enriched cell sample can be performed with a flow cytometer.
- the flow cytometer utilizes fluorescence- activated cell sorting.
- Flow cytometry is a valuable method for the analysis and isolation of cells. As such it has a wide range of diagnostic and therapeutic applications. Flow cytometry can utilize a fluid stream to linearly segregate cells such that they can pass, single file, through a detection apparatus. Individual cells can be distinguished according to their location in the fluid stream and the presence of detectable markers. Cells flow through the focused interrogation point where at least one laser directs a laser beam to a focused point within the channel. The sample fluid containing cells is hydrodynamically focused to a very small core diameter by flowing sheath fluid around the sample stream at a very high volumetric rate.
- the small core diameter can be fewer than 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200 micrometers, or a number or a range between any two of these values.
- the volumetric rate of the sheath fluid can be on the order of at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 times, or a number or a range between any two of these values, the volumetric rate of the sample. This results in very fast linear velocities for the focused cells on the order of meters per second.
- each cell spends a very limited time in the excitation spot, for example fewer than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 microseconds, or a number or a range between any two of these values.
- Flow cytometers are analytical tools that enable the characterization of cells on the basis of optical parameters such as light scatter and fluorescence.
- a flow cytometer cells in a fluid suspension are passed by a detection region in which the cells are exposed to an excitation light, typically from one or more lasers, and the light scattering and fluorescence properties of the cells are measured.
- Cells or components thereof can be labeled with fluorescent dyes to facilitate detection.
- a multiplicity of different cells or components can be simultaneously detected by using spectrally distinct fluorescent dyes to label the different cells or components.
- a multiplicity of photodetectors, one for each of the scatter parameters to be measured, and one for each of the distinct dyes to be detected are included in the analyzer.
- the data obtained comprise the signals measured for each of the light scatter parameters and the fluorescence emissions.
- the medicament can be an identified TCR, a nucleic acid encoding the TCR, a cell comprising the TCR or the nucleic acid encoding the TCR described herein.
- the identified TCR, the nucleic acid encoding the TCR, or the cell comprising the TCR or the nucleic acid encoding the TCR described herein can be formulated as a pharmaceutical composition with additional adjuvants or pharmaceutically acceptable carriers or excipients.
- the pharmaceutical composition described herein can comprise a population of cells (e g., immune cells or T cells) comprising the identified tumor-reactive TCRs.
- a given cell of the population of cells can express a single tumor-reactive TCR of the identified tumor-reactive TCRs.
- each cell of the population of cells can express a tumor-reactive TCR of the identified tumor-reactive TCRs.
- each cell of the population of cells can express a different or a unique tumor-reactive TCR of the identified tumor-reactive TCRs.
- the population of cells can comprise at least 10 5 cells, at least 10 6 cell, at least 10 7 cells, at least 10 8 cells, at least 10 9 cells, at least 10 10 cells, at least 10 11 cells, at least 10 12 cells, at least 10 13 cells, at least 10 14 cells, at least 10 15 cells, at least 10 16 cells, least most 10 17 cells, at least 10 18 cells, at least 10 19 cells, at least 10 2 ° cells or more cells.
- a pharmaceutical composition comprising an active agent such as an immune cell comprising the TCRs described herein, in combination with one or more adjuvants can be formulated in conventional manner using one or more physiologically acceptable carriers, comprising excipients, diluents, and/or auxiliaries, e.g., which facilitate processing of the active agents into preparations that can be administered. Proper formulation can depend at least in part upon the route of administration chosen.
- the agent(s) described herein can be delivered to a patient using a number of routes or modes of administration, including oral, buccal, topical, rectal, transdermal, transmucosal, subcutaneous, intravenous, and intramuscular applications, as well as by inhalation.
- the active agents can be formulated for parenteral administration (e.g., by injection, for example bolus injection or continuous infusion) and can be presented in unit dose form in ampoules, pre-fdled syringes, small volume infusion or in multi-dose containers with an added preservative.
- the compositions can take such forms as suspensions, solutions, or emulsions in oily or aqueous vehicles, for example solutions in aqueous polyethylene glycol.
- a pharmaceutical composition comprised of the identified TCR, nucleic acid encoding the TCR, or a cell comprising the TCR can further comprise an acceptable additive in order to improve the stability of immune cells in the composition.
- Acceptable additives may not alter the specific activity of the immune cells. Examples of acceptable additives include, but are not limited to, a sugar such as mannitol, sorbitol, glucose, xylitol, trehalose, sorbose, sucrose, galactose, dextran, dextrose, fructose, lactose and mixtures thereof. Acceptable additives can be combined with acceptable carriers and/or excipients such as dextrose.
- examples of acceptable additives include, but are not limited to, a surfactant such as polysorbate 20 or polysorbate 80 to increase stability of the peptide and decrease gelling of the solution.
- the surfactant can be added to the composition in an amount of 0.01% to 5% of the solution. Addition of such acceptable additives increases the stability and half-life of the composition in storage.
- compositions of the identified TCR, nucleic acid encoding the TCR, or a cell comprising the TCR are considered for use in medicaments or any of the methods provided herein, it is contemplated that the composition can be substantially free of pyrogens such that the composition will not cause an inflammatory reaction or an unsafe allergic reaction when administered to a human patient.
- Testing compositions for pyrogens and preparing compositions substantially free of pyrogens are well understood to one or ordinary skill of the art and can be accomplished using commercially available kits.
- Acceptable carriers can contain a compound that acts as a stabilizing agent, increases or delays absorption, or increases or delays clearance.
- a compound that acts as a stabilizing agent include, for example, carbohydrates, such as glucose, sucrose, or dextrans; low molecular weight proteins; compositions that reduce the clearance or hydrolysis of peptides; or excipients or other stabilizers and/or buffers.
- Agents that delay absorption include, for example, aluminum monostearate and gelatin. Detergents can also be used to stabilize or to increase or decrease the absorption of the pharmaceutical composition, including liposomal carriers.
- the compound can be complexed with a composition to render it resistant to acidic and enzymatic hydrolysis, or the compound can be complexed in an appropriately resistant carrier such as a liposome.
- Means of protecting compounds from digestion are known in the art (e. ., Fix (1996) Pharm Res. 13: 1760 1764; Samanen (1996) J. Pharm. Pharmacol. 48: 119 135; and U.S. Pat. No. 5,391,377).
- the vehicle can be chosen from those known in art to be suitable, including aqueous solutions or oil suspensions, or emulsions, with sesame oil, com oil, cottonseed oil, or peanut oil, as well as elixirs, mannitol, dextrose, or a sterile aqueous solution, and similar pharmaceutical vehicles.
- the formulation can also comprise polymer compositions which are biocompatible, biodegradable, such as poly(lactic-co-glycolic)acid. These materials can be made into micro or nanospheres, loaded with drug and further coated or derivatized to provide superior sustained release performance.
- Vehicles suitable for periocular or intraocular injection include, for example, suspensions of therapeutic agent in inj ection grade water, liposomes and vehicles suitable for lipophilic substances.
- Other vehicles for periocular or intraocular injection are well known in the art.
- compositions for intravenous administration are solutions in sterile isotonic aqueous buffer.
- the composition can also include a solubilizing agent and a local anesthetic such as lidocaine to ease pain at the site of the injection.
- the ingredients are supplied either separately or mixed together in unit dosage form, for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent.
- composition is to be administered by infusion, it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline.
- an ampoule of sterile water for injection or saline can be provided so that the ingredients can be mixed prior to administration.
- the active agent can be formulated in aqueous solutions, specifically in physiologically compatible buffers such as Hank’s solution, Ringer’s solution, or physiological saline buffer.
- the solution can contain formulatory agents such as suspending, stabilizing and/or dispersing agents.
- the pharmaceutical composition does not comprise an adjuvant or any other substance added to enhance the immune response.
- the active agents can also be formulated as a depot preparation.
- Such long-acting formulations can be administered by implantation or transcutaneous delivery (for example subcutaneously or intramuscularly), intramuscular injection or use of a transdermal patch.
- the agents can be formulated with suitable polymeric or hydrophobic materials (for example as an emulsion in an acceptable oil) or ion exchange resins, or as sparingly soluble derivatives, for example, as a sparingly soluble salt.
- a pharmaceutical composition can comprise the population of engineered immune cells produced according to any of the methods disclosed herein.
- a pharmaceutical composition can comprise the engineered immune cells comprising the TCRs or nucleic acids encoding the TCRs disclosed herein.
- a pharmaceutical composition can comprise the population of engineered immune cells disclosed herein.
- any of the pharmaceutical compositions disclosed herein can be administered to a subject.
- any of the pharmaceutical compositions disclosed herein can be administered to a subject to treat a disease or condition in a subject in need thereof.
- the subject can be the same subject from which the biological sample is obtained.
- any identified TCR, nucleic acid encoding the TCR, or a cell comprising the TCR disclosed herein can be used in the manufacture of a medicament for treating a disease or a condition (e.g., cancer, autoimmune disease, or infectious disease) in a subject.
- a disease or a condition e.g., cancer, autoimmune disease, or infectious disease
- any identified TCR, nucleic acid encoding the TCR, or a cell comprising the TCR disclosed herein can be used in the manufacture of a medicament for treating a cancer in a subject.
- the disease or condition that can be treated with the methods disclosed herein is abnormal growth of cells.
- the disease or condition that can be treated with the methods disclosed herein is cancer.
- the cancer is a malignant cancer.
- the cancer is a benign cancer.
- the cancer is an invasive cancer.
- the cancer is a solid tumor.
- the cancer is a liquid cancer.
- the methods of the disclosure can be used to treat any type of cancer known in the art.
- cancers to be treated by the methods of the present disclosure include melanoma (e.g., metastatic malignant melanoma), renal cancer (e.g., clear cell carcinoma), prostate cancer (e.g., hormone refractory prostate adenocarcinoma), pancreatic adenocarcinoma, breast cancer, colon cancer, lung cancer (e.g., non-small cell lung cancer), esophageal cancer, squamous cell carcinoma of the head and neck, liver cancer, ovarian cancer, cervical cancer, thyroid cancer, glioblastoma, glioma, leukemia, lymphoma, and other neoplastic malignancies.
- melanoma e.g., metastatic malignant melanoma
- renal cancer e.g., clear cell carcinoma
- prostate cancer e.g., hormone refractory prostate adenocarcinoma
- a cancer to be treated by the methods of treatment of the present disclosure is selected from the group consisting of carcinoma, squamous carcinoma, adenocarcinoma, sarcomata, endometrial cancer, breast cancer, ovarian cancer, cervical cancer, fallopian tube cancer, primary peritoneal cancer, colon cancer, colorectal cancer, squamous cell carcinoma of the anogenital region, melanoma, renal cell carcinoma, lung cancer, non-small cell lung cancer, squamous cell carcinoma of the lung, stomach cancer, bladder cancer, gall bladder cancer, liver cancer, thyroid cancer, laryngeal cancer, salivary gland cancer, esophageal cancer, head and neck cancer, glioblastoma, glioma, squamous cell carcinoma of the head and neck, prostate cancer, pancreatic cancer
- a cancer to be treated by the methods of the present disclosure further include sarcomata (for example, myogenic sarcoma), leukosis, neuroma, melanoma, and lymphoma.
- a cancer to be treated by the methods of the present disclosure is breast cancer.
- a cancer to be treated by the methods of treatment of the present disclosure is triple negative breast cancer (TNBC).
- TNBC triple negative breast cancer
- a cancer to be treated by the methods of treatment of the present disclosure is ovarian cancer.
- a cancer to be treated by the methods of treatment of the present disclosure is colorectal cancer.
- a patient or population of patients to be treated with a pharmaceutical composition of the present disclosure have a solid tumor.
- a solid tumor is a melanoma, renal cell carcinoma, lung cancer, bladder cancer, breast cancer, cervical cancer, colon cancer, gall bladder cancer, laryngeal cancer, liver cancer, thyroid cancer, stomach cancer, salivary gland cancer, prostate cancer, pancreatic cancer, or Merkel cell carcinoma.
- a patient or population of patients to be treated with a pharmaceutical composition of the present disclosure have a hematological cancer.
- the patient has a hematological cancer such as Diffuse large B cell lymphoma (“DLBCL”), Hodgkin’s lymphoma (“HL”), Non-Hodgkin’s lymphoma (“NHL”), Follicular lymphoma (“FL”), acute myeloid leukemia (“AML”), or Multiple myeloma (“MM”).
- a patient or population of patients to be treated having the cancer selected from the group consisting of ovarian cancer, lung cancer and melanoma.
- cancers that can be prevented and/or treated in accordance with present disclosure include, but are not limited to, the following: renal cancer, kidney cancer, glioblastoma multiforme, metastatic breast cancer; breast carcinoma; breast sarcoma; neurofibroma; neurofibromatosis; pediatric tumors; neuroblastoma; malignant melanoma; carcinomas of the epidermis; leukemias such as but not limited to, acute leukemia, acute lymphocytic leukemia, acute myelocytic leukemias such as myeloblastic, promyelocytic, myelomonocytic, monocytic, erythroleukemia leukemias and myelodysplastic syndrome, chronic leukemias such as but not limited to, chronic myelocytic (granulocytic) leukemia, chronic lymphocytic leukemia, hairy cell leukemia; polycythemia vera; lymphomas such as but not limited to Hod
- cancers include myxosarcoma, osteogenic sarcoma, endotheliosarcoma, lymphangioendotheliosarcoma, mesothelioma, synovioma, hemangioblastoma, epithelial carcinoma, cystadenocarcinoma, bronchogenic carcinoma, sweat gland carcinoma, sebaceous gland carcinoma, papillary carcinoma and papillary adenocarcinomas.
- Cancers include, but are not limited to, B cell cancer, e.g., multiple myeloma, Waldenstrom’s macroglobulinemia, the heavy chain diseases, such as, for example, alpha chain disease, gamma chain disease, and mu chain disease, benign monoclonal gammopathy, and immunocytic amyloidosis, melanomas, breast cancer, lung cancer, bronchus cancer, colorectal cancer, prostate cancer (e.g., metastatic, hormone refractory prostate cancer), pancreatic cancer, stomach cancer, ovarian cancer, urinary bladder cancer, brain or central nervous system cancer, peripheral nervous system cancer, esophageal cancer, cervical cancer, uterine or endometrial cancer, cancer of the oral cavity or pharynx, liver cancer, kidney cancer, testicular cancer, biliary tract cancer, small bowel or appendix cancer, salivary gland cancer, thyroid gland cancer, adrenal gland cancer, osteosarcoma, chondrosarcoma, cancer of hematological
- the cancer whose phenotype is determined by the method of the present disclosure is an epithelial cancer such as, but not limited to, bladder cancer, breast cancer, cervical cancer, colon cancer, gynecologic cancers, renal cancer, laryngeal cancer, lung cancer, oral cancer, head and neck cancer, ovarian cancer, pancreatic cancer, prostate cancer, or skin cancer.
- the cancer is breast cancer, prostate cancer, lung cancer, or colon cancer.
- the epithelial cancer is non-small-cell lung cancer, nonpapillary renal cell carcinoma, cervical carcinoma, ovarian carcinoma (e.g., serous ovarian carcinoma), or breast carcinoma.
- the epithelial cancers may be characterized in various other ways including, but not limited to, serous, endometrioid, mucinous, clear cell, brenner, or undifferentiated.
- the present disclosure is used in the treatment, diagnosis, and/or prognosis of lymphoma or its subtypes, including, but not limited to, mantle cell lymphoma. Lymphoproliferative disorders are also considered to be proliferative diseases.
- Cancer refers to diseases in which abnormal cells divide out of control and are able to invade other tissues. Cancer cells can spread to other parts of the body through the blood and lymph systems. Cancer can be characterized as a group of diseases involving abnormal cell growth that may begin in any tissue with the potential to invade or spread to other parts of the body. Some cancers can be characterized by their type, e.g., solid cancers, liquid cancers, or based on cellular origin such as hematopoietic cancers, osteosarcoma or lymphoma. Some cancers are known by the tissue of their origin or prevalence, e.g., endometrial cancers are characterized as cancers of the endometrial tissue.
- Some cancers are known by the organ or site of their origin or prevalence, e.g. lung cancer, head and neck cancer. Some cancers may be known by the overproductions of certain proteins, enzymes or biomarkers compared to their counterpart cells or tissues that are not cancerous. For example, certain proteins of viral origin may be associated with certain cancers, such as HPV-16 cancers, where certain proteins, for example HPV-16 E6 and E7 are overexpressed in cancer cells of this type.
- certain antigens, such as KRAS may be highly expressed in certain cancer types, compared to non-cancer cells of the same type, and may be designated as KRAS overexpressing cancers.
- the overexpression of the antigen or the specific protein may be associated with or related to one or more mutations, and the cancer type may be associated with the mutation.
- mutation at the wild type G residue corresponding to position 12 in KRAS amino acid sequence may be mutated to V, D, C or other amino acids in KRAS-specific cancer cells.
- Certain specific antigens may be specifically expressed in cancer cells of certain cancer types, and not in other cancer types.
- Various cancers are contemplated herein that may not be restricted to a specific cell type, tissue type or organ, or even a certain stage of cancer.
- the TCRs of the present invention are directed to cancer cells that express a cancer antigen, that may be patient specific, which can be found during sequencing of a subject’s genome from biological sample obtained from a cancer cell, cancer site or cancer tissue and compared to a corresponding non-cancer sample from the same subject; wherein the patient-specific antigen may be expressed in the cancer cell, and not on the non-cancer cell of the subject.
- cancer antigens may be cancer specific, where the antigen is reportedly present in the type of cancer observed in multiple patients in the human population, who have been diagnosed of the specific cancer.
- certain types are cancers are associated with an antigen, a protein (e.g., a viral protein) a gene mutation, all forms of cancer are contemplated herein.
- the cancer is a solid cancer. In some cases, the cancer is a liquid / blood cancer.
- the cancer can express or be diagnosed as expressing a tumor antigen.
- the tumor antigen can be a tumor-associated antigen or a tumor-specific antigen.
- the cancer expresses a tumor-associated antigen (TAA). In some cases, the cancer expresses a tumor-specific antigen (TSA).
- TAA tumor-associated antigen
- the cancer is a cancer expressing or diagnosed as expressing a TAA. In some embodiments, the cancer is a cancer expressing or diagnosed as expressing a TSA.
- the current classification of TAA can include the following group: a) Cancer testis (CT) antigen: Since testis cells do not express HLA class I and class II molecules, these antigens may not be recognized by T cells in normal tissues and may therefore be immunologically considered tumor specific.
- CT antigens include members of the MAGE family and NY-ESO-1;
- Differentiation antigen both tumor and normal tissue (from which the tumor originates) may contain TAAs. Differentiation antigens may be found, for example, in melanoma and normal melanocytes. Many of these melanocyte lineage-associated proteins may be involved in melanin biosynthesis and therefore these proteins may not tumor-specific but may still widely be used for immunotherapy of cancer.
- TAA gene-encoded widely expressed TAAs may be detected in histologically diverse tumors and in many normal tissues, with generally low expression levels. It is possible that many epitopes processed and potentially presented by normal tissues may be below the threshold level of T cell recognition, whereas their overexpression in tumor cells can trigger anticancer responses by breaking previously established tolerance.
- TAAs include Her-2/neu, survivin, telomerase or WT1; d) tumor specific antigen can include unique TAAs resulted from mutations in normal genes (e.g., beta-catenin, CDK4). Some of these molecular changes can be associated with neoplastic transformation and/or progression. Tumor-specific antigens can generally induce strong immune responses without risking from the autoimmune response to normal tissue strips. On the other hand, these TAAs may only be associated with the exact tumor on which they are confirmed and may not commonly shared among many individual tumors.
- TAA resulting from aberrant post-translational modification may result from proteins in the tumor that are neither specific nor overexpressed, but which still have tumor relevance (this relevance is due to posttranslational processing that is primarily active on tumors).
- TAAs may result from an altered glycosylation pattern, resulting in a tumor producing a novel epitope for MUC1 or in an event such as protein splicing during degradation, which may or may not be tumor specific; and f) Tumor virus protein: these TTAs are viral proteins that may play a key role in the oncogenic process and, because they are foreign proteins (non-human proteins), may be able to trigger T cell responses.
- Non-limiting examples of such proteins include human papilloma type 16 viral proteins, E6 and E7, which are expressed in cervical cancer.
- tumor antigens include, but not limited to new antigens expressed during tumorigenesis, products of oncogenes and tumor suppressor genes, overexpressed or abnormally expressed intracellular proteins (e.g., HERZ, MUC1, PSA, MUC1), carcinoembryonic antigen (CEA), tumor viruses (e.g., EBC, HPV, HBV, HCB, HTLV), cancer testis antigens (CTA) (e.g., MAGE family, NY-ESO), oncofetal antigens, altered surface glycolipids and glycoproteins, cell type-specific differentiation antigens (e.g., MART-1), or a derivative thereof.
- intracellular proteins e.g., HERZ, MUC1, PSA, MUC1
- CEA carcinoembryonic antigen
- tumor viruses e.g., EBC, HPV, HBV, HCB, HTLV
- CTA cancer testis antigens
- oncofetal antigens e.g.,
- the tumor antigens can be selected from the group consisting of NY-ESO-1, Her2/neu, SSX-2, MAGE-C2, MAGE-A1, M-2433-233, MAGE-A10254-262, KK-LC-1, p53, PRAME, Alpha fetoprotein, HPV6-E6, HPV16- E7, EBV-LMP1, RAS: G12D, RAS: G12C, RAS: G12A, RAS: G12S, RAS: G12R, RAS: G12R, RAS: G12R, RAS: G122 V, RAS: Q61H, RAS: Q61L, RAS: Q61R, RAS: G13D, TP53: V157G, TP53: V157F, TP53: R248Q, TP53: R248W, TP53: G245S, TP53: Y163C, TP53: G249S, TP53: Y240C, TP53: R1
- the RAS can be KRAS, HRAS, or NRAS.
- tumor-associated antigen or tumor-specific antigen includes antigens from Human Papilloma Virus, Epstein-Barr Virus, Merkel cell polyomavirus, Human Immunodeficiency Virus, Human T-cell Leukemia Virus, Human Herpes Virus 8, Hepatitis B virus, Hepatitis C virus, HCV, HBC, Cytomegalovirus, or from the group of single-point mutated antigens derived from the group consisting of the antigens of ctnnbl gene, casp8 gene, HER2 gene, p53 gene, KRAS gene, NRAS gene, or particular tumor antigens issued or derived from the group consisting of RAS oncogene, BCR-ABL tumor antigens, ETV6-AML1 tumor antigens, melanoma-antigen encoding genes (MAGE), BAGE antigens, GAGE antigens, ssx antigens, ny-eso
- the cancer cells express the tumor antigens, including and not limited to, NY- ESO-1, Her2/neu, SSX-2, MAGE-C2, MAGE-A1, M-2433-233, MAGE-A10254-262, KK-LC-1, p53, PRAME, Alpha fetoprotein, HPV6-E6, HPV16-E7, EBV-LMP1, RAS: G12D, RAS: G12C, RAS: G12A, RAS: G12S, RAS: G12R, RAS: G12R, RAS: G12R, RAS: G122 V, RAS: Q61H, RAS: Q61L, RAS: Q61R, RAS: G13D, TP53: V157G, TP53: V157F, TP53: R248Q, TP53: R248W, TP53: G245S, TP53: Y163C, TP53: G249S, TP53: Y
- the cancer is a carcinoma, lymphoma, blastoma, sarcoma, leukemia, squamous cell cancer, lung cancer (including small cell lung cancer, non-small cell lung cancer, adenocarcinoma of the lung, and squamous carcinoma of the lung), cancer of the peritoneum, hepatocellular cancer, gastric or stomach cancer (including gastrointestinal cancer), pancreatic cancer, glioblastoma, cervical cancer, ovarian cancer, liver cancer, bladder cancer, hepatoma, breast cancer, colon cancer, melanoma, endometrial or uterine carcinoma, salivary gland carcinoma, kidney or renal cancer, liver cancer, prostate cancer, vulval cancer, thyroid cancer, hepatic carcinoma, head and neck cancer, colorectal cancer, rectal cancer, soft-tissue sarcoma, Kaposi’s sarcoma, B-cell lymphoma (including low grade/follicular non-Hodgkin’s lymphom
- any type of tumor tissue sample derived from a subject having a cancer, may be suitable.
- This can include solid tumors such as breast, lung, colorectal, prostate, and skin cancers, or hematological malignancies like lymphomas and leukemias. Additionally, fine needle aspirates, biopsies, or resected tumor tissues could be used.
- the method can also be applicable to peripheral blood mononuclear cells (PBMCs), or lymphocytes isolated from the blood, or samples obtained from other body fluids, such as pleural effusions or ascites, which may contain tumor-infiltrating T cells.
- PBMCs peripheral blood mononuclear cells
- samples from any solid tumor that can be dissociated into single cells would also be suitable for this method.
- the T cell can be obtained from a tissue sample comprising a solid tissue, with non-limiting examples including a tissue from brain, liver, lung, kidney, prostate, ovary, spleen, lymph node (e.g., tonsil), thyroid, thymus, pancreas, heart, skeletal muscle, intestine, larynx, esophagus, and stomach. Additional non-limiting sources include bone marrow, cord blood, tissue from a site of infection, ascites, pleural effusion, spleen tissue, and tumors.
- the T cells can be obtained from a solid tumor lesion from a subject.
- the T cell can be derived or obtained from a healthy donor, from a patient diagnosed with cancer or from a patient diagnosed with an infection.
- the T cells can be isolated from a sample and selected with certain properties by various methods.
- tissue e.g., isolating tumor-infiltrating T cells from tumor tissues
- the tissues made be minced or fragmented to dissociate cells before lysing the red blood cells or depleting the monocytes.
- the source T cells can be tumor-infiltrating lymphocytes (TILs), e.g., tumor-infiltrating T cells (TITs).
- TILs tumor-infiltrating lymphocytes
- TITs tumor-infiltrating T cells
- a TIL can be isolated from an organ afflicted with a cancer.
- One or more cells can be isolated from an organ with a cancer that can be a brain, heart, lungs, eye, stomach, pancreas, kidneys, liver, intestines, uterus, bladder, skin, hair, nails, ears, glands, nose, mouth, lips, spleen, gums, teeth, tongue, salivary glands, tonsils, pharynx, esophagus, large intestine, small intestine, rectum, anus, thyroid gland, thymus gland, bones, cartilage, tendons, ligaments, suprarenal capsule, skeletal muscles, smooth muscles, blood vessels, blood, spinal cord, trachea, ureters
- TILs can be from a brain, heart, liver, skin, intestine, lung, kidney, eye, small bowel, or pancreas.
- TILs can be from a pancreas, kidney, eye, liver, small bowel, lung, or heart.
- the one or more cells can be pancreatic islet cells, for example, pancreatic P cells.
- a TIL can be from a gastrointestinal cancer.
- the tumor sample can be a surgically removed tumor sample (or a resection sample).
- the tumor sample can be a biopsy sample such as core biopsy, fine needle biopsy sample, or a large needle biopsy sample.
- Example 1 Predictive identification of tumor antigen-reactive T cell receptors from exhausted T cells
- An antigen-agnostic prediction algorithm was developed to identify tumor antigen-reactive TCRs for both CD8+ and CD4+ T cells within a tumor.
- the bioinformatic algorithm takes advantage of molecular signatures which are captured by single cell transcriptome (scGEX) and TCR (scTCR) sequencing on TILs as input.
- scGEX single cell transcriptome
- scTCR TCR
- the model was trained on seven tumor samples across three cancer types including non-small cell lung cancer (NSCLC), colorectal (CRC), head and neck cancers.
- the algorithm described herein achieved a median positive predictive value (PPV) of 40% for CD8+ and 70% for CD4+ top five TCR clones, respectively.
- PPV median positive predictive value
- the algorithm achieved a median PPV of 70%.
- the top 10 clones targeted a median of four unique somatic mutations presented by three unique HLA alleles (Table 7).
- the optimal performance of the prediction algorithm paves a way for developing a personalized adoptive T cell therapy to treat solid tumors.
- the end-to-end algorithm is mainly composed of four sequential steps.
- TILs were partitioned into either CD4+ or CD8+ population by unsupervised clustering using a 99-gene signature (Table 2) as variable genes with six principal components (PC) at 0.1 resolution.
- the 99-gene signature was derived from analyzing in- house data on two tumor samples. Due to lower mRNA expression level of the CD4 gene and a high drop off rate in scGEX, using CD4 and CD8 gene expressions alone were not sufficient to separate the two populations (FIG. 1).
- FIG. 2 depicts the separation of TILs into CD4+ and CD8+ populations as described. Each dot represents a T cell.
- FIG. 2A shows Uniform Manifold Approximation and Projection (UMAP) on two clusters formed by using the 99- gene gene signature.
- FIGs. 2B-2C show gene expression levels for CD4 and CD8A in normalized UMI counts, respectively. In the second step, exhausted CD8 and CD4 cells were identified and defined.
- UMAP Uniform Manifold Approximation and Projection
- Tumor-reactive T cells have been shown to carry an exhausted (or dysfunctional) phenotype in TILs.
- Two exhaustion gene signatures were established. One for CD8+ cells which is composed of 20 genes (Table 3) and the other for CD4+ cells which is also composed of 20 genes (Table 4). The two exhaustion gene signatures were derived from the seven tumor samples that have been fully characterized and analyzed. Exhaustion (EX) scores were calculated for each T cell using the exhaustion gene signatures. The EX score is calculated as the sum of normalized UMI counts for the signature gene set. CD4+ cells with EX scores greater than or equal to (>) 13 were considered exhausted follicular helper cells (CD4.EX.FH). Similarly, CD8+ cells with EX scores > 13 were considered to be in an exhausted state (CD8.EX).
- T cell were further classified by CD8+ and CD4+ subtypes including exhausted CD8+ T cells (CD8.EX), cytotoxic CD8+ T cells (CD 8. Cytotoxic), resident memory CD8+ T cells (CD8.RM), effector or CD8’s re-expressing CD45RA (CD8.EFF.EMRA), effector memory CD8’s (CD8.EM), stressed CD8+ T cells (CD8. Stressed), THl-like CD4+ T cells (CD4.THl-like), proliferating T cells (Prolif), regulatory T cells (CD4.Treg.1-3), naive and/or central memory CD4+ T cells (CD4.Naive.
- CD8.EX exhausted CD8+ T cells
- CD8.RM cytotoxic CD8+ T cells
- CD8.RM resident memory CD8+ T cells
- CD8.EMRA effector or CD8’s re-expressing CD45RA
- CD8.EMRA effector memory CD8’s
- stressed CD8+ T cells
- CM CM
- CD4+ T cells CD4. Stressed
- FIGs. 3A, 3B Exhausted T cells defined by using exhaustion scores were cross confirmed with annotated T cell subtypes (FIG. 4A). Exhaustion signature scores assigned to each T cell were compared across subtypes (FIG. 4B).
- GSEA gene set enrichment analysis
- clone expansion Upon activation by an antigen, a T cell will proliferate, so-called clone expansion. If more than half of the cells with the same TCR clonotype were predicted to be exhausted by either calculation approaches described above, the clonotype was labeled as EX clone and selectable as a therapeutic candidate TCR.
- Table 1 above displays an output example of exhaustion score calculations for CD8+ clones described above using two signature gene sets with either sum (20-gene signature) or GSEA (61 -gene signature) method.
- step three several quality control (QC) checks and filters were implemented for the selectable EX clonotypes (FIGs. 7A-7C). QCs included unique alpha and beta chain pairing status in a clonotype, matches to public TCRs, and expression of innate immune cell marker genes, etc.
- clonotypes with paired TCR-alpha and TCR-beta chain are retained, while clonotypes with only alpha or beta chain captured were filtered out; clonotypes are matched to public TCRs (e.g. VDJdb collections, https://vdjdb.crd3.net).
- Candidate exhausted clones should not match to public TCRs that recognize antigens derived from non-oncogenic pathogens;
- Candidate exhausted T cells should carry the least of Mucosal-Associated Invariant T (MAIT) cell features, such as expressing innate immune cell markers (e.g.
- MAIT Mucosal-Associated Invariant T
- the bioinformatics pipeline was executed with little human intervention thus can be implemented into standard operations.
- Five associated gene signatures are proprietary, which include a 99-gene signature for CD4+ and CD8+ cell computational separation, a 20-gene and 88-gene signatures to narrow down CD4.EX.FH cells, and a 20-gene and 61-gene signatures to define CD8.EX cells.
- Table 2 shows the 99 gene signature to separate CD4+ T cells and CD8+ T cells.
- Table 3 shows the CD8+ 20-gene exhaustion signature for calculating exhaustion score.
- Table 4 shows the CD4+ 20-gene exhaustion signature for calculating exhaustion score.
- Table 5 shows the CD8+ 61 -gene exhaustion signature for calculating GSEA scores.
- Table 6 shows the CD4+ 88-gene exhaustion signature for calculating GSEA scores.
- Tumor samples were obtained 24 hours following surgery for single-cell analyses.
- K2EDTA blood samples were obtained from the same subjects.
- Patient peripheral blood mononuclear cells (PBMCs) were isolated by Ficoll density-gradient centrifugation and B cells were isolated by positive isolation (Miltenyi) and treated with EBV virus (ATCC) for immortalization.
- B cell cultures were expanded in flasks to harvest personalized cellular screening material.
- PBMCs were frozen for whole- exome sequencing (WES) and HLA typing.
- FFPE formalin fixation and paraffin embedding
- Tumor single cell suspensions were thawed and stained with: Live/Dead Blue, anti-CD45, anti-CD14, anti-CD19, anti-CD3, anti-CD4 and anti-CD8.
- CD3+ TILs and CD45negative cells were sorted for single-cell lOx VDJ capture and gene expression or WES respectively.
- Shearing conditions were 220W peak incident power, 380s of duration, 220 PIP, 25 DF, 50 CPB, 55 AIP at setpoint of 10°C in 50 pl.
- WES whole exome sequencing
- samples were processed using a library preparation kit according to the manufacturers protocol.
- Samples were processed for total RNAseq was using total RNA prep ligation and a ribosomal depletion kit.
- Tumor and normal tissue were diluted to 200 ng and PBMC and CD45negative samples were diluted to 10 ng and RNAseq library prep were performed according to manufacturer’s protocol for this kit. Both WES and RNAseq libraries were pooled and normalized and sequenced.
- tumor samples were sequenced with sequencing depth of 250M - 300M reads paired reads per sample and normal adjacent, PBMC and expanded TIL samples were sequenced at 166M - 300M read paired reads sample.
- RNAseq tumor samples were sequenced with 120M - 166M paired reads per sample.
- GEX and V(D)J libraries were sequenced for QC and library normalization and then sequenced using a targeting depth of 20,000 reads/cell and 5000 reads/cell respectively. Samples were sequenced using 200 cycles.
- Somatic variants were called on the basis of tumor and normal WES using an ensemble of seven different mutation calling algorithms: VarDict (version 1.4.6), Strelka (version 1.0.15), VarScan2 (version 2.3.9), Mutect2 (from the GATK version 3.5 bundle), Atlas Indel2 (version 1.4.3), Seurat (version 2.6), and Platypus (version 0.8.1).
- the three sequencing datasets were then realigned using HaplotypeCaller (from the GATK version 3.5 bundle), specifying the candidate mutations (union of the seven call sets) as known variants.
- the variants were filtered according to the following features: the level of read support in the tumor WES data, the presence of variant reads in the normal WES data, read orientation bias, adequacy of coverage in the normal WES sample, the presence of neighboring (+/-30nt) variants (somatic or germline), and read quality bias observed in mutation- supporting reads.
- RNA-Seq expression levels of all genes and transcripts were quantified in transcripts per million (TPM) using RSEM (version 1.2.31). The overall expression of each somatic variant was calculated as the product of the RSEM-derived transcript expression (summing across all overlapping protein-coding transcripts) and the fraction of RNA-Seq reads supporting the variant. Variants with zero supporting RNA reads were still considered as valid mutations (and counted toward tumor mutation burden) but were not considered for inclusion in vaccine. RNA-Seq was additionally processed using STAR-Fusion (version 2.5.1.b) to identify transcript fusions (requiring both junction support and spanning read pairs).
- CTAs Cancer-Testis Antigens
- T cells were first computationally classified into two populations, CD4+ T cell or CD8+ T cell, based on an unsupervised clustering method using a set of defined gene signatures. Then exhaustion score is assigned to each CD4+ and CD8 T cell using exhaustion gene signature for CD4 and CD8, respectively.
- the exhaustion gene signature for CD8 comprises 20 genes; the exhaustion gene signature for CD4 also comprises 20 genes.
- Clonotype (TCR) of a given T cell was matched to its scGEX via a single cell barcode. Within each clonotype, if the median exhaustion score is equal or larger than a predefined cutoff value, the clonotype is labeled as exhausted (EX) clone which is selectable for downstream screening and validation.
- the prioritized list of mutations was analyzed and adjusted to generate peptide sequence lists for synthesis.
- sequences were designed with the mutated amino acid centrally placed and flanked by 13 wild-type residues to create 25-mer peptides. Any 25- mer sequence with a proline residue at or near the C-terminus was elongated.
- frameshift mutations longer than 30 amino acids the sequence was segmented into overlapping 30-mer peptides (25 amino acid overlap), avoiding sequences with C-terminal or near C-terminal proline residues.
- the top 80 sequences from the Class I short peptides were sorted based on HLA binding preferences to ensure a variety of binding preferences would be present in the final pools.
- 15-mer peptide libraries representing the entire length of CTA proteins were prepared using overlapping sequences (11 amino acid overlap), avoiding sequences with C-terminal or near C-terminal proline residues.
- Peptides were synthesized using standard Fmoc-based solid-phase peptide synthesis (SPPS) with a capping strategy. For 15-mer and 25-30-mer peptides, Rink amide resin was used, while Wang or HMPB resins preloaded with the C-terminal amino acid were used for short peptides. Synthesis was performed on a library peptide synthesizer in tip-mode or a library synthesizer in plate mode at a 2.5 pmol scale, using 10 equivalents of Fmoc-protected amino acids, 10 equivalents of HCTU, and 100 equivalents of DIPEA per coupling. Each cycle from 1-10 involved two 30-minute couplings, and cycles 11-30 involved three 30-minute couplings. All cycles were followed by a 30-minute capping step with acetic anhydride. Deprotection was conducted using 20-40% piperidine. Upon completion, resins were washed with dichloromethane and dried under vacuum.
- SPPS solid-phase peptide synthesis
- cleavage cocktail (90% trifluoroacetic acid, 5% thioanisole, 3% ethanedithiol, and 2% anisole) was added to each well and incubated at room temperature for 4 hours. The solution was fdtered from the resin and collected in deep-well plates, then combined with cold diethyl ether. After centrifugation, the diethyl ether was decanted and peptides were dissolved/suspended in water, frozen in liquid nitrogen, and lyophilized until dry.
- Mass spectra were analyzed to confirm the desired sequence. Any peptide where the desired sequence was not the most prominent species as determined from absorbance integration of the chromatogram, underwent resynthesis as described below. Peptides that passed this QC were carried on to screening without further purification.
- Failed peptides were resynthesized using a microwave synthesizer at 50 pmol scale with default methods and acetic anhydride capping. Peptides were cleaved using a cleaving oven. The cleaved peptides were separated, washed, centrifuged, and dissolved in water, then lyophilized. The completed peptide resins were washed 3 times with dichloromethane and dried under vacuum.
- the peptides were cleaved from the resin on a cleaving oven using 5 mL of cleavage cocktail (90% trifluoroacetic acid, 5% thioanisole, 3% ethanedithiol, and 2% anisole) at 40°C for 1 hour.
- the peptide solution was then separated from the resin, 40 mL of cold diethyl ether was added, and the resulting suspension was centrifuged at 4000 rpm for 5 minutes at 4°C.
- the diethyl ether was decanted and the pellet was triturated with another 40 mL of diethyl ether, centrifuged, and decanted.
- the pellet was dissolved in 5mL of water, frozen in liquid nitrogen, and lyophilized. Once dry, a sample of the material was dissolved to 0.1 mg/mL in 10% DMSO in water with ImM TCEP and analyzed by LC- MS as previously described. Peptides passing the previously described QC were carried on
- Short Class I mutanome peptides were dissolved in 30 ⁇ L of 12.5 mM TCEP in DMSO and then diluted with 270 ⁇ L of serum-free RPMI media. 100 ⁇ L of 10 short peptides were combined to make a pool of 10 at where the concentration of each peptide was approximately 0.1 mg/mL. Unpooled material was retained for peptide deconvolution assays.
- TCR alpha and beta variable chain regions were ordered as gene fragments from a vendor.
- the following upstream and downstream TCR constant overlaps were added to the 5’ and 3’ ends of each variable region, respectively.
- Beta upstream (5’) overlap
- the codon that is split between the variable and constant TCR region directly 3’ ofthe variable region was handled differently for alpha and beta sequences. Since the split beta codon is always a “GAG,” this codon is built into the beta downstream (3’) overlap above as the first three nucleotides (shown in bold). The split alpha codon varies between TCR sequences, however. For some samples, the split codon was omitted from the TCR sequence. For the remaining patients, the sequence “AAT” was included as the split codon sequence since the identity of the split amino acid is asparagine in the majority of TCRs. For these samples, the “AAT” codon was included in the alpha gene fragment sequence directly before the alpha downstream (3’) overlap sequence above.
- TCR alpha and beta template sequences which include primer sequences, the T7 promoter sequence, the Kozak sequence, a leader peptide sequence, and the TCR variable and constant regions
- PCR polymerase chain reaction
- alpha and beta chain gene fragment sequences (10 ng) were mixed with the corresponding upstream (3’) and downstream (5’) sequences in ⁇ 1 : 0.25 : 0.1 (gene fragment : upstream sequence : downstream sequence) molar ratio along with buffer (final concentration: lx), dNTPs (final concentration: 0.5 mM), and DNA Polymerase (final amount: 0.4 units) and brought to 20 ⁇ L with nuclease-free water. The reactions were mixed.
- the poly-A tail template sequence was included in the reverse amplification PCR primer and was thus incorporated at the amplification PCR step.
- the alpha and beta overlap-extension PCR products (4 ⁇ L) were mixed with the forward and reverse primers (final concentration: 100 nM each), buffer (final concentration: lx), dNTPs (final concentration: 0.2 mM), and DNA Polymerase (final amount: 1 unit) and brought to 50 ⁇ L with nuclease-free water. After an initial denaturation step at 98°C for 30 s, 35 rounds of PCR were performed (98°C for 30 s and 72°C for 2 min) followed by a 10 min hold at 72°C. Unpurified amplification PCR product was either immediately used for in vitro transcription (IVT) or stored at - 20°C overnight prior to IVT.
- IVTT in vitro transcription
- IVTT In vitro transcription
- IVT was performed separately for TCR alpha and beta chains to generate RNA transcripts that are capped co-transcriptionally with a capping kit. Briefly, unpurified alpha and beta amplification PCR products (6 ⁇ L) were mixed with NTPs (final concentrations: 6 mM ATP and 5 mM each of UTP, CTP, and GTP), capping reagent (final concentration: 4 mM), reaction buffer (final concentration: IX), and T7 RNA Polymerase Mix (amount: 2 ⁇ L) for a final reaction volume of 20 ⁇ L. The reactions were incubated in a thermocycler at 37°C for 2 h. After 2 h, 2 ⁇ L of RNase-free DNase I was added to each well and the reaction was incubated at 37°C for another 15 min. IVT products were either immediately purified or stored overnight at -20°C prior to purification.
- NTPs final concentrations: 6 mM ATP and 5 mM each of UTP, CTP, and GTP
- RNA-containing supernatant was transferred to a clean PCR plate. Representative purified RNA products were tested via electrophoresis to confirm correct sizing. TCR alpha and beta RNA chains were each brought to a final concentration of 1 ⁇ g/ ⁇ L and paired TCR alpha and beta RNA chains were combined in a 1:1 ratio (1 ⁇ g/ ⁇ L combined concentration; 0.5 ⁇ g/ ⁇ L each chain) and stored at -80°C until further use in co-culture assays.
- TCR RNA stocks containing both the alpha and beta RNA chain were electroporated into Jurkat NFAT luciferase cells (derived from product JI 601; engineered to knock-out TRAC, TRBC, and 02M and overexpress CD8) using an electroporation system with 4mm gap 96-well plates. Briefly, cells were washed twice with serum-free media and resuspended at 5x10 6 cells/mL in media. Cells were electroporated in 200 ⁇ L at a ratio of 4 pg mixed-chain TCR RNA to 1x106 cells at 280 V for 10 ms (1 pulse).
- Cells were diluted in RPMI + 10% FBS + 200 pg/mL hygromycin B and left to recover overnight. The following day, cells were stained with LIVE/DEAD Fixable Blue dye and PE anti-mouse TCR chain antibody and run on a flow cytometer to assess TCR expression. Generally, an average of >80% expression across all TCRs for a patient was considered acceptable for screening.
- Plasmids containing class I (alpha chain) and class II (alpha and beta chain) patient HLA sequences were obtained.
- the HLA plasmids were PCR amplified with the following primer sequences using a DNA Polymerase kit from NEB. Forward primer: TGGGCGCGTTATTTATCGGAGTTGCAGTTG; Reverse primer:
- the poly-A tail template sequence was included in the reverse PCR primer and was thus incorporated in this step.
- HLA plasmids (2 ⁇ L of 50 ng/ ⁇ L) were mixed with the forward and reverse primers (final concentration: 1.2 pM each), reaction buffer (final concentration: lx), dNTPs (final concentration: 0.3 mM), and DNA Polymerase (final amount: 1 unit) and brought to 50 ⁇ L with nuclease-free water. The reactions were mixed. After an initial denaturation step at 98°C for 50 s, 28 rounds of PCR were performed (98°C for 20 s, 69.8°C for 60 s, and 72°C for 35 s) followed by a 3 min hold at 72oC.
- HLA PCR product was then purified with selection beads in a process similar to TCR RNA purification with the following exception: 27 ⁇ L of beads were added to the 50 ⁇ L of unpurified PCR product. Purified HLA DNA was either immediately used for in vitro transcription (IVT) or stored at -20°C overnight prior to IVT.
- IVT in vitro transcription
- IVT was performed with purified HLA DNA to generate RNA transcripts that are capped co- transcriptionally with anti -reverse cap analog (ARC A) using a capping kit. Briefly, purified HLA DNA (6 ⁇ L) was mixed with 2XNTP/ARCA (final concentration: IX), reaction buffer (final concentration: 1 X), and manufacturer enzyme mix (final concentration: IX) for a final reaction volume of 20 ⁇ L. The reactions were gently mixed and then incubated in a thermocycler at 37°C for 1 h. After 1 h, 1 ⁇ L of T7 GTP (final concentration: ⁇ 1.5 mM) was added to each well and the reaction was incubated at 37°C for another 1 h.
- 2XNTP/ARCA final concentration: IX
- reaction buffer final concentration: 1 X
- manufacturer enzyme mix final concentration: IX
- HLA RNA purification was performed with selection beads using the same protocol as TCR RNA purification. Representative purified RNA products were tested via electrophoresis to confirm correct sizing. HLA RNA was brought to a final concentration of 1 ⁇ g/ ⁇ L and stored at -80°C until further use in co-culture assays.
- HLA RNA stocks were electroporated into K562 cells using an electroporation system with 4mm gap 96-well plates. Briefly, cells were washed twice with serum-free media and resuspended at 5x10 6 cells/mL in media. Cells were electroporated in 200 ⁇ L at a ratio of 2 pg each HLA RNA chain (alpha chain for class I and alpha + beta chain for class II) to IxlO 6 cells at 200 V for 8 ms (3 pulses with 400 ms intervals). Cells were diluted in RPMI + 10% FBS (R10) and left to recover overnight.
- R10 10% FBS
- each donor’s TCR sequences were screened for reactivity to peptides representing the donor’s tumor mutanome.
- mixed-chain TCR RNA was electroporated into Jurkat NF AT cells as described above.
- Jurkat cells expressing individual TCRs were diluted to a concentration of 1.75x10 6 live cells/mL in R10 media in 8-channel reservoirs, assuming 30% electroporation- induced cell death. Reservoirs were sealed with semi-permeable membranes and rested at 37°C until assay setup.
- K562 cells were electroporated with donor HLA RNA and combined into pools for screening. On the day of the screening assay, HLA RNA was electroporated into K562 cells as described above. Post-electroporation, K562 cells expressing individual class I or II patient HLA were diluted to a concentration of 1.75x10 6 live cells/mL in R10 media, assuming 30% electroporation-induced cell death. Subsequently, HLA-expressing K562 cells were combined into class I or II pools at equal volumes, with ⁇ 3 different class I or II HLA constituting each pool.
- K562-HLA pools I and II were dispensed into quadrants 1/2 and 3/4 of the 384- well plates, respectively.
- Media without APC cells was pipetted into the columns where PHA-L would be added to assess TCR expression.
- 10 ⁇ L/well TCR-expressing Jurkat NF AT luciferase cells were dispensed into each well.
- EBV-B cells EBV-B cells
- TCR-expressing Jurkats were dispensed into only two quadrants
- K562-HLA cells were used, TCR-expressing Jurkats were dispensed into all 4 quadrants of the 384-well assay plates to provide for duplicate measurements for each condition.
- the peptides pools, CTA pepmixes, and PHA-L and media controls were then added to the 384-well assay plates. Briefly, 10 ⁇ L/well of peptide or control samples were dispensed into all 4 quadrants of the 384-well assay plates. Note that in cases where EBV-B cells served as the APC, tips were changed between quadrants 1/2 and 3/4 to avoid contamination between TCRs. Plates were covered and incubated overnight ( ⁇ 16 h) at 37°C and 5% CO 2 .
- Luciferase substrate was prepared according to the manufacturer’s instructions and added at a 1 :1 ratio to each well. Luminescence was measured after ⁇ 5 min using a plate reader. Raw data from each well was normalized to the corresponding average “no peptide” signal for each TCR, leading to a fold-change signal for each condition. Fold-change NF AT luciferase signal was plotted in heat map format and reactive conditions were identified as those having greater than ⁇ 2-fold signal.
- peptide deconvolution was performed to identify the neoantigen responsible for TCR activation. Briefly, Jurkats expressing TCRs where reactivity was observed were cocultured with either EBV-B cells or the reactive K562-HLA cell pool and each peptide within the reactive peptide pool individually, with the reactive peptide pool serving as the positive control.
- Jurkat and APC concentrations were kept the same in the peptide deconvolution ( ⁇ 5.8x10 6 cells/mL each) and individual long and short crude peptides were tested at final concentrations of 16.6 ⁇ g/mL and 3.3 pg/mL, respectively (same as the individual concentrations within the peptide pools used in the screen). Plates were covered and incubated overnight ( ⁇ 16 h) at 37°C and 5% CO 2 . The next day, plates were removed from the incubator and luciferase activity was assayed and calculated in the same manner as for the initial screen. Specific reactivities were determined by identifying the neoantigen with the highest fold-change peptide signal.
- a head-to-head comparison of three functional avidity (FA) readouts was run in Jurkat and/or primary CD8 T cell systems to identify which assay produced the highest sensitivity in a high- throughput format.
- FA functional avidity
- NF AT activation in the form of luminescence and percentage of CD69 expressing cells were compared to primary CD8 T cell interferon gamma (IFN- ⁇ ) secretion measured via ELISA and percentage of CD69 expressing cells.
- IFN- ⁇ primary CD8 T cell interferon gamma
- each validated CD8 TCR was assessed by evaluating its functional avidity (FA) across a dose titration of antigen presented by the HLA restriction element.
- the FA of each TCR was assessed by co-culturing Jurkat or primary T cells transiently expressing validated TCRs with T2 cells transiently expressing donor-matched HLA in the presence of a dose titration of purified cognate antigen.
- T2 cells were electroporated to express the TCR’s matched HLA restriction element.
- T2 cells were electroporated as previously described for K562 and diluted into R1O media to a concentration of 2.5x10 6 cells/mL. Cells were transferred to an appropriate flask size and rested at 37°C, 5% CO 2 overnight.
- Jurkat cells were electroporated, as described above, to express an individual validated CD8 TCR. After electroporation, Jurkat cells were diluted in RIO media in an 8- channel reservoir to a concentration of 2.5x10 6 cells/mL and rested at 37°C, 5% CO 2 for 3 hours. After one hour, T2 cells were counted and if needed, diluted in RIO media to a concentration of 2.5x10 6 cells/mL and transferred to an 8-channel reservoir (each T2-HLA was transferred to the row number that corresponds to the matched TCR location). 10 ⁇ L of T2 cells were dispensed into all four quadrants of a 384-well plate.
- the plate was transferred to an incubator at 37°C, 5% CO 2 while peptides were being prepared.
- Each stock of peptide was dissolved in dimethyl sulfoxide (DMSO) + 0.1M of tris(2- carboxy ethyl) phosphine (TCEP) to a concentration of 10 mM.
- DMSO dimethyl sulfoxide
- TCEP tris(2- carboxy ethyl) phosphine
- Both mutant and wild-type versions of TCR-specific purified peptides were diluted to a concentration of 20 pM in RIO media in the first column of a 96-well U-bottom plate and titrated 1 : 10 for 11 -points with the 12th point containing only media.
- 20 ⁇ L/well of peptide was added to the plate containing T2 cells.
- the plate was placed at 37°C, 5% CO 2 for two hours. After the peptide pulse, 10 ⁇ L/well of the prepared Jurkat cells were added to the plate, resulting in a final top concentration of peptide of 10 pM. Plates were incubated for approximately 16 hours overnight at 37°C, 5% CO 2 .
- the surface staining antibody cocktail was prepared by diluting the following antibodies in staining buffer: AF488 anti-human CD3 (clone UCTHT1), PE anti-mouse TCR ⁇ chain (clone H57- 597), BV785 anti-human CD8 (clone SKI) and APC anti-human CD69 (clone FN50).
- AF488 anti-human CD3 clone UCTHT1
- PE anti-mouse TCR ⁇ chain clone H57- 597
- BV785 anti-human CD8 clone SKI
- APC anti-human CD69 clone FN50
- Cells were then washed twice with stain buffer and 10 ⁇ L/well of 1 % PF A diluted in PBS was added and incubated at room temperature in the dark for 10 min. Following the incubation, cells were washed twice with stain buffer and resuspended in a final volume of 20 ⁇ L/well. Cells were analyzed by flow cytometry in high-throughput mode, collecting 15 ⁇ L of sample.
- SSC-A vs FSC-A population of interest
- SSC-W vs SSC-H and FSC-W vs FSC-H live cells
- SSC-A vs L/D live cells
- EC50 values were calculated by creating a 3-parameter dose- response curve representing the percentage of cells positively expressing CD69 across the titrated peptide concentrations. In some cases where titration curves did not saturate at the upper baseline, functional avidity was repeated with a top peptide concentration of 100 ⁇ M.
- tumor samples were obtained from human donors and processed as described in Example 2. Briefly, samples were processed as described in Example 2, the screened TCR clones were then categorized based on exhaustion score and assessed for antigen specificity (FIG. 8). The exhaustion gene signature scores were calculated as described in Example 1. As shown in FIGs. 9A-9B, clones which demonstrated scores above the indicated cutoff were selected. The results of the tumor antigen reactivity screen are shown using squares (reactive) or asterisks (non-reactive). The antigen reactivity analysis of the selected clones demonstrated that the clones were specific for diverse CD8 and CD4 tumor antigens (FIGs. 10A-10D)
- top 10 clones were prioritized as shown in FIG. 11. Briefly, clones which demonstrated a higher clonality, higher exhaustion score, positive for proliferation gene signatures were prioritized. Clones which had dual alpha or beta chains or demonstrated a Treg phenotype were deprioritized. Following validation via NF AT activation as described in Example 2, positive predictive value (PPV) results values are shown in FIGs. 12A-12C and summarized in Table 7. The PPV calculation is based on the top 10 clones ranked by prioritization criteria, which may consist of five CD8 and five CD4 clones, depending on availability. If fewer than five CD8 clones were available, additional CD4 clones were included to reach up to 10 clones.
- prioritization criteria which may consist of five CD8 and five CD4 clones, depending on availability. If fewer than five CD8 clones were available, additional CD4 clones were included to reach up to 10 clones.
- Table 8A provides detailed validation statistics on selectable clones and top CD8 clones.
- Table 8B provides detailed validation statistics on selectable clones and top CD4 clones.
- the number of mutations denote the number of antigen mutants which were recognized among the selectable or top clones.
- the number of HLA denote the number of HLA alleles which were recognized among the selectable or top clones.
- the scr-TCR number denotes the number of selectable or top 5 clones tested.
- TCRs Antigen capture of the TCRs was also assessed. This is performed via single cell sequencing using a library of barcoded antigens and selecting for antigen specific TCRs for further analysis. As shown in FIGs. 13A-13C, top 5 CD8 TCR ranking captured 2 antigens and the top 5 CD8 TCR ranking captured 2 antigens. Taken together, these data demonstrate that the process described herein can effectively identify clinically relevant TCR clones that can respond effectively to tumor antigens.
- ICI-based immune therapy has been approved as frontline therapy and/or widely used as neoadjuvant therapy for multiple cancer types.
- ICI treatment can alter the tumor immune landscape in the tumor microenvironment ICI can promote T cell function by re-invigorating exhausted T cells, specifically those progenitor exhausted T cells. Thus, exhaustion phenotype and scores of T cells can be affected by prior ICI treatment.
- the process was adjusted by using the upper quantile value of exhaustion score and/or GSEA score instead of using the median value. If the upper quantile value of exhaustion score and/or GSEA score of a TCR clonotype was equal or greater than the cut off value, then this TCR clonotype was defined as an exhausted clone.
- the fixed cutoff values 13 for CD4+ exhaustion score; 13 for CD8+ exhaustion score; 0.2 for CD4 GSEA score; 0.3 for CD8+ GSEA score) used were the same as described in Examples 1-3.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Physics & Mathematics (AREA)
- Genetics & Genomics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Organic Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Pathology (AREA)
- Data Mining & Analysis (AREA)
- Zoology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Wood Science & Technology (AREA)
- Evolutionary Biology (AREA)
- Analytical Chemistry (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Immunology (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Oncology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioethics (AREA)
- Epidemiology (AREA)
- Software Systems (AREA)
- Hospice & Palliative Care (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Biochemistry (AREA)
- Microbiology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present disclosure provides methods and compositions to identify exhausted T cells from a population of T cells extracted from a tumor microenvironment in cancer patients. The present disclosure also provides methods and compositions to identify tumor-reactive T-cell receptor (TCR) clonotypes. The present disclosure also provides methods and compositions to identify tumor-reactive TCR clonotypes from the exhausted T cells identified using the methods described herein. The present disclosure utilizes advanced prediction algorithms and next-generation sequencing technologies, like single-cell transcriptome (scGEX) and TCR sequencing (scTCR), to identify TCR clonotypes within a tumor. The methods can distinguish each T cell as either a CD4+ or CD8+ cell based on the expression level of classification genes. The classification can create distinct CD4+ and CD8+ clusters. A CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score can be calculated for T cells in the CD4+ cluster, and similarly, a CD8+ exhaustion score and/or a CD8+ GSEA score for T cells in the CD8+ cluster. These scores can be calculated based on the expression level of specific exhaustion gene markers. Any T cell exceeding a set of exhaustion score and/or GSEA score threshold can be identified as exhausted. TCR clonotypes can be identified and ranked in the exhausted T cells for further applications.
Description
METHODS FOR IDENTIFYING EXHAUSTED T CELLS AND T-CELL RECEPTORS THEREOF
CROSS-REFERENCE
[0001] This application claims the benefit of U.S. Provisional Application No. 63/639,993 filed on April 29, 2024, which is incorporated herein by reference in its entirety.
BACKGROUND OF THE INVENTION
[0002] T-cell receptor (TCR) specificity for tumor antigens can be a key factor that determines the effectiveness of an immune response against cancer. Identifying TCRs that are reactive to tumor antigens in a patient-specific manner can significantly improve the development of personalized cancer therapies. However, conventional methods for identifying these TCRs can be labor-intensive and challenging. Thus, the development of bioinformatic tools capable of predicting tumor antigen-reactive TCRs can be of great interest, presenting a potential solution to this problem.
SUMMARY OF THE INVENTION
[0003] Tumor antigen-reactive T cell receptors (TCRs) can be found on the surface of a subpopulation of CD8+ and/or CD4+ T cells obtained from a tumor microenvironment (e.g., tumor infiltrating leukocytes (TILs)) that may display exhaustion phenotype. Recognized herein is a need for improved methods and compositions for identifying such exhausted T cells from TILs and the tumor antigen- reactive TCRs of the identified exhausted T cells. The prediction algorithms that aim to harness the power of next-generation sequencing technologies, like single-cell transcriptome (scGEX) and TCR sequencing (scTCR), can provide improved methods for tumor antigen-reactive TCR identification within a tumor.
[0004] Provided herein is a method of identifying exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer, the method comprising: (a) providing single cell transcriptome data of the population of T cells; (b) classifying each T cell of the population of T cells as a CD4+ T cell or a CD8+ T cell based on an expression level of each classification gene of a set of at least 10 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster and a CD8+ cluster; and (c) calculating (i) a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 5 CD4+ exhaustion gene markers, and (ii) a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell
of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 5 CD8+ exhaustion gene markers, wherein the set of at least 5 CD4+ exhaustion gene markers is different from the set of at least 5 CD8+ exhaustion gene markers, wherein: each T cell within the CD4+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell, and each T cell within the CD8+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
[0005] Also provided herein is a method of identifying exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer, the method comprising: calculating a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD4+ T cell, wherein the calculating is based on an expression level of each CD4+ exhaustion gene marker of a set of at least 5 CD4+ exhaustion gene markers; wherein the expression level of each CD4+ exhaustion gene marker is from single cell transcriptome data of the population of T cells from the tumor microenvironment of the subject; wherein each T cell classified as a CD4+ T cell with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
[0006] In some embodiments, the method further comprises, prior to calculating, classifying a T cell from the population of T cells as a CD4+ T cell based on an expression level of each classification gene of a set of at least 10 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster.
[0007] Also provided herein is a method of identifying exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer, the method comprising: calculating a CD8+ exhaustion score and/or a CD8+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD8+ T cell, wherein the calculating is based on an expression level of each CD8+ exhaustion gene marker of a set of at least 5 CD8+ exhaustion gene markers; wherein the expression level of each CD8+ exhaustion gene marker is from single cell transcriptome data of the population of T cells from the tumor microenvironment of the subject; wherein each T cell classified as a CD8+ T cell with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
[0008] In some embodiments, the method further comprises, prior to calculating, classifying a T cell from the population of T cells as a CD8+ T cell based on an expression level of each classification gene of a set of at least 10 classification genes from the single cell transcriptome data, thereby generating a CD8+ cluster.
[0009] In some embodiments, the method further comprises classifying each T cell from the population of T cells as a CD4+ T cell or a CD8+ T cell based on an expression level of each classification gene of a set of at least 10 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster and a CD8+ cluster.
[0010] In some embodiments, the method further comprises calculating (i) a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 5 CD4+ exhaustion gene markers, and (ii) a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 5 CD8+ exhaustion gene markers.
[0011] In some embodiments, the set of at least 5 CD4+ exhaustion gene markers is different from the set of at least 5 CD8+ exhaustion gene markers.
[0012] Also provided herein is a method of classifying CD8+ T cells and CD4+ T cells in a population of T cells, the method comprising: (a) providing single cell transcriptome data of a population of T cells obtained from a tumor microenvironment of a subject having a cancer; (b) classifying each T cell of the population of T cells as a CD4+ T cell or a CD8+ T cell based on an expression level of each classification gene of a set of at least 40 classification genes selected from the group consisting of the genes of Table 2 from the single cell transcriptome data, thereby generating a CD4+ cluster and a CD8+ cluster, wherein a T cell of the CD4+ cluster is classified as CD4+ T cell, and wherein a T cell of the CD8+ cluster is classified as CD8+ T cell.
[0013] In some embodiments, the method further comprises calculating a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 5 CD4+ exhaustion gene markers.
[0014] In some embodiments, the expression level of each CD4+ exhaustion gene marker is from single cell transcriptome data of a population of T cells from the tumor microenvironment of the subject.
[0015] In some embodiments, each T cell within the CD4+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
[0016] In some embodiments, the method further comprises calculating a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 5 CD8+ exhaustion gene markers.
[0017] In some embodiments, the set of at least 5 CD4+ exhaustion gene markers is different from the set of at least 5 CD8+ exhaustion gene markers.
[0018] In some embodiments, the expression level of each CD8+ exhaustion gene marker is from single cell transcriptome data of a population of T cells from a tumor microenvironment of a subject having a cancer.
[0019] In some embodiments, each T cell within the CD8+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
[0020] In some embodiments, the method further comprises obtaining the population of T cells from the tumor microenvironment of the subject.
[0021] In some embodiments, obtaining comprises isolating a tumor or a tumor tissue comprising the population of T cells from the subject.
[0022] In some embodiments, the expression level is determined by mRNA transcripts.
[0023] In some embodiments, the method further comprises sequencing mRNAs from the population of T cells to obtain the single cell transcriptome data.
[0024] In some embodiments, the method further comprises providing single-cell T-cell receptor (scTCR) data of the population of T cells.
[0025] In some embodiments, the method further comprises sequencing the population of T cells to obtain the scTCR data of each T cell.
[0026] In some embodiments, the method further comprises identifying a TCR clonotype of an exhausted CD4+ T cell or an exhausted CD8+ T cell based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells.
[0027] In some embodiments, the method further comprises identifying TCR clonotypes of each exhausted CD4+ T cell of the population of T cells based on the scTCR data of exhausted CD4+ T cells.
[0028] In some embodiments, the method further comprises identifying TCR clonotypes of each exhausted CD8+ T cell of the population of T cells based on the scTCR data of exhausted CD8+ T cells.
[0029] In some embodiments, the method further comprises identifying TCR clonotypes of each exhausted CD4+ T cell and each exhausted CD8+ T cell of the population of T cells based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells.
[0030] In some embodiments, a TCR clonotype of a given exhausted CD4+ T cell is matched to the CD4+ exhaustion score and/or the CD4+ GSEA score of the same exhausted CD4+ T cell.
[0031] In some embodiments, the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell.
[0032] In some embodiments, the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell via a same single cell barcode.
[0033] In some embodiments, a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score and/or the CD8+ GSEA score of the same exhausted CD8+ T cell.
[0034] In some embodiments, the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell.
[0035] In some embodiments, the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell via a same single cell barcode [0036] In some embodiments, the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD4+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD4+ T cells.
[0037] In some embodiments, the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD4+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD4+ T cells.
[0038] In some embodiments, the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD8+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD8+ T cells.
[0039] In some embodiments, the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD8+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD8+ T cells.
[0040] In some embodiments, the method further comprises, prior to obtaining the single cell transcriptomic data, separating a subset of T cells from the population of T cells based on expression of a CD4+ and/or CD8+ exhaustion marker, thereby generating a subset of exhausted T cells and a subset of non-exhausted T cells.
[0041] In some embodiments, the CD4+ and/or CD8+ exhaustion marker comprises at least 5 genes selected from the group consisting of genes in Tables 3-6.
[0042] In some embodiments, separating comprises fluorescence activated cell sorting (FACS). [0043] In some embodiments, the method further comprises sequencing the subset of exhausted T cells and the subset of non-exhausted T cells using single cell sequencing or bulk sequencing.
[0044] In some embodiments, the sequencing does not comprise using a barcode.
[0045] In some embodiments, the population of T cells are obtained from a frozen sample or a fresh sample.
[0046] In some embodiments, the sample is a formalin-fixed paraffin- embedded (FFPE) sample. [0047] In some embodiments, the sample is not a FFPE sample.
[0048] In some embodiments, the method further comprises preparing a pharmaceutical composition using the candidate tumor-reactive TCR clonotype or a cell expressing the candidate tumor-reactive TCR clonotype.
[0049] Also provided herein is a method of identifying one or more T-cell receptors from exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer, the method comprising: (a) providing single cell transcriptome data of the population of T cells; (b) classifying each T cell of the population of T cells as a CD4+ T cell or a CD8+ T cell based on an expression level of each classification gene of a set of at least 10 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster and a CD8+ cluster; (c) calculating (i) a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 5 CD4+ exhaustion gene markers, and (ii) a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 5 CD8+ exhaustion gene markers, wherein the set of at least 5 CD4+ exhaustion gene markers is different from the set of at least 5 CD8+ exhaustion gene markers, wherein: each T cell within the CD4+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell, and each T cell within the CD8+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell; and (d) identifying TCR clonotypes of the exhausted CD4+ T cells and exhausted CD8+ T cells separately based on single-cell T-cell receptor (scTCR) data of exhausted CD4+ T cells and exhausted CD8+ T cells identified in (c).
[0050] In some embodiments, the set of at least 10 classification genes comprises at least 10 genes selected from the group consisting of PTPN13, TNFRSF4, CCR6, FOXP3, TSHZ2, MFHAS1, FAAH2, CD4, GK, IL2RA, CRADD, LTB, IRS2, KLRB1, TNFRSF25, LINC02694, THAD A, BATF, TNFRSF18, SELL, IL12RB2, FURIN, HIPK2, MAP3K5, TMEM173, CTSB, SAMHD1, ADAM19, ICOS, GNA15, EPSTI1, ZC3H12D, PHTF2, MAST4, UGP2, RAPGEF6, STAM, CTLA4, RORA, SATB1, ZEB1, PIM2, CD28, LDLRAD4, PELI1, RHBDD2, SOCS3, TRAF3, ABCC1, RNASET2, SPOCK2, ITK, STK24, SNX9, GZMA, RALGAPA1, GZMB, JMJD6, ZEB2, DUSP2, CLEC2B, GABARAPL1, SLA2, LITAF, AKNA, LYST, ITGA4, TUBA4A, IFNG, METRNL, CST7, IER5L, MXRA7, GGA2, AUTS2, APOBEC3G, NELL2, LYAR, GALNT11,
PTMS, CMC1, AOAH, LAG3, PRF1, TNFSF9, CCL5, CCL4, CTSW, GZMH, GNLY, YBX3, GZMK, CRTAM, CD8A, KLRK1, NKG7, KLRD1, CD8B, and LINC02446.
[0051] In some embodiments, classifying each T cell of the population of T cells comprises classifying each T cell of the population of T cells as a CD4+ cell and/or a CD8+ cell based on an expression level of each classification gene of a set of from 11 to 99 classification genes selected from the group consisting of PTPN13, TNFRSF4, CCR6, FOXP3, TSHZ2, MFHAS1, FAAH2, CD4, GK, IL2RA, CRADD, LTB, IRS2, KLRB1, TNFRSF25, LINC02694, THADA, BATF, TNFRSF18, SELL, IL12RB2, FURIN, HIPK2, MAP3K5, TMEM173, CTSB, SAMHD1, ADAM19, ICOS, GNA15, EPSTI1, ZC3H12D, PHTF2, MAST4, UGP2, RAPGEF6, STAM, CTLA4, RORA, SATB1, ZEB1, PIM2, CD28, LDLRAD4, PELI1, RHBDD2, SOCS3, TRAF3, ABCC1, RNASET2, SPOCK2, ITK, STK24, SNX9, GZMA, RALGAPA1, GZMB, JMJD6, ZEB2, DUSP2, CLEC2B, GABARAPL1, SLA2, LITAF, AKNA, LYST, ITGA4, TUBA4A, IFNG, METRNL, CST7, IER5L, MXRA7, GGA2, AUTS2, APOBEC3G, NELL2, LYAR, GALNT11, PTMS, CMC1, AOAH, LAG3, PRF1, TNFSF9, CCL5, CCL4, CTSW, GZMH, GNLY, YBX3, GZMK, CRTAM, CD8A, KLRK1, NKG7, KLRD1, CD8B, and LINC02446.
[0052] In some embodiments, the set of at least 5 CD4+ exhaustion gene markers comprises at least 5 genes selected from the group consisting of ADGRG1, CD200, CHN1, CPM, CTLA4, CXCL13, DRAIC, ENTPD1, GNG4, ICA1, IGFL2, IGFLR1, MY07A, PDE7B, NMB, PTPN13, TIGIT, TOX, TOX2, and TSHZ2.
[0053] In some embodiments, the set of at least 5 CD4+ exhaustion gene markers comprises from 6 to 20 genes selected from the group consisting of ADGRG1, CD200, CHN1, CPM, CTLA4, CXCL13, DRAIC, ENTPD1, GNG4, ICA1, IGFL2, IGFLR1, MY07A, PDE7B, NMB, PTPN13, TIGIT, TOX, TOX2, and TSHZ2.
[0054] In some embodiments, the set of at least 5 CD8+ exhaustion gene markers comprises at least 5 genes selected from the group consisting of CHN1, CLECL1, CTLA4, CXCL13, CXCR6, ENTPD1, HAVCR2, KRT86, LAG3, LAYN, LRRN3, MYO IE, MY07A, PDCD1, SIRPG, SRGAP3, STAM, TIGIT, TNFRSF18, and TOX.
[0055] In some embodiments, the set of at least 5 CD8+ exhaustion gene markers comprises from 6 to 20 genes selected from the group consisting of CHN1, CLECL1, CTLA4, CXCL13, CXCR6, ENTPD1, HAVCR2, KRT86, LAG3, LAYN, LRRN3, MYO IE, MY07A, PDCD1, SIRPG, SRGAP3, STAM, TIGIT, TNFRSF18, and TOX.
[0056] In some embodiments, calculating the CD4+ exhaustion score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each CD4+ exhaustion gene of the set of at least 5 CD4+ exhaustion gene markers to obtain the expression level of each CD4+ exhaustion gene of the
set of at least 5 CD4+ exhaustion gene markers; (ii) scaling the UMI count by dividing the UMI count for each gene by the total number of UMIs of the single cell transcriptome data of the T cell and then multiplying the quotient by a scale factor; (iii) applying a logarithmic transformation to obtain normalized UMI counts for the set of at least 5 CD4+ exhaustion gene markers; and (iv) calculating the CD4+ exhaustion score for the T cell as a mean of the normalized UMI counts, wherein the T cell with a CD4+ exhaustion score equal to or higher than 0.65 is identified as an exhausted CD4+ T cell.
[0057] In some embodiments, calculating the CD8+ exhaustion score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each CD8+ exhaustion gene of the set of at least 5 CD8+ exhaustion gene markers to obtain the expression level of each gene of the set of at least 5 exhaustion gene markers; (ii) scaling the UMI count by dividing the UMI count for each CD8+ exhaustion gene by the total number of UMIs of the single cell transcriptome data of the T cell and then multiplying the quotient by a scale factor ; (iii) applying a logarithmic transformation to obtain normalized UMI counts for the set of at least 5 CD8+ exhaustion gene markers; and (iv) calculating the CD8+ exhaustion score for the T cell as a mean of the normalized UMI counts, wherein the T cell with a CD8+ exhaustion score equal to or higher than 0.65 is identified as an exhausted CD8+ T cell. [0058] In some embodiments, the scale factor is 10,000.
[0059] In some embodiments, the set of at least 5 CD4+ exhaustion gene markers comprises at least 5 genes selected from the group consisting of ADD3, AGFG1, AHI1, AP3S1, ARAP2, ARHGEF3, ATP2A2, CCDC6, CD200, CD27, CH25H, CHN1, CNIH1, COTL1, CPM, CRYBG1, CTLA4, CXCL13, DUSP4, ELM01, FABP5, FBLN7, FBXO32, FKBP5, FOXN2, FYB1, GEM, GK, GPRIN3, GRSF1, GYPC, HIPK2, HMGB2, ICA1, IL6ST, IQGAP1, ITM2A, ITPR1, IARID2, LHFPL6, LIMSI, LRMP, LRRC8D, MAGEH1, MTHFD2, NAP1L4, NCOA7, NFATC2, NMB, NR3C1, NUDT16, PDCD1, PGM2L1, PHACTR2, POR, PTPN13, RBPJ, RNF19A, SESN1, SESN3, SH2D1A, SLA, SMARCA2, SMARCAD1, SMS, SNX9, SRGN, STAT3, TIAM1, TIGIT, TMEM243, TMEM64, TMEM70, TMPO, TNFAIP8, TNFRSF18, TNFSF8, TN IK, TOX, TOX2, TP53BP2, TP53INP1, TRABD2A, TSHZ2, UGCG, WNK1, YWHAQ and CD4
[0060] In some embodiments, the set of at least 5 CD4+ exhaustion gene markers comprises from 6 to 88 genes selected from the group consisting of ADD3, AGFG1, AHU, AP3S1, ARAP2, ARHGEF3, ATP2A2, CCDC6, CD200, CD27, CH25H, CHN1, CNIH1, COTL1, CPM, CRYBG1, CTLA4, CXCL13, DUSP4, ELM01, FABP5, FBLN7, FBXO32, FKBP5, FOXN2, FYB1, GEM, GK, GPRIN3, GRSF1, GYPC, HIPK2, HMGB2, ICA1, IL6ST, IQGAP1, ITM2A, ITPR1, JARID2, LHFPL6, LIMSI, LRMP, LRRC8D, MAGEH1, MTHFD2, NAP1L4, NCOA7, NFATC2, NMB, NR3C1, NUDT16, PDCD1, PGM2L1, PHACTR2, POR, PTPN13, RBPI, RNF19A, SESN1,
SESN3, SH2D1A, SLA, SMARCA2, SMARCAD1, SMS, SNX9, SRGN, STAT3, TIAM1, TIGIT, TMEM243, TMEM64, TMEM70, TMPO, TNFAIP8, TNFRSF18, TNFSF8, TNIK, TOX, TOX2, TP53BP2, TP53INP1, TRABD2A, TSHZ2, UGCG, WNK1, YWHAQ and CD4
[0061] In some embodiments, the set of at least 5 CD8+ exhaustion gene markers comprises at least 5 genes selected from the group consisting of AHSA1, ALOX5AP, BAG3, BST2, CACYBP, CARD16, CD3D, CD7, CD82, CHN1, CLECL1, CLEC2B, CLEC2D, CTLA4, CTSD, CXCL13, CXCR6, DUSP4, ENTPD1, FKBP1A, GAPDH, GEM, GZMB, HAVCR2, HLA-DRB1, HSPB1, ICOS, IQGAP1, ITGAE, KRT86, LAG3, LAYN, LSP1, NAP1L4, NR3C1, PDCD1, PELI1, PHLDA1, POLR1E, PRDM1, PTPN22, RABI 1FIP1, RAB27A, RBPJ, RGS1, RGS2, RHBDD2, RUNX2, SAMSN1, SERPINH1, SH3BGRL3, SLA, SNX9, SRGAP3, STAM, TIGIT, TNFRSF9, TOX, TTN, CD8A, and CD8B.
[0062] In some embodiments, the set of at least 5 CD8+ exhaustion gene markers comprises from 6 to 61 genes selected from the group consisting of AHSA1, ALOX5AP, BAG3, BST2, CACYBP, CARD16, CD3D, CD7, CD82, CHN1, CLECL1, CLEC2B, CLEC2D, CTLA4, CTSD, CXCL13, CXCR6, DUSP4, ENTPD1, FKBP1A, GAPDH, GEM, GZMB, HAVCR2, HLA-DRB1, HSPB1, ICOS, IQGAP1, ITGAE, KRT86, LAG3, LAYN, LSP1, NAP1L4, NR3C1, PDCD1, PELI1, PHLDA1, POLR1E, PRDM1, PTPN22, RABI 1FIP1, RAB27A, RBPJ, RGS1, RGS2, RHBDD2, RUNX2, SAMSN1, SERPINH1, SH3BGRL3, SLA, SNX9, SRGAP3, STAM, TIGIT, TNFRSF9, TOX, TTN, CD8A, and CD8B.
[0063] In some embodiments, (A) calculating the CD4+ GSEA score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) increasing a running-sum statistic for each CD4+ exhaustion gene of all genes that appears in the set of at least 5 CD4+ exhaustion gene markers and decreasing a running-sum statistic for each CD4+ exhaustion gene of all genes that does not appear in the set of at least 5 CD4+ exhaustion gene markers; and (iv) calculating the CD4+ GSEA score based on running-sum statistics, wherein the T cell with a CD4+ GSEA score equal to or higher than a cutoff value established from data distribution is identified as an exhausted CD4+ T cell, or (B) calculating the CD4+ GSEA score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) calculating an area under the curve (AUC) value of a set of at least 5 CD4+ exhaustion genes; and (iv) calculating the CD4+ GSEA score based on AUC values, wherein the T cell with a CD4+ GSEA score equal to or higher than a cutoff value established from data distribution is identified as an exhausted CD4+ T cell.
[0064] In some embodiments, the cutoff value established from data distribution in (A) or (B) is 0.2. [0065] In some embodiments, calculating in (B)(iii) comprises assessing recovery of the set of at least 5 CD4+ exhaustion genes.
[0066] In some embodiments, the set of at least 5 CD4+ exhaustion genes are selected among the top ranked genes from the UMI rank obtained in (B)(ii).
[0067] In some embodiments, (A) calculating the CD8+ GSEA score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) increasing a running-sum statistic for each CD8+ exhaustion gene of all genes that appears in the set of at least 5 CD8+ exhaustion gene markers and decreasing a running-sum statistic for each CD8+ exhaustion gene of all genes that does not appear in the set of at least 5 CD8+ exhaustion gene markers; and (iv) calculating the CD8+ GSEA score based on running-sum statistics, wherein the T cell with a CD8+ GSEA score equal to or higher than a cutoff value established from data distribution is identified as an exhausted CD8+ T cell, or (B) calculating the CD8+ GSEA score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) calculating an area under the curve (AUC) value of a set of at least 5 CD8+ exhaustion genes; and (iv) calculating the CD8+ GSEA score based on AUC values, wherein the T cell with a CD8+ GSEA score equal to or higher than a cutoff value established from data distribution is identified as an exhausted CD8+ T cell.
[0068] In some embodiments, the cutoff value from data distribution in (A) or (B) is 0.3.
[0069] In some embodiments, calculating in (B)(iii) comprises assessing recovery of the set of at least 5 CD8+ exhaustion genes.
[0070] In some embodiments, the set of at least 5 CD8+ exhaustion genes are selected among the top ranked genes from the UMI rank obtained in (B)(ii).
[0071] In some embodiments, the method further comprises calculating the CD4+ exhaustion score and the CD4+ GSEA score for the T cell of the CD4+ cluster.
[0072] In some embodiments, the method further comprises calculating the CD8+ exhaustion score and the CD8+ GSEA score for the T cell of the CD8+ cluster.
[0073] In some embodiments, the method further comprises identifying TCR clonotypes of the exhausted CD4+ T cells and exhausted CD8+ cells separately based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells identified in (c), wherein the exhausted CD4+ T cells have both the CD4+ exhaustion score and the CD4+ GSEA score above the threshold value, and the
exhausted CD8+ T cells have both the CD8+ exhaustion score and the CD8+ GSEA score above the threshold value.
[0074] In some embodiments, the method further comprises identifying TCR clonotypes of the exhausted CD4+ T cells and exhausted CD8+ cells separately based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells identified in (c), wherein the exhausted CD4+ T cells have the CD4+ exhaustion score or the CD4+ GSEA score above the threshold value, and the exhausted CD8+ T cells have the CD8+ exhaustion score or the CD8+ GSEA score above the threshold value.
[0075] In some embodiments, (a) for each TCR clonotype identified in a CD4+ exhausted T cell, the method comprises calculating a mean or median CD4+ exhaustion score and/or exhaustion score and a mean or median CD4+ GSEA score for all CD4+ exhausted T cells having the same TCR clonotype; and/or (b) for each TCR clonotype identified in a CD8+ exhausted T cell, the method comprises calculating a mean or median CD8+ exhaustion score and/or exhaustion score and a mean or median CD8+ GSEA score for all CD8+ exhausted T cells having the same TCR clonotype.
[0076] In some embodiments, (a) for each TCR clonotype identified in a CD4+ exhausted T cell, the method comprises identifying a maximum CD4+ exhaustion score and/or exhaustion score and a maximum CD4+ GSEA score for all CD4+ exhausted T cells having the same TCR clonotype; and/or (b) for each TCR clonotype identified in a CD8+ exhausted T cell, the method comprises identifying a maximum CD8+ exhaustion score and/or exhaustion score and a maximum CD8+ GSEA score for all CD8+ exhausted T cells having the same TCR clonotype.
[0077] In some embodiments, a TCR clonotype of a given exhausted CD4+ T cell is matched to the CD4+ exhaustion score and/or the CD4+ GSEA score of the same exhausted CD4+ T cell.
[0078] In some embodiments, the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell.
[0079] In some embodiments, the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell via a same single cell barcode. [0080] In some embodiments, a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score and/or the CD8+ GSEA score of the same exhausted CD8+ T cell.
[0081] In some embodiments, the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell.
[0082] In some embodiments, the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell via a same single cell barcode. [0083] In some embodiments, the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD4+ T cells and
a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD4+ T cells.
[0084] In some embodiments, the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD4+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD4+ T cells.
[0085] In some embodiments, the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD8+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD8+ T cells.
[0086] In some embodiments, the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD8+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD8+ T cells.
[0087] In some embodiments, further comprising selecting candidate tumor-reactive TCR clonotypes from the TCR clonotypes identified for the exhausted CD4+ T cells and/or the exhausted CD8+ T cells, and wherein the candidate tumor-reactive TCR clonotypes are further quality checked by (i) unique pairing of TCR alpha chain and TCR beta chain, (ii) match to known TCRs from a public database; and/or (iii) expression of innate immune cell markers.
[0088] In some embodiments, quality checking comprises excluding candidate tumor-reactive TCR clonotypes which (i) have unique pairing of TCR alpha chain and TCR beta chain, (ii) match to known TCRs from a public database; and/or (iii) express innate immune cell markers.
[0089] In some embodiments, candidate tumor-reactive TCR clonotypes that match to a known TCR that recognizes a non-oncogenic pathogen are not selected.
[0090] In some embodiments, the method further comprises ranking the candidate tumor-reactive TCR clonotypes of the exhausted CD4+ T cells based on clone size.
[0091] In some embodiments, the method further comprises ranking the candidate tumor-reactive TCR clonotypes of the exhausted CD8+ T cells based on clone size.
[0092] In some embodiments, the method further comprises ranking the candidate tumor-reactive TCR clonotypes with similar clone sizes based on the mean or median CD4+ exhaustion score, the maximum CD4+ exhaustion score, the mean or median CD4+ GSEA score, and/or the maximum CD4+ GSEA score for all CD4+ exhausted T cells.
[0093] In some embodiments, the method further comprises ranking the candidate tumor-reactive TCR clonotypes with similar clone sizes based on the mean or median CD8+ exhaustion score, the maximum CD8+ exhaustion score, the mean or median CD8+ GSEA score, and/or the maximum CD8+ GSEA score for all CD8+ exhausted T cells.
[0094] In some embodiments, the same TCR clonotype is determined by having the same CDR3 sequence.
[0095] In some embodiments, the candidate tumor-reactive TCR clonotypes that match to known TCRs are determined by having the same CDR3 sequence.
[0096] In some embodiments, the candidate tumor-reactive TCR clonotype of a proliferating cell is given a higher weighting value when ranking the candidate tumor-reactive TCR clonotypes.
[0097] In some embodiments, the candidate tumor-reactive TCR clonotypes are predicted to be therapeutically relevant.
[0098] In some embodiments, a median positive predictive value (PPV) is at least 0.1 for CD4+ TCR clones or at least 0.1 for CD8+ TCR clones.
[0099] In some embodiments, the method further comprises selecting at least one candidate tumor- reactive TCR clonotype from at least the top 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 or more of the candidate tumor-reactive TCR clonotypes ranked.
[0100] In some embodiments, the method further comprises delivering a nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype of the candidate tumor- reactive TCR clonotypes into a cell.
[0101] In some embodiments, the method further comprises administering the nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype of the candidate tumor- reactive TCR clonotypes, or a cell comprising the nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype of the candidate tumor-reactive TCR clonotypes into a subject.
[0102] In some embodiments, the subject is the same subject where the population of T cells are obtained.
[0103] In some embodiments, the population of T cells are tumor-infiltrating lymphocytes (TILs).
[0104] In some embodiments, the population of T cells comprises at least 100, at least 500, at least 1,000, at least 2,000, at least 5,000, at least 10,000 or more cells.
[0105] Also provided herein is a method of identifying one or more T-cell receptors as one or more candidate tumor-reactive TCRs from exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer, the method comprising: (a) providing single cell transcriptome data and single-cell T-cell receptor (scTCR) data of the population of T cells
comprising exhausted CD4+ T cells and exhausted CD8+ T cells; and (b) identifying TCR clonotypes of the exhausted CD4+ T cells or the exhausted CD8+ cells based on the scTCR data of the exhausted CD4+ T cells or the exhausted CD8+ T cells, wherein the exhausted CD4+ T cells or the exhausted CD8+ T cells are identified based on the single cell transcriptome data.
[0106] In some embodiments, the exhausted CD4+ T cells or the exhausted CD8+ T cells are identified by the method of any one of claims 1-45.
[0107] In some embodiments, each cell of the exhausted CD4+ T cells or the exhausted CD8+ T cells has an exhaustion score and/or a GSEA score equal to or higher than a threshold value.
[0108] In some embodiments, the candidate tumor-reactive TCR induces activation of NF AT.
[0109] In some embodiments, the candidate tumor-reactive TCR induces expression of CD69, IFN- y, TNF-a, IL-2, and/or IL- 18.
[0110] Also provided herein is a nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype selected by the method of any one of the foregoing embodiments, [oni] Also provided herein is a cell comprising a TCR comprising the at least one candidate tumor- reactive TCR clonotype selected by the method of any one of the foregoing embodiments or the nucleic acid of any one of the foregoing embodiments.
[0112] Also provided herein is a pharmaceutical composition comprising a TCR comprising (a) the at least one candidate tumor-reactive TCR clonotype selected by the method of any one of the foregoing embodiments, the nucleic acid of any one of the foregoing embodiments, or the cell of any one of the foregoing embodiments, and (b) a pharmaceutically acceptable carrier.
[0113] Also provided herein is use of a TCR comprising the at least one candidate tumor-reactive TCR clonotype selected by the method of any one of the foregoing embodiments, the nucleic acid of any one of the foregoing embodiments, the cell of any one of the foregoing embodiments, or the pharmaceutical composition of any one of the foregoing embodiments in the manufacturing of a medicament in treating a cancer in a subject in need thereof.
[0114] In some embodiments, the cancer is selected from the group consisting of bone cancer, blood cancer, lung cancer, liver cancer, pancreatic cancer, skin cancer, cancer of the head or neck, cutaneous or intraocular melanoma, uterine cancer, ovarian cancer, rectal cancer, cancer of the anal region, stomach cancer, colon cancer, breast cancer, prostate cancer, carcinoma of the sexual and reproductive organs, Hodgkin’s Disease, cancer of the esophagus, cancer of the small intestine, cancer of the endocrine system, cancer of the thyroid gland, cancer of the parathyroid gland, cancer of the adrenal gland, sarcoma of soft tissue, cancer of the bladder, cancer of the kidney, renal cell carcinoma, carcinoma of the renal pelvis neoplasms of the central nervous system (CNS), neuroectodermal cancer, spinal axis tumors glioma, meningioma, and pituitary adenoma.
INCORPORATION BY REFERENCE
[0115] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
BRIEF DESCRIPTION OF THE DRAWINGS
[0116] The novel features of the present disclosure are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also “Figure” and “FIG.” herein), of which:
[0117] FIG. 1 depicts the mRNA expression level of the CD4 gene and CD8A gene from scGEX data using a normalized UMI unit.
[0118] FIGs. 2A-2C depicts the separation of tumor infiltrating lymphocytes (TILs) into CD4+ or CD8+ population. Each dot represents a single T cell. FIG. 2A shows Uniform Manifold Approximation and Projection (UMAP) on two clusters formed by using the 99-gene gene signature which is detailed in Table 2. FIG. 2B shows gene expression levels for the CD4 gene in normalized UMI counts. FIG. 2C shows gene expression levels for CD8A gene in normalized UMI counts accordingly.
[0119] FIG. 3A depicts a UMAP on clustering and annotation of T cell functional subtypes. FIG. 3B depicts the corresponding CD4+ or CD8+ identity defined as illustrated in FIG. 2 plotted on the same UMAP.
[0120] FIG. 4A depicts the annotated clusters which represent T cell functional subtypes as shown in FIG. 3A (left). An exhausted CD8 cluster (i.e. CD8.EX) was identified by the clustering method. Identification of exhausted CD4 (i.e. CD4_EX) and exhausted CD8 (CD8_EX) cells by the 20-gene exhaustion signature score-based method (right). FIG. 4B depicts exhaustion signature score assignment for each cell from TILs. The top graph shows the distribution of CD8 exhaustion signature scores across annotated functional T cell subtypes. The bottom graph shows the distribution of CD4 exhaustion signature scores across annotated functional T cell subtypes. Each dot represents a T cell. [0121] FIG. 5A depicts exhausted and non-exhausted TILs by 61 -gene CD8 exhaustion signature. FIG. 5B depicts the correlation between the GSEA score of the 61 -gene CD8 exhaustion signature and the CD8 exhaustion signature score. FIG. 5C depicts area under the curve (AUC) scores from the 61 -gene CD8 exhaustion signature for each T cell subtype.
[0122] FIG. 6A depicts exhausted and non-exhausted TILs by 88-gene CD4 exhaustion signature. FIG. 6B depicts the correlation between the GSEA score of the 88-gene CD4 exhaustion signature and the CD4 exhaustion signature score. FIG. 6C depicts AUC scores from the 88-gene CD4 exhaustion signature for each T cell subtype.
[0123] FIGs. 7A-7C depict a quality control step on lists of identified exhausted CD4+ or CD8+ clonotypes. FIG. 7A depicts expression of CD4. FIG. 7B depicts expression of CD8A. FIG. 7C depicts expression of CD8B. Given the early computational separation of CD4+ and CD8+ populations, CD4+ or CD8+ exhausted clones going towards the final prioritization step indeed expressed CD4 or CD8 genes accordingly. There are a few cases that a cell expressed both CD8 (CD8A and/or CD8B) and CD4 which may represent a doublet cell situation.
[0124] FIG. 8 summarizes an exemplary TCR selection and validation process.
[0125] FIGs. 9A-9B show exhaustion signature score for identified TCRs. FIG. 9A shows CD8 exhaustion scores. FIG. 9B shows CD4 exhaustion scores.
[0126] FIGs. 10A-10D show results of TCR antigen specificity diversity, organized by cancer-type of tumor sample. FIG. 10A shows results for lung cancer sample. FIG. 10B shows results for head and neck cancer samples. FIG. 10C shows results for colorectal cancer samples. FIG. 10D shows results for ovarian and breast cancers.
[0127] FIG. 11 summarizes an exemplary verification process of prioritizing top 10 clones and performing functional validation to calculate a positive prediction value (PPV) for the top clones.
[0128] FIGs. 12A-12C show PPV results calculated fortop clones. FIG. 12A shows CD8 PPV results. FIG. 12B shows CD4 PPV results. FIG. 12C shows CD4 CD8 combined PPV results.
[0129] FIGs. 13A-13C show antigen capture results for overall selectable and top clones. FIG. 13A shows results for CD8 clones. FIG. 13B shows results for CD4 clones. FIG. 13C shows results for combined CD4 and CD8 clones.
DETAILED DESCRIPTION
Introduction
[0130] The present disclosure provides predictive identification of tumor antigen-reactive T-cell receptors (TCRs), for example, in personalized adoptive T cell therapy in solid tumors. With single cell transcriptome (scGEX) and TCR (scTCR) sequence data generated on tumor infiltrated lymphocytes (TILs), a tumor antigen-agnostic prediction strategy can be developed to identify tumor- specific TCRs, for both CD8+ and CD4+ T cells, using molecular signatures captured in the sequencing data. In some embodiments, scGEX and scTCR data are generated with a known
sequencing platform (e.g., the lOx sequencing platform) with aimed 20,000 and 5,000 reads coverage for GEX and TCR per cell, respectively.
[0131] Provided herein is an end-to-end algorithm with scGEX and scTCR as the algorithm’s sole input. T cells are partitioned into CD8+ and CD4+ compartments bioinformatically using scGEX data as input. Although this step can be done experimentally at an initial TILs sorting step, experimental sorting potentially leads to lower yields of T cells post-sort. This is especially a concern for CD8+ T cells, given the imbalanced CD4:CD8 ratio often observed in solid tumors and a higher cost of goods. Furthermore, bioinformatically sorting the CD8+ and CD4+ T cells can reduce material costs that would come from sorting. Exhaustion scores for each cell in the two compartments derived from the above step can be calculated. CD4 and CD8+ T cell clones are ranked, separately, based on clone size and exhaustion scores. The Top N clones (for example, an N of 10 clones encompassing both CD4+ and CD8+ clones) can be selected.
[0132] Further provided herein are gene signatures developed to identify tumor-specific TCRs, for both CD8+ and CD4+ T cells, using molecular signatures captured in the sequencing data. A gene signature to partition T cells into CD4+ and CD8+ can be used. A signature gene list can be developed for compartmentalizing with only scGEX data. This step can be pivotal due to lower mRNA expression level of the CD4+ gene and the high drop off rate in single cell sequencing. Two gene lists or exhaustion gene signatures for CD4+ and CD8, respectively, can be used to estimate the exhaustion state as represented by an exhaustion score of the CD4+ or CD8+ cells. For example, a short gene list (20 genes) for CD4+ can be used to calculate the exhaustion scores on CD4+ follicular helper cells (CD4.FH) relying on a sequencing depth-based normalization method on gene expression measured by scGEX using unique molecular identifiers (UMIs). In addition, the gene set enrichment analysis (GSEA) can be applied on a longer list of 88 genes to calculate the enrichment score of this exhaustion signature in each cell instead, therefore the calculation can be raw UMI rank-based which means it's normalization-method-independent. The exhaustion scores from the two lists serve to complement one another. Similarly, for CD8+, a short gene list (20 genes) to compute normalization-based score and a long gene list (61 genes) to computed GSEA score can be used to estimate the exhaustion state of CD8+ cells. The performance of the end-to-end algorithm for selecting 10 candidate tumor-reactive TCR clonotypes has been validated experimentally and demonstrated with a combined 70% positive predictive value (PPV) as shown in Table 7.
Definitions
[0133] To facilitate an understanding of the present disclosure, a number of terms and phrases are defined below. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
[0134] The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, e.g., the limitations of the measurement system. For example, “about” can mean within 1 or more than 1 standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, up to 10%, up to 5%, or up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated the term “about” meaning within an acceptable error range for the particular value should be assumed.
[0135] The term “and/or” used herein to link one or more species means one of the species or any combinations of the one or more species.
[0136] An antigen is a foreign substance to the body that induces an immune response. A “neoantigen” refers to a class of tumor antigens which arise from tumor-specific changes in proteins. Neoantigens encompass, but are not limited to, tumor antigens which arise from, for example, a substitution in a protein sequence, a frame shift mutation, a fusion polypeptide, an in-frame deletion, an insertion, and expression of an endogenous retroviral polypeptide.
[0137] A “neoepitope” refers to an epitope that is not present in a reference, such as a non-diseased cell, e.g., a non-cancerous cell or a germline cell, but is found in a diseased cell, e.g., a cancer cell. This includes situations where a corresponding epitope is found in a normal non-diseased cell or a germline cell but, due to one or more mutations in a diseased cell, e.g., a cancer cell, the sequence of the epitope is changed so as to result in the neoepitope.
[0138] A “mutation” refers to a change of or a difference in a nucleic acid sequence (e.g., a nucleotide substitution, addition or deletion) compared to a reference nucleic acid. A “somatic mutation” can occur in any of the cells of the body except the germ cells (sperm and egg) and are not passed on to children. These alterations can (but do not always) cause cancer or other diseases. In some embodiments, a mutation is a non-synonymous mutation. A “non-synonymous mutation” refers to a mutation, for (e.g., a nucleotide substitution), which does result in an amino acid change such as an amino acid substitution in the translation product. A “frameshift” occurs when a mutation disrupts the normal phase of a gene’s codon periodicity (also known as “reading frame”), resulting in translation of a non-native protein sequence. It is possible for different mutations in a gene to achieve the same altered reading frame.
[0139] An “antigen-presenting cell” (APC) refers to a cell that expresses an MHC molecule and can present an epitope in complex with the MHC molecule. The cell can present peptide fragments of protein antigens in association with MHC molecules on its cell surface. The term includes professional
antigen-presenting cells (e.g., B lymphocytes, macrophages, dendritic cells) as well as any other cells that express an MHC and can present an epitope in complex with the MHC (e.g., keratinocytes, endothelial cells, astrocytes, fibroblasts, oligodendrocytes). The APC can be a tissue-specific APC (e.g., Langerhans cells, Kupffer cells, microglia). The APC can be a cell that is engineered to express an MHC molecule or a cell that expresses an endogenous MHC molecule.
[0140] The term “derived” when used to discuss an epitope is a synonym for “prepared.” A derived epitope can be isolated from a natural source, or it can be synthesized according to standard protocols in the art. Synthetic epitopes can comprise artificial amino acid residues “amino acid mimetics,” such as D isomers of natural occurring L amino acid residues or non-natural amino acid residues such as cyclohexylalanine. A derived or prepared epitope can be an analog of a native epitope. The term “derived from” refers to the origin or source, and can include naturally occurring, recombinant, unpurified, purified or differentiated molecules or cells. For example, an expanded or induced antigen specific T cell can be derived from a T cell. For example, an expanded or induced antigen specific T cell can be derived from an antigen specific T cell in a biological sample. For example, a matured APC (e.g., a professional APC) can be derived from a non-matured APC (e.g., an immature APC). For example, an APC can be derived from a monocyte (e.g., a CD14+ monocyte). For example, an APC can be derived from a bone marrow cell.
[0141] An “epitope” is the collective features of a molecule (e.g., a peptide’s charge and primary, secondary and tertiary structure) that together form a site recognized by another molecule (e.g., an immunoglobulin, T-cell receptor, HLA molecule, or chimeric antigen receptor). For example, an epitope can be a set of amino acid residues involved in recognition by a particular immunoglobulin; a Major Histocompatibility Complex (MHC) receptor; or in the context of T cells, those residues recognized by a T-cell receptor protein and/or a chimeric antigen receptor. Epitopes can be prepared by isolation from a natural source, or they can be synthesized according to standard protocols in the art. Synthetic epitopes can comprise artificial amino acid residues, amino acid mimetics, (such as D isomers of naturally-occurring L amino acid residues or non-naturally-occurring amino acid residues). Throughout this disclosure, epitopes can be referred to in some cases as peptides or peptide epitopes. In certain embodiments, there is a limitation on the length of a peptide of the present disclosure. The embodiment that is length-limited occurs when the protein or peptide comprising an epitope described herein comprises a region (i.e., a contiguous series of amino acid residues) having 100% identity with a native sequence. In order to avoid the definition of epitope from reading, e.g., on whole natural molecules, there is a limitation on the length of any region that has 100% identity with a native peptide sequence. Thus, for a peptide comprising an epitope described herein and a region with 100% identity with a native peptide sequence, the region with 100% identity to a native sequence generally has a
length of: less than or equal to 600 amino acid residues, less than or equal to 500 amino acid residues, less than or equal to 400 amino acid residues, less than or equal to 250 amino acid residues, less than or equal to 100 amino acid residues, less than or equal to 85 amino acid residues, less than or equal to 75 amino acid residues, less than or equal to 65 amino acid residues, and less than or equal to 50 amino acid residues. In certain embodiments, an “epitope” described herein is comprised by a peptide having a region with less than 51 amino acid residues that has 100% identity to a native peptide sequence, in any increment down to 5 amino acid residues; for example 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 amino acid residues.
[0142] A “T cell epitope” refers to a peptide sequence bound by an MHC molecule in the form of a peptide-MHC (pMHC) complex. A peptide-MHC complex can be recognized and bound by a TCR of a T cell (e.g., a cytotoxic T-lymphocyte or a T-helper cell).
[0143] A “T cell” includes CD4+ T cells and CD8+ T cells. The term T cell also includes both T helper 1 type T cells and T helper 2 type T cells. T cells can be generated by the method described in the application, for a clinical application. T cells or adoptive T cells referred to here, such as for a clinical application are cells isolated from a biological source, manipulated and cultured ex vivo and prepared into a drug candidate for a specific therapy such as a cancer therapy. When candidate cells pass specific qualitative and quantitative criteria for fitness for a clinical application, the drug candidate can be designated a drug product. In some cases, a drug product is selected from a number of drug candidates. When candidate vaccines comprising candidate antigens pass specific qualitative and quantitative criteria for fitness for a clinical application, such candidate vaccines can be designated a drug product. In the context of this application, a drug product can be a vaccine, such as an mRNA-based vaccine, a T cell, more specifically, a population of T cells, or more specifically a population of T cells with heterogeneous characteristics and subtypes. For example, a drug product, as disclosed herein can have a population of T cells comprising CD8+ T cells, CD4+ T cells, with cells at least above a certain exhibiting antigen specificity, a certain percentage of each exhibiting a memory phenotype, among others.
[0144] As used herein “tumor infiltrating lymphocytes” or “TILs” refers to a population of cells originally obtained as white blood cells that have left the bloodstream of a subject and migrated into a tumor. TILs include, but are not limited to, CD8+ cytotoxic T cells (lymphocytes), Thl and Thl7 CD4+ T cells, and natural killer cells.
[0145] An “immune cell” refers to a cell that plays a role in the immune response. Immune cells are of hematopoietic origin, and include lymphocytes, such as B cells, T cells and natural killer cells;
myeloid cells, such as monocytes, macrophages (e.g., Ml Macrophages), dendritic cells, eosinophils, mast cells, basophils, and granulocytes. Immune cells can migrate into tumors.
[0146] A “T-cell receptor” (“TCR”) refers to a molecule, whether natural or partly or wholly synthetically produced, found on the surface of T lymphocytes (T cells) that recognizes an antigen bound to a major histocompatibility complex (MHC) molecule. The ability of a T cells to recognize an antigen associated with various diseases (e.g., cancers) is conferred by its TCR, which is made up of both an alpha (a) chain and a beta ( ) chain or a gamma (y) and a delta (8) chain. The proteins which make up these chains are encoded by DNA, which employs a unique mechanism for generating the tremendous diversity of the TCR. This multi-subunit immune recognition receptor associates with the CD3 complex and binds peptides presented by the MHC class I and II proteins on the surface of antigen-presenting cells (APCs). Binding of a TCR to a peptide on an APC is a central event in T cell activation.
[0147] A “TCR clonotype” refers to a distinct T cell receptor (TCR) comprising a pair of TCR alpha chain and a TCR beta chain (or a pair of TCR gamma chain and a TCR delta chain) that is unique to a specific T cell clone. The TCR is a complex of integral membrane proteins that participates in the activation of T cells in response to an antigen. Each T cell has a unique TCR, and when that cell replicates, all of its descendants will have the exact same TCR — this group of cells is referred to as a T cell clone. The term “TCR clonotype” can be used to identify and track these T cell clones as they respond to specific antigens and participate in immune responses. The diversity of TCR clonotypes in an individual's immune system can provide valuable insights into the breadth and specificity of their immune response. The size of a T cell clone can be determined through a process called T cell receptor (TCR) sequencing. In this process, DNA from T cells can be extracted and sequenced to identify the unique genetic arrangement that encodes the TCR of each cell. This unique sequence can be specific to each T cell clone. By counting the frequency of each unique TCR sequence, the relative size of each T cell clone in the sample can be determined due to the fact that every T cell in a specific clone has the same unique TCR sequence. So, the more frequently a particular TCR sequence appears in the sequencing data, the larger the size of that T cell clone. This type of analysis can provide valuable insights into the diversity and specificity of the immune response.
[0148] As used herein, a “chimeric antigen receptor” or “CAR” refers to an antigen binding protein in that includes an immunoglobulin antigen binding domain (e.g., an immunoglobulin variable domain) and a T cell receptor (TCR) constant domain. As used herein, a “constant domain” of a TCR polypeptide includes a membrane-proximal TCR constant domain, a TCR transmembrane domain and/or a TCR cytoplasmic domain, or fragments thereof. For example, in some embodiments, a CAR is a monomer that includes a polypeptide comprising an immunoglobulin heavy chain variable domain
linked to a TCRp constant domain. In some embodiments, the CAR is a dimer that includes a first polypeptide comprising an immunoglobulin heavy or light chain variable domain linked to a TCRa or TCRP constant domain and a second polypeptide comprising an immunoglobulin heavy or light chain variable domain (e.g., a K or X variable domain) linked to a TCRp or TCRa constant domain.
[0149] “Major Histocompatibility Complex” or “MHC” is a cluster of genes or the protein products thereof that plays a role in control of the cellular interactions responsible for physiologic immune responses. The terms “major histocompatibility complex” and the abbreviation “MHC” can include any class of MHC molecule, such as MHC class I and MHC class II molecules, and relate to a complex of genes which occurs in all vertebrates. In humans, the MHC complex is also known as the human leukocyte antigen (HLA) complex. Thus, a “Human Leukocyte Antigen” or “HLA” refers to a human Major Histocompatibility Complex (MHC) protein (see, e.g., Stites, et al., Immunology, 8TH Ed., Lange Publishing, Los Altos, Calif. (1994). For a detailed description of the MHC and HLA complexes, see, Paul, Fundamental Immunology, 3rd Ed., Raven Press, New York (1993).
[0150] The major histocompatibility complex in the genome comprises the genetic region whose gene products expressed on the cell surface are important for binding and presenting endogenous and/or foreign antigens and thus for regulating immunological processes. MHC proteins or molecules are important for signaling between lymphocytes and antigen-presenting cells or diseased cells in immune reactions. MHC proteins or molecules bind peptides and present them for recognition by T-cell receptors. The proteins encoded by the MHC can be expressed on the surface of cells, and display both self-antigens (peptide fragments from the cell itself) and non-self-antigens (e.g., fragments of invading microorganisms) to a T-cell. MHC binding peptides can result from the proteolytic cleavage of protein antigens and represent potential lymphocyte epitopes, (e.g., T cell epitope and B cell epitope). MHCs can transport the peptides to the cell surface and present them there to specific cells, such as cytotoxic T-lymphocytes, T-helper cells, or B cells. The MHC region can be divided into three subgroups, class I, class II, and class III. MHC class I proteins can contain an a-chain and β2 -microglobulin (not part of the MHC encoded by chromosome 15). They can present antigen fragments to cytotoxic T-cells. MHC class II proteins can contain a- and P-chains and they can present antigen fragments to T-helper cells. MHC class III region can encode for other immune components, such as complement components and cytokines. The MHC can be both polygenic (there are several MHC class I and MHC class II genes) and polymorphic (there are multiple alleles of each gene).
[0151] A “receptor” refers to a biological molecule or a molecule grouping capable of binding a ligand. A receptor can serve, to transmit information in a cell, a cell formation or an organism. A receptor comprises at least one receptor unit, for example, where each receptor unit can consist of a protein molecule. A receptor has a structure which complements that of a ligand and can complex the ligand
as a binding partner. The information is transmitted in particular by conformational changes of the receptor following complexation of the ligand on the surface of a cell. In some embodiments, a receptor is to be understood as meaning in particular proteins of MHC classes I and II capable of forming a receptor/ligand complex with a ligand, in particular a peptide or peptide fragment of suitable length. A “ligand” refers to a molecule which has a structure complementary to that of a receptor and is capable of forming a complex with this receptor. In some embodiments, a ligand is to be understood as meaning a peptide or peptide fragment which has a suitable length and suitable binding motifs in its amino acid sequence, so that the peptide or peptide fragment is capable of forming a complex with MHC proteins such as MHC class I or MHC class II proteins. In some embodiments, a “receptor/ligand complex” is also to be understood as meaning a “receptor/peptide complex” or “receptor/peptide fragment complex”, including a peptide- or peptide fragment-presenting MHC molecule such as MHC class I or MHC class II molecules.
[0152] A “native” or a “wild type” sequence refers to a sequence found in nature. The term “naturally occurring” as used herein refers to the fact that an object can be found in nature. For example, a peptide or nucleic acid that is present in an organism (including viruses) and can be isolated from a source in nature and which has not been intentionally modified by man in the laboratory is naturally occurring. The term “naturally processed” as used herein in the context of antigen processing or presentation, refers to the fact that the antigen is not pulsed or overexpressed in a cell by man in the laboratory but is presented by the cell as a product of endogenous pathways of antigen processing and presentation (e.g., via the transporter associated with antigen processing (TAP) pathway to present intracellular antigen on MHC I).
[0153] The term “motif’ refers to a pattern of residues in an amino acid sequence of defined length, for example, a peptide of less than about 15 amino acid residues in length, or less than about 13 amino acid residues in length, for example, from about 8 to about 13 amino acid residues (e.g., 8, 9, 10, 11, 12, or 13) for a class I HLA motif and from about 6 to about 25 amino acid residues (e.g., 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25) for a class II HLA motif, which is recognized by a particular HLA molecule. Motifs are typically different for each HLA protein encoded by a given human HLA allele. These motifs differ in their pattern of the primary and secondary anchor residues. In some embodiments, an MHC class I motif identifies a peptide of 7, 8 9, 10, 11, 12 or 13 amino acid residues in length. In some embodiments, an MHC class II motif identifies a peptide of 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 26 amino acid residues in length. A “cross -reactive binding” peptide refers to a peptide that binds to more than one member of a class of binding pair members (e.g., a peptide bound by more than one HLA molecule, or a peptide bound by both a class I HLA molecule and a class II HLA molecule).
[0154] The term “residue” refers to an amino acid residue or amino acid mimetic residue incorporated into a peptide or protein by an amide bond or amide bond mimetic, or that is encoded by a nucleic acid (DNA or RNA). The nomenclature used to describe peptides or proteins follows the conventional practice. The amino group is presented to the left (the amino- or N-terminus) and the carboxyl group to the right (the carboxy- or C-terminus) of each amino acid residue. When amino acid residue positions are referred to in a peptide epitope, they are numbered in an amino to carboxyl direction with the first position being the residue located at the amino terminal end of the epitope, or the peptide or protein of which it can be a part. In the formulae representing selected specific embodiments of the present invention, the amino- and carboxyl-terminal groups, although not specifically shown, are in the form they would assume at physiologic pH values, unless otherwise specified. In the amino acid structure formulae, each residue is generally represented by standard three letter or single letter designations. The L-form of an amino acid residue is represented by a capital single letter or a capital first letter of a three-letter symbol, and the D-form for those amino acid residues having D-forms is represented by a lower case single letter or a lower case three letter symbol. However, when three letter symbols or full names are used without capitals, they can refer to L amino acid residues. Glycine has no asymmetric carbon atom and is simply referred to as “Gly” or “G”. The amino acid sequences of peptides set forth herein are generally designated using the standard single letter symbol. (A, Alanine; C, Cysteine; D, Aspartic Acid; E, Glutamic Acid; F, Phenylalanine; G, Glycine; H, Histidine; I, Isoleucine; K, Lysine; L, Leucine; M, Methionine; N, Asparagine; P, Proline; Q, Glutamine; R, Arginine; S, Serine; T, Threonine; V, Valine; W, Tryptophan; and Y, Tyrosine.)
[0155] The terms “peptide” and “peptide epitope” are used interchangeably with “oligopeptide” in the present specification to designate a series of residues connected one to the other, typically by peptide bonds between the a-amino and carboxyl groups of adjacent amino acid residues. A “synthetic peptide” refers to a peptide that is obtained from a non-natural source, e.g., is man-made. Such peptides can be produced using such methods as chemical synthesis or recombinant DNA technology. “Synthetic peptides” include “fusion proteins.”
[0156] A “conservative amino acid substitution” is one in which one amino acid residue is replaced with another amino acid residue having a similar side chain. Families of amino acid residues having similar side chains have been defined in the art, including basic side chains (e.g., lysine, arginine, histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), beta-branched side chains (e.g., threonine, valine, isoleucine) and aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, histidine). For example, substitution of a phenylalanine for a tyrosine is a conservative
substitution. Methods of identifying nucleotide and amino acid conservative substitutions which do not eliminate peptide function are well-known in the art.
[0157] “Pharmaceutically acceptable” refers to a generally non-toxic, inert, and/or physiologically compatible composition or component of a composition. A “pharmaceutical excipient” or “excipient” comprises a material such as an adjuvant, a carrier, pH-adjusting and buffering agents, tonicity adjusting agents, wetting agents, preservatives, and the like. A “pharmaceutical excipient” is an excipient which is pharmaceutically acceptable.
[0158] The terms “polynucleotide” and “nucleic acid” are used interchangeably herein and refer to polymers of nucleotides of any length, and include DNA and RNA, for example, mRNA. The nucleotides can be deoxyribonucleotides, ribonucleotides, modified nucleotides or bases, and/or their analogs, or any substrate that can be incorporated into a polymer by DNA or RNA polymerase. In some embodiments, the polynucleotide and nucleic acid can be in vitro transcribed mRNA. In some embodiments, the polynucleotide that is administered using the methods of the invention is mRNA.
[0159] The terms “isolated” or “biologically pure” refer to material which is substantially or essentially free from components which normally accompany the material as it is found in its native state. Thus, isolated peptides described herein do not contain some or all of the materials normally associated with the peptides in their in situ environment. For example, an “isolated” epitope can be an epitope that does not include the whole sequence of the protein from which the epitope was derived. For example, a naturally-occurring polynucleotide or peptide present in a living animal is not isolated, but the same polynucleotide or peptide, separated from some or all of the coexisting materials in the natural system, is isolated. Such a polynucleotide can be part of a vector, and/or such a polynucleotide or peptide can be part of a composition, and still be “isolated” in that such vector or composition is not part of its natural environment. Isolated RNA molecules include in vivo or in vitro RNA transcripts of the DNA molecules described herein, and further include such molecules produced synthetically. In some embodiments, a polypeptide, antibody, polynucleotide, vector, cell, or composition which is isolated is substantially pure. The term “substantially pure” as used herein refers to material which is at least 50% pure (i.e., free from contaminants), at least 90% pure, at least 95% pure, at least 98% pure, or at least 99% pure.
[0160] The terms “identical” or percent “identity” in the context of two or more nucleic acids or polypeptides, refer to two or more sequences or subsequences that are the same or have a specified percentage of nucleotides or amino acid residues that are the same, when compared and aligned (introducing gaps, if necessary) for maximum correspondence, not considering any conservative amino acid substitutions as part of the sequence identity. The percent identity can be measured using sequence comparison software or algorithms or by visual inspection. Various algorithms and software that can
be used to obtain alignments of amino acid or nucleotide sequences are well-known in the art. These include, for example, BLAST, ALIGN, Megalign, BestFit, GCG Wisconsin Package, and variations thereof. In some embodiments, two nucleic acids or polypeptides described herein are substantially identical, meaning they have at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, and in some embodiments at least 95%, 96%, 97%, 98%, 99% nucleotide or amino acid residue identity, when compared and aligned for maximum correspondence, as measured using a sequence comparison algorithm or by visual inspection. In some embodiments, identity exists over a region of the sequences that is at least about 10, at least about 20, at least about 40-60 residues, at least about 60-80 residues in length or any integral value there between. In some embodiments, identity exists over a longer region than 60-80 residues, such as at least about 80-100 residues, and in some embodiments the sequences are substantially identical over the full length of the sequences being compared, such as an amino acid sequence of a peptide or a coding region of a nucleotide sequence.
[0161] The term “subject” refers to any animal (e.g., a mammal), including, for example, humans, non-human primates, canines, felines, rodents, and the like, which is to be the recipient of a particular treatment. In some embodiments, the terms “subject” and “patient” are used interchangeably herein in reference to a human subject.
[0162] The terms “effective amount” or “therapeutically effective amount” or “therapeutic effect” refer to an amount of a therapeutic effective to “treat” a disease or disorder in a subject or mammal. The therapeutically effective amount of a drug has a therapeutic effect and as such can prevent the development of a disease or disorder; slow down the development of a disease or disorder; slow down the progression of a disease or disorder; relieve to some extent one or more of the symptoms associated with a disease or disorder; reduce morbidity and mortality; improve quality of life; or a combination of such effects.
[0163] The terms “treating” or “treatment” or “to treat” or “alleviating” or “to alleviate” refer to therapeutic measures that cure, slow down, lessen symptoms of, and/or halt progression of a diagnosed pathologic condition or disorder. Thus, those in need of treatment include those already with the disorder. In some cases, treating may refer to reducing, or ameliorating a disorder and/or symptoms associated therewith (e.g., a neoplasia or tumor or infectious agent or an autoimmune disease). “Treating” can refer to administration of the therapy to a subject after the onset, or suspected onset, of a disease (e.g., cancer or infection by an infectious agent or an autoimmune disease). “Treating” includes the concepts of “alleviating”, which refers to lessening the frequency of occurrence or recurrence, or the severity, of any symptoms or other ill effects related to the disease and/or the side effects associated with therapy. The term “treating” may also encompass the concept of “managing” which refers to reducing the severity of a disease or disorder in a patient, e.g., extending the life or
prolonging the survivability of a patient with the disease, or delaying its recurrence, e.g., lengthening the period of remission in a patient who had suffered from the disease. It is appreciated that, although not precluded, treating a disorder or condition does not require that the disorder, condition, or symptoms associated therewith be completely eliminated.
[0164] The terms “prevent” or “prevention” refer to prophylactic or preventative measures that slow down the development of a targeted pathologic condition or disorder. Thus, those in need of prevention include those prone to have the disorder or those in whom the disorder is to be prevented.
[0165] The term “depleted” when used to describe a cell sample (e.g., a peripheral blood mononuclear cell (PBMC) sample) refers to a cell sample in which a subpopulation of cells has been removed or depleted.
[0166] The “stimulation” refers to a response induced by binding of a stimulatory molecule with its cognate ligand thereby mediating a signal transduction event. For example, stimulation of a T cell can refer to binding of a TCR of a T cell to a peptide-MHC complex. For example, stimulation of a T cell can refer to a step in which PBMCs are cultured together with peptide loaded APCs.
[0167] The term “enriched” refers to a composition or fraction wherein an object species has been partially purified such that the concentration of the object species is substantially higher than the naturally occurring level of the species in a finished product without enrichment. The term “induced cell” refers to a cell that has been treated with an inducing compound, cell, or population of cells that affects the cell’s protein expression, gene expression, differentiation status, shape, morphology, viability, and the like.
[0168] A “reference” can be used to correlate and/or compare the results obtained in the methods of the present disclosure from a diseased specimen. Typically, a “reference” may be obtained on the basis of one or more normal specimens, in particular specimens which are not affected by a disease, either obtained from an individual or one or more different individuals (e.g., healthy individuals), such as individuals of the same species. A “reference” can be determined empirically by testing a sufficiently large number of normal specimens.
[0169] As used herein, a tumor unless otherwise mentioned, is a cancerous tumor, and the terms cancer and tumor are used interchangeably throughout the document. While a tumor is a cancer of solid tissue, several of the compositions and methods described herein are in principle applicable to cancers of the blood, such as leukemia.
[0170] As used herein, the term “mRNA” or sometimes refer by “mRNA transcripts” include, but is not limited to pre-mRNA transcript(s), transcript processing intermediates, mature mRNA(s) ready for translation and transcripts of the gene or genes, or nucleic acids derived from the mRNA transcript(s). Transcript processing can include splicing, editing and degradation. As used herein, a nucleic acid
derived from an mRNA transcript refers to a nucleic acid for whose synthesis the mRNA transcript or a subsequence thereof has ultimately served as a template. Thus, a cDNA reverse transcribed from an mRNA, an RNA transcribed from that cDNA, a DNA amplified from the cDNA, an RNA transcribed from the amplified DNA, etc., are all derived from the mRNA transcript and detection of such derived products is indicative of the presence and/or abundance of the original transcript in a sample. Thus, mRNA derived samples include, but are not limited to, mRNA transcripts of the gene or genes, cDNA reverse transcribed from the mRNA, cRNA transcribed from the cDNA, DNA amplified from the genes, RNA transcribed from amplified DNA, and the like.
[0171] The term “barcode,” as used herein, generally refers to a label, or identifier, that can be part of an analyte to convey information about the analyte. A barcode can be a tag attached to an analyte (e.g., nucleic acid molecule) or a combination of the tag in addition to an endogenous characteristic of the analyte (e.g., size of the analyte or end sequence(s)). The barcode may be unique. Barcodes can have a variety of different formats, for example, barcodes can include: polynucleotide barcodes; random nucleic acid and/or amino acid sequences; and synthetic nucleic acid and/or amino acid sequences. A barcode can be attached to an analyte in a reversible or irreversible manner. A barcode can be added to, for example, a fragment of a deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) sample before, during, and/or after sequencing of the sample. Barcodes can allow for identification and/or quantification of individual sequencing-reads in real time.
[0172] The term “sequencing,” as used herein, generally refers to methods and technologies for determining the sequence of nucleotide bases in one or more polynucleotides. The polynucleotides can be, for example, deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), including variants or derivatives thereof (e.g., single stranded DNA). Sequencing devices may provide a plurality of sequence reads corresponding to the genetic information of a subject (e.g., human), as generated by the device from a sample comprising polynucleotides.
[0173] As used herein, the term “next generation sequencing” refers to sequencing technologies having increased throughput as compared to the traditional Sanger- and capillary electrophoresis-based approaches, for example with the ability to generate hundreds of thousands or millions of relatively short sequence reads at a time. Examples of next generation sequencing techniques include, but are not limited to, sequencing by synthesis, sequencing by ligation, and sequencing by hybridization. Examples of next generations sequencing methods include, but are not limited to, pyrosequencing as used by the GS Junior and GS FLX Systems (454 Life Sciences, Bradford, Conn.); sequencing by synthesis as used by Miseq and Solexa system (Illumina, Inc., San Diego, Calif); the SOLiD™ (Sequencing by Oligonucleotide Ligation and Detection) system and Ion Torrent Sequencing systems
such as the Personal Genome Machine or the Proton Sequencer (Thermo Fisher Scientific, Waltham, Mass.), and nanopore sequencing systems (Oxford Nanopore Technologies, Oxford, united Kingdom). [0174] The term, “running-sum statistic,” refers to a statistical measure obtained by consecutively adding (or subtracting) the values of a data set or time series. This method can be used for moving total computations. In this form of cumulative calculation, the total sum of data values is updated whenever a new data point is added to the series, or an existing data point is subtracted. Running-sum statistics can be useful for analyzing trends over time, checking data integrity, or identifying significant shifts in data points in fields such as finance, data analysis, economics, and engineering.
[0175] The term “pharmaceutically acceptable salt” refers to salts derived from a variety of organic and inorganic counter ions known in the art. Pharmaceutically acceptable acid addition salts can be formed with inorganic acids and organic acids. Preferred inorganic acids from which salts can be derived include, for example, hydrochloric acid, hydrobromic acid, sulfuric acid, nitric acid and phosphoric acid. Preferred organic acids from which salts can be derived include, for example, acetic acid, propionic acid, glycolic acid, pyruvic acid, oxalic acid, maleic acid, malonic acid, succinic acid, fumaric acid, tartaric acid, citric acid, benzoic acid, cinnamic acid, mandelic acid, methanesulfonic acid, ethanesulfonic acid, p-toluenesulfonic acid and salicylic acid. Pharmaceutically acceptable base addition salts can be formed with inorganic and organic bases. Inorganic bases from which salts can be derived include, for example, sodium, potassium, lithium, ammonium, calcium, magnesium, iron, zinc, copper, manganese and aluminum. Organic bases from which salts can be derived include, for example, primary, secondary, and tertiary amines, substituted amines including naturally occurring substituted amines, cyclic amines and basic ion exchange resins. Specific examples include isopropylamine, trimethylamine, diethylamine, triethylamine, tripropylamine, and ethanolamine. In some embodiments, the pharmaceutically acceptable base addition salt is chosen from ammonium, potassium, sodium, calcium, and magnesium salts. The term “cocrystal” refers to a molecular complex derived from a number of cocrystal formers known in the art. Unlike a salt, a cocrystal typically does not involve hydrogen transfer between the cocrystal and the drug, and instead involves intermolecular interactions, such as hydrogen bonding, aromatic ring stacking, or dispersive forces, between the cocrystal former and the drug in the crystal structure.
[0176] The terms “pharmaceutically acceptable carrier” or “pharmaceutically acceptable excipient” are intended to include any and all solvents, dispersion media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents, and inert ingredients. The use of such pharmaceutically acceptable carriers or pharmaceutically acceptable excipients for active pharmaceutical ingredients is well known in the art. Except insofar as any conventional pharmaceutically acceptable carrier or pharmaceutically acceptable excipient is incompatible with the
active pharmaceutical ingredient, its use in the therapeutic compositions of the invention is contemplated. Additional active pharmaceutical ingredients, such as other drugs, can also be incorporated into the described compositions, processes and methods.
Methods for Identifying Exhausted T Cells
[0177] Provided herein is a method of identifying exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer. In some embodiments, the method comprising: (a) providing single cell transcriptome data of the population of T cells. In some embodiments, the method comprising (b) classifying each T cell of the population of T cells as a CD4+ cell or a CD8+ cell based on an expression level of each classification gene of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least 72, at least 73, at least 74, at least 75, at least 76, at least 77, at least 78, at least 79, at least 80, at least 81, at least 82, at least 83, at least 84, at least 85, at least 86, at least 87, at least 88, at least 89, at least 90, at least 91, at least 92, at least 93, at least 94, at least 95, at least 96, at least 97, at least 98, or at least 99 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster and a CD8+ cluster. In some embodiments, the method comprising (c) calculating (i) a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers. In some embodiments, the method comprising (c) calculating (i) a CD4+ exhaustion score and a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers. In some embodiments, the method comprising (c) calculating (i) a CD4+ exhaustion score or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers.
In some embodiments the method comprising calculating a CD4+ exhaustion score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least
I, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least
I I, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD4+ exhaustion gene markers (e.g., see Table 4). In some embodiments the method comprising calculating a CD4+ GSEA score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least 72, at least 73, at least 74, at least 75, at least 76, at least 77, at least 78, at least 79, at least 80, at least 81, at least 82, at least 83, at least 84, at least 85, at least 86, at least 87, or at least 88 CD4+ exhaustion gene markers (e.g., see Table 6). In some embodiments, the method comprising (c) calculating (ii) a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers. In some embodiments, the method comprising (c) calculating (ii) a CD8+ exhaustion score and a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers. In some embodiments, the method comprising (c) calculating (ii) a CD8+ exhaustion score or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers. In some embodiments the method comprising calculating a CD8+ exhaustion score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD8+ exhaustion gene markers (e.g., see Table 3). In some embodiments the method comprising calculating a CD8+
GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least
26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least
35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least
44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least
53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, or at least 61 CD8+ exhaustion gene markers (e g., see Table 5). In some embodiments, the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers is different from the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers. In some embodiments, each T cell within the CD4+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell. In some embodiments, each T cell within the CD8+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell. In some embodiments, each T cell within the CD4+ cluster with an exhaustion score and a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell. In some embodiments, each T cell within the CD8+ cluster with an exhaustion score and a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell. In some embodiments, each T cell within the CD4+ cluster with an exhaustion score or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell. In some embodiments, each T cell within the CD8+ cluster with an exhaustion score or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
[0178] The threshold value described herein can be an arbitrary value. The threshold value can vary based on the number of gene markers used in the set. The threshold value can be a cutoff that is established from data distribution. The arbitrary cutoff can be fixed and can be determined by analysis on samples processed by the methods described herein. The cutoff can be determined by analyzing score distribution at a clonotype level and selecting a fixed cutoff based upon the overall distribution of a given population of samples.
[0179] Further provided herein is a method of identifying exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer. In some embodiment the method comprising: calculating a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD4+ T cell. In some embodiment the method comprising:
calculating a CD4+ exhaustion score and a CD4+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD4+ T cell. In some embodiment the method comprising: calculating a CD4+ exhaustion score or a CD4+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD4+ T cell. In some embodiments, the calculating is based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers. In some embodiments the method comprising calculating a CD4+ exhaustion score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD4+ exhaustion gene markers (e.g., see Table 4). In some embodiments the method comprising calculating a CD4+ GSEA score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least 72, at least 73, at least 74, at least 75, at least 76, at least 77, at least 78, at least 79, at least 80, at least 81, at least 82, at least 83, at least 84, at least 85, at least 86, at least 87, or at least 88 CD4+ exhaustion gene markers (e.g., see Table 6). In some embodiments, the expression level of each CD4+ exhaustion gene marker is from single cell transcriptome data of the population of T cells from the tumor microenvironment of the subject. In some embodiments, each T cell classified as a CD4+ T cell with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell. In some embodiments, each T cell classified as a CD4+ T cell with an exhaustion score and a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell. In some embodiments, each T cell classified as a CD4+ T cell with an exhaustion score or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell. In some embodiments, the method further comprises, prior to calculating, classifying a T cell from the population of T cells as a CD4+ cell based on an expression level of each classification gene of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least
15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster.
[0180] Provided herein is a method of identifying exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer. In some embodiments, the method comprising: calculating a CD8+ exhaustion score and/or a CD8+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD8+ T cell. In some embodiments, the method comprising: calculating a CD8+ exhaustion score and a CD8+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD8+ T cell. In some embodiments, the method comprising: calculating a CD8+ exhaustion score or a CD8+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD8+ T cell In some embodiments, the calculating is based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers. In some embodiments the method comprising calculating a CD8+ exhaustion score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD8+ exhaustion gene markers (e.g., see Table 3). In some embodiments the method comprising calculating a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, or at least 61 CD8+ exhaustion gene markers (e.g., see Table 5). In some embodiments, the expression level of each CD8+ exhaustion gene marker is from single cell transcriptome data of the population of T cells from the tumor microenvironment of the subject. In some embodiments, each T cell classified as a CD8+ T cell with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell. In some embodiments, each T cell classified as a CD8+ T cell with an exhaustion score and a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell. In some embodiments, each T cell classified as a CD8+ T cell with an exhaustion score or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell. In some embodiments, the method further comprises, prior
to calculating, classifying a T cell from the population of T cells as a CD8+ cell based on an expression level of each classification gene of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 classification genes from the single cell transcriptome data, thereby generating a CD8+ cluster.
[0181] In some embodiments, the method further comprises classifying each T cell from the population of T cells as a CD4+ cell or a CD8+ cell based on an expression level of each classification gene of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least 72, at least 73, at least 74, at least 75, at least 76, at least 77, at least 78, at least 79, at least 80, at least 81, at least 82, at least 83, at least 84, at least 85, at least 86, at least 87, at least 88, at least 89, at least 90, at least 91, at least 92, at least 93, at least 94, at least 95, at least 96, at least 97, at least 98, or at least 99 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster and a CD8+ cluster. In some embodiments, the method further comprises calculating (i) a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers. In some embodiments, the method further comprises calculating (i) a CD4+ exhaustion score and a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers. In some embodiments, the method further comprises calculating (i) a CD4+ exhaustion score or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers. In some embodiments the method comprising calculating a CD4+ exhaustion score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at
least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD4+ exhaustion gene markers (e.g., see Table 4). In some embodiments the method comprising calculating a CD4+ GSEA score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least 72, at least 73, at least 74, at least 75, at least 76, at least 77, at least 78, at least 79, at least 80, at least 81, at least 82, at least 83, at least 84, at least 85, at least 86, at least 87, or at least 88 CD4+ exhaustion gene markers (e.g., see Table 6). In some embodiments, the method further comprises calculating (ii) a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers. In some embodiments, the method further comprises calculating (ii) a CD8+ exhaustion score and a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers. In some embodiments, the method further comprises calculating (ii) a CD8+ exhaustion score or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers. In some embodiments the method comprising calculating a CD8+ exhaustion score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD8+ exhaustion gene markers (e.g., see Table 3). In some embodiments the method comprising calculating a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at
least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, or at least 61 CD8+ exhaustion gene markers (e g., see Table 5). In some embodiments, the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers is different from the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers.
[0182] Further provided herein is a method of classifying CD8+ T cells and CD4+ T cells in a population of T cells. In some embodiments, the method comprising: (a) providing single cell transcriptome data of a population of T cells obtained from a tumor microenvironment of a subject having a cancer. In some embodiments, the method further comprises (b) classifying each T cell of the population of T cells as a CD4+ cell or a CD8+ cell based on an expression level of each classification gene of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least 72, at least 73, at least 74, at least 75, at least 76, at least 77, at least 78, at least 79, at least 80, at least 81, at least 82, at least 83, at least 84, at least 85, at least 86, at least 87, at least 88, at least 89, at least 90, at least 91, at least 92, at least 93, at least 94, at least 95, at least 96, at least 97, at least 98, or at least 99 classification genes selected from the group consisting of the genes of Table 2 from the single cell transcriptome data, thereby generating a CD4+ cluster and a CD8+ cluster. In some embodiments, a T cell of the CD4+ cluster is classified as CD4+ T cell. In some embodiments, a T cell of the CD8+ cluster is classified as CD8+ T cell. In some embodiments, the method further comprises calculating a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers. In some embodiments,
the method further comprises calculating a CD4+ exhaustion score and a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers. In some embodiments, the method further comprises calculating a CD4+ exhaustion score or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers. In some embodiments, the method comprises calculating a CD4+ exhaustion score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD4+ exhaustion gene markers (e.g., see Table 4). In some embodiments the method comprises calculating a CD4+ GSEA score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least 72, at least 73, at least 74, at least 75, at least 76, at least 77, at least 78, at least 79, at least 80, at least 81, at least 82, at least 83, at least 84, at least 85, at least 86, at least 87, or at least 88 CD4+ exhaustion gene markers (e.g., see Table 6). In some embodiments, the expression level of each CD4+ exhaustion gene marker is from single cell transcriptome data of a population of T cells from the tumor microenvironment of the subject. In some embodiments, each T cell within the CD4+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell. In some embodiments, each T cell within the CD4+ cluster with an exhaustion score and a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell. In some embodiments, each T cell within the CD4+ cluster with an exhaustion score or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell. In some embodiments, the method further comprises calculating a CD8+ exhaustion score and/or a CD8+ GSEA score for a T
cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers In some embodiments, the method further comprises calculating a CD8+ exhaustion score or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers. In some embodiments, the method further comprises calculating a CD8+ exhaustion score and a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers. In some embodiments the method comprises calculating a CD8+ exhaustion score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD8+ exhaustion gene markers (e g., see Table 3). In some embodiments the method comprises calculating a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least
26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least
35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least
44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least
53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, or at least 61 CD8+ exhaustion gene markers (e.g., see Table 5). In some embodiments, the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers is different from the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers. In some embodiments, the expression level of each CD8+ exhaustion gene marker is from single cell transcriptome data of a population of T cells from a tumor microenvironment of a subject having a cancer. In some embodiments, each T cell within the CD8+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell. In some embodiments, each T cell within the CD8+ cluster with an exhaustion score and a GSEA score equal to or higher than a threshold value is
identified as an exhausted CD8+ T cell. In some embodiments, each T cell within the CD8+ cluster with an exhaustion score or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
[0183] In some embodiments, the method further comprises obtaining the population of T cells from the tumor microenvironment of the subject. In some embodiments, obtaining comprises isolating a tumor or a tumor tissue comprising the population of T cells from the subject.
[0184] In some embodiments, the expression level is determined by mRNA transcripts. In some embodiments, the method further comprises sequencing mRNAs from the population of T cells to obtain the single cell transcriptome data.
[0185] In some embodiments, the method further comprises providing single-cell T-cell receptor (scTCR) data of the population of T cells. In some embodiments, the method further comprises sequencing the population of T cells to obtain the scTCR data of each T cell. In some embodiments, the method further comprises identifying a TCR clonotype of an exhausted CD4+ T cell or an exhausted CD8+ T cell based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells. In some embodiments, the method further comprises identifying TCR clonotypes of each exhausted CD4+ T cell of the population of T cells based on the scTCR data of exhausted CD4+ T cells. In some embodiments, the method further comprises identifying TCR clonotypes of each exhausted CD8+ cell of the population of T cells based on the scTCR data of exhausted CD8+ T cells. In some embodiments, the method further comprises identifying TCR clonotypes of each exhausted CD4+ T cell and each exhausted CD8+ T cell of the population of T cells based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells.
[0186] In some embodiments, a TCR clonotype of a given exhausted CD4+ T cell is matched to the CD4+ exhaustion score and/or the CD4+ GSEA score of the same exhausted CD4+ T cell. In some embodiments, a TCR clonotype of a given exhausted CD4+ T cell is matched to the CD4+ exhaustion score and the CD4+ GSEA score of the same exhausted CD4+ T cell. In some embodiments, a TCR clonotype of a given exhausted CD4+ T cell is matched to the CD4+ exhaustion score or the CD4+ GSEA score of the same exhausted CD4+ T cell. In some embodiments, the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell. In some embodiments, the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell via a same single cell barcode.
[0187] In some embodiments, a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score and/or the CD8+ GSEA score of the same exhausted CD8+ T cell. In some embodiments, a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score and the CD8+ GSEA score of the same exhausted CD8+ T cell. In some embodiments, a TCR
clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score or the CD8+ GSEA score of the same exhausted CD8+ T cell. In some embodiments, the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell. In some embodiments, the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell via a same single cell barcode.
[0188] In some embodiments, the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD4+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD4+ T cells. In some embodiments, the method further comprises identifying a clone size expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD4+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD4+ T cells. In some embodiments, the method further comprises identifying a clone size in the group of exhausted is larger than the clone size in the group of non-exhausted expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD4+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD4+ T cells. In some embodiments, the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD4+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD4+ T cells.
[0189] In some embodiments, the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD8+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD8+ T cells. In some embodiments, the method further comprises identifying a clone size expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD8+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD8+ T cells. In some embodiments, the method further comprises identifying a clone size in the group of exhausted is larger than the clone size in the group of non-exhausted expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD8+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD8+ T cells. In some embodiments, the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the
group of exhausted CD8+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD8+ T cells.
[0190] In some embodiments, the method further comprises, prior to obtaining the single cell transcriptomic data, separating a subset of T cells from the population of T cells based on expression of a CD4+ and/or CD8+ exhaustion marker, thereby generating a subset of exhausted T cells and a subset of non-exhausted T cells. In some embodiments, the method further comprises, prior to obtaining the single cell transcriptomic data, separating a subset of T cells from the population of T cells based on expression of a CD4+ and CD8+ exhaustion marker, thereby generating a subset of exhausted T cells and a subset of non-exhausted T cells. In some embodiments, the method further comprises, prior to obtaining the single cell transcriptomic data, separating a subset of T cells from the population of T cells based on expression of a CD4+ or CD8+ exhaustion marker, thereby generating a subset of exhausted T cells and a subset of non-exhausted T cells. In some embodiments, the CD4+ and/or CD8+ exhaustion marker comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least 72, at least 73, at least 74, at least 75, at least 76, at least 77, at least 78, at least 79, at least 80, at least 81, at least 82, at least 83, at least 84, at least 85, at least 86, at least 87, at least 88, at least 89, at least 90, at least 91, at least 92, at least 93, at least 94, at least 95, at least 96, at least 97, at least 98, or at least 99 genes selected from the group consisting of genes in Tables 3-6. In some embodiments, separating comprises fluorescence activated cell sorting (FACS). In some embodiments, the method further comprises sequencing the subset of exhausted T cells and the subset of non-exhausted T cells using single cell sequencing or bulk sequencing. In some embodiments, the sequencing does not comprise using a barcode.
[0191] In some embodiments, the population of T cells are obtained from a frozen sample or a fresh sample. In some embodiments, the sample is a formalin-fixed paraffin-embedded (FFPE) sample. In some embodiments, the sample is not a FFPE sample.
[0192] In some embodiments, the sample is obtained from a tumor of a subject. In some embodiments, the subject has been treated with a therapy. In some embodiments, the subject has been treated with
the therapy prior to or concurrently with obtaining the sample. In some embodiments, the therapy comprises an immune checkpoint inhibitor.
[0193] In some embodiments, the method further comprises preparing a pharmaceutical composition using the candidate tumor-reactive TCR clonotype or a cell expressing the candidate tumor-reactive TCR clonotype.
[0194] Further provided herein is a method of identifying one or more T-cell receptors from exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer. In some embodiments, the method comprises (a) providing single cell transcriptome data of the population of T cells. In some embodiments, the method further comprises (b) classifying each T cell of the population of T cells as a CD4+ T cell or a CD8+ T cell based on an expression level of each classification gene of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least 72, at least 73, at least 74, at least 75, at least 76, at least 77, at least 78, at least 79, at least 80, at least 81, at least 82, at least 83, at least 84, at least 85, at least 86, at least 87, at least 88, at least 89, at least 90, at least 91, at least 92, at least 93, at least 94, at least 95, at least 96, at least 97, at least 98, or at least 99 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster and a CD8+ cluster. In some embodiments, the method further comprises (c) calculating (i) a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers. In some embodiments, the method further comprises (c) calculating (i) a CD4+ exhaustion score or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers. In some embodiments, the method further comprises (c) calculating (i) a CD4+ exhaustion score and a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least
2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers. In some embodiments the method comprising calculating a CD4+ exhaustion score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD4+ exhaustion gene markers (e.g., see Table 4). In some embodiments the method comprising calculating a CD4+ GSEA score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least
26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least
35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least
44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least
53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least
62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least
71, at least 72, at least 73, at least 74, at least 75, at least 76, at least 77, at least 78, at least 79, at least
80, at least 81, at least 82, at least 83, at least 84, at least 85, at least 86, at least 87, or at least 88 CD4+ exhaustion gene markers (e.g., see Table 6). In some embodiments, the method further comprises (c) calculating (ii) a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers. In some embodiments, the method further comprises (c) calculating (ii) a CD8+ exhaustion score and a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers. In some embodiments, the method further comprises (c) calculating (ii) a CD8+ exhaustion score or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker a set of at least 2, at least
3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers. In some embodiments the method comprises calculating a CD8+ exhaustion score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least
15, at least 16, at least 17, at least 18, at least 19, or at least 20 CD8+ exhaustion gene markers (e.g., see Table 3). In some embodiments the method comprises calculating a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, or at least 61 CD8+ exhaustion gene markers (e.g., see Table 5). In some embodiments, the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers is different from the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers. In some embodiments, each T cell within the CD4+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell. In some embodiments, each T cell within the CD4+ cluster with an exhaustion score and a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell. In some embodiments, each T cell within the CD4+ cluster with an exhaustion score or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell. In some embodiments, the method further comprises each T cell within the CD8+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell. In some embodiments, the method further comprises each T cell within the CD8+ cluster with an exhaustion score or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell. In some embodiments, the method further comprises each T cell within the CD8+ cluster with an exhaustion score and a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell. In some embodiments the method further comprises (d) identifying TCR clonotypes of the exhausted CD4+ T cells and exhausted CD8+ cells separately based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells identified in (c). Each TCR clonotype can comprise a paired TCR alpha chain and TCR beta chain from the single cell sequencing data, and each TCR clonotype can have a unique CDR3 sequence of a TCR beta chain and/or unique VDJ combination.
[0195] In some embodiments, the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 60, at least
70, at least 80, at least 90, or at least 100 classification genes comprises at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 genes selected from the group consisting of PTPN13, TNFRSF4, CCR6, FOXP3, TSHZ2, MFHAS1, FAAH2, CD4, GK, IL2RA, CRADD, LIB, IRS2, KLRB 1, TNFRSF25, LINC02694, THAD A, BATF, TNFRSF18, SELL, IL I 2RB2, FURIN, HIPK2, MAP3K5, TMEM173, CTSB, SAMHD1, ADAM19, ICOS, GNA15, EPSTI1, ZC3H12D, PHTF2, MAST4, UGP2, RAPGEF6, STAM, CTLA4, RORA, SATB1, ZEB1, PIM2, CD28, LDLRAD4, PELI1, RHBDD2, SOCS3, TRAF3, ABCC1, RNASET2, SPOCK2, ITK, STK24, SNX9, GZMA, RALGAPA1, GZMB, JMJD6, ZEB2, DUSP2, CLEC2B, GABARAPL1, SLA2, LITAF, AKNA, LYST, ITGA4, TUBA4A, IFNG, METRNL, CST7, IER5L, MXRA7, GGA2, AUTS2, APOBEC3G, NELL2, LYAR, GALNT11, PTMS, CMC1, AOAH, LAG3, PRF1, TNFSF9, CCL5, CCL4, CTSW, GZMH, GNLY, YBX3, GZMK, CRTAM, CD8A, KLRK1, NKG7, KLRD1, CD8B, and LINC02446. In some embodiments, classifying each T cell of the population of T cells comprises classifying each T cell of the population of T cells as a CD4+ cell and/or a CD8+ cell based on an expression level of each classification gene of a set of from 11 to 99 classification genes selected from the group consisting of PTPN13, TNFRSF4, CCR6, FOXP3, TSHZ2, MFHAS1, FAAH2, CD4, GK, IL2RA, CRADD, LTB, IRS2, KLRB1, TNFRSF25, LINC02694, THAD A, BATF, TNFRSF18, SELL, IL12RB2, FURIN, HIPK2, MAP3K5, TMEM173, CTSB, SAMHD1, ADAM19, ICOS, GNA15, EPSTI1, ZC3H12D, PHTF2, MAST4, UGP2, RAPGEF6, STAM, CTLA4, RORA, SATB1, ZEB1, PIM2, CD28, LDLRAD4, PELI1, RHBDD2, SOCS3, TRAF3, ABCC1, RNASET2, SPOCK2, ITK, STK24, SNX9, GZMA, RALGAPA1, GZMB, JMJD6, ZEB2, DUSP2, CLEC2B, GABARAPL1, SLA2, LITAF, AKNA, LYST, ITGA4, TUBA4A, IFNG, METRNL, CST7, IER5L, MXRA7, GGA2, AUTS2, APOBEC3G, NELL2, LYAR, GALNT11, PTMS, CMC1, AOAH, LAG3, PRF1, TNFSF9, CCL5, CCL4, CTSW, GZMH, GNLY, YBX3, GZMK, CRTAM, CD8A, KLRK1, NKG7, KLRD1, CD8B, and LINC02446.
[0196] In some embodiments, the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers comprises at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, or at least 20 genes selected from the group consisting of ADGRG1, CD200, CHN1, CPM, CTLA4, CXCL13, DRAIC, ENTPD1, GNG4, ICA1, IGFL2, IGFLR1, MY07A, PDE7B, NMB, PTPN13, TIGIT, TOX, TOX2, and TSHZ2. In some embodiments, the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers comprises from 6 to 20 genes selected from the group consisting of
ADGRG1, CD200, CHN1, CPM, CTLA4, CXCL13, DRAIC, ENTPD1, GNG4, ICA1, IGFL2, IGFLR1, MY07A, PDE7B, NMB, PTPN13, TIGIT, TOX, TOX2, and TSHZ2.
[0197] In some embodiments, the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers comprises at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, or at least 20 genes selected from the group consisting of ADGRG1, CD200, CHN1, CPM, CTLA4, CXCL13, DRAIC, ENTPD1, GNG4, ICA1, IGFL2, IGFLR1, MY07A, PDE7B, NMB, PTPN13, TIGIT, TOX, TOX2, and TSHZ2. In some embodiments, the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers comprises from 6 to 20 genes selected from the group consisting of ADGRG1, CD200, CHN1, CPM, CTLA4, CXCL13, DRAIC, ENTPD1, GNG4, ICA1, IGFL2, IGFLR1, MYO7A, PDE7B, NMB, PTPN13, TIGIT, TOX, TOX2, and TSHZ2.
[0198] In some embodiments, the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers comprises at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, or at least 20 genes selected from the group consisting of CHN1, CLECL1, CTLA4, CXCL13, CXCR6, ENTPD1, HAVCR2, KRT86, LAG3, LAYN, LRRN3, MY01E, MY07A, PDCD1, SIRPG, SRGAP3, STAM, TIGIT, TNFRSF18, and TOX. In some embodiments, the at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers comprises from 6 to 20 genes selected from the group consisting of CHN1, CLECL1, CTLA4, CXCL13, CXCR6, ENTPD1, HAVCR2, KRT86, LAG3, LAYN, LRRN3, MYO1E, MY07A, PDCD1, SIRPG, SRGAP3, STAM, TIGIT, TNFRSF18, and TOX.
[0199] In some embodiments, the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers comprises at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, or at least 20 genes selected from the group consisting of CHN1, CLECL1, CTLA4, CXCL13, CXCR6, ENTPD1, HAVCR2, KRT86, LAG3, LAYN, LRRN3, MY01E, MY07A, PDCD1, SIRPG, SRGAP3, STAM, TIGIT, TNFRSF18, and TOX. In some embodiments, the at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers comprises from 6 to 20 genes selected from the group consisting of CHN1, CLECL1, CTLA4, CXCL13, CXCR6, ENTPD1, HAVCR2, KRT86, LAG3, LAYN, LRRN3, MYO1E, MY07A, PDCD1, SIRPG, SRGAP3, STAM, TIGIT, TNFRSF18, and TOX.
[0200] In some embodiments, calculating the CD4+ exhaustion score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each CD4+ exhaustion gene of the set of at least 2, at least
3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers to obtain the expression level of each CD4+ exhaustion gene of the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers; (ii) scaling the UMI count by dividing the UMI count for each gene by the total number of UMIs of the single cell transcriptome data of the T cell and then multiplying the quotient by a scale factor; (iii) applying a logarithmic transformation to obtain normalized UMI counts for the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers; and (iv) calculating the CD4+ exhaustion score for the T cell as a sum of the normalized UMI counts, wherein the T cell with a CD4+ exhaustion score equal to or higher than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more is identified as an exhausted CD4+ T cell.
[0201] In some embodiments, the scale factor is 10,000. In some embodiments, the scale factor is about 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 10000, 15000, 20000, 25000, 50000, 100000 or more.
[0202] The threshold value can vary depending on the number of exhaustion gene markers used. In some cases, at least 5 exhaustion gene markers are used, and the threshold value can be 0.3 or 0.35. In some cases, 20 exhaustion gene markers are used, and the threshold value can be 13.
[0203] In some embodiments, calculating the CD4+ exhaustion score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each CD4+ exhaustion gene of the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers to obtain the expression level of each CD4+ exhaustion gene of the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers; (ii) scaling the UMI count by diving the UMI count for each gene by the total number of UMIs of the single cell transcriptome data of the T cell and then multiplying the quotient by a scale factor; (iii) applying a logarithmic transformation to obtain normalized UMI counts for the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers; and (iv) calculating the CD4+ exhaustion score for the T cell as a mean of the normalized UMI counts, wherein the T cell with a CD4+ exhaustion score equal to or higher than 0.1, 0.15, 0.2, 0.15, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more is identified as an exhausted CD4+ T cell.
[0204] In some embodiments, the scale factor is 10,000. In some embodiments, the scale factor is about 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 10000, 15000, 20000, 25000, 50000, 100000 or more.
[0205] In some embodiments, calculating the CD8+ exhaustion score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each CD8+ exhaustion gene of the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers to obtain the expression level of each gene of the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 exhaustion gene markers; (ii) scaling the UMI count by dividing the UMI count for each CD8+ exhaustion gene by the total number of UMIs of the single cell transcriptome data of the T cell and then multiplying the quotient by a scale factor; (iii) applying a logarithmic transformation to obtain normalized UMI counts for the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers; and (iv) calculating the CD8+ exhaustion score for the T cell as a sum of the normalized UMI counts, wherein the T cell with a CD8+ exhaustion score equal to or higher than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more is identified as an exhausted CD8+ T cell. In some embodiments, calculating the CD8+ exhaustion score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each CD8+ exhaustion gene of the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers to obtain the expression level of each gene of the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 exhaustion gene markers; (ii) scaling the UMI count for each CD8+ exhaustion gene by the total number of UMIs of the single cell transcriptome data of the T cell and then multiplying the quotient by a scale factor; (iii) applying a logarithmic transformation to obtain normalized UMI counts for the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers; and (iv) calculating the CD8+ exhaustion score for the T cell as a mean of the normalized UMI counts, wherein the T cell with a CD8+ exhaustion score equal to or higher than 0.1, 0.15, 0.2, 0.15, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more is identified as an exhausted CD8+ T cell.
[0206] In some embodiments, the scale factor is 10,000. In some embodiments, the scale factor is about 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 10000, 15000, 20000, 25000, 50000, 100000 or more.
[0207] In some embodiments, the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers comprises at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 genes selected from the group consisting of ADD3, AGFG1, AHI1, AP3S1, ARAP2, ARHGEF3, ATP2A2, CCDC6, CD200, CD27, CH25H, CHN1, CNIH1, COTL1, CPM, CRYBG1, CTLA4, CXCL13, DUSP4, ELMO1, FABP5, FBLN7, FBXO32, FKBP5, FOXN2, FYB1, GEM, GK, GPRIN3, GRSF1, GYPC, HIPK2, HMGB2, ICA1, IL6ST, IQGAP1, ITM2A, ITPR1, JARID2, LHFPL6, LIMSI, LRMP, LRRC8D, MAGEH1, MTHFD2, NAP1L4, NCOA7, NFATC2, NMB, NR3C1, NUDT16, PDCD1, PGM2L1, PHACTR2, POR, PTPN13, RBPJ, RNF19A, SESN1, SESN3, SH2D1A, SLA, SMARCA2, SMARCAD1, SMS, SNX9, SRGN, STAT3, TIAM1, TIGIT, TMEM243, TMEM64, TMEM70, TMPO, TNFAIP8, TNFRSF18, TNFSF8, TNIK, TOX, TOX2, TP53BP2, TP53INP1, TRABD2A, TSHZ2, UGCG, WNK1, YWHAQ and CD4. In some embodiments, the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers comprises from 6 to 88 genes selected from the group consisting of ADD3, AGFG1, AHI1, AP3S1, ARAP2, ARHGEF3, ATP2A2, CCDC6, CD200, CD27, CH25H, CHN1, CNIH1, COTL1, CPM, CRYBG1, CTLA4, CXCL13, DUSP4, ELMO1, FABP5, FBLN7, FBXO32, FKBP5, FOXN2, FYB1, GEM, GK, GPRIN3, GRSF1, GYPC, HIPK2, HMGB2, ICA1, IL6ST, IQGAP1, ITM2A, ITPR1, JARID2, LHFPL6, LIMSI, LRMP, LRRC8D, MAGEH1, MTHFD2, NAP1L4, NCOA7, NFATC2, NMB, NR3C1, NUDT16, PDCD1, PGM2L1, PHACTR2, POR, PTPN13, RBPJ, RNF19A, SESN1, SESN3, SH2D1A, SLA, SMARCA2, SMARCAD1, SMS, SNX9, SRGN, STAT3, TIAM1, TIGIT, TMEM243, TMEM64, TMEM70, TMPO, TNFAIP8, TNFRSF18, TNFSF8, TNIK, TOX, TOX2, TP53BP2, TP53INP1, TRABD2A, TSHZ2, UGCG, WNK1, YWHAQ and CD4. In some embodiments, the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers comprises at least 5 genes selected from the group consisting of AHSA1, ALOX5AP, BAG3, BST2, CACYBP, CARD16, CD3D, CD7, CD82, CHN1, CLECL1, CLEC2B, CLEC2D, CTLA4, CTSD, CXCL13, CXCR6, DUSP4, ENTPD1, FKBP1A, GAPDH, GEM, GZMB, HAVCR2, HLA-DRB1, HSPB1, ICOS, IQGAP1, ITGAE, KRT86, LAG3, LAYN, LSP1, NAP1L4, NR3C1, PDCD1, PELI1, PHLDA1, POLR1E, PRDM1, PTPN22, RAB11FIP1, RAB27A, RBPJ, RGS1, RGS2, RHBDD2, RUNX2, SAMSN1, SERPINH I, SH3BGRL3, SLA, SNX9, SRGAP3, STAM, TIGIT, TNFRSF9, TOX, TTN, CD8A, and CD8B. In some embodiments, the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers comprises from 6
to 61 genes selected from the group consisting of AHSA1, AL0X5AP, BAG3, BST2, CACYBP, CARD16, CD3D, CD7, CD82, CHN1, CLECL1, CLEC2B, CLEC2D, CTLA4, CTSD, CXCL13, CXCR6, DUSP4, ENTPD1, FKBP1A, GAPDH, GEM, GZMB, HAVCR2, HLA-DRB1, HSPB1, ICOS, IQGAP1, ITGAE, KRT86, LAG3, LAYN, LSP1, NAP1L4, NR3C1, PDCD1, PELI1, PHLDA1, P0LR1E, PRDM1, PTPN22, RAB11FIP1, RAB27A, RBPJ, RGS1, RGS2, RHBDD2, RUNX2, SAMSN1, SERPINH1, SH3BGRL3, SLA, SNX9, SRGAP3, STAM, TIGIT, TNFRSF9, TOX, TTN, CD8A, and CD8B.
[0208] In some embodiments, calculating the CD4+ GSEA score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) increasing a running- sum statistic for each CD4+ exhaustion gene of all genes that appears in the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers and decreasing a running-sum statistic for each CD4+ exhaustion gene of all genes that does not appear in the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers; and (iv) calculating the CD4+ GSEA score based on running-sum statistics, wherein the T cell with a CD4+ GSEA score equal to or higher than 0.001, equal to or higher than 0.005, equal to or higher than 0.01, equal to or higher than 0.05, equal to or higher than 0.1, equal to or higher than 0.2, equal to or higher than 0.25, equal to or higher than 0.3, equal to or higher than 0.35, equal to or higher than 0.4, equal to or higher than 0.45, equal to or higher than 0.5, equal to or higher than 0.55, equal to or higher than 0.6, equal to or higher than 0.65, equal to or higher than 0.7, equal to or higher than 0.75, equal to or higher than 0.8, equal to or higher than 0.85, or equal to or higher than 0.9 is identified as an exhausted CD4+ T cell.
[0209] In some embodiments, calculating the CD8+ GSEA score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) increasing a running- sum statistic for each CD8+ exhaustion gene of all genes that appears in the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers and decreasing a running-sum statistic for each CD8+ exhaustion gene of all genes that does not appear in the set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers; and (iv) calculating the CD8+ GSEA score based on running-sum statistics, wherein the T cell with a CD8+ GSEA score equal to or higher than 0.001, equal to or higher than 0.005, equal to or higher than 0.01, equal to or higher than 0.05, equal to or
higher than 0.1, equal to or higher than 0.15, equal to or higher than 0.25, equal to or higher than 0.3, equal to or higher than 0.35, equal to or higher than 0.4, equal to or higher than 0.45, equal to or higher than 0.5, equal to or higher than 0.55, equal to or higher than 0.6, equal to or higher than 0.65, equal to or higher than 0.7, equal to or higher than 0.75, equal to or higher than 0.8, equal to or higher than 0.85, or equal to or higher than 0.9 is identified as an exhausted CD8+ T cell.
[0210] In some embodiments, calculating the CD4+ GSEA score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) calculating an area under the curve (AUC) value of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD4+ exhaustion gene markers; and (iv) calculating the CD4+ GSEA score based on AUC values, wherein the T cell with a CD4+ GSEA score equal to or higher than a cutoff value established from data distribution is identified as an exhausted CD4+ T cell. In some embodiments, the cutoff value is score equal to or higher than 0.001, equal to or higher than 0.005, equal to or higher than 0.01, equal to or higher than 0.05, equal to or higher than 0.1, equal to or higher than 0. 15, equal to or higher than 0.25, equal to or higher than 0.3, equal to or higher than 0.35, equal to or higher than 0.4, equal to or higher than 0.45, equal to or higher than 0.5, equal to or higher than 0.55, equal to or higher than 0.6, equal to or higher than 0.65, equal to or higher than 0.7, equal to or higher than 0.75, equal to or higher than 0.8, equal to or higher than 0.85, or equal to or higher than 0.9. In some embodiments, the cutoff value is 0.2. In some embodiments, calculating in (iii) comprises assessing recovery of the set of at least 5 CD4+ exhaustion genes. In some embodiments, the set of CD4+ exhaustion genes are selected among the top 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100 or more ranked genes from the UMI rank obtained in (ii).
[0211] In some embodiments, calculating the CD8+ GSEA score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) calculating an area under the curve (AUC) value of a set of at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 CD8+ exhaustion gene markers; and (iv) calculating the CD8+ GSEA score based on AUC values, wherein the T cell with a CD8+ GSEA score equal to or higher than a cutoff value established from data distribution is identified as an exhausted CD8+ T cell. In some embodiments, the cutoff value is score equal to or higher than 0.001, equal to or higher than 0.005, equal to or higher than 0.01, equal to or higher than 0.05, equal to or higher than 0.1, equal to or higher than 0. 15, equal to or higher than 0.25, equal to or higher than 0.3, equal to or higher than 0.35, equal to or higher than 0.4, equal to or higher than 0.45,
equal to or higher than 0.5, equal to or higher than 0.55, equal to or higher than 0.6, equal to or higher than 0.65, equal to or higher than 0.7, equal to or higher than 0.75, equal to or higher than 0.8, equal to or higher than 0 85, or equal to or higher than 0.9. In some embodiments, the cutoff value is 0.3. In some embodiments, calculating in (iii) comprises assessing recovery of the set of at least 5 CD8+ exhaustion genes. In some embodiments, the set of CD8+ exhaustion genes are selected among the top 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100 or more ranked genes from the UMI rank obtained in (ii).
[0212] In some embodiments, the method further comprises calculating the CD4+ exhaustion score and the CD4+ GSEA score for the T cell of the CD4+ cluster. In some embodiments, the method further comprises calculating the CD8+ exhaustion score and the CD8+ GSEA score for the T cell of the CD8+ cluster. In some embodiments, the method further comprises identifying TCR clonotypes of the exhausted CD4+ T cells and exhausted CD8+ cells separately based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells identified in (c). In some embodiments, the exhausted CD4+ T cells have both the CD4+ exhaustion score and the CD4+ GSEA score above the threshold value. In some embodiments, the exhausted CD8+ T cells have both the CD8+ exhaustion score and the CD8+ GSEA score above the threshold value. In some embodiments, the exhausted CD4+ T cells have the CD4+ exhaustion score or the CD4+ GSEA score above the threshold value. In some embodiments, the exhausted CD8+ T cells have the CD8+ exhaustion score or the CD8+ GSEA score above the threshold value. In some embodiments, for each TCR clonotype identified in a CD4+ exhausted T cell, calculating a mean or median CD4+ exhaustion score and/or a mean or median CD4+ GSEA score for all CD4+ exhausted T cells having the same TCR clonotype. In some embodiments, for each TCR clonotype identified in a CD8+ exhausted T cell, calculating a mean or median CD8+ exhaustion score and/or a mean or median CD8+ GSEA score for all CD8+ exhausted T cells having the same TCR clonotype. In some embodiments, for each TCR clonotype identified in a CD4+ exhausted T cell, identifying a maximum CD4+ exhaustion score and/or a maximum CD4+ GSEA score for all CD4+ exhausted T cells having the same TCR clonotype. In some embodiments, for each TCR clonotype identified in a CD8+ exhausted T cell, identifying a maximum CD8+ exhaustion score and/or a maximum CD8+ GSEA score for all CD8+ exhausted T cells having the same TCR clonotype. [0213] In some embodiments, a TCR clonotype of a given exhausted CD4+ T cell is matched to the CD4+ exhaustion score and/or the CD4+ GSEA score of the same exhausted CD4+ T cell. In some embodiments, the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell. In some embodiments, the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell via a same single cell barcode. In some embodiments, the method can further comprise
matching the TCR clonotype of a given exhausted CD4+ T cell to the single cell transcriptome data of the same exhausted CD4+ T cell. In some embodiments, the method can further comprise matching a barcode of the TCR clonotype of a given exhausted CD4+ T cell to the same barcode of the single cell transcriptome data of the same exhausted CD4+ T cell.
[0214] In some embodiments, a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score and/or the CD8+ GSEA score of the same exhausted CD8+ T cell. In some embodiments, a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score and the CD8+ GSEA score of the same exhausted CD8+ T cell. In some embodiments, a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score or the CD8+ GSEA score of the same exhausted CD8+ T cell. In some embodiments, the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell. In some embodiments, the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell via a same single cell barcode. In some embodiments, the method can further comprise matching the TCR clonotype of a given exhausted CD4+ T cell to the single cell transcriptome data of the same exhausted CD4+ T cell. In some embodiments, the method can further comprise matching a barcode of the TCR clonotype of a given exhausted CD4+ T cell to the same barcode of the single cell transcriptome data of the same exhausted CD4+ T cell.
[0215] In some embodiments, the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD4+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD4+ T cells. In some embodiments, the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD4+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD4+ T cells.
[0216] In some embodiments, the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD8+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD8+ T cells. In some embodiments, the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD8+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD8+ T cells.
[0217] In some embodiments, the method further comprises selecting candidate tumor-reactive TCR clonotypes from the TCR clonotypes identified for the exhausted CD4+ T cells and/or the exhausted CD8+ cells In some embodiments, the candidate tumor-reactive TCR clonotypes are further quality checked by (i) unique pairing of TCR alpha chain and TCR beta chain, (ii) match to known TCRs from a public database; and/or (iii) expression of innate immune cell markers. In some embodiments, the quality checking comprises excluding candidate tumor-reactive TCR clonotypes which (i) have unique pairing of TCR alpha chain and TCR beta chain, (ii) match to known TCRs from a public database; and/or (iii) express innate immune cell markers. In some embodiments, candidate tumor-reactive TCR clonotypes that match to a known TCR that recognizes a non-oncogenic pathogen are not selected. In some embodiments, the method further comprises ranking the candidate tumor-reactive TCR clonotypes of the exhausted CD4+ T cells based on clone size. In some embodiments, the method further comprises ranking the candidate tumor-reactive TCR clonotypes of the exhausted CD8+ T cells based on clone size. In some embodiments, the method further comprises ranking the candidate tumor- reactive TCR clonotypes with similar clone sizes based on the mean or median CD4+ exhaustion score, the maximum CD4+ exhaustion score, the mean or median CD4+ GSEA score, and/or the maximum CD4+ GSEA score for all CD4+ exhausted T cells. In some embodiments, the method further comprises ranking the candidate tumor-reactive TCR clonotypes with similar clone sizes based on the mean or median CD8+ exhaustion score, the maximum CD8+ exhaustion score, the mean or median CD8+ GSEA score, and/or the maximum CD8+ GSEA score for all CD8+ exhausted T cells.
[0218] In some embodiments, the same TCR clonotype is determined by having the same CDR3 sequence. In some embodiments, the candidate tumor-reactive TCR clonotypes that match to known TCRs are determined by having the same CDR3 sequence. In some embodiments, the candidate tumor- reactive TCR clonotype of a proliferating cell is given a higher weighting value when ranking the candidate tumor-reactive TCR clonotypes. In some embodiments, a proliferating cell is identified by gene expression. In some embodiments, a proliferating cell is given a GSEA score based upon expression of genes associated with proliferation. In some embodiments, a proliferating cell is identified as having a GSEA score that has a calculated area under the curve value above a cutoff value. In some embodiments, the cutoff value is 0.3. In some embodiments, the genes associated with proliferation comprise one or more genes (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 genes) presented in Table 9.
Table 9. Proliferation Genes
[0219] In some embodiments, the candidate tumor-reactive TCR clonotypes are predicted to be therapeutically relevant. In some embodiments, a median positive predictive value (PPV) of the prediction algorithm of the methods described herein is at least 0.001, at least 0.005, at least 0.01, at least 0.05, at least 0.1, at least 0.2, at least 0.25, at least 0.3, at least 0.35, at least 0.4, at least 0.45, at least 0.5, at least 0.55, at least 0.6, at least 0.65, at least 0.7, at least 0.75, at least 0.8, at least 0.85, or at least 0.9 for CD4+ TCR clones or the median PPV is at least 0.001, at least 0.005, at least 0.01, at least 0.05, at least 0.1, at least 0.2, at least 0.25, at least 0.3, at least 0.35, at least 0.4, at least 0.45, at least 0.5, at least 0.55, at least 0.6, at least 0.65, at least 0.7, at least 0.75, at least 0.8, at least 0.85, or at least 0.9 for CD8+ TCR clones. In some embodiments, the method further comprises selecting at least one, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 candidate tumor-reactive TCR clonotype from at least top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500 or more of the candidate tumor-reactive TCR clonotypes ranked. In some cases, the performance of the end-to-end algorithm for selecting at least top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500 or more candidate tumor-reactive TCR clonotypes has a PPV value of at least 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36,
0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54,
0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72,
0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9,
0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98 or 0.99. As used herein, the PPV can be used as a precision measurement, which is the probability that a TCR clonotype predicted to be a tumor-reactive TCR through, for example, the methods or algorithm described herein, actually is a tumor-reactive TCR. It can be calculated by dividing the number of true positive results by the total number of results that returned positive (results that include false positives). PPV=True Positives/(True positives + False positives). For example, if in a set of 100 TCR clonotypes, the methods or algorithm identified a positive result in 50 clonotypes, of which 25 were true positives, the PPV would be 25/50=0.5. A PPV closer to 1 represents a more accurate prediction method. A PPV may be used to determine the
accuracy of the prediction method or algorithm. A PPV may be used to adjust the prediction method to accommodate for false positive results that may be generated by the method.
[0220] In some embodiments, the method further comprises delivering a nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype of the candidate tumor-reactive TCR clonotypes into a cell. In some embodiments, the method further comprises administering the nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype of the candidate tumor-reactive TCR clonotypes, or a cell comprising the nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype of the candidate tumor-reactive TCR clonotypes into a subject. In some embodiments, the subject is the same subject where the population of T cells are obtained. In some embodiments, the population of T cells are tumor- infiltrating lymphocytes (TILs). In some embodiments, the population of T cells comprises at least 100, at least 500, at least 1,000, at least 2,000, at least 5,000, at least 10,000 or more cells.
[0221] Also provided herein is a method of identifying one or more T-cell receptors as one or more candidate tumor-reactive TCRs from exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer, the method comprising: (a) providing single cell transcriptome data and single-cell T-cell receptor (scTCR) data of the population of T cells comprising exhausted CD4+ cells and exhausted CD8+ cells; and (b) identifying TCR clonotypes of the exhausted CD4+ T cells or the exhausted CD8+ cells based on the scTCR data of the exhausted CD4+ T cells or the exhausted CD8+ T cells, wherein the exhausted CD4+ T cells or the exhausted CD8+ T cells are identified based on the single cell transcriptome data.
[0222] In some embodiments, the exhausted CD4+ T cells or the exhausted CD8+ T cells are identified by any one of the methods disclosed herein. In some embodiments, each cell of the exhausted CD4+ T cells or the exhausted CD8+ T cells has an exhaustion score and/or a GSEA score equal to or higher than a threshold value. In some embodiments, each cell of the exhausted CD4+ T cells or the exhausted CD8+ T cells has an exhaustion score and a GSEA score equal to or higher than a threshold value. In some embodiments, each cell of the exhausted CD4+ T cells or the exhausted CD8+ T cells has an exhaustion score or a GSEA score equal to or higher than a threshold value. In some embodiments, the candidate tumor-reactive TCR induces activation of NF AT. In some embodiments, the candidate tumor-reactive TCR induces expression of CD69, IFN-y, TNF-a, IL-2, and/or IL-18.
[0223] Further provided herein is a nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype selected by any one of the methods disclosed herein.
[0224] Further proved herein is a cell comprising a TCR comprising the at least one candidate tumor- reactive TCR clonotype selected by any one of the methods described herein. Also provided herein is a cell comprising a TCR encoded by any one of the nucleic acids disclosed herein.
[0225] Further provided herein is a pharmaceutical composition comprising a TCR comprising the at least one candidate tumor-reactive TCR clonotype selected by any one of the methods disclosed herein and a pharmaceutically acceptable carrier. Further provided herein is a pharmaceutical composition comprising a TCR encoded by any one of the nucleic acids disclosed herein and a pharmaceutically acceptable carrier. Further provided herein is a pharmaceutical composition comprising any one of the cells disclosed herein, and a pharmaceutically acceptable carrier.
[0226] Further proved herein is the use of a TCR comprising the at least one candidate tumor-reactive TCR clonotype selected by the methods described above, the nucleic acid described above, the cell described above, or the pharmaceutical composition described above in the manufacturing of a medicament in treating a cancer in a subject in need thereof. In some embodiments, the cancer is selected from the group consisting of bone cancer, blood cancer, lung cancer, liver cancer, pancreatic cancer, skin cancer, cancer of the head or neck, cutaneous or intraocular melanoma, uterine cancer, ovarian cancer, rectal cancer, cancer of the anal region, stomach cancer, colon cancer, breast cancer, prostate cancer, carcinoma of the sexual and reproductive organs, Hodgkin’s Disease, cancer of the esophagus, cancer of the small intestine, cancer of the endocrine system, cancer of the thyroid gland, cancer of the parathyroid gland, cancer of the adrenal gland, sarcoma of soft tissue, cancer of the bladder, cancer of the kidney, renal cell carcinoma, carcinoma of the renal pelvis neoplasms of the central nervous system (CNS), neuroectodermal cancer, spinal axis tumors glioma, meningioma, and pituitary adenoma.
Gene Set Enrichment Analysis (GSEA)
[0227] The process of identifying exhausted T cells from a population of T cells within a tumor microenvironment can be facilitated by the development of methods for recognizing these cells based on their transcriptome data and classifying them as CD4+ or CD8+ cells. This classification can be determined by the expression level of each classification gene from a defined set of genes. From this point, calculations can be made to determine the exhaustion score and/or GSEA score for each T cell. GSEA, a computational method, determines whether a predefined set of genes demonstrates statistically significant, concordant differences between two biological states. GSEA can play a pivotal role in further classifying and identifying CD4+ and CD8+ T cells that may be regarded as exhausted based on a certain threshold value, providing insight into the cellular dynamics at play within a tumor microenvironment. The GSEA score described herein can be calculated using an area under the curve (AUC) test.
[0228] The GSEA can be conducted by using an AUC scoring method which is implemented in a tool, for example, AUCell. AUCell uses the “Area Under the Curve” (AUC) to calculate whether a critical
subset of the input gene set is enriched within the top expressed genes for each cell. The distribution of AUC scores across all the cells can allow exploring the relative expression of the signature. Since the scoring method is ranking-based, AUCell can be independent of the gene expression units and the normalization procedure. In addition, since the cells are evaluated individually, it can easily be applied to bigger datasets, subsetting the expression matrix if needed. The first step to calculate the enrichment of a signature is to create the “rankings”. These rankings can be an intermediate step to calculate the AUC, but they are kept as a separate step in the workflow in order to provide more flexibility (e.g., to save them for future analyses, to merge datasets, or process them by parts). For each cell, the genes can be ranked from highest to lowest value. The genes with same expression value can be shuffled. Therefore, genes with expression ‘0’ are randomly sorted at the end of the ranking. It may be important to check that most cells have at least the number of expressed/detected genes that are going to be used to calculate the AUC. In order to calculate the AUC, by default the top 5% of the genes in the ranking can be used. This can allow faster execution on bigger datasets and reduce the effect of the noise at the bottom of the ranking (e.g., where many genes might be tied at 0 counts). The percentage to be taken into account can be modified.
[0229] The AUC can estimate the proportion of genes in the gene-set that are highly expressed in each cell. Cells expressing many genes from the gene-set can have higher AUC values than cells expressing fewer (compensating for housekeeping genes, or genes that are highly expressed in all the cells in the dataset). Because the AUC represents the proportion of expressed genes in the gene-set, the relative AUCs across the cells can be used to explore a population of cells that are present in the dataset according to the expression of the gene-set.
[0230] The GSEA score can also be calculated by other methods, for example, the Kolmogorov- Smirnov (K-S) test. K-S test is a nonparametric test used to determine whether two underlying one- dimensional probability distributions differ, or to compare a sample with a reference probability distribution. In the context of GSEA, the K-S test can be adapted to evaluate the distribution of genes within predefined sets, to see if they are randomly distributed across the ranked list of all genes in a dataset or if they tend to cluster towards the top or bottom of the list, indicating enrichment. When applied to gene set enrichment, the K-S test may involve the following steps. All genes in the study may be ranked based on their correlation with a phenotype or biological condition of interest. The ranking metric can vary but often involves measures of differential expression, such as fold change or statistical significance. For a given gene set, the K-S test calculates an enrichment score (ES) that reflects the degree to which that gene set may be overrepresented at the top or bottom of the ranked list of genes. The ES is the maximum distance between the cumulative distribution function (CDF) of the gene set and the CDF of the background gene set. Starting from the top of the ranked list, the test
moves down the list, increasing a running-sum statistic when encountering a gene in the gene set and decreasing it when encountering genes not in the set. The magnitude of the increment depends on the correlation of the genes with the phenotype. The ES may be the peak deviation from zero encountered in this walk - positive if the set is enriched at the top of the ranked list, and negative if enriched at the bottom. To account for the size of the gene set, the ES can be normalized to yield a normalized enrichment score (NES), which allows comparison across gene sets of different sizes. The significance of the observed ES (or NES) can be typically assessed through permutation testing. By randomly permuting the phenotype labels or gene labels multiple times and recalculating the ES for each permutation, one can generate a null distribution of ES values against which the observed ES can be compared to estimate a p-value. Since many gene sets are tested simultaneously, correction for multiple hypothesis testing is often applied to control the false discovery rate (FDR). The K-S test in the context of GSEA provides a powerful way to identify gene sets that are significantly associated with a phenotype, taking into account the collective behavior of genes within sets rather than focusing on individual genes. This approach can be particularly useful in exploring the biological mechanisms underlying complex traits and diseases.
[0231] The specific equation for the K-S test in GSEA, which captures the essence of the enrichment score (ES) calculation, can be summarized as follows. Let S be the set of all genes, ranked based on their correlation with a phenotype. Let G be a predefined gene set whose enrichment is to be tested against S. The rank in S of each gene in G is used to calculate the ES. The ES can be calculated by walking down the list S, increasing a running-sum statistic for each gene in G and decreasing it for each gene not in G. The increase or decrease for each gene can be proportional to the gene's correlation with the phenotype. The formula for the running-sum statistic R(i) at position i in the ranked list is given by:
where:
• is the contribution of gene j to the ES. For genes in , and for
genes not in where NR is the number of genes in G and N is the total number
of genes.
• NH is the number of genes not in G.
• N is the total number of genes in the list.
[0232] The ES is the maximum deviation from zero of R(i) across all positions / in the ranked list:
[0233] This ES reflects how much the gene set G may be overrepresented at the top or bottom of the ranked list S, with higher absolute values indicating greater enrichment. The sign of the ES indicates whether the set may be enriched at the top (positive ES or bottom (negative ES) of the ranked list S.
Sequencing
[0234] Various sequencing methods can be used herein. Various sequencing methods include, but are not limited to, Sanger sequencing, high-throughput sequencing, sequencing-by-synthesis, single- molecule sequencing, sequencing-by-ligation, RNA-Seq, Next generation sequencing (NGS), Digital Gene Expression, Clonal Single MicroArray, shotgun sequencing, Maxim-Gilbert sequencing, or massively-parallel sequencing. The T cells can be used as input for single-cell RNA-Seq methods such as inDrop or DropSeq. For example, the sequencing may use single cell barcoding (e.g., partitioning the cells into individual compartment, barcoding nucleic acids released from a single cell, sequencing the nucleic acids, and pair the TCR chains from a single cell based on a same barcode). The sequencing may not comprise using a barcode if the sequence encoding the paired TCR chains within a cell has been fused or linked in a single continuous polynucleotide chain.
[0235] Sequencing described herein can be single cell sequencing. Single cell sequencing refers to obtaining sequence information from individual cells. It can be used to detect the genome, transcriptome and other multi-omics of single cells. In single cell sequencing, a population of cells can be made into single cell suspension and compartmentalized into individual partitions. Within each partition, the sequences released from a single cell can be barcoded and later sequenced. Various single cell sequencing methods can be used for TCR reconstruction (see De Simone M, Rossetti G and Pagani M (2018) Single Cell T Cell Receptor Sequencing: Techniques and Future Challenges. Front. Immunol. 9: 1638).
Bulk sequencing
[0236] Bulk sequencing, also known as population or conventional sequencing, can be a technique where the genetic material (DNA or RNA) from a large population of cells is collectively extracted and sequenced. This approach, unlike single-cell sequencing, may provide an averaged view of the genetic or transcriptomic profile of all the cells within the sample, hence the term "bulk".
[0237] In practice, the bulk sequencing process can start with the isolation of genetic material from the cell population of interest. For genomic studies, DNA may be extracted, purified, and fragmented.
For transcriptomic studies, total RNA can be first isolated, and then mRNA can be either directly used or converted into cDNA via reverse transcription. Once the genetic material is prepared, it may be used to construct a sequencing library. This may involve adapter ligation and may also include amplification steps. The prepared library can be then sequenced using next-generation sequencing (NGS) platforms, such as Illumina, Ion Torrent, or Pacific Biosciences.
[0238] The resulting data, however, may not allow for the resolution of individual cellular identities or states within the population. Instead, it offers an averaged “snapshot” of the cell population’s genetic or gene expression status. This could potentially mask the contributions of rare or highly variable cells within the population. Nevertheless, bulk sequencing can be useful for profiling large numbers of samples cost-effectively, establishing a baseline reference for a given tissue or cell type, or identifying common or dominant genetic or transcriptomic features of a population of cells.
Single cell analysis and compartmentalization
[0239] Sequencing described herein can be a single cell sequencing, which can be used for characterizing nucleic acids at a single-cell level. In particular, the single cell sequencing can use a droplet-based system. For example, the single cell sequencing can use a droplet-based system that enables 5’ mRNA digital counting of up to tens of thousands of single cells. In some cases, the single cell sequencing can use a droplet-based system that enables 5’ mRNA digital counting of up to hundreds of thousands of single cells, up to millions of single cells, or more. Various droplet-based systems can be used.
[0240] In some cases, the single cell analysis utilizes compartmentalization or partitioning of individual cells into discrete compartments or partitions (used interchangeably). A whole cell can be isolated in a compartment, thereby, allowing that cell to remain separate from other cells of the sample. When desired, the nucleic acids from a whole cell can be released into the compartment, for example, by contacting the cell with a lysis agent or other stimulus. The released nucleic acids can remain in the compartment, separated from other cells of the sample and also the nucleic acids associated with other cells of the sample. Unique identifiers, e.g., barcodes, may be previously, subsequently or concurrently delivered to the compartments that hold single cells, in order to allow for the later attribution of, e.g., sequence information, to a particular cell. While in the partitions, unique identifiers, e.g., barcodes or barcode sequences, can be associated with the nucleic acid sequences of nucleic acids from the whole cell using various processes, including ligation and/or amplification techniques. These barcode sequences can be used to determine the origin of a nucleic acid and/or to identify various nucleic acid sequences as being associated with a particular cell. Such identification can then allow that analysis to be attributed back to the individual cell or small group of cells from which the nucleic acids were derived. This can be accomplished regardless of whether the cell population represents a 50/50 mix of
cell types, a 90/10 mix of cell types, or virtually any ratio of cell types, as well as a complete heterogeneous mix of different cell types, or any mixture between these. Differing cell types may include cells or biologic organisms from different tissue types of an individual, from different individuals, from differing genera, species, strains, variants, or any combination of any or all of the foregoing. For example, differing cell types may include normal and tumor tissue from an individual, cells from a donor and a recipient (e.g., transplant), multiple different bacterial species, strains and/or variants from environmental, forensic, microbiome or other samples, or any of a variety of other mixtures of cell types.
[0241] In various embodiments, compartments comprise droplets of aqueous fluid within a non- aqueous continuous phase, e.g., an oil phase. In alternative embodiments, compartments can refer to containers or vessels (such as wells, microwells, tubes, through ports in nanoarray substrates, or other containers). These compartments may comprise, e g., microcapsules or micro-vesicles that have an outer barrier surrounding an inner fluid center or core, or they may be a porous matrix that is capable of entraining and/or retaining materials within its matrix. A variety of different vessels are described in, for example, U.S. Patent Application Publication No. 20140155295, the full disclosure of which is incorporated herein by reference in its entirety for all purposes. Likewise, emulsion systems for creating stable droplets in non-aqueous or oil continuous phases are described in detail in, e.g., U.S. Patent Application Publication No. 20100105112, the full disclosure of which is incorporated herein by reference in its entirety for all purposes.
[0242] In the case of droplets in an emulsion, allocating individual cells to discrete compartments may generally be accomplished by introducing a flowing stream of cells in an aqueous fluid into a flowing stream of a non-aqueous fluid, such that droplets are generated at the junction of the two streams. By providing the aqueous cell -containing stream at a certain concentration level of cells, the level of occupancy of the resulting partitions in terms of numbers of cells can be controlled. In some cases, where single cell partitions are desired, it may be desirable to control the relative flow rates of the fluids such that, on average, the partitions contain less than one cell per partition, in order to ensure that those partitions which are occupied, are primarily singly occupied. The flow rate can also be altered to provide a higher percentage of partitions that are occupied, e.g., allowing for only a small percentage of unoccupied partitions. In some aspects, the flows and channel architectures are controlled as to ensure a desired number of singly occupied partitions, less than a certain level of unoccupied partitions and/or less than a certain level of multiply occupied partitions.
[0243] A droplet-based system disclosed herein can capture any suitable percentage of a cell population to be analyzed into compartments, e.g., droplets. In some cases, it is desirable to capture the entire cell population into droplets. In other cases, capture of a percentage of the cell population is
desired or sufficient for downstream analysis and assay. In some embodiments, at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% of the cells of a cell sample are captured in a droplet using a droplet-based system provided herein. In some embodiments, at most about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% of the cells of a cell sample are captured in a droplet using a droplet-based system provided herein. In some embodiments, approximately 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% of the cells of a cell sample are captured in a droplet using a droplet-based system provided herein. In some embodiments, between about 10% and about 95%, between about 15% and about 90%, between about 20% and about 85%, between about 25% and about 80%, between about 30% and about 75%, between about 35% and about 70%, between about 40% and about 65%, between about 45% and about 60%, or between about 50% and about 55% of cells of a cell sample are captured in a droplet using a droplet- based system provided herein. In some embodiments, the percentage of cells captured into droplets can be optimized for a particular type of assay. In some embodiments, approximately 50% of cells of a cell sample loaded into a droplet-based system are captured in a droplet.
[0244] In many cases, a substantial majority of occupied partitions (partitions containing one or more microcapsules) formed from methods and systems disclosed herein include no more than 1 cell per occupied partition. In some cases, fewer than 25% of the occupied partitions contain more than one cell, and in many cases, fewer than 20% of the occupied partitions have more than one cell, while in some cases, fewer than 10% or even fewer than 5% of the occupied partitions include more than one cell per partition.
[0245] Additionally or alternatively, in many cases, it is desirable to avoid the creation of excessive numbers of empty partitions. While this may be accomplished by providing sufficient numbers of cells into the partitioning zone, the Poisson distribution may increase the number of partitions that would include multiple cells. In some embodiments, the flow of one or more of the cells, or other fluids directed into the partitioning zone are such that, in many cases, no more than 50% of the generated partitions, 25% of partitions, or 10% of partitions are unoccupied (e.g., including less than 1 cell). Further, in some aspects, these flows are controlled so as to present non-Poisson distribution of single occupied partitions while providing lower levels of unoccupied partitions.
[0246] Although described in terms of providing substantially singly occupied partitions, above, in certain cases, it is desirable to provide multiply occupied partitions, e.g., containing two, three, four or more cells within a single partition. Accordingly, as noted above, the flow characteristics of the cell and/or bead containing fluids and partitioning fluids may be controlled to provide for such multiply occupied partitions. In particular, the flow parameters may be controlled to provide a desired
occupancy rate at greater than 50% of the partitions, greater than 75%, and in some cases greater than 80%, 85%, 90%, 95%, or higher.
[0247] The partitions described herein can be characterized by having extremely small volumes, e g., less than 10 microliters (μL), 5 μL, 1 μL, 900 nanoliters (nL), 500 nL, 100 nL, 50 nL, 1 nL, 900 picoliters (μL), 800 μL, 700 μL, 600 μL, 500 μL, 400 μL, 300 μL, 200 μL, 100 μL, 50 μL, 20 μL, 10 μL, or 1 μL. For example, in the case of droplet-based partitions, the droplets may have overall volumes that are less than 1000 μL, 900 μL, 800 μL, 700 μL, 600 μL, 500 μL, 400 μL, 300 μL, 200 μL, 100 μL, 50 μL, 20 μL, 10 μL, or even less than 1 μL. Where co -partitioned with beads, it will be appreciated that the sample fluid volume, e.g., including co-partitioned cells, within the partitions may be less than 90% of the above described volumes, less than 80%, less than 70%, less than 60%, less than 50%, less than 40%, less than 30%, less than 20%, or even less than 10% the above described volumes.
[0248] Multiple samples can be processed in parallel using droplet-based systems. In some embodiments, at least 2, 3, 4, 5, 6, 7, 8, 9, or 10 samples are processed in parallel. The multiple samples processed in parallel may comprise similar numbers of cells. In some cases, the multiple samples processed in parallel do not comprise similar numbers of cells.
[0249] A cell population for analysis can comprise any number of cells. In some embodiments, a cell sample loaded on a droplet-based system of the disclosure comprises at least about 100, 1,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 125,000, 150,000, 175,000, 200,000, 225,000, 250,000, 275,000, 300,000, 325,000, 350,000, 375,000, 400,000, 425,000, 450,000, 475,000, 500,000, 525,000, 550,000, 575,000, 600,000, 625,000, 650,000, 675,000, 700,000, 725,000, 750,000, 775,000, 800,000, 825,000, 850,000, 875,000, 900,000, 925,000, 950,000, 975,000, or 1,000,000 cells. In some embodiments, a cell sample loaded on a droplet-based system of the disclosure comprises at most about 100, 1,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 125,000, 150,000, 175,000, 200,000, 225,000, 250,000, 275,000, 300,000, 325,000, 350,000, 375,000, 400,000, 425,000, 450,000, 475,000, 500,000, 525,000, 550,000, 575,000, 600,000, 625,000, 650,000, 675,000, 700,000, 725,000, 750,000, 775,000, 800,000, 825,000, 850,000, 875,000, 900,000, 925,000, 950,000, 975,000, or 1,000,000 cells. In some embodiments, a cell sample loaded on a droplet-based system of the disclosure comprises approximately 100, 1,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 125,000, 150,000, 175,000, 200,000, 225,000, 250,000, 275,000, 300,000, 325,000, 350,000, 375,000, 400,000, 425,000, 450,000, 475,000, 500,000, 525,000, 550,000, 575,000, 600,000, 625,000, 650,000, 675,000, 700,000, 725,000, 750,000, 775,000, 800,000, 825,000, 850,000, 875,000, 900,000, 925,000, 950,000, 975,000, or 1,000,000 cells.
[0250] As is described elsewhere herein, partitioning species may generate a population of partitions. In such cases, any suitable number of partitions can be generated to generate the population of partitions For example, in a method described herein, a population of partitions may be generated that comprises at least about 1,000 partitions, atleast about 5,000 partitions, at least about 10,000 partitions, at least about 50,000 partitions, at least about 100,000 partitions, at least about 500,000 partitions, at least about 1,000,000 partitions, at least about 5,000,000 partitions at least about 10,000,000 partitions, at least about 50,000,000 partitions, at least about 100,000,000 partitions, at least about 500,000,000 partitions or at least about 1,000,000,000 partitions. Moreover, the population of partitions may comprise both unoccupied partitions (e.g., empty partitions) and occupied partitions.
Single cell RNA sequencing
[0251] Single-cell RNA sequencing (scRNA-seq) and single-cell DNA sequencing (scDNA-seq) can be powerful technologies that provide an in-depth view of the genetic material within individual cells. The fundamental processes involved in these techniques may be cell isolation, lysis, reverse transcription (for scRNA-seq), amplification, library preparation, and sequencing.
[0252] For scRNA-seq, the process starts with isolating individual cells from a sample. This can be done using techniques such as FACS (Fluorescence Activated Cell Sorting), microfluidic devices or droplet-based systems. Once the cells are isolated, the cells can be lysed to release the RNAs. The RNAs can then be reverse transcribed into complementary DNA (cDNA). The cDNA can be amplified, which increases the amount of material for downstream analysis. The cDNA library can be then prepared and sequenced. Advanced bioinformatics tools may be subsequently used to analyze the resulting data and generate gene expression profiles for each individual cell. Platforms like Smart-seq2 can provide full-length transcript information, allowing for detection of splice variants, while droplet- based systems like lOx Genomics Chromium excel at processing thousands of cells at a lower read depth per cell.
Single cell DNA sequencing
[0253] Single cell DNA sequencing (scDNA-seq) can follow a similar process with several differences. Instead of isolating RNA and performing reverse transcription, the genomic DNA from the lysed cells can be directly used. Post cell lysis, the genomic DNA can be subjected to whole genome amplification (WGA) to produce sufficient DNA for sequencing. Various amplification techniques can be used, such as Multiple Displacement Amplification (MDA), Amplification via Strand Displacement Amplification (SDA), or MALBAC (Multiple Annealing and Looping Based Amplification Cycles). Following amplification, a sequencing library can be prepared and then sequenced. The resulting data can be used to identify genomic variants, TCR sequences and copy number variations at the single-
cell level, unveiling cell-to-cell genomic heterogeneity, which is particularly important in cancer research.
Barcoding
[0254] The nucleic acids sequenced in the methods described herein can be barcoded. The barcode can be a cell barcode or a molecular barcode. In some cases, a barcode may not be used and sequences are analyzed through bulk sequencing.
[0255] Unique identifiers, e.g., barcodes, may be previously, subsequently or concurrently delivered to the partitions that hold the compartmentalized or partitioned cells. Barcodes, which comprise a barcode sequence, may be delivered, in some embodiments, on an oligonucleotide (referred to interchangeably as a “barcoded oligonucleotide” or “oligonucleotide barcode”), to a partition via any suitable mechanism.
[0256] In some embodiments, barcoded oligonucleotides are delivered to a partition via a microcapsule. In some cases, barcoded oligonucleotides are initially associated with the microcapsule and then released from the microcapsule upon application of a stimulus which allows the oligonucleotides to dissociate or to be released from the microcapsule.
[0257] A microcapsule, in some embodiments, comprises a bead. In some embodiments, a bead may be porous, non-porous, solid, semi-solid, semi-fluidic, or fluidic. In some embodiments, a bead may be dissolvable, disruptable, or degradable. In some cases, a bead may not be degradable. In some embodiments, the bead may be a gel bead. A gel bead can be a hydrogel bead. A gel bead can be formed from molecular precursors, such as a polymeric or monomeric species. A semi-solid bead can be a liposomal bead. Solid beads can comprise metals including iron oxide, gold, and silver. In some cases, the beads are silica beads. In some cases, the beads are rigid. In some cases, the beads are flexible and/or compressible.
[0258] The beads may contain molecular precursors (e.g., monomers or polymers), which may form a polymer network via polymerization of the precursors. In some cases, a precursor may be an already polymerized species capable of undergoing further polymerization via, for example, a chemical cross- linkage. In some cases, a precursor comprises one or more of an acrylamide or a methacrylamide monomer, oligomer, or polymer. In some cases, the bead may comprise prepolymers, which are oligomers capable of further polymerization. For example, polyurethane beads may be prepared using prepolymers. In some cases, the bead may contain individual polymers that may be further polymerized together. In some cases, beads may be generated via polymerization of different precursors, such that they comprise mixed polymers, co-polymers, and/or block co-polymers.
[0259] A bead may comprise natural and/or synthetic materials. For example, a polymer can be a natural polymer or a synthetic polymer. In some cases, a bead comprises both natural and synthetic
polymers. Examples of natural polymers include proteins and sugars such as deoxyribonucleic acid, rubber, cellulose, starch (e.g., amylose, amylopectin), proteins, enzymes, polysaccharides, silks, polyhydroxyalkanoates, chitosan, dextran, collagen, carrageenan, ispaghula, acacia, agar, gelatin, shellac, sterculia gum, xanthan gum, Com sugar gum, guar gum, gum karaya, agarose, alginic acid, alginate, or natural polymers thereof. Examples of synthetic polymers include acrylics, nylons, silicones, spandex, viscose rayon, polycarboxylic acids, polyvinyl acetate, polyacrylamide, polyacrylate, polyethylene glycol, polyurethanes, polylactic acid, silica, polystyrene, polyacrylonitrile, polybutadiene, polycarbonate, polyethylene, polyethylene terephthalate, poly(chlorotrifluoroethylene), poly(ethylene oxide), poly(ethylene terephthalate), polyethylene, polyisobutylene, poly(methyl methacrylate), poly(oxymethylene), polyformaldehyde, polypropylene, polystyrene, poly(tetrafluoroethylene), poly(vinyl acetate), poly(vinyl alcohol), poly(vinyl chloride), poly(vinylidene dichloride), poly(vinylidene difluoride), poly(vinyl fluoride) and combinations (e.g., co-polymers) thereof. Beads may also be formed from materials other than polymers, including lipids, micelles, ceramics, glass-ceramics, material composites, metals, other inorganic materials, and others. [0260] In some cases, a chemical cross-linker may be a precursor used to cross-link monomers during polymerization of the monomers and/or may be used to attach oligonucleotides (e.g., barcoded oligonucleotides) to the bead. In some cases, polymers may be further polymerized with a cross-linker species or other type of monomer to generate a further polymeric network. Non-limiting examples of chemical cross-linkers (also referred to as a “crosslinker” or a “crosslinker agent” herein) include cystamine, gluteraldehyde, dimethyl suberimidate, N-Hydroxysuccinimide crosslinker BS3, formaldehyde, carbodiimide (EDC), SMCC, Sulfo-SMCC, vinylsilane, N,N’diallyltartardiamide (DATD), N,N’-Bis(acryloyl)cystamine (BAC), or homologs thereof. In some cases, the crosslinker used in the present disclosure contains cystamine.
[0261] Crosslinking may be permanent or reversible, depending upon the particular crosslinker used. Reversible crosslinking may allow for the polymer to linearize or dissociate under appropriate conditions. In some cases, reversible cross-linking may also allow for reversible attachment of a material bound to the surface of a bead. In some cases, a cross-linker may form disulfide linkages. In some cases, the chemical cross-linker forming disulfide linkages may be cystamine or a modified cystamine.
[0262] In some embodiments, disulfide linkages can be formed between molecular precursor units (e.g., monomers, oligomers, or linear polymers) or precursors incorporated into a bead and oligonucleotides. Cystamine (including modified cystamines), for example, is an organic agent comprising a disulfide bond that may be used as a crosslinker agent between individual monomeric or polymeric precursors of a bead. Polyacrylamide may be polymerized in the presence of cystamine or
a species comprising cystamine (e.g., a modified cystamine) to generate polyacrylamide gel beads comprising disulfide linkages (e.g., chemically degradable beads comprising chemically-reducible cross-linkers). The disulfide linkages may permit the bead to be degraded (or dissolved) upon exposure of the bead to a reducing agent.
[0263] In some embodiments, chitosan, a linear polysaccharide polymer, may be crosslinked with glutaraldehyde via hydrophilic chains to form a bead. Crosslinking of chitosan polymers may be achieved by chemical reactions that are initiated by heat, pressure, change in pH, and/or radiation.
[0264] In some embodiments, the bead may comprise covalent or ionic bonds between polymeric precursors (e.g., monomers, oligomers, linear polymers), oligonucleotides, primers, and other entities. In some cases, the covalent bonds comprise carbon-carbon bonds or thioether bonds.
[0265] In some cases, a bead may comprise an acrydite moiety, which in certain aspects may be used to attach one or more oligonucleotides (e.g., barcode sequence, barcoded oligonucleotide, primer, or other oligonucleotide) to the bead. In some cases, an acrydite moiety can refer to an acrydite analogue generated from the reaction of acrydite with one or more species, such as, the reaction of acrydite with other monomers and cross-linkers during a polymerization reaction. Acrydite moieties may be modified to form chemical bonds with a species to be attached, such as an oligonucleotide (e.g., barcode sequence, barcoded oligonucleotide, primer, or other oligonucleotide). Acrydite moieties may be modified with thiol groups capable of forming a disulfide bond or may be modified with groups already comprising a disulfide bond. The thiol or disulfide (via disulfide exchange) may be used as an anchor point for a species to be attached or another part of the acrydite moiety may be used for attachment. In some cases, attachment is reversible, such that when the disulfide bond is broken (e g., in the presence of a reducing agent), the attached species is released from the bead. In other cases, an acrydite moiety comprises a reactive hydroxyl group that may be used for attachment.
[0266] Functionalization of beads for attachment of oligonucleotides may be achieved through a wide range of different approaches, including activation of chemical groups within a polymer, incorporation of active or activatable functional groups in the polymer structure, or attachment at the pre-polymer or monomer stage in bead production.
[0267] For example, precursors (e.g., monomers, cross-linkers) that are polymerized to form a bead may comprise acrydite moieties, such that when a bead is generated, the bead also comprises acrydite moieties. The acrydite moieties can be attached to an oligonucleotide, such as a primer (e.g., a primer for amplifying target nucleic acids, barcoded oligonucleotide, etc.) that is desired to be incorporated into the bead. In some cases, the primer comprises a P5 sequence for attachment to a sequencing flow cell for Illumina sequencing. In some cases, the primer comprises a P7 sequence for attachment to a sequencing flow cell for Illumina sequencing. In some cases, the primer comprises a barcode sequence.
In some cases, the primer further comprises a unique molecular identifier (UMI). In some cases, the primer comprises an R1 primer sequence for Illumina sequencing. In some cases, the primer comprises an R2 primer sequence for Illumina sequencing
[0268] In some cases, precursors comprising a functional group that is reactive or capable of being activated such that it becomes reactive can be polymerized with other precursors to generate gel beads comprising the activated or activatable functional group. The functional group may then be used to attach additional species (e.g., disulfide linkers, primers, other oligonucleotides, etc.) to the gel beads. For example, some precursors comprising a carboxylic acid (COOH) group can co-polymerize with other precursors to form a gel bead that also comprises a COOH functional group. In some cases, acrylic acid (a species comprising free COOH groups), acrylamide, and bis(acryloyl)cystamine can be co-polymerized together to generate a gel bead comprising free COOH groups. The COOH groups of the gel bead can be activated (e.g., via l-Ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC) andN- Hydroxysuccinimide (NHS) or 4-(4,6-Dimethoxy-l,3,5-triazin-2-yl)-4-methylmorpholinium chloride (DMTMM)) such that they are reactive (e.g., reactive to amine functional groups where EDC/NHS or DMTMM are used for activation). The activated COOH groups can then react with an appropriate species (e.g., a species comprising an amine functional group where the carboxylic acid groups are activated to be reactive with an amine functional group) comprising a moiety to be linked to the bead. [0269] Beads comprising disulfide linkages in their polymeric network may be functionalized with additional species via reduction of some of the disulfide linkages to free thiols. The disulfide linkages may be reduced via, for example, the action of a reducing agent (e.g., DTT, TCEP, etc.) to generate free thiol groups, without dissolution of the bead. Free thiols of the beads can then react with free thiols of a species or a species comprising another disulfide bond (e.g., via thiol-disulfide exchange) such that the species can be linked to the beads (e g., via a generated disulfide bond). In some cases, free thiols of the beads may react with any other suitable group. For example, free thiols of the beads may react with species comprising an acrydite moiety. The free thiol groups of the beads can react with the acrydite via Michael addition chemistry, such that the species comprising the acrydite is linked to the bead. In some cases, uncontrolled reactions can be prevented by inclusion of a thiol capping agent such as N-ethylmalieamide or iodoacetate.
[0270] Activation of disulfide linkages within a bead can be controlled such that only a small number of disulfide linkages are activated. Control may be exerted, for example, by controlling the concentration of a reducing agent used to generate free thiol groups and/or concentration of reagents used to form disulfide bonds in bead polymerization. In some cases, a low concentration (e.g., molecules of reducing agent: gel bead ratios of less than about 10,000, 100,000, 1,000,000, 10,000,000, 100,000,000, 1,000,000,000, 10,000,000,000, or 100,000,000,000) of reducing agent may be used for
reduction. Controlling the number of disulfide linkages that are reduced to free thiols may be useful in ensuring bead structural integrity during functionalization. In some cases, optically-active agents, such as fluorescent dyes may be coupled to beads via free thiol groups of the beads and used to quantify the number of free thiols present in a bead and/or track a bead.
[0271] In some cases, addition of moieties to a gel bead after gel bead formation may be advantageous. For example, addition of an oligonucleotide (e.g., barcoded oligonucleotide) after gel bead formation may avoid loss of the species during chain transfer termination that can occur during polymerization. Moreover, smaller precursors (e.g., monomers or cross linkers that do not comprise side chain groups and linked moieties) may be used for polymerization and can be minimally hindered from growing chain ends due to viscous effects. In some cases, functionalization after gel bead synthesis can minimize exposure of species (e.g., oligonucleotides) to be loaded with potentially damaging agents (e.g., free radicals) and/or chemical environments. In some cases, the generated gel may possess an upper critical solution temperature (UCST) that can permit temperature driven swelling and collapse of a bead. Such functionality may aid in oligonucleotide (e g., a primer) infiltration into the bead during subsequent functionalization of the bead with the oligonucleotide. Post-production functionalization may also be useful in controlling loading ratios of species in beads, such that, for example, the variability in loading ratio is minimized. Species loading may also be performed in a batch process such that a plurality of beads can be functionalized with the species in a single batch.
[0272] In some cases, an acrydite moiety linked to precursor, another species linked to a precursor, or a precursor itself comprises a labile bond, such as chemically, thermally, or photo-sensitive bonds e g., disulfide bonds, UV sensitive bonds, or the like. Once acrydite moieties or other moieties comprising a labile bond are incorporated into a bead, the bead may also comprise the labile bond. The labile bond may be, for example, useful in reversibly linking (e.g., covalently linking) species (e.g., barcodes, primers, etc.) to a bead. In some cases, a thermally labile bond may include a nucleic acid hybridization based attachment, e.g., where an oligonucleotide is hybridized to a complementary sequence that is attached to the bead, such that thermal melting of the hybrid releases the oligonucleotide, e.g., a barcode containing sequence, from the bead or microcapsule.
[0273] The addition of multiple types of labile bonds to a gel bead may result in the generation of a bead capable of responding to varied stimuli. Each type of labile bond may be sensitive to an associated stimulus (e.g., chemical stimulus, light, temperature, etc.) such that release of species attached to a bead via each labile bond may be controlled by the application of the appropriate stimulus. Such functionality may be useful in controlled release of species from a gel bead. In some cases, another species comprising a labile bond may be linked to a gel bead after gel bead formation via, for example, an activated functional group of the gel bead as described above. As will be appreciated, barcodes that
are releasably, cleavably or reversibly attached to the beads described herein include barcodes that are released or releasable through cleavage of a linkage between the barcode molecule and the bead, or that are released through degradation of the underlying bead itself, allowing the barcodes to be accessed or accessible by other reagents, or both.
[0274] The barcodes that are releasable as described herein may sometimes be referred to as being activatable, in that they are available for reaction once released. Thus, for example, an activatable barcode may be activated by releasing the barcode from a bead (or other suitable type of partition described herein). Other activatable configurations are also envisioned in the context of the described methods and systems.
[0275] In addition to thermally cleavable bonds, disulfide bonds and UV sensitive bonds, other non- limiting examples of labile bonds that may be coupled to a precursor or bead include an ester linkage (e.g., cleavable with an acid, a base, or hydroxylamine), a vicinal diol linkage (e.g., cleavable via sodium periodate), a Diels-Alder linkage (e.g., cleavable via heat), a sulfone linkage (e.g., cleavable via a base), a silyl ether linkage (e.g., cleavable via an acid), a glycosidic linkage (e g., cleavable via an amylase), a peptide linkage (e.g., cleavable via a protease), or a phosphodiester linkage (e.g., cleavable via a nuclease (e.g., DNase)).
[0276] Species that do not participate in polymerization may also be encapsulated in beads during bead generation (e.g., during polymerization of precursors). Such species may be entered into polymerization reaction mixtures such that generated beads comprise the species upon bead formation. In some cases, such species may be added to the beads after formation. Such species may include, for example, oligonucleotides, reagents for a nucleic acid amplification reaction (e g., primers, polymerases, dNTPs, co-factors (e.g., ionic co-factors)) including those described herein, reagents for enzymatic reactions (e.g., enzymes, co-factors, substrates), or reagents for a nucleic acid modification reactions such as polymerization, ligation, or digestion. Trapping of such species may be controlled by the polymer network density generated during polymerization of precursors, control of ionic charge within the gel bead (e.g., via ionic species linked to polymerized species), or by the release of other species. Encapsulated species may be released from a bead upon bead degradation and/or by application of a stimulus capable of releasing the species from the bead.
[0277] Beads may be of uniform size or heterogeneous size. In some cases, the diameter of a bead may be about 1 pm, 5 pm, 10 pm, 20 pm, 30 pm, 40 pm, 50 pm, 60 pm, 70 pm, 80 pm, 90 pm, 100 pm, 250 pm, 500 pm, or 1 mm. In some cases, a bead may have a diameter of at least about 1 pm, 5 pm, 10 pm, 20 pm, 30 pm, 40 pm, 50 pm, 60 pm, 70 pm, 80 pm, 90 pm, 100 pm, 250 pm, 500 pm, 1 mm, or more. In some cases, a bead may have a diameter of less than about 1 pm, 5 pm, 10 pm, 20 pm, 30 pm, 40 pm, 50 pm, 60 pm, 70 pm, 80 pm, 90 pm, 100 pm, 250 pm, 500 pm, or 1 mm. In some cases,
a bead may have a diameter in the range of about 40-75 gm, 30-75 gm, 20-75 gm, 40-85 gm, 40-95 gm, 20-100 gm, 10-100 gm, 1-100 gm, 20-250 gm, or 20-500 gm.
[0278] In certain aspects, beads are provided as a population or plurality of beads having a relatively monodisperse size distribution. Where it may be desirable to provide relatively consistent amounts of reagents within partitions, maintaining relatively consistent bead characteristics, such as size, can contribute to the overall consistency. In particular, the beads described herein may have size distributions that have a coefficient of variation in their cross-sectional dimensions of less than 50%, less than 40%, less than 30%, less than 20%, and in some cases less than 15%, less than 10%, or even less than 5%.
[0279] Beads may be of any suitable shape. Examples of bead shapes include, but are not limited to, spherical, non-spherical, oval, oblong, amorphous, circular, cylindrical, and variations thereof.
[0280] In addition to, or as an alternative to the cleavable linkages between the beads and the associated molecules, e.g., barcode containing oligonucleotides, described above, the beads may be degradable, disruptable, or dissolvable spontaneously or upon exposure to one or more stimuli (e.g., temperature changes, pH changes, exposure to particular chemical species or phase, exposure to light, reducing agent, etc.). In some cases, a bead may be dissolvable, such that material components of the beads are solubilized when exposed to a particular chemical species or an environmental change, such as a change temperature or a change in pH. In some cases, a gel bead is degraded or dissolved at elevated temperature and/or in basic conditions. In some cases, a bead may be thermally degradable such that when the bead is exposed to an appropriate change in temperature (e.g., heat), the bead degrades. Degradation or dissolution of a bead bound to a species (e.g., an oligonucleotide, e g., barcoded oligonucleotide) may result in release of the species from the bead.
[0281] A degradable bead may comprise one or more species with a labile bond such that, when the bead/species is exposed to the appropriate stimuli, the bond is broken, and the bead degrades. The labile bond may be a chemical bond (e.g., covalent bond, ionic bond) or may be another type of physical interaction (e.g., van der Waals interactions, dipole-dipole interactions, etc.). In some cases, a crosslinker used to generate a bead may comprise a labile bond. Upon exposure to the appropriate conditions, the labile bond can be broken, and the bead degraded. For example, upon exposure of a polyacrylamide gel bead comprising cystamine crosslinkers to a reducing agent, the disulfide bonds of the cystamine can be broken and the bead degraded.
[0282] A degradable bead may be useful in more quickly releasing an attached species (e.g., an oligonucleotide, a barcode sequence, a primer, etc.) from the bead when the appropriate stimulus is applied to the bead as compared to a bead that does not degrade. For example, for a species bound to an inner surface of a porous bead or in the case of an encapsulated species, the species may have greater
mobility and accessibility to other species in solution upon degradation of the bead. In some cases, a species may also be attached to a degradable bead via a degradable linker (e.g., disulfide linker). The degradable linker may respond to the same stimuli as the degradable bead, or the two degradable species may respond to different stimuli. For example, a barcode sequence may be attached, via a disulfide bond, to a polyacrylamide bead comprising cystamine. Upon exposure of the barcoded-bead to a reducing agent, the bead degrades, and the barcode sequence is released upon breakage of both the disulfide linkage between the barcode sequence and the bead and the disulfide linkages of the cystamine in the bead.
[0283] A degradable bead may be introduced into a partition, such as a droplet of an emulsion or a well, such that the bead degrades within the partition and any associated species (e.g., oligonucleotides) are released within the droplet when the appropriate stimulus is applied. The free species (e.g., oligonucleotides) may interact with other reagents contained in the partition. For example, a polyacrylamide bead comprising cystamine and linked, via a disulfide bond, to a barcode sequence, may be combined with a reducing agent within a droplet of a water-in-oil emulsion. Within the droplet, the reducing agent breaks the various disulfide bonds resulting in bead degradation and release of the barcode sequence into the aqueous, inner environment of the droplet. In another example, heating of a droplet comprising a bead-bound barcode sequence in basic solution may also result in bead degradation and release of the attached barcode sequence into the aqueous, inner environment of the droplet.
[0284] As will be appreciated from the above disclosure, while referred to as degradation of a bead, in many instances as noted above, that degradation may refer to the disassociation of a bound or entrained species from a bead, both with and without structurally degrading the physical bead itself. For example, entrained species may be released from beads through osmotic pressure differences due to, for example, changing chemical environments. By way of example, alteration of bead pore sizes due to osmotic pressure differences can generally occur without structural degradation of the bead itself. In some cases, an increase in pore size due to osmotic swelling of a bead can permit the release of entrained species within the bead. In other cases, osmotic shrinking of a bead may cause a bead to better retain an entrained species due to pore size contraction.
[0285] Where degradable beads are provided, it may be desirable to avoid exposing such beads to the stimulus or stimuli that cause such degradation prior to the desired time, in order to avoid premature bead degradation and issues that arise from such degradation, including for example poor flow characteristics and aggregation. By way of example, where beads comprise reducible cross-linking groups, such as disulfide groups, it will be desirable to avoid contacting such beads with reducing agents, e.g., DTT or other disulfide cleaving reagents. In such cases, treatment to the beads described
herein will, in some cases be provided free of reducing agents, such as DTT. Because reducing agents are often provided in commercial enzyme preparations, it may be desirable to provide reducing agent free (or DTT free) enzyme preparations in treating the beads described herein. Examples of such enzymes include, e.g., polymerase enzyme preparations, reverse transcriptase enzyme preparations, ligase enzyme preparations, as well as many other enzyme preparations that may be used to treat the beads described herein. The terms “reducing agent free” or “DTT free” preparations can refer to a preparation having less than l/10th, less than l/5Oth, and even less than 1/lOOth of the lower ranges for such materials used in degrading the beads. For example, for DTT, the reducing agent free preparation will typically have less than 0.01 mM, 0.005 mM, 0.001 mM DTT, 0.0005 mM DTT, or even less than 0.0001 mM DTT. In many cases, the amount of DTT will be undetectable.
[0286] In some cases, a stimulus may be used to trigger degradation of the bead, which may result in the release of contents from the bead. Generally, a stimulus may cause degradation of the bead structure, such as degradation of the covalent bonds or other types of physical interaction. These stimuli may be useful in inducing a bead to degrade and/or to release its contents. Examples of stimuli that may be used include chemical stimuli, thermal stimuli, optical stimuli (e g., light) and any combination thereof, as described more fully below.
[0287] Numerous chemical triggers may be used to trigger the degradation of beads. Examples of these chemical changes may include, but are not limited to pH-mediated changes to the integrity of a component within the bead, degradation of a component of a bead via cleavage of cross-linked bonds, and depolymerization of a component of a bead.
[0288] In some embodiments, a bead may be formed from materials that comprise degradable chemical crosslinkers, such as BAC or cystamine. Degradation of such degradable crosslinkers may be accomplished through a number of mechanisms. In some examples, a bead may be contacted with a chemical degrading agent that may induce oxidation, reduction or other chemical changes. For example, a chemical degrading agent may be a reducing agent, such as dithiothreitol (DTT). Additional examples of reducing agents may include P-mercaptoethanol, (2S)-2-amino-l,4-dimercaptobutane (dithiobutylamine or DTBA), tris(2-carboxyethyl) phosphine (TCEP), or combinations thereof. A reducing agent may degrade the disulfide bonds formed between gel precursors forming the bead, and thus, degrade the bead. In other cases, a change in pH of a solution, such as an increase in pH, may trigger degradation of a bead. In other cases, exposure to an aqueous solution, such as water, may trigger hydrolytic degradation, and thus degradation of the bead.
[0289] Beads may also be induced to release their contents upon the application of a thermal stimulus. A change in temperature can cause a variety of changes to a bead. For example, heat can cause a solid bead to liquefy. A change in heat may cause melting of a bead such that a portion of the bead degrades.
In other cases, heat may increase the internal pressure of the bead components such that the bead ruptures or explodes. Heat may also act upon heat-sensitive polymers used as materials to construct beads.
[0290] In some embodiments, changes in temperature or pH may be used to degrade thermo-sensitive or pH-sensitive bonds within beads. In some embodiments, chemical degrading agents may be used to degrade chemical bonds within beads by oxidation, reduction or other chemical changes. For example, a chemical degrading agent may be a reducing agent, such as DTT, wherein DTT may degrade the disulfide bonds formed between a crosslinker and gel precursors, thus degrading the bead. In some embodiments, a reducing agent may be added to degrade the bead, which may or may not cause the bead to release its contents. Examples of reducing agents may include dithiothreitol (DTT), 0- mercaptoethanol, (2S)-2-amino-l,4-dimercaptobutane (dithiobutylamine or DTBA), tris(2- carboxyethyl) phosphine (TCEP), or combinations thereof. The reducing agent may be present at a concentration of about 0.1 mM, 0.5 mM, 1 mM, 5 mM, or 10 mM. The reducing agent may be present at a concentration of at least about 0.1 mM, 0.5 mM, 1 mM, 5 mM, 10 mM, or greater. The reducing agent may be present at concentration of at most about 0.1 mM, 0.5 mM, 1 mM, 5 mM, or 10 mM.
[0291] Any suitable number of nucleic acid molecules (e.g., primer, e.g., barcoded oligonucleotide) can be associated with a bead such that, upon release from the bead, the nucleic acid molecules (e.g., primer, e.g., barcoded oligonucleotide) are present in the partition at a pre-defined concentration. Such pre-defined concentration may be selected to facilitate certain reactions for generating a sequencing library, e.g., amplification, within the partition. In some cases, the pre-defined concentration of the primer is limited by the process of producing oligonucleotide bearing beads.
[0292] Additionally, in many cases, the multiple beads within a single partition may comprise different reagents associated therewith. In such cases, it may be advantageous to introduce different beads into a common channel or droplet generation junction, from different bead sources, i .e., containing different associated reagents, through different channel inlets into such common channel or droplet generation junction. In such cases, the flow and frequency of the different beads into the channel or junction may be controlled to provide for the desired ratio of microcapsules from each source, while ensuring the desired pairing or combination of such beads into a partition with the desired number of cells.
[0293] The single-cell sequencing methods disclosed herein can include obtaining sequence information by molecularly indexing the targets from one or more of the isolated cells from the sample. The targets can, for example, be polynucleotides. The polynucleotides can be, for example, DNA or RNA (e.g., mRNA). Molecular indexing (sometimes referred to as molecular barcoding or molecular tagging) can be used, for example, for high-sensitivity single molecular counting. For example, a collection of identical polynucleotide molecules from one or more of the isolated cells can be attached
to a diverse set of labels for molecular indexing. Each of the labels can comprise, for example, a molecular label (also known as molecular index). In some embodiments, the method comprises molecularly indexing the polynucleotides from 1, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 600, 800, 1000, 2000, 5000, 10000 cells, or a number or a range between any two of these values.
[0294] Molecular indexing can, for example, be used to identify the origin of an indexed polynucleotide (e g., indicating from which tissue, cell and/or container the indexed polynucleotide is from) and/or to inform the identity of the indexed polynucleotide. The container can be a plate, a well, a droplet, a partition, a tube, or like. The indexed polynucleotide can comprise, for example, the polynucleotide to be indexed (e.g., an mRNA, a genomic DNA, or a cDNA) and a label region comprising one or more labels. In some embodiments, the indexed polynucleotide can further comprise one or more of a universal PCR region and an adaptor region. As an example, the indexed polynucleotide can be situated in a container (e.g., a microtiter plate), and the indexed polynucleotide can further include a unique label (e.g., a sample barcode) for identifying the plate in which the index polynucleotide is situated. An example of the region for identifying the plate is a plate index. The label region can, in some embodiments, comprise two or more labels. For example, the label region can include a molecular label (also known as a molecular index) and a sample label (also known as a sample barcode). The length of the labels can vary. For example, the label (e.g., the molecular label or the sample label) can be, or be about, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20 nucleotides in length, or longer. In some embodiments, the molecular label is, or is about, 5 nucleotides in length, and the sample label is, or is about, 5 nucleotides in length. In some embodiments, the molecular label is, or is about, 10 nucleotides in length, and the sample label is, or is about, 10 nucleotides in length.
[0295] In some embodiments, molecularly indexing the polynucleotides comprises generating a molecularly indexed polynucleotide library from one or more of the isolated cells. Generating a molecularly indexed polynucleotide library includes generating a plurality of indexed polynucleotides from the one or more of the isolated cells. For example, for a molecularly indexed polynucleotide library comprising a first indexed polynucleotide and a second indexed polynucleotide, the label region of the first indexed polynucleotide can differ from the label region of the second indexed polynucleotide by at least one, two, three, four, or five nucleotides. In some embodiments, generating a molecularly indexed polynucleotide library includes contacting a plurality of mRNA molecules with a plurality of oligonucleotides including a poly(T) region and a label region; and conducting a first strand synthesis using a reverse transcriptase to produce single-strand labeled cDNA molecules each comprising a cDNA region and a label region, wherein the plurality of mRNA molecules includes at least two mRNA molecules of different sequences and the plurality of oligonucleotides includes at least two oligonucleotides of different sequences. Generating a molecularly indexed polynucleotide
-TI-
library can further comprise amplifying the single-strand labeled cDNA molecules to produce double- strand labeled cDNA molecules; and conducting nested PCR on the double-strand labeled cDNA molecules to produce labeled amplicons. In some embodiments, the method can include generating an adaptor-labeled amplicon.
[0296] Molecular indexing uses nucleic acid barcodes or tags to label individual DNA or RNA molecules. In some embodiments, it involves adding DNA barcodes or tags to cDNA molecules as they are generated from mRNA. Nested PCR can be performed to minimize PCR amplification bias. Adaptors can be added for sequencing using, for example, NGS.
Flow Cytometry
[0297] The method provided herein can comprise, prior to obtaining the single cell transcriptomic data, separating a subset of T cells from the population of T cells based on expression of a CD4+ and/or CD8+ exhaustion marker, thereby generating a subset of exhausted T cells and a subset of non- exhausted T cells. In some embodiments, the CD4+ and/or CD8+ exhaustion marker comprises at least 5 genes selected from the group consisting of genes in Tables 3-6. In some embodiments, separating comprises using flow cytometry. The flow cytometry can be fluorescence activated cell sorting (FACS).
[0298] In some embodiments, isolating one or more cells of interest in the enriched cell sample can be performed with a flow cytometer. In some embodiments, the flow cytometer utilizes fluorescence- activated cell sorting.
[0299] Flow cytometry is a valuable method for the analysis and isolation of cells. As such it has a wide range of diagnostic and therapeutic applications. Flow cytometry can utilize a fluid stream to linearly segregate cells such that they can pass, single file, through a detection apparatus. Individual cells can be distinguished according to their location in the fluid stream and the presence of detectable markers. Cells flow through the focused interrogation point where at least one laser directs a laser beam to a focused point within the channel. The sample fluid containing cells is hydrodynamically focused to a very small core diameter by flowing sheath fluid around the sample stream at a very high volumetric rate. The small core diameter can be fewer than 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200 micrometers, or a number or a range between any two of these values. The volumetric rate of the sheath fluid can be on the order of at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 times, or a number or a range between any two of these values, the volumetric rate of the sample. This results in very fast linear velocities for the focused cells on the order of meters per second. So each cell spends a very limited time in the excitation spot, for example fewer than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 microseconds, or a number or a range between any two of these values. Once the cells pass
the interrogation point the cells cannot be redirected to the interrogation point again because the linear flow velocity cannot be reversed.
[0300] Flow cytometers are analytical tools that enable the characterization of cells on the basis of optical parameters such as light scatter and fluorescence. In a flow cytometer, cells in a fluid suspension are passed by a detection region in which the cells are exposed to an excitation light, typically from one or more lasers, and the light scattering and fluorescence properties of the cells are measured. Cells or components thereof can be labeled with fluorescent dyes to facilitate detection. A multiplicity of different cells or components can be simultaneously detected by using spectrally distinct fluorescent dyes to label the different cells or components. In some implementations, a multiplicity of photodetectors, one for each of the scatter parameters to be measured, and one for each of the distinct dyes to be detected are included in the analyzer. The data obtained comprise the signals measured for each of the light scatter parameters and the fluorescence emissions.
Pharmaceutical Composition
[0301] Provided herein are uses of the identified TCRs, nucleic acids encoding the TCRs, or cells comprising the TCRs described herein in the manufacture of a medicament. The medicament can be an identified TCR, a nucleic acid encoding the TCR, a cell comprising the TCR or the nucleic acid encoding the TCR described herein. The identified TCR, the nucleic acid encoding the TCR, or the cell comprising the TCR or the nucleic acid encoding the TCR described herein can be formulated as a pharmaceutical composition with additional adjuvants or pharmaceutically acceptable carriers or excipients. The pharmaceutical composition described herein can comprise a population of cells (e g., immune cells or T cells) comprising the identified tumor-reactive TCRs. In some cases, a given cell of the population of cells can express a single tumor-reactive TCR of the identified tumor-reactive TCRs. In some cases, each cell of the population of cells can express a tumor-reactive TCR of the identified tumor-reactive TCRs. In some cases, each cell of the population of cells can express a different or a unique tumor-reactive TCR of the identified tumor-reactive TCRs. The population of cells can comprise at least 105 cells, at least 106 cell, at least 107 cells, at least 108 cells, at least 109 cells, at least 1010 cells, at least 1011 cells, at least 1012 cells, at least 1013 cells, at least 1014 cells, at least 1015 cells, at least 1016 cells, least most 1017 cells, at least 1018 cells, at least 1019 cells, at least 102° cells or more cells.
[0302] A pharmaceutical composition comprising an active agent such as an immune cell comprising the TCRs described herein, in combination with one or more adjuvants can be formulated in conventional manner using one or more physiologically acceptable carriers, comprising excipients, diluents, and/or auxiliaries, e.g., which facilitate processing of the active agents into preparations that
can be administered. Proper formulation can depend at least in part upon the route of administration chosen. The agent(s) described herein can be delivered to a patient using a number of routes or modes of administration, including oral, buccal, topical, rectal, transdermal, transmucosal, subcutaneous, intravenous, and intramuscular applications, as well as by inhalation. The active agents can be formulated for parenteral administration (e.g., by injection, for example bolus injection or continuous infusion) and can be presented in unit dose form in ampoules, pre-fdled syringes, small volume infusion or in multi-dose containers with an added preservative. The compositions can take such forms as suspensions, solutions, or emulsions in oily or aqueous vehicles, for example solutions in aqueous polyethylene glycol.
[0303] In some embodiments, a pharmaceutical composition comprised of the identified TCR, nucleic acid encoding the TCR, or a cell comprising the TCR can further comprise an acceptable additive in order to improve the stability of immune cells in the composition. Acceptable additives may not alter the specific activity of the immune cells. Examples of acceptable additives include, but are not limited to, a sugar such as mannitol, sorbitol, glucose, xylitol, trehalose, sorbose, sucrose, galactose, dextran, dextrose, fructose, lactose and mixtures thereof. Acceptable additives can be combined with acceptable carriers and/or excipients such as dextrose. Alternatively, examples of acceptable additives include, but are not limited to, a surfactant such as polysorbate 20 or polysorbate 80 to increase stability of the peptide and decrease gelling of the solution. The surfactant can be added to the composition in an amount of 0.01% to 5% of the solution. Addition of such acceptable additives increases the stability and half-life of the composition in storage.
[0304] When the compositions of the identified TCR, nucleic acid encoding the TCR, or a cell comprising the TCR are considered for use in medicaments or any of the methods provided herein, it is contemplated that the composition can be substantially free of pyrogens such that the composition will not cause an inflammatory reaction or an unsafe allergic reaction when administered to a human patient. Testing compositions for pyrogens and preparing compositions substantially free of pyrogens are well understood to one or ordinary skill of the art and can be accomplished using commercially available kits.
[0305] Acceptable carriers can contain a compound that acts as a stabilizing agent, increases or delays absorption, or increases or delays clearance. Such compounds include, for example, carbohydrates, such as glucose, sucrose, or dextrans; low molecular weight proteins; compositions that reduce the clearance or hydrolysis of peptides; or excipients or other stabilizers and/or buffers. Agents that delay absorption include, for example, aluminum monostearate and gelatin. Detergents can also be used to stabilize or to increase or decrease the absorption of the pharmaceutical composition, including liposomal carriers. To protect from digestion the compound can be complexed with a composition to
render it resistant to acidic and enzymatic hydrolysis, or the compound can be complexed in an appropriately resistant carrier such as a liposome. Means of protecting compounds from digestion are known in the art (e. ., Fix (1996) Pharm Res. 13: 1760 1764; Samanen (1996) J. Pharm. Pharmacol. 48: 119 135; and U.S. Pat. No. 5,391,377).
[0306] For injectable formulations, the vehicle can be chosen from those known in art to be suitable, including aqueous solutions or oil suspensions, or emulsions, with sesame oil, com oil, cottonseed oil, or peanut oil, as well as elixirs, mannitol, dextrose, or a sterile aqueous solution, and similar pharmaceutical vehicles. The formulation can also comprise polymer compositions which are biocompatible, biodegradable, such as poly(lactic-co-glycolic)acid. These materials can be made into micro or nanospheres, loaded with drug and further coated or derivatized to provide superior sustained release performance. Vehicles suitable for periocular or intraocular injection include, for example, suspensions of therapeutic agent in inj ection grade water, liposomes and vehicles suitable for lipophilic substances. Other vehicles for periocular or intraocular injection are well known in the art.
[0307] In some instances, pharmaceutical composition is formulated in accordance with routine procedures as a pharmaceutical composition adapted for intravenous administration to human beings. Typically, compositions for intravenous administration are solutions in sterile isotonic aqueous buffer. Where necessary, the composition can also include a solubilizing agent and a local anesthetic such as lidocaine to ease pain at the site of the injection. Generally, the ingredients are supplied either separately or mixed together in unit dosage form, for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent. Where the composition is to be administered by infusion, it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline. Where the composition is administered by injection, an ampoule of sterile water for injection or saline can be provided so that the ingredients can be mixed prior to administration.
[0308] When administration is by injection, the active agent can be formulated in aqueous solutions, specifically in physiologically compatible buffers such as Hank’s solution, Ringer’s solution, or physiological saline buffer. The solution can contain formulatory agents such as suspending, stabilizing and/or dispersing agents. In another embodiment, the pharmaceutical composition does not comprise an adjuvant or any other substance added to enhance the immune response.
[0309] In addition to the formulations described previously, the active agents can also be formulated as a depot preparation. Such long-acting formulations can be administered by implantation or transcutaneous delivery (for example subcutaneously or intramuscularly), intramuscular injection or use of a transdermal patch. Thus, for example, the agents can be formulated with suitable polymeric
or hydrophobic materials (for example as an emulsion in an acceptable oil) or ion exchange resins, or as sparingly soluble derivatives, for example, as a sparingly soluble salt.
[0310] In some embodiments, a pharmaceutical composition can comprise the population of engineered immune cells produced according to any of the methods disclosed herein. In some embodiments, a pharmaceutical composition can comprise the engineered immune cells comprising the TCRs or nucleic acids encoding the TCRs disclosed herein. In some embodiments, a pharmaceutical composition can comprise the population of engineered immune cells disclosed herein. [0311] In some embodiments, any of the pharmaceutical compositions disclosed herein can be administered to a subject. In some embodiments, any of the pharmaceutical compositions disclosed herein can be administered to a subject to treat a disease or condition in a subject in need thereof. In some embodiments, the subject can be the same subject from which the biological sample is obtained. [0312] In some embodiments, any identified TCR, nucleic acid encoding the TCR, or a cell comprising the TCR disclosed herein can be used in the manufacture of a medicament for treating a disease or a condition (e.g., cancer, autoimmune disease, or infectious disease) in a subject. In some embodiments, any identified TCR, nucleic acid encoding the TCR, or a cell comprising the TCR disclosed herein can be used in the manufacture of a medicament for treating a cancer in a subject.
[0313] In some embodiments, the disease or condition that can be treated with the methods disclosed herein is abnormal growth of cells. In some embodiments, the disease or condition that can be treated with the methods disclosed herein is cancer. In some embodiments, the cancer is a malignant cancer. In some embodiments, the cancer is a benign cancer. In some embodiments, the cancer is an invasive cancer. In some embodiments, the cancer is a solid tumor. In some embodiments, the cancer is a liquid cancer.
[0314] The methods of the disclosure can be used to treat any type of cancer known in the art. Non- limiting examples of cancers to be treated by the methods of the present disclosure include melanoma (e.g., metastatic malignant melanoma), renal cancer (e.g., clear cell carcinoma), prostate cancer (e.g., hormone refractory prostate adenocarcinoma), pancreatic adenocarcinoma, breast cancer, colon cancer, lung cancer (e.g., non-small cell lung cancer), esophageal cancer, squamous cell carcinoma of the head and neck, liver cancer, ovarian cancer, cervical cancer, thyroid cancer, glioblastoma, glioma, leukemia, lymphoma, and other neoplastic malignancies.
[0315] Additionally, the disease or condition provided herein includes refractory or recurrent malignancies whose growth may be inhibited using the methods of treatment of the present disclosure. In some embodiments, a cancer to be treated by the methods of treatment of the present disclosure is selected from the group consisting of carcinoma, squamous carcinoma, adenocarcinoma, sarcomata, endometrial cancer, breast cancer, ovarian cancer, cervical cancer, fallopian tube cancer, primary
peritoneal cancer, colon cancer, colorectal cancer, squamous cell carcinoma of the anogenital region, melanoma, renal cell carcinoma, lung cancer, non-small cell lung cancer, squamous cell carcinoma of the lung, stomach cancer, bladder cancer, gall bladder cancer, liver cancer, thyroid cancer, laryngeal cancer, salivary gland cancer, esophageal cancer, head and neck cancer, glioblastoma, glioma, squamous cell carcinoma of the head and neck, prostate cancer, pancreatic cancer, mesothelioma, sarcoma, hematological cancer, leukemia, lymphoma, neuroma, and combinations thereof In some embodiments, a cancer to be treated by the methods of the present disclosure include, for example, carcinoma, squamous carcinoma (for example, cervical canal, eyelid, tunica conjunctiva, vagina, lung, oral cavity, skin, urinary bladder, tongue, larynx, and gullet), and adenocarcinoma (for example, prostate, small intestine, endometrium, cervical canal, large intestine, lung, pancreas, gullet, rectum, uterus, stomach, mammary gland, and ovary). In some embodiments, a cancer to be treated by the methods of the present disclosure further include sarcomata (for example, myogenic sarcoma), leukosis, neuroma, melanoma, and lymphoma. In some embodiments, a cancer to be treated by the methods of the present disclosure is breast cancer. In some embodiments, a cancer to be treated by the methods of treatment of the present disclosure is triple negative breast cancer (TNBC). In some embodiments, a cancer to be treated by the methods of treatment of the present disclosure is ovarian cancer. In some embodiments, a cancer to be treated by the methods of treatment of the present disclosure is colorectal cancer.
[0316] In some embodiments, a patient or population of patients to be treated with a pharmaceutical composition of the present disclosure have a solid tumor. In some embodiments, a solid tumor is a melanoma, renal cell carcinoma, lung cancer, bladder cancer, breast cancer, cervical cancer, colon cancer, gall bladder cancer, laryngeal cancer, liver cancer, thyroid cancer, stomach cancer, salivary gland cancer, prostate cancer, pancreatic cancer, or Merkel cell carcinoma. In some embodiments, a patient or population of patients to be treated with a pharmaceutical composition of the present disclosure have a hematological cancer. In some embodiments, the patient has a hematological cancer such as Diffuse large B cell lymphoma (“DLBCL”), Hodgkin’s lymphoma (“HL”), Non-Hodgkin’s lymphoma (“NHL”), Follicular lymphoma (“FL”), acute myeloid leukemia (“AML”), or Multiple myeloma (“MM”). In some embodiments, a patient or population of patients to be treated having the cancer selected from the group consisting of ovarian cancer, lung cancer and melanoma.
[0317] Specific examples of cancers that can be prevented and/or treated in accordance with present disclosure include, but are not limited to, the following: renal cancer, kidney cancer, glioblastoma multiforme, metastatic breast cancer; breast carcinoma; breast sarcoma; neurofibroma; neurofibromatosis; pediatric tumors; neuroblastoma; malignant melanoma; carcinomas of the epidermis; leukemias such as but not limited to, acute leukemia, acute lymphocytic leukemia, acute
myelocytic leukemias such as myeloblastic, promyelocytic, myelomonocytic, monocytic, erythroleukemia leukemias and myelodysplastic syndrome, chronic leukemias such as but not limited to, chronic myelocytic (granulocytic) leukemia, chronic lymphocytic leukemia, hairy cell leukemia; polycythemia vera; lymphomas such as but not limited to Hodgkin’s disease, non-Hodgkin’s disease; multiple myelomas such as but not limited to smoldering multiple myeloma, nonsecretory myeloma, osteosclerotic myeloma, plasma cell leukemia, solitary plasmacytoma and extramedullary plasmacytoma; Waldenstrom’s macroglobulinemia; monoclonal gammopathy of undetermined significance; benign monoclonal gammopathy; heavy chain disease; bone cancer and connective tissue sarcomas such as but not limited to bone sarcoma, myeloma bone disease, multiple myeloma, cholesteatoma-induced bone osteosarcoma, Paget’s disease of bone, osteosarcoma, chondrosarcoma, Ewing’s sarcoma, malignant giant cell tumor, fibrosarcoma of bone, chordoma, periosteal sarcoma, soft-tissue sarcomas, angiosarcoma (hemangiosarcoma), fibrosarcoma, Kaposi’s sarcoma, leiomyosarcoma, liposarcoma, lymphangio sarcoma, neurilemmoma, rhabdomyosarcoma, and synovial sarcoma; brain tumors such as but not limited to, glioma, astrocytoma, brain stem glioma, ependymoma, oligodendroglioma, nonglial tumor, acoustic neurinoma, craniopharyngioma, medulloblastoma, meningioma, pineocytoma, pineoblastoma, and primary brain lymphoma; breast cancer including but not limited to adenocarcinoma, lobular (small cell) carcinoma, intraductal carcinoma, medullary breast cancer, mucinous breast cancer, tubular breast cancer, papillary breast cancer, Paget’s disease (including juvenile Paget’s disease) and inflammatory breast cancer; adrenal cancer such as but not limited to pheochromocytom and adrenocortical carcinoma; thyroid cancer such as but not limited to papillary or follicular thyroid cancer, medullary thyroid cancer and anaplastic thyroid cancer; pancreatic cancer such as but not limited to, insulinoma, gastrinoma, glucagonoma, vipoma, somatostatin-secreting tumor, and carcinoid or islet cell tumor; pituitary cancers such as but limited to Cushing’s disease, prolactin-secreting tumor, acromegaly, and diabetes insipius; eye cancers such as but not limited to ocular melanoma such as iris melanoma, choroidal melanoma, and cilliary body melanoma, and retinoblastoma; vaginal cancers such as squamous cell carcinoma, adenocarcinoma, and melanoma; vulvar cancer such as squamous cell carcinoma, melanoma, adenocarcinoma, basal cell carcinoma, sarcoma, and Paget’s disease; cervical cancers such as but not limited to, squamous cell carcinoma, and adenocarcinoma; uterine cancers such as but not limited to endometrial carcinoma and uterine sarcoma; ovarian cancers such as but not limited to, ovarian epithelial carcinoma, borderline tumor, germ cell tumor, and stromal tumor; cervical carcinoma; esophageal cancers such as but not limited to, squamous cancer, adenocarcinoma, adenoid cystic carcinoma, mucoepidermoid carcinoma, adenosquamous carcinoma, sarcoma, melanoma, plasmacytoma, verrucous carcinoma, and oat cell (small cell) carcinoma;
stomach cancers such as but not limited to, adenocarcinoma, fungating (polypoid), ulcerating, superficial spreading, diffusely spreading, malignant lymphoma, liposarcoma, fibrosarcoma, and carcinosarcoma; colon cancers; colorectal cancer, colon carcinoma; rectal cancers; liver cancers such as but not limited to hepatocellular carcinoma and hepatoblastoma, gallbladder cancers such as adenocarcinoma; cholangiocarcinomas such as but not limited to pappillary, nodular, and diffuse; lung cancers such as non-small cell lung cancer, non-small cell lung cancer, squamous cell carcinoma (epidermoid carcinoma), adenocarcinoma, large-cell carcinoma and small-cell lung cancer; lung carcinoma; testicular cancers such as but not limited to germinal tumor, seminoma, anaplastic, classic (typical), spermatocytic, nonseminoma, embryonal carcinoma, teratoma carcinoma, choriocarcinoma (yolk-sac tumor), prostate cancers such as but not limited to, androgen- independent prostate cancer, androgen-dependent prostate cancer, adenocarcinoma, leiomyosarcoma, and rhabdomyosarcoma; penal cancers; oral cancers such as but not limited to squamous cell carcinoma; basal cancers; salivary gland cancers such as but not limited to adenocarcinoma, mucoepidermoid carcinoma, and adenoidcystic carcinoma; pharynx cancers such as but not limited to squamous cell cancer, and verrucous; skin cancers such as but not limited to, basal cell carcinoma, squamous cell carcinoma and melanoma, superficial spreading melanoma, nodular melanoma, lentigo malignant melanoma, acrallentiginous melanoma; kidney cancers such as but not limited to renal cell cancer, adenocarcinoma, hypernephroma, fibrosarcoma, transitional cell cancer (renal pelvis and/or uterus); renal carcinoma; Wilms’ tumor; bladder cancers such as but not limited to transitional cell carcinoma, squamous cell cancer, adenocarcinoma, carcinosarcoma. In addition, cancers include myxosarcoma, osteogenic sarcoma, endotheliosarcoma, lymphangioendotheliosarcoma, mesothelioma, synovioma, hemangioblastoma, epithelial carcinoma, cystadenocarcinoma, bronchogenic carcinoma, sweat gland carcinoma, sebaceous gland carcinoma, papillary carcinoma and papillary adenocarcinomas. Cancers include, but are not limited to, B cell cancer, e.g., multiple myeloma, Waldenstrom’s macroglobulinemia, the heavy chain diseases, such as, for example, alpha chain disease, gamma chain disease, and mu chain disease, benign monoclonal gammopathy, and immunocytic amyloidosis, melanomas, breast cancer, lung cancer, bronchus cancer, colorectal cancer, prostate cancer (e.g., metastatic, hormone refractory prostate cancer), pancreatic cancer, stomach cancer, ovarian cancer, urinary bladder cancer, brain or central nervous system cancer, peripheral nervous system cancer, esophageal cancer, cervical cancer, uterine or endometrial cancer, cancer of the oral cavity or pharynx, liver cancer, kidney cancer, testicular cancer, biliary tract cancer, small bowel or appendix cancer, salivary gland cancer, thyroid gland cancer, adrenal gland cancer, osteosarcoma, chondrosarcoma, cancer of hematological tissues, and the like. Other non-limiting examples of types of cancers applicable to the methods encompassed by
the present disclosure include human sarcomas and carcinomas, e.g., fibrosarcoma, myxosarcoma, liposarcoma, chondrosarcoma, osteogenic sarcoma, chordoma, angiosarcoma, endotheliosarcoma, lymphangiosarcoma, lymphangioendotheliosarcoma, synovioma, mesothelioma, Ewing’s tumor, leiomyosarcoma, rhabdomyosarcoma, colon carcinoma, colorectal cancer, pancreatic cancer, breast cancer, ovarian cancer, squamous cell carcinoma, basal cell carcinoma, adenocarcinoma, sweat gland carcinoma, sebaceous gland carcinoma, papillary carcinoma, papillary adenocarcinomas, cystadenocarcinoma, medullary carcinoma, bronchogenic carcinoma, renal cell carcinoma, hepatoma, bile duct carcinoma, liver cancer, choriocarcinoma, seminoma, embryonal carcinoma, Wilms’ tumor, cervical cancer, bone cancer, brain tumor, testicular cancer, lung carcinoma, small cell lung carcinoma, bladder carcinoma, epithelial carcinoma, glioma, astrocytoma, medulloblastoma, craniopharyngioma, ependymoma, pinealoma, hemangioblastoma, acoustic neuroma, oligodendroglioma, meningioma, melanoma, neuroblastoma, retinoblastoma; leukemias, e.g., acute lymphocytic leukemia and acute myelocytic leukemia (myeloblastic, promyelocytic, myelomonocytic, monocytic and erythroleukemia); chronic leukemia (chronic myelocytic (granulocytic) leukemia and chronic lymphocytic leukemia); and polycythemia vera, lymphoma (Hodgkin’s disease and non-Hodgkin’s disease), multiple myeloma, Waldenstrom’s macroglobulinemia, and heavy chain disease. In some embodiments, the cancer whose phenotype is determined by the method of the present disclosure is an epithelial cancer such as, but not limited to, bladder cancer, breast cancer, cervical cancer, colon cancer, gynecologic cancers, renal cancer, laryngeal cancer, lung cancer, oral cancer, head and neck cancer, ovarian cancer, pancreatic cancer, prostate cancer, or skin cancer. In other embodiments, the cancer is breast cancer, prostate cancer, lung cancer, or colon cancer. In still other embodiments, the epithelial cancer is non-small-cell lung cancer, nonpapillary renal cell carcinoma, cervical carcinoma, ovarian carcinoma (e.g., serous ovarian carcinoma), or breast carcinoma. The epithelial cancers may be characterized in various other ways including, but not limited to, serous, endometrioid, mucinous, clear cell, brenner, or undifferentiated. In some embodiments, the present disclosure is used in the treatment, diagnosis, and/or prognosis of lymphoma or its subtypes, including, but not limited to, mantle cell lymphoma. Lymphoproliferative disorders are also considered to be proliferative diseases.
[0318] Cancer refers to diseases in which abnormal cells divide out of control and are able to invade other tissues. Cancer cells can spread to other parts of the body through the blood and lymph systems. Cancer can be characterized as a group of diseases involving abnormal cell growth that may begin in any tissue with the potential to invade or spread to other parts of the body. Some cancers can be characterized by their type, e.g., solid cancers, liquid cancers, or based on cellular origin such as hematopoietic cancers, osteosarcoma or lymphoma. Some cancers are known by the tissue of their
origin or prevalence, e.g., endometrial cancers are characterized as cancers of the endometrial tissue. Some cancers are known by the organ or site of their origin or prevalence, e.g. lung cancer, head and neck cancer. Some cancers may be known by the overproductions of certain proteins, enzymes or biomarkers compared to their counterpart cells or tissues that are not cancerous. For example, certain proteins of viral origin may be associated with certain cancers, such as HPV-16 cancers, where certain proteins, for example HPV-16 E6 and E7 are overexpressed in cancer cells of this type. For example, certain antigens, such as KRAS may be highly expressed in certain cancer types, compared to non-cancer cells of the same type, and may be designated as KRAS overexpressing cancers. Typically, the overexpression of the antigen or the specific protein may be associated with or related to one or more mutations, and the cancer type may be associated with the mutation. For example, mutation at the wild type G residue corresponding to position 12 in KRAS amino acid sequence may be mutated to V, D, C or other amino acids in KRAS-specific cancer cells. Certain specific antigens may be specifically expressed in cancer cells of certain cancer types, and not in other cancer types. Various cancers are contemplated herein that may not be restricted to a specific cell type, tissue type or organ, or even a certain stage of cancer. The TCRs of the present invention are directed to cancer cells that express a cancer antigen, that may be patient specific, which can be found during sequencing of a subject’s genome from biological sample obtained from a cancer cell, cancer site or cancer tissue and compared to a corresponding non-cancer sample from the same subject; wherein the patient-specific antigen may be expressed in the cancer cell, and not on the non-cancer cell of the subject. In some cases, cancer antigens may be cancer specific, where the antigen is reportedly present in the type of cancer observed in multiple patients in the human population, who have been diagnosed of the specific cancer. In some cases, certain types are cancers are associated with an antigen, a protein (e.g., a viral protein) a gene mutation, all forms of cancer are contemplated herein. [0319] In some embodiments, the cancer is a solid cancer. In some cases, the cancer is a liquid / blood cancer. The cancer can express or be diagnosed as expressing a tumor antigen. The tumor antigen can be a tumor-associated antigen or a tumor-specific antigen. In some cases, the cancer expresses a tumor-associated antigen (TAA). In some cases, the cancer expresses a tumor-specific antigen (TSA).
[0320] In some embodiments, the cancer is a cancer expressing or diagnosed as expressing a TAA. In some embodiments, the cancer is a cancer expressing or diagnosed as expressing a TSA.
[0321] The current classification of TAA can include the following group: a) Cancer testis (CT) antigen: Since testis cells do not express HLA class I and class II molecules, these antigens may not be recognized by T cells in normal tissues and may therefore be
immunologically considered tumor specific. Non-limiting examples of CT antigens include members of the MAGE family and NY-ESO-1; b) Differentiation antigen: both tumor and normal tissue (from which the tumor originates) may contain TAAs. Differentiation antigens may be found, for example, in melanoma and normal melanocytes. Many of these melanocyte lineage-associated proteins may be involved in melanin biosynthesis and therefore these proteins may not tumor-specific but may still widely be used for immunotherapy of cancer. Examples include, but are not limited to, tyrosinase for melanoma and PSA for Melan-A/MART-1 or prostate cancer; c) Overexpressed TAA: gene-encoded widely expressed TAAs may be detected in histologically diverse tumors and in many normal tissues, with generally low expression levels. It is possible that many epitopes processed and potentially presented by normal tissues may be below the threshold level of T cell recognition, whereas their overexpression in tumor cells can trigger anticancer responses by breaking previously established tolerance. Non-limiting examples of such TAAs include Her-2/neu, survivin, telomerase or WT1; d) tumor specific antigen can include unique TAAs resulted from mutations in normal genes (e.g., beta-catenin, CDK4). Some of these molecular changes can be associated with neoplastic transformation and/or progression. Tumor-specific antigens can generally induce strong immune responses without risking from the autoimmune response to normal tissue strips. On the other hand, these TAAs may only be associated with the exact tumor on which they are confirmed and may not commonly shared among many individual tumors. In the case of tumor specific (related) isoform proteins, peptide tumor specificity (or relatedness) may also occur if the peptide is derived from tumor (related) exons; e) TAA resulting from aberrant post-translational modification: such TAAs may result from proteins in the tumor that are neither specific nor overexpressed, but which still have tumor relevance (this relevance is due to posttranslational processing that is primarily active on tumors). Such TAAs may result from an altered glycosylation pattern, resulting in a tumor producing a novel epitope for MUC1 or in an event such as protein splicing during degradation, which may or may not be tumor specific; and f) Tumor virus protein: these TTAs are viral proteins that may play a key role in the oncogenic process and, because they are foreign proteins (non-human proteins), may be able to trigger T cell responses. Non-limiting examples of such proteins include human papilloma type 16 viral proteins, E6 and E7, which are expressed in cervical cancer.
[0322] Examples of tumor antigens include, but not limited to new antigens expressed during tumorigenesis, products of oncogenes and tumor suppressor genes, overexpressed or abnormally
expressed intracellular proteins (e.g., HERZ, MUC1, PSA, MUC1), carcinoembryonic antigen (CEA), tumor viruses (e.g., EBC, HPV, HBV, HCB, HTLV), cancer testis antigens (CTA) (e.g., MAGE family, NY-ESO), oncofetal antigens, altered surface glycolipids and glycoproteins, cell type-specific differentiation antigens (e.g., MART-1), or a derivative thereof. The tumor antigens can be selected from the group consisting of NY-ESO-1, Her2/neu, SSX-2, MAGE-C2, MAGE-A1, M-2433-233, MAGE-A10254-262, KK-LC-1, p53, PRAME, Alpha fetoprotein, HPV6-E6, HPV16- E7, EBV-LMP1, RAS: G12D, RAS: G12C, RAS: G12A, RAS: G12S, RAS: G12R, RAS: G12R, RAS: G12R, RAS: G122 V, RAS: Q61H, RAS: Q61L, RAS: Q61R, RAS: G13D, TP53: V157G, TP53: V157F, TP53: R248Q, TP53: R248W, TP53: G245S, TP53: Y163C, TP53: G249S, TP53: Y240C, TP53: R175H, TP53: K132N, CDC73: Q254E, TPP2A6: N438Y, CTNN1: T41A, CTNNB1: S45P, CTNNB1 : S37Y, CTNNB1 : S33C, EGFR: L858R, EGFR: T790M, PIK3CA: E542K, PIK3CA: H1047R, GNAS: R201H, CDK4:R24, R24C H3. 3 K28M, BRAF: V600E, CHD4 K73Rfs, NRAS Q61R, IDH1 :R132H, TVP23C: C51Y, and any combination thereof. The RAS can be KRAS, HRAS, or NRAS.
[0323] Other non-limiting examples of tumor-associated antigen or tumor-specific antigen includes antigens from Human Papilloma Virus, Epstein-Barr Virus, Merkel cell polyomavirus, Human Immunodeficiency Virus, Human T-cell Leukemia Virus, Human Herpes Virus 8, Hepatitis B virus, Hepatitis C virus, HCV, HBC, Cytomegalovirus, or from the group of single-point mutated antigens derived from the group consisting of the antigens of ctnnbl gene, casp8 gene, HER2 gene, p53 gene, KRAS gene, NRAS gene, or particular tumor antigens issued or derived from the group consisting of RAS oncogene, BCR-ABL tumor antigens, ETV6-AML1 tumor antigens, melanoma-antigen encoding genes (MAGE), BAGE antigens, GAGE antigens, ssx antigens, ny-eso-1 antigens, cyclin- A1 tumor antigens, MART-1 antigen, gplOO antigen, CD 19 antigen, prostate specific antigen, prostatic acidic phosphatase antigen, carcinoembryonic antigen, alphafetoprotein antigen, carcinoma antigen 125, mucin 16 antigen, mucin 1 antigen, human telomerase reverse transcriptase antigen, EGFR antigen, MOK antigen, RAGE-1 antigen, PRAME antigen, wild-type p53 antigen, oncogene ERBB2 antigen, sialyl-Tn tumor antigen, Wilms tumor 1 antigen, mesothelin antigen, carbohydrate antigens, B-catenin antigen, MUM-1 antigen, CDK4 antigen ERBB2IP antigen, and Melan-A melanoma tumor-associated antigen.
[0324] In some cases, the cancer cells express the tumor antigens, including and not limited to, NY- ESO-1, Her2/neu, SSX-2, MAGE-C2, MAGE-A1, M-2433-233, MAGE-A10254-262, KK-LC-1, p53, PRAME, Alpha fetoprotein, HPV6-E6, HPV16-E7, EBV-LMP1, RAS: G12D, RAS: G12C, RAS: G12A, RAS: G12S, RAS: G12R, RAS: G12R, RAS: G12R, RAS: G122 V, RAS: Q61H, RAS: Q61L, RAS: Q61R, RAS: G13D, TP53: V157G, TP53: V157F, TP53: R248Q, TP53: R248W, TP53:
G245S, TP53: Y163C, TP53: G249S, TP53: Y240C, TP53: R175H, TP53: K132N, CDC73: Q254E, TPP2A6: N438Y, CTNN1: T41A, CTNNB1: S45P, CTNNB1 : S37Y, CTNNB1 : S33C, EGFR: L858R, EGFR: T790M, PIK3CA: E542K, PIK3CA: H1047R, GNAS: R201H, CDK4 R24, R24C H3. 3:K28M, BRAF: V600E, CHD4 K73Rfs, NRAS Q61R, IDHER132H, or TVP23C: C51Y. The RAS can be KRAS, HRAS, or NRAS.
[0325] In some embodiments, the cancer is a carcinoma, lymphoma, blastoma, sarcoma, leukemia, squamous cell cancer, lung cancer (including small cell lung cancer, non-small cell lung cancer, adenocarcinoma of the lung, and squamous carcinoma of the lung), cancer of the peritoneum, hepatocellular cancer, gastric or stomach cancer (including gastrointestinal cancer), pancreatic cancer, glioblastoma, cervical cancer, ovarian cancer, liver cancer, bladder cancer, hepatoma, breast cancer, colon cancer, melanoma, endometrial or uterine carcinoma, salivary gland carcinoma, kidney or renal cancer, liver cancer, prostate cancer, vulval cancer, thyroid cancer, hepatic carcinoma, head and neck cancer, colorectal cancer, rectal cancer, soft-tissue sarcoma, Kaposi’s sarcoma, B-cell lymphoma (including low grade/follicular non-Hodgkin’s lymphoma (NHL), small lymphocytic (SL) NHL, intermediate grade/follicular NHL, intermediate grade diffuse NHL, high grade immunoblastic NHL, high grade lymphoblastic NHL, high grade small non-cleaved cell NHL, bulky disease NHL, mantle cell lymphoma, AIDS-related lymphoma, and Waldenstrom’s macroglobulinemia), chronic lymphocytic leukemia (CLL), acute lymphoblastic leukemia (ALL), myeloma, Hairy cell leukemia, chronic myeloblasts leukemia, and post-transplant lymphoproliferative disorder (PTLD), abnormal vascular proliferation associated with phakomatoses, edema, Meigs’ syndrome, or combinations thereof.
Samples
[0326] The present disclosure can be applied to a range of samples depending on the specific context and requirements. In general, any type of tumor tissue sample, derived from a subject having a cancer, may be suitable. This can include solid tumors such as breast, lung, colorectal, prostate, and skin cancers, or hematological malignancies like lymphomas and leukemias. Additionally, fine needle aspirates, biopsies, or resected tumor tissues could be used. Beyond actual tumor tissue, the method can also be applicable to peripheral blood mononuclear cells (PBMCs), or lymphocytes isolated from the blood, or samples obtained from other body fluids, such as pleural effusions or ascites, which may contain tumor-infiltrating T cells. Moreover, samples from any solid tumor that can be dissociated into single cells would also be suitable for this method. The T cell can be obtained from a tissue sample comprising a solid tissue, with non-limiting examples including a tissue from brain, liver, lung, kidney, prostate, ovary, spleen, lymph node (e.g., tonsil), thyroid, thymus, pancreas, heart, skeletal muscle,
intestine, larynx, esophagus, and stomach. Additional non-limiting sources include bone marrow, cord blood, tissue from a site of infection, ascites, pleural effusion, spleen tissue, and tumors. In some cases, the T cells can be obtained from a solid tumor lesion from a subject. The T cell can be derived or obtained from a healthy donor, from a patient diagnosed with cancer or from a patient diagnosed with an infection.
[0327] The T cells can be isolated from a sample and selected with certain properties by various methods. When isolating T cells from tissues (e.g., isolating tumor-infiltrating T cells from tumor tissues), the tissues made be minced or fragmented to dissociate cells before lysing the red blood cells or depleting the monocytes.
[0328] The source T cells can be tumor-infiltrating lymphocytes (TILs), e.g., tumor-infiltrating T cells (TITs). A TIL can be isolated from an organ afflicted with a cancer. One or more cells can be isolated from an organ with a cancer that can be a brain, heart, lungs, eye, stomach, pancreas, kidneys, liver, intestines, uterus, bladder, skin, hair, nails, ears, glands, nose, mouth, lips, spleen, gums, teeth, tongue, salivary glands, tonsils, pharynx, esophagus, large intestine, small intestine, rectum, anus, thyroid gland, thymus gland, bones, cartilage, tendons, ligaments, suprarenal capsule, skeletal muscles, smooth muscles, blood vessels, blood, spinal cord, trachea, ureters, urethra, hypothalamus, pituitary, pylorus, adrenal glands, ovaries, oviducts, uterus, vagina, mammary glands, testes, seminal vesicles, penis, lymph, lymph nodes or lymph vessels. One or more TILs can be from a brain, heart, liver, skin, intestine, lung, kidney, eye, small bowel, or pancreas. TILs can be from a pancreas, kidney, eye, liver, small bowel, lung, or heart. The one or more cells can be pancreatic islet cells, for example, pancreatic P cells. In some cases, a TIL can be from a gastrointestinal cancer.
[0329] In some cases, the tumor sample can be a surgically removed tumor sample (or a resection sample). In some cases, the tumor sample can be a biopsy sample such as core biopsy, fine needle biopsy sample, or a large needle biopsy sample.
EXAMPLES
Example 1: Predictive identification of tumor antigen-reactive T cell receptors from exhausted T cells
[0330] An antigen-agnostic prediction algorithm was developed to identify tumor antigen-reactive TCRs for both CD8+ and CD4+ T cells within a tumor. The bioinformatic algorithm takes advantage of molecular signatures which are captured by single cell transcriptome (scGEX) and TCR (scTCR) sequencing on TILs as input. The model was trained on seven tumor samples across three cancer types including non-small cell lung cancer (NSCLC), colorectal (CRC), head and neck cancers. When
applied it retrospectively and prospectively on fifteen tumor samples across five cancer types five cancer types including non-small cell lung cancer (NSCLC), colorectal (CRC), head and neck, ovarian and breast cancers, the algorithm described herein achieved a median positive predictive value (PPV) of 40% for CD8+ and 70% for CD4+ top five TCR clones, respectively. On combined top ten selected clones, the algorithm achieved a median PPV of 70%. The top 10 clones targeted a median of four unique somatic mutations presented by three unique HLA alleles (Table 7). The optimal performance of the prediction algorithm paves a way for developing a personalized adoptive T cell therapy to treat solid tumors. In brief, the end-to-end algorithm is mainly composed of four sequential steps. It starts with gene expression matrix and TCR repertoire generated from Cell Ranger processed scGEX and scTCR sequencing data. Taking scGEX as input, TILs were partitioned into either CD4+ or CD8+ population by unsupervised clustering using a 99-gene signature (Table 2) as variable genes with six principal components (PC) at 0.1 resolution. The 99-gene signature was derived from analyzing in- house data on two tumor samples. Due to lower mRNA expression level of the CD4 gene and a high drop off rate in scGEX, using CD4 and CD8 gene expressions alone were not sufficient to separate the two populations (FIG. 1). Computational separation of CD4+ and CD8+ cells can have advantage over an experimental sorting approach which could potentially lead to a lower yield of T cells, especially CD8 cells, given imbalanced CD4:CD8 ratio often observed in solid tumors. Furthermore, experimental sorting may be associated with a higher cost of goods. FIG. 2 depicts the separation of TILs into CD4+ and CD8+ populations as described. Each dot represents a T cell. FIG. 2A shows Uniform Manifold Approximation and Projection (UMAP) on two clusters formed by using the 99- gene gene signature. FIGs. 2B-2C show gene expression levels for CD4 and CD8A in normalized UMI counts, respectively. In the second step, exhausted CD8 and CD4 cells were identified and defined. Tumor-reactive T cells have been shown to carry an exhausted (or dysfunctional) phenotype in TILs. Two exhaustion gene signatures were established. One for CD8+ cells which is composed of 20 genes (Table 3) and the other for CD4+ cells which is also composed of 20 genes (Table 4). The two exhaustion gene signatures were derived from the seven tumor samples that have been fully characterized and analyzed. Exhaustion (EX) scores were calculated for each T cell using the exhaustion gene signatures. The EX score is calculated as the sum of normalized UMI counts for the signature gene set. CD4+ cells with EX scores greater than or equal to (>) 13 were considered exhausted follicular helper cells (CD4.EX.FH). Similarly, CD8+ cells with EX scores > 13 were considered to be in an exhausted state (CD8.EX).
[0331] Based on transcriptome-based single cell clustering, T cell were further classified by CD8+ and CD4+ subtypes including exhausted CD8+ T cells (CD8.EX), cytotoxic CD8+ T cells (CD 8. Cytotoxic), resident memory CD8+ T cells (CD8.RM), effector or CD8’s re-expressing
CD45RA (CD8.EFF.EMRA), effector memory CD8’s (CD8.EM), stressed CD8+ T cells (CD8. Stressed), THl-like CD4+ T cells (CD4.THl-like), proliferating T cells (Prolif), regulatory T cells (CD4.Treg.1-3), naive and/or central memory CD4+ T cells (CD4.Naive. CM), and stressed CD4+ T cells (CD4. Stressed) (FIGs. 3A, 3B). Exhausted T cells defined by using exhaustion scores were cross confirmed with annotated T cell subtypes (FIG. 4A). Exhaustion signature scores assigned to each T cell were compared across subtypes (FIG. 4B).
[0332] Additionally, a gene set enrichment analysis (GSEA) was performed to estimate exhaustion by utilizing a 61 -gene signature for CD8+ (Table 5) and an 88-gene signature for CD4+ (Table 6) at a single cell level (FIGs. 5A-5C and FIGs. 6A-6C). These two signatures were also developed from the seven fully studied tumor samples. The application of a different scoring method (GSEA vs. SUM) on independent gene signatures served a purpose to have a complementary method of calling the exhaustion state for CD4.EX.FH and CD8.EX cells. CD8+ cells with GSEA score > 0.3 were considered as CD8.EX; CD4+ cells with GSEA score > 0.2 met the criteria as CD4.EX.FH.
[0333] Upon activation by an antigen, a T cell will proliferate, so-called clone expansion. If more than half of the cells with the same TCR clonotype were predicted to be exhausted by either calculation approaches described above, the clonotype was labeled as EX clone and selectable as a therapeutic candidate TCR.
[0334] Table 1. Exemplary Exhaustion Score Calculations
[0335] Table 1 above displays an output example of exhaustion score calculations for CD8+ clones described above using two signature gene sets with either sum (20-gene signature) or GSEA (61 -gene signature) method.
[0336] In step three, several quality control (QC) checks and filters were implemented for the selectable EX clonotypes (FIGs. 7A-7C). QCs included unique alpha and beta chain pairing status in a clonotype, matches to public TCRs, and expression of innate immune cell marker genes, etc. Specifically, clonotypes with paired TCR-alpha and TCR-beta chain are retained, while clonotypes with only alpha or beta chain captured were filtered out; clonotypes are matched to public TCRs (e.g. VDJdb collections, https://vdjdb.crd3.net). Candidate exhausted clones should not match to public TCRs that recognize antigens derived from non-oncogenic pathogens; Candidate exhausted T cells should carry the least of Mucosal-Associated Invariant T (MAIT) cell features, such as expressing innate immune cell markers (e.g. SLC4A10 and KLRB1) and usage of TRAJ33, TRAJ12, or TRAJ20 in the alpha chain. Lastly, a general rule was implemented to rank TCR clones that pass the QC step from high to low priority. CD4+ and CD8+ clones were ranked separately. Large clone size weights the most towards a higher ranking. In cases with similar clone sizes, a clone with a higher median exhaustion score ranks higher. Clones with proliferating cells are preferred. Clones with dual alpha or beta chain are deprioritized. These criteria provided the basis for prioritizing and eventually selecting the top 10 clonotypes, ideally including top five CD8 and top five CD4 clones to be engineered into autologous T cells as a personalized TCR therapy product.
[0337] The bioinformatics pipeline was executed with little human intervention thus can be implemented into standard operations. Five associated gene signatures are proprietary, which include a 99-gene signature for CD4+ and CD8+ cell computational separation, a 20-gene and 88-gene signatures to narrow down CD4.EX.FH cells, and a 20-gene and 61-gene signatures to define CD8.EX cells.
[0338] The performance of the prediction algorithm and associated gene signatures has been compared with other published results on TILs. First, the result was compared to prediction methods published by Lowery et al (Dr. Steven Rosenberg’s group). Under similar estimation conditions, Lowery et al ’s signature-based prospective prediction power on four CRC samples achieved a combined validation rate of 26% for CD8+ and 32% for CD4+ clones, respectively. The algorithm described above on the four CRC tumors, achieved a combined validation rate of roughly 36% for CD8+ clones and above 52% for CD4+ clones. Secondly, the gene signature published by Lowery et al. was applied to one of the CRC samples (sample ID BG0028) to predict tumor-reactive TCRs. All top ten TCRs that contribute to the positive predictive value (PPV) would have been missed by their signature-based prediction. It is possible that the published signatures are dependent on other confounding factors in the workflow used by the prediction methods.
[0339] Five gene signatures were developed for the TCR prediction algorithm and are presented in the following tables.
[0340] Table 2 shows the 99 gene signature to separate CD4+ T cells and CD8+ T cells.
[0341] Table 3 shows the CD8+ 20-gene exhaustion signature for calculating exhaustion score.
[0342] Table 4 shows the CD4+ 20-gene exhaustion signature for calculating exhaustion score.
[0343] Table 5 shows the CD8+ 61 -gene exhaustion signature for calculating GSEA scores.
[0344] Table 6 shows the CD4+ 88-gene exhaustion signature for calculating GSEA scores.
Example 2. Process of TCR Screening and Validation
Tumor Processing
[0345] Tumor samples were obtained 24 hours following surgery for single-cell analyses. K2EDTA blood samples were obtained from the same subjects. Patient peripheral blood mononuclear cells (PBMCs) were isolated by Ficoll density-gradient centrifugation and B cells were isolated by positive isolation (Miltenyi) and treated with EBV virus (ATCC) for immortalization. B cell cultures were expanded in flasks to harvest personalized cellular screening material. PBMCs were frozen for whole- exome sequencing (WES) and HLA typing.
[0346] Patient tumor samples were obtained following surgery. A portion of the sample was removed for formalin fixation and paraffin embedding (FFPE). The remainder of the tissue was minced
manually, and transferred to a tube for WES or suspended in DMEM media, with an available tumor enzymatic digestion kit and DNase, transferred to disposable tubes and incubated with regular agitation in a tissue dissociator for 60 minutes. After digestion, clumps were removed and the single cell suspension was recovered, washed and immediately frozen in aliquots and stored in vapor-phase liquid nitrogen.
[0347] Tumor single cell suspensions were thawed and stained with: Live/Dead Blue, anti-CD45, anti-CD14, anti-CD19, anti-CD3, anti-CD4 and anti-CD8. CD3+ TILs and CD45negative cells were sorted for single-cell lOx VDJ capture and gene expression or WES respectively.
Whole Exome Sequencing
[0348] 15-25mg of tumor fragment samples were collected in 350 μL of lysis buffer with freshly prepared p-mercaptoethanol and homogenized using a bead mill. 100,000 cells to maximum of 500,000 PBMC cells were collected and lysed by vertexing in the lysis buffer described above with poly (A) carrier RNA. Co-extraction of DNA and RNA in tumor and PBMC samples were performed using DNA RNA mini and micro purification kits, respectively. Genomic DNA was diluted to lOOng for tumor and normal tissue sample and lOOng or lOng for PBMC samples and lOng for CD45negative flow through. Genomic DNA was then sheared. Shearing conditions were 220W peak incident power, 380s of duration, 220 PIP, 25 DF, 50 CPB, 55 AIP at setpoint of 10°C in 50 pl. For the whole exome sequencing (WES) library preparation, samples were processed using a library preparation kit according to the manufacturers protocol. Samples were processed for total RNAseq was using total RNA prep ligation and a ribosomal depletion kit. Tumor and normal tissue were diluted to 200 ng and PBMC and CD45negative samples were diluted to 10 ng and RNAseq library prep were performed according to manufacturer’s protocol for this kit. Both WES and RNAseq libraries were pooled and normalized and sequenced. For WES, tumor samples were sequenced with sequencing depth of 250M - 300M reads paired reads per sample and normal adjacent, PBMC and expanded TIL samples were sequenced at 166M - 300M read paired reads sample. For RNAseq, tumor samples were sequenced with 120M - 166M paired reads per sample.
Single-Cell Sequencing
[0349] Assuming a loss of about 50% of cells from recovery, a minimum of 500 cells to maximum of 20,000 cells per well were loaded onto a single cell sequencing chip following the manufacturer’s instructions. Gel bead emulsions (GEM) were generated using microfluidics and nuclei were barcoded with UMI and cell barcode using the sequencer. cDNAs were generated by reverse transcription, followed by post-GEM RT clean up and cDNA amplification. After cDNA amplification, the 5 ’Gene Expression library (GEX) was performed, involving steps of fragmentation of cDNA, end pair and A-
tailing, post-fragmentation and sample index, and adaptor ligation. Concurrently, 5’ V(D)J library was prepared by enrichment for V(D)J segments using a kit following the manufacturer’s instruction.
[0350] GEX and V(D)J libraries were sequenced for QC and library normalization and then sequenced using a targeting depth of 20,000 reads/cell and 5000 reads/cell respectively. Samples were sequenced using 200 cycles.
Mutanome Construction
[0351] All computational analysis steps were conducted using an automated analysis pipeline hosted on a cloud computing platform. Tumor and germline WES and tumor RNA-Seq results were aligned to the human genome using the BWA-MEM (version 0.7.13) and STAR (version 2.5.1.b), respectively. All alignments were post-processed with the GATK 3.5 workflow including GATK Indel Realigner, GATK Base Recalibrator, and Picard Mark Duplicates (version 1.140). ConTest was used to confirm that all three samples originated from the same individual.
[0352] Somatic variants were called on the basis of tumor and normal WES using an ensemble of seven different mutation calling algorithms: VarDict (version 1.4.6), Strelka (version 1.0.15), VarScan2 (version 2.3.9), Mutect2 (from the GATK version 3.5 bundle), Atlas Indel2 (version 1.4.3), Seurat (version 2.6), and Platypus (version 0.8.1). The three sequencing datasets were then realigned using HaplotypeCaller (from the GATK version 3.5 bundle), specifying the candidate mutations (union of the seven call sets) as known variants. Finally, the variants were filtered according to the following features: the level of read support in the tumor WES data, the presence of variant reads in the normal WES data, read orientation bias, adequacy of coverage in the normal WES sample, the presence of neighboring (+/-30nt) variants (somatic or germline), and read quality bias observed in mutation- supporting reads.
[0353] RNA-Seq expression levels of all genes and transcripts were quantified in transcripts per million (TPM) using RSEM (version 1.2.31). The overall expression of each somatic variant was calculated as the product of the RSEM-derived transcript expression (summing across all overlapping protein-coding transcripts) and the fraction of RNA-Seq reads supporting the variant. Variants with zero supporting RNA reads were still considered as valid mutations (and counted toward tumor mutation burden) but were not considered for inclusion in vaccine. RNA-Seq was additionally processed using STAR-Fusion (version 2.5.1.b) to identify transcript fusions (requiring both junction support and spanning read pairs).
[0354] Up to 150 somatic mutations were selected based on variant allele frequency of mutations and expression level of mutated genes. Expression levels of Cancer-Testis Antigens (CTAs) were determined from RNA-sequencing data, and CTAs with TPM > 1 were considered expressed and screened.
Tumor Infiltrating Lymphocyte Selection
[0355] Taking scGEX as input, T cells were first computationally classified into two populations, CD4+ T cell or CD8+ T cell, based on an unsupervised clustering method using a set of defined gene signatures. Then exhaustion score is assigned to each CD4+ and CD8 T cell using exhaustion gene signature for CD4 and CD8, respectively. The exhaustion gene signature for CD8 comprises 20 genes; the exhaustion gene signature for CD4 also comprises 20 genes. Clonotype (TCR) of a given T cell was matched to its scGEX via a single cell barcode. Within each clonotype, if the median exhaustion score is equal or larger than a predefined cutoff value, the clonotype is labeled as exhausted (EX) clone which is selectable for downstream screening and validation.
Mutanome Peptide Library Preparation
Preparing Peptide Sequence Lists
[0356] The prioritized list of mutations was analyzed and adjusted to generate peptide sequence lists for synthesis. For prioritized missense somatic mutations, sequences were designed with the mutated amino acid centrally placed and flanked by 13 wild-type residues to create 25-mer peptides. Any 25- mer sequence with a proline residue at or near the C-terminus was elongated. For frameshift mutations longer than 30 amino acids, the sequence was segmented into overlapping 30-mer peptides (25 amino acid overlap), avoiding sequences with C-terminal or near C-terminal proline residues.
[0357] The top 80 sequences from the Class I short peptides were sorted based on HLA binding preferences to ensure a variety of binding preferences would be present in the final pools. For donors positive for CTAs, 15-mer peptide libraries representing the entire length of CTA proteins were prepared using overlapping sequences (11 amino acid overlap), avoiding sequences with C-terminal or near C-terminal proline residues.
Synthesizing the Peptide Libraries
[0358] Peptides were synthesized using standard Fmoc-based solid-phase peptide synthesis (SPPS) with a capping strategy. For 15-mer and 25-30-mer peptides, Rink amide resin was used, while Wang or HMPB resins preloaded with the C-terminal amino acid were used for short peptides. Synthesis was performed on a library peptide synthesizer in tip-mode or a library synthesizer in plate mode at a 2.5 pmol scale, using 10 equivalents of Fmoc-protected amino acids, 10 equivalents of HCTU, and 100 equivalents of DIPEA per coupling. Each cycle from 1-10 involved two 30-minute couplings, and cycles 11-30 involved three 30-minute couplings. All cycles were followed by a 30-minute capping step with acetic anhydride. Deprotection was conducted using 20-40% piperidine. Upon completion, resins were washed with dichloromethane and dried under vacuum.
Cleaving the Peptide Libraries
[0359] Following drying, a cleavage cocktail (90% trifluoroacetic acid, 5% thioanisole, 3% ethanedithiol, and 2% anisole) was added to each well and incubated at room temperature for 4 hours. The solution was fdtered from the resin and collected in deep-well plates, then combined with cold diethyl ether. After centrifugation, the diethyl ether was decanted and peptides were dissolved/suspended in water, frozen in liquid nitrogen, and lyophilized until dry.
Characterizing the Peptide Libraries
[0360] Peptides were dissolved to 0.1 mg/mL in 10% DMSO in water with 1 mM TCEP. UPLC-MS analysis was performed on a Waters Acquity H-Class UPLC equipped with an SQD2 mass spectrometer and an Acquity UPLC Peptide BEH C18 column. The LC method involved a gradient of solvent A (0.1% formic acid in water) and solventB (0.1% formic acid in acetonitrile) at a 0.8 ml/min flow rate. Gradient Information is shown in Table 10.
[0361] Table 10. Chromatography Gradient Used
[0362] Mass spectra were analyzed to confirm the desired sequence. Any peptide where the desired sequence was not the most prominent species as determined from absorbance integration of the chromatogram, underwent resynthesis as described below. Peptides that passed this QC were carried on to screening without further purification.
Resynthesizing Failed Sequences
[0363] Failed peptides were resynthesized using a microwave synthesizer at 50 pmol scale with default methods and acetic anhydride capping. Peptides were cleaved using a cleaving oven. The cleaved peptides were separated, washed, centrifuged, and dissolved in water, then lyophilized. The completed peptide resins were washed 3 times with dichloromethane and dried under vacuum. The peptides were cleaved from the resin on a cleaving oven using 5 mL of cleavage cocktail (90% trifluoroacetic acid, 5% thioanisole, 3% ethanedithiol, and 2% anisole) at 40°C for 1 hour. The peptide solution was then separated from the resin, 40 mL of cold diethyl ether was added, and the resulting suspension was centrifuged at 4000 rpm for 5 minutes at 4°C. The diethyl ether was decanted and the pellet was triturated with another 40 mL of diethyl ether, centrifuged, and decanted. The pellet was
dissolved in 5mL of water, frozen in liquid nitrogen, and lyophilized. Once dry, a sample of the material was dissolved to 0.1 mg/mL in 10% DMSO in water with ImM TCEP and analyzed by LC- MS as previously described. Peptides passing the previously described QC were carried on to screening without further purification.
Pooling Peptides for pTCR Co-Culture Screening
[0364] 25 -3 Omer “long” mutanome peptides were dissolved in 40 μL of 12.5 mM TCEP in DMSO and then diluted with 160 μL of serum-free RPMI media. 100 μL of 10 long peptides were combined to make a pool of 10 at where the concentration of each peptide was approximately 0.5 mg/mL. Unpooled material was retained for peptide deconvolution assays.
[0365] Short Class I mutanome peptides were dissolved in 30 μL of 12.5 mM TCEP in DMSO and then diluted with 270 μL of serum-free RPMI media. 100 μL of 10 short peptides were combined to make a pool of 10 at where the concentration of each peptide was approximately 0.1 mg/mL. Unpooled material was retained for peptide deconvolution assays.
[0366] CTA 15mer peptide libraries were dissolved in 25 μL ImM TCEP in DMSO and diluted in
100 μL of water. 75 μL of 10 15mer peptides were combined to make pools of 10. 500 μL of each pool of 10 were combined in a 15 mL conical vial, frozen in liquid nitrogen, and lyophilized. Once dry, the vial containing approximately 200 ug of each 15mer peptide was reconstituted in 200 μL DMSO to achieve a concentration of 1 mg/mL/peptide. Unpooled material was retained for peptide deconvolution assays.
TCR and HLA RNA Synthesis
TCR alpha and beta variable sequence constructs
[0367] To enable high-throughput screening of TIL-derived TCR sequences against patient mutanomes, paired TCR alpha and beta variable chain regions were ordered as gene fragments from a vendor. The following upstream and downstream TCR constant overlaps were added to the 5’ and 3’ ends of each variable region, respectively. Beta upstream (5’) overlap:
Beta downstream (3’) overlap:
Alpha upstream (5’) overlap:
Alpha downstream (3’) overlap:
[0368] The codon that is split between the variable and constant TCR region directly 3’ ofthe variable region was handled differently for alpha and beta sequences. Since the split beta codon is always a “GAG,” this codon is built into the beta downstream (3’) overlap above as the first three nucleotides (shown in bold). The split alpha codon varies between TCR sequences, however. For some samples, the split codon was omitted from the TCR sequence. For the remaining patients, the sequence “AAT”
was included as the split codon sequence since the identity of the split amino acid is asparagine in the majority of TCRs. For these samples, the “AAT” codon was included in the alpha gene fragment sequence directly before the alpha downstream (3’) overlap sequence above.
Overlap-extension PCR to generate full-length TCR constructs
[0369] To construct the full-length TCR alpha and beta template sequences, which include primer sequences, the T7 promoter sequence, the Kozak sequence, a leader peptide sequence, and the TCR variable and constant regions, an overlap-extension polymerase chain reaction (PCR) was performed with each alpha and beta TCR gene fragment and the following upstream (5’) and downstream (3’) sequences using a DNA Polymerase kit. The following sequences were used: Beta upstream (5’) sequence:
; Beta downstream (3’) sequence:
Alpha upstream (5’) sequence:
Alpha downstream (3’) sequence:
[0370] Briefly, the alpha and beta chain gene fragment sequences (10 ng) were mixed with the corresponding upstream (3’) and downstream (5’) sequences in ~1 : 0.25 : 0.1 (gene fragment : upstream sequence : downstream sequence) molar ratio along with buffer (final concentration: lx), dNTPs (final concentration: 0.5 mM), and DNA Polymerase (final amount: 0.4 units) and brought to 20 μL with nuclease-free water. The reactions were mixed. After an initial denaturation step at 98°C for 2 min, 20 rounds of PCR were performed (98°C for 10 s, 71°C for 30 s, and 72°C for 1 min 30 s) followed by a 10 min hold at 72°C. Overlap-extension PCR product was either immediately used for amplification PCR or stored at -20°C overnight without purification.
Amplification PCR to generate TCR IVT template
[0371] Following overlap-extension PCR, the full-length TCR alpha and beta PCR products were PCR amplified with the following primer sequences (same forward and reverse primers for alpha and beta sequences) using a DNA Polymerase kit. Forward primer:
Reverse primer:
[0372] The poly-A tail template sequence was included in the reverse amplification PCR primer and was thus incorporated at the amplification PCR step.
[0373] Briefly, the alpha and beta overlap-extension PCR products (4 μL) were mixed with the forward and reverse primers (final concentration: 100 nM each), buffer (final concentration: lx), dNTPs (final concentration: 0.2 mM), and DNA Polymerase (final amount: 1 unit) and brought to 50 μL with nuclease-free water. After an initial denaturation step at 98°C for 30 s, 35 rounds of PCR were performed (98°C for 30 s and 72°C for 2 min) followed by a 10 min hold at 72°C. Unpurified amplification PCR product was either immediately used for in vitro transcription (IVT) or stored at - 20°C overnight prior to IVT.
In vitro transcription (IVT) of full-length TCR RNA
[0374] IVT was performed separately for TCR alpha and beta chains to generate RNA transcripts that are capped co-transcriptionally with a capping kit. Briefly, unpurified alpha and beta amplification PCR products (6 μL) were mixed with NTPs (final concentrations: 6 mM ATP and 5 mM each of UTP, CTP, and GTP), capping reagent (final concentration: 4 mM), reaction buffer (final concentration: IX), and T7 RNA Polymerase Mix (amount: 2 μL) for a final reaction volume of 20 μL. The reactions were incubated in a thermocycler at 37°C for 2 h. After 2 h, 2 μL of RNase-free DNase I was added to each well and the reaction was incubated at 37°C for another 15 min. IVT products were either immediately purified or stored overnight at -20°C prior to purification.
[0375] Selection beads were used to purify the IVT product, with all steps performed at room temperature. The beads were thoroughly mixed and added to the IVT product at a 1 : 1 ratio. Samples were mixed and rested for 5 min. Then, the IVT product plate with beads was briefly spun and placed on a magnet for 5 min. Next, the supernatant was carefully aspirated and discarded and the plate was left to rest for 30 s. Then, 180 μL of 80% ethanol solution was added to each well and the plate was left on the magnet for another 30 s. The supernatant was carefully removed and discarded. A second ethanol wash was added and removed in the same manner and the plate was left for 30 s to allow excess ethanol to evaporate. The plate was then removed from the magnet and 32 μL of buffer was added. Samples were mixed thoroughly and rested for 5 min to elute the IVT product. After 5 min, the plate was placed back on the magnet, rested for 5 min, and the purified RNA-containing supernatant was transferred to a clean PCR plate. Representative purified RNA products were tested via electrophoresis to confirm correct sizing. TCR alpha and beta RNA chains were each brought to a final concentration of 1 μg/μL and paired TCR alpha and beta RNA chains were combined in a 1:1 ratio (1 μg/μL combined concentration; 0.5 μg/μL each chain) and stored at -80°C until further use in co-culture assays.
Confirmation of TCR expression in Jurkat NFAT luciferase cells
[0376] To confirm productive TCR expression in cells, TCR RNA stocks containing both the alpha and beta RNA chain were electroporated into Jurkat NFAT luciferase cells (derived from product
JI 601; engineered to knock-out TRAC, TRBC, and 02M and overexpress CD8) using an electroporation system with 4mm gap 96-well plates. Briefly, cells were washed twice with serum-free media and resuspended at 5x106 cells/mL in media. Cells were electroporated in 200 μL at a ratio of 4 pg mixed-chain TCR RNA to 1x106 cells at 280 V for 10 ms (1 pulse). Cells were diluted in RPMI + 10% FBS + 200 pg/mL hygromycin B and left to recover overnight. The following day, cells were stained with LIVE/DEAD Fixable Blue dye and PE anti-mouse TCR chain antibody and run on a flow cytometer to assess TCR expression. Generally, an average of >80% expression across all TCRs for a patient was considered acceptable for screening.
PCR to generate HLA IVT template
[0377] Plasmids containing class I (alpha chain) and class II (alpha and beta chain) patient HLA sequences were obtained. To generate sufficient HLA DNA template for IVT, the HLA plasmids were PCR amplified with the following primer sequences using a DNA Polymerase kit from NEB. Forward primer: TGGGCGCGTTATTTATCGGAGTTGCAGTTG; Reverse primer:
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAGCGACGCGG [0378] The poly-A tail template sequence was included in the reverse PCR primer and was thus incorporated in this step.
[0379] Briefly, HLA plasmids (2 μL of 50 ng/ μL) were mixed with the forward and reverse primers (final concentration: 1.2 pM each), reaction buffer (final concentration: lx), dNTPs (final concentration: 0.3 mM), and DNA Polymerase (final amount: 1 unit) and brought to 50 μL with nuclease-free water. The reactions were mixed. After an initial denaturation step at 98°C for 50 s, 28 rounds of PCR were performed (98°C for 20 s, 69.8°C for 60 s, and 72°C for 35 s) followed by a 3 min hold at 72oC. HLA PCR product was then purified with selection beads in a process similar to TCR RNA purification with the following exception: 27 μL of beads were added to the 50 μL of unpurified PCR product. Purified HLA DNA was either immediately used for in vitro transcription (IVT) or stored at -20°C overnight prior to IVT.
In vitro transcription (IVT) of HLA RNA
[0380] IVT was performed with purified HLA DNA to generate RNA transcripts that are capped co- transcriptionally with anti -reverse cap analog (ARC A) using a capping kit. Briefly, purified HLA DNA (6 μL) was mixed with 2XNTP/ARCA (final concentration: IX), reaction buffer (final concentration: 1 X), and manufacturer enzyme mix (final concentration: IX) for a final reaction volume of 20 μL. The reactions were gently mixed and then incubated in a thermocycler at 37°C for 1 h. After 1 h, 1 μL of T7 GTP (final concentration: ~1.5 mM) was added to each well and the reaction was incubated at 37°C for another 1 h. Two hours after transcription initiation, 1 μL of DNase (final amount: 2 U) was added
to each well and the reaction was incubated at 37°C for another 15 min. IVT products were either immediately purified or stored overnight at -20°C prior to purification. HLA RNA purification was performed with selection beads using the same protocol as TCR RNA purification. Representative purified RNA products were tested via electrophoresis to confirm correct sizing. HLA RNA was brought to a final concentration of 1 μg/μL and stored at -80°C until further use in co-culture assays.
Confirmation of HLA expression in K562 cells
[0381] To confirm HLA expression in cells, HLA RNA stocks were electroporated into K562 cells using an electroporation system with 4mm gap 96-well plates. Briefly, cells were washed twice with serum-free media and resuspended at 5x106 cells/mL in media. Cells were electroporated in 200 μL at a ratio of 2 pg each HLA RNA chain (alpha chain for class I and alpha + beta chain for class II) to IxlO6 cells at 200 V for 8 ms (3 pulses with 400 ms intervals). Cells were diluted in RPMI + 10% FBS (R10) and left to recover overnight. The following day, cells were stained with LIVE/DEAD Fixable Blue dye and either PE anti-human HLA-A,B,C antibody, AF647 anti-human HLA-DR,DP,DQ antibody, or PE anti-human HLA-DQ antibody) and run on a flow cytometer to confirm HLA expression.
TCR Screening
High-throughput screening of patient TCRs against patient mutanome
[0382] Once TCR expression was confirmed in JurkatNFAT cells, each donor’s TCR sequences were screened for reactivity to peptides representing the donor’s tumor mutanome. On the day of the screening assay, mixed-chain TCR RNA was electroporated into Jurkat NF AT cells as described above. Post-electroporation, Jurkat cells expressing individual TCRs were diluted to a concentration of 1.75x106 live cells/mL in R10 media in 8-channel reservoirs, assuming 30% electroporation- induced cell death. Reservoirs were sealed with semi-permeable membranes and rested at 37°C until assay setup.
[0383] For donors where EBV-B cells were not generated, K562 cells were electroporated with donor HLA RNA and combined into pools for screening. On the day of the screening assay, HLA RNA was electroporated into K562 cells as described above. Post-electroporation, K562 cells expressing individual class I or II patient HLA were diluted to a concentration of 1.75x106 live cells/mL in R10 media, assuming 30% electroporation-induced cell death. Subsequently, HLA-expressing K562 cells were combined into class I or II pools at equal volumes, with ~3 different class I or II HLA constituting each pool. Pools were pipetted into deep-well, single-well reservoirs, sealed with semi-permeable membranes, and rested at 37°C until assay setup. For patients where EBV-B cells were generated, EBV-B cells were brought to a concentration of 1.75x106 cells/mL in a deep-well, single-well reservoir and rested at 37°C until assay setup.
[0384] 10 μL/well of antigen-presenting cells (either patient EBV-B cells or HLA-expressing K562 cells) were dispensed into 384-well plates. Briefly, for donors where EBV-B cells were used, the EBV- B cells were dispensed into all 4 quadrants of the 384-well plates. For patients where HLA-expressing K562 cells were used, K562-HLA pools I and II were dispensed into quadrants 1/2 and 3/4 of the 384- well plates, respectively. Media without APC cells was pipetted into the columns where PHA-L would be added to assess TCR expression. Next, 10 μL/well TCR-expressing Jurkat NF AT luciferase cells were dispensed into each well. For screens where EBV-B cells were used, TCR-expressing Jurkats were dispensed into only two quadrants, whereas for screens where K562-HLA cells were used, TCR- expressing Jurkats were dispensed into all 4 quadrants of the 384-well assay plates to provide for duplicate measurements for each condition.
[0385] Long and short peptides covering the donor mutanome were resuspended and pooled as described previously. After pooling, 3x preparations of each long and short peptide pool were made by diluting stock pools 1: 10 with R10 in 12-channel reservoirs immediately before plating. Highly expressed cancer testis antigens (CTA) for each patient were ordered as mixes or produced and diluted to a 3x concentration of 15 pg/mL in R10 in 12-channel reservoirs immediately before plating. PHA- L was prepared at a 3x concentration in the 12-channel reservoir as well to provide a control for TCR expression in Jurkat NF AT luciferase cells. Media was included in one column of each 12-channel reservoir to serve as a “no peptide” control. The peptides pools, CTA pepmixes, and PHA-L and media controls were then added to the 384-well assay plates. Briefly, 10 μL/well of peptide or control samples were dispensed into all 4 quadrants of the 384-well assay plates. Note that in cases where EBV-B cells served as the APC, tips were changed between quadrants 1/2 and 3/4 to avoid contamination between TCRs. Plates were covered and incubated overnight (~16 h) at 37°C and 5% CO2.
[0386] The following day, plates were removed from the incubator and equilibrated at room temperature for ~10 min. Luciferase substrate was prepared according to the manufacturer’s instructions and added at a 1 :1 ratio to each well. Luminescence was measured after ~5 min using a plate reader. Raw data from each well was normalized to the corresponding average “no peptide” signal for each TCR, leading to a fold-change signal for each condition. Fold-change NF AT luciferase signal was plotted in heat map format and reactive conditions were identified as those having greater than ~2-fold signal.
Reactive TCR peptide deconvolution
[0387] After initial validation of TCR reactivities to peptide pools, peptide deconvolution was performed to identify the neoantigen responsible for TCR activation. Briefly, Jurkats expressing TCRs where reactivity was observed were cocultured with either EBV-B cells or the reactive K562-HLA cell pool and each peptide within the reactive peptide pool individually, with the reactive peptide pool
serving as the positive control. Jurkat and APC concentrations were kept the same in the peptide deconvolution (~5.8x106 cells/mL each) and individual long and short crude peptides were tested at final concentrations of 16.6 μg/mL and 3.3 pg/mL, respectively (same as the individual concentrations within the peptide pools used in the screen). Plates were covered and incubated overnight (~16 h) at 37°C and 5% CO2. The next day, plates were removed from the incubator and luciferase activity was assayed and calculated in the same manner as for the initial screen. Specific reactivities were determined by identifying the neoantigen with the highest fold-change peptide signal.
Reactive TCR HLA Deconvolution
[0388] To determine the HLA responsible for activation of each TCR hit, reactive TCRs and individual matched-class patient HLAs (either all HLA if EBV-B cells were used for screening or the HLA from the reactive K562-HLA pool) were electroporated into Jurkat NF AT luciferase and K562 cells, respectively, as described previously. After electroporation, 10 μL of both Jurkat and K562 cells were added to wells of a 384-well plate such that there were quadruplicate wells for each TCR/HLA combination (each well containing 1.75x104 cells/well of both APCs and Jurkats). Then, 10 μL of the reactive crude peptide for each TCR was added to two wells of each TCR/HLA combination at a final concentration of 16.6 or 3.3 pg/mL for long and short peptides, respectively. To the other two wells of each TCR/HLA combination, 10 μL of RIO without peptide was added as the “no peptide” control. Plates were covered and incubated overnight (~16 h) at 37°C and 5% CO2. The next day, plates were removed from the incubator and luciferase activity was assayed and calculated in the same manner as for the initial screen, with reactivity for each HLA normalized to the average of the corresponding “no peptide” control wells. HLA restrictions were determined by identifying the HLA with the highest fold-change peptide signal. TCRs for which peptide and HLA were successfully deconvoluted were considered to be validated TCRs.
TCR Functional Avidity Assay
Optimization of functional avidity assay for measuring TCR EC50
[0389] A head-to-head comparison of three functional avidity (FA) readouts was run in Jurkat and/or primary CD8 T cell systems to identify which assay produced the highest sensitivity in a high- throughput format. In Jurkat cells, NF AT activation in the form of luminescence and percentage of CD69 expressing cells were compared to primary CD8 T cell interferon gamma (IFN-γ) secretion measured via ELISA and percentage of CD69 expressing cells.
Measurement of validated CD8 TCR EC50 using Jurkat CD69 functional avidity assay
[0390] The potency of each validated CD8 TCR was assessed by evaluating its functional avidity (FA) across a dose titration of antigen presented by the HLA restriction element. The FA of each TCR was assessed by co-culturing Jurkat or primary T cells transiently expressing validated TCRs with T2
cells transiently expressing donor-matched HLA in the presence of a dose titration of purified cognate antigen. One day before the assay setup, T2 cells were electroporated to express the TCR’s matched HLA restriction element. T2 cells were electroporated as previously described for K562 and diluted into R1O media to a concentration of 2.5x106 cells/mL. Cells were transferred to an appropriate flask size and rested at 37°C, 5% CO2 overnight.
[0391] The following day, Jurkat cells were electroporated, as described above, to express an individual validated CD8 TCR. After electroporation, Jurkat cells were diluted in RIO media in an 8- channel reservoir to a concentration of 2.5x106 cells/mL and rested at 37°C, 5% CO2 for 3 hours. After one hour, T2 cells were counted and if needed, diluted in RIO media to a concentration of 2.5x106 cells/mL and transferred to an 8-channel reservoir (each T2-HLA was transferred to the row number that corresponds to the matched TCR location). 10 μL of T2 cells were dispensed into all four quadrants of a 384-well plate. The plate was transferred to an incubator at 37°C, 5% CO2 while peptides were being prepared. Each stock of peptide was dissolved in dimethyl sulfoxide (DMSO) + 0.1M of tris(2- carboxy ethyl) phosphine (TCEP) to a concentration of 10 mM. Both mutant and wild-type versions of TCR-specific purified peptides were diluted to a concentration of 20 pM in RIO media in the first column of a 96-well U-bottom plate and titrated 1 : 10 for 11 -points with the 12th point containing only media. Next, 20 μL/well of peptide was added to the plate containing T2 cells. The plate was placed at 37°C, 5% CO2 for two hours. After the peptide pulse, 10 μL/well of the prepared Jurkat cells were added to the plate, resulting in a final top concentration of peptide of 10 pM. Plates were incubated for approximately 16 hours overnight at 37°C, 5% CO2.
[0392] The following day, cells were prepared for flow cytometry analysis. Plates were removed and 40 μL of PBS were added to the wells and centrifuged at 400 xg for 4 min. The supernatants were flicked out of the plate and an additional wash was performed by repeating this step. Live/Dead Stain was prepared by diluting 1:1000 in PBS. After the second wash, 20 μL of prepared Live/Dead stain was added to the wells and incubated in the dark for 15 min at room temperature. During the incubation, the surface staining antibody cocktail was prepared by diluting the following antibodies in staining buffer: AF488 anti-human CD3 (clone UCTHT1), PE anti-mouse TCR β chain (clone H57- 597), BV785 anti-human CD8 (clone SKI) and APC anti-human CD69 (clone FN50). After the live/ dead incubation, the plate was washed once as previously described and 20 μL/well of the antibody cocktail was dispensed into the plate. Cells were incubated in the dark for 40 min on ice. Cells were then washed twice with stain buffer and 10 μL/well of 1 % PF A diluted in PBS was added and incubated at room temperature in the dark for 10 min. Following the incubation, cells were washed twice with stain buffer and resuspended in a final volume of 20 μL/well. Cells were analyzed by flow cytometry in high-throughput mode, collecting 15 μL of sample. Data analysis was performed by gating on the
population of interest (SSC-A vs FSC-A), single cells (SSC-W vs SSC-H and FSC-W vs FSC-H), live cells (SSC-A vs L/D) and then either CD3+, m TCR β+ (A488 vs PE) or CD8+ (SSC-A vs BV785) followed by CD69+ (SSC-A vs APC). EC50 values were calculated by creating a 3-parameter dose- response curve representing the percentage of cells positively expressing CD69 across the titrated peptide concentrations. In some cases where titration curves did not saturate at the upper baseline, functional avidity was repeated with a top peptide concentration of 100 μM.
Example 3. Analysis of Selected TCRs
[0393] To demonstrate the effectiveness of the selection process used to identify TCRs, tumor samples were obtained from human donors and processed as described in Example 2. Briefly, samples were processed as described in Example 2, the screened TCR clones were then categorized based on exhaustion score and assessed for antigen specificity (FIG. 8). The exhaustion gene signature scores were calculated as described in Example 1. As shown in FIGs. 9A-9B, clones which demonstrated scores above the indicated cutoff were selected. The results of the tumor antigen reactivity screen are shown using squares (reactive) or asterisks (non-reactive). The antigen reactivity analysis of the selected clones demonstrated that the clones were specific for diverse CD8 and CD4 tumor antigens (FIGs. 10A-10D)
[0394] Then, top 10 clones were prioritized as shown in FIG. 11. Briefly, clones which demonstrated a higher clonality, higher exhaustion score, positive for proliferation gene signatures were prioritized. Clones which had dual alpha or beta chains or demonstrated a Treg phenotype were deprioritized. Following validation via NF AT activation as described in Example 2, positive predictive value (PPV) results values are shown in FIGs. 12A-12C and summarized in Table 7. The PPV calculation is based on the top 10 clones ranked by prioritization criteria, which may consist of five CD8 and five CD4 clones, depending on availability. If fewer than five CD8 clones were available, additional CD4 clones were included to reach up to 10 clones. First, the PPV was calculated separately for each sample's CD8 and CD4 clones, followed by a combined PPV for the top 10 clones of each sample. The number of unique mutations (e.g., antigens) targeted, along with their associated HLA alleles, reflects the combined results from both CD8 and CD4 clones.
[0395] Table 7. PPV Values
[0396] Table 8A provides detailed validation statistics on selectable clones and top CD8 clones. Table 8B provides detailed validation statistics on selectable clones and top CD4 clones. The number of mutations denote the number of antigen mutants which were recognized among the selectable or top clones. The number of HLA denote the number of HLA alleles which were recognized among the selectable or top clones. The scr-TCR number denotes the number of selectable or top 5 clones tested. These data demonstrate the diversity of targets when the top 5 of CD8 and top 5 of CD4 for therapy were ranked out of all screened “selectable” TCRs.
[0397] Table 8A. Selectable and Top CD8 Clones
[0398] Table 8B. Selectable and Top CD4 Clones
[0399] Antigen capture of the TCRs was also assessed. This is performed via single cell sequencing using a library of barcoded antigens and selecting for antigen specific TCRs for further analysis. As shown in FIGs. 13A-13C, top 5 CD8 TCR ranking captured 2 antigens and the top 5 CD8 TCR ranking captured 2 antigens. Taken together, these data demonstrate that the process described herein can effectively identify clinically relevant TCR clones that can respond effectively to tumor antigens.
Example 4. Use of Immune Checkpoint Inhibitor-treated Samples
[0400] The processes described in Examples 1-3 were also applied to tumor samples obtained from donors which have undergone pre-treatment with immune checkpoint inhibitors (ICI). ICI-based immune therapy has been approved as frontline therapy and/or widely used as neoadjuvant therapy for multiple cancer types. ICI treatment can alter the tumor immune landscape in the tumor microenvironment ICI can promote T cell function by re-invigorating exhausted T cells, specifically those progenitor exhausted T cells. Thus, exhaustion phenotype and scores of T cells can be affected by prior ICI treatment. To adapt the TCR prediction and selection algorithm to a tumor which is known to have received an ICI as a neoadjuvant therapy or recently been treated with ICI regimen, the process was adjusted by using the upper quantile value of exhaustion score and/or GSEA score instead of using the median value. If the upper quantile value of exhaustion score and/or GSEA score of a TCR clonotype was equal or greater than the cut off value, then this TCR clonotype was defined as an
exhausted clone. The fixed cutoff values (13 for CD4+ exhaustion score; 13 for CD8+ exhaustion score; 0.2 for CD4 GSEA score; 0.3 for CD8+ GSEA score) used were the same as described in Examples 1-3.
[0401] While preferred embodiments of the present disclosure have been shown and described herein, it will be apparent to those skilled in the art that such embodiments are provided by way of example only Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
Claims
1. A method of identifying exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer, the method comprising:
(a) providing single cell transcriptome data of the population of T cells;
(b) classifying each T cell of the population of T cells as a CD4+ T cell or a CD8+ T cell based on an expression level of each classification gene of a set of at least 10 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster and a CD8+ cluster, and
(c) calculating
(i) a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 5 CD4+ exhaustion gene markers, and
(ii) a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 5 CD8+ exhaustion gene markers, wherein the set of at least 5 CD4+ exhaustion gene markers is different from the set of at least 5 CD8+ exhaustion gene markers, wherein: each T cell within the CD4+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell, and each T cell within the CD8+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
2. A method of identifying exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer, the method comprising: calculating a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD4+ T cell, wherein the calculating is based on an expression level of each CD4+ exhaustion gene marker of a set of at least 5 CD4+ exhaustion gene markers; wherein the expression level of each CD4+ exhaustion gene marker is from single cell transcriptome data of the population of T cells from the tumor microenvironment of the subj ect;
wherein each T cell classified as a CD4+ T cell with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
3. The method of claim 2, wherein the method further comprises, prior to calculating, classifying a T cell from the population of T cells as a CD4+ T cell based on an expression level of each classification gene of a set of at least 10 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster
4. A method of identifying exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer, the method comprising: calculating a CD8+ exhaustion score and/or a CD8+ gene set enrichment analysis (GSEA) score for a T cell classified as a CD8+ T cell, wherein the calculating is based on an expression level of each CD8+ exhaustion gene marker of a set of at least 5 CD8+ exhaustion gene markers; wherein the expression level of each CD8+ exhaustion gene marker is from single cell transcriptome data of the population of T cells from the tumor microenvironment of the subject; wherein each T cell classified as a CD8+ T cell with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
5. The method of claim 4, wherein the method further comprises, prior to calculating, classifying a T cell from the population of T cells as a CD8+ T cell based on an expression level of each classification gene of a set of at least 10 classification genes from the single cell transcriptome data, thereby generating a CD8+ cluster.
6. The method of claim 3 or 5, wherein the method further comprises classifying each T cell from the population of T cells as a CD4+ T cell or a CD8+ T cell based on an expression level of each classification gene of a set of at least 10 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster and a CD8+ cluster.
7. The method of claim 6, wherein the method further comprises calculating (i) a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 5 CD4+ exhaustion gene markers, and (ii) a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 5 CD8+ exhaustion gene markers.
8. The method of claim 7, wherein the set of at least 5 CD4+ exhaustion gene markers is different from the set of at least 5 CD8+ exhaustion gene markers.
9. A method of classifying CD8+ T cells and CD4+ T cells in a population of T cells, the method comprising:
(a) providing single cell transcriptome data of a population of T cells obtained from a tumor microenvironment of a subject having a cancer;
(b) classifying each T cell of the population of T cells as a CD4+ T cell or a CD8+ T cell based on an expression level of each classification gene of a set of at least 40 classification genes selected from the group consisting of the genes of Table 2 from the single cell transcriptome data, thereby generating a CD4+ cluster and a CD8+ cluster, wherein a T cell of the CD4+ cluster is classified as CD4+ T cell, and wherein a T cell of the CD8+ cluster is classified as CD8+ T cell.
10. The method of claim 9, wherein the method further comprises calculating a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 5 CD4+ exhaustion gene markers.
11. The method of claim 10, wherein the expression level of each CD4+ exhaustion gene marker is from single cell transcriptome data of a population of T cells from the tumor microenvironment of the subj ect.
12. The method of claim 10 or 11, wherein each T cell within the CD4+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell.
13. The method of any one of claims 9-12, wherein the method further comprises calculating a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 5 CD8+ exhaustion gene markers.
14. The method of claim 13, wherein the set of at least 5 CD4+ exhaustion gene markers is different from the set of at least 5 CD8+ exhaustion gene markers.
15. The method of claim 13 or 14, wherein the expression level of each CD8+ exhaustion gene marker is from single cell transcriptome data of a population of T cells from a tumor microenvironment of a subject having a cancer.
16. The method of any one of claims 13-15, wherein each T cell within the CD8+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell.
17. The method of any one of claims 1-16, wherein the method further comprises obtaining the population of T cells from the tumor microenvironment of the subject.
18. The method of claim 17, wherein obtaining comprises isolating a tumor or a tumor tissue comprising the population of T cells from the subject.
19. The method of any one of claims 1-18, wherein the expression level is determined by mRNA transcripts.
20. The method of any one of claims 1-19, wherein the method further comprises sequencing mRNAs from the population of T cells to obtain the single cell transcriptome data.
21. The method of any one of claims 1-20, wherein the method further comprises providing single-cell T-cell receptor (scTCR) data of the population of T cells.
22. The method of claim 21, wherein the method further comprises sequencing the population of T cells to obtain the scTCR data of each T cell.
23. The method of claim 21 or 22, wherein the method further comprises identifying a TCR clonotype of an exhausted CD4+ T cell or an exhausted CD8+ T cell based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells.
24. The method of claim 23, wherein the method further comprises identifying TCR clonotypes of each exhausted CD4+ T cell of the population of T cells based on the scTCR data of exhausted CD4+ T cells.
25. The method of claim 23 or 24, wherein the method further comprises identifying TCR clonotypes of each exhausted CD8+ T cell of the population of T cells based on the scTCR data of exhausted CD8+ T cells.
26. The method of claim 23, wherein the method further comprises identifying TCR clonotypes of each exhausted CD4+ T cell and each exhausted CD8+ T cell of the population of T cells based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells.
27. The method of any one of claims 1-26, wherein a TCR clonotype of a given exhausted CD4+ T cell is matched to the CD4+ exhaustion score and/or the CD4+ GSEA score of the same exhausted CD4+ T cell.
28. The method of claim 27, wherein the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell.
29. The method of claim 28, wherein the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell via a same single cell barcode.
30. The method of any one of claims 1-26, wherein a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score and/or the CD8+ GSEA score of the same exhausted CD8+ T cell.
31. The method of claim 30, wherein the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell.
32. The method of claim 31, wherein the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell via a same single cell barcode.
33. The method of any one of claims 1-26, wherein the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD4+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD4+ T cells.
34. The method of claim 33, wherein the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD4+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD4+ T cells.
35. The method of any one of claims 1-26, 33 and 34, wherein the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD8+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD8+ T cells.
36. The method of claim 35, wherein the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD8+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD8+ T cells.
37. The method of any one of claims 1-36, wherein the method further comprises, prior to obtaining the single cell transcriptomic data, separating a subset of T cells from the population of T cells based on expression of a CD4+ and/or CD8+ exhaustion marker, thereby generating a subset of exhausted T cells and a subset of non-exhausted T cells.
38. The method of claim 37, wherein the CD4+ and/or CD8+ exhaustion marker comprises at least 5 genes selected from the group consisting of genes in Tables 3-6.
39. The method of claim 37 or 38, wherein separating comprises fluorescence activated cell sorting (FACS).
40. The method of any one of claims 37-39, wherein the method further comprises sequencing the subset of exhausted T cells and the subset of non-exhausted T cells using single cell sequencing or bulk sequencing.
41. The method of claim 40, wherein the sequencing does not comprise using a barcode.
42. The method of any one of claims 1-41, wherein the population of T cells are obtained from a frozen sample or a fresh sample.
43. The method of claim 42, wherein the sample is a formalin-fixed paraffin-embedded (FFPE) sample.
44. The method of claim 42, wherein the sample is not a FFPE sample.
45. The method of any one of claims 34-44, wherein the method further comprises preparing a pharmaceutical composition using the candidate tumor-reactive TCR clonotype or a cell expressing the candidate tumor-reactive TCR clonotype.
46. A method of identifying one or more T-cell receptors from exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer, the method comprising:
(a) providing single cell transcriptome data of the population of T cells;
(b) classifying each T cell of the population of T cells as a CD4+ T cell or a CD8+ T cell based on an expression level of each classification gene of a set of at least 10 classification genes from the single cell transcriptome data, thereby generating a CD4+ cluster and a CD8+ cluster;
(c) calculating
(i) a CD4+ exhaustion score and/or a CD4+ gene set enrichment analysis (GSEA) score for a T cell of the CD4+ cluster based on an expression level of each CD4+ exhaustion gene marker of a set of at least 5 CD4+ exhaustion gene markers, and
(ii) a CD8+ exhaustion score and/or a CD8+ GSEA score for a T cell of the CD8+ cluster based on an expression level of each CD8+ exhaustion gene marker of a set of at least 5 CD8+ exhaustion gene markers, wherein the set of at least 5 CD4+ exhaustion gene markers is different from the set of at least 5 CD8+ exhaustion gene markers, wherein: each T cell within the CD4+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD4+ T cell, and each T cell within the CD8+ cluster with an exhaustion score and/or a GSEA score equal to or higher than a threshold value is identified as an exhausted CD8+ T cell; and
(d) identifying TCR clonotypes of the exhausted CD4+ T cells and exhausted CD8+ T cells separately based on single-cell T-cell receptor (scTCR) data of exhausted CD4+ T cells and exhausted CD8+ T cells identified in (c).
47. The method of any one of claims 1, 3, 5, and 6-46, wherein the set of at least 10 classification genes comprises at least 10 genes selected from the group consisting of PTPN13, TNFRSF4, CCR6, FOXP3, TSHZ2, MFHAS1, FAAH2, CD4, GK, IL2RA, CRADD, LTB, IRS2, KLRB1, TNFRSF25, LINC02694, THAD A, BATF, TNFRSF18, SELL, IL12RB2, FURIN, HIPK2, MAP3K5, TMEM173, CTSB, SAMHD1, ADAM19, ICOS, GNA15, EPSTI1, ZC3H12D, PHTF2, MAST4, UGP2, RAPGEF6, STAM, CTLA4, RORA, SATB1, ZEB1, PIM2, CD28, LDLRAD4, PELI1, RHBDD2, SOCS3, TRAF3, ABCC1, RNASET2, SPOCK2, ITK, STK24, SNX9, GZMA, RALGAPA1, GZMB, JMJD6, ZEB2, DUSP2, CLEC2B, GABARAPL1, SLA2, LITAF, AKNA, LYST, ITGA4, TUBA4A, IFNG, METRNL, CST7, IER5L, MXRA7, GGA2, AUTS2, APOBEC3G, NELL2, LYAR, GALNT11, PTMS, CMC1, AOAH, LAG3, PRF1, TNFSF9, CCL5, CCL4, CTSW, GZMH, GNLY, YBX3, GZMK, CRTAM, CD8A, KLRK1, NKG7, KLRD1, CD8B, and LINC02446.
48. The method of claim 47, wherein classifying each T cell of the population of T cells comprises classifying each T cell of the population of T cells as a CD4+ cell and/or a CD8+ cell based on an expression level of each classification gene of a set of from 11 to 99 classification genes selected from the group consisting of PTPN13, TNFRSF4, CCR6, FOXP3, TSHZ2, MFHAS1, FAAH2, CD4, GK, IL2RA, CRADD, LTB, IRS2, KLRB1, TNFRSF25, LINC02694, THADA, BATF, TNFRSF18, SELL, IL12RB2, FURIN, HIPK2, MAP3K5, TMEM173, CTSB, SAMHD1, ADAM19, ICOS, GNA15, EPSTI1, ZC3H12D, PHTF2, MAST4, UGP2, RAPGEF6, STAM, CTLA4, RORA, SATB1, ZEB1, PIM2, CD28, LDLRAD4, PELI1, RHBDD2, SOCS3, TRAF3, ABCC1, RNASET2, SPOCK2, ITK, STK24, SNX9, GZMA, RALGAPA1, GZMB, JMJD6, ZEB2, DUSP2, CLEC2B, GABARAPL1, SLA2, LITAF, AKNA, LYST, ITGA4, TUBA4A, IFNG, METRNL, CST7, IER5L, MXRA7, GGA2, AUTS2, APOBEC3G, NELL2, LYAR, GALNT11, PTMS, CMC1, AOAH, LAG3, PRF1, TNFSF9, CCL5, CCL4, CTSW, GZMH, GNLY, YBX3, GZMK, CRTAM, CD8A, KLRK1, NKG7, KLRD1, CD8B, and LINC02446.
49. The method of any one of claims 1-3 and 7-48, wherein the set of at least 5 CD4+ exhaustion gene markers comprises at least 5 genes selected from the group consisting of ADGRG1, CD200, CHN1, CPM, CTLA4, CXCL13, DRAIC, ENTPD1, GNG4, ICA1, IGFL2, IGFLR1, MY07A, PDE7B, NMB, PTPN13, TIGIT, TOX, TOX2, and TSHZ2.
50. The method of claim 49, wherein the set of at least 5 CD4+ exhaustion gene markers comprises from 6 to 20 genes selected from the group consisting of ADGRG1, CD200, CHN1, CPM, CTLA4, CXCL13, DRAIC, ENTPD1, GNG4, ICA1, IGFL2, IGFLR1, MY07A, PDE7B, NMB, PTPN13, TIGIT, TOX, TOX2, and TSHZ2.
51. The method of any one of claims 1 and 4-50, wherein the set of at least 5 CD8+ exhaustion gene markers comprises at least 5 genes selected from the group consisting of CHN1, CLECL1, CTLA4, CXCL13, CXCR6, ENTPD1, HAVCR2, KRT86, LAG3, LAYN, LRRN3, MYO IE, MY07A, PDCD1, SIRPG, SRGAP3, STAM, TIGIT, TNFRSF18, and TOX.
52. The method of claim 51, wherein the set of at least 5 CD8+ exhaustion gene markers comprises from 6 to 20 genes selected from the group consisting of CHN1, CLECL1, CTLA4, CXCL13, CXCR6, ENTPD1, HAVCR2, KRT86, LAG3, LAYN, LRRN3, MY01E, MY07A, PDCD1, SIRPG, SRGAP3, STAM, TIGIT, TNFRSF18, and TOX.
53. The method of any one of claims 1-3 and 7-52, wherein calculating the CD4+ exhaustion score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each CD4+ exhaustion gene of the set of at least 5 CD4+ exhaustion gene markers to obtain the expression level of each CD4+ exhaustion gene of the set of at least 5 CD4+ exhaustion gene markers; (ii) scaling the UMI count by dividing the UMI count for each gene by the total number of UMIs of the single cell transcriptome data of the T cell and then multiplying the quotient by a scale factor; (iii) applying a logarithmic transformation to obtain normalized UMI counts for the set of at least 5 CD4+ exhaustion gene markers; and (iv) calculating the CD4+ exhaustion score for the T cell as a mean of the normalized UMI counts, wherein the T cell with a CD4+ exhaustion score equal to or higher than 0.65 is identified as an exhausted CD4+ T cell.
54. The method of any one of claims 1 and 4-53, wherein calculating the CD8+ exhaustion score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each CD8+ exhaustion gene of the set of at least 5 CD8+ exhaustion gene markers to obtain the expression level of each gene of the set of at least 5 exhaustion gene markers; (ii) scaling the UMI count by dividing the UMI count for each CD8+ exhaustion gene by the total number of UMIs of the single cell transcriptome data of the T cell and then multiplying the quotient by a scale factor ; (iii) applying a logarithmic transformation to obtain normalized UMI counts for the set of at least 5 CD8+ exhaustion gene markers; and (iv) calculating the CD8+ exhaustion score for the T cell as a mean of the normalized UMI counts, wherein the
T cell with a CD8+ exhaustion score equal to or higher than 0.65 is identified as an exhausted CD8+ T cell.
55. The method of claim 53 or 54, wherein the scale factor is 10,000.
56. The method of any one of claims 53- 55, wherein the set of at least 5 CD4+ exhaustion gene markers comprises at least 5 genes selected from the group consisting of ADD3, AGFG1, AHI1, AP3S1, ARAP2, ARHGEF3, ATP2A2, CCDC6, CD200, CD27, CH25H, CHN1, CNIH1, COTL1, CPM, CRYBG1, CTLA4, CXCL13, DUSP4, ELMO1, FABP5, FBLN7, FBXO32, FKBP5, FOXN2, FYB1, GEM, GK, GPRIN3, GRSF1, GYPC, HIPK2, HMGB2, ICA1, IL6ST, IQGAP1, ITM2A, ITPR1, JARID2, LHFPL6, LIMSI, LRMP, LRRC8D, MAGEH1, MTHFD2, NAP1L4, NCOA7, NFATC2, NMB, NR3C1, NUDT16, PDCD1, PGM2L1, PHACTR2, POR, PTPN13, RBPJ, RNF19A, SESN1, SESN3, SH2D1A, SLA, SMARCA2, SMARCAD1, SMS, SNX9, SRGN, STAT3, TIAM1, TIGIT, TMEM243, TMEM64, TMEM70, TMPO, TNFAIP8, TNFRSF18, TNFSF8, TNIK, TOX, TOX2, TP53BP2, TP53INP1, TRABD2A, TSHZ2, UGCG, WNK1, YWHAQ and CD4
57. The method of claim 56, wherein the set of at least 5 CD4+ exhaustion gene markers comprises from 6 to 88 genes selected from the group consisting of ADD3, AGFG1, AHI1, AP3S1, ARAP2, ARHGEF3, ATP2A2, CCDC6, CD200, CD27, CH25H, CHN1, CNIH1, COTL1, CPM, CRYBG1, CTLA4, CXCL13, DUSP4, ELM01, FABP5, FBLN7, FBXO32, FKBP5, FOXN2, FYB1, GEM, GK, GPRIN3, GRSF1, GYPC, HIPK2, HMGB2, ICA1, IL6ST, IQGAP1, ITM2A, ITPR1, JARID2, LHFPL6, LIMSI, LRMP, LRRC8D, MAGEH1, MTHFD2, NAP1L4, NCOA7, NFATC2, NMB, NR3C1, NUDT16, PDCD1, PGM2L1, PHACTR2, POR, PTPN13, RBPJ, RNF19A, SESN1, SESN3, SH2D1A, SLA, SMARCA2, SMARCAD1, SMS, SNX9, SRGN, STAT3, TIAM1, TIGIT, TMEM243, TMEM64, TMEM70, TMPO, TNFAIP8, TNFRSF18, TNFSF8, TNIK, TOX, TOX2, TP53BP2, TP53INP1, TRABD2A, TSHZ2, UGCG, WNK1, YWHAQ and CD4.
58. The method of any one of claims 53-56, wherein the set of at least 5 CD8+ exhaustion gene markers comprises at least 5 genes selected from the group consisting of AHSA1, ALOX5AP, BAG3, BST2, CACYBP, CARD 16, CD3D, CD7, CD82, CHN1, CLECL1, CLEC2B, CLEC2D, CTLA4, CTSD, CXCL13, CXCR6, DUSP4, ENTPD1, FKBP1A, GAPDH, GEM, GZMB, HAVCR2, HLA-DRB1, HSPB1, ICOS, IQGAP1, ITGAE, KRT86, LAG3, LAYN, LSP1, NAP1L4, NR3C1, PDCD1, PELI1, PHLDA1, POLR1E, PRDM1, PTPN22, RAB11FIP1, RAB27A, RBPJ, RGS1, RGS2, RHBDD2, RUNX2, SAMSN1, SERPINH1, SH3BGRL3, SLA, SNX9, SRGAP3, STAM, TIGIT, TNFRSF9, TOX, TTN, CD8A, and CD8B.
59. The method of claim 58, wherein the set of at least 5 CD8+ exhaustion gene markers comprises from 6 to 61 genes selected from the group consisting of AHSA1, AL0X5AP, BAG3, BST2, CACYBP, CARD16, CD3D, CD7, CD82, CHN1, CLECL1, CLEC2B, CLEC2D, CTLA4, CTSD, CXCL13, CXCR6, DUSP4, ENTPD1, FKBP1A, GAPDH, GEM, GZMB, HAVCR2, HLA-DRB1, HSPB1, ICOS, IQGAP1, ITGAE, KRT86, LAG3, LAYN, LSP1, NAP1L4, NR3C1, PDCD1, PELI1, PHLDA1, POLR1E, PRDM1, PTPN22, RABI 1FIP1, RAB27A, RBPJ, RGS1, RGS2, RHBDD2, RUNX2, SAMSN1, SERPINH1, SH3BGRL3, SLA, SNX9, SRGAP3, STAM, TIGIT, TNFRSF9, TOX, TTN, CD8A, and CD8B
60. The method of any one of claims 1-3 and 7-59, wherein (A) calculating the CD4+ GSEA score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) increasing a running-sum statistic for each CD4+ exhaustion gene of all genes that appears in the set of at least 5 CD4+ exhaustion gene markers and decreasing a running-sum statistic for each CD4+ exhaustion gene of all genes that does not appear in the set of at least 5 CD4+ exhaustion gene markers; and (iv) calculating the CD4+ GSEA score based on running-sum statistics, wherein the T cell with a CD4+ GSEA score equal to or higher than a cutoff value established from data distribution is identified as an exhausted CD4+ T cell, or (B) calculating the CD4+ GSEA score for the T cell of the CD4+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) calculating an area under the curve (AUC) value of a set of at least 5 CD4+ exhaustion genes; and (iv) calculating the CD4+ GSEA score based on AUC values, wherein the T cell with a CD4+ GSEA score equal to or higher than a cutoff value established from data distribution is identified as an exhausted CD4+ T cell.
61. The method of claim 60, wherein the cutoff value established from data distribution in (A) or (B) is 0.2.
62. The method of claim 60 or 61, wherein calculating in (B)(iii) comprises assessing recovery of the set of at least 5 CD4+ exhaustion genes.
63. The method of any one of claims 60-62, wherein the set of at least 5 CD4+ exhaustion genes are selected among the top ranked genes from the UMI rank obtained in (B)(ii).
64. The method of any one of claims 1 and 4-63, wherein (A) calculating the CD8+ GSEA score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by
ranking UMI counts of all genes; (iii) increasing a running-sum statistic for each CD8+ exhaustion gene of all genes that appears in the set of at least 5 CD8+ exhaustion gene markers and decreasing a running-sum statistic for each CD8+ exhaustion gene of all genes that does not appear in the set of at least 5 CD8+ exhaustion gene markers; and (iv) calculating the CD8+ GSEA score based on running-sum statistics, wherein the T cell with a CD8+ GSEA score equal to or higher than a cutoff value established from data distribution is identified as an exhausted CD8+ T cell, or (B) calculating the CD8+ GSEA score for the T cell of the CD8+ cluster comprises (i) obtaining an UMI count for each gene of all genes in the single cell transcriptome data; (ii) obtaining an UMI rank of all genes by ranking UMI counts of all genes; (iii) calculating an area under the curve (AUC) value of a set of at least 5 CD8+ exhaustion genes; and (iv) calculating the CD8+ GSEA score based on AUC values, wherein the T cell with a CD8+ GSEA score equal to or higher than a cutoff value established from data distribution is identified as an exhausted CD8+ T cell.
65. The method of claim 64, wherein the cutoff value from data distribution in (A) or (B) is 0.3.
66. The method of claim 64 or 65, wherein calculating in (B)(iii) comprises assessing recovery of the set of at least 5 CD8+ exhaustion genes.
67. The method of any one of claims 64-66, wherein the set of at least 5 CD8+ exhaustion genes are selected among the top ranked genes from the UMI rank obtained in (B)(ii).
68. The method of any one of claims 49-67, wherein the method further comprises calculating the CD4+ exhaustion score and the CD4+ GSEA score for the T cell of the CD4+ cluster.
69. The method of any one of claims 49-68, wherein the method further comprises calculating the CD8+ exhaustion score and the CD8+ GSEA score for the T cell of the CD8+ cluster.
70. The method of claim 69, wherein the method further comprises identifying TCR clonotypes of the exhausted CD4+ T cells and exhausted CD8+ cells separately based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells identified in (c), wherein the exhausted CD4+ T cells have both the CD4+ exhaustion score and the CD4+ GSEA score above the threshold value, and the exhausted CD8+ T cells have both the CD8+ exhaustion score and the CD8+ GSEA score above the threshold value.
71. The method of claim 69, wherein the method further comprises identifying TCR clonotypes of the exhausted CD4+ T cells and exhausted CD8+ cells separately based on the scTCR data of exhausted CD4+ T cells and exhausted CD8+ T cells identified in (c), wherein the exhausted CD4+ T cells have the CD4+ exhaustion score or the CD4+ GSEA score above the threshold value, and the exhausted CD8+ T cells have the CD8+ exhaustion score or the CD8+ GSEA score above the threshold value.
72. The method of claim 70 or 71, wherein,
(a) for each TCR clonotype identified in a CD4+ exhausted T cell, the method comprises calculating a mean or median CD4+ exhaustion score and/or exhaustion score and a mean or median CD4+ GSEA score for all CD4+ exhausted T cells having the same TCR clonotype; and/or
(b) for each TCR clonotype identified in a CD8+ exhausted T cell, the method comprises calculating a mean or median CD8+ exhaustion score and/or exhaustion score and a mean or median CD8+ GSEA score for all CD8+ exhausted T cells having the same TCR clonotype.
73. The method of any one of claims 70-72, wherein,
(a) for each TCR clonotype identified in a CD4+ exhausted T cell, the method comprises identifying a maximum CD4+ exhaustion score and/or exhaustion score and a maximum CD4+ GSEA score for all CD4+ exhausted T cells having the same TCR clonotype; and/or
(b) for each TCR clonotype identified in a CD8+ exhausted T cell, the method comprises identifying a maximum CD8+ exhaustion score and/or exhaustion score and a maximum CD8+ GSEA score for all CD8+ exhausted T cells having the same TCR clonotype.
74. The method of any one of claims 46-73, wherein a TCR clonotype of a given exhausted CD4+ T cell is matched to the CD4+ exhaustion score and/or the CD4+ GSEA score of the same exhausted CD4+ T cell.
75. The method of claim 74, wherein the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell.
76. The method of claim 75, wherein the TCR clonotype of a given exhausted CD4+ T cell is matched to the single cell transcriptome data of the same exhausted CD4+ T cell via a same single cell barcode.
77. The method of any one of claims 46-73, wherein a TCR clonotype of a given exhausted CD8+ T cell is matched to the CD8+ exhaustion score and/or the CD8+ GSEA score of the same exhausted CD8+ T cell.
78. The method of claim 77, wherein the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell.
79. The method of claim 78, wherein the TCR clonotype of a given exhausted CD8+ T cell is matched to the single cell transcriptome data of the same exhausted CD8+ T cell via a same single cell barcode.
80. The method of any one of claims 46-73, wherein the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD4+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD4+ T cells.
81. The method of claim 80, wherein the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD4+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD4+ T cells.
82. The method of any one of claims 46-73, 80 and 81, wherein the method further comprises identifying a number of single cells expressing a TCR of a given TCR clonotype that is present in a group of exhausted CD8+ T cells and a number of single cells expressing the same TCR of the given TCR clonotype that is present in a group of non-exhausted CD8+ T cells.
83. The method of claim 82, wherein the method further comprises selecting a TCR of a given TCR clonotype as a candidate tumor-reactive TCR clonotype when the number of single cells expressing the TCR of the given TCR clonotype that is present in the group of exhausted CD8+ T cells is larger than the number of single cells expressing the same TCR of the given TCR clonotype that is present in the group of non-exhausted CD8+ T cells.
84. The method of any one of claims 70-83, further comprising selecting candidate tumor- reactive TCR clonotypes from the TCR clonotypes identified for the exhausted CD4+ T cells and/or the exhausted CD8+ T cells, and wherein the candidate tumor-reactive TCR clonotypes are further quality checked by (i) unique pairing of TCR alpha chain and TCR beta chain, (ii) match to known TCRs from a public database; and/or (iii) expression of innate immune cell markers.
85. The method of claim 84, wherein quality checking comprises excluding candidate tumor- reactive TCR clonotypes which (i) have unique pairing of TCR alpha chain and TCR beta chain, (ii) match to known TCRs from a public database; and/or (iii) express innate immune cell markers.
86. The method of claim 84, wherein candidate tumor-reactive TCR clonotypes that match to a known TCR that recognizes a non-oncogenic pathogen are not selected.
87. The method of any one of claims 70-86, wherein the method further comprises ranking the candidate tumor-reactive TCR clonotypes of the exhausted CD4+ T cells based on clone size.
88. The method of any one of claim 70-87, wherein the method further comprises ranking the candidate tumor-reactive TCR clonotypes of the exhausted CD8+ T cells based on clone size.
89. The method of any one of claims 72-88, wherein the method further comprises ranking the candidate tumor-reactive TCR clonotypes with similar clone sizes based on the mean or median CD4+ exhaustion score, the maximum CD4+ exhaustion score, the mean or median CD4+ GSEA score, and/or the maximum CD4+ GSEA score for all CD4+ exhausted T cells.
90. The method of any one of claims 72-89, wherein the method further comprises ranking the candidate tumor-reactive TCR clonotypes with similar clone sizes based on the mean or median CD8+ exhaustion score, the maximum CD8+ exhaustion score, the mean or median CD8+ GSEA score, and/or the maximum CD8+ GSEA score for all CD8+ exhausted T cells.
91. The method of any one of claims 72-90, wherein the same TCR clonotype is determined by having the same CDR3 sequence.
92. The method of any one of claims 84-91, wherein the candidate tumor-reactive TCR clonotypes that match to known TCRs are determined by having the same CDR3 sequence.
93. The method of any one of claims 87-92, wherein the candidate tumor-reactive TCR clonotype of a proliferating cell is given a higher weighting value when ranking the candidate tumor-reactive TCR clonotypes.
94. The method of any one of claims 70-93, wherein the candidate tumor-reactive TCR clonotypes are predicted to be therapeutically relevant.
95. The method of claim 94, wherein a median positive predictive value (PPV) is at least 0.1 for CD4+ TCR clones or at least 0.1 for CD8+ TCR clones.
96. The method of any one of claims 84-95, wherein the method further comprises selecting at least one candidate tumor-reactive TCR clonotype from at least the top 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 or more of the candidate tumor-reactive TCR clonotypes ranked.
97. The method of any one of claims 84-95, further comprises delivering a nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype of the candidate tumor-reactive TCR clonotypes into a cell.
98. The method of claim 97, further comprising administering the nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype of the candidate tumor- reactive TCR clonotypes, or a cell comprising the nucleic acid encoding a TCR comprising
the at least one candidate tumor-reactive TCR clonotype of the candidate tumor-reactive TCR clonotypes into a subject.
99. The method of claim 98, wherein the subject is the same subj ect where the population of T cells are obtained.
100. The method of any one of claims 1-99, wherein the population of T cells are tumor- infiltrating lymphocytes (TILs).
101. The method of any one of claims 1-100, wherein the population of T cells comprises at least 100, at least 500, at least 1,000, at least 2,000, at least 5,000, at least 10,000 or more cells.
102. A method of identifying one or more T-cell receptors as one or more candidate tumor- reactive TCRs from exhausted T cells from a population of T cells obtained from a tumor microenvironment of a subject having a cancer, the method comprising:
(a) providing single cell transcriptome data and single-cell T-cell receptor (scTCR) data of the population of T cells comprising exhausted CD4+ T cells and exhausted CD8+ T cells; and
(b) identifying TCR clonotypes of the exhausted CD4+ T cells or the exhausted CD8+ cells based on the scTCR data of the exhausted CD4+ T cells or the exhausted CD8+ T cells, wherein the exhausted CD4+ T cells or the exhausted CD8+ T cells are identified based on the single cell transcriptome data.
103. The method of claim 102, wherein the exhausted CD4+ T cells or the exhausted CD8+ T cells are identified by the method of any one of claims 1-45.
104. The method of claim 102 or 103, wherein each cell of the exhausted CD4+ T cells or the exhausted CD8+ T cells has an exhaustion score and/or a GSEA score equal to or higher than a threshold value.
105. The method of any one of claims 46-104, wherein the candidate tumor-reactive TCR induces activation of NF AT.
106. The method of any one of claims 46-105, wherein the candidate tumor-reactive TCR induces expression of CD69, fFN-γ, TNF-a, IL-2, and/or IL-18.
107. A nucleic acid encoding a TCR comprising the at least one candidate tumor-reactive TCR clonotype selected by the method of any one of claims 96-106.
108. A cell comprising a TCR comprising the at least one candidate tumor-reactive TCR clonotype selected by the method of any one of claims 96-106 or the nucleic acid of claim 107.
109. A pharmaceutical composition comprising a TCR comprising (a) the at least one candidate tumor-reactive TCR clonotype selected by the method of any one of claims 96-106, the
nucleic acid of claim 107, or the cell of claim 108, and (b) a pharmaceutically acceptable carrier.
110. Use of a TCR comprising the at least one candidate tumor-reactive TCR clonotype selected by the method of any one of claims 96-106, the nucleic acid of claim 107, the cell of claim 108, or the pharmaceutical composition of claim 109 in the manufacturing of a medicament in treating a cancer in a subj ect in need thereof.
111. The use of claim 110, wherein the cancer is selected from the group consisting of bone cancer, blood cancer, lung cancer, liver cancer, pancreatic cancer, skin cancer, cancer of the head or neck, cutaneous or intraocular melanoma, uterine cancer, ovarian cancer, rectal cancer, cancer of the anal region, stomach cancer, colon cancer, breast cancer, prostate cancer, carcinoma of the sexual and reproductive organs, Hodgkin’s Disease, cancer of the esophagus, cancer of the small intestine, cancer of the endocrine system, cancer of the thyroid gland, cancer of the parathyroid gland, cancer of the adrenal gland, sarcoma of soft tissue, cancer of the bladder, cancer of the kidney, renal cell carcinoma, carcinoma of the renal pelvis neoplasms of the central nervous system (CNS), neuroectodermal cancer, spinal axis tumors glioma, meningioma, and pituitary adenoma.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463639993P | 2024-04-29 | 2024-04-29 | |
| US63/639,993 | 2024-04-29 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025229503A1 true WO2025229503A1 (en) | 2025-11-06 |
Family
ID=95743609
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/IB2025/054407 Pending WO2025229503A1 (en) | 2024-04-29 | 2025-04-28 | Methods for identifying exhausted t cells and t-cell receptors thereof |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025229503A1 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5391377A (en) | 1990-10-19 | 1995-02-21 | Cortecs Limited | Biphasic release formations for lipophilic acids |
| US20100105112A1 (en) | 2006-08-07 | 2010-04-29 | Christian Holtze | Fluorocarbon emulsion stabilizing surfactants |
| US20140155295A1 (en) | 2012-08-14 | 2014-06-05 | 10X Technologies, Inc. | Capsule array devices and methods of use |
-
2025
- 2025-04-28 WO PCT/IB2025/054407 patent/WO2025229503A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5391377A (en) | 1990-10-19 | 1995-02-21 | Cortecs Limited | Biphasic release formations for lipophilic acids |
| US20100105112A1 (en) | 2006-08-07 | 2010-04-29 | Christian Holtze | Fluorocarbon emulsion stabilizing surfactants |
| US20140155295A1 (en) | 2012-08-14 | 2014-06-05 | 10X Technologies, Inc. | Capsule array devices and methods of use |
Non-Patent Citations (9)
| Title |
|---|
| FIX, PHARM RES., vol. 13, 1996, pages 1760 - 1764 |
| FOROUTAN MOMENEH ET AL: "The Ratio of Exhausted to Resident Infiltrating Lymphocytes Is Prognostic for Colorectal Cancer Patient Outcome", CANCER IMMUNOLOGY RESEARCH, vol. 9, no. 10, 1 October 2021 (2021-10-01), US, pages 1125 - 1140, XP093299402, ISSN: 2326-6066, DOI: 10.1158/2326-6066.CIR-21-0137 * |
| GUO XINYI ET AL: "Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing", NATURE MEDICINE(AUTHOR MANUSCRIPT ), NATURE PUBLISHING GROUP US, NEW YORK, vol. 24, no. 7, 25 June 2018 (2018-06-25), pages 978 - 985, XP036542070, ISSN: 1078-8956, [retrieved on 20180625], DOI: 10.1038/S41591-018-0045-3 * |
| HANADA KEN-ICHI ET AL: "A phenotypic signature that identifies neoantigen-reactive T cells in fresh human lung cancers", CANCER CELL, CELL PRESS, US, vol. 40, no. 5, 21 April 2022 (2022-04-21), pages 479, XP087049624, ISSN: 1535-6108, [retrieved on 20220421], DOI: 10.1016/J.CCELL.2022.03.012 * |
| HANJIE LI ET AL: "Dysfunctional CD8 T Cells Form a Proliferative, Dynamically Regulated Compartment within Human Melanoma", CELL, vol. 176, no. 4, 7 February 2019 (2019-02-07), Amsterdam NL, pages 775 - 789, XP055712906, ISSN: 0092-8674, DOI: 10.1016/j.cell.2018.11.043 * |
| LOWERY FRANK J. ET AL: "Molecular signatures of antitumor neoantigen-reactive T cells from metastatic human cancers", SCIENCE - AUTHOR MANUSCRIPT, vol. 375, no. 6583, 25 February 2022 (2022-02-25), US, pages 877 - 884, XP093090824, ISSN: 0036-8075, Retrieved from the Internet <URL:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8996692/pdf/nihms-1790595.pdf> DOI: 10.1126/science.abl5447 * |
| PAUL: "Fundamental Immunology", 1993, RAVEN PRESS |
| SAMANEN, J. PHARM. PHARMACOL., vol. 48, 1996, pages 119 - 135 |
| STITES ET AL.: "Immunology", 1994, LANGE PUBLISHING |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Kristensen et al. | Neoantigen-reactive CD8+ T cells affect clinical outcome of adoptive cell therapy with tumor-infiltrating lymphocytes in melanoma | |
| Sethna et al. | RNA neoantigen vaccines prime long-lived CD8+ T cells in pancreatic cancer | |
| Hu et al. | Personal neoantigen vaccines induce persistent memory T cell responses and epitope spreading in patients with melanoma | |
| Keskin et al. | Neoantigen vaccine generates intratumoral T cell responses in phase Ib glioblastoma trial | |
| Li et al. | Neoantigen vaccination induces clinical and immunologic responses in non-small cell lung cancer patients harboring EGFR mutations | |
| Wang et al. | Pairing of single-cell RNA analysis and T cell antigen receptor profiling indicates breakdown of T cell tolerance checkpoints in atherosclerosis | |
| Ott et al. | An immunogenic personal neoantigen vaccine for patients with melanoma | |
| Lopez et al. | Autogene cevumeran with or without atezolizumab in advanced solid tumors: a phase 1 trial | |
| JP7668226B2 (en) | Compositions and methods for preparing T cell compositions and uses thereof | |
| Gros et al. | Prospective identification of neoantigen-specific lymphocytes in the peripheral blood of melanoma patients | |
| Qiu et al. | Single‐cell landscape highlights heterogenous microenvironment, novel immune reaction patterns, potential biomarkers and unique therapeutic strategies of cervical squamous carcinoma, Human Papillomavirus‐Associated (HPVA) and Non‐HPVA Adenocarcinoma | |
| Pritchard et al. | Exome sequencing to predict neoantigens in melanoma | |
| CN110741260B (en) | Methods for predicting the availability of disease-specific amino acid modifications for immunotherapy | |
| Borgers et al. | Personalized, autologous neoantigen-specific T cell therapy in metastatic melanoma: a phase 1 trial | |
| Hou et al. | The neurotransmitter calcitonin gene-related peptide shapes an immunosuppressive microenvironment in medullary thyroid cancer | |
| Kast et al. | Advances in identification and selection of personalized neoantigen/T-cell pairs for autologous adoptive T cell therapies | |
| US20230138309A1 (en) | Methods of isolating t-cells and t-cell receptors from tumor by single-cell analysis for immunotherapy | |
| JP2024511444A (en) | antigen-reactive T cell receptor | |
| JP2024506839A (en) | How to treat cancer using kinase inhibitors | |
| Li et al. | High-throughput screening of functional neo-antigens and their specific T-cell receptors via the jurkat reporter system combined with droplet microfluidics | |
| Alburquerque-González et al. | Design of personalized neoantigen RNA vaccines against cancer based on next-generation sequencing data | |
| US20240024439A1 (en) | Administration of anti-tumor vaccines | |
| Jiang et al. | Annexin A1-FPR1 Interaction in dendritic cells promotes immune microenvironment modulation in Thyroid Cancer | |
| WO2025229503A1 (en) | Methods for identifying exhausted t cells and t-cell receptors thereof | |
| WO2023131323A1 (en) | Novel personal neoantigen vaccines and markers |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25725577 Country of ref document: EP Kind code of ref document: A1 |