WO2024061628A1 - Allele specific expression - Google Patents
Allele specific expression Download PDFInfo
- Publication number
- WO2024061628A1 WO2024061628A1 PCT/EP2023/074437 EP2023074437W WO2024061628A1 WO 2024061628 A1 WO2024061628 A1 WO 2024061628A1 EP 2023074437 W EP2023074437 W EP 2023074437W WO 2024061628 A1 WO2024061628 A1 WO 2024061628A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- tumour
- expressed
- mutation
- specific mutation
- specific
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Definitions
- the present disclosure relates to methods for determining whether a tumour-specific mutated allele is likely to be expressed and for identifying tumour-specific mutated alleles that are expressed in a tumour.
- the present disclosure also relates to methods and compositions for the treatment of cancer which make use of or target neoantigens.
- cancer antigens such as those resulting from cancer specific variants (also referred to as “tumour-specific mutations”) represent promising therapeutic targets for immunotherapy provided that they are expressed by cancer cells (see e.g. Heemskerk, Kvitsborg & Schumacher, 2013).
- RNA sequencing could be used instead of WGS/WES to identify tumour-specific mutations, with the advantage that this is not limited to known genes and can identify e.g. intragenic fusions, novel transcripts, etc.
- tumour-specific variants there is a need for improved methods to determine the expression of tumour- specific variants.
- RNA expression is insufficient in the context of cancer-specific mutations because it combines information from variant and normal transcripts of the gene (both of which may be present in the tumour cells) and provides an aggregate signal from healthy and tumour cells (since tumour samples are typically mixed samples comprising both types of cells, the latter being potentially genetically heterogeneous), altogether providing very uncertain information about whether a variant is in fact expressed.
- tumour samples are typically mixed samples comprising both types of cells, the latter being potentially genetically heterogeneous
- the present inventors have recognised that there is a need for a more sensitive approach that can take into account all of the factors mentioned above to more confidently identify the presence of variants in RNA expression data.
- the inventors have devised an approach to determine the probability that a tumour-specific mutation is expressed in a tumour using RNA sequence data from one or more samples comprising tumour cells or genetic material derived therefrom, as well as an approach to determine whether the RNA sequence data that is available provides sufficient information to determine whether a tumour-specific mutation is expressed in a sample.
- This method finds particular use in identifying neoantigens, for example for the purpose of cancer therapy or prognosis.
- the present inventors when assessing whether a T cell reaction can be observed in vitro for a plurality of tumour-specific mutations identified e.g. from genomic sequence data, the present inventors have identified that many mutations that do trigger a reaction are not identified as expressed in RNA-seq data. In other words, the inventors have identified the presence of a subset of mutations for which no expression is detected, but which were found to be immunogenic. There are multiple possible explanations for this. For example, it is possible that these mutations are immunogenic but not expressed due to immuno-editing. In other words, it is possible that there is truly no expression of the mutation. Another possibility is that the variant is expressed but there is low power to detect expression at this locus in the RNA- seq data.
- the present inventors devised a method to calculate the power to detect a mutation in expression data.
- This method is particularly useful to identify whether known mutations that may be expressed at low levels (where the mutation and/or the locus is expressed at low level and/or the tumour purity of the sample is low) are truly not expressed or likely false negatives (mutations that may be expressed but are not detected in the data at hand).
- the method can also be used to identify mutations directly from RNA expression data. This may be particularly useful to identify splicing variants such as e.g. retained introns and skipped exons, or any variant that cannot be straightforwardly identified using genomic data for example due to mappability problems such as fusions.
- the method uses a rigorous statistical framework to classify individual mutations as expressed, and provides a probability reflecting the confidence in the assignment.
- a method of determining whether a tumour- specific mutation is likely to be expressed in a subject comprising: providing, or obtaining, RNA sequence data from one or more samples from the subject comprising tumour genetic material, the RNA sequence data comprising for each of the one or more samples: the number of RNA reads in the sample that show the tumour-specific mutation (b), and the total number of RNA reads at the location of the tumour-specific mutation (d); and determining the likelihoods of the sequence data if the tumour-specific mutation is (i) expressed ( ⁇ ( ⁇ , ⁇
- ⁇ , ⁇ , ⁇ 1)) and (ii) not-expressed ( ⁇ ( ⁇ , ⁇
- ⁇ , ⁇ , ⁇ 0)).
- the likelihoods of the sequence data may also be referred to as the probability of observing the sequence data if the mutation is expressed, and the probability of observing the sequence data if the mutation is not expressed.
- the method of the present aspect may have one or more of the following features.
- the likelihoods may be the probabilities of the sequence data in view of (conditional on) a tumour fraction for each of the one or more samples, and the fraction of the total number of reads at the location of the tumour-specific mutation that originate from a population of cells in the one or more samples that does not comprise the tumour-specific mutation (e.g. normal cells), the samples further comprising a population of cells that comprises the tumour-specific mutation (e.g. tumour cells) representing a fraction of the sample equal to the tumour fraction.
- the likelihoods may depend on the probability of sampling a sequence read comprising the tumour-specific mutation from a sample if the tumour-specific mutation is expressed or not expressed, respectively, depending on the sequencing error rate, the genotypes of the tumour and normal cell populations, the tumour fraction of the sample ,and the fraction of total read counts for the gene comprising the tumour-specific mutation which is due to the normal cell population.
- the likelihoods of the sequence data may be the probabilities of observing the sequence data in view of (conditional on) a tumour fraction for each of the one or more samples (t), and the fraction ( ⁇ ) of the total number of reads at the location of the tumour- specific mutation that originate from a population of cells in the one or more samples that does not comprise the tumour-specific mutation.
- the samples may therefore comprise a population of cells that does not comprise the tumour-specific mutation and a tumour population of cells that comprises the tumour-specific mutation, the latter representing a fraction of the sample equal to the tumour fraction.
- the likelihoods may be determined using equations (5) or (5’).
- the method may further comprise comparing the likelihoods obtained thereby determining whether the tumour-specific mutation is likely to be expressed in the subject.
- ⁇ , ⁇ , ⁇ 1 ) ) and (ii) not-expressed ( ⁇ ( ⁇ , ⁇
- ⁇ , ⁇ , ⁇ 0 ) ).
- ⁇ , ⁇ , ⁇ , ⁇ ) provided by equation (13)), (14) or (14’) (in particular any of the different formulations of ⁇ ( ⁇ 1
- Comparing the likelihoods may comprise determining the power to detect whether the tumour-specific mutation is expressed at a predetermined false positive rate, wherein the power to detect whether the tumour-specific mutation is expressed is the area under the curve of the likelihood of the number of reads that show the tumour specific mutation if the tumour-specific mutation is expressed ( ⁇ ( ⁇ , ⁇
- ⁇ , ⁇ , ⁇ 1), ⁇ ( ⁇
- ⁇ 1), ⁇ ( ⁇
- ⁇ , ⁇ , ⁇ 0 ) , ⁇ ( ⁇
- ⁇ 0 ) , ⁇ ( ⁇
- the power to detect a mutation as being expressed may be determined using equation (8), where P(b
- M 1 ) is the likelihood of the sequence data if the mutation is expressed, which may be defined in equation (5) or (5’).
- the choice of a false positive rate may depend on the situation and the user’s preferences. For example, a false positive rate of 0.05 or 0.01 may be chosen.
- the method may further comprise determining whether the number of RNA reads in the sample that show the tumour-specific mutation (b) is below or above the threshold number of reads (b c ).
- the inventors have devised a Bayesian framework for determining the probability that a tumour specific mutation is expressed, and a statistical hypothesis testing framework to decide whether a tumour specific mutation is expressed and quantify the power of this test.
- Both frameworks are based on quantifying the likelihood of an observed number of RNA reads comprising the mutation (in the context of a total number of RNA reads at the location of the tumour specific mutation) if the tumour-specific mutation is expressed, and the likelihood of an observed number of RNA reads comprising the mutation (in the context of a total number of RNA reads at the location of the tumour specific mutation) if the tumour-specific mutation is not expressed. These are then compared to obtain a posterior probability (which depends on the likelihood ratio), or compared to quantify the true positive rate associated with a threshold number of reads comprising the mutation that is required to call the tumour specific mutation as expressed.
- both frameworks enable to rigorously determine whether a tumour specific mutation is expressed in view of the data about expression at the locus of the mutation, using quantities and parameters that are directly interpretable and linked to biological phenomena.
- the Bayesian framework additionally can incorporate an informative prior, such as e.g. knowledge of whether the gene comprising the tumour-specific mutation is expected to be expressed, whether mutations in this disease type are typically expressed, etc.
- a tumour-specific mutation may be considered to be likely to be expressed if the posterior probability that the tumour-specific mutation is expressed is above a predetermined threshold.
- a tumour-specific mutation may be considered to be unlikely to be expressed if the posterior probability that the tumour-specific mutation is expressed is below a predetermined threshold.
- the predetermined threshold may be selected as the threshold that maximise the true positive rate while keeping the false positive rate below a predetermined value (such as e.g.0.05) in a set of tumour-specific mutations with known expression status.
- the predetermined threshold may be approximately 0.5, between 0.4 and 0.7, or between 0.5 and 0.6.
- a tumour-specific mutation may be considered to be likely to be expressed if the power to detect whether the tumour-specific mutation is expressed is above a predetermined threshold and the number of RNA reads in the sample that show the tumour-specific mutation (b) is above a threshold number of reads (bc).
- a tumour-specific mutation may be considered to be unlikely to be expressed if the power to detect whether the tumour-specific mutation is expressed is above a predetermined threshold and the number of RNA reads in the sample that show the tumour-specific mutation (b) is below a threshold number of reads (b c ).
- a tumour-specific mutation may be considered to be likely to be expressed if the power to detect whether the tumour-specific mutation is expressed is below a predetermined threshold and the number of RNA reads in the sample that show the tumour-specific mutation (b) is below a threshold number of reads (b c ).
- the RNA sequence data may be insufficient to determine confidently whether the tumour-specific mutation is truly expressed or not, and the tumour-specific mutation may therefore be considered likely to be expressed by default. This is by contrast to previous methods that would simply ignore tumour-specific mutations that are detected at low level in RNA sequence data, regardless of whether this low level is truly indicative of a lack of expression or instead results from limitations of the data itself.
- RNA sequence data comprising the number of RNA reads in the sample that show the tumour-specific mutation (b), the number of RNA reads in the sample that show the corresponding germline allele, and the total number of RNA reads at the location of the tumour-specific mutation (d) may instead comprise obtaining RNA sequence data comprising at least two of: the number of RNA reads in the sample that show the tumour-specific mutation (b), the number of RNA reads in the sample that show the corresponding germline allele(s) (or do not show the tumour-specific mutation), and the total number of RNA reads at the location of the tumour-specific mutation (d).
- the RNA sequence data may comprise RNA sequence data from a plurality of samples from the subject. The plurality of samples may be tumour samples.
- the posterior probability that the tumour specific mutation is expressed may be a posterior probability that the tumour specific mutation is expressed ubiquitously in a tumour of the subject. Such a posterior probability may be determined using equation (14’) (and in particular any of the formulations of equation (14’)).
- Each sample may be assumed to comprise: a first population of cells that each have a genotype (GV) comprising at least one copy of the tumour-specific mutation, and a second population of cells that each have a genotype that does not comprise the tumour-specific mutation (GN), the first population of cells representing a proportion equal to the tumour fraction (t) for the respective sample.
- the likelihoods may be obtained as sums over a plurality of joint genotypes comprising a genotype for the first population of cells and a genotype for the second populations of cells.
- the sum may be a weighted sum, wherein the likelihoods assuming each of the respective joint genotypes are weighted by a probability associated with the joint genotype.
- the probabilities associated with each of the respective joint genotypes may sum to 1.
- the second population of cells may be assumed to have a homozygous diploid reference allele genotype (AA).
- AA homozygous diploid reference allele genotype
- the joint genotypes may be determined from estimates of the major and minor copy numbers for the locus by assuming that the possible genotypes of the first population of cells include any genotype with a copy number equal to the sum of the major and minor copy number and a number of copies of the variant allele between the minor copy number and the major copy number.
- the possible genotypes of the first population of cells may include genotypes assuming that the mutation to the variant allele occurred prior to or before any copy number event (decrease or increase of copy number at the locus compared to a diploid genotype). Each of the possible genotypes considered may be associated with an equal probability.
- the tumour-specific mutation may be assumed to be ubiquitous in the one or more samples.
- the tumour-specific mutation may be a clonal mutation or a mutation assumed to be clonal in the tumour of the subject.
- the method may comprise determining whether the tumour-specific mutation is likely to be clonal in the subject.
- determining whether a clonal tumour-specific mutation is likely to be expressed in a subject also described herein is a method of determining whether a clonal tumour-specific mutation is likely to be expressed in a subject.
- the likelihoods of the sequence data if the tumour-specific mutation is (i) expressed and (ii) not expressed may depend on the fraction ( ⁇ ) of the total expression of the gene comprising the tumour-specific mutation (number of reads or transcripts per million, TPM) that originate from a population of cells in the one or more samples that does not comprise the tumour- specific mutation.
- References to the total expression of the gene comprising the tumour- specific mutation may refer to the total expression of one or more transcripts derived from the gene.
- the parameter ⁇ may be set to a predetermined default value or wherein ⁇ is estimated from data comprising RNA sequence data for a plurality of samples having different tumour fractions.
- ⁇ may be set to a value that depends on the expression level for the gene in one or more tumours and normal samples.
- the expression level for the gene in one or more tumours and normal samples may be obtained from a database.
- expected expression of the gene in samples from the same tumour type and normal samples may be used, or differential expression between one or more samples from the same tumour type and normal samples.
- ⁇ may be set to 1 when determining the likelihood of the sequence data if the tumour-specific mutation is not-expressed.
- the parameter ⁇ may be estimated from data comprising RNA sequence data for a plurality of samples having different tumour fractions, wherein ⁇ is derived from the slope (g) of a regression model (e.g.
- ⁇ 1 ⁇ 2 (1 ⁇ ⁇ ⁇ ⁇ + ⁇ ⁇ ⁇ ⁇ ) where ⁇ ⁇ ⁇ ⁇ and ⁇ ⁇ ⁇ ⁇ are the total expression values for the gene comprising the tumour-specific mutation for a tumour sample and a normal sample, respectively.
- the values of ⁇ ⁇ ⁇ ⁇ and ⁇ ⁇ ⁇ ⁇ may be obtained using a regression model fitted to values of the total expression for the gene (TPM) in a plurality of tumour samples with at least two different tumour fractions (t), as a function of tumour fraction.
- the parameter ⁇ is preferably set or estimated on a gene or transcript basis, i.e. using data or prior knowledge about the particular gene or transcript comprising the tumour-specific mutation.
- the parameter ⁇ may be estimated using expression data from the same subject, or from subjects having the same type of tumour as the subject.
- the parameter may be estimated individually for the gene i comprising the tumour-specific mutation.
- the regression model may be a linear regression model.
- the probability of the tumour-specific mutation being expressed may be the mean ⁇ ⁇ of a Beta distribution (p( ⁇ )) for parameter ⁇ of a Bernoulli distribution for the variable capturing whether the tumour-specific mutation is expressed (E).
- the parameter ⁇ ⁇ may be set to a predetermined value.
- the parameter ⁇ ⁇ may be set to a first predetermined value when the total number of RNA reads at the location of the tumour-specific mutation is above or at or above a predetermined threshold.
- the predetermine threshold may be 0 (d>0).
- ⁇ ⁇ may be set to a first predetermined value or a second predetermined value depending on whether the gene comprising the tumour-specific mutation is expected to be expressed.
- the gene comprising the tumour-specific mutation may be considered as expected to be expressed when the total expression of the gene in the sample is above a predetermined threshold, and considered as expected not to be expressed otherwise.
- the first predetermined value of ⁇ may be 0.5.
- the second predetermined value of ⁇ may be below 0.5, such as e.g. 0.2, 0.1, 0.05, or 0.01.
- the predetermined threshold on total expression of the gene comprising the tumour-specific mutation may be 1 TPM (transcript per million).
- the predetermined threshold on the total number of RNA reads at the location of the tumour- specific mutation may be 0.
- the predetermined threshold on total expression of the gene comprising the tumour specific mutation may be set depending on the expected level of expression of the gene in the one or more tumour samples, such as e.g. based on the expression level of the gene in one or more further tumour samples, such as e.g. tumour samples from the same type of tumour as that of the subject.
- the expression level of the gene in one or more further tumour samples may be obtained e.g. from a database.
- a value for the prior probability of the mutation being expressed may depend on the subject, the tumour, the mutation, or a combination of these.
- a value may be determined using data previously acquired on a relevant cohort of patients, such as e.g. patients that suffer from the same type or subtype of cancers.
- a value may be set arbitrarily based on prior knowledge about the cancer type or mutation. For example, specific mutations that have been found across a plurality of cancer samples and have been identified as often being expressed in these samples may be assigned a higher than 0.5 probability.
- the present inventors have adapted a simple Bayesian framework with a single prior to instead include, in cases where there is insufficient evidence from the data (i.e.
- the likelihoods may be conditional on the tumour purity of the one or more samples.
- the tumour purity of the one or more samples may be estimated using genomic sequence data from the one or more samples.
- the tumour purity may be estimated using methods known in the art such as e.g. ASCAT or Sequenza.
- the tumour purity for a sample may be the purity with highest confidence obtained from a probabilistic method for estimating tumour purity from DNA sequence data, such as e.g. a maximum a posteriori estimate.
- the genotypes of the first and second populations of cells (also referred to as “joint genotype”) may be set to predetermined genotypes, such as e.g.
- the first population of cells may be considered to be heterozygous diploid comprising a copy of the tumour-specific mutation and a copy of a non-mutated allele
- the second population of cells may be considered to be diploid with two non-mutated alleles.
- the genotypes of the first and second populations of cells may be estimated using methods known in the art such as e.g. ASCAT or Sequenza.
- the genotypes for a sample may be the genotypes with highest confidence obtained from a probabilistic method for estimating genotypes in a mixed population from DNA sequence data, such as e.g. a maximum a posteriori estimate joint genotype.
- the probability of observing the sequence data may depend on the genotypes of the first and second populations of cells, the total number of reads at the locus of the tumour specific mutation, the tumour fraction (t), the fraction ( ⁇ ) of the total expression of the gene comprising the tumour- specific mutation that originate from the second population of cells, and the ratio of expression of the allele(s) not comprising the tumour-specific mutation relative to the total expression at the locus of the tumour-specific mutation ( ⁇ ).
- the ratio of expression of the alleles not comprising the tumour-specific mutation relative to the total expression at the locus of the tumour-specific mutation ( ⁇ ) may be assumed to be a random variable with a first distribution if the tumour-specific mutation is expressed (i.e.
- the second distribution may be a beta distribution with parameters ⁇ 0 , ⁇ 0
- the likelihoods may be marginal likelihoods obtained from the probability of observing the sequence data by integrating out the ratio of expression of the alleles not comprising the tumour-specific mutation relative to the total expression at the locus of the tumour-specific mutation ( ⁇ ).
- the step of determining the likelihoods may comprise a numerical integration, e.g. by a processor, of the probability of observing the sequence data over all possible values (i.e. between 0 and 1) of the ratio of expression of the alleles not comprising the tumour-specific mutation relative to the total expression at the locus of the tumour-specific mutation ( ⁇ ).
- the likelihoods may be integrals over all possible values of ⁇ of the probability of observing the sequence data multiplied by the distribution of the ratio of expression of the alleles not comprising the tumour-specific mutation relative to the total expression at the locus of the tumour-specific mutation ( ⁇ ).
- the likelihoods may be obtained as sums of integrals (e.g. sums of marginal likelihoods) over a plurality of possible genotypes of the first and second population of cells. The sums may be weighted by the respective probabilities of the plurality of possible genotypes.
- the probability of observing the sequence data conditional on the genotypes of the first and second populations of cells (GV, GN), the total number of reads at the locus of the tumour specific mutation, the tumour fraction (t), the fraction ( ⁇ ) of the total expression of the gene comprising the tumour-specific mutation that originate from the second population of cells, and the ratio of expression of the alleles not comprising the tumour-specific mutation relative to the total expression at the locus of the tumour-specific mutation ( ⁇ ) may be assumed to follow a binomial distribution with a parameter ⁇ ( ⁇ , ⁇ , ⁇ , t ) representing the probability of sampling a read with the tumour specific mutation from a tumour sample.
- ⁇ ( ⁇ , ⁇ , ⁇ , t ) is given by any of equations (2), (2’), (2’’), optionally wherein where c(GV) and c(GN) are respectively the total number of copies of the locus comprising the tumour-specific mutation in the first and second populations of cells, ⁇ is a sequencing error rate, ⁇ ( ⁇ ⁇ , ⁇ , ⁇ ) is the probability of sampling a read with the tumour-specific mutation from the second population of cells and ⁇ ( ⁇ ⁇ , ⁇ , ⁇ ) is the probability of sampling a read with the tumour- specific mutation from the first population of cells.
- the posterior probability may depend on the ratio (r) of the likelihoods of the sequence data if the tumour- specific mutation is (i) expressed ( ⁇ ( ⁇ , ⁇
- ⁇ , ⁇ , ⁇ 1)) and (ii) not-expressed ( ⁇ ( ⁇ , ⁇
- ⁇ , ⁇ , ⁇ 0)).
- the posterior probability may be given by equation (13), (14) or (14’) (i.e. the posterior probability may be equal where ⁇ is the prior probability of the mutation being expressed.
- the prior probability of the mutation being expressed may be the mean of a Beta probability distribution for parameter ⁇ (p( ⁇ )), where the variable representing whether the variant is expressed is assumed to follow a Bernoulli distribution with parameter ⁇ .
- the method may be computer implemented.
- the step of obtaining the RNA sequence data may be performed by a processor, and the step of determining the likelihoods may be performed by said processor.
- the step of obtaining the RNA sequence data may comprise receiving RNA sequence data comprising sequence reads from one or more samples from the subject, and determining from said sequence reads at least two of: the number of RNA reads in the sample that show the tumour-specific mutation (b), the number of reads in the sample that show the corresponding germline allele(s) (or do not show the tumour-specific mutation), and the total number of reads at the location of the tumour-specific mutation (d).
- the step of determining the posterior probability that the tumour-specific mutation is expressed and/or the power to detect a tumour-specific mutation that is expressed may be computer implemented.
- the step of determining the posterior probability that the tumour-specific mutation is expressed and/or the power to detect a tumour-specific mutation that is expressed may comprise a step of numerical integration to obtain the likelihoods.
- the step may comprise determining the posterior probability that the mutation is expressed in view of a prior probability of the mutation being expressed, and the probabilities of observing the sequence data if the tumour-specific mutation is (i) expressed and (ii) not expressed, by solving a plurality of one dimensional integrals (such as e.g. a pair of integrals for each sample, respectively representing the assumption that the mutation is expressed and not expressed) integrating the probability of the observed sequence data over all possible values of the proportion of RNA reads from the tumour cells in the sample that comprise the tumour-specific mutation ( ⁇ ).
- a plurality of one dimensional integrals such as e.g. a pair of integrals for each sample, respectively representing the assumption that the mutation is expressed and not expressed
- the step of determining the power to detect a tumour-specific mutation that is expressed may comprise solving a plurality of one dimensional integrals (such as e.g. a pair of integrals for each sample, respectively representing the assumption that the mutation is expressed and not expressed) integrating the probability of the observed sequence data over all possible values of the proportion of RNA reads from the tumour cells in the sample that comprise the tumour-specific mutation ( ⁇ ).
- the step of providing may comprise one or more steps, all or some of which are computer implemented.
- the method may further comprise obtaining or providing, for each sample, at least one estimate of the tumour fraction, and optionally at least one corresponding set of one or more candidate joint genotypes comprising a genotype for a population of cells that does not comprise the tumour-specific mutation (e.g. a normal cell population) and a genotype for a population of cells that does comprise the tumour-specific mutation (e.g. a tumour population).
- a tumour fraction estimate may be obtained using a method for determining allele-specific copy number profiles in samples comprising a mixture of tumour and normal cells.
- Methods for doing this using genomic sequencing or array data are known in the art, for example by expressing the allele specific data as a function of parameters including allele-specific copy numbers, tumour aneuploidy and tumour cell fraction, and identifying the value of these parameters that best fit all of the data. Examples of such methods include e.g. ASCAT (Van Loo et al., 2010), amongst others. Alternatively, a tumour fraction estimate may be determined experimentally. Thus, the method may further comprise obtaining a tumour fraction estimate for each of the one or more samples.
- the method may comprise obtaining, by a processor, for each sample, at least one estimate of the tumour fraction comprises the processor determining an estimate of the tumour fraction and allele specific copy numbers using genomic sequence data from the one or more samples, and determining, by said processor, a set of one or more candidate joint genotypes associated with said allele specific copy numbers.
- a set of one or more candidate genotypes may be obtained using allele- specific copy numbers or variables derived therefrom (or conversely, from which such allele- specific copy numbers can be derived, such as B allele fraction and log R) for the tumour cells in a mixed sample.
- Allele-specific copy numbers for the tumour cells in a mixed sample may be obtained using a method for determining allele-specific copy number profiles in samples comprising a mixture of tumour and normal cells, such as e.g. ASCAT (Van Loo et al., 2010), Sequenza (Favero et al., 2015), or ascatNgs (Raine et al., 2016), amongst others.
- the method may further comprise obtaining, for each of the one or more samples, estimates for at least two of: the copy number of the major allele in the tumour cells in the sample, the copy number of the minor allele in the tumour cells in the sample, and the total copy number at the location of the tumour-specific mutation in the tumour cells in the sample.
- the estimates of copy number in the tumour cells in the sample may represent a summarised (e.g. average) estimate over the entire population of tumour cells in the sample.
- the method may further comprise repeating the method for a plurality of tumour-specific mutations identified in the subject.
- the method may further comprise ranking or otherwise prioritising the plurality of tumour-specific mutations at least in part based on their determined probability of being expressed in the subject.
- the method may further comprise identifying one or more tumour-specific mutations in the subject. Identifying one or more tumour-specific mutations in the subject may be performed using genomic or transcriptomic sequence data from one or more samples from the subject comprising tumour genetic material and sequence data from one or more germline samples from the subject, such as by comparing said sequence data.
- Identifying one or more tumour-specific mutations in the subject may comprise aligning sequence data from at least one sample comprising tumour genetic material to a reference sequence and identifying positions where the sequence of the sample differs from the reference sequence.
- the method may further comprise aligning sequence data from at least one germline sample to the reference sequence and identifying positions where the sequence of the sample comprising tumour genetic material differs from the germline sample.
- the reference sequence may be a reference genome or a reference transcriptome.
- the step of providing sequence data from one or more samples from the subject may comprise or consist of receiving sequence data from a user (for example through a user interface), from one or more computing device(s), or from one or more data stores or databases.
- the step of providing sequence data may further comprise sequencing (or otherwise determining the sequence composition of genetic material present in a sample) one or more samples from the subject comprising tumour genetic material.
- the method may further comprise sequencing (or otherwise determining the sequence composition of genomic material present in a sample) one or more germline samples from the subject.
- the method may further comprise obtaining, from the subject, one or more samples comprising tumour genetic material and optionally one or more germline samples.
- Genetic material as used herein comprises RNA molecules (e.g. mRNA transcripts), and optionally DNA molecules (e.g. genomic DNA).
- the method may further comprise providing to a user, for example through a user interface, the determined probability of the tumour-specific mutation being expressed and/or a value derived therefrom or associated therewith, such as the likelihoods of the sequence data assuming that the tumour specific mutation is expressed or not expressed, the relative likelihood of the sequence data assuming that the tumour specific mutation is expressed or not expressed, and/or the power to detect the tumour specific mutation as expressed given the distributions of the likelihood of the sequence data assuming that the tumour specific mutation is expressed and not expressed, at one or more false positive rates.
- the method may comprise providing a “expressed status” flag or value based on the determined probability of the tumour- specific mutation being expressed, and/or the values of the likelihoods and power to detect a tumour-specific mutation that is expressed.
- the method may comprise providing information identifying the mutation (such as e.g. the sequence of the mutation and its genomic location).
- a method of identifying one or more neoantigens in a subject comprising: identifying a plurality of tumour- specific mutations in the subject (such as for example, using genomic and/or transcriptomic data from one or more tumour samples from said subject); determining whether one or more of the tumour-specific mutations is likely to be expressed in a tumour of the subject using the method of any embodiment of the preceding aspect; and determining whether one or more of the tumour-specific mutations is likely to give rise to a neoantigen, wherein a neoantigen is a tumour-specific mutation that satisfies one or more predetermined criteria on whether the tumour-specific mutation is likely to be expressed in the tumour and optionally one or more further criteria on whether the tumour-specific mutation is likely to give rise to a neoantigen.
- Also described according to the present aspect is a method of identifying one or more neoantigens in a subject, the method comprising: identifying, by a processor using sequence data from one or more samples from said subject, a plurality of tumour-specific mutations in the subject; determining, by a processor whether one or more of the tumour-specific mutations is likely to be expressed in the subject using the method of any preceding claim; and selecting, by said processor, one or more of the tumour-specific mutations as candidate neoantigens, wherein a candidate neoantigen is a tumour-specific mutation that satisfies at least one or more predetermined criteria on whether the tumour-specific mutation is likely to be expressed and optionally one or more criteria on whether the tumour-specific mutation is likely to give rise to a neoantigen.
- a neoantigen may be a tumour-specific mutation that satisfies at least a criterion selected from: having a probability of being expressed above a predetermined threshold, having a probability of being expressed that is above a threshold set adaptively to select a predetermined number of tumour-specific mutations with the highest probabilities of being expressed amongst the tumour-specific mutations for which a probability was determined, having a probability of being expressed that is above a threshold set adaptively to select a predetermined top percentile of tumour-specific mutations amongst the tumour-specific mutations for which a probability was determined, having a power to detect a mutation as expressed that is above a predetermined threshold and a number of RNA reads showing the tumour-specific mutation above a threshold number associated with the power to detect a mutation as being expressed, and having a power to detect a mutation as expressed that is below a predetermined threshold and a number of RNA reads showing the tumour-specific mutation below a
- the method may further comprise determining whether one or more of the tumour-specific mutations is likely to be clonal in a tumour of the subject, and identifying whether one or more of the tumour-specific mutations is likely to give rise to a clonal neoantigen.
- a clonal neoantigen may be a tumour-specific mutation that satisfies at least a criterion selected from: having a probability of being clonal above a predetermined threshold, having a probability of being clonal that is above a threshold set adaptively to select a predetermined number of tumour-specific mutations with the highest probabilities of being clonal amongst the tumour- specific mutations for which a probability was determined, and having a probability of being clonal that is above a threshold set adaptively to select a predetermined top percentile of tumour-specific mutations amongst the tumour-specific mutations for which a probability was determined.
- the one or more predetermined criteria on whether the tumour-specific mutation is likely to be clonal may be selected from: the mutation having a likelihood of being clonal above a predetermined threshold, the mutation having a likelihood of being clonal that is above a threshold set adaptively to select a predetermined number of tumour-specific mutations with the highest likelihoods of being clonal amongst the tumour-specific mutations for which a likelihood was determined, and having a likelihood of being clonal that is above a threshold set adaptively to select a predetermined top percentile of tumour-specific mutations amongst the tumour-specific mutations for which a likelihood was determined.
- a neoantigen may be a tumour-specific mutation that satisfies at least a criterion selected from: being predicted to result in a protein or peptide that is not expressed in the normal cells of the subject, being predicted to result in at least one peptide that is likely to be presented by an MHC molecule, being predicted to result in at least one peptide that is likely to be presented by an MHC allele that is known to be present in the subject, and being predicted to result in a protein or peptide that is immunogenic.
- a neoantigen may be a tumour-specific mutation that satisfies a criterion that it is predicted to result in a change in the sequence of a protein (e.g.
- the one or more criteria on whether the tumour-specific mutation is likely to give rise to a neoantigen may be selected from: the mutation being associated with an expression product that is expressed in tumour cells, the mutation being predicted to result in a protein or peptide that is not expressed in the normal cells of the subject, the mutation being predicted to result in at least one peptide that is likely to be presented by an MHC molecule, the mutation being predicted to result in at least one peptide that is likely to be presented by an MHC allele that is known to be present in the subject, and the mutation being predicted to result in a protein or peptide that is immunogenic.
- the method may further comprise identifying one or more peptides associated with the one or more neoantigens (i.e.
- tumour-specific mutation satisfies one or more criteria (such as e.g. criteria related to probability of being expressed, likelihood of giving rose to a neoantigen and/or likelihood of giving rise to a clonal neoantigen) as described above.
- criteria such as e.g. criteria related to probability of being expressed, likelihood of giving rose to a neoantigen and/or likelihood of giving rise to a clonal neoantigen
- criteria such as e.g. criteria related to probability of being expressed, likelihood of giving rose to a neoantigen and/or likelihood of giving rise to a clonal neoantigen
- a method of determining whether a tumour- specific mutation is likely to give rise to a neoantigen comprising: determining whether the tumour-specific mutations is likely to be expressed in a tumour of the subject using the method of any embodiment of the first aspect; and determining whether the tumour- specific mutation satisfies one or more predetermined criteria applying to the result of the step of determining whether the tumour-specific mutation is likely to be expressed, and optionally one or more further criteria.
- the method of the present aspect may have any one or more of the features of any preceding aspect.
- a method of providing an immunotherapy for a subject that has been diagnosed as having cancer comprising: identifying one or more neoantigens that are likely to be expressed in a tumour of the subject using a method as described herein, such as a method according to any embodiment of the first or second aspect, for example by identifying one or more tumour-specific mutations in the subject; determining whether one or more of the tumour-specific mutations is likely to be expressed in a tumour of the subject using the methods of any embodiment of the first aspect; selecting one or more tumour-specific mutations from the identified tumour-specific mutations based on the result of the determining and optionally one or more further criteria, such as e.g.as described in relation to the second aspect; and designing an immunotherapy that targets one or more of the neoantigens identified.
- a method of treating a subject that has been diagnosed as having cancer comprising: identifying one or more neoantigens by: identifying a plurality of tumour-specific mutations in the subject; determining whether one or more of the tumour-specific mutations is likely to be expressed in the subject; selecting one or more of the tumour-specific mutations as candidate neoantigens, wherein a candidate neoantigen is a tumour-specific mutation that satisfies at least one or more predetermined criteria on whether the tumour-specific mutation is likely to be expressed; and treating the subject with an immunotherapy that targets one or more of the selected candidate neoantigens.
- determining whether a tumour-specific mutation is likely to be expressed in a subject comprises: obtaining, by a processor, RNA sequence data from one or more samples from the subject comprising tumour genetic material, the RNA sequence data comprising for each of the one or more samples: the number of reads in the sample that show the tumour-specific mutation (b), and the total number of reads at the location of the tumour-specific mutation (d), and determining, by the processor, the probabilities of observing the RNA sequence data if the tumour-specific mutation is (i) expressed and (ii) not expressed.
- the method may further comprise determining the posterior probability that the tumour-specific mutation is expressed depending on: a prior probability of the mutation being expressed, and the probabilities of observing the RNA sequence data if the tumour-specific mutation is (i) expressed and (ii) not expressed.
- the method may have any of the features described in relation to any preceding aspect.
- the method may have any one or more of the following features.
- the present disclosure also relates to immunotherapies that target one or more neoantigens associated with a tumour-specific mutation that has been determined to be expressed in a tumour using a method as described herein, and to methods for designing and/or providing such immunotherapies.
- an immunotherapy may be an immunogenic composition, a composition comprising immune cells or a therapeutic antibody.
- the immunogenic composition may comprise one or more neoantigens identified (such as e.g. a neoantigen peptide or protein or a cell displaying the neoantigen), or material sufficient for expression of the one or more neoantigens identified (e.g. a DNA or RNA molecule which encodes the neoantigen(s)).
- the composition comprising immune cells may comprise T cells, B cells and/or dendritic cells.
- the composition comprising a therapeutic antibody may comprise one or more antibodies that recognise at least one of the one or more of the neoantigens identified. An antibody may be a monoclonal antibody.
- the cancer may be selected from bladder cancer, gastric cancer, oesophageal cancer, breast cancer, colorectal cancer, cervical cancer, ovarian cancer, endometrial cancer, kidney cancer (renal cell), lung cancer (small cell, non-small cell and mesothelioma), brain cancer (gliomas, astrocytomas, glioblastomas), melanoma, lymphoma, small bowel cancers (duodenal and jejunal), leukemia, pancreatic cancer, hepatobiliary tumours, germ cell cancers, prostate cancer, head and neck cancers, thyroid cancer and sarcomas.
- the cancer may be lung cancer.
- the cancer may be melanoma.
- the cancer may be bladder cancer.
- the cancer may be head and neck cancer.
- the subject may be human.
- Designing an immunotherapy that targets one or more neoantigens identified may comprise designing one or more candidate peptides for each of the one or more neoantigens targeted, each peptide comprising at least a portion of a neoantigen targeted.
- Designing or providing an immunotherapy may comprise obtaining the one or more candidate peptides.
- the method may further comprise testing the one or more candidate peptides for one or more properties. Testing may be performed in vitro or in silico.
- the one or more peptides may be tested for immunogenicity, propensity to be displayed by MHC molecules (optionally by specific MHC molecule alleles, where the alleles may have been chosen depending on the MHC alleles expressed by the subject), ability to elicit proliferation of a population of immune cells, etc.
- the method may further comprise producing the immunotherapy.
- the method may further comprise obtaining a population of dendritic cells that has been pulsed with one or more of the candidate peptides.
- the immunotherapy may be a composition comprising T cells that recognise at least one of the one or more of the neoantigens identified.
- the composition may be enriched for T cells that target at least one of the one or more of the neoantigens identified.
- the method may comprise obtaining a population of T cells and expanding the population of T cells to increase the number or relative proportion of T cells that target at least one of the one or more of the neoantigens identified.
- the method may further comprise obtaining a T cell population.
- a T cell population may be isolated from the subject, for example from one or more tumour samples obtained from the subject, or from a peripheral blood sample or a sample from other tissues of the subject.
- the T cell population may comprise tumour infiltrating lymphocytes.
- T cells may be isolated using methods which are well known in the art. For example, T cells may be purified from single cell suspensions generated from samples on the basis of expression of CD3, CD4 or CD8. T cells may be enriched from samples by passage through a Ficoll- opaque gradient.
- the method may further comprise expanding the T cell population.
- T cells may be expanded by ex vivo culture in conditions which are known to provide mitogenic stimuli for T cells.
- the T cells may be cultured with cytokines such as IL-2 or with mitogenic antibodies such as anti-CD3 and/or CD28.
- the T cells may be co-cultured with antigen-presenting cells (APCs), which may have been irradiated.
- APCs may be dendritic cells or B cells.
- the dendritic cells may have been pulsed with peptides containing one or more of the identified neoantigens as single stimulants or as pools of stimulating neoantigen peptides.
- Expansion of T cells may be performed using methods which are known in the art, including for example the use of artificial antigen presenting cells (aAPCs), which provide additional co-stimulatory signals, and autologous PBMCs which present appropriate peptides.
- aAPCs artificial antigen presenting cells
- Autologous PBMCs may be pulsed with peptides containing neoantigens as discussed herein as single stimulants, or alternatively as pools of stimulating neoantigens.
- a method for expanding a T cell population for use in the treatment of cancer in a subject comprising: identifying one or more neoantigens using a method as described herein, such as a method according to any embodiment of the second aspect; obtaining a T cell population comprising a T cell which is capable of specifically recognising one of the identified neoantigens; and co-culturing the T cell population with a composition comprising the identified neoantigens.
- the method may have one or more of the following features.
- the T cell population obtained may be assumed to comprise a T cell capable of specifically recognising one of the identified neoantigens.
- the method preferably comprises identifying a plurality of neoantigens.
- the neoantigens may be clonal neoantigens.
- the T cell population may comprise a plurality of T cells each of which is capable of specifically recognising one of the plurality of identified neoantigens, and co- culturing the T cell population with a composition comprising the plurality of identified neoantigens.
- the co-culture may result in expansion of the T cell population that specifically recognises the one or more neoantigens.
- the expansion may be performed by co-culture of a T cell with a neoantigen and an antigen presenting cell.
- the antigen presenting cell may be a dendritic cell.
- the expansion may be a selective expansion of T cells which are specific for the neoantigen.
- the expansion may further comprise one or more non-selective expansion steps.
- a composition comprising a population of T cells obtained or obtainable by a method according to any embodiment of the preceding aspect.
- a composition comprising a neoantigen, neoantigen specific immune cell, or an antibody that recognises a neoantigen, for use in the treatment or prevention of cancer in a subject, wherein said neoantigen has been identified as a neoantigen (e.g.
- a composition comprising a neoantigen, neoantigen specific immune cell, or an antibody that recognises a neoantigen, wherein said neoantigen has been identified as a neoantigen (e.g. identified as being derived from a tumour-specific mutation that is expressed in a tumour of the subject) using the methods described herein.
- a cell or population of cells expressing a neoantigen on its surface wherein said neoantigen has been identified as a neoantigen (e.g.
- a method of treating a subject that has been diagnosed as having cancer the method comprising administering an immunotherapy that has been provided using the methods described herein, or a composition as described herein.
- a system comprising: a processor; and a computer readable medium comprising instructions that, when executed by the processor, cause the processor to perform the steps of any method described herein, such as a method according to any embodiment of the first, second, third or fourth aspects above.
- a non-transitory computer readable media comprising instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of any method described herein, such as a method according to any embodiment of the first, second, third or fourth aspects above.
- FIG. 1A illustrates schematically the problem of determining allele specific expression in a homogeneous diploid cell population
- Figure 1B illustrates schematically the problem of determining allele specific expression in a mixed sample comprising variant cells (e.g. tumour cells) and reference cells (e.g. healthy cells).
- Figures 1C and 1D illustrate schematically the features of a model for determining the power to detect the expression of a mutated allele in a mixed sample.
- Variables of the model for detection of a mutated allele (here indicated by a “C”) in a mixed sample comprising a variant population of cells that have at least one copy of the allele (e.g. tumour cells, here illustrated as having a genotype GV heterozygous for the mutated allele), and a reference population of cells that do not have any copies of the allele (e.g. healthy cells, comprising only reference alleles here illustrated as “T”, the healthy cells having a genotype GN homozygous for T in the illustrated example).
- the sample comprise a proportion t of variant cells (also referred to as “tumour purity”).
- the expression data consists of a total number d of reads covering the locus, of which b reads show the variant allele.
- the variant cells express both the reference and the variant alleles, with proportion ⁇ (fraction of the expression in the tumour cells due to the reference allele), and the healthy cells express a proportion ⁇ of the total number of reads observed at the locus.
- These variables are used to determine the likelihood of observing any number of reads b with the variant allele under a null model of no expression of the variant allele (M 0 ) and under an alternative model where the variant allele is expressed (M 1 ).
- Figure 2A is a flowchart illustrating schematically a method of determining whether a tumour- specific mutation is likely to be clonal, and its use in identifying clonal neoantigens.
- Figure 2B is a flowchart illustrating schematically a method of providing an immunotherapy.
- Figure 3 shows an embodiment of a system for determining whether a tumour-specific mutation is likely to be clonal and/or for identifying clonal neoantigens and/or for providing an immunotherapy.
- Figure 4 illustrates schematically a model used to evaluate sequence data according to methods disclosed herein.
- Figure 5A and 5B illustrate schematically models used to determine the probability that a variant is expressed according to methods disclosed herein, based on data from a single tumour sample (A) or a plurality of tumour samples (B).
- Figure 6 shows the behaviour of the posterior probability of a variant being expressed (y-axis) as a function of the relative likelihood of models assuming that the variant is vs. is not expressed (r, x-axis).
- the plot shows a curve for each of a plurality of the parameter ⁇ (mean of the Beta distribution over ⁇ used as a hyperprior for ⁇ , where ⁇ is the parameter of a Bernoulli variable prior for E and E is a binary variable capturing whether the variant is expressed or not).
- FIG. 7 illustrates schematically how the fraction of total expression at a locus due to the normal cell population ( ⁇ ) can be estimated by fitting a line to data of total expression at the locus for samples with various purities (t).
- Figure 8 shows the true negative and false positive (first bar of each subplot – variants that are known not to be expressed) and the false negative and true positive (second bar of each subplot, variants that are known to be expressed) rates obtained using the methods described herein to identify whether variants are expressed in datasets of known dilutions of a cell line with known variants with known expression status.
- Each column shows a different dilution and each row shows results using different sources for the genotypes and purity used in the model.
- the number of mutations that are in each category are overlaid on the bars. In each case, the smallest part of the bar is the erroneously assigned mutations (FP or FN mutations).
- D Ground truth genotypes, estimated purity.
- E Fixed genotypes, estimated purity.
- Figure 10 shows the true positive rate as a function of the power to detect expression in the data of Figures 8 and 9, binning the mutations by power. Each plot is for a model using different sources for the genotypes and purity.
- Figure 11 shows the true negative and false positive (first bar of each subplot – variants that are known not to be expressed) and the false negative and true positive (second bar of each subplot, variants that are known to be expressed) rates obtained using the methods described herein to identify whether variants are expressed in datasets of known dilutions of a cell line with known variants with known expression status, using an estimation of the value of the cell expression ratio estimated from data. Each column shows a different dilution and each row shows results using different sources for the genotypes and purity used in the model.
- Figure 12 shows the distribution of estimated power to detect an expressed variant obtained using the methods described herein for the data on Figure 11.
- Each subplot shows results using different sources for the genotypes and purity used in the model.
- Figure 13 shows the calculated (ground truth) true positive rate as a function of the estimated power for the data on Figures 11 and 12.
- Figure 15 shows the number of variants associated with values between 0 and 1 of the probability of a mutation being expressed as determined using the variant allele fraction (VAF) of the mutation in the read data, for the data of Figures 11-14.
- VAF variant allele fraction
- FIG. 16 shows the receiver operator characteristics (ROC) curve for the identification of variants as expressed or not expressed using the probabilities of a mutation being expressed as determined using the methods described herein (A) or the probability of a mutation being expressed as determined using the variant allele fraction of the mutation in the read data (B), for the data of Figures 11-15.
- ROC receiver operator characteristics
- ROC curves show the true positive rate as a function of the false positive rate for different values of the cutoff (illustrated as the colour of the curve, with the scale on the right hand side of the plot) on probability used to call a variant as expressed or not expressed.
- Data for the model using ground truth genotypes and purity, and a value of ⁇ estimated from data at different purities using a regression model are shown for A. This data only considers variants where the total number of reads was >0.
- the plots also indicate the area under the curve (AUC), the probability threshold (cutoff) that results in the highest true positive rate while keeping the false positive rate ⁇ 0.05, and the corresponding false positive rate (FPR) and true positive rate (TPR).
- Figure 17 shows calibration curves for a model determining the probabilities of a mutation being expressed as determined using the methods described herein (A) or the probability of a mutation being expressed as determined using the variant allele fraction of the mutation in the read data (B), using the data of Figures 11-16.
- the calibration curves show the fraction of true positives (variants that are expressed) as a function of the probability bin (bins of 10% probability) obtained for each of the approaches in A and B.
- Data for the model using ground truth genotypes and purity, and a value of ⁇ estimated from data at different purities using a regression model are shown for A. This data only considers variants where the total number of reads was >0.
- FIG 18 shows receiver operator characteristics (ROC) curve for the identification of variants as expressed or not expressed using the probabilities of a mutation being expressed as determined using the methods described herein (A) or the probability of a mutation being expressed as determined using the variant allele fraction of the mutation in the read data (B), for the data of Figures 11-17.
- ROC curves show the true positive rate as a function of the false positive rate for different values of the cutoff (illustrated as the colour of the curve, with the scale on the right hand side of the plot) on probability used to call a variant as expressed or not expressed.
- Data for the model using ground truth genotypes and purity, and a value of ⁇ estimated from data at different purities using a regression model are shown for A.
- FIG. 19 shows calibration curves for a model determining the probabilities of a mutation being expressed as determined using the methods described herein (A) or the probability of a mutation being expressed as determined using the variant allele fraction of the mutation in the read data (B), using the data of Figures 11-18.
- the calibration curves show the fraction of true positives (variants that are expressed) as a function of the probability bin (bins of 10% probability) obtained for each of the approaches in A and B.
- DETAILED DESCRIPTION While it is generally accepted that cancer antigens such as those resulting from cancer specific variants (also referred to as “tumour-specific mutations”) represent promising therapeutic targets provided that they are expressed by cancer cells (see e.g.
- RNA sequencing data could be used to identify tumour-specific mutations, with the advantage that this is not limited to known genes and can identify e.g. intragenic fusions, novel transcripts, etc.
- RNA expression combines information from variant and normal transcripts of the gene (both of which may be present in the tumour cells). Additionally, because tumour samples are typically mixed samples comprising tumour and normal cells, this also provides an aggregate signal from healthy and tumour cells, altogether providing very uncertain information about whether a variant is in fact expressed.
- allele-specific expression refers to the amount of mRNA that is transcribed from a particular allele, or to whether a particular allele is expressed (i.e. whether any mRNA is transcribed from the particular allele).
- determining allele-specific expression refers to determining whether an allele is expressed unless indicated otherwise. Allele specific expression is typically a property of a particular cell, cell population, tissue, sample or individual. In the context of the present disclosure, the alleles of interest are alleles that are present in a tumour and not in a germline population.
- alleles of interest may be referred to interchangeably as “mutations” or “mutated alleles” or “variant alleles”.
- a mutated allele may be a tumour specific mutation, and a reference allele may be the corresponding germline allele.
- a germline allele may also be referred to as a “normal” or “healthy” or “reference” allele.
- the germline or the mutant allele may be the major allele. Determining allele-specific expression for a mutant allele in a mixed sample comprising mutant and germline cells is a considerably more complex problem than the determination of allelic imbalance.
- Allelic imbalance which is also sometimes referred to in the prior art as “allelic expression”, quantifies expression variation between the two haplotypes of a diploid individual with heterozygous sites. This is conceptually different from the problem at hand (determining whether a variant is expressed) as statistical tests that assess departure from an expected 0.5 allele expression balance can be used (typically Binomial tests). Indeed, the methods described herein aim to determine whether a variant allele is present in expression data, rather than whether a variant and reference allele are expressed to different extents.
- Figure 1A illustrates the problem of determining allele specific expression in a homogeneous diploid cell population. Additionally, allelic imbalance refers to a germline context where a single, pure population of heterozygous cells is considered.
- Figure 1B illustrates the problem of determining allele specific expression in a mixed sample comprising variant cells and reference cells. This is a considerably more complex situation than when considering a homogeneous diploid cell population (as in the determination of allelic expression / allelic imbalance in the germline context).
- the genomic VAF is expected to be equal to 0.5, and genomic DNA reads should show an even representation of each allele.
- the number of RNA reads that represent each allele can be anywhere between 0 and 1 because the two alleles may not have the same expression level.
- any transcriptomic VAF between 0 and 1 could be plausible.
- Any sequencing process samples the population of molecules present in the sample. The sampling process results in a population of reads that is more or less likely to accurately represent the original population depending on the sampling depth, the amount of the molecules for a particular locus that are actually present in the sample (i.e. the expression level of the locus), and the sequencing error rate.
- Gene expression is known to have an extremely broad dynamic range, with some genes being abundantly expressed and others only being expressed at very low levels. At high levels of expression of a genomic locus, the number of reads for each allele should be sufficient to infer the expression VAF with relatively good confidence, particularly if neither allele has extremely low expression.
- the minor allele may not be sampled at all (simply by chance) and/or the number of reads that actually sample the minor allele may be so low that the presence of the variant cannot be distinguished from sequencing errors in view of the error rate of the sequencing platform used.
- This is even more complicated when looking at mixed populations, such as samples comprising tumour cells and non-tumour cells, as illustrated on Figure 1B.
- the number of genomic copies from which a variant allele (tumour specific allele) can be expressed in a tumour sample depends at least on the tumour purity (percentage of the cells in the sample that are tumour cells) and the copy number of the variant allele in the tumour population.
- the expression level of the locus may differ between the tumour and non-tumour cells.
- the present inventors have developed a method to determine the power to detect the expression of a mutated allele in a mixed sample comprising a variant population of cells that have at least one copy of the allele (e.g. tumour cells), and a reference population of cells that do not have any copies of the allele (e.g. normal cells).
- the method evaluates the likelihood of observing any number of reads b with the variant allele under a null model of no expression of the variant allele and under an alternative model where the variant allele is expressed.
- a statistical test to decide whether a variant is expressed is associated with 4 outcomes: a false positive (FP, the variant is identified as expressed when it is in fact not expressed), a false negative (FN, the variant is identified as not expressed when it is in fact expressed), a true positive (TP, a variant that is in fact expressed is identified as expressed), and a true negative (TN, a variant that is in fact not expressed is identified as not expressed).
- FP false positive
- FN false negative
- TP a variant that is in fact expressed
- TN true negative
- the FP rate can be fixed at an acceptable level (e.g.5%), and the power (the probability of avoiding false negatives) can be calculated based on the FN rate at that chosen FP rate.
- the likelihoods M1 and M0 are estimated based on e probability of observing the sequence data comprising a total number of reads covering the locus (d), a number of reads covering the locus and having the mutation (b), and taking into account the tumour purity (t), the genotype of the variant and reference cells (G, also denoted G V and G N , respectively for the variant and reference cell populations), the fraction of expression at the locus that is due to the reference allele ( ⁇ , reference read count/total read count), and the fraction of total expression (both variant and reference) due to the reference cells ( ⁇ ).
- the model described in detail below assumes that the variant and reference cell populations are genetically homogeneous at the locus (i.e.
- the variant population comprises a population with the variant and a population without the variant (but that may not have the same genotype at the locus as the reference population).
- the mutations are assumed to be ubiquitous in the sample(s) that is/are being analysed.
- the variable ⁇ is the probability of sampling a read with the mutation from a mixed sample comprising variant cells with genotype G V and reference cells with genotype GN, where the variant cells represent a proportion t of the population. This is proportional to the number of copies of the variant allele in the mixed population, and its relative expression in the population.
- the probability of observing a number of variant reads b is assumed to follow a Binomial distribution conditional on the total number of read at the local and the value of the variable
- the likelihood of the null and alternative models is obtained using Bayes rule and different prior probabilities over the parameter ⁇ (P( ⁇ )) for the null and variant model.
- P( ⁇ ) the prior over ⁇
- E 1) and p( ⁇
- the joint posterior distribution over E and ⁇ from which the marginal distribution on E can be calculated as shown on Figure 1E by integrating over ⁇ , with r the likelihood ratio between the two models and ⁇ the mean of the Beta distribution over ⁇ (hyperprior).
- the output of the methods may be used in a number of ways. For example, when the power to detect expression of a particular variant is determined to be low and/or the probability that the variant is expressed is determined to be low, the variant may not be excluded from a list of potentially immunogenic variants despite there being low evidence for expression of the variant.
- This information may be combined with one or more further criteria such as likelihood of binding to one or more MHC alleles of a peptide derived from the variant, likelihood of presentation by one or more MHC alleles of a peptide derived from the variant, likelihood of processing of a peptide derived from the variant, likelihood of immunogenicity of a peptide derived from the variant, differential binding affinity and/or likelihood of presentation between a peptide derived from the variant and the corresponding germline peptide, etc.
- a variant that is identified as being associated with a high power to detect expression and/or a low probability of being expressed may be excluded from a list of potentially immunogenic variants if there is low evidence for expression of the variant.
- sample as used herein may be a cell or tissue sample, a biological fluid, an extract (e.g.
- genomic sequencing e.g. whole genome sequencing, whole exome sequencing
- RNA sequencing also referred to as “RNAseq” or “RNA-seq”.
- the sample may be a cell, tissue or biological fluid sample obtained from a subject (e.g. a biopsy). Such samples may be referred to as “subject samples”.
- the sample may be a blood sample, or a tumour sample, or a sample derived therefrom.
- a “sample” as used herein may be a cell or tissue sample, or an extract (e.g.
- RNA extract obtained from a subject from which transcriptomic material can be obtained.
- a “sample” as used herein may be a cell or tissue sample, a biological fluid, an extract (e.g. a DNA extract obtained from the subject), from which genomic material can be obtained for genomic analysis, such as genomic sequencing (e.g. whole genome sequencing, whole exome sequencing).
- genomic sequencing e.g. whole genome sequencing, whole exome sequencing.
- the sample may be one which has been freshly obtained from a subject or may be one which has been processed and/or stored prior to genomic/transcriptomic analysis (e.g. frozen, fixed or subjected to one or more purification, enrichment or extraction steps).
- the sample may be a cell or tissue culture sample.
- a sample as described herein may refer to any type of sample comprising cells or genomic and/or transcriptomic material derived therefrom, whether from a biological sample obtained from a subject, or from a sample obtained from e.g. a cell line.
- the sample is a sample obtained from a subject, such as a human subject.
- the sample is preferably from a mammalian (such as e.g. a mammalian cell sample or a sample from a mammalian subject, such as a cat, dog, horse, donkey, sheep, pig, goat, cow, mouse, rat, rabbit or guinea pig), preferably from a human (such as e.g. a human cell sample or a sample from a human subject).
- sample may be transported and/or stored, and collection may take place at a location remote from the sequence data acquisition (e.g. sequencing) location, and/or any computer-implemented method steps described herein may take place at a location remote from the sample collection location and/or remote from the sequence data acquisition (e.g. sequencing) location (e.g. the computer-implemented method steps may be performed by means of a networked computer, such as by means of a “cloud” provider).
- a “mixed sample” refers to a sample that is assumed to comprise multiple cell types or genetic material derived from multiple cell types. Within the context of the present disclosure, a mixed sample is typically one that comprises tumour cells or is assumed (expected) to comprise tumour cells, or genetic material derived from tumour cells.
- Genetic material can comprise genomic material (e.g. DNA) or transcriptomic material (e.g. RNA).
- Samples obtained from subjects are typically mixed samples (unless they are subject to one or more purification and/or separation steps).
- the sample comprises tumour cells and at least one other cell type (and/or genetic material derived therefrom).
- the mixed sample may be a tumour sample.
- a “tumour sample” refers to a sample derived from or obtained from a tumour.
- Such samples may comprise tumour cells and normal (non- tumour) cells.
- the normal cells may comprise immune cells (such as e.g. lymphocytes), and/or other normal (non-tumour) cells.
- tumour-infiltrating lymphocytes may be referred to as “tumour-infiltrating lymphocytes” (TIL).
- TIL tumor-infiltrating lymphocytes
- a tumour may be a solid tumour or a non-solid or haematological tumour.
- a tumour sample may be a primary tumour sample, tumour- associated lymph node sample, or a sample from a metastatic site from the subject.
- a sample comprising tumour cells or genetic material derived from tumour cells may be a bodily fluid sample.
- the genetic material derived from tumour cells may be circulating tumour DNA or tumour DNA in exosomes.
- the sample may comprise circulating tumour cells.
- a mixed sample may be a sample of cells, tissue or bodily fluid that has been processed to extract genetic material. Methods for extracting genetic material from biological samples are known in the art.
- a mixed sample may have been subject to one or more processing steps that may modify the proportion of the multiple cell types or genetic material derived from the multiple cell types in the sample.
- a mixed sample comprising tumour cells may have been processed to enrich the sample in tumour cells.
- a sample of purified tumour cells may be referred to as a “mixed sample” on the basis that small amounts of other types of cells may be present, even if the sample may be assumed, for a particular purpose, to be pure (i.e. to have a tumour fraction of 1 or 100%).
- tumour fraction refers to the proportion of DNA containing cells within a mixed sample that are tumour cells, or to the equivalent proportion that is assumed to result in a particular mixture of genetic material from tumour and non-tumour cells in a sample.
- Methods for determining the tumour fraction in a sample are known in the art. For example, in the context of cell or tissue samples, a tumour fraction may be estimated by analysing pathology slides (e.g.
- hematoxylin and eosin (H&E)-stained slides or other histochemistry or immunohistochemistry slides by counting tumour cells in one or more representative areas of a sample), or using high throughput assays such as flow cytometry.
- a tumour fraction may be estimated using sequence analysis processes that attempt to deconvolute tumour and germline genomes such as e.g. ASCAT (Van Loo et al., 2010), ABSOLUTE (Carter et al., 2012), or ichorCNA (Adalsteinsson et al., 2017).
- a “normal sample”, “healthy sample” or “germline sample” refers to a sample that is assumed not to comprise tumour cells or genetic material derived from tumour cells.
- a germline sample may be a blood sample, a tissue sample, or a purified sample such as a sample of peripheral blood mononuclear cells from a subject.
- the terms “normal”, “germline” or “wild type” when referring to sequences or genotypes refer to the sequence / genotype of cells other than tumour cells.
- a germline sample may comprise a small proportion of tumour cells or genetic material derived therefrom, and may nevertheless be assumed, for practical purposes, not to comprise said cells or genetic material.
- sequence data refers to information that is indicative of the presence and preferably also the amount of genetic material in a sample that has a particular sequence.
- Such information may be obtained using sequencing technologies, such as e.g. next generation sequencing (NGS), for example whole exome sequencing (WES), whole genome sequencing (WGS), RNA sequencing or sequencing of captured genomic loci (targeted or panel sequencing), or using array technologies, such as e.g. copy number variation arrays, expression arrays or other molecular counting assays.
- NGS next generation sequencing
- WES whole exome sequencing
- WGS whole genome sequencing
- RNA sequencing or sequencing of captured genomic loci targeted or panel sequencing
- array technologies such as e.g. copy number variation arrays, expression arrays or other molecular counting assays.
- the sequence data may comprise a count of the number of sequencing reads that have a particular sequence.
- the sequence data may comprise a signal (e.g. an intensity value) that is indicative of the number of sequences in the sample that have a particular sequence, for example by comparison to an appropriate control.
- Sequence data may be mapped to a reference sequence, for example a reference genome or transcriptome, using methods known in the art (such as e.g. Bowtie (Langmead et al., 2009)).
- counts of sequencing reads or equivalent non-digital signals may be associated with a particular genomic location (where the “genomic location” refers to a location in the reference genome to which the sequence data was mapped).
- a genomic location may contain a mutation, in which case counts of sequencing reads or equivalent non-digital signals may be associated with each of the possible variants (also referred to as “alleles”) at the particular genomic location.
- variant calling The process of identifying the presence of a mutation at a particular location in a sample is referred to as “variant calling” and can be performed using methods known in the art (such as e.g. the GATK HaplotypeCaller,gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller).
- sequence data may comprise a count of the number of reads (or an equivalent non-digital signal) which match a germline (also sometimes referred to as “reference”) allele at a particular genomic location, and a count of the number of reads (or an equivalent non- digital signal) which match a mutated (also sometimes referred to as “alternate”) allele at the genomic location.
- genomic sequence data may be used to infer copy number profiles along a genome, using methods known in the art. Copy number profiles refer to the number of genomic copies of genomic regions. Copy number profiles may be allele specific. In the context of the present disclosure, copy number profiles are preferably allele specific and tumour / normal sample specific.
- the copy number profiles used in the present disclosure are preferably obtained using methods designed to analyse samples comprising a mixture of tumour and normal cells, and to produce allele-specific copy number profiles for the tumour cells and the normal cells in a sample.
- Allele specific copy number profiles for mixed samples may be obtained from sequence data (e.g. using read counts as described above), using e.g. ASCAT (Van Loo et al., 2010). Other methods are known and equally suitable.
- the method used to obtain allele-specific copy number profiles is one that reports a plurality of possible copy number solutions and an associated quality/confidence metric.
- ASCAT outputs a goodness-of-fit metric for each combination of values of ploidy (ploidy for a whole tumour sample, not segment-specific) and purity for which a corresponding allele-specific copy number profile was evaluated.
- tumour-specific copy number profiles generated by such methods represent an average or summary of the entire tumour cell population.
- total copy number refers to the total number of copies of a genomic region in a sample.
- major copy number refers to the number of genomic copies of the most prevalent allele in a sample.
- minor copy number refers to the number of genomic copies of the allele other than the most prevalent allele in a sample.
- normal copy number or “normal total copy number” refers to the number of copies of a genomic region in the normal cells in a sample. Normal cells typically have two copies of each chromosome (unless the cell is genetically male and the chromosome is a sex chromosome), and hence the normal copy number may in embodiments be assumed to be equal to 2 (unless the genomic region is on the X or Y chromosome and the sample under analysis is from a male subject, in which case the normal copy number may be assumed to be equal to 1). Alternatively, the normal copy number for a particular genomic region may be determined using a normal sample.
- log R value refers to a measure of normalised total signal intensity, quantifying the total genomic copy number at a genomic locus.
- the term typically refers to the log R value for a sample comprising tumour genetic material, and the normalisation is typically performed by reference to a normal sample (which is preferably a matched normal sample but may also be a process-matched normal sample or other suitable normal reference sample).
- a normal sample which is preferably a matched normal sample but may also be a process-matched normal sample or other suitable normal reference sample.
- the logR may be obtained as the normalised log transform of read depth (log(read depth tumour/ read depth normal)).
- MBAF mean B allele frequency
- BAF also sometimes referred to as “B allele frequency” (BAF)
- BAF is a measure of normalised allelic intensity ratio at a genomic location.
- the term typically refers to the BAF value for a sample comprising tumour genetic material, and the normalisation is typically performed by reference to a normal sample (which is preferably a matched normal sample but may also be a process-matched normal sample or other suitable normal reference sample).
- the BAF may be obtained as the ratio of the allele frequency for the tumour allele vs the normal allele.
- Copy number profiles typically comprise copy number estimates over genomic regions called “segments”.
- the BAF and logR associated with a genomic location may refer to the BAF and logR of the segment overlapping a particular genomic location (such as e.g. the genomic location of a mutation). Further, the BAF and logR can be used to obtain corresponding major and minor copy numbers. In embodiments, the values of copy number metrics may be provided for both a tumour copy number profile estimate and a normal copy number profile estimate, even if only the tumour copy number profile values are used.
- the terms “tumour-specific mutation”, “tumour-specific variants”, “somatic mutation” or simply “mutation” or “variant” are used interchangeably and refer to a difference in a nucleotide sequence (e.g.
- DNA or RNA in a tumour cell compared to a healthy cell from the same subject.
- the difference in the nucleotide sequence can result in the expression of a protein which is not expressed by a healthy cell from the same subject.
- a mutation may be a single nucleotide variant (SNV), multiple nucleotide variant (MNV), a deletion mutation, an insertion mutation, a translocation, a missense mutation, a translocation, a fusion, a splice site mutation, or any other change in the genetic material of a tumour cell.
- a mutation may result in the expression of a protein or peptide that is not present in a healthy cell from the same subject.
- Mutations may be identified by exome sequencing, RNA-sequencing, whole genome sequencing and/or targeted gene panel sequencing and or routine Sanger sequencing of single genes, followed by sequence alignment and comparing the DNA and/or RNA sequence from a tumour sample to DNA and/or RNA from a reference sample or reference sequence (e.g. the germline DNA and/or RNA sequence, or a reference sequence from a database). Suitable methods are known in the art.
- An "indel mutation” refers to an insertion and/or deletion of bases in a nucleotide sequence (e.g. DNA or RNA) of an organism. Typically, the indel mutation occurs in the DNA, preferably the genomic DNA, of an organism.
- the indel may be from 1 to 150 bases, for example 1 to 90, 1 to 50, 1 to 23 or 1 to 10 bases.
- An indel mutation may be a frameshift indel mutation.
- a frameshift indel mutation is a change in the reading frame of the nucleotide sequence caused by an insertion or deletion of one or more nucleotides. Such frameshift indel mutations may generate a novel open-reading frame which is typically highly distinct from the polypeptide encoded by the non-mutated DNA/RNA in a corresponding healthy cell in the subject.
- a “neoantigen” (or “neo-antigen”) is an antigen that arises as a consequence of a mutation within a cancer cell.
- neoantigen is not expressed (or expressed at a significantly lower level) by normal (i.e. non-tumour) cells.
- a neoantigen may be processed to generate distinct peptides which can be recognised by T cells when presented in the context of MHC molecules.
- neoantigens may be used as the basis for cancer immunotherapies. References herein to "neoantigens" are intended to include also peptides derived from neoantigens. The term "neoantigen" as used herein is intended to encompass any part of a neoantigen that is immunogenic.
- an "antigenic" molecule as referred to herein is a molecule which itself, or a part thereof, is capable of stimulating an immune response, when presented to the immune system or immune cells in an appropriate manner.
- the binding of a neoantigen to a particular MHC molecule may be predicted using methods which are known in the art. Examples of methods for predicting MHC binding include those described by Lundegaard et al., O’Donnel et al., and Bullik-Sullivan et al.
- MHC binding of neoantigens may be predicted using the netMHC-3 (Lundegaard et al.) and netMHCpan4 (Jurtz et al.) algorithms.
- a neoantigen that has been predicted to bind to a particular MHC molecule is thereby predicted to be presented by said MHC molecule on the cell surface.
- a “clonal neoantigen” (also sometimes referred to as “truncal neoantigen”) is a neoantigen that results from a mutation that is present in essentially every tumour cell in one or more samples from a subject (or that can be assumed to be present in essentially every tumour cell from which the tumour genetic material in the sample(s) is derived).
- a “clonal mutation” (sometimes referred to as “truncal mutation”) is a mutation that is present in essentially every tumour cell in one or more samples from a subject (or that can be assumed to be present in essentially every tumour cell from which the tumour genetic material in the sample(s) is derived).
- a clonal mutation may be a mutation that is present in every tumour cell in one or more samples from a subject.
- a “sub-clonal” neoantigen is a neoantigen that results from a mutation that is present in a subset or a proportion of cells in one or more tumour samples from a subject (or that can be assumed to be present in a subset of the tumour cells from which the tumour genetic material in the sample(s) is derived).
- a “sub- clonal” mutation is a mutation that is present in a subset or a proportion of cells in one or more tumour samples from a subject (or that can be assumed to be present in a subset of the tumour cells from which the tumour genetic material in the sample(s) is derived).
- a neoantigen or mutation may be clonal in the context of one or more samples from a subject while not being truly clonal in the context of the entirety of the population of tumour cells that may be present in a subject (e.g. including all regions of a primary tumour and metastasis).
- a clonal mutation may be “truly clonal” in the sense that it is a mutation that is present in essentially every tumour cell (i.e. in all tumour cells) in the subject. This is because the one or more samples may not be representative of each and every subset of cells present in the subject.
- a “clonal neoantigen” or “clonal mutation” may also be referred to as a “ubiquitous neoantigen” or “ubiquitous mutation”, to indicate that the neoantigen is present in essentially all tumour cells that have been analysed, but may not be present in all tumour cells that may exist in the subject.
- the terms “clonal” and “ubiquitous” are used interchangeably unless context indicates that reference to “true clonality” was intended.
- tumour cell in relation to one or more samples or a subject may refer to at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94% at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% of the tumour cells in the one or more samples or the subject.
- a neoantigen/mutation that is identified as likely to be clonal (or “ubiquitous”) may be considered likely to be truly clonal, or at least more likely to be truly clonal than a neoantigen/mutation that is identified as unlikely to be clonal.
- the confidence in the probability that a clonal neoantigen/mutation identified in a subject is truly clonal increases when the sample(s) used to identify the clonal neoantigen/mutation capture a more complete picture of the genetic diversity of the tumour (e.g. by including a plurality of samples from the subject, such as e.g. samples from different regions of the tumour, and/or by including samples that inherently capture a diversity of tumour cells such as e.g. ctDNA samples).
- a neoantigen/mutation that is identified as unlikely to be clonal is unlikely to be truly clonal, because the identification that the neoantigen/mutation is unlikely to be clonal indicates that even in the restricted view afforded by the sampling process, there is evidence that the neoantigen/mutation is not present in all tumour cells.
- the process of identifying clonal neoantigens/mutations may be seen as prioritising which candidate neoantigens/mutations are most likely to be clonal, based on the restricted view of the clonal structure of the subject’s tumour available from the one or more samples.
- cancer cell fraction refers to the proportion of tumour cells that contain a mutation, such as e.g. a mutation that results in a particular neoantigen.
- a cancer cell fraction may be estimated based on one or more samples, and as such may not be equal to the true cancer cell fraction in the subject (as explained above). Nevertheless, the cancer cell fraction estimated based on one or more samples may provide a useful indication of the likely true cancer cell fraction. Further, as explained above, the accuracy of such an estimate may increase when the sample(s) used to estimate the cancer cell fraction capture a more complete picture of the genetic diversity of the tumour. Additional sources of noise and confounding factors in genomic data mean that a cancer cell fraction determined from one or more samples represents an estimate.
- a cancer cell fraction estimate may be obtained by integrating variant allele frequencies with copy numbers and purity estimates as described by Landau et al. (2013). Such a CCF estimate can also be used to identify mutations that are likely to be clonal.
- a clonal mutation may be defined as a mutation which has an estimated cancer cell fraction (CCF) ⁇ 0.75, such as a CCF ⁇ 0.80, 0.85. 0.90, 0.95 or 1.0.
- a subclonal mutation may be defined as a mutation which has a CCF ⁇ 0.95, 0.90, 0.85, 0.80, or 0.75.
- a CCF estimate may be associated with (e.g. derived from) a distribution associating a probability with each of a plurality of possible values of CCF between 0 and 1, from which statistical estimates of confidence may be obtained.
- a mutation may be identified as clonal if there is more than a 50% chance or probability that its cancer cell fraction (CCF) reaches or exceeds the required value as defined above, for example 0.75 or 0.95, such as a chance or probability of 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% or more.
- mutations may be classified as likely clonal or subclonal based on whether the posterior probability that their CCF exceeds 0.95 (or 0.75, or any other chosen threshold) is greater or lesser than 0.5, respectively.
- the threshold may be fixed.
- the threshold may be determined for a particular set of mutations that are investigated.
- the threshold may be set based on a benchmarking data set with known clonal / non-clonal status, to reach a predetermined precision and/or recall.
- a benchmarking data set may be obtained using synthetic data and/or using a data set obtained from a population with known clonality structure (for example a cell line mixture data).
- the threshold may be set such that any mutation (or a certain % of mutations) that is associated with an estimated CCF that has a confidence interval meeting the criteria described above (e.g.it is such that the upper bound of the 95% confidence interval of the estimated CCF is greater than or equal to 0.75) is selected as likely to be clonal.
- the threshold may be set such that any mutation (or a certain % of mutations) that is associated with an estimated CCF that has a posterior probability distribution meeting the criteria described above (e.g. a posterior probability that their CCF exceeds 0.95 (or 0.75, or any other chosen threshold) is greater than 0.5) is selected as likely to be clonal.
- a cancer immunotherapy refers to a therapeutic approach comprising administration of an immunogenic composition (e.g. a vaccine), a composition comprising immune cells, or an immunoactive drug, such as e.g. a therapeutic antibody, to a subject.
- an immunogenic composition e.g. a vaccine
- a composition comprising immune cells or an immunoactive drug, such as e.g. a therapeutic antibody
- an immunogenic composition or vaccine may comprise a neoantigen, neoantigen presenting cell or material necessary for the expression of the neoantigen.
- a composition comprising immune cells may comprise T and/or B cells that recognise a neoantigen.
- the immune cells may be isolated from tumours or other tissues (including but not limited to lymph node, blood or ascites), expanded ex vivo or in vitro and re-administered to a subject (a process referred to as “adoptive cell therapy”).
- T cells can be isolated from a subject and engineered to target a neoantigen (e.g. by insertion of a chimeric antigen receptor that binds to the neoantigen) and re-administered to the subject.
- a therapeutic antibody may be an antibody which recognises a neoantigen.
- an antibody as referred to herein will recognise the neoantigen.
- the neoantigen is an intracellular antigen
- the antibody will recognise the neoantigen peptide-MHC complex.
- an antibody which "recognises" a neoantigen encompasses both of these possibilities.
- an immunotherapy may target a plurality of neoantigens.
- an immunogenic composition may comprise a plurality of neoantigens, cells presenting a plurality of neoantigens or the material necessary for the expression of the plurality of neoantigens.
- a composition may comprise immune cells that recognise a plurality of neoantigens. Similarly, a composition may comprise a plurality of immune cells that recognise the same neoantigen. As another example, a composition may comprise a plurality of therapeutic antibodies that recognise a plurality of neoantigens. Similarly, a composition may comprise a plurality of therapeutic antibodies that recognise the same neoantigen.
- a composition as described herein may be a pharmaceutical composition which additionally comprises a pharmaceutically acceptable carrier, diluent or excipient. The pharmaceutical composition may optionally comprise one or more further pharmaceutically active polypeptides and/or compounds. Such a formulation may, for example, be in a form suitable for intravenous infusion.
- an immune cell is intended to encompass cells of the immune system, for example T cells, NK cells, NKT cells, B cells and dendritic cells.
- the immune cell is a T cell.
- An immune cell that recognises a neoantigen may be an engineered T cell.
- a neoantigen specific T cell may express a chimeric antigen receptor (CAR) or a T cell receptor (TCR) which specifically binds a neoantigen or a neoantigen peptide, or an affinity-enhanced T cell receptor (TCR) which specifically binds a neoantigen or a neoantigen peptide (as discussed further hereinbelow).
- CAR chimeric antigen receptor
- TCR T cell receptor
- TCR affinity-enhanced T cell receptor
- the T cell may express a chimeric antigen receptor (CAR) or a T cell receptor (TCR) which specifically binds to a neo- antigen or a neo-antigen peptide (for example an affinity enhanced T cell receptor (TCR) which specifically binds to a neo-antigen or a neo-antigen peptide).
- a population of immune cells that recognise a neoantigen may be a population of T cell isolated from a subject with a tumour.
- the T cell population may be generated from T cells in a sample isolated from the subject, such as e.g. a tumour sample, a peripheral blood sample or a sample from other tissues of the subject.
- the T cell population may be generated from a sample from the tumour in which the neoantigen is identified.
- the T cell population may be isolated from a sample derived from the tumour of a patient to be treated, where the neoantigen was also identified from a sample from said tumour.
- the T cell population may comprise tumour infiltrating lymphocytes (TIL).
- TIL tumour infiltrating lymphocytes
- Antibody includes monoclonal antibodies, polyclonal antibodies, multispecific antibodies (e.g., bispecific antibodies), and antibody fragments that exhibit the desired biological activity.
- immunoglobulin Ig
- Ig immunoglobulin
- an “immunogenic composition” is a composition that is capable of inducing an immune response in a subject.
- the term is used interchangeably with the term “vaccine”.
- the immunogenic composition or vaccine described herein may lead to generation of an immune response in the subject.
- An "immune response" which may be generated may be humoral and/or cell-mediated immunity, for example the stimulation of antibody production, or the stimulation of cytotoxic or killer cells, which may recognise and destroy (or otherwise eliminate) cells expressing antigens corresponding to the antigens in the vaccine on their surface.
- the immunogenic composition may comprise one or more neoantigens, or the material necessary for the expression of one or more neoantigens.
- a neoantigen may be delivered in the form of a cell, such as an antigen presenting cell, for example a dendritic cell.
- the antigen presenting cell such as a dendritic cell may be pulsed or loaded with the neo-antigen or neo- antigen peptide or genetically modified (via DNA or RNA transfer) to express one, two or more neo-antigens or neoantigen peptides, for example 2, 3, 4, 5, 6, 7, 8, 9 or 10 neo-antigens or neo-antigen peptides.
- Neoantigen peptides may be synthesised using methods which are known in the art.
- the term "peptide” is used in the normal sense to mean a series of residues, typically L-amino acids, connected one to the other typically by peptide bonds between the a-amino and carboxyl groups of adjacent amino acids.
- the term includes modified peptides and synthetic peptide analogues.
- the neoantigen peptide may comprise the cancer cell specific mutation (e.g the non-silent amino acid substitution encoded by a single nucleotide variant (SNV)) at any residue position within the peptide.
- SNV single nucleotide variant
- a peptide which is capable of binding to an MHC class I molecule is typically 7 to 13 amino acids in length.
- the amino acid substitution may be present at position 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or 13 in a peptide comprising thirteen amino acids.
- longer peptides for example 21-31-mers, may be used, and the mutation may be at any position, for example at the centre of the peptide, e.g. at positions 10, 11, 12, 13, 14, 15 or 16.
- Such peptides can also be used to stimulate both CD4 and CD8 cells to recognise neoantigens.
- treatment refers to reducing, alleviating or eliminating one or more symptoms of the disease which is being treated, relative to the symptoms prior to treatment.
- prevention refers to delaying or preventing the onset of the symptoms of the disease. Prevention may be absolute (such that no disease occurs) or may be effective only in some individuals or for a limited amount of time.
- computer system includes the hardware, software and data storage devices for embodying a system or carrying out a method according to the above described embodiments.
- a computer system may comprise a central processing unit (CPU), input means, output means and data storage, which may be embodied as one or more connected computing devices.
- CPU central processing unit
- the computer system has a display or comprises a computing device that has a display to provide a visual output display (for example in the design of the business process).
- the data storage may comprise RAM, disk drives or other non-transitory computer readable media.
- the computer system may include a plurality of computing devices connected by a network and able to communicate with each other over that network. It is explicitly envisaged that computer system may consist of or comprise a cloud computer.
- computer readable media includes, without limitation, any non- transitory medium or media which can be read and accessed directly by a computer or computer system.
- the media can include, but are not limited to, magnetic storage media such as floppy discs, hard disc storage media and magnetic tape; optical storage media such as optical discs or CD-ROMs; electrical storage media such as memory, including RAM, ROM and flash memory; and hybrids and combinations of the above such as magnetic/optical storage media.
- Identification of expressed mutations The present disclosure provides methods for determining whether a tumour-specific mutation is likely to be expressed using sequence data from one or more samples comprising tumour cells or genetic material derived therefrom. The disclosure also provides methods for identifying neoantigens comprising determining whether one or more tumour-specific mutations is/are likely to be expressed. An illustrative method will be described by reference to Figure 2A.
- the method may comprise optional step 10 of obtaining one or more samples comprising tumour genetic material (such as e.g. one or more tumour samples).
- the sample(s) may be mixed samples comprising genomic material from multiple cell types including tumour cells and non-tumour cells (also referred to as “reference”, “healthy”, “normal” or “germline” cells).
- One or more germline samples may also be obtained, which do not comprise tumour genetic material.
- Germline samples may be matched germline samples, obtained from the same subject as the subject from which the one or more tumour samples are obtained.
- a matched germline sample improves the accuracy of calling of somatic (tumour- specific) mutations, as any variant position identified in a tumour sample can be compared to variant positions in a matched germline sample to exclude germline variants.
- the same matched germline sample may be used to analyse a plurality of tumour samples from a subject. Further, the matched germline sample and one or more tumour samples may have been obtained at different times. For example, a first tumour sample and matched germline sample may have been obtained at the time of diagnosis or resection of a tumour, and a further tumour sample may be obtained and analysed together with the initial matched germline sample at a later time point.
- a reference sample or genome including common germline variants may be used.
- a process- matched normal sample may be used, which may not have been obtained from the same subject, or may have been obtained from a pool of subjects.
- the samples may be sequenced at step 12, to obtain at least RNA sequence data, and also optionally DNA sequence data.
- the RNA sequence data may be obtained by RNA sequencing.
- the DNA sequence data may be obtained using one of whole exome sequencing, or whole genome sequencing.
- Alternative methods such as e.g. allele-specific copy number arrays or expression arrays, may be used, although sequencing methods are preferred since they generate a digital output representative of the number of each particular sequence in a sample.
- the sequence data may have been previously obtained and may be received from a user interface, computing device or database.
- the sequence data may be analysed to identify one or more mutations that are likely to be present in the tumour cells but not in non-cancerous cells. These represent tumour-specific mutations. These may be used as candidate neoantigens as explained further below.
- Step 14 may comprise the steps of aligning the sequences from the one or more samples (i.e. the mixed sample(s) and the germline sample(s), if available) to a reference such as e.g.
- RNA sequence data may be used to identify tumour-specific mutations that are single nucleotide variants, multiple nucleotide variants or indels, and RNA sequence data may be used to identify tumour-specific mutations that are gene fusions or splicing variants.
- the tumour-specific mutations analysed may be somatic mutations present in a tumour of the subject from which the samples have been obtained. Any one or more of the tumour-specific mutations identified (or otherwise selected for example by a user through a user interface, or obtained from a computing device or database), may then be analysed to determine whether it is likely to be expressed.
- sequence data is obtained for each sample comprising tumour genetic material, the data comprising the number of RNA reads in the sample that show the tumour-specific mutation (b) (these may also be referred to as read “containing” the tumour-specific mutation or reads “supporting” the tumour- specific mutation) and the total number of reads at the location of the tumour specific (d).
- the sequence data may comprise any two of: the number of RNA reads that show a tumour specific mutation, the number of RNA reads that show the corresponding germline allele, and the total number of RNA reads at the location of the tumour specific mutation (as all 3 numbers can be obtained from any two of these).
- information about at least one copy number solution compatible with each sample comprising tumour-genetic material, and about the tumour fraction in the sample may be obtained.
- This information may comprise allele-specific copy number metrics for the tumour fraction of the sample selected from the major copy number, minor copy number, total copy number, mean B allele frequency, log R value and tumour ploidy, and the normal copy number, or information derived from these metrics such as a set of candidate joint genotypes that is compatible with these allele-specific copy number metrics.
- Not all such allele-specific copy number metrics are necessary as some contain redundant information and/or can be associated with suitable default values.
- the normal copy number can be associated with a suitable default value (e.g.2, assuming that the normal cells are diploid). Further, only two of the major copy number, total copy number and minor copy number are necessary to infer the third one.
- a copy number solution may be associated with a corresponding confidence metric. When such a metric is not available, each copy number solution may be assumed to be equally likely.
- Each candidate joint genotype comprises a genotype at the location of the tumour-specific mutation for a normal population, and for a tumour cell population that comprises the tumour-specific mutation.
- it is determined whether a tumour-specific mutation is likely to be expressed by determining the likelihoods of the sequence data (number of reads containing the tumour specific mutation and total number of reads at the locus of the tumour specific mutation) if the tumour-specific mutation is expressed and if the tumour specific mutation is not expressed.
- These likelihoods may depend on the probability of sampling a sequence read comprising the tumour-specific mutation from a sample if the tumour-specific mutation is expressed or not expressed, respectively, depending on the genotypes of the tumour and normal cell populations, the sequencing error rate, the tumour fraction of the sample and the fraction of total read counts for the gene comprising the tumour-specific mutation which is due to the normal cell population.
- the likelihoods may be compared to determine whether the tumour- specific mutation is likely to be expressed.
- ⁇ , ⁇ , ⁇ 1)) and (ii) not-expressed ( ⁇ ( ⁇ , ⁇
- ⁇ , ⁇ , ⁇ 0 ) ).
- comparing the likelihoods may comprise determining whether the tumour specific mutation is expressed and determining the power to detect whether the tumour-specific mutation is expressed at a predetermined false positive rate.
- This may comprise determining a threshold number of reads (bc) as the number of reads such that: the area under the curve of the likelihood of the number of reads that show the tumour specific mutation if the tumour-specific mutation is not expressed ( ⁇ ( ⁇ , ⁇
- ⁇ , ⁇ , ⁇ 0), ⁇ ( ⁇
- ⁇ 0), ⁇ ( ⁇
- a tumour-specific mutation may be considered to be likely to be expressed if the number of reads showing the mutation is above this threshold number of reads.
- the power to detect whether the tumour-specific mutation is expressed may be the area under the curve of the likelihood of the number of reads that show the tumour specific mutation if the tumour-specific mutation is expressed ( ⁇ ( ⁇ , ⁇
- ⁇ , ⁇ , ⁇ 1), ⁇ ( ⁇
- ⁇ 1), ⁇ ( ⁇
- Step 20 may comprise determining that the tumour-specific mutation is likely to be expressed if the posterior probability is above a predetermined threshold.
- Step 20 may comprise determining that the tumour-specific mutation is likely to be expressed if the number of reads showing the mutation is above the threshold number of reads.
- Step 20 may comprise determining that the tumour-specific mutation is likely to be expressed if the number of reads showing the mutation is below the threshold number of reads but the power to detect whether the tumour specific mutation is expressed is below a predetermined threshold.
- Step 20 may further comprise, prior to determining the likelihoods, a step of estimating the value of the fraction of total read counts for the gene comprising the tumour-specific mutation which is due to the normal cell population ( ⁇ ). This may comprise obtaining the total expression of the gene comprising the tumour specific mutation in a plurality of samples with different tumour purities, and fitting a regression model to the values of the total expression as a function of purity, and determining the value of ⁇ from the fitted regression model. This may be performed using equation (31) as explained further below.
- Step 20 may further comprise determining whether the total expression of the gene comprising the tumour-specific mutation is above a predetermined threshold.
- Step 20 may further comprise obtaining a prior probability that the tumour-specific mutation is expressed.
- the prior probability may be obtained from a user, computing device or database.
- the prior probability may be selected from a plurality of values depending on: (i) whether the total expression of the gene comprising the tumour- specific mutation is above a predetermined threshold, and (ii) whether the total number of reads at the location of the tumour specific mutaton is above a predetermine threshold. For example, when the total number of reads at the location of the tumour specific mutation is low (e.g. 0) but the total expression of the gene is not low (e.g.
- the tumour-specific mutation may be considered likely to be expressed and the prior probability may be set to 0.5.
- the tumour-specific mutation may be considered unlikely to be expressed (as the gene as a whole is not expressed) and the prior probability may be set to value below 0.5, such as e.g. 0.05.
- the total expression of the gene may be the total expression of the gene in the one or more samples, or in one or more other samples such as e.g. samples from the same or similar tumour types, or expression data derived thereof such as expression data from one or more databases (e.g.
- step 22 it is determined whether the tumour-specific mutation is likely to give rise to a neoantigen. For example, it may be determined whether the mutation is likely to result in a peptide or protein that is not expressed by a germline cell (whose genome does not contain the mutation). As another example, it may be determined whether the mutation is likely to be clonal in the tumour. This step may be performed at any point after step 14, and in particular need not be performed after steps 16-20.
- candidate tumour-specific mutations may be filtered depending on whether they are likely to give rise to a neoantigen prior to determining whether the tumour-specific mutation is likely to be expressed.
- tumour- specific mutations that satisfy one or more criteria that apply to the results of step 20 and one or more criteria that apply to the results of step 22 may be identified. These may be considered to represent candidate neoantigens, optionally candidate clonal neoantigens.
- the results of any of the preceding steps (and in particular any of steps 20 to 24) may be provided to a user, for example through a user interface. These results may be used for example to provide an immunotherapy or prognosis for a subject, as will be described further below.
- the methods described herein may be used to determine whether a mutation is likely to be actually expressed, and whether a mutation is identified as unlikely to be expressed because the power to detect expression with the sequence data at hand is low. Further, the methods described herein may be used to detect mutations from RNA data (e.g. in particular for mutations that are only or advantageously detectable from RNA sequence data such as e.g. gene fusions and splice variants such as retained introns). Thus, also described herein are methods of detecting a candidate tumour-specific mutation, comprising determining whether the candidate tumour-specific mutation is likely to be expressed in a sample using the methods described herein.
- RNA sequencing depth that is necessary to be able to detect a particular mutation with a predetermined minimum power.
- the approach may be used to determine the power to detect a mutation as being expressed given a plurality of candidate sequencing depths (resulting in corresponding b and d values).
- M0) and with a power above a predetermined value may be selected for use in sequencing a sample where a mutation is suspected to be present or expressed.
- RNA sequence data comprising the number of RNA reads in the sample that show the tumour- specific mutation (b), and the total number of RNA reads at the location of the tumour-specific mutation (d) corresponding to the one or more sequencing depths if the tumour-specific mutation is truly expressed in the sample; and determining the likelihoods of the sequence data if the tumour-specific mutation is (i) expressed ( ⁇ ( ⁇ , ⁇
- ⁇ , ⁇ , ⁇ 1 ) ) and (ii) not-expressed ( ⁇ ( ⁇ , ⁇
- ⁇ , ⁇ , ⁇ 0 ) ), using the RNA sequence data, and (ii) selecting a sequence depth that is sufficient for the tumour specific mutation to be
- a method to determine a sequencing depth to be used to sequence RNA in a tumour sample comprising performing the methods of determining whether a tumour-specific variant is likely to be expressed using RNA sequence data at one or more candidate sequencing depths, and using the results of the determining to select a candidate sequencing depth such that a tumour-specific variant that is truly expressed is determined as likely to be expressed according to the methods described herein.
- the above methods find applications in the context of cancer diagnostic, prognostic and therapeutic approaches.
- the above methods may be used to provide immunotherapies that target neoantigens.
- FIG. 2B illustrates schematically an exemplary method of providing an immunotherapy.
- one or more samples comprising tumour genetic material and optionally one or more germline samples are obtained from a subject.
- the subject may be a subject that has been diagnosed as having cancer, and may be (but does not need to be) the same subject for which the immunotherapy is provided.
- a list of candidate neoantigens is obtained.
- This may comprise step 212’ of obtaining a list of candidate neoantigens from genomic sequence data and/or step 212’’ of obtaining a list of candidate neoantigens from RNA sequence data.
- a list of candidate neoantigens may be obtained from genomic sequence data from the sample(s) using methods known in the art, for example as described in WO 2016/16174085, Landau et al. (2013), Lu et al. (2016), Leko et al. (2019), Hundal et al. (2019), and others.
- the list of candidate neoantigens may comprise a single neoantigen, or a plurality of neoantigens.
- the list comprises a plurality of neoantigens.
- the neoantigens may be clonal neoantigens. Methods to identify clonal neoantigens are known in the art and include the methods described in WO 2016/16174085, Landau et al. (2013), Roth et al. (2014), McGranahan et al. (2016), and in co-pending application PCT/EP2022/058793.
- one or more candidate neoantigens may be identified from RNA sequence data from the sample(s) at step 212’’, for example by identifying one or more RNA sequence reads that include a variant.
- step 212’’ may comprise optional step 212a’’, where the RNA sequence content of the one or more mixed samples and optionally the matched sample may be determined, for example by sequencing the RNA (or mRNA) in the sample using RNA sequencing. Alternative methods such as e.g. expression arrays may be used, although sequencing methods are preferred since they generate a digital output representative of the number of each particular sequence in a sample.
- Step 212’’ may further comprise optional step 212b’’ of analysing the RNA sequence data to identify one or more mutations that are likely to be present in the tumour cells but not in non-cancerous cells.
- step 212b’’ may comprise aligning the RNA sequences from the one or more samples to a reference genome or transcriptome and identifying sequences that are not expected to be present in such a reference (e.g.
- Step 212’ may comprise optional step 212a’, where the sequence content of the one or more mixed samples and optionally the matched sample may be determined, for example by sequencing the genomic material in the sample using one of whole exome sequencing, or whole genome sequencing. Alternative methods such as e.g. allele-specific copy number arrays may be used, although sequencing methods are preferred since they generate a digital output representative of the number of each particular sequence in a sample.
- the sequence data may be analysed to identify one or more mutations that are likely to be present in the tumour cells but not in non-cancerous cells. These represent tumour-specific mutations and may be used as candidate neoantigens. This may comprise the steps of aligning the sequences from the one or more samples (i.e. the mixed sample(s) and the germline sample(s), if available), and identifying genomic locations where the sequence of the tumour differs from the germline sequence or can be assumed to differ from the germline sequence (e.g. if a germline sequence for the subject is not available).
- genomic sequence data for the mixed sample at the genomic location of a candidate tumour- specific mutation is obtained, comprising the count of reads supporting the mutated allele (also referred to as “non-reference allele”), the count of reads supporting the germline allele(s) (A, collectively referred to as “germline allele” if the locus is heterozygous in the germline population, also referred to as “reference”, “wild type” or “normal” allele) at the genomic location, and/or the total count of reads at the genomic location of the candidate tumour- specific mutation. Only two of these metrics need to be obtained as the third one can be deduced from any two of these.
- the sequence data may instead or in addition to this include read data or intensity data from which the counts can be obtained.
- information about at least one copy number solution compatible with each sample comprising tumour-genetic material may be obtained.
- This information may comprise allele-specific copy number metrics for the tumour fraction of the sample selected from the major copy number, minor copy number, total copy number, mean B allele frequency (MBAF), log R value and tumour ploidy, and the normal copy number, or information derived from these metrics such as a set of candidate joint genotypes that is compatible with these allele-specific copy number metrics.
- MBAF mean B allele frequency
- log R value and tumour ploidy log R value and tumour ploidy
- the normal copy number or information derived from these metrics such as a set of candidate joint genotypes that is compatible with these allele-specific copy number metrics.
- Not all such allele-specific copy number metrics are necessary as some contain redundant information and/or can be associated with suitable default values.
- the normal copy number can be associated with a suitable default value as explained
- a copy number solution may be associated with a corresponding confidence metric. When such a metric is not available, each copy number solution may be assumed to be equally likely.
- Each candidate joint genotype comprises a genotype at the location of the tumour-specific mutation for a normal population, a reference tumour population that does not comprise the tumour-specific mutation and a variant tumour cell population that comprises the tumour-specific mutation.
- a posterior probability of a tumour-specific mutation being clonal is determined depending on: a prior probability of the mutation being clonal, and the probabilities of observing the sequence data if the tumour-specific mutation is (i) clonal and (ii) non-clonal, in view of a tumour fraction for each of the one or more samples and one or more candidate joint genotypes. Methods for obtaining such a posterior probability are further described below.
- a prior probability is a probability that represent a belief about a quantity before some evidence is taken into account.
- a prior probability of a mutation being clonal may represent a probability of a mutation being clonal in the tumour, that is based on prior knowledge or assumptions, and does not take into account the sequence data from the mixed sample.
- tumour-specific mutations that satisfy one or more criteria that apply to the results of step 214 and optionally one or more criteria are selected. For example, tumour-specific mutations that are associated with a probability of being expressed above a predetermined threshold may be selected.
- tumour-specific mutations that are associated with a probability of being expressed below a predetermined threshold may be excluded.
- tumour-specific mutations that are associated with a power to detect expression that is above a predetermined threshold, and a likelihood of a model assuming that the mutation is not expressed that is above a predetermined threshold may be excluded.
- tumour-specific mutations that are associated with a likelihood of a model assuming that the mutation is expressed that is below a predetermined threshold may be excluded unless they are also associated with a power to detect expression that is below a predetermined threshold.
- One or more further criteria may be applied, which together with the determination of step 214 provide information as to whether the candidate neoantigens are likely to represent true neoantigens.
- One or more further criteria may be applied which provide information as to whether the candidate neoantigen is likely to be clonal in the tumour. For example, tumour-specific mutations associated with a probability of being clonal (such as e.g. as determined at step 212e’) that is above a predetermined threshold may be selected. Any of these criteria may be applied in any order.
- candidate tumour-specific mutations may be filtered depending on whether they are likely to give rise to a neoantigen prior to determining whether the tumour-specific mutation is likely to be clonal, or vice-versa.
- an immunotherapy that targets at least one (and optionally a plurality) of the selected candidate neoantigens is designed. Designing such an immunotherapy may comprise identifying at step 218A one or more candidate peptides for each of the candidate clonal neoantigens.
- a plurality of peptides may be designed for at least one of the candidate clonal neoantigens, which differ in their lengths and/or the location of a sequence variation that characterises the neoantigen compared to the corresponding germline peptide.
- the one or more peptides identified may be tested in vitro and or in silico to evaluate one or more properties such as their immunogenicity, likelihood of being displayed by a MHC molecule, etc.
- one or more of the peptides may be selected, for example based on the results of step 218B.
- the selected peptides may be obtained.
- Peptides with selected sequences may be obtained using any method known in the art such as e.g. using an expression system or by direct synthesis.
- an immunotherapy may be produced using the one or more candidate peptides.
- the immunotherapy may comprise the one or more candidate peptides or material sufficient for their expression (e.g. in the case of an immunogenic composition or vaccine), or may comprise molecules or cells that have been obtained using the candidate peptides (e.g. in the case of therapeutic antibodies that selectively bind the candidate peptides, or immune cells that specifically recognise the candidate peptides).
- the immunotherapy may be administered to a subject, which is preferably the subject from which the samples used to identify the neoantigens have been obtained.
- a population of T cells may be obtained.
- the T cells may be obtained from the subject to be treated, but do not need to be.
- the T cells may be obtained from a tumour sample, from a blood sample, or from any other tissue sample.
- a population of dendritic cells may be obtained.
- a population of dendritic cells may be derived from mononuclear cells (e.g. peripheral blood mononuclear cells, PBMCs) from the subject to be treated.
- the population of dendritic cells may be pulsed with the candidate peptides.
- the T cell population may be selectively expanded using the population of pulsed dendritic cells. Additional expansion factors such as e.g. cytokines or stimulating antibodies may be used.
- the disclosure also provides a T cell composition comprising a T cell population selectively enriched with T cells that recognise one or more neoantigens that are likely to be expressed in a tumour, wherein the one or more neoantigens have been identified using any of the methods described herein.
- Neoantigens that are likely to be expressed may be neoantigens that are selected using the results of a method as described herein, such as e.g.
- the expanded population of neoantigen-reactive T cells may have a higher activity than the population of T cells which have not been expanded, as measured by the response of the T cell population to restimulation with a neoantigen peptide.
- Activity may be measured by cytokine production, and wherein a higher activity is a 5-10 fold or greater increase in activity.
- References to a plurality of neoantigens may refer to a plurality of peptides or proteins each comprising a different tumour-specific mutation that gives rise to a neoantigen.
- Said plurality may be from 2 to 250, from 3 to 200, from 4 to 150, or from 5 to 100 tumour-specific mutations, for example from 5 to 75 or from 10 to 50 tumour-specific mutations.
- Each tumour-specific mutation may be represented by one or more neoantigen peptides.
- a plurality of neoantigens may comprise a plurality of different peptides, some of which comprise a sequence that includes the same tumour-specific mutation (for example at different positions within the sequence of the peptide, or within peptides of varying lengths).
- the tumour-specific mutations selected according to the methods described herein may be ones that are determined using a method as described herein to be likely to be expressed in a tumour of the patient to be treated.
- tumour-specific mutations selected according to the methods described herein may be ones that are determined to be likely to be clonal in a tumour of the patient to be treated.
- a T cell population that is produced in accordance with the present disclosure will have an increased number or proportion of T cells that target one or more neoantigens that are predicted to be expressed (and optionally clonal). That is to say, the composition of the T cell population will differ from that of a "native" T cell population (i.e. a population that has not undergone the expansion steps discussed herein), in that the percentage or proportion of T cells that target a neoantigen that is predicted to be expressed (and optionally clonal) will be increased.
- the T cell population according to the disclosure may have at least about 0.2, 0.3, 0.4, 0.5, 06, 07, 08, 09, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or 100% T cells that target a neoantigen that is predicted to be expressed.
- the immunotherapies described herein may be used in the treatment of cancer.
- the disclosure also provides a method of treating cancer in a subject comprising administering an immunotherapeutic composition as described herein to the subject.
- the cancer may be ovarian cancer, breast cancer, endometrial cancer, kidney cancer (renal cell), lung cancer (small cell, non-small cell and mesothelioma), brain cancer (gliomas, astrocytomas, glioblastomas), melanoma, merkel cell carcinoma, clear cell renal cell carcinoma (ccRCC), lymphoma, small bowel cancers (duodenal and jejunal), leukemia, pancreatic cancer, hepatobiliary tumours, germ cell cancers, prostate cancer, head and neck cancers, thyroid cancer and sarcomas.
- kidney cancer renal cell
- lung cancer small cell, non-small cell and mesothelioma
- brain cancer gliomas, astrocytomas, glioblastomas
- melanoma merkel cell carcinoma
- clear cell renal cell carcinoma ccRCC
- lymphoma small bowel cancers (duodenal and jejunal)
- leukemia pancreatic cancer
- the cancer may be lung cancer, such as lung adenocarcinoma or lung squamous- cell carcinoma.
- the cancer may be melanoma.
- the cancer may be selected from melanoma, merkel cell carcinoma, renal cancer, non-small cell lung cancer (NSCLC), urothelial carcinoma of the bladder (BLAC) and head and neck squamous cell carcinoma (HNSC) and microsatellite instability (MSI)-high cancers.
- the cancer is non-small cell lung cancer (NSCLC).
- the cancer is melanoma. Treatment using the compositions and methods of the present disclosure may also encompass targeting circulating tumour cells and/or metastases derived from the tumour.
- the methods and uses for treating cancer described herein may be performed in combination with additional cancer therapies.
- the T cell compositions described herein may be administered in combination with immune checkpoint intervention, co-stimulatory antibodies, chemotherapy and/or radiotherapy, targeted therapy or monoclonal antibody therapy.
- 'In combination' may refer to administration of the additional therapy before, at the same time as or after administration of the T cell composition as described herein.
- the disclosure also provides a method for producing an immunotherapeutic composition, the method comprising identifying a neoantigen as likely to be expressed and producing an immunotherapeutic composition that targets the neoantigen.
- Also described herein is a method of treating a subject that has been diagnosed as having cancer, the method comprising: identifying one or more neoantigens by: identifying a plurality of tumour-specific mutations in the subject; determining whether one or more of the tumour- specific mutations is likely to be expressed in the subject; optionally determining whether one or more of the tumour-specific mutations is likely to be clonal in the subject; selecting one or more of the tumour-specific mutations as candidate neoantigens, wherein a candidate neoantigen is a tumour-specific mutation that satisfies at least one or more predetermined criteria on whether the tumour-specific mutation is likely to be expressed; and treating the subject with an immunotherapy that targets one or more of the selected candidate neoantigens; wherein determining whether a tumour-specific mutation is likely to be expressed in a subject is performed using the methods described herein.
- determining whether a tumour-specific mutation is likely to be expressed in a subject may comprise: obtaining, by a processor, RNA sequence data from one or more samples from the subject comprising tumour genetic material, the sequence data comprising for each of the one or more samples, at least two of: the number of RNA sequence reads in the sample that show the tumour-specific mutation (b), the number of RNA sequence reads in the sample that show the corresponding germline allele, and the total number of RNA sequence reads at the location of the tumour-specific mutation (d), and determining, by the processor, the posterior probability that the tumour-specific mutation is expressed depending on: the mean of a prior probability of the mutation being expressed, and the likelihoods of the sequence data if the tumour-specific mutation is (i) expressed and (ii) not expressed, wherein the likelihoods of the sequence data are conditional on a tumour fraction for each of the one or more samples, and a fraction of the total expression at the locus of the tumour-specific mutation that is assumed to come from a normal population of cells that does not
- the method may further comprise determine whether a tumour-specific mutation is likely to be clonal in the subject by: obtaining, by a processor, genomic sequence data from one or more samples from the subject comprising tumour genetic material, the sequence data comprising for each of the one or more samples, at least two of: the number of reads in the sample that show the tumour-specific mutation (d b ), the number of reads in the sample that show the corresponding germline allele, and the total number of reads at the location of the tumour-specific mutation (d), and determining, by the processor, a posterior probability that the tumour-specific mutation is clonal depending on: a prior probability of the mutation being clonal, and the probabilities of observing the sequence data if the tumour-specific mutation is (i) clonal and (ii) non-clonal, in view of a tumour fraction for each of the one or more samples and one or more candidate joint genotypes each comprising a genotype at the location of the tumour-specific mutation for a normal population, a reference tumour population
- the candidate neoantigens may be selected as tumour-specific mutations that further satisfy at least one or more predetermined criteria on whether the tumour-specific mutation is likely to be clonal and/or to give rise to a neoantigen.
- the step of selecting, by said processor, one or more of the tumour-specific mutations as candidate neoantigens may comprise determining whether the one or more tumour specific mutations satisfy one or more criteria on whether the tumour-specific mutation is likely to give rise to a neoantigen selected from: the mutation being predicted to result in a protein or peptide that is not expressed in the normal cells of the subject, the mutation being predicted to result in at least one peptide that is likely to be presented by an MHC molecule, the mutation being predicted to result in at least one peptide that is likely to be presented by an MHC allele that is known to be present in the subject, the mutation being likely to be clonal, and the mutation being predicted to result in a protein or peptide that is immunogenic.
- the step of selecting, by said processor, one or more of the tumour-specific mutations as candidate neoantigens may comprise determining, by said processor, whether the one or more tumour specific mutations satisfy one or more predetermined criteria on whether the tumour-specific mutation is likely to be clonal selected from: the mutation having a likelihood of being clonal above a predetermined threshold, the mutation having a likelihood of being clonal that is above a threshold set adaptively to select a predetermined number of tumour-specific mutations with the highest likelihoods of being clonal amongst the tumour-specific mutations for which a likelihood was determined, and having a likelihood of being clonal that is above a threshold set adaptively to select a predetermined top percentile of tumour-specific mutations amongst the tumour-specific mutations for which a likelihood was determined.
- the immunotherapy that targets the one or more of the selected neoantigens may be an immunogenic composition, a composition comprising immune cells or a therapeutic antibody.
- the immunotherapy may be a composition comprising T cells that recognise at least one of the one or more of the selected neoantigens identified.
- the composition may be enriched for T cells that target at least one of the one or more of the selected neoantigens identified.
- the method may comprise obtaining a population of T cells and expanding the population of T cells to increase the number or relative proportion of T cells that target at least one of the one or more of the selected neoantigens identified.
- Determining a posterior probability that a candidate tumour-specific mutation is clonal depending on: a prior probability of the mutation being expressed, and the probabilities of observing the sequence data if the tumour-specific mutation is (i) expressed and (ii) not expressed, in view of a tumour fraction for each of the one or more samples and one or more candidate joint genotypes each comprising a genotype at the location of the tumour-specific mutation for a normal population, a reference tumour population that does not comprise the tumour-specific mutation and a variant tumour cell population that comprises the tumour- specific mutation) may be performed using the approach described in the following section. Identification of clonal mutations Embodiments of the methods described herein may comprise determining whether a tumour- specific mutation is likely to be clonal.
- each mutation divides the set of cells that were sequenced into three sub-populations: (i) the normal cell population consisting of cells with healthy germline genomes (likely diploid in the region of the mutation); (ii) the reference cell population which consists of cancer cells without the mutation in question (may be aneuploid in the region of the mutation in question); and (iii) the variant cell population which consists of cancer cells with the mutation in question (may be aneuploid in the region of the mutation in question, may not have the same copy number in said region as the reference population).
- the term “mutation” is intended here in its broadest sense to refer to any genetic alteration that is detectable in sequence data, and particularly genomic sequence data.
- G (A, B, AA, AB, AAA, AABB,...) be the set of all genotypes where A and B represent reference and variant alleles respectively.
- AB would represent a heterozygous variant (comprising one reference/normal allele A and one variant allele B) with total copy number 2.
- the normal population has the genotype AA (where both A can be the same or different, i.e.
- the normal population may be homozygous or heterozygous, but both alleles are normal
- the reference population has the genotype AAA (where the A alleles are selected from the A alleles of the normal population)
- the variant population has the genotype AABB (where the A alleles are selected from the A alleles of the normal population and the B alleles are any non-reference alleles).
- the genotype of all cells within each sub-population is constant (i.e. by reference to Figure 4, all cells in the normal population have the genotype AA, all cells in the reference population have the genotype AAA, and all cells in the variant population have the genotype AABB).
- G (G H ;G R ;G V ) ⁇ G 3 be a vector where the entries are the genotype of the normal (healthy), reference and variant populations respectively (each of these individual genotypes will be referred to generically as “G” below).
- t the proportion of cancer cells in the sample. This is often referred to as the tumour content, tumour purity or cellularity of the sample.
- ⁇ the proportion of cancer cells harbouring the mutation in the sample, that is the relative proportion of cancer cells in the variant population. This is often referred to as the cancer cell fraction (CCF) or cellular prevalence of the mutation.
- CCF cancer cell fraction
- ⁇ be the assumed sequencing error rate.
- ⁇ (G, t) be the probability of sampling a read with the variant allele. Assuming that we have an infinite initial population of cells which are sampled when sequencing, the probability of sampling a read with a variant allele is roughly proportional to the number of copies of the variant allele in the input pool of DNA.
- the variable ⁇ (G, ⁇ , t) captures the sum of the number of copies of the variant allele originating from each genotype multiplied by the probability of sampling a read with a mutation from the genotype, normalised by the sum of the total number of copies of both alleles originating from each genotype.
- the variable d is the total number of reads covering the mutation in the sample, of which d b contain the mutant allele.
- G, ⁇ , t) the probability of observing these number of reads d, d b (P(d, d b
- Beta-binomial model with mean ⁇ (G, ⁇ , t) and precision (inverse of variance) ⁇ (equation (4)) can be used instead, for example if the data has more variance than can be explained by a Binomial model: P(d, d b
- G, ⁇ , t) Binomial(d b
- G, ⁇ , t, ⁇ ) BetaBinomial(d b
- the parameters ⁇ is set to 200 in the examples below, though other values are possible.
- ⁇ and t are associated with individual samples so the notation above is a shorthand for ⁇ s and ts, respectively.
- Eliciting mutational genotype priors The above model uses either a known joint genotype, or prior probabilities ⁇ , where ⁇ i is the prior probability of the i th plausible joint genotype, Gi, of the populations (i.e. Gi is one possible combination of genotypes for the healthy, variant and reference populations).
- ⁇ i is the prior probability of the i th plausible joint genotype, Gi, of the populations
- Gi is one possible combination of genotypes for the healthy, variant and reference populations.
- Various methods can be used to set potential genotype priors. Note that the same principles apply to the methods for determining whether a tumour-specific mutation is likely to be expressed, with the difference that the joint genotypes G refer to the genotypes of the tumour (variant) and normal (no variant / germline) population (i.e.
- the major copy number method there is no reference tumour population).
- one possible method can be referred to as the “major copy number” method.
- Let c major and c minor denote the major and minor allele copy number for the region overlapping the mutation in the tumour sample.
- the method “major copy number method” considers two cases: (a) In the first case, the mutation occurs before the copy number event. In this case the reference population genotype matches the normal population. We consider all possible mutational genotypes for the variant population with up to c major chromosomes containing the variant. (b) In the second case, the mutation occurs after the copy number event. In this case the reference population has cmajor + cminor reference alleles. The variant population has 1 variant allele and c major + c minor - 1 reference allele.
- G1 (AA, AA, AAB)
- G2 (AA, AA, ABB)
- G3 (AA, AAA, AAB) each with a prior probability of 1/3.
- cmajor can be set to the total copy number and cminor to zero. This approach assumes that a mutation occurs only once, such that if more than one copy of the mutant allele is present in the variant population, then this occurred because the mutation preceded a copy number change at the locus and was subsequently amplified.
- genotype of the variant population has the predicted total copy number at the region of the mutation, with at least one mutant allele, and that the reference population is either AA or the genotype with a copy number equal to the predicted total copy number and no variant allele (with equal probability).
- total copy number prior indicates that the genotype of the variant population at the locus has the predicted total copy number and may have any number (>0) of copies of the mutant allele (i.e.
- the “major copy number” approach “trusts” the range of the possible major copies, but not the absolute value of it, by considering all values between 1 and the predicted major copy number.
- Clonality estimation model A hierarchical Bayesian model may be used based on the above for identifying ubiquitous mutations. Let Z be a Bernoulli variable which is one when a mutation is ubiquitous (assumed to be clonal) and zero otherwise. Let ⁇ be the prior probability that the mutation is ubiquitous. This is set to 0.5 in the examples below. As above, ⁇ is the proportion of cancer cells harbouring the mutation in the sample.
- the model can be expressed as: Z
- Equation (107) the probability in equation (107) is given by equations (103)/(103a) or (104)/(104a).
- ⁇ ) ⁇ ⁇ ⁇ ⁇ 1 Pr ( ⁇ ⁇ , ⁇
- ⁇ ⁇ ) (108a)
- the proportion of cancer cells harbouring the mutation ( ⁇ ) is unknown.
- ⁇ ⁇ ⁇ ⁇ 1 ⁇ 1 0 Pr ( ⁇ ⁇ , ⁇
- ⁇ ( ⁇ ⁇
- this can be expressed as: where p(d b , d
- ⁇ , t, ⁇ ) is given by equation (110) and p(d b , d Z z
- ⁇ ⁇ ) is given by the beta distributions in equation (106), and Pr( ⁇ ⁇ , ⁇
- ASCAT evaluates a plurality of possible combinations of tumour ploidy and tumour fractions, based on the assumption that the associated allele-specific copy number calls should be as close as possible to nonnegative whole numbers for germline heterozygous single nucleotide polymorphisms (SNPs).
- a solution deemed optimal is then reported (estimated tumour ploidy, tumour purity and allele-specific copy number calls for the tumour and normal part of the sample) together with its goodness-of-fit (based on the above assumption).
- the model provided above can be adjusted to accommodate multiple copy number solutions and their uncertainties, by modifying ⁇ to contain entries for the genotypes from each predicted copy number state (e.g. each proposed solution comprising a major and minor copy state), weighted by the probability associated with this state.
- tumour purity estimate may be estimated together with these copy number states (as is the case e.g. when an approach like ASCAT is used)
- the associated tumour purity estimate can also be taken into account. Note that this may not be necessary when e.g. the tumour purity is estimated or measured separately and is not intrinsically associated with the copy number state estimate.
- ⁇ C be a vector where each entry is the probability for each possible such set of estimates. For each state C, it is possible to compute the vector ⁇ CG of possible genotypes as explained above.
- a final genotype vector can thus be obtained by multiplying ⁇ CG by the entry for state C in ⁇ C .
- ⁇ , ⁇ ) ⁇ i ⁇ i Binomial(d b
- ⁇ , ⁇ , ⁇ ) ⁇ i ⁇ i BetaBinomial(d b
- tumour content t i may now depend on the particular state (and the ⁇ i are elements of the vector ⁇ obtained by multiplying ⁇ CG by the entry for state C in ⁇ C ). These new densities can be substituted in the relevant equations above.
- the problem solved may then be expressed as solving equation (111a), where Pr( ⁇ ⁇ , ⁇
- the values for ti, cmajor, cminor (and hence the compatible ⁇ CG according to the model used) and ⁇ C are provided as outputs of many methods for performing allele-specific copy number analysis of tumours, including but not limited to ASCAT, as explained above.
- the major and minor copy number overlapping the mutation for the tumour population, for a specified copy number solution can be obtained directly from ASCAT (e.g. using ascatNgs, Raine et al., 2016), or derived from the output of e.g. ASCAT such as using the mean B allele frequency of the copy number segment overlapping the mutation, the log R value of the copy number segment overlapping the mutation, and the ploidy of the solution.
- a probability of each solution may optionally be provided (this can also be obtained from the output of e.g. ASCAT which proves a negative log likelihood for a solution). If this is not provided, then all of a plurality of solutions may be treated as equally likely and receive equal weight.
- the script may produce as output a mutation identifier and posterior probability that the mutation is clonal/ubiquitous.
- a Python script implementing the above method may be used, taking as input for each mutation: a mutation identifier, a sample identifier, a count of the number of RNA reads that match the reference allele at the mutation position, a count of the number of RNA reads that match the mutant allele at the mutation position, and, a genotype for the normal and tumour cells (e.g. obtained using ASCAT and/or any approach described herein for inferring joint genotypes for tumour and normal populations from mixed samples), and a tumour purity value (optionally associated with the specific joint genotype, such as e.g. obtained as an output of e.g. ASCAT).
- a genotype for the normal and tumour cells e.g. obtained using ASCAT and/or any approach described herein for inferring joint genotypes for tumour and normal populations from mixed samples
- a tumour purity value optionally associated with the specific joint genotype, such as e.g. obtained as an output of e.g. ASCAT).
- the script may produce as output a mutation identifier and posterior probability that the mutation is expressed.
- Systems Figure 3 shows an embodiment of a system for determining whether a tumour-specific mutation is likely to be expressed, and/or identifying neoantigens and/or for providing an immunotherapy based at least in part on the identified neoantigens, according to the present disclosure.
- the system comprises a computing device 1, which comprises a processor 101 and computer readable memory 102.
- the computing device 1 also comprises a user interface 103, which is illustrated as a screen but may include any other means of conveying information to a user such as e.g. through audible or visual signals.
- the computing device 1 is communicably connected, such as e.g.
- the computing device may be a smartphone, tablet, personal computer or other computing device.
- the computing device is configured to implement a method for determining whether a tumour specific mutation is likely to be expressed, as described herein.
- the computing device 1 is configured to communicate with a remote computing device (not shown), which is itself configured to implement a method of determining whether a tumour specific mutation is likely to be expressed, as described herein.
- the remote computing device may also be configured to send the result of the method to the computing device.
- Communication between the computing device 1 and the remote computing device may be through a wired or wireless connection, and may occur over a local or public network such as e.g. over the public internet or over WiFi.
- the sequence data acquisition 3 means may be in wired connection with the computing device 1, or may be able to communicate through a wireless connection, such as e.g. through a network 6, as illustrated.
- the connection between the computing device 1 and the sequence data acquisition means 3 may be direct or indirect (such as e.g. through a remote computer).
- the sequence data acquisition means 3 are configured to acquire sequence data from nucleic acid samples, for example RNA samples and also optionally genomic DNA samples extracted from cells and/or tissue samples.
- the sample may have been subject to one or more preprocessing steps such as RNA purification, fragmentation, library preparation, target sequence capture (such as e.g. exon capture and/or panel sequence capture).
- preprocessing steps such as RNA purification, fragmentation, library preparation, target sequence capture (such as e.g. exon capture and/or panel sequence capture).
- the sample has not been subject to amplification, or when it has been subject to amplification this was done in the presence of amplification bias controlling means such as e.g. using unique molecular identifiers.
- amplification bias controlling means such as e.g. using unique molecular identifiers.
- Any sample preparation process that is suitable for use in the determination of a genomic copy number profile (whether whole genome or sequence specific) may be used within the context of the present disclosure.
- the sequence data acquisition means is preferably a next generation sequencer.
- the sequence data acquisition means 3 may be in direct or indirect connection with one or more databases 2, on which sequence data (raw or partially processed) may be stored.
- sequence data raw or partially processed
- the following is presented by way of example and is not to be construed as a limitation to the scope of the claims.
- EXAMPLES These examples describe a method of identifying clonal mutations according to the present disclosure, and demonstrate its use using simulated data and multiple types of experimental data.
- Introduction Allele expression (AE, also referred to herein as ‘allele-specific expression’, ASE) has been reported as a predictor of immunogenicity (Gartner et al.2021).
- the power to detect its expression depends on factors such as tumour purity / gene expression level, and we cannot distinguish between lack of ASE and power to detect ASE. Additionally, immunogenic mutations without ASE could potentially be due to immuno-editing (the tumour cells have repressed the expression to avoid immune detection). This is important because if we had a power to detect measure we can separate out those that are 1) potentially false negative (low power to detect), 2) potentially immuno-edited (high power to detect).
- ALExA Alzheimer's disease a series of statistical methods called ALExA (Achilles Likelihood of an Expressed Allele) for evaluation of allele specific expression in a tumour context.
- the inventors devised a method that calculates the power of detecting a mutation as being expressed, accounting for tumour purity, genotype and gene expression (Example 1).
- the inventors further present a statistical method that estimates a probability of a mutation being expressed given RNAseq read count data (Example 2), using the concepts developed in Example 1.
- Example 1 - Power to detect allele-specific expression in a tumour This section presents a model for the number of variant reads that are expected to be observed if a variant is expressed or is not expressed (illustrated on Figure 4).
- a prior on ⁇ for instance a Beta distribution
- the prior on ⁇ may be for example obtained from analysis of DNA sequence data as explained above.
- the variant cell population which consists of cancer cells with the mutation in question may be aneuploid in the region of the mutation in question, may not have the same copy number in said region as the normal population.
- the term “mutation” is intended here in its broadest sense to refer to any genetic alteration that is detectable in sequence data, and particularly genomic sequence data.
- SNVs single nucleotide variants
- MNVs multiple nucleotide variants
- indels etc.
- the present methods relate to the detection of variants present in a tumour population and therefore the mutations are somatic mutations.
- G (A, B, AA, AB, AAA, AABB,...) be the set of all genotypes where A and B represent reference and variant alleles respectively.
- AB would represent a heterozygous variant (comprising one reference/normal allele A and one variant allele B) with total copy number 2.
- the normal population has the genotype AA (where both A can be the same or different, i.e.
- the normal population may be homozygous or heterozygous, but both alleles are normal), and a tumour population may for example have the genotype AABB (where the A alleles are selected from the A alleles of the normal population and the B alleles are any non-reference alleles).
- - d is the total number of reads covering the mutation in the sample
- - b is the number of reads containing the mutation in the sample
- - ⁇ is the assumed sequencing error rate
- - c(G): G ⁇ N is a function which maps a genotype to the total copy number at the locus (i.e.
- DNA sequence data associated with the sample(s) under investigation - ⁇ is the reference ratio (reference read count / total read count) , the fraction of total read counts attributed to the reference allele; this parameter gets integrated over and as such its exact value is not determined; - ⁇ is the fraction of total expression of the gene comprising the tumour-speicfic mutation which is due to the normal cell population (e.g.TPM normal /(TPM normal + TPM tumour ), where TPM are transcripts per million, a common expression metric in RNA sequencing); the value of this parameter can be assumed, known or estimated from e.g.
- G, ⁇ ), i.e. the probability that a read picked from a sequenced cell population with genotype G will carry the mutation of interest (m 1).
- RNA-seq Probability of sampling a read with the mutation from a population with genotype G
- the model presented herein models the process of sampling RNA-seq reads with a mutation of interest from a tumour sample (where a tumour sample is assumed to comprise a mixed population that may include cells with at least some copies of the mutation – i.e. tumour cells - and cells without any copies of the mutation – i.e. normal cells), allowing for different allelic expression of mutant and reference alleles.
- the probability that a read picked from a sequenced cell population with genotype G will carry the mutation of interest (m 1)) provided by equation (1): where the first case captures a variant population that contains at least one reference allele, where it is expected that b(G V ) > 0, the second case captures the normal population, and the third case captures a variant population that does not contain any reference alleles.
- the probability of sampling a read with the variant allele from a population that has at least one variant allele and at least one reference allele is equal to ⁇ ⁇ ( ⁇ )(1 ⁇ ⁇ )+ ⁇ ( ⁇ ) ⁇
- the probability of sampling a read with the variant allele from a population that has no variant allele is equal to ⁇
- the probability of sampling a read with the variant allele from a population that has no reference allele is equal to 1- ⁇ .
- transitions / transversions or individual mutations e.g. A-> C, A-> G, A->T).
- aligning and calling the variant allele may be more prone to error, compared to the reference allele.
- reads aligning to variant alleles may be undercounted in some situations such as for some indel variants. This can be addressed by adding a bias term in the model to correct the observed number of variant reads, for example depending on the expected proportion of reads that may fail to align to the variant.
- the sequencing error rate was additional set differently for the null model (M0) and the alternative model (M1).
- a single value may alternatively be used for both models. Note that both b and d are observed variables, but the approach only models b as d is treated as a parameter of the model.
- model M 0 is used to set b c (which is the value such that we reject H0 if b ⁇ b c with a chosen significance level given by P(b ⁇ b c
- This approach may be used to determine whether a mutation is likely to be actually expressed (and if it is determined to be unlikely to be expressed, whether that is because the power to detect expression with the sequence data at hand is low), to detect mutations from RNA data (e.g. in particular for mutations that are only or advantageously detectable from RNA sequence data such as e.g.
- the approach may be used to determine the power to detect a mutation as being expressed given a plurality of candidate sequencing depths (resulting in a corresponding coverage value d and possible values of b depending on the expected ratio of expression of the normal and variant alleles and the tumour purity and genotype), and a depth that satisfies b> bc at a significance level P(b ⁇ bc
- ⁇ is the cell expression ratio
- ⁇ is the sequencing error rate
- M0 and M1 are the null and alternative models, with different priors for the reference ratio ⁇ .
- the variant and normal genotypes (GV and GN) and the tumour purity t were set to their MAP estimate. The 100% sample can be used to assess the ground truth number of mutations, and whether they are expressed. Two replicate experiments were analysed in an identical manner.
- the variant genotypes are ‘ground truth’ – estimates of major and minor copy numbers for each mutation at 100% purity, using modified Sequenza. The purity is estimated.
- the purity and genotypes were jointly estimated using Sequenza (using Sequenza determine the probability that a tumour has a particular purity/ploidy value along a grid of possible values, and assigning genotypes for each variant given the associated LRR, BAF and ploidy value). The model was not run on estimated genotypes and estimated purity because part of the aim for the different data sets above was to investigate the effect of changing one variable at a time.
- ⁇ was set to 0.5 (assuming that the tumour and normal cells contribute equal amounts to the total expression at the locus).
- - bc the critical value for the test of mutation expression: if the number of variant reads at that mutation is at least equal to bc (b ⁇ bc), then the mutation is considered to be detected (i.e.
- the false positive (FP) and false negative (FN) rates were also calculated., using the ground truth data (i.e. the 221 mutations that are known to be expressed in the 100% sample and the 112 mutations that are known not to be expressed in the 100% sample)
- the ground truth data i.e. the 221 mutations that are known to be expressed in the 100% sample and the 112 mutations that are known not to be expressed in the 100% sample.
- the TN was very similar across all cases as all use the same false positive rate ( ⁇ 0.05).
- the distribution of estimated power, grouped by the different outcomes (TP, FP, FN, TP) for each the tumour purities assessed is shown on Figures 9A-E for each of the data sets (replicate 2).
- the data for replicate 1 is similar and not shown for brevity. This shows that for the mutations that are truly expressed (FN+TP), the power is lower for false negatives (FN) than for true positives (TP), indicating that the method is able to identify cases where the data provides low power to detect a mutation (and hence absence of its detection does not necessarily mean absence of expression).
- Example 2 Estimation of the alpha parameter
- ⁇ the fraction of total expression at the locus which is due to the normal cell population (e.g.TPMnormal/(TPMnormal + TPMtumour)), used in Example 1.
- the value of the parameter ⁇ is typically not known for a particular sample. In the simplest implementation, this can be set to a suitable default value, such as e.g.0.5. This assumes that the normal and tumour cell populations contribute equally to the total expression signal (e.g. number of reads) at the locus. However, intuitively there should be a relationship between this parameter and tumour purity.
- TPM TPM value for gene i
- ⁇ ⁇ ⁇ ⁇ 6 ⁇ ⁇ ⁇ ⁇ 10 (20) 10
- ⁇ ⁇ ⁇ ⁇ is the number of reads that map to gene i (ri) divided by the length of gene I (li).
- T and N The superscripts T and N will be used to refer to the reads from the tumour cells and the reads from the normal cells, respectively.
- - the sample comprised n cells, with (1-t)n normal cells and tn tumour cells
- - ⁇ ⁇ ⁇ ⁇ ⁇ reads for gene i are obtained for each tumour cell
- 15 - ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ reads for gene i are obtained for each normal cell
- ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ (21a)
- ⁇ ⁇ ⁇ (1 ⁇ ⁇ ) ⁇ ⁇ ⁇ ⁇ ⁇ (21b) and: 20
- ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ are the number of reads for gene i from each tumour cell and normal cell, respectively, normalised by gene length.
- Example 2 Replicate 2
- ⁇ parameter cell expression ratio
- the cell expression ratio was estimated for each mutation b a linear regression model using purity and gene expression values (transcripts per million, TPM), as explained above, across the titration series.
- TPM tumor per million
- null model of no expression is rejected
- - alpha the associated significance value (probability of false positive – detecting the mutation when it is not expressed) of the test, where alpha ⁇ 0.05 as explained above
- - power the power of detecting an expressed mutation (probability of rejecting null hypothesis when alternative is true).
- the power to detect a mutation provides a indication of whether we can have confidence in calling whether a mutated allele is expressed.
- the distribution of estimated power from the model, grouped by different outcomes (TP, FN, FP, TP) and separated by tumour purity is shown on Figure 12A-E (for each of the 5 datasets). This shows that for the mutations that are truly expressed, the power is lower for false negatives that for true positives.
- Table 2. Average error rates over the 10, 20 and 50%purity samples The mutations were then grouped by power values (using bins of 10%). For each bin, the fraction of true positives and the fraction of true negatives were calculated. The results of this are shown on Figure 13, which shows that the TP rate correlates with the power estimated by the model, as expected.
- Example 3 Probability that a mutation is expressed This section introduces a unified model to determine whether a mutation is expressed or not, using the statistical hypothesis framework introduced above to estimate the power of detecting an expressed mutation.
- Example 1 describes models of variant expression and no variant expression using a binomial likelihood for the number of variant reads at a site with coverage d (Equation (3)) (illustrated on Figure 4).
- Example 2 illustrates the process of estimation of the ⁇ parameter (cell expression ratio) in the model of Example 1.
- a model to determine the probability that a mutation is expressed or not is presented, as illustrated on Figures 5A and 5B, respectively for a single sample (e.g. a single tumour region) and a plurality of samples (e.g. a plurality of tumour regions), based on the framework in Example 1.
- This approach advantageously combines the information in the model of Example 1 with prior information about a mutation being expressed, within a Bayesian framework. Additionally, the approach provides a single, interpretable probability of a tumour-specific mutation being expressed. This can be used for example to rank or otherwise prioritise tumour specific mutations for further analysis, for use as therapeutic targets, for use as diagnostic markers, etc.
- the probability advantageously combines information from the data available (RNA sequence data) in the form of the likelihood of the data under different models (expression / not expression of the variants), as well as information about prior beliefs of whether the mutation is expressed (hence the reference to a “unified” model, combining likelihood and prior probability of expression).
- ⁇ ⁇ can be set using prior knowledge of whether the mutation is likely to be expressed in the sample. This can be based on e.g. prior knowledge about the type of mutations, the type of cancer, the gene in which the mutation is located, etc.
- E the joint posterior distribution on E
- ⁇ 1
- Defining the likelihood ratio of model M1 over model M0 as r, which is given by equation (12): we can rewrite equation (11) to obtain equation (13): ⁇ ( ⁇ 1
- ⁇ , ⁇ , ⁇ , ⁇ ) ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ (log( ⁇ ) + log ( ⁇ ⁇ 1 ⁇ ⁇ ⁇ )) (14)
- Figure 6 shows that when a prior indicating a low chance of the variant being expressed (low ⁇ ), a lot of evidence (high likelihood ratio r) is needed to push the posterior probability to high values (high probability of expression).
- the figure also shows that in cases where there is a low power to detect variant alleles and low evidence of expression (likelihood ratio close to 1, the null and alternative models are similar) the probability will be close to the prior. With high evidence (number of reads higher than the critical value, high likelihood ratio), the probability that the variant is expressed is higher.
- high evidence number of reads higher than the critical value, high likelihood ratio
- the probability that the variant is expressed is higher.
- b ⁇ bc and there is strong evidence in favour of M 0 , leading to a very small r.
- P(E 1
- b,d, ⁇ ,t) 0.
- TPM > 1) of the gene that harbours the variant.
- the threshold used may be set to any predetermined value, preferably a low value in order to capture genes with low expression.
- the threshold may be set to a value that results in good performance of the model to detect genes with low expression, or to a value that reflects the TPM value below which a gene is unlikely to be expressed, using reference expression data such as e.g. expression data from a cohort of samples.
- the present inventors have adapted a simple Bayesian framework with a single prior to instead include a prior that has multiple possible values including at least a first value if there is evidence that the locus is expressed, and a second value if there is not enough evidence that the locus is expressed.
- Extension to multi-region sequencing For multi-region sequencing, when we have sequencing data from multiple regions of the tumour, the model can be extended to use the multiple sources of information. Suppose that there are S regions, and subscripts indicates the s th region.
- ⁇ 0 ) ⁇ ( ⁇ ⁇ , ⁇ ⁇
- ⁇ 0) ⁇ ( ⁇ ) ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ( ⁇ ⁇ , ⁇ ⁇
- ⁇ 1) ⁇ ( ⁇ ⁇ , ⁇ ⁇
- ⁇ 1) ⁇ ( ⁇ ) ⁇ ⁇ ⁇ ⁇ ⁇ (5’’)
- rs is the likelihood ratio of M1 to M0 given the data covering the mutation in region s:
- the model described in the methods section can be used with a default value for ⁇ or with an estimated value. The latter is illustrated here.
- the same data used in Example 2 was used, with the same parameters for the sequencing error rate ⁇ and the priors for ⁇ under the null and alternative models.
- the posterior probabilities of each mutation being expressed were calculated as explained in the methods section above, for the data described in Example 1 (replicate 2).
- TPM>1, and ⁇ 0.05 if the gene is expected to be lowly expressed -TPM ⁇ 1– where expectation of expression was determined based on calculated TPM values for each of the data sets in the purity series), as expected.
- the mutations that were truly not expressed (top row) are assigned very low probabilities (see peak near 0), and the mutations that were truly expressed (bottom row) are mostly assigned high probabilities (see peak near 1),with only a few false negative mutations with read counts >0 and probabilities near the lower end of the scale.
- Figure 15 shows that by contrast the “probabilities” derived from the VAF are much more spread out for the mutations that are in fact expressed – with many mutations having very low VAF, such that there is no natural threshold to identify mutations as expressed or not. Thus, using this model would either result in very high false positives or very high false negatives.
- comparing Figures 14 and 15 shows that the method described here is able to integrate the data within a biologically grounded model to push the confidence in expression vs no expression from a prior belief to confident determinations of expression vs no expression, unless the data genuinely does not provide enough information to make such a determination.
- - ROC curve TP and FP rates using varying thresholds for calling a mutation as expressed / not expressed: this shows the effect of the decision threshold on the TP and FP rates, and allows to calculate the AUC (Area Under the ROC Curve), and the threshold that gives the best TP rate with a FP rate below 0.05; - The AUC: this represents the probability that a random positive example (mutation that is truly expressed) is ranked higher (has a greater probability of being expressed) than a random negative example (mutation that is truly not expressed); - Calibration curve: the AUC and ROC demonstrate a model’s ability to differentiate between positive and negative cases, but we would like to assess whether the probabilities used to predict whether a mutation is expressed are calibrated correctly in that they reflect the true likelihood, i.e.
- the present model provides very informative predictions (any AUC over 50% is better than random guess).
- the present model can likely be improved by further calibration, particularly by adapting the choice of prior.
- a model using the VAF proportion of reads with the variant
- may appear informative in the particular cases shown such a model has severe limitations. For example, it provides an estimate that is not comparable across samples, making it impossible to make reliable, reproducible and verifiable decisions for the same patient. Indeed, a VAF of 10% in a 5% purity sample would provide a strong indication that the variant is expressed, whereas a VAF of 10% in a 90% purity sample would be much more likely to be a false positive due to e.g. sequencing error.
- the ROC curves are shown on Figures 18A and 18B, respectively for the present model and the simpler VAF model, and the calibration curves are shown on Figures 19A and 19B, respectively for the present model and the VAF model.
- the VAFs are not well calibrated probabilities, they cannot integrate the prior assumptions (used in the edge cases to infer probability of expression), and the simpler VAF model performs poorly.
- the calibration curve for the present model is slightly better than without edge cases (see Fig.
- the cell specific expression ⁇ was identified to pay a key role in the accuracy of the power estimate output by the model, and a way to estimate this when data is available to do this was also introduced and demonstrated.
- the performance of the approach using cell line data was finally evaluated, showing a good classification performance (accuracy of classification of mutations as expressed vs not expressed).
- NetMHCpan-4.0 Improved Peptide–MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data. J Immunol November 1, 2017, 199 (9) 3360- 3368. Langmead, B., Trapnell, C., Pop, M. et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25 (2009). Lundegaard C, Lamberth K, Harndahl M, Buus S, Lund O, Nielsen M. NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioethics (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
Abstract
Description
Claims
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23765269.8A EP4591310A1 (en) | 2022-09-23 | 2023-09-06 | Allele specific expression |
| JP2025517289A JP2025534267A (en) | 2022-09-23 | 2023-09-06 | Allele-specific expression |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GBGB2213928.1A GB202213928D0 (en) | 2022-09-23 | 2022-09-23 | Allele specific expression |
| GB2213928.1 | 2022-09-23 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024061628A1 true WO2024061628A1 (en) | 2024-03-28 |
Family
ID=83978681
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2023/074437 Ceased WO2024061628A1 (en) | 2022-09-23 | 2023-09-06 | Allele specific expression |
Country Status (4)
| Country | Link |
|---|---|
| EP (1) | EP4591310A1 (en) |
| JP (1) | JP2025534267A (en) |
| GB (1) | GB202213928D0 (en) |
| WO (1) | WO2024061628A1 (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2016174085A1 (en) | 2015-04-27 | 2016-11-03 | Cancer Research Technology Limited | Method for treating cancer |
| AU2018206769A1 (en) * | 2010-05-14 | 2018-08-09 | Dana-Farber Cancer Institute, Inc. | Compositions and methods of identifying tumor specific neoantigens |
| US20190362808A1 (en) * | 2017-02-01 | 2019-11-28 | The Translational Genomics Research Institute | Methods of detecting somatic and germline variants in impure tumors |
| US20210327535A1 (en) * | 2018-08-22 | 2021-10-21 | The Regents Of The University Of California | Sensitively detecting copy number variations (cnvs) from circulating cell-free nucleic acid |
-
2022
- 2022-09-23 GB GBGB2213928.1A patent/GB202213928D0/en not_active Ceased
-
2023
- 2023-09-06 EP EP23765269.8A patent/EP4591310A1/en active Pending
- 2023-09-06 WO PCT/EP2023/074437 patent/WO2024061628A1/en not_active Ceased
- 2023-09-06 JP JP2025517289A patent/JP2025534267A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| AU2018206769A1 (en) * | 2010-05-14 | 2018-08-09 | Dana-Farber Cancer Institute, Inc. | Compositions and methods of identifying tumor specific neoantigens |
| WO2016174085A1 (en) | 2015-04-27 | 2016-11-03 | Cancer Research Technology Limited | Method for treating cancer |
| US20190362808A1 (en) * | 2017-02-01 | 2019-11-28 | The Translational Genomics Research Institute | Methods of detecting somatic and germline variants in impure tumors |
| US20210327535A1 (en) * | 2018-08-22 | 2021-10-21 | The Regents Of The University Of California | Sensitively detecting copy number variations (cnvs) from circulating cell-free nucleic acid |
Non-Patent Citations (13)
| Title |
|---|
| CARTER SLCIBULSKIS KHELMAN EMCKENNA ASHEN HZACK TLAIRD PWONOFRIO RCWINCKLER WWEIR BA: "Absolute quantification of somatic DNA alterations in human cancer", NAT BIOTECHNOL., vol. 30, no. 5, May 2012 (2012-05-01), pages 413 - 21, XP055563480, DOI: 10.1038/nbt.2203 |
| CASTEL, SELEVY-MOONSHINE AMOHAMMADI PBANKS ELAPPALAINEN T: "Tools and best practice for data processing in allelic expression analysis", GENOME BIOLOGY, vol. 16, 2015, pages 195, XP055642724, DOI: 10.1186/s13059-015-0762-6 |
| FAVERO FJOSHI TMARQUARD AMBIRKBAK NJKRZYSTANEK MLI QSZALLASI ZEKLUND AC: "Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data", ANN ONCOL., vol. 26, no. 1, January 2015 (2015-01-01), pages 64 - 70 |
| GARTNERJARED J. ET AL.: "A Machine Learning Model for Ranking Candidate HLA Class I Neoantigens Based on Known Neoepitopes from Multiple Human Tumor Types", NATURE CANCER, vol. 2, no. 5, 2021, pages 563 - 74, XP055915961, DOI: 10.1038/s43018-021-00197-6 |
| HEEMSKERK BKVISTBORG PSCHUMACHER TNM: "The cancer antigenome", THE EMBO JOURNAL, vol. 32, no. 2, 2013, XP055923619 |
| LANDAU DACARTER SLSTOJANOV PMCKENNA ASTEVENSON KLAWRENCE MSSOUGNEZ CSTEWART CSIVACHENKO AWANG L: "Evolution and impact of subclonal mutations in chronic lymphocytic leukemia", CELL., vol. 152, no. 4, 14 February 2013 (2013-02-14), pages 714 - 26, XP028979918, DOI: 10.1016/j.cell.2013.01.019 |
| LANGMEAD, B.TRAPNELL, C.POP, M. ET AL.: "Ultrafast and memory-efficient alignment of short DNA sequences to the human genome", GENOME BIOL, vol. 10, 2009, pages R25, XP021053573, DOI: 10.1186/gb-2009-10-3-r25 |
| LUNDEGAARD CLAMBERTH KHARNDAHL MBUUS SLUND ONIELSEN M: "NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11", NUCLEIC ACIDS RES, vol. 36, 1 July 2008 (2008-07-01), pages W509 - 12, XP055252573, DOI: 10.1093/nar/gkn202 |
| MCGRANAHAN, N.FURNESS, A. J.ROSENTHAL, R.RAMSKOV, S.LYNGAA, R.SAINI, S. K.JAMAL-HANJANI, M.WILSON, G. A.BIRKBAK, N. J.HILEY, C. T.: "Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade", SCIENCE, vol. 351, no. 6280, 2016, pages 1463 - 1469, XP055283414, DOI: 10.1126/science.aaf1490 |
| RAINE KM, VAN LOO P, WEDGE DC, JONES D, MENZIES A, BUTLER AP, TEAGUE JW, TARPEY P, NIK-ZAINAL S, CAMPBELL PJ: "ascatNgs: Identifying Somatically Acquired Copy-Number Alterations from Whole-Genome Sequencing Data ", CURR PROTOC BIOINFORMATICS., vol. 56, 8 December 2016 (2016-12-08), pages 1 - 17 |
| THE CANCER GENOME ATLAS DATASET, Retrieved from the Internet <URL:https://portal.gdc.cancer.gov> |
| VANESSA JURTZSINU PAULMASSIMO ANDREATTAPAOLO MARCATILIBJOERN PETERSMORTEN NIELSEN: "NetMHCpan-4.0: Improved Peptide-MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data", J IMMUNOL, vol. 199, no. 9, 1 November 2017 (2017-11-01), pages 3360 - 3368, XP055634914, DOI: 10.4049/jimmunol.1700893 |
| WILSON DOUGLASR. ET AL: "Mapping Tumor-Specific Expression QTLs in Impure Tumor Samples", JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, vol. 115, no. 529, 4 June 2019 (2019-06-04), US, pages 79 - 89, XP093101476, ISSN: 0162-1459, Retrieved from the Internet <URL:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7410098/pdf/nihms-1028292.pdf> [retrieved on 20231114], DOI: 10.1080/01621459.2019.1609968 * |
Also Published As
| Publication number | Publication date |
|---|---|
| GB202213928D0 (en) | 2022-11-09 |
| EP4591310A1 (en) | 2025-07-30 |
| JP2025534267A (en) | 2025-10-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Budczies et al. | Tumour mutational burden: clinical utility, challenges and emerging improvements | |
| Litchfield et al. | Meta-analysis of tumor-and T cell-intrinsic mechanisms of sensitization to checkpoint inhibition | |
| Bailey et al. | Tracking cancer evolution through the disease course | |
| Miao et al. | Genomic correlates of response to immune checkpoint blockade in microsatellite-stable solid tumors | |
| Linxweiler et al. | The immune microenvironment and neoantigen landscape of aggressive salivary gland carcinomas differ by subtype | |
| Jia et al. | Titin mutation associated with responsiveness to checkpoint blockades in solid tumors | |
| Wang et al. | Comutations in DNA damage response pathways serve as potential biomarkers for immune checkpoint blockade | |
| US11504398B2 (en) | Identification of clonal neoantigens and uses thereof | |
| Kiyotani et al. | Integrated analysis of somatic mutations and immune microenvironment in malignant pleural mesothelioma | |
| EP3576781B1 (en) | Neoantigens and uses thereof for treating cancer | |
| Liu et al. | Response and recurrence correlates in individuals treated with neoadjuvant anti-PD-1 therapy for resectable oral cavity squamous cell carcinoma | |
| Dunne et al. | Immune-derived PD-L1 gene expression defines a subgroup of stage II/III colorectal cancer patients with favorable prognosis who may be harmed by adjuvant chemotherapy | |
| Liu et al. | Predicting patient outcomes after treatment with immune checkpoint blockade: A review of biomarkers derived from diverse data modalities | |
| Lo et al. | Indication-specific tumor evolution and its impact on neoantigen targeting and biomarkers for individualized cancer immunotherapies | |
| Wang et al. | The loss of neoantigens is an important reason for immune escape in multiple myeloma patients with high intratumor heterogeneity | |
| WO2024061628A1 (en) | Allele specific expression | |
| Gaißler et al. | Dynamics of Melanoma-associated EPITOPE-specific Cd8+ T cells in the blood correlate with clinical outcome under PD-1 blockade | |
| US20250239349A1 (en) | Systems and methods for determining t-cell cross-reactivity between antigens | |
| Koh et al. | Spatially Resolved Whole-Transcriptomic and Proteomic Profiling of Lung Cancer and Its Immune Microenvironment According to PD-L1 Expression | |
| US20250218535A1 (en) | Identification of clonal neoantigens and uses thereof | |
| Boll et al. | Predicting immunotherapy response in advanced bladder cancer: a meta-analysis of six independent cohorts | |
| US20240412813A1 (en) | Methods and systems for tumour monitoring | |
| Barroux et al. | Evolutionary and immune microenvironment dynamics during neoadjuvant treatment of oesophagael adenocarcinoma | |
| Ruihan et al. | Integrated modeling to implicate evolving neoantigen-T cell interplays and immunotherapy efficacy in tumors | |
| Saha et al. | Temporal and disease-site specific transcriptional and phenotypic changes in response to neoadjuvant chemotherapy in high-grade serous ovarian cancers |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23765269 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2025517289 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2025517289 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023765269 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2023765269 Country of ref document: EP Effective date: 20250423 |
|
| WWP | Wipo information: published in national office |
Ref document number: 2023765269 Country of ref document: EP |