WO2024256485A1 - Identification de peptides néo-antigènes - Google Patents
Identification de peptides néo-antigènes Download PDFInfo
- Publication number
- WO2024256485A1 WO2024256485A1 PCT/EP2024/066271 EP2024066271W WO2024256485A1 WO 2024256485 A1 WO2024256485 A1 WO 2024256485A1 EP 2024066271 W EP2024066271 W EP 2024066271W WO 2024256485 A1 WO2024256485 A1 WO 2024256485A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- tumour
- sequence
- mutation
- neoantigen
- identifying
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
Definitions
- the present disclosure relates to methods for identifying a neoantigen peptide from RNA sequence data.
- the present disclosure also relates to methods and compositions for the treatment of cancer which make use of or target neoantigens.
- Cancer neoantigens are antigens that result from the presence of cancer specific variants (also referred to as “tumour-specific mutations”) and are therefore not present on normal cells. These represent promising therapeutic targets for immunotherapy, and multiple neoantigen vaccine trials are currently underway.
- the process of designing a personalised neoantigen based immunotherapy typically starts with DNA and/or RNA sequencing of a patient to identify tumour-specific mutations present in the patient. It is then typically necessary to identify the peptides that result from the tumour-specific mutations identified. Indeed, the peptides may be part of the therapy or a necessary reagent to obtain or validate the therapy, or they may at least need to be identified to screen candidate neoantigens for immunogenicity, MHC binding and/or presentation.
- Identifying the peptides associated with tumour-specific variants is not a trivial task. Indeed, many factors can impact the peptides obtained, such as the haplotype-specific presence of polymorphisms and the effect of the mutations on the splicing and translation process.
- RNA sequence data Two main types of approaches to identify neoantigen peptides from DNA or RNA sequence data can be distinguished: a first type of approach relies on modifying reference transcripts in the region of tumour-specific mutation by introducing identified mutations and phased polymorphisms, then translating the transcripts using their canonical reading frame. These approaches can make use of DNA sequence data only.
- a second type of approach is referred to herein as “assembly-based approaches” and assembles transcripts directly from RNA sequence data. While the resulting sequence is expected to more accurately capture the transcripts that are present in the cancer cell, they come with the further challenge of having to identify the peptides that are translated from these transcripts.
- One approach that has been suggested (called “Isovar”, Rubinsteyn, A.
- Neojunctions are more problematic in the context of variants that result in tumour-specific splice junctions (also referred to herein as “neojunctions”).
- first type of approaches when using DNA sequence only, it is necessary to predict the splicing effect of mutations in introns or near canonical splice junctions. This is often inaccurate.
- Neojunctions can be directly identified from RNA sequence data and spiked into reference transcripts, when available. This is what is performed in the package splice2neo (rdrr.io/github/TRON-Bioinformatics/splice2neo/).
- neojunctions can result in a change of the reading frame from the canonical reading frame of the transcript, and therefore using the canonical reading frame of the transcript may result in at least a portion of the translated sequence being erroneous.
- the present inventors have identified that mutations that result in the inclusion of intronic sequence in the transcript will fail to match to any reference transcript. Another solution is therefore needed.
- RNA reads compatible with a variant are assembled then matched with a reference transcript using only those positions of the assembled sequence that are upstream of the mutation and that are exonic.
- the present inventors showed that this was indeed the case and demonstrated on simulated data, cell line data and patient data that the methods described herein enabled a significant increase in the number of neoantigen peptides that could be identified. This is particularly significant as neojunctions can result in proteins that are very different from their wild-type counterparts and therefore have high immunogenic potential. Thus, the method leads to an improved likelihood of identifying neoantigens that are likely to be successful targets for immunotherapy.
- a method of identifying a neoantigen peptide associated with a tumour-specific mutation comprising: obtaining RNA sequence data from one or more samples comprising tumour genetic material; selecting one or more RNA sequence reads from the RNA sequence data that contain the tumour-specific mutation; assembling a sequence comprising the tumour-specific mutation and compatible with overlapping sequences of the selected one or more RNA sequence reads; identifying positions of the assembled sequence as part of intronic or exonic regions; extracting an anchor sequence from the assembled sequence as a subsequence that precedes the tumourspecific mutation and that only includes positions identified as part of exonic regions; identifying a reading frame as the reading frame of a reference transcript: (i) that overlaps the genomic position of the tumour-specific mutation, and (ii) that the anchor sequence maps to or that has a corresponding protein that a translation of the anchor sequence maps; and identifying the neoantigen peptide as a translation of the assembled sequence or a part thereof
- the method of the present aspect may have one or more of the following features.
- Identifying a neoantigen peptide refers to determining the sequence of the neoantigen peptide. This may also be referred to herein as “assembling” a neoantigen peptide.
- an anchor sequence that is 5’ of the tumour specific mutation but that does not comprise intronic regions ensures that a matching sequence can be found even in cases where a retained intron is present in the assembled sequence.
- the anchor sequence or translation thereof may be considered to map to a reference transcript or protein corresponding to a reference transcript when the number of mismatched positions between the anchor sequence or anchor sequence translation and the reference sequence or reference sequence protein is at most a predetermined number.
- the predetermined number may be 0, 1 , 2 or 3.
- the anchor sequence or translation thereof may be considered to map to a reference transcript or protein corresponding to a reference transcript when the length of the matching sequence between the anchor sequence or anchor sequence translation and the reference sequence or reference sequence protein has at least a predetermined length.
- the predetermined length may be at least 4 amino acids, at least 5 amino acids or at least 6 amino acids, or at least 12, 15, or 18 bases.
- the anchor sequence or translation thereof may be considered to map to a reference transcript or protein corresponding to a reference transcript when the sequence of the anchor sequence or anchor sequence translation is a perfect match to the reference sequence or reference sequence protein.
- the anchor sequence or translation thereof may be considered to map to a reference transcript or protein corresponding to a reference transcript when the sequence of the anchor sequence or anchor sequence translation is a match to the reference sequence or reference sequence protein allowing up to a predetermined number of mismatches.
- the predetermined number of mismatches may be 1 or 2 bases, or 1 or 2 amino acids.
- Identifying a reading frame may comprise: translating at least the anchor sequence using all 3 possible reading frames; and identifying a protein corresponding to a reference transcript that a translated anchor sequence maps to, wherein the identified reading frame is the reading frame of the translated anchor that maps to the protein corresponding to a reference transcript.
- Identifying a reading frame may comprise identifying a reference transcript that overlaps the genomic position of the tumour-specific mutation and that the anchor sequence maps to, and identifying the reading frame of the identified reference transcript.
- the use of the anchor sequence and reference transcripts to identify a reading frame advantageously means that the sequences that are matched are longer than when using protein sequences (since each amino acid corresponds to 3 bases in a transcript). This may enable matching of shorter anchor sequences.
- Identifying the reading frame of the identified reference transcript may comprise obtaining the reading frame from a database of reference transcripts, or determining the reading frame of the identified reference transcript from a corresponding protein.
- a reference transcript may be associated with one or both of: a reading frame and a corresponding protein sequence.
- a reading frame can be identified from a reference transcript and its corresponding protein sequence.
- a corresponding protein sequence can be identified from a reference transcript and its reading frame.
- Most reference genome databases provide reference transcripts together with corresponding protein sequences (or identifiers from which such sequences can be obtained).
- the term “reference transcript” may refer to the cDNA sequence of reference transcripts. cDNA sequences of reference transcripts are typically available from reference genome databases, or can easily be obtained from reference transcript mRNA sequences.
- Assembling a sequence comprising the tumour-specific mutation and compatible with overlapping sequences of the selected one or more RNA sequence reads may comprise assembling one or more of the selected RNA sequence reads into the longest sequence comprising the tumour-specific mutation compatible with overlapping sequences of the selected one or more RNA sequence reads.
- the use of an assembled sequence corresponding to the longest sequence that ca be assembled from RNA sequence reads that contain a variant ensures that all polymorphisms are inherently phased with the variant, and will be reflected in the identified protein, while using the most available information about the transcripts comprising the variant.
- the tumour-specific mutation may be a splice region variant.
- the tumour-specific mutation may be a 5’ splice variant.
- 5’ splice variants can advantageously be analysed using methods as described herein using any sequencing technology including short read sequencing and polyA capture based RNA sequencing. Without wishing to be bound by theory, this Is believed to be because such variants do not require knowledge of the length of any retained intron in order to identify a correct reading frame. Therefore, limited information obtained from short portions of transcripts that may be focussed on exonic sequences is believed to be sufficient to correctly deal with such variants.
- the tumour-specific mutation may be selected from: a mutation in a 5’ exonic splice region, a mutation in a splice donor site, and a mutation in a 5’ intronic splice region.
- the tumour-specific mutation is a 5’ splice variant or a 3’ splice variant.
- a 3’ splice variant may be selected from: a mutation in a 3’exonic splice region, a mutation in a splice acceptor site, and a mutation in a 3’ intronic splice region.
- the tumour-specific mutation is a 3’ splice variant
- the RNA sequence data may have been obtained using long read sequencing and/or ribodepleted RNA seq.
- the RNA sequence data may have been obtained using next generation sequencing.
- the NGS may be short read sequencing or long read sequencing.
- the RNA sequence data may have been obtained using polyA capture or ribodepletion RNAseq.
- the mutation may be in a gene transcribed from the positive strand and extracting an anchor sequence from the assembled sequence as a subsequence that precedes the tumour-specific mutation and that only includes positions identified as part of exonic regions may comprise extracting a sequence that is in 5’ of the tumour-specific mutation.
- the mutation may be in a gene transcribed from the negative strand and extracting an anchor sequence from the assembled sequence may comprise obtaining the reverse complement of the assembled sequence, wherein the anchor sequence is in 5’ of the tumour-specific mutation in the reverse complement sequence.
- Information on whether a gene is transcribed from the negative or the positive strand may be obtained from a genome I transcriptome reference.
- the method may comprise determining whether the tumour-specific mutation has support in each of the plurality of samples.
- the method may comprise determining whether the tumourspecific mutation is ubiquitous in the tumour from which the plurality of samples have been obtained.
- a tumour-specific mutation that has support in each of the plurality of samples may be considered to be ubiquitous in the tumour.
- the one or more samples may be samples from the same subject and/or the same tumour.
- the one or more samples may be tumour samples.
- one or more tumour samples may have been obtained from one or more biopsies and/or from a surgically resected tumour.
- the wording “same tumour” encompasses the same tumour mass as well as a primary tumour and one or more metastases, and/or samples from the same tumour obtained at different times (e.g. samples may be selected from: a diagnostic biopsy, a surgical resection, a metastasis, a relapse sample, etc.)
- the method may further comprise identifying the tumour-specific mutation using the RNA sequence data or DNA sequence data from one or more samples comprising tumour genetic material from the same tumour.
- the method may further comprise obtaining DNA sequence data from one or more samples comprising genetic material from the tumour and identifying the tumour-specific mutation by comparing the DNA sequence data from the one or more samples comprising genetic material from the tumour and a reference genome sequence and/or DNA sequence data from one or more germline samples.
- the sequence data may comprise a plurality of sequencing reads.
- the sequence data may comprise a plurality of aligned sequencing reads, for example in the form of a SAM or BAM file.
- the sequence data may comprise a plurality of sequencing reads each associated with coordinates in a reference genome or transcriptome to which the sequencing reads align.
- the step of obtaining sequence data from one or more samples from the subject may comprise or consist of receiving sequence data from a user (for example through a user interface), from one or more computing device(s), or from one or more data stores or databases.
- the step of obtaining sequence data may further comprise sequencing (or otherwise determining the sequence composition of genetic material present in a sample) one or more samples from the subject comprising tumour genetic material.
- the method may further comprise sequencing (or otherwise determining the sequence composition of genomic material present in a sample) one or more germline samples from the subject.
- the method may further comprise obtaining, from the subject, one or more samples comprising tumour genetic material and optionally one or more germline samples.
- Genetic material as used herein comprises RNA molecules (e.g. mRNA transcripts), and optionally DNA molecules (e.g. genomic DNA).
- Identifying the tumour-specific mutation may comprise identifying genomic coordinates associated with the tumour-specific mutation and the sequence of the mutation.
- the format of the genomic coordinates may depend on the type of mutation. For example, when the tumour specific mutation is a SNV, the genomic coordinates may be in the form of a single coordinate (e.g. chromosome and location along the chromosome). Then the tumour specific mutation is an indel, the genomic coordinates may be in the form of a pair of coordinates (e.g. start and finish location of the indel) and/or a single coordinate and a length of indel, and optionally the sequence of the mutation.
- the method may further comprise providing to a user, for example through a user interface, one or more of: the sequence of an identified peptide, information identifying one or more neojunctions in the identified peptide, information characterising one or more tumour-specific mutations present in the identified peptide (such as e.g. genomic location, mutation sequence, ubiquity, likelihood of clonality, etc.), one or more properties of the identified peptide (such as e.g.
- a predicted binding affinity of the peptide or a part thereof to one or more MHC molecules a likelihood of the peptide or a part thereof being presented by one or more MHC molecules, a likelihood of the peptide or a part thereof being immunogenic, etc.
- information identifying one or more amino acids and/or positions in the identified peptide that are not expected to be present in a corresponding normal peptide is not expected to be present in a corresponding normal peptide.
- the tumour-specific mutation may be associated with genomic coordinates and selecting one or more RNA sequence reads from the RNA sequence data that contain the tumour-specific mutation may comprise selecting one or more RNA sequence reads that overlap with the genomic coordinates of the tumour-specific mutation and that are compatible with the presence of the tumour-specific mutation.
- the tumour-specific mutation may be ubiquitous in the one or more tumour samples.
- the tumour-specific mutation may be a clonal mutation or a mutation assumed to be clonal in the tumour of the subject.
- the method may further comprise identifying a plurality of tumourspecific mutations, and for each of the plurality of tumour-specific mutations, determining one or more of: whether the tumour-specific mutations is likely to be clonal in the tumour, and whether the tumour-specific mutation is likely to be expressed in the tumour, and selecting a tumour-specific mutation using the result of the determining.
- the tumour-specific mutation may be one that has been determined to be likely to be clonal in the tumour and/or likely to be expressed in the tumour.
- a method of identifying one or more neoantigens in a subject comprising: identifying a plurality of tumour-specific mutations in the subject (such as for example, using genomic and/or transcriptomic data from one or more tumour samples from said subject); identifying a plurality of candidate neoantigen peptides associated with one or more of the plurality of tumour-specific mutations using the method of any embodiment of the first aspect; and identifying a neoantigen associated with at least one of the one or more of the tumour-specific mutations, wherein a neoantigen is a candidate neoantigen peptide that satisfies one or more predetermined criteria selected from: the candidate neoantigen peptide being likely to bind to an MHC molecule, the candidate neoantigen peptide being likely to be presented by an MHC molecule, the candidate neoantigen peptide being likely to be immunogenic, the candidate peptid
- Also described according to the present aspect is a method of identifying one or more neoantigens in a subject, the method comprising: identifying, by a processor using sequence data from one or more samples from said subject, a plurality of tumour-specific mutations in the subject; identifying, by a processor, a plurality of candidate neoantigen peptides associated with one or more of the plurality of tumour-specific mutations using the method of any embodiment of the first aspect; and selecting, by said processor, one or more of the candidate neoantigen peptides as neoantigens, wherein a neoantigen is candidate neoantigen peptide that satisfies at least one or more predetermined criteria on whether peptide is likely to be expressed and/or likely to be immunogenic.
- the one or more criteria may be selected from: the candidate neoantigen peptide being likely to bind to an MHC molecule, the candidate neoantigen peptide being likely to be presented by an MHC molecule, the candidate neoantigen peptide being likely to be immunogenic, the candidate peptide neoantigen peptide being likely to interact with a TCR, the candidate neoantigen peptide being likely to form a stable complex with an MHC molecule, and the candidate neoantigen peptide being likely to be expressed in the tumour.
- the method of the present aspect may have any one or more of the following features.
- a neoantigen may be a peptide that is derived from a tumour-specific mutation that satisfies at least a criterion applied to the likelihood that he tumour-specific mutation is expressed in the tumour in view of the RNA sequence data available from the tumour.
- Such a criterion can be selected from: having a probability of being expressed above a predetermined threshold, having a probability of being expressed that is above a threshold set adaptively to select a predetermined number of tumour-specific mutations with the highest probabilities of being expressed amongst the tumour-specific mutations for which a probability was determined, having a probability of being expressed that is above a threshold set adaptively to select a predetermined top percentile of tumour-specific mutations amongst the tumour-specific mutations for which a probability was determined, having a power to detect a mutation as expressed that is above a predetermined threshold and a number of RNA reads showing the tumour-specific mutation above a threshold number associated with the power to detect a mutation as being expressed, and having a power to detect a mutation as expressed that is below a predetermined threshold and a number of RNA reads showing the tumour-specific mutation below a threshold number associated with the power to detect a mutation as being expressed.
- the method may further comprise determining whether one or more of the tumour-specific mutations is likely to be clonal in a tumour of the subject, and identifying whether one or more of the tumour-specific mutations is likely to give rise to a clonal neoantigen.
- a clonal neoantigen may be a tumour-specific mutation that satisfies at least a criterion selected from: having a probability of being clonal above a predetermined threshold, having a probability of being clonal that is above a threshold set adaptively to select a predetermined number of tumour-specific mutations with the highest probabilities of being clonal amongst the tumourspecific mutations for which a probability was determined, and having a probability of being clonal that is above a threshold set adaptively to select a predetermined top percentile of tumour-specific mutations amongst the tumour-specific mutations for which a probability was determined.
- the one or more predetermined criteria on whether the tumour-specific mutation is likely to be clonal may be selected from: the mutation having a likelihood of being clonal above a predetermined threshold, the mutation having a likelihood of being clonal that is above a threshold set adaptively to select a predetermined number of tumour-specific mutations with the highest likelihoods of being clonal amongst the tumour-specific mutations for which a likelihood was determined, and having a likelihood of being clonal that is above a threshold set adaptively to select a predetermined top percentile of tumour-specific mutations amongst the tumour-specific mutations for which a likelihood was determined.
- Methods for calculating each of these are described in WO2022/207925.
- a method of determining whether a tumourspecific mutation is likely to give rise to a neoantigen comprising: identifying one or more neoantigen peptides associated with the tumour-specific mutation using the method of any embodiment of the first aspect; and determining whether the tumour-specific mutation satisfies one or more predetermined criteria applying to the identified neoantigen peptide(s), selected from: at least one of the one or more neoantigen peptides being likely to bind to an MHC molecule, at least one of the one or more neoantigen peptides being likely to be presented by an MHC molecule, at least one of the one or more neoantigen peptides being likely to be immunogenic, at least one of the one or more neoantigen peptides being likely to interact with a TCR, and at least one of the one or more neoantigen peptides being likely to form a stable complex with
- the method of the present aspect may have any one or more of the features of any preceding aspect.
- a method of providing an immunotherapy for a subject that has been diagnosed as having cancer comprising: identifying one or more tumour-specific mutations in the subject; identifying one or more neoantigen peptides associated with one or more tumour-specific mutations using the method of any embodiment of the first aspect; optionally selecting one or more of the neoantigen peptides identified using one or more criteria that apply to the neoantigen peptides; and designing an immunotherapy that targets the one or more neoantigens associated with the, optionally selected, one or more neoantigen peptides.
- Selecting one or more of the neoantigen peptides may comprise selecting one or more neoantigen peptides that satisfy one or more predetermined criteria selected from: whether the neoantigen peptide is likely to bind to an MHC molecule, whether the neoantigen peptide is likely to be presented by an MHC molecule, whether the neoantigen peptide is likely to be immunogenic, whether the neoantigen peptide is likely to interact with a TCR, whether the neoantigen peptide is likely to form a stable complex with an MHC molecule, and whether the neoantigen peptide is likely to be expressed in the tumour.
- the method may further comprise manufacturing the immunotherapy.
- the identity (i.e. sequence) of the neoantigen peptides may be used to manufacture the peptides (e.g. by chemical synthesis) when this forms part of the manufacture (whether as a step to obtain the product or as part of a release assay) and/or testing and development of the immunotherapy.
- the identity (i.e. sequence) of the neoantigen peptides may be used to select and/or validate candidate neoantigens to be targeted by the immunotherapy, such as e.g. by predicting any property that is directly associated with the peptide and that is indicative of likely efficacy of the immunotherapy (e.g. immunogenicity, MHC binding, etc.)
- the present disclosure also relates to immunotherapies that target one or more neoantigen peptides that have been identified using a method as described herein, and to methods for designing and/or providing such immunotherapies.
- an immunotherapy may be an immunogenic composition, a composition comprising immune cells or a therapeutic antibody.
- the immunogenic composition may comprise one or more neoantigens identified (such as e.g. a neoantigen peptide or protein or a cell displaying the neoantigen), or material sufficient for expression of the one or more neoantigens identified (e.g. a DNA or RNA molecule which encodes the neoantigen(s)).
- the composition comprising immune cells may comprise T cells, B cells and/or dendritic cells.
- the composition comprising a therapeutic antibody may comprise one or more antibodies that recognise at least one of the one or more of the neoantigens identified.
- an antibody may be a monoclonal antibody.
- the immunotherapy may be an immunogenic composition comprising the neoantigen peptides or peptide sequences derived therefrom (such as e.g. portions of the neoantigen peptide), or one or more nucleic acids encoding the neoantigen peptides or peptide sequences derived therefrom.
- the immunotherapy may be an antibody binding to the neoantigen peptides.
- the immunotherapy may be a cell therapy, such as e.g. T cells that are selected for reactivity to the one or more neoantigens (such as e.g. by selective expansion), or engineered T cells comprising TCR targeting a neoantigen peptide.
- the cancer may be selected from bladder cancer, gastric cancer, oesophageal cancer, breast cancer, colorectal cancer, cervical cancer, ovarian cancer, endometrial cancer, kidney cancer (renal cell), lung cancer (small cell, non-small cell and mesothelioma), brain cancer (gliomas, astrocytomas, glioblastomas), melanoma, lymphoma, small bowel cancers (duodenal and jejunal), leukemia, pancreatic cancer, hepatobiliary tumours, germ cell cancers, prostate cancer, head and neck cancers, thyroid cancer and sarcomas.
- the cancer may be lung cancer.
- the cancer may be melanoma.
- the cancer may be bladder cancer.
- the cancer may be head and neck cancer.
- the subject may be human.
- Designing an immunotherapy that targets one or more neoantigens identified may comprise designing one or more candidate peptides for each of the one or more neoantigens targeted, each peptide comprising at least a portion of a neoantigen targeted.
- Designing or providing an immunotherapy may comprise obtaining the one or more candidate peptides.
- the method may further comprise testing the one or more candidate peptides for one or more properties. Testing may be performed in vitro or in silico.
- the one or more peptides may be tested for immunogenicity, propensity to be displayed by MHC molecules (optionally by specific MHC molecule alleles, where the alleles may have been chosen depending on the MHC alleles expressed by the subject), ability to elicit proliferation of a population of immune cells, etc.
- the method may further comprise producing the immunotherapy.
- the method may further comprise obtaining a population of dendritic cells that has been pulsed with one or more of the candidate peptides.
- the immunotherapy may be a composition comprising T cells that recognise at least one of the one or more of the neoantigens identified.
- the composition may be enriched for T cells that target at least one of the one or more of the neoantigens identified.
- the method may comprise obtaining a population of T cells and expanding the population of T cells to increase the number or relative proportion of T cells that target at least one of the one or more of the neoantigens identified.
- the method may further comprise obtaining a T cell population.
- a T cell population may be isolated from the subject, for example from one or more tumour samples obtained from the subject, or from a peripheral blood sample or a sample from other tissues of the subject.
- the T cell population may comprise tumour infiltrating lymphocytes.
- T cells may be isolated using methods which are well known in the art. For example, T cells may be purified from single cell suspensions generated from samples on the basis of expression of CD3, CD4 or CD8. T cells may be enriched from samples by passage through a Ficoll- opaque gradient.
- the method may further comprise expanding the T cell population.
- T cells may be expanded by ex vivo culture in conditions which are known to provide mitogenic stimuli for T cells.
- the T cells may be cultured with cytokines such as IL-2 or with mitogenic antibodies such as anti-CD3 and/or CD28.
- the T cells may be co-cultured with antigen-presenting cells (APCs), which may have been irradiated.
- APCs may be dendritic cells or B cells.
- the dendritic cells may have been pulsed with peptides containing one or more of the identified neoantigens as single stimulants or as pools of stimulating neoantigen peptides.
- Expansion of T cells may be performed using methods which are known in the art, including for example the use of artificial antigen presenting cells (aAPCs), which provide additional co-stimulatory signals, and autologous PBMCs which present appropriate peptides.
- aAPCs artificial antigen presenting cells
- Autologous PBMCs may be pulsed with peptides containing neoantigens as discussed herein as single stimulants, or alternatively as pools of stimulating neoantigens.
- a method for expanding a T cell population for use in the treatment of cancer in a subject comprising: identifying one or more neoantigen peptides using a method as described herein; obtaining a T cell population comprising a T cell which is capable of specifically recognising one of the identified neoantigens; and co-culturing the T cell population with a composition comprising the identified neoantigens.
- the method may have one or more of the following features.
- the T cell population obtained may be assumed to comprise a T cell capable of specifically recognising one of the identified neoantigens.
- the method preferably comprises identifying a plurality of neoantigens.
- the neoantigens may be clonal neoantigens.
- the T cell population may comprise a plurality of T cells each of which is capable of specifically recognising one of the plurality of identified neoantigens, and co-culturing the T cell population with a composition comprising the plurality of identified neoantigens.
- the co-culture may result in expansion of the T cell population that specifically recognises the one or more neoantigens.
- the expansion may be performed by co-culture of a T cell with a neoantigen and an antigen presenting cell.
- the antigen presenting cell may be a dendritic cell.
- the expansion may be a selective expansion of T cells which are specific for the neoantigen.
- the expansion may further comprise one or more non-selective expansion steps.
- compositions comprising a population of T cells obtained or obtainable by a method according to any embodiment of the preceding aspect.
- a composition comprising a neoantigen, neoantigen specific immune cell, or an antibody that recognises a neoantigen, for use in the treatment or prevention of cancer in a subject, wherein said neoantigen has been identified as a neoantigen (e.g. identified as being derived from a tumour-specific mutation that is expressed in a tumour of the subject), using the methods described herein.
- composition comprising a neoantigen, neoantigen specific immune cell, or an antibody that recognises a neoantigen, wherein said neoantigen has been identified using the methods described herein.
- a cell or population of cells expressing a neoantigen on its surface wherein said neoantigen has been identified using the methods described herein.
- a method of treating a subject that has been diagnosed as having cancer the method comprising administering an immunotherapy that has been provided using the methods described herein, or a composition as described herein.
- a method of treating a subject that has been diagnosed as having cancer comprising: identifying one or more neoantigens by: identifying a plurality of tumour-specific mutations in the subject; identifying one or more neoantigen peptides associated with one or more tumour-specific mutations using the method of any embodiment of the first aspect; optionally selecting one or more of the neoantigen peptides using one or more predetermined criteria that apply to the neoantigen peptides; and treating the subject with an immunotherapy that targets a neoantigen associated with one or more of the optionally selected neoantigen peptides.
- a computer program comprising code which, when the code is executed on a computer, causes the computer to perform the steps of any method described herein, such as a method according to any embodiment of the first, second, third or fourth aspects above.
- Figure 1 is a flowchart illustrating schematically a method of identifying a neoantigen peptide.
- Figure 2 is a flowchart illustrating schematically a method of providing an immunotherapy (A. steps up to and including designing an immunotherapy, B. steps up to and including administering the immunotherapy to the subject).
- Figure 3 shows an embodiment of a system for identifying neoantigen peptides and/or for providing an immunotherapy.
- Figure 4 illustrates schematically the concept of neojunctions and splice region associated mutations.
- Figure 5 illustrates schematically different types of abnormal splicing that can be caused by mutations.
- Figure 6 illustrates a problem with obtaining peptide sequences from sequences associated with neojunctions.
- Figure 7 illustrates different classes of neojunction variants.
- Figure 8 shows examples of application of methods according to embodiments of the disclosure to splice donor mutations on the + strand (A, B), 5’ intronic splice region mutations on the + strand (C, D), 5’ exonic splice region variants on the + strand (E, F), and splice donor mutations on the - strand (G, H).
- “*” in amino acid sequences indicate “stop” codons.
- Figure 9 shows results obtained using methods of the disclosure to identify mutant peptides from 5’ splice region variants in a synthetic dataset. Left: number of variants in each of the categories identified on Figure 7. Right: number of variants for which a peptide could be assembled using methods of the disclosure.
- Figure 10 shows results obtained using methods of the disclosure to identify mutant peptides from splice variants in cell line data (HCC1395).
- Figure 11 shows results obtained using methods of the disclosure to identify mutant peptides from splice variants in cell line data (HCC1395).
- Figure 12 illustrates a method to identify neoantigen peptides from sequence data from a plurality of samples (labelled R1 , R2).
- the method considers the union of reads from all samples over a locus when assembling a variant sequence.
- the method also preserves the information about which sample each read came from, which enables it to determine whether there are reads supporting the mutation in each sample. This can in turn be used to determine whether the mutation is ubiquitous (present in all of the plurality of samples).
- Figure 13 shows results obtained using methods of the disclosure to identify mutant peptides from tumour samples.
- A In the top plot, each bar of the barchart represents a different cancer patient, and the height of the bar indicates the number of splice region associated variants for which peptides could be identified for each patient in each category of variant (in each bar, from top to bottom: exonic variant, splice donor variant, intronic variant). The bottom plot shows the information filtered for variants that are ubiquitously expressed.
- B Distribution of variant allele fractions (VAF) of splice variants identified across a cohort of patients, by class of splice variant.
- VAF variant allele fractions
- cancer antigens such as those resulting from cancer specific variants (also referred to as “tumour-specific mutations”) represent promising therapeutic targets provided that they are expressed by cancer cells, identifying the peptides that are produced as a result of the presence of these mutations (and which are therefore potential neoantigens) remains difficult.
- a “sample” as used herein may be a cell or tissue sample, or an extract (e.g. a RNA extract obtained from a subject) from which transcriptomic material can be obtained.
- a “sample” as used herein may be a cell or tissue sample, a biological fluid, an extract (e.g. a DNA extract obtained from the subject), from which genomic material can be obtained for genomic analysis, such as genomic sequencing (e.g. whole genome sequencing, whole exome sequencing).
- the sample may be one which has been freshly obtained from a subject or may be one which has been processed and/or stored prior to genomic/transcriptomic analysis (e.g.
- the sample may be a cell or tissue culture sample.
- a sample as described herein may refer to any type of sample comprising cells or genomic and/or transcriptomic material derived therefrom, whether from a biological sample obtained from a subject, or from a sample obtained from e.g. a cell line.
- the sample is a sample obtained from a subject, such as a human subject.
- the sample is preferably from a mammalian subject (such as e.g.
- a mammalian cell sample or a sample from a mammalian subject such as a cat, dog, horse, donkey, sheep, pig, goat, cow, mouse, rat, rabbit or guinea pig
- a human such as e.g. a human cell sample or a sample from a human subject
- the sample may be transported and/or stored, and collection may take place at a location remote from the sequence data acquisition (e.g. sequencing) location, and/or any computer-implemented method steps described herein may take place at a location remote from the sample collection location and/or remote from the sequence data acquisition (e.g. sequencing) location (e.g. the computer-implemented method steps may be performed by means of a networked computer, such as by means of a “cloud” provider).
- tumour cells e.g. a tumour sample or sample comprising circulating tumour cells
- genetic material derived from tumour cells such as e.g. cell free DNA or cell DNA and/or RNA extracted from a sample comprising cells.
- sample may be a “mixed sample”.
- a “mixed sample” refers to a sample that is assumed to comprise multiple cell types or genetic material derived from multiple cell types.
- a mixed sample is typically one that comprises tumour cells or is assumed (expected) to comprise tumour cells, or genetic material derived from tumour cells, and normal cells or genetic material derived from normal cells.
- Genetic material can comprise genomic material (e.g. DNA) or transcriptomic material (e.g. RNA).
- Samples obtained from subjects are typically mixed samples (unless they are subject to one or more purification and/or separation steps).
- the sample comprises tumour cells and at least one non-tumour cell type (and/or genetic material derived therefrom).
- a “tumour sample” refers to a sample derived from or obtained from a tumour.
- Such samples may comprise tumour cells and normal (non-tumour) cells.
- the normal cells may comprise immune cells (such as e.g. lymphocytes), and/or other normal (non-tumour) cells (e.g. stromal cells).
- TIL tumor-infiltrating lymphocytes
- a tumour may be a solid tumour or a non-solid or haematological tumour.
- a tumour sample may be a primary tumour sample, tumour-associated lymph node sample, or a sample from a metastatic site from the subject.
- a sample comprising tumour cells or genetic material derived from tumour cells may be a bodily fluid sample.
- the genetic material derived from tumour cells may be circulating tumour DNA or tumour DNA in exosomes.
- the sample may comprise circulating tumour cells.
- a mixed sample may be a sample of cells, tissue or bodily fluid that has been processed to extract genetic material. Methods for extracting genetic material from biological samples are known in the art.
- a mixed sample may have been subject to one or more processing steps that may modify the proportion of the multiple cell types or genetic material derived from the multiple cell types in the sample.
- a mixed sample comprising tumour cells may have been processed to enrich the sample in tumour cells.
- a sample of purified tumour cells may be referred to as a “mixed sample” on the basis that small amounts of other types of cells may be present, even if the sample may be assumed, for a particular purpose, to be pure (i.e. to have a tumour fraction of less than 100%).
- tumour fraction refers to the proportion of DNA containing cells within a mixed sample that are tumour cells, or to the equivalent proportion that is assumed to result in a particular mixture of genetic material from tumour and non-tumour cells in a sample.
- Methods for determining the tumour fraction in a sample are known in the art. For example, in the context of cell or tissue samples, a tumour fraction may be estimated by analysing pathology slides (e.g.
- a tumour fraction may be estimated using sequence analysis processes that attempt to deconvolute tumour and germline genomes such as e.g. ASCAT (Van Loo et al., 2010), ABSOLUTE (Carter et al., 2012), or ichorCNA (Adalsteinsson et al., 2017).
- a “normal sample”, “healthy sample” or “germline sample” refers to a sample that is assumed not to comprise tumour cells or genetic material derived from tumour cells.
- a germline sample may be a blood sample, a tissue sample, or a purified sample such as a sample of peripheral blood mononuclear cells from a subject.
- the terms “normal”, “germline” or “wild type” when referring to sequences or genotypes refer to the sequence I genotype of cells other than tumour cells.
- a germline sample may comprise a small proportion of tumour cells or genetic material derived therefrom, and may nevertheless be assumed, for practical purposes, not to comprise said cells or genetic material. In other words, all cells or genetic material may be assumed to be normal and/or sequence data that is not compatible with the assumption may be ignored.
- sequence data refers to information that is indicative of the presence of genetic material in a sample that has a particular sequence. Such information may be obtained using sequencing technologies, such as e.g. next generation sequencing (NGS), for example whole exome sequencing (WES), whole genome sequencing (WGS), RNA sequencing or sequencing of captured genomic loci (targeted or panel sequencing), or using array technologies, such as e.g. copy number variation arrays, SNP arrays, expression arrays or other molecular counting assays.
- NGS next generation sequencing
- WES whole exome sequencing
- WGS whole genome sequencing
- RNA sequencing or sequencing of captured genomic loci targeted or panel sequencing
- array technologies such as e.g. copy number variation arrays, SNP arrays, expression arrays or other molecular counting assays.
- the sequence data is typically obtained by DNA sequencing, and particularly next generation sequencing.
- the sequence data may comprise sequencing reads.
- sequence data may comprise a signal (e.g. an intensity value) that is indicative of the number of sequences in the sample that have a particular sequence, for example by comparison to an appropriate control.
- Sequence data may be mapped to a reference sequence, for example a reference genome or transcriptome, using methods known in the art (such as e.g. Bowtie (Langmead et al., 2009)). This may result in aligned sequencing reads, for example in the form of a SAM or BAM file.
- sequencing reads or equivalent non-digital signals may be associated with a particular genomic location (where the “genomic location” refers to a location in the reference genome to which the sequence data was mapped). Further, a genomic location may contain a mutation compared to the reference sequence at the particular genomic location.
- variant calling The process of identifying the presence of a mutation at a particular location in a sample is referred to as “variant calling” and can be performed using methods known in the art (such as e.g. the GATK HaplotypeCaller.gatk. broadinstitute. org/hc/en-us/articles/360037225632-HaplotypeCaller).
- tumour-specific mutation refers to a difference in a nucleotide sequence (e.g. DNA or RNA) in a tumour cell compared to a healthy cell from the same subject.
- the difference in the nucleotide sequence can result in the expression of a protein which is not expressed by a healthy cell from the same subject.
- a mutation may be a single nucleotide variant (SNV), multiple nucleotide variant (MNV), a deletion mutation, an insertion mutation, a translocation, a missense mutation, a translocation, a fusion, or any other change in the genetic material of a tumour cell.
- Mutations may be identified by exome sequencing, RNA-sequencing, whole genome sequencing and/or targeted gene panel sequencing and/or routine Sanger sequencing of single genes, followed by sequence alignment and comparing the DNA and/or RNA sequence from a tumour sample to DNA and/or RNA from a reference sample or reference sequence (e.g. the germline DNA and/or RNA sequence, or a reference sequence from a database). Suitable methods are known in the art.
- An "indel mutation” refers to an insertion and/or deletion of bases in a nucleotide sequence (e.g. DNA or RNA) of an organism.
- the indel mutation occurs in the DNA, preferably the genomic DNA, of an organism.
- the indel may be from 1 to 150 bases, for example 1 to 90, 1 to 50, 1 to 23 or 1 to 10 bases.
- a “splice region associated mutation”, “splice variant”, or “splice mutation” refers to a mutation that affects the splicing of a transcript comprising the mutation. These mutations result in the creation of at least one new junction between intronic and/or exonic sequences, also referred to as a “neojunction”.
- Figure 4 shows a gene model comprising two exons (Exon 1 and Exon 2) separated by an intron, as well as RNA sequencing read data mapped to the gene model.
- the illustrated sequencing data includes reads that map exclusively to Exons 1 and 2 (illustrated by dashed lines over the intronic sequence where the sequencing reads do not map), as expected for a sequence that is not mutated at the gene locus.
- the illustrated sequencing data also includes reads that map to Exon 1 and at least part of the following intron, and that include a mutation in the 5’ exon near the exon-intron boundary. These reads provide evidence that the mutation affected the splicing of the intron, resulting in novel sequence being included in the mature transcript downstream of the exon.
- the boundary between the exon sequence (in the illustrated case this is the normal, complete exon sequence) and the retained intronic sequence is a new boundary (or “neojunction”) that is not present in normal mature transcripts (where only a boundary between the sequence of exon 1 and the sequence of exon 2 exists).
- Figure 4 shows an example of a splice region associated mutation that results in at least partial intron retention.
- Any type of mutation above may affect the splicing of a transcript if it is located in a region that is involved in the splicing of the transcript.
- Regions involved in the splicing of transcripts can include intronic regions, including specifically the splice donor site and splice acceptor site, or in exonic regions, typically near an intron I exon boundary.
- Such variants are referred to herein as “splice region variants” or “splice region mutations”.
- a splice region variant may be any variant that occurs within a predetermined distance of a splice site, where a splice site may be defined as the location of a splice acceptor site or splice donor site.
- a predetermined distance may be set symmetrically for the intron side and the exon side of a splice junction, or may be set differently for the exon side and the intron side.
- a splice region variant may include any variant that occurs within a predetermined distance of 3 bases of a splice site, in an exon.
- a splice region variant may include any variants that affects any one or more of the last 3 bases at the 3’ end of an exon, or any one or more of the first 3 bases at the 5’ end of an exon.
- a splice region variant may include any variant that occurs within a predetermined distance of 6 bases of a splice site, in an intron.
- a splice region variant may include any variants that affects any one or more of the last 8 bases at the 3’ end of an intron (i.e. any of: the bases of the splice acceptor site itself -which are the last 2 bases of the intron- and the 6 bases that precede it), or any one or more of the first 8 bases at the 5’ end of an intron (i.e. any of: the bases of the splice donor site itself - which are the first two bases of the intron- and the 6 bases that follow it).
- a splice donor site may be defined as the 2 base region at the 5’ end of an intron.
- a splice acceptor site may be defined as the 2 base region at the 3’ end of an intron.
- a splice donor variant or mutation may be defined as any mutation that occurs within the first or second base at the 5’ end of an intron.
- a splice acceptor variant or mutation may be defined as any mutation that occurs within the last or penultimate base at the 3’ end of an intron.
- a splice donor site is a conserved Gil sequence that is normally located at the 5’ end of introns.
- a splice acceptor site is a conserved AG sequence that is normally located at the 3’ end of introns. These sites are recognised by the splicing machinery, leading to the intronic sequence between the splice donor site and the splice acceptor site (included) being excised. Mutations can cause the loss of donor or acceptor sites, or the creation of new donor or acceptor sites. These may be referred to as “cryptic splice sites”.
- Figure 5 illustrates schematically different types of abnormal splicing that can be caused by splice region associated mutations.
- a pre-m RNA is shown on the top row, including a first exon (Exon 1 , 5’ exon, upstream exon) an intron and a second exon (Exon 2, 3’ exon, downstream exon). This matches a gene model for the gene.
- the second row shows the corresponding mature mRNA after normal splicing, where the intron is excised, resulting in a junction between Exon 1 and Exon 2.
- the third row shows a mature mRNA in which a mutation at the 3’end of the 5’ exon (i.e.
- the fourth row shows a mature mRNA in which a mutation in Exon 2 has caused it to be “skipped”, leading to splicing of the entire sequence between Exon 1 and an exon downstream of Exon 2. This leads to one neojunction between Exon 1 and Exon 3. In the illustrated embodiment the entire Exon 2 is skipped but it is also possible for partial exon skipping to occur.
- the fifth row shows a mature mRNA in which a mutation in Exon 1 has caused the splicing at the start of the intro to fail, but a cryptic splice donor site in the intron was used instead, leading to retention of part of the intron and creation of two neojunctions (one between Exon 1 and the intron and one between the intronic cryptic site and Exon 2).
- the sixth row illustrates a mature mRNA in which a mutation in Exon 1 has caused the appearance of a new donor site (Cryptic site) in Exon 1 , leading to the sequence of Exon 1 following this site being excised from the mature mRNA and creating a neojunction between the cryptic site in Exon 1 and the start of Exon 2.
- the last row represents the fact that combinations of these types of abnormal splicing can also occur. For example, a part of an intron can be retained followed by full or partial skipping of an exon.
- the methods described herein can be used to identify neoantigen peptides associated with any of these types of abnormal splicing.
- a “neoantigen” is an antigen that arises as a consequence of a mutation within a cancer cell. Thus, a neoantigen is not expressed by normal (i.e. non-tumour) cells.
- a neoantigen may be processed to generate distinct peptides which can be recognised by T cells when presented in the context of MHC molecules. As described herein, neoantigens may be used as the basis for cancer immunotherapies. References herein to "neoantigens" are intended to include peptides derived from mutated proteins, regardless of their size, provided that they arise as a consequence of a mutation and are immunogenic.
- the term may also encompass peptides derived from mutated proteins that are believed to be likely to be immunogenic by virtue of being absent in normal cells.
- the term "neoantigen” as used herein is intended to encompass any part of a neoantigen that is likely to be immunogenic (whether or not true immunogenicity has been verified).
- An "antigenic" molecule as referred to herein is a molecule which itself, or a part thereof, is capable of stimulating an immune response, when presented to the immune system or immune cells in an appropriate manner. Immunogenicity is subject-specific and depends on multiple factors.
- Immunogenicity is believed to require presentation of the neoantigen to an MHC molecule present in the subject, which itself requires binding of the neoantigen to the MHC molecule.
- the binding of a neoantigen to a particular MHC molecule (encoded by a particular HLA allele) and/or the presentation of the neoantigen by the MHC molecule may be predicted using methods which are known in the art. Examples of methods for predicting MHC binding and/or presentation include those described by Lundegaard et al., O’Donnel et al. 2020, and Bullik- Sullivan et al. 2018.
- MHC binding of neoantigens may be predicted using the netMHC-3 (Lundegaard et al.) and netMHCpan4 (Jurtz et al.) algorithms.
- a neoantigen that has been predicted to bind to or be presented by a particular MHC molecule may be considered to be likely to be presented by said MHC molecule on the cell surface.
- Immunogenicity is further believed to require interaction between the neoantigen with a T cell receptor (TCR) present in the subject, in the context of an MHC molecule present in the subject.
- TCR T cell receptor
- the binding of peptides or peptide-MHC complexes to T cell receptors can be predicted using methods which are known in the art, such as e.g.
- neoantigen may refer to a peptide that has been predicted as likely to be immunogenic using any of the methods described above for predicting MHC binding, MHC presentation and/or TCR interaction, or that has been verified as immunogenic using methods known in the art such as e.g. ELISpot assays. Neoantigens that have not yet been predicted or verified to be immunogenic may by contrast be referred to as “candidate neoantigens”.
- a “clonal neoantigen” is a neoantigen that results from a mutation that is present in essentially every tumour cell in one or more samples from a subject (or that can be assumed to be present in essentially every tumour cell from which the tumour genetic material in the sample(s) is derived).
- a “clonal mutation” (sometimes referred to as “truncal mutation”) is a mutation that is present in essentially every tumour cell in one or more samples from a subject (or that can be assumed to be present in essentially every tumour cell from which the tumour genetic material in the sample(s) is derived).
- a clonal mutation may be a mutation that is present in every tumour cell in one or more samples from a subject.
- a “sub-clonal” neoantigen is a neoantigen that results from a mutation that is present in a subset or a proportion of cells in one or more tumour samples from a subject (or that can be assumed to be present in a subset of the tumour cells from which the tumour genetic material in the sample(s) is derived).
- a “sub- clonal” mutation is a mutation that is present in a subset or a proportion of cells in one or more tumour samples from a subject (or that can be assumed to be present in a subset of the tumour cells from which the tumour genetic material in the sample(s) is derived).
- a neoantigen or mutation may be clonal in the context of one or more samples from a subject while not being truly clonal in the context of the entirety of the population of tumour cells that may be present in a subject (e.g. including all regions of a primary tumour and metastasis).
- a clonal mutation may be “truly clonal” in the sense that it is a mutation that is present in essentially every tumour cell (i.e.
- a “clonal neoantigen” or “clonal mutation” may also be referred to as a “ubiquitous neoantigen” or “ubiquitous mutation”, to indicate that the neoantigen is present in essentially all tumour cells or all tumour samples that have been analysed, but may not be present in all tumour cells that may exist in the subject.
- the terms “clonal” and “ubiquitous” are used interchangeably unless context indicates that reference to “true clonality” was intended.
- tumour cell in relation to one or more samples or a subject may refer to at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94% at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% of the tumour cells in the one or more samples or the subject.
- a neoantigen/mutation that is identified as likely to be clonal (or “ubiquitous”) may be considered likely to be truly clonal, or at least more likely to be truly clonal than a neoantigen/mutation that is identified as unlikely to be clonal.
- the confidence in the probability that a clonal neoantigen/mutation identified in a subject is truly clonal increases when the sample(s) used to identify the clonal neoantigen/mutation capture a more complete picture of the genetic diversity of the tumour (e.g. by including a plurality of samples from the subject, such as e.g. samples from different regions of the tumour, and/or by including samples that inherently capture a diversity of tumour cells such as e.g. ctDNA samples).
- a neoantigen/mutation that is identified as unlikely to be clonal is unlikely to be truly clonal, because the identification that the neoantigen/mutation is unlikely to be clonal indicates that even in the restricted view afforded by the sampling process, there is evidence that the neoantigen/mutation is not present in all tumour cells.
- the process of identifying clonal neoantigens/mutations may be seen as prioritising which candidate neoantigens/mutations are most likely to be clonal, based on the restricted view of the clonal structure of the subject’s tumour available from the one or more samples.
- cancer cell fraction refers to the proportion of tumour cells that contain a mutation, such as e.g. a mutation that results in a particular neoantigen.
- a cancer cell fraction may be estimated based on one or more samples, and as such may not be equal to the true cancer cell fraction in the subject (as explained above). Nevertheless, the cancer cell fraction estimated based on one or more samples may provide a useful indication of the likely true cancer cell fraction. Further, as explained above, the accuracy of such an estimate may increase when the sample(s) used to estimate the cancer cell fraction capture a more complete picture of the genetic diversity of the tumour. Additional sources of noise and confounding factors in genomic data mean that a cancer cell fraction determined from one or more samples represents an estimate.
- mutations/neoantigens that are more likely to be clonal are expected to be associated with a higher CCF estimate (which may not be equal to 1) than mutations that are less likely to be clonal, which are expected to be associated with a lower CCF estimate.
- a cancer cell fraction estimate may be obtained by integrating variant allele frequencies with copy numbers and purity estimates as described by Landau et al. (2013). Such a CCF estimate can also be used to identify mutations that are likely to be clonal.
- a clonal mutation may be defined as a mutation which has an estimated cancer cell fraction (CCF) > 0.75, such as a CCF > 0.80, 0.85. 0.90, 0.95 or 1.0.
- a subclonal mutation may be defined as a mutation which has a CCF ⁇ 0.95, 0.90, 0.85, 0.80, or 0.75.
- a CCF estimate may be associated with (e.g. derived from) a distribution associating a probability with each of a plurality of possible values of CCF between 0 and 1 , from which statistical estimates of confidence may be obtained.
- a mutation may be identified as clonal if there is more than a 50% chance or probability that its cancer cell fraction (CCF) reaches or exceeds the required value as defined above, for example 0.75 or 0.95, such as a chance or probability of 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% or more.
- mutations may be classified as likely clonal or subclonal based on whether the posterior probability that their CCF exceeds 0.95 (or 0.75, or any other chosen threshold) is greater or lesser than 0.5, respectively.
- the threshold may be fixed.
- the threshold may be determined for a particular set of mutations that are investigated.
- the threshold may be set based on a benchmarking data set with known clonal I non-clonal status, to reach a predetermined precision and/or recall.
- a benchmarking data set may be obtained using synthetic data and/or using a data set obtained from a population with known clonality structure (for example a cell line mixture data).
- the threshold may be set such that any mutation (or a certain % of mutations) that is associated with an estimated CCF that has a confidence interval meeting the criteria described above (e.g.
- the threshold may be set such that any mutation (or a certain % of mutations) that is associated with an estimated CCF that has a posterior probability distribution meeting the criteria described above (e.g. a posterior probability that their CCF exceeds 0.95 (or 0.75, or any other chosen threshold) is greater than 0.5) is selected as likely to be clonal.
- a cancer immunotherapy refers to a therapeutic approach comprising administration of an immunogenic composition (e.g. a vaccine), a composition comprising immune cells, or an immunoactive drug, such as e.g. a therapeutic antibody, to a subject.
- an immunogenic composition e.g. a vaccine
- a composition comprising immune cells or an immunoactive drug, such as e.g. a therapeutic antibody
- an immunogenic composition or vaccine may comprise a neoantigen, neoantigen presenting cell or material necessary for the expression of the neoantigen.
- a composition comprising immune cells may comprise T and/or B cells that recognise a neoantigen.
- the immune cells may be isolated from tumours or other tissues (including but not limited to lymph node, blood or ascites), expanded ex vivo or in vitro and re-administered to a subject (a process referred to as “adoptive cell therapy”).
- T cells can be isolated from a subject and engineered to target a neoantigen (e.g. by insertion of a chimeric antigen receptor that binds to the neoantigen) and re-administered to the subject.
- a therapeutic antibody may be an antibody which recognises a neoantigen.
- an antibody as referred to herein will recognise the neoantigen.
- the neoantigen is an intracellular antigen
- the antibody will recognise the neoantigen peptide-MHC complex.
- an antibody which "recognises" a neoantigen encompasses both of these possibilities.
- an immunotherapy may target a plurality of neoantigens.
- an immunogenic composition may comprise a plurality of neoantigens, cells presenting a plurality of neoantigens or the material necessary for the expression of the plurality of neoantigens.
- a composition may comprise immune cells that recognise a plurality of neoantigens. Similarly, a composition may comprise a plurality of immune cells that recognise the same neoantigen. As another example, a composition may comprise a plurality of therapeutic antibodies that recognise a plurality of neoantigens. Similarly, a composition may comprise a plurality of therapeutic antibodies that recognise the same neoantigen.
- a composition as described herein may be a pharmaceutical composition which additionally comprises a pharmaceutically acceptable carrier, diluent or excipient.
- the pharmaceutical composition may optionally comprise one or more further pharmaceutically active polypeptides and/or compounds.
- Such a formulation may, for example, be in a form suitable for intravenous infusion.
- an immune cell is intended to encompass cells of the immune system, for example T cells, NK cells, NKT cells, B cells and dendritic cells.
- the immune cell is a T cell.
- An immune cell that recognises a neoantigen may be an engineered T cell.
- a neoantigen specific T cell may express a chimeric antigen receptor (CAR) or a T cell receptor (TCR) which specifically binds a neoantigen, or an affinity-enhanced T cell receptor (TCR) which specifically binds a neoantigen (as discussed further hereinbelow).
- CAR chimeric antigen receptor
- TCR T cell receptor
- TCR affinity-enhanced T cell receptor
- the T cell may express a chimeric antigen receptor (CAR) or a T cell receptor (TCR) which specifically binds to a neoantigen (for example an affinity enhanced T cell receptor (TCR) which specifically binds to a neo-antigen or a neo-antigen peptide).
- a population of immune cells that recognise a neoantigen may be a population of T cell isolated from a subject with a tumour.
- the T cell population may be generated from T cells in a sample isolated from the subject, such as e.g. a tumour sample, a peripheral blood sample or a sample from other tissues of the subject.
- the T cell population may be generated from a sample from the tumour in which the neoantigen is identified.
- the T cell population may be isolated from a sample derived from the tumour of a patient to be treated, where the neoantigen was also identified from a sample from said tumour.
- the T cell population may comprise tumour infiltrating lymphocytes (TIL).
- Antibody includes monoclonal antibodies, polyclonal antibodies, multispecific antibodies (e.g., bispecific antibodies), and antibody fragments that exhibit the desired biological activity.
- immunoglobulin Ig
- antibody immunoglobulin
- an “immunogenic composition” is a composition that is capable of inducing an immune response in a subject.
- the term is used interchangeably with the term “vaccine”.
- the immunogenic composition or vaccine described herein may lead to generation of an immune response in the subject.
- An "immune response" which may be generated may be humoral and/or cell-mediated immunity, for example the stimulation of antibody production, or the stimulation of cytotoxic or killer cells, which may recognise and destroy (or otherwise eliminate) cells expressing antigens corresponding to the antigens in the vaccine on their surface.
- the immunogenic composition may comprise one or more neoantigens, or the material necessary for the expression of one or more neoantigens.
- a neoantigen may be delivered in the form of a cell, such as an antigen presenting cell, for example a dendritic cell.
- the antigen presenting cell such as a dendritic cell may be pulsed or loaded with the neo-antigen or neoantigen peptide or genetically modified (via DNA or RNA transfer) to express one, two or more neo-antigens or neoantigen peptides, for example 2, 3, 4, 5, 6, 7, 8, 9 or 10 neo-antigens or neo-antigen peptides.
- Methods of preparing dendritic cell immunogenic compositions or vaccines are known in the art.
- Neoantigen peptides may be synthesised using methods which are known in the art.
- the term "peptide” is used in the normal sense to mean a series of residues, typically L-amino acids, connected one to the other typically by peptide bonds between the a-amino and carboxyl groups of adjacent amino acids.
- the term includes modified peptides and synthetic peptide analogues.
- the neoantigen peptide may comprise the cancer cell specific mutation (e.g the non-silent amino acid substitution encoded by a single nucleotide variant (SNV)) at any residue position within the peptide.
- SNV single nucleotide variant
- a peptide which is capable of binding to an MHC class I molecule is typically 7 to 13 amino acids in length.
- neoantigen peptides may be from 7 to 15, such as 8 to 13 amino acids in length.
- the peptides may be 7, 8, 9, 10, 11 , 12, 13, 14 or 15 amino acids long.
- 15 amino acids long peptides are designed as a set of overlapping sequences with 11 amino acid overlaps (i.e. a “jump” of 4 amino acids from the start of one sequence to the start of the next) each including at least one amino acid that results from the presence of a tumour specific mutation, to provide an overlapping peptide pool.
- longer peptides for example 15-31-mers, may be used.
- the mutation i.e. any one or more amino acids resulting from the presence of a tumour specific mutation
- the mutation may be at any position, for example at the centre of the peptide, e.g. at positions 7, 8, 9, 10, 11 , 12, 13, 14, 15 or 16.
- Such peptides can also be used to stimulate both CD4 and CD8 cells to recognise neoantigens.
- longer peptides such as peptides that are 27, 28, 29, 30 or 31 amino acids long, may be used to stimulate both CD4+ and CD8+ cells.
- the mutation may be present at any residue position(s) within the peptide.
- the peptide may comprise one or more amino acids that are not present in a corresponding native peptide expressed by a healthy cell at any residue position(s) within the peptide.
- the mutation for example an amino acid substitution(s) or any other new sequence resulting from the presence of a tumour specific mutation, may be present (or may start) at any one or more of positions 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12 or 13 in a peptide comprising 13 amino acids.
- a mutation comprising a plurality of amino acids may be present at any one or more of positions 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12 or 13 in a peptide comprising 13 amino acids.
- Reference to the mutation being at a particular position within the peptide refers to the residue position of the mutation relative to the amino acid sequence of the corresponding native peptide expressed by a healthy cell.
- the mutation is at or near the centre of the peptide.
- the mutation i.e. any one or more amino acids resulting from the presence of a tumour specific mutation
- the mutation i.e.
- any one or more amino acids resulting from the presence of a tumour specific mutation may be present (or may start) at any one or more of positions 6, 7, 8 or 9 in a peptide comprising 13 amino acids.
- the mutation i.e. any one or more amino acids resulting from the presence of a tumour specific mutation
- a mutation comprising a plurality of amino acids may start at a predetermined position within the peptide. For example, a predetermined position may be selected from position 1 , position 1+j, position 1+2j, position 1+3j, etc.
- j is a predetermined “jump” value, such as e.g. 3, 4 or 5 amino acids.
- a mutation may result in a sequence of amino acids that are not present in a corresponding peptide expressed by a healthy cell, the sequence being longer than the length of the peptide.
- the peptide may consist of amino acids that result from the mutation. Further, the sequence of amino acids that result from the mutation may not be fully included in the peptide, which may instead comprise a portion of said sequence.
- treatment refers to reducing, alleviating or eliminating one or more symptoms of the disease which is being treated, relative to the symptoms prior to treatment.
- prevention refers to delaying or preventing the onset of the symptoms of the disease. Prevention may be absolute (such that no disease occurs) or may be effective only in some individuals or for a limited amount of time.
- a computer system includes the hardware, software and data storage devices for embodying a system or carrying out a method according to the above described embodiments.
- a computer system may comprise a central processing unit (CPU) and/or a graphic processing unit (GPU), input means, output means and data storage, which may be embodied as one or more connected computing devices.
- a computer system may comprise a display or comprises a computing device that has a display to provide a visual output display.
- the data storage may comprise RAM, disk drives or other non- transitory computer readable media.
- the computer system may include a plurality of computing devices connected by a network and able to communicate with each other over that network. It is explicitly envisaged that computer system may consist of or comprise a cloud computer.
- computer readable media includes, without limitation, any non- transitory medium or media which can be read and accessed directly by a computer or computer system.
- the media can include, but are not limited to, magnetic storage media such as floppy discs, hard disc storage media and magnetic tape; optical storage media such as optical discs or CD-ROMs; electrical storage media such as memory, including RAM, ROM and flash memory; and hybrids and combinations of the above such as magnetic/optical storage media.
- the present disclosure provides methods for identifying a neoantigen peptide associated with a tumour-specific mutation, using RNA sequence data from one or more samples comprising tumour genetic material. An illustrative method will be described by reference to Figure 1.
- the method may comprise optional step 10 of obtaining one or more samples comprising tumour genetic material (such as e.g. one or more tumour samples).
- the sample(s) may be mixed samples comprising genomic material from multiple cell types including tumour cells and non-tumour cells (also referred to as “reference”, “healthy”, “normal” or “germline” cells).
- One or more germline samples may also be obtained, which do not comprise tumour genetic material.
- Germline samples may be matched germline samples, obtained from the same subject as the subject from which the one or more tumour samples are obtained.
- a matched germline sample improves the accuracy of calling of somatic (tumour-specific) mutations, as any variant position identified in a tumour sample can be compared to variant positions in a matched germline sample to exclude germline variants.
- the same matched germline sample may be used to analyse a plurality of tumour samples from a subject. Further, the matched germline sample and one or more tumour samples may have been obtained at different times. For example, a first tumour sample and matched germline sample may have been obtained at the time of diagnosis or resection of a tumour, and a further tumour sample may be obtained and analysed together with the initial matched germline sample at a later time point.
- a reference sample or genome including common germline variants may be used.
- a process-matched normal sample may be used, which may not have been obtained from the same subject, or may have been obtained from a pool of subjects.
- the samples may be sequenced at step 12, to obtain at least RNA sequence data, and also optionally DNA sequence data.
- the RNA sequence data may be obtained by RNA sequencing.
- the DNA sequence data may be obtained using one of whole exome sequencing, or whole genome sequencing. Alternatively, the sequence data may have been previously obtained and may be received from a user interface, computing device or database, At optional step 14, the sequence data may be analysed to identify one or more mutations that are likely to be present in the tumour cells but not in non-cancerous cells.
- Step 14 may comprise the steps of aligning the sequences from the one or more samples (i.e. the mixed sample(s) and the germline sample(s), if available) to a reference such as e.g. a reference genome or transcriptome, and identifying genomic locations (also referred to as “positions” or “loci”) where the sequence of the tumour differs from the germline sequence or can be assumed to differ from the germline sequence (e.g. if a germline sequence for the subject is not available).
- a reference such as e.g. a reference genome or transcriptome
- tumour-specific mutations may be somatic mutations present in a tumour of the subject from which the samples have been obtained. Any one or more of the tumour-specific mutations identified (or otherwise selected for example by a user through a user interface, or obtained from a computing device or database), may then be analysed to identify peptides associated with the mutations. The subsequent steps will be described in relation to a single tumour-specific mutation, and can be repeated for each of a plurality of tumour-specific mutations identified.
- the tumour-specific mutation may be a tumour-specific mutation that has been identified as a splice region variant.
- the tumour-specific mutation may be a tumourspecific mutation that has been identified as a 5’ splice variant. These types of variants may be identified based on the position of the mutation relative to annotated splice sites in a reference genome or transcriptome (sometimes also referred to as a gene model).
- the tumour-specific mutation may be a tumour that has been filtered or annotated for whether the tumour-specific mutation has RNA sequence read support in each of a plurality of tumour samples from a patient. Such mutations may be considered ubiquitous in the tumour.
- the tumour-specific mutation may be a tumour that has been filtered or annotated for whether the mutation is likely to be clonal in a tumour.
- the tumour-specific mutation may be a tumour that has been filtered or annotated for whether the mutation is likely to be expressed in a tumour.
- RNA sequences reads that contain the tumour specific mutations are selected from the RNA sequence data.
- Reads may be reads that have been mapped to a reference genome or transcriptome and therefore reads may be selected by selecting reads that overlap the genomic position of the tumour-specific mutation and where the sequence of the read is compatible with the presence of the mutation.
- the reads may be selected from sequence data from a single sample or from a plurality of samples. In other words, selecting one or more RNA sequence reads is performed across the RNA sequence data from all of a plurality of samples (e.g. a plurality of samples from different tumour regions of a patient).
- the reads selected at step 16 are used to assemble a sequence comprising the tumour-specific mutation and compatible with overlapping sequences of the selected one or more RNA sequence reads. This may comprise assembling one or more of the selected RNA sequence reads into the longest sequence comprising the tumour-specific mutation compatible with overlapping sequences of the selected one or more RNA sequence reads.
- an anchor sequence is extracted from the assembled sequence as a subsequence (i.e. a part of a sequence) that precedes the tumour-specific mutation and that only includes positions identified as part of exonic regions.
- positions of the assembled sequence may be identified (e.g. annotated) as intronic or exonic regions, by reference to a reference genome I transcriptome in which exonic and intronic regions are annotated.
- Extracting an anchor sequence may comprise determining whether the mutation is in a gene transcribed from the positive strand or the negative strand. Information on whether a gene is transcribed from the negative or the positive strand may be obtained from a genome I transcriptome reference.
- a mutation located in a genomic region annotated in a genome I transcriptome reference as located in a gene transcribed from the positive strand may be considered a mutation in a gene transcribed from the positive strand, and similarly for genes annotated on the negative strand.
- extracting an anchor sequence from the assembled sequence as a subsequence that precedes the tumour-specific mutation and that only includes positions identified as part of exonic regions may comprise extracting a sequence that is in 5’ of the tumour-specific mutation.
- extracting an anchor sequence from the assembled sequence may comprise obtaining the reverse complement of the assembled sequence, wherein the anchor sequence is in 5’ of the tumour-specific mutation in the reverse complement sequence.
- a reading frame is identified for the assembled sequence.
- the identified reading frame is the reading frame of a transcript that satisfies at least the following conditions: (i) it overlaps the genomic position of the tumour-specific mutation, and (ii) the anchor sequence maps to the reference transcript or the reference transcript has a corresponding protein that a translation of the anchor sequence maps to.
- An anchor sequence may be considered to map to a reference sequence if it has a matching sequence of a minimum length and/or a number of mismatches below a predetermined number.
- Identifying a reading frame may comprise translating at least the anchor sequence using all 3 possible reading frames and identifying a protein corresponding to a reference transcript that a translated anchor sequence maps to, wherein the identified reading frame is the reading frame of the translated anchor that maps to the protein corresponding to a reference transcript.
- identifying a reading frame may comprise identifying a reference transcript that overlaps the genomic position of the tumour-specific mutation and that the anchor sequence maps to, and identifying the reading frame of the identified reference transcript.
- the anchor sequence may be matched to reference transcripts at the protein level (i.e. matching translated anchor sequences to protein sequences associated with reference transcripts) or at the cDNA level (i.e. matching cDNA anchor sequences to reference transcripts sequences).
- the identified reading frame is the reading frame of the mapping anchor sequence, and no separate further translation step may be needed.
- the identified reading frame is the reading frame of the peptide corresponding to the reference transcript and this can be used to translate the complete assembled sequence.
- a neoantigen peptide that corresponds to the tumour-specific mutation is identified as a translation of the assembled sequence or a part of this translation, where the translation uses the reading frame identified at step 22.
- the assembled sequence or a part thereof which contains the mutation or contains a sequence that results in a peptide that differs from the wild type sequence as a result of the presence of the mutation
- the resulting peptide is considered as a candidate neoantigen.
- the results of one or more of the preceding steps are provided to a user, for example through a user interface.
- the above methods find applications in the context of designing immunotherapies, particularly immunotherapies that use peptides or sequences encoding peptides associated with tumourspecific mutations to generate or promote an immune response. Indeed, such methods typically require the identification of peptides associated with the mutations for the purpose of manufacturing the immunotherapy and/or for the purpose of selecting peptides for inclusion in the immunotherapy based on any criteria that apply to the peptide sequence (such as e.g. predicted immunogenicity). Indeed, peptides that are predicted to be likely to be immunogenic are more promising candidates for inclusion in an immunotherapy, and for it is therefore beneficial to identify neoantigen peptide sequences in order to test this even when the immunotherapy uses sequences encoding the peptides selected.
- cancer immunotherapies that target cancer-specific antigens (also referred to herein as “cancer neoantigens”, or simply “neoantigens”.
- the cancer neoantigens may be clonal neoantigens.
- methods of providing an immunotherapy for a subject comprising identifying one or more peptides associated with one or more tumour specific mutations, wherein the identifying is based on data from one or more samples from the subject and further is performed using a method as describe herein. An example of such a method will be described by reference to Figure 2.
- one or more samples comprising tumour genetic material and one or more germline samples are obtained from a subject.
- the subject may be a subject that has been diagnosed as having cancer, and may be (but does not need to be) the same subject for which the immunotherapy is provided.
- a list of tumour-specific variants is obtained. This may comprise step 212’ of obtaining a list of tumour-specific variants from genomic sequence data and/or step 212” of obtaining a list of tumour-specific variants from RNA sequence data. Sequences comprising tumour-specific variants that are believed to potentially lead to the expression of a protein that is not present in a normal cell may be referred to as “candidate neoantigens”.
- tumourspecific variants may be filtered for variants that are likely to result in the expression of a protein that is not present in a normal cell, and therefore reference to tumour-specific variants may equally refer to candidate neoantigens unless context indicates otherwise.
- a list of candidate neoantigens may be obtained from genomic sequence data from the sample(s) using methods known in the art, for example as described in WO 2016/16174085, Landau et al. (2013), Lu et al. (2016), Leko et al. (2019), Hundal et al. (2019), and others.
- the list of candidate neoantigens may comprise a single neoantigen, or a plurality of neoantigens. Preferably, the list comprises a plurality of neoantigens.
- the neoantigens may be clonal neoantigens. Methods to identify clonal neoantigens are known in the art and include the methods described in WO 2016/16174085, Landau et al. (2013), Roth et al. (2014), McGranahan et al. (2016), and in WO 2022/207925.
- one or more candidate neoantigens may be identified from RNA sequence data from the sample(s) at step 212”, for example by identifying one or more RNA sequence reads that include a variant. Variants can be identified by comparison with an expected healthy sequence such as a reference genome or transcriptome or RNA/DNA sequence from a normal sample of the subject.
- step 212 may comprise optional step 212a”, where the RNA sequence content of the one or more samples comprising tumour genetic material and optionally the matched germline sample may be determined, for example by sequencing the RNA (or mRNA) in the sample using RNA sequencing.
- Alternative methods such as e.g.
- Step 212 may further comprise optional step 212b” of analysing the RNA sequence data to identify one or more mutations that are likely to be present in the tumour cells but not in non-cancerous cells. These represent tumour-specific mutations and may be used as candidate neoantigens. This may comprise the steps of aligning the RNA sequences from the one or more samples (i.e. the sample(s) comprising tumour genetic material and the germline sample(s), if available). This may further comprise identifying locations where the RNA sequence of the tumour differs from the germline sequence or can be assumed to differ from the germline sequence (e.g.
- step 212b may comprise aligning the RNA sequences from the one or more samples to a reference genome or transcriptome and identifying sequences that are not expected to be present in such a reference (e.g. novel transcripts, splicing variants, fusions, single nucleotide variants, indels). For example, a fusion or splicing variant may be identified if one or more reads align to non-contiguous sections of the reference transcriptome or genome, or fail to align to the reference genome or transcriptome.
- Step 212’ may comprise optional step 212a’, where the sequence content of the one or more samples comprising tumour genetic material and optionally the matched germline sample may be determined, for example by sequencing the genomic material in the sample using one of whole exome sequencing or whole genome sequencing.
- the sequence data may be analysed to identify one or more mutations that are likely to be present in the tumour cells but not in non-cancerous cells. These represent tumour-specific mutations and may be used as candidate neoantigens. This may comprise the steps of aligning the sequences from the one or more samples (i.e.
- genomic sequence data for the sample comprising tumour genetic material may be used to determine the probability of the tumour-specific mutations being clonal, for example as described in WO 2022/207925.
- one or more candidate neoantigen peptides corresponding to the (optionally clonal) candidate neoantigens identified at step 212 are obtained, using a method as described herein. This may comprise, for each tumour-specific mutation identified using genomic sequence data (steps 212’), analysing RNA sequence data obtained at step 212a” to identify one or more candidate neoantigen peptides for each tumour-specific variant (optionally restricted to clonal variants) that has support in the RNA sequence data (e.g. for which at least 1 RNA sequencing read containing the variant can be identified).
- tumour-specific variants are by definition candidate neoantigens as they lead to the expression of a protein or peptide not present in a normal cell, due to the presence of the variant.
- an immunotherapy that targets at least one (and optionally a plurality) of the candidate (optionally clonal) neoantigens is designed.
- Designing such an immunotherapy comprises identifying one or more candidate peptides for each of the candidate clonal neoantigens (step 216A).
- a plurality of peptides may be designed for at least one of the candidate clonal neoantigens, which differ in their lengths and/or the location of a sequence variation that characterises the neoantigen compared to the corresponding germline peptide.
- the one or more peptides identified are analysed to determine whether they have one or more properties, such as being likely to be immunogenic (e.g. as described in GB 2303920.9), being expressed in the subject’s tumour (e.g. as described in GB2213928.1), being expressed in reference samples or datasets, having a low similarity to a corresponding normal peptide, having a high likelihood of manufacturability (e.g. as described in application PCT/EP2023/055383), etc.
- one or more of the peptides are selected for production based on at least some of the results of step 216B.
- the selected peptides may be obtained.
- Peptides with selected sequences may be obtained using any method known in the art but they are preferably obtained using chemical synthesis. Methods for obtaining sequences that encode peptides of interest are known. For example, tandem minigenes may be obtained which encode the selected one or more peptide.
- an immunotherapy may be produced using at least some of the one or more peptides or sequences encoding said peptides produced at step 218.
- the immunotherapy may comprise the one or more peptides (e.g. in the case of an immunogenic composition such as a synthetic long peptide vaccine), sequences encoding said peptides (e.g.
- the immunotherapy comprises cells that have been obtained using the selected peptides.
- Methods of producing an immunotherapy comprising cells that have been obtained using neoantigen peptides are known in the art, for example as described in WO 2022/207925, WO 2016/16174085, McGranahan et al. (2016), Lu et al. (2016), and Leko et al. (2019).
- the immunotherapy may be administered to a subject, which is preferably the subject from which the samples used to identify the neoantigens have been obtained.
- a subject which is preferably the subject from which the samples used to identify the neoantigens have been obtained.
- An example of producing an immunotherapy comprising a T cell population selectively enriched with T cells that recognise one or more neoantigens, preferably clonal neoantigens, will be described.
- a population of T cells may be obtained.
- the T cells may be obtained from the subject to be treated, but do not need to be.
- the T cells may be obtained from a tumour sample, from a blood sample, or from any other tissue sample.
- a population of antigen presenting cells e.g. dendritic cells
- a population of dendritic cells may be derived from mononuclear cells (e.g. peripheral blood mononuclear cells, PBMCs) from the subject to be treated.
- the population of dendritic cells may be pulsed with the selected peptides.
- the T cell population may be selectively expanded using the population of pulsed dendritic cells. Additional expansion factors such as e.g. cytokines or stimulating antibodies may be used.
- the disclosure provides a method of providing an immunotherapy for a subject that has been diagnosed as having cancer, the method comprising: identifying one or more neoantigen peptides associated with tumour-specific mutations present in the subject using a method described herein, and designing an immunotherapy that targets one or more of the candidate neoantigens.
- the method may have any one or more of the following features.
- the immunotherapy that targets the one or more of the neoantigens may be an immunogenic composition, a composition comprising immune cells or a therapeutic antibody.
- the immunogenic composition may comprise one or more of the neoantigen peptides (such as e.g. a neoantigen peptide or protein or a cell displaying the neoantigen).
- the composition comprising immune cells may comprise T cells, B cells and/or dendritic cells.
- the composition comprising a therapeutic antibody may comprise one or more antibodies that recognise at least one of the one or more of the neoantigen peptides.
- An antibody may be a monoclonal antibody.
- the immunogenic composition may comprise one or more nucleic acids encoding the one or more peptides, or a construct comprising such a nucleic acid.
- Designing an immunotherapy that targets one or more of the neoantigen peptides identified may comprise identifying one or more candidate peptides for each of the one or more neoantigens peptides targeted, each peptide comprising at least a portion of a neoantigen peptide targeted.
- the method may further comprise obtaining the one or more candidate peptides.
- the method may further comprise testing the one or more candidate peptides for one or more further properties. Further testing may be performed in vitro or in silico.
- the one or more peptides may be tested for immunogenicity, propensity to be displayed by MHC molecules (optionally by specific MHC molecule alleles, where the alleles may have been chosen depending on the MHC alleles expressed by the subject), ability to elicit proliferation of a population of immune cells, etc.
- a method of designing, or providing an immunotherapy may further comprise producing the immunotherapy.
- the method may further comprise obtaining a population of dendritic cells that has been pulsed with one or more of the candidate peptides.
- the immunotherapy may be a composition comprising T cells that recognise at least one of the one or more of the neoantigens identified.
- the composition may be enriched for T cells that target at least one of the one or more of the neoantigens identified.
- the method may comprise obtaining a population of T cells and expanding the population of T cells to increase the number or relative proportion of T cells that target at least one of the one or more of the neoantigens identified.
- the method may further comprise obtaining a T cell population.
- a T cell population may be isolated from the subject, for example from one or more tumour samples obtained from the subject, or from a peripheral blood sample or a sample from other tissues of the subject.
- the T cell population may comprise tumour infiltrating lymphocytes.
- T cells may be isolated using methods which are well known in the art. For example, T cells may be purified from single cell suspensions generated from samples on the basis of expression of CD3, CD4 or CD8. T cells may be enriched from samples by passage through a Ficoll-paque gradient.
- the method may further comprise expanding the T cell population. For example, T cells may be expanded by ex vivo culture in conditions which are known to provide mitogenic stimuli for T cells.
- the T cells may be cultured with cytokines such as IL-2 or with mitogenic antibodies such as anti-CD3 and/or CD28.
- the T cells may be co-cultured with antigen- presenting cells (APCs), which may have been irradiated.
- APCs antigen- presenting cells
- the APCs may be dendritic cells or B cells.
- the dendritic cells may have been pulsed with the candidate peptides (containing one or more of the identified neoantigens) as single stimulants or as pools of stimulating neoantigen peptides.
- Expansion of T cells may be performed using methods which are known in the art, including for example the use of artificial antigen presenting cells (aAPCs), which provide additional co-stimulatory signals, and autologous PBMCs which present appropriate peptides.
- aAPCs artificial antigen presenting cells
- Autologous PBMCs may be pulsed with peptides containing neoantigens as discussed herein as single stimulants, or alternatively as pools of stimulating neoantigens.
- Also described herein is a method for expanding a T cell population for use in the treatment of cancer in a subject, the method comprising: identifying one or more neoantigen peptides using a method as described herein; obtaining a T cell population comprising a T cell which is capable of specifically recognising one of the neoantigen peptides; and co-culturing the T cell population with a composition comprising the neoantigen peptide.
- the method may have one or more of the following features.
- the T cell population obtained may be assumed to comprise a T cell capable of specifically recognising one of the neoantigen peptides.
- the method preferably comprises identifying a plurality of neoantigen peptides.
- the neoantigen peptides may comprise one or more clonal neoantigens.
- the T cell population may comprise a plurality of T cells each of which is capable of specifically recognising one of the plurality of neoantigen peptides, and co-culturing the T cell population with a composition comprising the plurality of neoantigen peptides.
- the co-culture may result in expansion of the T cell population that specifically recognises one or more of the neoantigen peptides.
- the expansion may be performed by co-culture of a T cell with the one or more neoantigen peptides and an antigen presenting cell.
- the antigen presenting cell may be a dendritic cell.
- the expansion may be a selective expansion of T cells which are specific for the neoantigen peptides.
- the expansion may further comprise one or more non-selective expansion steps.
- a composition comprising a population of T cells obtained or obtainable by a method as described above.
- the disclosure also provides a T cell composition
- a T cell composition comprising a T cell population selectively enriched with T cells that recognise one or more neoantigens, preferably clonal neoantigens, wherein the T cell population has been selectively enriched using peptides that have been identified using any of the methods described herein.
- the expanded population of neoantigen-reactive T cells may have a higher activity than the population of T cells which have not been expanded, as measured by the response of the T cell population to restimulation with a neoantigen peptide.
- Activity may be measured by cytokine production, and wherein a higher activity is a 5-10 fold or greater increase in activity.
- References to a plurality of neoantigens may refer to a plurality of peptides or proteins each associated with a different tumour-specific mutation that gives rise to a neoantigen.
- Said plurality may be from 2 to 250, from 3 to 200, from 4 to 150, or from 5 to 100 tumour-specific mutations, for example from 5 to 75 or from 10 to 50 tumour-specific mutations.
- Each tumourspecific mutation may be represented by one or more neoantigen peptides.
- a plurality of neoantigens may comprise a plurality of different peptides, some of which comprise a sequence that includes the same tumour-specific mutation (for example at different positions within the sequence of the peptide, or within peptides of varying lengths).
- the one or more selected peptides obtained at step 218 may comprise from 2 to multiple hundred peptides, such as e.g.
- the one or more selected peptides may comprise up to a maximum number of peptides that is set by the capacity of a synthesis process or a step thereof, such as for example the number of wells in a reaction plate used for a single synthesis run or a multiple thereof.
- the number of selected peptides may be set to a maximum of 96, 192, 288, or 384.
- the number of peptides selected may be set to a maximum corresponding to the number of tumour-specific mutations that give rise to a neoantigen identified in a subject, or to the number of different peptides of a predetermined length that comprise said tumour-specific mutations. For example, as many as 1 ,000 to 10,000 peptides comprising one or more coding mutations may be identified using the methods described herein.
- a T cell population that is produced in accordance with the present disclosure will have an increased number or proportion of T cells that target one or more neoantigens that are represented in peptides identified using the methods described herein. That is to say, the composition of the T cell population will differ from that of a "native" T cell population (i.e. a population that has not undergone the expansion steps discussed herein), in that the percentage or proportion of T cells that target a neoantigen that is identified and produced as described herein will be increased.
- the T cell population according to the disclosure may have at least about 0.2, 0.3, 0.4, 0.5, 0 6, 0 7, 0 8, 0 9, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or 100% T cells that target a neoantigen for which a peptide is identified and produced as described herein.
- the immunotherapies described herein may be used in the treatment of cancer.
- the disclosure also provides a method of treating cancer in a subject comprising administering an immunotherapeutic composition as described herein to the subject.
- the cancer may be ovarian cancer, breast cancer, endometrial cancer, kidney cancer (renal cell), lung cancer (small cell, non-small cell and mesothelioma), bladder cancer, gastric cancer, oesophagal cancer, colorectal cancer, cervical cancer, endometrial cancer, brain cancer (gliomas, astrocytomas, glioblastomas), melanoma, merkel cell carcinoma, clear cell renal cell carcinoma (ccRCC), lymphoma, small bowel cancers (duodenal and jejunal), leukemia, pancreatic cancer, hepatobiliary tumours, germ cell cancers, prostate cancer, head and neck cancers, thyroid cancer and sarcomas.
- kidney cancer renal cell
- lung cancer small cell, non-small cell and mesothelioma
- bladder cancer gastric cancer
- oesophagal cancer colorectal cancer
- cervical cancer endometrial cancer
- brain cancer gliomas, astrocyto
- the cancer may be lung cancer, such as lung adenocarcinoma or lung squamous-cell carcinoma.
- the cancer may be melanoma.
- the cancer may be bladder cancer.
- the cancer may be head and neck cancer.
- the cancer may be selected from melanoma, merkel cell carcinoma, renal cancer, non-small cell lung cancer (NSCLC), urothelial carcinoma of the bladder (BLAC) and head and neck squamous cell carcinoma (HNSC) and microsatellite instability (MSI)-high cancers.
- the cancer is non-small cell lung cancer (NSCLC).
- the subject may be human.
- Treatment using the compositions and methods of the present disclosure may also encompass targeting circulating tumour cells and/or metastases derived from the tumour.
- Treatment according to the present disclosure targeting one or more neoantigens, preferably clonal neoantigens may help prevent the evolution of therapy resistant tumour cells which may occur with standard approaches such as chemotherapy, radiotherapy, or non-specific immunotherapy.
- the methods and uses for treating cancer described herein may be performed in combination with additional cancer therapies.
- the immunotherapies (including but not limited to T cell compositions) described herein may be administered in combination with immune checkpoint intervention, co-stimulatory antibodies, chemotherapy and/or radiotherapy, targeted therapy or monoclonal antibody therapy.
- any immunotherapy including but not limited to T cell compositions
- a T cell composition as described herein may be used in combination with an immunogenic composition as described herein.
- 'In combination' may refer to administration of the additional therapy (whether it is an immunotherapy or otherwise) before, at the same time as or after administration of the immunotherapy (e.g. T cell composition) as described herein.
- the invention also provides a method for producing an immunotherapeutic composition, the method comprising identifying one or more candidate neoantigen peptides associated with a tumour-specific mutation, predicting whether one or more candidate neoantigen peptides are likely to be immunogenic (based on e.g. likely MHC binding, likely MHC presentation, likely TCR binding, etc.), selecting one or more peptides from the candidate neoantigen peptides based on the predicting, and producing an immunotherapeutic composition that targets the selected neoantigens.
- compositions comprising a neoantigen peptide, neoantigen peptide specific immune cell, or an antibody that recognises a neoantigen peptide, for use in the treatment or prevention of cancer in a subject, wherein said neoantigen peptide has been identified using the methods described herein.
- composition comprising a neoantigen peptide, neoantigen peptide specific immune cell, or an antibody that recognises a neoantigen peptide, wherein said neoantigen peptide has been produced using the methods described herein.
- the computing device 1 may be a smartphone, tablet, personal computer or other computing device.
- the computing device is configured to implement a method for identifying a neoantigen peptide and/or providing an immunotherapy, as described herein.
- the computing device 1 is configured to communicate with a remote computing device (not shown), which is itself configured to implement a method of identifying a neoantigen peptide and/or providing an immunotherapy, as described herein.
- Figure 6 illustrates a problem with obtaining peptide sequences from sequences associated with neojunctions.
- Figure 6A illustrates a simple sequence with no neojunction.
- a variant sequence can be assembled from RNA sequence reads that are compatible with the variant (illustrated as “Assembled Variant Sequence”). This may comprise the variant itself (labelled as sSNV - single nucleotide variant) and a single nucleotide polymorphism (SNP).
- the assembled variant sequence can be matched with a reference transcript cDNA, provided that one or two mismatches are allowed in a successful match.
- Example 1 - A new method to assemble peptides from splice region associated mutations
- Figure 7 illustrates different classes of neojunction variants considered in the methods of the present disclosure. These include: 5’ exonic splice region variants, splice donor variants, 5’ intronic splice region variants, 3’ intronic splice region variants, splice acceptor variants and 3’ exonic splice region variants.
- the figure shows two exons, labelled E1 (5’ exon) and E2 (3’ exon), and intronic sequence between the two exons.
- the dashed lines indicate the normal splicing event that would occur in the wild-type sequence, which would result in a mature transcript comprising only the sequence of E1 and E2.
- Splice acceptor variants are variants that occur at the splice acceptor site.
- the splice acceptor site is typically an AG sequence forming the last two bases at the 3’ end of the intron.
- 3’ exonic splice region variants are variants that occur in the 3’ exon (E2) and that affect the splicing event between exons E1 and E2.
- variants were categorised based on the Ensembl Variant Effect Prediction definitions, i.e.:
- exonic splice region variants are variants that change any of the last 3 bases of an exon (i.e. bases 1 to 3 located 5’ of a splice donor site), splice donor variants are variants that change the 2 base region at the 5’ end of an intron,
- 5’ intronic splice region variants are variants that change any of bases 3 to 8 counting from the 5’ end of an intron (i.e. bases within 6 bases of a splice donor site, in the intron),
- intronic splice region variants are variants that change any of bases 3 to 8 counting from the 3’ end of an intron (i.e. bases within 6 bases of a splice acceptor site, in the intron), splice acceptor variants are variants that change the 2 base region at the 3’ end of an intron, and
- exonic splice region variants are variants that change any of the first 3 bases of an exon (i.e. bases 1 to 3 located 3’ of a splice acceptor site).
- the method assembles a sequence that contains the variant of interest from RNA sequence reads, then translates this into all 3 possible reading frames, identifies an anchor sequence that comprises the coding part of the assembled sequence that precedes the variants (or its corresponding translated sequences), finds reference transcripts that overlap the assembled sequence and their protein sequences, and uses the reading frame of the reference transcript (or corresponding protein) that best matches the anchor (or conversely, the translated anchor that best matches the protein sequence of a reference transcript).
- the anchor sequences are obtained as protein sequences (3 possible anchors per assembled sequence, one for each possible reading frame) and compared to protein sequences corresponding to reference transcripts. In either case, the method captures any polymorphisms that are part of the assembled sequence because the sequence is translated directly from the assembled sequence.
- Figures 8A, 8B illustrates the use of a method of the disclosure to assemble a peptide associated with a variant of interest that is a splice donor mutation on the + strand.
- the method starts from aligned RNA sequences, illustrated here as a BAM file, and information identifying one or more variants comprising for each variant: the variant sequence (e.g. in the illustrated embodiment the variant sequence is T>G or G) and the variant genomic coordinates in the same reference genome used to create the aligned RNA sequences.
- All RNA sequence reads comprising a variant of interest are extracted from the BAM file and assembled into an assembled sequence.
- sequence reads that overlap with the genomic location of the variant and that include the variant sequence.
- the assembled sequence is then translated into all 3 possible frames (labelled as Frame 1 , Frame 2, Frame 3) to obtain 3 translated sequences.
- a corresponding set of anchor sequences are obtained by trimming each of the 3 translated sequences to exclude all amino acids from the mutated codon (amino acid that is underlined in the translated sequences) onwards (in the 5’-3’ direction), and any remaining intronic amino acids between the mutated codon and the nearest upstream exon (i.e. intronic amino acids that are 5’ of the mutated position). In this particular case it is not necessary to remove additional intronic amino acids because the mutation occurs in the first codon of the retained intron.
- the variant information is also used to identify overlapping reference transcripts, i.e. all transcripts from a reference transcriptome that overlap with the genomic coordinates of the variant of interest. These reference transcripts are associated with corresponding reference protein sequences.
- the set of 3 anchors are compared with the reference protein sequences to identify the anchor that best matches to any of the reference protein sequences. A best match is identified as that with the highest sequence identity, allowing for a predetermined number of mismatches to allow for polymorphisms.
- the full translated sequence that corresponds to the best matching anchor sequence is identified as the neoantigen peptide associated with the variant of interest. In the illustrated embodiment it can be seen that this neoantigen includes a series of intronic amino acids that would not be present in the wild-type protein.
- Figures 8C, 8D illustrates the use of a method of the disclosure to assemble a peptide associated with a variant of interest that is a 5’ intronic splice region mutation on the + strand.
- the method starts from aligned RNA sequences, illustrated here as a BAM file, and information identifying one or more variants comprising, for each variant: the variant sequence (e.g. in the illustrated embodiment the variant sequence is A) and the variant genomic coordinates in the same reference genome used to create the aligned RNA sequences. All RNA sequence reads comprising a variant of interest (illustrated as a star in the sequence reads aligned to the gene model on Figure 8C) are extracted from the BAM file and assembled into an assembled sequence.
- sequence reads that overlap with the genomic location of the variant and that include the variant sequence.
- the assembled sequence is then translated into all 3 possible frames (labelled as Frame 1 , Frame 2, Frame 3) to obtain 3 translated sequences.
- a corresponding set of anchor sequences are obtained by trimming each of the 3 translated sequences to exclude all amino acids from the mutated codon (amino acid underlined in the translated sequences) onwards (in the 5’-3’ direction), and any intronic amino acids between the mutated codon and the nearest upstream exon (i.e. intronic amino acids that are 5’ of the mutated position). Any amino acid that is encoded by a codon that comprises at least one intronic base is considered an intronic amino acid.
- Translated anchor sequences that contain early stop codons may either not map to a reference protein sequence or may be mapped only to the stop codon, in which case they will typically be too short to map to a reference protein sequence and will therefore be excluded at the mapping stage.
- the variant information is also used to identify overlapping reference transcripts, i.e. all transcripts from a reference transcriptome that overlap with the genomic coordinates of the variant of interest. These reference transcripts are associated with corresponding reference protein sequences.
- the set of 3 anchors are compared with the reference protein sequences to identify the anchor that best matches to any of the reference protein sequences. A best match is identified as that with the highest sequence identity, allowing for a predetermined number of mismatches to allow for polymorphisms.
- the full translated sequence that corresponds to the best matching anchor sequence is identified as the neoantigen peptide associated with the variant of interest.
- this neoantigen includes a series of intronic amino acids that would not be present in the wild-type protein.
- Figures 8E, 8F 5’ illustrates the use of a method of the disclosure to assemble a peptide associated with a variant of interest that is a 5’ exonic splice region mutation on the + strand.
- the method starts from aligned RNA sequences, illustrated here as a BAM file, and information identifying one or more variants comprising, for each variant: the variant sequence (e.g. in the illustrated embodiment the variant sequence is T) and the variant genomic coordinates in the same reference genome used to create the aligned RNA sequences. All RNA sequence reads comprising a variant of interest (illustrated as a star in the sequence reads aligned to the gene model on Figure 8E) are extracted from the BAM file and assembled into an assembled sequence.
- sequence reads that overlap with the genomic location of the variant and that include the variant sequence.
- the assembled sequence is then translated into all 3 possible frames (labelled as Frame 1 , Frame 2, Frame 3) to obtain 3 translated sequences.
- a corresponding set of anchor sequences are obtained by trimming each of the 3 translated sequences to exclude all amino acids from the mutated codon (amino acid underlined in the translated sequences) onwards (in the 5’-3’ direction). Because the variant of interest is within the exon, there are no intronic amino acids between the mutated codon and the nearest upstream exonic amino acid.
- the variant information is also used to identify overlapping reference transcripts, i.e. all transcripts from a reference transcriptome that overlap with the genomic coordinates of the variant of interest.
- the set of 3 anchors are compared with the reference protein sequences to identify the anchor that best matches to any of the reference protein sequences. A best match is identified as that with the highest sequence identity, allowing for a predetermined number of mismatches to allow for polymorphisms.
- the full translated sequence that corresponds to the best matching anchor sequence is identified as the neoantigen peptide associated with the variant of interest. In the illustrated embodiment it can be seen that this neoantigen includes a series of intronic amino acids that would not be present in the wild-type protein.
- the method starts from aligned RNA sequences, illustrated here as a BAM file, and information identifying one or more variants comprising, for each variant: the variant sequence (e.g. in the illustrated embodiment the variant sequence is A) and the variant genomic coordinates in the same reference genome used to create the aligned RNA sequences.
- All RNA sequence reads comprising a variant of interest are extracted from the BAM file and assembled into an assembled sequence. These are sequence reads that overlap with the genomic location of the variant and that include the variant sequence. The reverse complement of the assembled sequence is then obtained.
- This is then translated into all 3 possible frames (labelled as Frame 1 , Frame 2, Frame 3) to obtain 3 translated sequences.
- a corresponding set of anchor sequences are obtained by trimming each of the 3 translated sequences to exclude all amino acids from the mutated codon (amino acid underlined in the translated sequences) onwards (in the 5’-3’ direction). Because the variant of interest is in the first codon of the intron, there are no intronic amino acids between the mutated codon and the nearest upstream exonic amino acid.
- the variant information is also used to identify overlapping reference transcripts, i.e. all transcripts from a reference transcriptome that overlap with the genomic coordinates of the variant of interest. These reference transcripts are associated with corresponding reference protein sequences.
- the set of 3 anchors are compared with the reference protein sequences to identify the anchor that best matches to any of the reference protein sequences.
- a best match is identified as that with the highest sequence identity, allowing for a predetermined number of mismatches to allow for polymorphisms.
- the full translated sequence that corresponds to the best matching anchor sequence is identified as the neoantigen peptide associated with the variant of interest. In the illustrated embodiment it can be seen that this neoantigen includes a series of intronic amino acids that would not be present in the wild-type protein.
- the anchor sequences also have a corresponding nucleic acid sequence in the assembled sequence (illustrated as a shaded area).
- the nucleic acid sequence corresponding to the anchor sequence is aligned with reference transcripts and the assembled sequence is translated using the translation frame of the reference transcript that best matches the nucleic acid sequence corresponding to the anchor sequence.
- Aligned sequences typically include, for each of a plurality of reads, at least: (i) the sequence of the read, and (ii) the genomic location to which the read is best aligned, including strand information (+/- strand) and the genomic coordinate of the start of the alignment.
- Aligned sequences can be provided in the form of a BAM or SAM file. As mentioned above, using these file formats, reads are typically represented on the forward (+) genomic strand. Therefore, sequences of reads that are mapped to the reverse (-) genomic strand are provided reverse complemented.
- Example 1 the methods described in Example 1 were tested on a synthetic data set where reads containing exonic and intronic sequence were generated and variants randomly spiked into those reads.
- an exact match between the reference sequence and the anchor sequence was required in order to identify a reading frame.
- more permissive alternatives are possible, particularly when mapping at the reference transcript level.
- 1 or 2 mismatches in any matching sequence could be tolerated for example to allow for the presence of SNPs.
- the number of mismatches allowed is determined depending on the minimum length of matching sequence. For example, 1 or 2 mismatches may be allowed in a 12 or more bases matching sequence, whereas 0 or 1 mismatches may be allowed in a 4 (or 5, 6) or more amino acids matching sequence may be used.
- Example 1 the methods described in Example 1 were tested on data from a breast cancer cell line (HCC1395).
- Variants were then filtered for variants that were annotated as splice region variants or splice donor/acceptor variants (splice_donor_variant, splice_acceptor_variant, splice_region_variant) by the Ensembl variant effect predictor (VEP, www.ensembl.org/info/docs/tools/vep/index.html).
- VEP Ensembl variant effect predictor
- HCC1395 is an adherent cell line. It was sub-cultured at 1 :2 ratio when it reached an approximately 75-80% confluence. Cells were detached using TrypLE Express Enzyme (Gibco) and growth with RPMI complete media containing the following reagents: RPMI 1640 (containing 2g/L Glucose) + 10 mM HEPES + 1 mM Sodium Pyruvate + 2.5g/L Glucose + 10% FCS + 1X Pen/Strep.
- Figures 10 and 11 show examples of results obtained using methods of the disclosure to identify mutant peptides from splice variants in cell line data (HCC1395).
- Figure 10 shows a specific example of a splice donor variant (chr12:21807476_C>A).
- Figure 11 shows an example of a 5’ Exonic Splice Region Variant (chr4:119661818_C>T).
- the data on Figure 11 shows, from the top of and as a function of genomic coordinates: read coverage, a splice junction track, the aligned reads pile up, the transcript sequence (ctDNA), the sequence of the corresponding protein in all 3 possible reading frames (with the selected reading frame highlighted), the position and amino acid sequence of the genes known to be located at these coordinates, and the sequence of a neoantigen peptide (in this case a 30 amino acids peptide with the neojunction located approximately in the middle of the peptide - i.e. starting at position 16; other lengths of peptides and locations of the neojunction are possible).
- a neoantigen peptide in this case a 30 amino acids peptide with the neojunction located approximately in the middle of the peptide - i.e. starting at position 16; other lengths of peptides and locations of the neojunction are possible.
- Example 2 the inventors extended the approach described in Example 1 to situations where a plurality of transcriptomic datasets are available for a particular patient, and demonstrated the use of this approach to patient data.
- RNA-seq data and whole exome sequencing data for human tumour samples were obtained for all patients.
- the WES data was analysed through an in-house sequence analysis pipeline comprising alignment (using BWA, Li and Durbin, 2009) variant calling (using standard tools such as e.g. Strelka2, Kim et al., 2018 and VarDict, Lai et al. 2016) by comparison to the corresponding germline sequence (HCC1395BL), and variant annotation which associates variants with transcripts.
- Variants were then filtered for variants that were annotated as splice region variants or splice donor I acceptor variants (splice_donor_variant, splice_acceptor_variant, splice_region_variant) by the Ensembl variant effect predictor (VEP, www.ensembl.org/info/docs/tools/vep/index.html).
- VEP Ensembl variant effect predictor
- Figure 12 illustrates a method to identify neoantigen peptides from sequence data from a plurality of samples (labelled R1 , R2).
- the method considers the union of reads from all samples over a locus when assembling a variant sequence.
- all RNA sequencing reads obtained from each of the samples are pooled together and aligned.
- a single BAM file may be obtained combining all of this sequence data.
- the method also preserves the information about which sample each read came from, which enables it to determine whether there are reads supporting the mutation in each sample. This can in turn be used to determine whether the mutation is ubiquitous (present in all of the plurality of samples).
- the sequencing reads are then grouped based on the distinct alleles that are present in the reads.
- a single exonic variant is present in the region considered and therefore reads are grouped in two groups: a group of reads comprising the variant allele, and a group of reads comprising the normal (wild type) allele.
- a plurality of somatic variants may be present, each of which may be supported by a plurality of reads.
- Reads that are compatible with each variant may be analysed together, i.e. reads supporting the particular variant are processed together by assembling a variant sequence. The rest of the method proceeds as explained above for each particular variant.
- Figure 13A shows results obtained using methods of the disclosure to identify mutant peptides from tumour samples.
- each bar of the barchart represents a different cancer patient, and the height of the bar indicates the number of splice region associated variants for which peptides could be identified for each patient in each category of variant (in each bar, from top to bottom: exonic variant, splice donor variant, intronic variant).
- the bottom plot shows the information filtered for variants that are ubiquitously expressed.
- the data shows that many of the patients in the cohort have expressed splice region associated variants for which the methods of the present disclosure could identify peptides. These each represent candidate patient-specific neoantigens for immunotherapy.
- Figure 13B shows, for each class of 5’ splice variants analysed, the distribution of variant allele fractions (VAF, labelled as total VAF as it is obtained by merging all variants and reference reads from a plurality of samples for the same patient) of splice variants identified across the entire cohort.
- VAF variant allele fractions
- Figure 13C shows an example of a ubiquitously expressed splice donor variant identified in a patient (chr5:140041594_T>C). This shows a retained intron (due to a mutation in a splice donor site) followed by a novel splice donor, after which the sequence carries on to the next exon. Because the intron retained did not have a length divisible by 3, the retained intron led to a shift in reading frame, which meant that the start of the exon was translated but then hit a premature stop codon. This type of variant could not have been identified without an assemblybased approach as described herein.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Chemical & Material Sciences (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
L'invention divulgue des méthodes d'identification d'un peptide néo-antigène associé à une mutation spécifique à une tumeur. La méthode consiste à sélectionner une ou plusieurs lectures de séquence d'ARN à partir de données de séquence d'ARN provenant d'un ou plusieurs échantillons, qui contiennent la mutation spécifique à une tumeur; assembler une séquence comprenant la mutation spécifique à une tumeur et compatible avec des séquences se chevauchant de la ou des lectures de séquence d'ARN sélectionnées; extraire une séquence d'ancrage de la séquence assemblée en tant que sous-séquence qui précède la mutation spécifique à une tumeur et qui comprend uniquement des positions identifiées en tant que partie de régions exoniques; identifier une trame de lecture en tant que trame de lecture d'une transcription de référence : (i) qui chevauche la position génomique de la mutation spécifique à une tumeur, et (ii) avec laquelle la séquence d'ancrage établit une correspondance ou qui a une protéine correspondante avec laquelle une traduction de la séquence d'ancrage établit une correspondance. Le peptide néo-antigène est une traduction d'au moins une partie de la séquence assemblée qui utilise la trame de lecture identifiée. L'invention divulgue également des méthodes, des systèmes et des produits associés.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB2308965.9 | 2023-06-15 | ||
| GB202308965 | 2023-06-15 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024256485A1 true WO2024256485A1 (fr) | 2024-12-19 |
Family
ID=91581985
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2024/066271 Pending WO2024256485A1 (fr) | 2023-06-15 | 2024-06-12 | Identification de peptides néo-antigènes |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024256485A1 (fr) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB2213928A (en) | 1987-12-14 | 1989-08-23 | Shimadzu Corp | Wavelength scanning spectrophotometer |
| GB2303920A (en) | 1994-04-25 | 1997-03-05 | Sensor Systems | Piezoelectric sensors |
| WO2016174085A1 (fr) | 2015-04-27 | 2016-11-03 | Cancer Research Technology Limited | Méthode de traitement du cancer |
| WO2022207925A1 (fr) | 2021-04-01 | 2022-10-06 | Achilles Therapeutics Uk Limited | Identification de néo-antigènes clonaux et leurs utilisations |
| US20230091256A1 (en) * | 2020-02-28 | 2023-03-23 | Curevac Netherlands B.V. | Hidden Frame Neoantigens |
-
2024
- 2024-06-12 WO PCT/EP2024/066271 patent/WO2024256485A1/fr active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB2213928A (en) | 1987-12-14 | 1989-08-23 | Shimadzu Corp | Wavelength scanning spectrophotometer |
| GB2303920A (en) | 1994-04-25 | 1997-03-05 | Sensor Systems | Piezoelectric sensors |
| WO2016174085A1 (fr) | 2015-04-27 | 2016-11-03 | Cancer Research Technology Limited | Méthode de traitement du cancer |
| US20230091256A1 (en) * | 2020-02-28 | 2023-03-23 | Curevac Netherlands B.V. | Hidden Frame Neoantigens |
| WO2022207925A1 (fr) | 2021-04-01 | 2022-10-06 | Achilles Therapeutics Uk Limited | Identification de néo-antigènes clonaux et leurs utilisations |
| US20230071113A1 (en) * | 2021-04-01 | 2023-03-09 | Achilles Therapeutics Uk Limited | Identification of clonal neoantigens and uses thereof |
Non-Patent Citations (23)
| Title |
|---|
| BULIK-SULLIVAN BBUSBY JPALMER CDDAVIS MJMURPHY TCLARK ABUSBY MDUKE FYANG AYOUNG L: "Deep learning using tumor HLA peptide mass spectrometry datasets improves neoantigen identification", NAT BIOTECHNOL, 17 December 2018 (2018-12-17) |
| CARTER SLCIBULSKIS KHELMAN EMCKENNA ASHEN HZACK TLAIRD PWONOFRIO RCWINCKLER WWEIR BA: "Absolute quantification of somatic DNA alterations in human cancer", NAT BIOTECHNOL, vol. 30, no. 5, May 2012 (2012-05-01), pages 413 - 21, XP055563480, DOI: 10.1038/nbt.2203 |
| DOBIN ADAVIS CASCHLESINGER FDRENKOW JZALESKI CJHA SBATUT PCHAISSON MGINGERAS TR: "STAR: ultrafast universal RNA-seq aligner", BIOINFORMATICS, vol. 29, no. 1, 1 January 2013 (2013-01-01), pages 15 - 21, XP055500895, DOI: 10.1093/bioinformatics/bts635 |
| EWING, A.HOULAHAN, K.HU, Y ET AL.: "Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection", NAT METHODS, vol. 12, 2015, pages 623 - 630, XP093155546, DOI: 10.1038/nmeth.3407 |
| HUNDAL JKIWALA SFENG YYLIU CJGOVINDAN RCHAPMAN WCUPPALURI RSWAMIDASS SJGRIFFITH OLMARDIS ER: "Accounting for proximal variants improves neoantigen prediction", NAT GENET, vol. 51, no. 1, January 2019 (2019-01-01), pages 175 - 179, XP036927708, DOI: 10.1038/s41588-018-0283-9 |
| KIM, S.SCHEFFLER, K.HALPERN, A.L ET AL.: "Strelka2: fast and accurate calling of germline and somatic variants", NAT METHODS, vol. 15, 2018, pages 591 - 594, XP036559399, DOI: 10.1038/s41592-018-0051-x |
| LAI ZMARKOVETS AAHDESMAKI MCHAPMAN BHOFMANN OMCEWEN RJOHNSON JDOUGHERTY BBARRETT JCDRY JR: "VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research", NUCLEIC ACIDS RES., vol. 44, no. 11, 20 June 2016 (2016-06-20), pages e108, XP055701286, DOI: 10.1093/nar/gkw227 |
| LANDAU DACARTER SLSTOJANOV PMCKENNA ASTEVENSON KLAWRENCE MSSOUGNEZ CSTEWART CSIVACHENKO AWANG L: "Evolution and impact of subclonal mutations in chronic lymphocytic leukemia", CELL, vol. 152, no. 4, 14 February 2013 (2013-02-14), pages 714 - 26, XP028979918, DOI: 10.1016/j.cell.2013.01.019 |
| LANGMEAD, B.TRAPNELL, C.POP, M ET AL.: "Ultrafast and memory-efficient alignment of short DNA sequences to the human genome", GENOME BIOL, vol. 10, 2009, pages R25, XP021053573, DOI: 10.1186/gb-2009-10-3-r25 |
| LEKO VMCDUFFIE LAZHENG ZGARTNER JJPRICKETT TDAPOLO ABAGARWAL PKROSENBERG SALU YC: "Identification of Neoantigen-Reactive Tumor-Infiltrating Lymphocytes in Primary Bladder Cancer", J IMMUNOL, vol. 202, no. 12, 15 June 2019 (2019-06-15), pages 3458 - 3467 |
| LI HDURBIN R: "Fast and accurate short read alignment with Burrows-Wheeler Transform", BIOINFORMATICS, vol. 25, 2009, pages 1754 - 60 |
| LU YCZHENG ZROBBINS PFTRAN EPRICKETT TDGARTNER JJLI YFRAY SFRANCO ZBLISKOVSKY V: "An Efficient Single-Cell RNA-Seq Approach to Identify Neoantigen-Specific T Cell Receptors", MOL THER, vol. 26, no. 2, 7 February 2018 (2018-02-07), pages 379 - 389, XP002781571 |
| LUNDEGAARD CLAMBERTH KHARNDAHL MBUUS SLUND ONIELSEN M: "NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11", NUCLEIC ACIDS RES., vol. 36, 1 July 2008 (2008-07-01), pages W509 - 12, XP055252573, DOI: 10.1093/nar/gkn202 |
| MCGRANAHAN NFURNESS AJROSENTHAL RRAMSKOV SLYNGAA RSAINI SKJAMAL-HANJANI MWILSON GABIRKBAK NJHILEY CT: "Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade", SCIENCE, vol. 351, no. 6280, 25 March 2016 (2016-03-25), pages 1463 - 9, XP055283414, DOI: 10.1126/science.aaf1490 |
| MCGRANAHAN, N.FURNESS, A. J.ROSENTHAL, R.RAMSKOV, S.LYNGAA, R.SAINI, S. K.JAMAL-HANJANI, M.WILSON, G. A.BIRKBAK, N. J.HILEY, C. T.: "Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade", SCIENCE, vol. 351, no. 6280, 2016, pages 1463 - 1469, XP055283414, DOI: 10.1126/science.aaf1490 |
| O'DONNELL TJRUBINSTEYN ALASERSON U: "MHCflurry 2.0: Improved Pan-Allele Prediction of MHC Class I-Presented Peptides by Incorporating Antigen Processing", CELL SYST, 22 July 2020 (2020-07-22) |
| PAN YAANG ET AL: "IRIS: Discovery of cancer immunotherapy targets arising from pre-mRNA alternative splicing", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, 16 May 2023 (2023-05-16), XP093203076, ISSN: 0027-8424 * |
| PIETER MORISJOEY DE PAUWANNA POSTOVSKAYASOFI GIELISNICOLAS DE NEUTERWOUT BITTREMIEUXBENSONOGUNJIMIKRIS LAUKENSPIETER MEYSMAN, CURRENT CHALLENGES FOR EPITOPE-AGNOSTIC TCR INTERACTION PREDICTION AND A NEW PERSPECTIVE DERIVED FROM IMAGE CLASSIFICATION, September 2020 (2020-09-01) |
| QUIRIN MANZ: "ASimulatoR: splice-aware RNA-Seq data simulation", BIOINFORMATICS, vol. 37, 15 September 2021 (2021-09-15), pages 3008 - 3010 |
| RAINE KMVAN LOO PWEDGE DCJONES DMENZIES ABUTLER APTEAGUE JWTARPEY PNIK-ZAINAL SCAMPBELL PJ: "ascatNgs: Identifying Somatically Acquired Copy-Number Alterations from Whole-Genome Sequencing Data", CURR PROTOC BIOINFORMATICS, vol. 56, 8 December 2016 (2016-12-08), pages 1 - 17 |
| ROTH AKHATTRA JYAP DWAN ALAKS EBIELE JHA GAPARICIO SBOUCHARD-CÔTÉ ASHAH SP: "PyClone: statistical inference of clonal population structure in cancer", NAT METHODS, vol. 11, no. 4, April 2014 (2014-04-01), pages 396 - 8, XP055563468, DOI: 10.1038/nmeth.2883 |
| RUBINSTEYN, A. KODYSH J.AKSOY BA, ISOVAR, 2017 |
| VANESSA JURTZSINU PAULMASSIMO ANDREATTAPAOLO MARCATILIBJOERN PETERSMORTEN NIELSEN: "NetMHCpan-4.0: Improved Peptide-MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data", J IMMUNOL, vol. 199, no. 9, 1 November 2017 (2017-11-01), pages 3360 - 3368, XP055634914, DOI: 10.4049/jimmunol.1700893 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20220093209A1 (en) | Predicting immunogenicity of t cell epitopes | |
| Sahin et al. | Personalized RNA mutanome vaccines mobilize poly-specific therapeutic immunity against cancer | |
| Granados et al. | Impact of genomic polymorphisms on the repertoire of human MHC class I-associated peptides | |
| Pritchard et al. | Exome sequencing to predict neoantigens in melanoma | |
| CN113711239A (zh) | 利用ii类mhc模型鉴别新抗原 | |
| US11504398B2 (en) | Identification of clonal neoantigens and uses thereof | |
| AU2016319316A1 (en) | "immune checkpoint intervention" in cancer | |
| Bertrums et al. | Elevated mutational age in blood of children treated for cancer contributes to therapy-related myeloid neoplasms | |
| Lozano-Rabella et al. | Exploring the immunogenicity of noncanonical HLA-I tumor ligands identified through proteogenomics | |
| Zhang et al. | CD86 is associated with immune infiltration and immunotherapy signatures in AML and promotes its progression | |
| Marcu et al. | Natural and cryptic peptides dominate the immunopeptidome of atypical teratoid rhabdoid tumors | |
| WO2019008365A1 (fr) | Méthode de traitement du cancer par un néo-antigène indel de déphasage | |
| CN104126017A (zh) | 对蛋白酶体抑制剂的反应的生物标记 | |
| US20220313804A1 (en) | Hla tumor antigen peptides of class i and ii for treating mammary/breast carcinomas | |
| Koşaloğlu et al. | Identification of immunotherapeutic targets by genomic profiling of rectal NET metastases | |
| Campbell et al. | Spatial profiling reveals association between WNT pathway activation and T-cell exclusion in acquired resistance of synovial sarcoma to NY-ESO-1 transgenic T-cell therapy | |
| WO2024256485A1 (fr) | Identification de peptides néo-antigènes | |
| Leko et al. | Utilization of primary tumor samples for cancer neoantigen discovery | |
| Lozano-Rabella et al. | Immunogenicity of non-canonical HLA-I tumor ligands identified through proteogenomics | |
| Pospiech et al. | Features of the TCR repertoire associate with patients' clinical and molecular characteristics in acute myeloid leukemia | |
| Zhao et al. | Identification of shared neoantigens derived from frameshift mutations in the APC gene | |
| WO2021077094A1 (fr) | Découverte, validation et personnalisation de vaccins contre le cancer utilisant des éléments transposables | |
| CN113272419A (zh) | 制备治疗性t淋巴细胞的方法 | |
| Barroux et al. | Evolutionary and immune microenvironment dynamics during neoadjuvant treatment of oesophagael adenocarcinoma | |
| Papp et al. | Synergy of HLA class I and II shapes the timing of antitumor immune response |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24733538 Country of ref document: EP Kind code of ref document: A1 |