[go: up one dir, main page]

WO2016070230A1 - Detecting sequence mutations in leukaemic fusion genes - Google Patents

Detecting sequence mutations in leukaemic fusion genes Download PDF

Info

Publication number
WO2016070230A1
WO2016070230A1 PCT/AU2015/000667 AU2015000667W WO2016070230A1 WO 2016070230 A1 WO2016070230 A1 WO 2016070230A1 AU 2015000667 W AU2015000667 W AU 2015000667W WO 2016070230 A1 WO2016070230 A1 WO 2016070230A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
umid
mutations
bcr
molecules
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/AU2015/000667
Other languages
French (fr)
Inventor
Wendy Tara PARKER
David Tak On YEUNG
Susan Branford
Hamish Steele Scott
Joel Micah GEOGHEGAN
Andreas Wolfgang SCHREIBER
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central Adelaide Local Health Network Inc
Adelaide University
Original Assignee
University of South Australia
Central Adelaide Local Health Network Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2014904452A external-priority patent/AU2014904452A0/en
Application filed by University of South Australia, Central Adelaide Local Health Network Inc filed Critical University of South Australia
Publication of WO2016070230A1 publication Critical patent/WO2016070230A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/82Translation products from oncogenes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention relates to methods of detecting mutations, particularly rare sequence mutations, in polynucleotide molecules. These methods involve the use of a novel technique, Single Molecule Consensus Sequencing (SMCS).
  • SMCS Single Molecule Consensus Sequencing
  • Chronic myeloid leukaemia otherwise known as chronic granulocytic leukaemia (CGL)
  • CML chronic myeloid leukaemia
  • CGL chronic granulocytic leukaemia
  • the disease is invariably fatal within 3 to 5 years.
  • the diagnosis is made during a relatively benign chronic phase (the "chronic phase”).
  • the disease progresses through an "accelerated phase” to a terminal “blastic phase” or “blast crisis” phase that is generally refractory to therapy (Goldman el ah, 2003).
  • the disease is characterised by the overproduction of granulocytes (also referred to as blasts or leukaemic blasts) in the blood marrow.
  • CML occurs when a bone marrow stem cell develops a new and abnormal chromosome referred to as the Philadelphia (Ph) chromosome. What causes this chromosome to appear in some people is unknown (although in some cases it may have arisen due to radiotherapy for other cancers), and it is not familial nor can it be passed to offspring.
  • the Ph chromosome comprises the fusion gene, BCR-ABL1, which represents the central molecular pathology of CML.
  • This gene encodes the BCR-ABLl fusion protein, which is a constitutively-activated tyrosine kinase that aberrantly activates a series of molecular pathways causing deregulated cell proliferation, differentiation, DNA repair and apoptosis (Melo et ah, 2007).
  • TKI BCR-ABL l tyrosine kinase inhibitor
  • Glivec® imatinib
  • TKI-resistant mutations More than 100 TKI-resistant mutations have now been described in CML patients (Branford et al., 2009). This has driven the development of more potent TKIs. Now available are the "second- generation" TKIs, nilotinib (Tasigna®; Novartis Pharmaceuticals Pty Ltd) and dasatinib (Sprycel®; Bristol-Myers Squibb Company, Princeton, NJ, United States of America) which, between them, are active against most imatinib-resistant BCR-ABLl protein mutants. A notable exception however, is the most commonly detected mutation, T3151, which confers resistance to imatinib, nilotinib and dasatinib.
  • Ponatinib (IclusigTM; ARIAD Pharmaceuticals lnc, Cambridge, MA, United States of America) is a "third-generation" TKI that inhibits all known BCR-ABL l protein mutants, including T3151, at clinically achievable doses (O'Hare et al., 2009).
  • ponatinib therapy has been associated with a relatively high incidence of side-effects, especially arterial thrombotic events such as coronary and peripheral vascular disease (Cortes et al., 2013).
  • in vitro studies have demonstrated that certain compound mutants are still likely to cause resistance (O'Hare et al., 2009; Zabriskie et al., 2014).
  • ABLOO l Novartis AG, Basel, Switzerland
  • the BCR-ABLl fusion gene as derived from the Ph chromosome is also central to the pathogenesis of other haematological neoplasms.
  • BCR-ABLl is not uncommonly found in a subtype of adult acute lymphoblastic leukaemia (ALL), associated with the older patient, and adverse prognosis.
  • ALL acute lymphoblastic leukaemia
  • TKIs are also increasingly recognised as a valuable addition to this standard of care (Chalandon et al, 2015).
  • the increased use of TKI for ALL is also associated with KD mutations as an evolving cause of disease resistance.
  • KD mutations in Ph+ ALL more commonly found at treatment failure, and more frequently confer resistance to both first and second generation TKIs, BCR-ABLl mutants that confer high level resistance, such as T3 151, E255K and Y253H, are more common in Ph+ ALL as compared to CML, as are compound mutations (Zabriskie et ah, 2014; Soverini et al, 2014).
  • Ph chromosome and acute myeloid leukaemia have been described, and the use of imatinib in these cases has also been reported to be associated with KD mutations as a pathway of treatment resistance (Reboursiere et al , 2015).
  • Ph+ leukaemias It is widely accepted that techniques used to interrogate the BCR-ABLl KD in the ascertainment of treatment resistant mutations in CML are also applicable and transferable to Ph+ ALL and Ph+ AML without, or with minimal, further modification. These diseases are referred to hereinafter as Ph+ leukaemias.
  • the present invention provides a method for identifying and/or enumerating sequence mutations within the kinase domain (KD) of a fusion gene comprising ABLI or a portion thereof encoding all or substantially all of the KD, wherein the method of the present invention comprises the steps of:
  • the present invention provides a kit for use in the method of the present invention.
  • a kit may comprise, for example, appropriate primer molecules, and/or buffer solutions, preparations of deoxyribonucleotide triphosphates (dNTPs) etc.
  • dNTPs deoxyribonucleotide triphosphates
  • Figure 1 provides: (A) a schematic diagram outlining an embodiment of the method according to the present invention, employing the use of a novel technique, Single Molecule Consensus Sequencing (SMCS).
  • SMCS Single Molecule Consensus Sequencing
  • the diagram depicts how, by uniquely tagging individual BCR-ABLl transcripts prior to PCR amplification, the method enables the identification and/or enumeration of those individual transcripts and hence differentiation between compound and polyclonal mutations; and
  • (B) provides tabulated results comparing compound mutations detected by the method depicted in (A) and amplicon next generation sequencing (NGS) using Ion Torrent. Only mutations within the region examined by the SMCS are summarised (nb. four (4) mutations were outside of this region).
  • the present invention provides a novel method which allows for the accurate and sensitive discrimination between compound mutants and multiple single mutants in patients with ABU fusion gene-driven diseases such as Ph+ leukaemias.
  • This method involves a technique herein termed as Single Molecule Consensus Sequencing (SMCS).
  • SMCS Single Molecule Consensus Sequencing
  • SMCS enables the identification and enumeration of sequence mutations (eg point mutations) present in RNA, cDNA or gDNA that may be associated with, for example, disease or treatment outcomes (eg TK1 drug resistance).
  • sequence mutations eg point mutations
  • the technique is particularly suitable for rare sequence mutations, however the technique may also be applied to the identification and enumeration of more commonly arising sequence mutations (eg mutations that may be present in all cells of a particular cancer).
  • SMCS enables one to distinguish rare mutations that may only be present in, for example, some cells from a particular cancer from mutations that may be caused by polynucleotide molecule amplification and/or sequencing reactions (ie artefacts).
  • the SMCS technique also enables compound mutations to be distinguished from instances where there are multiple versions of the relevant polynucleotide molecule (eg within a sample of a particular cancer) each with one different mutation within a particular cancer; as will be appreciated from the above, the problem with detecting compound mutations is that artefacts introduced during the amplification and/or sequencing reactions used in present methods can make it appear that two different mutations in the same gene are present in the same cell, when in fact they are present in separate cells.
  • the SMCS technique involves the use of "unique molecular ID tags" (UMIDs) or molecular "barcode” sequences to tag all of the polynucleotide molecules of interest within a sample.
  • UIDs unique molecular ID tags
  • barcode molecular "barcode”
  • the UMlD-tagged molecules are then copied and the sequence of the copies (ie amplicons) is obtained by, preferably, NGS.
  • the sequence of the original polynucleotide molecule is inferred from the consensus of the copies with the same UMID-tag sequence to thereby overcome misleading results caused by artefact mutations introduced during the amplification and/or sequencing reactions.
  • the present invention is particularly described with reference to the BCR-ABLl gene and chronic myeloid leukaemia (CML).
  • CML chronic myeloid leukaemia
  • the invention is considered to be more broadly applicable to other disease associated ABLl fusion genes, including but not limited to the BCR-ABLl- e fusion gene associated with some forms of acute lymphoblastic leukaemia (ALL), namely ETV6 (TEL)-ABLl which, in one described example, is a fusion of sequences from exon 5 of the ETV6 (TEL) gene (ie the ETS variant 6 gene) and exon 2 (also known as a2) of the ⁇ gene (Yeung et al., 2015).
  • ALL acute lymphoblastic leukaemia
  • TEL ETV6
  • TEL ETV6
  • TEL ETS variant 6 gene
  • exon 2 also known as a2
  • the present invention provides a method for identifying and/or enumerating sequence mutations (such as rare sequence mutations) within the kinase domain (KD) of a fusion gene comprising ABLJ or a portion thereof encoding all or substantially all of the KD.
  • the fusion gene may be associated with a disease (eg BCR-ABLl associated with CML, Ph+ ALL or Ph+ AML, or a fusion gene characteristic of another ABLJ translocation-associated neoplasm; for example the ETV6 (TEL)-ABLl fusion gene associated with some forms of ALL).
  • the method of the present invention therefore comprises the steps of:
  • RNA transcripts ie mRNA
  • ABLl a fusion gene comprising ABLl or cDN A molecules produced from transcripts of a fusion gene comprising ABLl;
  • a primer extension reaction eg 2-6 cycles
  • a first primer pair comprising forward and reverse primer molecules targeted so as to generate polynucleotide molecules comprising a polynucleotide sequence corresponding to all of the fusion gene or a portion thereof comprising a target region spanning a fusion gene breakpoint, an adjacent upstream sequence (eg of at least 20 nucleotides in length) and a downstream KD-encoding portion of the ABLl sequence (eg encoding exons 2 to 10, or 4 to 7), wherein one of said primer molecules (preferably the reverse primer or the primer closest to the KD- encoding portion) comprises a short sequence of random nucleotides providing an individual unique molecular ID tag (UMID), along with a fusion gene-specific sequence and a universal 5' tail sequence, to thereby tag each of the generated polynucleotide molecules with an individual UMID;
  • UID individual unique molecular ID tag
  • the method enables the identification (ie detection) of, for example, mutations (including rare sequence mutations and compound mutations) present in the fusion gene transcripts, which may be associated with disease (eg a Ph+ leukaemia such CML), or otherwise, of some disease- or treatment- associated characteristic (eg disease stage or drug resistance, particularly T I resistance).
  • the method may also enable, for example, rare sequence mutations to be distinguished from artefact mutations arising from amplification and/or sequencing reactions. As will be apparent from the above, avoiding such misleading results can be of considerable clinical significance.
  • rare sequence mutation is to be understood as referring to a "low level” mutation that may show a frequency of ⁇ 10% (eg a mutation that is present in less than 10% of individuals with an ABL1 fusion gene). Such rare sequence mutations may, for example, be present in all tumour cells in a sample or only in a portion or sub-population of the tumour cells in a sample. Further, a rare sequence mutation may consist of, for example, a point mutation (eg a single nucleotide variant; SNV), or an insertion or deletion mutation. Moreover, rare sequence mutations may represent polyclonal mutations or compound mutations.
  • the term "compound mutation” will be well understood by those skilled in the art and refers to one of two or more mutations present on the same polynucleotide mol ecule (ie one of multiple mutations present on the same polynucleotide molecule). These mutations may "compound” to cause, for example, an altered activity of an encoded protein, polypeptide or protein domain, which may in turn, be the cause of disease or, otherwise, of some disease- or treatment-associated characteristic (eg disease stage or drug resistance).
  • a "polyclonal mutation” will be understood by those skilled in the art as referring to one of two or more mutations present on different copies of a
  • polynucleotide molecule ie one of multiple single mutations found on different copies of a particular polynucleotide molecule.
  • the sample used in step (i) of the method comprises mRNA or complementary DNA (cDNA) prepared from any suitable body sample obtained from, for example, blood, serum, plasma, or the like (eg tumour tissue sample).
  • cDNA complementary DNA
  • the sample used in step (i) comprises cDNA.
  • cDNA refers to a DNA molecule that has a nucleotide sequence that is complementary to a molecule of messenger RNA (mRNA) which may be synthesised with reverse transcriptase using the mRNA as template.
  • mRNA messenger RNA
  • the cDNA does not contain intron sequences.
  • the sample for use in the method of the present invention may comprise cDNA as prepared by any of the methods well known to those skilled in the art.
  • the cDNA molecules present in the sample may be tagged with an individual unique molecular ID tag (UMID) in step (ii) by conducting a primer extension reaction (eg using a high-fidelity DNA polymerase enzyme) with a first primer pair comprising forward and reverse primer molecules, wherein one of said primer molecules (preferably the reverse primer) comprises a short sequence of random nucleotides providing the UMID, along with a fusion gene-specific sequence (ie a first fusion gene- specific sequence) and a universal 5' tail sequence.
  • the other primer molecules will also comprise a fusion gene-specific sequence (ie a second fusion gene-specific sequence).
  • the forward and reverse primer molecules are selected so that the primer extension reaction generates polynucleotide molecules comprising a polynucleotide sequence corresponding to at least the target region of the fusion gene.
  • the first fusion gene-specific sequence of the reverse primer may bind at the 3' end of the ABLl sequence within exon 7 or 9, while the second fusion gene-specific sequence of the forward primer may bind within the adjacent upstream sequence of the other gene member of the fusion gene (eg within exon el or el 3 of BCR or within exon 4 of ETV6 (TEL)).
  • the fusion gene-specific sequences of the forward and reverse primers may allow for code degeneracy (ie the primer molecules may be degenerate primers), or otherwise, the primer extension reaction may include multiple primers as required, to ensure tagging of all of the cDNA molecules.
  • the mRNA molecules may be tagged with an individual unique molecular ID tag (UMID) in step (ii) by conducting a primer extension reaction with a first primer pair comprising forward and reverse primer molecules (eg using a reverse transcriptase enzyme followed by synthesis of the second strand with a high-fidelity DNA polymerase enzyme), wherein one of said primer molecules (preferably the reverse primer) comprises a short sequence of random nucleotides providing the UMID, along with a fusion gene-specific sequence (ie a first fusion gene-specific sequence) and a universal 5' tail sequence.
  • a primer pair comprising forward and reverse primer molecules
  • a reverse transcriptase enzyme followed by synthesis of the second strand with a high-fidelity DNA polymerase enzyme
  • the other primer molecules will also comprise a fusion gene-specific sequence (ie a second fusion gene-specific sequence).
  • the forward and reverse primer molecules are selected so that polynucleotide molecules are generated which comprise a polynucleotide sequence corresponding to at least the target region of the fusion gene.
  • the first fusion gene-specific sequence of the reverse primer may bind at the 3' end of the ABLl sequence within exon 7 or 9, while the second fusion gene-specific sequence of the forward primer may bind within the adjacent upstream sequence of the other gene member of the fusion gene (eg within exon el or el 3 of BCR or within exon 4 of ETV6 (TEL)).
  • the primer extension reaction may include multiple primers as required to ensure tagging of all of the mRNA molecules.
  • the primer extension reaction may be conducted using any one of the suitable methodologies well known to those skilled in the art.
  • the primer extension reaction will comprise a two (2) cycle reaction.
  • the UMID sequences may be provided by generating random nucleotide sequences of, for example, 10-25 nucleotides in length (preferably, 15-20 nucleotides in length, and more preferably, 18 nucleotides in length).
  • the primer molecules comprising the UMID bind to the cDNA/mRNA (through the complementary fusion gene-specific sequence) in a simple manner forming a regular duplex structure devoid of any significant loop structure and, as such, those skilled in the art will understand that the method of the present invention does not employ molecules such as inversion probes (eg single molecule Molecular Inversion Probes (smMIP) described by Hiatt el ai, 2013).
  • the product of step (ii) is a reaction mixture wherein each one of the cDNA/mRNA molecules present in the sample is tagged with an individual UMID.
  • excess primers comprising the UMID are preferably degraded using any suitable methodology (eg by incubation with 60U of Exonuclease I at 37°C for 60 mins).
  • step (iii) the amplification of the UMID-tagged polynucleotide molecules to generate UMID- tagged amplicons comprising a polynucleotide sequence corresponding to all or substantially all of the said D-encoding portion, may be achieved using any of the suitable methodologies well known to those skilled in the art.
  • the amplification is performed using a standard polymerase chain reaction (PCR) amplification method (preferably a non-linear amplification method) using a pair of primers (ie forward and reverse primer molecules) defining the 5' and 3' ends of the desired polynucleotide sequence of the KD-encoding portion.
  • PCR polymerase chain reaction
  • the respective primer sequences hybridise to the 3' end of one strand (ie to thereby “define” the 3' end) and the 3' end of a complementary strand (ie to thereby “define” the 5' end) of the particular sequence so as to enable that sequence to be amplified.
  • this may be achieved by using a primer pair comprising a first primer molecule that comprises a nucleotide sequence that is complementary to the sequence at one end of the desired polynucleotide sequence and a second primer molecule that comprises a nucleotide sequence that targets a standard universal 5' tail sequence.
  • a primer pair comprising a first primer molecule that comprises a nucleotide sequence that is complementary to the sequence at one end of the desired polynucleotide sequence and a second primer molecule that comprises a nucleotide sequence that targets a standard universal 5' tail sequence.
  • each of the 20 nucleotides is perfectly complementary to the corresponding nucleotide of the particular target nucleotide sequence
  • a lesser degree of complementarity eg 95% complementary; wherein for a primer sequence of 20 nucleotides in length there may be one "mismatch" nucleotide and 19 nucleotides that are perfectly complementary with the corresponding nucleotide of the particular target nucleotide sequence.
  • step (iii) may employ primer molecules that allow for code degeneracy (ie the primer molecules may be degenerate primers), or otherwise, the amplification may include multiple primers as required, to ensure amplification of all of the tagged polynucleotide molecules.
  • the method will be conducted using a standard PCR amplification, in some circumstances, it may be preferred to perform the amplification step using a "nested" PCR amplification method using a further, "outside", pair of primers. Nested PCR amplification methods are well known to those skilled in the art.
  • primer molecules suitable for use in primer extension reactions and amplification reactions may be in accordance with techniques and guidelines well known to those skilled in the art (eg as described in Sambrook, J. and D. W. Russell, Molecular Cloning: a laboratory manual, Cold Spring Harbor Press, Third Edition (2001) at Chapter 8 (particularly Table 8-3), the entire disclosure of which is hereby incorporated by reference).
  • the amplicons produced in step (iii) may be no more than about 1 kb in length (although longer amplicons may also be suitable) and, perhaps preferably, will be about 650-750 nucleotides in length nucleotides in length comprising all or substantially all of the KD-encoding portion of the ABU sequence.
  • substantially all it is to be understood that the amplicons comprise at least two and, more preferably, at least three exons of the KD-encoding portion of the ABU sequence.
  • the amplicons comprise at least four exons of the KD-encoding portion (eg exons 2 to 10, exons 4 to 10 or exons 4 to 7).
  • each of the UMID-tagged amplicons is sequenced, conveniently by using a next generation sequencing (NGS) platform such as, for example, 454 pyrosequencing (Roche Diagnostics Corporation, Branford, CT, United States of America), Illumina (Solexa) sequencing (lllumina Inc, San Diego, CA, United States of America), SOLiD sequencing (Life Technologies, Carlsbad, CA, United States of America) or, alternatively, Ion Torrent semiconductor sequencing (Life Technologies).
  • NGS next generation sequencing
  • the technique is also amenable for use with new and emerging sequencing technologies such as PacBio (Pacific Biosciences, Menlo Park, CA, United States of America), Oxford Nanopore (Oxford Science Park, Oxford, United Kingdom) or Qiagen GeneReader (Qiagen, Hilden, Germany).
  • step (v) bioinformatic analysis is conducted on the sequences (or "reads") obtained in step (vi) to identify a consensus sequence for all sequenced amplicons comprising a common UMID (eg comprising a "Read Group” or read “family”).
  • the consensus sequence information may then reveal any sequence mutations that were present in the KD-encoding portion of the polynucleotide molecules present in the sample, since mutations arising from the polynucleotide molecule amplification and/or sequencing reactions (ie artefact mutations) will only be present in a small numbers of the sequences (ie in a few- reads only).
  • the bioinformatic analysis identifies reads derived from a single initial fusion gene transcript by virtue of the common UMID.
  • the consensus sequence of reads with a common UMID may be determined using automated variant calling and filtering algorithms, and represents the sequence of an initial cDNA/mRNA molecule present in the sample, thereby overcoming artefact mutations arising from the polynucleotide molecule amplification and/or sequencing reactions.
  • Steps (iv) and (v) may be conducted concurrently.
  • the term "consensus sequence” refers to the order of the most frequent nucleotides found in the sequences (ie reads) of the UMID-tagged amplicons (eg as produced in step (iv) of the method of the first aspect of the present invention) comprising a common UM1D.
  • the consensus sequence for a given group of amplicons comprising a common UMID may reveal any sequence mutations that were present in the
  • polynucleotide molecules present in the sample since mutations arising from the polynucleotide molecule amplification and/or sequencing reactions (ie artefact mutations) will only be present in a small numbers of the sequences (ie in a few reads only) and will therefore not be represented in the consensus sequence. Moreover, since the consensus sequence is produced from amplicons ultimately generated from a single polynucleotide molecule in the sample, the presence of two or more mutations in the consensus sequence may identify compound mutations (ie two or more mutations present on the same polynucleotide molecule present in the sample).
  • Mutations present in the consensus sequence may be recognised by sequence comparison against an appropriate reference sequence (eg the "wild-type” (WT) sequence), using standard methodologies and tools well known to those skilled in the art (eg Burrows-Wheeler sequence aligner (BWA); Li et al., 2010 and Genome Analysis Toolkit (GATK) such as that available from Appistry Inc, St Louis, MO, United States of America).
  • WT wild-type sequence
  • BWA Burrows-Wheeler sequence aligner
  • GATK Genome Analysis Toolkit
  • the number of amplicons present with a common UMID can be readily ascertained. This may be done by bioinformatic analysis of the NGS reads. For example, reads comprising a common UMID are grouped into Read Groups (or "families") and aligned to a reference sequence using BWA. The variants within each Read Group are called using GATK. Variants passing bioinformatic filtering that are present in the majority of reads within each Read Group represent those in the initial amplified molecule, whereas those present in only a few reads are artefacts.
  • the ratio of the number of Read Groups with a consensus sequence bearing a WT or reference sequence versus groups bearing variants may then be expressed as a percentage or ratio to allow description of prevalence of each mutant present.
  • Artificial, simulated samples created from known mixtures of WT/reference and mutant sequences may be used to empirically determine, and periodically calibrate, the sensitivity of the assay, or otherwise act as quality assurance. This may be particularly helpful when samples are of poor quality and/or quantity (eg some clinical samples such as formalin-fixed paraffin embedded sections, circulating tumour DNA samples and samples taken to assess minimal residual disease following treatment).
  • the present invention provides a method for identifying and/or enumerating sequence mutations (such as rare sequence mutations) within the kinase domain (KD) of the BCR-ABLl fusion gene associated with Ph+ leukaemias.
  • sequence mutations such as rare sequence mutations
  • the structure of BCR-ABLl (encompassing numerous variants) has been well described in, for example, Melo and Chuah, 2007, the content of which is hereby incorporated by reference in its entirety.
  • the KD-encoding portion of BCR-ABLl comprises six exons of the ABL1 portion of the fusion gene denoted exon 4 to 10 (also a4 to al O).
  • the chromosomal translocation point of BCR-ABLl that is the breakpoint or junction at which the two genes are fused, commonly arises from the major breakpoint cluster region (M-bcr) between exons e l 2 and e l6 of BCR and breakpoints of ABL within exon a2, and numerous fusion gene variants have been identified.
  • M-bcr major breakpoint cluster region
  • the "typical" transcripts of BCR-ABLl include transcripts denoted as e l a2 (a transcript produced from a fusion between exon e 1 of BCR and exon a2 of ABU), e 13a2 and e 14a2 (which can also arise from alternative splicing), but other BCR-ABLl transcripts are also well known including rare or "atypical” e l a2 (arising from the minor breakpoint cluster region (m-bcr) of BCR), e l a3, e2a2, e6a2, e l 3a3, e l4a3 and e l 9a2 variant forms.
  • m-bcr minor breakpoint cluster region
  • the e l 3a2 and e l2a2 transcripts encode a BCR-ABL l protein of 210 kDa, while the e l a2 transcript encodes for a BCR-ABL l protein of 190 kDa and the el 9a2 transcript encodes for a BCR-ABL l protein of 230 kDa.
  • the method of the present invention may be conducted in a manner to identify and/or enumerate sequence mutations (such as rare sequence mutations) within the kinase domain (KD) of one or more of the BCR-ABLl fusion gene and/or transcript variants, including but not limited to those mentioned above.
  • the method of the present invention therefore comprises the steps of:
  • a primer extension reaction eg 2-6 cycles
  • a first primer pair comprising forward and reverse primer molecules targeted so as to generate polynucleotide molecules comprising a polynucleotide sequence corresponding to all of the BCR-ABLl fusion gene or a portion thereof comprising a target region spanning a fusion gene breakpoint, an adjacent upstream 5CR-derived sequence (eg of at least 20 nucleotides in length) and a downstream KD-encoding portion of the ABL1 sequence (eg encoding exons 4 to 10 or exons 4 to 7), wherein one of said primer molecules (preferably the reverse primer or the primer closest to the KD-encoding portion) comprises a short sequence of random nucleotides providing an individual unique molecular ID tag (UMID), along with a BCR-ABLl - specific sequence (3') and a universal 5' tail sequence, to thereby tag each of the generated polynucleotide molecules with an individual UMID;
  • UID individual unique molecular ID
  • the sample used in step (i) of the method of this embodiment comprises BCR-ABLl transcripts or cDNA (produced from BCR-ABLl transcripts) prepared from any suitable body sample obtained from, for example, blood, serum, plasma, or the like.
  • the cDNA/mRNA sample used in step (i) of the method of this embodiment is prepared from a white blood cell pellet.
  • the sample used in step (i) of the method of this embodiment comprises cDNA.
  • the cDNA molecules may be tagged with an individual unique molecular ID tag (UMID) in step (ii) by conducting a primer extension reaction with a first primer pair comprising forward and reverse primer molecules, wherein one of said primer molecules (preferably the reverse primer) comprises a short sequence of random nucleotides providing the UMID, along with a BCR-ABLl -specific sequence (ie a first BCR-ABL 1 -specific sequence) and a universal 5' tail sequence.
  • the other primer molecules will also comprise a BCR-ABLl -specific sequence (ie a second BCR-ABL 1 -specific sequence).
  • the forward and reverse primer molecules are selected so that the primer extension reaction generates polynucleotide molecules comprising a polynucleotide sequence corresponding to at least the target region of the BCR- ABLl fusion gene.
  • the first BCR-ABL 1 -specific sequence of the reverse primer may bind at the 3' end of the ABLl sequence within exon 7 or 9, while the second BCR-ABLl -specific sequence of the forward primer may bind within the BCR sequence (eg within exon e l or e l3).
  • the forward and reverse primers may allow for code degeneracy, or otherwise, the primer extension reaction may include multiple primers as required, to ensure tagging of all of the cDNA molecules.
  • the primer extension reaction may preferably comprise a two (2) cycle reaction.
  • the UMID sequences may be provided by generating random nucleotide sequences of, for example, 10-25 nucleotides in length (preferably, 15-20 nucleotides in length, and more preferably, 18 nucleotides in length).
  • the primer molecules comprising the UMID bind to the cDNA (through the complementary BCR-ABLl -specific sequence) in a simple manner forming a regular duplex structure devoid of any significant loop structure (as such, the method of this embodiment does not employ molecules such as inversion probes).
  • the product of step (ii) is a reaction mixture wherein each one of the cDNA molecules present in the sample is tagged with an individual UMID. Following the UMID tagging, excess primers comprising the UMID are preferably degraded using any suitable methodology.
  • the tagged molecules produced in step (ii) are polynucleotide molecules encoding a "typical" transcript BCR-ABLl (eg el a2, e l 3a2 and e l4a2).
  • a "typical" transcript BCR-ABLl eg el a2, e l 3a2 and e l4a2
  • the method is also applicable for polynucleotide molecules encoding an "atypical" transcript such as those mentioned above.
  • the target region may comprise a portion of the BCR-ABLl fusion gene of, for example, no more than 2.5 kb (excluding intron sequences), but shorter sequences may be preferable such as a portion of no more than about 1.5 kb (excluding intron sequences) or a portion of no more than about 1 kb (excluding intron sequences).
  • step (iii) the amplification of the UMID-tagged polynucleotide molecules to generate UMID- tagged amplicons which comprise a polynucleotide sequence to all or substantially all of the said KD- encoding portion, may be achieved with any of the suitable methodologies well known to those skilled in the art.
  • the amplification may be preferably performed using a standard polymerase chain reaction (PCR) amplification method (preferably a non-linear amplification method) using a pair of primers (ie forward and reverse primer molecules) defining the 5' and 3' ends of the desired
  • PCR polymerase chain reaction
  • primers ie forward and reverse primer molecules
  • the amplicons produced in step (iii) may be no more than about 1 kb in length (although longer amplicons may also be suitable) and, perhaps preferably, will about 650-750 nucleotides in length nucleotides in length comprising all or substantially all of the KD-encoding portion of the ABL1 sequence.
  • the amplicons comprise at least two and, more preferably, at least three exons of the KD-encoding portion of the ABL1 sequence.
  • the amplicons comprise at least four exons of the KD-encoding portion (eg exons 4 to 7).
  • each of the UMID-tagged amplicons is sequenced, conveniently by using a next generation sequencing (NGS) platform.
  • NGS next generation sequencing
  • bioinformatic analysis is conducted on the sequences (or "reads") obtained in step (vi) to identify a consensus sequence for all sequenced amplicons comprising a common UMID (eg comprising a "Read Group”).
  • the consensus sequence information may then reveal any sequence mutations that were present in the KD-encoding portion of the polynucleotide molecules present in the sample.
  • the bioinformatic analysis identifies reads derived from a single initial BCR-ABLl molecul e by virtue of the common UMID.
  • the consensus sequence of reads with a common UMID may be determined using automated variant calling and filtering algorithms, and represents the sequence of an initial BCR-ABLl cDNA molecule present in the sample, thereby overcoming artefact mutations arising from the polynucleotide molecule amplification and/or sequencing reactions. Steps (iv) and (v) may be conducted concurrently.
  • the method of this embodiment of the present invention may thereby enable, for example, rare sequence mutations to be distinguished from artefact mutations arising from amplification and/or sequencing reactions.
  • the method of this embodiment of the present invention may enable the identification of the presence of compound mutations, present in the BCR-ABLl fusion gene, associated with advanced Ph+ leukaemia (eg advanced CML disease) and inferior therapeutic outcomes (Parker et al., 2012) and/or drug resistance.
  • the method of the present invention may be varied for the detection of minimal residual disease (MRD) by enabling the accurate and sensitive detection of sequence mutations associated with the particular disease.
  • MRD minimal residual disease
  • the term "minimal residual disease” will be well understood by those skilled in the art and refers to any small amount of remnant diseased tissue or cells in a subject during or after disease therapy. MRD is the major cause of relapse in leukaemia and cancer. More particularly, in this embodiment, the method may enable the assessment of whether a subject who has been treated (eg treated for Ph+ leukaemia) is free of the disease (eg the treatment has eradicated the diseased cells) or whether remnant diseased tissue or cells remain.
  • the method of this embodiment may enable the assessment of whether the treatment is or is not being effective.
  • the method may be used so as to allow comparison of the efficacy of different treatments, as well as monitoring the subject's remission status and recurrence of the disease. As such, an informed decision may be made on whether the patient may benefit from further treatment (perhaps with an alternative drug or drug regimen).
  • Ph+ leukaemia and other ABL1 translocation associated neoplasms the continual absence of residual leukaemia could identify patients where TKI treatment can be stopped wi th a limited probability of relapse.
  • the method further comprises the step of:
  • step (vi) where at least one disease-associated sequence mutation is identified within the consensus sequences, the method indicates the presence of minimal residual disease. Conversely, if no disease-associated sequence mutation is identified within the consensus sequences, the method indicates that the subject may be free of disease.
  • minimal residual disease may be detected by identifying and/or enumerating all sequenced amplicons comprising a common UMID.
  • the amplicons may be representative of the presence in the sample of BCR-ABL1 cDNA molecules.
  • the cDNA sample would preferably be prepared from mRNA isolated from a white blood cell pellet and, as such, the BCR-ABL1 cDNA molecules would represent transcripts of the BCR-ABL1 fusion gene.
  • the method of this further embodiment enables the detection and/or quantification of minimal residual disease.
  • the present invention provides a kit for use in a method of the present invention.
  • a kit may comprise, for example, appropriate primer molecules, and/or buffer solutions, preparations of deoxyribonucleotide triphosphates (dNTPs) etc.
  • SMCS Single Molecule Consensus Sequencing
  • Individual BCR-ABLl cDNA molecules were tagged with a unique molecular identifier (UMID) sequence using a two (2) cycle primer extension reaction performed using the polymerase chain reaction (PCR) technique and a robust high-fidelity DNA polymerase enzyme, and a set of BCR-ABLl -specific primers (nb. the forward primer is dependent on the type of BCR-ABLl transcript being targeted); namely:
  • the forward primer binds to BCR at nucleotides c.2645 to c.2668 (NM 021574) within exon el 3.
  • the reverse primer consists of sequence complementary to ABLl (NM_()05157.5) at the 3' end (at nucleotides c.1224 to c.1243 within exon 7), flanked by the UMID and a universal sequence (ie a portion of the Illumina sequencing adaptor, underlined) at the 5' end to allow amplification using a universal primer in subsequent steps.
  • the UMID consisted of 15 or 18 randomised nucleotides, generating >1 billion or >60 billion distinct sequences, respectively.
  • the forward primer in this case binds to BCR at nucleotides c.1 1 16 to c.1 137 within exon e 1.
  • Tagging with UMID was performed using BCR-ABLl cDNA generated from -0.5 ⁇ g of the total RN A and 400 nM of each primer in a 25 reaction. NEBNext High-Fidelity 2X PCR master mix (New England BioLabs Inc, Ipswich, MA, United States of America) was used for all PCRs. The UMID-tagged cDNA molecules are about 1.5 kb in length. Following UMID tagging, excess UM ID-containing primers were degraded by incubation with 60U of Exonuclease I at 37°C for 60 mins. The Exonuclease I was then heat inactivated (95°C for 5 min).
  • the uniquely-tagged BCR-ABLl molecules were amplified using 18-28 cycles of PCR with the BCR forward primer and a reverse primer complementary to the universal sequence in the UMID-containing primer (5' ACACTCTTTCCCTACACGACGCTC; SEQ ID NO: 4) in a 50 ⁇ xL reaction.
  • the products were purified with 0.6X AMPure® XP beads (Agilent
  • Excess primers were degraded using Exonuclease I, and then a final five (5) cycle PCR was performed to incorporate sample indexes ([i7] and [i5J to allow sample pooling for sequencing; lllumina Inc.) and sequences for binding the lllumina flow cell:
  • haplotype within the original samples, the consensus sequences were collated into unique sequences.
  • G ⁇ T nucleotide changes The most frequent error was G ⁇ T nucleotide changes, which is consistent with mis-incorporation of an A opposite an 8-oxo-guanine during the first round of PCR, resulting in G ⁇ T errors.
  • 8-oxo-guanine is prevalent in ancient DNA and is caused by nucleic acid oxidation during sample storage.
  • G ⁇ T nucleotide changes were most frequent in samples which had been stored for long periods of time (nb. most RNA samples had been stored for over 5 years) and were collected into PAXgene® tubes (Qiagen).
  • the second most prevalent nucleotide changes were G ⁇ A, which are caused by cytosine deamination to uracil causing mis-incorporation of an A opposite the uracil during the first round of PCR.
  • This nucleotide change was commonly observed at specific genomic positions, suggesting that it is likely due to in vivo cytosine deamination by RNA editing enzymes, such as those of the APOBEC family
  • Parent nodes were therefore interpreted as representing the clonal diversity of interest within the biological sample, with children nodes representing uninteresting changes of parent haplotypes.
  • mock samples were created by mixing compound mutant plasmids or patient samples (cDNA or RNA) with different BCR-ABL 1 mutations (up to 5 replicates each of a total of 14 mock samples were examined). Examination of the raw sequencing reads of the mock samples revealed a complex spectrum of mutants, similar to previous clinical reports (Soverini et ah, 2013; Khorashad et al., 2103). Using SMCS, however, enabled bioinformatic filtering of these artefacts, largely eliminating PCR amplification and sequencing errors, and exclusively reported the compound and polyclonal mutants known to be present in the mock samples.
  • a mock sample was generated by mixing 5 plasmids containing different compound mutations at various ratios with a plasmid containing unmutated BCR-ABLl.
  • the mock sample contained 65% unmutated BCR-ABL and 35%, 1 %, 0. 1 %, 0.05% and 0.01 % of each of the 5 compound mutations.
  • Six replicates of the mock sample (between 1 ,741 and 3,721 BCR-ABLl molecules were examined per replicate) were examined and it was found that it was possible to detect compound mutants present at a frequency of 0.1 % or greater (ie reproducible detection of 3 of the 5 plasmids containing compound mutations).
  • the amplicon NGS method detected 36 compound mutants within the 25 patients. Of the 32/36 mutations that were present within the region examined by SMCS, only eight (8) mutations were detected. Based on observations previously published in Parker et al., 2014, 16 of the 24 compound mutants that were not detected by SMCS were considered to likely represent PCR
  • the SMCS method was further evaluated by examining samples of 91 imatinib-resistant CML patients for which extensive examination of the BCR-ABLl kinase domain had been previously performed using Sanger sequencing (detection limit - 10%) and a mass-spectrometry based mutation assay (detection limit ⁇ ().2%)(Parker et al., 201 1). Neither Sanger sequencing nor the mass-spectrometry assay is able to distinguish compound mutations from polyclonal mutations.
  • the samples examined were collected immediately before starting (ie "baseline”) second-line TKl treatment with a second-line TKl (nilotinib or dasatinib), and clinical and molecular follow-up data was available.
  • the method By using samples of BCR-ABLl cDNA molecules (eg generated from patient mPvNA), the method not only allows examination of multiple exons of sequence using a current, clinically applicable, sequencing platform, but also abrogates the need for patient-specific primers to isolate their unique BCR-ABLl gene fusion molecules, a necessary step to enable cost-effective sequencing of the fusion molecules which may be scarce within a clinical sample.

Landscapes

  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biochemistry (AREA)
  • Oncology (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Hospice & Palliative Care (AREA)
  • Plant Pathology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Methods of detecting mutations, particularly rare sequence mutations, within the kinase domain (KD) of a fusion gene comprising ABL1 are disclosed. These methods involve the use of a novel technique, termed Single Molecule Consensus Sequencing (SMCS). The methods particularly enable the detection of compound mutations present on the same BCR-ABL1 polynucleotide molecule that may be causative of an altered activity of an encoded protein, polypeptide or protein domain, which may in turn, be the cause of chronic myeloid leukaemia (CML) or acute lymphoblastic leukaemia (ALL) disease or, otherwise, of some disease- or treatment -associated characteristic (eg disease stage or drug resistance).

Description

DETECTING SEQUENCE MUTATIONS
IN LEUKAEMIC FUSION GENES
TECHNICAL FIELD
10001 ] The present invention relates to methods of detecting mutations, particularly rare sequence mutations, in polynucleotide molecules. These methods involve the use of a novel technique, Single Molecule Consensus Sequencing (SMCS).
PRIORITY DOCUMENT
[0002] The present application claims priority from Australian Provisional Patent Application No 2014904452 titled "Detecting mutations" filed on 5 November 2014, the content of which is hereby incorporated by reference in its entirety.
BACKGROUND
[0003] Chronic myeloid leukaemia (CML), otherwise known as chronic granulocytic leukaemia (CGL), is a relatively rare disease that affects the blood and bone marrow. If left untreated, the disease is invariably fatal within 3 to 5 years. In Australia, around 330 new cases of CML are diagnosed each year (Leukaemia Foundation Australia), mostly in adult patients over the age of 50. In most instances, the diagnosis is made during a relatively benign chronic phase (the "chronic phase"). Subsequently, the disease progresses through an "accelerated phase" to a terminal "blastic phase" or "blast crisis" phase that is generally refractory to therapy (Goldman el ah, 2003). The disease is characterised by the overproduction of granulocytes (also referred to as blasts or leukaemic blasts) in the blood marrow.
[0004] CML occurs when a bone marrow stem cell develops a new and abnormal chromosome referred to as the Philadelphia (Ph) chromosome. What causes this chromosome to appear in some people is unknown (although in some cases it may have arisen due to radiotherapy for other cancers), and it is not familial nor can it be passed to offspring. The Ph chromosome comprises the fusion gene, BCR-ABL1, which represents the central molecular pathology of CML. This gene encodes the BCR-ABLl fusion protein, which is a constitutively-activated tyrosine kinase that aberrantly activates a series of molecular pathways causing deregulated cell proliferation, differentiation, DNA repair and apoptosis (Melo et ah, 2007).
[0005] In the past, CML patients were commonly treated by bone marrow transplant. However, now, most CML patients are effectively treated using a BCR-ABL l tyrosine kinase inhibitor (TKI), particularly imatinib (Glivec®; Novartis Pharmaceuticals Pty Ltd, North Ryde, NSW, Australia); with most patients able to live a normal life and reach a normal life span. However, resistance to imatinib is known to occur in 10-30% of all patients; the most commonly identified cause being the presence of somatic mutations within the kinase domain (KD) of the BCR-ABL1 protein (Branford et al., 2009) which interfere with drug binding and thereby lead to the reactivation of kinase activity.
[0006] More than 100 TKI-resistant mutations have now been described in CML patients (Branford et al., 2009). This has driven the development of more potent TKIs. Now available are the "second- generation" TKIs, nilotinib (Tasigna®; Novartis Pharmaceuticals Pty Ltd) and dasatinib (Sprycel®; Bristol-Myers Squibb Company, Princeton, NJ, United States of America) which, between them, are active against most imatinib-resistant BCR-ABLl protein mutants. A notable exception however, is the most commonly detected mutation, T3151, which confers resistance to imatinib, nilotinib and dasatinib. Further, it has been found that approximately 25% of all imatinib-resistant patients have more than one BCR-ABLl mutation (Branford et al., 2009). This has been shown by the present applicants to be associated with advanced CML disease and inferior therapeutic outcome (Parker et al., 2012).
Additionally, it has been found that where a patient has been treated sequentially with T I drugs (eg treated initially with imatinib and then treated with nilotinib), the sequential TKI therapy can actually "select" for patients with "compound mutant" clones having more than one mutation within the same BCR-ABLl molecule. Such compound mutants have been found to show altered oncogenic potency and drug resistance profiles compared to single mutants (O'Hare et ah, 2009).
[0007 ] Ponatinib (Iclusig™; ARIAD Pharmaceuticals lnc, Cambridge, MA, United States of America) is a "third-generation" TKI that inhibits all known BCR-ABL l protein mutants, including T3151, at clinically achievable doses (O'Hare et al., 2009). However, ponatinib therapy has been associated with a relatively high incidence of side-effects, especially arterial thrombotic events such as coronary and peripheral vascular disease (Cortes et al., 2013). Moreover, in vitro studies have demonstrated that certain compound mutants are still likely to cause resistance (O'Hare et al., 2009; Zabriskie et al., 2014).
However, the clinical significance of this finding is unclear, as there is currently no established method to sensitively and specifically ascertain compound mutations in clinical samples and differentiate them from mutations present in different clones (Zabriskie et al., 2014). Indeed, in studies conducted by the present applicants using a sensitive mass spectrometry-based mutation detection assay (Parker et al., 2013), it was demonstrated that the outcome for ponatinib-treated chronic phase patients with the T3 151 mutation was inferior if one or more additional mutations were detectable, however it could not be determined whether those mutations were present in individual clones or as compound mutants. In more recent studies, it was reported that there is a clinical association between compound mutants and ponatinib resistance
(Zabriskie et al., 2014). Using a laborious cloning and Sanger sequencing method with a sensitivity of -20%, nine different compound mutants were detected in 30 patients who discontinued ponatinib therapy; six (6) of these were associated with disease relapse and three (3) were not. Of the former, five (5) patients presented with BCR-ABLl including the T3 151 mutation. These clinical outcomes were in accordance with the expected sensitivities to TKIs of the compound mutants as predicted from in vitro assessments (Parker et al., 2013).
[0008] A number of strategies are now being explored to circumvent TKI resistance associated with BCR-ABL l kinase domain mutations. One promising approach involves the use of inhibitors that bind BCR-ABLl in a non-ATP-competitive manner. The compound denoted as ABLOO l (Novartis AG, Basel, Switzerland) is an example of such an inhibitor. This compound, which is currently being tested in phase 1 clinical studies, binds to a distinct allosteric site of the BCR-ABL l protein, exerting an auto-inhibitory effect. When used in combination with another TKI (eg nilotinib), in vitro studies have demonstrated the potential to inhibit T3151-mutant BCR-ABL l , but unfortunately, it appears that certain compound mutants are nevertheless still likely to confer resistance (Zhang et al., 2010).
[0009] Although compound mutations are emerging as an important mechanism of CML treatment resistance, there is at present no established method for identifying them sensitively and specifically in the clinical diagnostic setting. In particular, compound mutations cannot presently be distinguished from multiple single mutants using conventional direct sequencing, which is the current standard and recommended technique for BCR-ABLl mutation analysis (Hughes et al., 2006). Present methods used to determine BCR-ABLl compound mutation status in CML patients require examination of PGR amplicons, either by cloning and Sanger sequencing (Zabriskie et al., 2014) or using next generation sequencing (NGS) under the assumption that a single clone or sequencing read equates to a single BCR-ABLl molecule (Soverini et al., 2013; Khorashad et al., 2013). However, studies employing such methods have consistently revealed a high prevalence of compound mutants in patients harbouring multiple mutations (around 70%). Moreover, in most of the patients examined, the clonal architecture was surprisingly (and implausibly) complex, with identical mutations found to be both components of compound mutants and present as single mutants. Subsequent attempts to reconstruct the phylogeny have suggested that the exact same nucleotide substitution has occurred independently at multiple times in distinct leukaemic lineages within individual patients; however this is unlikely, as independent acquisition of the same mutation has rarely been described, and never to the extent reported in CML.
[0010] In addition to CML, the BCR-ABLl fusion gene as derived from the Ph chromosome is also central to the pathogenesis of other haematological neoplasms. For example, BCR-ABLl is not uncommonly found in a subtype of adult acute lymphoblastic leukaemia (ALL), associated with the older patient, and adverse prognosis. Although stem cell transplantation after remission induction with intensive chemotherapy remains an important pillar of management of ALL, TKIs are also increasingly recognised as a valuable addition to this standard of care (Chalandon et al, 2015). Like the situation with CML, the increased use of TKI for ALL is also associated with KD mutations as an evolving cause of disease resistance. Indeed, the range of KD mutations observed at TKI therapeutic failure, and the treatment sensitivity associated with each particular mutant, are similar to that observed in CML, with an important distinction. KD mutations in Ph+ ALL more commonly found at treatment failure, and more frequently confer resistance to both first and second generation TKIs, BCR-ABLl mutants that confer high level resistance, such as T3 151, E255K and Y253H, are more common in Ph+ ALL as compared to CML, as are compound mutations (Zabriskie et ah, 2014; Soverini et al, 2014). In addition, the association of the Ph chromosome and acute myeloid leukaemia (Ph+ AML) has been described, and the use of imatinib in these cases has also been reported to be associated with KD mutations as a pathway of treatment resistance (Reboursiere et al , 2015).
[001 1 ] It is widely accepted that techniques used to interrogate the BCR-ABLl KD in the ascertainment of treatment resistant mutations in CML are also applicable and transferable to Ph+ ALL and Ph+ AML without, or with minimal, further modification. These diseases are referred to hereinafter as Ph+ leukaemias.
[0012] Although the fusion between BCR and ABLl is the most commonly encountered molecular pathology that leads to a cancer phenotype associated with constitutive activation of ABL l , other gene fusions comprising ABLl have also been observed in ALL cases. These cases are associated with inferior outcomes and, together with other aberrantly activating mutations in kinases, make up the group of what is commonly referred to as Ph-like ALL (Roberts et al, 2014). TKI inhibitor treatment has been demonstrated to be associated with treatment responses in this disease (specifically a case involving a translocation between ETV6 and ABLl) (Yeung et al, 2015). Comprehensive surveys of Ph-like ALL will lead to a more comprehensive understanding of the range of molecular lesions associated with this disease. Already, fusion partners such as EMU, NUP214, ZMIZI, RCSD1, SFPQ, FOXP1, GAG and SNX2 have been identified in association with ABLl (Greuber et al. , 2013). KD mutations, similar to those reported in the BCR-ABLl fusion gene, are expected to be reported in this disease as TKI use becomes more widespread. Indeed, a case of T315I-mediated dasatinib resistance has already been reported (Yeung et al, 2015). Techniques used to interrogate the BCR-ABLl KD in the ascertainment of treatment resistant mutations in CML are therefore considered to be also applicable to the study of resistance mechanisms in other ABLl translocation-associated neoplasms.
[0013 J The present applicants have now demonstrated that the mutant complexity in Ph+ leukaemias that has been reported to date is a consequence of PCR recombination artefacts that mimic compound mutants, thereby leading to inaccurate assessment of mutation status using present methods (Parker et al, 2014). This suggests that the frequency of compound mutants may have been markedly overestimated in previous clinical studies. [0014 ] Accordingly, due to the limitations of the present methods, there exists a need for a novel method to determine BCR-ABL1 compound mutation status in patients with Ph+ leukaemias such as CML, wherein the method is able to accurately and sensitively discriminate between compound mutants and multiple single mutants.
SUMMARY
[0015] In one aspect, the present invention provides a method for identifying and/or enumerating sequence mutations within the kinase domain (KD) of a fusion gene comprising ABLI or a portion thereof encoding all or substantially all of the KD, wherein the method of the present invention comprises the steps of:
(i) providing a sample comprising RNA transcripts of a fusion gene comprising ABLI or cDNA molecules produced from transcripts of a fusion gene comprising ABLI.,
(ii) performing a primer extension reaction with a first primer pair comprising forward and reverse primer molecules targeted so as to generate polynucleotide molecules comprising a polynucleotide sequence corresponding to all of the fusion gene or a portion thereof comprising a target region spanning a fusion gene breakpoint, an adjacent upstream sequence and a downstream KD-encoding portion of the ABLI sequence, wherein one of said primer molecules comprises a short sequence of random nucleotides providing an individual unique molecular ID tag (UMID), along with a fusion gene-specific sequence and a universal 5' tail sequence, to thereby tag each of the generated polynucleotide molecules with an individual UMID;
(iii) amplifying the UMID-tagged polynucleotide molecules to generate UMID-tagged amplicons comprising a polynucleotide sequence corresponding to all or substantially all of the said KD-encoding portion;
(iv) sequencing the UMID-tagged amplicons; and
(v) identifying a consensus sequence for all sequenced amplicons comprising a common UMID.
[0016] In a further aspect, the present invention provides a kit for use in the method of the present invention. Such a kit may comprise, for example, appropriate primer molecules, and/or buffer solutions, preparations of deoxyribonucleotide triphosphates (dNTPs) etc.
BRIEF DESCRIPTION OF FIGURES
[0017] Figure 1 provides: (A) a schematic diagram outlining an embodiment of the method according to the present invention, employing the use of a novel technique, Single Molecule Consensus Sequencing (SMCS). The diagram depicts how, by uniquely tagging individual BCR-ABLl transcripts prior to PCR amplification, the method enables the identification and/or enumeration of those individual transcripts and hence differentiation between compound and polyclonal mutations; and (B) provides tabulated results comparing compound mutations detected by the method depicted in (A) and amplicon next generation sequencing (NGS) using Ion Torrent. Only mutations within the region examined by the SMCS are summarised (nb. four (4) mutations were outside of this region).
DETAILED DESCRIPTION
[0018] Compound mutations have been clinically associated with treatment failure and drug resistance in patients suffering from Ph+ leukaemias (eg CML) and other diseases and conditions. Further, with the development of more potent drugs for use in the "salvage" setting leading to sequential treatment of patients with different drugs and/or drug combinations, the phenomenon of compound mutations is likely to become an increasingly significant clinical problem. Moreover, the potential to inaccurately classify compound mutations could have dire consequences for patients; for example, patients could miss out on life-saving therapy or, conversely, they could experience rapid disease progression and be at risk of serious side effects from non-effective therapy at great cost to the health system. The present invention provides a novel method which allows for the accurate and sensitive discrimination between compound mutants and multiple single mutants in patients with ABU fusion gene-driven diseases such as Ph+ leukaemias. This method involves a technique herein termed as Single Molecule Consensus Sequencing (SMCS).
[0019] SMCS enables the identification and enumeration of sequence mutations (eg point mutations) present in RNA, cDNA or gDNA that may be associated with, for example, disease or treatment outcomes (eg TK1 drug resistance). The technique is particularly suitable for rare sequence mutations, however the technique may also be applied to the identification and enumeration of more commonly arising sequence mutations (eg mutations that may be present in all cells of a particular cancer). In the context of rare sequence mutations, SMCS enables one to distinguish rare mutations that may only be present in, for example, some cells from a particular cancer from mutations that may be caused by polynucleotide molecule amplification and/or sequencing reactions (ie artefacts). The SMCS technique also enables compound mutations to be distinguished from instances where there are multiple versions of the relevant polynucleotide molecule (eg within a sample of a particular cancer) each with one different mutation within a particular cancer; as will be appreciated from the above, the problem with detecting compound mutations is that artefacts introduced during the amplification and/or sequencing reactions used in present methods can make it appear that two different mutations in the same gene are present in the same cell, when in fact they are present in separate cells. The SMCS technique involves the use of "unique molecular ID tags" (UMIDs) or molecular "barcode" sequences to tag all of the polynucleotide molecules of interest within a sample. The UMlD-tagged molecules are then copied and the sequence of the copies (ie amplicons) is obtained by, preferably, NGS. The sequence of the original polynucleotide molecule is inferred from the consensus of the copies with the same UMID-tag sequence to thereby overcome misleading results caused by artefact mutations introduced during the amplification and/or sequencing reactions.
[0020] The present invention is particularly described with reference to the BCR-ABLl gene and chronic myeloid leukaemia (CML). However, the invention is considered to be more broadly applicable to other disease associated ABLl fusion genes, including but not limited to the BCR-ABLl- e fusion gene associated with some forms of acute lymphoblastic leukaemia (ALL), namely ETV6 (TEL)-ABLl which, in one described example, is a fusion of sequences from exon 5 of the ETV6 (TEL) gene (ie the ETS variant 6 gene) and exon 2 (also known as a2) of the ΑΒΙΛ gene (Yeung et al., 2015). Like BCR-ABLl , it has been found that mutation in the KD (particularly, T3151) has resulted in TKI resistance (Yeung et al., 2015).
[0021 ] In one aspect, the present invention provides a method for identifying and/or enumerating sequence mutations (such as rare sequence mutations) within the kinase domain (KD) of a fusion gene comprising ABLJ or a portion thereof encoding all or substantially all of the KD. The fusion gene may be associated with a disease (eg BCR-ABLl associated with CML, Ph+ ALL or Ph+ AML, or a fusion gene characteristic of another ABLJ translocation-associated neoplasm; for example the ETV6 (TEL)-ABLl fusion gene associated with some forms of ALL). The method of the present invention therefore comprises the steps of:
(i) providing a sample comprising RNA transcripts (ie mRNA) of a fusion gene comprising ABLl or cDN A molecules produced from transcripts of a fusion gene comprising ABLl;
(ii) performing a primer extension reaction (eg 2-6 cycles) with a first primer pair comprising forward and reverse primer molecules targeted so as to generate polynucleotide molecules comprising a polynucleotide sequence corresponding to all of the fusion gene or a portion thereof comprising a target region spanning a fusion gene breakpoint, an adjacent upstream sequence (eg of at least 20 nucleotides in length) and a downstream KD-encoding portion of the ABLl sequence (eg encoding exons 2 to 10, or 4 to 7), wherein one of said primer molecules (preferably the reverse primer or the primer closest to the KD- encoding portion) comprises a short sequence of random nucleotides providing an individual unique molecular ID tag (UMID), along with a fusion gene-specific sequence and a universal 5' tail sequence, to thereby tag each of the generated polynucleotide molecules with an individual UMID;
(hi) amplifying the UMID-tagged polynucleotide molecules (eg 18-28 cycles) to generate UMID- tagged amplicons comprising a polynucleotide sequence corresponding to all or substantially all of the said KD-encoding portion;
(iv) sequencing the UMID-tagged amplicons; and
(v) identifying a consensus sequence for all sequenced amplicons comprising a common UMID. [0022 ] The method enables the identification (ie detection) of, for example, mutations (including rare sequence mutations and compound mutations) present in the fusion gene transcripts, which may be associated with disease (eg a Ph+ leukaemia such CML), or otherwise, of some disease- or treatment- associated characteristic (eg disease stage or drug resistance, particularly T I resistance). The method may also enable, for example, rare sequence mutations to be distinguished from artefact mutations arising from amplification and/or sequencing reactions. As will be apparent from the above, avoiding such misleading results can be of considerable clinical significance. For example, by being certain of the particular mutations present in a Ph+ leukaemia or another ABL1 translocation-associated neoplasm (particularly one where resistance has been observed in the preferred drug(s) or treatments) thereof), valuable and "personalised" information may be generated to assist in the appropriate selection of available treatments (eg the selection of an appropriate treatment where particular mutations are associated with TKI drug resistance).
[0023] The term "rare sequence mutation" as used herein, is to be understood as referring to a "low level" mutation that may show a frequency of < 10% (eg a mutation that is present in less than 10% of individuals with an ABL1 fusion gene). Such rare sequence mutations may, for example, be present in all tumour cells in a sample or only in a portion or sub-population of the tumour cells in a sample. Further, a rare sequence mutation may consist of, for example, a point mutation (eg a single nucleotide variant; SNV), or an insertion or deletion mutation. Moreover, rare sequence mutations may represent polyclonal mutations or compound mutations.
[0024] The term "compound mutation" will be well understood by those skilled in the art and refers to one of two or more mutations present on the same polynucleotide mol ecule (ie one of multiple mutations present on the same polynucleotide molecule). These mutations may "compound" to cause, for example, an altered activity of an encoded protein, polypeptide or protein domain, which may in turn, be the cause of disease or, otherwise, of some disease- or treatment-associated characteristic (eg disease stage or drug resistance). In contrast to a compound mutation, a "polyclonal mutation" will be understood by those skilled in the art as referring to one of two or more mutations present on different copies of a
polynucleotide molecule (ie one of multiple single mutations found on different copies of a particular polynucleotide molecule).
[0025 ] The sample used in step (i) of the method comprises mRNA or complementary DNA (cDNA) prepared from any suitable body sample obtained from, for example, blood, serum, plasma, or the like (eg tumour tissue sample).
[0026] Preferably, the sample used in step (i) comprises cDNA. Those skilled in the art will readily appreciate that cDNA refers to a DNA molecule that has a nucleotide sequence that is complementary to a molecule of messenger RNA (mRNA) which may be synthesised with reverse transcriptase using the mRNA as template. The cDNA does not contain intron sequences. The sample for use in the method of the present invention may comprise cDNA as prepared by any of the methods well known to those skilled in the art. The cDNA molecules present in the sample may be tagged with an individual unique molecular ID tag (UMID) in step (ii) by conducting a primer extension reaction (eg using a high-fidelity DNA polymerase enzyme) with a first primer pair comprising forward and reverse primer molecules, wherein one of said primer molecules (preferably the reverse primer) comprises a short sequence of random nucleotides providing the UMID, along with a fusion gene-specific sequence (ie a first fusion gene- specific sequence) and a universal 5' tail sequence. The other primer molecules will also comprise a fusion gene-specific sequence (ie a second fusion gene-specific sequence). The forward and reverse primer molecules are selected so that the primer extension reaction generates polynucleotide molecules comprising a polynucleotide sequence corresponding to at least the target region of the fusion gene. For example, the first fusion gene-specific sequence of the reverse primer may bind at the 3' end of the ABLl sequence within exon 7 or 9, while the second fusion gene-specific sequence of the forward primer may bind within the adjacent upstream sequence of the other gene member of the fusion gene (eg within exon el or el 3 of BCR or within exon 4 of ETV6 (TEL)). It will be understood by those skilled in the art, that the fusion gene-specific sequences of the forward and reverse primers may allow for code degeneracy (ie the primer molecules may be degenerate primers), or otherwise, the primer extension reaction may include multiple primers as required, to ensure tagging of all of the cDNA molecules.
10027] On the other hand, where the sample used in step (i) comprises mRNA, the mRNA molecules may be tagged with an individual unique molecular ID tag (UMID) in step (ii) by conducting a primer extension reaction with a first primer pair comprising forward and reverse primer molecules (eg using a reverse transcriptase enzyme followed by synthesis of the second strand with a high-fidelity DNA polymerase enzyme), wherein one of said primer molecules (preferably the reverse primer) comprises a short sequence of random nucleotides providing the UMID, along with a fusion gene-specific sequence (ie a first fusion gene-specific sequence) and a universal 5' tail sequence. The other primer molecules will also comprise a fusion gene-specific sequence (ie a second fusion gene-specific sequence). The forward and reverse primer molecules are selected so that polynucleotide molecules are generated which comprise a polynucleotide sequence corresponding to at least the target region of the fusion gene. For example, the first fusion gene-specific sequence of the reverse primer may bind at the 3' end of the ABLl sequence within exon 7 or 9, while the second fusion gene-specific sequence of the forward primer may bind within the adjacent upstream sequence of the other gene member of the fusion gene (eg within exon el or el 3 of BCR or within exon 4 of ETV6 (TEL)). Again, the fusion gene-specific sequences of the forward and reverse primers may allow for code degeneracy, or otherwise, the primer extension reaction may include multiple primers as required to ensure tagging of all of the mRNA molecules. [0028] The primer extension reaction may be conducted using any one of the suitable methodologies well known to those skilled in the art. Preferably, the primer extension reaction will comprise a two (2) cycle reaction. The UMID sequences may be provided by generating random nucleotide sequences of, for example, 10-25 nucleotides in length (preferably, 15-20 nucleotides in length, and more preferably, 18 nucleotides in length). The primer molecules comprising the UMID bind to the cDNA/mRNA (through the complementary fusion gene-specific sequence) in a simple manner forming a regular duplex structure devoid of any significant loop structure and, as such, those skilled in the art will understand that the method of the present invention does not employ molecules such as inversion probes (eg single molecule Molecular Inversion Probes (smMIP) described by Hiatt el ai, 2013). The product of step (ii) is a reaction mixture wherein each one of the cDNA/mRNA molecules present in the sample is tagged with an individual UMID. Those skilled in the art will understand that following the UMID tagging, excess primers comprising the UMID are preferably degraded using any suitable methodology (eg by incubation with 60U of Exonuclease I at 37°C for 60 mins).
[0029] In step (iii), the amplification of the UMID-tagged polynucleotide molecules to generate UMID- tagged amplicons comprising a polynucleotide sequence corresponding to all or substantially all of the said D-encoding portion, may be achieved using any of the suitable methodologies well known to those skilled in the art. Preferably, the amplification is performed using a standard polymerase chain reaction (PCR) amplification method (preferably a non-linear amplification method) using a pair of primers (ie forward and reverse primer molecules) defining the 5' and 3' ends of the desired polynucleotide sequence of the KD-encoding portion. By the words "defining the 5' and 3' ends", it is to be understood that the respective primer sequences hybridise to the 3' end of one strand (ie to thereby "define" the 3' end) and the 3' end of a complementary strand (ie to thereby "define" the 5' end) of the particular sequence so as to enable that sequence to be amplified. Conveniently, this may be achieved by using a primer pair comprising a first primer molecule that comprises a nucleotide sequence that is complementary to the sequence at one end of the desired polynucleotide sequence and a second primer molecule that comprises a nucleotide sequence that targets a standard universal 5' tail sequence. Those skilled in the art will understand that the sequence of the first primer molecule of such a primer pair may be 100%
complementary to the 3' end sequence of the particular target nucleotide sequence (ie for a primer sequence of 20 nucleotides in length, each of the 20 nucleotides is perfectly complementary to the corresponding nucleotide of the particular target nucleotide sequence), or show a lesser degree of complementarity (eg 95% complementary; wherein for a primer sequence of 20 nucleotides in length there may be one "mismatch" nucleotide and 19 nucleotides that are perfectly complementary with the corresponding nucleotide of the particular target nucleotide sequence). Further, those skilled in the art will understand that the amplification of step (iii) may employ primer molecules that allow for code degeneracy (ie the primer molecules may be degenerate primers), or otherwise, the amplification may include multiple primers as required, to ensure amplification of all of the tagged polynucleotide molecules. Moreover, while typically the method will be conducted using a standard PCR amplification, in some circumstances, it may be preferred to perform the amplification step using a "nested" PCR amplification method using a further, "outside", pair of primers. Nested PCR amplification methods are well known to those skilled in the art.
[0030 ] The design of primer molecules suitable for use in primer extension reactions and amplification reactions that may be used in the method of the first aspect, may be in accordance with techniques and guidelines well known to those skilled in the art (eg as described in Sambrook, J. and D. W. Russell, Molecular Cloning: a laboratory manual, Cold Spring Harbor Press, Third Edition (2001) at Chapter 8 (particularly Table 8-3), the entire disclosure of which is hereby incorporated by reference).
[0031 ] Preferably, the amplicons produced in step (iii) may be no more than about 1 kb in length (although longer amplicons may also be suitable) and, perhaps preferably, will be about 650-750 nucleotides in length nucleotides in length comprising all or substantially all of the KD-encoding portion of the ABU sequence. By the term "substantially all", it is to be understood that the amplicons comprise at least two and, more preferably, at least three exons of the KD-encoding portion of the ABU sequence. Most preferably, the amplicons comprise at least four exons of the KD-encoding portion (eg exons 2 to 10, exons 4 to 10 or exons 4 to 7).
[0032] In step (iv), each of the UMID-tagged amplicons is sequenced, conveniently by using a next generation sequencing (NGS) platform such as, for example, 454 pyrosequencing (Roche Diagnostics Corporation, Branford, CT, United States of America), Illumina (Solexa) sequencing (lllumina Inc, San Diego, CA, United States of America), SOLiD sequencing (Life Technologies, Carlsbad, CA, United States of America) or, alternatively, Ion Torrent semiconductor sequencing (Life Technologies). The technique is also amenable for use with new and emerging sequencing technologies such as PacBio (Pacific Biosciences, Menlo Park, CA, United States of America), Oxford Nanopore (Oxford Science Park, Oxford, United Kingdom) or Qiagen GeneReader (Qiagen, Hilden, Germany).
[0033] In step (v), bioinformatic analysis is conducted on the sequences (or "reads") obtained in step (vi) to identify a consensus sequence for all sequenced amplicons comprising a common UMID (eg comprising a "Read Group" or read "family"). The consensus sequence information may then reveal any sequence mutations that were present in the KD-encoding portion of the polynucleotide molecules present in the sample, since mutations arising from the polynucleotide molecule amplification and/or sequencing reactions (ie artefact mutations) will only be present in a small numbers of the sequences (ie in a few- reads only). In particular, the bioinformatic analysis identifies reads derived from a single initial fusion gene transcript by virtue of the common UMID. The consensus sequence of reads with a common UMID may be determined using automated variant calling and filtering algorithms, and represents the sequence of an initial cDNA/mRNA molecule present in the sample, thereby overcoming artefact mutations arising from the polynucleotide molecule amplification and/or sequencing reactions.
[0034] Steps (iv) and (v) may be conducted concurrently.
[0035J As used herein, it is to be understood that the term "consensus sequence" refers to the order of the most frequent nucleotides found in the sequences (ie reads) of the UMID-tagged amplicons (eg as produced in step (iv) of the method of the first aspect of the present invention) comprising a common UM1D. As indicated above, the consensus sequence for a given group of amplicons comprising a common UMID (ie Read Group) may reveal any sequence mutations that were present in the
polynucleotide molecules present in the sample, since mutations arising from the polynucleotide molecule amplification and/or sequencing reactions (ie artefact mutations) will only be present in a small numbers of the sequences (ie in a few reads only) and will therefore not be represented in the consensus sequence. Moreover, since the consensus sequence is produced from amplicons ultimately generated from a single polynucleotide molecule in the sample, the presence of two or more mutations in the consensus sequence may identify compound mutations (ie two or more mutations present on the same polynucleotide molecule present in the sample). Mutations present in the consensus sequence may be recognised by sequence comparison against an appropriate reference sequence (eg the "wild-type" (WT) sequence), using standard methodologies and tools well known to those skilled in the art (eg Burrows-Wheeler sequence aligner (BWA); Li et al., 2010 and Genome Analysis Toolkit (GATK) such as that available from Appistry Inc, St Louis, MO, United States of America).
[0036] The use of UMIDs in the method of the present invention, allows enumeration of the
polynucleotide molecules of interest present in the sample. That is, during the sequencing step (iv) and/or consensus sequence identification step (v), the number of amplicons present with a common UMID can be readily ascertained. This may be done by bioinformatic analysis of the NGS reads. For example, reads comprising a common UMID are grouped into Read Groups (or "families") and aligned to a reference sequence using BWA. The variants within each Read Group are called using GATK. Variants passing bioinformatic filtering that are present in the majority of reads within each Read Group represent those in the initial amplified molecule, whereas those present in only a few reads are artefacts. The ratio of the number of Read Groups with a consensus sequence bearing a WT or reference sequence versus groups bearing variants may then be expressed as a percentage or ratio to allow description of prevalence of each mutant present. Artificial, simulated samples created from known mixtures of WT/reference and mutant sequences may be used to empirically determine, and periodically calibrate, the sensitivity of the assay, or otherwise act as quality assurance. This may be particularly helpful when samples are of poor quality and/or quantity (eg some clinical samples such as formalin-fixed paraffin embedded sections, circulating tumour DNA samples and samples taken to assess minimal residual disease following treatment). [0037] In one particular embodiment, the present invention provides a method for identifying and/or enumerating sequence mutations (such as rare sequence mutations) within the kinase domain (KD) of the BCR-ABLl fusion gene associated with Ph+ leukaemias. The structure of BCR-ABLl (encompassing numerous variants) has been well described in, for example, Melo and Chuah, 2007, the content of which is hereby incorporated by reference in its entirety. The KD-encoding portion of BCR-ABLl comprises six exons of the ABL1 portion of the fusion gene denoted exon 4 to 10 (also a4 to al O). The chromosomal translocation point of BCR-ABLl , that is the breakpoint or junction at which the two genes are fused, commonly arises from the major breakpoint cluster region (M-bcr) between exons e l 2 and e l6 of BCR and breakpoints of ABL within exon a2, and numerous fusion gene variants have been identified. The "typical" transcripts of BCR-ABLl include transcripts denoted as e l a2 (a transcript produced from a fusion between exon e 1 of BCR and exon a2 of ABU), e 13a2 and e 14a2 (which can also arise from alternative splicing), but other BCR-ABLl transcripts are also well known including rare or "atypical" e l a2 (arising from the minor breakpoint cluster region (m-bcr) of BCR), e l a3, e2a2, e6a2, e l 3a3, e l4a3 and e l 9a2 variant forms. The e l 3a2 and e l2a2 transcripts encode a BCR-ABL l protein of 210 kDa, while the e l a2 transcript encodes for a BCR-ABL l protein of 190 kDa and the el 9a2 transcript encodes for a BCR-ABL l protein of 230 kDa. The method of the present invention may be conducted in a manner to identify and/or enumerate sequence mutations (such as rare sequence mutations) within the kinase domain (KD) of one or more of the BCR-ABLl fusion gene and/or transcript variants, including but not limited to those mentioned above. The method of the present invention therefore comprises the steps of:
(i) providing a sample comprising BCR-ABL l transcripts (ie mRNA) or cDNA molecules;
(ii) performing a primer extension reaction (eg 2-6 cycles) with a first primer pair comprising forward and reverse primer molecules targeted so as to generate polynucleotide molecules comprising a polynucleotide sequence corresponding to all of the BCR-ABLl fusion gene or a portion thereof comprising a target region spanning a fusion gene breakpoint, an adjacent upstream 5CR-derived sequence (eg of at least 20 nucleotides in length) and a downstream KD-encoding portion of the ABL1 sequence (eg encoding exons 4 to 10 or exons 4 to 7), wherein one of said primer molecules (preferably the reverse primer or the primer closest to the KD-encoding portion) comprises a short sequence of random nucleotides providing an individual unique molecular ID tag (UMID), along with a BCR-ABLl - specific sequence (3') and a universal 5' tail sequence, to thereby tag each of the generated polynucleotide molecules with an individual UMID;
(iii) amplifying the UMID-tagged polynucleotide molecules (eg 18-28 cycles) to generate UMID- tagged amplicons comprising a polynucleotide sequence corresponding to all or substantially all of the said KD-encoding portion;
(iv) sequencing the UMID-tagged amplicons; and
(v) identifying a consensus sequence for all sequenced amplicons comprising a common UMID. [0038] The sample used in step (i) of the method of this embodiment comprises BCR-ABLl transcripts or cDNA (produced from BCR-ABLl transcripts) prepared from any suitable body sample obtained from, for example, blood, serum, plasma, or the like. Preferably, the cDNA/mRNA sample used in step (i) of the method of this embodiment is prepared from a white blood cell pellet.
[0039 ] Preferably, the sample used in step (i) of the method of this embodiment comprises cDNA. The cDNA molecules may be tagged with an individual unique molecular ID tag (UMID) in step (ii) by conducting a primer extension reaction with a first primer pair comprising forward and reverse primer molecules, wherein one of said primer molecules (preferably the reverse primer) comprises a short sequence of random nucleotides providing the UMID, along with a BCR-ABLl -specific sequence (ie a first BCR-ABL 1 -specific sequence) and a universal 5' tail sequence. The other primer molecules will also comprise a BCR-ABLl -specific sequence (ie a second BCR-ABL 1 -specific sequence). The forward and reverse primer molecules are selected so that the primer extension reaction generates polynucleotide molecules comprising a polynucleotide sequence corresponding to at least the target region of the BCR- ABLl fusion gene. For example, the first BCR-ABL 1 -specific sequence of the reverse primer may bind at the 3' end of the ABLl sequence within exon 7 or 9, while the second BCR-ABLl -specific sequence of the forward primer may bind within the BCR sequence (eg within exon e l or e l3). The forward and reverse primers may allow for code degeneracy, or otherwise, the primer extension reaction may include multiple primers as required, to ensure tagging of all of the cDNA molecules. The primer extension reaction may preferably comprise a two (2) cycle reaction. The UMID sequences may be provided by generating random nucleotide sequences of, for example, 10-25 nucleotides in length (preferably, 15-20 nucleotides in length, and more preferably, 18 nucleotides in length). The primer molecules comprising the UMID bind to the cDNA (through the complementary BCR-ABLl -specific sequence) in a simple manner forming a regular duplex structure devoid of any significant loop structure (as such, the method of this embodiment does not employ molecules such as inversion probes). The product of step (ii) is a reaction mixture wherein each one of the cDNA molecules present in the sample is tagged with an individual UMID. Following the UMID tagging, excess primers comprising the UMID are preferably degraded using any suitable methodology.
[0040] Preferably, the tagged molecules produced in step (ii) are polynucleotide molecules encoding a "typical" transcript BCR-ABLl (eg el a2, e l 3a2 and e l4a2). However, the method is also applicable for polynucleotide molecules encoding an "atypical" transcript such as those mentioned above.
[0041 ] The target region may comprise a portion of the BCR-ABLl fusion gene of, for example, no more than 2.5 kb (excluding intron sequences), but shorter sequences may be preferable such as a portion of no more than about 1.5 kb (excluding intron sequences) or a portion of no more than about 1 kb (excluding intron sequences). [0042 ] In step (iii), the amplification of the UMID-tagged polynucleotide molecules to generate UMID- tagged amplicons which comprise a polynucleotide sequence to all or substantially all of the said KD- encoding portion, may be achieved with any of the suitable methodologies well known to those skilled in the art. As described above, the amplification may be preferably performed using a standard polymerase chain reaction (PCR) amplification method (preferably a non-linear amplification method) using a pair of primers (ie forward and reverse primer molecules) defining the 5' and 3' ends of the desired
polynucleotide sequence of the KD-encoding portion.
[0043 ] Preferably, the amplicons produced in step (iii) may be no more than about 1 kb in length (although longer amplicons may also be suitable) and, perhaps preferably, will about 650-750 nucleotides in length nucleotides in length comprising all or substantially all of the KD-encoding portion of the ABL1 sequence. Preferably, the amplicons comprise at least two and, more preferably, at least three exons of the KD-encoding portion of the ABL1 sequence. Most preferably, the amplicons comprise at least four exons of the KD-encoding portion (eg exons 4 to 7).
[0044] In step (iv), each of the UMID-tagged amplicons is sequenced, conveniently by using a next generation sequencing (NGS) platform. In step (v), bioinformatic analysis is conducted on the sequences (or "reads") obtained in step (vi) to identify a consensus sequence for all sequenced amplicons comprising a common UMID (eg comprising a "Read Group"). The consensus sequence information may then reveal any sequence mutations that were present in the KD-encoding portion of the polynucleotide molecules present in the sample. In particular, the bioinformatic analysis identifies reads derived from a single initial BCR-ABLl molecul e by virtue of the common UMID. The consensus sequence of reads with a common UMID may be determined using automated variant calling and filtering algorithms, and represents the sequence of an initial BCR-ABLl cDNA molecule present in the sample, thereby overcoming artefact mutations arising from the polynucleotide molecule amplification and/or sequencing reactions. Steps (iv) and (v) may be conducted concurrently.
[ 0045] The method of this embodiment of the present invention may thereby enable, for example, rare sequence mutations to be distinguished from artefact mutations arising from amplification and/or sequencing reactions. By being certain of the particular mutations present in a case of Ph+ leukaemia or other ABU translocation-associated neoplasm (particularly one where TKI resistance has been observed), valuable and personalised information may be generated to assist in the appropriate selection of available treatments. More particularly, the method of this embodiment of the present invention may enable the identification of the presence of compound mutations, present in the BCR-ABLl fusion gene, associated with advanced Ph+ leukaemia (eg advanced CML disease) and inferior therapeutic outcomes (Parker et al., 2012) and/or drug resistance.
[0046] This embodiment of the method of the present invention is schematically depicted in Figure 1A. [0047 ] The method of the present invention may be varied for the detection of minimal residual disease (MRD) by enabling the accurate and sensitive detection of sequence mutations associated with the particular disease. The term "minimal residual disease" will be well understood by those skilled in the art and refers to any small amount of remnant diseased tissue or cells in a subject during or after disease therapy. MRD is the major cause of relapse in leukaemia and cancer. More particularly, in this embodiment, the method may enable the assessment of whether a subject who has been treated (eg treated for Ph+ leukaemia) is free of the disease (eg the treatment has eradicated the diseased cells) or whether remnant diseased tissue or cells remain. Similarly, where the subject may be undergoing treatment for disease, the method of this embodiment may enable the assessment of whether the treatment is or is not being effective. In addition, the method may be used so as to allow comparison of the efficacy of different treatments, as well as monitoring the subject's remission status and recurrence of the disease. As such, an informed decision may be made on whether the patient may benefit from further treatment (perhaps with an alternative drug or drug regimen). In the context of Ph+ leukaemia and other ABL1 translocation associated neoplasms, the continual absence of residual leukaemia could identify patients where TKI treatment can be stopped wi th a limited probability of relapse.
[0048] In this further embodiment of the method of the present invention, the method further comprises the step of:
(vi) assessing the presence in one or more of said consensus sequences, of at least one sequence mutation associated with disease (eg Ph+ leukaemia or oxher ABLl translocation associated neoplasm); or
(vii) identifying and/or enumerating all sequenced amplicons comprising a common UMID.
[0049] Where the method of this further embodiment comprises step (vi) and, where at least one disease- associated sequence mutation is identified within the consensus sequences, the method indicates the presence of minimal residual disease. Conversely, if no disease-associated sequence mutation is identified within the consensus sequences, the method indicates that the subject may be free of disease.
[0050] Where the method of this further embodiment comprises step (vii), minimal residual disease may be detected by identifying and/or enumerating all sequenced amplicons comprising a common UMID. For example, the amplicons may be representative of the presence in the sample of BCR-ABL1 cDNA molecules. In that context, the cDNA sample would preferably be prepared from mRNA isolated from a white blood cell pellet and, as such, the BCR-ABL1 cDNA molecules would represent transcripts of the BCR-ABL1 fusion gene. Thus, by determining the number of UMID-tagged amplicons generated from BCR-ABL1 transcripts, the method of this further embodiment enables the detection and/or quantification of minimal residual disease. Steps (v) and (vi), and similarly steps (v) and (vii) of the method of this embodiment, may be conducted concurrently. [0051 ] In a further aspect, the present invention provides a kit for use in a method of the present invention. Such a kit may comprise, for example, appropriate primer molecules, and/or buffer solutions, preparations of deoxyribonucleotide triphosphates (dNTPs) etc.
[0052 ] The invention is hereinafter described by way of the following non-limiting example(s) and accompanying figures.
EXAMPLES
Example 1 Detection of BCR-ABLl mutations in CML patients
Methods and Materials
[0053] A novel NGS technique termed Single Molecule Consensus Sequencing (SMCS) was employed to analyse BCR-ABLl cDNA molecules generated from patient RNA samples. The assay involved tagging individual BCR-ABLl cDNA molecules before library amplification, enabling identification and elimination of most PCR and sequencing errors. NGS was performed on the lllumina MiSeq:Redundant 2 x 300 bp paired-end reads; amino acids (aa) 244-407 of the KD (exons 4 to 7) was examined. Reads derived from an initial BCR-ABLl cDNA molecule were identified bioinformatically by virtue of sharing the same UMID tag sequence. The consensus sequence of reads with the same UMID was determined; this consensus sequence represents the sequence of the initial BCR-ABLl cDN A molecule (see Figure 1A).
[0054 ] Library Preparation
Total RNA was extracted from blood or bone marrow leukocytes and reverse transcribed using random hexamers (Branford et ah, 1999; Resuehr and Spiess, 2003). Individual BCR-ABLl cDNA molecules were tagged with a unique molecular identifier (UMID) sequence using a two (2) cycle primer extension reaction performed using the polymerase chain reaction (PCR) technique and a robust high-fidelity DNA polymerase enzyme, and a set of BCR-ABLl -specific primers (nb. the forward primer is dependent on the type of BCR-ABLl transcript being targeted); namely:
For the p210 BCR-ABLl transcripts (el3a2, e!4a2, el3a3 and el4a3 transcripts):
Fwd 5'-TGACCAACTCGTGTGTGAAACTCC (SEQ ID NO: 1 ) Rev 5'-ACACTCTTTCCCTACACGACGCTCTTCCGATCT|UMID1TGTTGTAGGCCA
GGCTCTCG (SEQ ID NO: 2)
The forward primer binds to BCR at nucleotides c.2645 to c.2668 (NM 021574) within exon el 3. The reverse primer consists of sequence complementary to ABLl (NM_()05157.5) at the 3' end (at nucleotides c.1224 to c.1243 within exon 7), flanked by the UMID and a universal sequence (ie a portion of the Illumina sequencing adaptor, underlined) at the 5' end to allow amplification using a universal primer in subsequent steps. The UMID consisted of 15 or 18 randomised nucleotides, generating >1 billion or >60 billion distinct sequences, respectively.
For the pi 90 BCR- ABLl transcript (elci2 transcript):
Fwd 5'-GAACTCGCAACAGTCCTTCGAC (SEQ ID NO: 3)
Rev 5'-ACACTCTTTCCCTACACGACGCTCTTCCGATCT[UMID]TGTTGTAGGCCA
GGCTCTCG (SEQ ID NO: 2)
The forward primer in this case binds to BCR at nucleotides c.1 1 16 to c.1 137 within exon e 1.
Tagging with UMID was performed using BCR-ABLl cDNA generated from -0.5 μg of the total RN A and 400 nM of each primer in a 25 reaction. NEBNext High-Fidelity 2X PCR master mix (New England BioLabs Inc, Ipswich, MA, United States of America) was used for all PCRs. The UMID-tagged cDNA molecules are about 1.5 kb in length. Following UMID tagging, excess UM ID-containing primers were degraded by incubation with 60U of Exonuclease I at 37°C for 60 mins. The Exonuclease I was then heat inactivated (95°C for 5 min).
The uniquely-tagged BCR-ABLl molecules were amplified using 18-28 cycles of PCR with the BCR forward primer and a reverse primer complementary to the universal sequence in the UMID-containing primer (5' ACACTCTTTCCCTACACGACGCTC; SEQ ID NO: 4) in a 50 \xL reaction. To enable sequencing of the KD-encoding region as a single fragment using a diagnostically applicable NGS platform (Illumina MiSeq), the products were purified with 0.6X AMPure® XP beads (Agilent
Technologies Inc, Santa Clara, CA, United States of America) and used as template in a further three (3) cycle PCR using a forward primer containing sequence complementary to ABLl at its 3' end (specifically nucleotides c.706 to c.727 within exon a4) and a portion of the Illumina adaptor sequence at the 5' end (underlined):
5' GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGAGATGGAACGCACGGACATCA (SEQ ID NO: 5) and the reverse primer from the previous PCR.
Excess primers were degraded using Exonuclease I, and then a final five (5) cycle PCR was performed to incorporate sample indexes ([i7] and [i5J to allow sample pooling for sequencing; lllumina Inc.) and sequences for binding the lllumina flow cell:
Fwd 5' CAAGCAGAAGACGGCATACGAGAT[i7]GTGACTGGAGTTCAGACGTGTGCT (SEQ ID NO: 6)
Rev 5' AATGATACGGCGACCACCGAGATCTACAC[i5 ]ACACTCTTTCCCTACACGACGC (SEQ ID NO: 7)
(|i7] and [i5 ] = 8 nucleotide indexes for sample identification)
Final amplicons were purified using 0.65x AMPure® XP beads and equal to the volume of 96-196 PCR products were pooled for sequencing.
[0055J Sequencing and analysis
Amplification resulted in the attachment of the adaptors for redundant NGS and indexes for sample pooling. The PCR products wrere quantified by quantitative PCR (eg using KAPA HiFi Library
Quantification kit; Kapa Biosystems Inc, Wilmington, MA, United States of America), pooled and sequenced by NGS. Using 2 x 300 bp paired-end reads on the lllumina MiSeq enabled examination of the vast majority of the BCR-ABLl KD (amino acids 244 - 407; exons 4 to 7).
[0056] To generate a consensus sequence for each group of reads corresponding to the same initial BCR- ABLl cDNA molecule, reads were grouped into "families" based on the UMID sequence (when an 18- nucleotide UMID was used, 1 mismatch in the UMID sequence was allowed when grouping reads). After adaptor trimming, reads of exactly the expected length were mapped to the ABL1 reference sequencing using BWA MEM. Reads with mapping quality less than 50 were filtered out, followed by elimination of any reads that were not mapped as pairs.
[00571 After mapping, read families with at least five (5) properly mapped read pairs were subjected to variant calling using the GATK UnifiedGenotyper with a ploidy setting of 1. Read families with a genotype quality (GQ) of less than 30 were filtered out. Taken together, these stringent quality control steps ensured that the consensus sequence of each remaining read family was extremely likely to be identical to that of the original cDNA molecule, eliminating essentially all errors introduced during the PCR amplification or sequencing process. [0058] To determine the abundance of each BCR-ABLl mutant clone (referred to hereafter as
"haplotype") within the original samples, the consensus sequences were collated into unique sequences. To understand the relationship between haplotypes and estimate the background error rate of SMCS, a graph theoretical approach was employed, with each distinct haplotype represented by a node and those differing by one base (ie Hamming distance = 1 ) connected by an edge. This approach allowed the identification of clear abundant "parent" haplotypes that gave rise to "child" haplotypes. Comparing the abundance of parent to children haplotypes allowed an estimation of the per-base error rate to be around 10°. The base dependence of this rate was consistent with DNA damage causing first round PCR errors. The most frequent error was G→T nucleotide changes, which is consistent with mis-incorporation of an A opposite an 8-oxo-guanine during the first round of PCR, resulting in G→T errors. 8-oxo-guanine is prevalent in ancient DNA and is caused by nucleic acid oxidation during sample storage. G→T nucleotide changes were most frequent in samples which had been stored for long periods of time (nb. most RNA samples had been stored for over 5 years) and were collected into PAXgene® tubes (Qiagen). The second most prevalent nucleotide changes were G→A, which are caused by cytosine deamination to uracil causing mis-incorporation of an A opposite the uracil during the first round of PCR. This nucleotide change was commonly observed at specific genomic positions, suggesting that it is likely due to in vivo cytosine deamination by RNA editing enzymes, such as those of the APOBEC family
(Wedekind et al., 2003), and therefore, while not likely interesting biologically in this setting, these base changes should not be considered an error of the SMCS method.
[0059] Parent nodes were therefore interpreted as representing the clonal diversity of interest within the biological sample, with children nodes representing uninteresting changes of parent haplotypes.
Consequently, it was determined to estimate the complete set of parent nodes for each sampl e. Any completely disconnected node was taken to be a parent, as were the nodes for the most abundant haplotype in each disconnected non-trivial subgraph. Further, putative parents were selected based on PageRank centralities. P-values were calculated for each remaining node under the null hypothesis that their haplotypes originated as a consequence of first round PC R errors (p=4xl ()"\ assuming independence of mis-incorporation events) from the set of nearest parent nodes. In addition to all parent nodes, those nodes with P-values less than 0.01 were considered to represent haplotypes characterising the clonal diversity in the sample.
Results
[0060] To test the validity of the SMCS method to overcome the technical artefacts associated with other amplicon-based NGS methods that have been used to detect compound mutations in CML patients, mock samples were created by mixing compound mutant plasmids or patient samples (cDNA or RNA) with different BCR-ABL 1 mutations (up to 5 replicates each of a total of 14 mock samples were examined). Examination of the raw sequencing reads of the mock samples revealed a complex spectrum of mutants, similar to previous clinical reports (Soverini et ah, 2013; Khorashad et al., 2103). Using SMCS, however, enabled bioinformatic filtering of these artefacts, largely eliminating PCR amplification and sequencing errors, and exclusively reported the compound and polyclonal mutants known to be present in the mock samples.
100611 An important feature of the SMCS method is that, unlike Sanger sequencing and conventional amplicon-based NGS methods, it allows enumeration of the actual number of BCR-ABLl molecules input into the reaction as well as reproducible estimation of frequency of each mutant within the original sample. The number of read "families" generated by bioinformatic analysis of the sequencing output directly corresponds to the number of BCR-ABLl molecules examined, with the number of input molecules dependent on CML disease burden and sample quality. Two replicates each of three (3) mock samples were analysed that were each generated by mixing RNA from two patients with different mutations ( 1 : T3 15I/E255V, 2: V299L/G250E, 3 :F359V/E255V). Up to 1516 BCR-ABLl molecules were examined per replicate and no artificial recombinant were detected, which demonstrated that
recombination between templates during cDNA synthesis occurs at a rate of less than 0.2%.
[0062] To further examine the accuracy of SMCS, 44 samples lacking BCR-ABLl kinase domain mutations were tested (nb. samples were collected at diagnosis of CML, cell lines [K562, Molm- 1 , HeLa], or of normal donors). Between 13 and 19577 (median, 972) read families (representing the number of ABLI or BCR-ABLl molecules examined for non-CML and CML samples, respectively) were examined per case, and no mutations implicated in TKJ resistance were detected. The most prevalent nucleotide change in fresh samples was c.746 G→A, which was often present in -0.5% of read families. This nucleotide change is likely caused by in vivo cytosine deamination by the APOBEC family of RNA editing enzymes.
10063] To examine the detection limit of the SMCS method, a mock sample was generated by mixing 5 plasmids containing different compound mutations at various ratios with a plasmid containing unmutated BCR-ABLl. The mock sample contained 65% unmutated BCR-ABL and 35%, 1 %, 0. 1 %, 0.05% and 0.01 % of each of the 5 compound mutations. Six replicates of the mock sample (between 1 ,741 and 3,721 BCR-ABLl molecules were examined per replicate) were examined and it was found that it was possible to detect compound mutants present at a frequency of 0.1 % or greater (ie reproducible detection of 3 of the 5 plasmids containing compound mutations). For some high quality patient samples with high leukaemic burden, it has been possible to examine over 40000 molecules, allowing greater sensitivity of mutation detection.
[0064] Since there is no gold standard method that can accurately detect compound mutations
(particularly rare compound mutations), validation of the SMCS method was investigated by assessing samples for which compound mutations had been previously detected using an amplicon NGS method performed at another centre (Ion Torrent, depth- 10000, Deininger et al, manuscript under review with Blood), and comparing the results. Samples of 25 TKl resistant chronic phase CML patients who were enrolled in the PACE Phase II clinical trial of the third-generation TKl ponatinib were available for testing using SMCS (patients had received 1 to 4 prior TKl therapies, median 3). Using Sanger sequencing, 43 mutations were detected. Within the region examined using SMCS, there was 100% detection concordance with Sanger sequencing (2 mutations detected by Sanger sequencing were outside the region examined by SMCS). The amplicon NGS method detected 36 compound mutants within the 25 patients. Of the 32/36 mutations that were present within the region examined by SMCS, only eight (8) mutations were detected. Based on observations previously published in Parker et al., 2014, 16 of the 24 compound mutants that were not detected by SMCS were considered to likely represent PCR
recombination artefacts. The other 8/24 were low level ( 1 -4%) mutations and most (6/8) involved mutations rarely/never reported in TKI-resistant patients; so these are also suspected to represent artefact nucleotide changes. An additional three (4) compound mutants were detected by SMCS, and were consistent with the respective patient's TKl treatment history (see Figure IB).
[0065] The SMCS method was further evaluated by examining samples of 91 imatinib-resistant CML patients for which extensive examination of the BCR-ABLl kinase domain had been previously performed using Sanger sequencing (detection limit - 10%) and a mass-spectrometry based mutation assay (detection limit ~().2%)(Parker et al., 201 1). Neither Sanger sequencing nor the mass-spectrometry assay is able to distinguish compound mutations from polyclonal mutations. The samples examined were collected immediately before starting (ie "baseline") second-line TKl treatment with a second-line TKl (nilotinib or dasatinib), and clinical and molecular follow-up data was available. Within the basel ine samples of the 91 imatinib-resistant patients, 89 mutations were detected by Sanger sequencing in 66 patients. Within the region examined using SMCS, there was 100% detection concordance with Sanger sequencing (9 mutations detected by Sanger sequencing were outside the region examined by SMCS). In addition to the mutations detectable by Sanger sequencing, 76 rare (low level) mutations had been previously detected using mass-spectrometry in 36 patients. Of these rare mutations, 66 (87%) were within the region examined using SMCS, and 59 (89%) were detected. As the SMCS method allows enumeration of the actual number of BCR-ABLl molecules sequenced, it is considered that the likely reason for the discordance in mutation detection between SMCS and mass-spectrometry was low input of BCR-ABLl molecules into the SMCS assay (median of 59 read families per sample). This may be due to sample degradation or low CML disease burden, resulting in a decreased sensitivity to detect rare mutations. However, in the baseline patient samples, SMCS was able to detect 4 mutations in 4 patients that were undetectable by both of the other methods and became dominant resistant clones during subsequent second-line TKl therapy causing treatment failure in these patients. This demonstrates that the SMCS method is able to detect clinically relevant mutations with greater sensitivity than the sensitive mass- spectrometry assay (Parker et al., 2013); however, sample quality can limit detection sensitivity.
Compound mutations were detected in the samples of 6 of the 91 imatinib-resistant patients (6.5%), suggesting that compound mutations may be relatively rare occurrence in this clinical setting.
Conclusion
[0066] It was demonstrated that BCR-ABLl compound and polyclonal mutants in patient samples could be detected using a novel NGS-based method having the potential to overcome technical artefacts generated with other published methods. Whilst there is no gold standard method that can accurately detect low level compound mutations, in this example SMCS correctly identified PCR amplification and/or sequencing artefacts using mock samples. The method of the present invention takes an important step towards enabling a more concrete understanding of the mutation spectra in patients and their association with resistance. By using samples of BCR-ABLl cDNA molecules (eg generated from patient mPvNA), the method not only allows examination of multiple exons of sequence using a current, clinically applicable, sequencing platform, but also abrogates the need for patient-specific primers to isolate their unique BCR-ABLl gene fusion molecules, a necessary step to enable cost-effective sequencing of the fusion molecules which may be scarce within a clinical sample.
[0067] Throughout the specification and the claims that follow, unless the context requires otherwise, the words "comprise" and "include" and variations such as "comprising" and "including" will be understood to imply the inclusion of a stated integer or group of integers, but not the exclusion of any other integer or group of integers.
[0068] The reference to any prior art in this specification is not, and should not be taken as, an acknowledgement of any form of suggestion that such prior art forms part of the common general knowledge.
[0069] It will be appreciated by those skilled in the art that the invention is not restricted in its use to the particular application described. Neither is the present invention restricted in its preferred embodiment with regard to the particular elements and/or features described or depicted herein. It will be appreciated that the invention is not limited to the embodiment or embodiments disclosed, but is capable of numerous rearrangements, modifications and substitutions without departing from the scope of the invention as set forth and defined by the following claims. REFERENCES
Branford S et al , Br J Haematol 107(3):587 ( 1999).
Branford S et al , Blood 1 14(27):5426-5435 (2009).
Chalandon Y et al., Blood 125(24):371 1-3719 (2015).
Cortes JE et al., N EnglJ Med 369( 19): 1783- 1796 (2013).
Goldman JM and JV Melo, N Engl J Med 349( 15): 1451 -1464 (2003).
Greuber E et al., Nat Rev Cancer 13(8):559-571 (2013).
Hiatt JB et al, Genome Res 23:843-854 (2013).
Hughes T et al, Blood 108( l):28-37 (2006).
Khorashad JS et al, Blood 121 (3):489-498 (2013).
Li H and R Durbin, Bioinformatics Epub, [PMID 20080505 ], 2010.
Melo JV and C Chuah, Cancer Lett 249(2): 121- 132 (2007).
Melo JV and DY Barnes, Nat Rev Cancer 7(6):441-453 (2007).
O'Hare T et al, Cancer Cell 16(5):401 -412 (2009).
Parker WT et al, J Clin Oncol 29(32) :4250-4259 (201 1 ).
Parker WT et al, Blood 1 19(10):2234-2238 (2012).
Parker WT et al, Blood 122(21 ):651 (2013).
Parker WT et al, Blood 124( 1): 153-155 (2014).
Reboursiere E et al, Hematol Oncol Stem Cell Therap 8( l ):28-33 (2015).
Resuehr D and A-N Spiess, Anal Biochem 322:287-291 (2003).
Roberts KG et al, N Engl J Med 371 (1 1 ): 1005-1015 (2014).
Soverini S et al, Blood 122(9): 1634- 1648 (2013).
Soverini S et al, Cancer 120(7): 1002- 1009 (2014).
Wedekind JE et al, Trends Genet 19(4):207-216 (2003).
Yeung DT et al, Leukaemia 29:230-258 (2015).
Zabriskie MS et al, 26(30):428-442 (2014).
Zhang J et al, Nature 463(7280):501 -506 (2010).

Claims

1. A method for identifying and/or enumerating sequence mutations within the kinase domain (KD) of a fusion gene comprising ABU or a portion thereof encoding all or substantially all of the KD, wherein the method of the present invention comprises the steps of:
(i) providing a sample comprising RNA transcripts of a fusion gene comprising ABU or cDNA molecules produced from transcripts of a fusion gene comprising ABU;
(ii) performing a primer extension reaction with a first primer pair comprising forward and reverse primer molecules targeted so as to generate polynucleotide molecules comprising a polynucleotide sequence corresponding to all of the fusion gene or a portion thereof comprising a target region spanning a fusion gene breakpoint, an adjacent upstream sequence and a downstream KD-encoding portion of the ABU sequence, wherein one of said primer molecules comprises a short sequence of random nucleotides providing an individual unique molecular ID tag (UMID), along with a fusion gene-specific sequence and a universal 5' tail sequence, to thereby tag each of the generated polynucleotide molecules with an individual UMID;
(iii) amplifying the UMID-tagged polynucleotide molecules to generate UMID-tagged amplicons comprising a polynucleotide sequence corresponding to all or substantially all of the said KD-encoding portion;
(iv) sequencing the UMID-tagged amplicons; and
(v) identifying a consensus sequence for all sequenced amplicons comprising a common UMID.
2. The method of claim I , wherein the sample comprises cDNA molecules produced from mRNA isolated from a blood sample.
3. The method of claim 1 or 2, wherein each UMID is of 15-20 nucleotides in length.
4. The method of any one of claims I to 3, wherein the reverse primer of step (ii) binds to the 3' end of the ABU sequence within exon 7 or 9.
5. The method of any one of claims 1 to 4, wherein the UMID-tagged amplicons generated in step (iii) are of 650-750 nucleotides in length comprising all or substantially all of the KD-encoding portion of the ABU sequence.
6. The method of claim 5, wherein the UMID-tagged amplicons comprise at least four exons of the KD-encoding portion.
7. The method of any one of claims 1 to 6, when used for the detection of minimal residual disease
(MRD).
8. The method of any one of claims 1 to 6, further comprising identifying one or more sequence mutations within the consensus sequence.
9. The method of claim 8, wherein at least one mutation is a rare sequence mutation.
10. The method of any one of claims 1 to 6, further comprising identifying compound mutations within the consensus sequence.
1 1. The method of claim 10, wherein the compound mutations are associated with tyrosine kinase inhibitor (TKI) drug resistance.
12. The method of any one of claims 1 to 1 1 , wherein the fusion gene is BCR-ABL1 and the forward primer of step (ii) binds to the BCR sequence within exon e l or e l 3.
13. The method of claim 1 1 , wherein the UMlD-tagged polynucleotide molecules produced in step (ii) encode a BCR-ABL1 transcript selected from e l a2, e l 3a2 and e l 4a2.
14. The method of any one of claims 1 to 10, wherein the fusion gene is ETV6 (TEL)-ABLl .
15. The method of claim 14, wherein the forward primer of step (ii) binds to the ETV6 (TEL) sequence within exon 4.
16. A method for identifying and/or enumerating sequence mutations within the kinase domain (KD) of the BCR-ABLl fusion gene, wherein the method comprises the steps of:
(i) providing a sample comprising BCR-ABLl transcripts or cDNA molecules;
(ii) performing a primer extension reaction with a first primer pair comprising forward and reverse primer molecules targeted so as to generate polynucleotide molecules comprising a polynucleotide sequence corresponding to all of the BCR-ABL l fusion gene or a portion thereof comprising a target region spanning a fusion gene breakpoint, an adjacent upstream BCR -derived sequence and a downstream KD-encoding portion of the ABLJ sequence, wherein one of said primer molecules comprises a short sequence of random nucleotides providing an individual unique molecular ID tag (UMID), along with a BCR-ABLl -specific sequence and a universal 5' tail sequence, to thereby tag each of the generated polynucleotide molecules with an individual UMID;
(iii) amplifying the UMID-tagged polynucleotide molecules to generate UMID-tagged amplicons comprising a polynucleotide sequence to all or substantially all of the said KD-encoding portion; (iv) sequencing the UMID-tagged amphcons; and
(v) identifying a consensus sequence for all sequenced amp cons comprising a common UMID.
17. The method of claim 16, wherein the sample comprises cDNA molecules produced from mRNA isolated from a blood sample.
18. The method of claim 16 or 17, wherein each UMID is of 15-20 nucleotides in length.
19. The method of any one of claims 16 to 18, wherein the reverse primer of step (ii) binds to the 3' end of the ABLl sequence within exon 7 or 9.
20. The method of any one of claims 16 to 19, wherein the forward primer of step (ii) binds to the BCR sequence within exon el or el 3.
21. The method of any one of claims 16 to 20, wherein the UMID-tagged polynucleotide molecules produced in step (ii) encode a BCR-ABLl transcript selected from e l a2, el3a2 and el4a2.
22. The method of any one of claims 16 to 21 , wherein the UMID-tagged amphcons generated in step (iii) are of 650-750 nucleotides in length comprising all or substantially all of the KD-encoding portion of the ABL l sequence.
23. The method of claim 22, wherein the UMID-tagged amplicons comprise at least four exons of the KD-encoding portion.
24. The method of any one of claims 16 to 23, further comprising identifying one or more sequence mutations within a consensus sequence.
25. The method of claim 24, wherein at least one mutation is a rare sequence mutation.
26. The method of any one of claims 16 to 23, further comprising identifying compound mutations within a consensus sequence.
27. The method of claim 26, wherein the compound mutations are associated with tyrosine kinase inhibitor (TKI) drug resistance.
28. The method of claim 26, wherein the compound mutations are associated with advanced Ph+ leukaemia.
29. The method of claim 28, wherein the advanced Ph+ leukaemia is advanced CML disease.
30. The method of any one of claims 16 to 26, when used for the detection of minimal residual disease (MRD).
31. The method of claim 30, wherein the method comprises identifying and/or enumerating all sequenced amplicons comprising a common UMID.
32. A kit for use in the method of any one of claims 1 to 31, wherein said kit comprises appropriate primer molecules, and/or buffer solutions, preparations of deoxyribonucleotide triphosphates (dNTPs) etc.
PCT/AU2015/000667 2014-11-05 2015-11-05 Detecting sequence mutations in leukaemic fusion genes Ceased WO2016070230A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AU2014904452 2014-11-05
AU2014904452A AU2014904452A0 (en) 2014-11-05 Detecting mutations

Publications (1)

Publication Number Publication Date
WO2016070230A1 true WO2016070230A1 (en) 2016-05-12

Family

ID=55908276

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2015/000667 Ceased WO2016070230A1 (en) 2014-11-05 2015-11-05 Detecting sequence mutations in leukaemic fusion genes

Country Status (1)

Country Link
WO (1) WO2016070230A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107723353A (en) * 2016-08-12 2018-02-23 嘉兴允英医学检验有限公司 A kind of high-flux detection method of leukaemia driving gene
CN112687341A (en) * 2021-03-12 2021-04-20 上海思路迪医学检验所有限公司 Method for identifying chromosome structure variation by taking breakpoint as center
CN115896038A (en) * 2023-02-02 2023-04-04 陕西师范大学 A kind of BCR-ABL mutation engineering cell and its construction method and application
CN116287162A (en) * 2023-02-14 2023-06-23 赣南医学院 Kit for detecting BCR-ABL1 fusion gene and tyrosine kinase region mutation and promoter methylation thereof and application method
CN120432016A (en) * 2025-07-08 2025-08-05 四川大学 Drug resistance prediction method, device, electronic device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013130512A2 (en) * 2012-02-27 2013-09-06 The University Of North Carolina At Chapel Hill Methods and uses for molecular tags
WO2014026031A1 (en) * 2012-08-10 2014-02-13 Sequenta, Inc. High sensitivity mutation detection using sequence tags
WO2014149134A2 (en) * 2013-03-15 2014-09-25 Guardant Health Inc. Systems and methods to detect rare mutations and copy number variation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013130512A2 (en) * 2012-02-27 2013-09-06 The University Of North Carolina At Chapel Hill Methods and uses for molecular tags
WO2014026031A1 (en) * 2012-08-10 2014-02-13 Sequenta, Inc. High sensitivity mutation detection using sequence tags
WO2014149134A2 (en) * 2013-03-15 2014-09-25 Guardant Health Inc. Systems and methods to detect rare mutations and copy number variation

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
DUNCAVAGE, E.J. ET AL.: "Molecular barcodes allow for discovery of low frequency variants by next-generation sequencing.", THE JOURNAL OF MOLECULAR DIAGNOSTICS, vol. 16, no. 6, November 2014 (2014-11-01), pages 777 - 778 *
HIATT, J.B. ET AL.: "Single molecule molecular inversion probes for targeted, high- accuracy detection of low-frequency variation.", GENOME RESEARCH, vol. 23, 2013, pages 843 - 854, XP055225609, DOI: doi:10.1101/gr.147686.112 *
PARKER, W.T. ET AL.: "Detection of BCR-ABL1 compound and polyclonal mutants in chronic myeloid leukemia patients using a novel next generation sequencing approach that minimises PCR and sequencing errors.", BLOOD, vol. 124, no. 21, 6 December 2014 (2014-12-06), pages 399 *
SMITH, C.C. ET AL.: "Single Molecule Real Time (SMRT?) Sequencing sensitively detects the evolution of polyclonal and compound BCR-ABL mutations in patients who relapse on kinase inhibitor therapy.", BLOOD, vol. 120, 2012 *
SOVERINI, S. ET AL.: "Sensitivity, reproducibility and clinical utility of next-generation sequencing (NGS) for BCR-ABL1 kinase domain mutation screening: results from the CML work package of the Iron-II (Interlaboratory RObustness Of Next-Generation Sequencing) international study.", BLOOD, vol. 122, no. 21, 2013, pages 3824 *
SOVERINI, S. ET AL.: "Unraveling the complexity of tyrosine kinase inhibitor-resistant populations by ultra-deep sequencing of the BCR-ABL kinase domain.", BLOOD, vol. 122, no. 9, 2013, pages 1634 - 1648 *
WANG, Y. ET AL.: "Clonal evolution in breast cancer revealed by single nucleus genome sequencing.", NATURE, vol. 512, 14 August 2014 (2014-08-14), pages 155 - 160 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107723353A (en) * 2016-08-12 2018-02-23 嘉兴允英医学检验有限公司 A kind of high-flux detection method of leukaemia driving gene
CN112687341A (en) * 2021-03-12 2021-04-20 上海思路迪医学检验所有限公司 Method for identifying chromosome structure variation by taking breakpoint as center
CN112687341B (en) * 2021-03-12 2021-06-04 上海思路迪医学检验所有限公司 Method for identifying chromosome structure variation by taking breakpoint as center
CN115896038A (en) * 2023-02-02 2023-04-04 陕西师范大学 A kind of BCR-ABL mutation engineering cell and its construction method and application
CN116287162A (en) * 2023-02-14 2023-06-23 赣南医学院 Kit for detecting BCR-ABL1 fusion gene and tyrosine kinase region mutation and promoter methylation thereof and application method
CN120432016A (en) * 2025-07-08 2025-08-05 四川大学 Drug resistance prediction method, device, electronic device and storage medium

Similar Documents

Publication Publication Date Title
Bewicke-Copley et al. Applications and analysis of targeted genomic sequencing in cancer studies
CA2988674C (en) Detection of chromosome interactions
Dennis et al. Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication
Ji et al. Identification of driving ALK fusion genes and genomic landscape of medullary thyroid cancer
US20180155770A1 (en) Genomic Alterations in the Tumor and Circulation of Pancreatic Cancer Patients
WO2016070230A1 (en) Detecting sequence mutations in leukaemic fusion genes
CN116919962A (en) Methods for diagnosing and treating behavioral disorders
Alizada et al. Conserved regulatory logic at accessible and inaccessible chromatin during the acute inflammatory response in mammals
CA2932679A1 (en) Targeted screening for mutations
AU2014229108B2 (en) Biomarkers for response to rapamycin analogs
KR101638473B1 (en) Detection method of gene deletion based on next-generation sequencing
US20250333795A1 (en) Composition for amplifying flt3 gene, and uses thereof
US10619217B2 (en) Oligodendroglioma drive genes
EP4317456B1 (en) Piezo type mechanosensitive ion channel component 1 (piezo1) variants and uses thereof
Jama et al. Gene fusions during the early evolution of mesothelioma correlate with impaired DNA repair and Hippo pathways
CN106906297A (en) The detection agent of detection drug resistance of tumor variation
US20140039803A1 (en) Method for Rapid Identification of Drug Targets and Drug Mechanisms of Action in Human Cells
WO2025090956A1 (en) Methods for detecting nucleic acid variants using capture probes
US20240084389A1 (en) Use of simultaneous marker detection for assessing difuse glioma and responsiveness to treatment
Rosenberg et al. Comprehensive molecular characterization of a rare case of Philadelphia chromosome–positive acute myeloid leukemia
Borges Monroy Transposable Elements in Health and Disease
WO2025137620A1 (en) Methods for high quality and high accuracy methylation sequencing
WO2025155895A1 (en) Nucleic acid modification profiling method
WO2023220648A2 (en) Compositions and methods for detecting and treating tumors and/or cancers associated with braf and/or map2k1 variants
WO2025076425A1 (en) Genomic and methylation biomarkers for prediction of copy number loss / gene deletion

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15857200

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15857200

Country of ref document: EP

Kind code of ref document: A1