WO2023061914A2 - Methods and reagents for the differential diagnosis of uterine tumors - Google Patents
Methods and reagents for the differential diagnosis of uterine tumors Download PDFInfo
- Publication number
- WO2023061914A2 WO2023061914A2 PCT/EP2022/078052 EP2022078052W WO2023061914A2 WO 2023061914 A2 WO2023061914 A2 WO 2023061914A2 EP 2022078052 W EP2022078052 W EP 2022078052W WO 2023061914 A2 WO2023061914 A2 WO 2023061914A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- gene
- uterine
- leiomyosarcoma
- sample
- expression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/112—Disease subtyping, staging or classification
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/118—Prognosis of disease development
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
Definitions
- the present invention relates to the field of gynecological cancer diagnostics and, more in particular, to methods for the differential diagnosis uterine leiomyoma and uterine leiomyosarcoma in a subject as well as to reagents suitable for carrying out said methods.
- Uterine leiomyomas are benign tumors arising in the smooth muscle cells of the uterine wall. They are the most common pelvic tumors in women, with prevalence of >80% for African American and -70% for Caucasian women before 50 years of age. Although LM are non-malignant tumors, the risk of hidden undiagnosed malignancy, such as leiomyosarcoma (LMS), occurs in one among 498 uterine tumors.
- LMS hidden undiagnosed malignancy
- Laparoscopic myomectomy with morcellation of the tumor is the gold standard therapeutic option for uterine tumors.
- clinical symptoms as well as morphological features between LM and LMS are indistinguishable prior to surgery introducing the risk of potential spread of undiagnosed LMS.
- the FDA issued a press release in 2014 discouraging use of power morcellators to treat myometrial tumors, substituting laparoscopic myomectomy for laparotomy-based procedures and thus increasing morbidity, mortality, and cost for the patient and healthcare system.
- RNAseq whole-exome and RNA sequencing
- the invention relates to an in vitro method for the differential diagnosis of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma, the method comprising:
- step (ii) identifying the subject as suffering from uterine leiomyosarcoma or from uterine leiomyoma by a predictive model which correlates the gene expression profile identified in step (i) with representative gene expression profiles from samples obtained from subjects previously identified as suffering from uterine leiomyosarcoma or from uterine leiomyoma, said predictive model having been generated by training a computer with a plurality of gene expression profiles from previously identified subjects suffering from uterine leiomyosarcoma or from uterine leiomyoma by machine learning on said plurality of gene expression profiles so as to obtain representative gene expression profiles associated with uterine leiomyosarcoma or with uterine leiomyoma.
- the invention relates to an in vitro method for the differential diagnosis of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma, the method comprising:
- the invention relates to an in vitro method for differential diagnosis of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma, the method comprising analyzing in a biological sample from the subject the coverage of at least one gene selected from the group consisting of the TUBB2b gene, the LRRCC1 gene, the NDRG4 gene, the HSF4 gene and the TMPRSS6 gene and wherein an increased coverage with respect to a reference sample in the HSF4 gene, in the NDRG4 gene, in the TMPRSS6 gene and/or in the TUBB2b gene and/or a decreased coverage with respect to a reference sample in the LRRCC1 gene is indicative that the patient has uterine leiomyosarcoma.
- the invention in another aspect, relates to an in vitro method for the diagnosis of a uterine tumor selected from the group consisting of uterine leiomyoma or uterine leiomyosarcoma in a subject, the method comprising determining in the whole-exome sequence of a biological sample from the subject the value of a mutational index which correlates with the number of single nucleotide variants which are characteristic of the COSMIC mutational signature 12 and/or of the COSMIC mutational signature 20, wherein an increase in said index with respect to a reference sample is indicative that the subject is suffering from uterine leiomyosarcoma or from uterine leiomyoma.
- the invention relates to an in vitro method for prognosis of a subject diagnosed of uterine leiomyosarcoma, comprising determining in a biological sample of the subject the presence of at least one CNVs shown in Table 5 wherein the presence of the CNV in the sample is indicative of a bad prognosis of uterine leiomyosarcoma.
- the invention relates to an In vitro method for selecting a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma as a candidate to receive an adequate therapy to treat uterine leiomyosarcoma or uterine leiomyoma, the method comprising:
- the invention in another aspect, relates to a method for the treatment of leiomyosarcoma in a subject in need thereof comprising the administration of a therapy adequate for the treatment of leiomyosarcoma, wherein the patient to be treated has been identified by any of the methods of the invention.
- the invention relates to a kit, package and/or device comprising reagents adequate for implementing the methods according to the invention, to a method according to the invention which is computer-implemented as well as to a computer containing instructions for carrying out any of the methods according to the invention.
- FIG. 1 Transcriptional analysis and validation of the targeted gene panel on leiomyoma (LM) and leiomyosarcoma (LMS).
- Figure 3 Selection of the 5 predictive genes for the differential diagnosis of LMS and LM based on DNA sequencing results. Adjusted p-values were calculated using a Students t-test and the Bonferroni-Hochberg correction.
- Figure 4 Kaplan-Meier plots showing the association between overall survival and alterations in at least 67% of the most frequent CNVs detected in LMS patients.
- First differential diagnostic method of the invention (method based on transcriptomic analysis)
- LM and LMS have specific transcriptomic profiles. This difference in the transcriptomic profiles allows the differential diagnosis of one disease or the other in a subject by analyzing the RNA composition in a sample from the subject and classifying the subject using artificial intelligence using an algorithm which has been trained with transcriptomic profiles from samples from known LM and LMS samples.
- the invention relates to an in vitro method for the differential diagnosis of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma (hereinafter “first method of the invention”), the method comprising: (i) measuring the expression level of gene ARHGAP11 A in a biological sample obtained from the subject thereby obtaining a gene expression profile of said sample; and
- step (ii) identifying the subject as suffering from uterine leiomyosarcoma or from uterine leiomyoma by a predictive model which correlates the gene expression profile identified in step (i) with representative gene expression profiles from samples obtained from subjects previously identified as suffering from uterine leiomyosarcoma or from uterine leiomyoma, said predictive model having been generated by training a computer with a plurality of gene expression profiles from previously identified subjects suffering from uterine leiomyosarcoma or from uterine leiomyoma by machine learning on said plurality of gene expression profiles so as to obtain representative gene expression profiles associated with uterine leiomyosarcoma or with uterine leiomyoma.
- the term “differential diagnosis”, as used herein, refers to the determination of which of two or more diseases with similar symptoms is likely responsible for a subject’s symptom(s), or distinguishing of a particular disease or condition from others that present similar clinical features based on an analysis of the clinical data. This determination, as it is understood by a person skilled in the art, does not claim to be correct in 100% of the analyzed samples. However, it requires that a statistically significant amount of the analyzed samples is classified correctly. The amount that is statistically significant can be established by a person skilled in the art by means of using different statistical tools; illustrative, non-limiting examples of said statistical tools include determining confidence intervals, determining the p-value, the Student’s t-test or Fisher’s discriminant functions, etc.
- the confidence intervals are preferably at least 90%, at least 95%, at least 97%, at least 98% or at least 99%.
- the p-value is preferably less than 0.1 , less than 0.05, less than 0.01 , less than 0.005 or less than 0.0001.
- the teachings of the present invention preferably allow correctly diagnosing in at least 60%, in at least 70%, in at least 80%, or in at least 90% of the subjects of a determined group or population analyzed.
- uterine leiomyoma also known as uterine fibroid, as used herein, refers to a benign tumor that appears in the smooth muscular layer of the uterus.
- uterine leiomyosarcoma refers to a malignant tumor which originates in the smooth muscular layer of the uterus.
- the term includes both primary tumors as well as metastasis.
- the expression levels of ARHGAP11A are determined in a sample from the subject whose diagnosis is to be determined.
- expression level refers to the measurable quantity of gene product produced by the gene in a sample of the subject, wherein the gene product can be a transcriptional product or a translational product.
- the gene expression level can be quantified by measuring the messenger RNA levels of said gene or of the protein encoded by said gene.
- the expression level of the genes used in the method according to the invention can be determined by measuring the levels of mRNA encoded by said gene, or by measuring the levels of the protein encoded by said gene, i.e. the protein or variants thereof.
- Variants of the proteins encoded by the genes which are measured according to the method of the invention include all the physiologically relevant post-translational chemical modifications forms of the protein, for example, glycosylation, phosphorylation, acetylation, etc., provided that the functionality of the protein is maintained.
- sample refers to biological material isolated from a subject.
- the biological sample contains any biological material suitable for detecting DNA, RNA or protein levels.
- the sample comprises genetic material, e.g., DNA, genomic DNA (gDNA), complementary DNA (cDNA), RNA, heterogeneous nuclear RNA (hnRNA), mRNA, etc., from the subject under study.
- the sample can be isolated from any suitable tissue or biological fluid such as, for example blood, saliva, plasma, serum, urine, cerebrospinal liquid (CSF), feces, a surgical specimen, a specimen obtained from a biopsy, and a tissue sample embedded in paraffin. Methods for isolating samples are well known to those skilled in the art.
- methods for obtaining a sample from a biopsy include gross apportioning of a mass, or micro-dissection or other art-known cell-separation methods. In order to simplify conservation and handling of the samples, these can be formalin-fixed and paraffin- embedded or first frozen and then embedded in a cryosolidifiable medium, such as OCT- compound, through immersion in a highly cryogenic medium that allows rapid freeze.
- the sample from the subject according to the methods of the present invention is a biological fluid sample.
- the sample from the subject according to the methods of the present invention is selected from the group consisting of blood, serum, plasma, and a tissue sample; more preferably from the group consisting of plasma and a tissue sample.
- the term “gene expression profile”, as used herein, refers to a dataset generated from one or more genes listed above that make up a particular gene expression pattern that may be reflective of level of expression of each gene or set of genes in the biological sample under study.
- the term “subject” or “patient” refers herein to a person in need of the analysis described herein.
- the subject is a patient.
- the subject is a human.
- the subject is a female human (a woman).
- the subject is a female presenting with pathology and or history consistent with uterine fibroids believed to be a benign neoplasm.
- the subject is a female presenting with pathology and or history consistent with uterine fibroids believed to be leiomyoma (LM).
- LM leiomyoma
- the subject is a female presenting with pathology and or history consistent with uterine fibroids believed to be leiomyoma and desiring surgical intervention.
- the subject is a female presenting with pathology and or history consistent with uterine fibroids believed to be leiomyoma, desiring surgical intervention, and requiring an evaluation of the neoplasm to evaluate the risk that the neoplasm is malignant in order to guide the selection of therapy.
- the subject is a female presenting with pathology and or history consistent with uterine fibroids, desiring surgical intervention and requiring an evaluation of the neoplasm to evaluate the risk that the neoplasm is a leiomyosarcoma in order to guide the selection of therapy.
- the sample wherein the expression level of the ARHGAP11A is determined can be any sample containing cells from the potential tumor.
- the sample containing cells from the potential tumor is a potential tumor tissue or a portion thereof.
- said potential tumor tissue sample is a uterine tissue sample from a patient in which the differential diagnosis of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma is to be carried out.
- Said sample can be obtained by conventional methods, e.g., biopsy, surgical excision, or aspiration, by using methods well known to those of ordinary skill in the related medical arts.
- Methods for obtaining the sample from the biopsy include gross apportioning of a mass, or microdissection or other art-known cell-separation methods including and partial tumorectomy.
- Tumor cells can additionally be obtained from fine needle aspiration cytology.
- the sample has been obtained by hysterectomy or laparoscopic/laparotomic myomectomy.
- these can be formalin-fixed and paraffin-embedded or first frozen and then embedded in a cryosolidifiable medium, such as OCT-compound, through immersion in a highly cryogenic medium that allows for rapid freeze.
- a cryosolidifiable medium such as OCT-compound
- the sample wherein the expression levels of the ARHGAP11A gene are determined is a tumor sample obtained by hysterectomy laparoscopic/laparotomic myomectomy.
- step (i) of the method of the invention comprises the determination of the expression levels of one or more additional genes.
- the first method of the invention comprises in step (i) measuring the expression level of CENPE gene.
- the first method of the invention comprises in step (i) measuring the expression level of the CENPE and the COL4A5 gene.
- the first method of the invention comprises in step (i) measuring the expression level of the CENPE gene, the COL4A5 gene and the CENPF gene.
- the first method of the invention comprises in step (i) measuring the expression level of the CENPE gene, the COL4A5 gene, the CENPF and the MFAP5 gene.
- step (i) of the first method of the invention may also comprise the determination of any combination of two, three, four or five of the genes shown in Table 3, which include the ARHGAP11A gene, the CENPE gene, the COL4A5 gene, the CENPF and the MFAP5 gene.
- step (i) of the first method of the invention comprise the determination of a set of genes selected from the group consisting of ARHGAP11A and CENPE, ARHGAP11A and COL4A5, ARHGAP11A and CENPF, ARHGAP11A and MFAP5, CENPE and COL4A5, CENPE and CENPF, CENPE and MFAP5, COL4A5 and CENPF, COL4A5 and MFAP5, CENPF and MFAP5, ARHGAP11A, CENPE and COL4A5, ARHGAP11A, CENPE and CENPF, ARHGAP11A, CENPE and MFAP5, ARHGAP11A, COL4A5 and CENPF, ARHGAP11A, COL4A5 and MFAP5, ARHGAP11A, CENPF and MFAP5, ARHGAP11A, CENPF and MFAP5, ARHGAP11A, CENPF and MFAP5, ARHGAP11A, CENPF and MFAP5, CEN
- the first step of the first diagnostic method according to the invention comprises the determination of the ARHGAP11 A gene, the CENPE gene, the COL4A5 gene, the CENPF and the MFAP5 gene together with the determination of at least one additional gene selected from the genes listed in Table 2.
- the method comprises the determination of the expression levels of all the genes listed in Table 2.
- Gene expression levels can be quantified by measuring the messenger RNA levels of the gene or of the protein encoded by said gene or of the protein encoded by said gene, i.e. ARHGAP11A protein or of variants thereof.
- ARHGAP11A protein variants include all the physiologically relevant post-translational chemical modifications forms of the protein, for example, glycosylation, phosphorylation, acetylation, etc., provided that the functionality of the protein is maintained.
- Said term encompasses the ARHGAP11A protein of any mammal species, including but not being limited to domestic and farm animals (cows, horses, pigs, sheep, goats, dogs, cats or rodents), primates and humans.
- the ARHGAP11A protein is a human protein.
- the biological sample may be treated to physically, mechanically or chemically disrupt tissue or cell structure, to release intracellular components into an aqueous or organic solution to prepare nucleic acids for further analysis.
- the nucleic acids are extracted from the sample by procedures known to the skilled person and commercially available.
- RNA is then extracted from frozen or fresh samples by any of the methods typical in the art, for example, Sambrook, J., et al., 2001. Molecular cloning: A Laboratory Manual, 3 rd ed., Cold Spring Harbor Laboratory Press, N.Y., Vol. 1-3.
- the RNA is extracted from formalin-fixed, paraffin embedded tissues.
- An exemplary deparaffinization method involves washing the paraffinized sample with an organic solvent, such as xylene, for example.
- Deparaffinized samples can be rehydrated with an aqueous solution of a lower alcohol. Suitable lower alcohols, for example include, methanol, ethanol, propanols, and butanols.
- Deparaffinized samples may be rehydrated with successive washes with lower alcoholic solutions of decreasing concentration, for example.
- the sample is simultaneously deparaffinised and rehydrated.
- the sample is then lysed and RNA is extracted from the sample.
- kits may be used for RNA extraction from paraffin samples, such as PureLinkTM FFPE Total RNA Isolation Kit (Thermofisher Scientific Inc., US).
- RNA extraction from paraffin embedded tissues are disclosed, for example, in Rupp and Locker (1987) Lab Invest. 56:A67, and De Andres et al., BioTechniques 18:42044 (1995). Preferably, care is taken to avoid degradation of the RNA during the extraction process.
- Hybridization-based approaches typically involve incubating fluorescently labelled cDNA with custom-made microarrays or commercial high-density oligo microarrays. Specialized microarrays have also been designed; for example, arrays with probes spanning exon junctions can be used to detect and quantify distinct spliced isoforms. Genomic tiling microarrays that represent the genome at high density have been constructed and allow the mapping of transcribed regions to a very high resolution, from several base pairs to -100 bp.
- Hybridization-based approaches are high throughput and relatively inexpensive, except for high-resolution tiling arrays that interrogate large genomes.
- these methods have several limitations, which include: reliance upon existing knowledge about genome sequence; high background levels owing to cross-hybridization; and a limited dynamic range of detection owing to both background and saturation of signals.
- comparing expression levels across different experiments is often difficult and can require complicated normalization methods.
- sequence-based approaches directly determine the cDNA sequence.
- Sanger sequencing of cDNA or EST libraries was used, but this approach is relatively low throughput, expensive and generally not quantitative.
- Tag-based methods were developed to overcome these limitations, including serial analysis of gene expression (SAGE), cap analysis of gene expression (CAGE), and massively parallel signature sequencing (MPSS). These tag-based sequencing approaches are high throughput and can provide precise, digital gene expression levels.
- SAGE serial analysis of gene expression
- CAGE cap analysis of gene expression
- MPSS massively parallel signature sequencing
- the present methods can also involve a larger-scale analysis of mRNA levels, e.g., the detection of a plurality of biomarkers (e.g., 2-10, or 5-50, or 10-100, or 50-500 or more at one time).
- the methods described here can also involve the step of conducting a transcriptomic analysis (i.e., the analysis of the complete set of transcripts in a cell, and their quantity, for a specific developmental stage or physiological condition). Understanding the transcriptome can be important for interpreting the functional elements of the genome and revealing the molecular constituents of cells and tissues, and also for understanding development and disease and how the biomarkers disclosed herein are indicative or predictive of a particular condition (e.g., LM or LMS).
- LM or LMS a particular condition
- transcriptomics The key aims of transcriptomics are: to catalogue all species of transcript, including mRNAs, non-coding RNAs and small RNAs; to determine the transcriptional structure of genes, in terms of their start sites, 5' and 3' ends, splicing patterns and other post- transcriptional modifications; and to quantify the changing expression levels of each transcript during development and under different conditions.
- RNA-Seq RNA sequencing
- RNAseq or"RNA-seq” is used to refer to a transcriptomic approach where the total complement of RNAs from a given sample is isolated and sequenced using high-throughput next generation sequencing (NGS) technologies (e.g., SOLiD, 454, Illumina, or ION Torrent).
- NGS next generation sequencing
- RNA-Seq uses deep-sequencing technologies.
- a population of RNA (total or fractionated, such as poly(A)+) is converted to a library of cDNA fragments with adaptors attached to one or both ends.
- Each molecule, with or without amplification, is then sequenced in a high-throughput manner to obtain short sequences from one end (single-end sequencing) or both ends (pair-end sequencing).
- the reads are typically 30- 400 bp, depending on the DNA-sequencing technology used.
- any high- throughput sequencing technology can be used for RNA-Seq, e.g., the Illumina IG18, Applied Biosystems SOUD22 and Roche 454 Life Science systems have already been applied for this purpose.
- the Helicos Biosciences tSMS system is also appropriate and has the added advantage of avoiding amplification of target cDNA.
- the resulting reads are either aligned to a reference genome or reference transcripts, or assembled de novo without the genomic sequence to produce a genomescale transcription map that consists of both the transcriptional structure and/or level of expression for each gene.
- RNA-seq Transcriptome analysis by next-generation sequencing (RNA-seq) allows investigation of a transcriptome at unsurpassed resolution.
- RNA-seq is independent of a priori knowledge on the sequence under investigation.
- the transcriptome can be profiled by high throughput techniques including SAGE, microarray, and sequencing of clones from cDNA libraries.
- high throughput techniques including SAGE, microarray, and sequencing of clones from cDNA libraries.
- oligo nucleotide microarrays have been the method of choice providing high throughput and affordable costs.
- microarray technology suffers from well- known limitations including insufficient sensitivity for quantifying lower abundant transcripts, narrow dynamic range and biases arising from non-specific hybridizations. Additionally, microarrays are limited to only measuring known/annotated transcripts and often suffer from inaccurate annotations. Sequencing -based methods such as SAGE rely upon cloning and sequencing cDNA fragments.
- Sequencingbased approaches have a number of significant technical advantages over hybridizationbased microarray methods.
- the output from sequence-based protocols is digital, rather than analog, obviating the need for complex algorithms for data normalization and summarization while allowing for more precise quantification and greater ease of comparison between results obtained from different samples. Consequently, the dynamic range is essentially infinite, if one accumulates enough sequence tags.
- Sequence-based approaches do not require prior knowledge of the transcriptome and are therefore useful for discovery and annotation of novel transcripts as well as for analysis of poorly annotated genomes.
- the application of sequencing technology in transcriptome profiling has been limited by high cost, by the need to amplify DNA through bacterial cloning, and by the traditional Sanger approach of sequencing by chain termination.
- next-generation sequencing (NGS) technology eliminates some of these barriers, enabling massive parallel sequencing at a high but reasonable cost for small studies.
- the technology essentially reduces the transcriptome to a series of randomly fragmented segments of a few hundred nucleotides in length. These molecules are amplified by a process that retains spatial clustering of the PGR products, and individual clusters are sequenced in parallel by one of several technologies.
- Current NGS platforms include the Roche 454 Genome Sequencer, Illumina's Genome Analyzer, and Applied Biosystems' SOLiD. These platforms can analyze tens to hundreds of millions of DNA fragments simultaneously, generate giga-bases of sequence information from a single run, and have revolutionized SAGE and cDNA sequencing technology.
- the 3' tag Digital Gene Expression uses oligo-dT priming for first strand cDNA synthesis, generates libraries that are enriched in the 3' untranslated regions of polyadenylated mRNAs, and produces base cDNA tags.
- DGE Digital Gene Expression
- sequencing methods contemplated herein requires the preparation of sequencing libraries.
- sequencing library preparation is described in U.S. Patent Application Publication No. US 2013/0203606, which is incorporated by reference in its entirety.
- this preparation may take the coagulated portion of the sample from the droplet actuator as an assay input.
- the library preparation process is a ligation-based process, which includes four main operations: (a) blunt-ending, (b) phosphorylating, (c) A-tailing, and (d) ligating adaptors. DNA fragments in a droplet are provided to process the sequencing library.
- nucleic acid fragments with 5'- and/or 3 '-overhangs are blunt-ended using T4 DNA polymerase that has both a 3 '-5' exonuclease activity and a 5'-3' polymerase activity, removing overhangs and yielding complementary bases at both ends on DNA fragments.
- the T4 DNA polymerase may be provided as a droplet.
- T4 polynucleotide kinase may be used to attach a phosphate to the 5'-hydroxyl terminus of the blunt-ended nucleic acid.
- the T4 polynucleotide kinase may be provided as a droplet.
- the 3' hydroxyl end of a dATP is attached to the phosphate on the 5 '-hydroxyl terminus of a blunt-ended fragment catalyzed by exo-Klenow polymerase.
- sequencing adaptors are ligated to the A-tail.
- T4 DNA ligase is used to catalyze the formation of a phosphate bond between the A-tail and the adaptor sequence.
- end-repairing including blunt-ending and phosphorylation
- sequencing library preparation can involve the production of a random collection of adapter-modified DNA fragments (e.g., polynucleotides) that are ready to be sequenced.
- Sequencing libraries of polynucleotides can be prepared from DNA or RNA, including equivalents, analogs of either DNA or cDNA, for example, DNA or cDNA that is complementary or copy DNA produced from an RNA template, by the action of reverse transcriptase.
- the polynucleotides may originate in double-stranded form (e.g., dsDNA such as genomic DNA fragments, cDNA, PCR amplification products, and the like) or, in certain embodiments, the polynucleotides may originated in singlestranded form (e.g., ssDNA, RNA, etc.) and have been converted to dsDNA form.
- single stranded mRNA molecules may be copied into double-stranded cDNAs suitable for use in preparing a sequencing library.
- the precise sequence of the primary polynucleotide molecules is generally not material to the method of library preparation, and may be known or unknown.
- the polynucleotide molecules are DNA molecules.
- the polynucleotide molecules represent the entire genetic complement of an organism or substantially the entire genetic complement of an organism, and are genomic DNA molecules (e.g., cellular DNA, cell free DNA (cfDNA), etc.), that typically include both intron sequence and exon sequence (coding sequence), as well as non-coding regulatory sequences such as promoter and enhancer sequences.
- genomic DNA molecules e.g., cellular DNA, cell free DNA (cfDNA), etc.
- coding sequence typically include both intron sequence and exon sequence (coding sequence)
- non-coding regulatory sequences such as promoter and enhancer sequences.
- the primary polynucleotide molecules comprise human genomic DNA molecules, e.g., cfDNA molecules present in peripheral blood of a subject.
- Preparation of sequencing libraries for some NGS sequencing platforms is facilitated by the use of polynucleotides comprising a specific range of fragment sizes.
- Preparation of such libraries typically involves the fragmentation of large polynucleotides (e.g. cellular genomic DNA) to obtain polynucleotides in the desired size range.
- the expression level can be determined using mRNA obtained from a formalin- fixed, paraffin-embedded tissue sample.
- mRNA may be isolated from an archival pathological sample or biopsy sample which is first deparaffinized.
- An exemplary deparaffinization method involves washing the paraffinized sample with an organic solvent, such as xylene.
- Deparaffinized samples can be rehydrated with an aqueous solution of a lower alcohol. Suitable lower alcohols, for example, include methanol, ethanol, propanols and butanols.
- Deparaffinized samples may be rehydrated with successive washes with lower alcoholic solutions of decreasing concentration, for example. Alternatively, the sample is simultaneously deparaffinized and rehydrated. The sample is then lysed and RNA is extracted from the sample.
- Samples can be also obtained from fresh tumor tissue such as a resected tumor.
- samples can be obtained from fresh tumor tissue or from OCT embedded frozen tissue.
- samples can be obtained by laparoscopic myomectomy and then paraffin-embedded.
- control RNA relates to RNA whose expression levels do not change or change only in limited amounts in tumor cells with respect to non-tumorigenic cells.
- the control RNA is mRNA derived from housekeeping genes and which code for proteins which are constitutively expressed and carry out essential cellular functions.
- housekeeping genes for use in the present invention include p-2-microglobulin, ubiquitin, 18-S ribosomal protein, cyclophilin, IPO8, HPRT, GAPDH, PSMB4, tubulin and p-actin.
- the relative gene expression quantification is calculated according to the comparative threshold cycle (Ct) method using GAPDH, IPO8, HPRT, P-actin or PSMB4 as an endogenous control and commercial RNA controls as calibrators.
- Ct comparative threshold cycle
- Final results are determined according to the formula 2 _(ACt sam P
- Suitable methods to determine gene expression levels at the mRNA level include, without limitation, standard assays for determining mRNA expression levels such as qPCR, RT-PCR, RNA protection analysis, Northern blot, RNA dot blot, in situ hybridization, microarray technology, tag based methods such as serial analysis of gene expression (SAGE) including variants such as LongSAGE and SuperSAGE, microarrays, fluorescence in situ hybridization (FISH), including variants such as Flow- FISH, qFiSH and double fusion FISH (D-FISH), and the like.
- SAGE serial analysis of gene expression
- FISH fluorescence in situ hybridization
- D-FISH double fusion FISH
- the determination of the expression levels of the genes or genes is carried out by exome-wide gene expression from RNAseq.
- the biological sample is a sample containing myometrial cells or RNA derived from myometrial cells.
- the sample containing myometrial cells is a myometrial biopsy.
- the first method of the invention comprises identifying the subject as suffering from uterine leiomyosarcoma or from uterine leiomyoma by a predictive model which correlates the gene expression profile identified in step (i) with representative gene expression profiles from samples obtained from subjects previously identified as suffering from uterine leiomyosarcoma or from uterine leiomyoma, said predictive model having been generated by training a computer with a plurality of gene expression profiles from previously identified subjects suffering from uterine leiomyosarcoma or from uterine leiomyoma by machine learning on said plurality of gene expression profiles so as to obtain representative gene expression profiles associated with uterine leiomyosarcoma or with uterine leiomyoma.
- the representative data sets use at least 10, and more preferably 20, 25, 30 or more gene expression profiles from samples obtained from subjects suffering uterine leiomyoma or uterine leiomyosarcoma.
- the data sets may derive from subjects with multiple different parameters such as gender, age, weight, national origin, etc.
- the second step of the first method of the invention is performed by a machine learning method selected from a regression method, a classification method or a combination thereof.
- machine learning generally refers to algorithms that give a computer the ability to learn without being explicitly programmed, including algorithms that learn from and make predictions about data.
- Machine learning algorithms employed by the embodiments disclosed herein may include, but are not limited to, random forest (“RF”), least absolute shrinkage and selection operator (“LASSO”) logistic regression, regularized logistic regression, XGBoost, decision tree learning, artificial neural networks (“ANN”), deep neural networks (“DNN”), support vector machines, rule-based machine learning, and/or others.
- RF random forest
- LASSO least absolute shrinkage and selection operator
- XGBoost decision tree learning
- ANN artificial neural networks
- DNN deep neural networks
- support vector machines rule-based machine learning, and/or others.
- linear regression For clarity, algorithms such as linear regression or logistic regression can be used as part of a machine learning process. However, it will be understood that using linear regression or another algorithm as part of a machine learning process is distinct from performing a statistical analysis such as regression with a spreadsheet program. Whereas statistical modeling relies on finding relationships between variables (e.g., mathematical equations) to predict an outcome, a machine learning process may continually update model parameters and adjust a classifier as new data becomes available, without relying on explicit or rules-based programming.
- variables e.g., mathematical equations
- the second step of the first method of the invention is performed by a classification method, which results in identifying the subject as suffering from uterine leiomyoma or uterine leiomyosarcoma.
- step (b) is carried out by a classification method; preferably selected from logistic regression, random forest, gradient boosting (GB), adaptive boosting (AB), extreme Gradient Boosting (XGB) k-nearest neighbors (kNN), artificial neural network (ANN), support vector machine (SVM), and combinations thereof.
- a classification method preferably selected from logistic regression, random forest, gradient boosting (GB), adaptive boosting (AB), extreme Gradient Boosting (XGB) k-nearest neighbors (kNN), artificial neural network (ANN), support vector machine (SVM), and combinations thereof.
- the predictive model is generated by training the computer with a plurality of gene expression profiles from previously identified samples from subjects suffering from uterine leiomyoma or uterine leiomyosarcoma by machine learning on said plurality of gene expression profiles so as to obtain representative multivariable data sets associated with uterine leiomyoma or uterine leiomyosarcoma; wherein the training comprises the following steps: (i) training data, from a plurality of gene expression profiles, is randomly stratified into:
- the predictive model is seeded on the calibration dataset (particularly is developed by applying a machine learning method selected from a regression method, a classification method or a combination thereof on the calibration dataset);
- the predictive model is optimized by an internal cross validation; preferably by a k-fold cross validation, wherein each of the k cases of the k-fold cross validation is used for testing only once and one at a time; and
- the predictive model is further validated by predicting new samples using the validation dataset.
- the second step is performed by a classification method wherein the patients are assigned a probability of belonging to given category such as patients suffering from leiomyoma or patients suffering from leiomyosarcoma.
- the classification method is carried out by a method selected from gradient boosting, support vector machine (SVM), decision trees, K nearest neighbors, Naive Bayes or neural networks.
- the classification method is carried out by a Gradient Boosting.
- Gradient Boosting is a machine learning algorithm that uses a gradient boosting framework.
- Gradient Boosting trees, a decision-tree-based ensemble model differ fundamentally from conventional statistical techniques that aim to fit a single model using the entire dataset.
- Such ensemble approach improves performance by combining strengths of models that learn the data by recursive binary splits, such as trees, and of "boosting", an adaptive method for combining several simple (base) models.
- a subsample of the training data is selected at random (without replacement) from the entire training data set, and then a simple base learner is fitted on each subsample.
- the final boosted trees model is an additive tree model, constructed by sequentially fitting such base learners on different subsamples. This procedure incorporates randomization, which is known to substantially improve the predictor accuracy and also increase robustness.
- the second step is performed by a regression method; preferably selected from multiple linear regression (MLR), principal component regression (PCR), partial least squares regression (PLSR), artificial neural network (ANN), support vector machine (SVM), random forest (RF), lassor regression, ridge regression and combinations thereof.
- MLR linear regression
- PCR principal component regression
- PLSR partial least squares regression
- ANN artificial neural network
- SVM support vector machine
- RF random forest
- the second step is performed by a classification method, more in particular, by gradient boosting, which includes the value of one or more variables of the gene expression profile collected in step (i) and which contribute to the identification of the subject as suffering from uterine leiomyoma or uterine leiomyosarcoma.
- the second step is performed by a regression method which includes the value of one or more variables of the gene expression profile collected in step (i) and which contribute to the identification of the subject as suffering from uterine leiomyoma or uterine leiomyosarcoma.
- the second method according to the invention is carried out in a patient that has been previously identified as suffering an uterine myometrial tumor, being either leiomyosarcoma or uterine leiomyoma by imaging examination, preferably by ultrasonography.
- Second differential diagnostic method of the invention (method based on transcriptomic analysis)
- the invention relates to an in vitro method (hereinafter second method of the invention) for the differential diagnosis of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma, the method comprising:
- the second method of the invention comprises the determination of the level of expression of at least one gene selected from the list shown in Table 1 in a biological sample obtained from the subject. In some embodiments, this step comprises the determination. In some embodiments, the first step of the method of the invention comprises the determination of the expression levels of at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least
- At least 180 at least 190, at least 200, at least 210, at least 220, at least 230, at least 240, at least 250, at least 260, at least 270, at least 280, at least 290, at least 300, at least 310, at least 320, at least 330, at least 340, at least 350, at least 360, at least
- the first step of the second method of the invention comprises the determination of the expression levels of the genes mentioned in Table 3.
- the first step of the second method of the invention comprises the determination of the expression levels of the genes mentioned in Table 2.
- the first step of the second method of the invention comprises the determination of the expression levels of the genes mentioned in Table 1.
- the biological sample is a sample containing myometrial cells, or RNA derived from myometrial cells.
- the sample containing myometrial cells is a myometrial biopsy.
- the second method of the invention comprises comparing said level of expression with a reference value.
- Reference value refers to a laboratory value used as a reference for values/data obtained by laboratory examinations of subjects or samples collected from subjects.
- the reference value or reference level can be an absolute value; a relative value; a value that has an upper and/or lower limit; a range of values; an average value; a median value, a mean value, or a value as compared to a particular control or baseline value.
- a reference value can be based on an individual sample value, such as for example, a value obtained from a sample from the subject being tested, but at an earlier point in time or from a non-cancerous tissue.
- the reference value can be based on a large number of samples, such as from population of subjects of the chronological age matched group, or based on a pool of samples including or excluding the sample to be tested.
- Various considerations are taken into account when determining the reference value of the marker. Among such considerations are the age, weight, sex, general physical condition of the patient and the like. For example, equal amounts of a group of at least 2, at least 10, at least 100 to preferably more than 1000 subjects, preferably classified according to the foregoing considerations, for example according to various age categories, are taken as the reference group.
- the quantity of the biomarker in a sample from a tested subject may be determined directly relative to the reference value (e.g., in terms of increase or decrease, or fold-increase or fold-decrease).
- this may allow to compare the quantity of the biomarker in the sample from the subject with the reference value (in other words to measure the relative quantity of any one or more biomarkers in the sample from the subject vis-a-vis the reference value) without the need to first determine the respective absolute quantities of said biomarker.
- reference values are the expression level of the gene being compared in a reference sample.
- the “reference sample” may vary depending on whether diagnosis of uterine leiomyosarcoma or of uterine leiomyoma is desired. If the diagnosis of uterine leiomyosarcoma is desired, then the reference sample means a sample obtained from a pool of subjects suffering uterine leiomyoma or which do not have a history of leiomyoma. Thus, in an embodiment, the reference value is the mean level of expression of the gene or genes in a pool of samples from leiomyoma patients.
- the reference sample means a sample obtained from a pool of subjects suffering uterine leiomyosarcoma or which do not have a history of leiomyosarcoma.
- the reference value is the mean level of expression of the gene or genes in a pool of samples from leiomyosarcoma patients
- the reference value for the expression level of the gene or genes of interest is the mean level of expression of said gene or genes in a pool of samples from primary tumours, preferably obtained from subjects suffering from the same type of cancer as the patient object of the study.
- the reference value is the expression levels of the gene of interest in a pool obtained from primary tumor tissue obtained from patients.
- the expression profile of the genes in the reference sample can preferably, be generated from a population of two or more individuals. The population, for example, can comprise 3, 4, 5, 10, 15, 20, 30, 40, 50 or more individuals.
- the expression profile of the genes in the reference sample and in the sample of the individual that is going to be diagnosed according to the methods of the present invention can be generated from the same individual, provided that the profiles to be assayed and the reference profile are generated from biological samples taken at different times and are compared to one another. For example, a sample of an individual can be obtained at the beginning of a study period. A reference biomarker profile from this sample can then be compared with the biomarker profiles generated from subsequent samples of the same individual.
- the reference sample is a pool of samples from several individuals and corresponds to portions of tissue that are far from the tumor area and which have preferably been obtained in the same biopsy but which do not have any anatomopathological characteristic of tumor tissue.
- the level of this marker expressed in tumor tissue from subjects can be compared with this reference value, and thus be assigned a level of deviation with respect to a reference value.
- the “deviation” can be either an increase or a decrease in the expression levels. For example, an increase in expression level above the reference value of at least 1.1 -fold, 1.5-fold, 2-fold, 5-fold, 10- fold, 20-fold, 30-fold, 40-fold, 50-fold, 60-fold, 70-fold, 80-fold, 90-fold, 100-fold or even more compared with the reference value is considered as “increased” expression level.
- the expression of a gene is considered increased in a sample of the subject under study when the levels increase with respect to the reference sample by at least 5%, by at least 10%, by at least 15%, by at least 20%, by at least 25%, by at least 30%, by at least 35%, by at least 40%, by at least 45%, by at least 50%, by at least 55%, by at least 60%, by at least 65%, by at least 70%, by at least 75%, by at least 80%, by at least 85%, by at least 90%, by at least 95%, by at least 100%, by at least 110%, by at least 120%, by at least 130%, by at least 140%, by at least 150%, or more.
- a decrease in expression levels below the reference value of at least 0.9-fold, 0.75- fold, 0.2-fold, 0.1-fold, 0.05-fold, 0.025-fold, 0.02-fold, 0.01-fold, 0.005-fold or even less compared with reference value is considered as “decreased” expression level.
- the expression of a gene is considered decreased when its levels decrease with respect to the reference sample by at least 5%, by at least 10%, by at least 15%, by at least 20%, by at least 25%, by at least 30%, by at least 35%, by at least 40%, by at least 45%, by at least 50%, by at least 55%, by at least 60%, by at least 65%, by at least 70%, by at least 75%, by at least 80%, by at least 85%, by at least 90%, by at least 95%, by at least 100% (i.e., absent).
- the comparison of the expression levels of the gene or genes of interest with the reference value allows differential diagnosis between uterine leiomyoma or uterine leiomyosarcoma.
- the patient is detected as having leiomyosarcoma if the expression level of the gene or genes under examination is/are increased with respect to the expression levels found in leiomyoma samples, which is defined in Table 1 as genes having a logFC higher than 0.
- the patient is detected as having leiomyosarcoma if the expression level of the gene or genes under examination is/are decreased with respect to the expression levels found in leiomyoma samples, which is defined in Table 1 as genes having a logFC lower than 0.
- the second method according to the invention allows the differential diagnoses of uterine leiomyosarcoma when the deviation in the level of expression of the gene or genes is/are of at least four fold with respect to the reference value or values for said genes, said reference value being the expression level of the same gene or genes determined in sample from a patient with uterine leiomyoma.
- the second method according to the invention allows the differential diagnoses of uterine leiomyoma when the deviation in the level of expression of the gene or genes is/are of at least four fold with respect to the reference value or values for said genes, said reference value being the expression level of the same gene or genes determined in sample from a patient with uterine leiomyosarcoma.
- the second method according to the invention is carried out in a patient that has been previously identified as suffering an uterine myometrial tumor, being either leiomyosarcoma or uterine leiomyoma by imaging examination, preferably by ultrasonography.
- the invention relates to an in vitro method for differential diagnosis (hereinafter “the third method of the invention) of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma, the method comprising analyzing in a biological sample from the subject the coverage in at least one gene selected from the group consisting of the TUBB2b gene, the LRRCC1 gene, the NDRG4 gene, the HSF4 gene and the TMPRSS6 gene, wherein an increased coverage in the HSF4 gene, in the NDRG4 gene, in the TMPRSS6 gene and/or in the TUBB2b gene and/or a decreased coverage in the LRRCC1 gene is indicative that the patient has uterine leiomyosarcoma.
- coverage refers to the abundance of sequence tags mapped to a defined sequence which is obtained after the sequence has been determined.
- a given genomic region is read, which may comprise the whole genome or regions thereof, and the number of reads which contain a given sequence tag within the genomic region is defined as the coverage for this specific sequence tag.
- sequence tag is herein used interchangeably with the term "mapped sequence tag” to refer to a sequence read that has been specifically assigned, i.e., mapped, to a larger sequence, e.g., a reference genome, by alignment.
- Mapped sequence tags are uniquely mapped to a reference genome, i.e., they are assigned to a single location to the reference genome. Unless otherwise specified, tags that map to the same sequence on a reference sequence are counted once. Tags may be provided as data structures or other assemblages of data.
- a tag contains a read sequence and associated information for that read such as the location of the sequence in the genome, e.g., the position on a chromosome.
- the location is specified for a positive strand orientation.
- a tag may be defined to allow a limited amount of mismatch in aligning to a reference genome.
- tags that can be mapped to more than one location on a reference genome, i.e., tags that do not map uniquely, may not be included in the analysis.
- Coverage can be quantitatively indicated by sequence tag density (or count of sequence tags), sequence tag density ratio, normalized coverage amount, adjusted coverage values, etc.
- the levels can be displayed graphically on a display as numeric values or proportional bars (i.e., a bar graph) or any other display method known to those skilled in the art.
- the graphic display can provide a visual representation of the amount of copy number variation in the biological sample being evaluated.
- coverage quantity refers to a modification of raw coverage and often represents the relative quantity of sequence tags (sometimes called counts) in a region of a genome such as a bin.
- a coverage quantity may be obtained by normalizing, adjusting and/or correcting the raw coverage or count for a region of the genome.
- a normalized coverage quantity for a region may be obtained by dividing the sequence tag count mapped to the region by the total number sequence tags mapped to the entire genome. Normalized coverage quantity allows comparison of coverage of a bin across different samples, which may have different depths of sequencing. It differs from sequence dose in that the latter is typically obtained by dividing by the tag count mapped to a subset of the entire genome. The subset is one or more normalizing segments or chromosomes. Coverage quantities, whether or not normalized, may be corrected for global profile variation from region to region on the genome, G-C fraction variations, outliers in robust chromosomes, etc.
- step (a) described above can comprise sequencing at least a portion of the nucleic acid molecules of a test sample to obtain said sequence information for the nucleic acid molecules of the test sample.
- step (c) comprises calculating a single gene dose for each of the gene of interest as the ratio of the number of sequence tags or the other parameter identified for each of the genes of interest and the number of sequence tags or the other parameter identified for the genes chromosome sequence(s).
- gene dose is based on processed sequence coverage quantities derived from the number of sequence tags or another parameter.
- only unique, non-redundant sequence tags are used to calculate the processed sequence coverage quantities or another parameter.
- the processed sequence coverage quantity is a sequence tag density ratio, which is the number of sequence tag standardized by sequence length.
- the processed sequence coverage quantity or the other parameter is a normalized sequence tag or another normalized parameter, which is the number of sequence tags or the other parameter of a sequence of interest divided by that of all or a substantial portion of the genome.
- the processed sequence coverage quantity or the other parameter such as a fragment size parameter is adjusted according to a global profile of the sequence of interest.
- the processed sequence coverage quantity or the other parameter is adjusted according to the within-sample correlation between the GC content and the sequence coverage for the sample being tested.
- the processed sequence coverage quantity or the other parameter results from combinations of these processes, which are further described elsewhere herein.
- a gene dose is calculated as the ratio of the processed sequence coverage or the other parameter for each of the genes of interest and that for the normalizing gene sequence(s).
- the biological sample is a sample containing myometrial cells or a sample containing DNA from myometrial cells.
- the value is compared to the coverage of the same gene in a reference sample.
- the coverage in the sample of the patient which is to be differentially diagnosed is compared to the coverage in a reference sample variation in the genomic DNA in a uterine leiomyoma and/or to the coverage in the genomic DNA of a uterine leiomyosarcoma, thereby determining the coverage for the specific gene between the genomic DNA of uterine leiomyoma and the genomic DNA in uterine leiomyosarcoma.
- the term “increased coverage” for a given gene in the context of the third method of the invention is understood as that, when the genome containing the gene is sequenced, the number of reads which contain the sequence of the gene or of sequence tags associated with said gene is increased with respect to the number of reads containing the sequence of the gene or of sequence tags associated with said gene in a reference sample, wherein the reference sample is either a sample of a patient suffering from uterine leiomyosarcoma or a sample from a patient suffering uterine leiomyoma.
- the coverage of the gene under consideration is at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 100% higher than the coverage of the gene in the reference sample.
- the term “decreased coverage” in the context of the fourth diagnostic method of the invention is understood for a given gene, in the context of the third method of the invention is understood as that, when the genome containing the gene is sequenced, the number of reads which contain the sequence of the gene or of sequence tags associated with said gene is decrease with respect to the number of reads containing the sequence of the gene or of sequence tags associated with said gene in a reference sample, wherein the reference sample is either a sample of a patient suffering from uterine leiomyosarcoma or a sample from a patient suffering uterine leiomyoma.
- the coverage of the gene under consideration is less than 90%, less than 80%, less than 70%, less than 60%, less than 50%, less than 40%, less than 30%, less then 20%, less than 10% or lower than the coverage of the the gene in the reference sample
- a subject can be diagnosed as having uterine leiomyosarcoma if the subject shows an increased coverage in the HSF4 gene, in the NDRG4 gene, in the TMPRSS6 gene and/or in the TUBB2b gene with respect to a reference sample.
- the subject can be diagnosed as having uterine leiomyosarcoma if the subject shows a decreased coverage in the LRRCC1 gene is indicative that the patient has uterine leiomyosarcoma with respect to a reference sample.
- a subject can be diagnosed as having uterine leiomyosarcoma if the subject shows an increased coverage in the HSF4 gene, in the NDRG4 gene, in the TMPRSS6 gene and in the TUBB2b gene and a decreased coverage in the LRRCC1 gene with respect to a reference sample.
- the second method according to the invention is carried out in a patient that has been previously identified as suffering an uterine myometrial tumor, being either leiomyosarcoma or uterine leiomyoma by imaging examination, preferably by ultrasonography.
- First diagnostic method of the invention (method based on genomic mutational analysis)
- RNAseq whole-exome and RNA sequencing
- the invention relates to an in vitro method (hereinafter referred to indistinctly as “fourth method of the invention” or “diagnostic method of the invention”) for the diagnosis of a uterine tumor selected from the group consisting of uterine leiomyoma or uterine leiomyosarcoma in a subject, the method comprising determining in the whole-exome sequence of a biological sample from the subject the value of an index which correlates with the number of single nucleotide variants which are characteristic of the COSMIC mutational signature 12 and/or of the COSMIC mutational signature 20, wherein an increase in said index with respect to a reference sample is indicative that the subject is suffering from uterine leiomyosarcoma or from uterine leiomyoma.
- diagnosis refers both to the process of attempting to determine and/or identify a possible disease in a subject, i.e. the diagnostic procedure, and to the opinion reached by this process, i.e. the diagnostic opinion. As such, it can also be regarded as an attempt at classification of an individual's condition into separate and distinct categories that allow medical decisions about treatment and prognosis to be made.
- diagnosis of the uterine tumor relates to the capacity to identify or detect the presence of a tumor in a subject. This detection, as it is understood by a person skilled in the art, does not claim to be correct in 100% of the analyzed samples. However, it requires that a statistically significant amount of the analyzed samples are classified correctly.
- the amount that is statistically significant can be established by a person skilled in the art by means of using different statistical tools; illustrative, non-limiting examples of said statistical tools include determining confidence intervals, determining the p-value, the Student’s t-test or Fisher’s discriminant functions, etc. (see, for example, Dowdy and Wearden, Statistics for Research, John Wiley & Sons, New York 1983).
- the confidence intervals are preferably at least 90%, at least 95%, at least 97%, at least 98% or at least 99%.
- the p-value is preferably less than 0.1 , less than 0.05, less than 0.01 , less than 0.005 or less than 0.0001.
- the teachings of the present invention preferably allow correctly diagnosing in at least 60%, in at least 70%, in at least 80%, or in at least 90% of the subjects of a determined group or population analyzed.
- uterine leiomyoma and “uterine leiomyosarcoma” have been defined above and are equally applicable to the fourth method according to the invention.
- the fourth method of the invention comprises determining in a whole-exome sequence of a biological sample from a subject the value of an index which correlates with the number of single nucleotide variants which are characteristic of the COSMIC mutational signature 12 and/or of the COSMIC mutational signature 20.
- whole exome sequence generally means the sequence that results from the sequencing of all the protein-coding genes in a genome (known as the exome). It consists of first selecting only the subset of DNA that encodes proteins (known as exons) and then sequencing this DNA using any high-throughput DNA sequencing technology. Humans have about 180,000 exons, constituting about 1.5 percent of the human genome, or approximately million base pairs. In particular, the exome sequencing may be carried out by next-generation sequencing.
- the fourth method according to the invention comprises determining within the whole-exome sequence data the value of an index which correlates with the number of single nucleotide variants which are characteristic of the COSMIC mutational signature 12 and/or of the COSMIC mutational signature 20.
- mutantational index which correlates with the number of single nucleotide variants” as used herein, refers to a numeric value which defines the number of one or more type or types of predetermined single nucleotide variants within a given sequence dataset. In some embodiments, the index is the number of all the mutations present in the sequence dataset.
- the mutational index corresponds to the mutational profile similarity as described in Blokzijl et al. (Genome Med., 2018, 10(1):33, doi: 10.1186/s13073-018-0539-0) which measures the similarity between two mutational profiles .
- COSMIC mutational signature refers to a list of somatic single nucleotide variants that appear in cancer cells as a result of the mutational processes that have been operative in said cells as described by Alexandrov et al. (Nature, 2013, 500(7463) :415-21, doi: 10.1038/nature12477). COSMIC is the acronym of "Catalogue of Somatic Mutations in Cancer”.
- COSMIC mutational signature 12 is characterized by a transcriptional strand bias for T>C substitutions with more mutations of A than T on the untranscribed strands of genes consistent with damage to adenine and repair by transcription-coupled nucleotide excision repair.
- COSMIC mutational signature 20 results from concurrent POLD1 (polymerase delta 1) mutations and defective DNA mismatch repair, leading to microsatellite instability. Mutational profiles characterized by microsatellilte instability are well known as well as methods for their identification once a genomic sequence is available (see e.g. Shah et al., Cancer Res. 2010 Jan 15; 70(2): 431-435).
- step (ii) of the fourth method of the invention comprises the determination of the mutational index for COSMIC signatures 12 and 20,
- the diagnostic method of the invention comprises diagnosing the presence of uterine leiomyosarcoma or uterine leiomyoma if said index is increased with respect to a reference sample.
- the term “reference value” has been defined above in the context of the first, second or third method of the invention and is equally applicable to the fourth method of the invention, with the exception that, in this case, the reference sample which is used for the determination of the reference value is sample obtained from a subject or from a pool of subjects which do not suffer uterine cancer or which do not suffer leiomyoma or leiomyosarcoma.
- the reference value is the mean mutational index in a pool of samples from patients which do not suffer uterine cancer or which do not suffer leiomyoma or leiomyosarcoma.
- the reference value for the mutational index is the mean mutational index in a pool of samples preferably obtained from subjects suffering from the same type of cancer as the patient object of the study.
- the reference value is the mutational index in a pool from primary tumor tissues obtained from patients.
- the values can be compared with the reference value, and thus be assigned a level of increase with respect to a reference value.
- the “increase” can be of at least 1.1 -fold, 1.5-fold, 2-fold, 5-fold, 10-fold, 20-fold, 30-fold, 40-fold, 50- fold, 60-fold, 70-fold, 80-fold, 90-fold, 100-fold or even more compared with the reference value is considered as “increased” expression level.
- value is considered increased in a sample of the subject under study when the value increases with respect to the reference sample by at least 5%, by at least 10%, by at least 15%, by at least 20%, by at least 25%, by at least 30%, by at least 35%, by at least 40%, by at least 45%, by at least 50%, by at least 55%, by at least 60%, by at least 65%, by at least 70%, by at least 75%, by at least 80%, by at least 85%, by at least 90%, by at least 95%, by at least 100%, by at least 110%, by at least 120%, by at least 130%, by at least 140%, by at least 150%, or more.
- the comparison of the mutational values for each of COSMIC signatures or for both COSMIC signatures with the reference value for each of the signatures allows the diagnosis of an uterine tumor selected from uterine leiomyoma or uterine leiomyosarcoma.
- the biological sample is a sample containing myometrial cells or DNA derived from myometrial cells.
- the sample containing myometrial cells is a myometrial biopsy.
- the fourth method according to the invention is carried out in a patient that has been previously identified as suffering uterine leiomyosarcoma or uterine leiomyoma by imaging examination, preferably by ultrasonography.
- Prognostic method of the invention (method based on genomic mutational data)
- the invention relates to a method (hereinafter referred indistinctl as “fifth method of the invention” or “prognostic method of the invention”) for the prognosis of a patient diagnosed of uterine leiomyosarcoma, comprising
- step (ii) comparing the number of CNVs obtained in step (i) with a reference value wherein the presence of an increased number of CNVs with respect to the reference value, is indicative of a bad prognosis of uterine leiomyosarcoma.
- the prognostic or fifth method according to the invention allows the determination of the prognosis of a patient suffering from uterine leiomyosarcoma.
- prognosis refers to the prediction in a subject having a leiomyosarcoma of the likelihood of cancer-attributable death or progression, including recurrence, metastatic spread, and drug resistance, of a leiomyosarcoma.
- Prognosis may also be referred to in terms of "aggressiveness” or “severity”: an aggressive cancer is determined to have a high risk of negative outcome (i.e. , negative or poor prognosis) and a non-aggressive cancer has a low risk of negative outcome (i.e., positive or favorable prognosis).
- tumour is a cell-proliferation disorder that has the biological capability to rapidly spread outside of its primary location or organ.
- Indicators of tumour aggressiveness include, without limitation, tumour stage, tumour grade, Gleason grade, Gleason score, nodal status, and survival.
- the term “survival” is not limited to mean survival until mortality (wherein said mortality may be either irrespective of cause or related to a cellproliferation disorder), but may also used in combination with other terms to define clinical outcomes (e.g., "recurrence-free survival”, in which the term “recurrence” includes both localized and distant recurrence; “metastasis-free survival”; “disease-free survival”, in which the term “disease” includes cancer and diseases associated therewith).
- the length of the survival may be calculated by reference to a defined starting point (e.g., time of diagnosis or start of treatment) and a defined end point. Accordingly, a negative or poor prognosis is defined by a lower post-treatment survival term or survival rate.
- a positive or good prognosis is defined by an elevated post-treatment survival term or survival rate.
- prognosis is provided as the time of progression free survival or overall survival.
- said determination is not usually correct for all (i.e., 100%) of the patients to be identified.
- the term requires being able to identify a significant part of the subjects.
- One skilled in the art can readily determine if a part is statistically significant using several well-known statistical evaluation tools, for example, the determination of confidence intervals, the determination of p-values, Student’s t-test, Mann-Whitney test, etc. Details can be found in Dowdy and Wearden, Statistics for Research, John Wiley and Sons, New York 1983.
- Preferred confidence intervals are at least 90%, at least 95%, at least 97%, at least 98%, or at least 99%.
- the p-values are preferably 0.1 , 0.05, 0.01 , 0.005, or 0.0001. More preferably, at least 60%, at least 70%, at least 80%, or at least 90% of the subjects of a population can be suitably identified by the method of the present invention.
- the prognostic method according to the invention comprises determining in a biological sample of the subject for the presence of at least one CNVs shown in Table 5.
- CNV or “copy number variations refers to variation in the number of copies of a nucleic acid sequence present in a test sample in comparison with the copy number of the nucleic acid sequence present in a reference sample.
- the nucleic acid sequence is 1 kb or larger.
- the nucleic acid sequence is a whole chromosome or significant portion thereof.
- a "copy number variant” refers to the sequence of nucleic acid in which copy-number differences are found by comparison of a nucleic acid sequence of interest in test sample with an expected level of the nucleic acid sequence of interest. For example, the level of the nucleic acid sequence of interest in the test sample is compared to that present in a qualified sample.
- Copy number variants/variations include deletions, including microdeletions, insertions, including microinsertions, duplications, multiplications, and translocations.
- CNVs encompass chromosomal aneuploidies and partial aneuploidies.
- Methods for determining CNV of a gene of interest are well-known in the art such as, for instance, the methods disclosed in, U.S. patent application Ser. No. 16/913,965; Hastings et al., Nat Rev Genet; 10(8):551-64 (2009); and Shishido et al., Psychiatry Clin Neurosci, 68(2):85-95 (2014), the disclosures of which are incorporated by reference herein.
- Existing methods to determine CNVs typically include cytogenetic methods such as fluorescent in situ hybridization, comparative genomic hybridization, and/or virtual karyotyping with SNP arrays.
- qPCR next-generation sequencing and quantitative PCR
- PRT paralog- ratio testing
- MCC molecular copy number counting
- qPCR compares threshold cycles (Ct) between the target gene and a reference sequence with normal copy numbers, to generate ACt values which are used for CNV calculation.
- This method has been used in large-scale CNV analysis in detecting disease associations, for example, psoriasis and Crohn's disease. With the development of genome-wide CNV screening, qPCR is often used as a confirmation method for computationally identified loci.
- multiplex PCR-based approaches such as multiplex amplifiable probe hybridization, multiplex ligation- dependent probe amplification, multiplex PCR-based real-time invader assay, quantitative multiplex PCR of short fluorescent fragments, and multiplex amplicon quantification, have also been used for targeted screening and validation of CNVs.
- CNV variation is determined by whole genome sequencing. In some embodiments, CNV variation is determined by whole exome sequencing. Both the whole genome sequence and the whole exome sequence can be carried out by Next Generation Sequencing.
- next Generation Sequencing refers to sequencing technologies having high-throughput sequencing as compared to traditional Sanger- and capillary electrophoresis-based approaches, wherein the sequencing process is performed in parallel, for example producing thousands or millions of relatively small sequences reads at a time.
- determining copy number variation includes the steps of: a. providing at least two sets of first polynucleotides, wherein each set maps to a different reference sequence in a genome, and, for each set of first polynucleotides; i. amplifying the polynucleotides to produce a set of amplified polynucleotides; ii. sequencing a subset of the set of amplified polynucleotides, to produce a set of sequencing reads; iii. grouping sequences reads sequenced from amplified polynucleotides into families, each family amplified from the same first polynucleotide in the set; iv. inferring a quantitative measure of families in the set; v. determining copy number variation by comparing the quantitative measure of families in each set.
- the method for determining the presence or absence of any CNV in a sample comprises (a) obtaining sequence information for nucleic acids in the sample; (b) using the sequence information and the method described above to identify a number of sequence tags, sequence coverage quantity, a fragment size parameter, or another parameter for each of the genes of interest and to identify a number of sequence tags or another parameter for one or more normalizing gene sequences; (c) using the number of sequence tags or the other parameter identified for each of the genes of interest and the number of sequence tags or the other parameter identified for each of the normalizing genes to calculate a single gene dose for each of the genes of interests; and (d) comparing each gene dose to a threshold value, and thereby determining the presence or absence of any complete CNVs in the sample.
- sequence tag has been defined above in the context of the third method according to the invention and is herein used interchangeably with the term “mapped sequence tag” to refer to a sequence read that has been specifically assigned,
- Mapped sequence tags are uniquely mapped to a reference genome, i.e., they are assigned to a single location to the reference genome. Unless otherwise specified, tags that map to the same sequence on a reference sequence are counted once.
- Tags may be provided as data structures or other assemblages of data.
- a tag contains a read sequence and associated information for that read such as the location of the sequence in the genome, e.g., the position on a chromosome. In certain embodiments, the location is specified for a positive strand orientation.
- a tag may be defined to allow a limited amount of mismatch in aligning to a reference genome.
- tags that can be mapped to more than one location on a reference genome, i.e., tags that do not map uniquely may not be included in the analysis.
- the CNV is a deletion, in some embodiments, the CNV is a duplication. In some embodiments, the first step comprises the determination of at least
- the term “increased copy number for a gene” in the context of the third differential diagnostic method of the invention is understood as that at least one additional copy of the gene is present in the sample from the patient which is to be differentially diagnosed with respect to a reference sample, wherein the reference sample is either a sample of a patient suffering from uterine leiomyosarcoma or a sample from a patient suffering uterine leiomyoma.
- the copy number of the gene under consideration is at least 1 , at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or more copies of the gene in the reference sample.
- the term “decreased copy number” in the context of the fourth diagnostic method of the invention is understood that at least one less copy of the gene is present in the subject which is to be differentially diagnosed as compared to the copy number of the same gene in the reference sample, wherein the reference sample is a sample from the patient which is to be differentially diagnosed and the reference sample is either a sample of a patient suffering from uterine leiomyosarcoma or from a patient suffering uterine leiomyoma.
- the biological sample is a sample containing myometrial cells, DNA derived from myometrial cells or RNA derived from myometrial cells.
- the sample containing myometrial cells is a myometrial biopsy.
- the biological sample is a biofluid.
- the biofluid is plasma, blood, serum, urine or uterine fluid.
- the prognostic or fifth method of the invention comprises comparing the determining that the patient shows a bad prognosis if one or more of the CNVs listed in Table 5 are present in the sample from the subject.
- bad prognosis denotes a significantly less favorable probability of survival after patient treatment in the group of patients defined as “bad prognosis” compared with the group of patients defined as “good prognosis”. According to the invention, the term “bad prognosis” also denotes a significantly less favorable probability of not needing treatment to survive in the group of patients defined as “bad prognosis” compared with the group of patients defined as “good prognosis”. In one embodiment, the prognosis of the patient is measured as survival, as disease-free progression or using any other parameter which is reflective of the outcome of the patient.
- the invention relates to an in vitro method for selecting a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma as a candidate to receive an adequate therapy to treat uterine leiomyosarcoma or uterine leiomyoma, the method comprising:
- the in vitro method for selecting a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma as a candidate to receive an adequate therapy to treat uterine leiomyosarcoma or uterine leiomyoma comprises determining whether the patient is suffering from uterine leiomyoma or leiomyosarcoma by using any of the diagnostic or differential diagnostic methods according to the invention.
- the method comprises selecting a patient to be treated with either a therapy adequate for the treatment of uterine leiomyosarcoma if the patient is diagnosed as suffering from uterine leiomyosarcoma or with a therapy adequate for the treatment of uterine leiomyoma if the patient is diagnosed as suffering from uterine leiomyoma.
- the patient when the patient is being diagnosed with leiomyosarcoma, the patient is selected to be treated by a therapy selected from the group consisting of surgery, radiation therapy, chemotherapy, hormonal therapy, or targeted therapy.
- the surgery is a simple hysterectomy, radical hysterectomy, or bilateral salpingo-oophorectomy
- the chemotherapy includes one or more drugs selected from the group consisting of dacarbazine (DTIC), docetaxel, doxorubicin, epirubicin, gemcitabine, ifosfamide, Paclitaxel, temozolomide, trabectedin and vinorelbine.
- DTIC dacarbazine
- docetaxel docetaxel
- doxorubicin epirubicin
- gemcitabine gemcitabine
- ifosfamide Paclitaxel
- temozolomide temozolomide
- trabectedin trabectedin
- vinorelbine vinorelbine
- the hormonal therapy comprises:
- Gonadotropin-releasing hormone such as goserelin or leuprolide, leuprorelin acetate, leuprorelin acetate sustained release depot (ATRIGEL), triptorelin pamoate, buserelin, naferelin, histrelin, goserelin, deslorelin, degarelix, ozarelix, ABT-620 (elagolix), TAK-385 (relugolix), EP- 100, KLH-2109 or triptorelinand goserelin acetate, or
- An aromatase inhibitor which is defined as a compound which inhibits estrogen production, for instance, the conversion of the substrates androstenedione and testosterone to estrone and estradiol, respectively, and that includes, but is not limited to steroids, especially atamestane, exemestane and formestane and, in particular, non-steroids, especially aminoglutethimide, roglethimide, pyridoglutethimide, trilostane, testolactone, ketokonazole, vorozole, fadrozole, anastrozole and letrozole
- therapy is a targeted therapy.
- targeted therapy refers to drugs which attack specific genetic mutations within cancer cells, such as leiomyosarcoma while minimizing harm to healthy cells.
- the targeted therapy comprises the use of pazopanib.
- the patient when the patient is being diagnosed with leiomyoma, the patient is selected to be treated by a morcellation procedure or other surgical methods for removing leiomyomas after they have been determined not to comprise any leiomyosarcomas.
- large tissue masses such as fibroid tissue masses (leiomyomas)
- leiomyomas are traditionally excised during a surgical procedure and removed intact from the patient through the surgical incision. These tissue masses can easily be several centimeters in diameter or larger.
- the surgery is typically conducted using incisions of less than 1 centimeter, and often 5 millimeters or less.
- Morcellation medical devices are well-known in the art.
- the instruments described m U.S. Pat Nos. 5,037,379; 5,403,276; 5,520,634; 5,327,896 and 5,443,472 can be used herein (each patent is incorporated herein by reference).
- excised tissue is morcellated (i.e. debulked), collected and removed from the patient's body through, for example, a surgical trocar or directly through one of the surgical incisions.
- Mechanical morcellators cut tissue using, for example, sharp end-effectors such as rotating blades.
- Electrosurgical and ultrasonic morcellators use energy to morcellate tissue.
- a system for fragmenting tissue utilizing an ultrasonic surgical instrument is described in "Physics of Ultrasonic Surgery Using Tissue Fragmentation", 1995 IEEE Ultrasonics Symposium Proceedings, pages 1597-1600.
- the excised tissue is can be transferred to a specimen bag prior to being morcellated.
- some morcellators are used without specimen bags. Specimen bags are, therefore, designed to hold excised tissue without spilling tissue, or tissue components, into the abdominal cavity during morcellation.
- Ultrasonic morcellation instruments may be particularly advantageous for use in certain surgical procedures and for debulking certain types of tissue.
- a blunt or rounded ultrasonic morcellator tip may reduce the possibility of unintended cutting or tearing of a specimen bag while the ultrasonic energy morcellates the tissue.
- the invention also provides methods for the treatment of subjects which have been identified as suffering leiomyosarcomas by any of the differential diagnostic or diagnostic method according to the invention, wherein if the subject has been diagnosed as suffering from a leiomyosarcoma, the subject is treated with a therapy adequate for the treatment of leiomyosarcoma.
- the therapy is selected from the group consisting of surgery, radiation therapy, chemotherapy, hormonal therapy or a replacement therapy.
- treatment comprises any type of therapy, which aims at terminating, preventing, ameliorating and/or reducing the susceptibility to a clinical condition as described herein.
- the term treatment relates to prophylactic treatment (i.e. a therapy to reduce the susceptibility of a clinical condition, a disorder or condition as defined herein).
- prophylactic treatment i.e. a therapy to reduce the susceptibility of a clinical condition, a disorder or condition as defined herein.
- treatment “treating,” and the like, as used herein, refer to obtaining a desired pharmacologic and/or physiologic effect, covering any treatment of a pathological condition or disorder in a mammal, including a human.
- treatment includes (1) preventing the disorder from occurring or recurring in a subject, (2) inhibiting the disorder, such as arresting its development, (3) stopping or terminating the disorder or at least symptoms associated therewith, so that the host no longer suffers from the disorder or its symptoms, such as causing regression of the disorder or its symptoms, for example, by restoring or repairing a lost, missing or defective function, or stimulating an inefficient process, or (4) relieving, alleviating, or ameliorating the disorder, or symptoms associated therewith, where ameliorating is used in a broad sense to refer to at least a reduction in the magnitude of a parameter, such as inflammation, pain, and/or immune deficiency.
- a parameter such as inflammation, pain, and/or immune deficiency
- the therapeutic method according to the invention are applied to patients which have been diagnosed as suffering leiomyosarcoma by using any of the diagnostic method for leiomyosarcoma or the differential diagnostic methods according to the invention.
- the method comprises a first step in which the diagnostic method for leiomyosarcoma or the differential diagnostic methods is applied to the patient and, a second step in which patients diagnosed as suffering leiomyosarcoma are selected and a third step in which the patients are treated with a therapy adequate for the treatment of leiomyosarcoma.
- Suitable surgical therapies, radiation therapies, chemotherapies, hormonal therapies or targeted therapies have been described in the context of the methods for selecting a therapy for a subject based on the diagnostic or differential diagnostic methods according to the invention and are equally applicable to the therapeutic methods according to the invention.
- the invention relates to a kit, package or device that contains reagents adequate for implementing any of the methods of the invention. It will be understood that, depending on the nature of the method, the reagents adequate for its implementation will vary.
- kit is understood as a product containing the different reagents required for carrying out the methods of the invention packaged such that it allows being transported and stored.
- the materials suitable for the packaging of the components of the kit include glass, plastic (polyethylene, polypropylene, polycarbonate, and the like), bottles, vials, paper, sachets, and the like. Where there are more than one component in a kit they may be packaged together if suitable or the kit will generally contain a second, third or other additional container into which the additional components may be separately placed. However, in some embodiments, certain combinations of components may be packaged together comprised in one container means.
- a kit can also include a means for containing any reagent containers in close confinement for commercial sale.
- Such containers may include injection or blow- molded plastic containers into which the desired vials are retained.
- One or more compositions of a kit can be lyophilized. In some embodiments, all compositions of a kit of the disclosure will be lyophilized. In some embodiments, a kit of the disclosure with one or more lyophilized agents will be supplied with a re-constitution buffer. Reagents and components of kits may be comprised in one or more suitable container means.
- a container means may generally comprise at least one vial, test tube, flask, bottle, syringe or other container means, into which a component may be placed, and preferably, suitably aliquoted.
- kits according to the invention can also comprise one or more reagents for preparing crude cell lysates and/or reagents for extracting, isolating and/or purification of nucleic acids from a sample.
- Additional components can comprise particles with affinity for nucleic acids and/or solid supports with affinity for nucleic acids, one or more wash buffers, binding enhancers, binding solutions, polar solvents, alcohols, elution buffers, filter membranes and/or columns for isolation of DNA/RNA.
- a kit may further comprise reagents for downstream processing of an isolated nucleic acid and may include without limitation at least one RNase inhibitor; at least one cDNA construction reagents (such as reverse transcriptase); one or more reagents for amplification of RNA, one or more reagents for amplification of DNA including primers, reagents for purification of DNA, probes for detection of specific nucleic acids.
- the kits of the invention can contain instructions for the simultaneous, sequential, or separate use of the different components that are in the kit.
- Said instructions can be in the form of printed material or in the form of an electronic medium capable of storing instructions such that they can be read by a subject, such as electronic storage media (magnetic disks, tapes, and the like), optical media (CD-ROM, DVD), and the like.
- the media may additionally or alternatively contain Internet addresses providing said instructions.
- the kit comprises
- kits when the diagnostic method or the differential diagnostic method according to the invention is based on the determination of the expression levels of one or more genes, the kits contain primers or probes adequate for the detection of the expression levels of said one or more genes.
- primer refers to oligonucleotides that can specifically hybridize to a target polynucleotide sequence, due to the sequence complementarity of at least part of the primer within a sequence of the target polynucleotide sequence.
- a primer can have a length of at least 8 nucleotides, typically 8 to 70 nucleotides, usually of 18 to 26 nucleotides.
- a primer can have at least 75 percent, at least 80 percent, at least 85 percent, at least 90 percent, or at least 95 percent sequence complementarity to the hybridized portion of the target polynucleotide sequence.
- Oligonucleotides useful as primers may be chemically synthesized according to the solid phase phosphoramidite triester method first described by Beaucage and Caruthers, Tetrahedron Letts. (1981) 22: 1859-1862, using an automated synthesizer, as described in Needham-Van Devanter et al, Nucleic Acids Res. (1984) 12: 6159-6168. Primers are useful in nucleic acid amplification reactions in which the primer is extended to produce a new strand of the polynucleotide.
- Primers can be readily designed by a skilled artisan using common knowledge known in the art, such that they can specifically anneal to the nucleotide sequence of the target nucleotide sequence of the at least one biomarker provided herein.
- the 3' nucleotide of the primer is designed to be complementary to the target sequence at the corresponding nucleotide position, to provide optimal primer extension by a polymerase.
- probe refers to oligonucleotides or analogs thereof that can specifically hybridize to a target polynucleotide sequence, due to the sequence complementarity of at least part of the probe within a sequence of the target polynucleotide sequence.
- exemplary probes can be, for example DNA probes, RNA probes, or protein nucleic acid (PNA) probes.
- a probe can have a length of at least 8 nucleotides, typically 8 to 70 nucleotides, usually of 18 to 26 nucleotides.
- a probe can have at least 75 percent, at least 80 percent, at least 85 percent, at least 90 percent, or at least 95 percent sequence complementarity to hybridized portion of the target polynucleotide sequence. Probes can also be chemically synthesized according to the solid phase phosphoramidite triester method as described above. Methods for preparation of DNA and RNA probes, and the conditions for hybridization thereof to target nucleotide sequences, are described in Molecular Cloning: A Laboratory Manual, J. Sambrook et al., eds., 2nd edition. Cold Spring Harbor Laboratory Press, 1989, Chapters 10 and 11.
- the reagents adequate for the determination of the expression levels of one or more genes comprise at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90% or at least 100% of the total amount of reagents adequate for the determination of the expression levels of genes forming the kit.
- the kit comprises reagents suitable for determining the presence of the CNV such as at least a pair of target gene specific primers, a background sequence, at least a pair of primers specific to the background sequence, optionally, a pair of primers specific to a control or reference sequence, a DNA polymerase, dNTP's, MgCI2 and one or more buffers.
- the kit can also comprise one or more probes, wherein the probe comprises a nucleic acid sequence operable to selectively hybridize to: a target nucleic acid sequence, a reference/control nucleic acid sequence and/or to an amplicon or a fragment of an amplicon, including a target gene amplicon or a fragment thereof, a reference/control amplicon or fragment, and in some embodiments, and optionally to hybridize selectively to a background sequence, an amplicon or a fragment of an amplicon of a background sequence.
- Probes in a kit of the disclosure can include probes to perform a 5' nuclease assay and/or one or more probes to detect the products of amplification.
- one or more probe of a kit of the disclosure is labeled. In some embodiments, one or more probes of a kit of the disclosure is a dual labeled probe. In some embodiments, one or more of the probes of a kit of the disclosure is labeled with a fluor and a quencher. In some embodiments, each probe of a kit is dually labeled with a different fluor and a different quencher. In some embodiments, each probe of a kit is dually labeled with a different fluor and the same quencher.
- the reagents adequate for the determination of one or more of the CNVs as defined in the present invention comprise at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90% or at least 100% of the total amount of reagents adequate for the determination of the CNVs forming the kit.
- Methods of the invention can be performed using software, hardware, firmware, hardwiring, or combinations of any of these.
- the invention relates to a computer-implemented method, wherein the method is any of the diagnostic, differential diagnostic or prognostic method according to the invention.
- the invention relates to a computer containing instructions for carrying any of said methods.
- FFPE paraffin-embedded
- DNA and RNA from 5-10-pm-thick FFPE tumor sections were isolated using GeneRead DNA FFPE kit (Qiagen GmbH, Hilden, Germany) and miRNAeasy FFPE kit (Qiagen GmbH, Hilden, Germany) following manufacturer instructions. Quality control and quantification analysis were performed using Qubit 2.0 Fluorometer (Thermo Scientific, Waltham, MA, USA) and 2100 Bioanalyzer system (Agilent, Santa Clara, CA, USA). Additionally, we measured amplification potential of nucleic acids by assessing ACq value through quantitative PCR, after normalization for a fixed input mass. Only DNA samples with ACq ⁇ 5 and 1 pg total DNA were included.
- RNA samples with 100 ng total RNA and DV200 >30% were selected for further experiments. Accordingly, five out of 34 LMS cases in the experimental cohort were excluded for RNAseq due to insufficient quantity and/or quality of RNA.
- DNA sequencing libraries from 44 LM and 34 LMS were constructed using the KAPA Hyper Prep kit (KK8514 and KK8515; Roche, Basel, Switzerland). Next, all exons were captured by a custom-designed SeqCap EZ MedExome kit (NimbleGen; Roche), which is targeted and enriched for exons and neighboring introns (within 50 bp) of 571 hematological-associated genes. Lastly, each quantified library was loaded in a HiSeqXTen platform (Illumina, San Diego, CA, USA), and paired-end sequencing was performed according to manufacturer instructions.
- DNA sequence data were demultiplexed and formatted in FASTQ files (FASTQ 1 .9 files with Phred+33) containing at least 32 million DNA reads for each sample. Reads were aligned to the human hg19 genome (CRGCh37) using the Burrows-Wheeler Alignment tool software (vO.7.17; https://github.com/lh3/bwa), with -M and -R options to mark short and split alignments as secondary and add read group information, respectively.
- SAM files were then converted to coordinate sorted BAM files using samtools v1.9, and Picard Tools v2.20.1 (https://github.com/broadinstitute/picard) was used to mark and remove duplicates.
- Interval reference files (design files) for the SeqCap EZ MedExome were also provided. Specifically, ⁇ 100 bases were added to the capture bed file for calling with a mean depth higher than 150X, with >80% of target regions covered. Coverage data were obtained using the bedtools software (https://github.com/arq5x/bedtools/) and normalized to counts per million (cpm) as follows:
- SnpEff version 4.3t, https://github.com/pcingola/SnpEff was used for variant annotation. Variants were visualized and analyzed using R(v3.6.1), tidyverse (v1.3; https://github.com/tidyverse/tidyverse), and vroom (v1.2; https://github.com/VROOM-Project/vroom).
- CNV detection we used the CNVkit Python library (v.0.9.6; https://github.com/etal/cnvkit) with default parameters for tumor analysis. Specifically, sample read depths were normalized and individually compared with the reference, using the circular binary segmentation algorithm to infer copy number segments, which were then annotated to genes. Lastly, to evaluate whether CNV data could be used to differentiate LMS and LM, we performed unsupervised hierarchical clustering based on Euclidean distance calculation of Iog2 values and using heatmap3 (https://github.com/slzhao/heatmap3).
- RNA sequencing libraries were prepared using Truseq RNA exome (Illumina), normalized to 10 nM, and clustered into a single pool with final concentration of ⁇ 2 pM. Paired-end sequencing (2x75 bp) was carried out in a NextSeq 500 instrument (Illumina).
- RNAseq data were demultiplexed to generate intermediate analysis FASTQ format files containing at least 20 million RNA reads per sample. Reads were aligned to the human hg19 genome (GRCh37) using STAR Alignment software (v2.7.0f). After quality filtering, we obtained an average of 47 million uniquely mapped reads to the human transcriptome per sample. Finally, gene transcript abundance was estimated using the HTseq Python package.
- RNAseq data could be used to differentiate LMS from LM
- we built a classification model using the caret R package (v6.0-86; https://github.com/topepo/caret). For this purpose, our sample cohort was randomly split, keeping balanced class distributions into a training set (75% of samples) and validation set (25% of samples).
- AdaBoost models were trained with 10-fold stratified cross-validation to obtain robust estimates of tumor classification capabilities, following a two-step approach. In the first step, five subsets of samples were created from the training data. From them, four subsets were used for fitting the model, and the other subset was used for feature pruning. Since model composition varied each time due to the probabilistic nature of classification models, fitting and feature pruning were repeated 10 times for each feature pruning subset, generating a total of 50 models.
- RNA extracted from 96 FFPE tissue sections was normalized to 100 ng total and used to prepare libraries with a PCR/amplicon-based workflow (AmpliSeq Library Plus, Illumina) following manufacturer instructions. Data from the Illumina NextSeq500 sequencer were demultiplexed and aligned to the CanTRAN hg19 reference genome using BWA mem. Coverage for each of the 20 target genes was calculated using bedtools and normalized based on total reads per sample.
- Table 1 Differentially expressed genes between LMS and LM samples.
- unsupervised hierarchical clustering grouped LMS samples in a homogeneous cluster of 29 samples, while 30 LM samples were detected in a separate cluster.
- another cluster included the remaining LM with some LMS (LMS03, LMS11 , LMS26, LMS35, and LMS62).
- LMS03, LMS11 , LMS26, LMS35, and LMS62 LMS03, LMS11 , LMS26, LMS35, and LMS62.
- Table 2 List of 20 differentially expressed genes identified from RNAseq data using a machine learning approach
- the total 96 samples were randomly split into a training set to build the machine learning model and a test set to validate the model (75% and 25% class- balanced samples for training and test sets, respectively).
- the gradient boosting algorithm was used to build a new model, which achieved optimal values of sensitivity and specificity, since it was able to correctly classify all test and training samples.
- the model was used to calculate class probabilities for all samples, allowing for a more fine-tuned classification of samples, where we defined a “warning range” for those tumors where the model was not confident enough, defined as probabilities of ⁇ 75% for each group (Fig. 1 B).
- this model could correctly classify all samples with high class probabilities, even for sample LMS39 with the lowest LMS probability.
- Table 3 Sensitivity and specificity of predictive models for the differential diagnosis of LMS and LM based on gene expression (RNAseq).
- Example 3 Identification of differential somatic single nucleotide variants and insertions/deletions
- SNVs single nucleotide variants
- Indels insertions/deletions
- signature 1 results from an endogenous mutational process initiated by spontaneous deamination of 5- methylcytosine, while signature 5 exhibits transcriptional strand-bias for T>C substitutions at ApTpN context as additional mutational features.
- signature 20 is associated with defective DNA mismatch repair due to high numbers of small indels at mono/polynucleotide repeats.
- signature 12 represents a novel mutational signature only present in these uterine tumors, showing similarities to liver cancer and exhibiting a strong transcriptional strand-bias for T>C substitutions as additional mutational features.
- coverage values which can be extrapolated to copy number states (where a higher coverage is interpreted as a duplication or amplification, while lower coverage is interpreted as a deletion) were calculated and normalized for all genes.
- DNA coverage data was used to build a classification model using the xgboost algorithm. Of these, the top 5 genes with highest importance were selected to build new classification models of each individual gene and combinations of them ( Figure 3 and Table 4).
- the coverage of the TUBB2B, LRRCC1 , NDRG4, HSF4 and the TMPRSS6 genes were identified as predictive for the differential diagnosis of LMS and LM (Fig. 3).
- Kaplan-Meier survival curves were generated to assess the association between LMS-specific CNVs and clinical prognosis based on overall survival.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Genetics & Genomics (AREA)
- Hospice & Palliative Care (AREA)
- Biochemistry (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Oncology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Peptides Or Proteins (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The present invention relates to methods for the differential diagnosis of leiomyosarcoma and leiomyoma, as well as to methods for the prognosis of patients suffering from leiomyosarcoma. The invention also relates to kits and devices for carrying out any of the differential diagnostic and prognostic methods of the invention.
Description
METHODS AND REAGENTS FOR THE DIFFERENTIAL DIAGNOSIS OF UTERINE TUMORS
FIELD OF THE INVENTION
The present invention relates to the field of gynecological cancer diagnostics and, more in particular, to methods for the differential diagnosis uterine leiomyoma and uterine leiomyosarcoma in a subject as well as to reagents suitable for carrying out said methods.
BACKGROUND OF THE INVENTION
Uterine leiomyomas (LM) are benign tumors arising in the smooth muscle cells of the uterine wall. They are the most common pelvic tumors in women, with prevalence of >80% for African American and -70% for Caucasian women before 50 years of age. Although LM are non-malignant tumors, the risk of hidden undiagnosed malignancy, such as leiomyosarcoma (LMS), occurs in one among 498 uterine tumors.
Laparoscopic myomectomy with morcellation of the tumor is the gold standard therapeutic option for uterine tumors. Unfortunately, clinical symptoms as well as morphological features between LM and LMS are indistinguishable prior to surgery introducing the risk of potential spread of undiagnosed LMS. For this reason, the FDA issued a press release in 2014 discouraging use of power morcellators to treat myometrial tumors, substituting laparoscopic myomectomy for laparotomy-based procedures and thus increasing morbidity, mortality, and cost for the patient and healthcare system. Given these challenges, there is an urgent need to develop more effective strategies for preoperative differential molecular diagnosis of uterine tumors.
The search for standardized molecular criteria to differentiate uterine LMS and LM before surgery to prevent dissemination of hidden malignancies during morcellation represents an important current diagnostic challenge. In fact, available “omics” profiling for LMS has been limited due to the rare incidence of this malignancy (1 in 498 patients undergoing hysterectomy or myomectomy for presumed LM).
SUMMARY OF THE INVENTION
The authors of the present invention, using whole-exome and RNA sequencing (RNAseq), have identified the differential molecular footprint of LM versus LMS. In particular, the authors of the invention have found that unlike LM, LMS has significant mutational heterogeneity and prevalent copy number alterations at the DNA level, while
a specific transcriptomic profile and multiple structural rearrangements were detected at the RNA level. With these data, an integrated molecular analysis was performed to assess the effect of copy number variants (CNVs) on gene expression across the study population. Targeted RNAseq data and machine learning were used to create a predictive model for comprehensive molecular classification of LMS and LM at the tumortissue level.
Therefore, in a first aspect the invention relates to an in vitro method for the differential diagnosis of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma, the method comprising:
(i) measuring the expression level of gene ARHGAP11 A in a biological sample obtained from the subject thereby obtaining a gene expression profile of said sample; and
(ii) identifying the subject as suffering from uterine leiomyosarcoma or from uterine leiomyoma by a predictive model which correlates the gene expression profile identified in step (i) with representative gene expression profiles from samples obtained from subjects previously identified as suffering from uterine leiomyosarcoma or from uterine leiomyoma, said predictive model having been generated by training a computer with a plurality of gene expression profiles from previously identified subjects suffering from uterine leiomyosarcoma or from uterine leiomyoma by machine learning on said plurality of gene expression profiles so as to obtain representative gene expression profiles associated with uterine leiomyosarcoma or with uterine leiomyoma.
In a second aspect, the invention relates to an in vitro method for the differential diagnosis of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma, the method comprising:
(i) measuring the level of expression of at least one gene selected from the list shown in Table 1 in a biological sample obtained from the subject, and
(ii) comparing said level of expression with a reference value, wherein a deviation in the level of expression of said at least one gene selected from the list shown in Table 1 with respect to said reference value, is indicative that the subject is suffering from uterine leiomyosarcoma or from uterine leiomyoma.
In a third aspect, the invention relates to an in vitro method for differential diagnosis of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma, the method comprising analyzing in a biological sample from the subject the coverage of at least one gene selected from the group consisting of the TUBB2b gene, the LRRCC1
gene, the NDRG4 gene, the HSF4 gene and the TMPRSS6 gene and wherein an increased coverage with respect to a reference sample in the HSF4 gene, in the NDRG4 gene, in the TMPRSS6 gene and/or in the TUBB2b gene and/or a decreased coverage with respect to a reference sample in the LRRCC1 gene is indicative that the patient has uterine leiomyosarcoma.
In another aspect, the invention relates to an in vitro method for the diagnosis of a uterine tumor selected from the group consisting of uterine leiomyoma or uterine leiomyosarcoma in a subject, the method comprising determining in the whole-exome sequence of a biological sample from the subject the value of a mutational index which correlates with the number of single nucleotide variants which are characteristic of the COSMIC mutational signature 12 and/or of the COSMIC mutational signature 20, wherein an increase in said index with respect to a reference sample is indicative that the subject is suffering from uterine leiomyosarcoma or from uterine leiomyoma.
In yet another aspect, the invention relates to an in vitro method for prognosis of a subject diagnosed of uterine leiomyosarcoma, comprising determining in a biological sample of the subject the presence of at least one CNVs shown in Table 5 wherein the presence of the CNV in the sample is indicative of a bad prognosis of uterine leiomyosarcoma.
In another aspect, the invention relates to an In vitro method for selecting a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma as a candidate to receive an adequate therapy to treat uterine leiomyosarcoma or uterine leiomyoma, the method comprising:
(i) determining whether the patient is suffering from uterine leiomyoma or leiomyosarcoma following any of the differential diagnostic methods of the invention; and
(ii) selecting said patient as a candidate to receive an adequate therapy to treat uterine leiomyosarcoma if the patient is diagnosed as suffering from uterine leiomyosarcoma or to receive an adequate therapy to treat uterine leiomyoma if the patient is diagnosed as suffering from uterine leiomyoma.
In another aspect, the invention relates to a method for the treatment of leiomyosarcoma in a subject in need thereof comprising the administration of a therapy adequate for the treatment of leiomyosarcoma, wherein the patient to be treated has been identified by any of the methods of the invention.
In further aspects, the invention relates to a kit, package and/or device comprising reagents adequate for implementing the methods according to the invention, to a method
according to the invention which is computer-implemented as well as to a computer containing instructions for carrying out any of the methods according to the invention.
BRIEF DESCRIPTION OF THE FIGURES
Figure 1 : Transcriptional analysis and validation of the targeted gene panel on leiomyoma (LM) and leiomyosarcoma (LMS). A) Volcano plot showing the gene distribution in LMS versus LM samples. FDR values were re-scaled as -Iog10 (FDR) for better visualization. Lines delimit the cutoffs for logFC and -log10 (FDR). Downregulated genes in LMS (logFC < -2; FDR < 0.05) are represented by white dots, while upregulated genes in LMS (logFC > 2; FDR < 0.05) are represented by black dots. B) Class probabilities predicted by the model for the test set, with the “warning range” highlighted in light grey.
Figure 2. Expression values (normalized coverage of RNAseq reads) of 5 predictive genes for the differential diagnosis of LMS and LM. Adjusted p-values were calculated using a Students t-test and the Bonferroni-Hochberg correction.
Figure 3: Selection of the 5 predictive genes for the differential diagnosis of LMS and LM based on DNA sequencing results. Adjusted p-values were calculated using a Students t-test and the Bonferroni-Hochberg correction.
Figure 4. Kaplan-Meier plots showing the association between overall survival and alterations in at least 67% of the most frequent CNVs detected in LMS patients.
DETAILED DESCRIPTION OF THE INVENTION
First differential diagnostic method of the invention (method based on transcriptomic analysis)
The authors of the present invention have found that LM and LMS have specific transcriptomic profiles. This difference in the transcriptomic profiles allows the differential diagnosis of one disease or the other in a subject by analyzing the RNA composition in a sample from the subject and classifying the subject using artificial intelligence using an algorithm which has been trained with transcriptomic profiles from samples from known LM and LMS samples. Accordingly, in a first aspect, the invention relates to an in vitro method for the differential diagnosis of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma (hereinafter “first method of the invention”), the method comprising:
(i) measuring the expression level of gene ARHGAP11 A in a biological sample obtained from the subject thereby obtaining a gene expression profile of said sample; and
(ii) identifying the subject as suffering from uterine leiomyosarcoma or from uterine leiomyoma by a predictive model which correlates the gene expression profile identified in step (i) with representative gene expression profiles from samples obtained from subjects previously identified as suffering from uterine leiomyosarcoma or from uterine leiomyoma, said predictive model having been generated by training a computer with a plurality of gene expression profiles from previously identified subjects suffering from uterine leiomyosarcoma or from uterine leiomyoma by machine learning on said plurality of gene expression profiles so as to obtain representative gene expression profiles associated with uterine leiomyosarcoma or with uterine leiomyoma.
The term “differential diagnosis”, as used herein, refers to the determination of which of two or more diseases with similar symptoms is likely responsible for a subject’s symptom(s), or distinguishing of a particular disease or condition from others that present similar clinical features based on an analysis of the clinical data. This determination, as it is understood by a person skilled in the art, does not claim to be correct in 100% of the analyzed samples. However, it requires that a statistically significant amount of the analyzed samples is classified correctly. The amount that is statistically significant can be established by a person skilled in the art by means of using different statistical tools; illustrative, non-limiting examples of said statistical tools include determining confidence intervals, determining the p-value, the Student’s t-test or Fisher’s discriminant functions, etc. (see, for example, Dowdy and Wearden, Statistics for Research, John Wiley & Sons, New York 1983). The confidence intervals are preferably at least 90%, at least 95%, at least 97%, at least 98% or at least 99%. The p-value is preferably less than 0.1 , less than 0.05, less than 0.01 , less than 0.005 or less than 0.0001. The teachings of the present invention preferably allow correctly diagnosing in at least 60%, in at least 70%, in at least 80%, or in at least 90% of the subjects of a determined group or population analyzed.
The term “uterine leiomyoma”, also known as uterine fibroid, as used herein, refers to a benign tumor that appears in the smooth muscular layer of the uterus.
The term “uterine leiomyosarcoma”, as used herein, refers to a malignant tumor which originates in the smooth muscular layer of the uterus. The term includes both primary tumors as well as metastasis.
In a first step of the first method of the invention, the expression levels of ARHGAP11A are determined in a sample from the subject whose diagnosis is to be determined.
The term "expression level”, as used herein, refers to the measurable quantity of gene product produced by the gene in a sample of the subject, wherein the gene product can be a transcriptional product or a translational product. As understood by the person skilled in the art, the gene expression level can be quantified by measuring the messenger RNA levels of said gene or of the protein encoded by said gene. In the context of the present invention, the expression level of the genes used in the method according to the invention can be determined by measuring the levels of mRNA encoded by said gene, or by measuring the levels of the protein encoded by said gene, i.e. the protein or variants thereof. Variants of the proteins encoded by the genes which are measured according to the method of the invention include all the physiologically relevant post-translational chemical modifications forms of the protein, for example, glycosylation, phosphorylation, acetylation, etc., provided that the functionality of the protein is maintained.
The term “sample” or “biological sample”, as used herein, refers to biological material isolated from a subject. The biological sample contains any biological material suitable for detecting DNA, RNA or protein levels. In a particular embodiment, the sample comprises genetic material, e.g., DNA, genomic DNA (gDNA), complementary DNA (cDNA), RNA, heterogeneous nuclear RNA (hnRNA), mRNA, etc., from the subject under study. The sample can be isolated from any suitable tissue or biological fluid such as, for example blood, saliva, plasma, serum, urine, cerebrospinal liquid (CSF), feces, a surgical specimen, a specimen obtained from a biopsy, and a tissue sample embedded in paraffin. Methods for isolating samples are well known to those skilled in the art. In particular, methods for obtaining a sample from a biopsy include gross apportioning of a mass, or micro-dissection or other art-known cell-separation methods. In order to simplify conservation and handling of the samples, these can be formalin-fixed and paraffin- embedded or first frozen and then embedded in a cryosolidifiable medium, such as OCT- compound, through immersion in a highly cryogenic medium that allows rapid freeze. In a particular embodiment, the sample from the subject according to the methods of the present invention is a biological fluid sample. In a particular embodiment, the sample from the subject according to the methods of the present invention is selected from the group consisting of blood, serum, plasma, and a tissue sample; more preferably from the group consisting of plasma and a tissue sample.
The term “gene expression profile”, as used herein, refers to a dataset generated from one or more genes listed above that make up a particular gene expression pattern that may be reflective of level of expression of each gene or set of genes in the biological sample under study.
The term “subject” or “patient” refers herein to a person in need of the analysis described herein. In some embodiments, the subject is a patient. In some embodiments, the subject is a human. In some embodiments, the subject is a female human (a woman). In some embodiments the subject is a female presenting with pathology and or history consistent with uterine fibroids believed to be a benign neoplasm. In some embodiments the subject is a female presenting with pathology and or history consistent with uterine fibroids believed to be leiomyoma (LM). In some embodiments the subject is a female presenting with pathology and or history consistent with uterine fibroids believed to be leiomyoma and desiring surgical intervention. In some embodiments the subject is a female presenting with pathology and or history consistent with uterine fibroids believed to be leiomyoma, desiring surgical intervention, and requiring an evaluation of the neoplasm to evaluate the risk that the neoplasm is malignant in order to guide the selection of therapy. In some embodiments the subject is a female presenting with pathology and or history consistent with uterine fibroids, desiring surgical intervention and requiring an evaluation of the neoplasm to evaluate the risk that the neoplasm is a leiomyosarcoma in order to guide the selection of therapy.
The sample wherein the expression level of the ARHGAP11A is determined can be any sample containing cells from the potential tumor. In a particular embodiment, the sample containing cells from the potential tumor is a potential tumor tissue or a portion thereof. In a more particular embodiment, said potential tumor tissue sample is a uterine tissue sample from a patient in which the differential diagnosis of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma is to be carried out. Said sample can be obtained by conventional methods, e.g., biopsy, surgical excision, or aspiration, by using methods well known to those of ordinary skill in the related medical arts. Methods for obtaining the sample from the biopsy include gross apportioning of a mass, or microdissection or other art-known cell-separation methods including and partial tumorectomy. Tumor cells can additionally be obtained from fine needle aspiration cytology. In some embodiment, the sample has been obtained by hysterectomy or laparoscopic/laparotomic myomectomy.
In order to simplify conservation and handling of the samples, these can be formalin-fixed and paraffin-embedded or first frozen and then embedded in a
cryosolidifiable medium, such as OCT-compound, through immersion in a highly cryogenic medium that allows for rapid freeze.
In a particular embodiment of the diagnostic method of the invention, the sample wherein the expression levels of the ARHGAP11A gene are determined, is a tumor sample obtained by hysterectomy laparoscopic/laparotomic myomectomy.
In some embodiments, step (i) of the method of the invention comprises the determination of the expression levels of one or more additional genes. In one embodiment, the first method of the invention comprises in step (i) measuring the expression level of CENPE gene. In one embodiment, the first method of the invention comprises in step (i) measuring the expression level of the CENPE and the COL4A5 gene. In one embodiment, the first method of the invention comprises in step (i) measuring the expression level of the CENPE gene, the COL4A5 gene and the CENPF gene. In yet another embodiment, the first method of the invention comprises in step (i) measuring the expression level of the CENPE gene, the COL4A5 gene, the CENPF and the MFAP5 gene.
It will be understood that step (i) of the first method of the invention may also comprise the determination of any combination of two, three, four or five of the genes shown in Table 3, which include the ARHGAP11A gene, the CENPE gene, the COL4A5 gene, the CENPF and the MFAP5 gene. Accordingly, in some embodiments, step (i) of the first method of the invention comprise the determination of a set of genes selected from the group consisting of ARHGAP11A and CENPE, ARHGAP11A and COL4A5, ARHGAP11A and CENPF, ARHGAP11A and MFAP5, CENPE and COL4A5, CENPE and CENPF, CENPE and MFAP5, COL4A5 and CENPF, COL4A5 and MFAP5, CENPF and MFAP5, ARHGAP11A, CENPE and COL4A5, ARHGAP11A, CENPE and CENPF, ARHGAP11A, CENPE and MFAP5, ARHGAP11A, COL4A5 and CENPF, ARHGAP11A, COL4A5 and MFAP5, ARHGAP11A, CENPF and MFAP5, CENPE, COL4A5 and CENPF, CENPE, COL4A5 and MFAP5, CENPE, CENPF and MFAP5, COL4A5, CENPF and MFAP5, ARHGAP11A, CENPE, COL4A5 and CENPF, ARHGAP11A, CENPE, COL4A5 and MFAP5, ARHGAP11A, CENPE, CENPF and MFAP5, ARHGAP11A, COL4A5, CENPF and MFAP5 and CENPE, COL4A5, CENPF and MFAP5.
In some embodiments, the first step of the first diagnostic method according to the invention comprises the determination of the ARHGAP11 A gene, the CENPE gene, the COL4A5 gene, the CENPF and the MFAP5 gene together with the determination of at least one additional gene selected from the genes listed in Table 2. In some
embodiments, the method comprises the determination of the expression levels of all the genes listed in Table 2.
Gene expression levels can be quantified by measuring the messenger RNA levels of the gene or of the protein encoded by said gene or of the protein encoded by said gene, i.e. ARHGAP11A protein or of variants thereof. ARHGAP11A protein variants include all the physiologically relevant post-translational chemical modifications forms of the protein, for example, glycosylation, phosphorylation, acetylation, etc., provided that the functionality of the protein is maintained. Said term encompasses the ARHGAP11A protein of any mammal species, including but not being limited to domestic and farm animals (cows, horses, pigs, sheep, goats, dogs, cats or rodents), primates and humans. Preferably, the ARHGAP11A protein is a human protein.
In order to measure the levels of the mRNA encoded by a given gene, the biological sample may be treated to physically, mechanically or chemically disrupt tissue or cell structure, to release intracellular components into an aqueous or organic solution to prepare nucleic acids for further analysis. The nucleic acids are extracted from the sample by procedures known to the skilled person and commercially available. RNA is then extracted from frozen or fresh samples by any of the methods typical in the art, for example, Sambrook, J., et al., 2001. Molecular cloning: A Laboratory Manual, 3rd ed., Cold Spring Harbor Laboratory Press, N.Y., Vol. 1-3. In some embodiments, the RNA is extracted from formalin-fixed, paraffin embedded tissues. An exemplary deparaffinization method involves washing the paraffinized sample with an organic solvent, such as xylene, for example. Deparaffinized samples can be rehydrated with an aqueous solution of a lower alcohol. Suitable lower alcohols, for example include, methanol, ethanol, propanols, and butanols. Deparaffinized samples may be rehydrated with successive washes with lower alcoholic solutions of decreasing concentration, for example. Alternatively, the sample is simultaneously deparaffinised and rehydrated. The sample is then lysed and RNA is extracted from the sample. Commercially available kits may be used for RNA extraction from paraffin samples, such as PureLink™ FFPE Total RNA Isolation Kit (Thermofisher Scientific Inc., US). Methods for RNA extraction from paraffin embedded tissues are disclosed, for example, in Rupp and Locker (1987) Lab Invest. 56:A67, and De Andres et al., BioTechniques 18:42044 (1995). Preferably, care is taken to avoid degradation of the RNA during the extraction process.
Various technologies are well-known in the art for deducing and/or measuring and/or detecting the levels of one or more transcripts in a cell. Such methods include hybridization-or sequence-based approaches. Hybridization-based approaches typically
involve incubating fluorescently labelled cDNA with custom-made microarrays or commercial high-density oligo microarrays. Specialized microarrays have also been designed; for example, arrays with probes spanning exon junctions can be used to detect and quantify distinct spliced isoforms. Genomic tiling microarrays that represent the genome at high density have been constructed and allow the mapping of transcribed regions to a very high resolution, from several base pairs to -100 bp. Hybridization-based approaches are high throughput and relatively inexpensive, except for high-resolution tiling arrays that interrogate large genomes. However, these methods have several limitations, which include: reliance upon existing knowledge about genome sequence; high background levels owing to cross-hybridization; and a limited dynamic range of detection owing to both background and saturation of signals. Moreover, comparing expression levels across different experiments is often difficult and can require complicated normalization methods.
In contrast to microarray methods, sequence-based approaches directly determine the cDNA sequence. Initially, Sanger sequencing of cDNA or EST libraries was used, but this approach is relatively low throughput, expensive and generally not quantitative. Tag-based methods were developed to overcome these limitations, including serial analysis of gene expression (SAGE), cap analysis of gene expression (CAGE), and massively parallel signature sequencing (MPSS). These tag-based sequencing approaches are high throughput and can provide precise, digital gene expression levels. However, most are based on Sanger sequencing technology, and a significant portion of the short tags cannot be uniquely mapped to the reference genome. Moreover, only a portion of the transcript is analyzed, and isoforms are generally indistinguishable from each other. These disadvantages limit the use of traditional sequencing technology in measuring or detection mRNA levels.
The present methods can also involve a larger-scale analysis of mRNA levels, e.g., the detection of a plurality of biomarkers (e.g., 2-10, or 5-50, or 10-100, or 50-500 or more at one time). In addition, the methods described here can also involve the step of conducting a transcriptomic analysis (i.e., the analysis of the complete set of transcripts in a cell, and their quantity, for a specific developmental stage or physiological condition). Understanding the transcriptome can be important for interpreting the functional elements of the genome and revealing the molecular constituents of cells and tissues, and also for understanding development and disease and how the biomarkers disclosed herein are indicative or predictive of a particular condition (e.g., LM or LMS). The key aims of transcriptomics are: to catalogue all species of transcript, including
mRNAs, non-coding RNAs and small RNAs; to determine the transcriptional structure of genes, in terms of their start sites, 5' and 3' ends, splicing patterns and other post- transcriptional modifications; and to quantify the changing expression levels of each transcript during development and under different conditions.
Recently, the development of novel high-throughput DNA sequencing methods has provided a new method for both mapping and quantifying transcriptomes. This method, termed RNA-Seq (RNA sequencing), has advantages over existing approaches for determining transcriptomes. Accordingly, in one embodiment, the expression level of the gene or genes used in the first method of the invention are determined by RNAseq.
As used herein "RNAseq" or"RNA-seq" is used to refer to a transcriptomic approach where the total complement of RNAs from a given sample is isolated and sequenced using high-throughput next generation sequencing (NGS) technologies (e.g., SOLiD, 454, Illumina, or ION Torrent).
RNA-Seq uses deep-sequencing technologies. In general, a population of RNA (total or fractionated, such as poly(A)+) is converted to a library of cDNA fragments with adaptors attached to one or both ends. Each molecule, with or without amplification, is then sequenced in a high-throughput manner to obtain short sequences from one end (single-end sequencing) or both ends (pair-end sequencing). The reads are typically 30- 400 bp, depending on the DNA-sequencing technology used. In principle, any high- throughput sequencing technology can be used for RNA-Seq, e.g., the Illumina IG18, Applied Biosystems SOUD22 and Roche 454 Life Science systems have already been applied for this purpose. The Helicos Biosciences tSMS system is also appropriate and has the added advantage of avoiding amplification of target cDNA. Following sequencing, the resulting reads are either aligned to a reference genome or reference transcripts, or assembled de novo without the genomic sequence to produce a genomescale transcription map that consists of both the transcriptional structure and/or level of expression for each gene.
Transcriptome analysis by next-generation sequencing (RNA-seq) allows investigation of a transcriptome at unsurpassed resolution. One major benefit is that RNA-seq is independent of a priori knowledge on the sequence under investigation.
The transcriptome can be profiled by high throughput techniques including SAGE, microarray, and sequencing of clones from cDNA libraries. For more than a decade, oligo nucleotide microarrays have been the method of choice providing high throughput and affordable costs. However, microarray technology suffers from well- known limitations including insufficient sensitivity for quantifying lower abundant
transcripts, narrow dynamic range and biases arising from non-specific hybridizations. Additionally, microarrays are limited to only measuring known/annotated transcripts and often suffer from inaccurate annotations. Sequencing -based methods such as SAGE rely upon cloning and sequencing cDNA fragments. This approach allows quantification of mRNA abundance by counting the number of times cDNA fragments from a corresponding transcript are represented in a given sample, assuming that cDNA fragments sequenced contain sufficient information to identify a transcript. Sequencingbased approaches have a number of significant technical advantages over hybridizationbased microarray methods. The output from sequence-based protocols is digital, rather than analog, obviating the need for complex algorithms for data normalization and summarization while allowing for more precise quantification and greater ease of comparison between results obtained from different samples. Consequently, the dynamic range is essentially infinite, if one accumulates enough sequence tags. Sequence-based approaches do not require prior knowledge of the transcriptome and are therefore useful for discovery and annotation of novel transcripts as well as for analysis of poorly annotated genomes. However, until recently the application of sequencing technology in transcriptome profiling has been limited by high cost, by the need to amplify DNA through bacterial cloning, and by the traditional Sanger approach of sequencing by chain termination.
The next-generation sequencing (NGS) technology eliminates some of these barriers, enabling massive parallel sequencing at a high but reasonable cost for small studies. The technology essentially reduces the transcriptome to a series of randomly fragmented segments of a few hundred nucleotides in length. These molecules are amplified by a process that retains spatial clustering of the PGR products, and individual clusters are sequenced in parallel by one of several technologies. Current NGS platforms include the Roche 454 Genome Sequencer, Illumina's Genome Analyzer, and Applied Biosystems' SOLiD. These platforms can analyze tens to hundreds of millions of DNA fragments simultaneously, generate giga-bases of sequence information from a single run, and have revolutionized SAGE and cDNA sequencing technology. For example, the 3' tag Digital Gene Expression (DGE) uses oligo-dT priming for first strand cDNA synthesis, generates libraries that are enriched in the 3' untranslated regions of polyadenylated mRNAs, and produces base cDNA tags.
In various embodiments the use of such sequencing technologies does not require the preparation of sequencing libraries. However, in certain embodiments the
sequencing methods contemplated herein requires the preparation of sequencing libraries.
Any method for making high-throughput sequencing libraries can be used. An example of sequencing library preparation is described in U.S. Patent Application Publication No. US 2013/0203606, which is incorporated by reference in its entirety. In some embodiments, this preparation may take the coagulated portion of the sample from the droplet actuator as an assay input. The library preparation process is a ligation-based process, which includes four main operations: (a) blunt-ending, (b) phosphorylating, (c) A-tailing, and (d) ligating adaptors. DNA fragments in a droplet are provided to process the sequencing library. In the blunt-ending operation (a), nucleic acid fragments with 5'- and/or 3 '-overhangs are blunt-ended using T4 DNA polymerase that has both a 3 '-5' exonuclease activity and a 5'-3' polymerase activity, removing overhangs and yielding complementary bases at both ends on DNA fragments. In some embodiments, the T4 DNA polymerase may be provided as a droplet. In the phosphorylation operation (b), T4 polynucleotide kinase may be used to attach a phosphate to the 5'-hydroxyl terminus of the blunt-ended nucleic acid. In some embodiments, the T4 polynucleotide kinase may be provided as a droplet. In the A-tailing operation (c), the 3' hydroxyl end of a dATP is attached to the phosphate on the 5 '-hydroxyl terminus of a blunt-ended fragment catalyzed by exo-Klenow polymerase. In the ligating operation (d), sequencing adaptors are ligated to the A-tail. T4 DNA ligase is used to catalyze the formation of a phosphate bond between the A-tail and the adaptor sequence. In some embodiments involving cfDNA, end-repairing (including blunt-ending and phosphorylation) may be skipped because the cfDNA are naturally fragmented, but the overall process upstream and downstream of end repair is otherwise comparable to processes involving longer strands of DNA.
In another example, sequencing library preparation can involve the production of a random collection of adapter-modified DNA fragments (e.g., polynucleotides) that are ready to be sequenced. Sequencing libraries of polynucleotides can be prepared from DNA or RNA, including equivalents, analogs of either DNA or cDNA, for example, DNA or cDNA that is complementary or copy DNA produced from an RNA template, by the action of reverse transcriptase. The polynucleotides may originate in double-stranded form (e.g., dsDNA such as genomic DNA fragments, cDNA, PCR amplification products, and the like) or, in certain embodiments, the polynucleotides may originated in singlestranded form (e.g., ssDNA, RNA, etc.) and have been converted to dsDNA form.
By way of illustration, in certain embodiments, single stranded mRNA molecules may be copied into double-stranded cDNAs suitable for use in preparing a sequencing library. The precise sequence of the primary polynucleotide molecules is generally not material to the method of library preparation, and may be known or unknown. In one embodiment, the polynucleotide molecules are DNA molecules. More particularly, in certain embodiments, the polynucleotide molecules represent the entire genetic complement of an organism or substantially the entire genetic complement of an organism, and are genomic DNA molecules (e.g., cellular DNA, cell free DNA (cfDNA), etc.), that typically include both intron sequence and exon sequence (coding sequence), as well as non-coding regulatory sequences such as promoter and enhancer sequences. In certain embodiments, the primary polynucleotide molecules comprise human genomic DNA molecules, e.g., cfDNA molecules present in peripheral blood of a subject.
Preparation of sequencing libraries for some NGS sequencing platforms is facilitated by the use of polynucleotides comprising a specific range of fragment sizes. Preparation of such libraries typically involves the fragmentation of large polynucleotides (e.g. cellular genomic DNA) to obtain polynucleotides in the desired size range.
The expression level can be determined using mRNA obtained from a formalin- fixed, paraffin-embedded tissue sample. mRNA may be isolated from an archival pathological sample or biopsy sample which is first deparaffinized. An exemplary deparaffinization method involves washing the paraffinized sample with an organic solvent, such as xylene. Deparaffinized samples can be rehydrated with an aqueous solution of a lower alcohol. Suitable lower alcohols, for example, include methanol, ethanol, propanols and butanols. Deparaffinized samples may be rehydrated with successive washes with lower alcoholic solutions of decreasing concentration, for example. Alternatively, the sample is simultaneously deparaffinized and rehydrated. The sample is then lysed and RNA is extracted from the sample. Samples can be also obtained from fresh tumor tissue such as a resected tumor. In a particular embodiment samples can be obtained from fresh tumor tissue or from OCT embedded frozen tissue. In another preferred embodiment samples can be obtained by laparoscopic myomectomy and then paraffin-embedded.
In order to normalize the values of mRNA expression among the different samples, it is possible to compare the expression levels of the mRNA of interest in the test samples with the expression of a control RNA. A “control RNA” as used herein, relates to RNA whose expression levels do not change or change only in limited amounts in tumor cells with respect to non-tumorigenic cells. Preferably, the control RNA is mRNA
derived from housekeeping genes and which code for proteins which are constitutively expressed and carry out essential cellular functions. Preferred housekeeping genes for use in the present invention include p-2-microglobulin, ubiquitin, 18-S ribosomal protein, cyclophilin, IPO8, HPRT, GAPDH, PSMB4, tubulin and p-actin.
In one embodiment, the relative gene expression quantification is calculated according to the comparative threshold cycle (Ct) method using GAPDH, IPO8, HPRT, P-actin or PSMB4 as an endogenous control and commercial RNA controls as calibrators. Final results are determined according to the formula 2_(ACt samP|e-ACt calibrator), where ACT values of the calibrator and sample are determined by subtracting the Ct value of the target gene from the value of the control gene.
Suitable methods to determine gene expression levels at the mRNA level include, without limitation, standard assays for determining mRNA expression levels such as qPCR, RT-PCR, RNA protection analysis, Northern blot, RNA dot blot, in situ hybridization, microarray technology, tag based methods such as serial analysis of gene expression (SAGE) including variants such as LongSAGE and SuperSAGE, microarrays, fluorescence in situ hybridization (FISH), including variants such as Flow- FISH, qFiSH and double fusion FISH (D-FISH), and the like.
In some embodiments, the determination of the expression levels of the genes or genes is carried out by exome-wide gene expression from RNAseq.
In some embodiments, the biological sample is a sample containing myometrial cells or RNA derived from myometrial cells. In yet another embodiment, the sample containing myometrial cells is a myometrial biopsy.
In a second step, the first method of the invention comprises identifying the subject as suffering from uterine leiomyosarcoma or from uterine leiomyoma by a predictive model which correlates the gene expression profile identified in step (i) with representative gene expression profiles from samples obtained from subjects previously identified as suffering from uterine leiomyosarcoma or from uterine leiomyoma, said predictive model having been generated by training a computer with a plurality of gene expression profiles from previously identified subjects suffering from uterine leiomyosarcoma or from uterine leiomyoma by machine learning on said plurality of gene expression profiles so as to obtain representative gene expression profiles associated with uterine leiomyosarcoma or with uterine leiomyoma.
Typically, the representative data sets use at least 10, and more preferably 20, 25, 30 or more gene expression profiles from samples obtained from subjects suffering
uterine leiomyoma or uterine leiomyosarcoma. The data sets may derive from subjects with multiple different parameters such as gender, age, weight, national origin, etc.
In an embodiment, the second step of the first method of the invention is performed by a machine learning method selected from a regression method, a classification method or a combination thereof.
It will be appreciated that the term "machine learning" generally refers to algorithms that give a computer the ability to learn without being explicitly programmed, including algorithms that learn from and make predictions about data. Machine learning algorithms employed by the embodiments disclosed herein may include, but are not limited to, random forest ("RF"), least absolute shrinkage and selection operator ("LASSO") logistic regression, regularized logistic regression, XGBoost, decision tree learning, artificial neural networks ("ANN"), deep neural networks ("DNN"), support vector machines, rule-based machine learning, and/or others.
For clarity, algorithms such as linear regression or logistic regression can be used as part of a machine learning process. However, it will be understood that using linear regression or another algorithm as part of a machine learning process is distinct from performing a statistical analysis such as regression with a spreadsheet program. Whereas statistical modeling relies on finding relationships between variables (e.g., mathematical equations) to predict an outcome, a machine learning process may continually update model parameters and adjust a classifier as new data becomes available, without relying on explicit or rules-based programming.
In a particular embodiment, the second step of the first method of the invention is performed by a classification method, which results in identifying the subject as suffering from uterine leiomyoma or uterine leiomyosarcoma.
In one embodiment, step (b) is carried out by a classification method; preferably selected from logistic regression, random forest, gradient boosting (GB), adaptive boosting (AB), extreme Gradient Boosting (XGB) k-nearest neighbors (kNN), artificial neural network (ANN), support vector machine (SVM), and combinations thereof.
In an embodiment, the predictive model is generated by training the computer with a plurality of gene expression profiles from previously identified samples from subjects suffering from uterine leiomyoma or uterine leiomyosarcoma by machine learning on said plurality of gene expression profiles so as to obtain representative multivariable data sets associated with uterine leiomyoma or uterine leiomyosarcoma; wherein the training comprises the following steps:
(i) training data, from a plurality of gene expression profiles, is randomly stratified into:
- a calibration dataset (particularly in a percentage of 75%), and
- a validation dataset (particularly in a percentage of 25%);
(ii) the predictive model is seeded on the calibration dataset (particularly is developed by applying a machine learning method selected from a regression method, a classification method or a combination thereof on the calibration dataset);
(iii) the predictive model is optimized by an internal cross validation; preferably by a k-fold cross validation, wherein each of the k cases of the k-fold cross validation is used for testing only once and one at a time; and
(iv) the predictive model is further validated by predicting new samples using the validation dataset.
In some embodiments, the second step is performed by a classification method wherein the patients are assigned a probability of belonging to given category such as patients suffering from leiomyoma or patients suffering from leiomyosarcoma. In some embodiments, the classification method is carried out by a method selected from gradient boosting, support vector machine (SVM), decision trees, K nearest neighbors, Naive Bayes or neural networks. In a preferred embodiment, the classification method is carried out by a Gradient Boosting. As used herein, Gradient Boosting is a machine learning algorithm that uses a gradient boosting framework. Gradient Boosting trees, a decision-tree-based ensemble model, differ fundamentally from conventional statistical techniques that aim to fit a single model using the entire dataset. Such ensemble approach improves performance by combining strengths of models that learn the data by recursive binary splits, such as trees, and of "boosting", an adaptive method for combining several simple (base) models. At each iteration of the gradient boosting algorithm, a subsample of the training data is selected at random (without replacement) from the entire training data set, and then a simple base learner is fitted on each subsample. The final boosted trees model is an additive tree model, constructed by sequentially fitting such base learners on different subsamples. This procedure incorporates randomization, which is known to substantially improve the predictor accuracy and also increase robustness. Additionally, boosted trees can fit complex nonlinear relationships, and automatically handle interaction effects between predictors as addition to other advantages of tree-based methods, such as handling features of different types and accommodating missing data. Hence, in many cases their predictive performance is superior to most traditional modelling methods.
In a particular embodiment, the second step is performed by a regression method; preferably selected from multiple linear regression (MLR), principal component regression (PCR), partial least squares regression (PLSR), artificial neural network (ANN), support vector machine (SVM), random forest (RF), lassor regression, ridge regression and combinations thereof.
In a particular embodiment, the second step is performed by a classification method, more in particular, by gradient boosting, which includes the value of one or more variables of the gene expression profile collected in step (i) and which contribute to the identification of the subject as suffering from uterine leiomyoma or uterine leiomyosarcoma.
In a particular embodiment, the second step is performed by a regression method which includes the value of one or more variables of the gene expression profile collected in step (i) and which contribute to the identification of the subject as suffering from uterine leiomyoma or uterine leiomyosarcoma.
In some embodiments, the second method according to the invention is carried out in a patient that has been previously identified as suffering an uterine myometrial tumor, being either leiomyosarcoma or uterine leiomyoma by imaging examination, preferably by ultrasonography.
Second differential diagnostic method of the invention (method based on transcriptomic analysis)
The authors of the present invention have carried out a comparative transcriptomic analysis between histologically confirmed LM (n = 52) and LMS (n = 44). The results have revealed 416 genes that were upregulated and 73 genes that were downregulated in LMS versus LM. Thus, in another aspect, the invention relates to an in vitro method (hereinafter second method of the invention) for the differential diagnosis of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma, the method comprising:
(i) measuring the level of expression of at least one gene selected from the list shown in Table 1 in a biological sample obtained from the subject, and
(ii) comparing said level of expression with a reference value, wherein a deviation in the level of expression of said at least one gene selected from the list shown in Table 1 with respect to said reference value, is indicative that the subject is suffering from uterine leiomyosarcoma or from uterine leiomyoma.
The terms and expressions “differential diagnosis”, “subject”, “uterine leiomyoma”, “uterine leiomyosarcoma”, “biological sample”, “expression level” have been defined in the context of the first method according to the invention and apply equally to the second method of the invention.
In a first step, the second method of the invention comprises the determination of the level of expression of at least one gene selected from the list shown in Table 1 in a biological sample obtained from the subject. In some embodiments, this step comprises the determination. In some embodiments, the first step of the method of the invention comprises the determination of the expression levels of at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least
170, at least 180, at least 190, at least 200, at least 210, at least 220, at least 230, at least 240, at least 250, at least 260, at least 270, at least 280, at least 290, at least 300, at least 310, at least 320, at least 330, at least 340, at least 350, at least 360, at least
370, at least 380, at least 390, at least 400, at least 410, at least 420, at least 430, at least 440, at least 450, at least 460, at least 470, at least 480 or the 489 genes listed in Table 1.
In some embodiment, the first step of the second method of the invention comprises the determination of the expression levels of the genes mentioned in Table 3.
In some embodiment, the first step of the second method of the invention comprises the determination of the expression levels of the genes mentioned in Table 2.
In some embodiment, the first step of the second method of the invention comprises the determination of the expression levels of the genes mentioned in Table 1.
In some embodiments, the biological sample is a sample containing myometrial cells, or RNA derived from myometrial cells. In yet another embodiment, the sample containing myometrial cells is a myometrial biopsy.
In the second step, the second method of the invention comprises comparing said level of expression with a reference value.
“Reference value”, as used herein, refers to a laboratory value used as a reference for values/data obtained by laboratory examinations of subjects or samples collected from subjects. The reference value or reference level can be an absolute value; a relative value; a value that has an upper and/or lower limit; a range of values; an average value; a median value, a mean value, or a value as compared to a particular control or baseline value. A reference value can be based on an individual sample value, such as for example, a value obtained from a sample from the subject being tested, but
at an earlier point in time or from a non-cancerous tissue. The reference value can be based on a large number of samples, such as from population of subjects of the chronological age matched group, or based on a pool of samples including or excluding the sample to be tested. Various considerations are taken into account when determining the reference value of the marker. Among such considerations are the age, weight, sex, general physical condition of the patient and the like. For example, equal amounts of a group of at least 2, at least 10, at least 100 to preferably more than 1000 subjects, preferably classified according to the foregoing considerations, for example according to various age categories, are taken as the reference group. In another embodiment, the quantity of the biomarker in a sample from a tested subject may be determined directly relative to the reference value (e.g., in terms of increase or decrease, or fold-increase or fold-decrease). Advantageously, this may allow to compare the quantity of the biomarker in the sample from the subject with the reference value (in other words to measure the relative quantity of any one or more biomarkers in the sample from the subject vis-a-vis the reference value) without the need to first determine the respective absolute quantities of said biomarker.
Typically, reference values are the expression level of the gene being compared in a reference sample. It will be understood that the “reference sample” may vary depending on whether diagnosis of uterine leiomyosarcoma or of uterine leiomyoma is desired. If the diagnosis of uterine leiomyosarcoma is desired, then the reference sample means a sample obtained from a pool of subjects suffering uterine leiomyoma or which do not have a history of leiomyoma. Thus, in an embodiment, the reference value is the mean level of expression of the gene or genes in a pool of samples from leiomyoma patients. If a diagnosis of uterine leiomyoma is desired, then the reference sample means a sample obtained from a pool of subjects suffering uterine leiomyosarcoma or which do not have a history of leiomyosarcoma. Thus, in an embodiment, the reference value is the mean level of expression of the gene or genes in a pool of samples from leiomyosarcoma patients
In another embodiment, the reference value for the expression level of the gene or genes of interest is the mean level of expression of said gene or genes in a pool of samples from primary tumours, preferably obtained from subjects suffering from the same type of cancer as the patient object of the study. In a particular embodiment, the reference value is the expression levels of the gene of interest in a pool obtained from primary tumor tissue obtained from patients.
The expression profile of the genes in the reference sample can preferably, be generated from a population of two or more individuals. The population, for example, can comprise 3, 4, 5, 10, 15, 20, 30, 40, 50 or more individuals. Furthermore, the expression profile of the genes in the reference sample and in the sample of the individual that is going to be diagnosed according to the methods of the present invention can be generated from the same individual, provided that the profiles to be assayed and the reference profile are generated from biological samples taken at different times and are compared to one another. For example, a sample of an individual can be obtained at the beginning of a study period. A reference biomarker profile from this sample can then be compared with the biomarker profiles generated from subsequent samples of the same individual. In a preferred embodiment, the reference sample is a pool of samples from several individuals and corresponds to portions of tissue that are far from the tumor area and which have preferably been obtained in the same biopsy but which do not have any anatomopathological characteristic of tumor tissue.
Once this reference value is established, the level of this marker expressed in tumor tissue from subjects can be compared with this reference value, and thus be assigned a level of deviation with respect to a reference value. The “deviation” can be either an increase or a decrease in the expression levels. For example, an increase in expression level above the reference value of at least 1.1 -fold, 1.5-fold, 2-fold, 5-fold, 10- fold, 20-fold, 30-fold, 40-fold, 50-fold, 60-fold, 70-fold, 80-fold, 90-fold, 100-fold or even more compared with the reference value is considered as “increased” expression level. Similarly, the expression of a gene is considered increased in a sample of the subject under study when the levels increase with respect to the reference sample by at least 5%, by at least 10%, by at least 15%, by at least 20%, by at least 25%, by at least 30%, by at least 35%, by at least 40%, by at least 45%, by at least 50%, by at least 55%, by at least 60%, by at least 65%, by at least 70%, by at least 75%, by at least 80%, by at least 85%, by at least 90%, by at least 95%, by at least 100%, by at least 110%, by at least 120%, by at least 130%, by at least 140%, by at least 150%, or more. On the other hand, a decrease in expression levels below the reference value of at least 0.9-fold, 0.75- fold, 0.2-fold, 0.1-fold, 0.05-fold, 0.025-fold, 0.02-fold, 0.01-fold, 0.005-fold or even less compared with reference value is considered as “decreased” expression level. Similarly, the expression of a gene is considered decreased when its levels decrease with respect to the reference sample by at least 5%, by at least 10%, by at least 15%, by at least 20%, by at least 25%, by at least 30%, by at least 35%, by at least 40%, by at least 45%, by at least 50%, by at least 55%, by at least 60%, by at least 65%, by at least 70%, by at
least 75%, by at least 80%, by at least 85%, by at least 90%, by at least 95%, by at least 100% (i.e., absent). The comparison of the expression levels of the gene or genes of interest with the reference value allows differential diagnosis between uterine leiomyoma or uterine leiomyosarcoma.
In some embodiments, the patient is detected as having leiomyosarcoma if the expression level of the gene or genes under examination is/are increased with respect to the expression levels found in leiomyoma samples, which is defined in Table 1 as genes having a logFC higher than 0.
In some embodiments, the patient is detected as having leiomyosarcoma if the expression level of the gene or genes under examination is/are decreased with respect to the expression levels found in leiomyoma samples, which is defined in Table 1 as genes having a logFC lower than 0.
In some embodiments, the second method according to the invention allows the differential diagnoses of uterine leiomyosarcoma when the deviation in the level of expression of the gene or genes is/are of at least four fold with respect to the reference value or values for said genes, said reference value being the expression level of the same gene or genes determined in sample from a patient with uterine leiomyoma.
In some embodiments, the second method according to the invention allows the differential diagnoses of uterine leiomyoma when the deviation in the level of expression of the gene or genes is/are of at least four fold with respect to the reference value or values for said genes, said reference value being the expression level of the same gene or genes determined in sample from a patient with uterine leiomyosarcoma.
In some embodiments, the second method according to the invention is carried out in a patient that has been previously identified as suffering an uterine myometrial tumor, being either leiomyosarcoma or uterine leiomyoma by imaging examination, preferably by ultrasonography.
Third differential diagnostic method of the invention (method based on genomic coverage analysis)
The authors of the present invention, using whole-exome, have identified that unlike LM, LMS has significant mutational at the DNA level. Accordingly, the invention relates to an in vitro method for differential diagnosis (hereinafter “the third method of the invention) of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma, the method comprising analyzing in a biological sample from the
subject the coverage in at least one gene selected from the group consisting of the TUBB2b gene, the LRRCC1 gene, the NDRG4 gene, the HSF4 gene and the TMPRSS6 gene, wherein an increased coverage in the HSF4 gene, in the NDRG4 gene, in the TMPRSS6 gene and/or in the TUBB2b gene and/or a decreased coverage in the LRRCC1 gene is indicative that the patient has uterine leiomyosarcoma.
The terms “differential diagnosis”, “uterine leiomyoma”, “uterine leiomyosarcoma” and “sample” have been defined above and are equally applicable to the third method of the invention.
The term “coverage” refers to the abundance of sequence tags mapped to a defined sequence which is obtained after the sequence has been determined. Typically, a given genomic region is read, which may comprise the whole genome or regions thereof, and the number of reads which contain a given sequence tag within the genomic region is defined as the coverage for this specific sequence tag.
The term “sequence tag” is herein used interchangeably with the term "mapped sequence tag" to refer to a sequence read that has been specifically assigned, i.e., mapped, to a larger sequence, e.g., a reference genome, by alignment. Mapped sequence tags are uniquely mapped to a reference genome, i.e., they are assigned to a single location to the reference genome. Unless otherwise specified, tags that map to the same sequence on a reference sequence are counted once. Tags may be provided as data structures or other assemblages of data. In certain embodiments, a tag contains a read sequence and associated information for that read such as the location of the sequence in the genome, e.g., the position on a chromosome. In certain embodiments, the location is specified for a positive strand orientation. A tag may be defined to allow a limited amount of mismatch in aligning to a reference genome. In some embodiments, tags that can be mapped to more than one location on a reference genome, i.e., tags that do not map uniquely, may not be included in the analysis.
Coverage can be quantitatively indicated by sequence tag density (or count of sequence tags), sequence tag density ratio, normalized coverage amount, adjusted coverage values, etc. Once the coverage of the gene or genes of interest has been determined, they can be displayed in a variety of ways. For example, the levels can be displayed graphically on a display as numeric values or proportional bars (i.e., a bar graph) or any other display method known to those skilled in the art. The graphic display can provide a visual representation of the amount of copy number variation in the biological sample being evaluated
The term "coverage quantity" refers to a modification of raw coverage and often represents the relative quantity of sequence tags (sometimes called counts) in a region of a genome such as a bin. A coverage quantity may be obtained by normalizing, adjusting and/or correcting the raw coverage or count for a region of the genome. For example, a normalized coverage quantity for a region may be obtained by dividing the sequence tag count mapped to the region by the total number sequence tags mapped to the entire genome. Normalized coverage quantity allows comparison of coverage of a bin across different samples, which may have different depths of sequencing. It differs from sequence dose in that the latter is typically obtained by dividing by the tag count mapped to a subset of the entire genome. The subset is one or more normalizing segments or chromosomes. Coverage quantities, whether or not normalized, may be corrected for global profile variation from region to region on the genome, G-C fraction variations, outliers in robust chromosomes, etc.
In some embodiments, step (a) described above can comprise sequencing at least a portion of the nucleic acid molecules of a test sample to obtain said sequence information for the nucleic acid molecules of the test sample. In some embodiments, step (c) comprises calculating a single gene dose for each of the gene of interest as the ratio of the number of sequence tags or the other parameter identified for each of the genes of interest and the number of sequence tags or the other parameter identified for the genes chromosome sequence(s). In some other embodiments, gene dose is based on processed sequence coverage quantities derived from the number of sequence tags or another parameter. In some embodiments, only unique, non-redundant sequence tags are used to calculate the processed sequence coverage quantities or another parameter. In some embodiments, the processed sequence coverage quantity is a sequence tag density ratio, which is the number of sequence tag standardized by sequence length. In some embodiments, the processed sequence coverage quantity or the other parameter is a normalized sequence tag or another normalized parameter, which is the number of sequence tags or the other parameter of a sequence of interest divided by that of all or a substantial portion of the genome. In some embodiments, the processed sequence coverage quantity or the other parameter such as a fragment size parameter is adjusted according to a global profile of the sequence of interest. In some embodiments, the processed sequence coverage quantity or the other parameter is adjusted according to the within-sample correlation between the GC content and the sequence coverage for the sample being tested. In some embodiments, the processed sequence coverage
quantity or the other parameter results from combinations of these processes, which are further described elsewhere herein.
In some embodiments, a gene dose is calculated as the ratio of the processed sequence coverage or the other parameter for each of the genes of interest and that for the normalizing gene sequence(s).
In some embodiments, the biological sample is a sample containing myometrial cells or a sample containing DNA from myometrial cells.
After determining the coverage of one or more of the genes mentioned above, the value is compared to the coverage of the same gene in a reference sample. In one embodiment, the coverage in the sample of the patient which is to be differentially diagnosed is compared to the coverage in a reference sample variation in the genomic DNA in a uterine leiomyoma and/or to the coverage in the genomic DNA of a uterine leiomyosarcoma, thereby determining the coverage for the specific gene between the genomic DNA of uterine leiomyoma and the genomic DNA in uterine leiomyosarcoma.
The term “increased coverage” for a given gene, in the context of the third method of the invention is understood as that, when the genome containing the gene is sequenced, the number of reads which contain the sequence of the gene or of sequence tags associated with said gene is increased with respect to the number of reads containing the sequence of the gene or of sequence tags associated with said gene in a reference sample, wherein the reference sample is either a sample of a patient suffering from uterine leiomyosarcoma or a sample from a patient suffering uterine leiomyoma. In some embodiments, the coverage of the gene under consideration is at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 100% higher than the coverage of the gene in the reference sample.
The term “decreased coverage” in the context of the fourth diagnostic method of the invention is understood for a given gene, in the context of the third method of the invention is understood as that, when the genome containing the gene is sequenced, the number of reads which contain the sequence of the gene or of sequence tags associated with said gene is decrease with respect to the number of reads containing the sequence of the gene or of sequence tags associated with said gene in a reference sample, wherein the reference sample is either a sample of a patient suffering from uterine leiomyosarcoma or a sample from a patient suffering uterine leiomyoma. In some embodiments, the coverage of the gene under consideration is less than 90%, less than 80%, less than 70%, less than 60%, less than 50%, less than 40%, less than 30%, less
then 20%, less than 10% or lower than the coverage of the the gene in the reference sample
Accordingly, in some embodiments, a subject can be diagnosed as having uterine leiomyosarcoma if the subject shows an increased coverage in the HSF4 gene, in the NDRG4 gene, in the TMPRSS6 gene and/or in the TUBB2b gene with respect to a reference sample.
In other embodiments, the subject can be diagnosed as having uterine leiomyosarcoma if the subject shows a decreased coverage in the LRRCC1 gene is indicative that the patient has uterine leiomyosarcoma with respect to a reference sample.
In some embodiments, a subject can be diagnosed as having uterine leiomyosarcoma if the subject shows an increased coverage in the HSF4 gene, in the NDRG4 gene, in the TMPRSS6 gene and in the TUBB2b gene and a decreased coverage in the LRRCC1 gene with respect to a reference sample.
In some embodiments, the second method according to the invention is carried out in a patient that has been previously identified as suffering an uterine myometrial tumor, being either leiomyosarcoma or uterine leiomyoma by imaging examination, preferably by ultrasonography.
First diagnostic method of the invention (method based on genomic mutational analysis)
The authors of the present invention, using whole-exome and RNA sequencing (RNAseq), have identified that unlike LM, LMS has significant mutational heterogeneity. This allows the creation of a predictive model for comprehensive molecular classification of LMS and LM at the tumor-tissue level based on the presence of certain mutational signatures. Accordingly, in yet another aspect, the invention relates to an in vitro method (hereinafter referred to indistinctly as “fourth method of the invention” or “diagnostic method of the invention”) for the diagnosis of a uterine tumor selected from the group consisting of uterine leiomyoma or uterine leiomyosarcoma in a subject, the method comprising determining in the whole-exome sequence of a biological sample from the subject the value of an index which correlates with the number of single nucleotide variants which are characteristic of the COSMIC mutational signature 12 and/or of the COSMIC mutational signature 20, wherein an increase in said index with respect to a reference sample is indicative that the subject is suffering from uterine leiomyosarcoma or from uterine leiomyoma.
The terms “diagnosis”, as used herein, refers both to the process of attempting to determine and/or identify a possible disease in a subject, i.e. the diagnostic procedure, and to the opinion reached by this process, i.e. the diagnostic opinion. As such, it can also be regarded as an attempt at classification of an individual's condition into separate and distinct categories that allow medical decisions about treatment and prognosis to be made. In particular, the term “diagnosis of the uterine tumor” relates to the capacity to identify or detect the presence of a tumor in a subject. This detection, as it is understood by a person skilled in the art, does not claim to be correct in 100% of the analyzed samples. However, it requires that a statistically significant amount of the analyzed samples are classified correctly. The amount that is statistically significant can be established by a person skilled in the art by means of using different statistical tools; illustrative, non-limiting examples of said statistical tools include determining confidence intervals, determining the p-value, the Student’s t-test or Fisher’s discriminant functions, etc. (see, for example, Dowdy and Wearden, Statistics for Research, John Wiley & Sons, New York 1983). The confidence intervals are preferably at least 90%, at least 95%, at least 97%, at least 98% or at least 99%. The p-value is preferably less than 0.1 , less than 0.05, less than 0.01 , less than 0.005 or less than 0.0001. The teachings of the present invention preferably allow correctly diagnosing in at least 60%, in at least 70%, in at least 80%, or in at least 90% of the subjects of a determined group or population analyzed.
The terms “uterine leiomyoma” and “uterine leiomyosarcoma” have been defined above and are equally applicable to the fourth method according to the invention.
The fourth method of the invention comprises determining in a whole-exome sequence of a biological sample from a subject the value of an index which correlates with the number of single nucleotide variants which are characteristic of the COSMIC mutational signature 12 and/or of the COSMIC mutational signature 20.
The expression "whole exome sequence" generally means the sequence that results from the sequencing of all the protein-coding genes in a genome (known as the exome). It consists of first selecting only the subset of DNA that encodes proteins (known as exons) and then sequencing this DNA using any high-throughput DNA sequencing technology. Humans have about 180,000 exons, constituting about 1.5 percent of the human genome, or approximately million base pairs. In particular, the exome sequencing may be carried out by next-generation sequencing.
The fourth method according to the invention comprises determining within the whole-exome sequence data the value of an index which correlates with the number of
single nucleotide variants which are characteristic of the COSMIC mutational signature 12 and/or of the COSMIC mutational signature 20.
The term “mutational index” which correlates with the number of single nucleotide variants” as used herein, refers to a numeric value which defines the number of one or more type or types of predetermined single nucleotide variants within a given sequence dataset. In some embodiments, the index is the number of all the mutations present in the sequence dataset.
In some embodiments, the mutational index corresponds to the mutational profile similarity as described in Blokzijl et al. (Genome Med., 2018, 10(1):33, doi: 10.1186/s13073-018-0539-0) which measures the similarity between two mutational profiles .
The term “COSMIC mutational signature”, as used herein, refers to a list of somatic single nucleotide variants that appear in cancer cells as a result of the mutational processes that have been operative in said cells as described by Alexandrov et al. (Nature, 2013, 500(7463) :415-21, doi: 10.1038/nature12477). COSMIC is the acronym of "Catalogue of Somatic Mutations in Cancer".
The term “COSMIC mutational signature 12” is characterized by a transcriptional strand bias for T>C substitutions with more mutations of A than T on the untranscribed strands of genes consistent with damage to adenine and repair by transcription-coupled nucleotide excision repair.
The term “COSMIC mutational signature 20” results from concurrent POLD1 (polymerase delta 1) mutations and defective DNA mismatch repair, leading to microsatellite instability. Mutational profiles characterized by microsatellilte instability are well known as well as methods for their identification once a genomic sequence is available (see e.g. Shah et al., Cancer Res. 2010 Jan 15; 70(2): 431-435).
A list of mutation types that are characteristic of the COSMIC mutational signature 12 and of the COSMIC mutational signature 20 is provided in Alexandrov et al. (Nature, 2013, 500(7463):415-21 , doi: 10.1038/nature12477) as well as in the COSMIC SBS database accessible under https://cancer.sanger.ac.uk/signatures/sbs/, the contents of which are incorporated herein by reference in their entirety.
In some embodiments, step (ii) of the fourth method of the invention comprises the determination of the mutational index for COSMIC signatures 12 and 20,
Once the presence of mutational index is determined, the diagnostic method of the invention comprises diagnosing the presence of uterine leiomyosarcoma or uterine leiomyoma if said index is increased with respect to a reference sample.
The term “reference value” has been defined above in the context of the first, second or third method of the invention and is equally applicable to the fourth method of the invention, with the exception that, in this case, the reference sample which is used for the determination of the reference value is sample obtained from a subject or from a pool of subjects which do not suffer uterine cancer or which do not suffer leiomyoma or leiomyosarcoma. Thus, in an embodiment, the reference value is the mean mutational index in a pool of samples from patients which do not suffer uterine cancer or which do not suffer leiomyoma or leiomyosarcoma. In another embodiment, the reference value for the mutational index is the mean mutational index in a pool of samples preferably obtained from subjects suffering from the same type of cancer as the patient object of the study. In a particular embodiment, the reference value is the mutational index in a pool from primary tumor tissues obtained from patients.
Once the mutational index for the COSMIC signature or signatures in the subject under study is established, the values can be compared with the reference value, and thus be assigned a level of increase with respect to a reference value. The “increase” can be of at least 1.1 -fold, 1.5-fold, 2-fold, 5-fold, 10-fold, 20-fold, 30-fold, 40-fold, 50- fold, 60-fold, 70-fold, 80-fold, 90-fold, 100-fold or even more compared with the reference value is considered as “increased” expression level. Similarly, value is considered increased in a sample of the subject under study when the value increases with respect to the reference sample by at least 5%, by at least 10%, by at least 15%, by at least 20%, by at least 25%, by at least 30%, by at least 35%, by at least 40%, by at least 45%, by at least 50%, by at least 55%, by at least 60%, by at least 65%, by at least 70%, by at least 75%, by at least 80%, by at least 85%, by at least 90%, by at least 95%, by at least 100%, by at least 110%, by at least 120%, by at least 130%, by at least 140%, by at least 150%, or more. The comparison of the mutational values for each of COSMIC signatures or for both COSMIC signatures with the reference value for each of the signatures allows the diagnosis of an uterine tumor selected from uterine leiomyoma or uterine leiomyosarcoma.
In some embodiments, the biological sample is a sample containing myometrial cells or DNA derived from myometrial cells. In yet another embodiment, the sample containing myometrial cells is a myometrial biopsy.
In some embodiments, the fourth method according to the invention is carried out in a patient that has been previously identified as suffering uterine leiomyosarcoma or uterine leiomyoma by imaging examination, preferably by ultrasonography.
Prognostic method of the invention (method based on genomic mutational data)
The authors of the present invention have also found that the presence of CNVs in certain genes correlate with shorter survival in patients suffering leiomyosarcoma. Accordingly, in another aspect, the invention relates to a method (hereinafter referred indistinctl as “fifth method of the invention” or “prognostic method of the invention”) for the prognosis of a patient diagnosed of uterine leiomyosarcoma, comprising
(i) analyzing a biological sample of the subject for the presence of at least one CNVs shown in Table 5, and
(ii) comparing the number of CNVs obtained in step (i) with a reference value wherein the presence of an increased number of CNVs with respect to the reference value, is indicative of a bad prognosis of uterine leiomyosarcoma.
The terms “uterine leiomyosarcoma”, “sample” and “subject” have been defined above in the context of the previous methods of the invention and are equally applicable to the prognostic method of the invention.
The prognostic or fifth method according to the invention allows the determination of the prognosis of a patient suffering from uterine leiomyosarcoma.
As it is used herein, the term “prognosis” refers to the prediction in a subject having a leiomyosarcoma of the likelihood of cancer-attributable death or progression, including recurrence, metastatic spread, and drug resistance, of a leiomyosarcoma. Prognosis may also be referred to in terms of "aggressiveness" or "severity": an aggressive cancer is determined to have a high risk of negative outcome (i.e. , negative or poor prognosis) and a non-aggressive cancer has a low risk of negative outcome (i.e., positive or favorable prognosis). An "aggressive" or "severe" tumour is a cell-proliferation disorder that has the biological capability to rapidly spread outside of its primary location or organ. Indicators of tumour aggressiveness that are standard in the art include, without limitation, tumour stage, tumour grade, Gleason grade, Gleason score, nodal status, and survival. In this context, the term "survival" is not limited to mean survival until mortality (wherein said mortality may be either irrespective of cause or related to a cellproliferation disorder), but may also used in combination with other terms to define clinical outcomes (e.g., "recurrence-free survival", in which the term "recurrence" includes both localized and distant recurrence; "metastasis-free survival"; "disease-free survival", in which the term "disease" includes cancer and diseases associated therewith). The length of the survival may be calculated by reference to a defined starting point (e.g., time of diagnosis or start of treatment) and a defined end point. Accordingly,
a negative or poor prognosis is defined by a lower post-treatment survival term or survival rate. Conversely, a positive or good prognosis is defined by an elevated post-treatment survival term or survival rate. Usually prognosis is provided as the time of progression free survival or overall survival. As one skilled in the art will understand, said determination is not usually correct for all (i.e., 100%) of the patients to be identified. However, the term requires being able to identify a significant part of the subjects. One skilled in the art can readily determine if a part is statistically significant using several well-known statistical evaluation tools, for example, the determination of confidence intervals, the determination of p-values, Student’s t-test, Mann-Whitney test, etc. Details can be found in Dowdy and Wearden, Statistics for Research, John Wiley and Sons, New York 1983. Preferred confidence intervals are at least 90%, at least 95%, at least 97%, at least 98%, or at least 99%. The p-values are preferably 0.1 , 0.05, 0.01 , 0.005, or 0.0001. More preferably, at least 60%, at least 70%, at least 80%, or at least 90% of the subjects of a population can be suitably identified by the method of the present invention.
The prognostic method according to the invention comprises determining in a biological sample of the subject for the presence of at least one CNVs shown in Table 5.
The term “CNV” or “copy number variations refers to variation in the number of copies of a nucleic acid sequence present in a test sample in comparison with the copy number of the nucleic acid sequence present in a reference sample. In certain embodiments, the nucleic acid sequence is 1 kb or larger. In some cases, the nucleic acid sequence is a whole chromosome or significant portion thereof. A "copy number variant" refers to the sequence of nucleic acid in which copy-number differences are found by comparison of a nucleic acid sequence of interest in test sample with an expected level of the nucleic acid sequence of interest. For example, the level of the nucleic acid sequence of interest in the test sample is compared to that present in a qualified sample. Copy number variants/variations include deletions, including microdeletions, insertions, including microinsertions, duplications, multiplications, and translocations. CNVs encompass chromosomal aneuploidies and partial aneuploidies.
Methods for determining CNV of a gene of interest are well-known in the art such as, for instance, the methods disclosed in, U.S. patent application Ser. No. 16/913,965; Hastings et al., Nat Rev Genet; 10(8):551-64 (2009); and Shishido et al., Psychiatry Clin Neurosci, 68(2):85-95 (2014), the disclosures of which are incorporated by reference herein. Existing methods to determine CNVs typically include cytogenetic methods such as fluorescent in situ hybridization, comparative genomic hybridization, and/or virtual
karyotyping with SNP arrays. Other methods include next-generation sequencing and quantitative PCR (qPCR), paralog- ratio testing (PRT) and molecular copy number counting (MCC). qPCR compares threshold cycles (Ct) between the target gene and a reference sequence with normal copy numbers, to generate ACt values which are used for CNV calculation. This method has been used in large-scale CNV analysis in detecting disease associations, for example, psoriasis and Crohn's disease. With the development of genome-wide CNV screening, qPCR is often used as a confirmation method for computationally identified loci. Other multiplex PCR-based approaches, such as multiplex amplifiable probe hybridization, multiplex ligation- dependent probe amplification, multiplex PCR-based real-time invader assay, quantitative multiplex PCR of short fluorescent fragments, and multiplex amplicon quantification, have also been used for targeted screening and validation of CNVs.
In some embodiments, CNV variation is determined by whole genome sequencing. In some embodiments, CNV variation is determined by whole exome sequencing. Both the whole genome sequence and the whole exome sequence can be carried out by Next Generation Sequencing.
The term “Next Generation Sequencing”, as used herein, refers to sequencing technologies having high-throughput sequencing as compared to traditional Sanger- and capillary electrophoresis-based approaches, wherein the sequencing process is performed in parallel, for example producing thousands or millions of relatively small sequences reads at a time.
For example, in some embodiments, determining copy number variation includes the steps of: a. providing at least two sets of first polynucleotides, wherein each set maps to a different reference sequence in a genome, and, for each set of first polynucleotides; i. amplifying the polynucleotides to produce a set of amplified polynucleotides; ii. sequencing a subset of the set of amplified polynucleotides, to produce a set of sequencing reads; iii. grouping sequences reads sequenced from amplified polynucleotides into families, each family amplified from the same first polynucleotide in the set; iv. inferring a quantitative measure of families in the set; v. determining copy number variation by comparing the quantitative measure of families in each set.
In some embodiments, the method for determining the presence or absence of any CNV in a sample comprises (a) obtaining sequence information for nucleic acids in the sample; (b) using the sequence information and the method described above to identify a number of sequence tags, sequence coverage quantity, a fragment size parameter, or another parameter for each of the genes of interest and to identify a
number of sequence tags or another parameter for one or more normalizing gene sequences; (c) using the number of sequence tags or the other parameter identified for each of the genes of interest and the number of sequence tags or the other parameter identified for each of the normalizing genes to calculate a single gene dose for each of the genes of interests; and (d) comparing each gene dose to a threshold value, and thereby determining the presence or absence of any complete CNVs in the sample.
The term “sequence tag” has been defined above in the context of the third method according to the invention and is herein used interchangeably with the term "mapped sequence tag" to refer to a sequence read that has been specifically assigned,
1.e. , mapped, to a larger sequence, e.g., a reference genome, by alignment. Mapped sequence tags are uniquely mapped to a reference genome, i.e., they are assigned to a single location to the reference genome. Unless otherwise specified, tags that map to the same sequence on a reference sequence are counted once. Tags may be provided as data structures or other assemblages of data. In certain embodiments, a tag contains a read sequence and associated information for that read such as the location of the sequence in the genome, e.g., the position on a chromosome. In certain embodiments, the location is specified for a positive strand orientation. A tag may be defined to allow a limited amount of mismatch in aligning to a reference genome. In some embodiments, tags that can be mapped to more than one location on a reference genome, i.e., tags that do not map uniquely, may not be included in the analysis.
Methods for the determination of the presence of a given CNV have been described above in the context of the invention and are equally applicable to the present method.
In some embodiments, the CNV is a deletion, in some embodiments, the CNV is a duplication. In some embodiments, the first step comprises the determination of at least
2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11 or the 12 CNVs shown in Table 5.
The term “increased copy number for a gene” in the context of the third differential diagnostic method of the invention is understood as that at least one additional copy of the gene is present in the sample from the patient which is to be differentially diagnosed with respect to a reference sample, wherein the reference sample is either a sample of a patient suffering from uterine leiomyosarcoma or a sample from a patient suffering uterine leiomyoma. In some embodiments, the copy number of the gene under consideration is at least 1 , at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or more copies of the gene in the reference sample.
The term “decreased copy number” in the context of the fourth diagnostic method of the invention is understood that at least one less copy of the gene is present in the subject which is to be differentially diagnosed as compared to the copy number of the same gene in the reference sample, wherein the reference sample is a sample from the patient which is to be differentially diagnosed and the reference sample is either a sample of a patient suffering from uterine leiomyosarcoma or from a patient suffering uterine leiomyoma.
In some embodiments, the biological sample is a sample containing myometrial cells, DNA derived from myometrial cells or RNA derived from myometrial cells. In yet another embodiment, the sample containing myometrial cells is a myometrial biopsy. In some embodiments, the biological sample is a biofluid. In some embodiments, the biofluid is plasma, blood, serum, urine or uterine fluid.
The prognostic or fifth method of the invention comprises comparing the determining that the patient shows a bad prognosis if one or more of the CNVs listed in Table 5 are present in the sample from the subject.
The term “bad prognosis”, as used herein, denotes a significantly less favorable probability of survival after patient treatment in the group of patients defined as "bad prognosis" compared with the group of patients defined as "good prognosis". According to the invention, the term "bad prognosis" also denotes a significantly less favorable probability of not needing treatment to survive in the group of patients defined as "bad prognosis" compared with the group of patients defined as "good prognosis". In one embodiment, the prognosis of the patient is measured as survival, as disease-free progression or using any other parameter which is reflective of the outcome of the patient.
Methods for selecting a therapy for a subject based on the diagnostic or differential diagnostic methods according to the invention
In another aspect, the invention relates to an in vitro method for selecting a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma as a candidate to receive an adequate therapy to treat uterine leiomyosarcoma or uterine leiomyoma, the method comprising:
(i) determining whether the patient is suffering from uterine leiomyoma or leiomyosarcoma following any of the diagnostic or differential diagnostic methods according to the invention and
(ii) selecting said patient as a candidate to receive an adequate therapy to treat uterine leiomyosarcoma if the patient is diagnosed as suffering from uterine leiomyosarcoma or to receive an adequate therapy to treat uterine leiomyoma if the patient is diagnosed as suffering from uterine leiomyoma
In a first step, the in vitro method for selecting a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma as a candidate to receive an adequate therapy to treat uterine leiomyosarcoma or uterine leiomyoma comprises determining whether the patient is suffering from uterine leiomyoma or leiomyosarcoma by using any of the diagnostic or differential diagnostic methods according to the invention.
In a second step, the method comprises selecting a patient to be treated with either a therapy adequate for the treatment of uterine leiomyosarcoma if the patient is diagnosed as suffering from uterine leiomyosarcoma or with a therapy adequate for the treatment of uterine leiomyoma if the patient is diagnosed as suffering from uterine leiomyoma.
In some embodiments, when the patient is being diagnosed with leiomyosarcoma, the patient is selected to be treated by a therapy selected from the group consisting of surgery, radiation therapy, chemotherapy, hormonal therapy, or targeted therapy.
In some embodiments, the surgery is a simple hysterectomy, radical hysterectomy, or bilateral salpingo-oophorectomy
In some embodiments, the chemotherapy includes one or more drugs selected from the group consisting of Dacarbazine (DTIC), docetaxel, doxorubicin, epirubicin, gemcitabine, ifosfamide, Paclitaxel, temozolomide, trabectedin and vinorelbine.
In some embodiments, the hormonal therapy comprises:
Progestin,
- An agonist of the Gonadotropin-releasing hormone such as goserelin or leuprolide, leuprorelin acetate, leuprorelin acetate sustained release depot (ATRIGEL), triptorelin pamoate, buserelin, naferelin, histrelin, goserelin, deslorelin, degarelix, ozarelix, ABT-620 (elagolix), TAK-385 (relugolix), EP- 100, KLH-2109 or triptorelinand goserelin acetate, or
- An aromatase inhibitor, which is defined as a compound which inhibits estrogen production, for instance, the conversion of the substrates androstenedione and testosterone to estrone and estradiol, respectively, and that includes, but is not limited to steroids, especially atamestane,
exemestane and formestane and, in particular, non-steroids, especially aminoglutethimide, roglethimide, pyridoglutethimide, trilostane, testolactone, ketokonazole, vorozole, fadrozole, anastrozole and letrozole
In some embodiments, therapy is a targeted therapy. The term “targeted therapy”, as used herein, refers to drugs which attack specific genetic mutations within cancer cells, such as leiomyosarcoma while minimizing harm to healthy cells. In some embodiments, the targeted therapy comprises the use of pazopanib.
In some embodiments, when the patient is being diagnosed with leiomyoma, the patient is selected to be treated by a morcellation procedure or other surgical methods for removing leiomyomas after they have been determined not to comprise any leiomyosarcomas. It will be well known that large tissue masses, such as fibroid tissue masses (leiomyomas), are traditionally excised during a surgical procedure and removed intact from the patient through the surgical incision. These tissue masses can easily be several centimeters in diameter or larger. In minimally invasive surgery, the surgery is typically conducted using incisions of less than 1 centimeter, and often 5 millimeters or less. Thus, the trend toward the use of minimally invasive surgery has created a need to reduce large tissue masses to a size small enough to fit through an opening which may be 1 centimeter or smaller in size. It will be appreciated that one common procedure for reducing the size of large tissue masses is morcellation.
Morcellation medical devices are well-known in the art. For example, the instruments described m U.S. Pat Nos. 5,037,379; 5,403,276; 5,520,634; 5,327,896 and 5,443,472 can be used herein (each patent is incorporated herein by reference). As those references illustrate, excised tissue is morcellated (i.e. debulked), collected and removed from the patient's body through, for example, a surgical trocar or directly through one of the surgical incisions. Mechanical morcellators cut tissue using, for example, sharp end-effectors such as rotating blades. Electrosurgical and ultrasonic morcellators use energy to morcellate tissue. For example, a system for fragmenting tissue utilizing an ultrasonic surgical instrument is described in "Physics of Ultrasonic Surgery Using Tissue Fragmentation", 1995 IEEE Ultrasonics Symposium Proceedings, pages 1597-1600. In some embodiments, it may be desirable to conduct the morcellation in conjunction with a tissue specimen bag in order to prevent morcellated tissue from spreading to other parts of the body during and after the morcellation procedure. For example, the excised tissue is can be transferred to a specimen bag prior to being morcellated. However, some morcellators are used without specimen bags. Specimen bags are, therefore, designed to hold excised tissue without spilling tissue, or tissue
components, into the abdominal cavity during morcellation. It will be apparent that specimen bags used with morcellators must be strong enough to prevent tears or cuts which might spill the contents of the specimen bag. Ultrasonic morcellation instruments may be particularly advantageous for use in certain surgical procedures and for debulking certain types of tissue. A blunt or rounded ultrasonic morcellator tip may reduce the possibility of unintended cutting or tearing of a specimen bag while the ultrasonic energy morcellates the tissue.
Therapeutic methods coupled to the differential diagnostic and diagnostic methods of the invention
The invention also provides methods for the treatment of subjects which have been identified as suffering leiomyosarcomas by any of the differential diagnostic or diagnostic method according to the invention, wherein if the subject has been diagnosed as suffering from a leiomyosarcoma, the subject is treated with a therapy adequate for the treatment of leiomyosarcoma. In some embodiments, the therapy is selected from the group consisting of surgery, radiation therapy, chemotherapy, hormonal therapy or a replacement therapy.
The term “treatment”, as used herein comprises any type of therapy, which aims at terminating, preventing, ameliorating and/or reducing the susceptibility to a clinical condition as described herein. In a preferred embodiment, the term treatment relates to prophylactic treatment (i.e. a therapy to reduce the susceptibility of a clinical condition, a disorder or condition as defined herein). Thus, “treatment,” “treating,” and the like, as used herein, refer to obtaining a desired pharmacologic and/or physiologic effect, covering any treatment of a pathological condition or disorder in a mammal, including a human. The effect may be prophylactic in terms of completely or partially preventing a disorder or symptom thereof and/or may be therapeutic in terms of a partial or complete cure for a disorder and/or adverse effect attributable to the disorder. That is, “treatment” includes (1) preventing the disorder from occurring or recurring in a subject, (2) inhibiting the disorder, such as arresting its development, (3) stopping or terminating the disorder or at least symptoms associated therewith, so that the host no longer suffers from the disorder or its symptoms, such as causing regression of the disorder or its symptoms, for example, by restoring or repairing a lost, missing or defective function, or stimulating an inefficient process, or (4) relieving, alleviating, or ameliorating the disorder, or symptoms associated therewith, where ameliorating is used in a broad sense to refer to
at least a reduction in the magnitude of a parameter, such as inflammation, pain, and/or immune deficiency.
The therapeutic method according to the invention are applied to patients which have been diagnosed as suffering leiomyosarcoma by using any of the diagnostic method for leiomyosarcoma or the differential diagnostic methods according to the invention. In some embodiments, the method comprises a first step in which the diagnostic method for leiomyosarcoma or the differential diagnostic methods is applied to the patient and, a second step in which patients diagnosed as suffering leiomyosarcoma are selected and a third step in which the patients are treated with a therapy adequate for the treatment of leiomyosarcoma.
Suitable surgical therapies, radiation therapies, chemotherapies, hormonal therapies or targeted therapies have been described in the context of the methods for selecting a therapy for a subject based on the diagnostic or differential diagnostic methods according to the invention and are equally applicable to the therapeutic methods according to the invention.
Kit of the invention and uses thereof
In another aspect, the invention relates to a kit, package or device that contains reagents adequate for implementing any of the methods of the invention. It will be understood that, depending on the nature of the method, the reagents adequate for its implementation will vary.
In the context of the present invention, “kit” is understood as a product containing the different reagents required for carrying out the methods of the invention packaged such that it allows being transported and stored. The materials suitable for the packaging of the components of the kit include glass, plastic (polyethylene, polypropylene, polycarbonate, and the like), bottles, vials, paper, sachets, and the like. Where there are more than one component in a kit they may be packaged together if suitable or the kit will generally contain a second, third or other additional container into which the additional components may be separately placed. However, in some embodiments, certain combinations of components may be packaged together comprised in one container means. A kit can also include a means for containing any reagent containers in close confinement for commercial sale. Such containers may include injection or blow- molded plastic containers into which the desired vials are retained. One or more compositions of a kit can be lyophilized. In some embodiments, all compositions of a kit
of the disclosure will be lyophilized. In some embodiments, a kit of the disclosure with one or more lyophilized agents will be supplied with a re-constitution buffer. Reagents and components of kits may be comprised in one or more suitable container means. A container means may generally comprise at least one vial, test tube, flask, bottle, syringe or other container means, into which a component may be placed, and preferably, suitably aliquoted.
Furthermore, kits according to the invention can also comprise one or more reagents for preparing crude cell lysates and/or reagents for extracting, isolating and/or purification of nucleic acids from a sample. Additional components can comprise particles with affinity for nucleic acids and/or solid supports with affinity for nucleic acids, one or more wash buffers, binding enhancers, binding solutions, polar solvents, alcohols, elution buffers, filter membranes and/or columns for isolation of DNA/RNA. A kit may further comprise reagents for downstream processing of an isolated nucleic acid and may include without limitation at least one RNase inhibitor; at least one cDNA construction reagents (such as reverse transcriptase); one or more reagents for amplification of RNA, one or more reagents for amplification of DNA including primers, reagents for purification of DNA, probes for detection of specific nucleic acids. Furthermore, the kits of the invention can contain instructions for the simultaneous, sequential, or separate use of the different components that are in the kit. Said instructions can be in the form of printed material or in the form of an electronic medium capable of storing instructions such that they can be read by a subject, such as electronic storage media (magnetic disks, tapes, and the like), optical media (CD-ROM, DVD), and the like. The media may additionally or alternatively contain Internet addresses providing said instructions.
In some embodiments, the kit comprises
(i) primers or probes adequate for the detection of the expression levels of one or more of the genes, the expression levels of which are determined in the first method according to the invention or in the second method according the invention,
(ii) primers or probes adequate for the determination of the coverage of one or more genes gene which are analyzed in the third method according the invention,
(iii) primers or probes adequate for the detection of one or more of the mutations which are characteristic of the COSMIC mutational signatures
12 and/or 20 which are analyzed in the fourth method according to the invention and
(iv) primers or probes adequate for the determination of the one or more of the CNVs defined in the prognostic method (fifth method) according to the invention.
In some embodiments, when the diagnostic method or the differential diagnostic method according to the invention is based on the determination of the expression levels of one or more genes, the kits contain primers or probes adequate for the detection of the expression levels of said one or more genes.
The term "primer" as used herein refers to oligonucleotides that can specifically hybridize to a target polynucleotide sequence, due to the sequence complementarity of at least part of the primer within a sequence of the target polynucleotide sequence. A primer can have a length of at least 8 nucleotides, typically 8 to 70 nucleotides, usually of 18 to 26 nucleotides. For proper hybridization to the target sequence, a primer can have at least 75 percent, at least 80 percent, at least 85 percent, at least 90 percent, or at least 95 percent sequence complementarity to the hybridized portion of the target polynucleotide sequence. Oligonucleotides useful as primers may be chemically synthesized according to the solid phase phosphoramidite triester method first described by Beaucage and Caruthers, Tetrahedron Letts. (1981) 22: 1859-1862, using an automated synthesizer, as described in Needham-Van Devanter et al, Nucleic Acids Res. (1984) 12: 6159-6168. Primers are useful in nucleic acid amplification reactions in which the primer is extended to produce a new strand of the polynucleotide. Primers can be readily designed by a skilled artisan using common knowledge known in the art, such that they can specifically anneal to the nucleotide sequence of the target nucleotide sequence of the at least one biomarker provided herein. Usually, the 3' nucleotide of the primer is designed to be complementary to the target sequence at the corresponding nucleotide position, to provide optimal primer extension by a polymerase.
The term "probe" as used herein refers to oligonucleotides or analogs thereof that can specifically hybridize to a target polynucleotide sequence, due to the sequence complementarity of at least part of the probe within a sequence of the target polynucleotide sequence. Exemplary probes can be, for example DNA probes, RNA probes, or protein nucleic acid (PNA) probes. A probe can have a length of at least 8 nucleotides, typically 8 to 70 nucleotides, usually of 18 to 26 nucleotides. For proper hybridization to the target sequence, a probe can have at least 75 percent, at least 80 percent, at least 85 percent, at least 90 percent, or at least 95 percent sequence
complementarity to hybridized portion of the target polynucleotide sequence. Probes can also be chemically synthesized according to the solid phase phosphoramidite triester method as described above. Methods for preparation of DNA and RNA probes, and the conditions for hybridization thereof to target nucleotide sequences, are described in Molecular Cloning: A Laboratory Manual, J. Sambrook et al., eds., 2nd edition. Cold Spring Harbor Laboratory Press, 1989, Chapters 10 and 11.
In a preferred embodiment, the reagents adequate for the determination of the expression levels of one or more genes comprise at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90% or at least 100% of the total amount of reagents adequate for the determination of the expression levels of genes forming the kit.
In some embodiments, when the diagnostic method or the differential diagnostic method according to the invention is based on the determination of the presence of one or more CNV, the kit comprises reagents suitable for determining the presence of the CNV such as at least a pair of target gene specific primers, a background sequence, at least a pair of primers specific to the background sequence, optionally, a pair of primers specific to a control or reference sequence, a DNA polymerase, dNTP's, MgCI2 and one or more buffers. The kit can also comprise one or more probes, wherein the probe comprises a nucleic acid sequence operable to selectively hybridize to: a target nucleic acid sequence, a reference/control nucleic acid sequence and/or to an amplicon or a fragment of an amplicon, including a target gene amplicon or a fragment thereof, a reference/control amplicon or fragment, and in some embodiments, and optionally to hybridize selectively to a background sequence, an amplicon or a fragment of an amplicon of a background sequence. Probes in a kit of the disclosure can include probes to perform a 5' nuclease assay and/or one or more probes to detect the products of amplification. In some embodiments, one or more probe of a kit of the disclosure is labeled. In some embodiments, one or more probes of a kit of the disclosure is a dual labeled probe. In some embodiments, one or more of the probes of a kit of the disclosure is labeled with a fluor and a quencher. In some embodiments, each probe of a kit is dually labeled with a different fluor and a different quencher. In some embodiments, each probe of a kit is dually labeled with a different fluor and the same quencher.
In a preferred embodiment, the reagents adequate for the determination of one or more of the CNVs as defined in the present invention, these reagents comprise at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least
70%, at least 80%, at least 90% or at least 100% of the total amount of reagents adequate for the determination of the CNVs forming the kit.
Computer Systems and Devices suitable for carrying out the methods of the invention
Methods of the invention can be performed using software, hardware, firmware, hardwiring, or combinations of any of these.
Accordingly, in another aspect, the invention relates to a computer-implemented method, wherein the method is any of the diagnostic, differential diagnostic or prognostic method according to the invention. In another aspect, the invention relates to a computer containing instructions for carrying any of said methods.
The invention is described below by way of the following examples, which are merely illustrative and do not limit the scope of the invention.
EXAMPLES
Materials and Methods
Patient characteristics and clinical sample collection
Epidemiological, histopathological, and clinical outcomes of the 96 participant women undergoing hysterectomy or laparoscopic/laparotomic myomectomy as surgical treatment for symptomatic LM (n = 52) or suspected LMS (n = 44) were selected for the present study.
Patients diagnosed with LM had a median age of 44 years (range: 28-55 years), while patients with LMS had a median age of 53 years (range: 35-75 years), with significant differences only for higher BMI (p = 0.019, t = -2.4039, degrees of freedom = 66) and age of diagnosis (p = 2.976e-8, t = -6.0523, degrees of freedom = 93) in the LMS versus LM group.
Formalin-fixed paraffin-embedded (FFPE) tumor samples were collected from all patients and split into experimental and validation cohorts. The experimental cohort was used for general study of global DNA and RNAseq profiles (44 for LM, 34 for LMS), and the validation cohort included additional new samples (8 LM, 10 LMS) to perform targeted sequencing and model validation. Patients with other gynecological disorders, malignancies, or diagnosed bacterial, fungal, or viral infections were excluded.
Use of human tissue samples was previously approved by the IRB of the hospitals involved: Hospital La Fe, Valencia, Spain (July 24, 2019); Hospital Virgen de la Arrixaca Murcia, Spain (December 17, 2019), Hospital Santa Lucia Murcia, Spain (January 28, 2020), and Fundacion Institute Valenciano de Oncologia (IVO), Valencia, Spain (December 2, 2020). All patients signed and provided written informed consent.
Hospital la Fe contributed 52 LM samples and 13 LMS tumors; Hospital Santa Lucia provided 6 LMS; Hospital Virgen de la Arrixaca provided 12 LMS; and IVO provided 3 LMS. In addition, 13 LMS were obtained from Origene Technologies Inc. (Rockville, MD, USA). Before further processing, anonymized samples were evaluated by two expert pathologists who histologically confirmed a diagnosis of LM or LMS according to WHO criteria. This study was registered on ClinicalTrials.gov with ID NCT04214457, and data were monitored by a clinical research associate.
Nucleic acid isolation
DNA and RNA from 5-10-pm-thick FFPE tumor sections were isolated using GeneRead DNA FFPE kit (Qiagen GmbH, Hilden, Germany) and miRNAeasy FFPE kit (Qiagen GmbH, Hilden, Germany) following manufacturer instructions. Quality control and quantification analysis were performed using Qubit 2.0 Fluorometer (Thermo Scientific, Waltham, MA, USA) and 2100 Bioanalyzer system (Agilent, Santa Clara, CA, USA). Additionally, we measured amplification potential of nucleic acids by assessing ACq value through quantitative PCR, after normalization for a fixed input mass. Only DNA samples with ACq < 5 and 1 pg total DNA were included. Consequently, five out of 34 LMS cases in the experimental cohort were excluded for whole exome sequencing due to insufficient quantity and/or quality of DNA. RNA samples with 100 ng total RNA and DV200 >30% (percentage of RNA fragments >200 nucleotides in length were selected for further experiments. Accordingly, five out of 34 LMS cases in the experimental cohort were excluded for RNAseq due to insufficient quantity and/or quality of RNA.
DNA library preparation, targeted exome sequencing, and mapping
DNA sequencing libraries from 44 LM and 34 LMS were constructed using the KAPA Hyper Prep kit (KK8514 and KK8515; Roche, Basel, Switzerland). Next, all exons were captured by a custom-designed SeqCap EZ MedExome kit (NimbleGen; Roche), which is targeted and enriched for exons and neighboring introns (within 50 bp) of 571 hematological-associated genes. Lastly, each quantified library was loaded in a
HiSeqXTen platform (Illumina, San Diego, CA, USA), and paired-end sequencing was performed according to manufacturer instructions.
DNA sequence data were demultiplexed and formatted in FASTQ files (FASTQ 1 .9 files with Phred+33) containing at least 32 million DNA reads for each sample. Reads were aligned to the human hg19 genome (CRGCh37) using the Burrows-Wheeler Alignment tool software (vO.7.17; https://github.com/lh3/bwa), with -M and -R options to mark short and split alignments as secondary and add read group information, respectively.
SAM files were then converted to coordinate sorted BAM files using samtools v1.9, and Picard Tools v2.20.1 (https://github.com/broadinstitute/picard) was used to mark and remove duplicates. Interval reference files (design files) for the SeqCap EZ MedExome were also provided. Specifically, ±100 bases were added to the capture bed file for calling with a mean depth higher than 150X, with >80% of target regions covered. Coverage data were obtained using the bedtools software (https://github.com/arq5x/bedtools/) and normalized to counts per million (cpm) as follows:
CPM_x = x/z*1000000 where x= coverage of specific gene and z= sum of coverage for all the genes.
Detection of SNVs and supervised analysis of mutational signatures
Small variants, including somatic SNVs and indels, were identified from LM and LMS tumors using Freebayes software (v1.3.2; https://github.com/freebayes/freebayes) with base quality of >3 and alternate fraction of >0.12. Accordingly, for variants with quality <20, homopolymer regions and missing values were filtered out with BCFtools (v1 .9; https://github.com/samtools/bcftools). Variants with >5% frequency in gnomAD as well as variants with a non-PASS filter in BRAVO browser (https://www.nhlbiwgs.org/) were removed from selection.
In brief, SnpEff (version 4.3t, https://github.com/pcingola/SnpEff) was used for variant annotation. Variants were visualized and analyzed using R(v3.6.1), tidyverse (v1.3; https://github.com/tidyverse/tidyverse), and vroom (v1.2; https://github.com/VROOM-Project/vroom).
Analysis of somatic mutational signatures inferred from SNVs also was done using R package Mutational Patterns (v1.12; https://github.com/UMCUGenetics/MutationalPatterns). The profile of each signature was displayed using the six substitution subtypes (C>A, C>G, C>T, T>A, T>C, T>G) and
determined based on to the 96-base substitution model'l l . Comparison with predefined signatures from the COSMIC database
(https://cancer.sanger.ac.uk/cosmic/signatures_v2) also was performed with the Mutational Patterns R package from Bioconductor.
Copy number analysis
For CNV detection, we used the CNVkit Python library (v.0.9.6; https://github.com/etal/cnvkit) with default parameters for tumor analysis. Specifically, sample read depths were normalized and individually compared with the reference, using the circular binary segmentation algorithm to infer copy number segments, which were then annotated to genes. Lastly, to evaluate whether CNV data could be used to differentiate LMS and LM, we performed unsupervised hierarchical clustering based on Euclidean distance calculation of Iog2 values and using heatmap3 (https://github.com/slzhao/heatmap3).
RNA library preparation, sequencing, and mapping
RNA sequencing libraries were prepared using Truseq RNA exome (Illumina), normalized to 10 nM, and clustered into a single pool with final concentration of ~2 pM. Paired-end sequencing (2x75 bp) was carried out in a NextSeq 500 instrument (Illumina).
RNAseq data were demultiplexed to generate intermediate analysis FASTQ format files containing at least 20 million RNA reads per sample. Reads were aligned to the human hg19 genome (GRCh37) using STAR Alignment software (v2.7.0f). After quality filtering, we obtained an average of 47 million uniquely mapped reads to the human transcriptome per sample. Finally, gene transcript abundance was estimated using the HTseq Python package.
Differential expression analysis and validation by qRT-PCR
Differential expression analysis between LMS and LM samples was performed using the edgeR package from the Bioconductor repository and represented with ggplot2 and heatmap3. Trimmed mean of M-values method was used to compute normalization factors, and tagwise dispersions were calculated and subjected to a quasi-likelihood F- test. Only genes with a |logFC|>2 and FDR<0.05 were considered DEGs.
Building the classification model
To assess whether RNAseq data could be used to differentiate LMS from LM, we built a classification model using the caret R package (v6.0-86; https://github.com/topepo/caret). For this purpose, our sample cohort was randomly split, keeping balanced class distributions into a training set (75% of samples) and validation set (25% of samples).
To build and train the models, we reviewed previously identified DEGs. Genes with variances close or equal to 0 or with very high correlation values between them were removed. CPM values were pre-processed using caret, which standardized them to minimize the influence of different scales and variances in the final model by centering and scaling the data (mean = 0, SD = 1 , respectively). We chose the Adaboost algorithm75 and performed parameter tuning using a grid search with 10-fold stratified cross-validation. The optimal hyperparameters were selected based on the lowest value for logLoss (niter = 50, method = AdaBoost.MI).
AdaBoost models were trained with 10-fold stratified cross-validation to obtain robust estimates of tumor classification capabilities, following a two-step approach. In the first step, five subsets of samples were created from the training data. From them, four subsets were used for fitting the model, and the other subset was used for feature pruning. Since model composition varied each time due to the probabilistic nature of classification models, fitting and feature pruning were repeated 10 times for each feature pruning subset, generating a total of 50 models.
In the second step, these models were combined to generate a single aggregate AdaBoost model, which contained the best average performance across all pruning subsets and included a total of 20 relevant genes.
Model validation
Validation of the model was performed by re-sequencing all LM (n = 44) and LMS (n = 34; two samples filtered due to low sequencing quality) samples and adding a new set of 8 LM and 10 LMS samples. Briefly, RNA extracted from 96 FFPE tissue sections was normalized to 100 ng total and used to prepare libraries with a PCR/amplicon-based workflow (AmpliSeq Library Plus, Illumina) following manufacturer instructions. Data from the Illumina NextSeq500 sequencer were demultiplexed and aligned to the CanTRAN hg19 reference genome using BWA mem. Coverage for each of the 20 target genes was calculated using bedtools and normalized based on total reads per sample.
Normalized coverage values were introduced in the caret R package, following the same procedure described above but choosing the gradient boosting algorithm. Once
the model was built, the test set was used for its evaluation and to construct receiver operating characteristic curve (ROCs) as well as to determine the score threshold that maximized both sensitivity and specificity. Finally, the gradient boosting model was capable not only of predicting the tumor type of each sample in absolute terms (LMS or LM) but also predicting the probability of each sample belonging to one group or the other, showing the confidence on prediction for each class, considering that probabilities <50% may indicate mixed classes and less confident predictions that need further study.
Statistical analyses
Statistical analyses were performed using R version 3.3.2 (http://www.R- project.org). Two-tailed Student’s t-tests were performed to compare quantitative clinical variables in LMS and LM patients and for gene validation using RT-qPCR. All survival analyses were performed by Cox regression using the survival R-package (v3.2-3) and corrected for age at diagnosis and minimum stage grouping of the tumor sample. Comparison of survival curves between normal and aberrant CNVs was performed using the log-rank test. P < 0.05 was considered statistically significant.
Example 1
Differential transcriptomic characterization of LMS versus LM
It was sought to identify differential gene expression footprints by RNAseq analysis of 44 LM and 34 LMS tumors. Class comparison detected a total of 489 DEGs, 416 significantly upregulated and 73 downregulated between LMS and LM (FDR < 0.05 and |logFC| > 2) (Fig. 1A, Table 1).
gene logCPM logFC F PValue FDR AURKA 4,93364659 3,83181898 282,778906 1 .2352E-26 7,5881 E-23 KIF14 4,68656406 4,38134893 282,389479 1.2851 E-26 7,5881 E-23 POLQ 4,38266883 3,8365031 281 ,274779 1 ,44E-26 7,5881 E-23 BUB1 B 5,60286999 3,98963299 262,884356 9.9268E-26 3.9233E-22 CDCA2 3,68181779 4,5048897 257,479385 1 .7867E-25 5,2531 E-22 ARHGAP11A 5,35243506 3,63041514 256,480746 1.9937E-25 5,2531 E-22 SPAG5 4,70480234 2,99489374 251 ,868316 3.3228E-25 7.1927E-22 KIF4A 5,24916422 4,37732469 250,160803 4.022E-25 7.1927E-22 CENPF 6,71608484 4,23282265 250,00085 4.0948E-25 7.1927E-22 STIL 4,86882289 2,84984057 247,197138 5,6161 E-25 8.8784E-22 NUF2 3,72750605 4,23026857 245,493828 6.8136E-25 9.7924E-22 PRC1 4,95584806 3,32652305 244,455215 7.6699E-25 1.0104E-21 LMNB1 4,61553528 3,12218551 238,857474 1.4616E-24 1.7774E-21 CKAP2L 3,59601888 4,41780572 237,831442 1 .647E-24 1.8599E-21 DEPDC1 3,87136142 4,57983389 236,601668 1.9016E-24 2,0041 E-21 TPX2 5,39162882 4,18083739 235,982005 2.0448E-24 2.0204E-21 NCAPG 4,8434555 4,20120871 233,236073 2,8261 E-24 2.5908E-21 CCNB1 4,34791022 3,08545649 232,874265 2.9499E-24 2.5908E-21 PLK4 3,81170147 2,6765673 232,130159 3.2222E-24 2.63E-21 BLM 3,8038504 2,86547241 231 ,822622 3.3422E-24 2.63E-21 FANCI 6,88515532 2,30497475 231 ,450246 3.4936E-24 2.63E-21 PRR11 3,66664588 3,35383441 228,509261 4.9673E-24 3.5695E-21 CDCA8 3,37576782 3,88899161 226,144022 6.609E-24 4.5427E-21 KIF15 4,01903401 3,04401797 225,457045 7.1836E-24 4,7319E-21 CDK1 4,15006835 3,4972071 223,615619 8,991 E-24 5.6856E-21 E2F7 4,06947878 4,31857744 222,763846 9.9793E-24 6.0678E-21 ASPM 8,00399579 4,96300199 221 ,311441 1.193E-23 6,9851 E-21 ECT2 5,38784159 2,6710641 219,521158 1 .4885E-23 8.1304E-21 KIF11 4,78112738 3,15895243 219,50503 1.4914E-23 8.1304E-21 RACGAP1 5,15752696 2,48201481 218,374236 1.7164E-23 9.0448E-21 BUB1 4,35242941 3,48071501 217,348907 1.9505E-23 9.9468E-21 CENPE 6,28365788 3,6774226 216,260274 2.2352E-23 1.1043E-20 RAD51 3,10264774 2,44767148 215,151041 2.5694E-23 1 .2309E-20 CHAF1A 4,72160945 2,02826997 214,26242 2.874E-23 1 .3067E-20 TTK 4,18799587 4,14241496 214,210485 2.893E-23 1 .3067E-20 NUSAP1 4,56401763 3,5514457 213,906487 3.0063E-23 1 .3202E-20 ATAD2 6,69627026 2,6684476 210,387187 4,7041 E-23 2.0099E-20 EXO1 3,9669807 3,67556733 210,080395 4.8926E-23 2.0355E-20 KIFC1 3,97345717 3,18780256 205,790424 8.5145E-23 3,4514E-20 KIF23 5,28073975 2,99116325 203,60914 1.1323E-22 4.3687E-20 KIF18A 3,96096562 3,81916947 203,575417 1.1373E-22 4.3687E-20 CDCA5 1 ,25551938 3,60409058 203,420538 1.1606E-22 4.3687E-20 SGO1 2,65124747 4,3774395 202,720611 1 .2725E-22 4.6578E-20 NCAPH 3,09703167 3,59773829 202,545411 1 .3022E-22 4.6578E-20
gene logCPM logFC F PValue FDR MELK 4,23814372 4,05535972 202,408748 1 .3258E-22 4.6578E-20 SGO2 3,62641028 2,63844199 200,538938 1 .6977E-22 5.8344E-20 GAS2L3 4,4109884 3,03077494 199,129534 2.0477E-22 6.8878E-20 CCNA2 4,36957391 2,72197329 198,719926 2.1628E-22 7.1233E-20 ESCO2 2,81397145 3,74183092 196,419098 2.9448E-22 9,501 E-20 EZH2 5,24872267 2,49778232 195,582403 3.2968E-22 1.0424E-19 CDKN3 3,17319293 3,70392949 194,415436 3,8613E-22 1.1969E-19 ORC1 3,88322543 2,8814015 194,148117 4,0041 E-22 1.2173E-19 ARHGAP11 B 4,04633037 2,95298311 192,724921 4.8609E-22 1.4499E-19 HMGB2 6,13610871 2,7002294 192,501135 5.0119E-22 1.4673E-19 PLK1 3,4115553 2,91493254 191 ,294091 5.9135E-22 1.6998E-19 TOP2A 7,63535272 4,23926016 190,974261 6.1793E-22 1.7444E-19 TACC3 3,36485439 2,52455601 190,18469 6.8892E-22 1.9107E-19 KNL1 4,89662839 3,96839483 189,710078 7.3558E-22 2.005E-19 CEP55 3,2748227 3,98491982 188,827872 8.3113E-22 2.227E-19 CLSPN 5,27911955 3,79308768 187,545939 9.9327E-22 2.6171 E-19 ANLN 6,27297536 4,30126781 186,69572 1 ,1184E-21 2.8986E-19 NEIL3 3,23745281 4,29838168 186,029836 1.2277E-21 3.0808E-19 UBE2C 3,34543525 4,01705262 184,586716 1.5039E-21 3.7148E-19 FAM1 11 B 3,33203588 3,98524403 183,751822 1.692E-21 4.1153E-19 NEK2 3,13811874 4,46166226 182,95651 1.8938E-21 4.5363E-19 RAD51AP1 2,89789446 3,07210207 180,510015 2.6843E-21 6.3338E-19 HMMR 3,13297729 3,98813048 178,660968 3.502E-21 8.1416E-19 WDR62 4,67293308 2,99812529 178,541708 3.5628E-21 8.163E-19 BIRC5 2,26235468 4,42744705 177,58046 4.0945E-21 9.247E-19 TICRR 4,07049943 3,91768559 177,03024 4.4348E-21 9.8747E-19 ESPL1 3,44171869 3,54292661 175,996973 5.1548E-21 1.1318E-18 ATAD5 4,75340648 2,14008182 175,271771 5,731 E-21 1 ,2411 E-18 CENPK 3,90315764 3,16638173 174,35784 6.5527E-21 1.3827E-18 CENPU 4,23917666 2,99339633 174,350732 6.5596E-21 1.3827E-18 RAD54L 4,64586357 3,01037616 173,589161 7.3373E-21 1.5228E-18 MAD2L1 2,3917787 2,84277515 173,515717 7,4171 E-21 1.5228E-18 ORC6 3,69455043 2,44605255 172,612191 8.4757E-21 1.7129E-18 TROAP 3,1 1496751 4,0113156 172,545512 8.5597E-21 1.7129E-18 BRIP1 5,83932577 3,23696087 170,258057 1 .2027E-20 2.3767E-18 CDC6 3,91417892 2,61421065 169,488337 1 .3495E-20 2.6339E-18 CKAP2 5,32056838 2,1741882 166,498668 2.1183E-20 4.084E-18 PTTG1 4,40076397 3,50252624 166,004629 2.2834E-20 4.2975E-18 CDC20 3,32368091 4,13293613 165,716873 2.3856E-20 4.437E-18 FANCD2 6,26534918 2,28570509 165,250888 2,5612E-20 4.7082E-18 DLGAP5 4,56620376 4,64106628 164,672867 2.7977E-20 5.0837E-18 FAM72D 2,00092069 2,87778582 164,388983 2,9219E-20 5,2491 E-18 HJURP 3,58979089 4,06704791 164,00268 3.1001 E-20 5.5067E-18 GTSE1 3,12175176 3,39860171 161 ,59926 4.4905E-20 7.8012E-18 CCNE2 3,32570606 3,29461636 161 ,460997 4.5878E-20 7.8836E-18
gene logCPM logFC F PValue FDR SKA3 2,55665243 3,66880615 160,811084 5.075E-20 8.6269E-18 ERCC6L 2,8325179 3,57342106 160,661157 5.1947E-20 8.7365E-18 POC1A 1 ,95939657 2,30164723 156,796878 9.526E-20 1.5837E-17 CIT 5,09500089 3,13333431 156,736937 9.6168E-20 1.5837E-17 MTFR2 1 ,56806724 3,1760953 154,510497 1.3704E-19 2.2107E-17 MCM10 2,7974069 3,49219879 152,222385 1.9793E-19 3,1291 E-17 HASPIN 1 ,39651205 4,12501209 151 ,875408 2.0935E-19 3.2769E-17 CENPH 2,32247177 2,13761335 151 ,759694 2.1331 E-19 3,3061 E-17 WDR76 4,60285053 2,22721792 150,562638 2.5905E-19 3.976E-17 HROB 2,06684911 2,15097671 150,38939 2.6646E-19 4.0504E-17 CDKN2A 3,99172888 4,95142988 150,135263 2.7772E-19 4.1814E-17 FAM72A 2,33858362 2,76480304 150,077162 2.8037E-19 4.1814E-17 CCNE1 2,1 1925382 2,63338387 148,763641 3.4757E-19 5.1352E-17 E2F8 2,60501084 3,96256132 148,001365 3.9395E-19 5.7667E-17 CHEK1 4,14324009 2,25575633 147,05072 4.6086E-19 6,6841 E-17 CCNF 2,58013699 2,15274803 145,695757 5,7701 E-19 8.2926E-17 PCLAF 2,38125965 3,8689723 144,312091 7.2693E-19 1 ,0261 E-16 MKI67 8,48714352 3,90209572 144,2311 7.3686E-19 1.0309E-16 NDC80 4,16184843 2,40742195 140,86765 1.3003E-18 1.7722E-16 H1-5 4,06335011 4,0936262 139,730458 1.5789E-18 2.1153E-16 FOXM1 4,20234969 3,88604656 139,562107 1.625E-18 2.1589E-16 CDC7 4,22593483 2,23000661 139,436679 1.6603E-18 2.1874E-16 UHRF1 2,81794035 3,55873958 139,3378 1.6887E-18 2.2064E-16 FAM83D 3,57435484 3,6072526 138,050481 2.1072E-18 2.7306E-16 XRCC2 1 ,30865433 2,34407295 137,703278 2.2374E-18 2.8757E-16 AURKB 2,07620336 4,25416214 137,476338 2.327E-18 2.9667E-16 IQGAP3 5,21130719 3,68035751 137,242762 2.423E-18 3.0644E-16 KIF20A 4,10713453 3,70121618 136,726589 2.6499E-18 3.3182E-16 MND1 1 ,96258828 2,82187641 136,692409 2.6656E-18 3.3182E-16 FAM72B 1 ,97554546 2,71951229 135,822336 3.1014E-18 3.8305E-16 MYBL2 3,28645448 4,08975665 135,686662 3.1757E-18 3.8619E-16 DTL 3,86741971 2,90431888 135,110481 3,5121 E-18 4.2074E-16 CENPN 3,169166 2,14076443 135,06585 3.5396E-18 4.2074E-16 CCNB2 3,02000971 3,57831356 134,838172 3.6836E-18 4.3458E-16 CDC45 2,91372083 3,41871233 132,236217 5.8284E-18 6.8253E-16 KIF18B 2,73869123 4,08645612 131 ,608138 6.5169E-18 7,5201 E-16 SPC25 1 ,51128231 3,43621775 130,364925 8.137E-18 9.1884E-16 CENPA 0,93013219 3,92982345 130,053515 8.6042E-18 9.5792E-16 FBXO43 1 ,24866447 3,59846283 126,883424 1.5266E-17 1.6877E-15 OIP5 0,94319402 2,87966075 126,805759 1.5483E-17 1.6998E-15 RAD54B 0,81729617 2,20951722 126,43569 1.6566E-17 1 ,8061 E-15 SPC24 1 ,67461576 2,87583266 126,02717 1 ,7851 E-17 1.9198E-15 PBK 2,96424718 4,01344338 125,673172 1.9048E-17 2,021 E-15 ITPRIPL1 1 ,20466146 2,84502087 125,365907 2.0153E-17 2.124E-15 FANCA 6,02012706 2,07315707 124,683628 2.2849E-17 2.3922E-15
gene logCPM logFC F PValue FDR KIF24 3,13220078 2,12122986 123,966211 2.6087E-17 2.7133E-15 CDC25A 1 ,7501106 2,54927895 119,809901 5.677E-17 5.7902E-15 KIF20B 5,21674852 2,04191277 118,989153 6.6325E-17 6.6786E-15 PIF1 1 ,98878754 2,74021364 118,787367 6.8918E-17 6.8958E-15 RRM2 3,85056455 3,49576561 118,721812 6.9783E-17 6.9384E-15 KIF2C 3,97263142 3,37865825 118,144051 7.7903E-17 7.6973E-15 H2AC4 2,09492695 2,5994681 117,367356 9.0374E-17 8,8741 E-15 CDC25C 1 ,9933595 3,73759772 116,935482 9.818E-17 9.5223E-15 SHCBP1 3,57941735 2,83392494 114,658221 1.5245E-16 1.4606E-14 BRCA2 5,16733974 2,0447144 114,268518 1.6446E-16 1.5662E-14 SMC1 B 1 ,94314852 4,37359577 111 ,880715 2.6267E-16 2.4718E-14 DIAPH3 5,24374513 3,22022276 110,405626 3.5187E-16 3.2342E-14 STRIP2 3,88537827 2,79135057 109,711687 4.0409E-16 3.6927E-14 RIBC2 1 ,57393776 3,58386574 109,639235 4.0999E-16 3.725E-14 UBE2T 3,49809687 2,07218864 108,770843 4.879E-16 4.3825E-14 DEPDC1 B 2,17906718 2,83688019 107,824821 5.9028E-16 5.2133E-14 RELT 1 ,87008913 2,09829205 107,354043 6.4923E-16 5.6705E-14 NUP210 4,44242724 3,08962591 107,317321 6.5407E-16 5.6814E-14 SMC4 7,87911703 2,26633814 106,812583 7.2456E-16 6.2593E-14 RECQL4 2,76079816 2,57888389 105,078428 1.0322E-15 8.7732E-14 E2F1 2,0719686 2,48674271 103,496256 1.43E-15 1.2025E-13 GINS2 1 ,72965288 2,15371938 102,534554 1.7459E-15 1.4603E-13 S100A13 2,35613376 2,03163966 101 ,033092 2.3896E-15 1.9778E-13 TCF19 1 ,56625516 2,29688762 100,51139 2.6666E-15 2.1957E-13 RRAGD 3,35024655 2,63070377 100,41966 2.7187E-15 2.2269E-13 GINS1 2,63689041 2,17558635 99,149907 3.5562E-15 2.8684E-13 CCDC34 3,71847429 2,19674799 99,0670346 3.6194E-15 2.8898E-13 HK2 5,07918857 2,41454724 96,7069495 5.9964E-15 4.6698E-13 PKMYT1 0,94247349 3,20311277 95,1424796 8.4131 E-15 6.3943E-13 ZWINT 2,38279053 2,48778549 92,6931884 1.4389E-14 1.063E-12 CENPM 0,72374851 2,70028592 92,1719201 1.6147E-14 1.1818E-12 H2BC17 4,21439618 2,76743789 91 ,3733286 1.928E-14 1.3981 E-12 LAMB3 3,3728309 3,28951628 90,6509373 2,2651 E-14 1.6203E-12 CCDC150 2,74430193 2,08700962 90,2739131 2.4646E-14 1.7394E-12 PLA2G7 3,68473802 3,04365389 89,9305025 2.662E-14 1.8704E-12 MYBL1 3,8548538 2,26467558 87,1286932 5.0229E-14 3.408E-12 MRGPRF 4,62644755 -2,1823516 84,8904406 8.4104E-14 5.5632E-12 TRIM59 3,5222197 2,4889036 84,5221715 9.1614E-14 6.0347E-12 EME1 1 ,72760874 2,24072271 84,055799 1.0212E-13 6.6167E-12 DHRS13 1 ,27838759 2,1958513 83,2548084 1.2316E-13 7.9149E-12 CKS2 3,15276522 2,0501612 81 ,6927827 1.7797E-13 1.099E-11 H3C11 3,16980727 3,27173932 81 ,4941251 1.8655E-13 1 .1476E-11 BICDL1 1 ,26797058 3,02910257 81 ,4325996 1.893E-13 1 .1599E-11 ATF5 2,69782905 2,01149037 80,3510843 2,4491 E-13 1.4778E-11 BCL2A1 0,8767524 3,40893815 79,8141878 2,7851 E-13 1 ,6741 E-11
gene logCPM logFC F PValue FDR IGF2BP3 3,30697562 5,13346462 76,7074855 5.9149E-13 3.3516E-11 MOCOS 3,40063795 2,56937695 76,2872881 6.5573E-13 3,6891 E-11 TK1 1 ,33231093 2,66052755 76,2518667 6.6146E-13 3.7082E-11 CLGN 3,54832734 4,50419003 75,9380832 7.1456E-13 3.9637E-11 H2AC16 4,00048708 2,71987895 75,3132745 8.3372E-13 4.5924E-11 P3H2 6,64830413 -2,4650362 74,487392 1.0233E-12 5.5785E-11 MYH15 1 ,93463094 3,32783005 74,2094708 1.0967E-12 5.9374E-11 SOX9 2,83749537 4,05865903 73,8780051 1.1913E-12 6.4275E-11 ITGA9 8,18121935 -2,4921256 72,7730919 1.5717E-12 8.3662E-11 BMPR1 B 3,4040097 3,93554334 72,6432209 1.624E-12 8.6155E-11 GALNT18 3,73141887 -2,16077 72,1211245 1.8529E-12 9.7642E-11 TINAGL1 4,7823776 -2,4646532 71 ,8012337 2.0093E-12 1.0449E-10 ALPK2 3,840956 4,71936414 71 ,1699525 2.359E-12 1.1953E-10 H2BC3 4,05217918 2,0014519 70,6878896 2.6678E-12 1 ,3431 E-10 CCND2 5,56070675 -2,8458191 70,4388791 2.8433E-12 1.4224E-10 COL4A5 9,85037191 -2,650044 68,6107368 4.5549E-12 2.1954E-10 H3C7 4,30255155 2,78755841 68,2725025 4.9733E-12 2.3753E-10 CDCA3 1 ,74221923 2,64591636 67,7295753 5.7294E-12 2.6557E-10 PSAT1 3,83091914 2,95130993 67,719079 5,7451 E-12 2.6557E-10 PAQR4 0,87907997 2,26820151 67,5887421 5.9442E-12 2.7317E-10 ZNF367 1 ,70879564 2,07174486 67,428411 6.1988E-12 2.8405E-10 SLC7A5 4,01014094 2,362878 65,8183661 9.4718E-12 4.2905E-10 RBM20 3,36058715 2,75501503 65,8013188 9.5147E-12 4.2976E-10 H4C13 I ,26606057 3,31779625 64,8905931 1 ,2121 E-11 5.3377E-10 PAX3 2,40657851 6,46045769 66,9045266 1.4063E-11 6.1587E-10 FBLN5 8,02603135 -2,1 157451 64,078942 1 ,5061 E-11 6.5774E-10 CDT1 1 ,8938762 2,80744538 63,6397311 1.6949E-11 7.3814E-10 PPM1 E 2,54738715 2,97425694 63,3519872 1 .8316E-11 7.8684E-10 H3C2 4,4935086 2,82771281 63,089398 1.9662E-11 8.4239E-10 KCNMA1 I I ,0290628 -2,3866539 63,0591905 1.9824E-11 8.47E-10 APOC1 2,94868088 2,6923272 62,8228405 2,1134E-11 8.8856E-10 HAS2 3,70308973 3,36713788 61 ,7809295 2.8059E-11 1.1582E-09 IL1 B 2,66587573 3,29942832 61 ,1970346 3.2923E-11 1.338E-09 FHAD1 3,64104189 3,02591502 60,0976964 4.4572E-11 1.7704E-09 APOBEC3B 2,44175074 2,07375917 59,728221 4.9378E-11 1.9418E-09 H3C3 3,72008828 2,81862307 59,6393989 5,0611 E-11 1.9854E-09 SHISAL1 3,33458888 -2,224336 59,5858112 5.137E-11 2.0052E-09 PKIA 1 ,12854905 3,63641122 59,0383997 5.9828E-11 2.3182E-09 ACTN2 4,68980375 3,58459311 58,9117112 6.1982E-11 2.3958E-09 DNAH17 2,23876371 3,00033014 58,8969758 6.2238E-11 2.3963E-09 CFAP20DC 3,22814992 2,2620533 58,7883118 6.4157E-11 2,4618E-09 C2 3,42780082 2,15156412 58,6404881 6.6866E-11 2,5451 E-09 MTTP 2,60478004 4,15399719 58,229577 7.5032E-11 2.7714E-09 KBTBD8 0,72457419 2,14881257 58,1782748 7,6121 E-11 2,8051 E-09 SH2D4A 1 ,77872244 2,46228983 58,1691111 7,6317E-11 2.8058E-09
gene logCPM logFC F PValue FDR ANK1 3,63656386 3,08388944 58,1069095 7.7663E-11 2,8421 E-09 DGAT2 0,31009312 2,637336 57,8660905 8.3108E-11 3.0343E-09 HTRA3 5,22127761 -2,3114666 57,4348923 9.386E-11 3.4033E-09 CA9 2,37515017 3,6946718 56,1356018 1.3577E-10 4,791 E-09 PAX7 1 ,02684412 7,66845558 65,8415962 1.4889E-10 5.2306E-09 H2BC9 4,90621657 2,06775825 55,6388768 1.5651 E-10 5,4619E-09 PBX4 0,40822722 2,1 1475847 55,6140622 1.5763E-10 5.4888E-09 ZNF695 1 ,4569316 2,04803035 55,5021152 1.6277E-10 5.6307E-09 H3C1 3,32920978 2,16420882 55,4868987 1.6348E-10 5.643E-09 ZCCHC12 6,34555585 -3,8390577 55,1469154 1.8027E-10 6.1157E-09 CA2 2,83336545 2,17905772 55,0013421 1.8799E-10 6.3368E-09 ARNTL2 3,97885542 2,2044332 54,8407713 1.969E-10 6.595E-09 H2BC14 4,03605597 2,01682095 54,7512412 2.0206E-10 6.7534E-09 SCN8A 4,6193883 2,74378098 54,7149337 2,0419E-10 6,8001 E-09 ANKRD18A 4,32012609 5,07894447 54,5581352 2.1366E-10 7,0517E-09 RAMP1 4,4079508 -2,4090922 54,4656733 2.1945E-10 7,1681 E-09 H2AX 2,56535887 2,59287656 53,3605991 3.0264E-10 9.5498E-09 TPD52 1 ,94318986 2,31118717 52,3886487 4.0249E-10 1.219E-08 H2AC11 5,48796625 2,26555895 52,3711872 4.0457E-10 1 .2206E-08 RAB39A 0,3213848 2,01239168 52,1805874 4.2795E-10 1.2813E-08 ADORA2B 1 ,07087122 2,62797892 51 ,8884338 4.6653E-10 1.3837E-08 ADGRV1 3,80823047 3,4485709 51 ,5747603 5.1195E-10 1.4988E-08 TCHH 3,74057192 3,31379035 51 ,3739614 5.4339E-10 1.585E-08 FMO3 4,72152786 4,63074069 50,948109 6.1681 E-10 1.7794E-08 DPF1 0,99297465 2,10056862 50,8107385 6,4261 E-10 1 .8404E-08 NLRP14 1 ,67520088 2,92356486 50,4499368 7.158E-10 2.028E-08 KCP 2,61315125 3,32007898 50,032508 8.1128E-10 2.2422E-08 CHRDL1 8,58701928 -2,6729544 50,0252548 8.1305E-10 2.2432E-08 CELSR3 3,80202776 2,35158655 49,7940327 8.7161 E-10 2.3757E-08 ESRRB 1 ,31373421 3,17177887 49,2934399 1.0137E-09 2.7117E-08 PRRG4 0,72291909 2,77702362 48,7280639 1 .2032E-09 3.1389E-08 PALM3 2,81150084 3,09462682 48,5552516 1 .2682E-09 3.2652E-08 EYA2 3,85393364 2,86798141 48,5297666 1.278E-09 3.2746E-08 NRN1 2,02038832 2,24571617 48,4925353 1 .2926E-09 3.3012E-08 ABCB5 5,17307443 4,85565273 47,9616162 1.52E-09 3.8142E-08 ADM 1 ,40916974 2,00434476 47,9435035 1 .5284E-09 3.8293E-08 NEURL1 3,05342193 -2,6453287 47,8646471 1.5658E-09 3,8981 E-08 APOBR 0,49137938 2,45058373 47,7880823 1 .6029E-09 3.9843E-08 DSPP 3,09532335 2,81301896 47,7686751 1.6125E-09 4,0018E-08 H3C13 2,16026596 2,3387066 47,7546673 1.6194E-09 4.0127E-08 CCNB3 1 ,23611512 2,17528264 47,5688363 1.7144E-09 4,2019E-08 ACTA1 5,21422272 5,81674582 47,544301 1.7273E-09 4,2141 E-08 CXCL10 1 ,86857171 3,16277307 47,3399149 1.8393E-09 4.4393E-08 OASL 1 ,24501746 3,02050933 46,9539207 2.0716E-09 4.9298E-08 PLP1 6,9621857 -3,6062129 46,5908809 2.3176E-09 5.396E-08
gene logCPM logFC F PValue FDR FAT2 3,32396125 3,7245329 46,5752366 2.3288E-09 5.4142E-08 RAD51 B 7,27190633 -2,2462077 46,1561253 2.6522E-09 6.0242E-08 CDKN2C 4,45298201 2,57122013 45,9540172 2.8243E-09 6.3243E-08 PITX1 3,72669214 3,6010233 45,7335887 3.0252E-09 6.717E-08 CDKN2B 0,99269155 2,52260777 45,4462394 3.3093E-09 7.226E-08 WWC1 1 ,95337961 3,50214372 45,3583185 3.4016E-09 7,4071 E-08 SLC11A1 1 ,86846148 2,62471282 45,2570158 3.5112E-09 7.6144E-08 PRELP 6,23358101 -2,4766702 44,9049672 3,9213E-09 8,4001 E-08 MEI1 1 ,78296803 2,72268236 44,736713 4.1345E-09 8.7852E-08 ARL4C 2,39193445 2,40835956 44,6474561 4.2523E-09 8.9276E-08 MYLK2 1 ,06377615 2,68811367 44,5730379 4.3532E-09 9.1152E-08 RASGEF1A 0,08894467 2,21325199 44,3285336 4.7024E-09 9.7816E-08 IGFN1 9,85155391 3,5632604 44,0616874 5.1166E-09 1 ,0601 E-07 H1-1 2,10275949 2,32674485 43,8841224 5.4128E-09 1.1128E-07 GALNT6 1 ,27186849 2,19149003 43,3048212 6.5079E-09 1.3123E-07 PDE8B 8,65499977 -2,3859672 43,1039105 6.9389E-09 1.3784E-07 BDKRB1 0,99770521 2,62921291 43,0414723 7.0787E-09 1 ,4041 E-07 TMEM108 2,49813858 2,25460349 43,0043707 7.1632E-09 1 ,4191 E-07 CSRP2 4,55116768 2,13249853 42,9022557 7,4011 E-09 1 .4607E-07 TRPA1 4,16281109 2,79925511 42,6675987 7.979E-09 1.567E-07 H3C12 1 ,03657134 2,58119256 41 ,8948652 1 .0232E-08 1.9373E-07 ALPK3 3,76510875 2,05325488 41 ,7617844 1 .0682E-08 2.008E-07 GBP5 3,64441436 2,55355643 41 ,6518203 1.1069E-08 2.066E-07 LIPG 2,43397707 2,93028665 41 ,5206772 1.155E-08 2,1481 E-07 STU 2,1 1961584 2,23968473 41 ,1995461 1.2819E-08 2.3374E-07 PID1 3,27039426 -2,0020867 41 ,1210053 1 ,3151 E-08 2.3896E-07 DNAH11 4,27397833 3,21899005 41 ,0834093 1.3313E-08 2.4107E-07 MYOC 4,89678319 3,80156862 40,8126979 1 ,4541 E-08 2.5916E-07 VEPH1 3,19093198 3,69217007 40,5564922 1 ,581 E-08 2.7989E-07 CYP2S1 1 ,15400502 2,1 1379359 40,4326169 1 .6464E-08 2.8888E-07 IGFBPL1 0,95978059 2,89774941 40,3588418 1 .6867E-08 2.9464E-07 HBA1 3,04674272 3,62178529 40,3403409 1.697E-08 2,9611 E-07 TTC39A 0,91622823 2,36908063 40,2556679 1 .7448E-08 3.0378E-07 CDCP1 2,86265274 2,34643203 40,1439825 1.8099E-08 3.1408E-07 CD247 2,49351542 2,09483532 40,0342896 1.8763E-08 3,2431 E-07 AGBL1 3,7661208 3,17675563 40,0331287 1.877E-08 3,2431 E-07 NETO2 3,82127861 2,07850007 40,024458 1 .8824E-08 3.2488E-07 DPP6 6,58230972 -2,6094636 39,9550347 1.9259E-08 3.313E-07 RBFOX3 2,78649746 -2,9328062 39,9029465 1.9592E-08 3.352E-07 GREB1 7,96845601 -2,6273867 39,8635656 1.9847E-08 3,3811 E-07 MAP3K9 1 ,62998656 2,37498712 39,6597645 2.1226E-08 3.5585E-07 EYA1 5,54626573 3,19893672 39,4114916 2.304E-08 3,8261 E-07 HHIP 3,61749377 3,87149846 38,7792882 2.8416E-08 4,6217E-07 ETV7 0,59391061 2,15134779 38,4784997 3.1412E-08 5.0672E-07
CACNA2D4 1 ,38570222 2,29771833 38,462244 3.1582E-08 5.0792E-07
gene logCPM logFC F PValue FDR PRG4 5,07221266 3,26474359 38,4183049 3.2049E-08 5.1438E-07 GRID1 4,13715609 -2,2508261 38,359603 3.2684E-08 5.2297E-07 SLC12A8 3,1209774 2,60296785 37,8615712 3,8615E-08 6.0864E-07 H2AC14 3,83298494 2,6528932 37,8240144 3.9105E-08 6.1514E-07 RNF157 2,31466726 2,22903524 37,792521 3,9521 E-08 6.1983E-07 PLS1 0,63437032 2,10075085 37,7215666 4.0474E-08 6.3165E-07 PROM1 2,96715689 3,10977333 37,5739897 4.2534E-08 6.5795E-07 PGR 8,57969984 -3,1423068 37,507835 4.3492E-08 6,7211 E-07 ST6GALNAC5 2,44902742 2,75369582 37,2926594 4.6765E-08 7.1918E-07 GZMB 0,44349986 2,53557483 37,1193857 4.9584E-08 7.5884E-07 MFAP5 7,65760802 -2,527177 37,0639994 5.0522E-08 7.7169E-07 PLPP2 2,70835116 -2,2743781 36,8847122 5.3684E-08 8.1448E-07 ADRA2C 2,66915776 -3,1 125241 36,7102139 5.6957E-08 8.6151 E-07 HTR2B 6,26459691 -2,2765778 36,5863699 5.9404E-08 8.927E-07 EXTL1 2,37398675 -2,3384075 36,5760097 5.9614E-08 8,9415E-07 DMRT2 1 ,81161719 3,22455171 36,2474959 6.6669E-08 9.7866E-07 MMP11 5,43211579 -2,8795828 36,2473612 6.6672E-08 9.7866E-07 ANO1 7,49339286 -2,1405461 36,1319634 6.935E-08 1.0151 E-06 AVPR1A 4,19261712 -2,0996211 35,9706085 7,3281 E-08 1 .0658E-06 RFX8 1 ,9587437 2,81256694 35,9200866 7.4558E-08 1 .0824E-06 CFAP61 2,87590571 3,1743672 35,846304 7.6465E-08 1.106E-06 MMP1 2,59395481 4,1490206 35,7356662 7,9418E-08 1.1414E-06 PKHD1 3,04934968 3,26651997 35,2418928 9.4104E-08 1 .3096E-06 BANK1 1 ,13325925 2,25459456 35,1859505 9.5935E-08 1 .3327E-06 ADAMTSL5 1 ,71444542 -2,8612571 35,1769003 9.6235E-08 1.3357E-06 HOOK1 1 ,24344893 2,82106553 35,1524431 9.705E-08 1 .3458E-06 C2CD6 3,84472846 2,15968494 34,8910071 1 .0622E-07 1.4601 E-06 C8orf34 1 ,83455781 3,06351894 34,880102 1 .0662E-07 1 .4644E-06 COL9A1 3,12028001 3,68445313 34,8019664 1.0954E-07 1 .4993E-06 PRDM6 2,31286785 2,1 1014692 34,7327302 1.1219E-07 1.5317E-06 TRIB3 0,61559029 2,27168331 34,7050266 1.1328E-07 1 .5438E-06 CYP24A1 2,45103875 3,95706163 34,6082534 1.1714E-07 1 .5882E-06 NEB 8,0542311 3,10640458 34,5166798 1 .2092E-07 1 .6282E-06 CNN1 8,30395078 -2,0140962 34,4324014 1.2451 E-07 1 .6667E-06 TMEM156 0,5017417 2,31910786 34,4049122 1.257E-07 1 .677E-06 ABCA13 2,36334718 2,84414747 34,0759515 1.4095E-07 1.8631 E-06 CXCL9 1 ,99343326 3,23451164 34,0577936 1.4184E-07 1.8718E-06 BMP8A 1 ,98883052 -2,286958 34,0309125 1.4318E-07 1 .8862E-06 FOSL1 1 ,20391061 2,44563006 33,5129461 1.716E-07 2.1984E-06 BAALC 2,10190228 2,46961718 33,4603974 1.7479E-07 2.2303E-06 GRIN2B 2,23466144 3,1341458 33,3676589 1.8057E-07 2.2985E-06 STEAP1 B 1 ,21210258 2,43365012 33,3199025 1.8363E-07 2,3317E-06 SALL4 3,57135946 2,60688673 33,073299 2.0027E-07 2.5227E-06 TBX18 3,18506816 2,45971029 32,8601909 2.1589E-07 2.6938E-06 TMPRSS2 -0,7678112 3,88405616 32,7700725 2.2287E-07 2.7735E-06
gene logCPM logFC F PValue FDR H2AC12 3,06871395 2,15299833 32,7592538 2.2372E-07 2.7783E-06 COL4A6 7,13497864 -2,2397277 32,2993616 2.6329E-07 3.1969E-06 STK31 0,95282291 2,38379842 32,2295675 2.699E-07 3,2713E-06 AGTR2 4,18635599 -4,0816173 31 ,9537071 2.9773E-07 3.5658E-06 DNAJC6 2,02802235 2,44784247 31 ,9392878 2.9926E-07 3.5774E-06 ASIC2 1 ,09675293 -2,9259661 31 ,936585 2.9955E-07 3.5774E-06 PWWP3B 4,50787934 -2,1638732 31 ,7695684 3.1794E-07 3.7685E-06 CHI3L1 5,21033578 3,0183314 31 ,765615 3.1838E-07 3.7703E-06 WNK4 3,49547986 2,02673987 31 ,6763936 3.2869E-07 3.8749E-06 GPC3 3,93059586 2,1810483 31 ,4993214 3,5018E-07 4.0947E-06 TACR2 4,79908063 -2,9196091 30,7264757 4.623E-07 5.193E-06 SFRP4 8,09299483 -2,9569401 30,3583102 5,2813E-07 5.8224E-06 F2RL2 4,3137238 -2,0202029 29,8836816 6,2751 E-07 6.7168E-06 STAG3 4,07587471 2,06815471 29,8798343 6.2839E-07 6.7168E-06 STEAP3 1 ,70448017 2,01480206 29,8048832 6.4578E-07 6.8702E-06 XKR9 1 ,15584965 2,0844341 29,8011135 6.4667E-07 6.875E-06 RANBP3L 3,2349957 3,14188836 29,7870312 6.5E-07 6,9011 E-06 SPP1 6,64746836 3,15171713 29,2995383 7.7677E-07 8,0517E-06 CD2 1 ,94877032 2,02333638 29,2046766 8.0427E-07 8,3211 E-06 MX1 5,73572774 2,06966225 29,1488785 8.209E-07 8.4655E-06 AMH 1 ,4138074 2,14951281 29,0787181 8.4233E-07 8.6358E-06 SYT14 3,70915938 3,98697909 28,8031074 9.3222E-07 9.3989E-06 TEX14 2,47783166 2,16744638 28,6286496 9,9417E-07 9.9223E-06 MGST1 2,16938612 2,89590982 28,3757839 1.0916E-06 1.0709E-05 COL19A1 4,10190278 3,34292926 28,374795 1 .092E-06 1.0709E-05 SMOC2 7,39395557 -2,2039746 28,0257915 1 .2429E-06 1.1967E-05 SEPTIN3 2,17524213 2,3476715 27,7190866 1 .3932E-06 1.3173E-05 TENM2 4,34878625 3,90648674 27,6379813 1 .436E-06 1.3473E-05 KLHL41 2,41270688 2,34655711 27,4968167 1.5138E-06 1 .4044E-05 MACC1 0,78817109 2,07442299 27,1752392 1.7075E-06 1 .564E-05 TNNT2 4,56465168 4,09312625 27,069432 1 .7767E-06 1.617E-05 ANPEP 3,90648955 2,32422766 27,0276538 1 .8048E-06 1 .636E-05 CLDN1 6,0331851 -2,8897245 26,9569023 1 .8534E-06 1 .6729E-05 DSCAML1 3,43659165 -2,0279797 26,9561037 1 .854E-06 1 .6729E-05 HHIPL2 0,76430098 2,12784039 26,8212205 1 .9506E-06 1.7455E-05 KBTBD12 0,84156362 2,12552718 26,7694132 1.989E-06 1.7725E-05 SFRP1 6,51398184 -2,2284646 26,6419614 2.0869E-06 1.8435E-05 MKX 1 ,48702129 2,44920554 26,6077139 2.1141 E-06 1.8619E-05 RGS9 3,39070343 2,05098839 26,3525091 2.3283E-06 2,0213E-05 CD8A 2,84404227 2,04155552 26,0255858 2.6357E-06 2.2402E-05 RP1 L1 4,02538773 2,4311027 25,8404398 2,8281 E-06 2.3769E-05 HPGD 5,92018753 -2,2716954 25,736767 2,9421 E-06 2.4609E-05 PRPH 1 ,97962972 2,52461242 25,2212082 3.5834E-06 2.9216E-05 WDR49 1 ,46713537 2,22922805 24,9248492 4.0156E-06 3.2225E-05 MUC4 4,1 1539547 3,46588971 24,7682227 4.2654E-06 3.3953E-05
gene logCPM logFC F PValue FDR NPY1 R 3,83288017 -2,0663633 24,5797149 4.5874E-06 3.6188E-05 POTEF 3,84910845 3,09977558 24,0246811 5.689E-06 4.3595E-05 GABBR2 3,61808299 3,25871319 24,0094102 5.7229E-06 4,3791 E-05 HBA2 5,1 1694612 2,90421218 23,920286 5.9249E-06 4,5151 E-05 TAF7L 1 ,72067738 2,09479986 23,8036708 6.2004E-06 4.6856E-05 SBSPON 4,29045648 2,04586611 23,7771101 6.265E-06 4.7299E-05 TRPV3 1 ,36025852 2,3462521 23,5214765 6.9232E-06 5.1433E-05 NDP 2,51688799 -2,1410259 23,404467 7.2478E-06 5.3542E-05 FER1 L6 1 ,7254508 2,38825962 23,0897301 8,2012E-06 5.9529E-05 MYH8 7,13807155 3,50072485 23,027914 8.4032E-06 6,0771 E-05 FZD5 2,35818385 -2,3044078 22,7990407 9.1967E-06 6.554E-05 HTR1 E 1 ,02136238 -3,0801271 22,7883149 9.2357E-06 6,571 E-05 ANGPT4 -0,2386042 -2,1065772 22,7160494 9.5032E-06 6.734E-05 MMP9 5,60759147 2,98384824 22,6362144 9.8079E-06 6.9097E-05 FST 2,42668641 2,25789034 22,631108 9.8277E-06 6.9206E-05 VWDE 1 ,61195693 2,41985815 22,5150882 1.029E-05 7,2041 E-05 GATA3 1 ,7623253 2,63110088 22,4248345 1 .0664E-05 7.427E-05 SDC1 2,26293197 2,04903994 21 ,417175 1.5943E-05 0,00010506 GLDC 2,21136795 2,41701572 21 ,3633634 1 ,6291 E-05 0,0001069 SLFN13 4,59772058 2,1 1809321 21 ,2765746 1.687E-05 0,00011008 SH3GL2 1 ,31908391 2,03456012 21 ,265773 1 .6943E-05 0,00011041 TPSAB1 1 ,44730225 -2,6732271 21 ,2522631 1.7036E-05 0,00011092 PCP4L1 0,83678053 -2,35968 20,8445946 2,0081 E-05 0,00012768 ELAVL2 2,181336 2,28975311 20,8277676 2,0218E-05 0,00012831 SPAG17 4,30815621 2,25313793 20,6526747 2.1704E-05 0,00013675 SHISA6 2,51302185 -2,7438671 20,5171139 2,2931 E-05 0,00014312 IL1 R2 0,19875731 2,02604324 20,486031 2.3222E-05 0,00014465 BTBD11 2,40506547 2,3248636 20,4134659 2,3918E-05 0,0001484 SLC14A1 3,04933978 2,29314832 20,3345249 2.4698E-05 0,00015246 ACKR1 2,33316502 -2,1072234 19,5915044 3.347E-05 0,00019862 CATSPER1 1 ,19528813 2,01901723 19,3579187 3.6848E-05 0,00021519 KRT8 4,66488184 2,57024947 19,3081535 3.7612E-05 0,0002186 HBB 6,52075327 2,56480384 19,2361051 3.8747E-05 0,0002243 PTCHD1 5,70126387 -2,4001447 19,0372328 4.2067E-05 0,00024035 NLRP2 1 ,56768611 2,07448992 18,7575621 4,7241 E-05 0,00026587 PI15 7,52354012 -2,1601694 18,391832 5,5015E-05 0,00030234 DMBT1 2,89155736 2,27345749 18,3273132 5,6518E-05 0,00030874 DCSTAMP 1 ,00282158 2,49780075 18,1639691 6,0515E-05 0,00032763 ROS1 0,92029248 2,91955056 18,0549086 6.3345E-05 0,00034143 AK5 2,38739088 2,10421007 18,0135062 6.4455E-05 0,00034588 LEFTY2 1 ,25277457 -2,1836508 17,970881 6,5619E-05 0,00035118 MUC16 5,83402563 2,78030176 17,9111378 6.7287E-05 0,00035913 PDE11A 3,74464787 -2,835101 17,8602536 6.8742E-05 0,00036578 IRF6 2,10876377 2,25252567 17,8140564 7,0091 E-05 0,00037183 HMGA1 5,84658966 2,14288921 17,6651537 7.4628E-05 0,00039131
gene logCPM logFC F PValue FDR PTPRQ 1 ,88958148 2,19867517 17,5714846 7.7637E-05 0,00040494 FOXD2 1 ,06327012 2,16129206 17,2813964 8.7774E-05 0,00044834 NNAT 1 ,8258957 2,45877572 17,11297 9.4277E-05 0,00047595 GLB1 L2 5,04128566 -2,7216377 17,0189902 9,8121 E-05 0,00049166 PTCHD4 3,715816 -2,0987597 16,738816 0,00011057 0,00054234 MYO18B 3,66662595 2,92077316 16,6373508 0,00011547 0,0005618 ADAMTS18 3,70293488 -2,2886049 16,4962678 0,00012266 0,00059318 COBL 1 ,54385386 2,03027323 16,3491951 0,00013065 0,0006255 USH1 C 1 ,8154103 2,1 1606573 16,2279755 0,00013764 0,00065224 NEFL 4,59245724 2,56530256 16,0421449 0,00014911 0,0006972 PRSS12 4,01180213 2,23224826 15,8074692 0,00016502 0,00075771 C7 7,3248061 -2,3507038 15,670751 0,00017509 0,00079976 TMEM178B 1 ,77559513 -2,0586533 15,5919596 0,00018118 0,00082306 ERBB3 4,44279348 2,23750319 15,556493 0,00018399 0,00083392 RYR1 7,23422442 2,02408288 15,4156374 0,00019561 0,00087903 FGFR4 1 ,00956399 2,0023744 14,9458514 0,00024017 0,00104596 MME 6,62411392 2,30933359 14,9077346 0,00024422 0,00106154 SLC24A2 4,57876843 -2,0482124 14,866466 0,00024868 0,00107745 OR2L5 1 ,03113038 2,07289103 14,3413013 0,00031342 0,0013015 PTPRT 1 ,90923656 2,28799757 13,782834 0,00040167 0,0016129 RAB3C 1 ,60241902 -2,6140978 13,6028222 0,00043531 0,00172736 SORCS1 2,00336706 2,04382335 13,5304394 0,00044964 0,00177889 ZNF729 2,40610793 2,2089434 13,1797754 0,00052635 0,00203101 COL11A1 8,02097389 2,46228393 13,0813705 0,00055022 0,00210802 FREM2 5,09131326 2,7214513 12,9217237 0,00059137 0,00224034 ADAMTS19 7,1820374 -2,1485244 12,8590994 0,00060837 0,00229868 MYOM3 2,41196829 2,53313409 12,6573265 0,00066666 0,00247982 NRXN1 1 ,90171082 2,00003831 12,1461162 0,00084175 0,00302882 ZNF804A 2,44116346 2,30490974 11 ,8689907 0,00095597 0,00336575 LGI3 0,32655015 -2,1354015 11 ,3376464 0,0012222 0,00415702 UNC13A 2,8907374 2,20694435 11 ,1656586 0,00132401 0,00444496 MSLN 0,87475691 -2,0187546 11 ,0117597 0,00142258 0,00472271 TRDN 3,89886031 2,67888129 10,7552912 0,00160411 0,00522335 TDRD9 2,77795325 2,27266382 10,2344745 0,00205072 0,0064236 RXFP2 4,13545102 -3,0343168 10,0279168 0,00226205 0,00694923 GLB1 L3 3,18593226 -2,3647644 9,86113188 0,00244916 0,00741172 SLC6A15 1 ,26222945 2,08570165 9,64573427 0,00271493 0,0080814 SYT10 3,72410184 2,19774227 8,36615914 0,00505339 0,01367027 NTRK1 2,46122186 1 ,19583869 6,19882312 0,0150949 0,03459196 CACNA1 S 3,74802173 2,14038257 6,12774265 0,01566287 0,03564335
Table 1 : Differentially expressed genes between LMS and LM samples.
Next, unsupervised hierarchical clustering grouped LMS samples in a homogeneous cluster of 29 samples, while 30 LM samples were detected in a separate cluster. Of note, another cluster included the remaining LM with some LMS (LMS03, LMS11 , LMS26, LMS35, and LMS62). Based on the heatmap/dendogram, these LMS samples appeared closer to the LM group, suggesting that their molecular profile was more similar to LM samples than LMS samples. This was in line with clinical information of the corresponding patients, as only one out of the five LMS patients died due to the disease, while the other four are still alive, reinforcing that these tumors may have intermediate characteristics but are closer to LM than LMS.
Model creation and validation for differential molecular diagnosis of LMS and LM
To classify LM and LMS tumors, a machine learning approach was developed based on the transcriptomic signatures of each group, since we found that management, analysis, and biological comprehension was more straightforward when using RNAseq data. After feature pruning, the final model was composed of 20 DEGs and was able to correctly classify all samples in the validation set (Table 2).
predictors logCPM logFC F PValue FDR Importance Importance unnormalized CHAF1A 4,721609449 2,028269972 214,2624195 2,9E-23 l,3E-20 0 0 HMGB2 6,136108708 2,700229401 192,5011346 5E-22 1,5E-19 0 0 CENPF 6,716084843 4,232822647 250,0008498 4,lE-25 7,2E-22 11,82498319 27,3812988 CENPU 4,239176663 2,993396332 174,3507322 6,6E-21 1,4E-18 8,72432095 20,2015711 PBK 2,964247184 4,013443385 125,673172 1,9E-17 2E-15 3,00656961 6,9618518 CDCA5 1,255519375 3,604090582 203,4205383 l,2E-22 4,4E-20 0 0 E2F7 4,069478782 4,318577441 222,7638456 IE-23 6,1E-21 3,97655562 9,2078996 ARHGAP11A 5,352435058 3,630415136 256,480746 2E-25 5,3E-22 100 231,5546533 CENPE 6,283657885 3,677422597 216,2602742 2,2E-23 1,1E-2O 46,93236027 108,6740641 EXO1 3,966980697 3,675567335 210,0803951 4,9E-23 2E-20 0,0795938 0,1843031 CCDC34 3,718474292 2,196747992 99,06703459 3,6E-15 2,9E-13 0 0 CENPH 2,322471771 2,137613351 151,759694 2,1E-19 3,3E-17 0,06739962 0,1560677 ITPRIPL1 1,20466146 2,845020866 125,3659065 2E-17 2,1E-15 S100A13 2,356133758 2,031639664 101,0330923 2,4E-15 2E-13 0,37869143 0,8768776 ITGA9 8,18121935 -2,492125591 72,77309195 1,6E-12 8,4E-11 0 0 COL4A5 9,850371909 -2,650044037 68,6107368 4,6E-12 2,2E-10 18,52564931 42,897003 MFAP5 7,657608021 -2,527177027 37,06399936 5,1E-O8 7,7E-07 8,86742726 20,5329404 SH2D4A 1,778722439 2,462289829 58,16911111 7,6E-11 2,8E-09 0 0 NRN1 2,020388319 2,245716171 48,49253529 l,3E-09 3,3E-08 0 0 BRCA2 5,167339738 2,0447144 114,268518 1,6E-16 1,6E-14 0 0
Table 2: List of 20 differentially expressed genes identified from RNAseq data using a machine learning approach
Based on this model, we built a targeted sequencing panel using AmpliSeq technology for the 20 selected genes, which was used to re-analyze all previous LM and LMS tumors (n = 44 and n = 32, respectively, since two LMS samples were filtered due to poor sequencing quality) in addition to new samples (eight LM and ten LMS).
Next, the total 96 samples were randomly split into a training set to build the machine learning model and a test set to validate the model (75% and 25% class- balanced samples for training and test sets, respectively). Specifically, the gradient boosting algorithm was used to build a new model, which achieved optimal values of sensitivity and specificity, since it was able to correctly classify all test and training samples. The model was used to calculate class probabilities for all samples, allowing for a more fine-tuned classification of samples, where we defined a “warning range” for those tumors where the model was not confident enough, defined as probabilities of <75% for each group (Fig. 1 B). Interestingly, this model could correctly classify all samples with high class probabilities, even for sample LMS39 with the lowest LMS probability.
Example 2
Differential diagnosis of LM and LMS based on a 5 gene signature
In order to reduce the number of necessary genes for the classification of LM/LMS, the genes in Table 3 were ordered according to their importance (relative contribution of each variable) in the machine learning model, to select only those above an importance threshold, which was set to the top five genes with the highest importance. As follows in the section “model creation and validation for differential molecular diagnosis of LMS and LM”, models were then re-built using only targeted RNAseq data from the following 5 genes and combinations of them, in order to evaluate the performance of the models consisting of a reduced number of genes (Fig. 2, Table 3).
Table 3: Sensitivity and specificity of predictive models for the differential diagnosis of LMS and LM based on gene expression (RNAseq).
Example 3 Identification of differential somatic single nucleotide variants and insertions/deletions Whole-exome sequencing in 44 LM and 34 LMS tumors was performed to screen for single nucleotide variants (SNVs) and insertions/deletions (indels). We detected 181 ,354 small variants in LMS tumors, of which 171 ,863 were SNVs and 9,491 were small indels. In LM samples, we detected 85,250 small variants, of which 81 ,152 were SNVs and 4,098 were small indels.
In an attempt to relate our findings to known mutational signatures, we searched in the Catalogue of Somatic Mutations in Cancer (COSMIC) database (Forbes, S. et al., 2017, Nucleic acids research 45, doi:10.1093/nar/gkw1121) and identified four (1 , 5, 12, and 20) out of 30 existing signatures. Among them, signature 1 results from an endogenous mutational process initiated by spontaneous deamination of 5- methylcytosine, while signature 5 exhibits transcriptional strand-bias for T>C substitutions at ApTpN context as additional mutational features. We also identified signature 20, which is associated with defective DNA mismatch repair due to high numbers of small indels at mono/polynucleotide repeats. While these signatures have
already been identified across 40 different human cancer types, signature 12 represents a novel mutational signature only present in these uterine tumors, showing similarities to liver cancer and exhibiting a strong transcriptional strand-bias for T>C substitutions as additional mutational features.
Example 4
Coverage analysis for the differential diagnosis of LM and LMS
To use DNAseq data as a differential diagnostic tool, coverage values, which can be extrapolated to copy number states (where a higher coverage is interpreted as a duplication or amplification, while lower coverage is interpreted as a deletion) were calculated and normalized for all genes.
As described in section ”DNA library preparation, targeted exome sequencing and mapping” from Materials and Methods, DNA coverage data was used to build a classification model using the xgboost algorithm. Of these, the top 5 genes with highest importance were selected to build new classification models of each individual gene and combinations of them (Figure 3 and Table 4).
Table 4. Sensitivity and specificity of predictive models for the differential diagnosis of LMS and LM based on gene coverage (cpm) of DNA sequencing
The coverage of the TUBB2B, LRRCC1 , NDRG4, HSF4 and the TMPRSS6 genes were identified as predictive for the differential diagnosis of LMS and LM (Fig. 3).
Example 5
Determination of the prognosis of LMS patients based on CNV
We next compared somatic copy number variants (CNVs) in LMS and LM. A total of 14,467 CNVs were detected in LM, while 14,950 CNVs were detected in LMS. Despite the similar results, Student’s t-test showed a significant difference in mean values of
CNVs per sample between LMS (439.7) and LM (328.8) (p = 5.61 e-5; t = -4.268; degrees of freedom = 76). Because some CNVs were present in more than one sample within each group, we filtered unique CNVs per group, obtaining a total of 8,390 CNVs in LMS and 5,376 CNVs in LM. In terms of their structural nature, 18.2% of CNVs in LMS were deletions, while 73.1 % were duplications. Specifically, 3.5% were LMS-specific deletions present in more than one sample, and 5.2% were LMS-specific duplications present in more than one sample. In the LM group, 11.7% of CNVs were deletions and 84.5% were duplications, with only 0.1 % tumor-specific deletions and 3.6% tumor-specific duplications. While the CNV profile for LMS was heterogeneous and showed alterations in most chromosomes, LM tumors had recurrent losses in chromosomes 1 , 13, 14, 15, and 22 and recurrent gains in chromosomes 12 and 19.
Kaplan-Meier survival curves were generated to assess the association between LMS-specific CNVs and clinical prognosis based on overall survival. We selected 12 of the most frequent LMS-specific CNVs present in at least 10 out of 34 LMS (Table 5) and found statistically significant differences between patients with disruptions in at least 67% of CNVs.
Table 5. Most frequent copy number variants (CNVs) in the leiomyosarcoma (LMS) group (detected in at least 25% of LMS samples and in no leiomyoma samples).
Remarkably, these patients had shorter survival time than those with normal copy number values (diploid) in these regions (p = 0.033) (Fig. 4).
Claims
65
CLAIMS An in vitro method for the differential diagnosis of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma, the method comprising:
(i) measuring the expression level of gene ARHGAP11A in a biological sample obtained from the subject thereby obtaining a gene expression profile of said sample; and
(ii) identifying the subject as suffering from uterine leiomyosarcoma or from uterine leiomyoma by a predictive model which correlates the gene expression profile identified in step (i) with representative gene expression profiles from samples obtained from subjects previously identified as suffering from uterine leiomyosarcoma or from uterine leiomyoma, said predictive model having been generated by training a computer with a plurality of gene expression profiles from previously identified subjects suffering from uterine leiomyosarcoma or from uterine leiomyoma by machine learning on said plurality of gene expression profiles so as to obtain representative gene expression profiles associated with uterine leiomyosarcoma or with uterine leiomyoma. The in vitro method according to claim 1 , further comprising in step i) measuring the expression level of gene CENPE. The in vitro method according to claim 1 or 2, further comprising in step i) measuring the expression level of gene COL4A5. The in vitro method according to any one of claims 1 to 3, further comprising in step i) measuring the expression level of gene CENPF. The in vitro method according to any one of claims 1 to 4, further comprising in step i) measuring the expression level of MFAP5. The in vitro method according to claim 5 wherein the method comprises the determination of the expression levels of all the genes or any of the combinations of genes shown in Table 3.
66
7. The in vitro method according to claim 6 wherein the method comprises the determination of the expression levels of all the genes of Table 2.
8. An in vitro method for the differential diagnosis of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma, the method comprising:
(i) measuring the level of expression of at least one gene selected from the list shown in Table 1 in a biological sample obtained from the subject, and
(ii) comparing said level of expression with a reference value, wherein a deviation in the level of expression of said at least one gene selected from the list shown in Table 1 with respect to said reference value, is indicative that the subject is suffering from uterine leiomyosarcoma or from uterine leiomyoma.
9. The method according to claim 8 wherein the at least one gene the expression of which is determined in step (i) is selected from the genes shown in Table 3.
10. The in vitro method according to claim 8 or 9, wherein the method comprises measuring the level of expression of all the genes comprised in Table 1 or Table 2.
11 . The in vitro method according to any one of claims 8 to 10, the method comprising diagnosing uterine leiomyosarcoma when the deviation in the level of expression of the at least one gene is of at least four fold with respect to the reference value, said reference value being the expression level of the same gene or genes determined in sample from a patient with uterine leiomyoma.
12. The in vitro method according to any one of claims 8 to 11 , the method comprising diagnosing uterine leiomyoma when the deviation in the level of expression of the at least one gene is of at least four fold with respect to the reference value, said reference value being the expression level of the same gene or genes determined in sample from a patient with uterine leiomyosarcoma
67 An in vitro method for differential diagnosis of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma, the method comprising analyzing in a biological sample from the subject the coverage of at least one gene selected from the group consisting of the TUBB2b gene, the LRRCC1 gene, the NDRG4 gene, the HSF4 gene and the TMPRSS6 gene and wherein an increased coverage with respect to a reference sample in the HSF4 gene, in the NDRG4 gene, in the TMPRSS6 gene and/or in the TUBB2b gene and/or a decreased coverage with respect to a reference sample in the LRRCC1 gene is indicative that the patient has uterine leiomyosarcoma. The method according to claim 13 wherein the coverage of said at least one gene is determined by a method which comprises whole genome sequencing. The method according to claim 14 wherein the sequencing is Next Generation Sequencing. An in vitro method for the diagnosis of a uterine tumor selected from the group consisting of uterine leiomyoma or uterine leiomyosarcoma in a subject, the method comprising determining in the whole-exome sequence of a biological sample from the subject the value of a mutational index which correlates with the number of single nucleotide variants which are characteristic of the COSMIC mutational signature 12 and/or of the COSMIC mutational signature 20, wherein an increase in said index with respect to a reference sample is indicative that the subject is suffering from uterine leiomyosarcoma or from uterine leiomyoma. The method according to claim 16 wherein the COSMIC mutational signature is the COSMIC mutational signature 12. An in vitro method for prognosis of a subject diagnosed of uterine leiomyosarcoma, comprising determining in a biological sample of the subject the presence of at least one CNVs shown in Table 5, wherein the presence of the CNV in the sample is indicative of a bad prognosis of uterine leiomyosarcoma.
68
19. In vitro method according to any one of claims 1 to 18, wherein the patient has been previously identified as suffering a myometrial tumor by imaging examination, preferably by ultrasonography.
20. The method of any one of the preceding claims, wherein: a. the determination of the expression levels of one or more genes is carried out by exome-wide gene expression from RNAseq, b. the determination of the presence of mutations is carried out by whole- exome sequencing. c. the determination of the presence of CNV is carried out by whole-exome sequencing.
21 . The method according to any one of the preceding claims, wherein the biological sample is a sample containing myometrial cells, DNA derived from myometrial cells or RNA derived from myometrial cells.
22. The method according to claim 21 wherein the sample containing myometrial cells is a myometrial biopsy.
23. In vitro method for selecting a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma as a candidate to receive an adequate therapy to treat uterine leiomyosarcoma or uterine leiomyoma, the method comprising:
(i) determining whether the patient is suffering from uterine leiomyoma or leiomyosarcoma following the method of any one of claims 1 to 15 or 19-22; and
(ii) selecting said patient as a candidate to receive an adequate therapy to treat uterine leiomyosarcoma if the patient is diagnosed as suffering from uterine leiomyosarcoma or to receive an adequate therapy to treat uterine leiomyoma if the patient is diagnosed as suffering from uterine leiomyoma.
24. The method according to claim 23 wherein the therapy adequate for the treatment of uterine leiomyosarcoma is selected from the group consisting of surgery,
69 radiation therapy, chemotherapy, hormonal therapy or targeted therapy or wherein the therapy adequate for the treatment of leiomyoma is morcellation. The method according to claim 24 wherein the surgery is simple hysterectomy, radical hysterectomy or bilateral salpingo-oophorectomy, wherein the chemotherapy includes one or more drugs selected from the group consisting of Dacarbazine (DTIC), docetaxel, doxorubicin, epirubicin, gemcitabine, ifosfamide, Paclitaxel, temozolomide, trabectedin and vinorelbine, wherein the hormonal therapy comprises progestin, an agonist of the Gonadotropin-releasing hormone or an aromatase inhibitor or wherein the targeted therapy include pazopanib. A method for the treatment of leiomyosarcoma in a subject in need thereof comprising the administration of a therapy adequate for the treatment of leiomyosarcoma, wherein the patient to be treated has been identified by a method according to any of claims 1 to 15. The method according to claim 26 wherein the therapy adequate for the treatment of leiomyosarcoma is selected from the group consisting of surgery, radiation therapy, chemotherapy, hormonal therapy or targeted therapy. The method according to claim 27 wherein the surgery is simple hysterectomy, radical hysterectomy or bilateral salpingo-oophorectomy, wherein the chemotherapy includes one or more drugs selected from the group consisting of Dacarbazine (DTIC), docetaxel, doxorubicin, epirubicin, gemcitabine, ifosfamide, Paclitaxel, temozolomide, trabectedin and vinorelbine, wherein the hormonal therapy comprises progestin, an agonist of the Gonadotropin-releasing hormone or an aromatase inhibitor or wherein the targeted therapy include pazopanib. A kit, package and/or device comprising reagents adequate for implementing the methods according to any one of claims 1 to 28. The kit, package or device wherein the reagents comprise:
(i) primers or probes adequate for the detection of the expression levels of one or more of the genes, the expression levels of which are determined in the methods according to any of claims 1 to 12,
70
(ii) primers or probes adequate for the detection of the coverage in one or more genes which are analysed in claim 13 to 15,
(iii) primers or probes adequate for the detection of one or more of the mutations forming part of a COSMIC signature which is detected in the method of claims 16 or 17 or
(iv) primers or probes adequate for the determination of the one or more of the CNVs defined in claim 18. Use of a kit according to claim 30 for differential diagnosis of a subject suspected of suffering from uterine leiomyoma or uterine leiomyosarcoma, for the prognosis of a patient diagnosed of uterine leiomyosarcoma or for identifying a subject as a candidate to receive an adequate therapy to treat uterine leiomyosarcoma. A computer-implemented method, wherein the method is as defined in any of claims 1 to 28. A computer containing instructions for carrying a method as defined in any of claims 1 to 28.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP22805781.6A EP4416305A2 (en) | 2021-10-11 | 2022-10-10 | Methods and reagents for the differential diagnosis of uterine tumors |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP21382915 | 2021-10-11 | ||
| EP21382915.3 | 2021-10-11 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2023061914A2 true WO2023061914A2 (en) | 2023-04-20 |
| WO2023061914A3 WO2023061914A3 (en) | 2023-05-25 |
Family
ID=78413951
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2022/078052 Ceased WO2023061914A2 (en) | 2021-10-11 | 2022-10-10 | Methods and reagents for the differential diagnosis of uterine tumors |
Country Status (2)
| Country | Link |
|---|---|
| EP (1) | EP4416305A2 (en) |
| WO (1) | WO2023061914A2 (en) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5037379A (en) | 1990-06-22 | 1991-08-06 | Vance Products Incorporated | Surgical tissue bag and method for percutaneously debulking tissue |
| US5327896A (en) | 1993-06-30 | 1994-07-12 | Arthrex, Inc. | Suction downbiter |
| US5403276A (en) | 1993-02-16 | 1995-04-04 | Danek Medical, Inc. | Apparatus for minimally invasive tissue removal |
| US5443472A (en) | 1993-10-08 | 1995-08-22 | Li Medical Technologies, Inc. | Morcellator system |
| US5520634A (en) | 1993-04-23 | 1996-05-28 | Ethicon, Inc. | Mechanical morcellator |
| US20130203606A1 (en) | 2010-02-25 | 2013-08-08 | Advanced Liquid Logic Inc | Method of Preparing a Nucleic Acid Library |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2020212580A1 (en) * | 2019-04-17 | 2020-10-22 | Igenomix, S.L. | Improved methods for the early diagnosis of uterine leiomyomas and leiomyosarcomas |
-
2022
- 2022-10-10 WO PCT/EP2022/078052 patent/WO2023061914A2/en not_active Ceased
- 2022-10-10 EP EP22805781.6A patent/EP4416305A2/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5037379A (en) | 1990-06-22 | 1991-08-06 | Vance Products Incorporated | Surgical tissue bag and method for percutaneously debulking tissue |
| US5403276A (en) | 1993-02-16 | 1995-04-04 | Danek Medical, Inc. | Apparatus for minimally invasive tissue removal |
| US5520634A (en) | 1993-04-23 | 1996-05-28 | Ethicon, Inc. | Mechanical morcellator |
| US5327896A (en) | 1993-06-30 | 1994-07-12 | Arthrex, Inc. | Suction downbiter |
| US5443472A (en) | 1993-10-08 | 1995-08-22 | Li Medical Technologies, Inc. | Morcellator system |
| US20130203606A1 (en) | 2010-02-25 | 2013-08-08 | Advanced Liquid Logic Inc | Method of Preparing a Nucleic Acid Library |
Non-Patent Citations (14)
| Title |
|---|
| "Molecular Cloning: A Laboratory Manual", 1989, COLD SPRING HARBOR LABORATORY PRESS |
| "Physics of Ultrasonic Surgery Using Tissue Fragmentation", 1995, IEEE ULTRASONICS SYMPOSIUM PROCEEDINGS, pages: 1597 - 1600 |
| ALEXANDROV, NATURE, vol. 500, no. 7463, 2013, pages 415 - 21 |
| BEAUCAGECARUTHERS, TETRAHEDRON LETTS., vol. 22, 1981, pages 1859 - 1862 |
| BLOKZIJL ET AL., GENOME MED., vol. 10, no. 1, 2018, pages 33 |
| DE ANDRES ET AL., BIOTECHNIQUES, vol. 18, 1995, pages 42044 |
| DOWDYWEARDEN: "Statistics for Research", 1983, JOHN WILEY AND SONS |
| FORBES, S. ET AL., NUCLEIC ACIDS RESEARCH, vol. 45, 2017 |
| HASTINGS ET AL., NAT REV GENET, vol. 10, no. 8, 2009, pages 551 - 64 |
| NEEDHAM-VAN DEVANTER ET AL., NUCLEIC ACIDS RES., vol. 12, 1984, pages 6159 - 6168 |
| RUPPLOCKER: "Lab Invest.", vol. 56, 1987, pages: A67 |
| SAMBROOK, J. ET AL.: "Molecular cloning: A Laboratory Manual", vol. 1-3, 2001, COLD SPRING HARBOR LABORATORY PRESS |
| SHAH ET AL., CANCER RES., vol. 70, no. 2, 15 January 2010 (2010-01-15), pages 431 - 435 |
| SHISHIDO ET AL., PSYCHIATRY CLIN NEUROSCI, vol. 68, no. 2, 2014, pages 85 - 95 |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2023061914A3 (en) | 2023-05-25 |
| EP4416305A2 (en) | 2024-08-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113785076B (en) | Methods of predicting prognosis of cancer and compositions thereof | |
| US12195803B2 (en) | Methods and materials for assessing and treating cancer | |
| EP3571322B9 (en) | Molecular subtyping, prognosis, and treatment of bladder cancer | |
| EP3179393B1 (en) | Gene expression profile algorithm and test for determining prognosis of prostate cancer | |
| CA3081061C (en) | Method for using expression of klk2 to determine prognosis of prostate cancer | |
| US20180282817A1 (en) | Method of classifying and diagnosing cancer | |
| WO2020214718A1 (en) | Rrm2 signature genes as prognostic markers in prostate cancer patients | |
| US20190367964A1 (en) | Dissociation of human tumor to single cell suspension followed by biological analysis | |
| US20160222461A1 (en) | Methods and kits for diagnosing the prognosis of cancer patients | |
| US20250137066A1 (en) | Compostions and methods for diagnosing lung cancers using gene expression profiles | |
| US20250043355A1 (en) | Pan-cancer transcriptional signature | |
| CN107208131A (en) | Method for lung cancer parting | |
| US20230265522A1 (en) | Multi-gene expression assay for prostate carcinoma | |
| US20220262458A1 (en) | Detecting neurally programmed tumors using expression data | |
| US20230416841A1 (en) | Inferring transcription factor activity from dna methylation and its application as a biomarker | |
| US20240060138A1 (en) | Breast cancer-response prediction subtypes | |
| WO2023061914A2 (en) | Methods and reagents for the differential diagnosis of uterine tumors | |
| EP4010490B1 (en) | Molecular classifiers for prostate cancer | |
| WO2022125871A1 (en) | Methods for tailoring analgesic regimen in cancer patient's based on tumor transcriptomics | |
| Kumar et al. | Bioinformatics Analysis of Merkel Cell Carcinoma to identify critical genes and their validation | |
| Chen et al. | A Nomogram For Predicting HCC Patients’ Overall Survival Based On Double Hub Genes and Other Clinical Risk Factors. | |
| WO2024173242A2 (en) | Systems and methods for minimal residual disease analysis | |
| Samadder | Evaluating Differential Gene Expression Using RNA-Sequencing: A Case Study in Diet-Induced Mouse Model Associated With Non-Alcoholic Fatty Liver Disease (NAFLD) and CXCL12-vs-TGFβ Induced Fibroblast to Myofibroblast Phenoconversion | |
| CN113278702A (en) | Application of PSMC2 gene detection primer in preparation of glioblastoma multiforme auxiliary diagnosis and prognosis evaluation kit | |
| HK40035898A (en) | Gene expression profile algorithm and test for determining prognosis of prostate cancer |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22805781 Country of ref document: EP Kind code of ref document: A2 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022805781 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2022805781 Country of ref document: EP Effective date: 20240513 |