[go: up one dir, main page]

WO2013072346A2 - Discrete states for use as biomarkers for cancers such as renal cell cancer - Google Patents

Discrete states for use as biomarkers for cancers such as renal cell cancer Download PDF

Info

Publication number
WO2013072346A2
WO2013072346A2 PCT/EP2012/072578 EP2012072578W WO2013072346A2 WO 2013072346 A2 WO2013072346 A2 WO 2013072346A2 EP 2012072578 W EP2012072578 W EP 2012072578W WO 2013072346 A2 WO2013072346 A2 WO 2013072346A2
Authority
WO
WIPO (PCT)
Prior art keywords
disease
genes
descriptors
invers
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2012/072578
Other languages
French (fr)
Other versions
WO2013072346A9 (en
Inventor
Manfred Beleut
Karsten Henco
Holger Moch
Peter Schraml
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zurich Universitaet Institut fuer Medizinische Virologie
PAREQ AG
Original Assignee
Zurich Universitaet Institut fuer Medizinische Virologie
PAREQ AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zurich Universitaet Institut fuer Medizinische Virologie, PAREQ AG filed Critical Zurich Universitaet Institut fuer Medizinische Virologie
Publication of WO2013072346A2 publication Critical patent/WO2013072346A2/en
Publication of WO2013072346A9 publication Critical patent/WO2013072346A9/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Definitions

  • markers which are frequently designated as biomarkers, are at hand being characteristic for the disease in question and relating to relevant mechanisms, relevant clinical endpoints and relevant criteria to select proper treatment.
  • markers may be found on the DNA, the R A or the protein level.
  • molecular markers as a diagnostic tool is relatively straightforward as one can use the aberration on the DNA level to predict whether the disease will develop with a certain probability or not. For example, trinucleotide expansions on the DNA level may be used to predict whether an individual will develop Huntington Chorea. Similarly, mutations in the Survival of Motor Neurons gene can be used to predict whether an individual will develop Spinal Muscular Atrophy.
  • markers of inflammation or ongoing apoptosis markers of metabolic properties or molecular markers derived from mechanistic understanding of tumor induction, induced by deregulated balances between oncogenes such as Ras, Myc, CDKs and tumor suppressor genes such as pi 6, p27 or p53 (see e.g. Hanahan & Weinberg in "The Hallmarks of Cancer” (2000).
  • tumor development mechanisms such as uncontrolled cellular growth, senescence and apoptosis evasion, such as extravasion, invasion, and evasion of immune responses have further accentuated the tumor suppressor gene hypothesis.
  • Cancer for example, is considered as a prime example for multi- factorial diseases which arise from subtle to severe deregulation of complex molecular networks. In most cases, these diseases do not develop from a single gene mutation but rather result from the accumulation from mutations in various genes. Each single mutation may not be sufficient in itself to start disease development. Rather, accumulation of mutations over time seems to increasingly deregulate the complex molecular signaling networks within cells. In these cases, disease development has therefore usually been considered to be a gradual continuous process which cannot be characterized by key events. As a consequence thereof, it is commonly assumed that such diseases cannot be diagnosed or classified by a single bio marker but by a group of markers which ideally would reflect in a simplified manner the complex molecular mechanisms underlying the disease.
  • the human genome project together with all its spin-off projects such as analysis of individual genome varieties between individuals or just individual cells affected by a disease, analyses of respective transcriptomes, proteomes etc. were assumed to directly provide a large variety of useful biomarkers. Interestingly, most of these approaches have tried again to link the phenotypic differences observed for disease with distinct molecular pathways.
  • the present invention provides a strategic and direct approach to global and functional biomarkers of clinical relevance for essentially all kinds of tumors, such as for renal cell cancer or breast cancer and potentially non-tumor diseases, too.
  • tumors such as renal cell cancer or breast cancer being associated with discrete stable or meta-stable states which can be of clinical relevance
  • one is now able to define methods allowing the skilled person to not only identify and prove the existence of such discrete states for any kind of tumor such as renal cell cancer or breast cancer, but to assign such states with descriptors and signatures associated with such states.
  • the technology allows to identify a minimum of those descriptors which unequivocally identify and discriminate each such discrete state from alternative states in a given tumor cell sample such as for renal cell cancer.
  • the invention is thus based on the surprising finding that diseases such as renal cell cancer or breast cancer can be characterized by discrete states, which reflect the underlying molecular mechanisms. Interestingly, these discrete states are distinct from one another so that disease development does not seem to be characterized by a continuous process. Rather, a discrete state seems to be maintained until a certain threshold level is reached when a switch to another discrete state occurs. Further, it seems that the discrete states may be linked to clinically and pharmacologically important parameters. However, they do not necessarily seem to coincide with standard histological classification schemes or other classification schemes.
  • a signature is a pattern reflecting the qualitative and/or quantitative appearance of at least one descriptor.
  • a signature is a pattern reflecting the qualitative and/or quantitative appearance of multiple descriptors.
  • Descriptors may in principle be any testable molecule, function, size, form or other parameter that can be linked to a cell. Descriptors may thus be e.g. genes or gene-associated molecules such as proteins and RNAs. The expression pattern of such molecules may define a signature. Such descriptors may also be designated as markers and marker sets.
  • the invention thus relates to at least one discrete disease-specific state for use as a diagnostic and/or prognostic marker in classifying samples from patients, which are suspected of being afflicted by renal cell cancer.
  • the invention further relates to at least one discrete disease-specific state for use as a diagnostic and/or prognostic marker in classifying cell lines of renal cell cancer.
  • the invention also relates to at least one discrete disease-specific state for use as a target for
  • the invention in one embodiment relates to at least one signature for use as a diagnostic and/or prognostic marker in classifying samples from patients which are suspected to be afflicted by a disease such as renal cell cancer.
  • the invention also relates to at least one signature for use as a diagnostic and/or prognostic marker in classifying cell lines of a disease such as renal cell cancer.
  • the invention further relates to at least one signature for use as a read out of a target for development, identification and/or screening of pharmaceutically active compounds.
  • the invention also relates to sets of descriptors which have been found to be predictive for a given discrete disease-specific state such as a renal cancer- or breast cancer-specific state, and to methods of identifying such sets or predictive descriptors for all states currently known for a specific disease.
  • These sets of predictive descriptors may relate to measurable properties such as determining expression by PCR, optionally by qPCR for a set of genes which is then considered to be the set of predictive descriptors. It is disclosed herein that a set of at least 6 genes for each state of renal cell cancer may be sufficient to assign a patient to one of the three known states in renal cell cancer with an accuracy of at least 65%.
  • the invention relates to methods of diagnosing a disease such as renal cell cancer or breast cancer by making use of signatures and discrete disease- specific states.
  • the invention also relates to methods of determining the responsiveness of a test population suffering from a disease such as a renal cell cancer or breast cancer towards a pharmaceutically active agent by making use of signatures and discrete disease-specific states.
  • the invention relates to methods of predicting the responsiveness of patients suffering from a disease such as renal cell cancer or breast cancer in clinical trials towards a pharmaceutically active agent by making use of signatures and discrete disease-specific states.
  • the invention also relates to methods of determining the effects of a potential pharmaceutically active compound by making use of signatures and discrete disease- specific states.
  • the invention also relates to methods for identifying signatures, discrete disease specific states and sets of predictive descriptors in samples which may be derived from patients or which may e.g. be cell lines.
  • the present invention discloses specific sets of descriptors, properties of which may be used to determine whether a specific state is present within a disease such as renal cell cancer. These properties may e.g. be the expression patterns of the descriptors which are described
  • the expression may be determined e.g. on the R A or protein level.
  • the invention as described herein is not to limited to these specific descriptor and descriptor sets. While determining the expression levels of the descriptors and descriptor sets as described hereinafter may provide a straightforward approach for classifying hyper-proliferative diseases such as renal cell cancer or breast cancer according to a new classification scheme, one can use different type of descriptors and read outs to determine states. Methods for generally detecting states in hyper-proliferative diseases such as renal cancer, colorectal cancer etc. are described in PCT/EP201 1/057691 and Beleut et al., BMC cancer (2012), 12:310.
  • All of these embodiments of the invention can be used in the context of diseases including hyper-proliferative diseases such as cancer and preferably in the context of renal cell cancer or breast cancer.
  • FIG. 12 Breast cancer-specific state-based patient stratification in combination defines responder cohort.
  • a group is defined to comprise at least a certain number of embodiments, this is also to be understood to disclose a group, which preferably consists only of these embodiments.
  • an indefinite or definite article is used when referring to a singular noun, e.g. "a”, “an” or “the”, this includes a plural of that noun unless something else is specifically stated.
  • Terms like “obtainable” or “definable” and “obtained” or “defined” are used interchangeably. This e.g. means that, unless the context clearly dictates otherwise, the term “obtained” does not mean to indicate that e.g. an embodiment must be obtained by e.g. the sequence of steps following the term “obtained” even though such a limited understanding is always included by the terms “obtained” or “defined” as a preferred embodiment.
  • the terms "about” or “approximately” denote an interval of accuracy that the person skilled in the art will understand to still ensure the technical effect of the feature in question.
  • the term typically indicates deviation from the indicated numerical value of ⁇ 10%, and preferably of ⁇ 5%.
  • sample this always preferably refers to an extracorporeal sample.
  • histological phenotypes e.g. cancers such as lung cancer with specific expression patterns assuming that the different detectable phenotypes reflect continuous and progressive disease development.
  • Another example is renal cell cancer, where histological characterization has led to identification of clear cell, papillary and other types of renal cell cancer. The present invention is not using these standard approaches of the prior art.
  • a disease is characterized by switching to discrete disease-specific states. This suggests that de-regulation of regulatory networks within a cell can occur to a certain a threshold level without the overall discrete state being affected. However, once the threshold level has been exceeded cells seem to switch to another specific discrete state. These states can therefore be considered as stable or meta- stable in that they may allow for a certain degree of variation before they may switch. We understand a discrete state to reflect the flow and extent of interactions between and within different regulatory networks.
  • the extent and flow of interactions between and within different regulatory networks may be detectable by e.g. the expression level of e.g. proteins within such regulatory networks either on the RNA or protein level.
  • the molecular entities, which are looked at can be designated as descriptors.
  • the pattern, which is detected for a set of descriptors, can be considered as a signature.
  • the signature will be the expression pattern of proteins, which function as the descriptors.
  • One may thus look at expression levels of genes on the RNA level.
  • One may look at the regulation of miRNAs and one may even look at the qualitative distribution of descriptors such as the cellular localization of certain factors or the shape of a cell.
  • Identification of disease specific descriptors such as biomarkers may then be performed using SAM (Tusher et al, Proc Natl Acad Sci USA (2001) 98(9):5116-5121).
  • SAM Session et al, Proc Natl Acad Sci USA (2001) 98(9):5116-5121).
  • This approach can be used to identify signatures and states not only for renal cell cancer, but also for breast cancer, colorectal cancer, lung cancer, etc.
  • the approach is thus generally applicable by subjecting expression data obtained from different patients to this unsupervised two-way hierarchical clustering approach.
  • Identification of signatures and steps may be best performed by first extracting descriptors such as expressed genes for certain pathway using the Panther software as described in PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310 and subjecting these pathway specific sets of genes to unsupervised two-way hierarchical clustering.
  • the groups of descriptors, e.g. the genes identified for the different pathways may then be combined and again subjected to a unsupervised two-way hierarchical clustering approach against the same tumor sets. This two-fold unsupervised two-way hierarchical clustering will reveal in a straightforward manner whether a certain disease can be classified into different disease-specific states as describe herein.
  • the an unsupervised two-way hierarchical clustering approach and preferably the two-fold application thereof as described in PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310 allows identification of disease-specific states in different diseases such as renal cell cancer, breast cancer, ovarian cancer, colorectal cancer, lung cancer, prostate cancer, brain cancer, hepato cellular carcinoma, acute myeloma, pheochromocytoma, Burkitt's lymphoma, myeloma or Parkinson's disease.
  • the set of descriptors such as e.g. the set of expressed genes which can be used to distinguish the different states can be determined by this approach.
  • a set of predictive descriptors such as a set of genes can be identified which upon analysis by PCR analysis, optionally by quantitative PCR (qPCR) analysis allows assignment of a discrete disease-specific state in a patient sample.
  • qPCR quantitative PCR
  • This information may be obtainable by the unsupervised two-way hierarchical clustering approach, and preferably by the two-fold application thereof as described in PCT/EP201 1/057691 and Beleut et al., BMC cancer (2012), 12:310. From the set of descriptors which are identifiable by this approach, one can then select the set of predictive descriptors for a given disease-specific state, which are e.g. testable by PCR, optionally by qPCR, following the selection criteria mentioned hereinafter.
  • papillary RCCs of different patients may be characterized by different discrete molecular states and that the patients may thus have different survival expectations even though their cancers have been classified as comparable by histological standards. It follows from the invention as laid out hereinafter that the same discrete state can be characterized through different signatures. Thus, a novel interpretation of renal cell cancer is suggested, based on the signatures described hereinafter. The finding that a hyper-proliferative disease such as renal cell cancer can be characterized by different discrete renal cell cancer-specific states has important implications.
  • the discrete disease-specific state(s) may be used to classify patients and samples thereof as falling within distinct groups. As the discrete renal cell cancer-specific state may moreover be linked to clinically important parameters such as survival time or responsiveness to distinct drugs, this will help selecting therapeutic regimens.
  • the discrete molecular state(s) may thus be used as diagnostic and/or markers providing a new way of classifying renal cell cancer into clinically relevant subgroups etc.
  • a drug can be shown to act preferentially only in a selected group of patients which suffer from e.g. a subtype of renal cell cancer or breast cancer and which are characterized by the same discrete disease-specific state of interacting molecular networks, then this drug may be tested in other patients which suffer from a different disease, but are characterized by the same discrete molecular state. Further, clinical trials, which led to ambiguous results for a disease such as renal cell cancer or breast cancer, may be reassessed by regrouping patients according to their status as described herein.
  • the discrete states thus provide a stratifying tool for the testing of pharmacological treatments as it allows grouping of patients for clinical trials. Assuming a drug candidate is identified which is expected or hoped to positively influence the critical parameter of survival time in renal cell cancer substantially, this needs to be proven by clinical trials in order to receive FDA approval. Future drugs will likely focus on mechanistic intervention. If the mechanistically active drug is successful for the clinical end point parameter "survival time", it probably interacts selectively with mechanisms linked to the parameter "survival time”. These mechanistic subgroups are exactly those defined by e.g. the discrete molecular states enabled by this invention. It is thus fair to believe, that most probably one subgroup of patients reacts positively to a different degree than another subgroup does.
  • the knowledge about discrete disease-specific states may also allow using these states as targets during development of pharmaceutical products.
  • different renal cell cancer specific states may be linked to clinically relevant parameters such as survival time or response rate to a certain drug. If an agent is shown to switch the discrete disease- specific state in a sample or in a cell line from a state, which is linked to short survival time, into a state with long survival time, such a switch may be used as an indication that the agent may be therapeutically effective in treating the disease in question.
  • assays can be designed which make use of the correlation between a discrete renal cell cancer-specific state and e.g. the associated clinical parameter. The fact that one now knows that e.g.
  • discrete renal cell cancer- or discrete breast cancer-specific states exist and drive disease development in at least some of its aspects allows one to identify signatures of descriptors, which can then be used in a diagnostic test to classify renal cell cancer or breast cancer.
  • signatures of descriptors thus serve as a read-out for the classification of a disease or its subtype.
  • a preferred read-out for signatures and states of renal cell cancer or breast cancer may be the expression of the descriptors and descriptor sets described herein. From a practical perspective, the read out may be implemented in the form of ELISA assay, array technology, kits and all other types of devices and methods that allow determining expression of the descriptors and descriptor sets as described herein.
  • the invention thus also relates to such kits, assays, arrays etc. and as well as to the use of such kits, assays, arrays etc. as mentioned herein.
  • the read out for such states may be sets of predictive descriptors such as genes which can be tested by PCR, optionally by qPCR. The assignment of a disease-specific state based on the PCR- or qPCR-measurements is then done based on the calculations described hereinafter.
  • A, B, C and D are e.g. the expression patterns, i.e. the signatures of a limited set of descriptors, i.e. genes.
  • Each state may be best described by a signature arising from properties of a group of descriptors, which may also be designated as a descriptor set, such as the expression pattern of the group of genes described in Tables 1, 2, 3, and 4.
  • Each group of genes defines a descriptor set.
  • the expression pattern of each group of genes further provides a signature which is indicative of a renal cell cancer-specific disease state.
  • the expression pattern may be determined by different methods such as ELISA, Western Blotting, RNA expression analysis. It is to be understood that the nomenclature A, B and C refers to the same types of states as described in PCT/EP201 1/057691, even though they are described by different signatures, namely in the present case by the expression pattern of different sets of descriptors.
  • the discrete disease-specific states may reflect the aggressiveness of the tumor.
  • the read-out for these four discrete molecular states which are designated hereinafter as A, B, C and D are e.g. the expression patterns, i.e. the signatures of a limited set of descriptors, i.e. genes.
  • Each state may be best described by a signature arising from properties of a group of descriptors, which may also be designated as a descriptor set, such as the expression pattern of the group of genes described in Tables 1, 2, 3, and 4.
  • Each group of genes defines a descriptor set.
  • the expression pattern of each group of genes further provides a signature which is indicative of a breast cancer-specific disease state.
  • the expression pattern may be determined by different methods such as ELISA, Western Blotting, RNA expression analysis.
  • State means a stable or meta-stable constellation of a cell and/or cell population which is identifiable in at least two biological samples from at least two patients and which can be described by means of a single descriptor or multiple descriptors on the cellular or molecular level referenced against a standard state. As explained hereinafter, such state can be characterized by at least one or various signatures. Such signatures may be reflected by the expression of genes relative to each other.
  • different states refer to different stabile and metastabile constellations of a cell meaning that these constellations are distinct from each other in terms of the kind and extent of molecules of at least two regulatory networks interacting within a cell.
  • Different states can be characterized by a limited set of descriptors giving rise to different signatures. They may therefore also be designated a "discrete molecular state”.
  • a state is indicative of a disease, it may be designated as "disease specific molecular state" such as renal cell cancer-specific state.
  • a disease specific state may be linkable to clinically relevant parameters such as survival rate, therapy responsiveness, and the like.
  • a state which can be found in healthy human or animal subjects may be designated as "healthy state”.
  • discrete disease specific state preferably allows distinguishing different subtypes of a disease according to a new classification scheme which links the subtype being characterized by a discrete disease specific state to clinically or pharmacologically important parameters.
  • clinical or pharmacological relevant parameter preferably relate to efficacy-related parameters as they will be typically analyzed in clinical trials. They thus do not necessarily relate to a change in the histological appearance of a disease, but rather to important clinical end points such as average survival time, progression- free survival times, responsiveness to a certain drug, subjective patient- or physician- rated improvements making use established scale systems, tolerability, adverse events. The terms also include responsiveness towards treatment.
  • Descriptor means a measurable parameter on the molecular or cellular level which can be detected in terms of, but not limited to existence, constitution, quantity, localization, co-localization, chemical derivative or other physical property.
  • a descriptor thus reports at least one qualitative and/or quantitative measuring parameter of, but not limited to existence, kinetic variation, clustering, cellular localization or co-localization of at least one specific mRNA, processing or maturation derivatives of at least one specific mRNA, specific DNA-motifs, variants or chemical derivatives of such motifs, such as but not limited to methylation pattern, miRNA motifs, variants or chemical derivatives of such miRNA motifs, proteins or peptides, processing variants or chemical derivatives of such proteins or peptides or any combination of the foregoing.
  • a descriptor may be a protein the over- or under-expression of which can be used to describe a discrete disease-specific state vs. a different discrete disease-specific state or vs. the discrete healthy state. If different proteins, i.e.
  • a set of descriptors may comprise expression data for a first set of proteins, data on post-trans lational modifications of a second set of proteins and data for a group of miRNAs.
  • the measurable parameter of a descriptor is the expression level of a protein and/or gene which may be determined e.g. on the protein and/or RNA level by methods known in the art such as Western Blotting, ELISA, immunoassays, Northern Blotting, array expression analysis etc.
  • Preferred descriptors include genes and gene-related molecules such as mRNAs or proteins.
  • the “qualitative" detection of a descriptor refers preferably to e.g. determining the localization of a descriptor such as a protein, an mRNA or miRNA within e.g. a cell It may also refer to the size and/or the shape of cell.
  • the “quantitative" detection of a descriptor refers preferably to e.g. determining the presence and preferably the amount of a descriptor within a given sample.
  • the quantitative measurement of a descriptor relates to detecting the amount of genes and gene-related molecules such as mRNAs or proteins.
  • “Signature” means a pattern of a set of at least two experimentally detectable and/or quantifiable descriptors with the pattern being a characteristic description for a discrete state.
  • the term “diseases” relate to all types of diseases including hyper-pro liferative diseases. The term reflects the all stages of a disease, e.g. the formation of a disease including initial stages, the development of a disease including the spreading of a disease, the stages of manifestation, the maintenance of a disease, the surveillance of a disease etc.
  • Example of diseases include Parkinson disease, Alzheimer disease, etc..
  • hyper-proliferative diseases relate to all diseases associated with the abnormal growth or multiplication of cells.
  • a hyper-proliferative disease may be a disease that manifests as lesions in a subject.
  • Hyper-proliferative diseases include benign and malignant tumors of all types, but also diseases such as hyperkeratosis and psoriasis.
  • Tumor diseases include cancers such as such as lung cancer (including non small cell lung cancer), kidney cancer, bowel cancer, head and neck cancer, colo(rectal) cancer, glioblastom, breast cancer, prostate cancer, skin cancer, melanoma, non Hodgkin lymphoma and the like.
  • Other cancers include ovarian cancer, hepatocellular carcinoma, acute myeloid leukemia, pheochromocytoma, Burkitt's lymphoma and melanoma.
  • a preferred hyper-proliferative disease is renal cell cancer.
  • cancers considered are as defined according to the International Classification of Diseases in the field of oncology (see
  • Such cancers include epithelial carcinomas such as epithelial neoplasms; squamous cell neoplasms including squamous cell carcinoma; basal cell neoplasms including basal cell carcinoma; transitional cell papillomas and carcinomas; adenomas and adenocarcinomas (glands) including adenoma, adenocarcinoma, linitis plastic, insulinoma, glucagonoma, gastrinoma, vipoma, cholangiocarcinoma, hepatocellular carcinoma, adenoid cystic carcinoma, carcinoid tumor, prolactinoma, oncocytoma, hurthle cell adenoma, renal cell carcinoma, grawitz tumor, multiple endocrine adenomas, endometrioid adenoma; adnexal and skin appendage neoplasms; mucoepider
  • nevi and melanomas including melanocytic nevus, malignant melanoma, melanoma, nodular melanoma, dysplastic nevus, lentigo maligna melanoma, sarcoma and mesenchymal derived cancers, superficial spreading melanoma and acral lentiginous malignant melanoma.
  • sample typically refers to a human or individual that is suspected to suffer from e.g. a hyper-proliferative disease. Such individuals may be designated as patients. Samples may thus be tissue, cells, saliva, blood, serum, etc.
  • cell lines will designate cell lines which are either primary cell lines which were developed from patients' samples or which are typically be considered to be representative for a certain type of hyper-proliferative diseases. It is to be understood that all methods and uses described herein in one embodiment may be performed with at least one step and preferably all steps outside the human or animal body. If it is therefore e.g. mentioned that "a sample is obtained” this means that the sample is preferably provided in a form outside the human or animal body, i.e. as an extracorporeal sample.
  • sample, tissue etc. in the context of the present invention preferably relates to renal cell cancer tissue or breast cancer tissue. It will be first described how signatures can be identified in accordance with the invention. It is to be understood that a signature will be indicative of a discrete disease-specific state.
  • signatures and discrete disease-specific states can be identified by analyzing for the quality and/or quantity of descriptors from at least two different regulatory networks for a multitude of samples from either patients of a hyper- proliferative diseases such as renal cell cancer or cell lines of a hyper-proliferative disease such as renal cell cancer as was described in PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310 for renal cell carcinoma.
  • This data is then analyzed for certain patterns by (i) grouping the data for the quality and/or quantity across descriptors and (ii) grouping samples or cell lines in a second step for similarities of the quality and/or quantity of descriptor across all potential descriptors.
  • the present invention describes yet another algorithm based approach for identifying states, signatures and descriptors such as the expression patterns of distinct groups of genes for renal cell cancer, breast cancer or other diseases.
  • This method has led to identification of different sets of descriptors of states A, B and C known from PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310 and a new state D.
  • This method may, however, also be applied to other tumors such as breast cancer. It is to be understood that the overall group of analyzed descriptors (such as the expression of all genes) does not necessarily have to yield different signatures.
  • a chosen set of descriptors may only yield one signature. This will thus indicate that the disease examined has only one discrete disease-specific state.
  • this assumes that the analysis has been performed with a comprehensive set of sample covering all relevant types of a disease such as renal cell cancer.
  • the overall group descriptors may also yield multiple signatures such as 2, 3, 4, 5 or more signatures.
  • the number of signatures will indicate the number of discrete disease-specific states that can be observed on this level of resolution for a disease. For example, if one analyzes a comprehensive set of samples for renal cell cancer or breast cancer and identifies e.g. four signatures, this means that renal cell cancer or breast cancer can be characterized by four discrete disease-specific states. For each state, one may then select one signature and thus one set of descriptors that allows to determine the respective state.
  • tables 1, 2, 3 and 4 for example describe set of descriptors (genes and proteins), the expression of which can be used to determine whether the renal cell cancer of a particular sample and thus e.g. patient is characterized by state A, B, C and D.
  • tables 5, 6, 7 and 8 for example further describe set of descriptors (genes and proteins), the expression of which can be used to determine whether the breast cancer of a particular sample and thus e.g. patient is characterized by state A, B, C and D.
  • a discrete disease-specific state may be described through multiple signatures depending on what type and combination of descriptors have been used for identifying the signatures.
  • one can identify groups by grouping samples according to the similarity of a parameter which is attributable to a descriptor (such as expression) over a complete set or over a subset of genes or gene-associated molecules, wherein the similarity is preferably measured using a statistical distance measure such as Euclidian distance, Pearson correlation, Spearman correlation, or Manhattan distance.
  • the invention wherever it mentions methods of identifying discrete disease-specific states, signatures etc. always considers that the quality and/or quantity of descriptors has to be tested. This testing may include technical means such as use of e.g. micro-arrays to determine expression of genes. If the invention considers applying such methods by relying on and using data which are indicative of the quality and/or quantity of descriptors and which are deposited in e.g. databases after they have been determined using technical means, these methods will be run on technical devices such as a computer. All methods as they are described herein for identifying discrete disease-specific states, signatures etc. may therefore be performed in a computer-implemented way.
  • signatures and discrete renal cell cancer- or breast cancer-specific states can be used for diagnostic, prognostic, analytical and therapeutic purposes. These aspects will be discussed in parallel for discrete renal cell- and breast cancer-specific states and signatures as if these terms were interchangeable. It has, however, to be born in mind that a discrete renal cell cancer- or breast cancer-specific state can be described through various signatures and depending on the type and combinations of descriptors chosen. If in the following the term signature is used this is thus meant to incorporate all signatures and descriptor types that can be used to describe a single discrete renal cell cancer-or breast cancer-specific state.
  • the invention as mentioned relates to discrete disease-specific states such as discrete renal cell cancer-specific states for use as a diagnostic and/or prognostic marker in classifying samples from patients, which are suspected of being afflicted by a disease, optionally by a hyper-proliferative disease such as renal cell cancer or breast cancer.
  • the invention also relates to discrete disease-specific states such as discrete renal cell cancer-specific states for use as a diagnostic and/or prognostic marker in classifying cell lines of a disease, optionally of a hyper-proliferative disease such as renal cell cancer or breast cancer.
  • the invention further relates to discrete disease- specific states such as discrete renal cell cancer- or breast cancer-specific states for use as a target for development of pharmaceutically active compounds.
  • the invention also relates to signatures for use as a diagnostic and/or prognostic marker in classifying samples from patients, which are suspected of being afflicted by a disease, optionally by hyper-proliferative disease such as renal cell cancer or breast cancer wherein the signature comprises a qualitative and/or quantitative pattern of at least one descriptor and wherein the signature is indicative of a discrete disease-specific state such as a discrete renal cell cancer- or discrete breast cancer- specific state.
  • the invention also relates to signatures for use as a diagnostic and/or prognostic marker in classifying cell lines of a disease, optionally of a hyper-proliferative disease such as renal cell cancer or breast cancer wherein the signature comprises a qualitative and/or quantitative pattern of at least one descriptor and wherein the signature is indicative of a discrete disease-specific state such as a discrete renal cell cancer- or discrete breast cancer-specific state.
  • the invention relates to signatures for use as a read out for a target in the development, identification and/or application of pharmaceutically active compounds, wherein the signature comprises a qualitative and/or quantitative pattern of at least one descriptor and wherein the signature is indicative of a discrete disease-specific state such as a discrete renal cell cancer- or discrete breast cancer-specific state.
  • the target may be the discrete disease specific state which is reflected by the signature.
  • the discrete disease-specific states such as discrete renal cell cancer- or discrete breast cancer-specific states and signatures relating thereto can be used for diagnostic purposes.
  • samples of patients suffering from a disease such as a hyper- proliferative disease, e.g. renal cell cancer or breast cancer may be analyzed for their discrete disease-specific states and classified accordingly.
  • a disease such as a hyper- proliferative disease, e.g. renal cell cancer or breast cancer
  • the importance of discrete disease-specific states for classifying samples and thus for diagnosing patients become clear from the experiments on RCCs as described in PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310.
  • Renal cell cancer a Renal cell cancer
  • the present invention provides further evidence that the discrete renal cell cancer-specific states A, B, C and D as reflected by the expression pattern of the descriptors of tables 1, 2, 3 and 4 (see also Experiment 2) are indeed biologically relevant. It was assumed that potential differences, possible representing functional or metabolomic irregularities among states might become evident when best state descriptors for each such state are analyzed by means of bio informatics according to functional, known and predicted protein-protein interactions. To this end STRING
  • the present invention in one aspect thus relates to a method of diagnosing, stratifying and/or screening a hyper-pro liferative disease such as renal cell cancer in at least one patient, which is suspected of being afflicted by said or in at least one cell line of said disease comprising at least the steps of:
  • step b. Allocating a discrete disease-specific state to said sample based on the signature determined in step b.).
  • the sample may be a tumor sample of renal cell cancer.
  • a signature There may be different ways to test for a signature. If the signature is not known yet, one may identify it as described above. If the signature is already known, one can test for it by analyzing the quality and/or quantity of descriptors that were used for identification of the signature. One can also use optimized signatures which allow best differentiation between different states. If for example the signature is based on expression data for a set of given genes or gene-associated molecules such as RNAs or proteins, one can test for a signature by simply determining the expression pattern for this set of molecules. This may be done by standard methods such as by micro- array expression analysis. One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 1, 2, 3 and 4.
  • the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample. If one has identified the signature, one also knows the discrete disease specific state which correlates with this signature. Using such methods one can thus classify patient samples by common molecular mechanisms that lead to the same discrete disease specific molecular states.
  • the invention preferably relates in one embodiment to identifying discrete disease specific states and preferably discrete renal cell cancer-specific states by analyzing a hyper-proliferative disease such as renal cell cancer for signatures being indicative of discrete disease specific states as described above.
  • a hyper-proliferative disease such as renal cell cancer for signatures being indicative of discrete disease specific states as described above.
  • This analysis will be performed for a specific type of hyper-proliferative disease such as e.g. renal cell cancer.
  • the diseases may be identified by common selection criteria such as the organs being affected. However, initially no attention will be given to sub- classifications of these hyper-proliferative diseases, which are based on e.g.
  • discrete disease specific states for a disease like e.g. RCC, lung cancer, breast cancer, or as in the present case renal cell cancer, etc
  • the discrete disease specific state therefore usually allows one to directly predict which sub-type of the disease in question is developing (e.g. state A, B, C or D for renal cell cancer, RCC, lung cancer (see also PCT/EP2011/057691)).
  • subtypes are correlated with e.g. clinically relevant parameters such as survival time.
  • discrete disease specific state preferably allows distinguishing different subtypes of a disease according to a new classification scheme, which links the subtype to clinically or pharmacologically important parameters.
  • discrete disease specific states exist in diseases and can be correlated with subtypes that are characterized not necessarily by their histological properties but by clinically or pharmacologically relevant parameters thus allows deciphering disease through a new code which is based on the discrete disease specific states, substates and levels.
  • the possibility of assigning a discrete disease-specific state to samples allows analyzing the effectiveness of treatments with specific drugs. For example, one can test a patient or a population of patients suffering from a hyper-proliferative disease for (i) their reaction towards treatment with a pharmaceutically active agent and (ii) for their discrete disease specific molecular state.
  • the reaction towards treatment may be measured by e.g. the quality of and quantity of clinical
  • the invention in one aspect thus relates to a method of determining the
  • responsiveness of at least one human or animal individual which is suspected of being afflicted by a hyper-proliferative disease, preferably by renal cell cancer towards a pharmaceutically active agent comprising at least the steps of:
  • the signature may be tested for as described above.
  • the sample may be a tumor sample such as renal cell cancer.
  • One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 1, 2, 3 and 4. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample. Being able to predict the responsiveness of e.g. patients with a discrete disease specific state towards treatment is helpful in many aspects. For example, if such responsiveness is known, one can pre-select patients for treatment. Identification of signatures and discrete disease specific states can thus serve as companion diagnostics, which allow pre-selecting patients for effective treatment.
  • the invention in one embodiment thus relates to a method of predicting the responsiveness of at least one patient which is suspected of being afflicted by a hyper-proliferative disease, preferably by renal cell cancer towards a
  • pharmaceutically active agent comprising at least the steps of: a. Determining whether a correlation exists between effects on disease symptoms and/or discrete disease-specific states and the initial discrete disease-specific states as a consequence of administration of a pharmaceutically active agent as described above;
  • step d Comparing the discrete disease-specific state of the sample in step c. vs. the discrete disease-specific state for which a correlation has been determined in step a.);
  • the signature may be tested for as described above.
  • the sample may be a tumor sample such as renal cell cancer.
  • One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 1, 2, 3 and 4. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample.
  • samples from patients can be characterized as to their discrete disease specific states. Further, cell lines of diseases may also display such discrete disease specific states. It is assumed that a pharmaceutically active agent towards which a patient with a discrete disease specific state is responsive may in some instances induce a switch to another discrete disease specific sate (see in this respect PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310). This other discrete disease specific state may either be a completely new discrete disease specific state or it may be a discrete disease specific state, which has been found in other patients.
  • a pharmaceutically active agent may induce a switch from a discrete disease specific state which is correlated with low average survival times to a discrete disease specific state which is correlated with a longer average survival time.
  • the discrete disease specific states and signatures relating thereto may be identified as described above.
  • the target on which the pharmaceutically active agent would act is thus the discrete disease specific state.
  • the discrete disease specific states are thus considered to targets of pharmaceutically active agents.
  • the invention in one embodiment therefore relates to a method of determining the effects of a pharmaceutically active compound, comprising at least the steps of: a. Providing a sample of at least one human or animal individual which is suspected of being afflicted by a hyper-pro liferative disease, preferably by renal cell cancer or a cell line of said disease before a pharmaceutically active agent is applied;
  • the signature may be tested for as described above.
  • the sample may be a tumor sample such as renal cell cancer.
  • One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 1, 2, 3 and 4. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample.
  • the effects that are determined by this method may e.g. allow identification of compounds which may have a positive influence on the disease if e.g. a switch to a discrete disease specific state correlated with a more favorable clinical parameter such as increased survival time is observed.
  • the methods may, however, also allow identification of toxic compounds if these compounds induce a switch to a discrete disease specific state correlated with a less favorable clinical parameter such as decreased survival time.
  • These methods may thus be used as assays in the development, identification and/or screening of potential pharmaceutically active compounds, e.g. to determine the potential effectiveness of a pharmaceutically active compound in a disease such as a hyper-pro liferative disease. These assays may also be used for determining the toxicity of a pharmaceutically active compound.
  • Such discrete state-related assay systems for active and/or toxic drug candidates could be of enormous value to identify new pharmaceuticals.
  • the switch in state monitored by switch in signature marks an interesting screening system as a general "read out" for changing a tumor status. So the "read out” is related to functional efficacy rather than blocking a certain molecular target not necessarily being related to tumor function.
  • Such screening system would simply pick up any compound switching the state irrespective of the molecular target of interaction.
  • Such screening resembles assays interfering with virus propagation in cell cultures rather than screening for inhibitors of a certain viral enzyme just as reverse transcriptase.
  • the present invention in general thus relates to states, signatures and descriptors for use in diagnosing, stratifying, screening, prognosing human or animal individual being suspected of suffering from or suffering from renal cell cancer.
  • the present invention further relates to immunoassays, kits, arrays, and other type of equipment which allows determining the state of human or animal individuals being suspected of suffering from or suffering from renal cell cancer.
  • the signature may be tested for as described above.
  • the sample may be a tumor sample such as renal cell cancer.
  • One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 1, 2, 3 and 4. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample.
  • the present invention thus also relates to a microarray comprising specifically the sets of descriptors of tables 1, 2, 3 and 4 either alone or in combination.
  • the array comprises preferably at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or at least 20 descriptors of table 1, 2, 3, and/or 4.
  • the present inventions also relates to an immunoassay or ELISA kit allowing for determining expression of specifically the sets of descriptors of tables 1, 2, 3 and 4 either alone or in combination.
  • the immunoassay or ELISA kit comprises preferably at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or at least 20 descriptors of table 1, 2, 3, and/or 4.
  • the expression patterns of genes 1 to 100 can be used to distinguish between the discrete renal cell cancer specific states A vs. BCD.
  • the expression patterns of genes 101 to 200 can be used to distinguish between the discrete renal cell cancer specific states B vs. ACD.
  • the expression patterns of genes 201 to 300 can be used to distinguish between the discrete renal cell cancer specific states C vs. ABD.
  • the expression patterns of genes 301 to 399 can be used to distinguish between the discrete renal cell cancer specific states D vs. ABC.
  • the implications of these results are set forth. Then, the computer-implemented, algorithm based approaches are explained in further detail.
  • the expression pattern of about 400 genes which are listed in table 1 , 2, 3 and 4 can be used to unambiguously identify the four discrete renal cell cancer specific states, which for sake of nomenclature have been named A, B, C and D herein.
  • genes 1 to 100 (“normal") of table 1 are found to be over- expressed for a sample of a human or animal individual, the individual will be characterized as having the discrete renal cell cancer specific state A. If genes 101 to 185 of table 2 are found to be under-expressed ("invers") and if genes 186 to 200 ("normal) of table 2 are found to be over-expressed for a sample of a human or animal individual, the individual will be characterized as having the discrete renal cell cancer specific state B. If genes 201 to 300 of table 3 are found to be over-expressed ("normal”) for a sample of a human or animal individual, the individual will be characterized as having the discrete renal cell cancer specific state C.
  • genes 301 to 399 of table 4 are found to be under-expressed ("invers") for a sample of a human or animal individual, the individual will be characterized as having the discrete renal cell cancer specific state D.
  • Expression levels may be determined using the Affymetrix gene chips HG-U133A, HG-U133B, HG-U133_Plus_2, etc.
  • the decision as to whether a certain gene in a specific sample is over- or under-expressed will be taken in comparison to a control. This control will be either implemented in the software, or an overall median or other arithmetic mean across measurements is built. By implying a multitude of samples it is also conceivable to calculate a median and/or mean for each gene respectively.
  • a respective gene expression value is monitored as up or down-regulated.
  • Affymetrix gene chip expression analysis one may rely on the "limit value" of tables 1, 2, 3 and 4 for making a decision as to over- or under- expression. The limit value will be put in the respective software, which is used for expression analysis, individually for each gene.
  • the decision as to whether a respective gene is over- or under-expressed is made with respect to a control level which will be specific for the respective detection method and which is determined typically with respect to a value typical for healthy tissue.
  • renal cell cancer signatures as they are defined by the expression patterns of the genes of tables 1, 2, 3 and 4 reflect the outcome of a statistical analysis across multiple samples.
  • the reliability of the determination increases if more than one gene is analyzed with respect to its expression.
  • the analysis of the expression of at least 10 genes will usually be sufficient to assign a discrete renal cell cancer-specific state with a reliability of at least about 90%.
  • the analysis of the expression pattern of at least 5 genes of genes 1 to 100 of table 1 will usually allow deciding whether state A is present with a reliability of about 80% or more. This reliability will increase if more genes are analyzed. Thus, the analysis of the expression pattern of at least 10 genes of genes 1 to 100 of table 1 will usually allow deciding whether state A or state BCD is present with a reliability of about 90% or more. The analysis of the expression pattern of at least 15 genes of genes 1 to 100 of table 1 will usually allow deciding whether state A or state BCD is present with a reliability of about 95% or more.
  • the analysis of the expression pattern of at least 20 genes of genes 1 to 100 of table 1 will usually allow deciding whether state A or state BCD is present with a reliability of about 98% or more and the analysis of the expression pattern of at least 25 genes of genes 1 to 100 of table 1 will usually allow deciding whether state A or state BCD is present with a reliability of about 99% or more.
  • the analysis of the expression pattern of at least 5 genes of genes 101 to 200 of table 2 will usually allow deciding whether state B is present with a reliability of about 80% or more. This reliability will increase if more genes are analyzed.
  • the analysis of the expression pattern of at least 10 genes of genes 101 to 200 of table 2 will usually allow deciding whether state B or state ACD is present with a reliability of about 90% or more.
  • the analysis of the expression pattern of at least 15 genes of genes 101 to 200 of table 2 will usually allow deciding whether state B or state ACD is present with a reliability of about 95% or more.
  • the analysis of the expression pattern of at least 20 genes of genes 101 to 200 of table 2 will usually allow deciding whether state B or state ACD is present with a reliability of about 98% or more and the analysis of the expression pattern of at least 25 genes of genes 101 to 200 of table
  • the analysis of the expression pattern of at least 10 genes of genes 201 to 300 of table 3 will usually allow deciding whether state C or state ABD is present with a reliability of about 90% or more.
  • the analysis of the expression pattern of at least 15 genes of genes 201 to 300 of table 3 will usually allow deciding whether state C or state ABD is present with a reliability of about 95% or more.
  • the analysis of the expression pattern of at least 20 genes of genes 201 to 300 of table 3 will usually allow deciding whether state C or state ABD is present with a reliability of about 98% or more and the analysis of the expression pattern of at least 25 genes of genes 201 to 300 of table
  • the analysis of the expression pattern of at least 10 genes of genes 301 to 399 of table 4 will usually allow deciding whether state D or state ABC is present with a reliability of about 90% or more.
  • the analysis of the expression pattern of at least 15 genes of genes 301 to 399 of table 4 will usually allow deciding whether state D or state ABC is present with a reliability of about 95% or more.
  • the analysis of the expression pattern of at least 20 genes of genes 301 to 399 of table 4 will usually allow deciding whether state D or state ABC is present with a reliability of about 98% or more and the analysis of the expression pattern of at least 25 genes of genes 301 to 399 of table 4 will usually allow deciding whether state D or state ABC is present with a reliability of about 99% or more.
  • the set of about 4x100 genes of tables 1, 2, 3 and 4 thus serves as a reservoir for the unambiguous characterization of states A, B, C and D.
  • analyzing the expression behavior of e.g. of at least about 2, at least about 3, at least about 4, at least about 5, at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, at least about 11, at least about 12, at least about 13, at least about 14, at least about 15, at least about 16, at least about 17, at least about 18, at least about 19 or at least about 20 genes of genes 1 to 400 one will be able to decide whether a patient suffers from renal cell cancer and (ii) whether the patient suffers from cancer of state A or any of the other states B, C or D.
  • the present invention thus relates to a signature, which can be derived from the expression pattern of at least about 2, at least about 3, at least about 4, at least about 5, of at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, at least about 11, at least about 12, at least about 13, at least about 14, at least about 15, at least about 16, at least about 17, at least about 18, at least about 19 or at least about 20 genes of genes 1 to 400 of tables 1, 2, 3 and 4.
  • This signature will allow to unambiguously decide whether one of four discrete renal cell cancer specific states, namely state A, B, C or D is present.
  • the signature for A is defined by an over-expression of genes 1 to 100 of table 1.
  • the signature for B is defined by an under-expression of genes 101 to 185 and an over-expression of genes 186 to 200 of table 2.
  • the signature for C is defined by an over-expression of genes 201 to 300 of table 3.
  • the signature for D is defined by an under-expression of genes 301 to 399 of table 4. It is to be understood that the survival rates which have been allocated by to states A, B and C in PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310 equally apply to the states A, B and C as mentioned herein.
  • the signature described herein for state A indicates an RCC type with a high average survival time where about 70 to about 90% such as about 80%> of patients can be expected to live after 60 months.
  • the presence of this signature will be indicative of a discrete disease-specific state in RCC, which is indicative of an intermediate average survival time where about 60 to about 80% such as about 70% of patients can be expected to live after 90 months.
  • intermediate average survival time where about 45 to about 55% such as about 50%> of patients can be expected to live after 60 months.
  • the presence of this signature will be indicative of a discrete disease-specific state in RCC, which is indicative of an intermediate average survival time where about 40 to about 50% such as about 45% of patients can be expected to live after 90 months.
  • the signature described herein for state C indicates an RCC type with a low average survival time where e.g. about 30%> to about 45% such as about 40% of patients can be expected to live after 60 months.
  • the presence of this signature will be indicative of a discrete disease-specific state in RCC, which is indicative of an intermediate average survival time where about 5 to about 30% such as about 10% to 20% of patients can be expected to live after 90 months.
  • the present invention also relates to the above signatures for use as a diagnostic and/or prognostic marker in the context of renal cell cancer. By determining whether the signatures are present, one can take a decision as to whether a patient suffers from renal cell cancer as such and/or will likely develop renal cell cancer as such in the future. Further, one can distinguish between the aggressiveness of renal cell cancer development and adjust therapy accordingly. Further, the present invention relates to the above signatures for use in stratifying test populations for clinical trials for treatment of renal cell cancer. It is to be understood that determining the expression pattern of genes 1 to 400 of tables 1 , 2, 3 and 4 by microarray expression analysis as described is one of various options even though it can be preferred. However, it is also contemplated to perform such expression analysis on the protein level by e.g. ELISA, Immunoassay and/or Western Blotting. It is further to be understood that all methods of expression analysis is preferably conducted on renal cell cancer tissue.
  • the present invention relates to the above signatures for use as a read out of a target for development, identification and/or screening of at least one
  • the present invention also relates to the above signatures for use in stratifying human or animal individuals which are suspected to suffer from ongoing or imminent renal cell cancer development. Stratification allows to group these individuals by their discrete renal cell cancer specific states. Potential pharmaceutically active compounds which are assumed to be effective in renal cell cancer treatment can thus be analyzed in such pre-selected patient groups.
  • the present invention in one embodiment also relates to a method of diagnosing, prognosing, stratifying and/or screening renal cell cancer in at least one human or animal patient, which is suspected of being afflicted by said disease, comprising at least the steps of:
  • the present invention in one embodiment relates to a method of determining the responsiveness of at least one human or animal individual, which is suspected of being afflicted by renal cell cancer, towards a pharmaceutically active agent comprising at least the steps of:
  • the invention relates to a method of predicting the responsiveness of at least one patient which is suspected of being afflicted by renal cell cancer, towards a pharmaceutically active agent comprising at least the steps of:
  • b Testing a sample of a human or animal individual patient which is suspected of being afflicted by renal cell cancer for a signature indicative of a discrete renal cell cancer specific state by determining expression of at least 1 gene, preferably of at least 4 genes, more preferably of at least 5, 6, 7, 8, 9 or 10 genes of genes 1 to 100, 101 to 200, 201 to 300, 301 to 399 of tables 1, 2, 3, and/or 4; c. Allocating a discrete disease-specific state to said sample based on the signature determined in step c);
  • step d Comparing the discrete renal cell cancer-specific state of the sample in step c. vs. the discrete renal cell cancer-specific state for which a correlation has been determined in step a.);
  • One embodiment of the invention relates to a method of determining the effects of a potential pharmaceutically active agent for treatment of renal cell cancer, comprising at least the steps of:
  • the present invention provides further evidence that the discrete breast cancer- specific states A, B, C and D as reflected by the expression pattern of the descriptors of tables 5, 6, 7 and 8 (see also Experiment 4) are indeed biologically relevant. It was assumed that potential differences, possible representing functional or metabolomic irregularities among states might become evident when best state descriptors for each such state are analyzed by means of bio informatics according to functional, known and predicted protein-protein interactions. To this end STRING
  • the present invention in one aspect thus relates to a method of diagnosing, stratifying and/or screening a hyper-proliferative disease such as breast cancer in at least one patient, which is suspected of being afflicted by said or in at least one cell line of said disease comprising at least the steps of:
  • the sample may be a tumor sample of breast cancer.
  • the invention preferably relates in one embodiment to identifying discrete disease specific states and preferably discrete breast cancer-specific states by analyzing a hyper-proliferative disease such as breast cancer for signatures being indicative of discrete disease specific states as described above.
  • a hyper-proliferative disease such as breast cancer for signatures being indicative of discrete disease specific states as described above.
  • This analysis will be performed for a specific type of hyper-proliferative disease such as e.g. breast cancer.
  • the diseases may be identified by common selection criteria such as the organs being affected. However, initially no attention will be given to sub-classifications of these hyper-proliferative diseases, which are based on e.g. histological classification schemes. Once one has identified different discrete disease specific states for a disease like e.g.
  • the disease specific state therefore usually allows one to directly predict which sub-type of the disease in question is developing (e.g. state A, B, C or D for breast cancer, RCC, lung cancer (see also PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310). These subtypes are correlated with e.g. clinically relevant parameters such as survival time.
  • the term discrete disease specific state preferably allows distinguishing different subtypes of a disease according to a new classification scheme, which links the subtype to clinically or pharmacologically important parameters.
  • the knowledge that discrete disease-specific states exist e.g. in breast cancer can also be used to stratify patient cohorts undergoing clinical trials for new treatments of breast cancer.
  • certain pharmaceutically active agents may act only on specific discrete disease-specific states. If a patient cohort which undergoes a clinical trial with such an active agent consists mainly of individuals with other discrete breast cancer-specific states, any effects of the pharmaceutically active agent on the specific discrete breast cancer-specific state may not be discernible. Such effects may become, however, statistically significant if the patient cohort is grouped according to the discrete breast cancer-specific states.
  • the knowledge on the existence of breast cancer-specific states can be used to stratify test populations undergoing clinical trials according to their discrete breast cancer-specific states.
  • Experiment 5 An illustration of inter alia this aspect of the invention is provided by Experiment 5 which describes that breast cancer patients having a breast cancer-specific state A as it is described hereinafter show prolonged metastasis free survival upon treatment with tamoxifen compared to patients not having this breast cancer-specific state.
  • the knowledge on breast cancer-specific states could be used to stratify patient cohorts for clinical trials involving e.g. future combination therapies including tamoxifen.
  • This knowledge may also be used for diagnostic purpose, namely to identify patients diagnosed with breast cancer which would be responsive to tamoxifen treatment.
  • the classification of samples be it of patients or cell lines for hyper-proliferative diseases such as breast cancer, for their discrete disease specific states has further implications. Given that discrete disease specific states seem to reflect decisive stages of the underlying molecular disease mechanisms, they can be linked to relevant clinical and pharmacological parameters such as average survival times or responsiveness to drugs.
  • the possibility of assigning a discrete disease-specific state to samples allows analyzing the effectiveness of treatments with specific drugs. For example, one can test a patient or a population of patients suffering from a hyper-proliferative disease for (i) their reaction towards treatment with a pharmaceutically active agent and (ii) for their discrete disease specific molecular state.
  • the reaction towards treatment may be measured by e.g. the quality of and quantity of clinical
  • responsiveness of at least one human or animal individual which is suspected of being afflicted by a hyper-proliferative disease, preferably by breast cancer towards pharmaceutically active agent comprising at least the steps of:
  • the signature may be tested for as described above.
  • the sample may be a tumor sample such as breast cancer.
  • One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 5, 6, 7 and 8. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample.
  • An example of for this embodiment of the invention is the treatment of breast cancer patients with tamoxifen as described hereinafter which shows that breast cancer patients with breast cancer-specific state A show a prolonged distant metastasis free survival upon treatment with tamoxifen.
  • the invention in one embodiment thus relates to a method of predicting the responsiveness of at least one patient which is suspected of being afflicted by a hyper-proliferative disease, preferably by breast cancer towards a pharmaceutically active agent comprising at least the steps of:
  • step d Comparing the discrete disease-specific state of the sample in step c. vs. the discrete disease-specific state for which a correlation has been determined in step a.);
  • the signature may be tested for as described above.
  • the sample may be a tumor sample such as breast cancer.
  • One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 5, 6, 7 and 8. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample.
  • An example of for this embodiment of the invention is the treatment of breast cancer patients with tamoxifen as described hereinafter which shows that breast cancer patients with breast cancer-specific state A show a prolonged distant metastasis free survival upon treatment with tamoxifen.
  • a pharmaceutically active agent may induce a switch from a discrete disease specific state which is correlated with low average survival times to a discrete disease specific state which is correlated with a longer average survival time.
  • the discrete disease specific states and signatures relating thereto may be identified as described above.
  • the target on which the pharmaceutically active agent would act is thus the discrete disease specific state.
  • the discrete disease specific states are thus considered to targets of pharmaceutically active agents.
  • the invention in one embodiment therefore relates to a method of determining the effects of a pharmaceutically active compound, comprising at least the steps of: a. Providing a sample of at least one human or animal individual which is suspected of being afflicted by a hyper-pro liferative disease, preferably by breast cancer or a cell line of said disease before a pharmaceutically active agent is applied;
  • the signature may be tested for as described above.
  • the sample may be a tumor sample such as breast cancer.
  • One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 5, 6, 7, and 8. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample.
  • An example of for this embodiment of the invention is the treatment of breast cancer patients with tamoxifen as described hereinafter which shows that breast cancer patients with breast cancer-specific state A show a prolonged distant metastasis free survival upon treatment with tamoxifen.
  • the effects that are determined by this method may e.g. allow identification of compounds which may have a positive influence on the disease if e.g.
  • the methods may, however, also allow identification of toxic compounds if these compounds induce a switch to a discrete disease specific state correlated with a less favorable clinical parameter such as decreased survival time.
  • These methods may thus be used as assays in the development, identification and/or screening of potential pharmaceutically active compounds, e.g. to determine the potential effectiveness of a pharmaceutically active compound in a disease such as a hyper-pro liferative disease.
  • These assays may also be used for determining the toxicity of a pharmaceutically active compound.
  • Such discrete state-related assay systems for active and/or toxic drug candidates could be of enormous value to identify new pharmaceuticals.
  • the switch in state monitored by switch in signature marks an interesting screening system as a general "read out" for changing a tumor status. So the "read out” is related to functional efficacy rather than blocking a certain molecular target not necessarily being related to tumor function.
  • Such screening system would simply pick up any compound switching the state irrespective of the molecular target of interaction.
  • the present invention in general thus relates to states, signatures and descriptors for use in diagnosing, stratifying, screening, prognosing human or animal individual being suspected of suffering from or suffering from breast cancer.
  • the present invention further relates to immunoassays, kits, arrays, and other type of equipment which allows determining the state of human or animal individuals being suspected of suffering from or suffering from breast cancer.
  • the signature may be tested for as described above.
  • the sample may be a tumor sample such as breast cancer.
  • One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 5, 6, 7 and 8. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample.
  • the present invention thus also relates to a microarray comprising specifically the sets of descriptors of tables 5, 6, 7, and 8 either alone or in combination.
  • the array comprises preferably at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or at least 20 descriptors of table 1, 2, 3, and/or 4.
  • the present inventions also relates to an immunoassay or ELISA kit allowing for determining expression of specifically the sets of descriptors of tables 5, 6, 7, and 8 either alone or in combination.
  • the immunoassay or ELISA kit comprises preferably at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or at least 20 descriptors of table 5, 6, 7, and/or 4.
  • the expression patterns of genes 1 to 100 can be used to distinguish between the discrete breast cancer specific states A vs. BCD.
  • the expression patterns of genes 101 to 200 can be used to distinguish between the discrete breast cancer specific states B vs. ACD.
  • the expression patterns of genes 201 to 300 can be used to distinguish between the discrete breast cancer specific states C vs. ABD.
  • the expression patterns of genes 301 to 399 can be used to distinguish between the discrete breast cancer specific states D vs. ABC.
  • the expression pattern of about 400 genes which are listed in table 5, 6, 7, and 8 can be used to unambiguously identify the four discrete breast cancer specific states, which for sake of nomenclature have been named A, B, C and D herein.
  • genes 1 to 100 of Table 5 are found to be over-expressed for a sample of a human or animal individual, the individual will be characterized as having the discrete breast cancer specific state A. If genes 101 to 200 of table 6 are found to be under-expressed ("invers") for a sample of a human or animal individual, the individual will be characterized as having the discrete breast cancer specific state B.
  • genes 201 to 292 of table 7 are found to be under-expressed ("invers") and if genes 293 to 300 of table 7 are found to be over-expressed (“normal") for a sample of a human or animal individual, the individual will be characterized as having the discrete breast cancer specific state C. If genes 301 to 399 of table 8 are found to be under-expressed ("invers") for a sample of a human or animal individual, the individual will be characterized as having the discrete breast cancer specific state D. Expression levels may be determined using the Affymetrix gene chips HG-U133A, HG-U133B, HG-U133_Plus_2, etc.
  • the decision as to whether a respective gene is over- or under-expressed is made with respect to a control level which will be specific for the respective detection method and which is determined typically with respect to a value typical for healthy tissue.
  • breast cancer signatures as they are defined by the expression patterns of the genes of tables 5, 6, 7, and 8 reflect the outcome of a statistical analysis across multiple samples.
  • the reliability of the determination increases if more than one gene is analyzed with respect to its expression.
  • the analysis of the expression of at least 10 genes will usually be sufficient to assign a discrete breast cancer-specific state with a reliability of at least about 90%..
  • the analysis of the expression pattern of at least 5 genes of genes 1 to 100 of table 5 will usually allow deciding whether state A is present with a reliability of about 80% or more. This reliability will increase if more genes are analyzed. Thus, the analysis of the expression pattern of at least 10 genes of genes 1 to 100 of table 5 will usually allow deciding whether state A or state BCD is present with a reliability of about 90% or more. The analysis of the expression pattern of at least 15 genes of genes 1 to 100 of table 5 will usually allow deciding whether state A or state BCD is present with a reliability of about 95% or more.
  • the analysis of the expression pattern of at least 20 genes of genes 1 to 100 of table 5 will usually allow deciding whether state A or state BCD is present with a reliability of about 98% or more and the analysis of the expression pattern of at least 25 genes of genes 1 to 100 of table 5 will usually allow deciding whether state A or state BCD is present with a reliability of about 99% or more.
  • the analysis of the expression pattern of at least 5 genes of genes 101 to 200 of table 6 will usually allow deciding whether state B is present with a reliability of about 80% or more. This reliability will increase if more genes are analyzed.
  • the analysis of the expression pattern of at least 10 genes of genes 101 to 200 of table 6 will usually allow deciding whether state B or state ACD is present with a reliability of about 90% or more.
  • the analysis of the expression pattern of at least 15 genes of genes 101 to 200 of table 6 will usually allow deciding whether state B or state ACD is present with a reliability of about 95% or more.
  • the analysis of the expression pattern of at least 20 genes of genes 101 to 200 of table 6 will usually allow deciding whether state B or state ACD is present with a reliability of about 98% or more and the analysis of the expression pattern of at least 25 genes of genes 101 to 200 of table
  • the analysis of the expression pattern of at least 10 genes of genes 201 to 300 of table 7 will usually allow deciding whether state C or state ABD is present with a reliability of about 90% or more.
  • the analysis of the expression pattern of at least 15 genes of genes 201 to 300 of table 7 will usually allow deciding whether state C or state ABD is present with a reliability of about 95% or more.
  • the analysis of the expression pattern of at least 20 genes of genes 201 to 300 of table 7 will usually allow deciding whether state C or state ABD is present with a reliability of about 98% or more and the analysis of the expression pattern of at least 25 genes of genes 201 to 300 of table
  • the analysis of the expression pattern of at least 10 genes of genes 301 to 399 of table 8 will usually allow deciding whether state D or state ABC is present with a reliability of about 90% or more.
  • the analysis of the expression pattern of at least 15 genes of genes 301 to 399 of table 8 will usually allow deciding whether state D or state ABC is present with a reliability of about 95% or more.
  • the analysis of the expression pattern of at least 20 genes of genes 301 to 399 of table 8 will usually allow deciding whether state D or state ABC is present with a reliability of about 98% or more and the analysis of the expression pattern of at least 25 genes of genes 301 to 399 of table 8 will usually allow deciding whether state D or state ABC is present with a reliability of about 99% or more.
  • the set of about 4x100 genes of tables 5, 6, 7 and 8 thus serves as a reservoir for the unambiguous characterization of states A, B, C and D.
  • the present invention thus relates to a signature, which can be derived from the expression pattern of at least about 2, at least about 3, at least about 4, at least about 5, of at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, at least about 11, at least about 12, at least about 13, at least about 14, at least about 15, at least about 16, at least about 17, at least about 18, at least about 19 or at least about 20 genes of genes 1 to 400 of tables 5, 6, 7 and 8.
  • This signature will allow to unambiguously decide whether one of four discrete breast cancer specific states, namely state A, B, C or D is present.
  • the signature for A is defined by an over-expression of genes 1 to 100 of table 5.
  • the signature for B is defined by an under-expression of genes 101 to 200 of table 6.
  • the signature for C is defined by an under-expression of genes 201 to 292 and an over-expression of genes 293 to 300 of Table 7.
  • the signature for D is defined by an under-expression of genes 301 to 399 of Table 8.
  • the present invention also relates to the above signatures for use as a diagnostic and/or prognostic marker in the context of breast cancer. By determining whether the signatures are present, one can take a decision as to whether a patient suffers from breast cancer as such and/or will likely develop breast cancer as such in the future. Further, one can distinguish between the aggressiveness of breast cancer
  • the present invention relates to the above signatures for use in stratifying test populations for clinical trials for treatment of breast cancer.
  • determining the expression pattern of genes 1 to 400 of tables 5, 6, 7 and 8 by microarray expression analysis as described is one of various options even though it can be preferred. However, it is also contemplated to perform such expression analysis on the protein level by e.g. ELISA, Immunoassay and/or Western Blotting. It is further to be understood that all methods of expression analysis is preferably conducted on breast cancer tissue. Further, the present invention relates to the above signatures for use as a read out of a target for development, identification and/or screening of at least one
  • the present invention also relates to the above signatures for use in stratifying human or animal individuals which are suspected to suffer from ongoing or imminent breast cancer development. Stratification allows to group these individuals by their discrete breast cancer specific states. Potential pharmaceutically active compounds which are assumed to be effective in breast cancer treatment can thus be analyzed in such pre- selected patient groups.
  • the present invention in one embodiment also relates to a method of diagnosing, prognosing, stratifying and/or screening breast cancer in at least one human or animal patient, which is suspected of being afflicted by said disease, comprising at least the steps of:
  • the present invention in one embodiment relates to a method of determining the responsiveness of at least one human or animal individual, which is suspected of being afflicted by breast cancer, towards a pharmaceutically active agent comprising at least the steps of:
  • the invention relates to a method of predicting the responsiveness of at least one patient which is suspected of being afflicted by breast cancer, towards a pharmaceutically active agent comprising at least the steps of:
  • step c Allocating a discrete disease-specific state to said sample based on the signature determined in step c);
  • step d Comparing the discrete breast cancer-specific state of the sample in step c. vs. the discrete breast cancer-specific state for which a correlation has been determined in step a.);
  • An example of for this embodiment of the invention is predicting the responsiveness of breast cancer patients depending on their breast-cancer specific states A, B, C, and D toward treatment with tamoxifen.
  • the examples as presented hereinafter show that a breast cancer patient with the breast cancer-specific state A, but not on a state other than A, will react positively towards treatment with tamoxifen as can be taken from the prolonged distant metastasis free survival time.
  • One embodiment of the invention relates to a method of determining the effects of a potential pharmaceutically active agent for treatment of breast cancer, comprising at least the steps of:
  • testing said sample for a signature indicative of a discrete breast cancer specific state by determining expression of at least 1 , preferably of at least 4 genes, more preferably of at least 5, 6,7, 8, 9 or 10 genes of genes 1 to 100, 101 to 200, 201 to 300, 301 to 399 of tables 5, 6, 7, and/or 8;
  • the present invention also relates to a computer implemented method, a computer or other technical device which is suitable to perform the above steps and methods or those described in Experiment 1 and Figure 3.
  • the latter computer-implemented methods will allow identifying states, signatures and descriptors for a disease.
  • the invention further relates to the use of such computer-implemented methods, computers, technical devices etc. for classifying renal cell cancer samples, tissues etc. Such classification may enable the above-mentioned uses of diagnosis, stratification etc.
  • Examples of such other tests include nCounter Gene Expression assays from Nanostring (Seattle, WA, U.S. A), alternative expression analysis by sequencing (ALEXA-seq, www.alexaplatform.org), Serial Analysis of Gene Expression (SAGE), Northern Blotting, and more.
  • Example 6 it is described how one can obtain a set of descriptors, which are predictive for a renal-cancer specific state and which are measurable e.g. by PCR, and optionally by qPCR, by selecting such predictive descriptors from the group of descriptors such as expressed genes that are obtainable by unsupervised two-way hierarchical clustering for microarray expression data (e.g., using Affymetrix HG- U133 A, G-U133 B, HG-U133 Plus 2.0, Agilent, Nimblegen and their derivatives such as Illumina) as described in PCT/EP201 1/057691 and Beleut et al., BMC cancer (2012), 12:310 as well as herein.
  • a set of descriptors which are predictive for a renal-cancer specific state and which are measurable e.g. by PCR, and optionally by qPCR
  • this approach can be used for determining set of predictive descriptors for other hyper proliferative diseases such as breast cancer, ovarian cancer, colorectal cancer, lung cancer, prostate cancer, brain cancer, hepato cellular carcinoma, acute myeloma, pheochromocytoma, Burkitt's lymphoma, myeloma or other types of diseases such as Parkinson's disease once it has been shown by e.g. unsupervised two-way hierarchical clustering for microarray expression data as described in PCT/EP201 1/057691 and Beleut et al., BMC cancer (2012), 12:310 as well as herein that disease-specific states exist for these types of diseases.
  • the approach starts from the group of descriptors, such as expressed genes that are obtainable by unsupervised two-way hierarchical clustering for microarray expression data as described in PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310 as well as herein.
  • probe array consists of a number of oligonucleotide probe cells and each probe cell contains a unique oligonucleotide probe.
  • Probes are tiled in probe pairs as a Perfect Match (PM) and a Mismatch (MM).
  • PM andMM are the same, except for a change to the Watson-Crick complement in the middle of the MM probe sequence.
  • a probe set consists of a series of probe pairs and represents an expressed transcript. The sample should bind stronger to PM than to MM, so one assumes that for a given probe set for example 25% or more of the PM values should be higher than the MM values to be deemed valid.
  • a next step one selects at least the 2 genes, preferably at least 10 out of the remaining genes with the most positive, and the at least 2 genes, preferably at least 10 out of the remaining genes with the most negative correlation for a disease- specific state such as renal cancer-specific states A, B, or C. Further it is possible to add at least 2, preferably at least 10 genes that are randomly selected and at least 2, preferably at least 10 genes showing the least variation across all the states.
  • the resulting set of genes can then be tested e.g. in qPCR experiments with genes grouped in pairs since qPCR readers are usually designed to measure just pairs of genes.
  • correlations between the measured qPCR expression values and the states were calculated in a first step.
  • Genes for further analysis thus are chosen according to their correlation between mRNA expression microarray and qPCR tests. All genes with a qPCR/microarray correlation inferior to a threshold value between 0 and 1, preferably 0.35 may be excluded from the model leaving a reduced set of genes.
  • This set of genes should comprise typically at least 20 genes to then allow identification of sets of predictive descriptors for each disease-specific state.
  • 22 genes had values above the selected threshold of 0.35, which are A2M, ANGPTL4, AP2M1, BDH1, CD99, COBLL1, DOCK9, EPAS1, F5, H3F3B, IFITM3, LAPTM3B, LDB2, LPCAT3, MAPRE1, NDUFA4, PGBD5, RGS5, SERBP1, SERINC3, TSG101, UFSP2.
  • X E PASI denotes the qPCR value measured for patient (j) and gene EPAS1.
  • means and standard deviations for all patients which are not allocated to the particular state are calculated, that is, nonA, nonB, nonC are calculated.
  • nonA the mean over all patients not being allocated to state A (that is, allocated to either B or C, or not allocated at all) and gene EPAS1 is calculated according to
  • X E PASI denotes the qPCR value measured for patient (j) and gene EPAS1.
  • means and standard deviations for all patients which are not allocated to the particular state are calculated, that is, nonA, nonB, nonC are calculated.
  • nonB the mean over all patients not being allocated to state B (that is, allocated to either A or C, or not allocated at all) and gene EPAS1 is calculated according to n (all petients
  • X E PASI denotes the qPCR value measured for patient (j) and gene EPAS1.
  • means and standard deviations for all patients which are not allocated to the particular state are calculated, that is, nonA, nonB, nonC are calculated.
  • nonB the mean over all patients not being allocated to state B (that is, allocated to either A or B, or not allocated at all) and gene EPAS1 is calculated according to
  • MAPRE1 25 0.8 24.9 0.7 24.7 0.6 25.5 0.8 25.2 0.8 24.5 0.5
  • the posterior probability that the measured qPCR value x for EPAS 1 denotes state A is calculated according to
  • the values are calcul for state A, according to
  • the state of a sample is allocated by determining the maximum value of the individual subset probabilities. For example to determine whether a sample is state A or not, the state SA is calculated by
  • the values are calculated, for example for state A, according to
  • the state of a sample is allocated by determining the maximum value of the individual subset probabilities. For example to determine whether a sample is state B or not, the state SB is calculated by
  • the values are calculat for example for state A, according to
  • the state of a sample is allocated by determining the maximum value of the individual subset probabilities. For example to determine whether a sample is state C or not, the state SC is calculated by
  • 3 ⁇ 4BC max ⁇ PA,subset ⁇ > ⁇ , subset B,Pc, subset c)
  • the subsets of genes for the individual states are selected from the qPCR set of genes according to the following iterative procedure: 1. the subset of predictive genes is empty in the beginning of the procedure
  • step 3 select a number of genes with the best accuracy of step 2 and add them to the set of predictive genes, and take them out of the set.
  • the number of genes selected is at least one, two, or three.
  • step 4 select a number of genes with the best accuracy of step 4 and add them to the set of predictive genes, and take them out of the set. Preferably the number of genes selected is at least one, two, or three. 5. repeat steps 4) and 5) until a desired accuracy is obtained.
  • the prediction accuracy will increase with the number of genes tested. However, cost, effort, and time spent for a test will increase as well with the number of tested genes and will eventually exceed practical limits. It is possible to set a desired or required predication accuracy such as e.g. at least about 60%, at least about 65%o, at least about 70%>, at least about 75%, at least about 80%>, at least about 85%), or at least about 90%>, at least about 91%>, at least about 92%, at least about 93%), at least about 94%>, at least about 95%, at least about 96%>, at least about 975, at least about 98%>, or at least about 99% and repeat the selection procedure until the subset of genes selected provides this accuracy.
  • Table 12 shows the accuracy obtained with an increasing subset of genes, for predicting a single state (a) and to predict the state of a sample directly from a single set of measurements (b)
  • a set of at least 2 descriptors e.g. genes will typically allow to predict a disease-specific state with an accuracy of at least 65%.
  • the accuracy can be improved if more descriptors, e.g. genes are included into the SP list for each state.
  • a set of at least 4 descriptors may typically allow to predict a disease-specific state with an accuracy of at least 75%.
  • a set of at least 10 descriptors, e.g. genes may typically allow to predict a disease-specific state with an accuracy of at least 80%.
  • a set of at least 15 descriptors, e.g. genes will typically allow to predict a disease-specific state with an accuracy of at least, 90% or at least 95%.
  • Sn for the state n is calculated according to:
  • the invention in one aspect relates to a method of diagnosing, prognosing, classifying, stratifying and/or screening a disease in a human or animal patient, which is suspected of being afflicted by said disease, comprising at least the steps of:
  • determining the expression of a set of at least two predictive descriptors e.g. genes in a sample of said patient for a disease-specific state
  • said set of predictive descriptors is selected from a group of descriptors which is indicative of disease-specific states and which is identifiable by unsupervised two-way hierarchical clustering of gene expression data for samples of said disease from different patients.
  • the control sample may be a sample, preferably an extracorporeal sample from a healthy subject.
  • the group of descriptors, from which the sets of predictive descriptors are selected and which is indicative of disease-specific states, is identifiable by unsupervised two- way hierarchical clustering of gene expression data for samples of said disease from different patients as described in PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310.
  • the approach may thus rely on identifying disease-specific states, e.g. in renal cell cancer or breast cancer by performing an unsupervised two-way hierarchical clustering approach with TIGR MeV (Saeed et al., Methods Enzymol.
  • the sets of at least 2 predictive descriptors which is preferably analyzed by qPCR analysis, is selected from a group of descriptors, which is indicative of disease- specific states and which is identifiable by unsupervised two way hierarchical clustering of gene expression data for samples of said disease from different patients, by a process comprising at least the steps of:
  • a second list or subset of predictive descripotrs e.g. genes which is empty in the beginning of the following procedure, g. computing the accuracy of predicting the desired state disease-specific state for all single descriptors, e.g. genes separetly, of the set by employing the above naive Bayes model,
  • step g select a number of descriptors, e.g. genes with the best accuracy of step g) and add them to the second list or subset of predictive descriptors, e.g. genes, and take them out of the first list,
  • step h select a number of descriptors, e.g. genes with the best accuracy of step h) and add them to the second list or subset of predictive descriptors, e.g. genes, and take them out of the first list
  • steps g) and j) repeat steps g) and j) until a desired accuracy is obtained until said second list contains at least 2 descriptors for each disease-specific state, or until the prediction accuracy reaches a predefined threshold.
  • the number of descriptors, e.g. genes selected in steps g) and j) is at least one, two, or three.
  • descriptors of steps a. to d. are combined in a first list
  • the number of descriptors, e.g. genes to be combined in said first list should be at least 40, preferably at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100.
  • a set of predictive descriptors for a given disease-specific state which is measurable by e.g. PCR and optionally qPCR, should comprise at least 2, at least 4, at least 8, at least 10, at least 12, at least 14, at least 16, or at least 20 descriptors, e.g. genes.
  • the present invention further relates to the above-mentioned method of diagnosing, prognosing, classifying, stratifying and/or screening a disease in a human or animal patient, which is suspected of being afflicted by said disease, by determining a measurable quantity, e.g. expression of a set of disease-specific state-predictive descriptors, e.g. genes, wherein expression of said set of at least 2 descriptors per disease-specific state is determined by qPCR analysis and wherein, based on the qPCR results, assignment of a disease-state for a sample of patient is calculated according to:
  • hyper proliferative diseases include renal cell cancer, breast cancer, ovarian cancer, colorectal cancer, lung cancer, prostate cancer, brain cancer, hepato cellular carcinoma, acute myeloma, pheochromocytoma, Burkitt's lymphoma or myeloma.
  • Particularly preferred hyper proliferative diseases are renal cell cancer and breast cancer.
  • a particularly preferred embodiment relates to the use of sets of predictive descriptors for diagnosing, prognosing, classifying, stratifying and/or screening renal cell cancer in a human or animal patient, which is suspected of being afflicted by said disease, the descriptor being selected from the genes of Table 10.
  • the descriptor being selected from the genes of Table 10.
  • EPAS1, LAPTM4B, DOCK9, BDHl, AP2M1, LPCAT3, State A of renal cell cancer can be predicted with an accuracy of 85%.
  • DOCK9, CD99, BDHl, PGBDB5, NDUF4A, LPCAT3, State B of renal cell cancer can be predicted with an accuracy of 93%.
  • LPCAT3, LAPTM4B, RGS5, SERINC3, F5, COBLL1, State C of renal cell cancer can be predicted with an accuracy of 76% (see also Table 11).
  • the present invention thus further relates to a combination of predictive descriptors for diagnosing, prognosing, classifying, stratifying and/or screening renal cell cancer in a human or animal patient, which is suspected of being afflicted by said disease, said descriptors being selected from Table 10.
  • the present invention relates to a method of identifying sets of predictive descriptors, e.g. genes for a disease-specific state in a sample of a patient suffering from said disease which are suitable diagnosing, prognosing, classifying, stratifying and/or screening a disease in a human or animal patient, comprising at least the steps of:
  • descriptors e.g. genes genes with the best accuracy of step h
  • second list or subset of predictive descriptors e.g. genes
  • take them out of the first list compute the accuracy of predicting the desired disease-specific state for the combination of all descriptors, e.g. genes in the second list or subset in combination with each single remaining descriptor, e.g. gene of the first list
  • step j select a number of descriptors, e.g. genes with the best accuracy of step j) and add them to the second list or subset of predictive descriptors, e.g. genes, and take them out of the first list, repeat steps i) and k) until a desired accuracy is obtained until said second list contains at least 2 descriptors, e.g. genes for each disease- specific state, or until the prediction accuracy reaches a predefined threshold.
  • a number of descriptors e.g. genes with the best accuracy of step j
  • the second list or subset of predictive descriptors e.g. genes
  • the number of descriptors, e.g. genes selected in steps g) and j) is at least one, two, or three.
  • the sets of predictive descriptors in a sample of a patient may be analyzed by qPCR analysis.
  • the sets of predictive descriptors e.g. genes may be identified for a hyper proliferative disease.
  • a hyper proliferative disease is selected from the group comprising include renal cell cancer, breast cancer, ovarian cancer, colorectal cancer, lung cancer, prostate cancer, brain cancer, hepato cellular carcinoma, acute myeloma, pheochromocytoma, Burkitt's lymphoma or myeloma. Even more preferavly.
  • said hyper proliferative disease is renal cell cancer or breast cancer.
  • a set of predictive descriptors for a disease-specific state id identified comprising at least two, four, six, eight, 10, 12, 14, 16, 18, or 20 predictive descriptors for a discrete disease-specific state
  • the sets of descriptors may be identified in an extracorporeal sample of a patient.
  • the present invention also relates to combinations of predictive descriptors for diagnosing, prognosing, classifying, stratifying and/or screening renal cell cancer in a human or animal patient, which is suspected of being afflicted by said disease, being identifiable by the above method.
  • a two-step algorithm is applied: 1. In a first step the samples for the other tumor type such as colorectal cancer are classified according to the states found in the RCC, leading to a tissue-specific classification of the samples "Ts-C".
  • tissue-specific gene set Ts-G
  • each RCC sample can be characterized by a list of 100 values for the gene expression levels. This ordered list can be regarded as a vector in a 100-dimensional Hilbert space H. Once could select more or less than 100 values. However, this number is a reasonable size that provides reliable results.
  • the set of genes, which build the base of this space are optimized such that an optimal clustering (with a maximal distance within this vector space given the metric in this space) of the states is given by this base.
  • Figure 1 shows this process of representation of the states in a higher- dimensional space, together with the identification of centers for the states (denoted by crosses in the sketch).
  • a further vector is defined by the centre of all samples, which could not have been covered by the previous mapping. This forth group is denoted by "D”.
  • the set of genes RCC-G can be taken to define a similar vector space for the tumor specific samples of another disease or cancer such as colorectal cancer.
  • the data for these other tumor specific samples such as colorectal cancer may come e.g. from microarray expression analysis.
  • the cancer related data may be publicly available expression data, for example from Affymterix gene chip data, preferably for whole genome expression data. Not only this base is defined, moreover with the already calculated centers of the states one can now generate a tissue specific classification (Ts-C) by
  • tissue-specific classification is defined, illustrated in figure 2.
  • Ts-C tumor-specific classification
  • Each of the genes within this set is able to discriminate between one specific state and the other states, e.g. between "A” on the one hand, and "B", “C” and “D” on the other hand.
  • the tumor specific gene set relates to colorectal cancer.
  • the flowchart of the algorithm is sketched in Figure 3.
  • the samples for the other tumor type such as breast cancer are classified according to the states found in the RCC, leading to a tissue-specific classification of the samples "Ts-C".
  • tissue-specific gene set Ts-G
  • each RCC sample can be characterized by a list of 100 values for the gene expression levels. This ordered list can be regarded as a vector in a 100-dimensional Hilbert space H. Once could select more or less than 100 values. However, this number is a reasonable size that provides reliable results.
  • the set of genes, which build the base of this space are optimized such that an optimal clustering (with a maximal distance within this vector space given the metric in this space) of the states is given by this base. This defines the optimal set of genes RCC-G.
  • Figure 1 shows this process of representation of the states in a higher- dimensional space, together with the identification of centers for the states (denoted by crosses in the sketch).
  • a further vector is defined by the centre of all samples, which could not have been covered by the previous mapping. This forth group is denoted by "D”.
  • the set of genes RCC-G can be taken to define a similar vector space for the tumor specific samples of another disease or cancer such as breast cancer.
  • the data for these other tumor specific samples such as breast cancer may come e.g. from microarray expression analysis.
  • the breast cancer related data were publicly available expression data, for example from Affymterix gene chip data, preferably for whole genome expression data.
  • Ts-C tissue specific classification
  • Ts-C tumor-specific classification
  • Each of the genes within this set is able to discriminate between one specific state and the other states, e.g. between "A” on the one hand, and "B", “C” and “D” on the other hand.
  • Ts-G tumor specific gene set
  • the tumor specific gene set relates to breast cancer.
  • the flowchart of the algorithm is sketched in Figure 3.
  • Tamoxifen is approved by the U.S. Food and Drug Administration to treat women diagnosed with estrogen-receptor(ER)-positive, early and late stage breast cancer after primary intervention (chemotherapy, radiation, surgery) to reduce the risk of recurrence of the cancer.
  • ER-positive breast cancers show a heterogeneous range of response rates suggesting a complex biology of these tumours.
  • DMFS disant metastasis free survival
  • breast cancer-specific state A is associated with a markedly prolonged time to disease progression.
  • Patients with other states or belonging to group E (“non-State A") show a clinical course that is indistinguishable from patients that did not receive medication as demonstrated with cohort II which had not been treated with Tamoxifen (Fig. 2B).
  • Data for the above patients can be retrieved by typing the patient identifiers into the GEO accession no. field at http://www.ncbi.nlm.nih.gov/geo/.
  • Affymetrix a probe array consists of a number of oligonucleotide probe cells and each probe cell contains a unique oligonucleotide probe. Probes are tiled in probe pairs as a Perfect Match (PM) and a Mismatch (MM). The sequence for PM andMM are the same, except for a change to the Watson-Crick complement in the middle of the MM probe sequence.
  • a probe set consists of a series of probe pairs and represents an expressed transcript.
  • the sample should bind stronger to PM than to MM, so one assumes that for a given probe set for example 25% or more of the PM values should be higher than the MM values to be deemed valid.
  • the expression values of the remaining genes were then correlated with the patient states A, B, C.
  • the list of genes describing states A, B, and C is shown in Table 9.
  • a next step one selected at least the 2 genes, preferably at least 10 out of the remaining genes with the most positive, and the at least 2 genes, preferably at least 10 with the most negative correlation for a disease-specific state such as renal cancer-specific states A, B, or C. Further it is possible to add at least 2, preferably at least 10 genes that are randomly selected and at least 2, preferably at least 10 genes showing the least variation across all the states.
  • the values are cal for state A, according to
  • the state of a sample is allocated by determining the maximum value of the individual subset probabilities. For example to determine whether a sample is state A or not, the state SA is calculated by
  • 3 ⁇ 4BC max ⁇ PA,subset ⁇ > ⁇ , subset B,Pc, subset c)
  • step 3 select a number of genes with the best accuracy of step 2 and add them to the set of predictive genes, and take them out of the set.
  • the number of genes selected is at least one, two, or three.
  • step 4 select a number of genes with the best accuracy of step 4 and add them to the set of predictive genes, and take them out of the set.
  • the number of genes selected is at least one, two, or three.
  • Table 12 shows the accuracy obtained with an increasing subset of genes, for predicting a single state (a) and to predict the state of a sample directly from a single set of measurements (b)
  • No refers to gene numbers as mentioned herein.
  • ProbeSetID refers to the identification number on the Affymetrix gene chip HT HG-U133A.
  • State refers the respective renal cell cancer specific states.
  • the term “Mode” defines whether a gene has to be over- or under-expressed for state A, B, C or D.
  • “Invers” indicates under-expression and “normal” indicates over-expression relative to the value "limit value”, which describes the value which used as control to decide on over-expression or under-expression.
  • the term “Fit” describes the reliability of the limit value with a value of 0.5 indicating maximum reliability. The limit value will be put in the respective software, which is used for expression analysis, individually for each gene.
  • SEQ ID No. refers to SEQ ID No. of the sequence listing. Table 5

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Databases & Information Systems (AREA)
  • Organic Chemistry (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Zoology (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • Wood Science & Technology (AREA)
  • Oncology (AREA)
  • Microbiology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The present invention describes the use of discrete states and signatures for classifying cancer samples, preferably renal cell cancer samples.

Description

Discrete states for use as biomarkers for cancers such as renal cell cancer
DISCRETE STATES FOR USE AS BIOMARKERS FOR RENAL CELL
CANCER
BACKGROUND OF THE INVENTION
Before the advent of molecular biology and medicine, diseases have largely been classified on the basis of their phenotypic characteristics. This, of course, means that a disease can only be diagnosed when phenotypic characteristics become apparent which may occur at a rather late stage of disease development. Further, it is nowadays understood that similar phenotypes may result from different molecular mechanisms. A strictly phenotype-based therapy may therefore be useless if the therapeutic approach taken does not address the right underlying mechanism. As an example, breast cancer may develop by different molecular mechanisms which lead to the same appearance in terms of tumor formation. One such mechanism will involve up-regulation of Her2 while others will not. Therapy with the antibody Herceptin® which addresses over-expression of Her2 will therefore only help patients which are afflicted correspondingly. If one does not understand at least to some degree the molecular mechanisms underlying a disease, a chosen therapy may not prove effective.
Molecular biology and medicine therefore aim at deciphering the molecular basis of disease development. A better understanding of the molecular basis of a disease will help detecting imminent or ongoing disease development early on and will allow medical practitioners adjusting their therapy early on or developing alternative treatment approaches. For example, if one knows that Herceptin® will be effective only in a specific group of patients, one can pre-select these patients and treat them accordingly. Further, if one realizes that different diseases result at least to some degree from the same mechanism, one can consider a drug, which has originally been developed for one disease only also for treatment of the other diseases. This, of course, requires that molecular markers, which are frequently designated as biomarkers, are at hand being characteristic for the disease in question and relating to relevant mechanisms, relevant clinical endpoints and relevant criteria to select proper treatment. Such markers may be found on the DNA, the R A or the protein level.
In the case of monogenetic diseases, using molecular markers as a diagnostic tool is relatively straightforward as one can use the aberration on the DNA level to predict whether the disease will develop with a certain probability or not. For example, trinucleotide expansions on the DNA level may be used to predict whether an individual will develop Huntington Chorea. Similarly, mutations in the Survival of Motor Neurons gene can be used to predict whether an individual will develop Spinal Muscular Atrophy.
Since the beginning of molecular understanding of tumor diseases there is a desire to define molecular markers associated with tumorigenesis, malignancy, progression, metastasis formation, responsiveness to treatment, survival times and other functional properties important for clinicians and for the development of efficient therapies. A number of useful markers were identified, first of all pathological markers for the inspection of samples such as derived from tissue sections (large sections, fine needle biopsies), , body fluids, smears (blood, feces, sputum, urine) or hair samples. A number of markers got identified such as markers of inflammation or ongoing apoptosis, markers of metabolic properties or molecular markers derived from mechanistic understanding of tumor induction, induced by deregulated balances between oncogenes such as Ras, Myc, CDKs and tumor suppressor genes such as pi 6, p27 or p53 (see e.g. Hanahan & Weinberg in "The Hallmarks of Cancer" (2000).
Specific understanding of tumor development mechanisms such as uncontrolled cellular growth, senescence and apoptosis evasion, such as extravasion, invasion, and evasion of immune responses have further accentuated the tumor suppressor gene hypothesis.
However, the vast majority of diseases such as hyper-proliferative disease including cancers does not result from mono-genetic causes but are due to aberrant complex molecular interactions.
Cancer, for example, is considered as a prime example for multi- factorial diseases which arise from subtle to severe deregulation of complex molecular networks. In most cases, these diseases do not develop from a single gene mutation but rather result from the accumulation from mutations in various genes. Each single mutation may not be sufficient in itself to start disease development. Rather, accumulation of mutations over time seems to increasingly deregulate the complex molecular signaling networks within cells. In these cases, disease development has therefore usually been considered to be a gradual continuous process which cannot be characterized by key events. As a consequence thereof, it is commonly assumed that such diseases cannot be diagnosed or classified by a single bio marker but by a group of markers which ideally would reflect in a simplified manner the complex molecular mechanisms underlying the disease. Despite the large amount of molecular information available from many human cancers, current cancer research mainly still focuses on single, frequently altered chromosomal loci ideally harboring tumor type-specific biomarker candidates with drug target potential such as enhanced angiogenesis lead to the understanding of tumor promoting roles of the Her-receptor family and its ligands and related mutants. Some of those attempts indeed led to certain useful markers for the selection of tumor therapies (Herceptin® treatment for patients of amplified Her-2 receptors).
All these results mainly resulted from a maximum of expert knowledge. The general and common assumption is that tumors must be different from normal tissues due to above mentioned target expression. The majority of studies, often linked to pathologic parameters (such as tumor subtypes, grade or staging), therefore address their focus on the investigation of single targets. Even though their role in certain pathways and their binding partners may become evident in appropriate cell lines or mouse models their specific role as part of an entire network remains unclear.
The human genome project together with all its spin-off projects such as analysis of individual genome varieties between individuals or just individual cells affected by a disease, analyses of respective transcriptomes, proteomes etc. were assumed to directly provide a large variety of useful biomarkers. Interestingly, most of these approaches have tried again to link the phenotypic differences observed for disease with distinct molecular pathways.
There are e.g. a number of types and subtypes of diseases, obviously associated with some clearly differentiable markers on the level of e.g. organs such as lung cancer or prostate cancer or e.g. cell types. The common concept for identifying biomarkers is to link such phenotypes to distinct combinations of biomarkers which then allow diagnosing the specific subtype of disease, which displays the respective phenotype. Such approaches, for example, try identifying distinct proteome expression patterns for small cell lung cancer tissues or non- small lung cancer tissues of afflicted patients vs. healthy individuals and to then use such expression patterns to diagnose patients in the future. Another approach has recently been described for breast cancer {Nature (2012) 490, 61-70). Interestingly, these approaches frequently do not look at linking clinically relevant parameters such as survival time with markers. It is common to the above-described approaches that they try linking e.g. common histological classifications of tumors with expression profiling with the intention to find molecular patterns instead of the phenotypical characterizations.
However, the wealth and complexity of data have hindered clear cut identification of such patterns to some extent.
There is thus a continuing need for tools allowing classification of diseases on the molecular level and provision of biomarkers which can be used for e.g. diagnostic purposes.
SUMMARY OF THE INVENTION It is one objective of the present invention to provide new types of markers and/or marker sets, which are suitable and specific for classifying diseases such as renal cell cancer, preferably with clear correlation to clinically or pharmacologically relevant endpoints. It is also an objective of the present invention to provide methods for detecting markers and/or marker sets which are suitable and effective for classifying diseases such as renal cell cancer or breast cancer, preferably with clear correlation to clinically or pharmacologically relevant endpoints. These and other objectives as they will become apparent from the ensuing description are attained by the subject matter of the independent claims. The dependent claims relate to some of the preferred embodiments of the invention.
The present invention provides a strategic and direct approach to global and functional biomarkers of clinical relevance for essentially all kinds of tumors, such as for renal cell cancer or breast cancer and potentially non-tumor diseases, too. With the present finding of tumors such as renal cell cancer or breast cancer being associated with discrete stable or meta-stable states which can be of clinical relevance, one is now able to define methods allowing the skilled person to not only identify and prove the existence of such discrete states for any kind of tumor such as renal cell cancer or breast cancer, but to assign such states with descriptors and signatures associated with such states. In addition, the technology allows to identify a minimum of those descriptors which unequivocally identify and discriminate each such discrete state from alternative states in a given tumor cell sample such as for renal cell cancer.
The understanding of such states also allows identifying those descriptors with a large dynamic range for quantitative measurement and ease of experimental access.
The invention is thus based on the surprising finding that diseases such as renal cell cancer or breast cancer can be characterized by discrete states, which reflect the underlying molecular mechanisms. Interestingly, these discrete states are distinct from one another so that disease development does not seem to be characterized by a continuous process. Rather, a discrete state seems to be maintained until a certain threshold level is reached when a switch to another discrete state occurs. Further, it seems that the discrete states may be linked to clinically and pharmacologically important parameters. However, they do not necessarily seem to coincide with standard histological classification schemes or other classification schemes.
Each discrete state can be described by way of different signatures. A signature is a pattern reflecting the qualitative and/or quantitative appearance of at least one descriptor. Preferably, a signature is a pattern reflecting the qualitative and/or quantitative appearance of multiple descriptors. Descriptors may in principle be any testable molecule, function, size, form or other parameter that can be linked to a cell. Descriptors may thus be e.g. genes or gene-associated molecules such as proteins and RNAs. The expression pattern of such molecules may define a signature. Such descriptors may also be designated as markers and marker sets.
These findings of the invention can be used for various diagnostic, prognostic and therapeutic purposes. They may also be used for research and development of new treatments for diseases such as hyper-proliferative diseases, with renal cell cancer being preferred.
In one aspect, the invention thus relates to at least one discrete disease-specific state for use as a diagnostic and/or prognostic marker in classifying samples from patients, which are suspected of being afflicted by renal cell cancer. The invention further relates to at least one discrete disease-specific state for use as a diagnostic and/or prognostic marker in classifying cell lines of renal cell cancer. The invention also relates to at least one discrete disease-specific state for use as a target for
development, identification and/or screening of pharmaceutically active compounds.
As discrete renal cell cancer- or breast cancer-specific states may be determined by signatures, the invention in one embodiment relates to at least one signature for use as a diagnostic and/or prognostic marker in classifying samples from patients which are suspected to be afflicted by a disease such as renal cell cancer. The invention also relates to at least one signature for use as a diagnostic and/or prognostic marker in classifying cell lines of a disease such as renal cell cancer. The invention further relates to at least one signature for use as a read out of a target for development, identification and/or screening of pharmaceutically active compounds.
The invention also relates to sets of descriptors which have been found to be predictive for a given discrete disease-specific state such as a renal cancer- or breast cancer-specific state, and to methods of identifying such sets or predictive descriptors for all states currently known for a specific disease. These sets of predictive descriptors may relate to measurable properties such as determining expression by PCR, optionally by qPCR for a set of genes which is then considered to be the set of predictive descriptors. It is disclosed herein that a set of at least 6 genes for each state of renal cell cancer may be sufficient to assign a patient to one of the three known states in renal cell cancer with an accuracy of at least 65%. The availability of set of predictive descriptors for disease-specific states which are testable by PCR, optionally by qPCR allows to assign disease-specific states in a straightforward and cost-efficient manner. In some embodiments, the invention relates to methods of diagnosing a disease such as renal cell cancer or breast cancer by making use of signatures and discrete disease- specific states. The invention also relates to methods of determining the responsiveness of a test population suffering from a disease such as a renal cell cancer or breast cancer towards a pharmaceutically active agent by making use of signatures and discrete disease-specific states. Further, the invention relates to methods of predicting the responsiveness of patients suffering from a disease such as renal cell cancer or breast cancer in clinical trials towards a pharmaceutically active agent by making use of signatures and discrete disease-specific states. The invention also relates to methods of determining the effects of a potential pharmaceutically active compound by making use of signatures and discrete disease- specific states.
Aside from the specific uses of discrete disease specific states and signatures, the invention also relates to methods for identifying signatures, discrete disease specific states and sets of predictive descriptors in samples which may be derived from patients or which may e.g. be cell lines.
As a read-out of such states and signatures, which are suitable to characterize hyper- proliferative diseases such as renal cell cancer, the present invention discloses specific sets of descriptors, properties of which may be used to determine whether a specific state is present within a disease such as renal cell cancer. These properties may e.g. be the expression patterns of the descriptors which are described
hereinafter. The expression may be determined e.g. on the R A or protein level. However, it is to be understood that the invention as described herein is not to limited to these specific descriptor and descriptor sets. While determining the expression levels of the descriptors and descriptor sets as described hereinafter may provide a straightforward approach for classifying hyper-proliferative diseases such as renal cell cancer or breast cancer according to a new classification scheme, one can use different type of descriptors and read outs to determine states. Methods for generally detecting states in hyper-proliferative diseases such as renal cancer, colorectal cancer etc. are described in PCT/EP201 1/057691 and Beleut et al., BMC cancer (2012), 12:310.
All of these embodiments of the invention can be used in the context of diseases including hyper-proliferative diseases such as cancer and preferably in the context of renal cell cancer or breast cancer.
DESCRIPTION OF THE FIGURES
Identification of the RCC-G. Sketch of the states as vectors in a high- dimensional space of dimension 100. For simplification only three dimensions are shown. The states are clustered around some
(imaginary) center, denoted in the sketch with a cross. Ideally, all vectors belonging to the same state - and only these - are located within a hypersphere around the center of the cluster.
Transfer to other tissues. Mapping of a sample of the new tissue to one of the States defined by RCC-G. The sample is represented by a 100- dimensional vector, given as P in this sketch. The distance of P to all vectors defined by RCC-G (namely "A", "B", "C" and "D") is calculated. The new sample is now mapped to that state which has the smallest distance to P. If all distances are too big, the sample is mapped to the new state of unknowns "E".
Flowchart of algorithm used in experiment 1.
Networks associated with states A, B, C and D as identified by the STRFNG (http://www.string-db.org/) software in experiment 2. The associated networks of state A is depicted in Figure 4, of state B in Figure 5, of state C in Figure 6 and of state D in Figure 7. Figure 8-11 Networks associated with states A, B, C and D as identified by the STRING (http://www.string-db.org/) software in experiment 4. The associated networks of state A is depicted in Figure 8, of state B in Figure 9, of state C in Figure 10 and of state D in Figure 11.
Figure 12 Breast cancer- specific state-based patient stratification in combination defines responder cohort. A) State A is associated with significant response in estrogen receptor-positive patients (ER+) treated with Tamoxifen. B) In untreated ER+ patients no correlation with survival was attributable to any Cancer State.
DETAILED DESCRIPTION OF THE INVENTION
The present invention as illustratively described in the following may suitably be practiced in the absence of any element or elements, limitation or limitations, not specifically disclosed herein. The present invention will be described with respect to particular embodiments and with reference to certain figures but the invention is not limited thereto but only by the claims. Terms as set forth hereinafter are generally to be understood in their common sense unless indicated otherwise. Where the term "comprising" is used in the present description and claims, it does not exclude other elements. For the purposes of the present invention, the term "consisting of is considered to be a preferred embodiment of the term "comprising of. If hereinafter a group is defined to comprise at least a certain number of embodiments, this is also to be understood to disclose a group, which preferably consists only of these embodiments. Where an indefinite or definite article is used when referring to a singular noun, e.g. "a", "an" or "the", this includes a plural of that noun unless something else is specifically stated. Terms like "obtainable" or "definable" and "obtained" or "defined" are used interchangeably. This e.g. means that, unless the context clearly dictates otherwise, the term "obtained" does not mean to indicate that e.g. an embodiment must be obtained by e.g. the sequence of steps following the term "obtained" even though such a limited understanding is always included by the terms "obtained" or "defined" as a preferred embodiment.
In the context of the present invention the terms "about" or "approximately" denote an interval of accuracy that the person skilled in the art will understand to still ensure the technical effect of the feature in question. The term typically indicates deviation from the indicated numerical value of ±10%, and preferably of ±5%.
If reference is made in the context of the present invention to a "sample", this always preferably refers to an extracorporeal sample. As mentioned above, previous attempts in finding diagnostic tools for disease characterization have assumed that disease development is a continuous process and have tried to link different primarily histological phenotypes of e.g. cancers such as lung cancer with specific expression patterns assuming that the different detectable phenotypes reflect continuous and progressive disease development. Another example is renal cell cancer, where histological characterization has led to identification of clear cell, papillary and other types of renal cell cancer. The present invention is not using these standard approaches of the prior art. As described in PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310, it has been found that different types of renal cell cancer such as clear cell and papillary renal cell cancer may be rather generalized by the concept of status, signature and descriptors as described herein specifically for renal cell cancer. Other than previous approaches, the present invention is based on the finding that it seems that diseases such as hyper-proliferative diseases can comprehensively be described by a limited set of discrete disease-specific states which do not necessarily correlate with established histological characterization of different subtypes of such a hyper-proliferative disease but which may be linked to clinically relevant parameters such as survival time. The basic concepts of this approach are described in
PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310.
Without wanting to be bound to a specific scientific theory or expert knowledge, it is hypothesized that e.g. not the absolute expression of genes or gene groups forming individual pathways are a measure for this novel biomarker which is called state herein, but the overall profile of the global relation of all genes are a measure for the new biomarker called "state". Such global relative constellation seems to describe a stable constellation of at least three such alternatives for the disease state "renal cell carcinoma" (RCC, see PCT/EP2011/057691). In principle lots of expression products with their relative profiles define a state and only by considering them in their entirety, the nature and reality of the overall cell constitution can be monitored.
As mentioned these states were first found in RCC and in order to verify whether such pattern reflects a mere statistical coincidence or marks a certain biological relevance two questions were, namely whether such a pattern is predictive for a certain phenotype and whether the pattern itself is far away from the Gauss- distribution of pattern found by chance due to high combinatorial probability. The first question was positively answered as the three states could be assigned to different survival times. The second question was also positively answered as the majority of genes are involved forming a specific profile of global relative expression ratios and such homologous pattern arising in all tumors and beyond all of them as characteristic for the state of a disease. It was further shown that states found in RCC have homologous counterparts in other tumor types, set by homologous genetic expression relativity shared by tissue specific indicative genes. It is thus assumed that a disease is characterized by switching to discrete disease-specific states. This suggests that de-regulation of regulatory networks within a cell can occur to a certain a threshold level without the overall discrete state being affected. However, once the threshold level has been exceeded cells seem to switch to another specific discrete state. These states can therefore be considered as stable or meta- stable in that they may allow for a certain degree of variation before they may switch. We understand a discrete state to reflect the flow and extent of interactions between and within different regulatory networks. As cells seem to switch to different discrete states, such a switch seems to indicate a major re-arrangement of the flow and extent of interactions between and within different regulatory networks, which may lead to a changed aggressiveness of a disease and which may also help explaining why different discrete states can be linked to e.g. different average survival times.
The extent and flow of interactions between and within different regulatory networks may be detectable by e.g. the expression level of e.g. proteins within such regulatory networks either on the RNA or protein level. The molecular entities, which are looked at can be designated as descriptors. The pattern, which is detected for a set of descriptors, can be considered as a signature. In the aforementioned example, the signature will be the expression pattern of proteins, which function as the descriptors. Of course, one may chose different types of descriptors and different types of signatures. One may thus look at expression levels of genes on the RNA level. One may look at the regulation of miRNAs and one may even look at the qualitative distribution of descriptors such as the cellular localization of certain factors or the shape of a cell. One may use a given set of descriptors of the same type of molecules (e.g. mRNAs) to define signatures with the different signatures reflecting e.g.
different expression patterns or one may use a given set of descriptors which are a group of different molecules (such as mRNAs, proteins and miRNAs). It is thus important to note that according to the invention's logic a discrete state can be correlated to different signatures. As single signature will, however, define one discrete state only. As mentioned, the concept of disease-specific states which may be discernible from molecular expression patterns and which are not linked to e.g. histological classifications of a disease such as renal cell cancer was described in
PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310. The approach rests on identifying discrete disease-specific states, e.g. in renal cell cancer or breast cancer by performing an unsupervised two-way hierarchical clustering approach with TIGR MeV (Saeed et al, Methods Enzymol. (2006), 411 : 134-193) using Euclidian distance and average linkage without incorporating any histological or pathological data or classifications. For renal cell cancer, this approach was thus applied across different tumor sample, regardless of whether the samples were papillary, clear cell or chromophob renal cell cancers. Identification of disease specific descriptors such as biomarkers may then be performed using SAM (Tusher et al, Proc Natl Acad Sci USA (2001) 98(9):5116-5121). This approach can be used to identify signatures and states not only for renal cell cancer, but also for breast cancer, colorectal cancer, lung cancer, etc. The approach is thus generally applicable by subjecting expression data obtained from different patients to this unsupervised two-way hierarchical clustering approach. Identification of signatures and steps may be best performed by first extracting descriptors such as expressed genes for certain pathway using the Panther software as described in PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310 and subjecting these pathway specific sets of genes to unsupervised two-way hierarchical clustering. The groups of descriptors, e.g. the genes identified for the different pathways may then be combined and again subjected to a unsupervised two-way hierarchical clustering approach against the same tumor sets. This two-fold unsupervised two-way hierarchical clustering will reveal in a straightforward manner whether a certain disease can be classified into different disease-specific states as describe herein. The relevance of this approach and the states identified can be taken from the fact that the states identified for renal cell cancer correlate with survival time (see PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310) and that certain breast cancer patients react differently to tamoxifen treatment depending on their state (see hereinafter).
Thus, the an unsupervised two-way hierarchical clustering approach, and preferably the two-fold application thereof as described in PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310 allows identification of disease-specific states in different diseases such as renal cell cancer, breast cancer, ovarian cancer, colorectal cancer, lung cancer, prostate cancer, brain cancer, hepato cellular carcinoma, acute myeloma, pheochromocytoma, Burkitt's lymphoma, myeloma or Parkinson's disease. Further, the set of descriptors such as e.g. the set of expressed genes which can be used to distinguish the different states can be determined by this approach.
Based on these findings it is described hereinafter which set of descriptors can be used e.g. in an array expression analysis to determine the discrete disease-specific state in a patient's sample. However, as an array expression analysis is rather costly and time-consuming it is also described hereinafter how, given that discrete disease specific states, which are e.g. identifiable by the unsupervised two-way hierarchical clustering approach, and preferably by the two-fold application thereof as described in PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310, exist for a certain disease, a set of predictive descriptors such as a set of genes can be identified which upon analysis by PCR analysis, optionally by quantitative PCR (qPCR) analysis allows assignment of a discrete disease-specific state in a patient sample. Thus, the method of identifying sets of predictive descriptors requires that one knows that discrete disease-specific states exist for a given disease such as renal cell cancer or breast cancer and has understanding which descriptors may in principle be suitable for characterizing such a disease-specific state. This information may be obtainable by the unsupervised two-way hierarchical clustering approach, and preferably by the two-fold application thereof as described in PCT/EP201 1/057691 and Beleut et al., BMC cancer (2012), 12:310. From the set of descriptors which are identifiable by this approach, one can then select the set of predictive descriptors for a given disease-specific state, which are e.g. testable by PCR, optionally by qPCR, following the selection criteria mentioned hereinafter.
It is to be understood that in order for a set of predictive descriptors, which are e.g. testable by PCR, optionally by qPCR, to be a reliable indicator for a disease-specific state, one needs a set of predictive descriptors for each discrete disease-specific state. Once such sets of predictive descriptors are available for a certain disease, one can then e.g. determine expression of these descriptors such as genes by PCR, optionally by qPCR, and assign the state for a given sample according the calculations mentioned hereinafter. For a different aspect of the invention, it was thus hypothesized that one may identify further states, i.e. relative constellations of descriptor sets for other diseases such as colorectal cancer or breast cancer or even within renal cell cancer and the present invention is concerned with describing further states and new signatures as well as set of descriptors for characterizing renal cell cancer. As has been illustrated for renal cell carcinoma (RCC, see PCT/EP201 1/057691), discrete disease-specific states can be linked to medically important parameters such as average survival times.
Interestingly, however, the discrete disease-specific states do not necessarily correlate with common histological classification schemes meaning that e.g.
papillary RCCs of different patients may be characterized by different discrete molecular states and that the patients may thus have different survival expectations even though their cancers have been classified as comparable by histological standards. It follows from the invention as laid out hereinafter that the same discrete state can be characterized through different signatures. Thus, a novel interpretation of renal cell cancer is suggested, based on the signatures described hereinafter. The finding that a hyper-proliferative disease such as renal cell cancer can be characterized by different discrete renal cell cancer-specific states has important implications.
The discrete disease-specific state(s) may be used to classify patients and samples thereof as falling within distinct groups. As the discrete renal cell cancer- specific state may moreover be linked to clinically important parameters such as survival time or responsiveness to distinct drugs, this will help selecting therapeutic regimens. The discrete molecular state(s) may thus be used as diagnostic and/or markers providing a new way of classifying renal cell cancer into clinically relevant subgroups etc.
A lot of projects for the development of novel pharmaceuticals suffer from
insufficient differentiation from existing therapies, non-conclusive statistical data or a need for enormously high numbers of patients in Phase II or Phase III demanding for multimillion dollar investments and extensive time periods. If, however, a drug can be shown to act preferentially only in a selected group of patients which suffer from e.g. a subtype of renal cell cancer or breast cancer and which are characterized by the same discrete disease-specific state of interacting molecular networks, then this drug may be tested in other patients which suffer from a different disease, but are characterized by the same discrete molecular state. Further, clinical trials, which led to ambiguous results for a disease such as renal cell cancer or breast cancer, may be reassessed by regrouping patients according to their status as described herein. It may turn out that this re-grouping will show that the drug is working in a subgroup of patient characterized by a common state. It can be expected that such clinical trials will give statistically reliable results for much smaller patient groups. In fact, one may be able to show that treatment is effective where large scale clinical trial could not give such results because the large number of non-responders will avoid any statistically meaningful interpretation of the results.
The discrete states thus provide a stratifying tool for the testing of pharmacological treatments as it allows grouping of patients for clinical trials. Assuming a drug candidate is identified which is expected or hoped to positively influence the critical parameter of survival time in renal cell cancer substantially, this needs to be proven by clinical trials in order to receive FDA approval. Future drugs will likely focus on mechanistic intervention. If the mechanistically active drug is successful for the clinical end point parameter "survival time", it probably interacts selectively with mechanisms linked to the parameter "survival time". These mechanistic subgroups are exactly those defined by e.g. the discrete molecular states enabled by this invention. It is thus fair to believe, that most probably one subgroup of patients reacts positively to a different degree than another subgroup does. Knowledge of this patient cohort-specific imbalance is of utmost importance for the industry seeking approval for a drug, important to know for the physician to choose the optimum regimen and for the payers to spend money most efficiently on patients with promise of therapeutic success. Any definition of a subgroup reacting with maximum relative effect in terms of prolonged life expectancy may improve the chance for FDA registration.
The knowledge about discrete disease-specific states such as renal cell cancer-or breast cancer-specific states may also allow using these states as targets during development of pharmaceutical products. For example, different renal cell cancer specific states may be linked to clinically relevant parameters such as survival time or response rate to a certain drug. If an agent is shown to switch the discrete disease- specific state in a sample or in a cell line from a state, which is linked to short survival time, into a state with long survival time, such a switch may be used as an indication that the agent may be therapeutically effective in treating the disease in question. Thus, assays can be designed which make use of the correlation between a discrete renal cell cancer-specific state and e.g. the associated clinical parameter. The fact that one now knows that e.g. discrete renal cell cancer- or discrete breast cancer-specific states exist and drive disease development in at least some of its aspects allows one to identify signatures of descriptors, which can then be used in a diagnostic test to classify renal cell cancer or breast cancer. These signatures of descriptors thus serve as a read-out for the classification of a disease or its subtype. A preferred read-out for signatures and states of renal cell cancer or breast cancer may be the expression of the descriptors and descriptor sets described herein. From a practical perspective, the read out may be implemented in the form of ELISA assay, array technology, kits and all other types of devices and methods that allow determining expression of the descriptors and descriptor sets as described herein. The invention thus also relates to such kits, assays, arrays etc. and as well as to the use of such kits, assays, arrays etc. as mentioned herein. In another aspect, the read out for such states may be sets of predictive descriptors such as genes which can be tested by PCR, optionally by qPCR. The assignment of a disease-specific state based on the PCR- or qPCR-measurements is then done based on the calculations described hereinafter.
The invention and its embodiments will now be described in greater detail. For a better understanding of the following definitions, a rough outline of the findings in the context of renal cell cancer is given.
It was found that the overall majority of renal cell cancer can be classified into four discrete disease-specific states. The discrete disease-specific states may reflect the aggressiveness of the tumor. The read-out for these four discrete molecular states which are designated hereinafter as A, B, C and D are e.g. the expression patterns, i.e. the signatures of a limited set of descriptors, i.e. genes. Each state may be best described by a signature arising from properties of a group of descriptors, which may also be designated as a descriptor set, such as the expression pattern of the group of genes described in Tables 1, 2, 3, and 4. Each group of genes defines a descriptor set. The expression pattern of each group of genes further provides a signature which is indicative of a renal cell cancer-specific disease state. The expression pattern may be determined by different methods such as ELISA, Western Blotting, RNA expression analysis. It is to be understood that the nomenclature A, B and C refers to the same types of states as described in PCT/EP201 1/057691, even though they are described by different signatures, namely in the present case by the expression pattern of different sets of descriptors.
It was further found that the overall majority of breast cancer can be classified into four discrete disease-specific states. The discrete disease-specific states may reflect the aggressiveness of the tumor. The read-out for these four discrete molecular states which are designated hereinafter as A, B, C and D are e.g. the expression patterns, i.e. the signatures of a limited set of descriptors, i.e. genes. Each state may be best described by a signature arising from properties of a group of descriptors, which may also be designated as a descriptor set, such as the expression pattern of the group of genes described in Tables 1, 2, 3, and 4. Each group of genes defines a descriptor set. The expression pattern of each group of genes further provides a signature which is indicative of a breast cancer-specific disease state. The expression pattern may be determined by different methods such as ELISA, Western Blotting, RNA expression analysis.
We will now provide definitions useful to understand the present invention and will then discuss the invention in more detail. "State" means a stable or meta-stable constellation of a cell and/or cell population which is identifiable in at least two biological samples from at least two patients and which can be described by means of a single descriptor or multiple descriptors on the cellular or molecular level referenced against a standard state. As explained hereinafter, such state can be characterized by at least one or various signatures. Such signatures may be reflected by the expression of genes relative to each other.
By definition, different states refer to different stabile and metastabile constellations of a cell meaning that these constellations are distinct from each other in terms of the kind and extent of molecules of at least two regulatory networks interacting within a cell. Different states can be characterized by a limited set of descriptors giving rise to different signatures. They may therefore also be designated a "discrete molecular state".
If a state is indicative of a disease, it may be designated as "disease specific molecular state" such as renal cell cancer-specific state. In certain instances, a disease specific state may be linkable to clinically relevant parameters such as survival rate, therapy responsiveness, and the like.
A state which can be found in healthy human or animal subjects may be designated as "healthy state".
The term discrete disease specific state preferably allows distinguishing different subtypes of a disease according to a new classification scheme which links the subtype being characterized by a discrete disease specific state to clinically or pharmacologically important parameters.
The terms "clinical or pharmacological relevant parameter" preferably relate to efficacy-related parameters as they will be typically analyzed in clinical trials. They thus do not necessarily relate to a change in the histological appearance of a disease, but rather to important clinical end points such as average survival time, progression- free survival times, responsiveness to a certain drug, subjective patient- or physician- rated improvements making use established scale systems, tolerability, adverse events. The terms also include responsiveness towards treatment.
"Descriptor" means a measurable parameter on the molecular or cellular level which can be detected in terms of, but not limited to existence, constitution, quantity, localization, co-localization, chemical derivative or other physical property. A descriptor thus reports at least one qualitative and/or quantitative measuring parameter of, but not limited to existence, kinetic variation, clustering, cellular localization or co-localization of at least one specific mRNA, processing or maturation derivatives of at least one specific mRNA, specific DNA-motifs, variants or chemical derivatives of such motifs, such as but not limited to methylation pattern, miRNA motifs, variants or chemical derivatives of such miRNA motifs, proteins or peptides, processing variants or chemical derivatives of such proteins or peptides or any combination of the foregoing.
By way of example, a descriptor may be a protein the over- or under-expression of which can be used to describe a discrete disease-specific state vs. a different discrete disease-specific state or vs. the discrete healthy state. If different proteins, i.e.
different descriptors are analyzed for their expression behavior, the observed pattern of over- and/or under-expression for this set of descriptors gives a rise to a pattern, which may be designated as signature (see below). It is to be understood that different types of descriptors may be used to describe the same discrete state, substate and level. For example, a set of descriptors may comprise expression data for a first set of proteins, data on post-trans lational modifications of a second set of proteins and data for a group of miRNAs. In a preferred embodiment, the measurable parameter of a descriptor is the expression level of a protein and/or gene which may be determined e.g. on the protein and/or RNA level by methods known in the art such as Western Blotting, ELISA, immunoassays, Northern Blotting, array expression analysis etc.
Preferred descriptors include genes and gene-related molecules such as mRNAs or proteins.
The "qualitative" detection of a descriptor refers preferably to e.g. determining the localization of a descriptor such as a protein, an mRNA or miRNA within e.g. a cell It may also refer to the size and/or the shape of cell. The "quantitative" detection of a descriptor refers preferably to e.g. determining the presence and preferably the amount of a descriptor within a given sample.
In a preferred embodiment the quantitative measurement of a descriptor relates to detecting the amount of genes and gene-related molecules such as mRNAs or proteins.
The pattern resulting from the analysis of this combined set of descriptors will then be considered to be a signature.
"Signature" means a pattern of a set of at least two experimentally detectable and/or quantifiable descriptors with the pattern being a characteristic description for a discrete state. The term "diseases" relate to all types of diseases including hyper-pro liferative diseases. The term reflects the all stages of a disease, e.g. the formation of a disease including initial stages, the development of a disease including the spreading of a disease, the stages of manifestation, the maintenance of a disease, the surveillance of a disease etc. Example of diseases include Parkinson disease, Alzheimer disease, etc..
The term "hyper-proliferative" diseases relates to all diseases associated with the abnormal growth or multiplication of cells. A hyper-proliferative disease may be a disease that manifests as lesions in a subject. Hyper-proliferative diseases include benign and malignant tumors of all types, but also diseases such as hyperkeratosis and psoriasis.
Tumor diseases include cancers such as such as lung cancer (including non small cell lung cancer), kidney cancer, bowel cancer, head and neck cancer, colo(rectal) cancer, glioblastom, breast cancer, prostate cancer, skin cancer, melanoma, non Hodgkin lymphoma and the like. Other cancers include ovarian cancer, hepatocellular carcinoma, acute myeloid leukemia, pheochromocytoma, Burkitt's lymphoma and melanoma. In the context of the present invention, a preferred hyper-proliferative disease is renal cell cancer. In particular, cancers considered are as defined according to the International Classification of Diseases in the field of oncology (see
http://en.wikipedia.org/wiki/carcinoma). Such cancers include epithelial carcinomas such as epithelial neoplasms; squamous cell neoplasms including squamous cell carcinoma; basal cell neoplasms including basal cell carcinoma; transitional cell papillomas and carcinomas; adenomas and adenocarcinomas (glands) including adenoma, adenocarcinoma, linitis plastic, insulinoma, glucagonoma, gastrinoma, vipoma, cholangiocarcinoma, hepatocellular carcinoma, adenoid cystic carcinoma, carcinoid tumor, prolactinoma, oncocytoma, hurthle cell adenoma, renal cell carcinoma, grawitz tumor, multiple endocrine adenomas, endometrioid adenoma; adnexal and skin appendage neoplasms; mucoepidermoid neoplasms; cystic, mucinous and serous neoplasms including cystadenoma, pseudomyxoma peritonei; ductal, lobular and medullary neoplasms; acinar cell neoplasms; complex epithelial neoplasms including Warthin's tumor, thymoma; specialized gonadal neoplasms including sex cord-stromal tumor, thecoma, granulosa cell tumor, arrhenoblastoma, Sertoli-Leydig cell tumor; paragangliomas and glomus tumors including
paraganglioma, pheochromocytoma, glomus tumor; nevi and melanomas including melanocytic nevus, malignant melanoma, melanoma, nodular melanoma, dysplastic nevus, lentigo maligna melanoma, sarcoma and mesenchymal derived cancers, superficial spreading melanoma and acral lentiginous malignant melanoma.
The term "sample" typically refers to a human or individual that is suspected to suffer from e.g. a hyper-proliferative disease. Such individuals may be designated as patients. Samples may thus be tissue, cells, saliva, blood, serum, etc. The term "cell lines" will designate cell lines which are either primary cell lines which were developed from patients' samples or which are typically be considered to be representative for a certain type of hyper-proliferative diseases. It is to be understood that all methods and uses described herein in one embodiment may be performed with at least one step and preferably all steps outside the human or animal body. If it is therefore e.g. mentioned that "a sample is obtained" this means that the sample is preferably provided in a form outside the human or animal body, i.e. as an extracorporeal sample.
It is further to be understood that the term sample, tissue etc. in the context of the present invention preferably relates to renal cell cancer tissue or breast cancer tissue. It will be first described how signatures can be identified in accordance with the invention. It is to be understood that a signature will be indicative of a discrete disease-specific state.
In principle, signatures and discrete disease-specific states can be identified by analyzing for the quality and/or quantity of descriptors from at least two different regulatory networks for a multitude of samples from either patients of a hyper- proliferative diseases such as renal cell cancer or cell lines of a hyper-proliferative disease such as renal cell cancer as was described in PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310 for renal cell carcinoma. This data is then analyzed for certain patterns by (i) grouping the data for the quality and/or quantity across descriptors and (ii) grouping samples or cell lines in a second step for similarities of the quality and/or quantity of descriptor across all potential descriptors. The unsupervised two-way hierarchical clustering approach based on genes selected from different signal transduction pathways, which is described in PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310 for identifying states, signatures and descriptors of renal cell carcinoma, is incorporated for the purposes of identifying states, signatures and descriptors of renal cell cancer, breast cancer or other disease. An alternative approach of using specific algorithms for existing expression data is also described in the experimental section of
PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310. Further, the present invention describes yet another algorithm based approach for identifying states, signatures and descriptors such as the expression patterns of distinct groups of genes for renal cell cancer, breast cancer or other diseases. This method has led to identification of different sets of descriptors of states A, B and C known from PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310 and a new state D. This method may, however, also be applied to other tumors such as breast cancer. It is to be understood that the overall group of analyzed descriptors (such as the expression of all genes) does not necessarily have to yield different signatures. Thus a chosen set of descriptors may only yield one signature. This will thus indicate that the disease examined has only one discrete disease-specific state. Of course, this assumes that the analysis has been performed with a comprehensive set of sample covering all relevant types of a disease such as renal cell cancer.
Of course, the overall group descriptors may also yield multiple signatures such as 2, 3, 4, 5 or more signatures. The number of signatures will indicate the number of discrete disease-specific states that can be observed on this level of resolution for a disease. For example, if one analyzes a comprehensive set of samples for renal cell cancer or breast cancer and identifies e.g. four signatures, this means that renal cell cancer or breast cancer can be characterized by four discrete disease-specific states. For each state, one may then select one signature and thus one set of descriptors that allows to determine the respective state. For the present invention, tables 1, 2, 3 and 4 for example describe set of descriptors (genes and proteins), the expression of which can be used to determine whether the renal cell cancer of a particular sample and thus e.g. patient is characterized by state A, B, C and D. For the present invention, tables 5, 6, 7 and 8 for example further describe set of descriptors (genes and proteins), the expression of which can be used to determine whether the breast cancer of a particular sample and thus e.g. patient is characterized by state A, B, C and D.
It is further important to understand that a given signature will unequivocally relate to a discrete disease-specific state. However, a discrete disease-specific state may be described through multiple signatures depending on what type and combination of descriptors have been used for identifying the signatures. In general, one can identify groups by grouping samples according to the similarity of a parameter which is attributable to a descriptor (such as expression) over a complete set or over a subset of genes or gene-associated molecules, wherein the similarity is preferably measured using a statistical distance measure such as Euclidian distance, Pearson correlation, Spearman correlation, or Manhattan distance.
However, as the approaches which were used in PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310 and which rely e.g. on two-way hierarchical clustering, as well as the algorithm-based approaches described in
PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310 and herein make use of parameters that are easily accessible and testable on a large scale (e.g.
expression on the R A or protein level), they provide an important tool to identify the number of discrete disease-specific states for a given resolution as well as to identify signatures describing these states.
Once one has identified a number of signatures for a set of descriptors such as by the above-described methods one can further reduce the number of descriptors, which are necessary to distinguish best between different signatures.
To this end, one may analyze samples for which one knows the disease specific states from the above analysis for descriptors that allow the best differentiation of different discrete disease specific states. These descriptors do not necessarily have to be those which led initially to the identification of discrete disease specific stages.
For example, once one has identified discrete specific states for disease-specific samples such as tumor samples by the aforementioned methods making e.g. use of expression data for genes, one may analyze samples for which one knows the discrete disease specific states for expression across all approximately 24.000 genes. One can then select the genes which are most differentially regulated between the samples of different discrete disease specific states and may use these expression patterns as signatures. This sort of analysis may be performed by micro array expression analysis. It is to be understood that the term "genes" in the context of the tables refers to the probes on the Affymetrix gene chip. Tables 1, 2, 3, and 4 (for renal cell cancer) and Tables 5, 6, 7, and 8 (for breast cancer) names the Probe Identifiers which allow a clear identification. Where a DNA or amino acid sequence is known for a Probe
Identifier is known, this has been indicated. All statements hereinafter which relate to tables 1, 2, 3, 4, 5, 6, 7, and 8 preferably only include those genes where the DNA and/or amino acid sequence is known. Where only the DNA sequence is known, it is straightforward to also identify the amino acid sequence of the respective gene. If the expression of a gene is mentioned herein, this should also always refer to the expression of the respective protein or fragments thereof.
It is to be noted that the invention wherever it mentions methods of identifying discrete disease-specific states, signatures etc. always considers that the quality and/or quantity of descriptors has to be tested. This testing may include technical means such as use of e.g. micro-arrays to determine expression of genes. If the invention considers applying such methods by relying on and using data which are indicative of the quality and/or quantity of descriptors and which are deposited in e.g. databases after they have been determined using technical means, these methods will be run on technical devices such as a computer. All methods as they are described herein for identifying discrete disease-specific states, signatures etc. may therefore be performed in a computer-implemented way.
In the following, we will set forth in detail that signatures and discrete renal cell cancer- or breast cancer-specific states can be used for diagnostic, prognostic, analytical and therapeutic purposes. These aspects will be discussed in parallel for discrete renal cell- and breast cancer-specific states and signatures as if these terms were interchangeable. It has, however, to be born in mind that a discrete renal cell cancer- or breast cancer-specific state can be described through various signatures and depending on the type and combinations of descriptors chosen. If in the following the term signature is used this is thus meant to incorporate all signatures and descriptor types that can be used to describe a single discrete renal cell cancer-or breast cancer-specific state. Further, all embodiments, which are discussed for signatures, equally apply to discrete renal cell cancer- or breast cancer-specific states. It should be borne in mind that where the term state and/or signature is used in the context of any of the methods and uses described hereinafter, this will as a preferred embodiment include reference to the expression pattern of the descriptor sets of tables 1, 2, 3, and 4 (for renal cell cancer) and of tables 5, 6, 7, or 8 (for breast cancer).
The invention as mentioned relates to discrete disease-specific states such as discrete renal cell cancer-specific states for use as a diagnostic and/or prognostic marker in classifying samples from patients, which are suspected of being afflicted by a disease, optionally by a hyper-proliferative disease such as renal cell cancer or breast cancer. The invention also relates to discrete disease-specific states such as discrete renal cell cancer-specific states for use as a diagnostic and/or prognostic marker in classifying cell lines of a disease, optionally of a hyper-proliferative disease such as renal cell cancer or breast cancer. The invention further relates to discrete disease- specific states such as discrete renal cell cancer- or breast cancer-specific states for use as a target for development of pharmaceutically active compounds. It should be borne in mind that where the term state and/or signature is used in the context of any of the methods and uses described hereinafter, this will as a preferred embodiment include reference to the expression pattern of the descriptor sets of tables 1, 2, 3, and 4 (for renal cell cancer) and of tables 5, 6, 7, or 8 (for breast cancer).
The invention also relates to signatures for use as a diagnostic and/or prognostic marker in classifying samples from patients, which are suspected of being afflicted by a disease, optionally by hyper-proliferative disease such as renal cell cancer or breast cancer wherein the signature comprises a qualitative and/or quantitative pattern of at least one descriptor and wherein the signature is indicative of a discrete disease-specific state such as a discrete renal cell cancer- or discrete breast cancer- specific state. As for states, the invention also relates to signatures for use as a diagnostic and/or prognostic marker in classifying cell lines of a disease, optionally of a hyper-proliferative disease such as renal cell cancer or breast cancer wherein the signature comprises a qualitative and/or quantitative pattern of at least one descriptor and wherein the signature is indicative of a discrete disease-specific state such as a discrete renal cell cancer- or discrete breast cancer-specific state. Further, the invention relates to signatures for use as a read out for a target in the development, identification and/or application of pharmaceutically active compounds, wherein the signature comprises a qualitative and/or quantitative pattern of at least one descriptor and wherein the signature is indicative of a discrete disease-specific state such as a discrete renal cell cancer- or discrete breast cancer-specific state. The target may be the discrete disease specific state which is reflected by the signature. It should be borne in mind that where the term state and/or signature is used in the context of any of the methods and uses described hereinafter, this will as a preferred embodiment include reference to the expression pattern of the descriptor sets of tables 1, 2, 3, and 4 (for renal cell cancer) or of tables 5,6 ,7, or 8 (for breast cancer).
The discrete disease-specific states such as discrete renal cell cancer- or discrete breast cancer-specific states and signatures relating thereto can be used for diagnostic purposes. Thus, samples of patients suffering from a disease such as a hyper- proliferative disease, e.g. renal cell cancer or breast cancer may be analyzed for their discrete disease-specific states and classified accordingly. The importance of discrete disease-specific states for classifying samples and thus for diagnosing patients become clear from the experiments on RCCs as described in PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310.
In the following certain embodiments of the invention pertaining to renal cell cancer and breast cancer are discussed separately. Renal cell cancer
The present invention provides further evidence that the discrete renal cell cancer- specific states A, B, C and D as reflected by the expression pattern of the descriptors of tables 1, 2, 3 and 4 (see also Experiment 2) are indeed biologically relevant. It was assumed that potential differences, possible representing functional or metabolomic irregularities among states might become evident when best state descriptors for each such state are analyzed by means of bio informatics according to functional, known and predicted protein-protein interactions. To this end STRING
(http://www.stringdb.org/) was used for functional classification of 100 best state descriptors. The software's standard settings were used to search for multiple names, correlated to homo sapiens and clustered according to software's "confidentiality" parameter. As an output, associated networks were identified, and interacting proteins which suggest a functional relevance and point to a distinct biology in the respective tumors.
The present invention in one aspect thus relates to a method of diagnosing, stratifying and/or screening a hyper-pro liferative disease such as renal cell cancer in at least one patient, which is suspected of being afflicted by said or in at least one cell line of said disease comprising at least the steps of:
a. Providing a sample of a human or animal individual which is suspected of being afflicted by said disease;
b. Testing said sample for a signature;
c. Allocating a discrete disease-specific state to said sample based on the signature determined in step b.).
The sample may be a tumor sample of renal cell cancer.
There may be different ways to test for a signature. If the signature is not known yet, one may identify it as described above. If the signature is already known, one can test for it by analyzing the quality and/or quantity of descriptors that were used for identification of the signature. One can also use optimized signatures which allow best differentiation between different states. If for example the signature is based on expression data for a set of given genes or gene-associated molecules such as RNAs or proteins, one can test for a signature by simply determining the expression pattern for this set of molecules. This may be done by standard methods such as by micro- array expression analysis. One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 1, 2, 3 and 4. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample. If one has identified the signature, one also knows the discrete disease specific state which correlates with this signature. Using such methods one can thus classify patient samples by common molecular mechanisms that lead to the same discrete disease specific molecular states.
Thus, the invention preferably relates in one embodiment to identifying discrete disease specific states and preferably discrete renal cell cancer-specific states by analyzing a hyper-proliferative disease such as renal cell cancer for signatures being indicative of discrete disease specific states as described above. This analysis will be performed for a specific type of hyper-proliferative disease such as e.g. renal cell cancer. Thus, the diseases may be identified by common selection criteria such as the organs being affected. However, initially no attention will be given to sub- classifications of these hyper-proliferative diseases, which are based on e.g.
histological classification schemes. Once one has identified different discrete disease specific states for a disease like e.g. RCC, lung cancer, breast cancer, or as in the present case renal cell cancer, etc, one can test samples as described above for ongoing disease development already at a point in time when no phenotypic changes are recognizable. The discrete disease specific state therefore usually allows one to directly predict which sub-type of the disease in question is developing (e.g. state A, B, C or D for renal cell cancer, RCC, lung cancer (see also PCT/EP2011/057691)). These subtypes are correlated with e.g. clinically relevant parameters such as survival time. Thus, the term discrete disease specific state preferably allows distinguishing different subtypes of a disease according to a new classification scheme, which links the subtype to clinically or pharmacologically important parameters. The finding of the present invention that discrete disease specific states exist in diseases and can be correlated with subtypes that are characterized not necessarily by their histological properties but by clinically or pharmacologically relevant parameters thus allows deciphering disease through a new code which is based on the discrete disease specific states, substates and levels.
The knowledge that discrete disease-specific states exist e.g. in renal cell cancer can also be used to stratify patient cohorts undergoing clinical trials for new treatments of renal cell cancer. As mentioned herein, certain pharmaceutically active agents may act only on specific discrete disease-specific states. If a patient cohort which undergoes a clinical trial with such an active agent consists mainly of individuals with other discrete renal cell cancer-specific states, any effects of the
pharmaceutically active agent on the specific discrete renal cell cancer-specific state may not be discernible. Such effects may become, however, statistically significant if the patient cohort is grouped according to the discrete renal cell cancer-specific states. Thus, the knowledge on the existence of discrete renal cell cancer-specific states can be used to stratify test populations undergoing clinical trials according to their discrete renal cell cancer-specific states.
The classification of samples, be it of patients or cell lines for hyper-proliferative diseases such as renal cell cancer, for their discrete disease specific states has further implications. Given that discrete disease specific states seem to reflect decisive stages of the underlying molecular disease mechanisms, they can be linked to relevant clinical and pharmacological parameters such as average survival times or responsiveness to drugs. This means that analyzing samples of patients for their respective discrete disease specific molecular states does not only allow diagnosing the type of the disease at an early point in time but also makes a prognosis possible as to the future course of the disease. Thus, one will early know whether a patient suffers from e.g. renal cell cancer and whether this renal cell cancer will be an aggressive or comparatively moderate form. This prognosis can then be used for therapeutic purposes when making decisions as to the kind of medication, physical treatment or surgery.
Further, the possibility of assigning a discrete disease-specific state to samples allows analyzing the effectiveness of treatments with specific drugs. For example, one can test a patient or a population of patients suffering from a hyper-proliferative disease for (i) their reaction towards treatment with a pharmaceutically active agent and (ii) for their discrete disease specific molecular state. The reaction towards treatment may be measured by e.g. the quality of and quantity of clinical
improvement. One can then try to correlate such responders towards treatment with discrete disease specific states. If it turns out that patients for which the disease is characterized by a specific discrete disease specific state react more favorably towards treatment, these patients show a higher responsiveness towards treatment. As shown in PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310, pharmaceutically active compounds may affect a change of the state. If one disease- specific state is correlated with a more preferable clinically relevant parameter, knowledge about the present disease-specific state of a patient and its susceptibility towards changing into a disease-specific state, which is correlated with a more preferable clinically relevant parameter, upon treatment is, of course, of major importance from a medical perspective.
The invention in one aspect thus relates to a method of determining the
responsiveness of at least one human or animal individual which is suspected of being afflicted by a hyper-proliferative disease, preferably by renal cell cancer towards a pharmaceutically active agent comprising at least the steps of:
a. Providing a sample of at least one human or animal individual which is suspected of being afflicted by said disease before the pharmaceutically active agent is administered;
b. Testing said sample for a signature;
c. Allocating a discrete disease-specific state to said sample based on the signature determined;
d. Determining the effect of a pharmaceutically active compound on the disease symptoms and/or the discrete-disease specific state in said individual;
e. Identifying a correlation between the effects on disease symptoms and/or the discrete disease-specific state and the initial discrete disease-specific state of the sample.
The signature may be tested for as described above. The sample may be a tumor sample such as renal cell cancer. One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 1, 2, 3 and 4. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample. Being able to predict the responsiveness of e.g. patients with a discrete disease specific state towards treatment is helpful in many aspects. For example, if such responsiveness is known, one can pre-select patients for treatment. Identification of signatures and discrete disease specific states can thus serve as companion diagnostics, which allow pre-selecting patients for effective treatment. Tools for identifying patients that will respond to a particular treatment become more and more important with public health systems requiring such tests in order to reimburse expensive therapies. Being able to predict whether a specific group of patients which is characterized by their discrete disease specific states will react favorably towards a specific pharmaceutically active agent is also important for other areas. For example, a lot of drugs receive their initial marketing authorization from regulatory agencies such as the FDA for a specific indication only. Frequently, one then tries to test whether such drugs are also effective for treating other diseases. Such clinical trials are, however, extremely costly.
If one knew upfront that only patients with a specific discrete disease specific state have reacted positively towards a specific drug and if one now tests this drug for other diseases, one will be able to conduct such clinical trials with a significantly smaller patient group by selecting only patients with the discrete disease specific profile which has shown a positive response when patients with the same state were tested albeit for a different disease. These clinical trials will not only be less costly in view of the smaller test population, they are also likely to lead to a positive outcome as the effects of the treatment may be more pronounced and thus more easily discernible by statistical methods as the signal-to-noise ratio will be improved.
Being able to predict the responsiveness of a treatment also forms part of the prognostic aspects of the invention. The invention in one embodiment thus relates to a method of predicting the responsiveness of at least one patient which is suspected of being afflicted by a hyper-proliferative disease, preferably by renal cell cancer towards a
pharmaceutically active agent comprising at least the steps of: a. Determining whether a correlation exists between effects on disease symptoms and/or discrete disease-specific states and the initial discrete disease-specific states as a consequence of administration of a pharmaceutically active agent as described above;
b. Testing a sample of a human or animal individual which is suspected of being afflicted by a disease, optionally by a hyper-proliferative disease for a signature;
c. Allocating a discrete disease-specific state to said sample based
signature determined;
d. Comparing the discrete disease-specific state of the sample in step c. vs. the discrete disease-specific state for which a correlation has been determined in step a.);
e. Predicting the effect of a pharmaceutically active compound on the
disease symptoms in said patient.
The signature may be tested for as described above. The sample may be a tumor sample such as renal cell cancer. One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 1, 2, 3 and 4. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample.
The finding that diseases such as hyper-proliferative diseases are characterized by discrete disease specific states also allows new approaches for development and/or identification of new therapeutically active agents.
As mentioned above, samples from patients can be characterized as to their discrete disease specific states. Further, cell lines of diseases may also display such discrete disease specific states. It is assumed that a pharmaceutically active agent towards which a patient with a discrete disease specific state is responsive may in some instances induce a switch to another discrete disease specific sate (see in this respect PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310). This other discrete disease specific state may either be a completely new discrete disease specific state or it may be a discrete disease specific state, which has been found in other patients. For example, a pharmaceutically active agent may induce a switch from a discrete disease specific state which is correlated with low average survival times to a discrete disease specific state which is correlated with a longer average survival time. The discrete disease specific states and signatures relating thereto may be identified as described above.
If indeed a pharmaceutically active agent is capable of inducing a switch of discrete disease specific states, one can use discrete disease specific states and the signatures relating thereto as a read-out parameter for the potential effectiveness of
pharmaceutically active agents. The target on which the pharmaceutically active agent would act is thus the discrete disease specific state. The discrete disease specific states are thus considered to targets of pharmaceutically active agents.
The invention in one embodiment therefore relates to a method of determining the effects of a pharmaceutically active compound, comprising at least the steps of: a. Providing a sample of at least one human or animal individual which is suspected of being afflicted by a hyper-pro liferative disease, preferably by renal cell cancer or a cell line of said disease before a pharmaceutically active agent is applied;
b. Testing said sample or cell line for a signature;
c. Allocating a discrete disease-specific state to said sample or cell line based on the signature determined;
d. Testing said sample or cell line for a signature after the pharmaceutically active agent is applied;
e. Allocating a discrete disease-specific state to said sample or cell line based on the signature determined;
f. Comparing the discrete disease-specific states identified in steps c.) and e.).
The signature may be tested for as described above. The sample may be a tumor sample such as renal cell cancer. One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 1, 2, 3 and 4. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample.
The effects that are determined by this method may e.g. allow identification of compounds which may have a positive influence on the disease if e.g. a switch to a discrete disease specific state correlated with a more favorable clinical parameter such as increased survival time is observed. The methods may, however, also allow identification of toxic compounds if these compounds induce a switch to a discrete disease specific state correlated with a less favorable clinical parameter such as decreased survival time. These methods may thus be used as assays in the development, identification and/or screening of potential pharmaceutically active compounds, e.g. to determine the potential effectiveness of a pharmaceutically active compound in a disease such as a hyper-pro liferative disease. These assays may also be used for determining the toxicity of a pharmaceutically active compound.
Such discrete state-related assay systems for active and/or toxic drug candidates could be of enormous value to identify new pharmaceuticals. With the reasonable assumption that certain discrete states of a tumor are not just indicative for the status of being a hyper-proliferating cell but also being related e.g. to the aggressiveness of a tumor or survival time of a patient, the switch in state monitored by switch in signature marks an interesting screening system as a general "read out" for changing a tumor status. So the "read out" is related to functional efficacy rather than blocking a certain molecular target not necessarily being related to tumor function. Such screening system would simply pick up any compound switching the state irrespective of the molecular target of interaction. Such screening resembles assays interfering with virus propagation in cell cultures rather than screening for inhibitors of a certain viral enzyme just as reverse transcriptase.
On the other hand such assays could be indicative for the tumorgenicity of compounds turning a status characteristic for a healthy cell into a status characteristic for the status of a hyper-proliferative cell. The present invention in general thus relates to states, signatures and descriptors for use in diagnosing, stratifying, screening, prognosing human or animal individual being suspected of suffering from or suffering from renal cell cancer. The present invention further relates to immunoassays, kits, arrays, and other type of equipment which allows determining the state of human or animal individuals being suspected of suffering from or suffering from renal cell cancer. The signature may be tested for as described above. The sample may be a tumor sample such as renal cell cancer. One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 1, 2, 3 and 4. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample.
The present invention thus also relates to a microarray comprising specifically the sets of descriptors of tables 1, 2, 3 and 4 either alone or in combination. The array comprises preferably at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or at least 20 descriptors of table 1, 2, 3, and/or 4. The present inventions also relates to an immunoassay or ELISA kit allowing for determining expression of specifically the sets of descriptors of tables 1, 2, 3 and 4 either alone or in combination. The immunoassay or ELISA kit comprises preferably at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or at least 20 descriptors of table 1, 2, 3, and/or 4.
The concept of discrete disease-specific states, e.g. the characterization of diseases by the overall expression profile of genes relative to each other, was described for the first time in PCT/EP201 1/057691. Based on the understanding and data described therein, it was assumed that such discrete disease-specific states exist also for other diseases and in particular for other hyper-proliferative diseases such as cancer and that such discrete disease-specific states can be correlated with biological read-outs such as e.g. survival time. Based on these assumptions and existing expression data for renal cell cancer, computer-implemented, algorithm based approaches were undertaken to identify sets of genes which allow characterization of renal cell cancer by four discrete disease- specific states, which for the purposes of the present invention are designated as "A", "B", "C" or "D".
These computer-implemented, algorithm based approaches which are described in the following led to the identification of approximately 400 genes depicted in tables 1 to 4. The expression patterns of genes 1 to 100 (table 1) can be used to distinguish between the discrete renal cell cancer specific states A vs. BCD. The expression patterns of genes 101 to 200 (table 2) can be used to distinguish between the discrete renal cell cancer specific states B vs. ACD. The expression patterns of genes 201 to 300 can be used to distinguish between the discrete renal cell cancer specific states C vs. ABD. The expression patterns of genes 301 to 399 can be used to distinguish between the discrete renal cell cancer specific states D vs. ABC. In the following, the implications of these results are set forth. Then, the computer-implemented, algorithm based approaches are explained in further detail.
As mentioned, the expression pattern of about 400 genes, which are listed in table 1 , 2, 3 and 4 can be used to unambiguously identify the four discrete renal cell cancer specific states, which for sake of nomenclature have been named A, B, C and D herein.
More precisely, if genes 1 to 100 ("normal") of table 1 are found to be over- expressed for a sample of a human or animal individual, the individual will be characterized as having the discrete renal cell cancer specific state A. If genes 101 to 185 of table 2 are found to be under-expressed ("invers") and if genes 186 to 200 ("normal) of table 2 are found to be over-expressed for a sample of a human or animal individual, the individual will be characterized as having the discrete renal cell cancer specific state B. If genes 201 to 300 of table 3 are found to be over-expressed ("normal") for a sample of a human or animal individual, the individual will be characterized as having the discrete renal cell cancer specific state C. If genes 301 to 399 of table 4 are found to be under-expressed ("invers") for a sample of a human or animal individual, the individual will be characterized as having the discrete renal cell cancer specific state D. Expression levels may be determined using the Affymetrix gene chips HG-U133A, HG-U133B, HG-U133_Plus_2, etc. The decision as to whether a certain gene in a specific sample is over- or under-expressed will be taken in comparison to a control. This control will be either implemented in the software, or an overall median or other arithmetic mean across measurements is built. By implying a multitude of samples it is also conceivable to calculate a median and/or mean for each gene respectively. In relation to these results, a respective gene expression value is monitored as up or down-regulated. In case of Affymetrix gene chip expression analysis, one may rely on the "limit value" of tables 1, 2, 3 and 4 for making a decision as to over- or under- expression. The limit value will be put in the respective software, which is used for expression analysis, individually for each gene.
If other methods are used (such as ELISA or Western Blot analysis), the decision as to whether a respective gene is over- or under-expressed is made with respect to a control level which will be specific for the respective detection method and which is determined typically with respect to a value typical for healthy tissue.
It is to be understood that the renal cell cancer signatures as they are defined by the expression patterns of the genes of tables 1, 2, 3 and 4 reflect the outcome of a statistical analysis across multiple samples.
For the methods of diagnosis, prognosis, stratification, determining responsiveness etc. and the other uses as described herein, one will usually test samples obtained from an individual. On the individual level, the expression level of even a single gene of tables 1, 2, 3 and 4 may be sufficient to allocate a discrete renal cell cancer specific state. However, if inconclusive results are reached or if one wants to increase the reliability of allocation, one will usually analyze the expression pattern of more than one gene of genes 1 to 100, 101 to 200, 201 to 300 and/or 301 to 399 of tables 1, 2, 3 and 4. Typically one will analyze the expression pattern of at least about 2, at least about 3, at least about 4, at least about 5, at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, at least about 11, at least about 12, at least about 13, at least about 14, at least about 15, at least about 16, at least about 17, at least about 18, at least about 19 or at least about 20 genes of genes 1 to 100, 101 to 200, 201 to 300 or 301 to 399 of tables 1, 2, 3, or 4 to decide on whether the discrete renal cell cancer specific state being labeled herein as A, B, C or D is present or not. The reliability of the determination, of course, increases if more than one gene is analyzed with respect to its expression. The analysis of the expression of at least 10 genes will usually be sufficient to assign a discrete renal cell cancer-specific state with a reliability of at least about 90%..
The analysis of the expression pattern of at least 5 genes of genes 1 to 100 of table 1 will usually allow deciding whether state A is present with a reliability of about 80% or more. This reliability will increase if more genes are analyzed. Thus, the analysis of the expression pattern of at least 10 genes of genes 1 to 100 of table 1 will usually allow deciding whether state A or state BCD is present with a reliability of about 90% or more. The analysis of the expression pattern of at least 15 genes of genes 1 to 100 of table 1 will usually allow deciding whether state A or state BCD is present with a reliability of about 95% or more. The analysis of the expression pattern of at least 20 genes of genes 1 to 100 of table 1 will usually allow deciding whether state A or state BCD is present with a reliability of about 98% or more and the analysis of the expression pattern of at least 25 genes of genes 1 to 100 of table 1 will usually allow deciding whether state A or state BCD is present with a reliability of about 99% or more.
The analysis of the expression pattern of at least 5 genes of genes 101 to 200 of table 2 will usually allow deciding whether state B is present with a reliability of about 80% or more. This reliability will increase if more genes are analyzed. Thus, the analysis of the expression pattern of at least 10 genes of genes 101 to 200 of table 2 will usually allow deciding whether state B or state ACD is present with a reliability of about 90% or more. The analysis of the expression pattern of at least 15 genes of genes 101 to 200 of table 2 will usually allow deciding whether state B or state ACD is present with a reliability of about 95% or more. The analysis of the expression pattern of at least 20 genes of genes 101 to 200 of table 2 will usually allow deciding whether state B or state ACD is present with a reliability of about 98% or more and the analysis of the expression pattern of at least 25 genes of genes 101 to 200 of table
2 will usually allow deciding whether state B or state ACD is present with a reliability of about 99% or more.
The analysis of the expression pattern of at least 5 genes of genes 201 to 300 of table
3 will usually allow deciding whether state C is present with a reliability of about 80% or more. This reliability will increase if more genes are analyzed. Thus, the analysis of the expression pattern of at least 10 genes of genes 201 to 300 of table 3 will usually allow deciding whether state C or state ABD is present with a reliability of about 90% or more. The analysis of the expression pattern of at least 15 genes of genes 201 to 300 of table 3 will usually allow deciding whether state C or state ABD is present with a reliability of about 95% or more. The analysis of the expression pattern of at least 20 genes of genes 201 to 300 of table 3 will usually allow deciding whether state C or state ABD is present with a reliability of about 98% or more and the analysis of the expression pattern of at least 25 genes of genes 201 to 300 of table
3 will usually allow deciding whether state C or state ABD is present with a reliability of about 99% or more.
The analysis of the expression pattern of at least 5 genes of genes 301 to 399 of table
4 will usually allow deciding whether state D is present with a reliability of about 80% or more. This reliability will increase if more genes are analyzed. Thus, the analysis of the expression pattern of at least 10 genes of genes 301 to 399 of table 4 will usually allow deciding whether state D or state ABC is present with a reliability of about 90% or more. The analysis of the expression pattern of at least 15 genes of genes 301 to 399 of table 4 will usually allow deciding whether state D or state ABC is present with a reliability of about 95% or more. The analysis of the expression pattern of at least 20 genes of genes 301 to 399 of table 4 will usually allow deciding whether state D or state ABC is present with a reliability of about 98% or more and the analysis of the expression pattern of at least 25 genes of genes 301 to 399 of table 4 will usually allow deciding whether state D or state ABC is present with a reliability of about 99% or more.
The set of about 4x100 genes of tables 1, 2, 3 and 4 thus serves as a reservoir for the unambiguous characterization of states A, B, C and D. By analyzing the expression behavior of e.g. of at least about 2, at least about 3, at least about 4, at least about 5, at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, at least about 11, at least about 12, at least about 13, at least about 14, at least about 15, at least about 16, at least about 17, at least about 18, at least about 19 or at least about 20 genes of genes 1 to 400, one will be able to decide whether a patient suffers from renal cell cancer and (ii) whether the patient suffers from cancer of state A or any of the other states B, C or D.
The present invention thus relates to a signature, which can be derived from the expression pattern of at least about 2, at least about 3, at least about 4, at least about 5, of at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, at least about 11, at least about 12, at least about 13, at least about 14, at least about 15, at least about 16, at least about 17, at least about 18, at least about 19 or at least about 20 genes of genes 1 to 400 of tables 1, 2, 3 and 4. This signature will allow to unambiguously decide whether one of four discrete renal cell cancer specific states, namely state A, B, C or D is present.
The signature for A is defined by an over-expression of genes 1 to 100 of table 1. The signature for B is defined by an under-expression of genes 101 to 185 and an over-expression of genes 186 to 200 of table 2. The signature for C is defined by an over-expression of genes 201 to 300 of table 3. The signature for D is defined by an under-expression of genes 301 to 399 of table 4. It is to be understood that the survival rates which have been allocated by to states A, B and C in PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310 equally apply to the states A, B and C as mentioned herein. Thus, the signature described herein for state A indicates an RCC type with a high average survival time where about 70 to about 90% such as about 80%> of patients can be expected to live after 60 months. Preferably, the presence of this signature will be indicative of a discrete disease-specific state in RCC, which is indicative of an intermediate average survival time where about 60 to about 80% such as about 70% of patients can be expected to live after 90 months.
The signature described herein for state B indicates an RCC type with an
intermediate average survival time where about 45 to about 55% such as about 50%> of patients can be expected to live after 60 months. Preferably, the presence of this signature will be indicative of a discrete disease-specific state in RCC, which is indicative of an intermediate average survival time where about 40 to about 50% such as about 45% of patients can be expected to live after 90 months.
The signature described herein for state C indicates an RCC type with a low average survival time where e.g. about 30%> to about 45% such as about 40% of patients can be expected to live after 60 months. Preferably, the presence of this signature will be indicative of a discrete disease-specific state in RCC, which is indicative of an intermediate average survival time where about 5 to about 30% such as about 10% to 20% of patients can be expected to live after 90 months.
The present invention also relates to the above signatures for use as a diagnostic and/or prognostic marker in the context of renal cell cancer. By determining whether the signatures are present, one can take a decision as to whether a patient suffers from renal cell cancer as such and/or will likely develop renal cell cancer as such in the future. Further, one can distinguish between the aggressiveness of renal cell cancer development and adjust therapy accordingly. Further, the present invention relates to the above signatures for use in stratifying test populations for clinical trials for treatment of renal cell cancer. It is to be understood that determining the expression pattern of genes 1 to 400 of tables 1 , 2, 3 and 4 by microarray expression analysis as described is one of various options even though it can be preferred. However, it is also contemplated to perform such expression analysis on the protein level by e.g. ELISA, Immunoassay and/or Western Blotting. It is further to be understood that all methods of expression analysis is preferably conducted on renal cell cancer tissue.
Further, the present invention relates to the above signatures for use as a read out of a target for development, identification and/or screening of at least one
pharmaceutically active compound in the context of renal cell cancer as described above.
The present invention also relates to the above signatures for use in stratifying human or animal individuals which are suspected to suffer from ongoing or imminent renal cell cancer development. Stratification allows to group these individuals by their discrete renal cell cancer specific states. Potential pharmaceutically active compounds which are assumed to be effective in renal cell cancer treatment can thus be analyzed in such pre-selected patient groups.
The present invention in one embodiment also relates to a method of diagnosing, prognosing, stratifying and/or screening renal cell cancer in at least one human or animal patient, which is suspected of being afflicted by said disease, comprising at least the steps of:
a, Providing a sample of a human or animal individual being suspected to suffer from renal cell cancer;
b Testing said sample for a signature indicative of a discrete renal cell cancer specific state by determining expression of at least 1 gene, preferably at least 4 genes, more preferably of at least 5, 6, 7, 8, 9 or 10 genes of genes 1 to 100, 101 to 200, 201 to 300, 301 to 399 of tables 1, 2, 3, and/or 4;
c Allocating a discrete renal cell cancer specific state to said sample based on the signature determined in step b.). Further, the present invention in one embodiment relates to a method of determining the responsiveness of at least one human or animal individual, which is suspected of being afflicted by renal cell cancer, towards a pharmaceutically active agent comprising at least the steps of:
a. Providing a sample of a human or animal individual being suspected to suffer from renal cell cancer before the pharmaceutically active agent is administered;
b. Testing said sample for a signature indicative of a discrete renal cell cancer specific state by determining expression of at least 1 gene, preferably of at least 4 genes, more preferably of at least 5, 6, 7, 8, 9 or 10 genes of genes 1 to 100, 101 to 200, 201 to 300, 301 to 399 of tables 1, 2, 3, and/or 4;
c. Allocating a discrete renal cell cancer-specific state to said sample based on the signature determined in step b.);
d. Determining the effect of the pharmaceutically active agent on the disease symptoms and/or discrete renal cell cancer-specific states in said individual;
e. Identifying a correlation between the effects on disease symptoms and/or discrete renal cell cancer-specific states and the initial discrete renal cell cancer- specific state of the sample as determined in step c).
In yet another embodiment, the invention relates to a method of predicting the responsiveness of at least one patient which is suspected of being afflicted by renal cell cancer, towards a pharmaceutically active agent comprising at least the steps of:
a. Determining whether a correlation between effects on disease
symptoms and/or discrete renal cell cancer-specific states and the initial discrete renal cell cancer- specific state as a consequence of administration of a pharmaceutically active agent exists by using the above method ;
b. Testing a sample of a human or animal individual patient which is suspected of being afflicted by renal cell cancer for a signature indicative of a discrete renal cell cancer specific state by determining expression of at least 1 gene, preferably of at least 4 genes, more preferably of at least 5, 6, 7, 8, 9 or 10 genes of genes 1 to 100, 101 to 200, 201 to 300, 301 to 399 of tables 1, 2, 3, and/or 4; c. Allocating a discrete disease-specific state to said sample based on the signature determined in step c);
d. Comparing the discrete renal cell cancer-specific state of the sample in step c. vs. the discrete renal cell cancer-specific state for which a correlation has been determined in step a.);
e. Predicting the effect of a pharmaceutically active compound on the disease symptoms in said patient.
One embodiment of the invention relates to a method of determining the effects of a potential pharmaceutically active agent for treatment of renal cell cancer, comprising at least the steps of:
a. Providing a sample of a human or animal individual being suspected to suffer from renal cell cancer before a pharmaceutically active agent is applied;
b. Testing said sample for a signature indicative of a discrete renal cell cancer specific state by determining expression of at least 1 gene, preferably of at least 4 genes, more preferably of at least 5, 6, 7, 8, 9 or 10 genes of genes 1 to 100, 101 to 200, 201 to 300, 301 to 399 of tables 1, 2, 3, and/or 4;
c. Allocating a discrete renal cell cancer specific state to said sample based on the signature determined in step b.);
d. Providing a sample of a human or animal individual being suspected to suffer from renal cell cancer after a pharmaceutically active agent is applied;
e. Testing said sample for a signature indicative of a discrete renal cell cancer specific state by determining expression of at least 1 gene, preferably of at least 4 genes, more preferably of at least 5, 6, 7, 8, 9 or 10 genes of genes 1 to 100, 101 to 200, 201 to 300, 301 to 399 of tables 1, 2, 3, and/or 4; f. Allocating a discrete renal cell cancer specific state to said sample based on the signature determined in step e.);
g Comparing the discrete renal cell cancer specific states identified in steps c.) and f).
Breast Cancer
The present invention provides further evidence that the discrete breast cancer- specific states A, B, C and D as reflected by the expression pattern of the descriptors of tables 5, 6, 7 and 8 (see also Experiment 4) are indeed biologically relevant. It was assumed that potential differences, possible representing functional or metabolomic irregularities among states might become evident when best state descriptors for each such state are analyzed by means of bio informatics according to functional, known and predicted protein-protein interactions. To this end STRING
(http://www.stringdb.org/) was used for functional classification of 100 best state descriptors. The software's standard settings were used to search for multiple names, correlated to homo sapiens and clustered according to software's "confidentiality" parameter. As an output, associated networks were identified, and interacting proteins which suggest a functional relevance and point to a distinct biology in the respective tumors.
The present invention in one aspect thus relates to a method of diagnosing, stratifying and/or screening a hyper-proliferative disease such as breast cancer in at least one patient, which is suspected of being afflicted by said or in at least one cell line of said disease comprising at least the steps of:
a, Providing a sample of a human or animal individual which is suspected of being afflicted by said disease;
b Testing said sample for a signature;
c Allocating a discrete disease-specific state to said sample based
signature determined in step b.).
The sample may be a tumor sample of breast cancer. There may be different ways to test for a signature. If the signature is not known yet, one may identify it as described above. If the signature is already known, one can test for it by analyzing the quality and/or quantity of descriptors that were used for identification of the signature. One can also use optimized signatures which allow best differentiation between different states. If for example the signature is based on expression data for a set of given genes or gene-associated molecules such as R As or proteins, one can test for a signature by simply determining the expression pattern for this set of molecules. This may be done by standard methods such as by micro- array expression analysis. One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 5, 6, 7, and 8. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample.
If one has identified the signature, one also knows the discrete disease specific state which correlates with this signature. Using such methods one can thus classify patient samples by common molecular mechanisms that lead to the same discrete disease specific molecular states.
Thus, the invention preferably relates in one embodiment to identifying discrete disease specific states and preferably discrete breast cancer-specific states by analyzing a hyper-proliferative disease such as breast cancer for signatures being indicative of discrete disease specific states as described above. This analysis will be performed for a specific type of hyper-proliferative disease such as e.g. breast cancer. Thus, the diseases may be identified by common selection criteria such as the organs being affected. However, initially no attention will be given to sub-classifications of these hyper-proliferative diseases, which are based on e.g. histological classification schemes. Once one has identified different discrete disease specific states for a disease like e.g. RCC, lung cancer, or as in the present case breast cancer, etc, one can test samples as described above for ongoing disease development already at a point in time when no phenotypic changes are recognizable. The disease specific state therefore usually allows one to directly predict which sub-type of the disease in question is developing (e.g. state A, B, C or D for breast cancer, RCC, lung cancer (see also PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310). These subtypes are correlated with e.g. clinically relevant parameters such as survival time. Thus, the term discrete disease specific state preferably allows distinguishing different subtypes of a disease according to a new classification scheme, which links the subtype to clinically or pharmacologically important parameters. The finding of the present invention that discrete disease specific states exist in diseases and can be correlated with subtypes that are characterized not necessarily by their histological properties but by clinically or pharmacologically relevant parameters thus allows deciphering disease through a new code which is based on the discrete disease specific states, substates and levels.
The knowledge that discrete disease-specific states exist e.g. in breast cancer can also be used to stratify patient cohorts undergoing clinical trials for new treatments of breast cancer. As mentioned herein, certain pharmaceutically active agents may act only on specific discrete disease-specific states. If a patient cohort which undergoes a clinical trial with such an active agent consists mainly of individuals with other discrete breast cancer-specific states, any effects of the pharmaceutically active agent on the specific discrete breast cancer-specific state may not be discernible. Such effects may become, however, statistically significant if the patient cohort is grouped according to the discrete breast cancer-specific states. Thus, the knowledge on the existence of breast cancer-specific states can be used to stratify test populations undergoing clinical trials according to their discrete breast cancer-specific states. An illustration of inter alia this aspect of the invention is provided by Experiment 5 which describes that breast cancer patients having a breast cancer-specific state A as it is described hereinafter show prolonged metastasis free survival upon treatment with tamoxifen compared to patients not having this breast cancer-specific state.
Thus, the knowledge on breast cancer-specific states could be used to stratify patient cohorts for clinical trials involving e.g. future combination therapies including tamoxifen. This knowledge may also be used for diagnostic purpose, namely to identify patients diagnosed with breast cancer which would be responsive to tamoxifen treatment. The classification of samples, be it of patients or cell lines for hyper-proliferative diseases such as breast cancer, for their discrete disease specific states has further implications. Given that discrete disease specific states seem to reflect decisive stages of the underlying molecular disease mechanisms, they can be linked to relevant clinical and pharmacological parameters such as average survival times or responsiveness to drugs. This means that analyzing samples of patients for their respective discrete disease specific molecular states does not only allow diagnosing the type of the disease at an early point in time but also makes a prognosis possible as to the future course of the disease. Thus, one will early know whether a patient suffers from e.g. breast cancer and whether this breast cancer will be an aggressive or comparatively moderate form. This prognosis can then be used for therapeutic purposes when making decisions as to the kind of medication, physical treatment or surgery.
Further, the possibility of assigning a discrete disease-specific state to samples allows analyzing the effectiveness of treatments with specific drugs. For example, one can test a patient or a population of patients suffering from a hyper-proliferative disease for (i) their reaction towards treatment with a pharmaceutically active agent and (ii) for their discrete disease specific molecular state. The reaction towards treatment may be measured by e.g. the quality of and quantity of clinical
improvement. One can then try to correlate such responders towards treatment with discrete disease specific states. If it turns out that patients for which the disease is characterized by a specific discrete disease specific state react more favorably towards treatment, these patients show a higher responsiveness towards treatment. As shown in PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310, pharmaceutically active compounds may affect a change of the state. If one disease- specific state is correlated with a more preferable clinically relevant parameter, knowledge about the present disease-specific state of a patient and its susceptibility towards changing into a disease-specific state, which is correlated with a more preferable clinically relevant parameter, upon treatment is, of course, of major importance from a medical perspective. The invention in one aspect thus relates to a method of determining the
responsiveness of at least one human or animal individual which is suspected of being afflicted by a hyper-proliferative disease, preferably by breast cancer towards pharmaceutically active agent comprising at least the steps of:
a. Providing a sample of at least one human or animal individual which is suspected of being afflicted by said disease before the pharmaceutically active agent is administered;
b. Testing said sample for a signature;
c. Allocating a discrete disease-specific state to said sample based on the signature determined;
d. Determining the effect of a pharmaceutically active compound on the disease symptoms and/or the discrete-disease specific state in said individual;
e. Identifying a correlation between the effects on disease symptoms and/or the discrete disease-specific state and the initial discrete disease-specific state of the sample.
The signature may be tested for as described above. The sample may be a tumor sample such as breast cancer. One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 5, 6, 7 and 8. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample. An example of for this embodiment of the invention is the treatment of breast cancer patients with tamoxifen as described hereinafter which shows that breast cancer patients with breast cancer-specific state A show a prolonged distant metastasis free survival upon treatment with tamoxifen.
Being able to predict the responsiveness of e.g. patients with a discrete disease specific state towards treatment is helpful in many aspects. For example, if such responsiveness is known, one can pre-select patients for treatment. Identification of signatures and discrete disease specific states can thus serve as companion diagnostics, which allow pre-selecting patients for effective treatment. Tools for identifying patients that will respond to a particular treatment become more and more important with public health systems requiring such tests in order to reimburse expensive therapies. Being able to predict whether a specific group of patients which is characterized by their discrete disease specific states will react favorably towards a specific pharmaceutically active agent is also important for other areas. For example, a lot of drugs receive their initial marketing authorization from regulatory agencies such as the FDA for a specific indication only. Frequently, one then tries to test whether such drugs are also effective for treating other diseases. Such clinical trials are, however, extremely costly.
If one knew upfront that only patients with a specific discrete disease specific state have reacted positively towards a specific drug and if one now tests this drug for other diseases, one will be able to conduct such clinical trials with a significantly smaller patient group by selecting only patients with the discrete disease specific profile which has shown a positive response when patients with the same state were tested albeit for a different disease. These clinical trials will not only be less costly in view of the smaller test population, they are also likely to lead to a positive outcome as the effects of the treatment may be more pronounced and thus more easily discernible by statistical methods as the signal-to-noise ratio will be improved. An example of for this embodiment of the invention is the treatment of breast cancer patients with tamoxifen as described hereinafter which shows that breast cancer patients with breast cancer-specific state A show a prolonged distant metastasis free survival upon treatment with tamoxifen.
Being able to predict the responsiveness of a treatment also forms part of the prognostic aspects of the invention.
The invention in one embodiment thus relates to a method of predicting the responsiveness of at least one patient which is suspected of being afflicted by a hyper-proliferative disease, preferably by breast cancer towards a pharmaceutically active agent comprising at least the steps of:
a. Determining whether a correlation exists between effects on disease
symptoms and/or discrete disease-specific states and the initial discrete disease-specific states as a consequence of administration of a pharmaceutically active agent as described above; b. Testing a sample of a human or animal individual which is suspected of being afflicted by a disease, optionally by a hyper-proliferative disease for a signature;
c. Allocating a discrete disease-specific state to said sample based on the signature determined;
d. Comparing the discrete disease-specific state of the sample in step c. vs. the discrete disease-specific state for which a correlation has been determined in step a.);
e. Predicting the effect of a pharmaceutically active compound on the
disease symptoms in said patient.
The signature may be tested for as described above. The sample may be a tumor sample such as breast cancer. One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 5, 6, 7 and 8. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample. An example of for this embodiment of the invention is the treatment of breast cancer patients with tamoxifen as described hereinafter which shows that breast cancer patients with breast cancer-specific state A show a prolonged distant metastasis free survival upon treatment with tamoxifen.
The finding that diseases such as hyper-proliferative diseases are characterized by discrete disease specific states also allows new approaches for development and/or identification of new therapeutically active agents. As mentioned above, samples from patients can be characterized as to their discrete disease specific states. Further, cell lines of diseases may also display such discrete disease specific states. It is assumed that a pharmaceutically active agent towards which a patient with a discrete disease specific state is responsive may in some instances induce a switch to another discrete disease specific sate (see in this respect PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310). This other discrete disease specific state may either be a completely new discrete disease specific state or it may be a discrete disease specific state, which has been found in other patients. For example, a pharmaceutically active agent may induce a switch from a discrete disease specific state which is correlated with low average survival times to a discrete disease specific state which is correlated with a longer average survival time. The discrete disease specific states and signatures relating thereto may be identified as described above.
If indeed a pharmaceutically active agent is capable of inducing a switch of discrete disease specific states, one can use discrete disease specific states and the signatures relating thereto as a read-out parameter for the potential effectiveness of
pharmaceutically active agents. The target on which the pharmaceutically active agent would act is thus the discrete disease specific state. The discrete disease specific states are thus considered to targets of pharmaceutically active agents.
The invention in one embodiment therefore relates to a method of determining the effects of a pharmaceutically active compound, comprising at least the steps of: a. Providing a sample of at least one human or animal individual which is suspected of being afflicted by a hyper-pro liferative disease, preferably by breast cancer or a cell line of said disease before a pharmaceutically active agent is applied;
b. Testing said sample or cell line for a signature;
c. Allocating a discrete disease-specific state to said sample or cell line
based on the signature determined;
d. Testing said sample or cell line for a signature after the pharmaceutically active agent is applied;
e. Allocating a discrete disease-specific state to said sample or cell line
based on the signature determined;
f. Comparing the discrete disease-specific states identified in steps c.) and e.).
The signature may be tested for as described above. The sample may be a tumor sample such as breast cancer. One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 5, 6, 7, and 8. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample. An example of for this embodiment of the invention is the treatment of breast cancer patients with tamoxifen as described hereinafter which shows that breast cancer patients with breast cancer-specific state A show a prolonged distant metastasis free survival upon treatment with tamoxifen. The effects that are determined by this method may e.g. allow identification of compounds which may have a positive influence on the disease if e.g. a switch to a discrete disease specific state correlated with a more favorable clinical parameter such as increased survival time is observed. The methods may, however, also allow identification of toxic compounds if these compounds induce a switch to a discrete disease specific state correlated with a less favorable clinical parameter such as decreased survival time. These methods may thus be used as assays in the development, identification and/or screening of potential pharmaceutically active compounds, e.g. to determine the potential effectiveness of a pharmaceutically active compound in a disease such as a hyper-pro liferative disease. These assays may also be used for determining the toxicity of a pharmaceutically active compound.
Such discrete state-related assay systems for active and/or toxic drug candidates could be of enormous value to identify new pharmaceuticals. With the reasonable assumption that certain discrete states of a tumor are not just indicative for the status of being a hyper-proliferating cell but also being related e.g. to the aggressiveness of a tumor or survival time of a patient, the switch in state monitored by switch in signature marks an interesting screening system as a general "read out" for changing a tumor status. So the "read out" is related to functional efficacy rather than blocking a certain molecular target not necessarily being related to tumor function. Such screening system would simply pick up any compound switching the state irrespective of the molecular target of interaction. Such screening resembles assays interfering with virus propagation in cell cultures rather than screening for inhibitors of a certain viral enzyme just as reverse transcriptase. On the other hand such assays could be indicative for the tumorgenicity of compounds turning a status characteristic for a healthy cell into a status characteristic for the status of a hyper-proliferative cell. The present invention in general thus relates to states, signatures and descriptors for use in diagnosing, stratifying, screening, prognosing human or animal individual being suspected of suffering from or suffering from breast cancer. The present invention further relates to immunoassays, kits, arrays, and other type of equipment which allows determining the state of human or animal individuals being suspected of suffering from or suffering from breast cancer. The signature may be tested for as described above. The sample may be a tumor sample such as breast cancer. One way of determining a signature is to test for the expression pattern of the descriptor sets of tables 5, 6, 7 and 8. If the descriptor sets show an expression profile as described below, one can allocate a signature and thus state A, B, C or D to the respective sample.
The present invention thus also relates to a microarray comprising specifically the sets of descriptors of tables 5, 6, 7, and 8 either alone or in combination. The array comprises preferably at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or at least 20 descriptors of table 1, 2, 3, and/or 4. The present inventions also relates to an immunoassay or ELISA kit allowing for determining expression of specifically the sets of descriptors of tables 5, 6, 7, and 8 either alone or in combination. The immunoassay or ELISA kit comprises preferably at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or at least 20 descriptors of table 5, 6, 7, and/or 4.
The concept of discrete disease-specific states, e.g. the characterization of diseases by the overall expression profile of genes relative to each other, was described for the first time in PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310. Based on the understanding and data described therein, it was assumed that such discrete disease-specific states exist also for other diseases and in particular for other hyper-proliferative diseases such as cancer and that such discrete disease-specific states can be correlated with biological read-outs such as e.g. survival time. Based on these assumptions and existing expression data for breast cancer, computer-implemented, algorithm based approaches were undertaken to identify sets of genes which allow characterization of breast cancer by four discrete disease- specific states, which for the purposes of the present invention are designated as "A", "B", "C" or "D".
These computer-implemented, algorithm based approaches which are described in the following led to the identification of approximately 400 genes depicted in tables 1 to 4. The expression patterns of genes 1 to 100 (table 5) can be used to distinguish between the discrete breast cancer specific states A vs. BCD. The expression patterns of genes 101 to 200 (table 6) can be used to distinguish between the discrete breast cancer specific states B vs. ACD. The expression patterns of genes 201 to 300 (table 7) can be used to distinguish between the discrete breast cancer specific states C vs. ABD. The expression patterns of genes 301 to 399 (table 8) can be used to distinguish between the discrete breast cancer specific states D vs. ABC. In the following, the implications of these results are set forth. Then, the computer- implemented, algorithm based approaches are explained in further detail.
As mentioned, the expression pattern of about 400 genes, which are listed in table 5, 6, 7, and 8 can be used to unambiguously identify the four discrete breast cancer specific states, which for sake of nomenclature have been named A, B, C and D herein.
More precisely, if genes 1 to 100 of Table 5 are found to be over-expressed for a sample of a human or animal individual, the individual will be characterized as having the discrete breast cancer specific state A. If genes 101 to 200 of table 6 are found to be under-expressed ("invers") for a sample of a human or animal individual, the individual will be characterized as having the discrete breast cancer specific state B.
If genes 201 to 292 of table 7 are found to be under-expressed ("invers") and if genes 293 to 300 of table 7 are found to be over-expressed ("normal") for a sample of a human or animal individual, the individual will be characterized as having the discrete breast cancer specific state C. If genes 301 to 399 of table 8 are found to be under-expressed ("invers") for a sample of a human or animal individual, the individual will be characterized as having the discrete breast cancer specific state D. Expression levels may be determined using the Affymetrix gene chips HG-U133A, HG-U133B, HG-U133_Plus_2, etc. The decision as to whether a certain gene in a specific sample is over- or under-expressed will be taken in comparison to a control. This control will be either implemented in the software, or an overall median or other arithmetic mean across measurements is built. By implying a multitude of samples it is also conceivable to calculate a median and/or mean for each gene respectively. In relation to these results, a respective gene expression value is monitored as up or down-regulated. In case of Affymetrix gene chip expression analysis, one may rely on the "limit value" of tables 5, 6, 7, and 8 for making a decision as to over- or under-expression. The limit value will be put in the respective software, which is used for expression analysis, individually for each gene.
If other methods are used (such as ELISA or Western Blot analysis), the decision as to whether a respective gene is over- or under-expressed is made with respect to a control level which will be specific for the respective detection method and which is determined typically with respect to a value typical for healthy tissue.
It is to be understood that the breast cancer signatures as they are defined by the expression patterns of the genes of tables 5, 6, 7, and 8 reflect the outcome of a statistical analysis across multiple samples.
For the methods of diagnosis, prognosis, stratification, determining responsiveness etc. and the other uses as described herein, one will usually test samples obtained from an individual. On the individual level, the expression level of even a single gene of tables 5, 6, 7, and 8 may be sufficient to allocate a discrete breast cancer specific state. However, if inconclusive results are reached or if one wants to increase the reliability of allocation, one will usually analyze the expression pattern of more than one gene of genes 1 to 100, 101 to 200, 201 to 300 and/or 301 to 399 of tables 5, 6, 7, and 8. Typically one will analyze the expression pattern of at least about 2, at least about 3, at least about 4, at least about 5, at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, at least about 11, at least about 12, at least about 13, at least about 14, at least about 15, at least about 16, at least about 17, at least about 18, at least about 19 or at least about 20 genes of genes 1 to 100, 101 to 200, 201 to 300 or 301 to 399 of tables 5, 6, 7, or 8 to decide on whether the discrete breast cancer specific state being labeled herein as A, B, C or D is present or not. The reliability of the determination, of course, increases if more than one gene is analyzed with respect to its expression. The analysis of the expression of at least 10 genes will usually be sufficient to assign a discrete breast cancer-specific state with a reliability of at least about 90%..
The analysis of the expression pattern of at least 5 genes of genes 1 to 100 of table 5 will usually allow deciding whether state A is present with a reliability of about 80% or more. This reliability will increase if more genes are analyzed. Thus, the analysis of the expression pattern of at least 10 genes of genes 1 to 100 of table 5 will usually allow deciding whether state A or state BCD is present with a reliability of about 90% or more. The analysis of the expression pattern of at least 15 genes of genes 1 to 100 of table 5 will usually allow deciding whether state A or state BCD is present with a reliability of about 95% or more. The analysis of the expression pattern of at least 20 genes of genes 1 to 100 of table 5 will usually allow deciding whether state A or state BCD is present with a reliability of about 98% or more and the analysis of the expression pattern of at least 25 genes of genes 1 to 100 of table 5 will usually allow deciding whether state A or state BCD is present with a reliability of about 99% or more.
The analysis of the expression pattern of at least 5 genes of genes 101 to 200 of table 6 will usually allow deciding whether state B is present with a reliability of about 80% or more. This reliability will increase if more genes are analyzed. Thus, the analysis of the expression pattern of at least 10 genes of genes 101 to 200 of table 6 will usually allow deciding whether state B or state ACD is present with a reliability of about 90% or more. The analysis of the expression pattern of at least 15 genes of genes 101 to 200 of table 6 will usually allow deciding whether state B or state ACD is present with a reliability of about 95% or more. The analysis of the expression pattern of at least 20 genes of genes 101 to 200 of table 6 will usually allow deciding whether state B or state ACD is present with a reliability of about 98% or more and the analysis of the expression pattern of at least 25 genes of genes 101 to 200 of table
6 will usually allow deciding whether state B or state ACD is present with a reliability of about 99% or more.
The analysis of the expression pattern of at least 5 genes of genes 201 to 300 of table
7 will usually allow deciding whether state C is present with a reliability of about 80% or more. This reliability will increase if more genes are analyzed. Thus, the analysis of the expression pattern of at least 10 genes of genes 201 to 300 of table 7 will usually allow deciding whether state C or state ABD is present with a reliability of about 90% or more. The analysis of the expression pattern of at least 15 genes of genes 201 to 300 of table 7 will usually allow deciding whether state C or state ABD is present with a reliability of about 95% or more. The analysis of the expression pattern of at least 20 genes of genes 201 to 300 of table 7 will usually allow deciding whether state C or state ABD is present with a reliability of about 98% or more and the analysis of the expression pattern of at least 25 genes of genes 201 to 300 of table
7 will usually allow deciding whether state C or state ABD is present with a reliability of about 99% or more.
The analysis of the expression pattern of at least 5 genes of genes 301 to 399 of table
8 will usually allow deciding whether state D is present with a reliability of about 80% or more. This reliability will increase if more genes are analyzed. Thus, the analysis of the expression pattern of at least 10 genes of genes 301 to 399 of table 8 will usually allow deciding whether state D or state ABC is present with a reliability of about 90% or more. The analysis of the expression pattern of at least 15 genes of genes 301 to 399 of table 8 will usually allow deciding whether state D or state ABC is present with a reliability of about 95% or more. The analysis of the expression pattern of at least 20 genes of genes 301 to 399 of table 8 will usually allow deciding whether state D or state ABC is present with a reliability of about 98% or more and the analysis of the expression pattern of at least 25 genes of genes 301 to 399 of table 8 will usually allow deciding whether state D or state ABC is present with a reliability of about 99% or more.
The set of about 4x100 genes of tables 5, 6, 7 and 8 thus serves as a reservoir for the unambiguous characterization of states A, B, C and D. By analyzing the expression behavior of e.g. of at least about 2, at least about 3, at least about 4, at least about 5, at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, at least about 11, at least about 12, at least about 13, at least about 14, at least about 15, at least about 16, at least about 17, at least about 18, at least about 19 or at least about 20 genes of genes 1 to 400, one will be able to decide whether a patient suffers from breast cancer and (ii) whether the patient suffers from cancer of state A or any of the other states B, C or D.
The present invention thus relates to a signature, which can be derived from the expression pattern of at least about 2, at least about 3, at least about 4, at least about 5, of at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, at least about 11, at least about 12, at least about 13, at least about 14, at least about 15, at least about 16, at least about 17, at least about 18, at least about 19 or at least about 20 genes of genes 1 to 400 of tables 5, 6, 7 and 8. This signature will allow to unambiguously decide whether one of four discrete breast cancer specific states, namely state A, B, C or D is present.
The signature for A is defined by an over-expression of genes 1 to 100 of table 5. The signature for B is defined by an under-expression of genes 101 to 200 of table 6. The signature for C is defined by an under-expression of genes 201 to 292 and an over-expression of genes 293 to 300 of Table 7. The signature for D is defined by an under-expression of genes 301 to 399 of Table 8.
The present invention also relates to the above signatures for use as a diagnostic and/or prognostic marker in the context of breast cancer. By determining whether the signatures are present, one can take a decision as to whether a patient suffers from breast cancer as such and/or will likely develop breast cancer as such in the future. Further, one can distinguish between the aggressiveness of breast cancer
development and adjust therapy accordingly. Further, the present invention relates to the above signatures for use in stratifying test populations for clinical trials for treatment of breast cancer.
It is to be understood that determining the expression pattern of genes 1 to 400 of tables 5, 6, 7 and 8 by microarray expression analysis as described is one of various options even though it can be preferred. However, it is also contemplated to perform such expression analysis on the protein level by e.g. ELISA, Immunoassay and/or Western Blotting. It is further to be understood that all methods of expression analysis is preferably conducted on breast cancer tissue. Further, the present invention relates to the above signatures for use as a read out of a target for development, identification and/or screening of at least one
pharmaceutically active compound in the context of breast cancer as described above. The present invention also relates to the above signatures for use in stratifying human or animal individuals which are suspected to suffer from ongoing or imminent breast cancer development. Stratification allows to group these individuals by their discrete breast cancer specific states. Potential pharmaceutically active compounds which are assumed to be effective in breast cancer treatment can thus be analyzed in such pre- selected patient groups.
The present invention in one embodiment also relates to a method of diagnosing, prognosing, stratifying and/or screening breast cancer in at least one human or animal patient, which is suspected of being afflicted by said disease, comprising at least the steps of:
a. Providing a sample of a human or animal individual being suspected to suffer from breast cancer; Testing said sample for a signature indicative of a discrete breast cancer specific state by determining expression of at least 1 , preferably at least 4 genes, more preferably of at least 5, 6,7 8, 9 or 10 genes of genes 1 to 100, 101 to 200, 201 to 300, 301 to 399 of tables 5, 6, 7, and/or 8;
Allocating a discrete breast cancer specific state to said sample based on the signature determined in step b.).
Further, the present invention in one embodiment relates to a method of determining the responsiveness of at least one human or animal individual, which is suspected of being afflicted by breast cancer, towards a pharmaceutically active agent comprising at least the steps of:
a. Providing a sample of a human or animal individual being suspected to suffer from breast cancer before the pharmaceutically active agent is administered;
b. Testing said sample for a signature indicative of a discrete breast cancer specific state by determining expression of at least one, preferably of at least 4, more preferably of at least 5, 6,7 8, 9 or 10 genes of genes 1 to 100, 101 to 200, 201 to 300, 301 to 399 of tables 5, 6, 7, and/or 8;
c. Allocating a discrete breast cancer-specific state to said sample based on the signature determined in step b.);
d. Determining the effect of the pharmaceutically active agent on the disease symptoms and/or discrete breast cancer-specific states in said individual;
e. Identifying a correlation between the effects on disease symptoms and/or discrete breast cancer-specific states and the initial discrete breast cancer- specific state of the sample as determined in step c).
An example of for this embodiment of the invention is determining the
responsiveness of breast cancer patients depending on their breast-cancer specific states A, B, C, and D toward treatment with tamoxifen. The examples as presented hereinafter show that a breast cancer patient with the breast cancer-specific state A, but not of state other than A react positively towards treatment with tamoxifen as can be taken from the prolonged distant metastasis free survival time.
In yet another embodiment, the invention relates to a method of predicting the responsiveness of at least one patient which is suspected of being afflicted by breast cancer, towards a pharmaceutically active agent comprising at least the steps of:
a. Determining whether a correlation between effects on disease symptoms and/or discrete breast cancer-specific states and the initial discrete breast cancer-specific state as a consequence of administration of a pharmaceutically active agent exists by using the above method ;
b. Testing a sample of a human or animal individual patient which is suspected of being afflicted by breast cancer for a signature indicative of a discrete breast cancer specific state by determining expression of at least 1, preferably of at least 4 genes, more preferably of at least 5, 6, 7, 8, 9 or 10 genes of genes 1 to 100, 101 to 200, 201 to 300, 301 to 399 of tables 5, 6, 7, and/or 8;
c. Allocating a discrete disease-specific state to said sample based on the signature determined in step c);
d. Comparing the discrete breast cancer-specific state of the sample in step c. vs. the discrete breast cancer-specific state for which a correlation has been determined in step a.);
e. Predicting the effect of a pharmaceutically active compound on the disease symptoms in said patient.
An example of for this embodiment of the invention is predicting the responsiveness of breast cancer patients depending on their breast-cancer specific states A, B, C, and D toward treatment with tamoxifen. The examples as presented hereinafter show that a breast cancer patient with the breast cancer-specific state A, but not on a state other than A, will react positively towards treatment with tamoxifen as can be taken from the prolonged distant metastasis free survival time. One embodiment of the invention relates to a method of determining the effects of a potential pharmaceutically active agent for treatment of breast cancer, comprising at least the steps of:
a. Providing a sample of a human or animal individual being suspected to suffer from breast cancer before a pharmaceutically active agent is applied;
b. Testing said sample for a signature indicative of a discrete breast cancer specific state by determining expression of at least 1 , preferably of at least 4 genes, more preferably of at least 5, 6, 7, 8, 9 or 10 genes of genes 1 to 100, 101 to 200, 201 to 300, 301 to 399 of tables 5, 6, 7, and/or 8;
c. Allocating a discrete breast cancer specific state to said sample based on the signature determined in step b.);
d. Providing a sample of a human or animal individual being suspected to suffer from breast cancer after a pharmaceutically active agent is applied;
e. Testing said sample for a signature indicative of a discrete breast cancer specific state by determining expression of at least 1 , preferably of at least 4 genes, more preferably of at least 5, 6,7, 8, 9 or 10 genes of genes 1 to 100, 101 to 200, 201 to 300, 301 to 399 of tables 5, 6, 7, and/or 8;
f. Allocating a discrete breast cancer specific state to said sample based on the signature determined in step e.);
g. Comparing the discrete breast cancer specific states identified in steps c.) and f).
An example of for this embodiment of the invention is determining the
responsiveness of breast cancer patients depending on their breast-cancer specific states A, B, C, and D toward treatment with tamoxifen. The examples as presented hereinafter show that a breast cancer patient with the breast cancer-specific state A, but not of state other than A react positively towards treatment with tamoxifen as can be taken from the prolonged distant metastasis free survival time. The present invention also relates to a computer implemented method, a computer or other technical device which is suitable to perform the above steps and methods or those described in Experiment 1 and Figure 3. The latter computer-implemented methods will allow identifying states, signatures and descriptors for a disease.
The invention further relates to the use of such computer-implemented methods, computers, technical devices etc. for classifying renal cell cancer samples, tissues etc. Such classification may enable the above-mentioned uses of diagnosis, stratification etc.
Determining sets of predictive descriptors being testable by e. g. PCR and optionally by qPCR
As described in PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310 as well as herein, unsupervised two-way hierarchical clustering for microarray expression data across different renal cell cancers such as papillary, clear cell and chromophob renal cell cancer was used to identify three renal cancer-specific states, namely A, B and C. Further, it was shown that these states have functional relevance as patients with state A show a prolonged survival time. The same has been shown for breast cancer.
This knowledge allows to now classify samples from individual patients by using the same approach, i.e. based on microarray expression data or using the approach described in example 1. However, from a practical perspective, it is desirable to be able to determine a disease-specific state such as a renal cancer-specific state of patient by a set of predictive descriptors which can be determined by less costly and more straightforward methods such as e.g. by determining gene expression for a limited number of state-predictive genes using PCR, and optionally qPCR. Other tests than PCR or qPCR could be used as well for detecting e.g. gene expression or protein expression for these sets of predictive descriptors. Examples of such other tests include nCounter Gene Expression assays from Nanostring (Seattle, WA, U.S. A), alternative expression analysis by sequencing (ALEXA-seq, www.alexaplatform.org), Serial Analysis of Gene Expression (SAGE), Northern Blotting, and more.
In Example 6, it is described how one can obtain a set of descriptors, which are predictive for a renal-cancer specific state and which are measurable e.g. by PCR, and optionally by qPCR, by selecting such predictive descriptors from the group of descriptors such as expressed genes that are obtainable by unsupervised two-way hierarchical clustering for microarray expression data (e.g., using Affymetrix HG- U133 A, G-U133 B, HG-U133 Plus 2.0, Agilent, Nimblegen and their derivatives such as Illumina) as described in PCT/EP201 1/057691 and Beleut et al., BMC cancer (2012), 12:310 as well as herein. It is to be understood that this approach can be used for determining set of predictive descriptors for other hyper proliferative diseases such as breast cancer, ovarian cancer, colorectal cancer, lung cancer, prostate cancer, brain cancer, hepato cellular carcinoma, acute myeloma, pheochromocytoma, Burkitt's lymphoma, myeloma or other types of diseases such as Parkinson's disease once it has been shown by e.g. unsupervised two-way hierarchical clustering for microarray expression data as described in PCT/EP201 1/057691 and Beleut et al., BMC cancer (2012), 12:310 as well as herein that disease-specific states exist for these types of diseases.
The approach starts from the group of descriptors, such as expressed genes that are obtainable by unsupervised two-way hierarchical clustering for microarray expression data as described in PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310 as well as herein.
One then normalizes the expression data of such microarray analysis employing the MAS5 algorithm as provided by Affymetrix, ensued by a data filtering process in which all those genes are excluded from further analysis whose number of "present calls" was inferior to 25%. Present calls is a factor specific to Affymetrix
microarrays, where a probe array consists of a number of oligonucleotide probe cells and each probe cell contains a unique oligonucleotide probe. Probes are tiled in probe pairs as a Perfect Match (PM) and a Mismatch (MM). The sequence for PM andMM are the same, except for a change to the Watson-Crick complement in the middle of the MM probe sequence. A probe set consists of a series of probe pairs and represents an expressed transcript. The sample should bind stronger to PM than to MM, so one assumes that for a given probe set for example 25% or more of the PM values should be higher than the MM values to be deemed valid.
The expression values of the remaining genes are then correlated with the patient states A, B, C.
In a next step one selects at least the 2 genes, preferably at least 10 out of the remaining genes with the most positive, and the at least 2 genes, preferably at least 10 out of the remaining genes with the most negative correlation for a disease- specific state such as renal cancer-specific states A, B, or C. Further it is possible to add at least 2, preferably at least 10 genes that are randomly selected and at least 2, preferably at least 10 genes showing the least variation across all the states.
The resulting set of genes can then be tested e.g. in qPCR experiments with genes grouped in pairs since qPCR readers are usually designed to measure just pairs of genes. As gene expression values obtained with qPCR often do not correspond well to expression values obtained for the corresponding genes with microarrays, correlations between the measured qPCR expression values and the states were calculated in a first step. Genes for further analysis thus are chosen according to their correlation between mRNA expression microarray and qPCR tests. All genes with a qPCR/microarray correlation inferior to a threshold value between 0 and 1, preferably 0.35 may be excluded from the model leaving a reduced set of genes. This set of genes should comprise typically at least 20 genes to then allow identification of sets of predictive descriptors for each disease-specific state. If the above selection procedure does not result in enough genes, for example because of a poor correlation between qPCR and microarray expression data, one may initially select at least 15, at least 20, at least 25 etc. genes with the most positive correlation with a disease- specific state, etc. In this particular example, 22 genes had values above the selected threshold of 0.35, which are A2M, ANGPTL4, AP2M1, BDH1, CD99, COBLL1, DOCK9, EPAS1, F5, H3F3B, IFITM3, LAPTM3B, LDB2, LPCAT3, MAPRE1, NDUFA4, PGBD5, RGS5, SERBP1, SERINC3, TSG101, UFSP2.
All genes for which e.g. qPCR and microarray expression data correlate above threshold form a set of genes. In a subsequent step the parameters for a naive Bayes classifier are calculated for every gene (i) and every state, i.e. mean and standard deviations at over every state A,B,C individually. For example, for state A the mean over all patients for gene EPAS 1 is calculated according
n (all petients
classified A)
Figure imgf000071_0001
Λ— V XEPASl,j
=1
where XEPASI denotes the qPCR value measured for patient (j) and gene EPAS1. Similarly, means and standard deviations for all patients which are not allocated to the particular state are calculated, that is, nonA, nonB, nonC are calculated. For example, for nonA the mean over all patients not being allocated to state A (that is, allocated to either B or C, or not allocated at all) and gene EPAS1 is calculated according to
n (all petients
ied A)
xEPASl,j
Figure imgf000071_0002
The calculations are similarly done for states B and C. For example, for state B the mean over all patients for gene EPAS1 is calculated according
n (all petients
classified B)
^B.EPASl Λ— V^ XEPASl,j
= 1 where XEPASI denotes the qPCR value measured for patient (j) and gene EPAS1. Similarly, means and standard deviations for all patients which are not allocated to the particular state are calculated, that is, nonA, nonB, nonC are calculated. For example, for nonB the mean over all patients not being allocated to state B (that is, allocated to either A or C, or not allocated at all) and gene EPAS1 is calculated according to n (all petients
not classified B)
^nonB.EPASl Λ ~ ^ V xEPASl,j
=1
For state C the mean over all patients for gene EPAS 1 is calculated according
Figure imgf000072_0001
where XEPASI denotes the qPCR value measured for patient (j) and gene EPAS1. Similarly, means and standard deviations for all patients which are not allocated to the particular state are calculated, that is, nonA, nonB, nonC are calculated. For example, for nonB the mean over all patients not being allocated to state B (that is, allocated to either A or B, or not allocated at all) and gene EPAS1 is calculated according to
ents
ied C)
Figure imgf000072_0002
For the set of genes for which qPCR data were measured and which were used for test design, parameter values are given in the below table 11.
Figure imgf000072_0003
COBLL1 24.5 1.9 25 0.5 24.7 1.7 24.7 1.6 24.9 1.2 24.4 2.3
LDB2 25.3 2 23.3 1.6 24.2 1.7 25.5 2.5 24.4 2.3 25.1 1.5
MAPRE1 25 0.8 24.9 0.7 24.7 0.6 25.5 0.8 25.2 0.8 24.5 0.5
UFSP2 29.3 2.3 28.2 1.4 28.6 2.1 29.4 2 28.8 1.8 29.1 2.6
H3F3B 26.8 1.2 26.5 1.1 26.3 1 27.5 1 27 1.2 26.1 1
Table 1 1
Subsequently, posterior probabilities to denote a state are calculated for all genes.
Taking again state A and gene EPAS 1 as an example, the posterior probability that the measured qPCR value x for EPAS 1 denotes state A is calculated according to
(x- μΑ,ΕΡΑΞι)
2σ A.EPASl
A.EPASl 2πσΕΡΑ51
For a subset of genes, the values are calcul for state A, according to
A,subset A
Figure imgf000073_0001
belongs to subset
Where for k=EPASl the individual value PA,K would be PA.EPASI0? the preceding formula. The state of a sample is allocated by determining the maximum value of the individual subset probabilities. For example to determine whether a sample is state A or not, the state SA is calculated by
¾ rriilX {PA,subset A> ^nonA.subset nonA^)
For state B and gene EPAS 1 as an example, the posterior probability that the
measured qPCR value x for EPAS 1 denotes state B is calculated according to
(x- μΒ,ΕΡΑΞι)
2σ B.EPASl
B.EPASl
2naEPAsi
For a subset of genes, the values are calculated, for example for state A, according to
PB, subset A Pe,k
Figure imgf000073_0002
Where for k=EPASl the individual value PB,K would be PB.EPASI0? the preceding formula. The state of a sample is allocated by determining the maximum value of the individual subset probabilities. For example to determine whether a sample is state B or not, the state SB is calculated by
SB = max {PB subset B> PnonB .subset none ) For state C and gene EPAS 1 as an example, the posterior probability that the measured qPCR value x for EPAS 1 denotes state C is calculated according to
Figure imgf000074_0001
For a subset of genes, the values are calculat for example for state A, according to
Pc,subset C — rC,k
Figure imgf000074_0002
gene k
belongs to subset
Where for k=EPASl the individual value PC,K would be PC,EPASI °£ me preceding formula. The state of a sample is allocated by determining the maximum value of the individual subset probabilities. For example to determine whether a sample is state C or not, the state SC is calculated by
Sc = max (Pc subset c> ^nonC subset none) To determine whether a sample is state A, B, or C, SABC is calculated by
¾BC = max {PA,subset Α> Ρβ, subset B,Pc, subset c)
The subsets of genes for the individual states are selected from the qPCR set of genes according to the following iterative procedure: 1. the subset of predictive genes is empty in the beginning of the procedure
2. computing the accuracy of predicting the desired state A,B, or C for all single genes of the set by employing the na'ive Bayes model,
3. select a number of genes with the best accuracy of step 2 and add them to the set of predictive genes, and take them out of the set. Preferably the number of genes selected is at least one, two, or three.
compute the accuracy of predicting the desired state A,B, or C for the combination of all genes in the subset in combination with each single remaining gene of the set.
4. select a number of genes with the best accuracy of step 4 and add them to the set of predictive genes, and take them out of the set. Preferably the number of genes selected is at least one, two, or three. 5. repeat steps 4) and 5) until a desired accuracy is obtained.
In general, the prediction accuracy will increase with the number of genes tested. However, cost, effort, and time spent for a test will increase as well with the number of tested genes and will eventually exceed practical limits. It is possible to set a desired or required predication accuracy such as e.g. at least about 60%, at least about 65%o, at least about 70%>, at least about 75%, at least about 80%>, at least about 85%), or at least about 90%>, at least about 91%>, at least about 92%, at least about 93%), at least about 94%>, at least about 95%, at least about 96%>, at least about 975, at least about 98%>, or at least about 99% and repeat the selection procedure until the subset of genes selected provides this accuracy. Table 12 shows the accuracy obtained with an increasing subset of genes, for predicting a single state (a) and to predict the state of a sample directly from a single set of measurements (b)
Figure imgf000075_0001
Table 12
It has been found that a set of at least 2 descriptors, e.g. genes will typically allow to predict a disease-specific state with an accuracy of at least 65%. The accuracy can be improved if more descriptors, e.g. genes are included into the SP list for each state.
Thus, a set of at least 4 descriptors, e.g. genes may typically allow to predict a disease-specific state with an accuracy of at least 75%. A set of at least 10 descriptors, e.g. genes may typically allow to predict a disease-specific state with an accuracy of at least 80%. A set of at least 15 descriptors, e.g. genes will typically allow to predict a disease-specific state with an accuracy of at least, 90% or at least 95%.
Having a set of predictive descriptors, e.g. genes for each state identified by this approach, one can then test a sample of a patient or individual by e.g. qPCR for all gens of said sets of predictive descriptors, e.g. genes, and use the measure data to calculate and thus predict the disease-specific state of this patient or individual.
The assignment to a specific state after testing is performed by calculating 3 values SA , SB , and Sc in parallel, where for each j=A,B,C. In general, Sn for the state n is calculated according to:
Sji )
Figure imgf000076_0001
Before this background, the invention in one aspect relates to a method of diagnosing, prognosing, classifying, stratifying and/or screening a disease in a human or animal patient, which is suspected of being afflicted by said disease, comprising at least the steps of:
a. determining the expression of a set of at least two predictive descriptors, e.g. genes in a sample of said patient for a disease-specific state, b. allocating a disease-specific state to said sample of said patient based on the expression of said set of predictive descriptors,
wherein said set of predictive descriptors is selected from a group of descriptors which is indicative of disease-specific states and which is identifiable by unsupervised two-way hierarchical clustering of gene expression data for samples of said disease from different patients.
The control sample may be a sample, preferably an extracorporeal sample from a healthy subject. The group of descriptors, from which the sets of predictive descriptors are selected and which is indicative of disease-specific states, is identifiable by unsupervised two- way hierarchical clustering of gene expression data for samples of said disease from different patients as described in PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310. The approach may thus rely on identifying disease-specific states, e.g. in renal cell cancer or breast cancer by performing an unsupervised two-way hierarchical clustering approach with TIGR MeV (Saeed et al., Methods Enzymol. (2006), 411 : 134-193) using Euclidian distance and average linkage without incorporating any histological or pathological data or classifications. Identification of disease specific descriptors such as biomarkers may then be performed using SAM (Tusher et al, Proc Natl Acad Sci USA (2001) 98(9):5116-5121). Identification of signatures and states may be best performed by first extracting descriptors such as expressed genes for certain pathway using the Panther software as described in
PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310 and subjecting these pathway specific sets of genes to unsupervised two-way hierarchical clustering. The groups of descriptors, e.g. the genes identified for the different pathways may then be combined and again subjected to a unsupervised two-way hierarchical clustering approach against the same tumor sets. This two-fold unsupervised two- way hierarchical clustering will reveal in a straightforward manner whether a certain disease can be classified into different disease-specific states as describe herein. The relevance of this approach and the states identified can be taken from the fact that the states identified for renal cell cancer correlate with survival time (see
PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310) and that certain breast cancer patients react differently to tamoxifen treatment depending on their state (see hereinafter).
The sets of at least 2 predictive descriptors, which is preferably analyzed by qPCR analysis, is selected from a group of descriptors, which is indicative of disease- specific states and which is identifiable by unsupervised two way hierarchical clustering of gene expression data for samples of said disease from different patients, by a process comprising at least the steps of:
a. selecting for each disease disease-specific state at least 10 descriptors with the most positive correlation with a disease-specific state, b. selecting for each disease-specific state at least 10 descriptors with the most negative correlation with a disease-specific state, c. selecting at least 10 descriptors with low variation across all disease- specific states,
d. selecting randomly at least 10 descriptors across all disease-specific
states,
e. combining all descriptors of steps a. to d. into one first list,
f. defining a second list or subset of predictive descripotrs., e.g. genes which is empty in the beginning of the following procedure, g. computing the accuracy of predicting the desired state disease-specific state for all single descriptors, e.g. genes separetly, of the set by employing the above naive Bayes model,
h. select a number of descriptors, e.g. genes with the best accuracy of step g) and add them to the second list or subset of predictive descriptors, e.g. genes, and take them out of the first list,
i. compute the accuracy of predicting the desired disease-specific state for the combination of all descriptors, e.g. genes in the second list or subset in combination with each single remaining descriptor, e.g. gene of the first list,
j. select a number of descriptors, e.g. genes with the best accuracy of step h) and add them to the second list or subset of predictive descriptors, e.g. genes, and take them out of the first list
k. repeat steps g) and j) until a desired accuracy is obtained until said second list contains at least 2 descriptors for each disease-specific state, or until the prediction accuracy reaches a predefined threshold. Preferably the number of descriptors, e.g. genes selected in steps g) and j) is at least one, two, or three.
As mentioned above, before the descriptors of steps a. to d. are combined in a first list, one may test all descriptors, e.g. genes by e.g. qPCR analysis and determine the correlation between qPCR and microarray expression as described above. One may then only combine those descriptors in said first list, for which a correlation has been determined. The number of descriptors, e.g. genes to be combined in said first list should be at least 40, preferably at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100.
A set of predictive descriptors for a given disease-specific state, which is measurable by e.g. PCR and optionally qPCR, should comprise at least 2, at least 4, at least 8, at least 10, at least 12, at least 14, at least 16, or at least 20 descriptors, e.g. genes.
The present invention further relates to the above-mentioned method of diagnosing, prognosing, classifying, stratifying and/or screening a disease in a human or animal patient, which is suspected of being afflicted by said disease, by determining a measurable quantity, e.g. expression of a set of disease-specific state-predictive descriptors, e.g. genes, wherein expression of said set of at least 2 descriptors per disease-specific state is determined by qPCR analysis and wherein, based on the qPCR results, assignment of a disease-state for a sample of patient is calculated according to:
Sji )
Figure imgf000079_0001
with gene k
belongs to subset
and
(xk~ an,k)
2σ, n,k
2πσ, n,k wherein Xk denote the measured cycle numbers for each of the descriptors k and parameters μη k, and ση _fc denote predetermined parameters for each gene k and state n, wherein Sn with the highest value determines the disease-specific state. The parameters may be determined as described above. The above methods relate preferably to determine disease-specific states in hyper proliferative disease as mentioned above. Preferred hyper proliferative diseases include renal cell cancer, breast cancer, ovarian cancer, colorectal cancer, lung cancer, prostate cancer, brain cancer, hepato cellular carcinoma, acute myeloma, pheochromocytoma, Burkitt's lymphoma or myeloma. Particularly preferred hyper proliferative diseases are renal cell cancer and breast cancer.
A particularly preferred embodiment relates to the use of sets of predictive descriptors for diagnosing, prognosing, classifying, stratifying and/or screening renal cell cancer in a human or animal patient, which is suspected of being afflicted by said disease, the descriptor being selected from the genes of Table 10. Using the set of descriptors EPAS1, LAPTM4B, DOCK9, BDHl, AP2M1, LPCAT3, State A of renal cell cancer can be predicted with an accuracy of 85%. Using the set of descriptors DOCK9, CD99, BDHl, PGBDB5, NDUF4A, LPCAT3, State B of renal cell cancer can be predicted with an accuracy of 93%. Using the set of descriptors LPCAT3, LAPTM4B, RGS5, SERINC3, F5, COBLL1, State C of renal cell cancer can be predicted with an accuracy of 76% (see also Table 11)..
The present invention thus further relates to a combination of predictive descriptors for diagnosing, prognosing, classifying, stratifying and/or screening renal cell cancer in a human or animal patient, which is suspected of being afflicted by said disease, said descriptors being selected from Table 10.
In another aspect, the present invention relates to a method of identifying sets of predictive descriptors, e.g. genes for a disease-specific state in a sample of a patient suffering from said disease which are suitable diagnosing, prognosing, classifying, stratifying and/or screening a disease in a human or animal patient, comprising at least the steps of:
a. selecting said set of predictive descriptors from a group of descriptors which is indicative of disease-specific states and which is identifiable by unsupervised two-way hierarchical clustering of gene expression data for samples of said disease from different patients. b. selecting for each disease disease-specific state at least 10 descriptors with the most positive correlation with a disease-specific state, c. selecting for each disease-specific state at least 10 descriptors with the most negative correlation with a disease-specific state, selecting at least 10 descriptors with low variation across all disease- specific states,
selecting randomly at least 10 descriptors across all disease-specific states,
combining all descriptors of steps a. to d. into one first list, defining a second list or subset of predictive descriptors, e.g. genes which is empty in the beginning of the following procedure, computing the accuracy of predicting the desired state disease-specific state for all single descriptors, e.g. genes separetly, of the set by employing the above naive Bayes model,
select a number of descriptors, e.g. genes genes with the best accuracy of step h) and add them to the second list or subset of predictive descriptors, e.g. genes, and take them out of the first list, compute the accuracy of predicting the desired disease-specific state for the combination of all descriptors, e.g. genes in the second list or subset in combination with each single remaining descriptor, e.g. gene of the first list,
select a number of descriptors, e.g. genes with the best accuracy of step j) and add them to the second list or subset of predictive descriptors, e.g. genes, and take them out of the first list, repeat steps i) and k) until a desired accuracy is obtained until said second list contains at least 2 descriptors, e.g. genes for each disease- specific state, or until the prediction accuracy reaches a predefined threshold.
Preferably the number of descriptors, e.g. genes selected in steps g) and j) is at least one, two, or three.
The sets of predictive descriptors in a sample of a patient may be analyzed by qPCR analysis. The sets of predictive descriptors, e.g. genes may be identified for a hyper proliferative disease. Preferably such a hyper proliferative disease is selected from the group comprising include renal cell cancer, breast cancer, ovarian cancer, colorectal cancer, lung cancer, prostate cancer, brain cancer, hepato cellular carcinoma, acute myeloma, pheochromocytoma, Burkitt's lymphoma or myeloma. Even more preferavly. said hyper proliferative disease is renal cell cancer or breast cancer.
In one embodiment, a set of predictive descriptors for a disease-specific state id identified comprising at least two, four, six, eight, 10, 12, 14, 16, 18, or 20 predictive descriptors for a discrete disease-specific state
The sets of descriptors may be identified in an extracorporeal sample of a patient. The present invention also relates to combinations of predictive descriptors for diagnosing, prognosing, classifying, stratifying and/or screening renal cell cancer in a human or animal patient, which is suspected of being afflicted by said disease, being identifiable by the above method.
The invention is now described with respect to specific experiments. These experiment shall, however, not be construed as being limiting.
Experiments
1. Identification of discrete renal cell cancer-specific states and representative gene sets (descriptors)
In PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310, a two way hierarchical clustering approach was described for identification of discrete disease- specific states in renal cell cancer (RCC). Further, it was shown that this approach is suitable for identifying stable constellation of at least three global relative constellations (which were named sates) in other tumor types. The states in RCC were named A, B and C. Further, a computer implemented approach was described which also allowed identification of descriptors being characteristic for these states.
Based on the knowledge that such distinct relative constellations being designated as states exist in RCC, a new algorithm was developed to straightforwardly test for and identify states in other cancers such as colorectal cancer or even in renal cell cancer.
Determination of discriminating genes The basis of the determination of discriminating genes for colorectal and other types of cancer are the findings of research made on renal cell carcinoma. Several gene expression signatures (states) were identified allowing for a discrimination between different tumor behavior (see PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310). This classification of renal cancer cell (RCC) tissue samples, in the following called "RCC-C", together with their respective gene patterns formed the basic input to the gene identification process described below. The RCC-states can be characterized and discriminated by a set of genes, in the following called "RCC- G" which possess some discriminatory power to classify the tissue specific samples correctly (according to the classification found in the first step).
To transfer these findings from RCC to other types of cancer cells a two-step algorithm is applied: 1. In a first step the samples for the other tumor type such as colorectal cancer are classified according to the states found in the RCC, leading to a tissue- specific classification of the samples "Ts-C".
2. In a second step a set of genes is sought, whereby each single gene out of this set possesses some discriminatory power to classify the tissue specific samples correctly (according to the classification found in the first step). This set of genes is called tissue-specific gene set "Ts-G".
These two steps are described in more detail below.
Identification of the RCC-G
Starting with the given set of genes RCC-C (taken from PCT/EP2011/057691) each RCC sample can be characterized by a list of 100 values for the gene expression levels. This ordered list can be regarded as a vector in a 100-dimensional Hilbert space H. Once could select more or less than 100 values. However, this number is a reasonable size that provides reliable results.
To this end, the set of genes, which build the base of this space are optimized such that an optimal clustering (with a maximal distance within this vector space given the metric in this space) of the states is given by this base. This defines the optimal set of genes RCC-G. Figure 1 shows this process of representation of the states in a higher- dimensional space, together with the identification of centers for the states (denoted by crosses in the sketch). In addition to the centers of the different states A, B and C, a further vector is defined by the centre of all samples, which could not have been covered by the previous mapping. This forth group is denoted by "D".
Transfer to other tissues Now the set of genes RCC-G can be taken to define a similar vector space for the tumor specific samples of another disease or cancer such as colorectal cancer. The data for these other tumor specific samples such as colorectal cancer may come e.g. from microarray expression analysis. In the present case, the cancer related data may be publicly available expression data, for example from Affymterix gene chip data, preferably for whole genome expression data. Not only this base is defined, moreover with the already calculated centers of the states one can now generate a tissue specific classification (Ts-C) by
1. representing the sample by a vector in this vector space,
2. calculate the (Euclidian) distance between the sample and all centre vectors for A, B, C and D,
3. map each sample to the centre vector closest to it.
In this way the tissue-specific classification is defined, illustrated in figure 2. Identification of the tissue specific gene set
Once that by the first step described above the tumor-specific classification (Ts-C) is defined, a set of genes is calculated with the following property:
• Each of the genes within this set is able to discriminate between one specific state and the other states, e.g. between "A" on the one hand, and "B", "C" and "D" on the other hand.
• All genes are scanned whether they are, after appropriate pre-processing and scaling, able to achieve this discrimination, i.e. build a model with predictive power.
• For each state the N best genes according to this procedure are taken.
This results in a list of N genes for each state in this tissue and therefore the tumor specific gene set (Ts-G). In the present case, the tumor specific gene set relates to colorectal cancer. The flowchart of the algorithm is sketched in Figure 3.
This approach led to the identification genes 1 to 100 (table 1), 101 to 200 (table 2), 201 to 300 (table 3) and 301 to 399 (table 4), the expression patterns of which can be used to differentiate between four discrete colorectal cancer-specific states, which were labeled A, B, C and D as described above.
2. Functional relevance of discrete colorectal cancer- specific states and representative gene sets (descriptors)
The descriptors found being representative of states A (table 1), B (table 2), C (table 3) and D (table 4) were then analyzed for their functional relevance. It was assumed that potential differences, possible representing functional or metabolomic irregularities among states might become evident when best state descriptors for each such state are analyzed by means of bio informatics according to functional, known and predicted protein-protein interactions. To this end STRING
(htt ://www.stringdb. org/) was used for functional classification of 100 best state descriptors. The software's standard settings were used to search for multiple names, correlated to homo sapiens and clustered according to software's "confidentiality" parameter. As an output, associated networks were identified, and interacting proteins which suggest a functional relevance and point to a distinct biology in the respective tumors. The associated network of state A is depicted in Figure 4, of state B in Figure 5, of state C in Figure 6 and of state D in Figure 7.
3. Identification of discrete breast cancer-specific states and representative gene sets (descriptors) In PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310, a two way hierarchical clustering approach was described for identification of discrete disease- specific states in renal cell cancer (RCC). Further, it was shown that this approach is suitable for identifying stable constellation of at least three global relative constellations (which were named sates) in other tumor types. The states in RCC were named A, B and C. Further, a computer implemented approach was described which also allowed identification of descriptors being characteristic for these states. Based on the knowledge that such distinct relative constellations being designated as states exist in RCC, a new algorithm was developed to straightforwardly test for and identify states in other cancers such as breast cancer. Determination of discriminating genes
The basis of the determination of discriminating genes for breast and other types of cancer are the findings of research made on renal cell carcinoma. Several gene expression signatures (states) were identified allowing for a discrimination between different tumor behavior (see PCT/EP2011/057691 and Beleut et al, BMC cancer (2012), 12:310). This classification of renal cancer cell (RCC) tissue samples, in the following called "RCC-C", together with their respective gene patterns formed the basic input to the gene identification process described below. The RCC-states can be characterized and discriminated by a set of genes, in the following called "RCC- G" which possess some discriminatory power to classify the tissue specific samples correctly (according to the classification found in the first step).
To transfer these findings from RCC to other types of cancer cells a two-step algorithm is applied:
1. In a first step the samples for the other tumor type such as breast cancer are classified according to the states found in the RCC, leading to a tissue- specific classification of the samples "Ts-C".
2. In a second step a set of genes is sought, whereby each single gene out of this set possesses some discriminatory power to classify the tissue specific samples correctly (according to the classification found in the first step). This set of genes is called tissue-specific gene set "Ts-G".
These two steps are described in more detail below. Identification of the RCC-G
Starting with the given set of genes RCC-C (taken from PCT/EP2011/057691 and Beleut et al., BMC cancer (2012), 12:310) each RCC sample can be characterized by a list of 100 values for the gene expression levels. This ordered list can be regarded as a vector in a 100-dimensional Hilbert space H. Once could select more or less than 100 values. However, this number is a reasonable size that provides reliable results. To this end, the set of genes, which build the base of this space are optimized such that an optimal clustering (with a maximal distance within this vector space given the metric in this space) of the states is given by this base. This defines the optimal set of genes RCC-G. Figure 1 shows this process of representation of the states in a higher- dimensional space, together with the identification of centers for the states (denoted by crosses in the sketch). In addition to the centers of the different states A, B and C, a further vector is defined by the centre of all samples, which could not have been covered by the previous mapping. This forth group is denoted by "D".
Transfer to other tissues
Now the set of genes RCC-G can be taken to define a similar vector space for the tumor specific samples of another disease or cancer such as breast cancer. The data for these other tumor specific samples such as breast cancer may come e.g. from microarray expression analysis. In the present case, the breast cancer related data were publicly available expression data, for example from Affymterix gene chip data, preferably for whole genome expression data.
Not only this base is defined, moreover with the already calculated centers of the states one can now generate a tissue specific classification (Ts-C) by
1. representing the sample by a vector in this vector space,
2 calculate the (Euclidian) distance between the sample and all centre vectors for A, B, C and D,
3 map each sample to the centre vector closest to it. In this way the tissue-specific classification is defined, illustrated in figure 2.
Identification of the tissue specific gene set Once that by the first step described above the tumor-specific classification (Ts-C) is defined, a set of genes is calculated with the following property:
• Each of the genes within this set is able to discriminate between one specific state and the other states, e.g. between "A" on the one hand, and "B", "C" and "D" on the other hand.
• All genes are scanned whether they are, after appropriate pre-processing and scaling, able to achieve this discrimination, i.e. build a model with predictive power.
• For each state the N best genes according to this procedure are taken.
This results in a list of N genes for each state in this tissue and therefore the tumor specific gene set (Ts-G). In the present case, the tumor specific gene set relates to breast cancer. The flowchart of the algorithm is sketched in Figure 3.
This approach led to the identification genes 1 to 100 (table 5), 101 to 200 (table 6), 201 to 300 (table 7) and 301 to 399 (table 8), the expression patterns of which can be used to differentiate between four discrete breast cancer-specific states, which were labeled A, B, C and D as described above.
4. Functional relevance of discrete breast cancer- specific states and
representative gene sets (descriptors) The descriptors found being representative of states A (table 5), B (table 6), C (table 7) and D (table 8) were then analyzed for their functional relevance. It was assumed that potential differences, possible representing functional or metabolomic irregularities among states might become evident when best state descriptors for each such state are analyzed by means of bio informatics according to functional, known and predicted protein-protein interactions. To this end STRING
(http ://www.stringdb. org/) was used for functional classification of 100 best state descriptors. The software's standard settings were used to search for multiple names, correlated to homo sapiens and clustered according to software's "confidentiality" parameter. As an output, associated networks were identified, and interacting proteins which suggest a functional relevance and point to a distinct biology in the respective tumors. The associated network of state A is depicted in Figure 8, of state B in Figure 9, of state C in Figure 10 and of state D in Figure 11.
5. Pharmacologic relevance of breast cancer-specific states
Tamoxifen is approved by the U.S. Food and Drug Administration to treat women diagnosed with estrogen-receptor(ER)-positive, early and late stage breast cancer after primary intervention (chemotherapy, radiation, surgery) to reduce the risk of recurrence of the cancer. Under treatment with Tamoxifen, ER-positive breast cancers, show a heterogeneous range of response rates suggesting a complex biology of these tumours. The identification of markers to define responder groups, respectively the possibility to assign those not likely to benefit from Tamoxifen treatment to other types of therapy, is therefore of high medical and economic importance.
To determine the relevance of the breast cancer-specific states A, B, C and D identified in Example 3, two cohorts of ER-positive patients, namely Tamoxifen treated patients (Cohort I: GSE6532) and Tamoxifen-untreated patients (Cohort II: GSE2034) were re-analysed with respect to the breast cancer-specific states identified herein. The data were retrieved from Gene Expression Omnibus (for Cohort I, see http://www.ncbi.nlm.nih.gov/geo/query/acc. cgi?acc=GSE6532 and http://jco.ascopubs.org/content/25/10/1239, for Cohort II, see
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2034 and
http://www.ncbi.nlm.nih.gov/pubmed/15721472).
As a first step gene expression data from the two patient cohorts positive for estrogen receptor were analyzed using the algorithm of Example 3 and each patient was assigned to breast cancer-specific states A, B, C and D. Some patients could not be assigned to any of these states. They were thus assigned to a group labelled as E : Breast Cancer State/Group Number of Patients (Cohort 1) Number of Patients (Cohort 2)
A 141 (86%) 65 (31%)
B 0 (0%) 1 (0%)
C 5 (3%) 58 (28%)
D 0 (0%) 58 (28%)
E 17 (10%) 27 (13%)
Cancer states were then correlated with clinical data represented by the "distant metastasis free survival (DMFS)" parameters (see also Wang Y, et al. Lancet 2005: 671-9. PMID: 15721472).
As depicted in Figure 12, breast cancer-specific state A is associated with a markedly prolonged time to disease progression. Patients with other states or belonging to group E ("non-State A") show a clinical course that is indistinguishable from patients that did not receive medication as demonstrated with cohort II which had not been treated with Tamoxifen (Fig. 2B).
The patients of Cohort I were assigned to the breast cancer-specific states and group E as follows:
State A:
GSM150943;GSM150944;GSM150945;GSM150946;GSM150948;GSM150949;GS M150950;GSM150951;GSM150952;GSM150953;GSM150954;GSM150956;GSM1 50957;GSM150958;GSM150959;GSM150960;GSM150961;GSM150962;GSM1509 63;GSM150964;GSM150965;GSM150968;GSM150969;GSM150970;GSM150972; GSM150973;GSM150974;GSM150975;GSM150976;GSM150977;GSM150978;GS M150979;GSM150980;GSM150981;GSM150985;GSM150989;GSM150990;GSM1 50991;GSM150992;GSM150993;GSM150994;GSM150995;GSM150996;GSM1509 97;GSM150998;GSM150999;GSM151001;GSM151003;GSM151004;GSM151006; GSM151007;GSM151008;GSM151009;GSM151010;GSM151011;GSM151013;GS M151014;GSM151016;GSM151017;GSM151019;GSM151020;GSM151021;GSM1 51022;GSM151023;GSM151024;GSM151025;GSM151026;GSM151027;GSM1510 28;GSM151031;GSM151033;GSM151034;GSM151035;GSM151036;GSM151037; GSM151038;GSM151039;GSM151041;GSM151042;GSM151047;GSM151048;GS M151050;GSM151051;GSM151052;GSM151053;GSM151054;GSM151055;GSM1 51056;GSM151057;GSM151060;GSM151063;GSM151065;GSM151066;GSM6531 6;GSM65320;GSM65321;GSM65322;GSM65323;GSM65324;GSM65325;GSM653 26;GSM65329;GSM65330;GSM65331;GSM65332;GSM65333;GSM65334;GSM65 335;GSM65336;GSM65338;GSM65339;GSM65340;GSM65341;GSM65344;GSM6 5345;GSM65348;GSM65351;GSM65352;GSM65353;GSM65354;GSM65355;GSM 65356;GSM65357;GSM65358;GSM65359;GSM65360;GSM65361;GSM65362;GS M65363;GSM65364;GSM65365;GSM65366;GSM65367;GSM65368;GSM65370;G SM65371;GSM65373;GSM65376;GSM65377;GSM65378;GSM65379;
State B: no patient
State C: GSM150955;GSM150966;GSM150971;GSM150983;GSM150984; State D: no patient Group E:
GSM150947;GSM150986;GSM150987;GSM151000;GSM1510020;GSM151005;
GSM151018;GSM151044;GSM151058;GSM151067;GSM151068;GSM65317;GS
M65318;GSM65319;GSM65346;GSM65372;GSM65374; The patients of Cohort II were assigned to the breast cancer-specific states and group E as follows:
State A:
GSM36777;GSM36779;GSM36785;GSM36796;GSM36802;GSM36805;GSM3680 6;GSM36810;GSM36818;GSM36821;GSM36826;GSM36830;GSM36839;GSM368 41;GSM36843;GSM36845;GSM36857;GSM36864;GSM36866;GSM36867;GSM36 869;GSM36872;GSM36877;GSM36884;GSM36885;GSM36889;GSM36890;GSM3 6894;GSM36896;GSM36907;GSM36911;GSM36914;GSM36920;GSM36922;GSM 36924;GSM36925;GSM36936;GSM36938;GSM36945;GSM36967;GSM36970;GS M36973;GSM36983;GSM36984;GSM36989;GSM36990;GSM36992;GSM36993;G SM36994;GSM36995;GSM36998;GSM36999;GSM37000;GSM37004;GSM37010; GSM37011;GSM37013;GSM37015;GSM37024;GSM37026;GSM37028;GSM3703 5;GSM37036;GSM37037;GSM37044;
State B: GSM36811
State C:
GSM36784;GSM36786;GSM36787;GSM36799;GSM36801;GSM36803;GSM3680 4;GSM36807;GSM36815;GSM36819;GSM36820;GSM36823;GSM36824;GSM368 32;GSM36838;GSM36840;GSM36844;GSM36853;GSM36856;GSM36858;GSM36 859;GSM36860;GSM36861;GSM36868;GSM36874;GSM36878;GSM36882;GSM3 6883;GSM36888;GSM36895;GSM36898;GSM36899;GSM36901;GSM36902;GSM 36903;GSM36908;GSM36910;GSM36913;GSM36927;GSM36930;GSM36934;GS M36950;GSM36957;GSM36960;GSM36962;GSM36963;GSM36971;GSM36975;G SM36980;GSM36982;GSM36986;GSM36987;GSM36988;GSM37005;GSM37007; GSM37012;GSM37018;GSM37051; State D:
GSM36778;GSM36790;GSM36792;GSM36794;GSM36813;GSM36814;GSM3681 7;GSM36829;GSM36836;GSM36837;GSM36842;GSM36848;GSM36850;GSM368 70;GSM36871;GSM36881;GSM36887;GSM36892;GSM36893;GSM36897;GSM36 900;GSM36916;GSM36917;GSM36919;GSM36921;GSM36929;GSM36932;GSM3 6942;GSM36944;GSM36946;GSM36947;GSM36948;GSM36954;GSM36956;GSM 36958;GSM36965;GSM36972;GSM36976;GSM36979;GSM36985;GSM36996;GS M36997;GSM37003;GSM37006;GSM37008;GSM37009;GSM37019;GSM37025;G SM37029;GSM37031;GSM37032;GSM37033;GSM37038;GSM37041;GSM37057; GSM37059;GSM37060;GSM37062;
Group E:
GSM36781;GSM36782;GSM36783;GSM36789;GSM36825;GSM36831;GSM3683 4;GSM36849;GSM36851;GSM36852;GSM36873;GSM36880;GSM36928;GSM369 31 ;GSM36933;GSM36939;GSM36943;GSM36951 ;GSM36974;GSM37001 ;GSM37 014;GSM37027;GSM37030;GSM37039;GSM37046;GSM37047;GSM37058;
Data for the above patients can be retrieved by typing the patient identifiers into the GEO accession no. field at http://www.ncbi.nlm.nih.gov/geo/.
6. Identification of genes for qPCR to determine renal cell cancer-specific states
As described in PCT/EP201 1/057691 and Beleut et al, BMC cancer (2012), 12:310 as well as herein, unsupervised two-way hierarchical clustering for microarray expression data across different renal cell cancers such as papillary, clear cell and chromophob renal cell cancer was used to identify three renal cancer-specific states, namely A, B and C. Further, it was shown that these states have functional relevance as patients with state A show a prolonged survival time compared to patients with cancer states B or C. Importantly, this novel classification is independent of above mentioned histological parameters or other parameters which are used in clinics to classify tumors. Small tumors, for which the classical pathologist foresees a good prognosis are highly aggressive, when affiliated to groups B or C, and vice versa large (advanced) tumors for which the prognosis is unfavorable for patients can be regarded as "mild" when falling into group A
Then the expression data of such microarray analysis was normalized employing the MAS5 algorithm provided by Affymetrix, ensued by a data filtering process in which all those genes are excluded from further analysis whose number of present calls was inferior to 25%. Present calls is a factor specific to Affymetrix microarrays, where a probe array consists of a number of oligonucleotide probe cells and each probe cell contains a unique oligonucleotide probe. Probes are tiled in probe pairs as a Perfect Match (PM) and a Mismatch (MM). The sequence for PM andMM are the same, except for a change to the Watson-Crick complement in the middle of the MM probe sequence. A probe set consists of a series of probe pairs and represents an expressed transcript. The sample should bind stronger to PM than to MM, so one assumes that for a given probe set for example 25% or more of the PM values should be higher than the MM values to be deemed valid. The expression values of the remaining genes were then correlated with the patient states A, B, C. The list of genes describing states A, B, and C is shown in Table 9.
In a next step one selected at least the 2 genes, preferably at least 10 out of the remaining genes with the most positive, and the at least 2 genes, preferably at least 10 with the most negative correlation for a disease-specific state such as renal cancer-specific states A, B, or C. Further it is possible to add at least 2, preferably at least 10 genes that are randomly selected and at least 2, preferably at least 10 genes showing the least variation across all the states.
All genes for which e.g. qPCR and microarray expression data correlate above threshold (see Table 10) formed a set of genes. In a subsequent step the parameters for a naive Bayes classifier are calculated for every gene (i) and every state, i.e. mean μι and standard deviations oi over every state A,B,C individually. For state A the mean over all patients for gene EPAS 1 was calculated according
n (all petients
classified A)
^A.EPASl Λ— V XEPASl,j where XEPASI denotes the qPCR value measured for patient (j) and gene EPAS1. Similarly, means and standard deviations for all patients which are not allocated to the particular state are calculated, that is, nonA, nonB, nonC are calculated. For nonA the mean over all patients not being allocated to state A (that is, allocated to either B or C, or not allocated at all) and gene EPAS1 was calculated according to
ents
ied A)
xEPASl,j
Figure imgf000095_0001
For state B and C, the mean over all patients for gene EPAS1 was calculated correspondingly and as described above. For nonB and nonC, the mean over all patients not being allocated to state B (that is, allocated to either A or C, or not allocated at all) or not being allocated to state C (that is, allocated to either A or B, or not allocated at all) and gene EPAS1 was calculated correspondingly and as described above. For the set of genes for which qPCR data were measured, parameter values are given in table 1 1.
Figure imgf000096_0002
Table 1 1
Subsequently, posterior probabilities to denote a state are calculated for all genes. Taking again state A and gene EPAS1 as an example, the posterior probability that the measured aPCR value x for EPAS1 denotes state A is calculated according to
(x- μΑ,ΕΡΑΞι)
2σ A.EPASl
A.EPASl
2πσ
For a subset of genes, the values are cal for state A, according to
A,subset A
Figure imgf000096_0001
e ongs to su set
Where for k=EPASl the individual value PA,K would be PA,EPASI°£ the preceding formula. The state of a sample is allocated by determining the maximum value of the individual subset probabilities. For example to determine whether a sample is state A or not, the state SA is calculated by
¾ rriilX {PA,subset A> ^nonA.subset nonA^) The same caculations were done correspondingly and as described above for posterior probabilities for states B and C. To determine whether a sample is state A, B, or C, SABC was calculated by
¾BC = max {PA,subset Α> Ρβ, subset B,Pc, subset c)
The subsets of genes for the individual states were selected from the qPCR set of genes according to the following iterative procedure:
1. the subset of predictive genes is empty in the beginning of the procedure
2. computing the accuracy of predicting the desired state A,B, or C for all single genes of the set by employing the na'ive Bayes model,
3. select a number of genes with the best accuracy of step 2 and add them to the set of predictive genes, and take them out of the set. Preferably the number of genes selected is at least one, two, or three.
4. compute the accuracy of predicting the desired state A,B, or C for the
combination of all genes in the subset in combination with each single remaining gene of the set.
5. select a number of genes with the best accuracy of step 4 and add them to the set of predictive genes, and take them out of the set. Preferably the number of genes selected is at least one, two, or three.
6. Repeat steps 4) and 5) until a desired accuracy is obtained.
Table 12 shows the accuracy obtained with an increasing subset of genes, for predicting a single state (a) and to predict the state of a sample directly from a single set of measurements (b)
Figure imgf000097_0001
Figure imgf000098_0001
Table 12
The assignment to a specific state after testing a individual sample for the expression of these predictive descriptors was performed by calculating
Figure imgf000098_0002
Sji j)
with n.subset n
Figure imgf000098_0003
belongs to subset n
and
(xk~ un,k)2
P , = e 2iJ™.fc wherein Xk denote the measured cycle numbers for each of the descriptors k and parameters μη k, and ση _fc denote predetermined parameters for each gene k and state n , wherein Sn with the highest value determines the disease-specific state.
Table 1
Figure imgf000099_0001
55 208983 s at A normal 0.47 115 17 0.60 55
56 209070 s at A normal 0.46 113 22 0.69 56
57 209071 s at A normal 0.45 111 24 0.79 57
58 209182 s at A normal 0.46 111 21 0.59 58
59 209183 s at A normal 0.46 113 22 0.72 59
60 209199 s at A normal 0.45 112 23 0.63 60
61 209201 x at A normal 0.46 114 22 0.63 61
62 209473 at A normal 0.46 117 23 0.60 62
63 209474 s at A normal 0.45 113 23 0.47 63
64 209543 s at A normal 0.45 111 24 0.62 64
65 209574 s at A normal 0.45 116 24 0.47 65
66 209749 s at A normal 0.45 110 23 0.48 66
67 210095 s at A normal 0.45 118 24 0.85 67
68 210512 s at A normal 0.45 114 24 0.86 68
69 210513 s at A normal 0.45 119 26 0.66 69
70 210839 s at A normal 0.45 111 23 0.66 70
71 211148 s at A normal 0.45 118 25 0.47 71
72 211527 x at A normal 0.45 117 24 0.65 72
73 211919 s at A normal 0.45 111 24 0.63 73
74 212077 at A normal 0.45 117 25 0.78 74
75 212097 at A normal 0.47 116 17 0.74 75
76 212143 s at A normal 0.47 117 14 0.76 76
77 212171 x at A normal 0.45 118 24 0.69 77
78 212552 at A normal 0.46 115 22 0.75 78
79 212730 at A normal 0.45 114 24 0.58 79
80 213001 at A normal 0.45 113 23 0.57 80
81 213222 at A normal 0.45 117 24 0.61 81
82 213349 at A normal 0.45 112 23 0.69 82
83 213352 at A normal 0.45 118 25 0.51 83
84 213416 at A normal 0.45 111 23 0.52 84
85 215125 s at A normal 0.45 118 24 0.64 85
86 215363 x at A normal 0.45 114 24 0.53 86
87 215446 s at A normal 0.46 112 19 0.62 87
88 216589 at A normal 0.45 114 24 0.56 88
89 217456 x at A normal 0.45 113 24 0.84 89
90 217757 at A normal 0.45 115 25 0.80 90
91 218484 at A normal 0.46 113 21 0.74 91
92 218723 s at A normal 0.45 115 23 0.68 92
93 218888 s at A normal 0.46 115 18 0.63 93
94 219232 s at A normal 0.46 113 22 0.67 94
95 219506 at A normal 0.45 114 23 0.62 95
96 221009 s at A normal 0.45 116 23 0.70 96
97 221031 s at A normal 0.46 112 19 0.66 97
98 221530 s at A normal 0.45 116 24 0.68 98
99 221875 x at A normal 0.45 115 25 0.82 99
100 222033 s at A normal 0.46 112 22 0.69 100
Table 2
Figure imgf000100_0001
107 200007 at B Invers 0.42 21 8 0.80 107
108 200008 s at B Invers 0.43 22 7 0.57 108
109 200009 at B Invers 0.45 23 5 0.75 109
110 200010 at B Invers 0.45 21 5 0.78 110
111 200012 x at B Invers 0.45 22 5 0.82 111
112 200013 at B Invers 0.42 19 7 0.85 112
113 200014 s at B Invers 0.41 19 8 0.76 113
114 200015 s at B Invers 0.41 20 9 0.79 114
115 200016 x at B Invers 0.43 23 7 0.88 115
116 200017 at B Invers 0.44 23 6 0.87 116
117 200018 at B Invers 0.44 19 5 0.83 117
118 200019 s at B Invers 0.43 20 6 0.85 118
119 200020 at B Invers 0.42 20 8 0.71 119
120 200021 at B Invers 0.40 21 11 0.87 120
121 200022 at B Invers 0.40 17 8 0.79 121
122 200023 s at B Invers 0.44 21 6 0.75 122
123 200024 at B Invers 0.40 19 9 0.81 123
124 200025 s at B Invers 0.46 21 4 0.85 124
125 200026 at B Invers 0.41 19 8 0.88 125
126 200027 at B Invers 0.41 20 9 0.76 126
127 200028 s at B Invers 0.42 20 8 0.72 127
128 200029 at B Invers 0.45 21 5 0.87 128
129 200030 s at B Invers 0.44 21 6 0.76 129
130 200031 s at B Invers 0.45 21 5 0.90 130
131 200032 s at B Invers 0.45 21 5 0.87 131
132 200033 at B Invers 0.42 21 8 0.80 132
133 200034 s at B Invers 0.41 18 8 0.84 133
134 200036 s at B Invers 0.42 18 7 0.73 134
135 200037 s at B Invers 0.42 19 7 0.75 135
136 200038 s at B Invers 0.44 20 5 0.80 136
137 200039 s at B Invers 0.46 22 4 0.67 137
138 200040 at B Invers 0.44 22 6 0.75 138
139 200044 at B Invers 0.40 19 9 0.71 139
140 200045 at B Invers 0.43 22 7 0.60 140
141 200046 at B Invers 0.44 22 6 0.67 141
142 200047 s at B Invers 0.43 20 7 0.70 142
143 200048 s at B Invers 0.45 21 5 0.69 143
144 200050 at B Invers 0.40 19 9 0.69 144
145 200052 s at B Invers 0.44 22 6 0.58 145
146 200054 at B Invers 0.45 22 5 0.56 146
147 200056 s at B Invers 0.45 22 5 0.60 147
148 200059 s at B Invers 0.44 22 6 0.71 148
149 200061 s at B Invers 0.44 22 6 0.90 149
150 200062 s at B Invers 0.45 22 5 0.88 150
151 200063 s at B Invers 0.43 20 7 0.85 151
152 200064 at B Invers 0.43 22 7 0.78 152
153 200065 s at B Invers 0.40 19 9 0.81 153
154 200066 at B Invers 0.41 20 9 0.68 154
155 200067 x at B Invers 0.44 21 6 0.72 155
156 200068 s at B Invers 0.47 22 3 0.80 156
157 200071 at B Invers 0.41 19 8 0.56 157
158 200072 s at B Invers 0.43 20 7 0.64 158
159 200074 s at B Invers 0.44 20 5 0.74 159
160 200075 s at B Invers 0.42 21 8 0.67 160
161 200077 s at B Invers 0.40 17 8 0.81 161
162 200078 s at B Invers 0.42 18 7 0.64 162
163 200079 s at B Invers 0.44 22 6 0.71 163 164 200082 s at B Invers 0.46 22 4 0.82 164
165 200084 at B Invers 0.46 22 4 0.68 165
166 200085 s at B Invers 0.41 18 8 0.72 166
167 200086 s at B Invers 0.42 19 7 0.80 167
168 200087 s at B Invers 0.45 22 5 0.74 168
169 200088 x at B Invers 0.40 20 10 0.89 169
170 200089 s at B Invers 0.40 17 9 0.87 170
171 200091 s at B Invers 0.42 21 8 0.84 171
172 200092 s at B Invers 0.42 21 8 0.89 172
173 200093 s at B Invers 0.46 23 4 0.72 173
174 200094 s at B Invers 0.40 17 9 0.85 174
175 200095 x at B Invers 0.42 21 8 0.90 175
176 200096 s at B Invers 0.46 21 4 0.67 176
177 200097 s at B Invers 0.45 22 5 0.77 177
178 200099 s at B Invers 0.45 23 5 0.85 178
179 208645 s at B Invers 0.40 23 12 0.86 179
180 208853 s at B Invers 0.40 23 12 0.64 180
181 AFFX-HSAC07/ B Invers 0.44 22 6 0.67 181 X00351 5 at
182 AFFX-HSAC07/ B Invers 0.43 22 7 0.79 182 X00351 M at
183 AFFX- B Invers 0.45 22 5 0.79 183
HUMGAPDH/
M33197 5 at
184 AFFX- B Invers 0.42 19 7 0.79 184
HUMGAPDH/
M33197 M at
185 AFFX- B Invers 0.44 20 5 0.63 185
HUMISGF3A/
M97935 3 at
186 202379 s at B normal 0.41 18 8 0.81 186
187 202473 x at B normal 0.42 20 8 0.56 187
188 202509 s at B normal 0.41 20 9 0.67 188
189 208797 s at B normal 0.40 20 10 0.60 189
190 210800 at B normal 0.45 21 5 0.73 190
191 211386 at B normal 0.41 20 9 0.63 191
192 211679 x at B normal 0.43 21 7 0.51 192
193 211996 s at B normal 0.40 21 10 0.90 193
194 212027 at B normal 0.41 20 9 0.77 194
195 213813 x at B normal 0.43 19 6 0.70 195
196 215979 s at B normal 0.43 19 6 0.47 196
197 216527 at B normal 0.43 22 7 0.58 197
198 217448 s at B normal 0.40 21 10 0.61 198
199 220456 at B normal 0.43 20 7 0.70 199
200 220905 at B normal 0.40 19 9 0.71 200
Table 3
Figure imgf000102_0001
208 201216 at C normal 0.32 18 21 0.79 208
209 201268 at C normal 0.37 20 14 0.86 209
210 201284 s at c normal 0.35 22 19 0.63 210
211 201453 x at c normal 0.33 15 16 0.85 211
212 201454 s at c normal 0.34 16 15 0.80 212
213 201558 at c normal 0.33 20 21 0.71 213
214 201767 s at c normal 0.32 16 18 0.72 214
215 201814 at c normal 0.33 15 15 0.76 215
216 201816 s at c normal 0.32 16 18 0.77 216
217 201903 at c normal 0.32 16 18 0.79 217
218 201913 s at c normal 0.35 17 15 0.72 218
219 202031 s at c normal 0.33 17 18 0.71 219
220 202279 at c normal 0.32 18 20 0.69 220
221 202443 x at c normal 0.32 14 16 0.85 221
222 202661 at c normal 0.32 20 23 0.57 222
223 202788 at c normal 0.33 18 18 0.68 223
224 202931 x at c normal 0.34 17 16 0.77 224
225 202961 s at c normal 0.33 20 21 0.85 225
226 203228 at c normal 0.34 19 18 0.63 226
227 203257 s at c normal 0.33 17 18 0.73 227
228 203271 s at c normal 0.34 23 22 0.54 228
229 203510 at c normal 0.32 17 19 0.89 229
230 203642 s at c normal 0.32 18 21 0.77 230
231 203744 at c normal 0.33 15 15 0.66 231
232 203786 s at c normal 0.33 19 20 0.71 232
233 203917 at c normal 0.32 14 16 0.75 233
234 203931 s at c normal 0.32 17 19 0.68 234
235 204235 s at c normal 0.32 14 16 0.76 235
236 204276 at c normal 0.35 18 15 0.69 236
237 204480 s at c normal 0.32 16 18 0.67 237
238 204643 s at c normal 0.33 15 16 0.64 238
239 204779 s at c normal 0.33 18 19 0.69 239
240 204839 at c normal 0.32 19 21 0.71 240
241 204976 s at c normal 0.35 18 16 0.67 241
242 205011 at c normal 0.35 18 15 0.67 242
243 205633 s at c normal 0.32 14 16 0.67 243
244 208248 x at c normal 0.33 18 18 0.88 244
245 208660 at c normal 0.32 18 20 0.80 245
246 208700 s at c normal 0.32 25 29 0.73 246
247 208704 x at c normal 0.32 18 20 0.88 247
248 208746 x at c normal 0.32 18 20 0.85 248
249 208928 at c normal 0.34 21 20 0.61 249
250 209001 s at c normal 0.33 14 15 0.84 250
251 209179 s at c normal 0.32 15 17 0.72 251
252 209420 s at c normal 0.35 17 15 0.68 252
253 209781 s at c normal 0.32 17 19 0.69 253
254 210125 s at c normal 0.33 16 17 0.81 254
255 210201 x at c normal 0.36 17 13 0.79 255
256 210312 s at c normal 0.33 17 18 0.71 256
257 210453 x at c normal 0.32 17 19 0.85 257
258 211458 s at c normal 0.32 14 16 0.84 258
259 212062 at c normal 0.33 16 17 0.75 259
260 212186 at c normal 0.34 20 18 0.64 260
261 212528 at c normal 0.32 21 24 0.60 261
262 212656 at c normal 0.33 19 20 0.71 262
263 213035 at c normal 0.32 20 22 0.65 263
264 213057 at c normal 0.32 16 18 0.62 264 265 213093 at C normal 0.35 17 15 0.74 265
266 213107 at C normal 0.32 18 21 0.62 266
267 213573 at c normal 0.33 18 19 0.65 267
268 213587 s at c normal 0.32 19 22 0.66 268
269 213801 x at c normal 0.34 20 18 0.92 269
270 214039 s at c normal 0.35 19 17 0.86 270
271 214332 s at c normal 0.32 19 22 0.55 271
272 214439 x at c normal 0.33 17 17 0.77 272
273 214665 s at c normal 0.35 17 15 0.85 273
274 215566 x at c normal 0.32 19 21 0.69 274
275 217773 s at c normal 0.33 18 19 0.87 275
276 217782 s at c normal 0.32 18 21 0.59 276
277 217852 s at c normal 0.33 17 17 0.80 277
278 217874 at c normal 0.33 19 20 0.80 278
279 218018 at c normal 0.34 20 19 0.68 279
280 218200 s at c normal 0.36 19 15 0.87 280
281 218201 at c normal 0.32 19 21 0.83 281
282 218251 at c normal 0.36 17 13 0.76 282
283 218283 at c normal 0.33 17 18 0.69 283
284 218412 s at c normal 0.33 20 20 0.70 284
285 218447 at c normal 0.32 22 25 0.74 285
286 218493 at c normal 0.34 19 18 0.68 286
287 218526 s at c normal 0.35 23 19 0.66 287
288 218548 x at c normal 0.32 18 21 0.63 288
289 218567 x at c normal 0.34 21 20 0.67 289
290 218642 s at c normal 0.35 17 14 0.68 290
291 218722 s at c normal 0.32 17 19 0.65 291
292 218824 at c normal 0.32 18 21 0.65 292
293 219045 at c normal 0.34 17 16 0.59 293
294 219709 x at c normal 0.32 22 25 0.59 294
295 220147 s at c normal 0.32 22 25 0.75 295
296 221732 at c normal 0.33 16 16 0.70 296
297 221934 s at c normal 0.32 15 17 0.69 297
298 222125 s at c normal 0.33 16 17 0.70 298
299 34764 at c normal 0.33 17 18 0.59 299
300 41047 at c normal 0.34 17 16 0.67 300
Table 4
Figure imgf000104_0001
317 177 at D Invers 0.48 10 1 0.35 317
318 1773 at D Invers 0.48 10 1 0.36 318
319 179 at D Invers 0.48 10 1 0.42 319
320 1861 at D Invers 0.48 10 1 0.28 320
321 200593 s at D Invers 0.48 10 1 0.53 321
322 200594 x at D Invers 0.48 10 1 0.52 322
323 200595 s at D Invers 0.48 10 1 0.45 323
324 200596 s at D Invers 0.48 10 1 0.51 324
325 200597 at D Invers 0.48 10 1 0.42 325
326 200598 s at D Invers 0.48 10 1 0.59 326
327 200599 s at D Invers 0.48 10 1 0.59 327
328 200600 at D Invers 0.48 10 1 0.46 328
329 200601 at D Invers 0.48 10 1 0.52 329
330 200602 at D Invers 0.48 10 1 0.59 330
331 200603 at D Invers 0.48 10 1 0.58 331
332 200604 s at D Invers 0.48 10 1 0.42 332
333 200605 s at D Invers 0.48 10 1 0.48 333
334 200606 at D Invers 0.48 10 1 0.18 334
335 200607 s at D Invers 0.48 10 1 0.50 335
336 200608 s at D Invers 0.48 10 1 0.51 336
337 200609 s at D Invers 0.48 10 1 0.52 337
338 200610 s at D Invers 0.48 10 1 0.51 338
339 200611 s at D Invers 0.48 10 1 0.48 339
340 200612 s at D Invers 0.48 10 1 0.37 340
341 200613 at D Invers 0.48 10 1 0.47 341
342 200614 at D Invers 0.48 10 1 0.54 342
343 200615 s at D Invers 0.48 10 1 0.41 343
344 200616 s at D Invers 0.48 10 1 0.59 344
345 200617 at D Invers 0.48 10 1 0.48 345
346 200618 at D Invers 0.48 10 1 0.54 346
347 200619 at D Invers 0.48 10 1 0.43 347
348 200620 at D Invers 0.48 10 1 0.56 348
349 200621 at D Invers 0.48 10 1 0.21 349
350 200622 x at D Invers 0.48 10 1 0.42 350
351 200623 s at D Invers 0.48 10 1 0.46 351
352 200624 s at D Invers 0.48 10 1 0.49 352
353 200625 s at D Invers 0.48 10 1 0.55 353
354 200626 s at D Invers 0.48 10 1 0.47 354
355 200627 at D Invers 0.48 10 1 0.53 355
356 200628 s at D Invers 0.48 10 1 0.26 356
357 200629 at D Invers 0.48 10 1 0.46 357
358 200630 x at D Invers 0.48 10 1 0.63 358
359 200631 s at D Invers 0.48 10 1 0.54 359
360 200632 s at D Invers 0.48 10 1 0.48 360
361 200633 at D Invers 0.48 10 1 0.66 361
362 200634 at D Invers 0.48 10 1 0.58 362
363 200635 s at D Invers 0.48 10 1 0.51 363
364 200636 s at D Invers 0.48 10 1 0.37 364
365 200637 s at D Invers 0.48 10 1 0.19 365
366 200638 s at D Invers 0.48 10 1 0.21 366
367 200639 s at D Invers 0.48 10 1 0.53 367
368 200640 at D Invers 0.48 10 1 0.55 368
369 200641 s at D Invers 0.48 10 1 0.19 369
370 200642 at D Invers 0.48 10 1 0.58 370
371 200643 at D Invers 0.48 10 1 0.39 371
372 200644 at D Invers 0.48 10 1 0.53 372
373 200645 at D Invers 0.48 10 1 0.57 373 374 200646 s at D Invers 0.48 10 1 0.40 374
375 200647 x at D Invers 0.48 10 1 0.50 375
376 200648 s at D Invers 0.48 10 1 0.21 376
377 200649 at D Invers 0.48 10 1 0.49 377
378 200650 s at D Invers 0.48 10 1 0.64 378
379 200651 at D Invers 0.48 10 1 0.65 379
380 200652 at D Invers 0.48 10 1 0.50 380
381 200653 s at D Invers 0.48 10 1 0.65 381
382 200654 at D Invers 0.48 10 1 0.43 382
383 200655 s at D Invers 0.48 10 1 0.58 383
384 200656 s at D Invers 0.48 10 1 0.44 384
385 200657 at D Invers 0.48 10 1 0.49 385
386 200658 s at D Invers 0.48 10 1 0.49 386
387 200659 s at D Invers 0.48 10 1 0.44 387
388 201982 s at D Invers 0.48 11 1 0.37 388
389 204038 s at D Invers 0.50 11 0 0.65 389
390 206559 x at D Invers 0.50 11 0 0.79 390
391 207421 at D Invers 0.50 11 0 0.31 391
392 208584 at D Invers 0.50 11 0 0.21 392
393 208834 x at D Invers 0.50 11 0 0.73 393
394 210245 at D Invers 0.50 11 0 0.15 394
395 211498 s at D Invers 0.48 11 1 0.15 395
396 215229 at D Invers 0.48 11 1 0.15 396
397 215420 at D Invers 0.48 11 1 0.75 397
398 216524 x at D Invers 0.50 11 0 0.79 398
399 217515 s at D Invers 0.50 11 0 0.13 399
The following explanation applies to tables 1 to 4.
"No." refers to gene numbers as mentioned herein. "ProbeSetID" refers to the identification number on the Affymetrix gene chip HT HG-U133A. "Sate" refers the respective renal cell cancer specific states. The term "Mode" defines whether a gene has to be over- or under-expressed for state A, B, C or D. "Invers" indicates under-expression and "normal" indicates over-expression relative to the value "limit value", which describes the value which used as control to decide on over-expression or under-expression. The term "Fit" describes the reliability of the limit value with a value of 0.5 indicating maximum reliability. The limit value will be put in the respective software, which is used for expression analysis, individually for each gene. SEQ ID No. refers to SEQ ID No. of the sequence listing. Table 5
Figure imgf000107_0001
51 203085 s at A normal 0.39 258 148 0.60 450
52 218813 s at A normal 0.39 263 151 0.50 451
53 215807 s at A normal 0.39 256 147 0.51 452
54 221972 s at A normal 0.39 254 146 0.62 453
55 202216 x at A normal 0.39 264 152 0.58 454
56 208699 x at A normal 0.39 274 158 0.63 455
57 207622 s at A normal 0.39 279 161 0.54 456
58 211571 s at A normal 0.39 263 152 0.57 457
59 218045 x at A normal 0.39 273 158 0.53 458
60 217370 x at A normal 0.39 278 161 0.55 459
61 201028 s at A normal 0.39 271 157 0.67 460
62 208938 at A normal 0.39 271 157 0.55 461
63 216971 s at A normal 0.39 234 136 0.42 462
64 201082 s at A normal 0.39 270 157 0.54 463
65 201050 at A normal 0.39 259 151 0.65 464
66 207791 s at A normal 0.39 276 161 0.63 465
67 203239 s at A normal 0.39 263 154 0.51 466
68 217543 s at A normal 0.39 290 170 0.55 467
69 206113 s at A normal 0.39 254 149 0.52 468
70 209355 s at A normal 0.39 259 152 0.56 469
71 215646 s at A normal 0.39 260 153 0.58 470
72 203381 s at A normal 0.39 268 158 0.67 471
73 202736 s at A normal 0.39 274 162 0.60 472
74 212016 s at A normal 0.39 262 155 0.52 473
75 208677 s at A normal 0.39 249 148 0.66 474
76 211090 s at A normal 0.39 282 168 0.47 475
77 206989 s at A normal 0.38 246 147 0.65 476
78 200852 x at A normal 0.38 266 159 0.74 477
79 202024 at A normal 0.38 266 159 0.59 478
80 210978 s at A normal 0.38 234 140 0.67 479
81 208047 s at A normal 0.38 267 160 0.36 480
82 208686 s at A normal 0.38 287 172 0.61 481
83 219679 s at A normal 0.38 282 170 0.55 482
84 209162 s at A normal 0.38 265 160 0.53 483
85 210186 s at A normal 0.38 281 170 0.57 484
86 202593 s at A normal 0.38 276 167 0.58 485
87 201120 s at A normal 0.38 274 166 0.61 486
88 221473 x at A normal 0.38 249 151 0.70 487
89 203319 s at A normal 0.38 262 159 0.58 488
90 200866 s at A normal 0.38 265 161 0.75 489
91 209344 at A normal 0.38 253 154 0.63 490
92 201009 s at A normal 0.38 256 156 0.74 491
93 208503 s at A normal 0.38 241 147 0.56 492
94 219929 s at A normal 0.38 277 169 0.54 493
95 211966 at A normal 0.38 276 169 0.58 494
96 214895 s at A normal 0.38 270 166 0.49 495
97 217239 x at A normal 0.38 270 166 0.50 496
98 204147 s at A normal 0.38 268 165 0.51 497
99 217364 x at A normal 0.38 250 154 0.56 498
100 221695 s at A normal 0.38 245 151 0.48 499
Table 6
No. ProbeSetID Symbol Mode Fit Correct False LimitValue SEQ ID No.
101 201784 s at B Invers 0.26 15 28 0.66 500
102 208671 at B Invers 0.26 18 34 0.69 501 103 216274 s at B Invers 0.25 14 27 0.65 502
104 208811 s at B Invers 0.25 15 29 0.49 503
105 214167 s at B Invers 0.25 17 33 0.80 505
106 201485 s at B Invers 0.25 14 28 0.50 505
107 202164 s at B Invers 0.25 19 38 0.58 506
108 208852 s at B Invers 0.25 16 32 0.64 507
109 200973 s at B Invers 0.25 16 33 0.51 508
110 200668 s at B Invers 0.25 13 27 0.70 509
111 201435 s at B Invers 0.25 13 27 0.55 510
112 203992 s at B Invers 0.24 11 23 0.48 511
113 212749 s at B Invers 0.24 16 34 0.57 512
114 202654 x at B Invers 0.24 14 30 0.55 513
115 201376 s at B Invers 0.24 13 28 0.54 514
116 220477 s at B Invers 0.24 12 26 0.50 515
117 202770 s at B Invers 0.24 15 33 0.55 516
118 208517 x at B Invers 0.24 14 31 0.76 517
119 202593 s at B Invers 0.24 18 40 0.49 518
120 210338 s at B Invers 0.24 18 40 0.66 519
121 200605 s at B Invers 0.24 13 29 0.62 520
122 209861 s at B Invers 0.24 13 29 0.57 521
123 200626 s at B Invers 0.24 16 36 0.69 522
124 201259 s at B Invers 0.23 19 43 0.61 523
125 200059 s at B Invers 0.23 15 34 0.74 524
126 221531 at B Invers 0.23 19 44 0.55 525
127 211972 x at B Invers 0.23 15 35 0.88 526
128 217927 at B Invers 0.23 15 35 0.71 527
129 208848 at B Invers 0.23 16 38 0.53 528
130 200084 at B Invers 0.23 13 31 0.69 529
131 201239 s at B Invers 0.23 13 31 0.65 530
132 201529 s at B Invers 0.23 13 31 0.55 531
133 208739 x at B Invers 0.23 15 36 0.67 532
134 209526 s at B Invers 0.23 14 34 0.54 533
135 221428 s at B Invers 0.23 14 34 0.44 534
136 216387 x at B Invers 0.23 21 51 0.72 535
137 201444 s at B Invers 0.22 11 27 0.58 536
138 217466 x at B Invers 0.22 22 54 0.73 537
139 210621 s at B Invers 0.22 15 37 0.54 538
140 202793 at B Invers 0.22 14 35 0.40 539
141 221691 x at B Invers 0.22 18 45 0.73 540
142 201386 s at B Invers 0.22 20 50 0.67 541
143 208669 s at B Invers 0.22 16 40 0.65 542
144 209069 s at B Invers 0.22 17 43 0.82 543
145 214578 s at B Invers 0.22 15 38 0.55 544
146 208810 at B Invers 0.22 13 33 0.58 545
147 201742 x at B Invers 0.22 11 28 0.50 546
148 203100 s at B Invers 0.22 11 28 0.38 547
149 211098 x at B Invers 0.22 12 31 0.55 548
150 217915 s at B Invers 0.22 12 31 0.61 549
151 207791 s at B Invers 0.22 13 34 0.49 550
152 210968 s at B Invers 0.22 13 34 0.77 551
153 216190 x at B Invers 0.22 13 34 0.39 552
154 208800 at B Invers 0.22 19 50 0.55 553
155 200624 s at B Invers 0.22 11 29 0.62 554
156 202131 s at B Invers 0.22 11 29 0.45 555
157 208620 at B Invers 0.22 11 29 0.70 556
158 208737 at B Invers 0.22 11 29 0.67 557
159 213619 at B Invers 0.22 11 29 0.72 558 160 203870 at B Invers 0.21 15 40 0.50 559
161 204186 s at B Invers 0.21 12 32 0.51 560
162 218218 at B Invers 0.21 15 40 0.48 561
163 221039 s at B Invers 0.21 12 32 0.42 562
164 207974 s at B Invers 0.21 16 43 0.68 563
165 201165 s at B Invers 0.21 13 35 0.59 564
166 202228 s at B Invers 0.21 13 35 0.75 565
167 208310 s at B Invers 0.21 13 35 0.52 566
168 216583 x at B Invers 0.21 13 35 0.56 567
169 200927 s at B Invers 0.21 20 54 0.64 568
170 200651 at B Invers 0.21 14 38 0.83 569
171 211997 x at B Invers 0.21 18 49 0.80 570
172 202139 at B Invers 0.21 11 30 0.52 571
173 214545 s at B Invers 0.21 11 30 0.28 572
174 221688 s at B Invers 0.21 15 41 0.65 573
175 202899 s at B Invers 0.21 16 44 0.64 574
176 217356 s at B Invers 0.21 16 44 0.60 575
177 202527 s at B Invers 0.21 13 36 0.48 576
178 211864 s at B Invers 0.21 13 36 0.51 577
179 206174 s at B Invers 0.21 9 25 0.52 578
180 201010 s at B Invers 0.21 14 39 0.74 579
181 213047 x at B Invers 0.21 14 39 0.68 580
182 AFFX-HSAC07 B Invers 0.21 10 28 0.59 581 /X00351 5 at
183 200096 s at B Invers 0.21 15 42 0.75 582
184 208643 s at B Invers 0.21 15 42 0.67 583
185 216202 s at B Invers 0.21 11 31 0.44 584
186 217811 at B Invers 0.21 11 31 0.54 585
187 218172 s at B Invers 0.21 11 31 0.49 586
188 217635 s at B Invers 0.21 17 48 0.43 587
189 201698 s at B Invers 0.21 18 51 0.72 588
190 208656 s at B Invers 0.21 18 51 0.78 589
191 217379 at B Invers 0.21 18 51 0.73 590
192 202583 s at B Invers 0.21 12 34 0.49 591
193 208313 s at B Invers 0.21 12 34 0.69 592
194 212433 x at B Invers 0.21 12 34 0.88 593
195 211563 s at B Invers 0.21 13 37 0.48 594
196 208655 at B Invers 0.21 20 57 0.76 595
197 208867 s at B Invers 0.21 14 40 0.47 596
198 204427 s at B Invers 0.21 15 43 0.44 597
199 206075 s at B Invers 0.21 15 43 0.49 598
200 212595 s at B Invers 0.21 15 43 0.59 599
Table 7
Figure imgf000110_0001
211 AFFX-HSAC07/ C Invers 0.25 118 233 0.84 610 X00351 5 at
212 208255 s at c Invers 0.25 138 273 0.65 611
213 201050 at c Invers 0.25 139 276 0.68 612
214 208503 s at c Invers 0.25 132 263 0.59 613
215 206089 at c Invers 0.25 134 267 0.54 614
216 200922 at c Invers 0.25 135 270 0.63 615
217 210655 s at c Invers 0.25 132 264 0.57 616
218 212016 s at c Invers 0.25 125 250 0.56 617
219 212937 s at c Invers 0.25 136 273 0.47 618
220 203109 at c Invers 0.25 130 262 0.65 619
221 200920 s at c Invers 0.25 124 251 0.78 620
222 216971 s at c Invers 0.25 129 262 0.44 621
223 201156 s at c Invers 0.25 119 242 0.65 622
224 217211 at c Invers 0.25 117 238 0.59 623
225 216940 x at c Invers 0.25 96 196 0.54 624
226 AFFX-HSAC07 c Invers 0.25 113 231 0.91 625 /X00351 M at
227 205346 at c Invers 0.25 106 217 0.46 626
228 201149 s at c Invers 0.25 125 257 0.64 627
229 219710 at c Invers 0.25 125 257 0.48 628
230 203504 s at c Invers 0.25 123 253 0.61 629
231 211571 s at c Invers 0.25 117 241 0.61 630
232 211965 at c Invers 0.25 98 202 0.60 631
233 214995 s at c Invers 0.25 129 267 0.56 632
234 211672 s at c Invers 0.25 99 205 0.47 633
235 215399 s at c Invers 0.25 113 234 0.68 634
236 221926 s at c Invers 0.25 111 230 0.43 635
237 201009 s at c Invers 0.25 109 226 0.75 636
238 215646 s at c Invers 0.25 119 247 0.62 637
239 209051 s at c Invers 0.25 131 272 0.57 638
240 218700 s at c Invers 0.24 109 227 0.57 639
241 201353 s at c Invers 0.24 130 271 0.64 640
242 210926 at c Invers 0.24 129 269 0.73 641
243 206028 s at c Invers 0.24 117 244 0.68 642
244 217364 x at c Invers 0.24 137 286 0.59 643
245 211457 at c Invers 0.24 91 190 0.51 644
246 218748 s at c Invers 0.24 125 262 0.43 645
247 209162 s at c Invers 0.24 120 252 0.56 646
248 201369 s at c Invers 0.24 126 265 0.67 647
249 211598 x at c Invers 0.24 126 265 0.89 648
250 215807 s at c Invers 0.24 116 244 0.53 649
251 204456 s at c Invers 0.24 132 278 0.37 650
252 211900 x at c Invers 0.24 140 295 0.60 651
253 201617 x at c Invers 0.24 119 251 0.60 652
254 205409 at c Invers 0.24 118 249 0.49 653
255 208477 at c Invers 0.24 127 268 0.52 654
256 206759 at c Invers 0.24 124 262 0.37 655
257 203239 s at c Invers 0.24 114 241 0.52 656
258 201950 x at c Invers 0.24 113 239 0.71 657
259 212393 at c Invers 0.24 130 275 0.53 658
260 49049 at c Invers 0.24 121 256 0.53 659
261 214755 at c Invers 0.24 120 254 0.54 660
262 202947 s at c Invers 0.24 119 252 0.59 661
263 211899 s at c Invers 0.24 101 214 0.58 662
264 206989 s at c Invers 0.24 132 280 0.68 663
265 201801 s at c Invers 0.24 115 244 0.56 664 266 34031 i at C Invers 0.24 115 244 0.59 665
267 213887 s at C Invers 0.24 123 261 0.57 666
268 218813 s at c Invers 0.24 113 240 0.53 667
269 208750 s at c Invers 0.24 121 257 0.71 668
270 216100 s at c Invers 0.24 128 272 0.54 669
271 211203 s at c Invers 0.24 85 181 0.62 670
272 40850 at c Invers 0.24 130 277 0.69 671
273 216591 s at c Invers 0.24 122 260 0.59 672
274 208144 s at c Invers 0.24 136 290 0.51 673
275 203828 s at c Invers 0.24 98 209 0.56 674
276 204427 s at c Invers 0.24 98 209 0.61 675
277 220202 s at c Invers 0.24 128 273 0.58 676
278 201668 x at c Invers 0.24 134 286 0.62 677
279 216899 s at c Invers 0.24 119 254 0.49 678
280 211635 x at c Invers 0.24 117 250 0.51 679
281 212025 s at c Invers 0.24 95 203 0.58 680
282 212878 s at c Invers 0.24 108 231 0.60 681
283 204132 s at c Invers 0.24 122 261 0.58 682
284 209156 s at c Invers 0.24 129 276 0.78 683
285 210978 s at c Invers 0.24 136 291 0.70 684
286 213052 at c Invers 0.24 107 229 0.64 685
287 214845 s at c Invers 0.24 121 259 0.56 686
288 209261 s at c Invers 0.24 128 274 0.44 687
289 219725 at c Invers 0.24 97 208 0.57 688
290 221473 x at c Invers 0.24 122 262 0.73 689
291 213901 x at c Invers 0.24 106 228 0.51 690
292 202855 s at c Invers 0.24 111 239 0.47 691
293 205653 at c normal 0.25 127 253 0.62 692
294 213673 x at c normal 0.25 113 231 0.75 693
295 210264 at c normal 0.25 125 258 0.54 694
296 214197 s at c normal 0.25 94 195 0.72 695
297 205485 at c normal 0.24 116 245 0.55 696
298 201404 x at c normal 0.24 98 207 0.76 697
299 AFFX- c normal 0.24 103 218 0.79 698
M27830 M at
300 212781 at c normal 0.24 106 226 0.68 699
Table 8
Figure imgf000112_0001
317 AFFX-HSAC07/ D Invers 0.25 106 204 0.83 716 X00351 5 at
318 214007 s at D Invers 0.25 107 206 0.52 717
319 210057 at D Invers 0.25 101 195 0.44 718
320 208699 x at D Invers 0.25 91 176 0.63 719
321 200922 at D Invers 0.25 99 193 0.56 720
322 217370 x at D Invers 0.25 102 199 0.56 721
323 207722 s at D Invers 0.25 99 194 0.52 722
324 200852 x at D Invers 0.25 84 165 0.72 723
325 211966 at D Invers 0.25 88 173 0.58 724
326 216591 s at D Invers 0.25 90 178 0.53 725
327 201151 s at D Invers 0.25 89 177 0.48 726
328 211136 s at D Invers 0.25 89 177 0.56 727
329 221695 s at D Invers 0.25 90 179 0.45 728
330 204456 s at D Invers 0.25 106 213 0.32 729
331 AFFX- D Invers 0.25 86 173 0.77 730
HUMGAPDH/
M33197 5 at
332 221972 s at D Invers 0.25 80 161 0.58 731
333 209208 at D Invers 0.25 98 198 0.52 732
334 215760 s at D Invers 0.25 92 186 0.45 733
335 209953 s at D Invers 0.25 89 180 0.57 734
336 217364 x at D Invers 0.25 88 178 0.55 735
337 209261 s at D Invers 0.25 82 166 0.29 736
338 204426 at D Invers 0.25 99 201 0.60 737
339 202354 s at D Invers 0.25 84 171 0.50 738
340 213048 s at D Invers 0.25 111 226 0.82 739
341 200745 s at D Invers 0.25 76 155 0.68 740
342 209113 s at D Invers 0.25 100 204 0.66 741
343 203381 s at D Invers 0.25 91 186 0.66 742
344 213872 at D Invers 0.25 81 166 0.59 743
345 217188 s at D Invers 0.25 94 194 0.50 744
346 212808 at D Invers 0.25 73 151 0.50 745
347 201711 x at D Invers 0.25 80 166 0.60 746
348 213998 s at D Invers 0.24 80 167 0.51 747
349 214880 x at D Invers 0.24 90 188 0.61 748
350 202205 at D Invers 0.24 76 159 0.55 749
351 202601 s at D Invers 0.24 94 197 0.53 750
352 200760 s at D Invers 0.24 71 149 0.70 751
353 209895 at D Invers 0.24 101 213 0.50 752
354 209051 s at D Invers 0.24 90 190 0.53 753
355 212025 s at D Invers 0.24 107 226 0.60 754
356 203109 at D Invers 0.24 98 207 0.62 755
357 200869 at D Invers 0.24 96 203 0.88 756
358 200008 s at D Invers 0.24 87 184 0.66 757
359 201617 x at D Invers 0.24 101 214 0.58 758
360 203319 s at D Invers 0.24 90 191 0.57 759
361 202226 s at D Invers 0.24 81 172 0.53 760
362 202736 s at D Invers 0.24 96 204 0.61 761
363 210461 s at D Invers 0.24 84 179 0.52 762
364 218813 s at D Invers 0.24 84 179 0.48 763
365 210978 s at D Invers 0.24 94 201 0.64 764
366 214251 s at D Invers 0.24 113 242 0.49 765
367 208645 s at D Invers 0.24 77 165 0.91 766
368 204857 at D Invers 0.24 90 193 0.45 767
369 202015 x at D Invers 0.24 83 178 0.63 768
370 209355 s at D Invers 0.24 96 206 0.56 769 371 211503 s at D Invers 0.24 95 204 0.68 770
372 212005 at D Invers 0.24 73 157 0.49 771
373 201149 s at D Invers 0.24 84 181 0.57 772
374 201392 s at D Invers 0.24 75 162 0.51 773
375 217635 s at D Invers 0.24 81 175 0.52 774
376 201411 s at D Invers 0.24 93 201 0.54 775
377 214943 s at D Invers 0.24 86 186 0.51 776
378 203803 at D Invers 0.24 96 208 0.53 777
379 202137 s at D Invers 0.24 84 182 0.53 778
380 204427 s at D Invers 0.24 107 232 0.65 779
381 AFFX-HSAC07/ D Invers 0.24 98 213 0.90 780
X00351 M at
382 208664 s at D Invers 0.24 69 150 0.53 781
383 206113 s at D Invers 0.24 90 196 0.50 782
384 200695 at D Invers 0.24 105 229 0.70 783
385 217360 x at D Invers 0.24 88 192 0.52 784
386 206668 s at D Invers 0.24 92 201 0.53 785
387 208677 s at D Invers 0.24 101 221 0.66 786
388 210317 s at D Invers 0.24 95 208 0.57 787
389 200796 s at D Invers 0.24 99 217 0.38 788
390 201946 s at D Invers 0.24 88 193 0.61 789
391 200637 s at D Invers 0.24 93 204 0.60 790
392 201008 s at D Invers 0.24 90 198 0.66 791
393 215780 s at D Invers 0.24 60 132 0.55 792
394 221423 s at D Invers 0.24 93 205 0.56 793
395 201461 s at D Invers 0.24 88 194 0.50 794
396 216815 at D Invers 0.24 97 214 0.56 795
397 209131 s at D Invers 0.24 105 232 0.42 796
398 214326 x at D Invers 0.24 95 210 0.57 797
399 217239 x at D Invers 0.24 85 188 0.49 798
The fo lowing explanation applies to tables 1 to 4
"No." refers to gene numbers as mentioned herein. "ProbeSetID" refers to the identification number on the Affymetrix gene chip HT HG-U133A. "State" refers the respective breast cancer specific states. The term "Mode" defines whether a gene has to be over- or under-expressed for state A, B, C or D. "Invers" indicates under- expression and "normal" indicates over-expression relative to the value "limit value", which describes the value which used as control to decide on over-expression or under-expression. The term "Fit" describes the reliability of the limit value with a value of 0.5 indicating maximum reliability. The limit value will be put in the respective software, which is used for expression analysis, individually for each gene. SEQ ID No. refers to SEQ ID No. of the sequence listing.
Table 9
Figure imgf000114_0001
ADCY6 800
ANKRD36 801
AP2M1 802
BDH1 803
BRD4 804
C3orfl8 805
CD93 806
CD99 807
COBLL1 808
CTSO 809
DALRD3 810
DDX6 811
DOCK9 812
EIF5B 813
EPAS1 814
F5 815
FLII 816
H3F3B 817
HEBP1 818
HMGB3 819
HMGCS2 820
IFITM3 821
LAPTM4B 822
LDB2 823
LPCAT3 824
MAPRE1 825
OPHN1 826
PGBD5 827
PRDX1 828
PRKAR1A 829
PSMB3 830
RASGRP1 831
RHOF 832
SERBP1 833
SERINC3 834
TCOF1 835
TK2 836
TLR3 837
TM2D3 838
TSG101 839
UBE2D4 840
UBN1 841
UFSP2 842
VWF 843
YME1L1 844
ZNF609 845
ACTB 846
ANGPTL4 847
GABBR2 848
GAPDH 849
GBAS 850
GOLGA8A 851
HCFC1 852 55 LDHA 853
56 NDUFA4 854
57 NKTR 855
58 RBM25 856
59 RGS5 857
60 UBB 858
Table 10
No. Gene
1 A2M
2 ANGPTL4
3 AP2M1
4 BDH1
5 CD99
6 COBLL1
7 DOCK9
8 EPAS1
9 F5
10 H3F3B
11 IFITM3
12 LAPTM4B
13 LDB2
14 LPCA 3
15 MAPRE1
16 NDUFA4
17 PGBD5
18 RGS5
19 SERBP1
20 SERINC3
21 TSG101
22 UFSP2
As regards renal cell cancer, some embodiments of the invention relate to:
1. Method of diagnosing, prognosing, stratifying and/or screening renal cell cancer in at least one human or animal patient, which is suspected of being afflicted by said disease, comprising at least the steps of:
a. Providing a sample of a human or animal individual being suspected to suffer from renal cell cancer;
b. Testing said sample for a signature indicative of a discrete renal cell cancer specific state by determining expression of at least 1 gene of genes 1 to 100, 101 to 200, 201 to 300 and/or 301 to 399 of tables 1, 2, 3 and/or 4; c. Allocating a discrete renal cell cancer-specific state to said sample based on the signature determined in step b.).
2. Method of determining the responsiveness of at least one human or animal individual, which is suspected of being afflicted by renal cell cancer, towards a pharmaceutically active agent comprising at least the steps of:
a. Providing a sample of a human or animal individual being suspected to suffer from renal cell cancer before the pharmaceutically active agent is administered;
b. Testing said sample for a signature indicative of a discrete renal cell cancer specific state by determining expression of at least 1 gene of genes 1 to 100, 101 to 200, 201 to 300 and/or 301 to 399 of tables 1, 2, 3 and/or 4;
c. Allocating a discrete renal cell cancer-specific state to said sample based on the signature determined in step b.);
d. Determining the effect of the pharmaceutically active agent on the disease symptoms in said individual and/or determining a discrete renal cell cancer-specific state according to steps a. to c. after the pharmaceutically active agent is administered;
e. Identifying a correlation between the effects on disease symptoms and/or discrete renal cell cancer-specific states and the initial discrete renal cell cancer-specific state of the sample.
3. Method of predicting the responsiveness of at least one patient which is suspected of being afflicted by renal cell cancer, towards a pharmaceutically active agent comprising at least the steps of:
a. Determining whether a correlation between effects on disease
symptoms and/or discrete renal cell cancer-specific states and the initial discrete renal cell cancer- specific state as a consequence of administration of a pharmaceutically active agent exists by using the method of embodiment 2;
b. Testing a sample of a human or animal individual patient which is suspected of being afflicted by renal cell cancer for a signature indicative of a discrete renal cell cancer specific state by determining expression of at least 1 gene of genes 1 to 100, 101 to 200, 201 to 300 and/or 301 to 399 of table 1, 2, 3 and/or 4;
c. Allocating a discrete renal cell cancer specific state -specific state to said sample based on the signature determined in step c); d. Comparing the discrete renal cell cancer specific state-specific state of the sample in step c. vs. the discrete renal cell cancer specific state- specific state for which a correlation has been determined in step a.); e. Predicting the effect of a pharmaceutically active compound on the disease symptoms in said patient.
4. A method of determining the effects of a potential pharmaceutically active compound for treatment of renal cell cancer, comprising at least the steps of:
a. Providing a sample of a human or animal individual being suspected to suffer from renal cell cancer before a pharmaceutically active agent is applied;
b. Testing said sample for a signature indicative of a discrete renal cell cancer specific state by determining expression of at least 1 gene of genes 1 to 100, 101 to 200, 201 to 300 and/or 301 to 399 of table 1, 2, 3 and/or 4;
c. Allocating a discrete renal cell cancer-specific state to said sample based on the signature determined in step b.);
d. Providing a sample of the same human or animal individual being suspected to suffer from renal cell cancer after a pharmaceutically active agent is applied;
e. Testing said sample for a signature indicative of a discrete renal cell cancer-specific state by determining expression of at least 1 gene of genes 1 to 100, 101 to 200, 201 to 300 and/or 301 to 399 of tables 1, 2, 3 and/or 4;
f. Allocating a discrete renal cell cancer specific state to said sample based on the signature determined in step e.);
g. Comparing the discrete renal cell cancer specific states identified in steps c.) and f). 5. A method of any of embodiments 1 to 4 wherein the signature is
characterized by the expression pattern of at least 5 genes, preferably of at least 10 genes of genes 1 to 100 of table 1, with genes 1 to 100 being over-expressed.
6. A method of any of embodiments 1 to 4 wherein the signature is
characterized by the expression pattern of at least 5 genes, preferably of at least 10 genes of genes 101 to 200 of table 2, with genes 101 to 185 being under-expressed and genes 186 to 200 being over-expressed.
7. A method of any of embodiments 1 to 4 wherein the signature is
characterized by the expression pattern of at least 5 genes, preferably of at least 10 genes of genes 201 to 300 of table 3, with genes 201 to 300 being over-expressed. 8. A method of any of embodiments 1 to 4 wherein the signature is
characterized by the expression pattern of at least 5 genes, preferably of at least 10 genes of genes 301 to 399 of table 4, with genes 301 to 399 being under-expressed.
9. A signature which is definable by the expression pattern of at least one, preferably of at least five, more preferably of at least 10 genes of genes 1 to 100, 101 to 200, 201 to 300 and/or 301 to 399 of tables 1, 2, 3 and/or 4 for use in diagnosing, prognosing, stratifying and/or screening renal cell cancer in human or animal individuals. 10. A signature which is definable by the expression pattern of at least one, preferably of at least five, more preferably of at least 10 genes of genes 1 to 100, 101 to 200, 201 to 300 and/or 301 to 399 of tables 1, 2, 3 and/or 4 for use as a read out of a target for development, identification and/or screening of at least one pharmaceutically active compound for treatment of renal cell cancer.
11. A signature for use according to embodiment 9 or 10, which is definable by the expression pattern of at least one, preferably of at least five, more preferably of at least 10 genes 1 to 100 of table 1, with genes 1 to 100 being over-expressed. 12. A signature for use according to embodiment 9 or 10, which is definable by the expression pattern of at least one, preferably of at least five, more preferably of at least 10 genes of genes 101 to 200 of table 2, with genes 101 to 185 being under- expressed and genes 186 to 200 being over-expressed.
13. A signature for use according to embodiment 9 or 10, which is definable by the expression pattern of at least one, preferably of at least five, more preferably of at least 10 genes of genes 201 to 300 of table 3, with genes 201 to 300 being over- expressed.
14. A signature for use according to embodiment 9 or 10, which is definable by the expression pattern of at least one, preferably of at least five, more preferably of at least 10 genes of genes 301 to 399 of table 4, with genes 301 to 399 being under- expressed.
15. A discrete disease-specific state which is definable by a signature according to any of embodiments 9 to 14 for use in diagnosing, prognosing, stratifying and/or screening renal cell cancer in human or animal individuals.
16. A discrete disease-specific state which is definable by a signature according to any of embodiments 9 to 14 for use as a read out of a target for development, identification and/or screening of at least one pharmaceutically active compound for treatment of renal cell cancer.
As regards breast cancer, some embodiments of the invention relate to:
1. Method of diagnosing, prognosing, stratifying and/or screening breast cancer in at least one human or animal patient, which is suspected of being afflicted by said disease, comprising at least the steps of:
a. Providing a sample of a human or animal individual being suspected to suffer from breast cancer; Testing said sample for a signature indicative of a discrete breast cancer specific state by determining expression of at least 1 gene of genes 1 to 100, 101 to 200, 201 to 300 and/or 301 to 399 of tables 5, 6, 7 and/or 8;
Allocating a discrete breast cancer-specific state to said sample based on the signature determined in step b.).
2. Method of determining the responsiveness of at least one human or animal individual, which is suspected of being afflicted by breast cancer, towards a pharmaceutically active agent comprising at least the steps of:
a. Providing a sample of a human or animal individual being suspected to suffer from breast cancer before the pharmaceutically active agent is administered;
b. Testing said sample for a signature indicative of a discrete breast cancer specific state by determining expression of at least 1 gene of genes 1 to 100, 101 to 200, 201 to 300 and/or 301 to 399 of tables 5, 6, 7 and/or 8;
c. Allocating a discrete breast cancer-specific state to said sample based on the signature determined in step b.);
d. Determining the effect of the pharmaceutically active agent on the disease symptoms in said individual and/or determining a discrete breast cancer-specific state according to steps a. to c. after the pharmaceutically active agent is administered;
e. Identifying a correlation between the effects on disease symptoms and/or discrete breast cancer-specific states and the initial discrete breast cancer-specific state of the sample.
3. Method of predicting the responsiveness of at least one patient which is suspected of being afflicted by breast cancer, towards a pharmaceutically active agent comprising at least the steps of:
a. Determining whether a correlation between effects on disease
symptoms and/or discrete breast cancer-specific states and the initial discrete breast cancer-specific state as a consequence of administration of a pharmaceutically active agent exists by using the method of embodiment 2;
b. Testing a sample of a human or animal individual patient which is suspected of being afflicted by breast cancer for a signature indicative of a discrete breast cancer specific state by determining expression of at least 1 gene of genes 1 to 100, 101 to 200, 201 to 300 and/or 301 to 399 of table 5, 6, 7 and/or 8;
c. Allocating a discrete breast cancer specific state -specific state to said sample based on the signature determined in step c); d. Comparing the discrete breast cancer specific state-specific state of the sample in step c. vs. the discrete breast cancer specific state- specific state for which a correlation has been determined in step a.); e. Predicting the effect of a pharmaceutically active compound on the disease symptoms in said patient.
4. A method of determining the effects of a potential pharmaceutically active compound for treatment of breast cancer, comprising at least the steps of:
a. Providing a sample of a human or animal individual being suspected to suffer from breast cancer before a pharmaceutically active agent is applied;
b. Testing said sample for a signature indicative of a discrete breast cancer specific state by determining expression of at least 1 gene of genes 1 to 100, 101 to 200, 201 to 300 and/or 301 to 399 of table 5, 6, 7 and/or 8;
c. Allocating a discrete breast cancer-specific state to said sample based on the signature determined in step b.);
d. Providing a sample of the same human or animal individual being suspected to suffer from breast cancer after a pharmaceutically active agent is applied;
e. Testing said sample for a signature indicative of a discrete breast cancer- specific state by determining expression of at least 1 gene of genes 1 to 100, 101 to 200, 201 to 300 and/or 301 to 399 of tables 5, 6, 7 and/or 8; f. Allocating a discrete breast cancer specific state to said sample based on the signature determined in step e.);
g. Comparing the discrete breast cancer specific states identified in steps c.) and f).
5. A method of any of embodiments 1 to 4 wherein the signature is
characterized by the expression pattern of at least 5 genes, preferably of at least 10 genes of genes 1 to 100 of table 5, with genes 1 to 100 being over-expressed. 6. A method of any of embodiments 1 to 4 wherein the signature is
characterized by the expression pattern of at least 5 genes, preferably of at least 10 genes of genes 101 to 200 of table 6, with genes 101 to 200 being under-expressed.
7. A method of any of embodiments 1 to 4 wherein the signature is
characterized by the expression pattern of at least 5 genes, preferably of at least 10 genes of genes 201 to 300 of table 7, with genes 201 to 292 being under-expressed and genes 293 to 300 being over-expressed.
8. A method of any of embodiments 1 to 4 wherein the signature is
characterized by the expression pattern of at least 5 genes, preferably of at least 10 genes of genes 301 to 399 of table 8, with genes 301 to 399 being under-expressed.
9. A signature which is definable by the expression pattern of at least one, preferably of at least five, more preferably of at least 10 genes of genes 1 to 100, 101 to 200, 201 to 300 and/or 301 to 399 of tables 5, 6, 7 and/or 8 for use in diagnosing, prognosing, stratifying and/or screening breast cancer in human or animal individuals.
10. A signature which is definable by the expression pattern of at least one, preferably of at least five, more preferably of at least 10 genes of genes 1 to 100,
101 to 200, 201 to 300 and/or 301 to 399 of tables 5, 6, 7 and/or 8 for use as a read out of a target for development, identification and/or screening of at least one pharmaceutically active compound for treatment of breast cancer. 11. A signature for use according to embodiment 9 or 10, which is definable by the expression pattern of at least one, preferably of at least five, more preferably of at least 10 genes 1 to 100 of table 5, with genes 1 to 6 being under-expressed and genes 7 to 100 being over-expressed.
12. A signature for use according to embodiment 9 or 10, which is definable by the expression pattern of at least one, preferably of at least five, more preferably of at least 10 genes of genes 101 to 200 of table 6, with genes 101 to 190 being under- expressed and genes 191 to 200 being over-expressed.
13. A signature for use according to embodiment 9 or 10, which is definable by the expression pattern of at least one, preferably of at least five, more preferably of at least 10 genes of genes 201 to 300 of table 7, with genes 201 to 274 being under- expressed and genes 275 to 300 being over-expressed.
14. A signature for use according to embodiment 9 or 10, which is definable by the expression pattern of at least one, preferably of at least five, more preferably of at least 10 genes of genes 301 to 399 of table 8, with genes 301 to 374 being under- expressed and genes 375 to 399 being over-expressed.
15. A discrete disease-specific state which is definable by a signature according to any of embodiments 9 to 14 for use in diagnosing, prognosing, stratifying and/or screening breast cancer in human or animal individuals.
16. A discrete disease-specific state which is definable by a signature according to any of embodiments 9 to 14 for use as a read out of a target for development, identification and/or screening of at least one pharmaceutically active compound for treatment of breast cancer.

Claims

1. Method of diagnosing, prognosing, classifying, stratifying and/or screening a disease in a human or animal patient, which is suspected of being afflicted by said disease, comprising at least the steps of:
a. determining the expression of a set of at least two predictive descriptors in a sample of said patient for a disease-specific state,
b. comparing the level of expression of said set of at least six predictive descriptors to the level of expression of said set of predictive descriptors in a control sample,
c. allocating a disease-specific state to said sample of said patient based on the expression of said set of predictive descriptors,
wherein said set of predictive descriptors is selected from a group of descriptors which is indicative of disease-specific states and which is identifiable by unsupervised two-way hierarchical clustering of gene expression data for samples of said disease from different patients.
2. Method according to claim 1,
wherein determining expression of a set of predictive descriptors in a sample of a patient is done by PCR analysis.
Method according to claim 2,
wherein determining expression of a set of predictive descriptors in a sample of a patient is done by qPCR analysis.
Method according to any of claims 2 or 3,
wherein said set of at least two predictive descriptors for PCR analysis is selected from a group of descriptors, which is indicative of disease-specific states and which is identifiable by unsupervised two way hierarchical clustering of gene expression data for samples of said disease from different patients, by a process comprising at least the steps of:
a. selecting for each disease disease-specific state at least 10 descriptors with the most positive correlation with a disease-specific state, b. selecting for each disease-specific state at least 10 descriptors with the most negative correlation with a disease-specific state,
c. selecting at least 10 descriptors with low variation across all disease- specific states,
d. selecting randomly at least 10 descriptors across all disease-specific states,
e. combining all descriptors of steps a. to d. into one first list,
f. defining a second list or subset of predictive descripotrs., e.g. genes which is empty in the beginning of the following procedure,
g. computing the accuracy of predicting the desired state disease-specific state for all single descriptors, e.g. genes separetly, of the set by employing a naive Bayes model,
h. select a number of descriptors, e.g. genes with the best accuracy of step g) and add them to the second list or subset of predictive descriptors, e.g. genes, and take them out of the first list,
i. compute the accuracy of predicting the desired disease-specific state for the combination of all descriptors, e.g. genes in the second list or subset in combination with each single remaining descriptor, e.g. gene of the first list,
j. select a number of descriptors, e.g. genes with the best accuracy of step h) and add them to the second list or subset of predictive descriptors, e.g. genes, and take them out of the first list,
k. repeat steps g) and j) until a desired accuracy is obtained until said second list contains at least 2 descriptors for each disease-specific state, or until the prediction accuracy reaches a predefined threshold
Method according to any of claims 2, 3, or 4, wherein expression of said set of at least two descriptors per disease-specific state is determined by PCR analysis, optionally by qPCR analysis and wherein, based on the PCR results, assignment of a disease-state for a sample of patient is calculated according to: m alal jx( Pn susbset n> Pj,subset j
J J
with P, n,subset n
gene k
belongs to subset n and
Figure imgf000127_0001
wherein Xk denote the measured cycle numbers for each of the descriptors k and parameters μη and ση k denote predetermined parameters for each gene k and state n and, wherein Sn with the highest value determines the disease- specific state.
Method according to any of claims 1, 2, 3, 4, or 5,
wherein the disease-specific state is a disease-specific state for a hyper proliferative disease.
Method according to claim 6,
wherein said hyper proliferative disease is selected from the group comprising renal cell cancer, breast cancer, ovarian cancer, colorectal cancer, lung cancer, prostate cancer, brain cancer, hepato cellular carcinoma, acute myeloma, pheochromocytoma, skin cancer, lymphoma, Burkitt's lymphoma or myeloma.
Method according to claim 7,
wherein said hyper proliferative disease is renal cell cancer or breast cancer.
Method according to any of claims 1, 2, 3, 4, 5, 6, 7, or 8,
wherein expression of said descriptors is determined by PCR, optionally by qPCRs for genes.
Method according to any of claims 1, 2, 3, 4, 5, 6, 7, 8, or 9,
wherein the expression of a set of at least 2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15, 16, 17, 18, 19 or 20 predictive descriptors for a disease-specific state is determined. Method according to any of claims 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10,
wherein the expression of a set of at least 6, 8, 10, 12, 14, 16, 18, or 20 predictive descriptors for a disease-specific state is determined.
Method according to any of claims 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11,
wherein said sample of patient and said control sample is an extracorporeal sample.
Method of diagnosing, prognosing, classifying, stratifying and/or screening renal cell cancer in a human or animal patient, which is suspected of being afflicted by said disease, comprising at least the steps of:
a. determining expression by PCR, optionally by qPCR for the set of genes selected of Table 10 by PCR, optionally by qPCR,
b. assigning, based on the PCR results, one of three renal cell cancer- state for a sample of patient by calculating according to:
Figure imgf000128_0001
n> "j,subset j
all j
with
P, n,subset n P, n,k
Figure imgf000128_0002
gene k
belongs to subset n and
Figure imgf000128_0003
wherein Xk denote the measured cycle numbers for each of the descriptors k and parameters μη and ση k denote predetermined parameters for each gene k and state n and, wherein Sn with the highest value determines the disease-specific state. Method of claim 13, wherein the expression of a set of at least 6, 8, 10, 12, 14,
16, 18, or 20 predictive descriptors for a renal cancer-specific state is determined.
Method of claim any of claims 13 or 14, wherein the expression of a set of at least 6 predictive descriptors of for each of the three renal cancer-specific states A, B, and is determined by using qPCR measurements for the genes mentioned in Table 12 and wherein the parameters μη and ση k can be taken from Table 11.
Method of identifying sets of at least two predictive descriptors for a disease- specific state in a sample of a patient suffering from said disease which are suitable diagnosing, prognosing, classifying, stratifying and/or screening a disease in a human or animal patient, comprising at least the steps of:
a. selecting said set of predictive descriptors from a group of descriptors which is indicative of disease-specific states and which is identifiable by unsupervised two-way hierarchical clustering of gene expression data for samples of said disease from different patients.
b. selecting for each disease disease-specific state at least 10 descriptors with the most positive correlation with a disease-specific state, c. selecting for each disease-specific state at least 10 descriptors with the most negative correlation with a disease-specific state,
d. selecting at least 10 descriptors with low variation across all disease- specific states,
e. selecting randomly at least 10 descriptors across all disease-specific states,
f. combining all descriptors of steps a. to d. into one first list, g. defining a second list or subset of predictive descriptors, e.g. genes which is empty in the beginning of the following procedure, h. computing the accuracy of predicting the desired state disease-specific state for all single descriptors, e.g. genes separetly, of the set by employing a naive Bayes model,
1. select a number of descriptors, e.g. genes genes with the best accuracy of step h) and add them to the second list or subset of predictive descriptors, e.g. genes, and take them out of the first list,
J- compute the accuracy of predicting the desired disease-specific state for the combination of all descriptors, e.g. genes in the second list or subset in combination with each single remaining descriptor, e.g. gene of the first list,
k, select a number of descriptors, e.g. genes with the best accuracy of step j) and add them to the second list or subset of predictive descriptors, e.g. genes, and take them out of the first list,
1. repeat steps i) and k) until a desired accuracy is obtained until said second list contains at least 2 descriptors, e.g. genes for each disease-specific state, or until the prediction accuracy reaches a predefined threshold.
17. Method according to claim 16,
wherein said sets of predictive descriptors in a sample of a patient can be analyzed by PCR analysis.
18. Method according to claim 17,
wherein said sets of predictive descriptors in a sample of a patient can be analyzed by qPCR analysis.
19. Method according to any of claims 16, 17, or 18,
wherein the disease-specific state is a disease-specific state for a hyper proliferative disease.
20. Method according to claim 19,
wherein said hyper proliferative disease is selected from the group comprising include renal cell cancer, breast cancer, ovarian cancer, colorectal cancer, lung cancer, prostate cancer, brain cancer, hepato cellular carcinoma, acute myeloma, pheochromocytoma, Burkitt's lymphoma or myeloma.
21. Method according to claim 20,
wherein said hyper proliferative disease is renal cell cancer or breast cancer.
22. Method according to any of claims 16, 17, 18, 19, 20, or 21,
wherein the expression of a set of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19 or 20 predictive descriptors for a disease-specific state is identified.
23. Method according to any of claims 16, 17, 18, 19, 20, 21, or 22,
wherein the expression of a set of at least 6, 8, 10, 12, 14, 16, 18, or 20 predictive descriptors for a disease-specific state is determined.
24. Method according to any of claims 16, 17, 18, 19, 20, 21, 22 or 23,
wherein said sample of a patient is an extracorporeal sample.
25. A combination of predictive descriptors for diagnosing, prognosing, classifying, stratifying and/or screening renal cell cancer in a human or animal patient, which is suspected of being afflicted by said disease, being identifiable by a method in accordance with any of claims 14, 15, 16, 17, 18, 19, 20, or 21.
26. A combination of predictive descriptors for diagnosing, prognosing, classifying, stratifying and/or screening renal cell cancer in a human or animal patient, which is suspected of being afflicted by said disease, being selected from Table 10.
PCT/EP2012/072578 2011-11-15 2012-11-14 Discrete states for use as biomarkers for cancers such as renal cell cancer Ceased WO2013072346A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
EP11189178 2011-11-15
EP11189206.3 2011-11-15
EP11189206 2011-11-15
EP11189178.4 2011-11-15

Publications (2)

Publication Number Publication Date
WO2013072346A2 true WO2013072346A2 (en) 2013-05-23
WO2013072346A9 WO2013072346A9 (en) 2013-11-21

Family

ID=47148837

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2012/072578 Ceased WO2013072346A2 (en) 2011-11-15 2012-11-14 Discrete states for use as biomarkers for cancers such as renal cell cancer

Country Status (1)

Country Link
WO (1) WO2013072346A2 (en)

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
BELEUT ET AL., BMC CANCER, vol. 12, 2012, pages 3 10
BELEUT ET AL., BMC CANCER, vol. 12, 2012, pages 310
HANAHAN; WEINBERG, THE HALLMARKS OF CANCER, 2000
NATURE, vol. 490, 2012, pages 61 - 70
SAEED ET AL., METHODS ENZYMOL., vol. 411, 2006, pages 134 - 193
TUSHER ET AL., PROC NATL ACAD SCI USA, vol. 98, no. 9, 2001, pages 5116 - 5121
WANG Y ET AL., LANCET, 2005, pages 671 - 9

Also Published As

Publication number Publication date
WO2013072346A9 (en) 2013-11-21

Similar Documents

Publication Publication Date Title
Sanchez-Carbayo et al. Defining molecular profiles of poor outcome in patients with invasive bladder cancer using oligonucleotide microarrays
Singh et al. Gene expression correlates of clinical prostate cancer behavior
Ye et al. Predicting hepatitis B virus–positive metastatic hepatocellular carcinomas using gene expression profiling and supervised machine learning
CN103733065B (en) Molecular diagnostic tests for cancer
US20080032299A1 (en) Methods for prognosis and treatment of solid tumors
JP2007049991A (en) Prediction of breast cancer bone recurrence
WO2010063121A1 (en) Methods for biomarker identification and biomarker for non-small cell lung cancer
EP2417545A1 (en) Method for in vitro diagnosing a complex disease
EP2304631A1 (en) Algorithms for outcome prediction in patients with node-positive chemotherapy-treated breast cancer
WO2006119593A1 (en) Gene-based algorithmic cancer prognosis
CN101313306A (en) Gene expression profiling for identifying prognostic subclasses in nasopharyngeal carcinoma
US9721067B2 (en) Accelerated progression relapse test
US20150038359A1 (en) Method of predicting outcome in cancer patients
SG174333A1 (en) Identification of biologically and clinically essential genes and gene pairs, and methods employing the identified genes and gene pairs
JP2008520251A (en) Methods and systems for prognosis and treatment of solid tumors
US20130157887A1 (en) Discrete states for use as biomarkers
Liu et al. Bioinformatics analysis with graph-based clustering to detect gastric cancer-related pathways
Thomassen et al. Prediction of metastasis from low‐malignant breast cancer by gene expression profiling
EP1797429A2 (en) Methods and kits for the prediction of therapeutic success and recurrence free survival in cancer therapy
Gevaert et al. Prediction of cancer outcome using DNA microarray technology: past, present and future
WO2013072346A2 (en) Discrete states for use as biomarkers for cancers such as renal cell cancer
US11976330B2 (en) MiRNA signature expression in cancer
Tiwari Microarrays and cancer diagnosis.
AU2011251964A1 (en) Discrete states for use as biomarkers
WO2014009798A1 (en) Gene expression profiling using 5 genes to predict prognosis in breast cancer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12784022

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12784022

Country of ref document: EP

Kind code of ref document: A2