Case Ref. P128WO by IPTector® Method for selecting effective anti-cancer drugs. Technical Field [0001] Disclosed herein are computer implemented methods for predicting an effective anti-cancer treatment regime wherein a library of mRNA transcriptomes of mammal cancer cells is modelled with a library of mRNA transcriptomes of cancer cell lines and data on cytocidal and/or cytostatic effects of one or more anti-cancer agents against these cancer cell lines; comparing mRNA transcriptomes of cancer cells from a patient be treated with the model and thereby predicting an anti-cancer agent which is effective against the cancer cell from the patient. Background [0002] It is estimated that up to 75 % of anti-cancer regimes applied in a first line treatment are not effective against the cancer cells the anti-cancer agent is supposed to eliminate. Often cancer patients undergo repeatedly lengthy periods of “trial and error” treatments to find a treatment regime, which has effect against the particular cancer cells of the patient. However, valuable time and resources are lost on “trial and error“ treatments in which time the cancer may develop into becoming terminal or untreatable. A major cause for this weakness in treating cancers is the lack of clinical tools for predicting and selecting the most effective treatment regime right from examination and diagnosis and there is a great need for such clinical tools. Summary [0003] The present application disclose provides solutions and/or improvements to certain drawbacks of this background art including but not limited to tools for predicting effective cancer treatments, more specifically methods for linking mRNA transcriptomes of a patient cancer cells to one or more effective anti-cancer agents by using modern data processing technology and machine learning modelling mRNA transcriptomes of a library of mammal cancer cells with a library of cancer cell lines having a known sensitivity to an array of anti-cancer agents, and subjecting the mRNA transcriptomes of the patient to this model to predict one or more anti-cancer agent effective against the patients cancer cells. Accordingly, in a first aspect a computer implemented method is provided for predicting one or more effective anti-cancer treatment regimes comprising: a) Providing a first data set comprising one or more mRNA transcriptomes of a first collection of cancer cells; b) Providing a second data set comprising one or more mRNA transcriptomes of a first collection of
Case Ref. P128WO by IPTector® cancer cell lines; c) Providing a third data set comprising cytocidal and/or cytostatic effects of one or more anti- cancer agents on the first collection of cancer cell lines. d) Processing the first, second and third data sets in a computer model running one or more algorithms comparing the mRNA transcriptomes of the first collection of cancer cells with the mRNA transcriptomes of the first collection of cancer cell lines and linking the one or more anti- cancer agents to cytocidal and/or cytostatic effects against the first collection of human or animal cancer cells; e) Providing a fourth data set comprising one or more mRNA transcriptomes of cancer cells from a subject to be treated; and f) Processing the fourth data set by the computer model and thereby predicting one or more anti- cancer agents to have cytocidal and/or cytostatic effects against the cancer cells from the subject. [0004] In a further aspect an algorithm running on a computer is provided, comprising (i) data from one or more mRNA transcriptomes of a first collection of cancer cells from cancer tissues; (ii) data from one or more mRNA transcriptomes of a first collection of cancer cell lines; and (iii) cytocidal and/or cytostatic effect data for one or more anti-cancer agents on the first collection of cancer cell lines; and when feeding one or more mRNA transcriptomes of a patients cancer cell to the algorithm, it returns one or more anti-cancer agents predicted to have cytocidal and/or cytostatic effects against the patients cancer cell. [0005] In a still further aspect, a system is provided for predicting an effective anti-cancer treatment of cancer cells comprising a computer running one of more algorithms of the invention and/or the steps of the predictive method of the invention. [0006] In a still further aspect, a method is provided for treating a cancer in a patient comprising a) extracting a tissue sample biopsy from the patent comprising cancer cells; b) generating a mRNA transcriptome of the cancer cells by sequencing mRNAs produced by the cancer cells; c) Processing the mRNA transcriptome data in the system of the invention, and predicting one or more anti-cancer agents having cytocidal and/or cytostatic effects against the cancer cells; and d) Administering the predicted one or more anti-cancer agents to the patient in a therapeutically effective amount. Description of drawings and figures [0007] The figures included herein are illustrative and simplified for clarity, and they merely show
Case Ref. P128WO by IPTector® details which are essential to the understanding of the invention, while other details may have been left out. Throughout the specification, claims and drawings the same reference numerals are used for identical or corresponding parts. In the figures and drawing include herein: Figure 1 shows an GI-50 normalized plot of NCI cancer cell lines indicating the sensitivity and/or imperceptibility/resistance of each cell line to cisplatin. Figure 2 shows the operational flow in building and using a predictive algorithm by collecting and compiling data on mRNA transcriptomes from cancer cell lines, from known tumors and from new cancer patients and from data on anti-cancer agent effects on cancer cell lines. Figure 3 shows the effect data for 5-FU on cancer cell lines as retrieved from NCI. Figure 4 shows a graphical representation of the GI50 values for Epirubicin, NSC 256942, downloaded trough https://dtp.cancer.gov/dtpstandard/dwindex/index.jsp. Figure 5 shows a Kaplan-Meier plot of TTP for 3 groups: lowest 20%, middle 60%, highest 20% (HR: 0,44, p-value < 0,01). Incorporation by reference [0008] All publications, patents, and patent applications referred to herein are incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. In the event of a conflict between a term herein and a term in an incorporated reference, the term herein prevails and controls. Detailed Description The features and advantages of the present invention are readily apparent to a person skilled in the art by the below detailed description of embodiments and examples of the invention. Definitions [0009] The term “mRNA transcriptome” as used herein refers to the collection of different mRNAs in a cell reflecting expression of genes and gene products in the cell. In cancer cells the mRNA transcriptome can contain expressed genes/gene products that can be used as a biomarker for the cancer. For reference see Yang, W. et al (2012). Genomics of Drug Sensitivity in Cancer (GDSC): a
Case Ref. P128WO by IPTector® resource for therapeutic biomarker discovery in cancer cells. Nucleic acids research, 41(D1), D955- D961. [0010] The term “library of cancer cells” as used herein refers to a library of cancer cells from tissue samples of patients diagnosed with one or more cancers, samples typically taken from cancer tumors by biopsies and/or histopathological sections. Tissues can be fresh or frozen or they can be preserved, typically in formalin or a similar conservation agent. [0011] The term “library of cancer cell lines” as used herein refers to a collection of monocultures of cancer cells that can be propagated repeatedly and sometimes indefinitely. A cancer cell line typically originally arises from a primary cancer cell isolated directly from cancerous tissue, or organs. One example of a library of cancer cell lines is the NCI-60 cancer cell line panel which is a group of 60 human cancer cell lines used by the National Cancer Institute of the US (NCI) for the screening of compounds to detect potential anti-cancer activity against the cell lines (“Cell Lines in the In Vitro Screen”. Developmental Therapeutics Program. National Cancer Institute. 8 May 2015.). Here, also the sensitivity and resistance data for more than 50.000 cancer agents and products on these 60 cell lines are available - see also https://dtp.cancer.gov/discovery_development/approved_drugs.htm. Other examples of cancer cell line libraries or databases with mRNA transcriptome and drug sensitivity data are the Cancer Cell Line Encyclopedia (CCLE) with mRNA transcriptome data for >1000 cell lines and the Genomics of Drug Sensitivity in Cancer (GDSC) database with drug sensitivity data for >500 compounds. For reference see Barretina, J. et al. (2012). The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature, 483(7391), 603-607. [0012] The term “mRNA sequencing” as used herein includes any mean for identifying mRNA and correlating it to expression of a gene, including but not limited to Next-Generation Sequencing (NGS) based sequencing and mRNA array-based methods exemplified but not limited to Affymetrix HGU133+2. [0013] The term “AUC” or “area under ROC” (receiver operating curve) as used herein refers to a (quality) measure of prediction model precision by plotting the true positive and the false positive rates. Hazard ratio (“HR”) refers to the risk of an event happening (e.g. disease progression) in one group compared to another at any given time. A hazard ratio of 0,5 when comparing a group A to a group B would for example mean that risk of event is 50% lower in group A compared to group B at any given time. [0014] In the first aspect the present invention provides a method for predicting an effective anti- cancer treatment regime comprising: a) Providing a first data set comprising one or more mRNA transcriptomes of a first library of mammal cancer cells;
Case Ref. P128WO by IPTector® b) Providing a second data set comprising one or more mRNA transcriptomes of a first library of mammal cancer cell lines; c) Providing a third data set comprising cytocidal and/or cytostatic effects of one or more anti- cancer agents on the first library of mammal cancer cell lines. d) Processing the first, second and third data in a computer model running one or more algorithms comparing the mRNA transcriptomes of the first library of mammal cancer cells with the mRNA transcriptomes of the first library of mammal cancer cell lines and linking the one or more anti- cancer agents to cytocidal and/or cytostatic effects against the first library of mammal cancer cells; e) Providing a fourth data set comprising one or more mRNA transcriptomes of cancer cells from a mammal subject to be treated; and f) Processing the fourth data set by the computer model and thereby predicting one or more anti- cancer agents to have cytocidal and/or cytostatic effects against the cancer cells from the subject. [0015] The data processing in step d) can also be called the “training“ of the computer model wherein the model learns the correlation between mRNA transcriptomes of historical cancer cells and cancer diagnoses and mRNA transcriptomes of “pure” known cancer cell lines having a sensitivity to known anti-cancer agents. In some embodiments in the method described herein it can be said that the mRNA transcriptomes of the cancer cells from real cancer tissues is used as a data filter on the cancer cell line transcriptome data to filter off information in cell line transcriptomes which are not correlated to real life cancer transcriptomes. In one embodiment, the method further comprises adding data to the computer model on the cytocidal and/or cytostatic effects of the one or more predicted anti-cancer agents against the cancer cells from the subject. Typically, when having predicted one or more effective anti-cancer agents using the modelling of the invention and administering this agent to a patient in need of anti-cancer therapy, the data for the effectiveness of the treatment can advantageously be collected and fed back to the model as validation of the effectiveness of the predicted one or more anti-cancer agents or if not effective to further train the computer model to improve predictive capability. In a further embodiment the mRNA transcriptomes of the first library of cancer cells are retrieved from one or more histopathological biopsies or sections from one or more cancer patients. [0016] Cancer biopsy samples are preserved in one of two main ways to maintain their molecular and structural integrity for subsequent analyses. The two primary methods are fresh frozen (FF) and formalin-fixed, paraffin-embedded (FFPE) techniques: • Fresh frozen (FF): the tissue is rapidly frozen after collection in liquid nitrogen or dry ice and
Case Ref. P128WO by IPTector® stored at ultra-low temperatures, typically -80°C, which minimizes RNA degradation and helps preserve the most accurate representation of gene expression at the moment of collection. • Formalin-fixed paraffin-embedded (FFPE): multi-step process where the tissue is first fixed in formalin, which creates cross-links between proteins. After fixation, the tissue undergoes dehydration with an alcohol solution, and finally it is embedded in paraffin wax. While FFPE techniques are excellent for long-term tissue storage, the chemical fixation process compromises RNA quality, causing fragmentation and molecular modifications that complicate transcriptome analysis. [0017] Following tissue preservation, RNA is extracted from the samples to obtain the mRNA needed for transcriptome analysis. The preservation method affects the quality and characteristics of transcriptome data. A principal component analysis (PCA) of gene expression profiles from 3 FFPE and 3 FF tumor samples (see figure 8), showed a clear separation between the two groups reveals systematic differences in the expression data due to the preservation method. FF samples tend to preserve a more complete gene expression profile, while FFPE samples often show altered expression patterns due to chemical modifications introduced during fixation. Accordingly, transcriptome data generated from FF tissues are superior to data generated from FFPE tissues and in one embodiment the method described herein employs mRNA transcriptome data generated from FF tissues. [0018] The one or more mRNA transcriptomes are preferably generated by sequencing the total mRNA of cancer cells in the one or more tissue biopsies. mRNA transcriptomes of the first library of cancer cells for training the model can be generated by sequencing mRNA from cancer cells. Histopathological collections of cancer tissue are publicly and commercially available and methods for sequencing mRNA therefrom are well known in the art. More mRNA transcriptomes from more types of cancer cells improves the models and improves the training of the model and its capability of making credible predictions of best treatment of cancers based on patient mRNA transcriptomes. mRNA transcriptome data for training the model can also conveniently be sourced from publicly available data bases such as Gene Expression Omnibus (GEO) available from the National Cancer Institute of the USA (NCI) (https://www.ncbi.nlm.nih.gov/geo/). The Gene Expression Omnibus is a public database that archives and distributes high-throughput gene expression and other functional genomics data sets. GEO stores a wide range of data types, including transcriptome sequencing (RNA-seq) data for cancer cells and cell lines. Additionally or alternatively mRNA transcriptome data can also be sourced from Array Express, a public database of functional genomics experiments that stores a wide range of high-throughput data from a variety of organisms, including humans, mice, rats, and plants, including
Case Ref. P128WO by IPTector® gene expression profiles generated from microarray and sequencing technologies (https://www.ebi.ac.uk/biostudies/arrayexpress). [0019] The mRNA transcriptome data of the first or second library of cancer cell lines can be generated by sequencing the total mRNA of cancer cell lines. Cancer cell lines are publicly available either commercially or from public repositories and methods for sequencing mRNA are well known in the art. More mRNA transcriptomes from more types of cancer cell lines improves the models and improves the training of the model and its capability of making credible predictions of best treatment of cancers based on patient mRNA transcriptomes. mRNA transcriptome data for training the model can also conveniently be sourced from publicly available data bases such as Cellminer available from NCI (https://discover.nci.nih.gov/cellminer/loadDownload.do). CellMiner is a publicly available database that, amongst other things, provides information on gene expression levels including mRNA profiles of a great number of know cancer cell lines. [0020] Data of cytocidal and/or cytostatic effects of anti-cancer agents on cancer cell lines can be generated by testing the cytocidal and/or cytostatic effects of selected anti-cancer agents on selected cancer cell lines. Cancer cell lines are publicly available either commercially or from public repositories and cytocidal and/or cytostatic assays are well known in the art. More cytocidal and/or cytostatic effects of more anti-cancer agents on more cancer cell lines improves the models and improves the training of the model and its capability of making credible predictions of best treatment of cancers based on patient mRNA transcriptomes. Cytocidal and/or cytostatic effect data of anti-cancer agents on cancer cell lines can also conveniently be sourced from publicly available data bases such as DrugQuery available from NCI (https://discover.nci.nih.gov/cellminer/drugQuery.do). DrugQuery is a publicly available database that, amongst other things, also provides data on the sensitivity of a great number of cell lines to over 20,000 drugs, including FDA-approved drugs, investigational agents, and experimental compounds. This drug sensitivity data is represented as the half-maximal inhibitory concentration (IC50) of the drug in each cell line. [0021] The predicted one or more anti-cancer agents can be one or more distinct compounds such as one, two or three distinct compounds. For the data modelling and prediction to work appropriately the anti-cancer agent should impact the proliferation of cancer cells and cancer cell lines. In some embodiments the anti-cancer agent is selected from the group consisting of topoisomerase inhibitors, antihormone agents, alkylating agents, mitotic inhibitors, antimetabolites, anti-tumor antibiotics, corticosteroids, targeted anti-cancer agents, metal-based anti-cancer agents and/or differentiating agents alone or any combination thereof. [0022] In further embodiments the topoisomerase inhibitor can be a topoisomerase I or topoisomerase II inhibitor, where the topoisomerase I inhibitor can be irinotecan, its active metabolite
Case Ref. P128WO by IPTector® SN-38, or topotecan. Topoisomerase inhibitors is a class of chemotherapy drugs that work by interfering with enzymes called topoisomerases, which help separate the strands of DNA so they can be copied during the S phase. Topoisomerase inhibitor therapies are commonly used in the treatment of colorectal cancer, certain leukemias, as well as lung, breast, ovarian, gastrointestinal, and other cancers. [0023] In further embodiments the antihormone agent is selected from the group consisting of Tamoxifen, aromatase inhibitors (e.g. letrozole, anastrozole), fulvestrant, Leuprolide, abiraterone, enzalutamide, goserelin, and megestrol acetate. Anti-hormonal therapies are commonly used in the treatment of hormone-dependent cancers, such as breast, prostate, and ovarian cancers. [0024] In further embodiments anti-cancer agents that inhibit the cyclin dependent kinases 4 and 6 (CDK4/6) in hormone receptor (oestrogen receptor) positive breast cancer, cyclin D overexpression is common and blockage of CDK4/6 together with oestrogen receptor blockage is effective. CDK4/6 inhibitors include abemaciclib, palbociclib and ribociclib and analogues thereof. [0025] In further embodiments the alkylating agent is selected from the group consisting of Cyclophosphamide, Carmustine, Lomustine, Chlorambucil, Busulfan, Melphalan, Thiotepa, and Ifosfamide. Alkylating agents is a class of chemotherapy drugs that work by adding alkyl groups to DNA, causing damage to the DNA structure and preventing cancer cells from dividing and growing. Alkylating agent therapies are commonly used in the treatment of a variety of cancers such as lymphomas, leukemias, breast cancer, ovarian cancer, brain tumors, multiple myeloma, Hodgkin’s disease, and non-Hodgkin’s lymphoma, chronic lymphocytic leukaemia, chronic myeloid leukaemia, ovarian cancer, breast cancer, bladder cancer, and sarcomas. [0026] In some embodiments the mitotic inhibitor is selected from the group consisting of taxanes such as Paclitaxel or Docetaxel, Vinblastine, Vincristine, Estramustine, Ixabepilone, Eribulin, and Cabazitaxel. Mitotic inhibitors is a class of chemotherapy drugs that work by stopping mitosis in the M phase of the cell cycle but can damage cells in all phases by keeping enzymes from making proteins needed for cell reproduction thereby preventing the division and growth of rapidly dividing cancer cells. Mitotic inhibitors are often plant alkaloids and other compounds derived from natural products. Mitotic inhibitor therapies are commonly used in the treatment of a variety of cancers such as prostate cancer, breast cancer, such as metastatic breast cancer, ovarian cancer, and lung cancer, such as non- small cell lung cancer, myelomas, Hodgkin’s lymphoma, testicular cancer, leukaemia, lymphoma, and neuroblastoma. In some embodiments the anti-metabolite is selected from the group consisting of Methotrexate, Fluorouracil (5-FU), Capecitabine, Cytarabine, Gemcitabine, Azathioprine, Mercaptopurine, Cladribine, Floxuridine, Fludarabine, Hydroxyurea, and Pemetrexed. Anti-metabolites is a class of
Case Ref. P128WO by IPTector® chemotherapy drugs that work by interfering with DNA and RNA growth by substituting the normal building blocks of RNA and DNA. These agents damage cells during the S phase, when the cell’s chromosomes are being copied thereby interfering with the metabolism of cancer cells and preventing their growth and division. Anti-metabolite therapies are commonly used in the treatment of a variety of cancers, such as leukaemia, lymphoma, breast cancer, lung cancer, colorectal cancer, stomach cancer, and pancreatic cancer. [0027] In some embodiments the anti-tumor antibiotic is selected from the group consisting of anthracyclines and non-anthracyclines. Anthracyclines include drugs such as Daunorubicin, Doxorubicin, Epirubicin, and Idarubicin, while non-anthracyclines include drugs such as Actinomycin- D, Bleomycin, Mitomycin-C, and Mitoxantrone (also acts as a topoisomerase II inhibitor). Anti-tumor antibiotics is a class of chemotherapy drugs that work by inhibiting DNA and RNA synthesis in cancer cells. These drugs work in all phases of the cell cycle. Anthracyclines are also capable of inhibiting topoisomerase II. Anti-tumor antibiotic therapies are commonly used in the treatment of a variety of cancers such as breast cancer, bladder cancer, ovarian cancer, leukaemia, lymphoma, testicular cancer, Hodgkin’s lymphoma, pancreatic cancer, Wilms’ tumor, or rhabdomyosarcoma. In a particular embodiment the anti-tumor antibiotic is Epirubicin. In some embodiments the corticosteroid is selected from the group consisting of Dexamethasone, Prednisone, Methylprednisolone, Hydrocortisone, and Betamethasone. Corticosteroids, often simply called steroids, are natural hormones and hormone-like drugs that are useful in the treatment of many types of cancer, as well as other illnesses. Corticosteroids drugs work as supportive therapy in cancer treatment to help manage symptoms such as inflammation, nausea, and pain. When these drugs are used as part of cancer treatment, they are considered chemotherapy drugs. Corticosteroid therapies are commonly used in connection with the treatment of a variety of cancers such as lymphoma, leukaemia, or multiple myeloma. [0028] In further embodiments the metal-based anti-cancer agent is selected from the group consisting of cisplatin, carboplatin, oxaliplatin, nedaplatin, satraplatin, ruthenium-based drugs, gold- based drugs, and titanocene dichloride. metal-based anti-cancer agents is a class of chemotherapy drugs that contain metal ions as their active ingredient and work by interacting with the DNA or proteins in cancer cells, causing damage and inhibiting their ability to grow and divide. Metal-based anti-cancer agents are commonly used in the treatment of a variety of cancers such as prostate, colorectal, testicular, ovarian, bladder, lung, and head and neck cancers. Some metal-based anti- cancer agents are sometimes grouped with alkylating agents because they kill cells in a similar way. [0029] In some embodiments the differentiating agent is selected from the group consisting of tretinoin, arsenic trioxide, histone deacetylase (HDAC) inhibitors, hydroxamic acid analogues, butyric
Case Ref. P128WO by IPTector® acid analogues, azacytidine, decitabine, and tamoxifen. Differentiating agents is a class of chemotherapy drugs that work by inducing cancer cells to mature or differentiate into more specialized cells, which can reduce the number of cancer cells and slow down tumor growth. Differentiating agent therapies are commonly used in the treatment of a variety of cancers such as acute promyelocytic leukaemia (APL), acute myeloid leukaemia (AML) myelodysplastic syndrome (MDS), or hormone receptor-positive breast cancer. [0030] In further embodiments the anti-cancer agent can be a drug that inhibit the PARP repair enzymes and thereby help kill cancer cells by inhibiting DNA damage. Such PARP inhibitors may be selected from olaparib, rucaparib, stenoparib and/or analogues thereof. [0031] In a further embodiment the anti-cancer agent can be a drug like ibrutinib, Velcade, bortezomib and analogues thereof which inhibit proteasome inhibitors. Velcade and its analogues are useful in the treatment of multiple myeloma and other cancers. Ibrutinib and its analogues bind the protein Bruton's tyrosine kinase (BTK) and blocking BTK inhibits the B-cell receptor pathway, which is often aberrantly active in B cell cancers. [0032] In further embodiments tyrosine kinase inhibitors that work by blocking cancer cell growth signal receptors, this can be epidermal growth factor (EGF) – controls cell growth vascular endothelial growth factor (VEGF) – controls blood vessel development, platelet derived endothelial growth factor (PDGF) – controls blood vessel development and cell growth, fibroblast growth factor (FGF) – controls cell growth comprising drugs like axitinib, dasatinib, erlotinib, Imatinib, nilotinib, pazopanib, sunitinib, dovitininb and analogues [0033] In further embodiments the EGFR monoclonal antibody drugs directly inhibiting proliferation such as monoclonal antibody drugs (cetuximab, panitumumab, nimotuzumab, and necitumumab) and monoclonal antibody drugs binding to the HER2 gene such as trastuzumab and pertuzumab and analogues. [0034] Antibody Drug Conjugates (ADC), monoclonal antibody drug conjugates where the drug conjugate is an antiproliferative agent antimitotic perturbing microtubule growth, and they are exemplified by auristatin derivatives monomethyl auristatin E (MMAE) and monomethyl auristatin F (MMAF) emtansine, vedotine, and analogues thereof or a topoisomerase 1 drug as deruxtecan and govitecan and analogues or DNA damaging agents as calicheamicin γ1 analogues thereof including ozogamicin. [0035] In one embodiment the predicted one or more anti-cancer agents comprises cisplatin. In another embodiment the predicted one or more anti-cancer agents comprise 5-FU[IK1][MB2]. [0036] In another embodiment the predicted one or more anti-cancer agents comprises Docetaxel or Paclitaxel.
Case Ref. P128WO by IPTector® [0037] In addition to the aforementioned cancer types, in some embodiments the first library of mammal cancer cells and/or mammal cancer cell lines comprises cells and/or cell lines from the following cancers: Leukemia, non-small cell lung cancer, Colon cancer, CNS cancer, melanoma, ovarian cancer, renal cancer, prostate cancer, and breast cancer. In further embodiments the first library of mammal cancer cells and/or mammal cancer cell lines can comprise more than one cell and/or cell line type for the same cancer type. [0038] In still further embodiments the first library of mammal cancer cell lines comprises one or more of the cell lines in table 1. Accordingly, the mammal cancer cell lines can comprise at least one cell line from each of the cancer types in table 1 or can even comprise all of cell lines of table 1 – all available from National Cancer Institute, USA – including data of their sensitivity to a range of anti- cancer agents. Table 1 mammal cancer cell lines Cell line Tumor/cancer type CCRF-CEM Leukemia HL-60(TB) Leukemia K-562 Leukemia MOLT-4 Leukemia RPMI-8226 Leukemia SR Leukemia A549/ATCC Non-Small Cell Lung EKVX Non-Small Cell Lung HOP-62 Non-Small Cell Lung HOP-92 Non-Small Cell Lung NCI-H226 Non-Small Cell Lung NCI-H23 Non-Small Cell Lung NCI-H322M Non-Small Cell Lung NCI-H460 Non-Small Cell Lung NCI-H522 Non-Small Cell Lung COLO 205 Colon HCC-2998 Colon HCT116 Colon HCT-15 Colon HT-29 Colon KM12 Colon SW-620 Colon SF-268 CNS SF-295 CNS SF-539 CNS SNB-19 CNS
Case Ref. P128WO by IPTector® SNB-75 CNS U251 CNS LOX IMVI Melanoma MALME-3M Melanoma M14 Melanoma MDA-MB-435 Melanoma SK-MEL-2 Melanoma SK-MEL-28 Melanoma SK-MEL-5 Melanoma UACC-257 Melanoma UACC-62 Melanoma IGR-OV1 Ovarian OVCAR-3 Ovarian OVCAR-4 Ovarian OVCAR-5 Ovarian OVCAR-8 Ovarian NCI/ADR-RES (originally MCF-7/ADR-RES) Ovarian SK-OV-3 Ovarian 786-O Renal A498 Renal ACHN Renal CAKI-1 Renal RXF 393 Renal SN12C Renal TK-10 Renal UO-31 Renal PC-3 Prostate DU-145 Prostate MCF7 Breast MDA-MB-231/ATCC Breast MDA-MB-468 Breast HS 578T Breast MDA-N Melanoma BT-549 Breast T-47D Breast LXFL 529 Non-Small Cell Lung DMS 114 Small Cell Lung SHP-77 Small Cell Lung DLD-1 Colon KM20L2 Colon SNB-78 CNS XF 498 CNS RPMI-7951 Melanoma M19-MEL Melanoma
Case Ref. P128WO by IPTector® RXF-631 Renal SN12K1 Renal P388 Leukemia P388/ADR Leukemia [0039] In some embodiments the mRNA of the library of mRNA transcriptomes of the mammal cancer cells and/or the mammal cancer cell lines comprises mRNA from at least 25 expressed genes, such as at least 50 expressed genes, such as at least 75 expressed genes, such as at least 100 expressed genes, such as at least 125 expressed genes, such as at least 150 expressed genes, such as at least 175 expressed genes, such as at least 200 expressed genes, such as at least 250 expressed genes, such as at least 300 expressed genes, such as at least 400 expressed genes, such as at least 500 expressed genes, such as at least 1000 expressed genes in the cancer cell and/or the cancer cell line. The mRNA may also include long non-coding RNA sequences. In another embodiment, the mRNA transcriptomes comprise a sequence which transcribes into the protein sequence CNDTDTVDAV. [0040] Software and software platforms and systems, including algorithms for processing (training) data sets into a predictive model having appropriate algorithms to reflect and accommodate the training data are generally known in the art and are either publicly or commercially available. Exemplary software and software platforms are those from JADBIO (GR and US). In some embodiments the processing of the data sets includes one or steps selected from https://jadbio.com/company/. [0041] Systems and methods for processing (training) data sets into a predictive models suitably comprise one or more steps selected from (i) collecting the data sets as described, supra, (ii) data preprocessing and conditioning (ii) selecting a model and/or algorithm type, (iii) training the model with the training data sets, (iv) evaluating the model with the validation data sets, (v) fine-tuning by readjusting its parameters or trying different models, (vi) testing the model performance using a new test data set, (vii), deploying the model, and (viii) monitoring and updating the model. [0042] Preprocessing the training data sets is an important step in building the predictive model, as it can have a significant impact on the model’s accuracy and performance. The steps in preprocessing training data sets can in some embodiments suitably include one or more steps selected from: a) Data cleaning, which involves handling missing or erroneous data points. Missing or erroneous data points can either be removed or replaced with a reasonable value using techniques like mean, median, mode imputation, or constant removal; b) Data transformation, transforming data into a more usable format for the model. This can include scaling, normalization, and one-hot encoding categorical variables;
Case Ref. P128WO by IPTector® c) Feature selection, involving selecting the most relevant features from the dataset that are useful for the model. This step can reduce the number of features and improve the model’s performance; d) Feature engineering, involving creating new features from existing ones to improve the model’s performance. This can include creating interaction features, polynomial features, or extracting features from text or images; e) Splitting the dataset: Finally, the dataset is split into a training set and a validation set. The training set is used to train the model, while the validation set is used to evaluate the model’s performance. f) Features can be converted to another format. This can for example be from probes to genes using R-packages such as hgu133plus2.db, annotationDbi and biomaRt (or web interfaces associated with these), using Gene Expression Omnibus (GEO) platform annotations (e.g. for the Affymetrix HG U133+2 platform), using custom chip definition files (CDFs) from e.g. the Brainarray project or directly using annotations provided in publicly available datasets (e.g. gene expression data for Affymetrix HG U133+2 from the web-based tool Cellminer; https://discover.nci.nih.gov/cellminer/loadDownload.do). This can be done prior to model development or after generating the model. If it is done prior to model development, the features present in the final model will depend on which method was used. If done post model development, conversion method only has an effect on how the model is presented in e.g. publications and will not change the genes present in the model. By following these one or more steps, the training data sets can be preprocessed and prepared for training the predictive model. [0043] Preprocessing the data sets by mean or mean imputation is a known technique used in data preprocessing to handle missing or NaN (Not a Number) values in a dataset. When data is collected, it is possible that some values may be missing due to various reasons such as equipment failure, human error, or data corruption. In order to use this data for analysis or machine learning, it is needed to fill in the missing values with some reasonable values. Mean imputation is a simple method to fill the missing values with the mean of the available values for that feature/column. In mean imputation, the missing values of a feature are replaced with the mean of the available values of that feature. However, mean imputation may present some limitations, such as assuming that missing values are missing at random, and that the distribution of the available values is roughly normal. In cases where such assumptions are not met, mean imputation may be a less appropriate method for handling missing values and may be substituted or supplemented by other methods such as median imputation or interpolation. In median imputation missing values of a variable are replaced by the calculated
Case Ref. P128WO by IPTector® median value of the available data in the data set. In interpolation missing values are estimated from a range of known data points by using a mathematical formula or algorithm to calculate the value. The technique involves fitting a curve or function to the existing data points and then using that function to predict the value of any point within the range. There are several interpolation methods, including linear interpolation, polynomial interpolation, spline interpolation, and kriging. [0044] Mode imputation is also a known technique used in data preprocessing to handle missing or NaN (Not a Number) values in a dataset. Mode imputation is a simple method to fill the missing values with the mode of the available values for that feature/column. In mode imputation, the missing values of a feature are replaced with the mode of the available values of that feature. The mode is the value that appears most frequently in the dataset. However, mode imputation may present some limitations, such as assuming that missing values are missing at random, and that the distribution of the available values is roughly skewed. In cases where these assumptions are not met, mode imputation may be less appropriate for handling missing values. [0045] Constant removal is a technique used in data preprocessing to identify and remove features (columns) that have the same value for all the observations in a dataset, as such features do not provide any meaningful information to the model, and hence can be removed to simplify the dataset. In constant removal, features that have the same value for all the observations in the dataset are first identified, for example by calculating the variance of each feature. If the variance of a feature is zero, it means that all the observations have the same value for that feature, and it can be considered as a constant feature. Constant removal should be done carefully, and domain knowledge and context of the data should be considered prior to removing any features. [0046] Preprocessing the data sets by data transformation can be useful to make the data more suitable for analysis and can suitably include one or more of the following transformations: a) Scaling or normalization which transforms the data to a common scale, which can help improve the accuracy of a model. Common methods of scaling include z-score normalization and min-max normalization. b) Logarithmic transformation for when the data is skewed or when the variance increases as the mean increases. The transformation reduces the effect of extreme values and makes the data more normally distributed; c) Binning which is a method of discretizing continuous variables by grouping them into categories. This can be useful when there is a nonlinear relationship between the independent and dependent variables; d) Dummy variables created by converting categorical variables into binary variables. This can be useful when the categorical variables are not ordinal and have no natural ordering, or
Case Ref. P128WO by IPTector® e) Outlier treatment involving identification and removal of outliers or transforming the data to make the outliers less influential. Outliers can have a significant impact on the accuracy of predictive models. [0047] Feature extraction/selection is an important step in developing good predictive models, as it helps identify the most important variables or features that contribute to the accuracy of the model. Feature extraction involves transforming the raw data into features that can be used for modeling, which can involve transforming variables into categorical variables or aggregating variables into summary statistics. In some embodiments feature extraction/selection suitably includes one or more of the following elements: a) Correlation analysis for identifying variables that are highly correlated with the outcome variable. Highly correlated variables can be removed to avoid redundancy; b) Univariate selection involving selecting variables based on their individual performance in predicting the outcome variable. This can be done using statistical tests, such as chi-squared test or t-test, or by setting a threshold for correlation coefficients; c) Recursive Feature Elimination (RFE) which is an iterative process that selects the most important variables by recursively removing the least important variables. This process is typically repeated until a desired number of variables is reached; d) Principal Component Analysis (PCA) which is a technique that transforms variables into a smaller set of uncorrelated variables that capture most of the variation in the original data. PCA can be useful for reducing the number of variables and removing multicollinearity. e) Test budgeted Statistically Equivalent Signature (SES) algorithms; f) Regularization methods such as Lasso or Ridge regression, adding penalties to regression coefficients to shrink the less important variables to zero. This can be useful for reducing the number of variables and avoiding overfitting; g) SHAP values (SHapley Additive exPlanations) derived from ensemble learning methods such as Random Forest and XGBoost and deep learning approaches such as Graph Neural Networks (GNNs; e.g. DeepCDR), which can explain feature importance; h) Domain knowledge which is used to identify variables that are likely to be important in predicting the outcome variable. This can be particularly useful when dealing with complex or non-linear relationships; i) Testing the predictive effect of each feature on their own. Take the best-performing features as themselves (core genes), or make a network analysis (e.g. STRING analysis or weighted correlation network analysis (WGCNA) and incorporate genes that are associated with the core genes or make clusters of relevant genes and use these as individual features; and/or.
Case Ref. P128WO by IPTector® j) Inclusion of genes known to be associated with anti-cancer compounds. k) Use of differentially expressed genes between non-responders and responders to treatment with anti-cancer drugs. [0048] Feature engineering is the process of creating new features or transforming existing ones to improve the performance of a predictive model. Feature engineering suitably include one or more of the following elements: a) Feature scaling involving scaling features to a common scale, which can be useful for improving the performance of models that rely on distance measures or gradient descent algorithms; b) Feature encoding involving encoding categorical variables into numerical variables that can be used for modeling. This can be done using one-hot encoding or label encoding; or c) Feature combination involving combining multiple features to create new features that may be more informative for modeling. This can be done using arithmetic or geometric combinations. [0049] Overall, feature engineering can greatly improve the accuracy and efficiency of the model. The specific elements of feature engineering used will depend on the nature of the data and the goals of the analysis. [0050] Data splitting is an important step in developing good predictive models as it allows the model to be trained on one subset of the data and tested on another subset to evaluate its performance. Data splitting suitably include one or more of the following elements: a) Randomization involving randomly selecting samples from the data to create the training and testing sets. This ensures that the samples are representative of the entire population and avoids bias; b) Stratification involving splitting the data into training and testing sets based on a specific criterion, such as class labels or continuous variables. This ensures that both the training and testing sets have a similar distribution of variables; c) Cross-validation involving splitting the data into multiple subsets, or folds, and using each fold as the testing set while the other folds are used as the training set. This allows for a more robust evaluation of the model’s performance; d) Time-based splitting involving splitting the data based on time, with earlier data used for training and later data used for testing. This is particularly useful for time-series data. Sampling techniques: Sampling techniques, such as oversampling or undersampling, can be used to balance the distribution of classes in the training and testing sets. This can be particularly useful for imbalanced datasets.
Case Ref. P128WO by IPTector® e) Validation sets used to fine-tune the model’s hyperparameters and evaluate its performance before applying it to the testing set. [0051] Overall, the choice of data splitting method will depend on the nature of the data and the goals of the analysis. The data splitting process should be designed to ensure that the model is trained and evaluated on representative samples of the data and that the results are reliable and generalizable. [0052] There are various types of algorithms suitable for predictive modeling, each with their own strengths and weaknesses. A suitable algorithm or combination of algorithms can be selected among one or more of the following: a) Linear Regression which is a simple and widely used algorithm that models the relationship between the input variables and the output variable as a linear function; b) Logistic Regression which is used for classification problems and models the probability of an event occurring based on the input variables; c) Decision Trees which is a popular algorithm for both classification and regression problems. They create a tree-like model of decisions and their possible consequences. d) Random Forest which is an ensemble learning algorithm that creates multiple decision trees and combines their predictions to improve accuracy and reduce overfitting; e) Support Vector Machines (SVM) which is a popular algorithm for classification problems that finds the optimal boundary (hyperplane) between classes; f) K-Nearest Neighbors (KNN) which is a simple and effective algorithm for classification and regression problems that predicts the outcome variable based on the nearest K neighbors in the training data; g) Artificial Neural Networks (ANNs) which is a class of algorithms that are inspired by the structure and function of the human brain. They can be used for a wide range of classification and regression problems. h) Gradient Boosting which is an ensemble learning algorithm that combines multiple weak learners (e.g., decision trees) to create a stronger model, and/or i) Naïve Bayes is a probabilistic algorithm that is commonly used for text classification problems. [0053] Overall, the choice of algorithm will depend on the nature of the data and the specific problem being addressed. It is common to try multiple algorithms and compare their performance to select the best one for a particular task. [0054] In some embodiments the data sets are preprocessed using mean imputation, Mode imputation, constant removal and standardization, and Lasso feature selection is applied, optionally
Case Ref. P128WO by IPTector® using a gap penalty of 0,5, while the predicting algorithm is SVM of types C-SVC with polynomial Kernel and hyper-parameters cost=1,0, gamma=0,1 and degree=4. In other embodiments the data sets are preprocessed using mean imputation, Mode imputation, constant removal and standardization, and a feature selection using a Test budgeted Statistically Equivalent Signature (SES) algorithm with hyper parameters maxK=2, alpha=0,1 and budget=3*nvars, while the predicting algorithm is Classification Random Forests with 100 tress and with deviance splitting criterion and a leave size=2 and variables to split =1,144sqrt (nvars). In still other embodiments the data sets are preprocessed using mean imputation, Mode imputation, constant removal and standardization, and Lasso feature selection is applied, optionally using a gap penalty of 0,25, while the predicting algorithm is Classification Random Forests with 100 tress and with deviance splitting criterion and a leave size=2 and variables to split =1,144 sqrt (nvars). [0055] Training of the predicting model with the preprocessed data may involve a single training run or multiple training runs with different parameters and may suitably be run on an anti-cancer agent by anti-cancer agent basis. [0056] In some embodiments the algorithm or computer model of the invention applies Buzzard’s Conjecture. [0057] In the second aspect the invention provides an algorithm running on a computer compiling (i) data from one or more mRNA transcriptomes of a first library of cancer cells; (ii) data from one or more mRNA transcriptomes of a first library of cancer cell lines; and (iii) cytocidal and/or cytostatic effect data for one or more anti-cancer agents on the first library of cancer cell lines, said algorithm upon subjecting data from one or more mRNA transcriptomes of a patient’s cancer cells to the algorithm it returns one or more anti-cancer agents predicted to have cytocidal and/or cytostatic effects against the patients cancer cell. [0058] In the third aspect the invention provides a system for predicting an effective anti-cancer treatment of cancer cells, said system comprising a computer running one of more algorithms of the invention and the steps of the method of the invention. [0059] In the fourth aspect the invention provides a method for treating a cancer in a patient comprising a) extracting a tissue sample from the patent comprising cancer cells; b) generating a mRNA transcriptome of the cancer cells by sequencing mRNAs produced by the cancer cells; c) processing the mRNA transcriptome data in the system of the invention, and thereby predicting one or more anti-cancer agents having cytocidal and/or cytostatic effects against the cancer cells; and
Case Ref. P128WO by IPTector® d) Administering a medicament comprising the predicted one or more anti-cancer agents to the patient in a therapeutically effective amount. [0060] The said method can in some embodiments comprise administering at least two anti-cancer agents and wherein the anti-cancer agents are administered sequentially from the predicted most effective to the predicted least effective. In other embodiments the said method comprises administering at least two anti-cancer agents and wherein the anti-cancer agents are administered simultaneously. [0061] In further aspects the invention also provides compositions comprising a combination of anti- cancer agents predicted by the model described herein for treating a cancer in a patient. Further embodiments [0062] As anthracyclines are highly effective drugs but are toxic and feared for their cardiotoxicity, anthracyclines are optional adjuvant treatment of oestrogen receptor (ER) positive, HER2 negative early high-risk breast cancer. Therefore, within the scope of this invention a biomarker to predict if a patient will benefit from anthracycline treatment could enhance cure rate and reduce overall toxicity. Therefore, particularly described herein is a clinically validated, predictive computer implemented model/algorithm/test based on clinical tumor transcriptome data and the cell line panel NCI60, which can forecast Epirubicin response in breast cancer patients with high accuracy (see also figure 6). [0063] In the work with Epirubicin microarray gene expression and growth inhibition levels (GI50) from the NCI60 cell line panel were used to build a prediction algorithm. A tumor-specific variance- based filter with +6.000 patient tumors were used to select the most relevant genes. AUC was used to evaluate the performance of numerous configurations and combinations of parameters leading to different gene-signature sets. Response scores based on the gene expression level of all genes in each signature were generated for each sample in a published cohort. A 141 gene signature performed best (Progressive Disease vs. Response AUC 0,81 and anova-test p-value < 0,01) in GSE41998. The model was validated on a cohort of 153 Danish metastatic breast cancer patients treated with Epirubicin, with time-to-progression (TTP) as endpoint. [0064] The hazard ratio (‘HR’) for the best-performing signature was 0,31 when comparing patients with a difference of at least 50 in response score (two-sided p-value = 0,012, 95% confidence interval 0,12-0,77, endpoint TTP, continuous scoring). This was consistent in a multivariate model including ER and HER2. Patients were divided into groups of 50% lowest vs.50% highest (not significant) and 20% lowest vs. middle vs. 20% highest scoring patients (Fig 1.) – highly significant. To test if the gene signature was correlated between microarray- and RNA sequencing-based datasets, an established test method was used (Pedersen, C. B., Nielsen, F. C., Rossing, M., & Olsen, L. R. (2018). Using
Case Ref. P128WO by IPTector® microarray‐based subtyping methods for breast cancer in the era of high‐throughput RNA sequencing. Molecular Oncology, 12(12), 2136-2146.). Using this test method a high correlation between matching microarray and RNA-sequencing datasets were observed for the genes in our model (R2 = 0,87). [0065] Accordingly, using Epirubicin as one example, provided herein is a clinically validated, predictive test based on clinical tumor transcriptome data and the cell line panel NCI60 that can forecast anti-cancer agent, for example Epirubicin, response in cancer patients (for example breast cancer) with high accuracy. Working Examples Example 1 [0066] This specific example describes a general methodology, and the anti-cancer agent used herein can therefore be substituted with any other anti-cancer agent. Cisplatin, NSC 119875, is selected as anti-cancer agent for which sensitivity and resistance in cancer patients are to be modelled and predicted. [0067] A first prediction mRNA transcriptome algorithm is built by correlating the known sensitivity or imperceptibility/resistance of the 60 commercially available cancer cell lines in table 1 to cisplatin with the gene expression in these cell lines measured by recording and analyzing the mRNA transcriptome of the cancer cell lines (see figure 1). This algorithm establishes a first approximation of biomarkers/gene products of sensitive cancers and biomarkers/gene products of resisting/imperceptible cancers. The relevance of each biomarker/gene product can if necessary be verified by pathway analysis using published databases. [0068] In this algorithm multiple biomarkers/gene products were found pronounced for cell lines that were sensitive to cisplatin, and multiple of biomarkers/gene products were found pronounced for cell lines that were resistant or imperceptible to cisplatin. [0069] In a second step, the expression of biomarkers/gene products in a plurality of tumor/cancer types are evaluated by recording and analyzing the mRNA transcriptomes from cells of corresponding known tumor biopsies. For cells of each tumor biopsy the mRNA transcriptome data are compiled by the algorithm comparing the mRNA transcriptomes of tumors to the transcriptomes of the cancer cell lines and to the biomarkers/gene products of cell lines having sensitivity or imperceptibility/resistance to cisplatin, thereby correlating each tumor/cancer type sensitivity or imperceptibility/resistance to cisplatin. [0070] In a next step the expression of biomarkers/gene products in cells from a tumor biopsy taken from a cancer patient is obtained by recording and analyzing the mRNA transcriptome and these data are compiled by the prediction algorithm comparing the patient mRNA transcriptome with the
Case Ref. P128WO by IPTector® biomarkers/gene products found for cell lines that are sensitive to cisplatin and to the biomarkers/gene products found for cell lines that are resistant or imperceptible to cisplatin. Where the biomarkers/gene products of the cancer patient tumor cells have an overweight of biomarkers/gene products found in cell lines that are sensitive to cisplatin, cisplatin is predicted to be an effective anti-cancer agent for treating the cancer, while where the biomarkers/gene products of the cancer patient tumor cells have an overweight of biomarkers/gene products found in cell lines that are imperceptible/resistant to cisplatin, cisplatin is predicted to be an ineffective anti-cancer agent for treating the cancer. [0071] Finally the cancer patient is treated with cisplatin and whether the cancer is sensitive or imperceptible/resistant to the treatment is monitored and these data are fed to the algorithm to verify or to dismiss the initial prediction. Example 2 Cisplatin [0072] Cisplatin, NSC 119875, was selected as anti-cancer agent for which sensitivity and resistance in cancer patients are to be modelled and predicted. [0073] mRNA transcriptomes for cancer cells in the form of RNA sequencing datasets from solid tumors were downloaded from Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) and ArrayExpress (https://www.ebi.ac.uk/biostudies/arrayexpress) and/or similar repositories with patients having various cancer types. [0074] mRNA transcriptomes from cancer cell lines were downloaded from NCII60, from the Cellminer platform (https://discover.nci.nih.gov/cellminer/) as available from Oct 26th 2021. [0075] Effect data for cisplatin against the cancer cell lines were sourced from NCI, DrugQuery (https://discover.nci.nih.gov/cellminer/drugQuery.do), presented as log(GI50) values for each cell line for cisplatin. The GI50 = growth inhibition was further evaluated and confirmed by the sulforhodamine B assay described in Shoemaker RH.; The NCI60 human tumour cell line anticancer drug screen: Nat. Rev. Cancer.2006 Oct; 6(10):813–23<; pmid:16990858) – see figure 1. [0076] The mRNA data were processed and used to train a predictive model using a commercial platform (https://jadbio.com/), including preprocessing the data by Constant Removal and Standardization; feature selection using LASSO Feature Selection with penalty=1.0; and a predictive algorithm was built using Support Vector Machine type C-SVC with Polynomial Kernel and hyper- parameters: cost = 0,1, gamma = 1,0, degree = 2. In this model multiple of biomarkers/gene products were found pronounced for cell lines that were sensitive to cisplatin. [0077] In-silico prediction showed a conservative estimation of AUC to be 0,73.
Case Ref. P128WO by IPTector® Conclusion The in-silico predictive evaluation showed an AUC of 0,73 (or higher) for the model(s) described above, evidencing a high predictive value. Example 3 – 5-FU [0078] 5-FU, NSC 19893, was selected as anti-cancer agent for which sensitivity and resistance in cancer patients are to be modelled and predicted. [0079] mRNA transcriptome data for cancer cells, cancer cell lines, and effect data for 5-FU was sourced and generated as described in example 2. For cancer cell line effect data see figure 3. [0080] The mRNA data were processed and used to train a predictive model using a commercial platform (https://jadbio.com/), including preprocessing the data by Mean Imputation, Mode Imputation, Constant Removal, and Standardization, feature selection using LASSO Feature Selection with penalty=0,5; and a predictive algorithm was built using Classification Ransom Forests training of 500 trees with Deviance splitting criterion, minimum leaf size = 3, and variables to split = 1,154 sqrt (nvars). In this model multiple of biomarkers/gene products were found pronounced for cell lines that were sensitive to 5-FU. [0081] In-silico prediction of the model showed an AUC of 0,75, and the prediction was evaluated blindly in a clinical dataset with 73 colorectal cancer patients who underwent surgery and were thereafter treated with 5-FU. The model predicted non-sensitive (resistant) cancer patients from sensitive cancer patients with endpoint being progression-free survival, with a confidence of p=0,01. Conclusion The built model displayed a strong predictive ability both in-silico, but also confirmed by clinical samples with measure of progression-free-survival after surgery and treatment with 5-FU. Example 4 – Afatinib [0082] Afatinib, NSC 750691, was selected as anti-cancer agent for which sensitivity and resistance in cancer patients are to be modelled and predicted. [0083] mRNA transcriptome data for cancer cells, cancer cell lines, and effect data for Afatinib was sourced and generated as described in example 2. [0084] The mRNA data were processed and used to train a predictive model using a commercial platform (https://jadbio.com/), including preprocessing the data by Constant Removal, and Standardization, feature selection using LASSO Feature Selection with penalty=1.0; and a predictive algorithm was built using Support Vector Machine type C-SVC with Polynomial Kernel and hyper-
Case Ref. P128WO by IPTector® parameters: cost = 0,1, gamma = 1,0, degree = 2. [0085] In-silico prediction of the model showed an AUC of 0,74. Conclusion [0086] The in-silico predictive evaluation showed an AUC of 0,74 for the model(s) described above, evidencing a high predictive value. Example 5 – generation of mRNA transcriptomes for diagnosed cancer tissues Introduction [0087] Key mRNA genetic markers of cancer types of table 2 was identified by recording a mRNA transcriptome from tissue samples of the said cancer types. Data was sourced from Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/). Table 2 Cancer type Tissue type Gene Expression Omnibus reference Bladder Fresh-frozen GSE31684 Breast Fresh-frozen GSE102484 Breast Fresh-frozen GSE21653 Breast Fresh-frozen GSE26639 Breast Fresh-frozen GSE12276 Breast Fresh-frozen GSE76124 Breast Fresh-frozen GSE27830 Breast Fresh-frozen GSE5460 Breast Fresh-frozen GSE43365 Breast Fresh-frozen GSE231629 Breast Fresh-frozen GSE146558 Breast Fresh-frozen GSE58812 Breast Fresh-frozen GSE23177
Case Ref. P128WO by IPTector® Breast Fresh-frozen GSE28821 Breast Fresh-frozen GSE76274 Breast Fresh-frozen GSE17907 Breast Fresh-frozen GSE47389 Breast Fresh-frozen GSE52322 Breast Fresh-frozen GSE50567 Breast Fresh-frozen GSE87007 Breast Fresh-frozen GSE12763 Colorectal Fresh-frozen GSE26682 Colorectal Fresh-frozen GSE143985 Colorectal Fresh-frozen GSE64857 Colorectal Fresh-frozen GSE13067 Colorectal Fresh-frozen GSE39084 Colorectal Fresh-frozen GSE29621 Colorectal Fresh-frozen GSE62932 Colorectal Fresh-frozen GSE30540 Colorectal Fresh-frozen GSE34489 Esophagus Fresh-frozen GSE26886 GI tract Fresh-frozen GSE8167 Head/neck Fresh-frozen GSE6791 Head/neck Fresh-frozen GSE10300 Kidney Fresh-frozen GSE53757
Case Ref. P128WO by IPTector® Liver Fresh-frozen GSE112790 Liver Fresh-frozen GSE9843 Liver Fresh-frozen GSE62232 Liver Fresh-frozen GSE121248 Lung Fresh-frozen GSE37745 Lung Fresh-frozen GSE50081 Lung Fresh-frozen GSE77803 Lung Fresh-frozen GSE43580 Lung Fresh-frozen GSE40791 Lung Fresh-frozen GSE115457 Lung Fresh-frozen GSE101929 Lung Fresh-frozen GSE19804 Lung Fresh-frozen GSE10245 Lymph Fresh-frozen GSE132929 Lymph Fresh-frozen GSE19246 Lymph Fresh-frozen GSE12195 Lymph Fresh-frozen GSE53820 Lymph Fresh-frozen GSE70910 Ovary Fresh-frozen GSE63885 Ovary Fresh-frozen GSE12172 Ovary Fresh-frozen GSE44104 Pancreas Fresh-frozen GSE15471
Case Ref. P128WO by IPTector® Prostate Fresh-frozen GSE32448 Stomach Fresh-frozen GSE57303 Uterus Fresh-frozen GSE120490 Uterus Fresh-frozen GSE26511 Methods and analysis [0088] mRNA transcriptomes of tissue samples were acquired from Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/). Duplicate samples were removed and potential batches were identified using principal component analysis (Software “R”, version 4.1.3). All datasets were normalized using GCRMA-normalization (Wu J, Gentry RIwcfJMJ (2021). gcrma: Background Adjustment Using Sequence Information. R package version 2.66.0.) and subsequently corrected for batch effects using ComBat (Leek JT, Johnson WE, Parker HS, Fertig EJ, Jaffe AE, Zhang Y, Storey JD, Torres LC (2024). sva: Surrogate Variable Analysis. R package version 3.42.0.). Results and conclusion [0089] From the transcriptome of the cancer type a dataset of 6.252 tumors was generated. This was used at a later filtering step (Example 7). Example 6 – mRNA transcriptomes of cancer cell lines and identification of anti-cancer agents impacting on cell proliferation (kill or stasis) of the cancer cell lines. Introduction [0090] mRNA transcriptome data from cell lines of Table 3 was acquired from National Cancer Institute Cellminer DB (https://discover.nci.nih.gov/cellminer/home.do). The downloaded data included growth inhibition (“GI50”) values for each cell line subjected to anti-cancer agents (see table 3), mRNA transcriptomes from NCI60 from both Affymetrix HG U133+2 and RNA-sequences. Table 3 Cancer type Cell lines Breast MCF7, MDA-MB-231, HS 578T, BT-549, T-47D Central nervous system SF-268, SF-295, SF-539, SNB-19, SNB-75, U251 Colon COLO 205, HCC-2998, HCT-116, HCT-15, HT29, KM12, SW-620 Leukemia CCRF-CEM, HL-60(TB), K-562, MOLT-4, RPMI-8226, SR
Case Ref. P128WO by IPTector® Melanoma LOX IMVI, MALME-3M, M14, SK-MEL-2, SK-MEL-28, SK-MEL-5, UACC- 257, UACC-62, MDA-MB-435, MDA-N. Non-Small Cell Lung A549/ATCC, EKVX, HOP-62, HOP-92, NCI-H226, NCI-H23, NCI-H322M, NCI-H460, NCI-H522 Ovarian IGROV1, OVCAR-3, OVCAR-4, OVCAR-5, OVCAR-8, SK-OV3, NCI/ADR- RES Prostate PC-3, DU-145 Renal 786-0, A498, ACHN, CAKI-1, RXF 393, SN12C, TK-10, UO-31 Methods and analysis [0091] For a specific compound GI50 values were correlated to the baseline transcriptome expression (GCRMA-normalized) from Affymetrix HG U133+2. RMA and GCRMA are two methods to preprocess (normalize) microarray data. For reference on RMA see Irizarry, R. A. et al. (2003). Summaries of Affymetrix GeneChip probe level data. Nucleic acids research, 31(4), e15-e15. Gene expression values with a Pearson correlation higher than 0,25 or lower than -0,25 or values for genes with False- discovery rate (FDR)-values above 0,2, were kept for further analysis as described in other methods (Chen, J. J., Knudsen, S., Mazin, W., Dahlgaard, J., & Zhang, B. (2012). A 71-gene signature of TRAIL sensitivity in cancer cells. Molecular cancer therapeutics, 11(1), 34-44). [0092] This was done using Python (https://www.python.org/ ) with packages: anyio==4.0.0, argon2- cffi==23.1.0, argon2-cffi-bindings==21.2.0, arrow==1.2.3, asttokens==2.4.0, async-lru==2.0.4, attrs==23.1.0, Babel~=2.14.0, backcall==0.2.0, beautifulsoup4==4.12.2, bleach==6.0.0, certifi~=2024.2.2, cffi==1.15.1, charset-normalizer==3.2.0, comm==0.1.4, contourpy==1.1.0, cycler==0.11.0, debugpy==1.8.0, decorator==5.1.1, defusedxml==0.7.1, exceptiongroup==1.1.3, executing==1.2.0, fastjsonschema==2.18.0, fonttools==4.42.1, fqdn==1.5.1, idna~=3.7, importlib- metadata==6.8.0, importlib-resources==6.0.1, ipykernel==6.25.2, ipython==8.12.2, ipython- genutils==0.2.0, ipywidgets==8.1.1, isoduration==20.11.0, jedi==0.19.0, Jinja2~=3.1.3, joblib==1.3.2, json5==0.9.14, jsonpointer==2.4, jsonschema==4.19.1, jsonschema-specifications==2023.7.1, jupyter==1.0.0, jupyter-console==6.6.3, jupyter-events==0.7.0, jupyter-lsp==2.2.0, jupyter_client==8.3.1, jupyter_core==5.3.2, jupyter_server==2.7.3, jupyter_server_terminals==0.4.4, jupyterlab==4.0.6, jupyterlab-pygments==0.2.2, jupyterlab-widgets==3.0.9, jupyterlab_server==2.25.0, kiwisolver==1.4.4, MarkupSafe~=2.1.5, matplotlib==3.7.2, matplotlib- inline==0.1.6, mistune==3.0.1, nbclient==0.8.0, nbconvert==7.8.0, nbformat==5.9.2, nest- asyncio==1.5.8, notebook==7.0.4, notebook_shim==0.2.3, numpy~=1.26.4, overrides==7.4.0, packaging~=24.0, pandas~=2.2.2, pandocfilters==1.5.0, parso==0.8.3, patsy==0.5.3, pexpect==4.8.0,
Case Ref. P128WO by IPTector® pickleshare==0.7.5, Pillow==10.0.0, pkgutil_resolve_name==1.3.10, platformdirs~=4.2.0, prometheus-client==0.17.1, prompt-toolkit==3.0.39, psutil==5.9.5, ptyprocess==0.7.0, pure- eval==0.2.2, pycparser==2.21, Pygments~=2.17.2, pyparsing==3.0.9, python-dateutil~=2.9.0.post0, python-json-logger==2.0.7, pytz~=2024.1, PyYAML==6.0.1, pyzmq==25.1.1, qtconsole==5.4.4, QtPy==2.4.0, referencing==0.30.2, requests==2.31.0, rfc3339-validator==0.1.4, rfc3986- validator==0.1.1, rpds-py==0.10.3, scikit-learn==1.3.0, scipy==1.10.1, seaborn==0.12.2, Send2Trash==1.8.2, six==1.16.0, sniffio==1.3.0, soupsieve==2.5, stack-data==0.6.2, statsmodels==0.14.0, terminado==0.17.1, threadpoolctl==3.2.0, tinycss2==1.2.1, tomli==2.0.1, tornado==6.3.3, traitlets==5.10.1, typing_extensions~=4.11.0, tzdata==2023.3, uri-template==1.3.0, urllib3~=2.2.1, wcwidth==0.2.6, webcolors==1.13, webencodings==0.5.1, websocket-client==1.6.3, widgetsnbextension==4.0.9, zipp~=3.18.1, jadbio~=1.2.13, pip~=23.2.1, wheel~=0.41.2, docutils~=0.21.1, Sphinx~=7.3.7, colorama~=0.4.6, setuptools~=68.2.0, toml~=0.10.2, yapf~=0.40.2, importlib_metadata~=7.1.0, pydash~=8.0.0, imagesize~=1.4.1, snowballstemmer~=2.2.0, untokenize~=0.1.1 and html2text~=2024.2.26. Results and Discussion [0093] In this procedure a library of genes specific for drug efficacy for each anti-cancer agent in the cancer cell lines were identified. Example 7 – Creating a computer implemented predictive model based on transcriptome data. Introduction [0094] The data collected in examples 5 and 6 were used to build drug-specific predictive models. To ensure clinical relevance only genes that passed the variance filter generated from the data generated in example 5 were kept in the compound specific profiles. After creating the predictive model its precision was further trained/conditioned by the next steps. Methods and analysis [0095] From the dataset in Example 5 a variance filter was made. Genes with a variance in a top percetage, such as top 5% were kept for further analysis. As an alternative, filtering specific to cancer type could also be used so that a top 5 % per disease group was applied in some of the compound specific models. Results and conclusion [0096] In silico prediction showed that a model from i.e. Epirubicin performed with an overall area
Case Ref. P128WO by IPTector® under curve (AUC) of 0,81 in the receiver operating characteristics (ROC) curve. Example 8 – prediction of effective chemo agents from the mRNA transcriptome in normalized background population. Introduction [0097] The size of a background population needed to normalize a single sample depended on various factors, including the desired level of accuracy, the variability within the population, and the statistical methods being used. However, some general principles were applied as described below. Methods and analysis [0098] To be able to normalize to a background population a minimum of 30 comparable samples were required. In such a background population a score for each patient was generated by subtracting the mean gene expression of the negatively correlated genes from the mean gene expression of the positively correlated genes. The scores were then normalized to a scale between 0 and 100 by one of two methods, so that patients with high scores were expected to benefit from a treatment for example as specified in the examples for Epirubicin in Example 9-12: a) The scores were normalized using min-max normalization, so that the scores followed a Gaussian distribution. b) The rank of each score was divided by the total number of scores and multiplied by 100, so that the scores followed a uniform distribution. Results and conclusion [0099] Depending on the choice of normalization the distribution of scores will be either uniform or Gaussian. Example 9 – mRNA transcriptomes of cancer cell lines sensitive to Epirubicin. Introduction [0100] Cell lines of table 4 were evaluated for sensitivity and resistance to Epirubicin, NSC 256942, and correlated to the baseline transcriptome as described in Example 6. The GI50 values from Epirubicin NSC 256942 are shown in figure 4. Methods and analysis [0101] Data from cell lines of Table 4 was acquired from National Cancer Institute Cellminer DB (https://discover.nci.nih.gov/cellminer/home.do). The acquired data included growth inhibition
Case Ref. P128WO by IPTector® values (“GI50”) values for Epirubicin NSC 256942, transcriptomes from NCI60 from both Affymetrix HG U133+2 and RNA-sequences. Results and conclusion [0102] From the transcriptome of each cell line examples of key positive (Pearson correlation > 0,25) and negative genetic markers (Pearson correlation < -0,25) found is listed in table 4. In total 1509 genes were positively correlated to benefit of Epirubicin and 1125 genes were negative correlated. Table 4 Key positive gene markers Key negative gene markers CMAHP GIPC1 CRLF3 CD9 FERMT3 SHROOM3 ARHGAP30 SH3D19 PTPN22 GGA2 TRAF3IP3 RASEF WAS TMEM133 FMNL1 FAM84B ST8SIA4 RHPN2 SELPLG STAU1 Example 10 – Creating a computer implemented predictive model on transcriptome data Introduction [0103] The data collected in Example 5, 6, and 9 were loaded into Python by the use of packages as described in Example 6. Using the methodology in example 6 to first identify genes correlated to Epirubicin sensitivity and then using the methodology from example 7 to identify the genes with the highest variance using different combinations of parameters, multiple gene signature to predict Epirubicin sensitivity were generated. Using the methodology from example 8, each gene signature was used to generate scores for each patient in a dataset consisting of 279 breast cancer patients downloaded from GEO (https://www.ncbi.nlm.nih.gov/geo/). Each gene signature was then evaluated based on the generated scores.
Case Ref. P128WO by IPTector® Methods and analysis [0104] Dataset used for evaluation: Pre-treatment gene expression data and Epirubicin response data for a dataset of breast cancer patients were downloaded from GEO (https://www.ncbi.nlm.nih.gov/geo/, ID: “GSE41998”, see Horak, C. E. et al (2013). Biomarker analysis of neoadjuvant doxorubicin/cyclophosphamide followed by ixabepilone or Paclitaxel in early-stage breast cancer. Clinical Cancer Research, 19(6), 1587-1595). The raw data was normalized using robust multi-array average (RMA) (Gautier L, Cope L, Bolstad BM, Irizarry RA (2004). “affy—analysis of Affymetrix GeneChip data at the probe level.”Bioinformatics, 20(3), 307–315. ISSN 1367-4803, doi:10.1093/bioinformatics/btg405. Package version 1.78.0.). Treatment outcome consisted of ‘complete response’, ‘partial response’, ‘progressive disease’, ‘stable disease’, ‘unable to determine’ and ‘not available’ (NA). Patients with ‘partial response’ and ‘stable disease’ were grouped together, and patients with ‘unable to determine’ and ‘not available’ were omitted, after which 270 patients remained, of which 40 showed ‘complete response’, 225 showed ‘partial response’ and 5 showed ‘progressive disease’ (Table 5). Table 5 Cancer type Tissue type Prediction Breast cancer Tissue Complete response (n = 40), preserved in Partial response (n = 225) RNAlater Progressive disease (n = 5) [0105] AUC (Area Under Curve) was used to evaluate the performance of numerous configurations and combinations of parameters that led to different gene-signature sets. Response scores based on the gene expression level of all genes in each signature were generated for each sample in the cohort (Horak, C. E. et al (2013). Biomarker analysis of neoadjuvant doxorubicin/cyclophosphamide followed by ixabepilone or Paclitaxel in early-stage breast cancer. Clinical Cancer Research, 19(6), 1587-1595). Results and conclusion [0106] A final model with 141 genes was the best-performing signature (Progressive Disease versus Response AUC 0,81 and anova-test p-value < 0,01). Selected genes – key positive and key negative markers are presented in Table 6. Table 6 Key positive gene markers Key negative gene markers CHI3L2 TMEM54
Case Ref. P128WO by IPTector® CD2 TSPAN1 KCNE4 PDZK1IP1 CSTA CYP2J2 GZMA ELOVL7 PCOLCE2 ANXA3 TLR1 ODAM RHOH EREG ITK FOXQ1 CD38 ENPP5 Example 11 – Prediction of cancer patient for whom treatment with Epirubicin will be effective. Introduction [0107] The model of Example 10 was used to predict whether a patient would obtain a favourable response to treatment with Epirubicin by testing the mRNA transcriptome of the cancer tissue of the patient in the predictive model. Methods and analysis [0108] As described in Examples 9, 10 and 11 microarray gene expression levels and growth inhibition levels (GI50) for Epirubicin in the NCI60 cell line panel were acquired from NCBI to build a prediction algorithm. [0109] The best performing model generated in Example 10 was validated on a cohort of breast cancer patients treated with Epirubicin in the metastatic setting (n=153), with time-to-progression (TTP) as endpoint, using both normalization techniques from Example 8. Results and conclusions: [0110] The hazard ratio (‘HR’; risk of disease progression in one group compared to another at any given time point) for the best-performing signature (subset of genes) was 0,31 when comparing patients with a difference of at least 50 points in response score (two-sided p-value = 0,012, 95% confidence interval 0,12-0,77, endpoint time to disease progression (TTP), continuous scoring). This was consistent in a multivariate model including ER and HER2. Dividing the scores into groups of lowest 20 % middle 60 % and top 20 %, TTP was statistically significantly different between the three groups (HR 0,44, p<0,01, Figure 5). [0111] In 153 patients treated with Epirubicin time-to-progression was dependent on the AIDA Epirubicin score (hazard ratio (HR) 0,44, p<0,01 for the difference between the top 20% score, the
Case Ref. P128WO by IPTector® middle 60% and the bottom 20% score). These results clinically validated that the developed predictive model accurately generated a response signature for Epirubicin in breast cancer patients and that the model predicted efficacy of treating breast cancer patients with the response signature with Epirubicin. Example 12 translation between microarray and RNA sequencing Introduction [0112] A method was established to test if the gene signature was correlated between microarray- and RNA sequencing-based datasets. The method was developed using the general methodology described in (PMID: 30289602) (Pedersen, C. B., Nielsen, F. C., Rossing, M., & Olsen, L. R. (2018). Using microarray‐based subtyping methods for breast cancer in the era of high‐throughput RNA sequencing. Molecular Oncology, 12(12), 2136-2146.). Methods and analysis [0113] The method aligned raw RNA sequencing reads to microarray probes (i.e. Affymetrix HGU 133 +2). Results and conclusions [0114] Using the data and method provided by Pedersen et al (PMID: 30289602), a high correlation was observed between matching microarray and RNA sequencing datasets for the genes in our model (R2 = 0,87). [0115] Using the method from Pedersen et al to translate drug-specific models, the expression of the genes in the signature showed to be highly correlated between microarray- and RNA sequencing data for the Epirubicin-specific model in Example 11. Example 13 Docetaxel and Paclitaxel predictors. [0116] Similarly to the procedure in example 6 mRNA quantification data (gene expression data) was downloaded for CCLE (Cancer Cell Line Encyclopedia) from the DepMap data portal (https://depmap.org/portal/data_page/?tab=allData) and response data for anti-cancer drug compounds was downloaded for GDSC (Genomic Drug Sensitivity in Cancer) from ‘cancerrxgene.org’. Cell lines from tissues such as brain, skin and other disease groups rarely treated with docetaxel, or cell lines from disease groups which generally respond well to majority of treatments, such as blood cancers, were omitted. [0117] The procedures described in examples 5, 6 and 7 were run in the exact same manner as before, but this time CCLE (for gene expression) and GDSC (for drug sensitivity) were used instead of
Case Ref. P128WO by IPTector® NCI60. In addition, genes from literature known to interact with Docetaxel were also included. This generated a total of 330 signatures (gene subset). [0118] The signatures were trained and cross-validated using a machine learning algorithm, which weighed the importance of each gene. Training was done on a cohort of 164 breast cancer patients treated with docetaxel (samples originating from FFPE tissue) and done using 100 iterations to get a robust estimate of the signatures’ performance. Hazard Ratio (HR) was used to determine the best performing model. A final model using a group variance filter, consisted of 137 genes and had a mean HR = 0,37 across the 100 iterations. Selected genes – key positively weighted or negatively weighted genes are presented in table 7. Using machine learning to weigh genes differed from the procedure of Example 11. Table 7 Key positively weighted genes Key negatively weighted genes FAM110C DLGAP5 TOX3 SLC40A1 DIO1 BCAS1 LY6E PLAT LYZ LGR6 CALB2 CRABP2 TFPI2 GLDC KRT15 TMPRSS3 CYBRD1 RSPH1 TM4SF18 RTP4 The best-performing model was tested on a cohort of 41 breast cancer patients (samples originating from FFPE tissue) treated with paclitaxel (another taxane having the same mechanism of action and structurally similar to docetaxel) in the exact same manner as in example 11, except patients were only split into two groups – those with scores > 50 (considered sensitive) and those with scores <= 50 (considered resistant). The hazard ratio (HR) was 0.27, p-value < 0.01 in a continuous model with time to progression (TTP) as endpoint, in essence meaning that for any two individuals with a score difference of 50, the patient with the higher score has a 73% reduction in disease progression risk
Case Ref. P128WO by IPTector® when both are treated with paclitaxel. Dividing patients into two groups (scores > 50 and scores <= 50) and comparing TTP, a similarly large reduction in risk for higher scoring patients was seen (HR = 0.35, p-value = 0.051; see figure 7). Example 14 Fresh Frozen Taxane predictor Introduction The model of Example 13 was shown to perform suboptimal in predicting on Fresh Frozen tissue. One possible explanation is illustrated in Figure 8 (right plot), where a PCA plot based on the expression of the 137 genes included in the model clearly highlights differences between FFPE and fresh-frozen tissue samples. The framework of Example 13 was used to build a new model to predict on Fresh Frozen tissue, using Fresh Frozen samples instead of FFPE samples as input in the training phase. Methods and analysis Using the exact same methods as in Example 13, the same 330 signatures were produced, but within each of these 330 signatures, only the genes found on the Affymetrix HG U133A array (a subset of the Affymetrix HG U133+2 array) were kept, as much publicly available data from fresh frozen biopsies originated from this array. As in Example 13, the signatures were trained using machine learning. The training data consisted of a cohort of patients (n = 42) treated with a taxane- and anthracycline-based regimen, with gene expression data from Fresh Frozen biopsies. Samples were split into training and test subsets (70:30 ratio), and using the same machine learning algorithm as in Example 13,weights were assigned to each gene in each signature and the performance was evaluated with Area under the Curve (AUC) on the test subset. This was repeated 20 times with different random seeds for the training/test split (meaning that different samples, but the same total number of samples, are put into the training and test subsets respectively), and the best performing model was selected based on median AUC across 20 iterations. The best performing model (median AUC = 0,78) consisted of 113 genes (32 overlaps with the signature from Example 13). The best performing model was validated on 5 independent cohorts (n total = 819) of patients treated on a taxane- and anthracycline-based regimen, with pathologic complete response (pCR) as endpoint (see table 8). Results and conclusions The signature accurately discriminated between responders (pathologic complete response, pCR) and non-responders (residual disease, RD; no complete response, nCR) across five independent test datasets, achieving an average AUC of 0.77. In each of the five datasets, differences in signature scores between responders and non-responders were statistically significant (p < 0.01). In table 9 the key
Case Ref. P128WO by IPTector® positively weighted and negatively weighted genes of the gene signature are listed, and in figure 9 boxplots of score plotted against response group for each dataset are presented. These results clinically validate that the model accurately predicts pCR in patients neoadjuvantly treated with taxane and anthracycline when samples are from Fresh Frozen biopsies. These results further strengthened the capability of the methods described herein to discover and predict gene signatures, highlighting that the framework can be adjusted in the training phase to match the endpoint application. Table 8 Dataset name Endpoint Number of patients Training data GSE140494 pCR 42 Testing data GSE20194 pCR 267 GSE20271 pCR 148 GSE23988 pCR 57 GSE32646 pCR 115 GSE230881 pCR 232 Table 9 Key positively weighted genes Key negatively weighted genes FOSB GMNN PLA2G10 APLNR CNGA1 MCM5 ABHD2 CSRP2 IFT70A CDC7 KRT8 NR1H3 CYP3A4 ITGB2 AOC1 SPRY1 TFF2 TFF1 SIGMAR1 ALDH1B1
Case Ref. P128WO by IPTector® * * *