[go: up one dir, main page]

WO2025058517A1 - Biomarkers for typing a sample of an individual for hepatocellular carcinoma. - Google Patents

Biomarkers for typing a sample of an individual for hepatocellular carcinoma. Download PDF

Info

Publication number
WO2025058517A1
WO2025058517A1 PCT/NL2024/050499 NL2024050499W WO2025058517A1 WO 2025058517 A1 WO2025058517 A1 WO 2025058517A1 NL 2024050499 W NL2024050499 W NL 2024050499W WO 2025058517 A1 WO2025058517 A1 WO 2025058517A1
Authority
WO
WIPO (PCT)
Prior art keywords
hcc
proteins
pex14
klrg2
individual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/NL2024/050499
Other languages
French (fr)
Inventor
Arie Cornelis BREEDVELD
Blandine Jenneke Huguette LE TALLEC
Mark Johannes Adrianus SCHOONDERWOERD
Leon René LOOGMAN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Levels Diagnostics Holding BV
Original Assignee
Levels Diagnostics Holding BV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Levels Diagnostics Holding BV filed Critical Levels Diagnostics Holding BV
Publication of WO2025058517A1 publication Critical patent/WO2025058517A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57407Specifically defined cancers
    • G01N33/57438Specifically defined cancers of liver, pancreas or kidney

Definitions

  • the invention relates to methods for typing of a sample of an individual for cancer, particularly hepatocellular carcinoma.
  • the invention is directed to a set of marker proteins to type a hepatocellular carcinoma.
  • 1 INTRODUCTION The most common type of liver disorder, affecting approximately a quarter of the world population, is non-alcoholic fatty liver disease (NAFLD) (Marjot et al., 2020. Endocr Rev 41: bnz009). NAFLD is characterised by accumulation of fat in the liver (i.e.
  • NAFLD non-alcoholic steatohepatitis
  • fibrosis can advance to the point where the liver turns cirrhotic; at this stage the liver becomes functionally impaired as a consequence of cumulative scarring.
  • About 1 out of 5 NASH cases eventually progress to cirrhosis (Sheka et al., 2020. JAMA 323: 1175-1183).
  • Cirrhosis in turn, significantly increases the risk of patients to develop hepatocellular carcinoma (HCC), the most common type of liver cancer accounting for 75-85% of all liver cancer cases (Stra ⁇ et al., 2020. Clin Exp Hepatol 6: 170-175; Singal et al., 2020. J Hepatol 72: 250-261).
  • HCC hepatocellular carcinoma
  • HCC hepatitis B virus
  • HCV hepatitis C virus
  • liver cancer was the 6 th most prevalent type of cancer globally by annual number of new cases (906k, 4.7% of the world total) and the third most deadly by annual number of new deaths (830k, 8.3% of the world total) (Sung et al., 2021. CA Cancer J Clin 71: 209-249).
  • liver cancer was the 13 th most prevalent cancer type by number of new cases (88k, 2.2% of the European total) and the 7th most deadly by number of cancer-related deaths (78k, 4.0% of the European total) (Ferlay et al., 2020. Global Cancer Observatory: Cancer Today. available at gco.iarc.fr/today). Considering these numbers, it is clear that liver cancer has a major impact across the world.
  • liver cancer There are different methods available to diagnose or screen for liver cancer: imaging techniques like ultrasound (US), magnetic resonance imaging (MRI) or computed tomography (CT) scanning; histological examination, which requires performing a liver biopsy; or by measurement of blood levels of certain biomolecules, such as alpha-fetoprotein (AFP) (Marrero et al., 2018. Hepatology 698: 723-750). US scanning is currently the recommended practice for biannual surveillance testing in adult patients with cirrhosis, as they are at a higher risk for developing HCC (Marrero et al., 2018. Hepatology 698: 723-750). Following a positive ultrasound, HCC diagnoses are generally confirmed through MRI.
  • US scanning techniques like ultrasound (US), magnetic resonance imaging (MRI) or computed tomography (CT) scanning
  • AFP alpha-fetoprotein
  • HCC diagnoses are generally confirmed through MRI.
  • Liver cancer also generally doesn’t show clear symptoms, if any at all, especially in early stages (Ayuso et al., 2018. J Radiology 101: 72-81). These factors complicate early-stage diagnosis significantly.
  • Several proteins are being studied for potential use as liver cancer biomarkers for surveillance testing, the most widely tested of which is AFP. However, inclusion of AFP in screening guidelines is considered “suboptimal in terms of cost-effectiveness and for routine surveillance” (Galle et al., 2018. J Hepatol 69: 182-236). The most prominent HCC screening test in development is the Roche Diagnostics Elecsys® GAAD assay.
  • This multivariate assay is based on the so-called GAAD score which combines gender, age, and plasma measurements of the proteins AFP and des-gamma-carboxy prothrombin (DCP) (Marrero et al., 2018. Hepatology 698: 723-750; Galle et al., 2018. J Hepatol 69: 182-236; Best et al., 2020. Clin Gastroenterol Hepatol 18: 728-735; Schotten et al., 2021. Pharmaceuticals 14: 735; Yang et al., 2019. Cancer Epodemiol Biomarkers Prev 28: 531-538).
  • DCP des-gamma-carboxy prothrombin
  • DCP is also known as prothrombin induced by vitamin K absence or antagonist-II (PIVKA-II).
  • PIVKA-II reportedly produced an area under the curve (AUC) ranging between 0.83 and 0.87 (Best et al., 2020. Clin Gastroenterol Hepatol 18: 728-735).
  • AUC area under the curve
  • the GAAD assay was granted Breakthrough Device Designation by the United States Food and Drug Administration, though the Clinical Practice Guidance by the American Association for the Study of Liver Diseases has currently not recommended it, noting that phase II case-control biomarker studies showed promising results but that phase III and IV studies were still needed to assess the assay’s performance in larger cohorts (Marrero et al., 2018.
  • the invention is directed to an in vitro method of typing a sample of an individual for the presence of a hepatocellular carcinoma (HCC), the method comprising: (i) determining the concentration of at least 2 marker proteins to thereby provide a concentration profile of the marker proteins, wherein the marker proteins are selected from the proteins listed in Table 1; (ii) comparing the individual’s concentration profile to a reference concentration profile of the at least 2 marker proteins; thereby typing the sample for the presence of HCC.
  • Said sample preferably is a plasma or serum sample.
  • the determination of the protein concentration preferably is performed using an enzyme-linked immunosorbent assay (ELISA), preferably a multiplex ELISA.
  • ELISA enzyme-linked immunosorbent assay
  • the at least 2 marker proteins are selected from PEX14, KLRG2, ARL4D, RAB38, PKD2 and NKG2E, or from PEX14, KLRG2, RAB38, GALNS, CP2CJ and IMPA1.
  • the at least 2 marker proteins comprise PEX14 and KLRG2.
  • the at least 2 marker proteins comprise PEX14, KLRG2 and at least one protein selected from ARL4D, RAB38,PKD2, and NKG2E, preferably PEX14, KLRG2 and at least two proteins selected from ARL4D, RAB38, PKD2 and NKG2E, more preferably comprise PEX14, KLRG2, ARL4D, RAB38 and PKD2.
  • the protein concentration of at least 5 different marker proteins, more preferably at least 6 different marker proteins, more preferably at least 7 different marker proteins, more preferably at least 8 different marker proteins, more preferably at least 10 different marker proteins, more preferably at least 20 different marker proteins selected from the proteins listed in Table 1, most preferably all marker proteins listed in Table 1, is determined.
  • Said individual may be at risk of having or developing HCC.
  • Said individual may have cirrhosis, fibrosis, chronic hepatitis B, chronic hepatitis C, alcoholic liver disease, NAFLD, NASH, primary biliary cholangitis, primary hemochromatosis, auto-immune hepatitis, alpha-1 antitrypsin deficiency, or Wilson's disease.
  • Said reference concentration profile may be composed of the average concentrations of the marker proteins specified in step (ii) of individuals having HCC; of individuals not having HCC; or of a mixture of individuals having HCC and individuals not having HCC.
  • the individual’s concentration profile may be compared to two reference concentration profiles, wherein one reference concentration profile is composed of the average concentrations of the marker proteins specified in step (ii) of individuals having HCC and the other reference concentration profile is composed of the average concentrations of the marker proteins specified in step (ii) of individuals not having HCC.
  • the invention further provides a method of treating an individual with HCC, comprising typing of a sample from said individual using the method of typing according to the invention, treating the individual that is typed as having HCC with a curative treatment; and testing the individual that is typed as not having HCC with the method of typing according to the invention at regular time intervals, such as every three years, preferably every two years, preferably every year, more preferably every six months.
  • Said curative treatment may comprise liver transplantation, ablation, surgical resection or a combination thereof.
  • the individual that is typed as not having HCC is treated with a treatment strategy related to the individual’s underlying risk factor for HCC.
  • the individual is at risk of having or developing HCC.
  • the log2(FC) indicates whether protein levels were higher (positive) or lower (negative) in NASH-HCC samples compared to NASH control samples.
  • the p-value indicates the odds that the measured difference is the result of randomness.
  • Three alternative significance levels are depicted horizontally; the Bonferroni-adjusted significance level corrects for the number of proteins in the dataset, the PC-based cut off corrects for the number of principal components which describe over 99% of the variance in the dataset.
  • Figure 3 Swarm plots and boxplots of the log10(RFU) distributions for two individual proteins.
  • Abbreviation: RFU relative fluorescence intensity.
  • Figure 4. ROC curves illustrating the performance of three individual proteins in distinguishing NASH and NASH-HCC patients.
  • the three biomarkers are Glypican 3, which showed the highest AUC (0.75), IGFALS, which showed the most significant p-value (6.4e-5), and alpha-fetoprotein (AFP), a known HCC biomarker which may be used for screening in later stages but which performs poorly in early-stage HCC.
  • the diagonal dotted line indicates the line of no- discrimination, corresponding to the performance of a model based entirely on random guessing.
  • Figure 5 Plots of the percentage of proteomic data variance explained per principal component (histogram, left y-axis), and the cumulative data variance explained with each additional principal component (line plot, right y-axis).
  • the first 80 components are plotted of a total of 7,335.
  • the two ratio models combine HMGR and Glypican 3, which showed the highest AUC (0.75), and GON2 and WISP-2, which showed the most significant p-value (6.4e-5).
  • the diagonal dotted line indicates the line of no-discrimination, corresponding to the performance of a model based entirely on random guessing.
  • Figure 8. ROC curves of the best-performing SVM models for combinations of two (A), three (B), and four (C) proteins in detecting HCC in NASH patients. Curves were generated for 10 different random states of the data (dashed gray lines), for the mean model results (solid line), and for the mean model results ⁇ 1 standard deviation (gray area).
  • Heatmap illustrating the correlation between the 57 identified biomarker proteins Plots with a plus sign indicate positive correlation, plots with a minus sign indicate negative correlation.
  • the numbers of the proteins correspond with the numbers indicated in the overview in Table 6.
  • Figure 11. Heatmap of the p-values that were calculated to assess associations between several patient covariates and the 57 selected NASH-HCC protein biomarkers. The p-values were calculated using ordinary least squares regression. P-values higher than 0.05 were masked (depicted as white) for easy distinction.
  • MI body mass index
  • BCLC Barcelona Clinic Liver Cancer staging system
  • MELD Model For End-Stage Liver Disease
  • AST aspartate aminotransferase
  • ALT alanine transaminase
  • INR International Normalized Ratio
  • GPC3 Glypican 3
  • PSPN Persephin
  • AMY1A Amylase alpha 1A
  • NPPB N-terminal pro-BNP
  • IGF2R IGF-II receptor
  • CAL Calgranulin A
  • LMAN2 Lectin mannose-binding 2.
  • the first set (A) comprises 10,000 randomly selected combinations of 2 from the SomaScan dataset of >7000 protein concentration profiles.
  • the second set (B) comprises all possible combinations of 2 between the 57 protein biomarker candidates identified in the discovery study.
  • Figure 14. Plots showing the mean AUCs for two sets of SVM models combining 3 proteins per model.
  • the first set (A) comprises 10,000 randomly selected combinations of 3 from the SomaScan dataset of >7000 protein concentration profiles.
  • the second set (B) comprises all possible combinations of 3 between the 57 protein biomarker candidates identified in the discovery study.
  • Figure 15. Plots showing the mean AUCs for two sets of SVM models combining 4 proteins per model.
  • the first set (A) comprises 10,000 randomly selected combinations of 4 from the SomaScan dataset of >7000 protein concentration profiles.
  • the second set (B) comprises all possible combinations of 4 between the 57 protein biomarker candidates identified in the discovery study.
  • Figure 16. Plots showing the mean AUCs for two sets of SVM models combining 5 proteins per model.
  • the first set (A) comprises 10,000 randomly selected combinations of 5 from the SomaScan dataset of >7000 protein concentration profiles.
  • the second set (B) comprises a subsection of 10,000 models that each combine 5 randomly selected proteins from the 57 protein biomarker candidates identified in the discovery study.
  • Figure 17. Plots showing the mean AUCs for two sets of SVM models combining 6 proteins per model.
  • the first set (A) comprises 10,000 randomly selected combinations of 6 from the SomaScan dataset of >7000 protein concentration profiles.
  • the second set (B) comprises a subsection of 10,000 models that each combine 6 randomly selected proteins from the 57 protein biomarker candidates identified in the discovery study.
  • Figure 18. ELISA standard curves for protein targets Glypican 3, PEX14, RAB38, PKD2, and IMPA1. The curves were fitted using four-parameter logistic regression.
  • Figure 20. Swarm plots and boxplots of the plasma Glypican 3 concentration distributions across four subgroups, measured by ELISA.
  • cancer refers to a disease or disorder characterized by uncontrolled cell division. Said uncontrolled cell division may be caused by activating mutations that drive cell division and/or by an increase of survival or apoptosis resistance.
  • Cancer cells may acquire the ability to invade other neighbouring tissues (i.e. invasion), the ability to spread to other areas of the body where the cells are not normally located (i.e. metastasis) and/or the ability to establish new growth at ectopic sites.
  • non-alcoholic fatty liver disease refers to a liver disorder characterised by accumulation of fat in the liver (i.e. hepatic steatosis).
  • NAFLD occurs in the absence of demonstrable secondary causes like alcoholism, viral infections, medications, toxins, or congenital defects (Sheka et al., 2020. JAMA 323: 1175-1183).
  • NAFLD presents in four main stages: stage 1 is simple fatty liver (i.e.
  • NASH non-alcoholic steatohepatitis
  • stage 2 is non- alcoholic steatohepatitis (NASH) (i.e. steatosis with lobular inflammation but no fibrosis or balloon cells);
  • stage 3 is fibrosis (i.e. scarring) of the liver and
  • stage 4 is cirrhosis (i.e. irreversible, advanced scarring of the liver).
  • NAFLD includes any stage or degree of the disease.
  • non-alcoholic steatohepatitis (NASH) refers to a subtype of NAFLD, wherein inflammation of the liver is caused by fat build-up. NASH is not associated with alcohol consumption.
  • the term “NASH” may encompass steatosis, hepatocellular ballooning and lobular inflammation.
  • HCC hepatocellular carcinoma
  • Risk factors include chronic active hepatitis B, hepatitis C, and liver cirrhosis (e.g. caused by a hepatitis B or C virus infection, alcoholic liver disease, NAFLD, NASH, primary biliary cholangitis, primary hemochromatosis, auto-immune hepatitis, alpha-1 antitrypsin deficiency, or Wilson’s disease).
  • HCC can be subdivided into different HCC stages depending on the size of the tumour and whether there is spread to lymph nodes or other body tissues.
  • the most commonly used staging system for HCC is the Barcelona Clinic Liver Cancer (BCLC) staging system.
  • BCLC Barcelona Clinic Liver Cancer
  • the BCLC system takes into account a so-called ‘performance status’ of a patient.
  • a performance status of 0 means a patient is well, i.e.
  • a performance status of 1 means a patient is well enough to do everything he normally does, except for heavy work; a performance status of 2 means a patient is up for most of the day, but not well enough to work; a performance status of 3 means a patient needs to rest for more than half the day and need some help in looking after himself; a performance status of 4 means a patient is in bed or chair all day and needs help looking after himself.
  • the BCLC system further takes into account the functioning of the liver, e.g. measured using the so-called Child-Pugh system.
  • Child-Pugh group A means a patient’s liver is working normally; Child-Pugh group B means there is some liver damage; Child- Pugh group C means there is a lot of liver damage.
  • Barcelona stage 0 indicates very early stage HCC, wherein a patient has a single liver tumour that measures less than 2 cm across.
  • a patient is well (performance status 0) and the patient’s liver is working normally (Child-Pugh A).
  • Barcelona stage A indicates early stage HCC, wherein a patient has either a single tumour (of any size) or up to 3 tumours that are all less than 3 cm across.
  • a patient is well (performance status 0) and the patient’s liver may be working normally or there may be some liver damage (Child-Pugh A or B).
  • Barcelona stage B indicates intermediate stage HCC.
  • stage B a patient has multiple tumours in their liver, but are well overall (performance status 0).
  • the patient’s liver is working normally or there is only moderate liver damage (Child-Pugh A or B).
  • Barcelona stage C indicates advanced HCC.
  • the liver tumours have grown into blood vessels, or have spread to lymph nodes or other body organs.
  • performance status 1 or 2 a patient may not feel as well as normal and may be less active but is still reasonably fit.
  • performance status 1 or 2 In this stage, the patient’s liver is still working normally or there may be moderate liver damage (Child-Pugh A or B).
  • the term comprises any sample of bodily fluid or tissue obtained from an individual in order to type the individual for a liver disorder, here HCC.
  • a sample such as blood, plasma or serum, may comprise protein products from liver cancer cells from an individual.
  • a sample such as a tumour or liquid biopsy may comprise liver cancer cells from an individual.
  • the term “typing of a sample”, refers to the classification of a sample based on characterized features. In this invention typing includes the determination of protein concentrations in a sample, which may assist in the characterisation of an individual from which the sample is obtained as having a hepatocellular carcinoma (HCC), or not likely to have a HCC.
  • HCC hepatocellular carcinoma
  • protein concentration refers to a quantifiable level of a protein of interest.
  • a protein concentration can be expressed in absolute units, e.g. mg/ml or mol/ml, or can be expressed in relative units, such as relative to an internal reference standard e.g. relative fluorescence units (RFU).
  • RFP relative fluorescence units
  • the term “individual at risk of having or developing HCC”, refers to an individual having one or more conditions which are considered a risk factor for development of HCC. Such an individual is usually recommended by their physician or clinical expert to participate in an HCC surveillance program.
  • risk factors are: cirrhosis, fibrosis, chronic hepatitis B, chronic hepatitis C, alcoholic liver disease, NAFLD, NASH, primary biliary cholangitis, primary hemochromatosis, auto-immune hepatitis, alpha-1 antitrypsin deficiency and Wilson’s disease (Marrero et al., 2018. Hepatology 68(2): 723-750).
  • the term “curative treatment”, refers to a treatment that aims to cure a disease (here HCC) or to improve or alleviate symptoms associated with a disease (here HCC).
  • the term “palliative treatment”, refers to a treatment or therapy that does not aim at curing a disease but rather at providing relief.
  • the term “capturing molecule”, refers to a molecule that is able to specifically bind or attach to a marker protein.
  • capturing molecules include an antibody, antibody-like molecules such as a designed ankyrin repeat protein, a binding protein that is based on a Z domain of protein A, a binding protein that is based on a fibronectin type III domain, engineered lipocalin, and a binding protein that is based on a human Fyn SH3 domain (Skerra, 2007. Current Opinion Biotechnol 18: 295-304; ⁇ krlec et al., 2015. Trends Biotechnol 33: 408-418), and an aptamer.
  • a capturing molecule may be used in methods of the invention to determine the concentration of a marker protein.
  • the term “antibody”, refers to an antigen binding protein comprising at least a heavy chain variable region (Vh) that binds to a target epitope.
  • the term antibody includes monoclonal antibodies comprising immunoglobulin heavy and light chain molecules, single heavy chain variable domain antibodies, and variants and derivatives thereof, including chimeric variants of monoclonal and single heavy chain variable domain antibodies.
  • the term “aptamer”, refers to a single-stranded nucleic acid molecule (e.g. DNA or RNA) or peptide that specifically binds to a marker protein. An aptamer usually binds to its target with high affinity, such as an affinity in the picomolar range.
  • binding refers to the ability of a capturing molecule to interact with a marker protein. Preferably, said binding is specific.
  • the terms ‘specific’ or ‘specificity’ or grammatical variations thereof refer to the number of different proteins to which a particular capturing molecule can bind.
  • the specificity of a capturing molecule can be determined based on affinity.
  • a specific binding capturing molecule interacts with a marker protein with an affinity that is at least 2 times lower than the affinity of the capturing molecule to another protein, preferably at least 5 times lower, at least 10 times lower, such as at least 20 times lower.
  • a specific capturing molecule preferably has a binding affinity for its specific marker protein of less than 10 -7 M, such as less than 10 -8 M, or even lower.
  • concentrations of marker proteins can be determined in an individual’s sample.
  • a sample may be any type of biological sample obtained from an individual, wherein the concentrations of marker proteins of the invention can be determined.
  • a sample may comprise liver cancer cells from an individual, or suspected to comprise liver cancer cells from an individual, such as a tumour or liquid biopsy.
  • samples include a blood sample, a serum sample, a plasma sample, a lymphatic fluid sample, a saliva sample, a urine sample, a tissue sample or an extract of any of the aforementioned samples.
  • the sample is a blood, plasma or serum sample.
  • the sample is a plasma sample.
  • the sample may be collected in any clinically acceptable manner, but is preferably collected and conserved such as to preserve at least the proteins present in the sample.
  • the sample may be homogenized, or extracted with a solvent in order to obtain a liquid sample, prior to the determining the concentration of one or more marker proteins.
  • Liquid samples may be subjected to one or more pre-treatments prior to use in the present invention.
  • Pre-treatments include, but are not limited to dilution, filtration, centrifugation, concentration, sedimentation, precipitation or dialysis.
  • Pre-treatments may also include the addition of chemical or biochemical substances to the solution, such as acids, bases, buffers, salts, solvents, reactive dyes, detergents, emulsifiers, chelators.
  • a sample may comprise serum, which is prepared, for example, by coagulation of platelets, for example at room temperature, followed by centrifugation at low speed, such as between 2000 g and 5000 g, preferably at about 3000 g. Centrifugation preferably is performed at a room temperature, preferably between 20 °C and 25 °C.
  • a plasma sample in the context of the present invention is a substantially cell-free supernatant of blood containing anticoagulant obtained after centrifugation.
  • anticoagulants include calcium ion binding compounds such as EDTA or citrate and thrombin inhibitors such as heparinates or hirudin.
  • Cell-free plasma can be obtained by centrifugation of the anticoagulated blood (e.g. citrated, EDTA or heparinized blood), for example for at least 15 minutes at 2000 to 3000 g.
  • a tissue sample preferably is disrupted for example by homogenization, for example by application of pressure, ultrasound or by mechanical homogenization, as is known to the skilled person.
  • the sample may be obtained from an individual with HCC or with a likelihood of being typed as HCC.
  • Said individual may present liver disease symptoms or risk factors such as cirrhosis, fibrosis, a history of NAFLD and/or NASH, or a history of hepatitis B or C.
  • a sample may be obtained from an individual without any liver disease symptoms or risk factors.
  • Said individual may not have any liver disease symptoms or risk factors.
  • the sample is obtained from an individual who is at-risk of having or developing HCC.
  • Marker proteins The invention provides a set of at least 2, preferably at least 3, more preferably at least 4, marker proteins whose concentration is correlated with HCC. Said at least 2, preferably at least 3, more preferably at least 4, marker proteins are selected from the list of proteins provided in Table 1.
  • a proper characterisation of an individual at an early stage of the disease may be part of an approach for optimal treatment of said individual.
  • Said characterisation may help a physician in selecting a treatment strategy for said individual.
  • a set of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 or 56 marker proteins from the marker proteins listed in Table 1 is used, such as all 57 proteins listed in Table 1.
  • Said set of at least 2 marker proteins comprises PEX14 and KLRG2, PEX14 and ARL4D, PEX14 and RAB38, PEX14 and PKD2, PEX14 and NKG2E, KLRG2 and ARL4D, KLRG2 and RAB38, KLRG2 and PKD2, KLRG2 and NKG2E, ARL4D and RAB38, ARL4D and PKD2, ARL4D and NKG2E, RAB38 and PKD2, RAB38 and NKG2E, or PKD2 and NKG2E.
  • said set of at least 2 marker proteins comprises PEX14 and GALNS, PEX14 and CP2CJ, PEX14 and IMPA1, KLRG2 and RAB38, KLRG2 and GALNS, KLRG2 and CP2CJ, or KLRG2 and IMPA1.
  • a preferred set of marker proteins comprises at least one combination of 3 marker proteins, selected from the group of the following combinations: PEX14, KLRG2, and ARL4D; PEX14, KLRG2 and RAB38; PEX14, KLRG2 and PKD2; PEX14, KLRG2 and NKG2E; KLRG2, ARL4D, and RAB38; KLRG2, ARL4D and PKD2; KLRG2, ARL4D and NKG2E; ARL4D, RAB38 and PKD2; ARL4D, RAB38 and NKG2E; RAB38, PKD2 and NKG2E; PEX14, KLRG2, and RAB38; PEX14, KLRG2 and GALNS; PEX14, KLRG2 and CP2CJ; PEX14, KLRG2 and IMPA1; KLRG2, RAB38 and GALNS; KLRG2, RAB38 and CP2CJ; KLRG2, RAB38 and
  • a preferred set of marker proteins comprises at least one combination of 4 marker proteins; selected from the group of the following combinations: PEX14, KLRG2, ARL4D and RAB38; PEX14, KLRG2, ARL4D and PKD2; PEX14, KLRG2, ARL4D and NKG2E; KLRG2, ARL4D, RAB38 and PKD2; KLRG2, ARL4D, PKD2 and NKG2E; ARL4D, RAB38, PKD2, NKG2E; PEX14, KLRG2, RAB38 and GALNS; PEX14, KLRG2, RAB38 and CP2CJ; PEX14, KLRG2, RAB38 and IMPA1; KLRG2, RAB38, GALNS and CP2CJ; KLRG2, RAB38, GALNS and IMPA1; RAB38, GALNS, CP2CJ and IMPA1; PEX14, KLRG2, RAB38 and Glypican
  • a preferred set of marker proteins comprises at least 2, preferably at least 3, more preferably at least 4, more preferably at least 5, most preferably 6, of the proteins selected from PEX14, KLRG2, ARL4D, RAB38, PKD2, NKG2E; from Glypican 3, PEX14, KLRG2, ARL4D, RAB38 and PKD2, or from PEX14, KLRG2, RAB38, GALNS, CP2CJ and IMPA1.
  • a preferred set of 4 marker proteins is selected from the group of following combinations: Glypican 3, PEX14, KLGR2 and 4; Glypican 3, PEX14, KLGR2 and RAB38; Glypican 3, PEX14, KLGR2 and PKD2; Glypican 3, PEX14, ARL4D and RAB38; Glypican 3, PEX14, ARL4D and PKD2; Glypican 3, PEX14, RAB38 and PKD2; Glypican 3, KLRG2, ARL4D and RAB38; Glypican 3, KLRG2, ARL4D and PKD2; Glypican 3, KLRG2, RAB38 and PKD2; Glypican 3, ARL4D, RAB38 and PKD2; PEX14, KLRG2, ARL4D and RAB38; PEX14, KLRG2, ARL4D and PKD2; PEX14, KLRG2, RAB38 and PKD2; PEX14,
  • the marker proteins are provided in Table 1 with their protein name, Entrez gene symbol, Uniprot ID and the relation of the marker protein’s concentration in HCC compared to the concentration determined in control samples (i.e. NASH, non-HCC, samples) (up/down-regulation).
  • An upregulation of a marker protein’s concentration (indicated as “up” in Table 1) means that the concentration of said marker protein is increased in an individual with HCC when compared to a control.
  • a downregulation of a marker protein’s concentration (indicated as “down” in Table 1) means that the concentration of said marker protein is decreased in an individual with HCC when compared to a control.
  • Table 1 Overview of early-stage HCC protein markers.
  • the “Up/down”-column indicates upregulation or downregulation of the respective protein in NASH-HCC individual’s samples compared with NASH (non HCC) individual’s samples.
  • EGF-containing fibulin-like extracellular matrix EFEMP1 Q12805 up protein 1 fibulin 5 Fibulin-5
  • Tumor necrosis factor receptor superfamily member TNFRSF11B O00300 up 11B Lectin, mannose- Vesicular integral-membrane protein VIP36 LMAN2 Q12907 up binding 2
  • concentration of marker proteins can be accomplished by any means known in the art such as enzyme-linked immunosorbent assay (ELISA), radio immunoassay (RIA), mass-spectrometric (MS) detection, western-blot, flow cytometric immunoassay (FCIA), Fluorescence Resonance Energy Transfer (FRET), antigen capture assays (including dipstick antigen capture assays), surface plasmon resonance (SPR), quartz crystal microbalance (QCM), and any other acoustic, photonic, plasmonic, electrochemical version thereof, either in direct mode or resonance mode.
  • Preferred methods employ commercially available antibodies or functional parts thereof.
  • assays exist for determining the concentration of one or more proteins, which usually consist of a solid support on which various capturing molecules such as antibodies specific for the marker proteins described herein, antibody fragments and aptamers, are deposited (in technical jargon called ‘spotted’), usually in an orderly manner and at a specific and defined density.
  • capturing molecules such as antibodies specific for the marker proteins described herein, antibody fragments and aptamers
  • spotted usually in an orderly manner and at a specific and defined density.
  • Each of these capturing molecules by binding its own target protein and thereby isolating it from a complex mixture, such as e.g. a cell lysate, allows to highlight and quantify the specific protein of interest.
  • concentrations of multiple marker proteins are assessed simultaneously, by any means known in the art such as multiplex platforms.
  • the concentration of one or more marker proteins is determined using a multiplex platform such as BioPlex, MSD, Somalogic and RBM, or a platform based on singleplex or multiplex ELISA, or a lateral flow assay.
  • Some multiplex platforms including BioPlex, MASD and Myriad RBM) as well as ELISAs determine the absolute concentration of protein in the samples, while other platforms, e.g. Somalogic, measure only relative concentrations of protein.
  • multiplex platforms used for quantitative determination of proteins are usually, similar to ELISA, immunoassays where analytes are “sandwiched” between a capture- and a detection antibody before detection.
  • the detection antibody is usually conjugated to an enzyme, that after the addition of enzyme substrate catalyses a reaction leading to color development in the microtiter plate.
  • the intensity of the color is measured by spectrophotometry and corresponds to the amount of the specific protein to be detected in the unknown sample.
  • Another method developed by Meso Scale Discovery (www.mesoscale.com), applies electrochemiluminescence to quantify the proteins in a microtiter plate. This assay can be multiplexed since capture antibodies specific for different targets can be bound to distinct spots in a microtiter plate.
  • the Fc- parts of the capture antibodies are bound to groups of fluorescent microspheres. Each group of microspheres has slightly different fluorescence intensity and is covered with antibodies recognizing a distinct protein.
  • the detection antibody is coupled to a fluorescent molecule to enable detection.
  • Luminex With the Luminex method two systems to quantify the level of protein are available.
  • One of the systems, applied by (RBM) (Q2 Solutions, Durham NC) is flow-based.
  • the other microsphere-based system utilizes magnetic fluorescent beads and is used by the BioPlex kit (BioRad, Hercules, CA).
  • Somalogic (Boulder, CO) has developed a multiplex method for relative protein quantification of up to over 1100 analytes in one sample (www.somalogic.com). This technique is based on aptamer binding. Aptamers are folded, single-stranded, anionic oligonucleotides that can bind proteins with high specificity and affinity.
  • Somalogic has developed Slow Off-rate Modified Aptamers called SOMAmers, these are modified aptamers that have a slower dissociation rate of the aptamer from its target protein compared to normal aptamers (Gold et al., 2010. PLoS ONE 5: e15004).
  • the concentration of one or more marker proteins can also be deduced from the expression level determined for genes corresponding to one or more marker proteins.
  • the determination of gene expression levels of genes corresponding to one or more marker proteins can be accomplished by any means known in the art such as Northern blotting, quantitative PCR (qPCR), microarray analysis or RNA-seq.
  • Microarray analysis involves the use of selected probes that are immobilized on a solid surface, termed an array.
  • Said probes are able to hybridize to gene expression products such as mRNA, or derivatives thereof such as cDNA.
  • the probes are exposed to labeled gene expression products, or labelled derivates thereof such as labeled cDNA, hybridized, washed, after which the abundance of gene expression products or derivates thereof in the sample that are complementary to a probe is determined by determining the amount of label that remains associated to a probe.
  • the probes on a microarray may comprise DNA sequences, RNA sequences, or copolymer sequences of DNA and RNA.
  • the probes may also comprise DNA and/or RNA analogues such as, for example, nucleotide analogues or peptide nucleic acid molecules (PNA), or combinations thereof.
  • sequences of the probes may be full or partial fragments of genomic DNA.
  • the sequences may also be in vitro synthesized nucleotide sequences, such as synthetic oligonucleotide sequences.
  • a probe preferably is specific for a gene expression product of a gene coding for a marker protein listed in Table 1.
  • a probe is specific when it comprises a continuous stretch of nucleotides that is complementary, over the whole length, to a nucleotide sequence of a gene expression product, or a cDNA product thereof.
  • a probe can also be specific when it comprises a continuous stretch of nucleotides that is partially complementary to a nucleotide sequence of a gene expression product of said gene, or a cDNA product thereof.
  • nucleotide sequence of a gene expression product of said gene Partially means that a maximum of 5 nucleotides, more preferable 4 nucleotides, more preferable 3 nucleotides, more preferable 2 nucleotides and most preferable one nucleotide differs from the corresponding nucleotide sequence of a gene expression product of said gene.
  • complementary is known in the art and refers to a sequence that is related by base-pairing rules to the sequence that is to be detected. It is preferred that the sequence of the probe is carefully designed to minimize nonspecific hybridization to said probe. The specificity of the probe is further determined by the hybridization and/or washing conditions.
  • the hybridization and/or washing conditions are preferably stringent, which are determined by inter alia the temperature and salt concentration of the hybridization and washing conditions, as is known to a person skilled in the art.
  • An increased stringency will substantially reduce non-specific hybridization to a probe, while specific hybridization is not substantially reduced.
  • Stringent conditions include, for example, washing steps for five minutes at room temperature 0.1x sodium chloride-sodium citrate buffer (SSC)/0.005% Triton X- 102.
  • More stringent conditions include washing steps at elevated temperatures, such as 37 °Celsius, 45 °Celsius, or 65 °Celsius, either or not combined with a reduction in ionic strength of the buffer to 0,05x SSC or even 0,01x SSC, as is known to a skilled person.
  • the probe is, or mimics, a single stranded nucleic acid molecule.
  • the length of a probe can vary between 15 bases and several kilo bases, and is preferably between 20 bases and 1 kilobase, more preferred between 40 and 100 bases, and most preferred about 60 nucleotides. A most preferred probe comprises about 60 nucleotides.
  • Said probe is preferably identical over the whole length to a nucleotide sequence of a gene expression product of a gene, or a cDNA product thereof.
  • gene expression products in the sample are preferably labelled, either directly or indirectly, and contacted with probes on the array under conditions that favour duplex formation between a probe and a complementary molecule in the labelled gene expression product sample.
  • the amount of label that remains associated with a probe after washing of the microarray can be determined and is used as a measure for the gene expression level of a nucleic acid molecule that is complementary to said probe.
  • Image acquisition and data analysis can subsequently be performed to produce an image of the surface of the hybridized array.
  • the array may be dried and placed into a laser scanner to determine the amount of labelled sample that is bound to a probe at a predetermined spot. Laser excitation will yield an emission with characteristic spectra that is indicative of the labelled sample that is hybridized to a probe molecule.
  • An array preferably comprises multiple spots encompassing a specific probe. A probe preferably is present in duplicate, in triplicate, in quadruplicate, in quintuplicate, in sextuplicate or in octuplicate on an array. The multiple spots preferably are at randomized positions on an array to minimize bias.
  • the amount of label that remains associated with a particular probe at each spot may be averaged, where after the averaged level can be used as a measure for the gene expression level of a nucleic acid molecule that is complementary to said probe.
  • a gene product may be hybridized to two or more different probes that are specific for that gene product.
  • the determined RNA expression level can be normalized for differences in the total amounts of nucleic acid expression products between two separate samples by comparing the level of expression of one or more genes that are presumed not to differ in expression level between samples such as glyceraldehyde-3-phosphate- dehydro-genase, ⁇ -actin, and ubiquitin.
  • the array may comprise specific probes that are used for normalization. These probes may detect RNA products from housekeeping genes such as glyceraldehyde-3-phosphate dehydrogenase and 18S rRNA levels, or a set of normalization such as provided in WO 2008/039071, which is hereby incorporated by reference, of which the RNA level is thought to be constant in a given cell and independent from the developmental stage or prognosis of said cell.
  • NGS next-generation sequencing
  • RNA samples preferably RNA samples, with or without prior amplification of the RNA expression products.
  • NGS platforms including Illumina® sequencing; Roche 454 pyrosequencing®, ion torrent and ion proton sequencing, and ABI SOLiD® sequencing, allow sequencing of fragments of DNA in parallel. Bioinformatics analyses are used to piece these fragments together by mapping the individual reads. Each base is sequenced multiple times, providing high depth to deliver accurate data and an insight into unexpected DNA variation.
  • NGS can be used to sequence a complete exome including all genes or, alternatively, to sequence a number of individual genes.
  • Pyrosequencing detects the release of inorganic pyrophosphate (PPi) as particular nucleotides are incorporated into the nascent strand (Ronaghi et al., 1996. Analytical Biochemistry 242: 84-9; Ronaghi, 2001. Genome Res 11: 3-11; Ronaghi et al., 1998. Science 281: 363; U.S. Patent No.6,210,891 ; U.S. Patent No. 6,258,568 ; and U.S. Patent No.6,274,320, which are all incorporated herein by reference.
  • PPi inorganic pyrophosphate
  • NGS also includes so called third generation sequencing platforms, for example nanopore sequencing on an Oxford Nanopore Technologies platform, and single-molecule real-time sequencing (SMRT sequencing) on a PacBio platform, with or without prior amplification of the RNA expression products.
  • Further high throughput sequencing techniques include, for example, sequencing-by-synthesis. Sequencing-by-synthesis or cycle sequencing can be accomplished by stepwise addition of nucleotides containing, for example, a cleavable or photobleachable dye label as described, for example, in U.S.
  • Sequencing techniques also include sequencing by ligation techniques. Such techniques use DNA ligase to incorporate oligonucleotides and identify the incorporation of such oligonucleotides and are inter alia described in U.S. Patent No 6,969,488 ; U.S. Patent No. 6,172,218 ; and U.S. Patent No.6,306,597.
  • Sequencing techniques can be performed by directly sequencing RNA, or by sequencing a RNA-to-cDNA converted nucleic acid library. Most protocols for sequencing RNA samples employ a sample preparation method that converts the RNA in the sample into a double-stranded cDNA format prior to sequencing. Conversion of RNA into cDNA and/or cRNA using a reverse-transcriptase enzyme such as M-MLV reverse-transcriptase from Moloney murine leukemia virus, or AMV reverse-transcriptase from avian myeloblastosis virus, is known to a person skilled in the art.
  • FISSEQ fluorescent in situ sequencing
  • MPSS Massively Parallel Signature Sequencing
  • Quantitative PCR is a technique which is used to amplify and simultaneously quantify a template nucleic acid molecule such as DNA or RNA.
  • qPCR quantitative PCR
  • RT-PCR real-time PCR
  • the detection of the amplification product can in principle be accomplished by any suitable method known in the art.
  • the amplified products may be directly stained or labelled with radioactive labels, antibodies, luminescent dyes, fluorescent dyes, or enzyme reagents.
  • Direct DNA stains include for example intercalating dyes such as acridine orange, ethidium bromide, ethidium monoazide or Hoechst dyes.
  • intercalating dyes are non-specific and bind to all double stranded DNA in the PCR.
  • Another direct DNA detection method includes the use of sequence specific DNA probes consisting of a fluorescent reporter and quencher. Upon binding of the probe to its complementary sequence, polymerases of the PCR break the proximity of the reporter and the quencher, resulting in the emission of fluorescence.
  • Commonly used reporter dyes include FAM (Applied Biosystems), HEX (Applied Biosystems), ROX (Applied Biosystems), YAK (ELITech Group) or VIC (Life Technologies) and commonly used quenchers include TAMRA (Applied Biosystems), BHQ (Biosearch Technologies) and ZEN (Integrated DNA Technologies).
  • the amplified product may be detected by incorporation of labelled dNTP bases into the synthesized DNA fragments.
  • Detection labels which may be associated with nucleotide bases include, for example, fluorescein, cyanine dye and BrdUrd.
  • a multiplex qPCR can be used. In multiplex qPCRs, two or more template nucleic acid molecules are amplified and quantified in the same reaction. A commonly used method of achieving the simultaneous detection of multiple targets, is by using probes with different fluorescent dyes to distinguish distinct nucleic acid targets. RT-PCR can also be used as a proxy to quantify protein concentrations.
  • a protein can be detected by binding to an aptamer, followed by determining the amount of bound aptamer by RT-PCR.
  • Said aptamer for example a chemically optimized aptamer like a SOMAmer® (Low Off-Rate Modified Aptamer).
  • the protein of interest can be bound to a surface after which the aptamer is incubated in order to bind the protein of interest. After washing any unbound aptamers, the bound aptamers can be eluted and quantified by RT-PCR.
  • a weak binding nucleic acid molecule NAM
  • genes are selected for normalization of the raw data.
  • Preferred genes are genes of which the RNA expression levels are largely constant between individual samples comprising HCC cancer cells from one individual, and between samples comprising HCC cancer cells from different individuals. It will be clear to a skilled artisan that the RNA levels of said set of normalization genes preferably allow normalization over the whole range of RNA levels.
  • Normalization methods that may be employed include, for example, mean correction, linear combination of factors, Bayesian methods and non-linear normalization methods such quantile normalization.
  • Preferred methods include non-parametric regression methods such as locally estimated scatterplot smoothing (LOESS; Jacoby, 2000. Electoral Studies 19: 577–613) and locally weighted scatterplot smoothing (LOWESS; Cleveland et al., 1988. J American Statistical Association 83: 596–610).
  • LOESS locally estimated scatterplot smoothing
  • LOWESS locally weighted scatterplot smoothing
  • the difference or similarity between a sample’s protein concentration profile and a previously established reference protein concentration profile may be determined.
  • the sample’s protein concentration profile is composed of the protein concentrations of a set of marker proteins in said sample.
  • the reference protein concentration profile is composed of the average protein concentrations of the same set of marker proteins in a sample from a reference group.
  • the reference group may comprise a single individual.
  • the reference group comprises the average expression levels of at least 3, 5, 10, 25, 50, 100, 200 or 300 individuals.
  • the reference group may include individuals with different non-HCC diagnoses.
  • the reference group may also include individuals that all have HCC (i.e. HCC reference group) or individuals not having HCC (i.e. non-HCC reference group).
  • a protein concentration profile of an individual can also be typed by comparing the individual’s protein concentration profile to multiple reference profiles. For example, the individual’s protein concentration profile can be compared to both reference profiles identified above (i.e. the HCC reference group and the non-HCC reference group). If the protein concentration profile of the individual’s sample is substantially more similar to the HCC reference group, when compared to the non-HCC reference group, it will be typed as HCC.
  • the difference or similarity between a protein concentration profile and one or more reference profiles can be determined by determining a correlation of the concentrations of marker proteins in the profiles. For example, one can determine whether the protein concentration of a subset of marker proteins in a sample correlates to the protein concentration of the same subset of marker proteins in a reference profile.
  • This correlation can be numerically expressed, for example by using a correlation coefficient.
  • Several correlation coefficients can be used. Appropriate methods are established after determining whether concentration profiles are normally distributed, for example using the Shapiro-Wilk test. If the marker protein passes the normality test, an independent sample t-test can be performed to check if there is a statistically significant difference in the marker protein concentrations between different sample cohorts. If the protein concentrations are not normally distributed, a nonparametric analysis can be conducted, e.g. a Mann–Whitney U test. A correction for multiple testing can be performed, for example using the Benjamini-Hochberg method. It is also possible to construct a predictive model, for example using a supervised machine learning algorithm such as random forests or support vector machines.
  • a similarity score is a measure of the average correlation of protein concentrations of a set of proteins in a sample from an individual that is to be typed and a reference profile. Said similarity score can, but does not need to be, a numerical value between +1, indicative of a high correlation between the protein concentration profile of the set of proteins in a sample of said individual and said reference profile, and -1, which is indicative of an inverse correlation.
  • a threshold can be used to differentiate between samples typed as HCC, and samples typed as non-HCC.
  • Said threshold is an arbitrary value that allows for discrimination between samples from individuals without HCC, and samples of individuals with HCC. If a similarity threshold value is employed, it is preferably set at a value at which an acceptable number of individuals with HCC would score as false negatives, and an acceptable number of individuals without HCC would score as false positives. Based on the predictions made by the methods of the invention, one can determine a course of treatment of an individual with HCC. For example if the individual’s protein concentration profile is not substantially different from the non-HCC group, and/or substantially different from the HCC group, this indicates that the individual is predicted not to have HCC. 4.6 Methods of treating an individual with HCC Early diagnosis greatly improves the odds of successful treatment of HCC, resulting in higher survival rates.
  • Curative treatment options such as surgical resection, ablation and liver transplantation, are only available to patients with early-stage HCC, while patients with intermediate and advanced-stage HCC can only be provided with palliative care, such as chemoembolization, radioembolization or systemic therapy, which aims to alleviate suffering (Marrero et al., 2018. Hepatology 698: 723-750; Galle et al., 2018. J Hepatol 69: 182-236). Depending on several factors such as the HCC tumor stage, liver function etc, different treatment options are recommended. A multidisciplinary approach for optimal treatment of HCC is proposed by several authors such as Raza and Sood (2014. World J Gastroenterol 20: 4115-4127) and (2020.
  • liver transplantation is a potentially curative treatment and considered as a very effective treatment option, as it removes both the tumor and potential cirrhosis.
  • liver transplantation is recommended for the patients with HCC with BCLC stage A, whose tumor is within the Milan criteria for HCC, meaning that one lesion is not larger than 5 cm, or up to 3 lesions with each 3 cm or smaller.
  • Liver transplantation is the only potentially curative treatment for selected patients with cirrhosis and HCC who are not candidates for surgical resection.
  • Ablation also called ablative therapy, is another curative treatment option for HCC.
  • ablative therapy for HCC examples include radiofrequency ablation (RFA), microwave ablation (MWA), percutaneous ethanol injection (PEI), laser ablation (LSA), cryoablation (CRA), irreversible electroporation (IRE), high intensity focused ultrasound (HIFU) and their combinations.
  • RFA radiofrequency ablation
  • MMA microwave ablation
  • PEI percutaneous ethanol injection
  • LSA laser ablation
  • CRA cryoablation
  • IRE irreversible electroporation
  • HIFU high intensity focused ultrasound
  • Ablation techniques lead to tumor tissue necrosis through various mechanisms, such as thermal coagulation, rapid freezing and chemical cell dehydration, with different post-ablation effects.
  • Local ablation with RFA is considered a standard of care for the patients with very early and early stage tumors not suitable for surgery.
  • Chemoembolization such as trans-arterial chemoembolization (TACE) is currently considered a standard treatment for patients with intermediate-stage HCC.
  • TACE trans-arterial chemo
  • TACE Trans-arterial radioembolization
  • SIRT selective internal radiation therapy
  • the embolizing particles or drug eluting particles are usually 100-500 ⁇ m in size, which cause ischemia of tumor; but in radioembolization the microspheres are usually smaller (35 ⁇ m) in diameter and deliver radiation to tumor without ischemia to the tumor or liver tissue.
  • Molecular studies of HCC have identified aberrant activation of different signaling pathways, which represent key targets for novel molecular therapies. For patients with advanced disease, sorafenib is the only approved therapy, but novel targeted agents and their combinations are emerging.
  • Systemic therapy for the treatment of HCC, and especially advanced HCC includes therapy with sorafenib (BAY-43-9006, Nexavar®, Bayer), lenvatinib, nivolumab, regorafenib, cabozantinib, ramucirumab, pembrolizumab, capmatinib, nintedanib, axitinib, dovitinib, decitabine, codrituzumab, bevacizumab, erlotinib, temozolomide, veliparib, resminostat, AEG35156, capecitabine, refametinib, modified FOLFOX, sunitinib, erlotinib, linifanib and/or brivanib.
  • sorafenib BAY-43-9006, Nexavar®, Bayer
  • lenvatinib nivolumab
  • regorafenib
  • One or more of these agents may be combined .
  • one or more of these agents may be combined with a chemotherapeutic agent such as doxorubicin, octreotide and oxaliplatin, tegafur/uracil, cisplatin and gemcitabine and AVE 1642 (a human monoclonal antibody inhibiting the insulin-like growth factor-1 receptor), or a combination thereof.
  • a chemotherapeutic agent such as doxorubicin, octreotide and oxaliplatin, tegafur/uracil, cisplatin and gemcitabine and AVE 1642 (a human monoclonal antibody inhibiting the insulin-like growth factor-1 receptor), or a combination thereof.
  • a chemotherapeutic agent such as doxorubicin, octreotide and oxaliplatin, tegafur/uracil, cisplatin and gemcitabine and AVE 1642 (a human monoclonal antibody inhibiting the insulin
  • Methods of treating an individual that is predicted not to have HCC In the case that a patient is predicted not to have HCC, it may not be recommended to provide an HCC treatment.
  • the method of typing for the presence of HCC according to the invention is performed on a sample of an individual who is an individual at risk of having or developing HCC, meaning said individual is having one or more risk factors for development of HCC and may therefore be recommended by their physician or clinical expert to participate in a HCC surveillance program.
  • risk factors are: cirrhosis, fibrosis, chronic hepatitis B, chronic hepatitis C, alcoholic liver disease, NAFLD, NASH, primary biliary cholangitis, primary hemochromatosis, auto-immune hepatitis, alpha-1 antitrypsin deficiency and Wilson’s disease (Marrero et al., 2018. Hepatology 68(2): 723-750). It is recommended for such individual at risk of having or developing HCC to remain included in a surveillance testing program, which means the individual will be regularly tested.
  • the recommended treatment would be to treat the risk factor using the best available options as to limit the risk to develop HCC in the future.
  • the recommended treatment strategy for an individual at risk of having or developing HCC that is predicted not to have HCC thus relates to the individual’s underlying risk factor.
  • Recommended treatment for an individual, that is at risk of having or developing HCC and predicted not to have HCC, with chronic hepatitis B or C may include: antiviral medications, such as entecavir, tenofovir, lamivudine, adefovir and telbivudine; interferon injections, such as Interferon alfa- 2b (Intron A); or a liver transplant.
  • antiviral medications such as entecavir, tenofovir, lamivudine, adefovir and telbivudine
  • interferon injections such as Interferon alfa- 2b (Intron A)
  • Intra- 2b Interferon alfa- 2b
  • Recommended treatment for an individual, that is at risk of having or developing HCC and predicted not to have HCC, with alcoholic liver disease may include: professional help to stop or reduce the drinking of alcohol, possibly including help with withdrawal symptoms (which may include medications such as benzodiazepine and psychological therapy such as cognitive behavioural therapy), possibly including help with relapse prevention (which can include psychological therapy and medications such as acamprosate, disulfiram, or naltrexone), possibly including referral to a self-help group; nutritional support to promote a more healthy diet; treatments to reduce inflammation of the liver, such as corticosteroids; or a liver transplant.
  • help with withdrawal symptoms which may include medications such as benzodiazepine and psychological therapy such as cognitive behavioural therapy
  • help with relapse prevention which can include psychological therapy and medications such as acamprosate, disulfiram, or naltrexone
  • Recommended treatment for an individual, that is at risk of having or developing HCC and predicted not to have HCC, with NAFLD may include: professional help to promote weight loss through a healthy diet and/or exercise; treatment to reduce blood cholesterol and triglycerides, such as statins (e.g. atorvastatin, rosuvastatin, simvastatin, fluvastatin, pravastatin, lovastatin), treatment to control diabetes, such as insulin, if the individual is also diabetic; treatment to reduce blood pressure, such as a diet change to lower salt intake and/or medications such as angiotensin-converting enzyme inhibitors (e.g.
  • angiotensin-2 receptor blockers e.g. candesartan, irbesartan, losartan, valsartan, olmesartan
  • calcium channel blockers e.g. amlodipine, felodipine, nifedipine, diltiazem, verapamil
  • diuretics e.g. indapamide, bendroflumethiazide
  • beta blockers e.g. atenolol, bisoprolol
  • Recommended treatment for an individual, that is at risk of having or developing HCC and predicted not to have HCC, with NASH may include: professional help to promote weight loss through a healthy diet and/or exercise; treatment to reduce blood cholesterol and triglycerides, such as statins (e.g. atorvastatin, rosuvastatin, simvastatin, fluvastatin, pravastatin, lovastatin), treatment to control diabetes, such as insulin, if the individual is also diabetic; treatment to reduce blood pressure, such as a diet change to lower salt intake and/or medications such as angiotensin-converting enzyme inhibitors (e.g.
  • angiotensin-2 receptor blockers e.g. candesartan, irbesartan, losartan, valsartan, olmesartan
  • calcium channel blockers e.g. amlodipine, felodipine, nifedipine, diltiazem, verapamil
  • diuretics e.g. indapamide, bendroflumethiazide
  • beta blockers e.g. atenolol, bisoprolol
  • Recommended treatment for an individual, that is at risk of having or developing HCC and predicted not to have HCC, with primary biliary cholangitis may include: treatment that could slow the progression of the disease, prevent complications, improve liver function and/or reduce liver scarring, such as ursodeoxycholic acid, obeticholic acid, fibrates, or budesonide; treatments to control symptoms, such as antihistamines, cholestyramine, rifampin, sertraline, or opioid antagonists; or a liver transplant.
  • Recommended treatment for an individual, that is at risk of having or developing HCC and predicted not to have HCC, with primary hemochromatosis may include: therapeutic phlebotomy, i.e.
  • chelation therapy such as deferoxamine or deferasirox.
  • Recommended treatment for an individual, that is at risk of having or developing HCC and predicted not to have HCC, with auto-immune hepatitis may include: corticosteroid medications such as prednisone or budesonide; or immunosuppressants, such as azathioprine.
  • Recommended treatment for an individual, that is at risk of having or developing HCC and predicted not to have HCC, with alpha-1 antitrypsin deficiency may include: help with maintaining normal nutrition; or a liver transplant.
  • the invention provides a method of treating an individual with HCC, comprising: - typing of a sample from said individual using a method of typing a sample of an individual with HCC according to the invention; - treating the individual that is typed as having HCC with a curative treatment; and - treating the individual that is typed as not having HCC with a treatment strategy related to the individual’s underlying risk factor for HCC.
  • Curative HCC treatment preferably comprises liver transplantation, ablation, surgical resection or a combination thereof.
  • Palliative treatment preferably comprises chemoembolization, radioembolization, systemic therapy, or a combination thereof.
  • Example 1 retrospective case-control study for HCC biomarker discovery in NASH patients
  • Materials and methods Patient inclusion criteria EDTA plasma samples were obtained from the biobank of the Transplant Centre of the LUMC.
  • the selected study population composed of patients with cirrhosis or HCC that were over 18 years of age, with a history of NAFLD (non- alcoholic fatty liver disease) or NASH (non-alcoholic steatohepatitis); this was preferably also the primary liver disease etiology.
  • the NASH etiology was chosen because it is expected to become more dominant in the near future and because early-stage HCC detection in NASH is currently more difficult (Stra ⁇ et al., 2020. Clin Exp Hepatol 6: 170-175).
  • Inclusion criteria for the cirrhosis control samples were pathologically diagnosed liver cirrhosis, with no diagnosis of HCC within the next year after the sample was taken. Criteria for the HCC case samples were pathologically diagnosed HCC, with the samples being taken prior to treatment or intervention, and without other malignancies present at the time of diagnosis. Patients were excluded if they had a simultaneous diagnosis of another unrelated liver disease, such as hemochromatosis, primary biliary cholangitis, or autoimmune hepatitis. Preferably, the selected patients had not been subjected to chemotherapy, biologic therapy, radiation therapy, or immunosuppressants during the 5 years before drawing a plasma sample.
  • SomaScan assay characteristics The collected plasma samples were analysed using the SomaScan assay (v4.1), a fee-for-service aptamer-based proteomics platform provided by the American company SomaLogic, which provides 7,335 protein concentration measurements across a large number of different biological pathways (https://somalogic.com/somascan-assay/). The provided concentration data is given in relative fluorescence units (RFU), not absolute units such as mg/mL or mol/mL. Each of the 7,335 measurements relates to a unique aptamer sequence, but these are linked to only 6,414 unique Uniprot entries, with 34 aptamers having no related Uniprot entry.
  • NASH-HCC patients Distinguishing NASH-HCC patients from NASH patients was treated as a classification problem.
  • P-values were calculated using independent two-sided t-tests and values ⁇ 0.01 were considered significant.
  • P-values were corrected for multiple testing using the Bonferroni (Equation 1) and Benjamini-Hochberg corrections.
  • the data covariance was examined by conducting a principal component analysis (PCA). The results of this PCA were then used to establish an alternative, less conservatively adjusted significance level to correct for multiple testing (Equation 2). Uncorrected p-values were used in subsequent analysis steps.
  • PCA principal component analysis
  • the goal of the algorithm is to maximize the margin, i.e. the distance from the hyperplane to the support vectors, thereby maximally separating the two classes.
  • SVM analysis was performed on the data using the Python package scikit- learn (v1.1.1) (Pedregosa, 2011. J Mach Learn Res 12: 2825-2830). SVM models were trained on combinations of two, three, and four proteins. As with the biomarker ratios, the analyses were only conducted on subselections of proteins to limit the computation time.
  • Proteins to be combined were selected based on p-value and AUC cutoffs: combinations of two proteins were made for the 185 proteins with a p-value ⁇ 0.03 and/or an AUC >0.67; combinations of three proteins for the 61 proteins with a p-value ⁇ 0.01 and/or an AUC >0.7; and combinations of four proteins for the 41 proteins with a p-value ⁇ 0.006 and/or an AUC >0.705 (see Figure 1).
  • selections of 185 and 61 proteins have 47,239,010 and 521,855 unique combinations of four proteins, respectively.
  • SVM models were generated for 10 random states.
  • the dataset was randomly split 50/50 into a training set and a test set. Protein expression data of the training and test sets were scaled separately to zero mean and unit variance.
  • three SVM parameters were fine-tuned: the algorithm, the kernel coefficient gamma, and the regularisation parameter C.
  • the algorithm is the function type used for the kernel; either ‘linear’ or ‘radial base function’ (rbf) was used.
  • the kernel coefficient gamma defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’; the used inputs were 0.01, 0.1, 1 or scale, with scale corresponding to 1/(n_features*X.var()).
  • the regularisation parameter C indicates the trade-off between correct classification of training samples and maximisation of the decision function’s margin, with larger values of C leading to a smaller margin if the decision function is better at classifying all training points correctly, while a lower C promotes a larger margin and a simpler decision function at the cost of training accuracy; for C the inputs ranged from 0.1 to 1 with intervals of 0.1.
  • the maximal Youden index thus provides the optimal trade-off between sensitivity and specificity for a specific model.
  • the first sets of generated SVM models contained proteins which were selected based on calculated p-values and AUCs. It was noted that using only this approach to select proteins for inclusion in the generated SVM models has a risk of introducing a bias. To address this risk, the random forest machine learning algorithm was used to simultaneously rank all 7,335 protein measurements based on their predictive performance in an unbiased manner. First, the dataset was randomly split 50/50 into a training set and a test set.
  • the Boruta ranking method was used to select proteins with better-than-random predictive performance. This method adds copies of all features, so-called Shadow Features, and shuffles the values of the newly created features to remove their correlation with the response. Subsequently, a Random Forest Classifier is built on the Shadow Features to determine their importance quantified by a Z-score. Then, the algorithm assesses if the original features have a higher importance than the maximum importance of the Shadow Features. If the Z-score of an original feature is higher, the feature is considered significant and retained, otherwise the feature is dropped. This process is repeated until either a specified number of iterations has been reached or when all original features have been retained or dropped. The maximum number of iterations per run was set to 100.
  • the discovered protein biomarkers were then ranked to provide an indication of the diagnostic potential of individual markers and to distinguish which markers might be prioritised in the preparation of the follow-up study. This ranking was achieved through the following steps. (1) First the list of SVM models was expanded to include all possible combinations of 2, 3, and 4 of the identified proteins, in order to make a fair comparison. For model combinations which had not been tested yet, new SVM results were obtained using the same approach as described above. (2) Next, the SVM models were combined and sorted by highest average AUC. (3) For each model in the resulting selection, a number equal to the length of the list of models minus the position of the model in the list was added to the performance scores of the markers that the model composed of.
  • the package Canopy (v0.4.0) was used to load the data provided by SomaLogic22. Other packages that were used extensively included: pandas (v1.4.4); numpy (v1.23.5); scipy (v1.9.3); sklearn (v1.0.2); statsmodels (v0.13.2); matplotlib (v3.6.2); seaborn (v0.12.2); and dataframe-image (v0.1.10). Results Patient characteristics The characteristics of the 78 patients that were included in the clinical study are reported in Table 2. The NASH control cohort and the NASH-HCC case cohort included 40 (51.3%) and 38 (48.7%) patients respectively.
  • IQR interquartile range
  • NASH nonalcoholic steatohepatitis
  • HCC hepatocellular carcinoma
  • BCLC Barcelona Clinic Liver Cancer staging system
  • MELD Model For End-Stage Liver Disease
  • AST aspartate aminotransferase
  • ALT alanine transaminase.
  • the volcano plot in Figure 2 illustrates that 46 proteins were able to significantly differentiate NASH and NASH-HCC patients (p ⁇ 0.01). However, no individual proteins could detect HCC upon correcting for multiple testing with either the Bonferroni or the Benjamini-Hochberg correction (data not shown).
  • the ‘wing’ patterns displayed in the bottom left and right of the volcano plot were caused by high variability; for instance, proteins on the bottom right were upregulated in only a few NASH-HCC patients or even just a single patient.
  • Table 3 presents the top 10 proteins in terms of p-value and AUC.
  • the most significant protein, IGFALS had a p-value of 6.35e-5 and an AUC of 0.75. The maximum AUC of 0.75 was achieved by Glypican 3 and IGFALS.
  • Proteins were selected for inclusion in the generated SVM models if they appeared in at least 1% of the random forest selections; for 1000 iterations this corresponded with a frequency of 10 or more. This resulted in a selection of 32 proteins.
  • the protein IGFALS was most frequently classified as having significant predictive importance, being selected in 998 out of 1000 BorutaPy runs.
  • Glypican 3 and ASAP2 were ranked second and third, being selected 989 and 895 out of 1000 times respectively.
  • Number four, TMED9 sees a lower number of selections, being selected 685 times. After TMED9, the number of selections rapidly decreased, with the last 7 markers being selected less than 100 times by the Boruta method.
  • Table 8 shows the results of all possible model combinations between the first 6 of the 57 identified proteins. Note that models combining 5 and 6 different proteins were added as well, these were established in a later stage of the analysis (see the Expanded support vector machine models section). Table 6. Overview of the 57 unique proteins that were found in the selection of SVM models with average AUCs higher than 0.83. The protein score is based on the positions in the list, sorted by average AUC, of all models the protein appeared in.
  • FC fold change of the protein in the NASH- HCC cohort compared to the NASH control cohort.
  • ES effect size. No Best model Best .
  • Protein Score AUC FC ES 1 Glypican 3 8977122 1 0.924 4.90 0.32 2 PEX14 6861637 1 0.924 1.69 0.79 3 KLRG2 6047003 1 0.924 0.43 -0.64 4 ARL4D 3761359 4 0.912 1.89 0.50 5 RAB38 3325116 2 0.917 0.46 -0.29 6 PKD2 2761412 1 0.924 0.74 -0.48 7 NKG2E 2738222 3 0.915 1.65 0.33 8 GALNS 2462891 275 0.880 1.31 0.73 9 UB2L6 2261817 16 0.905 0.82 -0.66 10 TM157 2087919 15 0.906 1.22 0.70 11 TRA2B 2033617 169 0.886 1.39
  • n extracellular matrix protein 1 51 fibulin 5 Fibulin-5 FBLN5 Q9UBX5 up 52 RNAS4 Ribonuclease 4 RNASE4 P34096 up 53 IGF-I Insulin-like growth IGF1 P05019 down factor I 54 PGM5 PGM5 PGM5 Q15124 up 55 TMED9 Transmembrane emp24 TMED9 Q9BVK6 up domain-containing protein 9 56 OPG Tumor necrosis factor TNFRSF11 O00300 up receptor superfamily B member 11B 57 Lectin, Vesicular integral- LMAN2 Q12907 up mannose- membrane protein binding 2 VIP36
  • the strongest positive intercorrelations were found between TMED9 (55) and Lectin, mannose-binding 2 (57) at 0.93, between Glypican 3 (1) and Met (42) at 0.89, between SLIK1 (43) and IGF-I (53) at 0.83, and between N-terminal pro-BNP (33) and FBLN3 (50) at 0.77.
  • the strongest negative intercorrelations were found between PAHX (40) and FBLN3 (50) at -0.67, between TRI54 (28) and IL-1 sRI (49) at -0.60, between IL-1 sRI (49) and IGF-I (53) at -0.56, and between PAHX (40) and TMED9 (55) at -0.55.
  • the selected candidate markers were then characterised by looking for potential intercorrelations and correlations with covariates (patient age, sex, BMI, cancer stage, and several scores and measures of liver damage). Since the proteins were selected based on how well they could be combined into SVM models to predict presence of HCC, it was expected that no strong intercorrelations would be found between the proteins; this was indeed the case. Most of the proteins showed individual correlation with patient cohort and many with cancer stage. Some proteins showed small associations with patient age as well, very few with patient sex or BMI.
  • Example 2 random model comparison discovery study Materials and methods Selections of the predictive models constructed in the data analysis for the clinical study described in Example 1 were compared to selections of predictive models generated with proteins randomly selected from the SomaScan panel of over 7000 measured protein concentration profiles, in order to demonstrate the merits of the 57 identified protein biomarker candidates.
  • SVM models for all possible combinations of 5 or 6 proteins per model between members of the panel of 57 identified biomarker candidates was considered too demanding computationally; these sets would comprise 4,187,106 models and 36,288,252 models in total, respectively. Therefore, in order to be able to make an assessment of the performance of these models and to be able to compare them, random subsections were generated comprising 10,000 possible combinations with 5 proteins per model and 10,000 possible combinations with 6 proteins per model.
  • the SVM models were constructed using an approach similar to the one discussed in Example 1. SVM models were generated using the Python package scikit-learn (v1.1.1) (Pedregosa, 2011. J Mach Learn Res 12: 2825-2830), with cross-validating using 10 random states for each generated model.
  • Example 3 next-best marker analysis discovery study Materials and methods While the 57 protein biomarker candidates listed in Table 1 and mentioned in Example 1 showed promising results in our discovery study, subsequent development of an immunoassay was found to be challenging for some of the target proteins. For example, availability of adequate antibodies was limited for some of the most promising target proteins. Using the dataset and trained sets of predictive models discussed in Example 1, an assessment was therefore made towards the effects of excluding some of the top 10 target proteins for the model performance of the remaining biomarker candidate panel.
  • Table 10 shows the changes to the order of the top-ranking proteins following exclusion of Glypican 3 (#1), PEX14 (#2), KLRG2 (#3), ARL4D (#4), RAB38 (#5), PKD2 (#6), NKG2E (#7), or GALNS (#8). It can be seen that the order of the ranked biomarkers shows minimal change with the exception of the exclusion of Glypican 3 (#1) or PEX14 (#2), though even then the top 13 is mostly maintained. In Table 11 the best-performing models are presented that remain after excluding Glypican 3 (#1) from the selection, in order to check to what extent the trained models rely on the top-ranking biomarkers.
  • Buffers For most targets, a standard block buffer, wash buffer, and reagent diluent is used.
  • the block buffer typically consists of a 3% dilution of Bovine Serum Albumin (BSA) block buffer, in 1x phosphate-buffered saline (PBS) buffer.
  • the wash buffer typically consists of 0.05% (w/v) TWEEN®20 in 1x PBS buffer.
  • the reagent diluent typically consists of 0.05% (w/v) TWEEN®20 and, depending on the target protein, between 0.1% and 1% BSA buffer in 1xPBS buffer. All buffers containing TWEEN®20 are to be kept at room temperature and disposed of after one week.
  • Buffers containing BSA are to be made on the day of the experiment using either a BSA buffer stock (Block buffer) that is kept in the fridge, or freshly made BSA buffer. Buffers containing both BSA and TWEEN®20 may only be used for one day at room temperature.
  • Day 1. Firstly, the capture antibodies are diluted in 1xPBS. The dilution ranges from 1:10 to 1:100.000.
  • a 96-well microplate is then coated with 50 ⁇ L per well. After sealing the plate with a plate sealer, it is incubated overnight at 4 degrees Celsius. Day 2. Following incubation, the microplate wells are washed 4 times with 100 ⁇ L 1xPBS.
  • the wells are then filled with 100 ⁇ L block buffer and incubated at room temperature for 1 hour. Next, the wells are washed twice with 100 ⁇ L of the wash buffer. Then 50 ⁇ L is added of the recombinant or reference/patient plasma, which may be diluted in reagent diluent. This dilution ranges from undiluted to 1:10.000. After incubating again at room temperature for 1.5 hours, the wells are washed 5 times with 100 ⁇ L wash buffer. 50 ⁇ L of the detection antibodies is then added, which has been diluted in reagent diluent, followed by incubation at room temperature for 1 hour. The dilution may range from 1:10 to 1:100.000.
  • the wells are then washed 5 times with 100 ⁇ L wash buffer again. The next steps are performed in the dark. 50 ⁇ L of the HRP-conjugate is added to the wells, diluted in reagent diluent, followed by incubation at room temperature for 1 hour. The wells are then washed 5 times with 100 ⁇ L wash buffer. Subsequently 50 ⁇ L of substrate solution (3,3’,5,5’-tetramethylbenzidine (TMB)) is added to the wells, after which the plate is incubated at room temperature for 20 minutes. The plates are then checked at intervals of 5 minutes for high signal and stopped accordingly by adding 50 ⁇ L of ELISA Stop Solution from Invitrogen.
  • TMB trimethylbenzidine
  • Example 5 Internal validation results for target proteins Glypican 3 and Omentin The discovery study detailed in Example 1, which aimed to obtain a set of biomarker candidates with potential for implementation in a diagnostic HCC screening test, was conducted by analysing two groups of plasma samples using the SomaScan Assay developed by the company SomaLogic.
  • the SomaScan Assay is a fee-for-service aptamer-based proteomics platform which provides 7,335 protein concentration measurements across a large number of different biological pathways.
  • This impressive set of analysed targets makes the SomaScan Assay an attractive platform for biomarker discovery.
  • the measured concentration data that the SomaScan Assay provides is presented in relative fluorescence units (RFU), not absolute units such as mg/mL or mol/mL.
  • REU relative fluorescence units

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Urology & Nephrology (AREA)
  • Immunology (AREA)
  • Engineering & Computer Science (AREA)
  • Hematology (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Biotechnology (AREA)
  • Oncology (AREA)
  • Hospice & Palliative Care (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Food Science & Technology (AREA)
  • Cell Biology (AREA)
  • Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention relates to in vitro methods of typing a sample of an individual for the presence of hepatocellular carcinoma (HCC). The invention further relates to methods of treating an individual with HCC, that is typed according to the methods of the invention.

Description

P135237PC00 Title: Biomarkers for typing a sample of an individual for hepatocellular carcinoma. FIELD: The invention relates to methods for typing of a sample of an individual for cancer, particularly hepatocellular carcinoma. The invention is directed to a set of marker proteins to type a hepatocellular carcinoma. 1 INTRODUCTION The most common type of liver disorder, affecting approximately a quarter of the world population, is non-alcoholic fatty liver disease (NAFLD) (Marjot et al., 2020. Endocr Rev 41: bnz009). NAFLD is characterised by accumulation of fat in the liver (i.e. hepatic steatosis) in absence of secondary causes like alcoholism, viral infections, medications, toxins or congenital defects (Sheka et al., 2020. JAMA 323: 1175-1183). A subtype of NAFLD is non-alcoholic steatohepatitis (NASH), in which the fat build-up is causing an inflammation of the liver. In 2020 it was estimated that in the United States between 3 and 6% of the population was affected by NASH (Sheka et al., 2020. JAMA 323: 1175-1183). Over time, prolonged liver inflammation causes liver damage, which in turn may lead to fibrosis, i.e. the formation of scar tissue. Subsequently, fibrosis can advance to the point where the liver turns cirrhotic; at this stage the liver becomes functionally impaired as a consequence of cumulative scarring. About 1 out of 5 NASH cases eventually progress to cirrhosis (Sheka et al., 2020. JAMA 323: 1175-1183). Cirrhosis, in turn, significantly increases the risk of patients to develop hepatocellular carcinoma (HCC), the most common type of liver cancer accounting for 75-85% of all liver cancer cases (Straś et al., 2020. Clin Exp Hepatol 6: 170-175; Singal et al., 2020. J Hepatol 72: 250-261). By estimation, 1 in 3 patients with cirrhosis will eventually develop liver cancer, with a reported annual incidence of 1-8% depending on the underlying etiology. Globally, the leading causes underlying HCC are hepatitis B virus (HBV) at 33%, alcohol at 30%, and hepatitis C virus (HCV) at 21% (Singal et al., 2020. J Hepatol 72: 260-261). However, due to implementation of HBV and HCV vaccination programs and the global rise of obesity, NASH is expected to become a more dominant etiology in the near future. In 2020 liver cancer was the 6th most prevalent type of cancer globally by annual number of new cases (906k, 4.7% of the world total) and the third most deadly by annual number of new deaths (830k, 8.3% of the world total) (Sung et al., 2021. CA Cancer J Clin 71: 209-249). In Europe, liver cancer was the 13th most prevalent cancer type by number of new cases (88k, 2.2% of the European total) and the 7th most deadly by number of cancer-related deaths (78k, 4.0% of the European total) (Ferlay et al., 2020. Global Cancer Observatory: Cancer Today. available at gco.iarc.fr/today). Considering these numbers, it is clear that liver cancer has a major impact across the world. Early diagnosis greatly improves the odds of successful treatment of HCC, resulting in higher survival rates. Curative treatment options, such as surgical resection, ablation and liver transplantation, are only available to patients with early-stage HCC, while patients with intermediate and advanced-stage HCC can only be provided with palliative care, such as chemoembolization or systemic therapy, which instead aims to alleviate suffering (Marrero et al., 2018. Hepatology 698: 723-750; Galle et al., 2018. J Hepatol 69: 182-236). The stage at which liver cancer is diagnosed severely impacts the number of years that a patient has left to live on average following diagnosis. If HCC is diagnosed in an advanced stage, the 5-year survival rate drops as low as 10%, while this rate is even lower for patients who are diagnosed with late-stage HCC. In contrast, for patients diagnosed with early-stage HCC the 5-year survival rate is 50% or more. Promotion of early-stage HCC detection could also help to decrease the economic burden of non-curative treatments. Early and very early-stage HCC is associated with significantly lower per-patient-per-year (PPPY) costs compared to later stages, i.e. intermediate, advanced and late-stage HCC: an American study estimated these PPPY costs to be $137k, $133k, $178k, $269k, and $467k, respectively (Likhitsup et al.¸ 2020. Pharmacoeconomics 38: 5-24). There are different methods available to diagnose or screen for liver cancer: imaging techniques like ultrasound (US), magnetic resonance imaging (MRI) or computed tomography (CT) scanning; histological examination, which requires performing a liver biopsy; or by measurement of blood levels of certain biomolecules, such as alpha-fetoprotein (AFP) (Marrero et al., 2018. Hepatology 698: 723-750). US scanning is currently the recommended practice for biannual surveillance testing in adult patients with cirrhosis, as they are at a higher risk for developing HCC (Marrero et al., 2018. Hepatology 698: 723-750). Following a positive ultrasound, HCC diagnoses are generally confirmed through MRI. However, current blood assays and imaging techniques are not always reliable and can lack accuracy in indicating the presence of HCC, while biopsies are invasive, risky, and not necessarily representative indications of the liver’s status (Marrero et al., 2018. Hepatology 698: 723-750). Early-stage liver cancer is particularly difficult to detect; while US has a sensitivity of 84% for detection of HCC at any stage, its sensitivity for early-stage HCC is only 47% (Singal et al., 2020. J Hepatol 72: 250-261). Combined with blood testing for AFP, the sensitivity is still only 63% (Singal et al., 2020. J Hepatol 72: 250-261). Liver cancer also generally doesn’t show clear symptoms, if any at all, especially in early stages (Ayuso et al., 2018. J Radiology 101: 72-81). These factors complicate early-stage diagnosis significantly. Several proteins are being studied for potential use as liver cancer biomarkers for surveillance testing, the most widely tested of which is AFP. However, inclusion of AFP in screening guidelines is considered “suboptimal in terms of cost-effectiveness and for routine surveillance” (Galle et al., 2018. J Hepatol 69: 182-236). The most prominent HCC screening test in development is the Roche Diagnostics Elecsys® GAAD assay. This multivariate assay is based on the so-called GAAD score which combines gender, age, and plasma measurements of the proteins AFP and des-gamma-carboxy prothrombin (DCP) (Marrero et al., 2018. Hepatology 698: 723-750; Galle et al., 2018. J Hepatol 69: 182-236; Best et al., 2020. Clin Gastroenterol Hepatol 18: 728-735; Schotten et al., 2021. Pharmaceuticals 14: 735; Yang et al., 2019. Cancer Epodemiol Biomarkers Prev 28: 531-538). DCP is also known as prothrombin induced by vitamin K absence or antagonist-II (PIVKA-II). On its own, PIVKA-II reportedly produced an area under the curve (AUC) ranging between 0.83 and 0.87 (Best et al., 2020. Clin Gastroenterol Hepatol 18: 728-735). In March 2020 the GAAD assay was granted Breakthrough Device Designation by the United States Food and Drug Administration, though the Clinical Practice Guidance by the American Association for the Study of Liver Diseases has currently not recommended it, noting that phase II case-control biomarker studies showed promising results but that phase III and IV studies were still needed to assess the assay’s performance in larger cohorts (Marrero et al., 2018. Hepatology 698: 723-750). There is thus a need for a method to type HCC in an early disease stage, overcoming one or more of the abovementioned disadvantages. As such, there is a need for one or more biomarkers that have a diagnostic performance exceeding that of both the current clinical standard, i.e. US with or without AFP, and PIVKA-II. 2 BRIEF DESCRIPTION OF THE INVENTION From a retrospective case-control clinical study, plasma samples of patients with HCC, primarily caused by NASH-induced liver cirrhosis, (i.e. the cases) and patients with NASH-induced liver cirrhosis but not HCC (i.e. the controls) were obtained. Using proteomic data resulting from these samples, a promising set of biomarkers was identified. This set of biomarkers can be used to screen for HCC, for example in at-risk individuals. The invention is directed to an in vitro method of typing a sample of an individual for the presence of a hepatocellular carcinoma (HCC), the method comprising: (i) determining the concentration of at least 2 marker proteins to thereby provide a concentration profile of the marker proteins, wherein the marker proteins are selected from the proteins listed in Table 1; (ii) comparing the individual’s concentration profile to a reference concentration profile of the at least 2 marker proteins; thereby typing the sample for the presence of HCC. Said sample preferably is a plasma or serum sample. The determination of the protein concentration preferably is performed using an enzyme-linked immunosorbent assay (ELISA), preferably a multiplex ELISA. In said method of typing, more preferably, the at least 2 marker proteins are selected from PEX14, KLRG2, ARL4D, RAB38, PKD2 and NKG2E, or from PEX14, KLRG2, RAB38, GALNS, CP2CJ and IMPA1. In embodiments, the at least 2 marker proteins comprise PEX14 and KLRG2. In embodiments, the at least 2 marker proteins comprise PEX14, KLRG2 and at least one protein selected from ARL4D, RAB38,PKD2, and NKG2E, preferably PEX14, KLRG2 and at least two proteins selected from ARL4D, RAB38, PKD2 and NKG2E, more preferably comprise PEX14, KLRG2, ARL4D, RAB38 and PKD2. In said method of typing preferably the protein concentration of at least 5 different marker proteins, more preferably at least 6 different marker proteins, more preferably at least 7 different marker proteins, more preferably at least 8 different marker proteins, more preferably at least 10 different marker proteins, more preferably at least 20 different marker proteins selected from the proteins listed in Table 1, most preferably all marker proteins listed in Table 1, is determined. Said individual may be at risk of having or developing HCC. Said individual may have cirrhosis, fibrosis, chronic hepatitis B, chronic hepatitis C, alcoholic liver disease, NAFLD, NASH, primary biliary cholangitis, primary hemochromatosis, auto-immune hepatitis, alpha-1 antitrypsin deficiency, or Wilson's disease. Said reference concentration profile may be composed of the average concentrations of the marker proteins specified in step (ii) of individuals having HCC; of individuals not having HCC; or of a mixture of individuals having HCC and individuals not having HCC. The individual’s concentration profile may be compared to two reference concentration profiles, wherein one reference concentration profile is composed of the average concentrations of the marker proteins specified in step (ii) of individuals having HCC and the other reference concentration profile is composed of the average concentrations of the marker proteins specified in step (ii) of individuals not having HCC. The invention further provides a method of treating an individual with HCC, comprising typing of a sample from said individual using the method of typing according to the invention, treating the individual that is typed as having HCC with a curative treatment; and testing the individual that is typed as not having HCC with the method of typing according to the invention at regular time intervals, such as every three years, preferably every two years, preferably every year, more preferably every six months. Said curative treatment may comprise liver transplantation, ablation, surgical resection or a combination thereof. Preferably, the individual that is typed as not having HCC is treated with a treatment strategy related to the individual’s underlying risk factor for HCC. Preferably the individual is at risk of having or developing HCC. 3 BRIEF DESCRIPTION OF THE FIGURES Figure 1. Data analysis workflow used for identifying a panel of candidate biomarkers for early NASH-HCC detection from the proteomics dataset. Abbreviations: AUC, area under the curve; SVM: support vector machine. Figure 2. Volcano plot of log2(FC) (fold change) plotted against -log10(P), based on independent two-sample t-tests on the 7,335 protein measurements in the dataset. The log2(FC) indicates whether protein levels were higher (positive) or lower (negative) in NASH-HCC samples compared to NASH control samples. The p-value indicates the odds that the measured difference is the result of randomness. The vertical dotted lines indicate FCs of, respectively, 0.5 (log2(FC)=- 1) and 2 (log2(FC)=1). Three alternative significance levels are depicted horizontally; the Bonferroni-adjusted significance level corrects for the number of proteins in the dataset, the PC-based cut off corrects for the number of principal components which describe over 99% of the variance in the dataset. Figure 3. Swarm plots and boxplots of the log10(RFU) distributions for two individual proteins. The plotted proteins are the one with the best p-value (left), IGFALS (p=6.4e-5), and the one with the highest AUC (right), Glypican 3 (AUC=0.75), with distributions across NASH (light grey) and NASH-HCC (dark grey) patients. Abbreviation: RFU = relative fluorescence intensity. Figure 4. ROC curves illustrating the performance of three individual proteins in distinguishing NASH and NASH-HCC patients. The three biomarkers are Glypican 3, which showed the highest AUC (0.75), IGFALS, which showed the most significant p-value (6.4e-5), and alpha-fetoprotein (AFP), a known HCC biomarker which may be used for screening in later stages but which performs poorly in early-stage HCC. The diagonal dotted line indicates the line of no- discrimination, corresponding to the performance of a model based entirely on random guessing. Figure 5. Plots of the percentage of proteomic data variance explained per principal component (histogram, left y-axis), and the cumulative data variance explained with each additional principal component (line plot, right y-axis). The first 80 components are plotted of a total of 7,335. Figure 6. Swarm plots and boxplots of the log10(RFU) ratio distributions for two models which each compose of one protein that is elevated and one that is lowered in the NASH-HCC cohort. The plots are for the most significant biomarker ratio (left), GON2 | WISP-2 (p=4.6e-7), and the biomarker ratio with the highest AUC (right), HMGR | Glypican 3 (AUC=0.84), with distributions across NASH (light grey) and NASH-HCC (dark grey) patients. Abbreviation: RFU = relative fluorescence intensity. Figure 7. ROC curves illustrating the performance of two models which each are composed of one protein that is elevated and one that is lowered in the NASH- HCC cohort. The two ratio models combine HMGR and Glypican 3, which showed the highest AUC (0.75), and GON2 and WISP-2, which showed the most significant p-value (6.4e-5). The diagonal dotted line indicates the line of no-discrimination, corresponding to the performance of a model based entirely on random guessing. Figure 8. ROC curves of the best-performing SVM models for combinations of two (A), three (B), and four (C) proteins in detecting HCC in NASH patients. Curves were generated for 10 different random states of the data (dashed gray lines), for the mean model results (solid line), and for the mean model results ± 1 standard deviation (gray area). The diagonal dotted line indicates the line of no- discrimination, corresponding to a model based on random guessing. (A) ROC curves for the SVM model Glypican 3|ARL4D. (B) ROC curves for the SVM model Glypican 3|ARL4D|HMGR. (C) ROC curves for the SVM model Glypican 3|PEX14|PKD2|KLRG2. Figure 9. Scatter plot of the marker scores of all 57 unique proteins that were found in the selection of SVM models with average AUCs higher than 0.83. The score is based on the positions in the list, sorted by average AUC, of all models each protein appeared in. The marker numbers relate to the number presented in Table 6. Figure 10. Heatmap illustrating the correlation between the 57 identified biomarker proteins. Plots with a plus sign indicate positive correlation, plots with a minus sign indicate negative correlation. The numbers of the proteins correspond with the numbers indicated in the overview in Table 6. Figure 11. Heatmap of the p-values that were calculated to assess associations between several patient covariates and the 57 selected NASH-HCC protein biomarkers. The p-values were calculated using ordinary least squares regression. P-values higher than 0.05 were masked (depicted as white) for easy distinction. Abbreviations; MI = body mass index, BCLC = Barcelona Clinic Liver Cancer staging system, MELD = Model For End-Stage Liver Disease, AST = aspartate aminotransferase, ALT = alanine transaminase, INR = International Normalized Ratio, GPC3 = Glypican 3, PSPN = Persephin, AMY1A = Amylase alpha 1A, NPPB = N-terminal pro-BNP, IGF2R = IGF-II receptor, CAL. A = Calgranulin A, LMAN2 = Lectin mannose-binding 2. Figure 12. Mean ROC curves of the best-performing SVM models for detecting HCC in NASH patients, using combinations of two to eight proteins, either without (A), or with (B) inclusion of patient age and sex as additional model variables. Curves were generated for the mean model results of the best- performing models. The diagonal line indicates the line of no-discrimination, corresponding to a model based on random guessing. For each ROC curve the optimal Youden’s J was calculated using Equation 3 and depicted as solid dots. More data regarding the plotted models can be found in Table 9. Figure 13. Plots showing the mean areas under the curve (AUCs) for two sets of SVM models combining 2 proteins per model. The first set (A) comprises 10,000 randomly selected combinations of 2 from the SomaScan dataset of >7000 protein concentration profiles. The second set (B) comprises all possible combinations of 2 between the 57 protein biomarker candidates identified in the discovery study. Figure 14. Plots showing the mean AUCs for two sets of SVM models combining 3 proteins per model. The first set (A) comprises 10,000 randomly selected combinations of 3 from the SomaScan dataset of >7000 protein concentration profiles. The second set (B) comprises all possible combinations of 3 between the 57 protein biomarker candidates identified in the discovery study. Figure 15. Plots showing the mean AUCs for two sets of SVM models combining 4 proteins per model. The first set (A) comprises 10,000 randomly selected combinations of 4 from the SomaScan dataset of >7000 protein concentration profiles. The second set (B) comprises all possible combinations of 4 between the 57 protein biomarker candidates identified in the discovery study. Figure 16. Plots showing the mean AUCs for two sets of SVM models combining 5 proteins per model. The first set (A) comprises 10,000 randomly selected combinations of 5 from the SomaScan dataset of >7000 protein concentration profiles. The second set (B) comprises a subsection of 10,000 models that each combine 5 randomly selected proteins from the 57 protein biomarker candidates identified in the discovery study. Figure 17. Plots showing the mean AUCs for two sets of SVM models combining 6 proteins per model. The first set (A) comprises 10,000 randomly selected combinations of 6 from the SomaScan dataset of >7000 protein concentration profiles. The second set (B) comprises a subsection of 10,000 models that each combine 6 randomly selected proteins from the 57 protein biomarker candidates identified in the discovery study. Figure 18. ELISA standard curves for protein targets Glypican 3, PEX14, RAB38, PKD2, and IMPA1. The curves were fitted using four-parameter logistic regression. Figure 19. Graphs showing the log10 plasma concentrations measured by ELISA plotted against the measured concentrations by the SomaScan Assay in log10(RFU), for respectively the target proteins Glypican 3 and Omentin. RFU = relative fluorescence units. Figure 20. Swarm plots and boxplots of the plasma Glypican 3 concentration distributions across four subgroups, measured by ELISA. These subgroups include: 10 healthy adults (all newly included); 40 cirrhotic NASH patients (all from the discovery study); 30 early-stage HCC patients with BCLC stage 0 or A; and 15 late- stage HCC patients with BCLC stage B, C, or D (7 from the discovery study, 8 newly included). Figure 21. Graph showing the Glypican 3 concentrations in serum plotted against the Glypican 3 concentrations in plasma, both measured by ELISA. 4 DETAILED DESCRIPTION OF THE INVENTION 4.1 Definitions As is used herein, the term “cancer”, refers to a disease or disorder characterized by uncontrolled cell division. Said uncontrolled cell division may be caused by activating mutations that drive cell division and/or by an increase of survival or apoptosis resistance. Cancer cells may acquire the ability to invade other neighbouring tissues (i.e. invasion), the ability to spread to other areas of the body where the cells are not normally located (i.e. metastasis) and/or the ability to establish new growth at ectopic sites. As is used herein, the term “non-alcoholic fatty liver disease (NAFLD)”, refers to a liver disorder characterised by accumulation of fat in the liver (i.e. hepatic steatosis). NAFLD occurs in the absence of demonstrable secondary causes like alcoholism, viral infections, medications, toxins, or congenital defects (Sheka et al., 2020. JAMA 323: 1175-1183). NAFLD presents in four main stages: stage 1 is simple fatty liver (i.e. steatosis) without inflammation or fibrosis; stage 2 is non- alcoholic steatohepatitis (NASH) (i.e. steatosis with lobular inflammation but no fibrosis or balloon cells); stage 3 is fibrosis (i.e. scarring) of the liver and stage 4 is cirrhosis (i.e. irreversible, advanced scarring of the liver). The term "NAFLD" includes any stage or degree of the disease. As is used herein, the term “non-alcoholic steatohepatitis (NASH)”, refers to a subtype of NAFLD, wherein inflammation of the liver is caused by fat build-up. NASH is not associated with alcohol consumption. As used herein, the term “NASH” may encompass steatosis, hepatocellular ballooning and lobular inflammation. As is used herein, the term “hepatocellular carcinoma (HCC)”, refers to a cancer of the liver. An HCC is malignant and is the most common primary malignant liver cancer. Risk factors include chronic active hepatitis B, hepatitis C, and liver cirrhosis (e.g. caused by a hepatitis B or C virus infection, alcoholic liver disease, NAFLD, NASH, primary biliary cholangitis, primary hemochromatosis, auto-immune hepatitis, alpha-1 antitrypsin deficiency, or Wilson’s disease). HCC can be subdivided into different HCC stages depending on the size of the tumour and whether there is spread to lymph nodes or other body tissues. The most commonly used staging system for HCC is the Barcelona Clinic Liver Cancer (BCLC) staging system. The BCLC system takes into account a so-called ‘performance status’ of a patient. A performance status of 0 means a patient is well, i.e. as normal; a performance status of 1 means a patient is well enough to do everything he normally does, except for heavy work; a performance status of 2 means a patient is up for most of the day, but not well enough to work; a performance status of 3 means a patient needs to rest for more than half the day and need some help in looking after himself; a performance status of 4 means a patient is in bed or chair all day and needs help looking after himself. The BCLC system further takes into account the functioning of the liver, e.g. measured using the so-called Child-Pugh system. Child-Pugh group A means a patient’s liver is working normally; Child-Pugh group B means there is some liver damage; Child- Pugh group C means there is a lot of liver damage. In total, there are 5 groups in BCLC system, i.e. group 0, A, B, C and D. Barcelona stage 0 indicates very early stage HCC, wherein a patient has a single liver tumour that measures less than 2 cm across. In Barcelona stage 0 a patient is well (performance status 0) and the patient’s liver is working normally (Child-Pugh A). Barcelona stage A indicates early stage HCC, wherein a patient has either a single tumour (of any size) or up to 3 tumours that are all less than 3 cm across. In Barcelona stage A, a patient is well (performance status 0) and the patient’s liver may be working normally or there may be some liver damage (Child-Pugh A or B). Barcelona stage B indicates intermediate stage HCC. In stage B, a patient has multiple tumours in their liver, but are well overall (performance status 0). The patient’s liver is working normally or there is only moderate liver damage (Child-Pugh A or B). Barcelona stage C indicates advanced HCC. The liver tumours have grown into blood vessels, or have spread to lymph nodes or other body organs. In Barcelona stage C, a patient may not feel as well as normal and may be less active but is still reasonably fit (performance status 1 or 2). In this stage, the patient’s liver is still working normally or there may be moderate liver damage (Child-Pugh A or B). Finally, Barcelona stage D indicates very advanced HCC, wherein a patient is very unwell and needs help looking after themself (performance status 3 or 4), or it may mean a patient has a lot of liver damage (Child-Pugh C). As is used herein, the term “individual”, refers to a mammal, preferably a human. As is used herein, the term “sample”, refers to any biological sample that can be completely or partly obtained from an individual. The term comprises any sample of bodily fluid or tissue obtained from an individual in order to type the individual for a liver disorder, here HCC. A sample such as blood, plasma or serum, may comprise protein products from liver cancer cells from an individual. A sample such as a tumour or liquid biopsy may comprise liver cancer cells from an individual. As is used herein, the term “typing of a sample”, refers to the classification of a sample based on characterized features. In this invention typing includes the determination of protein concentrations in a sample, which may assist in the characterisation of an individual from which the sample is obtained as having a hepatocellular carcinoma (HCC), or not likely to have a HCC. As is used herein, the term “protein”, refers to a macromolecule comprising one or more long chains of amino acid residues. In the context of this invention the term “protein” covers any chain length and thus includes small peptides and polypeptides, and amino acid modifications such as acylation, glycosylation and isoprenylation. As is used herein, the term “protein concentration”, refers to a quantifiable level of a protein of interest. A protein’s concentration can be expressed in absolute units, e.g. mg/ml or mol/ml, or can be expressed in relative units, such as relative to an internal reference standard e.g. relative fluorescence units (RFU). As is used herein, the term “marker protein”, refers to a protein whose concentration, alone or in combination with other proteins, is correlated with an effect, in this application a correlation to the characterization of an individual from which the sample is typed as having a HCC. As used herein, the term “individual at risk of having or developing HCC”, refers to an individual having one or more conditions which are considered a risk factor for development of HCC. Such an individual is usually recommended by their physician or clinical expert to participate in an HCC surveillance program. Examples of these risk factors are: cirrhosis, fibrosis, chronic hepatitis B, chronic hepatitis C, alcoholic liver disease, NAFLD, NASH, primary biliary cholangitis, primary hemochromatosis, auto-immune hepatitis, alpha-1 antitrypsin deficiency and Wilson’s disease (Marrero et al., 2018. Hepatology 68(2): 723-750). As used herein, the term “curative treatment”, refers to a treatment that aims to cure a disease (here HCC) or to improve or alleviate symptoms associated with a disease (here HCC). As used herein, the term “palliative treatment”, refers to a treatment or therapy that does not aim at curing a disease but rather at providing relief. As used herein, the term “capturing molecule”, refers to a molecule that is able to specifically bind or attach to a marker protein. Examples of capturing molecules include an antibody, antibody-like molecules such as a designed ankyrin repeat protein, a binding protein that is based on a Z domain of protein A, a binding protein that is based on a fibronectin type III domain, engineered lipocalin, and a binding protein that is based on a human Fyn SH3 domain (Skerra, 2007. Current Opinion Biotechnol 18: 295-304; Škrlec et al., 2015. Trends Biotechnol 33: 408-418), and an aptamer. A capturing molecule may be used in methods of the invention to determine the concentration of a marker protein. As used herein, the term “antibody”, refers to an antigen binding protein comprising at least a heavy chain variable region (Vh) that binds to a target epitope. The term antibody includes monoclonal antibodies comprising immunoglobulin heavy and light chain molecules, single heavy chain variable domain antibodies, and variants and derivatives thereof, including chimeric variants of monoclonal and single heavy chain variable domain antibodies. As used herein, the term “aptamer”, refers to a single-stranded nucleic acid molecule (e.g. DNA or RNA) or peptide that specifically binds to a marker protein. An aptamer usually binds to its target with high affinity, such as an affinity in the picomolar range. As used herein, the term “binding”, refers to the ability of a capturing molecule to interact with a marker protein. Preferably, said binding is specific. The terms ‘specific’ or ‘specificity’ or grammatical variations thereof refer to the number of different proteins to which a particular capturing molecule can bind. The specificity of a capturing molecule can be determined based on affinity. A specific binding capturing molecule interacts with a marker protein with an affinity that is at least 2 times lower than the affinity of the capturing molecule to another protein, preferably at least 5 times lower, at least 10 times lower, such as at least 20 times lower. A specific capturing molecule preferably has a binding affinity for its specific marker protein of less than 10-7 M, such as less than 10-8 M, or even lower. 4.2 Sample collection and pre-processing This invention is directed to a set of marker proteins that can be used to type a sample of an individual as having HCC. Early diagnosis greatly improves the odds of successful treatment of HCC, resulting in higher survival rates. According to the invention, concentrations of marker proteins can be determined in an individual’s sample. A sample may be any type of biological sample obtained from an individual, wherein the concentrations of marker proteins of the invention can be determined. A sample may comprise liver cancer cells from an individual, or suspected to comprise liver cancer cells from an individual, such as a tumour or liquid biopsy. Examples of samples include a blood sample, a serum sample, a plasma sample, a lymphatic fluid sample, a saliva sample, a urine sample, a tissue sample or an extract of any of the aforementioned samples. Preferably, the sample is a blood, plasma or serum sample. Most preferably, the sample is a plasma sample. The sample may be collected in any clinically acceptable manner, but is preferably collected and conserved such as to preserve at least the proteins present in the sample. Where appropriate, the sample may be homogenized, or extracted with a solvent in order to obtain a liquid sample, prior to the determining the concentration of one or more marker proteins. Liquid samples may be subjected to one or more pre-treatments prior to use in the present invention. Such pre- treatments include, but are not limited to dilution, filtration, centrifugation, concentration, sedimentation, precipitation or dialysis. Pre-treatments may also include the addition of chemical or biochemical substances to the solution, such as acids, bases, buffers, salts, solvents, reactive dyes, detergents, emulsifiers, chelators. A sample may comprise serum, which is prepared, for example, by coagulation of platelets, for example at room temperature, followed by centrifugation at low speed, such as between 2000 g and 5000 g, preferably at about 3000 g. Centrifugation preferably is performed at a room temperature, preferably between 20 °C and 25 °C. A plasma sample in the context of the present invention is a substantially cell-free supernatant of blood containing anticoagulant obtained after centrifugation. Exemplary anticoagulants include calcium ion binding compounds such as EDTA or citrate and thrombin inhibitors such as heparinates or hirudin. Cell-free plasma can be obtained by centrifugation of the anticoagulated blood (e.g. citrated, EDTA or heparinized blood), for example for at least 15 minutes at 2000 to 3000 g. A tissue sample preferably is disrupted for example by homogenization, for example by application of pressure, ultrasound or by mechanical homogenization, as is known to the skilled person. The sample may be obtained from an individual with HCC or with a likelihood of being typed as HCC. Said individual may present liver disease symptoms or risk factors such as cirrhosis, fibrosis, a history of NAFLD and/or NASH, or a history of hepatitis B or C. Alternatively, a sample may be obtained from an individual without any liver disease symptoms or risk factors. Said individual may not have any liver disease symptoms or risk factors. Preferably, the sample is obtained from an individual who is at-risk of having or developing HCC. 4.3. Marker proteins The invention provides a set of at least 2, preferably at least 3, more preferably at least 4, marker proteins whose concentration is correlated with HCC. Said at least 2, preferably at least 3, more preferably at least 4, marker proteins are selected from the list of proteins provided in Table 1. Since early diagnosis of HCC increases the survival chances of HCC patients (see above), a proper characterisation of an individual at an early stage of the disease may be part of an approach for optimal treatment of said individual. Said characterisation may help a physician in selecting a treatment strategy for said individual. Preferably, a set of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 or 56 marker proteins from the marker proteins listed in Table 1 is used, such as all 57 proteins listed in Table 1. Said set of at least 2 marker proteins comprises PEX14 and KLRG2, PEX14 and ARL4D, PEX14 and RAB38, PEX14 and PKD2, PEX14 and NKG2E, KLRG2 and ARL4D, KLRG2 and RAB38, KLRG2 and PKD2, KLRG2 and NKG2E, ARL4D and RAB38, ARL4D and PKD2, ARL4D and NKG2E, RAB38 and PKD2, RAB38 and NKG2E, or PKD2 and NKG2E. In embodiments, said set of at least 2 marker proteins comprises PEX14 and GALNS, PEX14 and CP2CJ, PEX14 and IMPA1, KLRG2 and RAB38, KLRG2 and GALNS, KLRG2 and CP2CJ, or KLRG2 and IMPA1. A preferred set of marker proteins comprises at least one combination of 3 marker proteins, selected from the group of the following combinations: PEX14, KLRG2, and ARL4D; PEX14, KLRG2 and RAB38; PEX14, KLRG2 and PKD2; PEX14, KLRG2 and NKG2E; KLRG2, ARL4D, and RAB38; KLRG2, ARL4D and PKD2; KLRG2, ARL4D and NKG2E; ARL4D, RAB38 and PKD2; ARL4D, RAB38 and NKG2E; RAB38, PKD2 and NKG2E; PEX14, KLRG2, and RAB38; PEX14, KLRG2 and GALNS; PEX14, KLRG2 and CP2CJ; PEX14, KLRG2 and IMPA1; KLRG2, RAB38 and GALNS; KLRG2, RAB38 and CP2CJ; KLRG2, RAB38 and IMPA1; RAB38, GALNS and CP2CJ; RAB38, GALNS and IMPA1; GALNS, CP2CJ and IMPA1; Glypican 3, ARL4D and HMGR; Glypican 3, ARL4D and LR2BP; Glypican 3, ARL4D and NKG2E; Glypican 3, PEX14 and PKD2; Glypican 3,ARL4D and NDE1; Glypican 3,ARL4D and UB2L6; Glypican 3, ARL4D and AADAT; Glypican 3, ARL4D and DCNL3; Glypican 3, ARL4D and MSTN1; and Glypican 3, HMGR and NKG2E. A preferred set of marker proteins comprises at least one combination of 4 marker proteins; selected from the group of the following combinations: PEX14, KLRG2, ARL4D and RAB38; PEX14, KLRG2, ARL4D and PKD2; PEX14, KLRG2, ARL4D and NKG2E; KLRG2, ARL4D, RAB38 and PKD2; KLRG2, ARL4D, PKD2 and NKG2E; ARL4D, RAB38, PKD2, NKG2E; PEX14, KLRG2, RAB38 and GALNS; PEX14, KLRG2, RAB38 and CP2CJ; PEX14, KLRG2, RAB38 and IMPA1; KLRG2, RAB38, GALNS and CP2CJ; KLRG2, RAB38, GALNS and IMPA1; RAB38, GALNS, CP2CJ and IMPA1; PEX14, KLRG2, RAB38 and Glypican 3; Glypican 3, PEX14, PKD2 and KLRG2; Glypican 3, PEX14, PKD2 and RAB38; Glypican 3, ARL4D, HMGR and NKG2E; Glypican 3, PEX14, PKD2 and ARL4D; PEX14, ARL4D, KLRG2 and CPT1B; Glypican 3, PEX14, ARL4D and RAB38; Glypican 3, ARL4D, KLRG2 and NKG2E; Glypican 3, ARL4D, KLRG2 and MSTN1; Glypican 3, ARLD4, HMGR and LR2BP; and Glypican 3, KLRG2, NKG2E and MSTN1. A preferred set of marker proteins comprises at least 2, preferably at least 3, more preferably at least 4, more preferably at least 5, most preferably 6, of the proteins selected from PEX14, KLRG2, ARL4D, RAB38, PKD2, NKG2E; from Glypican 3, PEX14, KLRG2, ARL4D, RAB38 and PKD2, or from PEX14, KLRG2, RAB38, GALNS, CP2CJ and IMPA1. A preferred set of 4 marker proteins is selected from the group of following combinations: Glypican 3, PEX14, KLGR2 and 4; Glypican 3, PEX14, KLGR2 and RAB38; Glypican 3, PEX14, KLGR2 and PKD2; Glypican 3, PEX14, ARL4D and RAB38; Glypican 3, PEX14, ARL4D and PKD2; Glypican 3, PEX14, RAB38 and PKD2; Glypican 3, KLRG2, ARL4D and RAB38; Glypican 3, KLRG2, ARL4D and PKD2; Glypican 3, KLRG2, RAB38 and PKD2; Glypican 3, ARL4D, RAB38 and PKD2; PEX14, KLRG2, ARL4D and RAB38; PEX14, KLRG2, ARL4D and PKD2; PEX14, KLRG2, RAB38 and PKD2; PEX14, ARL4D, RAB38 and PKD2; and KLRG2, ARL4D, RAB38 and RAB38. The marker proteins are provided in Table 1 with their protein name, Entrez gene symbol, Uniprot ID and the relation of the marker protein’s concentration in HCC compared to the concentration determined in control samples (i.e. NASH, non-HCC, samples) (up/down-regulation). An upregulation of a marker protein’s concentration (indicated as “up” in Table 1) means that the concentration of said marker protein is increased in an individual with HCC when compared to a control. A downregulation of a marker protein’s concentration (indicated as “down” in Table 1) means that the concentration of said marker protein is decreased in an individual with HCC when compared to a control.
Table 1. Overview of early-stage HCC protein markers. The “Up/down”-column indicates upregulation or downregulation of the respective protein in NASH-HCC individual’s samples compared with NASH (non HCC) individual’s samples. EGS: Entrez Gene Symbol. Protein Full name EGS Uniprot Up/down 1 Glypican 3 Glypican-3 GPC3 P51654 up 2 PEX14 Peroxisomal membrane protein PEX14 PEX14 O75381 up 3 KLRG2 Killer cell lectin-like receptor subfamily G member 2 KLRG2 A4D1S0 down 4 ARL4D ADP-ribosylation factor-like protein 4D ARL4D P49703 up 5 RAB38 Ras-related protein Rab-38 RAB38 P57729 down 6 PKD2 Polycystin-2 PKD2 Q13563 down 7 NKG2E NKG2-E type II integral membrane protein KLRC3 Q07444 up 8 GALNS N-acetylgalactosamine-6-sulfatase GALNS P34059 up 9 UB2L6 Ubiquitin/ISG15-conjugating enzyme E2 L6 UBE2L6 O14933 down 10 TM157 Membrane protein FAM174A FAM174A Q8TBP5 up 11 TRA2B Transformer-2 protein homolog beta TRA2B P62995 up 12 CP2CJ Cytochrome P4502C19 CYP2C19 P33261 up 13 IMPA1 Inositol monophosphatase 1 IMPA1 P29218 up 14 Persephin Persephin PSPN O60542 up 15 MSTN1 Musculoskeletal embryonic nuclear protein 1 MUSTN1 Q8IVN3 up 16 DCNL3 DCN1-like protein 3 DCUN1D3 Q8IWE4 down 17 Apaf-1 Apoptotic protease-activating factor 1 APAF1 O14727 down 18 HMGR 3-hydroxy-3-methylglutaryl-coenzyme A reductase HMGCR P04035 down 19 LR2BP LRP2-binding protein LRP2BP Q9P2M1 down 20 NDE1 Nuclear distribution protein nudE homolog 1 NDE1 Q9NXR1 down 21 Amylase, alpha 1A Alpha-amylase 1 AMY1A P04745 up 22 C1RL1 Complement C1r subcomponent-like protein C1RL Q9NZP8 down
ASAP2 Arf-GAP with SH3 domain, ANK repeat and PH ASAP2 O43150 down domain-containing protein 2 AADAT Kynurenine/alpha-aminoadipate aminotransferase, AADAT Q8N5Z0 down mitochondrial CPT1B Carnitine O-palmitoyltransferase 1, muscle isoform CPT1B Q92523 up DRB3 HLA class II histocompatibility antigen, DR beta 3 HLA-DRB3 P79483 down chain SOST Sclerostin SOST Q9BQB4 up TRI54 Tripartite motif-containing protein 54 TRIM54 Q9BYV2 down ARY1 Arylamine N-acetyltransferase 1 NAT1 P18440 down RASF2 Ras association domain-containing protein 2 RASSF2 P50749 down SOSSC SOSS complex subunit C INIP Q9NRY2 down DJB12 DnaJ homolog subfamily B member 12 DNAJB12 Q9NXW2 up N-terminal pro-BNP N-terminal pro-BNP NPPB P16860 up UGT 1A6 UDP-glucuronosyltransferase 1-6 UGT1A6 P19224 up DLDH Dihydrolipoyl dehydrogenase, mitochondrial DLD P09622 up TRIM9 E3 ubiquitin-protein ligase TRIM9 TRIM9 Q9C026 down TPGS2 Tubulin polyglutamylase complex subunit 2 TPGS2 Q68CL5 down DERM Dermatopontin DPT Q07507 up CST8 Cystatin-8 CST8 O60676 up PAHX Phytanoyl-CoA dioxygenase, peroxisomal PHYH O14832 down IGF-II receptor Cation-independent mannose-6-phosphate receptor IGF2R P11717 up Met Hepatocyte growth factor receptor MET P08581 up SLIK1 SLIT and NTRK-like protein 1 SLITRK1 Q96PX8 down Omentin Intelectin-1 ITLN1 Q8WWA0 up Calgranulin A Calgranulin A S100A8 P05109 down WISP-2 WNT1-inducible-signaling pathway protein 2 CCN5 O76076 up
CYTD Cystatin-D CST5 P28325 up Lefty-A Left-right determination factor 2 LEFTY2 O00292 up IL-1 sRI Interleukin-1 receptor type 1 IL1R1 P14778 up FBLN3 EGF-containing fibulin-like extracellular matrix EFEMP1 Q12805 up protein 1 fibulin 5 Fibulin-5 FBLN5 Q9UBX5 up RNAS4 Ribonuclease 4 RNASE4 P34096 up IGF-I Insulin-like growth factor I IGF1 P05019 down PGM5 PGM5 PGM5 Q15124 up TMED9 Transmembrane emp24 domain-containing protein 9 TMED9 Q9BVK6 up OPG Tumor necrosis factor receptor superfamily member TNFRSF11B O00300 up 11B Lectin, mannose- Vesicular integral-membrane protein VIP36 LMAN2 Q12907 up binding 2
4.4 Determining concentration of marker proteins The determination of the concentration of one or more marker proteins can be accomplished by any means known in the art such as enzyme-linked immunosorbent assay (ELISA), radio immunoassay (RIA), mass-spectrometric (MS) detection, western-blot, flow cytometric immunoassay (FCIA), Fluorescence Resonance Energy Transfer (FRET), antigen capture assays (including dipstick antigen capture assays), surface plasmon resonance (SPR), quartz crystal microbalance (QCM), and any other acoustic, photonic, plasmonic, electrochemical version thereof, either in direct mode or resonance mode. Preferred methods employ commercially available antibodies or functional parts thereof. Several assays exist for determining the concentration of one or more proteins, which usually consist of a solid support on which various capturing molecules such as antibodies specific for the marker proteins described herein, antibody fragments and aptamers, are deposited (in technical jargon called ‘spotted’), usually in an orderly manner and at a specific and defined density. Each of these capturing molecules, by binding its own target protein and thereby isolating it from a complex mixture, such as e.g. a cell lysate, allows to highlight and quantify the specific protein of interest. Preferably, the concentrations of multiple marker proteins are assessed simultaneously, by any means known in the art such as multiplex platforms. Examples of such multiplex platforms are BioPlex, Meso Scale Discovery (MSD), Somalogic and Rules-Based Medicine (RBM), previously called “Myriad RBM”, or multiplex ELISA. Preferably, in a method according to the invention, the concentration of one or more marker proteins is determined using a multiplex platform such as BioPlex, MSD, Somalogic and RBM, or a platform based on singleplex or multiplex ELISA, or a lateral flow assay. Some multiplex platforms (including BioPlex, MASD and Myriad RBM) as well as ELISAs determine the absolute concentration of protein in the samples, while other platforms, e.g. Somalogic, measure only relative concentrations of protein. For a more detailed description of these multiplex platforms, reference is made to Christiansson et al., 2014 (EuPA Open Proteomics 3: 37-47) and Tighe et al., 2015 (Proteomics Clin Appl 9: 406-422). In short, multiplex platforms used for quantitative determination of proteins are usually, similar to ELISA, immunoassays where analytes are “sandwiched” between a capture- and a detection antibody before detection. In a common singleplex ELISA the detection antibody is usually conjugated to an enzyme, that after the addition of enzyme substrate catalyses a reaction leading to color development in the microtiter plate. The intensity of the color is measured by spectrophotometry and corresponds to the amount of the specific protein to be detected in the unknown sample. Another method, developed by Meso Scale Discovery (www.mesoscale.com), applies electrochemiluminescence to quantify the proteins in a microtiter plate. This assay can be multiplexed since capture antibodies specific for different targets can be bound to distinct spots in a microtiter plate. In the microsphere-based technology developed by Luminex Corporation (www.luminexcorp.com), the Fc- parts of the capture antibodies are bound to groups of fluorescent microspheres. Each group of microspheres has slightly different fluorescence intensity and is covered with antibodies recognizing a distinct protein. The detection antibody is coupled to a fluorescent molecule to enable detection. With the Luminex method two systems to quantify the level of protein are available. One of the systems, applied by (RBM) (Q2 Solutions, Durham NC) is flow-based. The other microsphere-based system utilizes magnetic fluorescent beads and is used by the BioPlex kit (BioRad, Hercules, CA). Somalogic (Boulder, CO) has developed a multiplex method for relative protein quantification of up to over 1100 analytes in one sample (www.somalogic.com). This technique is based on aptamer binding. Aptamers are folded, single-stranded, anionic oligonucleotides that can bind proteins with high specificity and affinity. Somalogic has developed Slow Off-rate Modified Aptamers called SOMAmers, these are modified aptamers that have a slower dissociation rate of the aptamer from its target protein compared to normal aptamers (Gold et al., 2010. PLoS ONE 5: e15004). Alternatively, the concentration of one or more marker proteins can also be deduced from the expression level determined for genes corresponding to one or more marker proteins. The determination of gene expression levels of genes corresponding to one or more marker proteins can be accomplished by any means known in the art such as Northern blotting, quantitative PCR (qPCR), microarray analysis or RNA-seq. Microarray analysis involves the use of selected probes that are immobilized on a solid surface, termed an array. Said probes are able to hybridize to gene expression products such as mRNA, or derivatives thereof such as cDNA. The probes are exposed to labeled gene expression products, or labelled derivates thereof such as labeled cDNA, hybridized, washed, after which the abundance of gene expression products or derivates thereof in the sample that are complementary to a probe is determined by determining the amount of label that remains associated to a probe. The probes on a microarray may comprise DNA sequences, RNA sequences, or copolymer sequences of DNA and RNA. The probes may also comprise DNA and/or RNA analogues such as, for example, nucleotide analogues or peptide nucleic acid molecules (PNA), or combinations thereof. The sequences of the probes may be full or partial fragments of genomic DNA. The sequences may also be in vitro synthesized nucleotide sequences, such as synthetic oligonucleotide sequences. In the context of the invention, a probe preferably is specific for a gene expression product of a gene coding for a marker protein listed in Table 1. A probe is specific when it comprises a continuous stretch of nucleotides that is complementary, over the whole length, to a nucleotide sequence of a gene expression product, or a cDNA product thereof. A probe can also be specific when it comprises a continuous stretch of nucleotides that is partially complementary to a nucleotide sequence of a gene expression product of said gene, or a cDNA product thereof. Partially means that a maximum of 5 nucleotides, more preferable 4 nucleotides, more preferable 3 nucleotides, more preferable 2 nucleotides and most preferable one nucleotide differs from the corresponding nucleotide sequence of a gene expression product of said gene. The term complementary is known in the art and refers to a sequence that is related by base-pairing rules to the sequence that is to be detected. It is preferred that the sequence of the probe is carefully designed to minimize nonspecific hybridization to said probe. The specificity of the probe is further determined by the hybridization and/or washing conditions. The hybridization and/or washing conditions are preferably stringent, which are determined by inter alia the temperature and salt concentration of the hybridization and washing conditions, as is known to a person skilled in the art. An increased stringency will substantially reduce non-specific hybridization to a probe, while specific hybridization is not substantially reduced. Stringent conditions include, for example, washing steps for five minutes at room temperature 0.1x sodium chloride-sodium citrate buffer (SSC)/0.005% Triton X- 102. More stringent conditions include washing steps at elevated temperatures, such as 37 °Celsius, 45 °Celsius, or 65 °Celsius, either or not combined with a reduction in ionic strength of the buffer to 0,05x SSC or even 0,01x SSC, as is known to a skilled person. It is preferred that the probe is, or mimics, a single stranded nucleic acid molecule. The length of a probe can vary between 15 bases and several kilo bases, and is preferably between 20 bases and 1 kilobase, more preferred between 40 and 100 bases, and most preferred about 60 nucleotides. A most preferred probe comprises about 60 nucleotides. Said probe is preferably identical over the whole length to a nucleotide sequence of a gene expression product of a gene, or a cDNA product thereof. To determine an RNA expression level by micro arraying, gene expression products in the sample are preferably labelled, either directly or indirectly, and contacted with probes on the array under conditions that favour duplex formation between a probe and a complementary molecule in the labelled gene expression product sample. The amount of label that remains associated with a probe after washing of the microarray can be determined and is used as a measure for the gene expression level of a nucleic acid molecule that is complementary to said probe. Image acquisition and data analysis can subsequently be performed to produce an image of the surface of the hybridized array. For this, the array may be dried and placed into a laser scanner to determine the amount of labelled sample that is bound to a probe at a predetermined spot. Laser excitation will yield an emission with characteristic spectra that is indicative of the labelled sample that is hybridized to a probe molecule. An array preferably comprises multiple spots encompassing a specific probe. A probe preferably is present in duplicate, in triplicate, in quadruplicate, in quintuplicate, in sextuplicate or in octuplicate on an array. The multiple spots preferably are at randomized positions on an array to minimize bias. The amount of label that remains associated with a particular probe at each spot may be averaged, where after the averaged level can be used as a measure for the gene expression level of a nucleic acid molecule that is complementary to said probe. In addition, a gene product may be hybridized to two or more different probes that are specific for that gene product. The determined RNA expression level can be normalized for differences in the total amounts of nucleic acid expression products between two separate samples by comparing the level of expression of one or more genes that are presumed not to differ in expression level between samples such as glyceraldehyde-3-phosphate- dehydro-genase, β-actin, and ubiquitin. Conventional methods for normalization of array data include global analysis, which is based on the assumption that the majority of genetic markers on an array are not differentially expressed between samples (Yang et al., 2002. Nucl Acids Res 30: l5). Alternatively, the array may comprise specific probes that are used for normalization. These probes may detect RNA products from housekeeping genes such as glyceraldehyde-3-phosphate dehydrogenase and 18S rRNA levels, or a set of normalization such as provided in WO 2008/039071, which is hereby incorporated by reference, of which the RNA level is thought to be constant in a given cell and independent from the developmental stage or prognosis of said cell. Another preferred method for determining RNA expression levels is by sequencing, preferably next-generation sequencing (NGS), of RNA samples, with or without prior amplification of the RNA expression products. High throughput sequencing techniques for sequencing RNA, or RNA-seq, have been developed. NGS platforms, including Illumina® sequencing; Roche 454 pyrosequencing®, ion torrent and ion proton sequencing, and ABI SOLiD® sequencing, allow sequencing of fragments of DNA in parallel. Bioinformatics analyses are used to piece these fragments together by mapping the individual reads. Each base is sequenced multiple times, providing high depth to deliver accurate data and an insight into unexpected DNA variation. NGS can be used to sequence a complete exome including all genes or, alternatively, to sequence a number of individual genes. Pyrosequencing detects the release of inorganic pyrophosphate (PPi) as particular nucleotides are incorporated into the nascent strand (Ronaghi et al., 1996. Analytical Biochemistry 242: 84-9; Ronaghi, 2001. Genome Res 11: 3-11; Ronaghi et al., 1998. Science 281: 363; U.S. Patent No.6,210,891 ; U.S. Patent No. 6,258,568 ; and U.S. Patent No.6,274,320, which are all incorporated herein by reference. In pyrosequencing, released PPi can be detected by being immediately converted to adenosine triphosphate (ATP) by ATP sulfurylase, and the level of ATP generated is detected via luciferase-produced photons. NGS also includes so called third generation sequencing platforms, for example nanopore sequencing on an Oxford Nanopore Technologies platform, and single-molecule real-time sequencing (SMRT sequencing) on a PacBio platform, with or without prior amplification of the RNA expression products. Further high throughput sequencing techniques include, for example, sequencing-by-synthesis. Sequencing-by-synthesis or cycle sequencing can be accomplished by stepwise addition of nucleotides containing, for example, a cleavable or photobleachable dye label as described, for example, in U.S. Patent No. 7,427,673; U.S. Patent No. 7,414,116; WO 04/018497; WO 91/06678; WO 07/123744; and U.S. Patent No.7,057,026, all of which are incorporated herein by reference. Sequencing techniques also include sequencing by ligation techniques. Such techniques use DNA ligase to incorporate oligonucleotides and identify the incorporation of such oligonucleotides and are inter alia described in U.S. Patent No 6,969,488 ; U.S. Patent No. 6,172,218 ; and U.S. Patent No.6,306,597. Other sequencing techniques include, for example, fluorescent in situ sequencing (FISSEQ), and Massively Parallel Signature Sequencing (MPSS). Sequencing techniques can be performed by directly sequencing RNA, or by sequencing a RNA-to-cDNA converted nucleic acid library. Most protocols for sequencing RNA samples employ a sample preparation method that converts the RNA in the sample into a double-stranded cDNA format prior to sequencing. Conversion of RNA into cDNA and/or cRNA using a reverse-transcriptase enzyme such as M-MLV reverse-transcriptase from Moloney murine leukemia virus, or AMV reverse-transcriptase from avian myeloblastosis virus, is known to a person skilled in the art. Quantitative PCR (qPCR), or real-time PCR (RT-PCR), is a technique which is used to amplify and simultaneously quantify a template nucleic acid molecule such as DNA or RNA. A person skilled in the art will appreciate that qPCR of RNA may require the generation of a copy DNA strand by reverse transcriptase. The detection of the amplification product can in principle be accomplished by any suitable method known in the art. The amplified products may be directly stained or labelled with radioactive labels, antibodies, luminescent dyes, fluorescent dyes, or enzyme reagents. Direct DNA stains include for example intercalating dyes such as acridine orange, ethidium bromide, ethidium monoazide or Hoechst dyes. These intercalating dyes are non-specific and bind to all double stranded DNA in the PCR. An increase in DNA products during amplification, results in an increased fluorescence intensity being measured. Another direct DNA detection method includes the use of sequence specific DNA probes consisting of a fluorescent reporter and quencher. Upon binding of the probe to its complementary sequence, polymerases of the PCR break the proximity of the reporter and the quencher, resulting in the emission of fluorescence. Commonly used reporter dyes include FAM (Applied Biosystems), HEX (Applied Biosystems), ROX (Applied Biosystems), YAK (ELITech Group) or VIC (Life Technologies) and commonly used quenchers include TAMRA (Applied Biosystems), BHQ (Biosearch Technologies) and ZEN (Integrated DNA Technologies). Alternatively, the amplified product may be detected by incorporation of labelled dNTP bases into the synthesized DNA fragments. Detection labels which may be associated with nucleotide bases include, for example, fluorescein, cyanine dye and BrdUrd. For the simultaneous detection of multiple nucleic acid gene expression products, a multiplex qPCR can be used. In multiplex qPCRs, two or more template nucleic acid molecules are amplified and quantified in the same reaction. A commonly used method of achieving the simultaneous detection of multiple targets, is by using probes with different fluorescent dyes to distinguish distinct nucleic acid targets. RT-PCR can also be used as a proxy to quantify protein concentrations. For example, a protein can be detected by binding to an aptamer, followed by determining the amount of bound aptamer by RT-PCR. Said aptamer, for example a chemically optimized aptamer like a SOMAmer® (Low Off-Rate Modified Aptamer). The protein of interest can be bound to a surface after which the aptamer is incubated in order to bind the protein of interest. After washing any unbound aptamers, the bound aptamers can be eluted and quantified by RT-PCR. Alternatively, a weak binding nucleic acid molecule (NAM) can be designed that can bind to the target protein. The introduction of an aptamer to the NAM-target protein complex will interfere with the binding of the NAM and release the NAM. In this case the concentration of weak binding NAM can be quantified using RT- PCR, and used as a proxy for the amount of target protein. It is preferred in methods of the invention that genes are selected for normalization of the raw data. Preferred genes are genes of which the RNA expression levels are largely constant between individual samples comprising HCC cancer cells from one individual, and between samples comprising HCC cancer cells from different individuals. It will be clear to a skilled artisan that the RNA levels of said set of normalization genes preferably allow normalization over the whole range of RNA levels. Normalization methods that may be employed include, for example, mean correction, linear combination of factors, Bayesian methods and non-linear normalization methods such quantile normalization. Preferred methods include non-parametric regression methods such as locally estimated scatterplot smoothing (LOESS; Jacoby, 2000. Electoral Studies 19: 577–613) and locally weighted scatterplot smoothing (LOWESS; Cleveland et al., 1988. J American Statistical Association 83: 596–610). 4.5 Characterizing an individual for having HCC The invention provides a method for typing a sample of an individual for the presence of HCC or in other words a method for characterizing an individual as having HCC. Typing of a sample can be performed in various ways. For example, the difference or similarity between a sample’s protein concentration profile and a previously established reference protein concentration profile may be determined. The sample’s protein concentration profile is composed of the protein concentrations of a set of marker proteins in said sample. The reference protein concentration profile is composed of the average protein concentrations of the same set of marker proteins in a sample from a reference group. The reference group may comprise a single individual. Preferably the reference group comprises the average expression levels of at least 3, 5, 10, 25, 50, 100, 200 or 300 individuals. The reference group may include individuals with different non-HCC diagnoses. The reference group may also include individuals that all have HCC (i.e. HCC reference group) or individuals not having HCC (i.e. non-HCC reference group). Alternatively, a protein concentration profile of an individual can also be typed by comparing the individual’s protein concentration profile to multiple reference profiles. For example, the individual’s protein concentration profile can be compared to both reference profiles identified above (i.e. the HCC reference group and the non-HCC reference group). If the protein concentration profile of the individual’s sample is substantially more similar to the HCC reference group, when compared to the non-HCC reference group, it will be typed as HCC. The difference or similarity between a protein concentration profile and one or more reference profiles can be determined by determining a correlation of the concentrations of marker proteins in the profiles. For example, one can determine whether the protein concentration of a subset of marker proteins in a sample correlates to the protein concentration of the same subset of marker proteins in a reference profile. This correlation can be numerically expressed, for example by using a correlation coefficient. Several correlation coefficients can be used. Appropriate methods are established after determining whether concentration profiles are normally distributed, for example using the Shapiro-Wilk test. If the marker protein passes the normality test, an independent sample t-test can be performed to check if there is a statistically significant difference in the marker protein concentrations between different sample cohorts. If the protein concentrations are not normally distributed, a nonparametric analysis can be conducted, e.g. a Mann–Whitney U test. A correction for multiple testing can be performed, for example using the Benjamini-Hochberg method. It is also possible to construct a predictive model, for example using a supervised machine learning algorithm such as random forests or support vector machines. Said correlations between the protein concentrations of marker proteins in the individual’s sample and the reference group, can be used to produce an overall similarity score for the set of marker proteins used. A similarity score is a measure of the average correlation of protein concentrations of a set of proteins in a sample from an individual that is to be typed and a reference profile. Said similarity score can, but does not need to be, a numerical value between +1, indicative of a high correlation between the protein concentration profile of the set of proteins in a sample of said individual and said reference profile, and -1, which is indicative of an inverse correlation. A threshold can be used to differentiate between samples typed as HCC, and samples typed as non-HCC. Said threshold is an arbitrary value that allows for discrimination between samples from individuals without HCC, and samples of individuals with HCC. If a similarity threshold value is employed, it is preferably set at a value at which an acceptable number of individuals with HCC would score as false negatives, and an acceptable number of individuals without HCC would score as false positives. Based on the predictions made by the methods of the invention, one can determine a course of treatment of an individual with HCC. For example if the individual’s protein concentration profile is not substantially different from the non-HCC group, and/or substantially different from the HCC group, this indicates that the individual is predicted not to have HCC. 4.6 Methods of treating an individual with HCC Early diagnosis greatly improves the odds of successful treatment of HCC, resulting in higher survival rates. Curative treatment options, such as surgical resection, ablation and liver transplantation, are only available to patients with early-stage HCC, while patients with intermediate and advanced-stage HCC can only be provided with palliative care, such as chemoembolization, radioembolization or systemic therapy, which aims to alleviate suffering (Marrero et al., 2018. Hepatology 698: 723-750; Galle et al., 2018. J Hepatol 69: 182-236). Depending on several factors such as the HCC tumor stage, liver function etc, different treatment options are recommended. A multidisciplinary approach for optimal treatment of HCC is proposed by several authors such as Raza and Sood (2014. World J Gastroenterol 20: 4115-4127) and (2020. Am J cancer Res 10: 2993- 3036) and can be summarized as follows. Early stage (BCLC 0/A) HCC patients are usually treated with liver transplantation (LT) and other curative therapies. For intermediate HCC (BCLC B) locoregional treatments like transarterial chemoembolization (TACE) are mainly used. For patients with advanced HCC (BCLC C), systemic pharmacological treatment is the most effective. Surgical resection (also called hepatic resection (HR)) is considered as a good curative treatment option for patients with good liver function and HCC satisfying the Milan criteria, which involves up to 3 lesions < 3 cm or a single lesion < 5 cm and no extra hepatic manifestations (e.g. cirrhosis) or vascular invasion. Surgical resection has an increased risk of hepatic decompensation in the patients with cirrhosis. Liver transplantation is a potentially curative treatment and considered as a very effective treatment option, as it removes both the tumor and potential cirrhosis. Currently liver transplantation is recommended for the patients with HCC with BCLC stage A, whose tumor is within the Milan criteria for HCC, meaning that one lesion is not larger than 5 cm, or up to 3 lesions with each 3 cm or smaller. Liver transplantation is the only potentially curative treatment for selected patients with cirrhosis and HCC who are not candidates for surgical resection. Ablation, also called ablative therapy, is another curative treatment option for HCC. Examples of ablative therapy for HCC are radiofrequency ablation (RFA), microwave ablation (MWA), percutaneous ethanol injection (PEI), laser ablation (LSA), cryoablation (CRA), irreversible electroporation (IRE), high intensity focused ultrasound (HIFU) and their combinations. Ablation techniques lead to tumor tissue necrosis through various mechanisms, such as thermal coagulation, rapid freezing and chemical cell dehydration, with different post-ablation effects. Local ablation with RFA is considered a standard of care for the patients with very early and early stage tumors not suitable for surgery. Chemoembolization such as trans-arterial chemoembolization (TACE) is currently considered a standard treatment for patients with intermediate-stage HCC. Patients with compensated liver function, with a large single nodule (< 5 cm) or multifocal HCC without evidence of vascular invasion or extra hepatic spread are considered candidates for TACE. Trans-arterial radioembolization (TARE) or selective internal radiation therapy (SIRT) is another therapeutic option for intermediate-stage HCC. In radioembolization, implantable radioactive microspheres are delivered into the arteries that feed the tumor so that tumor nodules are treated irrespective of their number, size or location. Radioembolization is different from the TACE. In TACE, the embolizing particles or drug eluting particles are usually 100-500 μm in size, which cause ischemia of tumor; but in radioembolization the microspheres are usually smaller (35 μm) in diameter and deliver radiation to tumor without ischemia to the tumor or liver tissue. Molecular studies of HCC have identified aberrant activation of different signaling pathways, which represent key targets for novel molecular therapies. For patients with advanced disease, sorafenib is the only approved therapy, but novel targeted agents and their combinations are emerging. Systemic therapy for the treatment of HCC, and especially advanced HCC, includes therapy with sorafenib (BAY-43-9006, Nexavar®, Bayer), lenvatinib, nivolumab, regorafenib, cabozantinib, ramucirumab, pembrolizumab, capmatinib, nintedanib, axitinib, dovitinib, decitabine, codrituzumab, bevacizumab, erlotinib, temozolomide, veliparib, resminostat, AEG35156, capecitabine, refametinib, modified FOLFOX, sunitinib, erlotinib, linifanib and/or brivanib. One or more of these agents may be combined . Alternatively, one or more of these agents may be combined with a chemotherapeutic agent such as doxorubicin, octreotide and oxaliplatin, tegafur/uracil, cisplatin and gemcitabine and AVE 1642 (a human monoclonal antibody inhibiting the insulin-like growth factor-1 receptor), or a combination thereof. Alternatively, one or more of these agents may be combined with other therapeutic options as described herein above, for example a combination of sorafenib with TACE has been found an effective treatment option in patients with unresectable HCC (Cabrera et al., 2011. Aliment Pharmacol Ther 34:205 - 213). 4.7 Methods of treating an individual that is predicted not to have HCC In the case that a patient is predicted not to have HCC, it may not be recommended to provide an HCC treatment. Preferably, the method of typing for the presence of HCC according to the invention is performed on a sample of an individual who is an individual at risk of having or developing HCC, meaning said individual is having one or more risk factors for development of HCC and may therefore be recommended by their physician or clinical expert to participate in a HCC surveillance program. Examples of these risk factors are: cirrhosis, fibrosis, chronic hepatitis B, chronic hepatitis C, alcoholic liver disease, NAFLD, NASH, primary biliary cholangitis, primary hemochromatosis, auto-immune hepatitis, alpha-1 antitrypsin deficiency and Wilson’s disease (Marrero et al., 2018. Hepatology 68(2): 723-750). It is recommended for such individual at risk of having or developing HCC to remain included in a surveillance testing program, which means the individual will be regularly tested. As such, an individual who is predicted not to have HCC according to methods of the invention, but who liver disease symptom or risk factor, is recommended to be tested with the method of typing according to the invention at regular time intervals, such as every three years, preferably every two years, preferably every year, more preferably every six months. Furthermore, for individual at risk of having or developing HCC and predicted not to have HCC according to a method of typing of the invention, the recommended treatment would be to treat the risk factor using the best available options as to limit the risk to develop HCC in the future. The recommended treatment strategy for an individual at risk of having or developing HCC that is predicted not to have HCC thus relates to the individual’s underlying risk factor. In the case an individual at risk of having or developing HCC has cirrhosis, the cirrhosis is always accompanied by another risk factor for HCC. Thus, recommended treatment for such individual, that is at risk of having or developing HCC and predicted not to have HCC, with cirrhosis is therefore based on treatment of the underlying primary cause of the cirrhosis, which could be any of the other mentioned HCC risk factors. Recommended treatment for an individual, that is at risk of having or developing HCC and predicted not to have HCC, with chronic hepatitis B or C may include: antiviral medications, such as entecavir, tenofovir, lamivudine, adefovir and telbivudine; interferon injections, such as Interferon alfa- 2b (Intron A); or a liver transplant. Recommended treatment for an individual, that is at risk of having or developing HCC and predicted not to have HCC, with alcoholic liver disease may include: professional help to stop or reduce the drinking of alcohol, possibly including help with withdrawal symptoms (which may include medications such as benzodiazepine and psychological therapy such as cognitive behavioural therapy), possibly including help with relapse prevention (which can include psychological therapy and medications such as acamprosate, disulfiram, or naltrexone), possibly including referral to a self-help group; nutritional support to promote a more healthy diet; treatments to reduce inflammation of the liver, such as corticosteroids; or a liver transplant. Recommended treatment for an individual, that is at risk of having or developing HCC and predicted not to have HCC, with NAFLD may include: professional help to promote weight loss through a healthy diet and/or exercise; treatment to reduce blood cholesterol and triglycerides, such as statins (e.g. atorvastatin, rosuvastatin, simvastatin, fluvastatin, pravastatin, lovastatin), treatment to control diabetes, such as insulin, if the individual is also diabetic; treatment to reduce blood pressure, such as a diet change to lower salt intake and/or medications such as angiotensin-converting enzyme inhibitors (e.g. enalapril, lisinopril, perindopril, ramipril), angiotensin-2 receptor blockers (e.g. candesartan, irbesartan, losartan, valsartan, olmesartan), calcium channel blockers (e.g. amlodipine, felodipine, nifedipine, diltiazem, verapamil), diuretics (e.g. indapamide, bendroflumethiazide), beta blockers (e.g. atenolol, bisoprolol), if the individual has hypertension; or a liver transplant. Recommended treatment for an individual, that is at risk of having or developing HCC and predicted not to have HCC, with NASH may include: professional help to promote weight loss through a healthy diet and/or exercise; treatment to reduce blood cholesterol and triglycerides, such as statins (e.g. atorvastatin, rosuvastatin, simvastatin, fluvastatin, pravastatin, lovastatin), treatment to control diabetes, such as insulin, if the individual is also diabetic; treatment to reduce blood pressure, such as a diet change to lower salt intake and/or medications such as angiotensin-converting enzyme inhibitors (e.g. enalapril, lisinopril, perindopril, ramipril), angiotensin-2 receptor blockers (e.g. candesartan, irbesartan, losartan, valsartan, olmesartan), calcium channel blockers (e.g. amlodipine, felodipine, nifedipine, diltiazem, verapamil), diuretics (e.g. indapamide, bendroflumethiazide), beta blockers (e.g. atenolol, bisoprolol), if the individual has hypertension; or a liver transplant. Recommended treatment for an individual, that is at risk of having or developing HCC and predicted not to have HCC, with primary biliary cholangitis may include: treatment that could slow the progression of the disease, prevent complications, improve liver function and/or reduce liver scarring, such as ursodeoxycholic acid, obeticholic acid, fibrates, or budesonide; treatments to control symptoms, such as antihistamines, cholestyramine, rifampin, sertraline, or opioid antagonists; or a liver transplant. Recommended treatment for an individual, that is at risk of having or developing HCC and predicted not to have HCC, with primary hemochromatosis may include: therapeutic phlebotomy, i.e. periodical removal of blood; or chelation therapy, such as deferoxamine or deferasirox. Recommended treatment for an individual, that is at risk of having or developing HCC and predicted not to have HCC, with auto-immune hepatitis may include: corticosteroid medications such as prednisone or budesonide; or immunosuppressants, such as azathioprine. Recommended treatment for an individual, that is at risk of having or developing HCC and predicted not to have HCC, with alpha-1 antitrypsin deficiency may include: help with maintaining normal nutrition; or a liver transplant. Recommended treatment an individual, that is at risk of having or developing HCC and predicted not to have HCC, with Wilson’s disease may include: a diet change to reduce copper intake; copper chelating agents, such as penicillamine and trientine; copper absorption reducers, such as zinc acetate; or a liver transplant. The invention provides a method of treating an individual with HCC, comprising: - typing of a sample from said individual using a method of typing a sample of an individual with HCC according to the invention; - treating the individual that is typed as having HCC with a curative treatment; and - treating the individual that is typed as not having HCC with a treatment strategy related to the individual’s underlying risk factor for HCC. Curative HCC treatment preferably comprises liver transplantation, ablation, surgical resection or a combination thereof. Palliative treatment preferably comprises chemoembolization, radioembolization, systemic therapy, or a combination thereof. For the purpose of clarity and a concise description, features are described herein as part of the same or separate aspects and preferred embodiments thereof, however, it will be appreciated that the scope of the invention may include embodiments having combinations of all or some of the features described. The invention will now be illustrated by the following examples, which are provided by way of illustration and not of limitation and it will be understood that many variations in the methods described and the amounts indicated can be made without departing from the spirit of the invention and the scope of the appended claims. 5 EXAMPLES Example 1: retrospective case-control study for HCC biomarker discovery in NASH patients Materials and methods Patient inclusion criteria EDTA plasma samples were obtained from the biobank of the Transplant Centre of the LUMC. The selected study population composed of patients with cirrhosis or HCC that were over 18 years of age, with a history of NAFLD (non- alcoholic fatty liver disease) or NASH (non-alcoholic steatohepatitis); this was preferably also the primary liver disease etiology. The NASH etiology was chosen because it is expected to become more dominant in the near future and because early-stage HCC detection in NASH is currently more difficult (Straś et al., 2020. Clin Exp Hepatol 6: 170-175). Inclusion criteria for the cirrhosis control samples were pathologically diagnosed liver cirrhosis, with no diagnosis of HCC within the next year after the sample was taken. Criteria for the HCC case samples were pathologically diagnosed HCC, with the samples being taken prior to treatment or intervention, and without other malignancies present at the time of diagnosis. Patients were excluded if they had a simultaneous diagnosis of another unrelated liver disease, such as hemochromatosis, primary biliary cholangitis, or autoimmune hepatitis. Preferably, the selected patients had not been subjected to chemotherapy, biologic therapy, radiation therapy, or immunosuppressants during the 5 years before drawing a plasma sample. SomaScan assay characteristics The collected plasma samples were analysed using the SomaScan assay (v4.1), a fee-for-service aptamer-based proteomics platform provided by the American company SomaLogic, which provides 7,335 protein concentration measurements across a large number of different biological pathways (https://somalogic.com/somascan-assay/). The provided concentration data is given in relative fluorescence units (RFU), not absolute units such as mg/mL or mol/mL. Each of the 7,335 measurements relates to a unique aptamer sequence, but these are linked to only 6,414 unique Uniprot entries, with 34 aptamers having no related Uniprot entry. This is because a number of Uniprot entries are present multiple times in the dataset: 5,633 Uniprot entries are present once; 702 twice; 64 three times; 11 four times; 1 five times; 1 six times; 1 eight times; and 1 nine times. Similar overlap is seen for unique Entrez gene symbols (EGSs), protein target names and full names, but with small variations in the number of unique entries. The proteins most frequently present in the dataset are ‘heat shock 70 kDa protein 1A’ (nine times) and ‘Tenascin’ (eight times). Data analysis: individual proteins To identify a panel of candidate biomarkers for early-stage HCC detection in NASH patients, data analysis was performed on the proteomics dataset using Python. Distinguishing NASH-HCC patients from NASH patients was treated as a classification problem. For each individual protein we calculated the p-value, AUC, and fold change (FC) in expression in NASH-HCC patients versus NASH patients. P-values were calculated using independent two-sided t-tests and values < 0.01 were considered significant. P-values were corrected for multiple testing using the Bonferroni (Equation 1) and Benjamini-Hochberg corrections. In order to obtain more insight into the proteomic data, the data covariance was examined by conducting a principal component analysis (PCA). The results of this PCA were then used to establish an alternative, less conservatively adjusted significance level to correct for multiple testing (Equation 2). Uncorrected p-values were used in subsequent analysis steps. (Equation 1) (Equation 2)
Figure imgf000038_0001
Data analysis: protein ratios Next, biomarker ratios were generated for combinations of candidate biomarkers of which one was elevated in NASH-HCC samples (i.e. FC>1) and one was lowered in NASH-HCC samples (i.e. FC<1) compared to NASH samples. Only a subselection of proteins was included, to keep the computation time feasible; this subselection was based on p-value and AUC cutoffs. Biomarker ratios were made for the 360 proteins with a p-value <0.05 and/or an AUC >0.65. For each patient sample, ratios were generated by dividing the expression of the elevated NASH- HCC candidate biomarker by that of the lowered candidate biomarker. P-values and AUCs were subsequently calculated for each biomarker ratio. An overview of the data analysis workflow can be seen in Figure 1. Data analysis: support vector machine models Next, proteins were combined into detection models, using the support vector machine (SVM) supervised learning algorithm. This algorithm was selected because it is a powerful tool in machine learning, generally has a good off-the-shelf performance, and is relatively easy to explain to clinicians (Noble, 2006. Nat Biotechnol 24: 1565-1567). The SVM algorithm divides the patients into two classes, those with and those without HCC, by generating a hyperplane, otherwise known as a decision plane or decision boundary. Support vectors are the data points closest to the hyperplane that are used to define the decision boundary. The goal of the algorithm is to maximize the margin, i.e. the distance from the hyperplane to the support vectors, thereby maximally separating the two classes. SVM analysis was performed on the data using the Python package scikit- learn (v1.1.1) (Pedregosa, 2011. J Mach Learn Res 12: 2825-2830). SVM models were trained on combinations of two, three, and four proteins. As with the biomarker ratios, the analyses were only conducted on subselections of proteins to limit the computation time. Proteins to be combined were selected based on p-value and AUC cutoffs: combinations of two proteins were made for the 185 proteins with a p-value <0.03 and/or an AUC >0.67; combinations of three proteins for the 61 proteins with a p-value <0.01 and/or an AUC >0.7; and combinations of four proteins for the 41 proteins with a p-value <0.006 and/or an AUC >0.705 (see Figure 1). To illustrate the reason for selecting fewer proteins for combinations of four proteins than for combinations of three or two proteins, selections of 185 and 61 proteins have 47,239,010 and 521,855 unique combinations of four proteins, respectively. For each protein combination, SVM models were generated for 10 random states. For each random state, the dataset was randomly split 50/50 into a training set and a test set. Protein expression data of the training and test sets were scaled separately to zero mean and unit variance. For each random state, three SVM parameters were fine-tuned: the algorithm, the kernel coefficient gamma, and the regularisation parameter C. The algorithm is the function type used for the kernel; either ‘linear’ or ‘radial base function’ (rbf) was used. The kernel coefficient gamma defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’; the used inputs were 0.01, 0.1, 1 or scale, with scale corresponding to 1/(n_features*X.var()). Lastly, the regularisation parameter C indicates the trade-off between correct classification of training samples and maximisation of the decision function’s margin, with larger values of C leading to a smaller margin if the decision function is better at classifying all training points correctly, while a lower C promotes a larger margin and a simpler decision function at the cost of training accuracy; for C the inputs ranged from 0.1 to 1 with intervals of 0.1. For clarity, this means that two different algorithm options, four different gamma values, and ten different C values were tested, resulting in a total of 80 parameter setting combinations to be tested for each random state. For each random state per protein the optimal parameter settings were then selected based on the SVM model with the highest AUC. For each protein combination, the maximum AUC, average AUC, and standard deviation (SD) of the AUC of the 10 random states were calculated. In addition, to gain additional insights into the different performances of the best SVM models combining two, three, and four proteins, the maximal Youden index was calculated, which is defined as: ^^ ^^ ^^ ^^ = ( ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ + ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ − 1) = ^^ ^^ ^^ ( ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ + ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^) (Equation 3) The model’s sensitivity (or true positive rate, TPR) relates to the chance that patients who are positive (i.e. who have HCC) are classified correctly, while the model’s specificity (or true negative rate, TNR), indicates the chance of negative patients (i.e. who don’t have HCC) to be correctly classified. The maximal Youden index thus provides the optimal trade-off between sensitivity and specificity for a specific model. Data analysis: random forest models The first sets of generated SVM models contained proteins which were selected based on calculated p-values and AUCs. It was noted that using only this approach to select proteins for inclusion in the generated SVM models has a risk of introducing a bias. To address this risk, the random forest machine learning algorithm was used to simultaneously rank all 7,335 protein measurements based on their predictive performance in an unbiased manner. First, the dataset was randomly split 50/50 into a training set and a test set. The Boruta ranking method was used to select proteins with better-than-random predictive performance. This method adds copies of all features, so-called Shadow Features, and shuffles the values of the newly created features to remove their correlation with the response. Subsequently, a Random Forest Classifier is built on the Shadow Features to determine their importance quantified by a Z-score. Then, the algorithm assesses if the original features have a higher importance than the maximum importance of the Shadow Features. If the Z-score of an original feature is higher, the feature is considered significant and retained, otherwise the feature is dropped. This process is repeated until either a specified number of iterations has been reached or when all original features have been retained or dropped. The maximum number of iterations per run was set to 100. Because the output of retained important features can vary per run, the function was executed 1000 times, providing 1000 sets of protein markers. The protein markers were then ranked on frequency of selection. In order to keep the computation time for these additional SVM models feasible as well, the proteins were considered to have a significant predictive performance if they appeared in at least 1% of the random forest selections; with 1000 iterations, this corresponds with a frequency of 10. This way, an alternative selection of proteins was established for which SVM models with combinations of 2, 3 or 4 proteins were generated as well using the same parameters as mentioned above (see Figure 1). Data analysis: candidate biomarker selection After obtaining the computed results, the generated SVM models were combined and sorted in descending order by their determined average model AUC. From the combined model results, only those models were selected with an average AUC of at least 0.83. This number was based on the reported AUC for the protein PIVKA-II in a similar study (Best et al., 2020. Clin Gastroenterol Hepatol 18: 728- 735); PIVKA-II is an extensively studied biomarker for HCC (Cui et al., 2016. Cancer Invest 34: 459-464), therefore it was chosen as a benchmark that models from this study will need to surpass in order to qualify as candidates for the development of an improved screening test. A list of potential HCC biomarkers was then assembled by gathering all unique proteins that were present in the list of models meeting this cutoff. The discovered protein biomarkers were then ranked to provide an indication of the diagnostic potential of individual markers and to distinguish which markers might be prioritised in the preparation of the follow-up study. This ranking was achieved through the following steps. (1) First the list of SVM models was expanded to include all possible combinations of 2, 3, and 4 of the identified proteins, in order to make a fair comparison. For model combinations which had not been tested yet, new SVM results were obtained using the same approach as described above. (2) Next, the SVM models were combined and sorted by highest average AUC. (3) For each model in the resulting selection, a number equal to the length of the list of models minus the position of the model in the list was added to the performance scores of the markers that the model composed of. For example, if 100 models were in the list, then a score of +100 would be added to the performance score of all proteins present in the best model, all proteins in the second-best model would gain +99 to their performance score and so on, until finally +1 would be added to the scores of all proteins in the last model of the list. (4) Finally, the markers were then sorted by their respective performance score. This workflow can also be seen in Figure 1. Intercorrelation and confounder analysis To find out whether the selected biomarkers showed strong associations with each other or not, the intercorrelation between the selected protein biomarkers was assessed using a correlation matrix; the result was visualised by generating a correlation heatmap. Furthermore, it was investigated whether the proteins showed correlation with any of the covariates for which data was available. As can be seen in the overview of the patient characteristics of the case (NASH-HCC) and control (NASH) cohorts (Table 2), the NASH-HCC cohort contains both significantly older people and more males compared to the NASH control cohort. Therefore, it was considered important to investigate whether the identified proteins were linked to any covariates rather than to HCC presence. Using ordinary least squares (OLS) regression, statistically significant associations were analysed between the proteins in the biomarker panel and the covariates for which data was available: patient age, sex, body mass index, and a number of clinical measures for HCC stage and liver damage. Expanded support vector machine models Lastly, it was investigated whether the predictive performance of the SVM models could be enhanced further if they combine up to eight instead of four proteins. Furthermore, it was investigated whether adding patient age and sex as model input variables could contribute as well. For obtaining the SVM models, the same workflow was followed as detailed above, but with the following alterations: as it was observed that nearly all selected SVM models from the previously generated set made use of a linear kernel type and a scaled kernel coefficient gamma, the variable SVM parameters were limited to just the regularization parameter C (ranging from 0.1 to 1 with intervals of 0.1). This made the script significantly more efficient as the number of models generated per protein combination per random state was reduced from 80 to just 10. To further limit the required computation time, only the 18 best-performing proteins were included in the generated models. Python All analyses were performed in Jupyter Notebook (v6.5.2), which uses the programming language Python (v3.9.12). The package Canopy (v0.4.0) was used to load the data provided by SomaLogic22. Other packages that were used extensively included: pandas (v1.4.4); numpy (v1.23.5); scipy (v1.9.3); sklearn (v1.0.2); statsmodels (v0.13.2); matplotlib (v3.6.2); seaborn (v0.12.2); and dataframe-image (v0.1.10). Results Patient characteristics The characteristics of the 78 patients that were included in the clinical study are reported in Table 2. The NASH control cohort and the NASH-HCC case cohort included 40 (51.3%) and 38 (48.7%) patients respectively. Based on the characteristics, it could be concluded that there is a bias in the data, with NASH- HCC patients being older (p=1.26e-6) and more often male (p=6.70e-3) compared with NASH control patients. No significant bias was seen for BMI (body mass index) (p=0.452). The majority of the NASH-HCC cases were in an early stage (60.5%), with smaller numbers in very early (18.4%) and intermediate (15.8%) stages; only two cases were in an advanced stage (5.3%) and no cases were in the end-stage. Combined, the patients with HCC in the early and very early stages accounted for 78.9% of the case group. Furthermore, no significant p-values (<0.05) were found for: the MELD score (Model for End-Stage Liver Disease); Child Pugh score, which estimates cirrhosis severity; Fibrosis score, which estimates the amount of liver scarring; for plasma level measurements of bilirubin, albumin, creatine, AST, ALT, or thrombocytes, which are all clinical indicators for liver disease; or for INRs (International Normalized Ratios), i.e. the degree of relatively faster or slower blood clotting, which is also an indicator of liver disease. Table 2. Overview of patients characteristics in the NASH-HCC and NASH cohorts. Abbreviations: IQR = interquartile range; NASH = nonalcoholic steatohepatitis; HCC = hepatocellular carcinoma; BCLC = Barcelona Clinic Liver Cancer staging system; MELD = Model For End-Stage Liver Disease; AST = aspartate aminotransferase; ALT = alanine transaminase. Covariate NASH- NASH P-value HCC (N=40) (N=38) Age (years) Median 67.9(47.2- 56.0(21.2- 1.26e-6 (range) 80.2) 76.5) Sex 6.70e-3 Female 3 (7.9%) 13 (32.5%) Male 35 (92.1%) 27 (67.5%) Body Mass Index (kg/m2) Median 30.7 (6.1) 30.3 (6.9) 0.452 (IQR) BCLC stage Stage 0 (very early) 7 (18.4%) 0 (0.0%) Stage A (early) 23 (60.5%) 0 (0.0%) Stage B (intermediate) 6 (15.8%) 0 (0.0%) Stage C (advanced) 2 (5.3%) 0 (0.0%) Stage D (end-stage) 0 (0.0%) 0 (0.0%) MELD score Median 9.0 (9.8) 8.0 (8.3) 0.825 (IQR) Child Pugh score 0.142 A (well-compensated 29 (76.3%) 24 (60.0%) disease) 7 (18.4%) 12 (30.0%) B(functional compromise) 2 (5.3%) 4 (10.0%) C (decompensated disease) Fibrosis score 0.0258 0 (no liver scarring) 0 (0.0%) 0 (0.0%) 1 (little liver scarring) 9 (23.7%) 2 (5.0%) 2 (moderate liver scarring) 6 (15.8%) 10 (25.0%) 3 (severe liver scarring) 13 (34.2%) 7 (17.5%) 4 (late-stage liver scarring) 10 (26.3%) 21 (52.5%) Plasma bilirubin (µmol/L) Median 15.5 (12.8) 14.5 (38.25) 0.194 (IQR) Plasma albumin (g/L) Median 41.0 (7.50) 46.0 (11.5) 0.529 (IQR) Plasma creatine (µmol/L) Median 80.5 (36.5) 85.0 (28.5) 0.198 (IQR) Plasma AST (U/L) Median 51.0 (44.5) 50.5 (42.5) 0.810 (IQR) Plasma ALT (U/L) Median 45.0 (20.3) 41.0 (71.3) 0.129 (IQR) Plasma thrombocytes (109/L) Median 139.5 179.0 0.576 (IQR) (107.5) (121.5) International Normalized Ratio Median 1.20 (0.18) 1.10 (0.30) 0.896 (IQR) Data analysis: individual proteins To identify a panel of candidate biomarkers for early-stage HCC detection, first, the performance was assessed of individual proteins in distinguishing NASH and NASH-HCC patients. The volcano plot in Figure 2 illustrates that 46 proteins were able to significantly differentiate NASH and NASH-HCC patients (p<0.01). However, no individual proteins could detect HCC upon correcting for multiple testing with either the Bonferroni or the Benjamini-Hochberg correction (data not shown). The ‘wing’ patterns displayed in the bottom left and right of the volcano plot were caused by high variability; for instance, proteins on the bottom right were upregulated in only a few NASH-HCC patients or even just a single patient. Table 3 presents the top 10 proteins in terms of p-value and AUC. The most significant protein, IGFALS, had a p-value of 6.35e-5 and an AUC of 0.75. The maximum AUC of 0.75 was achieved by Glypican 3 and IGFALS. Distribution plots of the protein levels among the two patient cohorts can be found in Figure 3. In addition, ROC (receiver operating characteristic) curves, which show the relation between the true positive rate and the corresponding false positive rate, were plotted for Glypican 3, IGFALS, and AFP (Figure 4); AFP was added for reference, since this protein is a known HCC biomarker that is sometimes used for diagnosis but which shows a poor model performance in earlier disease stages. To gain more insights from the proteomic data, a PCA was conducted to examine the data covariance. As can be seen in Figure 5, the covariation was high: out of the total of 7,335 principal components (PCs) that were generated to describe the data variation (i.e. the number of protein measurements), only the first 64 and 73 were needed to describe more than 95% and 99% of the data variance, respectively. The high data covariance illustrates that a Bonferroni-adjusted significance level, as seen in Equation 1, is a rather conservative multiple testing adjustment. A different approach to adjust for multiple testing could be to substitute the number of features in Equation 1 with the number of PCs needed to account for 99% of the data variance (Equation 2), which increases the significance level with a factor 100. This alternative significance level was added to the volcano plot in Figure 2, to establish whether a PC-based significance level would yield more results. Only one protein was found to have a significant p-value using the PC-based significance level adjustment: IGFALS (p=6.35e-5). Table 3. Top 10 individual biomarkers with the highest performance in terms of p-value (left) or in terms of AUC (right) for detecting HCC in NASH patients. AUC = area under the curve; FC = fold change.
Figure imgf000046_0001
Data analysis: protein ratios Following the individual protein analysis, biomarker signatures were generated by taking the ratios of biomarker candidates that were lowered in NASH-HCC cases (FC<1) over those that were elevated (FC>1). Their ability to distinguish NASH and NASH-HCC patients was evaluated based on p-value and AUC. The top 10 results are reported in Table 4. Most top 10 models ranked by p- value also performed relatively well in terms of AUC when compared with the top 10 models ranked by AUC, all of them exceeding 0.78 and some even reaching an AUC of 0.83. On the other hand, some of the top models ranked by AUC performed poorly in terms of p-value; multiple models showed p-values higher than 0.05. The best-performing biomarker ratio in terms of P-value combined proteins GON2 and WISP-2, producing a p-value of 4.6e-7 and an AUC of 0.82. The biomarker ratio with the highest performance in terms of AUC combined proteins HMGR and Glypican 3, producing a decent AUC of 0.84, but a poor p-value of only 0.17. For both top ratios the distribution and ROC curve plots are presented in Figure 6 and Figure 7, respectively. Table 4. Top 10 biomarker ratios with the highest performance in terms of P-value (left) or in terms of AUC (right) for detecting HCC in NASH patients. AUC = area under the curve; FC = fold change. sorted by P-value sorted by AUC No. Protein P- AU F P- AU F s value C C Proteins value C C 1 GON2 | 4.6e-7 0.82 1.5 HMGR | 1.7e-1 0.84 5.5 WISP-2 Glypican 3 2 UB2L6 | 5.5e-7 0.83 1.4 PKD2 | 1.6e-1 0.84 5.7 WISP-2 Glypican 3 3 KLRG2 | 7.0e-7 0.81 1.9 HMGR | CHIN 4.5e-2 0.84 1.5 IL-1 sRI 4 GON2 | 8.6e-7 0.82 1.6 Carbonic 1.3e-1 0.84 4.8 OPG anhydrase 9 | Glypican 3 5 KLRG2 | 1.1e-6 0.80 1.9 KLRG2 | WISP- 1.6e-6 0.83 1.8 Omentin 2 6 DAZP1 | 1.1e-6 0.80 2.3 KLRG2 | 6.7e-5 0.83 2.3 TIMP-4 PEX14 7 DCNL3 | 1.3e-6 0.82 1.4 RM38 | PEX14 1.8e-5 0.83 1.9 WISP-2 8 XYLT2 | 1.6e-6 0.81 1.4 PKD2 | PEX14 1.0e-5 0.83 2.1 WISP-2 9 KLRG2 | 1.6e-6 0.83 1.8 RN215 | 1.9e-1 0.83 7.2 WISP-2 Glypican 3 10 MGAT1 1.8e-6 0.78 1.4 RAB38 | 1.8e-1 0.83 6.6 | WISP-2 Glypican 3 Data analysis: random forest models To select candidate protein biomarkers for inclusion in SVM models using an alternative and unbiased approach, the random forest machine learning algorithm was used in combination with the Boruta feature selection method. Proteins were selected for inclusion in the generated SVM models if they appeared in at least 1% of the random forest selections; for 1000 iterations this corresponded with a frequency of 10 or more. This resulted in a selection of 32 proteins. The protein IGFALS was most frequently classified as having significant predictive importance, being selected in 998 out of 1000 BorutaPy runs. Glypican 3 and ASAP2 were ranked second and third, being selected 989 and 895 out of 1000 times respectively. Number four, TMED9, sees a lower number of selections, being selected 685 times. After TMED9, the number of selections rapidly decreased, with the last 7 markers being selected less than 100 times by the Boruta method. Of note is that this set of 32 proteins showed a large overlap with the proteins that were selected based on calculated AUCs and p-values; 20, 22, and 28 out of the 32 proteins were also present in the sets of 41, 61, and 185 proteins selected for inclusion in SVM models combining 4, 3, and 2 proteins, respectively (Figure 1). This overlap limited the number of additional SVM models to be generated following these results. Data analysis: support vector machine models In order to create a detection model with better performance than PIVKA-II, by a larger margin than the protein ratios, support vector machine models were generated for combinations of a given number of proteins, and AUCs on the test set were calculated for 10 different random 50/50 data splits. Firstly, the results will be reported of the models combining two proteins. The detection model with the highest average AUC combined proteins Glypican 3 and ARL4D, thereby achieving an average AUC of 0.88 (Table 5). Figure 8A shows the ROC curves of this model for the individual random states as well as the mean ROC curve and standard deviation. Even though the performance of this model in terms of average AUC exceeded that of PIVKA-II, the improvement was marginal. The best-performing detection model combining three proteins was trained on the same proteins as the best-performing model combining two proteins, namely Glypican 3 and ARL4D, as well as protein HMGR (Table 5, Figure 8B). The addition of HMGR resulted in higher performance as indicated by an average AUC of 0.90. Lastly, the detection model of four proteins that achieved the highest average AUC combined proteins Glypican 3, PEX14, PKD2, and KLRG2 (Table 5, Figure 8C), with an average AUC of 0.92. Interestingly, Table 5 shows considerable overlap in the proteins that were used in the 10 best-performing detection models for combinations of four proteins. To gain additional insights into the performances of the SVM models combining two, three, and four proteins, particularly when compared to each other, the so-called Youden’s index was calculated for the best performing models. As can be seen in the ROC curves in Figure 8, the optimal Youden indices for the best SVM models combining two, three and four proteins were, respectively: 0.66, with a corresponding TNR and TPR of 0.85 and 0.81; 0.69, with a corresponding TNR and TPR of 0.83 and 0.86; and 0.72, with a corresponding TNR and TPR of 0.90 and 0.82. Thus, both the AUC and Youden’s J appeared to increase with additional proteins. Table 5. Top 10 SVM models for combinations of two, three, and four proteins with the highest performance in terms of average AUC for detecting HCC in NASH patients. Mean AUC and SD were taken over 10 different random data-splits. AUC = area under the curve; SD = standard deviation; Jmax = optimal Youden index, with the corresponding true negative rate (TNR) and true positive rate (TPR). Comb. No. Protein combination Mean AUC SD Jmax(TNR, TPR) 2 1 Glypican 3 | ARL4D 0.88 0.03 0.66 (0.85, 0.81) 2 2 Glypican 3 | NKG2E 0.86 0.04 0.58 (0.80, 0.78) 2 3 Glypican 3 | HMGR 0.83 0.05 0.49 (0.67, 0.82) 2 4 Glypican 3 | PKD2 0.83 0.05 0.51 (0.80, 0.72) 2 5 Glypican 3 | IL-3 0.83 0.05 0.53 (0.75, 0.78) 2 6 Glypican 3 | NDE1 0.82 0.04 0.51 (0.89, 0.63) 2 7 Glypican 3 | LR2BP 0.82 0.05 0.50 (0.81, 0.70) 2 8 RAB38 | Carbohydrate 0.82 0.03 0.57 (0.73, 0.84) sulfotransferase 9 2 9 IMPA1 | CP2CJ 0.82 0.05 0.48 (0.60, 0.88) 2 10 Glypican 3 | MSTN1 0.81 0.03 0.47 (0.76, 0.72) 3 1 Glypican 3 | ARL4D | 0.90 0.03 0.69 (0.83, 0.86) HMGR 3 2 Glypican 3 | ARL4D | 0.90 0.04 0.65 (0.82, 0.83) LR2BP 3 3 Glypican 3 | ARL4D | 0.89 0.03 0.64 (0.78, 0.87) NKG2E 3 4 Glypican 3 | PEX14 | 0.89 0.04 0.62 (0.85, 0.77) PKD2 3 5 Glypican 3 | ARL4D | 0.89 0.03 0.67 (0.85, 0.82) NDE1 3 6 Glypican 3 | ARL4D | 0.89 0.03 0.66 (0.82, 0.84) UB2L6 3 7 Glypican 3 | ARL4D | 0.89 0.03 0.64 (0.83, 0.81) AADAT 3 8 Glypican 3 | HMGR | 0.89 0.04 0.60 (0.80, 0.80) NKG2E 3 9 Glypican 3 | ARL4D | 0.89 0.03 0.66 (0.85, 0.81) DCNL3 3 10 Glypican 3 | ARL4D | 0.89 0.03 0.63 (0.80, 0.83) MSTN1 4 1 Glypican 3 | PEX14 | 0.92 0.03 0.72 (0.90, 0.82) PKD2 | KLRG2 4 2 Glypican 3 | PEX14 | 0.92 0.04 0.72 (0.85, 0.87) PKD2 | RAB38 4 3 Glypican 3 | ARL4D | 0.91 0.03 0.70 (0.80, 0.90) HMGR | NKG2E 4 4 Glypican 3 | ARL4D | 0.91 0.04 0.72 (0.83, 0.89) HMGR | LR2BP 4 5 Glypican 3 | ARL4D | 0.91 0.03 0.69 (0.81, 0.88) KLRG2 | NKG2E 4 6 Glypican 3 | ARL4D | 0.91 0.02 0.70 (0.83, 0.87) KLRG2 | MSTN1 4 7 Glypican 3 | PEX14 | 0.91 0.04 0.66 (0.80, 0.86) PKD2 | ARL4D 4 8 PEX14 | ARL4D | 0.91 0.05 0.67 (0.75, 0.92) KLRG2 | CPT1B 4 9 Glypican 3 | KLRG2 | 0.91 0.03 0.66 (0.83, 0.83) NKG2E | MSTN1 4 10 Glypican 3 | PEX14 | 0.90 0.05 0.65 (0.80, 0.85) ARL4D | RAB38 Data analysis: candidate biomarker selection Next, a list of potential biomarkers was constructed based on the performance of the SVM models they were present in. First, all SVM models were combined, including both those based on proteins selected by p-value and AUC cut-offs and those based on proteins selected from the Boruta random forest analysis. Then the SVM models were selected with an average AUC of at least 0.83, including models with combinations of two, three and four proteins. Of the SVM models with combinations of two from a set of 189 proteins, only two models passed the AUC cut-off, Glypican 3|ARL4D and Glypican 3|NKG2E, which included three unique proteins. For the SVM models with combinations of three proteins, 212 models surpassed the AUC cut-off, while covering 46 unique proteins of the 71 that were included in total. For the SVM models with combinations of four proteins, 2258 models passed the AUC cut-off while covering 47 unique proteins of the 53 that were included in total (see Figure 1). Combined, this resulted in a list of 2472 SVM models which predominantly contained SVM models combining 4 proteins; across the top 20 of this list all models combined 4 proteins. The selection of 2472 SVM models covered a total of 57 unique proteins. Next, these 57 unique proteins were ranked according to the positions of all the models they were present in from the combined list after sorting by average AUC. Considering that some of the selected candidate markers might be based on models combining only two proteins while falling outside of the chosen scope for the models combining three or four proteins, it was decided that a more fair ranking could be established based on an expanded list of SVM models covering all possible combinations of two, three or four proteins from the set of 57 candidate markers. A total of 6,132 of the resulting SVM models surpassed the AUC cutoff of 0.83: 2, 235, and 5895 models combining 2, 3, and 4 proteins, respectively. The top 10 models for each of the combinations, as reported in Table 5, remained the same. This list of models yielded the ranked overview of the identified 57 candidate HCC protein biomarkers that is presented in Table 6. Additional identifiers for the proteins are listed in Table 7. Looking at the scores across the list of unique proteins, it can be remarked that there is a clear drop in performance between the best-performing markers and the others; the plot in Figure 9 illustrates this as well. In order to demonstrate how some of the different protein combinations performed without showing all 6,132 models, Table 8 shows the results of all possible model combinations between the first 6 of the 57 identified proteins. Note that models combining 5 and 6 different proteins were added as well, these were established in a later stage of the analysis (see the Expanded support vector machine models section). Table 6. Overview of the 57 unique proteins that were found in the selection of SVM models with average AUCs higher than 0.83. The protein score is based on the positions in the list, sorted by average AUC, of all models the protein appeared in. The best model and AUC refer to the position and AUC of the best model the protein appeared in. FC = fold change of the protein in the NASH- HCC cohort compared to the NASH control cohort. ES = effect size. No Best model Best . Protein Score AUC FC ES 1 Glypican 3 8977122 1 0.924 4.90 0.32 2 PEX14 6861637 1 0.924 1.69 0.79 3 KLRG2 6047003 1 0.924 0.43 -0.64 4 ARL4D 3761359 4 0.912 1.89 0.50 5 RAB38 3325116 2 0.917 0.46 -0.29 6 PKD2 2761412 1 0.924 0.74 -0.48 7 NKG2E 2738222 3 0.915 1.65 0.33 8 GALNS 2462891 275 0.880 1.31 0.73 9 UB2L6 2261817 16 0.905 0.82 -0.66 10 TM157 2087919 15 0.906 1.22 0.70 11 TRA2B 2033617 169 0.886 1.39 0.60 12 CP2CJ 1888607 301 0.878 3.20 0.61 13 IMPA1 1858590 452 0.872 1.33 0.59 14 Persephin 1615762 3 0.915 1.16 0.62 15 MSTN1 1601791 10 0.909 1.29 0.52 16 DCNL3 1592052 19 0.904 0.88 -0.68 17 Apaf-1 1490537 45 0.899 0.83 -0.74 18 HMGR 1452839 4 0.912 0.87 -0.57 19 LR2BP 1393807 7 0.910 0.93 -0.66 20 NDE1 1262177 14 0.907 0.89 -0.64 21 Amylase, alpha 1A 1238950 73 0.894 1.28 0.53 22 C1RL1 1154749 58 0.896 0.85 -0.64 23 ASAP2 1131667 44 0.900 0.74 -0.47 24 AADAT 1060631 6 0.911 0.88 -0.61 25 CPT1B 1059270 13 0.908 1.14 0.69 26 DRB3 981168 37 0.901 0.85 -0.68 27 SOST 973478 5 0.911 1.16 0.62 28 TRI54 821581 103 0.891 0.86 -0.62 29 ARY1 780869 256 0.881 0.86 -0.70 30 RASF2 775372 176 0.886 0.89 -0.64 31 SOSSC 765002 275 0.880 0.78 -0.46 32 DJB12 580867 225 0.882 1.30 0.57 33 N-terminal pro-BNP 546813 301 0.878 1.71 0.49 34 UGT 1A6 519918 379 0.875 1.08 0.06 35 DLDH 485929 430 0.873 1.39 0.78 36 TRIM9 442517 72 0.894 0.96 -0.19 37 TPGS2 432974 233 0.882 0.93 -0.62 38 DERM 407343 138 0.888 1.24 0.62 39 CST8 377197 758 0.865 1.21 0.70 No Best model Best . Protein Score AUC FC ES 40 PAHX 363834 248 0.881 0.92 -0.39 41 IGF-II receptor 302709 482 0.871 1.14 0.50 42 Met 291369 421 0.873 1.26 0.53 43 SLIK1 260791 758 0.865 0.62 -0.85 44 Omentin 239336 625 0.867 1.35 0.79 45 Calgranulin A 194211 986 0.861 0.81 -0.66 46 WISP-2 187052 1569 0.853 1.27 0.89 47 CYTD 167863 786 0.864 1.53 0.67 48 Lefty-A 122893 1804 0.851 1.27 0.78 49 IL-1 sRI 103058 415 0.873 1.29 0.64 50 FBLN3 53601 1048 0.860 1.11 0.15 51 fibulin 5 51185 1640 0.853 1.30 0.51 52 RNAS4 42189 1316 0.856 1.19 0.67 53 IGF-I 14705 3424 0.840 0.70 -0.67 54 PGM5 11921 1675 0.852 1.36 0.66 55 TMED9 11563 2044 0.849 1.19 0.37 56 OPG 4161 3300 0.841 1.30 0.81 57 Lectin, mannose- 3788 4293 0.836 1.22 0.41 binding 2 Table 7. Overview of additional identifiers for the 57 potential NASH-HCC biomarkers. EGS = Entrez Gene Symbol. Up/down = upregulation or downregulation in NASH-HCC samples compared with NASH samples. No Protein Full name EGS Uniprot Up/dow . n 1 Glypican 3 Glypican-3 GPC3 P51654 up 2 PEX14 Peroxisomal membrane PEX14 O75381 up protein PEX14 3 KLRG2 Killer cell lectin-like KLRG2 A4D1S0 down receptor subfamily G member 2 4 ARL4D ADP-ribosylation ARL4D P49703 up factor-like protein 4D 5 RAB38 Ras-related protein RAB38 P57729 down Rab-38 6 PKD2 Polycystin-2 PKD2 Q13563 down 7 NKG2E NKG2-E type II KLRC3 Q07444 up integral membrane protein 8 GALNS N-acetylgalactosamine- GALNS P34059 up 6-sulfatase 9 UB2L6 Ubiquitin/ISG15- UBE2L6 O14933 down conjugating enzyme E2 L6 No Protein Full name EGS Uniprot Up/dow . n 10 TM157 Membrane protein FAM174A Q8TBP5 up FAM174A 11 TRA2B Transformer-2 protein TRA2B P62995 up homolog beta 12 CP2CJ Cytochrome P4502C19 CYP2C19 P33261 up 13 IMPA1 Inositol IMPA1 P29218 up monophosphatase 1 14 Persephin Persephin PSPN O60542 up 15 MSTN1 Musculoskeletal MUSTN1 Q8IVN3 up embryonic nuclear protein 1 16 DCNL3 DCN1-like protein 3 DCUN1D3 Q8IWE4 down 17 Apaf-1 Apoptotic protease- APAF1 O14727 down activating factor 1 18 HMGR 3-hydroxy-3- HMGCR P04035 down methylglutaryl- coenzyme A reductase 19 LR2BP LRP2-binding protein LRP2BP Q9P2M1 down 20 NDE1 Nuclear distribution NDE1 Q9NXR1 down protein nudE homolog 1 21 Amylase, Alpha-amylase 1 AMY1A P04745 up alpha 1A 22 C1RL1 Complement C1r C1RL Q9NZP8 down subcomponent-like protein 23 ASAP2 Arf-GAP with SH3 ASAP2 O43150 down domain, ANK repeat and PH domain- containing protein 2 24 AADAT Kynurenine/alpha- AADAT Q8N5Z0 down aminoadipate aminotransferase, mitochondrial 25 CPT1B Carnitine O- CPT1B Q92523 up palmitoyltransferase 1, muscle isoform 26 DRB3 HLA class II HLA-DRB3 P79483 down histocompatibility antigen, DR beta 3 chain 27 SOST Sclerostin SOST Q9BQB4 up 28 TRI54 Tripartite motif- TRIM54 Q9BYV2 down containing protein 54 29 ARY1 Arylamine N- NAT1 P18440 down acetyltransferase 1 No Protein Full name EGS Uniprot Up/dow . n 30 RASF2 Ras association RASSF2 P50749 down domain-containing protein 2 31 SOSSC SOSS complex subunit INIP Q9NRY2 down C 32 DJB12 DnaJ homolog DNAJB12 Q9NXW2 up subfamily B member 12 33 N-terminal N-terminal pro-BNP NPPB P16860 up pro-BNP 34 UGT 1A6 UDP- UGT1A6 P19224 up glucuronosyltransferas e 1-6 35 DLDH Dihydrolipoyl DLD P09622 up dehydrogenase, mitochondrial 36 TRIM9 E3 ubiquitin-protein TRIM9 Q9C026 down ligase TRIM9 37 TPGS2 Tubulin TPGS2 Q68CL5 down polyglutamylase complex subunit 2 38 DERM Dermatopontin DPT Q07507 up 39 CST8 Cystatin-8 CST8 O60676 up 40 PAHX Phytanoyl-CoA PHYH O14832 down dioxygenase, peroxisomal 41 IGF-II Cation-independent IGF2R P11717 up receptor mannose-6-phosphate receptor 42 Met Hepatocyte growth MET P08581 up factor receptor 43 SLIK1 SLIT and NTRK-like SLITRK1 Q96PX8 down protein 1 44 Omentin Intelectin-1 ITLN1 Q8WWA up 0 45 Calgranuli Calgranulin A S100A8 P05109 down n A 46 WISP-2 WNT1-inducible- CCN5 O76076 up signaling pathway protein 2 47 CYTD Cystatin-D CST5 P28325 up 48 Lefty-A Left-right LEFTY2 O00292 up determination factor 2 49 IL-1 sRI Interleukin-1 receptor IL1R1 P14778 up type 1 50 FBLN3 EGF-containing EFEMP1 Q12805 up fibulin-like No Protein Full name EGS Uniprot Up/dow . n extracellular matrix protein 1 51 fibulin 5 Fibulin-5 FBLN5 Q9UBX5 up 52 RNAS4 Ribonuclease 4 RNASE4 P34096 up 53 IGF-I Insulin-like growth IGF1 P05019 down factor I 54 PGM5 PGM5 PGM5 Q15124 up 55 TMED9 Transmembrane emp24 TMED9 Q9BVK6 up domain-containing protein 9 56 OPG Tumor necrosis factor TNFRSF11 O00300 up receptor superfamily B member 11B 57 Lectin, Vesicular integral- LMAN2 Q12907 up mannose- membrane protein binding 2 VIP36
Table 8. Performance overview in terms of average AUC for detecting HCC in NASH patients of all SVM models based on combinations of 2-6 of the 6 most promising identified proteins. These 6 proteins are: Glypican 3; PEX14; KLRG2; ARL4D; RAB38; and PKD2 (see also Table 6 and 7). Results were obtained using a 10-fold cross-validation. The optimal Youden index Jmax was calculated by taking the highest outcome for the following equation: J = TPR + TNR -1. Model = the number of proteins in the model combination; AUC = area under the curve; SD = standard deviation; Jmax = optimal Youden index; TNR = true negative rate, i.e. the specificity; TPR = true positive rate, i.e. the sensitivity. No. Model Protein combination AUC SDAUC Jmax (TNR, TPR) 1 6 Glypican 3|PEX14|KLRG2|ARL4D|RAB38|PKD2 0.956 0.035 0.79 (0.89, 0.90) 2 5 Glypican 3|PEX14|KLRG2|RAB38|PKD2 0.942 0.036 0.76 (0.94, 0.82) 3 5 Glypican 3|PEX14|ARL4D|RAB38|PKD2 0.935 0.038 0.76 (0.85, 0.91) 4 5 Glypican 3|PEX14|KLRG2|ARL4D|PKD2 0.929 0.028 0.71 (0.89, 0.83) 5 5 Glypican 3|PEX14|KLRG2|ARL4D|RAB38 0.927 0.040 0.73 (0.89, 0.84) 6 4 Glypican 3|PEX14|KLRG2|PKD2 0.924 0.031 0.72 (0.90, 0.82) 7 4 Glypican 3|PEX14|RAB38|PKD2 0.917 0.042 0.72 (0.85, 0.87) 8 5 PEX14|KLRG2|ARL4D|RAB38|PKD2 0.910 0.040 0.69 (0.85, 0.84) 9 4 Glypican 3|PEX14|ARL4D|PKD2 0.909 0.039 0.66 (0.80, 0.86) 10 4 Glypican 3|PEX14|ARL4D|RAB38 0.904 0.051 0.65 (0.80, 0.85) 11 4 Glypican 3|PEX14|KLRG2|RAB38 0.904 0.040 0.64 (0.81, 0.84) 12 5 Glypican 3|KLRG2|ARL4D|RAB38|PKD2 0.902 0.028 0.66 (0.85, 0.81) 13 4 PEX14|ARL4D|RAB38|PKD2 0.893 0.047 0.68 (0.77, 0.91) 14 4 Glypican 3|KLRG2|ARL4D|RAB38 0.892 0.033 0.65 (0.78, 0.87) 15 4 PEX14|KLRG2|ARL4D|RAB38 0.891 0.046 0.62 (0.89, 0.73) 16 3 Glypican 3|PEX14|PKD2 0.890 0.036 0.62 (0.85, 0.77) 17 4 Glypican 3|KLRG2|ARL4D|PKD2 0.889 0.024 0.65 (0.89, 0.76) 18 4 Glypican 3|PEX14|KLRG2|ARL4D 0.889 0.040 0.61 (0.71, 0.90) 19 4 PEX14|KLRG2|ARL4D|PKD2 0.888 0.030 0.66 (0.80, 0.86) 20 3 Glypican 3|ARL4D|PKD2 0.883 0.029 0.65 (0.82, 0.83) 21 2 Glypican 3|ARL4D 0.883 0.031 0.66 (0.85, 0.81) 22 3 Glypican 3|KLRG2|ARL4D 0.883 0.033 0.63 (0.83, 0.80)
4 PEX14|KLRG2|RAB38|PKD2 0.879 0.043 0.59 (0.89, 0.70) 4 Glypican 3|ARL4D|RAB38|PKD2 0.879 0.034 0.61 (0.81, 0.80) 4 Glypican 3|KLRG2|RAB38|PKD2 0.874 0.031 0.58 (0.75, 0.83) 3 Glypican 3|ARL4D|RAB38 0.871 0.046 0.60 (0.81, 0.79) 3 Glypican 3|PEX14|RAB38 0.868 0.054 0.55 (0.78, 0.78) 3 Glypican 3|PEX14|ARL4D 0.866 0.056 0.58 (0.82, 0.76) 3 Glypican 3|PEX14|KLRG2 0.864 0.042 0.57 (0.89, 0.68) 3 Glypican 3|KLRG2|PKD2 0.862 0.034 0.55 (0.80, 0.75) 3 PEX14|KLRG2|PKD2 0.861 0.037 0.59 (0.82, 0.77) 3 PEX14|KLRG2|ARL4D 0.860 0.041 0.56 (0.67, 0.90) 3 PEX14|KLRG2|RAB38 0.854 0.053 0.54 (0.89, 0.65) 3 PEX14|ARL4D|PKD2 0.853 0.052 0.57 (0.80, 0.77) 3 Glypican 3|RAB38|PKD2 0.848 0.038 0.51 (0.80, 0.71) 3 PEX14|RAB38|PKD2 0.846 0.058 0.59 (0.80, 0.79) 3 PEX14|ARL4D|RAB38 0.844 0.061 0.52 (0.93, 0.59) 3 Glypican 3|KLRG2|RAB38 0.835 0.039 0.53 (0.78, 0.75) 3 KLRG2|ARL4D|RAB38 0.831 0.036 0.56 (0.89, 0.67) 2 Glypican 3|PKD2 0.828 0.047 0.51 (0.80, 0.72) 4 KLRG2|ARL4D|RAB38|PKD2 0.819 0.030 0.51 (0.85, 0.66) 2 PEX14|KLRG2 0.814 0.047 0.46 (0.67, 0.79) 2 PEX14|PKD2 0.814 0.062 0.50 (0.83, 0.67) 2 Glypican 3|PEX14 0.813 0.048 0.51 (0.89, 0.62) 3 KLRG2|ARL4D|PKD2 0.811 0.040 0.53 (0.81, 0.72) 2 Glypican 3|KLRG2 0.808 0.044 0.47 (0.67, 0.81) 2 Glypican 3|RAB38 0.802 0.043 0.46 (0.77, 0.69) 2 ARL4D|RAB38 0.801 0.047 0.50 (0.80, 0.71) 2 PEX14|ARL4D 0.797 0.056 0.47 (0.94, 0.53) 3 ARL4D|RAB38|PKD2 0.797 0.036 0.54 (0.89, 0.65) 2 PEX14|RAB38 0.788 0.062 0.43 (0.67, 0.76)
No. Model Protein combination AUC SDAUC Jmax (TNR, TPR) 52 2 KLRG2|ARL4D 0.788 0.048 0.47 (0.80, 0.67) 53 2 ARL4D|PKD2 0.788 0.039 0.49 (0.80, 0.69) 54 3 KLRG2|RAB38|PKD2 0.784 0.042 0.43 (0.67, 0.76) 55 2 KLRG2|PKD2 0.769 0.045 0.44 (0.76, 0.68) 56 2 KLRG2|RAB38 0.763 0.047 0.43 (0.82, 0.61) 57 2 RAB38|PKD2 0.740 0.044 0.37 (0.59, 0.78)
Intercorrelation and confounder analysis Intercorrelation between the identified 57 NASH-HCC protein candidate biomarkers was assessed using a correlation matrix, the result of which was visualised by generating a heatmap. This heatmap (Figure 10) shows a clear tendency that the best-performing proteins show little to no intercorrelation with the other proteins, while both the number and strength of the intercorrelations appear to increase further down the marker list. The strongest positive intercorrelations were found between TMED9 (55) and Lectin, mannose-binding 2 (57) at 0.93, between Glypican 3 (1) and Met (42) at 0.89, between SLIK1 (43) and IGF-I (53) at 0.83, and between N-terminal pro-BNP (33) and FBLN3 (50) at 0.77. The strongest negative intercorrelations were found between PAHX (40) and FBLN3 (50) at -0.67, between TRI54 (28) and IL-1 sRI (49) at -0.60, between IL-1 sRI (49) and IGF-I (53) at -0.56, and between PAHX (40) and TMED9 (55) at -0.55. The average, median, and 75th percentile of the absolute covariance were 0.18, 0.14, and 0.27, respectively. Next, in order look for potential confounders or interesting associates with certain covariates, relations were investigated between the 57 protein biomarkers and several covariates: patient age, sex, body mass index, and a number of clinical measures for liver damage and HCC stage; see also Table 2 for an overview of the patient characteristics. The results were plotted in a heatmap to provide a clear overview of which proteins have significant association with any of the covariates and to what extent (Figure 11). The vast majority of the proteins associated with patient cohort, i.e. whether a patient has HCC or not, and BCLC stage, and to a lesser extent associations with the MELD and Child Pugh scores were also seen for a number of proteins. From the protein selection WISP-2 (46) had the strongest individual association with patient cohort (p=2.0e-4), followed by SLIK1 (43) (p=4.2e-4), OPG (56) (p=7.1e-4), and PEX14 (2) (p=9.2e-4). 9 of the 57 proteins showed no statistically significant associations (i.e. p>0.05) with HCC presence: Glypican 3 (1), RAB38 (5), NKG2E (7), UGT 1A6 (34), TRIM9 (36), PAHX (40), FBLN3 (50), TMED9 (55), and Lectin, mannose-binding 2 (57). Of these 9 proteins, 4 proteins showed a sufficiently high AUC to be included in the SVM models with combinations of 4, 3 proteins showed an AUC that was only high enough for inclusion in the SVM models combining 2 proteins, and the remaining 2 proteins showed poor AUCs but were instead selected based on the random forest model results. The BCLC stage showed the strongest associations with DLDH(35), PEX14(2), Met(42), and SLIK1(43) (p-values between 5.2e-5 and 7.4e-4). 32 proteins were found to associate with patient age with p-values below 0.05, with most of these proteins appearing in the lower part of the ranked list – only 6 of these proteins were in the top 25 proteins. The lowest p-values for association with patient age were seen for OPG (56) (p=3.5e-7), WISP-2 (46) (p=2.0e-6), Lefty-A (48) (p=1.4e-5), and SLIK1 (43) (p=4.9e-5). Curiously, for 16 proteins the p-value indicating correlation with patient age was lower than the p-value indicating correlation with patient cohort, this was the case for: GALNS (8), C1RL1 (22), DRB3 (26), SOSSC (31), DJB12 (32), N-terminal pro-BNP (33), TPGS2 (37), CST8 (39), SLIK1 (43), Calgranulin A (45), WISP-2 (46), CYTD (47), Lefty-A (48), RNAS4 (52), IGF-I (53), and OPG (56). Furthermore, only 2 proteins associated with patient sex; SLIK1 (43) (p=0.017), and UB2L6 (9) (p=0.033). 9 proteins were found to associate with BMI: PEX14 (2), UGT 1A6 (34), OPG (56), DCNL3 (16), N- terminal pro-BNP (33), FBLN3 (50), IL-1 sRI (49), ASAP2 (23), and Met (42), with p-values ranging between 0.0013 and 0.043. Of the clinical indicators for liver damage (see also Table 2), associations with p-values below 0.05 were found: with the MELD score for 27 proteins; with plasma ALT level for 26 proteins; with plasma thrombocyte level for 24 proteins; with the Child Pugh score for 23 proteins; with plasma bilirubin level for 23 proteins; with plasma creatine level for 17 proteins; with the INR for 13 proteins; with plasma AST level for 9 proteins; with the fibrosis score for 8 proteins; and with plasma albumin level for 6 proteins. The plasma creatine level showed the most significant associations with Lectin, mannose-binding 2 (57) (p=9.2e-21), TMED9 (55) (p=8.4e- 18), MSTN1 (15) (p=4.5e-16), and N-terminal pro-BNP (33) (p=3.6e-11). The MELD score showed the most significant associations with FBLN3 (50) (p=2.5e-15), TMED9 (55) (p=7.6e-14), Lectin, mannose-binding 2 (57) (p=2.4e-13), and N- terminal pro-BNP (33) (p=1.1e-11). The plasma bilirubin level showed the most significant associations with UB2L6 (9) (p=1.4e-13), UGT 1A6 (34) (p=4.1e-11), FBLN3 (50) (p=1.7e-6), and PAHX (40) (p=1.9e-5). The Child Pugh score showed the most significant associations with FBLN3 (50) (p=3.2e-10), PAHX (40) (p=1.0e- 9), TMED9 (55) (p=1.2e-6), and IL-1 sRI (49) (p=1.4e-5). The plasma thrombocyte level showed the most significant associations with IL-1 sRI (49) (p=9.6e-9), TRI54 (28) (p=1.5e-6), Omentin (44) (p=1.3e-5), and PGM5 (54) (p=5.0e-5). The plasma AST level showed the most significant associations with DLDH (35) (p=1.9e-6), UGT 1A6 (34) (p=1.9e-5), TRA2B (11) (p=1.9e-4), and Met (42) (p=1.5e-3). The plasma ALT level showed the most significant associations with RASF2 (30) (p=7.1e-6), PAHX (40) (p=7.6e-5), WISP-2 (46) (p=1.2e-4), and IL-1 sRI (49) (p=3.4e- 4). The INR showed the most significant associations with PAHX (40) (p=3.9e-4), TMED9 (55) (p=1.7e-3), FBLN3 (50) (p=1.8e-3), and N-terminal pro-BNP (33) (p=5.6e-3). The plasma albumin level showed the most significant associations with Calgranulin A (45) (p=3.2e-3). Finally, the fibrosis score showed the most significant associations with FBLN3 (50) (p=1.3e-2). Expanded support vector machine models Lastly, it was investigated whether the predictive performance of the generated SVM models could be enhanced further by including as model inputs up to 8 instead of 4 proteins and/or patient age and patient sex. To limit the computation time this was only assessed for models covering the first 18 of the identified candidate biomarkers. In addition, it was assessed whether inclusion of patient age and sex as model variables would improve performance. The results are reported in Table 9 and ROC curves for the best models can be seen in Figure 12; note that some AUCs differ between Table 9 and Figure 12 as Table 9 lists the means of the calculated AUCs during the 10-fold cross-validation, whereas Figure 12 reports the AUCs calculated from the mean ROC curves. It was found that adding additional proteins did increase the AUC, as the best-performing models with two, three, four, five, six, seven and eight proteins reached AUCs of, respectively, 0.883, 0.900, 0.924, 0.942, 0.956, 0.962, and 0.965 without inclusion of patient age and sex, and with inclusion 0.882, 0.903, 0.933, 0.947, 0.960, 0.968, and 0.966. The highest observed mean AUC was 0.968. In most cases, adding an additional protein to the model variables appeared to increase the AUC by a larger extent than adding patient age and sex. Furthermore, optimal Youden’s index values were calculated (see Table 9 and Figure 12). Although the results were varying, in general an upwards trend could be seen as the SVM models expanded, with numbers exceeding 0.80 for models including six or more proteins. Table 9. Top 3 SVM models for combinations of two to eight proteins with or without patient age and sex as additional model input variables, ranked by average AUC for predicting HCC in NASH patients. Mean AUC and SD were taken over 10 different random data-splits. Jmax was calculated using Equation 3. AUC = area under the curve; SD = standard deviation; SVM = support vector machine; GPC3 = Glypican 3; A = age; S = sex; PSPN = Persephin. Model Combination AUC SD Jmax (TNR, TPR) 2#1 GPC3|ARL4D 0.883 0.031 0.66 (0.85, 0.81) 2#2 GPC3|NKG2E 0.861 0.036 0.58 (0.80, 0.78) 2#3 GPC3|HMGR 0.829 0.055 0.49 (0.67, 0.82) 2AS#1 A|S|GPC3|ARL4D 0.882 0.035 0.65 (0.85, 0.80) 2AS#2 A|S|GPC3|NKG2E 0.870 0.043 0.55 (0.78, 0.77) 2AS#3 A|S|GPC3|HMGR 0.858 0.049 0.46 (0.71, 0.76) 3#1 GPC3|ARL4D|HMGR 0.900 0.032 0.69 (0.83, 0.86) 3#2 GPC3|ARL4D|NKG2E 0.894 0.032 0.64 (0.78, 0.87) 3#3 GPC3|PEX14|PKD2 0.890 0.036 0.62 (0.85, 0.77) 3AS#1 A|S|GPC3|ARL4D|HMGR 0.903 0.029 0.68 (0.83, 0.85) 3AS#2 A|S|GPC3|NKG2E|HMGR 0.901 0.040 0.60 (0.75, 0.85) 3AS#3 A|S|GPC3|ARL4D|NKG2E 0.899 0.039 0.61 (0.78, 0.83) 4#1 GPC3|PEX14|KLRG2|PKD2 0.924 0.031 0.72 (0.90, 0.82) 4#2 GPC3|PEX14|RAB38|PKD2 0.917 0.042 0.72 (0.85, 0.87) 4#3 GPC3|KLRG2|NKG2E|PSPN 0.915 0.041 0.68 (0.80, 0.88) 4AS#1 A|S|GPC3|PEX14|KLRG2|PKD2 0.933 0.017 0.70 (0.90, 0.80) 4AS#2 A|S|GPC3|PEX14|RAB38|PKD2 0.920 0.047 0.68 (0.89, 0.79) 4AS#3 A|S|GPC3|NKG2E|PSPN|HMGR 0.919 0.040 0.64 (0.80, 0.84) 5#1 GPC3|PEX14|KLRG2|RAB38|PKD2 0.942 0.036 0.76 (0.94, 0.82) 5#2 GPC3|PEX14|KLRG2|PKD2|MSTN1 0.940 0.026 0.75 (0.94, 0.81) 5#3 GPC3|PEX14|KLRG2|PKD2|PSPN 0.940 0.032 0.76 (0.94, 0.82) 5AS#1 A|S|GPC3|PEX14|KLRG2|RAB38|PKD2 0.947 0.025 0.74 (0.89, 0.85)
Model Combination AUC SD Jmax (TNR, TPR) 5AS#2 A|S|GPC3|PEX14|KLRG2|PKD2|PSPN 0.942 0.019 0.75 (0.89, 0.86) 5AS#3 A|S|GPC3|PEX14|KLRG2|PKD2|MSTN1 0.942 0.024 0.72 (0.94, 0.78) 6#1 GPC3|PEX14|KLRG2|ARL4D|RAB38|PKD2 0.956 0.035 0.79 (0.89, 0.90) 6#2 GPC3|PEX14|KLRG2|PKD2|PSPN|MSTN1 0.953 0.025 0.79 (0.89, 0.90) 6#3 GPC3|PEX14|ARL4D|RAB38|PKD2|UB2L6 0.950 0.040 0.79 (0.90, 0.89) 6AS#1 A|S|GPC3|PEX14|KLRG2|ARL4D|RAB38|PKD2 0.960 0.028 0.77 (0.93, 0.84) 6AS#2 A|S|GPC3|PEX14|KLRG2|RAB38|PKD2|PSPN 0.954 0.024 0.79 (0.93, 0.86) 6AS#3 A|S|GPC3|PEX14|KLRG2|PKD2|PSPN|MSTN1 0.950 0.024 0.77 (0.94, 0.83) 7#1 GPC3|PEX14|KLRG2|ARL4D|RAB38|PKD2|PSPN 0.962 0.033 0.85 (0.94, 0.91) 7#2 GPC3|PEX14|KLRG2|ARL4D|RAB38|PKD2|UB2L6 0.960 0.032 0.80 (0.90, 0.90) 7#3 GPC3|PEX14|KLRG2|RAB38|PKD2|PSPN|MSTN1 0.957 0.033 0.84 (0.94, 0.90) 7AS#1 A|S|GPC3|PEX14|KLRG2|ARL4D|RAB38|PKD2|PSPN 0.968 0.025 0.81 (0.90, 0.92) 7AS#2 A|S|GPC3|PEX14|KLRG2|ARL4D|RAB38|PKD2|UB2L6 0.962 0.024 0.76 (0.90, 0.86) 7AS#3 A|S|GPC3|PEX14|KLRG2|ARL4D|RAB38|PKD2|DCNL3 0.961 0.026 0.77 (0.94, 0.83) 8#1 GPC3|PEX14|KLRG2|ARL4D|RAB38|PKD2|PSPN|MSTN1 0.965 0.032 0.87 (0.94, 0.93) 8#2 GPC3|PEX14|KLRG2|ARL4D|RAB38|PKD2|UB2L6|TM157 0.963 0.029 0.82 (0.89, 0.93) 8#3 GPC3|PEX14|KLRG2|ARL4D|RAB38|PKD2|UB2L6|PSPN 0.963 0.031 0.84 (0.94, 0.90) 8AS#1 A|S|GPC3|PEX14|KLRG2|ARL4D|RAB38|PKD2|UB2L6|PSPN 0.966 0.026 0.80 (0.94, 0.86) 8AS#2 A|S|GPC3|PEX14|KLRG2|ARL4D|RAB38|PKD2|PSPN|HMGR 0.963 0.022 0.81 (0.90, 0.91) 8AS#3 A|S|GPC3|PEX14|KLRG2|ARL4D|RAB38|PKD2|PSPN|DCNL3 0.961 0.027 0.80 (0.90, 0.90)
Conclusion In order to identify plasma biomarker proteins that show potential for the development of a new screening test for early-stage detection of HCC, over 7000 SomaScan assay protein concentration measurements were studied in plasma samples from two cohorts of cirrhotic NASH patients, one cohort with (N=38) and one without (N=40) HCC. Three analytical approaches were employed to achieve this. First, associations between NASH-HCC incidence and individual proteins were investigated, which did yield potential protein candidates, but none remained significant after multiple testing corrections. Secondly, associations were assessed between NASH-HCC and ratios of proteins, which yielded results that were significant; however this did not yield AUCs that confidently surpassed the used benchmark AUC of 0.83, based on the reported diagnostic performance of the protein biomarker PIVKA-II. Thirdly, support vector machine models were constructed with two, three or four proteins as model inputs. This way a total of 2,472 models were found with an AUC over 0.83, covering 57 unique proteins. The identified proteins were given a performance-based ranking, using an extended list of all SVM models with an AUC surpassing 0.83 while combining 2, 3 and 4 proteins from the complete set of 57 identified proteins, which included 6,132 models. The selected candidate markers were then characterised by looking for potential intercorrelations and correlations with covariates (patient age, sex, BMI, cancer stage, and several scores and measures of liver damage). Since the proteins were selected based on how well they could be combined into SVM models to predict presence of HCC, it was expected that no strong intercorrelations would be found between the proteins; this was indeed the case. Most of the proteins showed individual correlation with patient cohort and many with cancer stage. Some proteins showed small associations with patient age as well, very few with patient sex or BMI. As for the other covariates, which mostly provided different indications of liver functionality, no particular correlations with the protein marker were found aside from a small number of interesting correlations between individual proteins and patient MELD and Child Pugh scores as well was plasma levels of bilirubin, creatine, AST, and thrombocytes. Overall, it was considered unlikely that the mentioned patient covariates act as confounding variables. Finally, it was investigated whether the AUC of the best SVM model combining four proteins could be surpassed by adding as model variables up to four additional proteins and/or patient age and sex. It was found that this indeed resulted in higher model performance and that patient age and sex could contribute to the models as well but to a minor extent. AUCs of 0.95-0.97, were found for models with six or more proteins. Some of the models demonstrated maximal Youden indices of 0.80, with true positive and true negative rates around 90%. The identified 57 proteins are considered promising candidates to establish a biomarker signature for use in a screening test for early-stage HCC. Example 2: random model comparison discovery study Materials and methods Selections of the predictive models constructed in the data analysis for the clinical study described in Example 1 were compared to selections of predictive models generated with proteins randomly selected from the SomaScan panel of over 7000 measured protein concentration profiles, in order to demonstrate the merits of the 57 identified protein biomarker candidates. The same dataset was used, including over 7000 protein concentration measurements in plasma samples of 38 patients with HCC primarily caused by NASH-induced liver cirrhosis (case cohort) and 40 patients with NASH-induced liver cirrhosis but not HCC (control cohort). To this end, sets of 10,000 support vector machine (SVM) models combining randomly selected 2, 3, 4, 5, or 6 proteins per model were assembled. For the model sets covering all possible combinations between members of the panel of 57 identified biomarker candidates, the sets of 1,596 models combining 2 proteins per model, 29,260 models combining 3 proteins per model, and 395,010 models combining 4 proteins per model had already been generated for the data analysis discussed in Example 1. Generating SVM models for all possible combinations of 5 or 6 proteins per model between members of the panel of 57 identified biomarker candidates was considered too demanding computationally; these sets would comprise 4,187,106 models and 36,288,252 models in total, respectively. Therefore, in order to be able to make an assessment of the performance of these models and to be able to compare them, random subsections were generated comprising 10,000 possible combinations with 5 proteins per model and 10,000 possible combinations with 6 proteins per model. The SVM models were constructed using an approach similar to the one discussed in Example 1. SVM models were generated using the Python package scikit-learn (v1.1.1) (Pedregosa, 2011. J Mach Learn Res 12: 2825-2830), with cross-validating using 10 random states for each generated model. For each random state, the dataset was randomly split 50/50 into a training set and a test set. Protein expression data of the training and test sets were scaled separately to zero mean and unit variance. A linear SVM algorithm was used, the kernel coefficient gamma was set at scale, for example corresponding to 1/(n_features*X.var()), and the regularisation parameter C was fine-tuned for each random state, with inputs ranging from 0.1 to 1 with intervals of 0.1. For each generated model the mean area under the curve (AUC) was calculated across the 10 random states. These calculated AUCs were then plotted to assess the performance of the model sets. Results & Conclusion The AUC results for the 10,000 randomly generated SVM models are shown in Figure 13A-17A, while figure 13B-17B show the results for the SVM models making combinations from the panel of 57 identified biomarker candidates. For all numbers of combined proteins per model (i.e., 2, 3, 4, 5, or 6), the vast majority of generated models derived from the panel of 57 biomarkers outperformed the models derived from proteins randomly taken from the entire SomaScan dataset. In addition, none of the latter models passed the employed AUC cut-off of 0.83, while all of the former model selections containing multiple models surpassing this AUC cut-off. The most promising of these models will be further developed and tested. Example 3: next-best marker analysis discovery study Materials and methods While the 57 protein biomarker candidates listed in Table 1 and mentioned in Example 1 showed promising results in our discovery study, subsequent development of an immunoassay was found to be challenging for some of the target proteins. For example, availability of adequate antibodies was limited for some of the most promising target proteins. Using the dataset and trained sets of predictive models discussed in Example 1, an assessment was therefore made towards the effects of excluding some of the top 10 target proteins for the model performance of the remaining biomarker candidate panel. Table 10 shows the changes to the order of the top-ranking proteins following exclusion of Glypican 3 (#1), PEX14 (#2), KLRG2 (#3), ARL4D (#4), RAB38 (#5), PKD2 (#6), NKG2E (#7), or GALNS (#8). It can be seen that the order of the ranked biomarkers shows minimal change with the exception of the exclusion of Glypican 3 (#1) or PEX14 (#2), though even then the top 13 is mostly maintained. In Table 11 the best-performing models are presented that remain after excluding Glypican 3 (#1) from the selection, in order to check to what extent the trained models rely on the top-ranking biomarkers. While no models combining 2 protein biomarkers reached the AUC benchmark of 0.83, a multitude of models combining 3, 4, 5, or 6 biomarkers did surpass the benchmark. Table 10. Changes to the order of the top-ranking proteins following individual exclusion of the top-ranking 8 biomarker candidates. GPC3 = Glypican 3. Biomarker panel positions after excluding individual marker candidates ) ) ) ) ) ) ) ) ) 3 4 ) ) 7 8 ) 0 2 3 1( 2 ( ( 2 ( 5 ( 6 ( ( ( S 9 ( 1 ( 1 ( 1 ( 3 4 D C 1 G 4 8 3 2 E 2 N 6 L 7 5 B 2 J 1 C A P X E R L L B D G L 2 1 A 2 R A K K A B ) 1 P P G P K A R P N G U M T R T 1 ( C M I r GPC3 - 1 2 7 3 8 12 4 13 10 9 5 6 ek PEX14 1 - 2 3 5 10 4 11 7 13 8 6 9 r a KLRG2 1 2 - 3 4 5 7 6 8 12 10 11 9 m d ARL4D 1 2 3 - 4 6 5 7 11 10 8 9 12 ed RAB38 1 2 3 4 - 5 6 7 9 8 10 12 11 ul c PKD2 1 3 2 4 5 - 6 7 8 10 9 11 12 x E NKG2E 1 2 3 4 5 6 - 7 8 10 9 12 11 GALNS 1 2 3 4 5 6 7 - 8 9 18 15 10 Table 11. Top 5 SVM models for combinations of 2 to 6 proteins excluding Glypican 3, ranked by average AUC for predicting HCC in NASH patients. Mean AUC and SD (standard deviation) were taken over 10 different random data-splits. Model Combination AUC SD 2#1 CP2CJ|IMPA1 0.82 0.05 2#2 PEX14|KLRG2 0.81 0.05 2#3 PEX14|PKD2 0.81 0.06 2#4 KLRG2|Apaf-1 0.81 0.04 2#5 PEX14|GALNS 0.81 0.06 3#1 PEX14|KLRG2|SOST 0.88 0.04 3#2 PEX14|KLRG2|Apaf-1 0.87 0.04 3#3 PEX14|KLRG2|TM157 0.87 0.05 3#4 PEX14|KLRG2|MSTN1 0.87 0.05 3#5 PEX14|KLRG2|CPT1B 0.87 0.06 4#1 PEX14|KLRG2|PKD2|SOST 0.91 0.04 4#2 PEX14|KLRG2|ARL4D|CPT1B 0.91 0.05 4#3 PEX14|KLRG2| TM157|MSTN1 0.91 0.05 4#4 PEX14|KLRG2|UB2L6|SOST 0.90 0.05 4#5 PEX14|KLRG2|RAB38|SOST 0.90 0.05 5#1 PEX14|KLRG2| ARL4D|UB2L6|Apaf-1 0.93 0.03 5#2 PEX14|KLRG2|PKD2|Persephin|SOST 0.93 0.03 5#3 PEX14|KLRG2|ARL4D|RAB38|CPT1B 0.93 0.05 5#4 PEX14|KLRG2|PKD2|Persephin|MSTN1 0.92 0.04 5#5 PEX14|KLRG2|UB2L6|Apaf-1|SOST 0.92 0.05 6#1 PEX14|KLRG2|ARL4D|UB2L6|Persephin 0.94 0.04 |Apaf-1 6#2 PEX14|KLRG2|PKD2|Persephin|MSTN1| 0.94 0.03 SOST 6#3 PEX14|KLRG2|PKD2|Persephin|Apaf- 0.94 0.03 1|SOST 6#4 PEX14|KLRG2|RAB38|PKD2|Persephin| 0.94 0.04 SOST 6#5 PEX14|KLRG2|PKD2|UB2L6|Apaf- 0.94 0.04 1|SOST Example 4 ELISA protocol The following standard protocol was developed to conduct sandwich Enzyme Linked Immunosorbent Assays (ELISAs) in order to determine the concentrations of target proteins in human plasma samples. Buffers. For most targets, a standard block buffer, wash buffer, and reagent diluent is used. The block buffer typically consists of a 3% dilution of Bovine Serum Albumin (BSA) block buffer, in 1x phosphate-buffered saline (PBS) buffer. The wash buffer typically consists of 0.05% (w/v) TWEEN®20 in 1x PBS buffer. The reagent diluent typically consists of 0.05% (w/v) TWEEN®20 and, depending on the target protein, between 0.1% and 1% BSA buffer in 1xPBS buffer. All buffers containing TWEEN®20 are to be kept at room temperature and disposed of after one week. Buffers containing BSA are to be made on the day of the experiment using either a BSA buffer stock (Block buffer) that is kept in the fridge, or freshly made BSA buffer. Buffers containing both BSA and TWEEN®20 may only be used for one day at room temperature. Day 1. Firstly, the capture antibodies are diluted in 1xPBS. The dilution ranges from 1:10 to 1:100.000. A 96-well microplate is then coated with 50 µL per well. After sealing the plate with a plate sealer, it is incubated overnight at 4 degrees Celsius. Day 2. Following incubation, the microplate wells are washed 4 times with 100 µL 1xPBS. The wells are then filled with 100 µL block buffer and incubated at room temperature for 1 hour. Next, the wells are washed twice with 100 µL of the wash buffer. Then 50 µL is added of the recombinant or reference/patient plasma, which may be diluted in reagent diluent. This dilution ranges from undiluted to 1:10.000. After incubating again at room temperature for 1.5 hours, the wells are washed 5 times with 100 µL wash buffer. 50 µL of the detection antibodies is then added, which has been diluted in reagent diluent, followed by incubation at room temperature for 1 hour. The dilution may range from 1:10 to 1:100.000. The wells are then washed 5 times with 100 µL wash buffer again. The next steps are performed in the dark. 50 µL of the HRP-conjugate is added to the wells, diluted in reagent diluent, followed by incubation at room temperature for 1 hour. The wells are then washed 5 times with 100 µL wash buffer. Subsequently 50 µL of substrate solution (3,3’,5,5’-tetramethylbenzidine (TMB)) is added to the wells, after which the plate is incubated at room temperature for 20 minutes. The plates are then checked at intervals of 5 minutes for high signal and stopped accordingly by adding 50 µL of ELISA Stop Solution from Invitrogen. Immediately after stopping the reaction, the optical density is determined at 450 and 630 nm using a microplate reader. Standard curves After completing the sandwich ELISA, a standard curve is established and the protein concentrations in the analysed samples can be calculated based on the measured optical densities. Figure 18 shows examples of the standard curves that have been established for some of the most promising identified biomarker proteins. Example 5: Internal validation results for target proteins Glypican 3 and Omentin The discovery study detailed in Example 1, which aimed to obtain a set of biomarker candidates with potential for implementation in a diagnostic HCC screening test, was conducted by analysing two groups of plasma samples using the SomaScan Assay developed by the company SomaLogic. The SomaScan Assay is a fee-for-service aptamer-based proteomics platform which provides 7,335 protein concentration measurements across a large number of different biological pathways. This impressive set of analysed targets makes the SomaScan Assay an attractive platform for biomarker discovery. However, the measured concentration data that the SomaScan Assay provides is presented in relative fluorescence units (RFU), not absolute units such as mg/mL or mol/mL. In order to validate the observed concentration differences between the two groups that were compared (cirrhotic NASH patients with and without HCC), a different measurement method is needed. In order to check whether the SomaScan results can be replicated with other measuring techniques as well, ELISA experiments were conducted to measure and compare the plasma concentration of Glypican 3, which is one of the most effective biomarker candidates that was identified, as well as the target protein Omentin. As can be seen in Figure 19, there is a clear relation between the concentration profiles measured using these two different measuring techniques. The measured Glypican 3 concentration distributions between the HCC and non-HCC sample groups can also be seen in Figure 3, where the corresponding SomaScan data is shown, and Figure 20, where the newly measured ELISA data is shown. For this experiment, Glypican 3 concentrations were determined in duplicate in all 78 samples that were included in the discovery study (see Example 1), as well as 10 additional samples of healthy adults and 9 additional samples of patients with late-stage HCC. It should be noted that for 2 of the late-stage HCC patients datapoints had to be omitted because they were above the range covered by the established standard curve, indicating that the corresponding plasma concentrations could not be determined with adequate certainty but that they were at least more elevated than the other samples. As was found with the SomaScan Assay (Figure 3), the measured plasma concentrations (Figure 20) show that plasma levels of Glypican 3 are elevated in most HCC patients, but not in all, emphasizing the added value of combining expression levels of multiple different biomarkers into predictive models. Finally, it was checked whether the Glypican 3 measurements in patient plasma samples could also be replicated in patient serum samples. To this end, serum and plasma samples were analysed from a group of 10 healthy adults and 10 late-stage HCC patients. Indeed, as shown in Figure 21, the measurements across these two different sample types clearly correlated, though it may be remarked that the measured concentrations in serum were lower. For 5 serum samples, all from the group of healthy adults, the measured datapoints were below the range of the established standard curve and were therefore excluded, while for one of the late-stage HCC samples the measured datapoint was too high to establish a concentration. This may indicate that protein analysis in plasma might be preferable, especially for proteins which tend to be present in blood only in minor concentrations.

Claims

Claims 1. An in vitro method of typing a sample of an individual for the presence of a hepatocellular carcinoma (HCC), the method comprising: (i) determining the concentration of at least 2 marker proteins to thereby provide a concentration profile of the marker proteins, wherein the marker proteins are selected from the proteins listed in Table 1; (ii) comparing the individual’s concentration profile to a reference concentration profile of the at least 2 marker proteins; thereby typing the sample for the presence of HCC.
2. The method according to claim 1, wherein the sample is a plasma or serum sample.
3. The method according to claim 1 or claim 2, wherein the determination of the protein concentration is performed using an enzyme-linked immunosorbent assay (ELISA), preferably a multiplex ELISA.
4. The method according to any one of claims 1-3, wherein the at least 2 marker proteins are selected from PEX14, KLRG2, ARL4D, RAB38, PKD2, and NKG2E, or from PEX14, KLRG2, RAB38, GALNS, CP2CJ and IMPA1.
5. The method according to any one of claims 1-4, wherein the at least 2 marker proteins comprise PEX14 and KLRG2.
6. The method according to any one of claims 1-5, wherein the at least 2 marker proteins comprise PEX14, KLRG2 and at least one protein selected from ARL4D, RAB38,PKD2, and NKG2E, preferably comprise PEX14, KLRG2 and at least two proteins selected from ARL4D, RAB38, PKD2 and NKG2E, more preferably comprise PEX14, KLRG2, ARL4D, RAB38 and PKD2.
7. The method according to any one of claims 1-6, wherein the at least 2 marker proteins comprise PEX14, KLRG2, RAB38 and Glypican 3.
8. The method according to any one of claims 1-7, wherein the protein concentration of at least 5 different marker proteins, at least 6 different marker proteins, preferably at least 7 different marker proteins, more preferably at least 8 different marker proteins, more preferably at least 10 different marker proteins, more preferably at least 20 different marker proteins selected from the proteins listed in Table 1, most preferably all marker proteins listed in Table 1, is determined.
9. The method according to any one of claims 1-8, wherein the individual is at risk of having or developing HCC.
10. The method according to any one of claims 1-9, wherein the individual has cirrhosis, fibrosis, chronic hepatitis B, chronic hepatitis C, alcoholic liver disease, NAFLD, NASH, primary biliary cholangitis, primary hemochromatosis, auto- immune hepatitis, alpha-1 antitrypsin deficiency, or Wilson's disease.
11. The method according to any one of claims 1-10, wherein the reference concentration profile is composed of the average concentrations of the marker proteins specified in step (ii) of individuals having HCC; of individuals not having HCC; or of a mixture of individuals having HCC and individuals not having HCC.
12. The method according to any one of claims 1-11, wherein the individual’s concentration profile is compared to two reference concentration profiles, wherein one reference concentration profile is composed of the average concentrations of the marker proteins specified in step (ii) of individuals having HCC and the other reference concentration profile is composed of the average concentrations of the marker proteins specified in step (ii) of individuals not having HCC.
13. A method of treating an individual with HCC, comprising - typing of a sample from said individual using the method according to any one of claims 1-12; - treating the individual that is typed as having HCC with a curative treatment; and - testing the individual that is typed as not having HCC with the method according to any one of claims 1-12 at regular time intervals, such as every three years, preferably every two years, preferably every year, more preferably every six months .
14. The method according to claim 13, wherein the individual that is typed as not having HCC is treated with a treatment strategy related to the individual’s underlying risk factor for HCC.
15. The method according to claim 13 or claim 14, wherein the curative treatment comprises liver transplantation, ablation, surgical resection or a combination thereof.
16. The method according to any one of claims 13-15, wherein the individual is at risk of having or developing HCC.
PCT/NL2024/050499 2023-09-12 2024-09-12 Biomarkers for typing a sample of an individual for hepatocellular carcinoma. Pending WO2025058517A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP23197015.3 2023-09-12
EP23197015 2023-09-12

Publications (1)

Publication Number Publication Date
WO2025058517A1 true WO2025058517A1 (en) 2025-03-20

Family

ID=88021014

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/NL2024/050499 Pending WO2025058517A1 (en) 2023-09-12 2024-09-12 Biomarkers for typing a sample of an individual for hepatocellular carcinoma.

Country Status (1)

Country Link
WO (1) WO2025058517A1 (en)

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1991006678A1 (en) 1989-10-26 1991-05-16 Sri International Dna sequencing
US6172218B1 (en) 1994-10-13 2001-01-09 Lynx Therapeutics, Inc. Oligonucleotide tags for sorting and identification
US6210891B1 (en) 1996-09-27 2001-04-03 Pyrosequencing Ab Method of sequencing DNA
US6258568B1 (en) 1996-12-23 2001-07-10 Pyrosequencing Ab Method of sequencing DNA based on the detection of the release of pyrophosphate and enzymatic nucleotide degradation
US6274320B1 (en) 1999-09-16 2001-08-14 Curagen Corporation Method of sequencing a nucleic acid
US6306597B1 (en) 1995-04-17 2001-10-23 Lynx Therapeutics, Inc. DNA sequencing by parallel oligonucleotide extensions
WO2004018497A2 (en) 2002-08-23 2004-03-04 Solexa Limited Modified nucleotides for polynucleotide sequencing
US6969488B2 (en) 1998-05-22 2005-11-29 Solexa, Inc. System and apparatus for sequential processing of analytes
US7057026B2 (en) 2001-12-04 2006-06-06 Solexa Limited Labelled nucleotides
WO2007123744A2 (en) 2006-03-31 2007-11-01 Solexa, Inc. Systems and devices for sequence by synthesis analysis
WO2008039071A2 (en) 2006-09-29 2008-04-03 Agendia B.V. High-throughput diagnostic testing using arrays
US7414116B2 (en) 2002-08-23 2008-08-19 Illumina Cambridge Limited Labelled nucleotides
US8030013B2 (en) * 2006-04-14 2011-10-04 Mount Sinai School Of Medicine Methods and compositions for the diagnosis for early hepatocellular carcinoma
WO2016093567A1 (en) * 2014-12-12 2016-06-16 서울대학교산학협력단 Biomarker for diagnosis of hepatoma and use thereof
US20170168052A1 (en) * 2008-10-06 2017-06-15 Morehouse School Of Medicine Exosome-mediated diagnosis of hepatitis virus infections and diseases
EP3232198A1 (en) * 2014-12-12 2017-10-18 Seoul National University R&DB Foundation Biomarker for diagnosis of hepatoma and use thereof
EP3097202B1 (en) * 2014-01-21 2018-10-31 Morehouse School of Medicine Exosome-mediated detection of infections and diseases
EP4057006A1 (en) * 2021-03-08 2022-09-14 Universite De Bordeaux Ex vivo method for analysing a tissue sample using proteomic profile matching, and its use for the diagnosis, prognosis of pathologies and for predicting response to treatments

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1991006678A1 (en) 1989-10-26 1991-05-16 Sri International Dna sequencing
US6172218B1 (en) 1994-10-13 2001-01-09 Lynx Therapeutics, Inc. Oligonucleotide tags for sorting and identification
US6306597B1 (en) 1995-04-17 2001-10-23 Lynx Therapeutics, Inc. DNA sequencing by parallel oligonucleotide extensions
US6210891B1 (en) 1996-09-27 2001-04-03 Pyrosequencing Ab Method of sequencing DNA
US6258568B1 (en) 1996-12-23 2001-07-10 Pyrosequencing Ab Method of sequencing DNA based on the detection of the release of pyrophosphate and enzymatic nucleotide degradation
US6969488B2 (en) 1998-05-22 2005-11-29 Solexa, Inc. System and apparatus for sequential processing of analytes
US6274320B1 (en) 1999-09-16 2001-08-14 Curagen Corporation Method of sequencing a nucleic acid
US7057026B2 (en) 2001-12-04 2006-06-06 Solexa Limited Labelled nucleotides
US7427673B2 (en) 2001-12-04 2008-09-23 Illumina Cambridge Limited Labelled nucleotides
WO2004018497A2 (en) 2002-08-23 2004-03-04 Solexa Limited Modified nucleotides for polynucleotide sequencing
US7414116B2 (en) 2002-08-23 2008-08-19 Illumina Cambridge Limited Labelled nucleotides
WO2007123744A2 (en) 2006-03-31 2007-11-01 Solexa, Inc. Systems and devices for sequence by synthesis analysis
US8030013B2 (en) * 2006-04-14 2011-10-04 Mount Sinai School Of Medicine Methods and compositions for the diagnosis for early hepatocellular carcinoma
WO2008039071A2 (en) 2006-09-29 2008-04-03 Agendia B.V. High-throughput diagnostic testing using arrays
US20170168052A1 (en) * 2008-10-06 2017-06-15 Morehouse School Of Medicine Exosome-mediated diagnosis of hepatitis virus infections and diseases
EP3097202B1 (en) * 2014-01-21 2018-10-31 Morehouse School of Medicine Exosome-mediated detection of infections and diseases
WO2016093567A1 (en) * 2014-12-12 2016-06-16 서울대학교산학협력단 Biomarker for diagnosis of hepatoma and use thereof
EP3232198A1 (en) * 2014-12-12 2017-10-18 Seoul National University R&DB Foundation Biomarker for diagnosis of hepatoma and use thereof
EP4057006A1 (en) * 2021-03-08 2022-09-14 Universite De Bordeaux Ex vivo method for analysing a tissue sample using proteomic profile matching, and its use for the diagnosis, prognosis of pathologies and for predicting response to treatments

Non-Patent Citations (45)

* Cited by examiner, † Cited by third party
Title
AM J CANCER RES, vol. 10, 2020, pages 2993 - 3036
AYUSO ET AL., J RADIOLOGY, vol. 101, 2018, pages 72 - 81
BEST ET AL., CLIN GASTROENTEROL HEPATOL, vol. 18, 2020, pages 728 - 735
CABRERA ET AL., ALIMENT PHARMACOL THER, vol. 34, 2011, pages 205 - 213
CAI MENGJIAO ET AL: "Disruption of peroxisome function leads to metabolic stress, mTOR inhibition, and lethality in liver cancer cells", CANCER LETTERS, vol. 421, 1 May 2018 (2018-05-01), US, pages 82 - 93, XP093227362, ISSN: 0304-3835, DOI: 10.1016/j.canlet.2018.02.021 *
CAO JIE ET AL: "A signature of 13 autophagy-related gene pairs predicts prognosis in hepatocellular carcinoma", BIOENGINEERED, vol. 12, no. 1, 1 January 2021 (2021-01-01), US, pages 697 - 707, XP093228295, ISSN: 2165-5979, DOI: 10.1080/21655979.2021.1880084 *
CHRISTIANSSON ET AL., EUPA OPEN PROTEOMICS, vol. 3, 2014, pages 37 - 47
CUI ET AL., CANCER INVEST, vol. 34, 2016, pages 459 - 464
FERLAY ET AL., GLOBAL CANCER OBSERVATORY: CANCER TODAY, 2020
GALLE ET AL., J HEPATOL, vol. 69, 2018, pages 182 - 236
GOLD ET AL., PLOS ONE, vol. 5, 2010, pages 15004
HO MING-CHIH ET AL: "A Gene Expression Profile for Vascular Invasion can Predict the Recurrence After Resection of Hepatocellular Carcinoma: a Microarray Approach", ANNALS OF SURGICAL ONCOLOGY, vol. 13, no. 11, 29 September 2006 (2006-09-29), Cham, pages 1474 - 1484, XP093228092, ISSN: 1068-9265, DOI: 10.1245/s10434-006-9057-1 *
HSIEH JIA-JUAN ET AL: "RAB38 is a potential prognostic factor for tumor recurrence in non-small cell lung cancer", ONCOLOGY LETTERS, 1 January 2019 (2019-01-01), GR, XP093124050, ISSN: 1792-1074, DOI: 10.3892/ol.2019.10547 *
JIA QINGAN ET AL: "Dual roles of WISP2 in the progression of hepatocellular carcinoma: implications of the fibroblast infiltration into the tumor microenvironment", AGING, vol. 13, no. 17, 15 September 2021 (2021-09-15), pages 21216 - 21231, XP093122990, ISSN: 1945-4589, DOI: 10.18632/aging.203424 *
LIKHITSUP ET AL., PHARMACOECONOMICS, vol. 38, 2020, pages 5 - 24
LOESS; JACOBY, ELECTORAL STUDIES, vol. 19, 2000, pages 577 - 613
LOWESS; CLEVELAND ET AL., J AMERICAN STATISTICAL ASSOCIATION, vol. 83, 1988, pages 596 - 610
MAASS THORSTEN ET AL: "Microarray-Based Gene Expression Analysis of Hepatocellular Carcinoma", CURRENT GENOMICS, vol. 11, no. 4, 1 June 2010 (2010-06-01), NL, pages 261 - 268, XP093227877, ISSN: 1389-2029, DOI: 10.2174/138920210791233063 *
MARJOT ET AL., ENDOCR REV, vol. 41, 2020
MARQUARDT JENS U ET AL: "Sequential transcriptome analysis of human liver cancer indicates late stage acquisition of malignant traits", JOURNAL OF HEPATOLOGY, ELSEVIER, AMSTERDAM, NL, vol. 60, no. 2, 26 October 2013 (2013-10-26), pages 346 - 353, XP028809731, ISSN: 0168-8278, DOI: 10.1016/J.JHEP.2013.10.014 *
MARRERO ET AL., HEPATOLOGY, vol. 698, no. 2, 2018, pages 723 - 750
MELCHIORRE CERVELLO ET AL: "Expression of WISPs and of Their Novel Alternative Variants in Human Hepatocellular Carcinoma Cells", ANNALS OF THE NEW YORK ACADEMY OF SCIENCES, NEW YORK ACADEMY OF SCIENCES, US, vol. 1028, no. 1, 12 January 2006 (2006-01-12), pages 432 - 439, XP071401661, ISSN: 0077-8923, DOI: 10.1196/ANNALS.1322.051 *
NEUHAUS ALEXANDER ET AL: "A Novel Pex14 Protein-interacting Site of Human Pex5 Is Critical for Matrix Protein Import into Peroxisomes", JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 289, no. 1, 1 January 2014 (2014-01-01), US, pages 437 - 448, XP093227701, ISSN: 0021-9258, DOI: 10.1074/jbc.M113.499707 *
NOBLE, NAT BIOTECHNOL, vol. 24, 2006, pages 1565 - 1567
PEDREGOSA, J MACH LEARN RES, vol. 12, 2011, pages 2825 - 2830
RAZASOOD, WORLD J GASTROENTEROL, vol. 20, 2014, pages 4115 - 4127
RONAGHI ET AL., ANALYTICAL BIOCHEMISTRY, vol. 242, 1996, pages 84 - 9
RONAGHI ET AL., SCIENCE, vol. 281, 1998, pages 363
RONAGHI, GENOME RES, vol. 11, 2001, pages 3 - 11
SCHOTTEN ET AL., PHARMACEUTICALS, vol. 14, 2021, pages 735
SHEKA ET AL., JAMA, vol. 323, 2020, pages 1175 - 1183
SINGAL ET AL., J HEPATOL, vol. 72, 2020, pages 260 - 261
SINGH GURJOT ET AL: "Biomarkers for hepatocellular cancer", WORLD JOURNAL OF HEPATOLOGY, vol. 12, no. 9, 27 September 2020 (2020-09-27), pages 558 - 573, XP093121819, ISSN: 1948-5182, DOI: 10.4254/wjh.v12.i9.558 *
SKERRA, CURRENT OPINION BIOTECHNOL, vol. 18, 2007, pages 295 - 304
SKRLEC ET AL., TRENDS BIOTECHNOL, vol. 33, 2015, pages 408 - 418
STRAS ET AL., CLIN EXP HEPATOL, vol. 6, 2020, pages 170 - 175
SUNG ET AL., CA CANCER J CLIN, vol. 71, 2021, pages 209 - 249
TIGHE ET AL., PROTEOMICS CLIN APPL, vol. 9, 2015, pages 406 - 422
WEBER ET AL: "The cancer biomarker osteopontin: Combination with other markers", CANCER GENOMICS & PROTEOMICS, vol. 8, 1 January 2011 (2011-01-01), pages 263 - 288, XP055711875 *
XU DAHAI ET AL: "Performance of Serum Glypican 3 in Diagnosis of Hepatocellular Carcinoma: A meta-analysis", ANNALS OF HEPATOLOGY, vol. 18, no. 1, 1 January 2019 (2019-01-01), MX, pages 58 - 67, XP093121817, ISSN: 1665-2681, DOI: 10.5604/01.3001.0012.7863 *
YANG CHUN-CHENG ET AL: "Role of Rab GTPases in Hepatocellular Carcinoma", JOURNAL OF HEPATOCELLULAR CARCINOMA 2014, vol. 8, 1 January 2021 (2021-01-01), pages 1389 - 1397, XP093227833, ISSN: 2253-5969, DOI: 10.2147/JHC.S336251 *
YANG ET AL., CANCER EPODEMIOL BIOMARKERS PREV, vol. 28, 2019, pages 531 - 538
YANG ET AL., NUCL ACIDS RES, vol. 30, 2002, pages 15
ZHANG YUAN-YUAN ET AL: "Omentin-1, a new adipokine, promotes apoptosis through regulating Sirt1-dependent p53 deacetylation in hepatocellular carcinoma cells", EUROPEAN JOURNAL OF PHARMACOLOGY, vol. 698, no. 1-3, 1 January 2013 (2013-01-01), NL, pages 137 - 144, XP093123013, ISSN: 0014-2999, DOI: 10.1016/j.ejphar.2012.11.016 *
ZHAO SHOUJIE ET AL: "The diagnostic value of the combination of Golgi protein 73, glypican-3 and alpha-fetoprotein in hepatocellular carcinoma: a diagnostic meta-analysis", ANNALS OF TRANSLATIONAL MEDICINE, vol. 8, no. 8, 1 April 2020 (2020-04-01), US, pages 536 - 536, XP093123533, ISSN: 2305-5839, DOI: 10.21037/atm.2020.02.89 *

Similar Documents

Publication Publication Date Title
Todeschini et al. Circulating miRNA landscape identifies miR-1246 as promising diagnostic biomarker in high-grade serous ovarian carcinoma: A validation across two independent cohorts
US20230366034A1 (en) Compositions and methods for diagnosing lung cancers using gene expression profiles
Ono et al. Circulating tumor DNA analysis for liver cancers and its usefulness as a liquid biopsy
JP6140202B2 (en) Gene expression profiles to predict breast cancer prognosis
US11851716B2 (en) Methods and systems for analyzing nucleic acid molecules
US20110159498A1 (en) Methods, agents and kits for the detection of cancer
CN111139300B (en) Application of a group of colon cancer prognosis-related genes
JP2009508493A (en) Methods for diagnosing pancreatic cancer
CA2985683A1 (en) Methods and compositions for diagnosing or detecting lung cancers
WO2022053065A1 (en) Biomarker used for predicting or evaluating lung cancer patients, detection method, and application
US20250137066A1 (en) Compostions and methods for diagnosing lung cancers using gene expression profiles
CN112210605A (en) DNA methylation detection kit for evaluating tissue immune response and diagnosing prognosis
CN110229899B (en) Plasma marker combinations for early diagnosis or prognosis prediction of colorectal cancer
AU2014375224A1 (en) Lung cancer determinations using miRNA ratios
WO2025058517A1 (en) Biomarkers for typing a sample of an individual for hepatocellular carcinoma.
WO2009123990A1 (en) Cancer risk biomarker
EP4623099A1 (en) Cell-free dna methylation test for breast cancer
Salgüero et al. PBMCs gene expression signature of advanced cirrhosis with high risk for clinically significant portal hypertension in HIV/HCV coinfected patients: A cross-control study
WO2024229279A2 (en) Use of circulating microrna profiles for identification of brca1 or brca2 mutations
CN118159669A (en) A nucleic acid and protein detection kit for diagnosing liver cancer
EP2531856A2 (en) Methods and kits used in classifying adrenocortical carcinoma
EP2596126A1 (en) Methods for determining a prognosis for survival for a patient with leukaemia
CN112368399A (en) Prediction and prognosis application of miRNA (micro ribonucleic acid) in treatment and care of high-grade serous ovarian cancer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24772432

Country of ref document: EP

Kind code of ref document: A1