[go: up one dir, main page]

WO2024252401A1 - Markers - Google Patents

Markers Download PDF

Info

Publication number
WO2024252401A1
WO2024252401A1 PCT/IL2024/050563 IL2024050563W WO2024252401A1 WO 2024252401 A1 WO2024252401 A1 WO 2024252401A1 IL 2024050563 W IL2024050563 W IL 2024050563W WO 2024252401 A1 WO2024252401 A1 WO 2024252401A1
Authority
WO
WIPO (PCT)
Prior art keywords
lung cancer
cfdna
methylation
marker
dna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/IL2024/050563
Other languages
French (fr)
Inventor
Danny Frumkin
Adam Wasserstrom
Revital KNIRSH
Orna SAVIN
Nimrod AXELRAD
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nucleix Ltd
Original Assignee
Nucleix Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nucleix Ltd filed Critical Nucleix Ltd
Publication of WO2024252401A1 publication Critical patent/WO2024252401A1/en
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers

Definitions

  • the present invention relates to methods, systems and kits for diagnosing lung cancer (and particularly early-stage and/or high-grade lung cancer) in a subject, staging and grading the cancer, evaluating post-treatment disease recurrence, monitoring treatment efficacy and providing prognosis, by analysing DNA methylation markers in cell-free DNA from a sample of the subject.
  • Lung cancer is one of the most common and serious types of cancer.
  • NSCLC non-small cell lung cancer
  • SCLC small cell lung cancer
  • the general prognosis of lung cancer is poor as it does not usually cause noticeable symptoms until it has spread through the lungs and sometimes also into other parts of the body. Therefore, detection of cancer at the earliest possible stage is of paramount importance for treatment of the disease.
  • Lung cancers may be classified according to ‘stage’ and/or ‘grade’ (with the classification process known as ‘staging’ or ‘grading’, respectively).
  • stage of a lung cancer indicates its size and degree of spread around the body, so provides information about the progression of the disease.
  • a typical lung cancer staging system comprises stages 1 to 4, with progression being from stage 1 to stage 4.
  • the grade of a lung cancer is determined by specific morphological features of the lung cancer tumor cells and generally indicates the similarity of the tumor cells to non-tumor cells when viewed under a microscope.
  • Lung cancer grade provides information about the aggressiveness of the cancer, i.e., how quickly the lung cancer cells are likely to be able to divide and spread around the body.
  • Various grading systems may be used, comprising a different number of possible grades.
  • SCLCs typically have a high grade. Both stage and grade can influence treatment efficacy, for instance certain treatments may not be as effective in a high-grade cancer compared to a low-grade cancer. Stage and grade will also affect the balance of risk against benefit for any particular treatment option.
  • the present invention addresses these needs by providing methods, systems and kits for diagnosing and staging/grading lung cancer based on the methylation of CpG sites in DNA, specifically cell-free DNA (cfDNA), at one or more of the genomic loci in the appended sequence listing.
  • cfDNA cell-free DNA
  • the genomic loci in the sequence listing are also referred to as ‘markers’ or ‘marker loci’.
  • DNA methylation is the conversion of a cytosine in a DNA site with the dinucleotide sequence ‘CG’ (known as a CpG site) to 5 -methylcytosine (5mC).
  • CG dinucleotide sequence
  • 5mC 5 -methylcytosine
  • Changes in DNA methylation are known to occur in many types of cancer, but the pattern of DNA methylation across the genome also varies over time, between different individuals and between different instances of the disease. So, it is difficult to link specific changes in DNA methylation to the presence or absence of lung cancer (or its stage/grade) and there is currently insufficient knowledge about methylation markers that are highly correlated with lung cancer.
  • the inventors have discovered that a change in methylation of one or more of the CpG markers in the sequence listing indicates the presence of lung cancer with surprisingly high specificity and high sensitivity. Accordingly, measuring the methylation level of one or more of the markers in the sequence listing allows for improved diagnosis of lung cancer.
  • the change for any particular marker can be an increase in methylation (hypermethylation) or a decrease (hypomethylation) compared to an index methylation level for the marker in cfDNA from an individual/individuals without lung cancer (which can include individuals without lung cancer, but who are at high risk of developing lung cancer).
  • the invention involves analysis of the methylation of CpG sites in cfDNA, i.e., fragmented genomic DNA which is found in vivo in an animal within a bodily fluid rather than within an intact cell. This allows for non-invasive diagnosis (so-called ‘liquid biopsy’).
  • CfDNA from an individual with lung cancer will comprise small amounts of DNA derived from the lung cancer tumor cells, so analysing methylation in cfDNA allows for cancer-associated marker methylation changes to be detected without having to take samples of the tumor itself.
  • the small amount of tumor-derived cfDNA will be mixed with a massive excess of DNA derived from non-tumor cells. This is even more the case when the individual has an early-stage lung cancer.
  • the invention provides a method for determining a likelihood of the presence of lung cancer in a human subject, comprising steps of:
  • the known source may be cfDNA from: an individual without lung cancer or an individual known to have lung cancer. If the source is an individual known to have lung cancer, in some embodiments they be known to have a particular type of lung cancer (z.e., NSCLC or SCLC), a particular stage of lung cancer, or a particular grade of lung cancer. Comparison of the methylation level of the sample to an index methylation level derived from an individual having a known lung cancer status permits the likelihood of the presence (or absence) of lung cancer to be determined.
  • TNM Tumor, Node and Metastasis
  • the TNM system can also be used to divide lung cancers into ‘number stages’ (stage 1, stage 2, stage 3 or stage 4) and/or sub-stages (stage 1A, stage IB, stage 2A, stage 2B, stage 3 A, stage 3B, stage 3C, stage 4A or stage 4B).
  • stage 1B stage 1, stage 2, stage 3 or stage 4
  • sub-stages stage 1A, stage IB, stage 2A, stage 2B, stage 3 A, stage 3B, stage 3C, stage 4A or stage 4B.
  • SCLC an alternative staging system is to classify the cancer as either ‘limited’ or ‘extensive’.
  • a system for grading lung cancer classifies the cancer into grade 1 (well differentiated, lepidic dominant), grade 2 (moderately differentiated, acinar or papillary predominant) or grade 3 (poorly differentiated; solid or micropapillary predominant).
  • the known source may be cfDNA from an individual with stage 1, stage 2, stage 3 or stage 4 lung cancer. Alternatively or additionally, the known source may be cfDNA from an individual with stage 1A, stage IB, stage 2A, stage 2B, stage 3A, stage 3B, stage 3C, stage 4A or stage 4B lung cancer. The known source may also be cfDNA from an individual with limited SCLC or extensive SCLC. The known source may also be cfDNA from an individual with grade 1 , grade 2 or grade 3 lung cancer.
  • the index level may be based on cfDNA from a plurality of individuals with the same cancer status.
  • the known source may be a plurality of individuals without lung cancer, a plurality of individuals known to have lung cancer, a plurality of individuals known to have a particular type of lung cancer (i.e., NSCLC or SCLC), a plurality of individuals known to have a particular stage of lung cancer, or a plurality of individuals known to have a particular grade of lung cancer.
  • the index methylation level may be an average of the methylation levels for the same marker in the known sources.
  • the comparison methylation level may be the average of the methylation level calculated for the same marker in cfDNA from many individuals without lung cancer. The average may be the arithmetic mean.
  • a cancer becomes increasingly severe ⁇ i.e., progresses from stage 1 through to stage 4 and/or from a low grade to a high grade
  • additional changes in CpG methylation can accrue.
  • the methylation level of the markers of the sequence listing may also change with lung cancer severity, so providing information about lung cancer stage and/or grade.
  • Methods of the invention are particularly useful for diagnosing early-stage lung cancer, i.e., stage 1 (encompassing stage 1A and stage IB).
  • Methods of the invention are also particularly useful for diagnosing high-grade lung cancer (e.g., grade 2 or 3), preferably at an early stage (e.g., stage 1 or 2).
  • the methylation level of the markers of the sequence listing may also differ depending on lung cancer type (NSCLC or SCLC). So, methods of the invention are also useful for diagnosing whether NSCLC or SCLC is present in a human subject.
  • the invention provides a method for determining a likelihood of the presence of lung cancer of a particular type, stage and/or grade in a human subject, comprising:
  • the measured methylation level(s) may be compared to more than one index methylation level, each index level being for the same marker in individual(s) having a different type, stage and/or grade of lung cancer, and the likelihood of the presence of lung cancer of a particular type, stage and/or grade is based on all the comparisons performed.
  • the indicative value of a marker can be quantified by measuring the area under the receiver operating characteristic (ROC) curve (AUC) for the marker.
  • AUC receiver operating characteristic
  • an AUC of greater than 0.5 (for hypermethylated markers) or less than 0.5 (for hypomethylated markers) is useful for disease prediction.
  • the difference between the AUC of and 0.5 is >0.25, >0.30, >0.35, >0.40, >0.45, or greater.
  • the indicative value of a marker can also be measured statistically by comparing its mean methylation level in cfDNA samples from patients with lung cancer to its mean methylation level in cfDNA samples from patients without lung cancer e.g. using Student’s t-test to produce a p-value.
  • a p-value ⁇ 0.05 is generally used as the cut-off for statistical significance.
  • the invention provides a method for determining a likelihood of the presence of lung cancer in a human subject, comprising:
  • the methylation level of a marker is measured by cfDNA digestion using methylation-sensitive and/or methylation-dependent restriction endonucleases (MSREs or MDREs, respectively) followed by downstream analytical steps which quantify the degree of digestion.
  • MSREs or MDREs methylation-sensitive and/or methylation-dependent restriction endonucleases
  • downstream analytical steps which quantify the degree of digestion.
  • MSRE methylation-sensitive and/or methylation-dependent restriction endonucleases
  • Digestion with a plurality of MSREs or MDREs, for instance, two MSREs, is also encompassed in the methods of the invention. MSREs and MDREs are described in more detail below.
  • this term refers to the mixing of active restriction enzyme(s) with cfDNA in conditions under which digestion can occur. If there are no recognition sites for the restriction enzyme in question (e.g., because it is a MSRE and all of the recognition sequences are fully methylated) then a step of ‘digestion’ still takes place even though DNA cleavage does not occur.
  • the methylation level of a marker is measured by mixing cfDNA with one or more reagents that chemically modify nucleobases within DNA in a methylation-conditional manner, followed by downstream analytical steps which quantify the degree of modification.
  • a suitable reagent is sodium bisulfite, which converts unmethylated cytosine to uracil (further details below).
  • the downstream analytical steps comprise (i) amplification of a sequence comprising a CpG site located within the marker, or (ii) high throughput sequencing.
  • the amplification is by polymerase chain reaction (PCR), specifically by real time PCR (rtPCR, also known as quantitative PCR or qPCR).
  • rtPCR real time PCR
  • the amplification is by another real-time amplification reaction, for instance, an isothermal real-time amplification reaction such as real-time accelerated reverse transcription loop-mediated isothermal amplification (real-time RT-LAMP).
  • the invention provides a method for determining a likelihood of the presence of lung cancer in a human subject, comprising:
  • the invention also provides primer pairs comprising a first primer and a second primer, for amplifying a CpG site within a marker of the sequence listing. So, for each marker listed in the sequence listing, the invention provides a primer pair consisting of one primer binding upstream of a CpG site within the marker and one primer binding downstream of the CpG site, wherein the primer pair is suitable for use in a PCR to generate an amplification product comprising the CpG site.
  • measuring a methylation level comprises using a fluorescently-labelled polynucleotide probe to obtain a signal intensity for an amplification product generated in the rtPCR.
  • the labelled probe is typically between 15-30 nucleotides in length and comprises sequence that is complementary to a sub-sequence within the amplicon of interest.
  • the melting temperature of the probe is comparable to that of the primers used in the rtPCR.
  • the invention also provides fluorescently-labelled oligonucleotide probes for detecting an amplification product of a primer pair of the invention.
  • the invention also provides primer sets, each primer set having 4-6 primers, for amplifying a CpG site within a marker of the sequence listing by an isothermal real-time amplification reaction such as real-time RT-LAMP.
  • the invention also provides fluorescently-labelled polynucleotide probes for obtaining a signal intensity for an amplification product generated in the isothermal real-time amplification reaction.
  • the invention also provides a nucleic acid construct comprising a pair of sequencing adapters flanking a nucleic acid insert, wherein the insert is a marker listed in the sequence listing (or a fragment thereof).
  • the sequencing adapters can include one or more of: a site recognised by a universal primer; a flow cell binding sequence, such as a P5 or P7 sequence; an index sequence, such as an i5 or i7 index; and/or a molecular barcode.
  • the two adapters within the construct may differ e.g. one may include a P7 and i7 sequence, whereas the other includes a P5 and i5 sequence.
  • the insert is a fragment of a marker in the sequence listing, it is ideally at least 20 nucleotides long e.g.
  • constructs can be prepared by ligating sequencing adapters to a digested cfDNA sample e.g. by ligating Y-shaped adapters.
  • the digested cfDNA sample may be subjected to end repair and/or A-tailing prior to the ligation.
  • the nucleic acid construct is suitable for sequencing by a NGS technique.
  • the invention provides a method for determining a likelihood of the presence of lung cancer in a human subject, comprising:
  • the amount of cfDNA that can be isolated from a typical sample is generally not limiting.
  • the invention is particularly useful as an initial evaluation technique, or as part of a screening programme.
  • the subject may not be suspected of having lung cancer.
  • the subject may be suspected of having lung cancer but is asymptomatic (/. ⁇ ?., does not exhibit any suspicious clinical signs of lung cancer).
  • a reason that the subject is suspected of having lung cancer may be that the subject is classified as having a high risk of developing lung cancer, for example, based on age, smoking history, previous history of lung cancer, genetic predisposition, and/or family history.
  • the high risk may be classified according to the age and smoking history of the subject. For instance, the subject may be classified as high risk if they are between 55 and 74 years old and smoke or have smoked.
  • USPSTF US Preventative Services Taskforce
  • a ‘pack year’ is a unit of smoking equivalent to an average of 1 pack of cigarettes (such as 20 cigarettes) per day for 1 year.
  • a person could have a 20 pack year history by smoking 1 pack a day for 20 years, or 2 packs a day for 10 years.
  • the subject has a smoking history of about 40 pack years or more.
  • the subject is at least about 50 years old, at least about 55 years old, at least about 60 years old, or at least about 65 years old.
  • the subject may exhibit suspicious clinical signs of cancer and/or is suspected of having lung cancer based on other prior assay(s) e.g., based on testing of other biomarker(s).
  • the subject is at risk of recurrence of lung cancer.
  • the subject shows at least one symptom or characteristic of lung cancer.
  • Symptoms or characteristics of lung cancer include, but are not limited to: a persistent (and potentially worsening) cough, recurring chest infections, coughing up blood, aches and/or pains when breathing and/or coughing, persistent breathlessness, persistent lack of energy, loss of appetite, unexplained weight loss, finger clubbing, difficulty swallowing or pain when swallowing, wheezing, a hoarse voice, swelling of the face or neck and persistent chest and/or shoulder pain.
  • the subject was not previously diagnosed with lung cancer. In some embodiments, the subject was previously diagnosed and treated for lung cancer. In some embodiments, such a subject is in need of monitoring for the recurrence of lung cancer.
  • methods include a step of preparing a report in paper or electronic form based on the assessment of the likelihood of lung cancer or the diagnosis of the presence or absence of lung cancer, and optionally communicating the report to the subject and/or a healthcare provider of the subject.
  • the invention can also be embodied as a method for: assessment of a subject with lung cancer, assessment of a subject without any symptoms of lung cancer, assessment of a subject with at least one symptom of lung cancer, ruling out lung cancer in a subject with at least one symptom of lung cancer, determining the presence or absence of high-grade lung cancer in a subject, or ruling out high-grade lung cancer in a subject.
  • the invention can also be used as an initial step in existing lung cancer diagnostic techniques, to target such techniques on patients where the invention indicates that lung cancer is present.
  • the invention also provides a method for detecting lung cancer in a subject, comprising determining a likelihood of the presence of lung cancer as disclosed herein, and performing a clinical diagnostic step on the subject.
  • the clinical diagnostic step may be one or more of: a chest X-ray; a CT scan; a PET-CT scan; a bronchoscopy and biopsy; a bronchoscopy and endobronchial ultrasound scan; a thoracoscopy; a mediastinoscopy; and/or percutaneous needle biopsy.
  • the invention can also be embodied as methods of treatment.
  • the invention provides a method for treating or managing lung cancer in a human subject, comprising determining a likelihood of the presence of lung cancer as disclosed herein, and administering, deciding to administer, or recommending the administration of a suitable treatment to the subject based on the likelihood.
  • the subject can be taken forward into a suitable method of treatment.
  • the treatment may comprise one or more of surgical resection (including wedge resection, segmental resection, sleeve resection, lobectomy and pnemonectomy), laser therapy, photodynamic therapy, cryosurgery, electocautery, chemotherapy, radiation therapy, immunotherapy, and/or targeted drug therapy (see below for more details).
  • the invention also provides a method for treating or managing lung cancer in a human subject, comprising determining a likelihood of the presence of lung cancer of a particular type, stage and/or grade as disclosed herein, and administering a suitable treatment based on the likelihood.
  • a likelihood of the presence of lung cancer is determined in a human subject one or more times after the subject has undergone lung cancer treatment. This provides information about treatment response.
  • the human subject is identified as non- responsive to the lung cancer treatment and said lung cancer treatment is modified.
  • the human subject is identified as non-responsive to the lung cancer treatment and it is decided to modify said lung cancer treatment.
  • the human subject is identified as non-responsive to the lung cancer treatment and it is recommended to modify said lung cancer treatment.
  • the human subject is categorised as having residual disease or tumor viable cells, and a second-line therapy is administered, to the subject.
  • the human subject is categorised as having residual disease or tumor viable cells, and it is decided to administer a second-line therapy to the subject.
  • the second-line therapy comprises one or more of surgical resection (including wedge resection, segmental resection, sleeve resection, lobectomy and pnemonectomy), laser therapy, photodynamic therapy, cryosurgery, electocautery, chemotherapy, radiation therapy, immunotherapy, and/or targeted drug therapy.
  • said subject is categorised as having residual disease or viable tumor cells, thereby indicating that said subject is at high risk of disease recurrence.
  • the invention also provides a method for differentially amplifying tumor-derived cfDNA and non-tumor-derived cfDNA in cfDNA from a sample of a human subject having lung cancer comprising: (a) treating the cfDNA from a sample of a human subject having lung cancer with at least one reagent that differentially affects methylated and non-methylated DNA; and
  • the reagent that differentially affects methylated and non-methylated cfDNA is a MSRE or a MDRE.
  • the reagent is a MSRE. Treating the cfDNA with a plurality of MSREs or MDREs, for instance, two MSREs, is also encompassed in the methods of the invention.
  • the reagent that differentially affects methylated and non-methylated cfDNA may be a reagent that conditionally chemically modifies nucleobases within DNA based on their methylation status.
  • a suitable reagent is sodium bisulfite, which converts unmethylated cytosine to uracil.
  • the amplification reaction is preferably PCR. Most preferably the amplification reaction is rtPCR because this can be highly sensitive and does not require a separate step for quantifying amplification. Alternatively, the amplification reaction may be an isothermal amplification reaction, such as RT-LAMP.
  • the invention also provides a method for preparing data useful for lung cancer diagnosis comprising: treating cell-free DNA (cfDNA) from a sample of a subject with a reagent that differentially affects methylated and non-methylated DNA; measuring a methylation level, based on the effect of the reagent on the cfDNA, for at least one marker of the sequence listing; and recording the measured methylation level(s).
  • cfDNA cell-free DNA
  • the CpG markers disclosed herein have been selected based on their ability to identify methylation changes associated with lung cancer. More generally, however, the same markers can also be useful for identify methylation changes associated with other types of cancer and proliferative disorders. Moreover, they can be used as pan-cancer markers i.e. for determining the likelihood of the presence of multiple different types of cancer (including lung cancer).
  • the inventors have identified the genomic loci listed in the sequence listing as markers for early detection of lung cancer.
  • measuring the methylation level of a CpG located within a sequence of the sequence listing can be used for determining the likelihood of the presence of lung cancer in a human subject.
  • the sequence listing provides the sequence of each marker and comprises the following additional information for each marker: the marker’s chromosome (in the “chromosome” qualifier of the “source” feature of each sequence) start/end coordinates according to the hg38 genome assembly (in the “map” qualifier of the “source” feature of each sequence)
  • the markers of SEQ ID NOs: 1-30615 have increased methylation (‘hyper’) in cfDNA from cancer patients compared to healthy control subjects.
  • the markers of SEQ ID NOs: 30616-39636 have decreased methylation (‘hypo’) in cfDNA from cancer patients compared to healthy control subjects.
  • two or more markers may overlap (e.g. SEQ ID NOs: 25726 & 25727) and, in these instances, the invention also extends to an aggregated marker encompassing the overlapping markers (e.g. for SEQ ID NOs: 25726 & 25727, nucleotides 12,685,047-12,685,301 on chromosome 16).
  • genomic locus refers to a DNA sequence at a specific region within the genome.
  • the specific region may be identified by the molecular location, namely, by the chromosome and the numbers of the starting and ending base pairs on the chromosome.
  • Genomic loci include gene sequences as well as other genetic elements (e.g., intergenic sequences).
  • a ‘marker locus’ or simply ‘marker’ is a genomic locus that is differentially methylated between different sources of cfDNA (e.g. lung tumor vs. healthy tissue), and therefore analysis of its methylation provides an indication with respect to the source of the DNA.
  • hypermethylation of a particular marker indicates of the presence of the cancer, where ‘hypermethylation’ means increased methylation of the marker across a sample of DNA molecules containing the marker, compared to an index methylation level for that marker in cfDNA from an individual/individuals without lung cancer.
  • hypomethylation of a particular marker indicates of the presence of the cancer, where ‘hypomethylation’ means decreased methylation of the marker across a sample of DNA molecules containing the marker, compared to an index methylation level for that marker in cfDNA from an individual/individuals without lung cancer.
  • the comparison of a methylation level for a marker in a sample and the index methylation level of that marker can use typical techniques used when comparing measurements in biological systems.
  • the comparison may be accompanied by an indication of the confidence in that comparison e.g. based on statistical analysis.
  • the degree to which the methylation status of a particular marker is indicative of the presence or absence of lung cancer can be quantified by measuring the area under the receiver operating characteristic (ROC) curve (AUC) for the marker.
  • ROC receiver operating characteristic
  • a particular methylation level can be chosen as a threshold for a disease prediction model based on methylation of that marker.
  • the model would predict the presence of disease for observed methylation levels that are cross that threshold, and the absence of disease for observed methylation levels that do not.
  • a particular classification threshold will be associated with a true positive rate (sensitivity), z. e. , the proportion of observations that are correctly predicted to indicate disease, and a false positive rate (1 - specificity), i. e. , the proportion of observations that are incorrectly predicted to indicate disease.
  • a ROC curve is obtained by plotting the true positive rate (on the y axis) against the false positive rate (on the x axis) for various classification thresholds.
  • ROC curve that is simply a straight line from the bottom left corner to the top right corner (AUC of 50% or 0.5), occurs if the true and false positive rates are equal at all classification thresholds, and indicates no predictive value.
  • Preferred markers herein have an AUC that differs from 0.5 by at least 0.2 e.g. by >0.25, >0.30, >0.35, >0.40, >0.45, or more.
  • a hypermethylated marker may thus have a AUC of >0.7, and a hypomethylated marker may have a AUC of ⁇ 0.2.
  • Markers according to the invention are described in the sequence listing.
  • the location of the markers is given according to Genome Reference Consortium Human Build 38 patch pl3 (‘GRCh38.pl3’, generally known as ‘hg38’).
  • the markers in the sequence listing cover between around 30 bp to around 500 bp of the human genome.
  • the markers in the sequence listing contain at least one CpG site located within a restriction site of a MSRE or MDRE.
  • CpG site(s) may be at any position within a particular marker in the sequence listing.
  • the invention can be based on analysis of any CpG found within the markers in the sequence listing.
  • these marker loci can be detected in cell-free DNA, particularly in cfDNA from plasma samples, enabling non-invasive disease detection and characterization.
  • Cell-free DNA particularly in cfDNA from plasma samples, enabling non-invasive disease detection and characterization.
  • cfDNA cell-free DNA
  • the methods disclosed herein are particularly useful for analysing cell-free DNA (cfDNA) i.e., fragmented genomic DNA which is found in vivo in an animal within a bodily fluid rather than within an intact cell.
  • cfDNA cell-free DNA
  • the origin of cfDNA is not fully understood, but it is generally believed to be released from cells in processes such as apoptosis and necrosis.
  • cfDNA is highly fragmented compared to intact genomic DNA (e.g., see Alcaide et al. (2020) Scientific Reports 10, article 12564), and in general circulates as fragments between 120-220 bp long, with a peak around 168 bp (in humans).
  • cfDNA is present in many bodily fluids, including but not limited to blood and urine, and the methods and compositions disclosed herein can use any suitable source of cfDNA e.g., a blood sample (such as venous blood) or a urine sample.
  • a blood sample such as venous blood
  • a urine sample e.g., a blood sample obtained from a blood sample.
  • cfDNA is isolated from blood, and the blood may be treated to yield plasma (i.e., the liquid remaining after a whole blood sample is subjected to a separation process to remove the blood cells, typically involving centrifugation) or serum (i.e., blood plasma without clotting factors such as fibrinogen).
  • plasma i.e., the liquid remaining after a whole blood sample is subjected to a separation process to remove the blood cells, typically involving centrifugation
  • serum i.e., blood plasma without clotting factors such as fibrinogen.
  • the methods and compositions disclosed herein can be used
  • Methods disclosed herein may thus include a step of purifying cfDNA from a blood, plasma or serum sample, to provide cfDNA for digestion and analysis. Methods may also include a step of obtaining a blood sample and preparing plasma or serum therefrom, thus providing a source for downstream purification of cfDNA.
  • Blood can be collected in tubes that contain an anticoagulant and an agent to inhibit genomic DNA from white blood cells in the sample being released into the plasma component of the blood sample.
  • Such tubes are commercially available as glass cfDNA ‘Blood Collection Tubes’ or ‘BCT’ from Streck (La Vista, NE) e.g. as discussed by Diaz et al. (2016) PLoS One 11(11): e0166354, and they can stabilize cfDNA within blood for up to 14 days at 6-37°C (thus providing advantages compared to typical K 2 EDTA collection tubes).
  • Useful anticoagulants include, but are not limited to, EDTA, heparin, or citrate.
  • Useful agents to inhibit release of genomic DNA from white blood cells include, but are not limited to, diazolidinyl urea, imidazolidinyl urea, dimethoylol-5,5- dimethylhydantoin, dimethylol urea, 2-bromo-2-nitropropane-l,3-diol, oxazolidines, sodium hydroxymethyl glycinate, 5-hydroxy-methoxymethyl-l-laza-3,7-dioxabicyclo[3.3.0]octane, 5- hydroxymethyl- 1-1 aza-3,7dioxa-bicyclo[3.3.0]octane, 5-hydroxypoly [methyleneoxy]methyl- 1 -laza- 3,7dioxabicyclo[3.3.0]-octane, quaternary adamantine, and mixtures thereof.
  • a quenching agent e.g. lysine, ethylene diamine, arginine, urea, adenine, guanine, cytosine, thymine, spermidine, or any combination thereof
  • a quenching agent e.g. lysine, ethylene diamine, arginine, urea, adenine, guanine, cytosine, thymine, spermidine, or any combination thereof
  • a tube can include imidazolidinyl urea (or diazolidinyl urea), EDTA and glycine.
  • Suitable collection tubes can be found in W02013/123030 and US2010/0184069.
  • Other useful collection tubes are available, including but not limited to various plastic tubes: the ‘Cell-Free DNA Collection Tube’ from Roche, made of PET; the ‘LBgard blood tube’ from Biomatrica, made from plastic and suitable for up to 8.5 mL of blood; and the ‘PAXgene Blood DNA tube’ from PreAnalytiX or Qiagen. These tubes are discussed in more detail in Kerachian et al. (2021) Clinical Epigenetics 13,193 and Grolz et al. (2016) Current Pathobiology Reports 6:275-86.
  • These various tubes can store up to 8.5 mL of blood, or sometimes up to 10 mL.
  • a blood sample taken from a subject may thus typically have a volume of between 5-10 mL.
  • a 10 mL blood sample typically yields between 10-500 ng cfDNA, but can sometimes yield substantially higher amounts e.g. up to around 10 pg, particularly in certain cancer patients. Methods disclosed herein can be performed on the amount of cfDNA contained in a 10 mL blood sample. Methods and compositions disclosed herein may typically use from 10-400 ng of cfDNA, for instance from 10-250 ng or from 10-200 ng.
  • Kits for purifying cfDNA from plasma (and other bodily fluids) are readily available e.g. the MagMAX cfDNA isolation kit from ThermoFisher, the Maxwell RSC ccfDNA plasma kit from Promega, the alle MiniMax high efficiency isolation kit from Beckman Coulter, or the QIAamp or EZ1 products from Qiagen.
  • Methods disclosed herein may therefore utilise cfDNA extracted from ⁇ 10 mL blood from a subject. Methods may begin with cfDNA which has already been prepared, or may include an upstream step of preparing the cfDNA. Similarly, methods may include an upstream step of obtaining a plasma sample before a step of preparing cfDNA from the plasma sample.
  • the cfDNA utilised in methods disclosed herein is substantially free of singlestranded DNA (ssDNA) i.e. where less than 7% of the cfDNA molecules (by number) are singlestranded, and preferably less than 5% or less than 1% (i.e. such that at least 99% of the cfDNA molecules are double-stranded).
  • the cfDNA contains less than 0.1% ssDNA, less than 0.01% ssDNA, or may even contain no ssDNA (i.e. free of ssDNA). Extraction of cfDNA to obtain a cfDNA sample substantially free of ssDNA is described, for example, in WQ2020/188561.
  • Ensuring low levels of ssDNA avoids potential inhibition of restriction digestion, and also avoids undesired amplification of ssDNA.
  • kits are available for quantifying single-stranded DNA in a sample e.g. the Promega QuantiFluorTM kit.
  • all extracted cfDNA is used in the methods disclosed herein.
  • cfDNA is split into multiple fractions, and one or more fractions is not used in the methods disclosed herein but may instead be used in other analytical methods, or is kept for use in control experiments, or for other purposes.
  • cfDNA is quantified prior to digestion. In other embodiments, cfDNA is not quantified prior to digestion. Measuring a methylation level
  • Methods of the invention comprise measuring a methylation level of a marker in the sequence listing.
  • a ‘methylation level’ of a marker as used herein is a numerical value conveying information about the proportion or number of cfDNA molecules in a sample of cfDNA which were methylated and/or unmethylated at one or more CpG site(s) in the marker.
  • the invention can use any method suitable for measuring a methylation level. Methods encompassed by the invention include those that comprise analysis of DNA upstream and/or downstream of the markers given in the sequence listing, so long as the methylation level of at least one CpG site in a marker in the sequence listing is measured.
  • Preferred methods are those comprising cfDNA digestion using methylation sensitive and/or dependent restriction endonucleases (MSREs/MDREs) followed by downstream analytical steps which quantify the degree of digestion of the marker and/or of a CpG site in the marker.
  • Preferred downstream analyses are high-throughput sequencing (also known as next-generation sequencing or NGS) or real-time PCR (rtPCR, also known as quantitative PCR or qPCR).
  • a methylation level can be expressed as a percentage, a fraction, a normalised value, etc.
  • a methylation level of a marker may be expressed as a percentage, ratio or fraction representing the proportion of cfDNA molecules that are methylated at one or more CpG sites in the marker out of the total number of cfDNA molecules comprising the marker.
  • a methylation level of a marker may be expressed as a copy number of methylated or unmethylated cfDNA molecules comprising the marker. This may be expressed as a ‘HitspanN’ of a genomic position in the marker (explained in more detail below).
  • a methylation level may be expressed as the quantification cycle (Cq) for an amplicon comprising a marker locus.
  • Cq quantification cycle
  • the methylation level of a marker would again represent the number of cfDNA molecules comprising the marker which were methylated or unmethylated.
  • methylation levels can be determined according to how often the MSREs/MDREs used cleave and/or do not cleave at their recognition site during digestion. For example, where digestion used an MSRE, alignments which span a particular recognition site indicate molecules which were not cleaved, and so which (with complete digestion) were methylated at the CpG site within the recognition site. So, alignments which span a recognition site directly indicate methylation of the site when an MSRE was used for digestion (and conversely, indicate unmethylation when an MDRE was used).
  • alignments which start or terminate with the cleaved recognition site indicate molecules which were unmethylated at the site (and therefore cleaved during digestion). So, alignments which start or terminate with the cleaved recognition site directly indicate unmethylation of the site when an MSRE was used for digestion (and indicate methylation of the site when an MDRE was used).
  • a methylation level can be determined from alignments that directly indicate methylation and/or alignments that directly indicate unmethylation. Preferably, alignments that directly indicate methylation and alignments that directly indicate unmethylation are considered because this allows for greater accuracy.
  • methylation levels can be determined according to how often a nucleobase capable of being modified by the reagent(s) is modified by the reagent(s). For instance, in embodiments comprising sodium bisulfite treatment, a methylation level of a CpG can be determined from the number of reads wherein the site has the sequence ‘TG’ instead of ‘CG’.
  • the HitspanN of a genomic position corresponds to the number of reads or alignments with a size of ‘N’ nucleotides centred on the position (where N is a positive even integer).
  • the HitspanlOO of a genomic position refers to the number of reads or sequence alignments with a size of at least 100 nucleotides centred on the position. So, a HitspanlOO of 90 at a specific position means that there are 90 sequence reads or alignments with a size of at least 100 nucleotides centred on the position.
  • a methylation level may be normalised with respect to a reference locus and/or a reference DNA sample.
  • the methylation level is a methylation ratio between a marker locus and a reference locus (which may be in the cfDNA being analysed or in a reference DNA sample), expressed as a ratio between signals obtained for these loci in downstream analysis following restriction digestion, methylation-conditional nucleobase modification, PCR amplification, etc.
  • the methylation level of a marker can be calculated by dividing the HitspanN of aposition in the marker by an expected HitspanN of the position e.g., the HitspanN which would be expected if the marker was fully methylated, and thus uncleaved by an MSRE).
  • the expected HitspanN may be determined using, for instance: (i) the HitspanN of a position in a reference locus that is not cut by the restriction enzyme; (ii) the average HitspanN of positions in a plurality of such reference loci; or (iii) the HitspanN of a position in a reference locus in an undigested reference sample (e.g.
  • methylation level may be inferred by comparing the HitspanN in a digested sample to the HitspanN in a reference locus in an undigested sample.
  • the non-methylated CpG sites can be taken as sequencing reads whose 5' ends map to a site, as sequencing reads whose 3' ends map to a site, or as the half of the sum of sequencing reads whose 5' ends or 3' ends map to a site.
  • some sequencing library preparation methods can result in depletion of small fragments, which are then not sequenced (e.g., in CpG islands, where a starting cfDNA molecule is cleaved by a MSRE at more than one unmethylated site, thus providing 3 or more restriction fragments, some of which are very small)
  • the observed number of unmethylated CpG sites may be lower than the true value in the original sample. This distortion can be somewhat addressed by using the larger of the number of reads or alignments whose 3' ends map to a site and the number of reads or alignments whose 5' ends map to a site (or to use the mean).
  • the reference locus may be a different locus compared to the marker locus.
  • the reference locus and the marker locus may be present in the cfDNA from samples from the one or more first subjects.
  • the reference locus may be in DNA from a sample other than those from the one or more first subjects and one or more second subjects, such as an artificial sample comprising a locus with a known methylation level.
  • the reference locus may be the same locus as the marker locus, with the reference locus and marker locus in different samples.
  • the marker locus may be present in cfDNA from samples from the one or more first subjects, and the reference locus may be in cfDNA from samples from the one or more second subjects.
  • the marker locus may be present in cfDNA from samples from one of a plurality of first subjects, and the reference locus may be the same locus in cfDNA from samples from another one of the plurality of first subjects - for example, in first subjects that have a different disease classifications.
  • Methylation level may also be determined without use of a reference locus.
  • the expected read count for a marker locus may be determined as the sum of the read count for the marker locus (indicating methylation, where an MSRE is used) with the read count of loci that start or end at the marker locus (indicating non-methylation), taking account where necessary of any end repair which took place during library preparation. Therefore, a methylation level may be determined without reference to other loci or other samples, based on the ‘raw’ or ‘absolute’ level of methylation at the marker locus.
  • Methods of the invention may comprise a methylation-conditional nucleobase modification step in which chemical changes are made to nucleobases within DNA based on their methylation status. Such chemical changes can be detected in downstream analytical steps.
  • a suitable downstream step is high-throughput sequencing.
  • the methylation-conditional nucleobase modification step is bisulfite conversion.
  • DNA is treated with sodium bisulfite to convert unmethylated cytosine to uracil. The differences in sequence between treated and untreated DNA permits methylation to be detected.
  • Methods of the invention may comprise bisulfite conversion (including as part of an upstream step when preparing the DNA) followed by downstream analytical steps which can distinguish uracil from cytosine in the markers of the invention.
  • the methylation-conditional nucleobase modification step is ten-eleven translocation (TET)-assisted pyridine borane sequencing (TAPS).
  • TAPS refers to a nucleobase modification technique and does not include any particular methodology for reading the sequence of treated DNA.
  • methylated cytosine is converted to dihydrouracil, which is recognised as thymine.
  • Methods of the invention may comprise TAPS (including as part of an upstream step when preparing the DNA) followed by downstream analytical steps which can distinguish dihydrouracil from cytosine in the markers of the invention.
  • measuring a methylation level comprises the methodology described in Fiillgrabe et al. (2023) Nature Biotechnol. (https://doi.org/10.1038/s41587-022-01652-0; see also WO2022/023753) in which methylation-conditional nucleobase modification is combined with particular downstream DNA sequencing steps.
  • preferred methods do not include a step of methylation-conditional nucleobase modification (also called nucleobase conversion).
  • preferred methods do not include nucleobase conversion. Instead, preferred methods disclosed herein use restriction enzymes which recognise specific sequences in double-stranded DNA and introduce a double-stranded break into the DNA. More specifically, methods and compositions disclosed herein may use MSREs and/or MDREs.
  • MSREs and MDREs recognise specific sequences in double-stranded target DNA and introduce a double-stranded break into the target DNA.
  • a MSRE cleaves the target DNA only if a CpG associated with its recognition site is unmethylated, and methylation inhibits the cleavage.
  • a MDRE cleaves the target DNA only if a CpG associated with its recognition site is methylated.
  • DNA digestion with MSREs and/or MDREs provides information about the methylation status of the CpGs within recognition sites present in the target DNA.
  • Type II restriction endonucleases i.e., enzymes where the double-stranded break is introduced within the recognition site, are particularly useful in the invention. The use of multiple restriction enzymes permits simultaneous digestion in parallel within a sample.
  • MSREs and MDREs are also called ‘restriction sites’. Many MSREs and MDREs, with different restriction sites, are commercially available, so a broad coverage of CpG sites across a genome can be obtained using the appropriate combination of MSREs and/or MDREs. Because broad genomic coverage can be obtained, the use of MSREs and/or MDREs in the invention is particularly preferred, with the use of MSREs being most preferable.
  • cfDNA from the sample is digested with MSRE(s). In some embodiments, cfDNA from the sample is digested with MDRE(s). In some embodiments, cfDNA from the sample is digested with MSRE(s) and MDRE(s). Use of MSRE(s) without any MDRE is preferred, and use of a combination of two or more MSREs is preferred, as discussed below. In embodiments involving DNA digestion, enzymes and DNA are typically incubated for a long enough period for substantially complete digestion to occur i.e., further incubation does not lead to any measurable increase in DNA cleavage.
  • digestion can be performed if desired e.g., 3 hours, 4 hours, or longer (e.g., overnight). In some embodiments, digestion is performed for 11 hours or less e.g., for between 2-10 hours, 2-9 hours, 2-8 hours, or 2-4 hours. In other embodiments (e.g., where a collection tube is used, as discussed herein) digestion may be performed for longer periods e.g., for 12 hours or more.
  • Allowing a digestion reaction to substantially proceed to completion provides information about the cleavability of the restriction site of the restriction endonuclease(s) used in the reaction. For example, if a particular restriction site in a particular DNA molecule is not cleaved after complete digestion, then it can be inferred that the locus in that molecule was not cleavable. Lack of cleavage of a MSRE restriction site thus indicates that a CpG sequence which is within or overlaps with that MSRE recognition sequence was methylated, while cleavage indicates that it was unmethylated.
  • restriction enzymes can be inactivated by heating (e.g. to 65°C or 85°C) e.g., by immersing the reaction mixture in a water bath, or by subjecting the mixture to a raised temperature within a thermal cycler which can be used for subsequent PCR.
  • Digestion reaction mixtures with cfDNA tend to have a low volume such that the temperature of the whole reaction mixture reaches the elevated temperate very quickly, leading to inactivation of the enzymes. In some embodiments heating at this temperature occurs for longer than 15 minutes, and ideally occurs for at least 20 minutes e.g., for 20-60 minutes.
  • the temperature can exceed the temperature required for inactivation if desired, but this is not required. This heating step is adequate for complete inactivation of the restriction enzymes i.e., such that the enzymes’ digestion activity toward cleavable target cfDNA molecules under the digestion conditions employed prior to heating can no longer be measurably detected.
  • Preferred methods do not use restriction enzyme isoschizomers, where one of the enzymes recognizes both the methylated and unmethylated forms of the restriction site while the other recognizes only one of these forms.
  • Preferred methods do not use a mixture of restriction enzymes in which at least one enzyme has a recognition sequence which includes a CpG but which is neither a MSRE or a MDRE i.e., an enzyme which digests regardless of the CpG methylation status.
  • MSREs and MDREs are readily available from well-known commercial suppliers, such as ThermoFisher, New England Biolabs, Promega, etc.
  • MSREs include, but are not limited to: Aatll, AccII, Acil, Acll, Afel, Agel, Aorl3HI, Aor51HI, Asci, AsiSI, Aval, BceAI, BmgBI, BsaAI, BsaHI, BsiEI, BsiWI, BsmBI, BspDI, BspT104I, BssHII, BstBI, BstUI, CfrlOI, Clal, Cpol, DpnII, EagI, Eco52I, Faul, Fsel, FspI, Haell, HapII, Hgal, Hhal, HinPH, Hpall, Hpy99I, HpyCH4IV, KasI, Mini, MspI, Nael, Narl, NgoMIV, Notl, Nrul, Nsbl, PaeR7I, PluTI, PmaCI, Pmll, Pspl406I,
  • MDREs include, but are not limited to: BspEI, BtgZI, FspEI, Glal, LpnPI, McrBC, MspJI, Xhol, Xmal.
  • Two preferred MSREs are HinPH and Acil.
  • the invention also provides for the use of a plurality of restriction endonucleases, wherein the plurality consists of MSRE and/or MDRE.
  • the plurality may include only MSREs, only MDREs, or a mixture of both (e.g. one or more MSRE plus one or more MDRE).
  • MSREs it is preferred to work with MSREs, without needing MDREs, and thus the plurality includes two or more MSREs.
  • MSREs leads to digested cfDNA in which methylated CpG sites are intact but unmethylated CpG sites are digested.
  • a preferred plurality of MSREs includes both HinPH and Acil.
  • the markers in the sequence listing include a restriction site for HinPH and/or Acil. This pairing of enzymes covers over 99% of CpG islands in the human genome. With this MSRE pairing it is preferred to include HinPH at an excess (measured in terms of enzymatic units) to Acil, and ideally an excess of at least 1.2:1 e.g. at least 1.5:1, at least 1.75:1, at least 2:1, at least 3:1, at least 4:1, or at least 5:1.
  • Ratios between 2:1 and 5:1 are particularly useful with human cfDNA, and an excess of about 4.5 is preferred.
  • Digestion can be performed at about 37°C, until completion. Incubation at 37°C for 2 hours is typically adequate for complete digestion with HinPH and Acil.
  • HinPH (sometimes known as Hin6I) recognises the sequence GCGC and cleaves after the first G to leave a two nucleotide 5' overhang (5 -G/CGC). It cuts well at 37°C and can be heat-inactivated by heating at 65°C for 20 minutes.
  • NEB recommends the use of its rCutSmartTM buffer (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 pg/mL recombinant albumin, pH 7.9).
  • 1 unit of HinPH is defined as the amount of enzyme required to digest 1 pg of DNA in 1 hour at 37°C in a total reaction volume of 50 pL.
  • Acil recognises the sequence CCGC and cleaves after the first C to leave a two nucleotide 5' overhang (5'-C/CGC). It cuts well at 37°C and can be heat-inactivated by heating at 65°C for 20 minutes.
  • NEB recommends the use of its rCutSmartTM buffer (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 pg/mL recombinant albumin, pH 7.9).
  • 1 unit of Acil is defined as the amount of enzyme required to digest I pg of A DNA in 1 hour at 37°C in a total reaction volume of 50 pL. Its recognition site is non-palindromic.
  • a DNA is a commonly used DNA substrate extracted from bacteriophage lambda (cI857ind 1 Sam 7), being 48502bp long. It is usually stored in 10 mM Tris-HCl (pH 8.0), 1 mM EDTA, and is widely available from commercial suppliers e.g., from NEB under catalogue number N3011S.
  • HinPlI and Acil can both be inactivated by heating at 65 °C. In some embodiments heating at this temperate occurs for longer than 15 minutes, and ideally occurs for at least 20 minutes e.g., for 20-60 minutes. The temperature can exceed 65°C if desired, but this is not required. This heating step is adequate for complete inactivation of the restriction enzymes i.e., such that the enzymes’ digestion activity which was present during cfDNA digestion can no longer be measurably detected even when cleavable target molecules are present.
  • the marker loci disclosed herein contain differentially methylated CpG sites located within recognition site(s) of at least one MSRE and/or MDRE, differences in methylation levels between DNA sources result in differences in the degree of digestion, and subsequently different amplification patterns in subsequent amplification and quantification steps. Such differences enable distinguishing between DNA from different sources, for example, between DNA samples from subjects with lung cancer and DNA samples from healthy subjects.
  • methods disclosed herein may include a step of amplification (e.g., PCR) performed on the digested cfDNA.
  • this amplification will be targeted to the marker(s) of interest.
  • upstream and downstream primers are used which flank a CpG site of interest in the marker, and the intervening CpG-containing sequence will be amplified if it has not been digested by restriction enzymes.
  • the resulting amplicons can then be detected e.g., using a labelled probe which is complementary to a sub-sequence within the amplicons of interest.
  • Methods may therefore include a step of adding PCR reagents after digestion e.g., suitable buffer/salt components (if required in addition to buffer/salt remaining from digestion), a DNA polymerase (such as a Taq polymerase), dNTPs, primers and (optionally) probes.
  • suitable buffer/salt components if required in addition to buffer/salt remaining from digestion
  • a DNA polymerase such as a Taq polymerase
  • dNTPs primers and (optionally) probes.
  • one or more of these components may be present during digestion e.g., it is possible to use a hot start PCR protocol, such that PCR reagents are already present during the digestion step but they do not become active until the reaction mixture is heated (e.g. during heat inactivation of the restriction enzymes).
  • PCR Restriction digestion typically takes place in the presence of high levels of Mg ++ .
  • PCR usually relies on Mg ++ , so standard PCR buffers include Mg ++ . In this situation, however, addition of a standard PCR buffer can lead to an excess of Mg ++ which can inhibit efficiency of amplification. Thus added PCR reagents may include a lower level of Mg ++ than would normally be the case.
  • PCR primers and probes are present during MSRE digestion, they should be designed so that their sequences do not include the recognition site for the MSRE(s) which is/are being used.
  • Amplification and detection of amplicons may be carried out by conventional PCR using fluorescently-labeled primers followed by capillary electrophoresis of amplification products.
  • the amplification products are separated by capillary electrophoresis and fluorescent signals are quantified.
  • An electropherogram plotting the change in fluorescent signals as a function of size (bp) or time from injection may be generated, wherein each peak in the electropherogram corresponds to the amplification product of a single locus.
  • the peak's height (provided for example using ‘relative fluorescent units’, rFU) may represent the intensity of the signal from the amplified locus.
  • Computer software may be used to detect peaks and calculate the fluorescence intensities (peak heights) of a set of loci whose amplification products were run on the capillary electrophoresis machine, and subsequently the ratios between the signal intensities.
  • a preferred PCR technique is real-time PCR (also known as qPCR), in which simultaneous amplification and detection of the amplification products are performed.
  • Real-time PCR can be used with non-specific detection or sequence-specific detection.
  • Non-specific detection e.g., using a dsDNA-binding dye, such as SYBR Green
  • SYBR Green can be used within the methods disclosed herein, but is not ideal if it is desired to distinguish between multiple different amplicons in the same reaction.
  • sequence-specific detection and methods and compositions may use a labelled oligonucleotide probe (usually with a fluorophore and fluorescence quencher on the same probe, as in the TaqMan system) which is complementary to a specific sequence within nucleic acid amplicon(s) of interest.
  • a labelled oligonucleotide probe usually with a fluorophore and fluorescence quencher on the same probe, as in the TaqMan system
  • Different probes for amplicons derived from different target CpGs can be labelled with different fluorophores so that multiple different amplicons can be distinguished.
  • Real-time PCR may thus be achieved by using a hydrolysis probe based on combined reporter and quencher molecules.
  • oligonucleotide probes have a fluorescent moiety (fluorophore) attached to their 5' end and a quencher attached to the 3' end.
  • the polynucleotide probes selectively hybridize to their target sequences on the template, and as the polymerase replicates the template it also cleaves the polynucleotide probes due to the polymerase’s 5'-nuclease activity.
  • the polynucleotide probes are intact, the close proximity between the quencher and the fluorescent moiety normally results in a low level of background fluorescence.
  • the quencher When the polynucleotide probes are cleaved, the quencher is decoupled from the fluorescent moiety, resulting in an increase of intensity of fluorescence.
  • the fluorescent signal correlates with the amount of amplification products, i.e., the signal increases as the amplification products accumulate.
  • Suitable fluorophores include, but are not limited to, fluorescein, FAM, lissamine, phycoerythrin, rhodamine, Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX, JOE, HEX, NED, VIC and ROX.
  • Suitable fluorophore/quencher pairs are known in the art, including but not limited to: FAM- TAMRA, FAM-BHQ1, Yakima Yellow-BHQl, ATTO550-BHQ2 and R0X-BHQ2. Fluorescence may be monitored during each PCR cycle, providing an amplification plot showing the change of fluorescent signals from the probe(s) as a function of cycle number. In the context of real-time PCR, the following terminology is used:
  • ‘Quantification cycle’ refers to the cycle number in which fluorescence increases above a threshold, set automatically by software or manually by the user.
  • the threshold may be constant for each CpG locus of interest and may be set in advance, prior to carrying out the amplification and detection. In other embodiments, the threshold may be defined separately for each CpG locus after the run, based on the maximum fluorescence level detected for this locus during the amplification cycles.
  • Theshold refers to a value of fluorescence used for Cq determination.
  • the threshold value may be a value above baseline fluorescence, and/or above background noise, and within the exponential growth phase of the amplification plot.
  • Baseline refers to the initial cycles of PCR where there is little to no change in fluorescence.
  • Computer software is readily available for analysing amplification plots and determining baseline, threshold and Cq.
  • Primers may vary in length, depending on the particular assay format and the particular needs.
  • the primers may be at least 15 nucleotides long, such as between 15- 25 nucleotides or 18-25 nucleotides long.
  • the primers may be adapted to be suited to a chosen amplification system.
  • Primers may be designed to generate amplicons between 60-150 bp long (when the relevant CpG site(s) is/are intact) e.g. between 70-140 bp long.
  • Oligonucleotide probes may vary in length. In some embodiments, the probes may include between 15-30 nucleotides, from 20-30 nucleotides, or from 25-30 nucleotides.
  • the oligonucleotide probes may be designed to bind to either strand of the double-stranded amplicons. Additional considerations include the melting temperature of the probes, which should preferably be comparable to that of the primers. Where multiple CpG sites are analysed in parallel, with simultaneous amplification of more than one target in the same reaction mixture (co-amplification) using different primer pairs for each CpG site of interest, these different primers may be designed such that they can work at the same annealing temperature during amplification. Thus, primers with similar melting temperature (Tm) can be designed e.g. within + 3°-5°C of each other. Similar considerations apply where multiple probes are used.
  • Tm melting temperature
  • Methods disclosed herein may include a step of DNA sequencing, such as a step using nextgeneration sequencing (‘NGS’) techniques (also known as high-throughput sequencing).
  • NGS generally involves three basic steps: library preparation; sequencing; and data processing.
  • Examples of NGS techniques include sequencing-by-synthesis and sequencing -by-ligation (employed, for example, by Illumina Inc., Life Technologies Inc., PacBio, and Roche), nanopore sequencing methods and electronic detection-based methods such as Ion TorrentTM technology (Life Technologies Inc.).
  • NGS may be performed using various high-throughput sequencing instruments and platforms, including but not limited to: NovaseqTM, NextseqTM and MiSeqTM (Illumina), 454 Sequencing (Roche), Ion ChefTM (ThermoFisher), SOLiD® (ThermoFisher) and Sequel IITM (Pacific Biosciences).
  • Appropriate platform-designed sequencing adapters are used for preparing the sequencing library, and are readily available from the platforms’ manufacturers.
  • Sequencing adapters typically include platform-specific sequences for fragment recognition by a particular sequencer e.g. sequences that enable ligated molecules to bind to the flow cells of Illumina platforms (e.g. the P5 and P7 sequences). Each sequencing instrument provider typically sells a specific set of sequences for this purpose. Further details of library preparation are discussed below.
  • Sequencing adapters can include sites for binding to a universal set of PCR primers. This permits multiple adapter-ligated DNA molecules to be amplified in parallel by PCR, using a single set of primers.
  • Sequencing adapters can include sample indices, which are sequences that enable multiple samples to be combined, and then sequenced together (i.e. multiplexed) on the same instrument flow cell or chip. Each sample index, typically 6-10 nucleotides, is specific to a given sample and is used for de-multiplexing during downstream data analysis to assign individual sequence reads to the correct sample. Sequencing adapters may contain single or dual sample indexes depending on the number of libraries combined and the level of accuracy desired. Sequencing adapters can include unique molecular identifiers (UMIs) to provide molecular tracking, error correction and increased accuracy during sequencing. UMIs are short sequences, typically 5 to 20 bases in length, used to uniquely identify original molecules in a sample library. As each nucleic acid in the starting material is tagged to provide a unique molecular barcode, bioinformatics software can filter out duplicate reads and PCR errors with a high level of accuracy and report unique reads, removing the identified errors before final data analysis.
  • UMIs unique mole
  • sequencing adapters include both a sample barcode sequence and a UMI.
  • sequencing adapters allow for paired-end sequencing.
  • compositions and methods disclosed herein use Y-shaped sequencing adapters i.e., adapters consisting of two single-stranded oligonucleotides which anneal to provide a double-stranded stem and two single-stranded ‘arms’.
  • compositions and methods disclosed herein use hairpin sequencing adapters i.e., a single-stranded oligonucleotide whose 5' and 3' termini anneal to provide a double-stranded stem.
  • the double-stranded stem can include a short single-stranded overhang e.g., a single A or T nucleotide.
  • the double-stranded stem can be ligated to a cfDNA fragment, to prepare a sequencing library.
  • Suitable sequencing adapters for use in the compositions and methods disclosed herein may thus be TruSeqTM or AmpliSeqTM or TruSightTM adapters (for use on the Illumina platform) or SMRTbellTM adapters (for use on the PacBio platform).
  • sequencing adapters are added by ligation, this usually occurs at both ends of the DNA to be sequenced.
  • Restriction digestion can leave blunt-ends, but typically produces a single-stranded overhang.
  • Library preparation steps can either preserve this overhang (i.e., add complementary nucleotides) or remove it.
  • sequence of a post-digestion terminal single-stranded overhang can include useful information then it is preferred to add sequencing adapters in a way which preserves the overhang e.g. using enzymatic ligation in which a ligase enzyme covalently links a sequencing adapter to a DNA fragment where the terminal sequence of the adapter is complementary to the terminal sequence obtained using the restriction enzyme, or by using a polymerase to add complementary nucleotides and generate a blunt-ended fragment.
  • end repair methods can be carried out before adapter ligation can ensure that DNA molecules contain 5' phosphate and 3' hydroxyl groups.
  • dAMP deoxyadenosine 5 '-monophosphate
  • the chelating agent can be added to provide an amplification reaction mix comprising the chelating agent and a divalent cation at a molar ratio of between 1:20 to 2:1.
  • the reaction mix may include 8-20 mM Mg ++ e.g., about 10 mM magnesium.
  • amplification may be carried out in a reaction mix comprising between 3-4 mM chelating agent and 4 mM Mg ++ .
  • the chelating agent may comprise one or both of EDTA and EGTA.
  • the prepared DNA molecules can be sequenced, to provide a plurality of ‘sequence reads’.
  • Sequence reads from DNA sequencing are then subjected to data processing e.g., to remove sequences which do not fulfil desired quality criteria, to remove duplicates, to correct sequencing errors, to map sequences onto a reference genome, to count the number of sequence reads, etc.
  • Computer software is readily available for performing these steps.
  • the sequencing may be single-read sequencing or paired-end sequencing. Paired-end sequencing is preferred. In single-read sequencing, individual DNA strands are sequenced from one end. In paired-end sequencing, individual DNA strands are sequenced from both ends of the strand. Paired-end sequencing produces paired-end reads, wherein a single paired-end read contains a forward read derived from one end of the DNA strand and a reverse read derived from the other end of the DNA strand. The forward and reverse reads may or may not overlap.
  • Sequence reads can be mapped to a reference genome i.e., a previously identified genome sequence, whether partial or complete, assembled as a representative example of a species or subject.
  • a reference genome is typically haploid, and typically does not represent the genome of a single individual of the species but rather is a mosaic of the genomes of several individuals.
  • a reference genome for the methods of the present invention is typically a human reference genome e.g., a complete human genome, such as the human genome assemblies available at the website of the National Center for Biotechnology Information or at the University of California, Santa Cruz, Genome Browser.
  • An example of a suitable reference genome for human studies is the GRCh38 major assembly (up to patch pl3).
  • Mapping aligns sequence reads to the reference genome, to identify the location of the reads within the reference genome.
  • the sequence reads that align are designated as being ‘mapped’.
  • the alignment process aims to maximize the possibility for obtaining regions of sequence identity across the various sequences in the alignment, allowing mismatches, indels and/or clipping of some short fragments on the two ends of the reads.
  • the number of sequence reads mapped to a certain genomic locus is referred to as the ‘read count’ or ‘copy number’ of this genomic locus. It is not necessary to map all sequence reads which are obtained; indeed, it is not unusual that a portion of sequence reads obtained in any given experiment will not be mappable.
  • the forward and reverse reads of a paired-end read will map upstream and downstream of a locus but not overlap.
  • the 5' and 3' ends of DNA molecule that gave rise to the paired-end read have been directly sequenced, but the sequence of the intervening DNA can be indirectly sequenced (as the genomic sequence between the mapped regions). So, a ‘sequence alignment’, or simply ‘alignment’ may contain both direct and indirect sequence information.
  • the analysis of sequencing data is preferably based on sequence alignments. In embodiments comprising methylation-conditional nucleobase modification, only direct sequence information can be used.
  • alignments used in the analysis are less than about 600 bp and more than about 50 bp in length. More preferably, alignments are less than about 500 bp and more than about 100 bp in length, or less than about 400 bp and more than about 100 bp in length.
  • Another way of expressing coverage that is useful in the analysis methods is to use ‘HitspanN’, such as ‘HitspanlOO’.
  • Any particular CpG site can feature in multiple sequence reads, which can be sequence reads derived from the same original cfDNA molecule and/or from different cfDNA molecules which span the same CpG site. Sequencing is suitably performed such that CpG site(s) of interest is/are seen in at least 100 sequence reads e.g., in at least 200, 300, 400, 500, 600, 700 or more sequence reads.
  • genomic locus refers to a specific location within the genome, and may include a single position (a single nucleotide at a defined position in the genome) or a stretch of nucleotides starting and ending at defined positions in the genome.
  • the specific position(s) may be identified by the molecular location, namely, by the chromosome and the numbers of the starting and ending base pairs on the chromosome.
  • a genomic locus of interest herein contains at least one CpG site.
  • the non-methylated CpG sites can be taken as sequencing reads whose 5’ ends map to a site, as sequencing reads whose 3’ ends map to a site, or as the half of the sum of sequencing reads whose 5’ ends or 3’ ends map to a site (see above).
  • Sequencing may optionally be preceded by a step of ‘hybrid capture’ (also known as ‘hybridization capture’) to enrich the sample to be sequenced for DNA molecules comprising regions of interest, such as one or more markers of the sequence listing.
  • hybrid capture a sample of DNA molecules, such as the prepared DNA molecules of the sequencing library, is captured by allowing the DNA molecules to hybridize with single-stranded oligonucleotide ‘baits’ or ‘probes’ specific for the regions of interest.
  • the baits may be immobilized on a solid support to capture the DNA molecules.
  • the hybridization is carried out in solution with baits that comprise a tag, allowing the subsequent isolation of the DNA:bait hybrids.
  • the baits may be biotinylated and the DNA:bait hybrids isolated by allowing them to bind to the surface of streptavidin-coated magnetic beads.
  • RNA baits are preferred because RNA:DNA duplexes hybridize more efficiently and are more stable than DNA:DNA duplexes.
  • the prepared DNA molecules are subjected to hybrid capture prior to sequencing e.g. using biotinylated RNA bait molecules specific for one or several markers of the sequence listing (or genomic regions close to or overlapping the sequence listing markers).
  • Methods disclosed herein do not require differential adapter tagging of methylated vs. unmethylated DNA molecules.
  • the same population of adapters can be used for all molecules.
  • the invention also provides various systems and kits.
  • a system can comprise computer processor(s) for performing and/or controlling the methods disclosed herein, and/or for processing the results e.g., for performing calculations based on the results.
  • Methods which are at least partially computer-implemented are provided.
  • a system or kit may comprise: a blood, plasma or serum sample of a human subject; components for carrying out a method disclosed herein on at least one CpG site; and computer software stored on a non-transitory computer readable medium, the computer software being able to direct a computer processor to determine a methylation level for the at least one CpG locus based on the methylation assay.
  • the software may also be able to link the methylation level to a diagnostic result or prediction e.g. by comparing one or more methylation level(s) to one or more reference levels to assess the presence of a disease in the subject.
  • the computer software may receive data from a qPCR and/or a NGS experiment.
  • Components for carrying out a method disclosed herein encompass biochemical components (e.g., enzymes, primers, probes, NTPs, etc.), chemical components e.g., buffers, reagents), and technical components (e.g., a PCR system, such as a real-time PCR system, and equipment such as tubes, vials, plates, pipettes).
  • the system may be able to prepare and/or communicate a report to the subject and/or to a healthcare provider of the subject, based on the methylation levels.
  • Computer software includes processor-executable instructions that are stored on a non- transitory computer readable medium.
  • the computer software may also include stored data.
  • the computer readable medium is a tangible computer readable medium, such as a compact disc (CD), magnetic storage, optical storage, random access memory (RAM), read only memory (ROM), or any other tangible medium.
  • Computer-related methods and steps described herein are implemented using software stored on non-volatile or non-transitory computer readable instructions that when executed configure or direct a computer processor or computer to perform the instructions.
  • Each of the system, server, computing device, and computer described in this application can be implemented on one or more computer systems and be configured to communicate over a network. They all may also be implemented on one single computer system.
  • the computer system includes a bus or other communication mechanism for communicating information, and a hardware processor coupled with bus for processing information.
  • a computer system also includes a main memory, such as a random-access memory (RAM) or other dynamic storage device, coupled to bus for storing information and instructions to be executed by processor.
  • Main memory also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor.
  • Such instructions when stored in non-transitory storage media accessible to processor, render computer system into a specialpurpose machine that is customized to perform the operations specified in the instructions.
  • a computer system can include read only memory (ROM) or other static storage device coupled to bus for storing static information and instructions for processor.
  • ROM read only memory
  • a storage device such as a magnetic disk or optical disk, is provided and coupled to bus for storing information and instructions.
  • a computer system may be coupled via bus to a display, for displaying information to a computer user.
  • An input device including alphanumeric and other keys, can be coupled to bus for communicating information and command selections to processor.
  • cursor control such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor and for controlling cursor movement on display.
  • Methods disclosed herein may be performed by a computer system in response to the processor executing one or more sequences of one or more instructions contained in main memory. Such instructions may be read into main memory from another storage medium, such as storage device. Execution of the sequences of instructions contained in main memory causes the processor to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
  • Suitable storage media include any non-transitory media that store data and/or instructions that cause a machine to operation in a specific fashion.
  • Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
  • Storage media are distinct from, but may be used in conjunction with, transmission media.
  • Transmission media participates in transferring information between storage media.
  • transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus.
  • the invention also provides a kit comprising: (i) a composition comprising one or more restriction enzymes; and (ii) components for analysing cfDNA which has been digested with the composition.
  • these components may be e.g. components for performing PCR, or for preparing a sequencing library from digested cfDNA.
  • the kit may include one or more of: (a) a buffer solution e.g.
  • a kit may include an instruction manual for carrying out the methods as disclosed herein.
  • a kit may include a non-transitory computer readable medium storing a computer software comprising instructions that when executed configure or direct a computer processor to perform the method steps disclosed herein.
  • Methods disclosed herein can take advantage of positive and negative controls.
  • parallel analysis can be performed on one or more of:
  • a DNA control which contains a fully methylated recognition sequence for the restriction enzymes used for digestion If this DNA is digested when a method uses only MSREs, this indicates that the method has not performed correctly (and conversely for MDREs).
  • a DNA control which contains a fully unmethylated recognition sequence for the restriction enzymes used for digestion If this DNA is not fully digested when a method uses only MSREs, this indicates that the method has not performed correctly (and conversely for MDREs).
  • DNA controls can also be used as a reference point for analysis, for checking completeness of digestion, etc. As mentioned above, for instance, if fragments are obtained using MSRE digestion then it can be useful in a downstream NGS experiment to know the expected read count, and one way of obtaining this value is to look at the read count for DNA which does not contain the recognition sequence for the MSRE, or at the read count for DNA which contains the recognition sequence but is fully methylated.
  • the DNA control should be similar in size and composition to cfDNA molecules which contain CpG sites of interest.
  • synthetic DNA or PCR amplicons or bacterial plasmid DNA as an unmethylated control, these are more useful if they have sizes which are similar to cfDNA (e.g., a long synthetic DNA, or an appropriately-sized restriction fragment prepared from a plasmid).
  • Control experiments can be performed internally in a sample, or externally.
  • control DNA can be present in a sample already (e.g., cfDNA containing a CpG site which is known to be ubiquitously (un)methylated, or cfDNA which does not contain a recognition sequence for the restriction enzymes being used) and/or can be added (e.g., synthetic DNA, added to cfDNA).
  • the control DNA can therefore be processed in combination with the cfDNA, and experiences the same conditions as the cfDNA, and so a method can involve co-amplification of a locus including a restriction site and a control locus.
  • control DNA is subjected to the same treatment as the cfDNA but not as part of the same reaction mixture.
  • control DNA like cfDNA
  • Real-time PCR of suitable control loci can give a result that can be used as a reference point.
  • the signals obtained from cfDNA at a CpG site of interest and from control DNA can be compared, and the signal ratio can be used to determine the degree of methylation at a CpG site of interest, because the ratio of signal reflects the ratio of methylation.
  • methods disclosed herein can be performed without requiring evaluation of absolute methylation levels at genomic loci, but rather by calculating a signal ratio between the analyzed genomic loci and a control. This contrasts with some conventional methods of methylation analysis for distinguishing between tumor-derived and normal DNA, which require determining actual methylation levels at specific genomic loci.
  • the methods disclosed herein can thus eliminate the need for standard curves and/or additional laborious steps involved in determination of absolute methylation levels, thereby offering a simple and cost-effective procedure.
  • An additional advantage when using an internal control is that signal ratios are obtained for loci amplified in the same reaction mixture under the same reaction conditions, which can help to eliminate sources of potential error (e.g. the potential for differences between reaction mixtures, such as the concentration of template, enzyme, etc.).
  • Methods which use qPCR may therefore involve calculating signal intensity ratios between a CpG site co-amplified after digestion of DNA as disclosed herein, thereby providing a methylation status for the CpG site.
  • This methylation level can then be compared to reference levels (e.g., obtained from healthy subjects, or from subjects having a known disease) and, based on the comparison, a diagnostic result can be derived.
  • a method may involve: co-amplifying from restriction endonuclease-digested DNA a CpG site and a control locus, thereby generating co-amplification products; determining a signal intensity for each generated co-amplification product; and calculating a ratio between the signal intensities of the co-amplification products of the CpG site and the control locus.
  • the ratio between the signal intensities of the co-amplification products may be calculated by determining the quantification cycle (Cq) for each locus and calculating 2 ⁇ Cq contro1 locus ’ Cq CpG Slte) .
  • Cq quantification cycle
  • CpG Slte the reduction in Cq relative to the control locus
  • this value is used as the exponent of 2 to calculate the ratio.
  • the difference in Cq for a marker of interest and a control locus (ACq) is at least 2 cycles.
  • a numerical value which represents the degree of methylation of that CpG site in a cfDNA sample.
  • This value may be expressed in a variety of ways e.g., as a ratio or percentage of the cfDNA molecules that are methylated at a CpG site, or as an intensity of a signal obtained from a particular CpG site, or as the ratio between a CpG site and a control locus, etc.
  • PRC2 is a protein complex that methylates histone H3 at lysine 27 (H3K27).
  • the constitutive subunits of PRC2 are polycomb protein SUZ12, histone-lysine N-methyltransferase EZH1 or histone-lysine N-methyltransferase EZH2, polycomb protein EED and histone binding protein RBBP4 or histone binding protein RBBP7.
  • PRC2 may also comprise, as accessory subunits, zinc finger protein AEBP2 and protein Jumonji (Jarid2); or one of the PCL proteins (PHF1, MTF2 or PHF19), and EPOP or PALI1/PALI2.
  • marker loci according to the invention can comprise the genomic loci targeted by PRC2.
  • Marker loci according to the invention also comprise genomic loci that are located fewer than 500 bp, 1000 bp, 2000 bp, 5000 bp, 10000 bp or 20000 bp from a genomic locus targeted by PRC2.
  • Genomic loci targeted by PRC2 are known in the art and include loci identified as being targeted by any constitutive subunit of PRC2, such as polycomb protein SUZ12, or any accessory subunit of PRC2, such as Jarid2.
  • Loci targeted by PRC2 or any of subunits of PRC2 can be identified using methods known in the art, such as, but not limited to, chromatin immunoprecipitation (ChIP) followed by microarray analysis or sequencing.
  • Loci targeted by PRC2 may also be identified by inference, based on the H3K27 methylation of associated nucleosomes (loci associated with high levels of H3K27 methylation are likely to be PRC2 targets because PRC2 catalyses this methylation).
  • H3K27 methylation can be associated with genomic loci, for instance, by performing ChIP using antibodies that selectively recognise tri-methylated H3K27, (i.e., ‘H3K27me3’, the product of PRC2 methylation) followed by microarray analysis or sequencing.
  • H3K27me3 the product of PRC2 methylation
  • the H3K27 methylation activity of PRC2 may be involved in the development of lung cancer. Accordingly, the invention also encompasses methods for the treatment or prevention of lung cancer comprising regulating the activity of PRC2.
  • the regulating is achieved by contacting at least one subunit of PRC2, such as SUZ12, with a therapeutic compound.
  • the at least one subunit contacted may be a constitutive or an accessory subunit.
  • the therapeutic compound affects the genomic targeting of PRC2. Additionally or alternatively, the therapeutic compound may regulate the methyltransferase activity of PRC2.
  • the therapeutic compound may be able to interact with the methyltransferase active site in EZH2 and/or EZH1 and inhibit methyltransferase activity.
  • the therapeutic compound may allosterically regulate the methyltransferase activity of PRC2, for instance, by interacting with EED.
  • the invention provides a method for treating or managing lung cancer in a human subject, comprising determining a likelihood of the presence of lung cancer as above, and administering, deciding to administer, or recommending the administration of, a suitable treatment to the subject based on the likelihood.
  • the treatment may comprise administration of one or more of: adagrasib, afatinib dimaleate, alectinib, amivantamab, atezolizumab, bevacizumab, brigatinib, capmatinib, carboplatin, cemiplimab, ceritinib, cisplatin, crizotinib, dabrafenib mesylate, dacomitinib, docetaxel, doxorubicin hydrochloride, durvalumab, entrectinib, erlotinib hydrochloride, etoposide, everolimus, famtrastuzumab deruxtecan, gefitinib,
  • the type of treatment may be determined by skilled practitioner(s) according to characteristics of the tumor, including the type, stage and grade of the tumor. The type of treatment is determined typically also based on additional factors such as characteristics of the patient.
  • composition ‘comprising’ encompasses ‘including’ as well as ‘consisting’ e.g. a composition ‘comprising’ X may consist exclusively of X or may include something additional e. g. X + Y.
  • the word ‘substantially’ does not exclude ‘completely’ e.g., a composition which is ‘substantially free’ from Y may be completely free from Y. Where necessary, the word ‘substantially’ may be omitted from the definition of the invention.
  • the term ‘between’ with reference to two values includes those two values e.g., the range ‘between’ 10 mg and 20 mg encompasses inter alia 10, 15, and 20 mg.
  • a method comprising a step of mixing two or more components does not require any specific order of mixing.
  • components can be mixed in any order. Where there are three components then two components can be combined with each other, and then the combination may be combined with the third component, etc.
  • the various steps of methods may be carried out at the same or different times, in the same or different geographical locations, e.g., countries, and by the same or different people or entities.
  • US Preventative Services Taskforce US Preventative Services Taskforce
  • ECS methylation sensitive restriction enzymes followed by standard library preparation and sequencing
  • Mapping rate was 99.6%, 99.7% and 85.7% and unique mapping rate was 94.1%, 94.3% and 81.4% for WGS, ECS, and BS samples, respectively. Copy number integrity showed Pearson correlations of 0.9 for ECS and 0.67 for BS. Somatic mutation analysis identified a subset of cases with relatively high cfDNA shedding that were associated with larger tumors, older age and squamous cell carcinoma histology. This subset was further used to identify tumor derived plasma based markers and assess fragmentation with high confidence.
  • methylation levels for CpG markers were measured in plasma & lung tissue from healthy subjects, and in plasma & tissue from subjects known to have early-stage lung cancer. Comparisons were performed both at an early sample set (28 controls, 36 lung cancer patients) and a later set (90 controls, 93 cases). Methylation levels were analyzed and compared in various ways. For instance, methylation levels were rank ordered using Student’s t-test, to compare the mean HitSpanlOO in plasma samples taken from lung cancer patients or from healthy controls (and produce a FDR- corrected p-value to compare those means i.e. to indicate how likely it is that the mean HitSpanlOO in the plasma of lung cancer patients and in healthy controls is the same).
  • a logistic regression classifier with Lasso regularization was trained on 100,000 loci, and performance was examined by mean AUC using 5-fold cross validation.
  • AUC values for a ROC curve were determined to assess a CpG marker’s ability to distinguish lung cancer samples from healthy controls.
  • an updated lung cancer atlas was constructed by collecting additional plasma samples from cancer subjects and high-risk individuals without cancer, and processing these as described above. After rigorous filtering, the updated lung cancer atlas includes a total of 79 tumour tissues, 88 normal lung tissues, 89 plasma cancers and 128 plasma controls, that were used for marker development.
  • markers of particular interest were found i.e. about 1/1000 of the CpG sites present in the human genome. Around 9000 are hypo-methylated in cancer samples, and the remainder are hyper-methylated. The full list of markers is shown in the sequence listing.
  • the markers have at least one of the following properties: (i) an AUC well above or below 0.5 for hyper-methylated and hypo-methylated markers in the early sample set; (ii) an AUC well above or below 0.5 for hyper-methylated or hypo-methylated markers in the later sample set; (iii) an AUC well above or below 0.5 for hyper-methylated and hypo-methylated markers in the further analysis; (iv) a p-value ⁇ 0.01 in the t-test.
  • a marker did not meet these criteria when comparing all cancer samples to all controls, but it was still selected where it was found to be useful for classification as being informative for identifying only a specific subset of the cancers.
  • a machine learning model trained on CpG markers in category (iii) performed with high accuracy in discriminating lung cancer patients from high-risk healthy individuals.
  • markers have a low background (z.e. low methylation levels in plasma from healthy patients) which means that they could not have been detected using bisulfite conversion.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides methods, systems and kits for diagnosing lung cancer (and particularly early- stage and/or high-grade lung cancer) in a subject, staging and grading the cancer, evaluating post¬ treatment disease recurrence, monitoring treatment efficacy and providing prognosis, by analysing DNA methylation markers in cell-free DNA from a sample of the subject.

Description

MARKERS
All documents and online information cited herein are incorporated by reference in their entirety.
FIELD OF THE INVENTION
The present invention relates to methods, systems and kits for diagnosing lung cancer (and particularly early-stage and/or high-grade lung cancer) in a subject, staging and grading the cancer, evaluating post-treatment disease recurrence, monitoring treatment efficacy and providing prognosis, by analysing DNA methylation markers in cell-free DNA from a sample of the subject.
BACKGROUND OF THE INVENTION
Lung cancer is one of the most common and serious types of cancer. There are two main types of lung cancer, non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC), the latter of which is usually caused by smoking. NSCLC is the most common. The general prognosis of lung cancer is poor as it does not usually cause noticeable symptoms until it has spread through the lungs and sometimes also into other parts of the body. Therefore, detection of cancer at the earliest possible stage is of paramount importance for treatment of the disease.
Currently, a mixed approach combining initial imaging techniques followed by biopsy is used to diagnose lung cancer. However, the initial imaging techniques may not give a definitive diagnosis, especially when the lung cancer is at a very early stage. Further, these evaluation procedures are laborious, time consuming, costly and require specially trained personnel. At least partly for these reasons, the current lung cancer evaluation procedures are difficult to deploy in, for instance, lung cancer screening programmes, which aim to detect early-stage lung cancer in individuals before they develop noticeable symptoms (hence allowing cancer treatment to begin earlier, when treatment can be more effective). As a result, lung cancer is still too often diagnosed at a late stage, when effective treatment options are more limited. Also, the monitoring of lung cancer during and after treatment, to determine treatment efficacy, is laborious and time consuming.
Lung cancers may be classified according to ‘stage’ and/or ‘grade’ (with the classification process known as ‘staging’ or ‘grading’, respectively). The stage of a lung cancer indicates its size and degree of spread around the body, so provides information about the progression of the disease. A typical lung cancer staging system comprises stages 1 to 4, with progression being from stage 1 to stage 4. The grade of a lung cancer is determined by specific morphological features of the lung cancer tumor cells and generally indicates the similarity of the tumor cells to non-tumor cells when viewed under a microscope. Lung cancer grade provides information about the aggressiveness of the cancer, i.e., how quickly the lung cancer cells are likely to be able to divide and spread around the body. Various grading systems may be used, comprising a different number of possible grades. Most systems have between 2 to 5 grades (see Travis et al. (2016), European Respiratory Journal 201647: 720-723). SCLCs typically have a high grade. Both stage and grade can influence treatment efficacy, for instance certain treatments may not be as effective in a high-grade cancer compared to a low-grade cancer. Stage and grade will also affect the balance of risk against benefit for any particular treatment option.
Accordingly, there is a need for improved methods for diagnosing lung cancer, particularly early-stage lung cancer and/or high-grade lung cancer. In addition, there is a need for improved methods of staging and/or grading lung cancer. These methods would ideally be able to detect lung cancer before the onset of noticeable symptoms and be fast, non-invasive and relatively simple to perform and interpret. They should also diagnose lung cancer in subjects with high specificity and high sensitivity. Such methods can be suitable for use as part of lung cancer screening programmes and for monitoring the effectiveness of ongoing treatment courses. Additionally, diagnostic methods that simultaneously stage the lung cancer would improve treatment by allowing a clinician to more quickly determine the most appropriate treatment option.
SUMMARY
The present invention addresses these needs by providing methods, systems and kits for diagnosing and staging/grading lung cancer based on the methylation of CpG sites in DNA, specifically cell-free DNA (cfDNA), at one or more of the genomic loci in the appended sequence listing. Herein, the genomic loci in the sequence listing are also referred to as ‘markers’ or ‘marker loci’.
DNA methylation is the conversion of a cytosine in a DNA site with the dinucleotide sequence ‘CG’ (known as a CpG site) to 5 -methylcytosine (5mC). Changes in DNA methylation are known to occur in many types of cancer, but the pattern of DNA methylation across the genome also varies over time, between different individuals and between different instances of the disease. So, it is difficult to link specific changes in DNA methylation to the presence or absence of lung cancer (or its stage/grade) and there is currently insufficient knowledge about methylation markers that are highly correlated with lung cancer.
The inventors have discovered that a change in methylation of one or more of the CpG markers in the sequence listing indicates the presence of lung cancer with surprisingly high specificity and high sensitivity. Accordingly, measuring the methylation level of one or more of the markers in the sequence listing allows for improved diagnosis of lung cancer.
The change for any particular marker can be an increase in methylation (hypermethylation) or a decrease (hypomethylation) compared to an index methylation level for the marker in cfDNA from an individual/individuals without lung cancer (which can include individuals without lung cancer, but who are at high risk of developing lung cancer).
The invention involves analysis of the methylation of CpG sites in cfDNA, i.e., fragmented genomic DNA which is found in vivo in an animal within a bodily fluid rather than within an intact cell. This allows for non-invasive diagnosis (so-called ‘liquid biopsy’). CfDNA from an individual with lung cancer will comprise small amounts of DNA derived from the lung cancer tumor cells, so analysing methylation in cfDNA allows for cancer-associated marker methylation changes to be detected without having to take samples of the tumor itself. However, in a sample of cfDNA, the small amount of tumor-derived cfDNA will be mixed with a massive excess of DNA derived from non-tumor cells. This is even more the case when the individual has an early-stage lung cancer.
In one aspect, the invention provides a method for determining a likelihood of the presence of lung cancer in a human subject, comprising steps of:
(a) measuring, in cell-free DNA (cfDNA) from a sample of the subject, a methylation level for at least one marker of the sequence listing; and
(b) comparing the measured methylation level to an index methylation level for the same marker in at least one known source, thereby determining the likelihood based on the similarity of the comparison.
The known source may be cfDNA from: an individual without lung cancer or an individual known to have lung cancer. If the source is an individual known to have lung cancer, in some embodiments they be known to have a particular type of lung cancer (z.e., NSCLC or SCLC), a particular stage of lung cancer, or a particular grade of lung cancer. Comparison of the methylation level of the sample to an index methylation level derived from an individual having a known lung cancer status permits the likelihood of the presence (or absence) of lung cancer to be determined.
A common system for staging lung cancer is the Tumor, Node and Metastasis (TNM) staging system, which captures information about the size and spread of the cancer (see Carter & Erasmus, Diseases of the Chest, Breast, Heart and Vessels 2019-2022, Hodler et al. (eds.), IDKD Springer Series, 2019). The TNM system can also be used to divide lung cancers into ‘number stages’ (stage 1, stage 2, stage 3 or stage 4) and/or sub-stages (stage 1A, stage IB, stage 2A, stage 2B, stage 3 A, stage 3B, stage 3C, stage 4A or stage 4B). For SCLC, an alternative staging system is to classify the cancer as either ‘limited’ or ‘extensive’.
A system for grading lung cancer classifies the cancer into grade 1 (well differentiated, lepidic dominant), grade 2 (moderately differentiated, acinar or papillary predominant) or grade 3 (poorly differentiated; solid or micropapillary predominant).
The known source may be cfDNA from an individual with stage 1, stage 2, stage 3 or stage 4 lung cancer. Alternatively or additionally, the known source may be cfDNA from an individual with stage 1A, stage IB, stage 2A, stage 2B, stage 3A, stage 3B, stage 3C, stage 4A or stage 4B lung cancer. The known source may also be cfDNA from an individual with limited SCLC or extensive SCLC. The known source may also be cfDNA from an individual with grade 1 , grade 2 or grade 3 lung cancer.
The index level may be based on cfDNA from a plurality of individuals with the same cancer status. For instance, the known source may be a plurality of individuals without lung cancer, a plurality of individuals known to have lung cancer, a plurality of individuals known to have a particular type of lung cancer (i.e., NSCLC or SCLC), a plurality of individuals known to have a particular stage of lung cancer, or a plurality of individuals known to have a particular grade of lung cancer. The index methylation level may be an average of the methylation levels for the same marker in the known sources. For instance, the comparison methylation level may be the average of the methylation level calculated for the same marker in cfDNA from many individuals without lung cancer. The average may be the arithmetic mean.
As a cancer becomes increasingly severe {i.e., progresses from stage 1 through to stage 4 and/or from a low grade to a high grade), additional changes in CpG methylation can accrue. The methylation level of the markers of the sequence listing may also change with lung cancer severity, so providing information about lung cancer stage and/or grade. Methods of the invention are particularly useful for diagnosing early-stage lung cancer, i.e., stage 1 (encompassing stage 1A and stage IB). Methods of the invention are also particularly useful for diagnosing high-grade lung cancer (e.g., grade 2 or 3), preferably at an early stage (e.g., stage 1 or 2).
The methylation level of the markers of the sequence listing may also differ depending on lung cancer type (NSCLC or SCLC). So, methods of the invention are also useful for diagnosing whether NSCLC or SCLC is present in a human subject.
In a further aspect, the invention provides a method for determining a likelihood of the presence of lung cancer of a particular type, stage and/or grade in a human subject, comprising:
(a) measuring, in cell-free DNA (cfDNA) from a sample of the subject, a methylation level for at least one marker of the sequence listing; and
(b) comparing the measured methylation level to an index methylation level for the same marker in an individual, or a plurality of individuals who have a particular type, stage and/or grade of lung cancer; thereby determining the likelihood of the presence of lung cancer of a particular type, stage and/or grade based on the comparison.
In another embodiment, the measured methylation level(s) may be compared to more than one index methylation level, each index level being for the same marker in individual(s) having a different type, stage and/or grade of lung cancer, and the likelihood of the presence of lung cancer of a particular type, stage and/or grade is based on all the comparisons performed.
The indicative value of a marker can be quantified by measuring the area under the receiver operating characteristic (ROC) curve (AUC) for the marker. As explained below, an AUC of greater than 0.5 (for hypermethylated markers) or less than 0.5 (for hypomethylated markers) is useful for disease prediction. The greater the difference from 0.5 (i.e. the closer to 0 or 1), the better the ROC performance of the marker. Accordingly, the invention provides a method for determining a likelihood of the presence of lung cancer in a human subject, comprising:
(a) measuring, in cell-free DNA (cfDNA) from a sample of the subject, a methylation level for at least one marker of the sequence listing; and
(b) comparing the measured methylation level(s) to index methylation level(s) for the same marker(s), thereby determining a likelihood of the presence of lung cancer based on the comparison; wherein the measured marker(s) have an area under the receiver operating characteristic curve (AUC) which differs from 0.5 by at least 0.2.
In preferred embodiments, the difference between the AUC of and 0.5 is >0.25, >0.30, >0.35, >0.40, >0.45, or greater.
The indicative value of a marker can also be measured statistically by comparing its mean methylation level in cfDNA samples from patients with lung cancer to its mean methylation level in cfDNA samples from patients without lung cancer e.g. using Student’s t-test to produce a p-value. A p-value <0.05 is generally used as the cut-off for statistical significance.
Accordingly, the invention provides a method for determining a likelihood of the presence of lung cancer in a human subject, comprising:
(a) measuring, in cell-free DNA (cfDNA) from a sample of the subject, a methylation level for a marker of the sequence listing; and
(b) comparing the measured methylation level to an index methylation level for the same marker, thereby determining a likelihood of the presence of lung cancer based on the comparison; wherein the mean methylation level of the marker in cfDNA samples from patients with lung cancer differs from the mean methylation level of the marker in cfDNA samples from patients without lung cancer, and the difference in the means has a p-value <0.05 when assessed by Student’s t-test. In preferred embodiments, the p-value is <0.01.
In preferred embodiments, the methylation level of a marker is measured by cfDNA digestion using methylation-sensitive and/or methylation-dependent restriction endonucleases (MSREs or MDREs, respectively) followed by downstream analytical steps which quantify the degree of digestion. Most preferably, a MSRE is used. Digestion with a plurality of MSREs or MDREs, for instance, two MSREs, is also encompassed in the methods of the invention. MSREs and MDREs are described in more detail below.
Where methods are described herein as involving ‘digestion’, this term (and also ‘digesting’ , etc.) refers to the mixing of active restriction enzyme(s) with cfDNA in conditions under which digestion can occur. If there are no recognition sites for the restriction enzyme in question (e.g., because it is a MSRE and all of the recognition sequences are fully methylated) then a step of ‘digestion’ still takes place even though DNA cleavage does not occur.
Alternatively, the methylation level of a marker is measured by mixing cfDNA with one or more reagents that chemically modify nucleobases within DNA in a methylation-conditional manner, followed by downstream analytical steps which quantify the degree of modification. For example, a suitable reagent is sodium bisulfite, which converts unmethylated cytosine to uracil (further details below).
Preferably, the downstream analytical steps comprise (i) amplification of a sequence comprising a CpG site located within the marker, or (ii) high throughput sequencing. Preferably, the amplification is by polymerase chain reaction (PCR), specifically by real time PCR (rtPCR, also known as quantitative PCR or qPCR). Alternatively, the amplification is by another real-time amplification reaction, for instance, an isothermal real-time amplification reaction such as real-time accelerated reverse transcription loop-mediated isothermal amplification (real-time RT-LAMP).
Accordingly, the invention provides a method for determining a likelihood of the presence of lung cancer in a human subject, comprising:
(a) digesting cfDNA in a sample from the subject with at least one methylation-sensitive restriction endonuclease, to provide digested cfDNA;
(b) quantifying the degree of digestion by performing a real-time amplification reaction on the digested cfDNA to amplify at least one sequence comprising a CpG site within a marker of the sequence listing, thereby measuring the methylation level of the marker; and
(c) comparing the measured methylation level to an index methylation level, thereby determining a likelihood of the presence of lung cancer in the subject based on the comparison.
The invention also provides primer pairs comprising a first primer and a second primer, for amplifying a CpG site within a marker of the sequence listing. So, for each marker listed in the sequence listing, the invention provides a primer pair consisting of one primer binding upstream of a CpG site within the marker and one primer binding downstream of the CpG site, wherein the primer pair is suitable for use in a PCR to generate an amplification product comprising the CpG site.
In some embodiments, measuring a methylation level comprises using a fluorescently-labelled polynucleotide probe to obtain a signal intensity for an amplification product generated in the rtPCR. The labelled probe is typically between 15-30 nucleotides in length and comprises sequence that is complementary to a sub-sequence within the amplicon of interest. Preferably, the melting temperature of the probe is comparable to that of the primers used in the rtPCR. Thus, the invention also provides fluorescently-labelled oligonucleotide probes for detecting an amplification product of a primer pair of the invention.
The invention also provides primer sets, each primer set having 4-6 primers, for amplifying a CpG site within a marker of the sequence listing by an isothermal real-time amplification reaction such as real-time RT-LAMP. The invention also provides fluorescently-labelled polynucleotide probes for obtaining a signal intensity for an amplification product generated in the isothermal real-time amplification reaction.
The invention also provides a nucleic acid construct comprising a pair of sequencing adapters flanking a nucleic acid insert, wherein the insert is a marker listed in the sequence listing (or a fragment thereof). The sequencing adapters can include one or more of: a site recognised by a universal primer; a flow cell binding sequence, such as a P5 or P7 sequence; an index sequence, such as an i5 or i7 index; and/or a molecular barcode. The two adapters within the construct may differ e.g. one may include a P7 and i7 sequence, whereas the other includes a P5 and i5 sequence. Where the insert is a fragment of a marker in the sequence listing, it is ideally at least 20 nucleotides long e.g. >30, >40, >50, >60, >70, >80, >90, or >100 nucleotides long. These constructs can be prepared by ligating sequencing adapters to a digested cfDNA sample e.g. by ligating Y-shaped adapters. The digested cfDNA sample may be subjected to end repair and/or A-tailing prior to the ligation. The nucleic acid construct is suitable for sequencing by a NGS technique.
In another aspect, the invention provides a method for determining a likelihood of the presence of lung cancer in a human subject, comprising:
(a) digesting cfDNA from a sample of the subject with at least one methylation-sensitive restriction endonuclease, to provide digested cfDNA;
(b) quantifying the degree of digestion by performing high-throughput sequencing on the digested cfDNA to provide sequencing data and determining from the sequencing data a methylation level of at least one marker of the sequence listing; and
(c) comparing the measured methylation level to an index methylation level, thereby determining a likelihood of the presence of lung cancer in the subject based on the comparison.
When the downstream analytical steps comprise high throughput sequencing, the amount of cfDNA that can be isolated from a typical sample is generally not limiting.
The invention is particularly useful as an initial evaluation technique, or as part of a screening programme. For non-targeted screening, the subject may not be suspected of having lung cancer. Alternatively, the subject may be suspected of having lung cancer but is asymptomatic (/.<?., does not exhibit any suspicious clinical signs of lung cancer). A reason that the subject is suspected of having lung cancer may be that the subject is classified as having a high risk of developing lung cancer, for example, based on age, smoking history, previous history of lung cancer, genetic predisposition, and/or family history. The high risk may be classified according to the age and smoking history of the subject. For instance, the subject may be classified as high risk if they are between 55 and 74 years old and smoke or have smoked. Another way the high risk may be classified is according to the methodology of the US Preventative Services Taskforce (USPSTF), which defines high risk individuals as 55 through 80 years old, are either current smokers or who have quit within the past 15 years, and have a smoking history of 30 pack years or more. A ‘pack year’ is a unit of smoking equivalent to an average of 1 pack of cigarettes (such as 20 cigarettes) per day for 1 year. For example, a person could have a 20 pack year history by smoking 1 pack a day for 20 years, or 2 packs a day for 10 years. In some embodiments, the subject has a smoking history of about 40 pack years or more. In some embodiments, the subject is at least about 50 years old, at least about 55 years old, at least about 60 years old, or at least about 65 years old.
In additional embodiments, the subject may exhibit suspicious clinical signs of cancer and/or is suspected of having lung cancer based on other prior assay(s) e.g., based on testing of other biomarker(s). In some embodiments, the subject is at risk of recurrence of lung cancer. In some embodiments, the subject shows at least one symptom or characteristic of lung cancer.
Symptoms or characteristics of lung cancer include, but are not limited to: a persistent (and potentially worsening) cough, recurring chest infections, coughing up blood, aches and/or pains when breathing and/or coughing, persistent breathlessness, persistent lack of energy, loss of appetite, unexplained weight loss, finger clubbing, difficulty swallowing or pain when swallowing, wheezing, a hoarse voice, swelling of the face or neck and persistent chest and/or shoulder pain.
In some embodiments, the subject was not previously diagnosed with lung cancer. In some embodiments, the subject was previously diagnosed and treated for lung cancer. In some embodiments, such a subject is in need of monitoring for the recurrence of lung cancer.
In some embodiments, methods include a step of preparing a report in paper or electronic form based on the assessment of the likelihood of lung cancer or the diagnosis of the presence or absence of lung cancer, and optionally communicating the report to the subject and/or a healthcare provider of the subject.
The invention can also be embodied as a method for: assessment of a subject with lung cancer, assessment of a subject without any symptoms of lung cancer, assessment of a subject with at least one symptom of lung cancer, ruling out lung cancer in a subject with at least one symptom of lung cancer, determining the presence or absence of high-grade lung cancer in a subject, or ruling out high-grade lung cancer in a subject.
The invention can also be used as an initial step in existing lung cancer diagnostic techniques, to target such techniques on patients where the invention indicates that lung cancer is present. Thus the invention also provides a method for detecting lung cancer in a subject, comprising determining a likelihood of the presence of lung cancer as disclosed herein, and performing a clinical diagnostic step on the subject. Thus, where the presence of lung cancer is likely, the subject can be taken forward into a suitable confirmatory test. The clinical diagnostic step may be one or more of: a chest X-ray; a CT scan; a PET-CT scan; a bronchoscopy and biopsy; a bronchoscopy and endobronchial ultrasound scan; a thoracoscopy; a mediastinoscopy; and/or percutaneous needle biopsy.
The invention can also be embodied as methods of treatment. For instance, the invention provides a method for treating or managing lung cancer in a human subject, comprising determining a likelihood of the presence of lung cancer as disclosed herein, and administering, deciding to administer, or recommending the administration of a suitable treatment to the subject based on the likelihood. Thus, where the presence of lung cancer is likely, the subject can be taken forward into a suitable method of treatment. The treatment may comprise one or more of surgical resection (including wedge resection, segmental resection, sleeve resection, lobectomy and pnemonectomy), laser therapy, photodynamic therapy, cryosurgery, electocautery, chemotherapy, radiation therapy, immunotherapy, and/or targeted drug therapy (see below for more details).
The invention also provides a method for treating or managing lung cancer in a human subject, comprising determining a likelihood of the presence of lung cancer of a particular type, stage and/or grade as disclosed herein, and administering a suitable treatment based on the likelihood.
In further embodiments, a likelihood of the presence of lung cancer is determined in a human subject one or more times after the subject has undergone lung cancer treatment. This provides information about treatment response. In some embodiments, the human subject is identified as non- responsive to the lung cancer treatment and said lung cancer treatment is modified. In some embodiments, the human subject is identified as non-responsive to the lung cancer treatment and it is decided to modify said lung cancer treatment. In some embodiments, the human subject is identified as non-responsive to the lung cancer treatment and it is recommended to modify said lung cancer treatment. In some embodiments, the human subject is categorised as having residual disease or tumor viable cells, and a second-line therapy is administered, to the subject. In some embodiments, the human subject is categorised as having residual disease or tumor viable cells, and it is decided to administer a second-line therapy to the subject. In some embodiments, the second-line therapy comprises one or more of surgical resection (including wedge resection, segmental resection, sleeve resection, lobectomy and pnemonectomy), laser therapy, photodynamic therapy, cryosurgery, electocautery, chemotherapy, radiation therapy, immunotherapy, and/or targeted drug therapy. In some embodiments, said subject is categorised as having residual disease or viable tumor cells, thereby indicating that said subject is at high risk of disease recurrence.
The invention also provides a method for differentially amplifying tumor-derived cfDNA and non-tumor-derived cfDNA in cfDNA from a sample of a human subject having lung cancer comprising: (a) treating the cfDNA from a sample of a human subject having lung cancer with at least one reagent that differentially affects methylated and non-methylated DNA; and
(b) performing an amplification reaction on the treated cfDNA, wherein the amplification reaction differentially amplifies at least one marker in the sequence listing based on the methylation level of the at least one marker.
Preferably, the reagent that differentially affects methylated and non-methylated cfDNA is a MSRE or a MDRE. Most preferably, the reagent is a MSRE. Treating the cfDNA with a plurality of MSREs or MDREs, for instance, two MSREs, is also encompassed in the methods of the invention.
Alternatively, the reagent that differentially affects methylated and non-methylated cfDNA may be a reagent that conditionally chemically modifies nucleobases within DNA based on their methylation status. For example, a suitable reagent is sodium bisulfite, which converts unmethylated cytosine to uracil.
The amplification reaction is preferably PCR. Most preferably the amplification reaction is rtPCR because this can be highly sensitive and does not require a separate step for quantifying amplification. Alternatively, the amplification reaction may be an isothermal amplification reaction, such as RT-LAMP.
The invention also provides a method for preparing data useful for lung cancer diagnosis comprising: treating cell-free DNA (cfDNA) from a sample of a subject with a reagent that differentially affects methylated and non-methylated DNA; measuring a methylation level, based on the effect of the reagent on the cfDNA, for at least one marker of the sequence listing; and recording the measured methylation level(s).
The CpG markers disclosed herein have been selected based on their ability to identify methylation changes associated with lung cancer. More generally, however, the same markers can also be useful for identify methylation changes associated with other types of cancer and proliferative disorders. Moreover, they can be used as pan-cancer markers i.e. for determining the likelihood of the presence of multiple different types of cancer (including lung cancer).
DETAILED DESCRIPTION
Marker loci
The inventors have identified the genomic loci listed in the sequence listing as markers for early detection of lung cancer. Using cfDNA, measuring the methylation level of a CpG located within a sequence of the sequence listing can be used for determining the likelihood of the presence of lung cancer in a human subject. The sequence listing provides the sequence of each marker and comprises the following additional information for each marker: the marker’s chromosome (in the “chromosome” qualifier of the “source” feature of each sequence) start/end coordinates according to the hg38 genome assembly (in the “map” qualifier of the “source” feature of each sequence)
• whether methylation is increased (‘hyper’) or decreased (‘hypo’) in cfDNA from cancer patients compared to healthy control subjects i.e. how the index methylation level should be compared (in the “note” qualifier of the “source” feature for each sequence)
• the name of the nearest transcription start site to the marker (in the “gene” qualifier of the “gene” feature of each marker).
The markers of SEQ ID NOs: 1-30615 have increased methylation (‘hyper’) in cfDNA from cancer patients compared to healthy control subjects. The markers of SEQ ID NOs: 30616-39636 have decreased methylation (‘hypo’) in cfDNA from cancer patients compared to healthy control subjects.
In the sequence listing, two or more markers may overlap (e.g. SEQ ID NOs: 25726 & 25727) and, in these instances, the invention also extends to an aggregated marker encompassing the overlapping markers (e.g. for SEQ ID NOs: 25726 & 25727, nucleotides 12,685,047-12,685,301 on chromosome 16).
The term ‘genomic locus’ refers to a DNA sequence at a specific region within the genome. The specific region may be identified by the molecular location, namely, by the chromosome and the numbers of the starting and ending base pairs on the chromosome. Genomic loci include gene sequences as well as other genetic elements (e.g., intergenic sequences).
A ‘marker locus’ or simply ‘marker’, is a genomic locus that is differentially methylated between different sources of cfDNA (e.g. lung tumor vs. healthy tissue), and therefore analysis of its methylation provides an indication with respect to the source of the DNA.
In some embodiments, hypermethylation of a particular marker indicates of the presence of the cancer, where ‘hypermethylation’ means increased methylation of the marker across a sample of DNA molecules containing the marker, compared to an index methylation level for that marker in cfDNA from an individual/individuals without lung cancer. In some embodiments, hypomethylation of a particular marker indicates of the presence of the cancer, where ‘hypomethylation’ means decreased methylation of the marker across a sample of DNA molecules containing the marker, compared to an index methylation level for that marker in cfDNA from an individual/individuals without lung cancer.
The comparison of a methylation level for a marker in a sample and the index methylation level of that marker can use typical techniques used when comparing measurements in biological systems. Thus the comparison may be accompanied by an indication of the confidence in that comparison e.g. based on statistical analysis. In particular, it is possible to indicate the confidence that the methylation level in the sample differs from the index methylation level, rather than arising from normal biological variation.
The degree to which the methylation status of a particular marker is indicative of the presence or absence of lung cancer, can be quantified by measuring the area under the receiver operating characteristic (ROC) curve (AUC) for the marker. For any particular marker, the greater the difference between the AUC and 0.5, the more useful the methylation level of that marker is for disease prediction.
In more detail, for any marker, a particular methylation level can be chosen as a threshold for a disease prediction model based on methylation of that marker. The model would predict the presence of disease for observed methylation levels that are cross that threshold, and the absence of disease for observed methylation levels that do not. A particular classification threshold will be associated with a true positive rate (sensitivity), z. e. , the proportion of observations that are correctly predicted to indicate disease, and a false positive rate (1 - specificity), i. e. , the proportion of observations that are incorrectly predicted to indicate disease. A ROC curve is obtained by plotting the true positive rate (on the y axis) against the false positive rate (on the x axis) for various classification thresholds. A ROC curve that lies toward the top left corner (having a high AUC), indicates that there are classification thresholds that produce a low false positive rate and high true positive rate. Conversely, a ROC curve that lies toward the bottom right corner (having a low AUC), indicates that there are classification thresholds that produce a high false positive rate and low true positive rate. This situation may occur when a lower methylation level, z'.e., hypomethylation, is indicative of the presence of disease, or when a higher methylation level, i.e., hypermethylation, is indicative of the absence of disease. However, a ROC curve that is simply a straight line from the bottom left corner to the top right corner (AUC of 50% or 0.5), occurs if the true and false positive rates are equal at all classification thresholds, and indicates no predictive value. See also Makdrekar (J Thorac Oncol. 2010;5:1315-1316). Preferred markers herein have an AUC that differs from 0.5 by at least 0.2 e.g. by >0.25, >0.30, >0.35, >0.40, >0.45, or more. A hypermethylated marker may thus have a AUC of >0.7, and a hypomethylated marker may have a AUC of <0.2.
Markers according to the invention are described in the sequence listing. The location of the markers is given according to Genome Reference Consortium Human Build 38 patch pl3 (‘GRCh38.pl3’, generally known as ‘hg38’). The markers in the sequence listing cover between around 30 bp to around 500 bp of the human genome.
The markers in the sequence listing contain at least one CpG site located within a restriction site of a MSRE or MDRE. CpG site(s) may be at any position within a particular marker in the sequence listing. The invention can be based on analysis of any CpG found within the markers in the sequence listing.
Advantageously, these marker loci can be detected in cell-free DNA, particularly in cfDNA from plasma samples, enabling non-invasive disease detection and characterization. Cell-free DNA
The methods disclosed herein are particularly useful for analysing cell-free DNA (cfDNA) i.e., fragmented genomic DNA which is found in vivo in an animal within a bodily fluid rather than within an intact cell. The origin of cfDNA is not fully understood, but it is generally believed to be released from cells in processes such as apoptosis and necrosis. cfDNA is highly fragmented compared to intact genomic DNA (e.g., see Alcaide et al. (2020) Scientific Reports 10, article 12564), and in general circulates as fragments between 120-220 bp long, with a peak around 168 bp (in humans). cfDNA is present in many bodily fluids, including but not limited to blood and urine, and the methods and compositions disclosed herein can use any suitable source of cfDNA e.g., a blood sample (such as venous blood) or a urine sample. Ideally cfDNA is isolated from blood, and the blood may be treated to yield plasma (i.e., the liquid remaining after a whole blood sample is subjected to a separation process to remove the blood cells, typically involving centrifugation) or serum (i.e., blood plasma without clotting factors such as fibrinogen). Thus, the methods and compositions disclosed herein can be used as part of so-called liquid biopsy testing, and can be implemented using plasma or serum cfDNA. Methods disclosed herein may thus include a step of purifying cfDNA from a blood, plasma or serum sample, to provide cfDNA for digestion and analysis. Methods may also include a step of obtaining a blood sample and preparing plasma or serum therefrom, thus providing a source for downstream purification of cfDNA.
Blood can be collected in tubes that contain an anticoagulant and an agent to inhibit genomic DNA from white blood cells in the sample being released into the plasma component of the blood sample. Such tubes are commercially available as glass cfDNA ‘Blood Collection Tubes’ or ‘BCT’ from Streck (La Vista, NE) e.g. as discussed by Diaz et al. (2016) PLoS One 11(11): e0166354, and they can stabilize cfDNA within blood for up to 14 days at 6-37°C (thus providing advantages compared to typical K2EDTA collection tubes). Useful anticoagulants include, but are not limited to, EDTA, heparin, or citrate. Useful agents to inhibit release of genomic DNA from white blood cells include, but are not limited to, diazolidinyl urea, imidazolidinyl urea, dimethoylol-5,5- dimethylhydantoin, dimethylol urea, 2-bromo-2-nitropropane-l,3-diol, oxazolidines, sodium hydroxymethyl glycinate, 5-hydroxy-methoxymethyl-l-laza-3,7-dioxabicyclo[3.3.0]octane, 5- hydroxymethyl- 1-1 aza-3,7dioxa-bicyclo[3.3.0]octane, 5-hydroxypoly [methyleneoxy]methyl- 1 -laza- 3,7dioxabicyclo[3.3.0]-octane, quaternary adamantine, and mixtures thereof. Other useful components can include a quenching agent (e.g. lysine, ethylene diamine, arginine, urea, adenine, guanine, cytosine, thymine, spermidine, or any combination thereof) which can abate free aldehyde from reacting with DNA within a sample, aurintricarboxylic acid, metabolic inhibitors (e.g. glyceraldehyde and/or sodium fluoride), and/or nuclease inhibitors. For instance, a tube can include imidazolidinyl urea (or diazolidinyl urea), EDTA and glycine. Further information about suitable collection tubes can be found in W02013/123030 and US2010/0184069. Other useful collection tubes are available, including but not limited to various plastic tubes: the ‘Cell-Free DNA Collection Tube’ from Roche, made of PET; the ‘LBgard blood tube’ from Biomatrica, made from plastic and suitable for up to 8.5 mL of blood; and the ‘PAXgene Blood DNA tube’ from PreAnalytiX or Qiagen. These tubes are discussed in more detail in Kerachian et al. (2021) Clinical Epigenetics 13,193 and Grolz et al. (2018) Current Pathobiology Reports 6:275-86.
These various tubes can store up to 8.5 mL of blood, or sometimes up to 10 mL. A blood sample taken from a subject may thus typically have a volume of between 5-10 mL.
A 10 mL blood sample typically yields between 10-500 ng cfDNA, but can sometimes yield substantially higher amounts e.g. up to around 10 pg, particularly in certain cancer patients. Methods disclosed herein can be performed on the amount of cfDNA contained in a 10 mL blood sample. Methods and compositions disclosed herein may typically use from 10-400 ng of cfDNA, for instance from 10-250 ng or from 10-200 ng.
Analysis of plasma-derived cfDNA is preferred. Kits for purifying cfDNA from plasma (and other bodily fluids) are readily available e.g. the MagMAX cfDNA isolation kit from ThermoFisher, the Maxwell RSC ccfDNA plasma kit from Promega, the Apostle MiniMax high efficiency isolation kit from Beckman Coulter, or the QIAamp or EZ1 products from Qiagen.
Methods disclosed herein may therefore utilise cfDNA extracted from < 10 mL blood from a subject. Methods may begin with cfDNA which has already been prepared, or may include an upstream step of preparing the cfDNA. Similarly, methods may include an upstream step of obtaining a plasma sample before a step of preparing cfDNA from the plasma sample.
Preferably, the cfDNA utilised in methods disclosed herein is substantially free of singlestranded DNA (ssDNA) i.e. where less than 7% of the cfDNA molecules (by number) are singlestranded, and preferably less than 5% or less than 1% (i.e. such that at least 99% of the cfDNA molecules are double-stranded). In some embodiments, the cfDNA contains less than 0.1% ssDNA, less than 0.01% ssDNA, or may even contain no ssDNA (i.e. free of ssDNA). Extraction of cfDNA to obtain a cfDNA sample substantially free of ssDNA is described, for example, in WQ2020/188561. Ensuring low levels of ssDNA avoids potential inhibition of restriction digestion, and also avoids undesired amplification of ssDNA. Commercial kits are available for quantifying single-stranded DNA in a sample e.g. the Promega QuantiFluor™ kit.
In some embodiments, all extracted cfDNA is used in the methods disclosed herein. In other embodiments, cfDNA is split into multiple fractions, and one or more fractions is not used in the methods disclosed herein but may instead be used in other analytical methods, or is kept for use in control experiments, or for other purposes.
In some embodiments, cfDNA is quantified prior to digestion. In other embodiments, cfDNA is not quantified prior to digestion. Measuring a methylation level
Methods of the invention comprise measuring a methylation level of a marker in the sequence listing. A ‘methylation level’ of a marker as used herein is a numerical value conveying information about the proportion or number of cfDNA molecules in a sample of cfDNA which were methylated and/or unmethylated at one or more CpG site(s) in the marker. The invention can use any method suitable for measuring a methylation level. Methods encompassed by the invention include those that comprise analysis of DNA upstream and/or downstream of the markers given in the sequence listing, so long as the methylation level of at least one CpG site in a marker in the sequence listing is measured. Preferred methods are those comprising cfDNA digestion using methylation sensitive and/or dependent restriction endonucleases (MSREs/MDREs) followed by downstream analytical steps which quantify the degree of digestion of the marker and/or of a CpG site in the marker. Preferred downstream analyses are high-throughput sequencing (also known as next-generation sequencing or NGS) or real-time PCR (rtPCR, also known as quantitative PCR or qPCR).
A methylation level can be expressed as a percentage, a fraction, a normalised value, etc. For example, a methylation level of a marker may be expressed as a percentage, ratio or fraction representing the proportion of cfDNA molecules that are methylated at one or more CpG sites in the marker out of the total number of cfDNA molecules comprising the marker. Alternatively, a methylation level of a marker may be expressed as a copy number of methylated or unmethylated cfDNA molecules comprising the marker. This may be expressed as a ‘HitspanN’ of a genomic position in the marker (explained in more detail below). As a further example, in, for instance, methods comprising rtPCR, a methylation level may be expressed as the quantification cycle (Cq) for an amplicon comprising a marker locus. In this case, the methylation level of a marker would again represent the number of cfDNA molecules comprising the marker which were methylated or unmethylated.
In methods comprising digestion with MSREs or MDREs, methylation levels can be determined according to how often the MSREs/MDREs used cleave and/or do not cleave at their recognition site during digestion. For example, where digestion used an MSRE, alignments which span a particular recognition site indicate molecules which were not cleaved, and so which (with complete digestion) were methylated at the CpG site within the recognition site. So, alignments which span a recognition site directly indicate methylation of the site when an MSRE was used for digestion (and conversely, indicate unmethylation when an MDRE was used). Further, where digestion used an MSRE, alignments which start or terminate with the cleaved recognition site indicate molecules which were unmethylated at the site (and therefore cleaved during digestion). So, alignments which start or terminate with the cleaved recognition site directly indicate unmethylation of the site when an MSRE was used for digestion (and indicate methylation of the site when an MDRE was used). A methylation level can be determined from alignments that directly indicate methylation and/or alignments that directly indicate unmethylation. Preferably, alignments that directly indicate methylation and alignments that directly indicate unmethylation are considered because this allows for greater accuracy. In methods comprising methylation-conditional nucleobase modification (see below), methylation levels can be determined according to how often a nucleobase capable of being modified by the reagent(s) is modified by the reagent(s). For instance, in embodiments comprising sodium bisulfite treatment, a methylation level of a CpG can be determined from the number of reads wherein the site has the sequence ‘TG’ instead of ‘CG’.
One way of expressing the number of reads or alignments which span a genomic position, such as the position of a CpG site within a restriction enzyme recognition site or the position of a nucleobase capable of being modified in a methylation-conditional fashion, is using ‘HitspanN’, for instance ‘HitspanlOO’. The HitspanN of a genomic position corresponds to the number of reads or alignments with a size of ‘N’ nucleotides centred on the position (where N is a positive even integer). For example, in reference to sequencing data, the HitspanlOO of a genomic position refers to the number of reads or sequence alignments with a size of at least 100 nucleotides centred on the position. So, a HitspanlOO of 90 at a specific position means that there are 90 sequence reads or alignments with a size of at least 100 nucleotides centred on the position.
A methylation level may be normalised with respect to a reference locus and/or a reference DNA sample. In some particular embodiments, the methylation level is a methylation ratio between a marker locus and a reference locus (which may be in the cfDNA being analysed or in a reference DNA sample), expressed as a ratio between signals obtained for these loci in downstream analysis following restriction digestion, methylation-conditional nucleobase modification, PCR amplification, etc.
In other embodiments, the methylation level of a marker can be calculated by dividing the HitspanN of aposition in the marker by an expected HitspanN of the position e.g., the HitspanN which would be expected if the marker was fully methylated, and thus uncleaved by an MSRE). The expected HitspanN may be determined using, for instance: (i) the HitspanN of a position in a reference locus that is not cut by the restriction enzyme; (ii) the average HitspanN of positions in a plurality of such reference loci; or (iii) the HitspanN of a position in a reference locus in an undigested reference sample (e.g. a portion of the extracted DNA reserved as undigested DNA), optionally corrected for sequencing depth differences. For example, methylation level may be inferred by comparing the HitspanN in a digested sample to the HitspanN in a reference locus in an undigested sample.
To avoid double-counting, the non-methylated CpG sites can be taken as sequencing reads whose 5' ends map to a site, as sequencing reads whose 3' ends map to a site, or as the half of the sum of sequencing reads whose 5' ends or 3' ends map to a site. As some sequencing library preparation methods (see below) can result in depletion of small fragments, which are then not sequenced (e.g., in CpG islands, where a starting cfDNA molecule is cleaved by a MSRE at more than one unmethylated site, thus providing 3 or more restriction fragments, some of which are very small), the observed number of unmethylated CpG sites may be lower than the true value in the original sample. This distortion can be somewhat addressed by using the larger of the number of reads or alignments whose 3' ends map to a site and the number of reads or alignments whose 5' ends map to a site (or to use the mean).
These calculations can thus provide, for any given CpG site, the proportion of cfDNA molecules in a sample which were methylated at that CpG site. Conversely, similar calculations can provide the proportion of a particular CpG site which were unmethylated. These figures can be expressed as a percentage, a fraction, a normalised value, etc.
The reference locus may be a different locus compared to the marker locus. For example, the reference locus and the marker locus may be present in the cfDNA from samples from the one or more first subjects. The reference locus may be in DNA from a sample other than those from the one or more first subjects and one or more second subjects, such as an artificial sample comprising a locus with a known methylation level.
Alternatively, the reference locus may be the same locus as the marker locus, with the reference locus and marker locus in different samples. For example, the marker locus may be present in cfDNA from samples from the one or more first subjects, and the reference locus may be in cfDNA from samples from the one or more second subjects. The marker locus may be present in cfDNA from samples from one of a plurality of first subjects, and the reference locus may be the same locus in cfDNA from samples from another one of the plurality of first subjects - for example, in first subjects that have a different disease classifications.
Methylation level may also be determined without use of a reference locus. For example, the expected read count for a marker locus may be determined as the sum of the read count for the marker locus (indicating methylation, where an MSRE is used) with the read count of loci that start or end at the marker locus (indicating non-methylation), taking account where necessary of any end repair which took place during library preparation. Therefore, a methylation level may be determined without reference to other loci or other samples, based on the ‘raw’ or ‘absolute’ level of methylation at the marker locus.
Methods of the invention may comprise a methylation-conditional nucleobase modification step in which chemical changes are made to nucleobases within DNA based on their methylation status. Such chemical changes can be detected in downstream analytical steps. A suitable downstream step is high-throughput sequencing. When the methods of the invention comprise a methylation-conditional nucleobase modification step, digestion using MSREs/MDREs is not necessary.
In some embodiments, the methylation-conditional nucleobase modification step is bisulfite conversion. In bisulfite conversion, DNA is treated with sodium bisulfite to convert unmethylated cytosine to uracil. The differences in sequence between treated and untreated DNA permits methylation to be detected. Methods of the invention may comprise bisulfite conversion (including as part of an upstream step when preparing the DNA) followed by downstream analytical steps which can distinguish uracil from cytosine in the markers of the invention. In some embodiments, the methylation-conditional nucleobase modification step is ten-eleven translocation (TET)-assisted pyridine borane sequencing (TAPS). Herein, the term “TAPS” refers to a nucleobase modification technique and does not include any particular methodology for reading the sequence of treated DNA. In TAPS, methylated cytosine is converted to dihydrouracil, which is recognised as thymine. Methods of the invention may comprise TAPS (including as part of an upstream step when preparing the DNA) followed by downstream analytical steps which can distinguish dihydrouracil from cytosine in the markers of the invention.
In some embodiments, measuring a methylation level comprises the methodology described in Fiillgrabe et al. (2023) Nature Biotechnol. (https://doi.org/10.1038/s41587-022-01652-0; see also WO2022/023753) in which methylation-conditional nucleobase modification is combined with particular downstream DNA sequencing steps.
However, preferred methods do not include a step of methylation-conditional nucleobase modification (also called nucleobase conversion).
DNA digestion
As mentioned above, preferred methods do not include nucleobase conversion. Instead, preferred methods disclosed herein use restriction enzymes which recognise specific sequences in double-stranded DNA and introduce a double-stranded break into the DNA. More specifically, methods and compositions disclosed herein may use MSREs and/or MDREs.
MSREs and MDREs recognise specific sequences in double-stranded target DNA and introduce a double-stranded break into the target DNA. A MSRE cleaves the target DNA only if a CpG associated with its recognition site is unmethylated, and methylation inhibits the cleavage. Conversely, a MDRE cleaves the target DNA only if a CpG associated with its recognition site is methylated. Accordingly, DNA digestion with MSREs and/or MDREs provides information about the methylation status of the CpGs within recognition sites present in the target DNA. Type II restriction endonucleases, i.e., enzymes where the double-stranded break is introduced within the recognition site, are particularly useful in the invention. The use of multiple restriction enzymes permits simultaneous digestion in parallel within a sample.
The recognition sites of MSREs and MDREs are also called ‘restriction sites’. Many MSREs and MDREs, with different restriction sites, are commercially available, so a broad coverage of CpG sites across a genome can be obtained using the appropriate combination of MSREs and/or MDREs. Because broad genomic coverage can be obtained, the use of MSREs and/or MDREs in the invention is particularly preferred, with the use of MSREs being most preferable.
In some embodiments, cfDNA from the sample is digested with MSRE(s). In some embodiments, cfDNA from the sample is digested with MDRE(s). In some embodiments, cfDNA from the sample is digested with MSRE(s) and MDRE(s). Use of MSRE(s) without any MDRE is preferred, and use of a combination of two or more MSREs is preferred, as discussed below. In embodiments involving DNA digestion, enzymes and DNA are typically incubated for a long enough period for substantially complete digestion to occur i.e., further incubation does not lead to any measurable increase in DNA cleavage. For a typical sample, this can be achieved by incubation at 37°C for 2 hours, but longer digestions can be performed if desired e.g., 3 hours, 4 hours, or longer (e.g., overnight). In some embodiments, digestion is performed for 11 hours or less e.g., for between 2-10 hours, 2-9 hours, 2-8 hours, or 2-4 hours. In other embodiments (e.g., where a collection tube is used, as discussed herein) digestion may be performed for longer periods e.g., for 12 hours or more.
Allowing a digestion reaction to substantially proceed to completion provides information about the cleavability of the restriction site of the restriction endonuclease(s) used in the reaction. For example, if a particular restriction site in a particular DNA molecule is not cleaved after complete digestion, then it can be inferred that the locus in that molecule was not cleavable. Lack of cleavage of a MSRE restriction site thus indicates that a CpG sequence which is within or overlaps with that MSRE recognition sequence was methylated, while cleavage indicates that it was unmethylated.
After digestion has occurred, it is preferred to inactivate the restriction enzymes, particularly if downstream amplification steps will be used. Many restriction enzymes can be inactivated by heating (e.g. to 65°C or 85°C) e.g., by immersing the reaction mixture in a water bath, or by subjecting the mixture to a raised temperature within a thermal cycler which can be used for subsequent PCR. Digestion reaction mixtures with cfDNA tend to have a low volume such that the temperature of the whole reaction mixture reaches the elevated temperate very quickly, leading to inactivation of the enzymes. In some embodiments heating at this temperature occurs for longer than 15 minutes, and ideally occurs for at least 20 minutes e.g., for 20-60 minutes. The temperature can exceed the temperature required for inactivation if desired, but this is not required. This heating step is adequate for complete inactivation of the restriction enzymes i.e., such that the enzymes’ digestion activity toward cleavable target cfDNA molecules under the digestion conditions employed prior to heating can no longer be measurably detected.
Preferred methods do not use restriction enzyme isoschizomers, where one of the enzymes recognizes both the methylated and unmethylated forms of the restriction site while the other recognizes only one of these forms.
Preferred methods do not use a mixture of restriction enzymes in which at least one enzyme has a recognition sequence which includes a CpG but which is neither a MSRE or a MDRE i.e., an enzyme which digests regardless of the CpG methylation status.
Restriction enzymes
MSREs and MDREs are readily available from well-known commercial suppliers, such as ThermoFisher, New England Biolabs, Promega, etc.
MSREs include, but are not limited to: Aatll, AccII, Acil, Acll, Afel, Agel, Aorl3HI, Aor51HI, Asci, AsiSI, Aval, BceAI, BmgBI, BsaAI, BsaHI, BsiEI, BsiWI, BsmBI, BspDI, BspT104I, BssHII, BstBI, BstUI, CfrlOI, Clal, Cpol, DpnII, EagI, Eco52I, Faul, Fsel, FspI, Haell, HapII, Hgal, Hhal, HinPH, Hpall, Hpy99I, HpyCH4IV, KasI, Mini, MspI, Nael, Narl, NgoMIV, Notl, Nrul, Nsbl, PaeR7I, PluTI, PmaCI, Pmll, Pspl406I, Pvul, RsrII, SacII, Sall, ScrFI, Sfol, SgrAI, Smal, SnaBI, Srfl, TspMI, Zral, and high-fidelity (HF®) versions of any of these listed enzymes.
MDREs include, but are not limited to: BspEI, BtgZI, FspEI, Glal, LpnPI, McrBC, MspJI, Xhol, Xmal.
Two preferred MSREs are HinPH and Acil.
The invention also provides for the use of a plurality of restriction endonucleases, wherein the plurality consists of MSRE and/or MDRE. Thus, the plurality may include only MSREs, only MDREs, or a mixture of both (e.g. one or more MSRE plus one or more MDRE). In general, however, it is preferred to work with MSREs, without needing MDREs, and thus the plurality includes two or more MSREs. Using MSREs leads to digested cfDNA in which methylated CpG sites are intact but unmethylated CpG sites are digested. Thus, for any particular CpG-containing restriction site in a cfDNA sample, a higher percentage of methylation at this site leads to a lower extent of digestion compared to a cfDNA sample containing a higher percentage of methylation at this site.
A preferred plurality of MSREs includes both HinPH and Acil. In some embodiments it is possible to use one or more MSREs in addition to HinPH and Acil, but it is more preferred to use HinPH and Acil as the only two restriction enzymes for digestion of cfDNA. The markers in the sequence listing include a restriction site for HinPH and/or Acil. This pairing of enzymes covers over 99% of CpG islands in the human genome. With this MSRE pairing it is preferred to include HinPH at an excess (measured in terms of enzymatic units) to Acil, and ideally an excess of at least 1.2:1 e.g. at least 1.5:1, at least 1.75:1, at least 2:1, at least 3:1, at least 4:1, or at least 5:1. Ratios between 2:1 and 5:1 are particularly useful with human cfDNA, and an excess of about 4.5 is preferred. Digestion can be performed at about 37°C, until completion. Incubation at 37°C for 2 hours is typically adequate for complete digestion with HinPH and Acil.
HinPH (sometimes known as Hin6I) recognises the sequence GCGC and cleaves after the first G to leave a two nucleotide 5' overhang (5 -G/CGC). It cuts well at 37°C and can be heat-inactivated by heating at 65°C for 20 minutes. For HinPH, NEB recommends the use of its rCutSmart™ buffer (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 pg/mL recombinant albumin, pH 7.9). 1 unit of HinPH is defined as the amount of enzyme required to digest 1 pg of DNA in 1 hour at 37°C in a total reaction volume of 50 pL.
Acil recognises the sequence CCGC and cleaves after the first C to leave a two nucleotide 5' overhang (5'-C/CGC). It cuts well at 37°C and can be heat-inactivated by heating at 65°C for 20 minutes. For Acil, NEB recommends the use of its rCutSmart™ buffer (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 pg/mL recombinant albumin, pH 7.9). 1 unit of Acil is defined as the amount of enzyme required to digest I pg of A DNA in 1 hour at 37°C in a total reaction volume of 50 pL. Its recognition site is non-palindromic.
A DNA is a commonly used DNA substrate extracted from bacteriophage lambda (cI857ind 1 Sam 7), being 48502bp long. It is usually stored in 10 mM Tris-HCl (pH 8.0), 1 mM EDTA, and is widely available from commercial suppliers e.g., from NEB under catalogue number N3011S.
After digestion has occurred, it is preferred to inactivate the restriction enzymes, particularly if downstream amplification steps will be used. HinPlI and Acil can both be inactivated by heating at 65 °C. In some embodiments heating at this temperate occurs for longer than 15 minutes, and ideally occurs for at least 20 minutes e.g., for 20-60 minutes. The temperature can exceed 65°C if desired, but this is not required. This heating step is adequate for complete inactivation of the restriction enzymes i.e., such that the enzymes’ digestion activity which was present during cfDNA digestion can no longer be measurably detected even when cleavable target molecules are present.
Downstream amplification
Because the marker loci disclosed herein contain differentially methylated CpG sites located within recognition site(s) of at least one MSRE and/or MDRE, differences in methylation levels between DNA sources result in differences in the degree of digestion, and subsequently different amplification patterns in subsequent amplification and quantification steps. Such differences enable distinguishing between DNA from different sources, for example, between DNA samples from subjects with lung cancer and DNA samples from healthy subjects.
Thus, methods disclosed herein may include a step of amplification (e.g., PCR) performed on the digested cfDNA. Typically, this amplification will be targeted to the marker(s) of interest. Thus, upstream and downstream primers are used which flank a CpG site of interest in the marker, and the intervening CpG-containing sequence will be amplified if it has not been digested by restriction enzymes. The resulting amplicons can then be detected e.g., using a labelled probe which is complementary to a sub-sequence within the amplicons of interest.
Methods may therefore include a step of adding PCR reagents after digestion e.g., suitable buffer/salt components (if required in addition to buffer/salt remaining from digestion), a DNA polymerase (such as a Taq polymerase), dNTPs, primers and (optionally) probes. As an alternative, one or more of these components may be present during digestion e.g., it is possible to use a hot start PCR protocol, such that PCR reagents are already present during the digestion step but they do not become active until the reaction mixture is heated (e.g. during heat inactivation of the restriction enzymes).
Restriction digestion typically takes place in the presence of high levels of Mg++. PCR usually relies on Mg++, so standard PCR buffers include Mg++. In this situation, however, addition of a standard PCR buffer can lead to an excess of Mg++ which can inhibit efficiency of amplification. Thus added PCR reagents may include a lower level of Mg++ than would normally be the case. Where PCR primers and probes are present during MSRE digestion, they should be designed so that their sequences do not include the recognition site for the MSRE(s) which is/are being used.
Amplification and detection of amplicons may be carried out by conventional PCR using fluorescently-labeled primers followed by capillary electrophoresis of amplification products. In some embodiments, following amplification the amplification products are separated by capillary electrophoresis and fluorescent signals are quantified. An electropherogram plotting the change in fluorescent signals as a function of size (bp) or time from injection may be generated, wherein each peak in the electropherogram corresponds to the amplification product of a single locus. The peak's height (provided for example using ‘relative fluorescent units’, rFU) may represent the intensity of the signal from the amplified locus. Computer software may be used to detect peaks and calculate the fluorescence intensities (peak heights) of a set of loci whose amplification products were run on the capillary electrophoresis machine, and subsequently the ratios between the signal intensities.
A preferred PCR technique is real-time PCR (also known as qPCR), in which simultaneous amplification and detection of the amplification products are performed. Real-time PCR can be used with non-specific detection or sequence-specific detection. Non-specific detection (e.g., using a dsDNA-binding dye, such as SYBR Green) can be used within the methods disclosed herein, but is not ideal if it is desired to distinguish between multiple different amplicons in the same reaction. Thus, it is more typical to use sequence-specific detection, and methods and compositions may use a labelled oligonucleotide probe (usually with a fluorophore and fluorescence quencher on the same probe, as in the TaqMan system) which is complementary to a specific sequence within nucleic acid amplicon(s) of interest. Different probes for amplicons derived from different target CpGs can be labelled with different fluorophores so that multiple different amplicons can be distinguished.
Real-time PCR may thus be achieved by using a hydrolysis probe based on combined reporter and quencher molecules. In such assays, oligonucleotide probes have a fluorescent moiety (fluorophore) attached to their 5' end and a quencher attached to the 3' end. During PCR amplification, the polynucleotide probes selectively hybridize to their target sequences on the template, and as the polymerase replicates the template it also cleaves the polynucleotide probes due to the polymerase’s 5'-nuclease activity. When the polynucleotide probes are intact, the close proximity between the quencher and the fluorescent moiety normally results in a low level of background fluorescence. When the polynucleotide probes are cleaved, the quencher is decoupled from the fluorescent moiety, resulting in an increase of intensity of fluorescence. The fluorescent signal correlates with the amount of amplification products, i.e., the signal increases as the amplification products accumulate.
Suitable fluorophores include, but are not limited to, fluorescein, FAM, lissamine, phycoerythrin, rhodamine, Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX, JOE, HEX, NED, VIC and ROX. Suitable fluorophore/quencher pairs are known in the art, including but not limited to: FAM- TAMRA, FAM-BHQ1, Yakima Yellow-BHQl, ATTO550-BHQ2 and R0X-BHQ2. Fluorescence may be monitored during each PCR cycle, providing an amplification plot showing the change of fluorescent signals from the probe(s) as a function of cycle number. In the context of real-time PCR, the following terminology is used:
‘Quantification cycle’ (‘Cq’) refers to the cycle number in which fluorescence increases above a threshold, set automatically by software or manually by the user. In some embodiments, the threshold may be constant for each CpG locus of interest and may be set in advance, prior to carrying out the amplification and detection. In other embodiments, the threshold may be defined separately for each CpG locus after the run, based on the maximum fluorescence level detected for this locus during the amplification cycles.
‘Threshold’ refers to a value of fluorescence used for Cq determination. In some embodiments, the threshold value may be a value above baseline fluorescence, and/or above background noise, and within the exponential growth phase of the amplification plot.
‘Baseline’ refers to the initial cycles of PCR where there is little to no change in fluorescence.
Computer software is readily available for analysing amplification plots and determining baseline, threshold and Cq.
Where a CpG site has not been digested, and is thus amplified in subsequent PCR, relatively low Cq values are seen because detectable amplification products accumulate after a relatively small number of amplification cycles. Conversely, if amplicons are present at lower levels (e.g., because some CpG loci of interest were digested) then fewer amplicons are seen, and the Cq value is higher.
These results can thus indicate, for any given CpG site, the proportion of cfDNA molecules in a sample which were methylated/unmethylated at that CpG site. These figures can be expressed as a percentage, a fraction, a normalised value, etc.
Primers may vary in length, depending on the particular assay format and the particular needs. In some embodiments, the primers may be at least 15 nucleotides long, such as between 15- 25 nucleotides or 18-25 nucleotides long. The primers may be adapted to be suited to a chosen amplification system.
Primers may be designed to generate amplicons between 60-150 bp long (when the relevant CpG site(s) is/are intact) e.g. between 70-140 bp long.
Oligonucleotide probes may vary in length. In some embodiments, the probes may include between 15-30 nucleotides, from 20-30 nucleotides, or from 25-30 nucleotides.
The oligonucleotide probes may be designed to bind to either strand of the double-stranded amplicons. Additional considerations include the melting temperature of the probes, which should preferably be comparable to that of the primers. Where multiple CpG sites are analysed in parallel, with simultaneous amplification of more than one target in the same reaction mixture (co-amplification) using different primer pairs for each CpG site of interest, these different primers may be designed such that they can work at the same annealing temperature during amplification. Thus, primers with similar melting temperature (Tm) can be designed e.g. within + 3°-5°C of each other. Similar considerations apply where multiple probes are used.
Computer software is readily available for routine designing of primers and probes which meet the various requirements of any particular experiment.
Downstream sequencing
Methods disclosed herein may include a step of DNA sequencing, such as a step using nextgeneration sequencing (‘NGS’) techniques (also known as high-throughput sequencing). NGS generally involves three basic steps: library preparation; sequencing; and data processing. Examples of NGS techniques include sequencing-by-synthesis and sequencing -by-ligation (employed, for example, by Illumina Inc., Life Technologies Inc., PacBio, and Roche), nanopore sequencing methods and electronic detection-based methods such as Ion Torrent™ technology (Life Technologies Inc.). NGS may be performed using various high-throughput sequencing instruments and platforms, including but not limited to: Novaseq™, Nextseq™ and MiSeq™ (Illumina), 454 Sequencing (Roche), Ion Chef™ (ThermoFisher), SOLiD® (ThermoFisher) and Sequel II™ (Pacific Biosciences). Appropriate platform-designed sequencing adapters are used for preparing the sequencing library, and are readily available from the platforms’ manufacturers.
Library preparation for the major high-throughput sequencing platforms involves ligation of specific adapter oligonucleotides, also termed “sequencing adapters”, to the DNA fragments to be sequenced. Sequencing adapters typically include platform-specific sequences for fragment recognition by a particular sequencer e.g. sequences that enable ligated molecules to bind to the flow cells of Illumina platforms (e.g. the P5 and P7 sequences). Each sequencing instrument provider typically sells a specific set of sequences for this purpose. Further details of library preparation are discussed below.
Sequencing adapters can include sites for binding to a universal set of PCR primers. This permits multiple adapter-ligated DNA molecules to be amplified in parallel by PCR, using a single set of primers.
Sequencing adapters can include sample indices, which are sequences that enable multiple samples to be combined, and then sequenced together (i.e. multiplexed) on the same instrument flow cell or chip. Each sample index, typically 6-10 nucleotides, is specific to a given sample and is used for de-multiplexing during downstream data analysis to assign individual sequence reads to the correct sample. Sequencing adapters may contain single or dual sample indexes depending on the number of libraries combined and the level of accuracy desired. Sequencing adapters can include unique molecular identifiers (UMIs) to provide molecular tracking, error correction and increased accuracy during sequencing. UMIs are short sequences, typically 5 to 20 bases in length, used to uniquely identify original molecules in a sample library. As each nucleic acid in the starting material is tagged to provide a unique molecular barcode, bioinformatics software can filter out duplicate reads and PCR errors with a high level of accuracy and report unique reads, removing the identified errors before final data analysis.
In some embodiments, sequencing adapters include both a sample barcode sequence and a UMI.
In some embodiments, sequencing adapters allow for paired-end sequencing.
In some embodiments, the compositions and methods disclosed herein use Y-shaped sequencing adapters i.e., adapters consisting of two single-stranded oligonucleotides which anneal to provide a double-stranded stem and two single-stranded ‘arms’. In other embodiments, the compositions and methods disclosed herein use hairpin sequencing adapters i.e., a single-stranded oligonucleotide whose 5' and 3' termini anneal to provide a double-stranded stem. For both Y-shaped and hairpin adapters the double-stranded stem can include a short single-stranded overhang e.g., a single A or T nucleotide. For both Y-shaped and hairpin adapters the double-stranded stem can be ligated to a cfDNA fragment, to prepare a sequencing library.
Suitable sequencing adapters for use in the compositions and methods disclosed herein may thus be TruSeq™ or AmpliSeq™ or TruSight™ adapters (for use on the Illumina platform) or SMRTbell™ adapters (for use on the PacBio platform).
Where sequencing adapters are added by ligation, this usually occurs at both ends of the DNA to be sequenced.
Restriction digestion can leave blunt-ends, but typically produces a single-stranded overhang. Library preparation steps can either preserve this overhang (i.e., add complementary nucleotides) or remove it. As the sequence of a post-digestion terminal single-stranded overhang can include useful information then it is preferred to add sequencing adapters in a way which preserves the overhang e.g. using enzymatic ligation in which a ligase enzyme covalently links a sequencing adapter to a DNA fragment where the terminal sequence of the adapter is complementary to the terminal sequence obtained using the restriction enzyme, or by using a polymerase to add complementary nucleotides and generate a blunt-ended fragment.
In addition to removing or filling in single-strand overhangs, end repair methods can be carried out before adapter ligation can ensure that DNA molecules contain 5' phosphate and 3' hydroxyl groups.
For some libraries, incorporation of a non-templated deoxyadenosine 5 '-monophosphate (dAMP) onto the 3' end of blunted DNA fragments is used in library preparation (a process known as dA-tailing). dA-tails prevent concatemer formation during downstream ligation steps and enable DNA fragments to be ligated to adapter oligonucleotides with complementary dT-overhangs.
As noted above, restriction digestion typically takes place in the presence of high levels of Mg++. Sequencing library preparation may also rely on Mg++, so standard library prep buffers include Mg++. In this situation, however, addition of a standard library prep buffer can lead to an excess of Mg++ which can inhibit efficiency of downstream steps. Thus, added reagents may include a lower level of Mg++ than would normally be the case for library preparation.
As an alternative approach to using lower levels of Mg++, it is possible to add a chelating agent after digestion, which can remove the need for removal or dilution of excess Mg++ for downstream amplification step(s). It has been found that the addition of a chelating agent at the concentrations disclosed herein impairs neither such amplification step(s) nor subsequent sequencing. The chelating agent can be added to provide an amplification reaction mix comprising the chelating agent and a divalent cation at a molar ratio of between 1:20 to 2:1. For instance, the reaction mix may include 8-20 mM Mg++ e.g., about 10 mM magnesium. For instance, amplification may be carried out in a reaction mix comprising between 3-4 mM chelating agent and 4 mM Mg++. The chelating agent may comprise one or both of EDTA and EGTA.
After library preparation, the prepared DNA molecules can be sequenced, to provide a plurality of ‘sequence reads’. Sequence reads from DNA sequencing are then subjected to data processing e.g., to remove sequences which do not fulfil desired quality criteria, to remove duplicates, to correct sequencing errors, to map sequences onto a reference genome, to count the number of sequence reads, etc. Computer software is readily available for performing these steps.
The sequencing may be single-read sequencing or paired-end sequencing. Paired-end sequencing is preferred. In single-read sequencing, individual DNA strands are sequenced from one end. In paired-end sequencing, individual DNA strands are sequenced from both ends of the strand. Paired-end sequencing produces paired-end reads, wherein a single paired-end read contains a forward read derived from one end of the DNA strand and a reverse read derived from the other end of the DNA strand. The forward and reverse reads may or may not overlap.
Sequence reads can be mapped to a reference genome i.e., a previously identified genome sequence, whether partial or complete, assembled as a representative example of a species or subject. A reference genome is typically haploid, and typically does not represent the genome of a single individual of the species but rather is a mosaic of the genomes of several individuals. A reference genome for the methods of the present invention is typically a human reference genome e.g., a complete human genome, such as the human genome assemblies available at the website of the National Center for Biotechnology Information or at the University of California, Santa Cruz, Genome Browser. An example of a suitable reference genome for human studies is the GRCh38 major assembly (up to patch pl3). Mapping aligns sequence reads to the reference genome, to identify the location of the reads within the reference genome. The sequence reads that align are designated as being ‘mapped’. The alignment process aims to maximize the possibility for obtaining regions of sequence identity across the various sequences in the alignment, allowing mismatches, indels and/or clipping of some short fragments on the two ends of the reads. The number of sequence reads mapped to a certain genomic locus is referred to as the ‘read count’ or ‘copy number’ of this genomic locus. It is not necessary to map all sequence reads which are obtained; indeed, it is not unusual that a portion of sequence reads obtained in any given experiment will not be mappable.
Sometimes, the forward and reverse reads of a paired-end read will map upstream and downstream of a locus but not overlap. In this situation, the 5' and 3' ends of DNA molecule that gave rise to the paired-end read have been directly sequenced, but the sequence of the intervening DNA can be indirectly sequenced (as the genomic sequence between the mapped regions). So, a ‘sequence alignment’, or simply ‘alignment’ may contain both direct and indirect sequence information. If the sequenced DNA had been digested with an MSRE or MDRE, then the methylation status of an indirectly sequenced restriction site can be inferred from the alignment because the site must have been intact (so methylated if the DNA had been digested with an MSRE or unmethylated if the DNA had been digested with an MDRE). Therefore, in embodiments comprising restriction digestion, the analysis of sequencing data is preferably based on sequence alignments. In embodiments comprising methylation-conditional nucleobase modification, only direct sequence information can be used.
Preferably, alignments used in the analysis are less than about 600 bp and more than about 50 bp in length. More preferably, alignments are less than about 500 bp and more than about 100 bp in length, or less than about 400 bp and more than about 100 bp in length. Another way of expressing coverage that is useful in the analysis methods is to use ‘HitspanN’, such as ‘HitspanlOO’.
Any particular CpG site can feature in multiple sequence reads, which can be sequence reads derived from the same original cfDNA molecule and/or from different cfDNA molecules which span the same CpG site. Sequencing is suitably performed such that CpG site(s) of interest is/are seen in at least 100 sequence reads e.g., in at least 200, 300, 400, 500, 600, 700 or more sequence reads.
The term “genomic locus” refers to a specific location within the genome, and may include a single position (a single nucleotide at a defined position in the genome) or a stretch of nucleotides starting and ending at defined positions in the genome. The specific position(s) may be identified by the molecular location, namely, by the chromosome and the numbers of the starting and ending base pairs on the chromosome. A genomic locus of interest herein contains at least one CpG site.
To avoid double-counting, the non-methylated CpG sites can be taken as sequencing reads whose 5’ ends map to a site, as sequencing reads whose 3’ ends map to a site, or as the half of the sum of sequencing reads whose 5’ ends or 3’ ends map to a site (see above). Sequencing may optionally be preceded by a step of ‘hybrid capture’ (also known as ‘hybridization capture’) to enrich the sample to be sequenced for DNA molecules comprising regions of interest, such as one or more markers of the sequence listing. In hybrid capture, a sample of DNA molecules, such as the prepared DNA molecules of the sequencing library, is captured by allowing the DNA molecules to hybridize with single-stranded oligonucleotide ‘baits’ or ‘probes’ specific for the regions of interest. The baits may be immobilized on a solid support to capture the DNA molecules. Preferably, however, the hybridization is carried out in solution with baits that comprise a tag, allowing the subsequent isolation of the DNA:bait hybrids. For instance, the baits may be biotinylated and the DNA:bait hybrids isolated by allowing them to bind to the surface of streptavidin-coated magnetic beads. For both solid-phase and solution hybridization, DNA molecules that do not bind specifically to the immobilised baits are washed away and the specifically bound molecules subsequently eluted. The resultant DNA sample, which can then be sequenced, is enriched for molecules comprising the regions of interest. RNA baits are preferred because RNA:DNA duplexes hybridize more efficiently and are more stable than DNA:DNA duplexes.
Thus, in preferred embodiments, the prepared DNA molecules are subjected to hybrid capture prior to sequencing e.g. using biotinylated RNA bait molecules specific for one or several markers of the sequence listing (or genomic regions close to or overlapping the sequence listing markers).
Methods disclosed herein do not require differential adapter tagging of methylated vs. unmethylated DNA molecules. The same population of adapters can be used for all molecules.
Systems and kits
The invention also provides various systems and kits.
A system can comprise computer processor(s) for performing and/or controlling the methods disclosed herein, and/or for processing the results e.g., for performing calculations based on the results. Methods which are at least partially computer-implemented are provided.
A system or kit may comprise: a blood, plasma or serum sample of a human subject; components for carrying out a method disclosed herein on at least one CpG site; and computer software stored on a non-transitory computer readable medium, the computer software being able to direct a computer processor to determine a methylation level for the at least one CpG locus based on the methylation assay. The software may also be able to link the methylation level to a diagnostic result or prediction e.g. by comparing one or more methylation level(s) to one or more reference levels to assess the presence of a disease in the subject. The computer software may receive data from a qPCR and/or a NGS experiment.
Components for carrying out a method disclosed herein encompass biochemical components (e.g., enzymes, primers, probes, NTPs, etc.), chemical components e.g., buffers, reagents), and technical components (e.g., a PCR system, such as a real-time PCR system, and equipment such as tubes, vials, plates, pipettes). The system may be able to prepare and/or communicate a report to the subject and/or to a healthcare provider of the subject, based on the methylation levels.
Computer software includes processor-executable instructions that are stored on a non- transitory computer readable medium. The computer software may also include stored data. The computer readable medium is a tangible computer readable medium, such as a compact disc (CD), magnetic storage, optical storage, random access memory (RAM), read only memory (ROM), or any other tangible medium.
Computer-related methods and steps described herein are implemented using software stored on non-volatile or non-transitory computer readable instructions that when executed configure or direct a computer processor or computer to perform the instructions.
Each of the system, server, computing device, and computer described in this application can be implemented on one or more computer systems and be configured to communicate over a network. They all may also be implemented on one single computer system. In one embodiment, the computer system includes a bus or other communication mechanism for communicating information, and a hardware processor coupled with bus for processing information.
A computer system also includes a main memory, such as a random-access memory (RAM) or other dynamic storage device, coupled to bus for storing information and instructions to be executed by processor. Main memory also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor. Such instructions, when stored in non-transitory storage media accessible to processor, render computer system into a specialpurpose machine that is customized to perform the operations specified in the instructions.
A computer system can include read only memory (ROM) or other static storage device coupled to bus for storing static information and instructions for processor. A storage device, such as a magnetic disk or optical disk, is provided and coupled to bus for storing information and instructions.
A computer system may be coupled via bus to a display, for displaying information to a computer user.
An input device, including alphanumeric and other keys, can be coupled to bus for communicating information and command selections to processor. Another type of user input device is cursor control, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor and for controlling cursor movement on display.
Methods disclosed herein may be performed by a computer system in response to the processor executing one or more sequences of one or more instructions contained in main memory. Such instructions may be read into main memory from another storage medium, such as storage device. Execution of the sequences of instructions contained in main memory causes the processor to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
Suitable storage media include any non-transitory media that store data and/or instructions that cause a machine to operation in a specific fashion. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
Storage media are distinct from, but may be used in conjunction with, transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus.
The invention also provides a kit comprising: (i) a composition comprising one or more restriction enzymes; and (ii) components for analysing cfDNA which has been digested with the composition. These components may be e.g. components for performing PCR, or for preparing a sequencing library from digested cfDNA. For instance, the kit may include one or more of: (a) a buffer solution e.g. with 50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 pg/mL recombinant albumin, pH 7.9, or with 50 mM Tris-HCl, 10 mM MgCL, 100 mM NaCl, 100 pg/mL recombinant albumin, pH 7.9; (b) a DNA polymerase, dNTPs, primers and, optionally, one or more probes; (c) sequencing adapters; (d) an enzyme solution, including a DNA ligase and/or a DNA polymerase; and/or (e) control DNA.
A kit may include an instruction manual for carrying out the methods as disclosed herein.
A kit may include a non-transitory computer readable medium storing a computer software comprising instructions that when executed configure or direct a computer processor to perform the method steps disclosed herein.
Controls
Methods disclosed herein can take advantage of positive and negative controls. In some embodiments, parallel analysis can be performed on one or more of:
• A DNA control which does not contain a recognition sequence for the restriction enzymes used for digestion. If this DNA is digested, this indicates that the method has not performed correctly.
• A DNA control which contains a fully methylated recognition sequence for the restriction enzymes used for digestion. If this DNA is digested when a method uses only MSREs, this indicates that the method has not performed correctly (and conversely for MDREs). • A DNA control which contains a fully unmethylated recognition sequence for the restriction enzymes used for digestion. If this DNA is not fully digested when a method uses only MSREs, this indicates that the method has not performed correctly (and conversely for MDREs).
These DNA controls can also be used as a reference point for analysis, for checking completeness of digestion, etc. As mentioned above, for instance, if fragments are obtained using MSRE digestion then it can be useful in a downstream NGS experiment to know the expected read count, and one way of obtaining this value is to look at the read count for DNA which does not contain the recognition sequence for the MSRE, or at the read count for DNA which contains the recognition sequence but is fully methylated.
For these purposes, it is preferred that the DNA control should be similar in size and composition to cfDNA molecules which contain CpG sites of interest. Thus, although it is possible to use synthetic DNA or PCR amplicons or bacterial plasmid DNA as an unmethylated control, these are more useful if they have sizes which are similar to cfDNA (e.g., a long synthetic DNA, or an appropriately-sized restriction fragment prepared from a plasmid).
Control experiments can be performed internally in a sample, or externally. For an internal control, control DNA can be present in a sample already (e.g., cfDNA containing a CpG site which is known to be ubiquitously (un)methylated, or cfDNA which does not contain a recognition sequence for the restriction enzymes being used) and/or can be added (e.g., synthetic DNA, added to cfDNA). The control DNA can therefore be processed in combination with the cfDNA, and experiences the same conditions as the cfDNA, and so a method can involve co-amplification of a locus including a restriction site and a control locus. For an external control, control DNA is subjected to the same treatment as the cfDNA but not as part of the same reaction mixture.
Thus, control DNA, like cfDNA, can be digested with restriction enzymes and then subjected to downstream analytical steps e.g., amplification, DNA sequencing, etc. Real-time PCR of suitable control loci can give a result that can be used as a reference point. For instance, the signals obtained from cfDNA at a CpG site of interest and from control DNA (in particular, from control DNA which is not digested by the restriction enzymes being used) can be compared, and the signal ratio can be used to determine the degree of methylation at a CpG site of interest, because the ratio of signal reflects the ratio of methylation. Thus, methods disclosed herein can be performed without requiring evaluation of absolute methylation levels at genomic loci, but rather by calculating a signal ratio between the analyzed genomic loci and a control. This contrasts with some conventional methods of methylation analysis for distinguishing between tumor-derived and normal DNA, which require determining actual methylation levels at specific genomic loci. The methods disclosed herein can thus eliminate the need for standard curves and/or additional laborious steps involved in determination of absolute methylation levels, thereby offering a simple and cost-effective procedure. An additional advantage when using an internal control is that signal ratios are obtained for loci amplified in the same reaction mixture under the same reaction conditions, which can help to eliminate sources of potential error (e.g. the potential for differences between reaction mixtures, such as the concentration of template, enzyme, etc.).
Methods which use qPCR may therefore involve calculating signal intensity ratios between a CpG site co-amplified after digestion of DNA as disclosed herein, thereby providing a methylation status for the CpG site. This methylation level can then be compared to reference levels (e.g., obtained from healthy subjects, or from subjects having a known disease) and, based on the comparison, a diagnostic result can be derived. Thus, a method may involve: co-amplifying from restriction endonuclease-digested DNA a CpG site and a control locus, thereby generating co-amplification products; determining a signal intensity for each generated co-amplification product; and calculating a ratio between the signal intensities of the co-amplification products of the CpG site and the control locus.
The ratio between the signal intensities of the co-amplification products may be calculated by determining the quantification cycle (Cq) for each locus and calculating 2<Cq contro1 locusCq CpG Slte). In other words, the reduction in Cq relative to the control locus is determined, and this value is used as the exponent of 2 to calculate the ratio. In some embodiments, the difference in Cq for a marker of interest and a control locus (ACq) is at least 2 cycles.
Thus, using qPCR or sequencing, it is possible, based on the degree of digestion at any particular CpG site, to derive a numerical value which represents the degree of methylation of that CpG site in a cfDNA sample. This value may be expressed in a variety of ways e.g., as a ratio or percentage of the cfDNA molecules that are methylated at a CpG site, or as an intensity of a signal obtained from a particular CpG site, or as the ratio between a CpG site and a control locus, etc.
PRC2
Several markers in the sequence listing are located in, or close to, regions of the genome that are targeted by polycomb repressive complex 2 (PRC2). PRC2 is a protein complex that methylates histone H3 at lysine 27 (H3K27). The constitutive subunits of PRC2 are polycomb protein SUZ12, histone-lysine N-methyltransferase EZH1 or histone-lysine N-methyltransferase EZH2, polycomb protein EED and histone binding protein RBBP4 or histone binding protein RBBP7. PRC2 may also comprise, as accessory subunits, zinc finger protein AEBP2 and protein Jumonji (Jarid2); or one of the PCL proteins (PHF1, MTF2 or PHF19), and EPOP or PALI1/PALI2.
Therefore, marker loci according to the invention can comprise the genomic loci targeted by PRC2. Marker loci according to the invention also comprise genomic loci that are located fewer than 500 bp, 1000 bp, 2000 bp, 5000 bp, 10000 bp or 20000 bp from a genomic locus targeted by PRC2.
Genomic loci targeted by PRC2 are known in the art and include loci identified as being targeted by any constitutive subunit of PRC2, such as polycomb protein SUZ12, or any accessory subunit of PRC2, such as Jarid2. Loci targeted by PRC2 or any of subunits of PRC2 can be identified using methods known in the art, such as, but not limited to, chromatin immunoprecipitation (ChIP) followed by microarray analysis or sequencing. Loci targeted by PRC2 may also be identified by inference, based on the H3K27 methylation of associated nucleosomes (loci associated with high levels of H3K27 methylation are likely to be PRC2 targets because PRC2 catalyses this methylation). H3K27 methylation can be associated with genomic loci, for instance, by performing ChIP using antibodies that selectively recognise tri-methylated H3K27, (i.e., ‘H3K27me3’, the product of PRC2 methylation) followed by microarray analysis or sequencing.
As targeting by PRC2 identifies cancer markers, the H3K27 methylation activity of PRC2 may be involved in the development of lung cancer. Accordingly, the invention also encompasses methods for the treatment or prevention of lung cancer comprising regulating the activity of PRC2. In preferred embodiments, the regulating is achieved by contacting at least one subunit of PRC2, such as SUZ12, with a therapeutic compound. The at least one subunit contacted may be a constitutive or an accessory subunit. Preferably, the therapeutic compound affects the genomic targeting of PRC2. Additionally or alternatively, the therapeutic compound may regulate the methyltransferase activity of PRC2. For instance, the therapeutic compound may be able to interact with the methyltransferase active site in EZH2 and/or EZH1 and inhibit methyltransferase activity. In other embodiments, the therapeutic compound may allosterically regulate the methyltransferase activity of PRC2, for instance, by interacting with EED.
Treatment
The invention provides a method for treating or managing lung cancer in a human subject, comprising determining a likelihood of the presence of lung cancer as above, and administering, deciding to administer, or recommending the administration of, a suitable treatment to the subject based on the likelihood. The treatment may comprise administration of one or more of: adagrasib, afatinib dimaleate, alectinib, amivantamab, atezolizumab, bevacizumab, brigatinib, capmatinib, carboplatin, cemiplimab, ceritinib, cisplatin, crizotinib, dabrafenib mesylate, dacomitinib, docetaxel, doxorubicin hydrochloride, durvalumab, entrectinib, erlotinib hydrochloride, etoposide, everolimus, famtrastuzumab deruxtecan, gefitinib, gemcitabine, ipilimumab, lorlatinib, lurbinectedin, methotrexate, mobocertinib, necitumumab, nivolumab, osimertinib mesylate, paclitaxel, pembrolizumab, pemetrexed, pralsetinib, ramucirumab, selpercatinib, sotorasib, tepotinib, topotecan hydrochloride, trametinib, tremelimumab, vinorelbine (or pharmaceutically acceptable salts or derivates of any of these molecules). Preferred treatment combinations are the carboplatin and paclitaxel combination and the gemcitabine and cisplatin combination. The method of administration of any chemotherapy, immunotherapy or targeted therapy depends on the type, grade and stage of the cancer being treated.
The type of treatment may be determined by skilled practitioner(s) according to characteristics of the tumor, including the type, stage and grade of the tumor. The type of treatment is determined typically also based on additional factors such as characteristics of the patient. General
The practice of the present invention will employ, unless otherwise indicated, conventional methods of chemistry, biochemistry, and molecular biology, within the skill of the art. Such techniques are explained fully in the literature. See, e.g., Methods In Enzymology (Academic Press, Inc.), Green & Sambrook (2012) Molecular Cloning: A Laboratory Manual, 4th edition (Cold Spring Harbor Press), Ausubel et al. (eds) Short protocols in molecular biology, 5th edition (Current Protocols), Molecular Biology Techniques: An Intensive Laboratory Course, (Ream & Field, eds., 1998, Academic Press), Wilson and Walker's Principles and Techniques of Biochemistry and Molecular Biology (Hodmann & Clokie, 2018), Basic Molecular Biology & Techniques - Recent Advances: Molecular Biology & Its Technique (Singh et al., 2021), etc.
The term ‘comprising’ encompasses ‘including’ as well as ‘consisting’ e.g. a composition ‘comprising’ X may consist exclusively of X or may include something additional e. g. X + Y.
The term ‘about’ in relation to a numerical value x is optional and means, for example, x+10%.
The word ‘substantially’ does not exclude ‘completely’ e.g., a composition which is ‘substantially free’ from Y may be completely free from Y. Where necessary, the word ‘substantially’ may be omitted from the definition of the invention.
The term ‘between’ with reference to two values includes those two values e.g., the range ‘between’ 10 mg and 20 mg encompasses inter alia 10, 15, and 20 mg.
Unless specifically stated, a method comprising a step of mixing two or more components does not require any specific order of mixing. Thus, components can be mixed in any order. Where there are three components then two components can be combined with each other, and then the combination may be combined with the third component, etc.
The various steps of methods may be carried out at the same or different times, in the same or different geographical locations, e.g., countries, and by the same or different people or entities.
EXAMPLES
Example 1. Markers for the detection of lung cancer
To identify markers for lung cancer, we firstly constructed a novel atlas through whole genome sequencing (WGS) to map changes in CpG methylation associated with early-stage lung cancer.
Samples from early-stage lung cancer subjects (>50% stage I) and cancer free control subjects, all high risk by US Preventative Services Taskforce (USPSTF), were acquired from academic and commercial biobanks. Tissue (tumor and normal lung), whole blood (WB) and plasma samples were acquired for the lung cancer cases. WB and plasma samples were acquired for the control cases.
Each sample underwent digestion with methylation sensitive restriction enzymes followed by standard library preparation and sequencing (herein denoted “ECS”). Sequencing of cfDNA was performed at an average depth of 600x; tissue and WB were sequenced at 80x. Sequence reads were mapped to the human genome. Data analysis was performed using customized software and machine learning algorithms.
Mapping rate was 99.6%, 99.7% and 85.7% and unique mapping rate was 94.1%, 94.3% and 81.4% for WGS, ECS, and BS samples, respectively. Copy number integrity showed Pearson correlations of 0.9 for ECS and 0.67 for BS. Somatic mutation analysis identified a subset of cases with relatively high cfDNA shedding that were associated with larger tumors, older age and squamous cell carcinoma histology. This subset was further used to identify tumor derived plasma based markers and assess fragmentation with high confidence.
In addition, plasma samples were collected from 81 cancer subjects and 54 high-risk individuals without cancer. Extracted DNA was digested with methylation-sensitive endonucleases and sequenced at an average depth of 600x. HitSpanlOO values were determined for CpG sites i.e. the number of sequence alignments with a size of at least 100 nucleotides centred on a CpG site.
Overall, methylation levels for CpG markers were measured in plasma & lung tissue from healthy subjects, and in plasma & tissue from subjects known to have early-stage lung cancer. Comparisons were performed both at an early sample set (28 controls, 36 lung cancer patients) and a later set (90 controls, 93 cases). Methylation levels were analyzed and compared in various ways. For instance, methylation levels were rank ordered using Student’s t-test, to compare the mean HitSpanlOO in plasma samples taken from lung cancer patients or from healthy controls (and produce a FDR- corrected p-value to compare those means i.e. to indicate how likely it is that the mean HitSpanlOO in the plasma of lung cancer patients and in healthy controls is the same). A logistic regression classifier with Lasso regularization was trained on 100,000 loci, and performance was examined by mean AUC using 5-fold cross validation. In other analysis, AUC values for a ROC curve were determined to assess a CpG marker’s ability to distinguish lung cancer samples from healthy controls. By this AUC comparison, markers which are hyper-methylated in cancer should have AUC>0.5, whereas markers which are hypo-methylated are expected to have AUC<0.5, so to compare the AUC of a hypo-methylated marker to the AUC of a hyper-methylated marker its AUC should be subtracted from 1 (e.g. a hypo-methylation marker with AUC=0.1 is equivalent in its usefulness to a hyper-methylation marker with AUC=0.9).
In a further analysis, an updated lung cancer atlas was constructed by collecting additional plasma samples from cancer subjects and high-risk individuals without cancer, and processing these as described above. After rigorous filtering, the updated lung cancer atlas includes a total of 79 tumour tissues, 88 normal lung tissues, 89 plasma cancers and 128 plasma controls, that were used for marker development.
In this further analysis, comparisons were performed on a set of 128 controls and 89 cases. Methylation levels were analyzed and compared as described above, but without training a logistic regression classifier. The performance of the markers was also computationally validated using a second dataset of plasma samples sequenced at a depth of 50x, which included plasma samples from 149 cancer patients and 276 control subjects.
From these analyses, 39636 CpG markers of particular interest were found i.e. about 1/1000 of the CpG sites present in the human genome. Around 9000 are hypo-methylated in cancer samples, and the remainder are hyper-methylated. The full list of markers is shown in the sequence listing. The markers have at least one of the following properties: (i) an AUC well above or below 0.5 for hyper-methylated and hypo-methylated markers in the early sample set; (ii) an AUC well above or below 0.5 for hyper-methylated or hypo-methylated markers in the later sample set; (iii) an AUC well above or below 0.5 for hyper-methylated and hypo-methylated markers in the further analysis; (iv) a p-value <0.01 in the t-test. However, in some cases a marker’s AUC value or p-value did not meet these criteria when comparing all cancer samples to all controls, but it was still selected where it was found to be useful for classification as being informative for identifying only a specific subset of the cancers.
A machine learning model trained on CpG markers in category (iii) performed with high accuracy in discriminating lung cancer patients from high-risk healthy individuals.
Many of the markers have a low background (z.e. low methylation levels in plasma from healthy patients) which means that they could not have been detected using bisulfite conversion.
It will be understood that the inventors’ work has been described above by way of example only and modifications may be made while remaining within the scope and spirit of the invention.

Claims

1. A method for determining a likelihood of the presence of lung cancer in a human subject, comprising:
(a) measuring, in cell-free DNA (cfDNA) from a sample of the subject, a methylation level for at least one marker of the sequence listing; and
(b) comparing the measured methylation level to an index methylation level for the same marker in at least one known source, thereby determining the likelihood based on the comparison.
2. The method according to claim 1, wherein the known source is cfDNA from: i) one or more individuals without lung cancer; ii) one or more individuals known to have lung cancer; iii) one or more individuals known to have a particular type of lung cancer, optionally wherein the type is non-small cell lung cancer or small cell lung cancer; iv) one or more individuals known to have a particular stage of lung cancer; and/or v) one or more individuals known to have a particular grade of lung cancer.
3. The method of claim 1 or 2, wherein the known source is cfDNA from one or more individuals known to have a high-grade lung cancer, optionally wherein the high-grade is 2 or higher.
4. The method of any preceding claim, wherein the measured methylation level is compared to more than one index methylation level, each index level being for the same marker in a different known source.
5. The method of any preceding claim, wherein the sample is a blood sample.
6. The method of any one of claims 1-4, wherein the sample is a plasma sample.
7. The method of any preceding claim, further comprising, before (a), a step of preparing the cfDNA from the sample.
8. The method of any preceding claim, wherein measuring a methylation level comprises:
(A) digesting the cfDNA with at least one methylation-sensitive restriction endonuclease and/or at least one methylation-dependent restriction endonuclease, to produce digested cfDNA; and
(B) quantifying a degree of digestion at the at least one marker within the digested cfDNA, thereby measuring the methylation value.
9. The method of claim 8, wherein the digesting is performed with at least one methylationsensitive restriction endonuclease.
10. The method of claim 9, wherein the digesting is performed with HinPlI and Acil.
11. The method of any one of claims 8-10, wherein the quantifying a degree of digestion at the at least one marker comprises: performing real-time PCR (rtPCR) on the digested cfDNA to amplify a genomic locus comprising a CpG site within the marker; or performing high-throughput sequencing on the digested cfDNA to provide sequencing data and analysing the sequencing data to quantify the degree of digestion.
12. The method of any one of claims 1-4 or 7, wherein: a) the sample is plasma; and b) measuring a methylation level comprises: i) digesting the cfDNA with HinPlI and Acil to produce digested cfDNA; and ii) quantifying a degree of digestion at the at least one marker within the digested cfDNA by performing real-time PCR (rtPCR) on the digested cfDNA to amplify a genomic locus comprising a CpG site within the marker, thereby measuring the methylation value.
13. A composition comprising: a) at least one primer pair, wherein a primer pair is for amplifying a CpG site within a marker of the sequence listing to generate an amplification product; and optionally b) at least one oligonucleotide probe for obtaining a signal intensity for each amplification product in a real-time PCR using said primer pairs.
14. A kit comprising: the composition of claim 13; and (i) one or more methylation-sensitive and/or methylation-dependent restriction endonucleases and/or (ii) PCR reagents.
15. A method for treating or managing lung cancer in a human subject, comprising determining the likelihood according to the method of any one of claims 1-12; and administering, deciding to administer, or recommending the administration of a suitable treatment to the subject based on the likelihood.
16. The method of claim 15, wherein the suitable treatment is one or more of: surgical resection, including wedge resection, segmental resection, sleeve resection, lobectomy and pnemonectomy; laser therapy; photodynamic therapy; cryosurgery; electocautery; chemotherapy; radiation therapy; immunotherapy; and targeted drug therapy.
17. The method of claim 16, further comprising categorizing the human subject as having residual disease or tumor viable cells, and administering, deciding to administer or recommending the administration of a second-line therapy to the subject.
18. The method of claim 17, wherein the second-line therapy is one or more of: surgical resection, including wedge resection, segmental resection, sleeve resection, lobectomy and pnemonectomy; laser therapy; photodynamic therapy; cryosurgery; electocautery; chemotherapy; radiation therapy; immunotherapy; and targeted drug therapy.
19. A method for treating or managing high-grade lung cancer in a human subject, comprising determining a likelihood of the presence of lung cancer according to the method of any one of claims 1-12 (particularly claim 3) and administering, deciding to administer, or recommending the administration of a suitable treatment to the subject based on the likelihood.
20. The method of any one of claims 15-19, further comprising identifying the subject as non- responsive to the treatment and modifying, deciding to modify or recommending modification of the treatment.
21. A method for evaluating the response of a human subject to lung cancer treatment comprising administering a suitable treatment to the subject and determining a likelihood of the presence of lung cancer in the subject after treatment according to any one of claims 1-12.
PCT/IL2024/050563 2023-06-06 2024-06-06 Markers Pending WO2024252401A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB2308414.8 2023-06-06
GBGB2308414.8A GB202308414D0 (en) 2023-06-06 2023-06-06 Markers

Publications (1)

Publication Number Publication Date
WO2024252401A1 true WO2024252401A1 (en) 2024-12-12

Family

ID=87156940

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2024/050563 Pending WO2024252401A1 (en) 2023-06-06 2024-06-06 Markers

Country Status (2)

Country Link
GB (1) GB202308414D0 (en)
WO (1) WO2024252401A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005040399A2 (en) * 2003-10-21 2005-05-06 Orion Genomics Llc Methods for quantitative determination of methylation density in a dna locus
WO2022157764A1 (en) * 2021-01-19 2022-07-28 Nucleix Ltd. Non-invasive cancer detection based on dna methylation changes

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005040399A2 (en) * 2003-10-21 2005-05-06 Orion Genomics Llc Methods for quantitative determination of methylation density in a dna locus
WO2022157764A1 (en) * 2021-01-19 2022-07-28 Nucleix Ltd. Non-invasive cancer detection based on dna methylation changes

Also Published As

Publication number Publication date
GB202308414D0 (en) 2023-07-19

Similar Documents

Publication Publication Date Title
US20210363597A1 (en) Identification and use of circulating nucleic acids
KR102210852B1 (en) Systems and methods to detect rare mutations and copy number variation
JP6543569B2 (en) Quantitative multiplex methylation specific PCR method-cMeth DNA, reagent, and use thereof
JP2022525890A (en) Methods and systems for detecting methylation changes in DNA samples
JP2023550141A (en) Detection of methylation changes in DNA samples using restriction enzymes and high-throughput sequencing
CN110741096A (en) Compositions and methods for detecting circulating tumor DNA
CN119421958A (en) Identification of methylation markers for cancer and their applications
JP2023524067A (en) Methods for identification and relative quantification of nucleic acid sequence, mutation, copy number or methylation changes using nuclease, ligation, deamination, DNA repair and polymerase reactions in combination with carryover prevention and marker
US20240093302A1 (en) Non-invasive cancer detection based on dna methylation changes
EP4638781A2 (en) Methods involving methylation preserving amplification with error correction
JP2025522763A (en) Enrichment of aberrantly methylated DNA
US20250101494A1 (en) Methods for analyzing cytosine methylation and hydroxymethylation
WO2025029475A1 (en) Methods to enrich nucleotide variants by negative selection
WO2024252401A1 (en) Markers
WO2023227954A1 (en) Sample preparation for cell-free dna analysis
WO2023228174A9 (en) Useful combinations of restriction enzymes
TW202328459A (en) A tumor detection method and application
WO2022262831A1 (en) Substance and method for tumor assessment
US20220307077A1 (en) Conservative concurrent evaluation of dna modifications
WO2024157256A1 (en) Markers of disease
US20250243550A1 (en) Minimum residual disease (mrd) detection in early stage cancer using urine
US20250101522A1 (en) Brca1 promoter methylation in sporadic breast cancer patients detected by liquid biopsy
WO2023089613A1 (en) Whole genome cpg analysis
WO2025207926A1 (en) Methods for selective deamination using methyl-sensitive deaminases
WO2025224260A1 (en) Target enrichment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24818925

Country of ref document: EP

Kind code of ref document: A1