WO2025181348A1 - Method for determining the origin of circulating dna - Google Patents
Method for determining the origin of circulating dnaInfo
- Publication number
- WO2025181348A1 WO2025181348A1 PCT/EP2025/055553 EP2025055553W WO2025181348A1 WO 2025181348 A1 WO2025181348 A1 WO 2025181348A1 EP 2025055553 W EP2025055553 W EP 2025055553W WO 2025181348 A1 WO2025181348 A1 WO 2025181348A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cfdna
- cancer
- origin
- dna
- cell
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6881—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for tissue or cell typing, e.g. human leukocyte antigen [HLA] probes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6858—Allele-specific amplification
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
Definitions
- the present invention relates to methods of determining the origin of cell free (circulating) DNA using a nanopore sequencer.
- Fragmentation patterns can also be used to understand the origin of circulating DNA, although this is complicated by the fact that DNAse activity is not restricted to these cells but also occurs during circulation. Unlike bisulfite sequencing, which results in additional fragmentation of input DNA and has been used for many of the circulating cancer DNA studies, nanopore sequencing preserves native fragmentation properties.
- One of the important global fragmentation changes associated with cancer is the shortening of the typical mono-nucleosomes and di-nucleosome fragments. While this shortening process disproportionally affects DNA derived from cancer relative to other non-cancer cell types, it appears to affect non-cancer DNA as well.
- FIGURE 1 Nanopore sequencing of a selected subset of 37 cancer samples and 21 healthy samples.
- A ichorCNA tumor fractions for all samples sequenced using Oxford Nanopore Technologies (ONT).
- C H3.1 nucleosome levels vs. Illumina tumor fractions or ONT tumor fractions, for the same set of samples.
- D Comparison of ONT ichorCNA cancer detection and DNA methylation-based cancer detection.
- FIGURE 2 Identification of cancer-associated fragment length features.
- A-B Principal component analysis of all cancer and healthy samples, using either unnormalized (A) or z-score normalized (B) counts for each fragment length bin. Lines indicate values above 50 for PC1 and values above -2 for PC2.
- C Unnormalized counts of all cancer and healthy samples, ordered by PC1 and then PC2. Red line selects PC2 values greater than - 2. Loadings for each of the PCs are shown above ONT samples. Three samples with significant fragments over 10kb are separated out as “Healthy-Ultralong” samples.
- D Same as C, but using bin counts z-score normalized using mean and standard deviation of healthy samples.
- FIGURE S (A) Typical “spun” plasma processing workflow and alternative “unspun” workflow. (B) All ONT samples, showing proportion of DNA in fragments greater than 7.5kb in length. (C) Bioanalyzer trace showing one of the Healthy-Ultralong samples, together with a pre-centrifugation (“Unspun”) aliquot of the same plasma. (D) H3.1 concentration and (E) cfDNA concentration for 3 outlier healthy samples and 4 other healthy samples. (F-G) Repeated PCA analysis run including the three Unspun counterparts of the three Healthy-Ultralong samples. The right side of (E) shows the fraction of sequenced DNA in each of the fragment length ranges defined by each of the 3 principal components (right).
- H-J Proportion of fragments greater than 500bp (H), 1 ,000 bp (I) and 7,500 bp (J).
- K Proportion of fragment ends starting with CC in ONT sequencing, as a function of fragment length, for 4 typical healthy volunteers (black) and the three outlier healthy volunteers processed with the spun workflow (red) and the unspun workflow (blue).
- L The proportion of fragment ends starting with CCCA in ONT sequencing, in typical length fragments (100- 400bp, x axis) vs. fragments greater than 7.5kb (y axis).
- FIGURE 4 Unsupervised clustering by fragment length.
- A All samples ordered by first three principal components. The first 3 sample groups are cancer samples with the cancer type indicated by color, and the second 3 groups are healthy samples. Bar plots on the left show total cfDNA concentration (ng/mL), total plasma H3.1 nucleosome concentration (ng/mL), tumor DNA fraction from ichorCNA from ONT sequencing (“CNA ONT”) and Illumina sequencing (“CAN ILLUMINA”), and the fraction of ONT fragments ending with the motif CCCA. In all bar plots, higher bars are also indicated with darker colors. The heatmap to the right of these bar plots shows the proportion of DNA sequenced in different fragment length bins, on a log scale from 50 bp to 50,000 bp.
- the left heatmap shows the unnormalized proportion of sequenced DNA in each bin
- the right heatmap shows z-scores which are column normalized according to the mean and standard deviation across all spun healthy samples.
- Above the heatmaps are the PC loadings for each of the first 3 PCs.
- the heatmap to the right shows the same data, but each fragment length bin/column is z-score normalized according to the mean and standard deviation across all healthy samples (not including “pre-spin” samples).
- the bar plots on the right show the eigenvalues for the 3 principal components (“PC scores”), and the percentage of sequenced DNA in the ranges defined the by those PCs (“Frac DNA”).
- PC scores principal components
- Frac DNA percentage of sequenced DNA in the ranges defined the by those PCs
- One sample with a sub-threshold but positive PC1 value is marked with an asterisk.
- C CC end motif frequencies as a function of fragment length, for the 6 cancer samples with the highest PC3 scores, and the 6 cancer samples with the lowest PC3-scores.
- FIGURE S Origins of short mono and dinucleosome fragments (75-145bp, 245- 295 bp)
- A All samples, showing Cell of Origin (COO) scores for relevant cell types. 10 samples that had positive PC3 scores and >1.5M reads are shown with asterisk or “X”. Those with asterisk have a top COO cell type matching the cancer type, whereas the correct cell type was not detected in the four marked with “X”.
- B ichorCNA tumor fraction for the 10 samples described above, for only the typical mononucleosome fragments (x axis) and the shorter mono-nucleosome and di-nucleosome fragments associated with PC3 (y axis).
- C Methylated percentage averaged across CpGs from 2,274 CpG islands commonly hypermethylated in cancer. Elevated methylation at these regions was observed in the same six samples with elevated tumor fraction (Prostate cancer CA-14, Colorectal cancers CA-35, and CA-75, Non Hodgkins Lymphomas CB-64, CA-19, and CA-07)
- D Cell type fraction for the 4 cell types correctly identified among the 10 samples, as estimated using COO deconvolution.
- FIGURE S Results for DNASE1L3-associated CC motifs in long fragments. CC end motif frequency is plotted as a function of fragment length.
- FIGURE 7 Links between hyperfragmentation and elevated circulating chromatin levels in cancer.
- A 30 cancer samples from the PC3-high and PC1-3 low groups, ordered by PC3 value. Heatmap rows show sample features clustered by hierarchical clustering based on 1-p (Spearman’s rho). Heatmap rows are normalized to row min and max values.
- B Spearman correlation values for all pairwise comparisons.
- FIGURE 8 Cancer-Specific DNA Methylation Patterns in Client-Owned Dogs.
- WGM Global methylation changes
- (B) H3.1 plasma nucleosome concentrations in dogs newly diagnosed with large cell lymphoma (n 15) before and 24 hours after the first dose of chemotherapy (week 1, cycle 1). H3.1 concentration in dogs with (C) increases and (D) decreases, in nucleosomes 24 hours after chemotherapy.
- FIGURE 9 Hyperfragmentation is a Strong Predictor of Healthy vs. Cancer Status in Dogs.
- A Hyperfragmentation component (pel) in the 21 “ultra-shallow” healthy vs. cancer dog cohort.
- B-D Bona-fide cancer DNA markers in the same cohort: global methylation (WGM) (B), tumor fraction from ichorCNA Copy Number Alterations (C), and average methylation at cancer-associated CpG Islands (CGI) (D).
- E Receiver Operating Characteristics (ROC) using Multivariable Logistic Regression (MLR) to predict healthy vs cancer status using a combination of global methylation, CpG Island methylation, and tumor fraction as inputs.
- MLR Multivariable Logistic Regression
- AUC Area Under the Curve
- LOOCV Leave-One-Out Cross Validation
- FIGURE 10 Hyperfragmentation is a Strong Predictor of Inflammation vs. Cancer Status in Dogs.
- A Hyperfragmentation component (pel) in “shallow” inflammation vs. cancer cohort (“INF” is inflammatory inflammation, “LSA” is lymphoma”, and “HSS” is hemangiosarcoma).
- B-D Bona-fide cancer DNA markers in the same cohort: global methylation (WGM) (B), tumor fraction from ichorCNA Copy Number Alterations (C), and average methylation at cancer-associated CpG Islands (CGI) (D).
- WGM global methylation
- C cancer-associated CpG Islands
- FIGURE 11 Using Hyperfragmentation and Cancer DNA Markers to Monitor Dogs during and After Treatment.
- FIGURE 12 Hyperfragmentation changes 24 hours post chemotherapy in dogs.
- A For 13 lymphoma cases from the “shallow” ONT sequencing cohort, pre-treatment (“Diagnosis”) values vs. 24 hours post chemotherapy values are shown for H3.1 nucleosome levels (left) and values of the hyperfragmentation component (pel), with both generally showing increases at the post chemotherapy timepoint.
- B Bona-fide cancer DNA markers for the same samples.
- C Pre-therapy (“Diagnosis”), the fragment length distribution for patient “HR” including all cancer-associated CpG islands shows longer fragments for both the cancer and non-cancer fragments, while post-chemotherapy fragments are shorter (hyperfragmentation) in both the cancer and non-cancer fragment groups.
- FIGURE 13 Enrichment of Long Fragments in Clinical Sepsis.
- B shows the fraction of fragments in each sample that have lengths between 900bp to 4,300bp.
- C shows the fraction of granulocyte (neutrophil) DNA estimated using methylation-based cell of origin deconvolution from the Non-Negative Least Squares (NNLS) method of the CelFiE-ISH package.
- NLS Non-Negative Least Squares
- Circulating tumor DNA (ctDNA) is routinely identified in cancer patients and increasingly used for monitoring and early detection.
- concentration of immune cell DNA is elevated in cancer, suggesting major changes in immune cell turnover or cell free DNA (cfDNA) homeostasis.
- the present inventors used nanopore sequencing to identify a general shortening cfDNA in most cancers, and methylation-based cell of origin analysis revealed this phenomenon affects both cancer and non-cancer DNA. Therefore, incorporation of long-read sequencing provides an important parameter to consider when determining the tissue of origin of cfDNA and in methods of detecting ctDNA for liquid biopsy.
- a tissue of origin a cell type of origin, origination from a cancerous or immune cell, or a combination thereof of cell-free DNA (cfDNA) in an animal subject, the method comprising:
- Nanopore sequencing is more appropriate for longer range alterations such as aneuploidy and chromosomal arm amplifications or deletions particularly where rapid results are required.
- One advantage we have found for nanopore sequencing is to provide direct methylated and/or hydroxymethylated DNA sequence results for cfDNA in real time (with no chemical pretreatment).
- Nanopore sequencer instruments can also be small, low cost and suitable for use near to the patient. Moreover, when sufficient data has been obtained in real time, sequencing can be terminated avoiding the unnecessary use of further reagents leading to economy of use. Furthermore, nanopore sequencers are small and so can be used broadly and closer to the patient.
- Nanopore DNA sequencing (for example the sequencing methods employed by Oxford Nanopore Technology DNA instruments) is becoming more commonly employed by workers in the field.
- nanopore sequencing involves the passage of DNA strands through electrically charged nanopores.
- passing is translocating.
- an electrical disturbance characteristic of the nucleotide is induced and this is detected by a sensor connected to the nanopore.
- Nanopore sequencing typically produces lower coverage results and less accurate DNA sequence results than sequencing by synthesis (“SBS”, for example the sequencing methods employed by Illumina NGS instruments are sequencing by synthesis) but has a number of advantages including the ability to sequence long DNA chains without fragmentation, the ability to directly sequence non-standard nucleotides in addition to adenine, thymine, cytosine and guanine (for example 5-methylcytosine and 5-hydroxymethylcytosine) and the generation of sequence data in real time and without library amplification. This facilitates the obtaining of sequence data in a shorter time of a few hours or less. Nanopore sequencing is expensive if used for high coverage sequencing of whole genomes but more economic for shallow sequencing (e.g. 1X coverage) or ultra-shallow (e.g. 0.05x coverage) of cfDNA fragments.
- the nanopore sequencer is an Oxford Nanopore sequencer.
- Methods of the invention may use a computer-based machine learning system to analyse the fragment length data.
- Algorithms e.g. machine learning algorithms
- executed by a computer can be used to automate analytical model building, e.g., for clustering, classification or pattern recognition.
- Machine learning algorithms may be supervised or unsupervised. Learning algorithms include, for example, artificial neural networks, discriminant analyses (e.g. Bayesian classifier or Fischer analysis), support vector machines, decision trees (e.g. recursive partitioning processes such as CART - classification and regression trees, or random forests), linear classifiers (e.g. multiple linear regression (MLR), partial least squares (PLS) regression, and principal components regression), hierarchical clustering, and cluster analysis.
- discriminant analyses e.g. Bayesian classifier or Fischer analysis
- support vector machines e.g. recursive partitioning processes such as CART - classification and regression trees, or random forests
- linear classifiers e.g. multiple linear regression (MLR), partial least square
- analysis of the fragment length data is performed on the cfDNA after passing the cfDNA through the nanopore sequencer by clustering the data based on the length of said cfDNA.
- the clustering is unsupervised clustering.
- a classifier is used to cluster the data based on the length of said cfDNA.
- the term “classifier” generally refers to algorithm computer code that receives, as input, test data and produces, as output, a classification of the input data as belonging to one or another class (e.g. length of fragment).
- analysis of the fragment length data comprises transforming a fraction of sequencing reads from the fragment length data to an estimated fraction of total cfDNA present in the sample.
- said transforming is performed prior to clustering.
- clustering the data comprises separating the data into short, mid and long cfDNA lengths.
- the short cfDNA length comprises 50-500 base pairs (bp) in length. In a further embodiment, the short cfDNA length comprises about 60-300 bp in length, such as about 75-145 and about 245-294 bp in length.
- the mid cfDNA length comprises 501-5,000 bp in length. In a further embodiment, the mid cfDNA length comprises about 900-4,300 bp in length.
- the long cfDNA length comprises 5,001 bp or more in length, such as 5,001-60,000 bp in length. In a further embodiment, the long cfDNA length comprises about 7,500-53,000 bp in length.
- the method additionally comprises extracting the cfDNA from circulating chromatin fragments in the sample and sequencing the extracted cfDNA using the nanopore sequencer.
- the cfDNA is unamplified after it is extracted from a sample from a subject. Unlike most cfDNA sequencing approaches, amplification is not required (i.e. the method may be “amplification free”), which may provide an even more accurate representation of fragmentation features.
- the cfDNA is modified with a sequencing adapter. Therefore, the method may comprise ligating an adapter sequence below 75 nucleotides in length to the cfDNA to produce adapter ligated cfDNA.
- Said adapter sequence may comprise a nucleic acid barcode that uniquely identifies a source sample (i.e. the sample from which the cfDNA is obtained) of the cfDNA.
- Using adapter sequences can produce an adapter ligated cfDNA library for analysis.
- the adapter ligated cfDNA library may then be passed through a nanopore sequencer to produce a sequence of the cfDNA.
- the method further comprises performing an additional analysis on the cfDNA.
- the additional analysis is performed on the cfDNA after passing the cfDNA through the nanopore sequencer.
- fragmentation location analysis is performed on the cfDNA after passing the cfDNA through the nanopore sequencer and said fragmentation location analysis is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
- the fragmentation location analysis comprises fragment end motif analysis.
- the fragment end motif analysis is performed with the sequence determined from sequencing a plurality of cfDNAs.
- the method additionally comprises:
- RNA sequencing RNA-seq
- DNase deoxyribonuclease
- RT-PCR Reverse Transcription Polymerase Chain Reaction
- DNase enzymes catalyse the hydrolytic cleavage of phosphodiester linkages in the DNA backbone, thus degrading DNA.
- DNase I deoxyribonuclease I
- DNase II deoxyribonuclease II
- the DNase enzyme is a DNase I enzyme.
- the DNase I enzyme is DNASE1 L3.
- the end sequence is an end 4 nucleotides.
- the end sequences are the sequences provided in Chan et al., “Plasma DNA Profile Associated with DNASE1L3 Gene Mutations: Clinical Observations, Relationships to Nuclease Substrate Preference, and In Vivo Correction”, American Journal of Human Genetics, Vol. 107, No. 5 (2020), herein incorporated by reference in its entirety.
- the end sequences are the sequences provided in Serpas et al., “Dnase1l3 deletion causes aberrations in length and end-motif frequencies in plasma DNA”. Proc. Natl Acad. Sci. Vol.
- the end sequence is selected from CCCA, CCAG, CCTG, CCAA, CCCT, CCTT, CCAT, CAAA, CCTC, CCAC, TGAA, TAAA, CCTA, CCCC, TGAG, TGTT, CAAG, CTTT, AAAA, TGTG, CATT, CACA, CAGA, TATT, and CAGG.
- the end sequence is CCCA.
- the presence of a specific end fragment sequence indicates the DNA is from a cancer cell.
- the presence of a specific end fragment sequence indicates the DNA is from an immune cell.
- an enrichment of a specific end fragment sequence indicates the sample is from a subject that has cancer.
- the presence or absence of a specific end fragment sequence is indicative of DNASE1L3 activity.
- a reduction or absence of the end sequence e.g. CCCA, CCTG, CCAG, CCAA, CCAT and CCTC, in particular CCCA indicates loss of function of DNASE1L3.
- DNA modification data refers to the information of the modification of a portion of bases in the DNA molecule.
- the modification is an epigenetic modification, such as an epigenetically modified base.
- the DNA modification data is selected from: methylation data, hydroxymethylation data and both, and said DNA modification data is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
- the methylation data is 5-methylcytosine (5mC) methylation.
- the hydroxymethylation data is 5-hydroxymethylcytosine (5hmC) hydroxymethylation.
- methylation data refers to the information of the methylation status of a portion of the bases in a DNA molecule.
- hydroxymethylation data refers to the information of the hydroxymethylation status of a portion of the bases in a DNA molecule.
- a portion is all of the bases.
- the bases are cytosines.
- copy number analysis is performed on the cfDNA after passing the cfDNA through the nanopore sequencer and said copy number analysis is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
- the fragment length data is combined with copy number analysis and one or more signatures derived from DNA modification data (such as methylation data) to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
- DNA modification data such as methylation data
- the copy number analysis results in the detection of an oncogene amplification and further comprising administering an agent that targets said oncogene.
- the identifying is based on the fragment length and the fragmentation location analysis. In some embodiments, the identifying is based on the fragment length and DNA modification data. In some embodiments, the identifying is based on the fragment length and copy number analysis. In some embodiments, the identifying is based on the fragment length and the level of circulating nucleosomes and/or total level of cfDNA. In some embodiments, the identifying is based on the fragment length, the fragmentation location analysis, the DNA modification data and the level of circulating nucleosomes. In some embodiments, the identifying is based on the fragment length, the fragmentation location analysis, the DNA modification data and the total level of cfDNA.
- the identifying is based on the fragment length, the fragmentation location analysis, the DNA modification data, the copy number analysis and the level of circulating nucleosomes. In some embodiments, the identifying is based on the fragment length, the fragmentation location analysis, the DNA modification data, the copy number analysis and the total level of cfDNA.
- the present inventors have found a distinct hyperfragmentation pattern occurs across a large fraction of cancers and are associated with markers of altered DNase activity.
- the results show that fragments derived from cancer can be distinguished by a shorter length distribution (e.g. 75-145 bp and 245-295 bp) and end motif patterns consistent with loss of DNASE1L3 fragmentation activity. Therefore, the DNase activity, in particular DNASE1 L3 activity, can be used in the determination of a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
- the method additionally comprises detecting DNASE1L3 activity.
- said detecting comprises: (a) fragment end motif analysis, such as analysing the presence or absence of a specific end fragment sequence which is indicative of DNASE1L3 activity;
- RNA sequencing (b) performing RNA sequencing (RNA-seq) on the body fluid sample and detecting a level of DNASE1L3 mRNA expression;
- the method is a method of determining origination from a cancerous cell and further comprises identifying a cancer-specific DNA modification change in said cancerous cell.
- the method is a method of determining origination from an immune cell, which optionally further comprises identifying an immune- specific sequence in said immune cell.
- the method additionally comprises centrifuging the sample prior to passing the cfDNA through a nanopore sequencer.
- the sample may be any biological fluid (or body fluid) sample taken from a subject including, without limitation, cerebrospinal fluid (CSF), whole blood, blood serum, plasma, menstrual blood, endometrial fluid, urine, saliva, or other bodily fluid (stool, tear fluid, synovial fluid, sputum), breath, e.g. as condensed breath, or an extract or purification therefrom, or dilution thereof.
- CSF cerebrospinal fluid
- whole blood blood serum, plasma, menstrual blood, endometrial fluid, urine, saliva, or other bodily fluid (stool, tear fluid, synovial fluid, sputum), breath, e.g. as condensed breath, or an extract or purification therefrom, or dilution thereof.
- blood, serum or plasma samples are used.
- plasma samples are used.
- Plasma samples may be collected in collection tubes containing one or more anticoagulants such as ethylenediamine tetraacetic acid (EDTA), heparin, or sodium cit
- biomarker means a distinctive biological or biologically derived indicator of a process, event, or condition. Used herein, the biomarker may refer to fragment length data obtained by the nanopore sequencer. Biomarkers can be used in methods of diagnosis, e.g. clinical screening, and prognosis assessment and in monitoring the results of therapy, identifying patients most likely to respond to a particular therapeutic treatment, drug screening and development. Biomarkers and uses thereof are valuable for identification of new drug treatments and for discovery of new targets for drug treatment.
- a size profile obtained by nanopore sequencing of all circulating chromatin fragments present in a body fluid sample to identify a patient suitable for cancer treatment, such as immunotherapy.
- Cancerous cells of interest in the present invention are derived from a cancer.
- the cancer is a haematological cancer including without limitation leukaemias, lymphomas (including canine lymphoma), myelomas and angiosarcomas (including canine hemangiosarcoma).
- the cancer disease is a solid cancer including without limitation, lung cancer, liver cancer, prostate cancer, breast cancer, gastric cancer, colorectal cancer, thyroid cancer, skin cancer (e.g. melanoma), bladder cancer, cervical cancer, pancreatic cancer, brain cancer, ovarian cancer, endometrial cancer or renal cancer.
- Haematological cancers are cancers of the blood, therefore may also be referred to as “liquid or blood cancers”. There are 3 principal types of haematological cancers: leukaemias, which are caused by the rapid production of abnormal white blood cells; lymphomas which are caused by abnormal lymphoma cells; and myelomas, which is a cancer of the plasma cells.
- a blood cancer may be considered to be any cancer in direct contact with the circulation.
- the cancerous cells are derived from hemangiosarcoma.
- Leukaemia is cancer of the blood cells which usually starts in the bone marrow and travels through the bloodstream.
- the bone marrow produces mutated cells and spreads them into the blood, where they grow and crowd out healthy blood cells.
- Lymphoma diseases affect the cells in the lymphatic system.
- lymphomas immune cells called lymphocytes grow out of control and collect in lymph nodes, the spleen, in other lymph tissues or in neighbouring organs.
- Myeloma also known as multiple myeloma, develops in the bone marrow and affects plasma cells, which produce antibodies that attack infections and diseases.
- Examples of blood cancers include Acute Lymphoblastic Leukaemia (ALL), Acute Myeloid Leukaemia (AML), Hodgkin Lymphoma (HL) and Non-Hodgkin Lymphoma (NHL).
- references to “acute leukaemia” means the cancer progresses quickly and aggressively, usually requiring immediate treatment.
- ALL involves the development of large numbers of immature lymphocytes which are unable to fight infection. This causes the patient to have less room for healthy white blood cells, red blood cells, and platelets in the circulation. As a result, the patient usually suffers from a weakened immune system and the symptoms of anaemia, such as tiredness, breathlessness and an increased risk of excessive bleeding.
- the risk for developing ALL is highest in children younger than 5 years of age and it is the most common type of leukaemia that affects children. The risk then declines slowly until the mid-20s, and begins to rise again slowly after age 50. Overall, about 4 of every 10 cases of ALL are in adults.
- AML affects myeloblasts which results in the accumulation of abnormal monocytes and granulocytes in the bone marrow. AML may also affect myeloid stem cells resulting in abnormal red blood cells or platelets. As with ALL, this causes the patient to have lower levels of healthy white blood cells, red blood cells, and platelets in the circulation. AML is one of the most common types of leukaemia in adults and the average age at diagnosis is 68.
- HL and NHL are the two main types of lymphoma.
- HL has a particular appearance under the microscope and contains cells called Reed-Sternberg cells (a type of B lymphocyte that has become cancerous), whereas NHL looks different under the microscope and does not contain Reed-Sternberg cells.
- Most lymphomas are NHL and only about 1 in 5 are HL.
- NHL is a cancer affecting lymphocytes and usually starts in lymph nodes or lymph tissue. It is one of the more common cancers among children, teens and young adults.
- CBC complete blood count
- WBC white blood cell count
- X-ray, CT or PET scan can be used to detect swollen lymph nodes, however this is also non-specific.
- a bone marrow or lymph node biopsy is required.
- Immune cells of interest in the present invention include, but are not limited to, CD34+ cells, B-Cells, CD45+ (lymphocyte common antigen) cells, Alpha-Beta T-cells, Cytotoxic T-cells, Helper T-cells, Plasma Cells, Neutrophils, Monocytes, Macrophages, Red Blood Cells, Platelets, Dendritic Cells, Phagocytes, Granulocytes, Innate lymphoid cells, Natural Killer (NK) cells and Gamma Delta T-cells.
- immune cells are classified with the aid of combinatorial cell surface molecule analysis (e.g. via flow cytometry) to identify or group or cluster to differentiate immune cells into sub-populations. These can be then still further sub-divided with additional analysis.
- Uses and methods may additionally comprise measuring or detecting the level of chromatin fragments in the body fluid sample and said level is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
- chromatin fragment refers to a complex of proteins and nucleic acid whose origin lies in the chromosome or mitochondria of a cell. The term encompasses chromatin fragments found outside of cells, which may also be referred to as “cell free chromatin fragments”.
- a fragment of chromatin may contain a nucleosome and/or associated DNA and/or any of a huge variety of non-histone chromatin associated proteins in a multi-protein-nucleic acid complex.
- Some examples of non-histone chromatin associated proteins include transcription factors, cofactors, co-activators, co-repressors, RNA polymerase moieties, elongation factors, chromatin remodelling factors, mediators, STAT moieties, upstream binding factor (UBF) and others.
- Chromatin fragments, cfDNA or cf-nucleosomes may be measured by many methods including, for example without limitation, binding methods such as immunochemical or immunoassay methods or binding by DNA intercalating dyes, sequencing (for example to determine read numbers), rtPCR methods and spectroscopic methods.
- the method additionally comprises measuring or detecting the level of circulating (cell free) nucleosomes in the body fluid sample and said level is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
- the method additionally comprises measuring or detecting the level of circulating nucleosomes in the body fluid sample and said level is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA. Sequencing can only provide relative cell type proportions therefore in order to address the question of globally elevated levels of chromatin, the inventors profiled not only DNA concentration in these individuals, but also absolute concentrations of circulating nucleosomes.
- the nucleosome is the basic unit of chromatin structure and consists of a protein complex of eight highly conserved core histones (comprising of a pair of each of the histones H2A, H2B, H3, and H4). Around this complex is wrapped approximately 146 base pairs of DNA. Another histone, H1 or H5, acts as a linker and is involved in chromatin compaction.
- the DNA is wound around consecutive nucleosomes in a structure often said to resemble “beads on a string” and this forms the basic structure of open or euchromatin. In compacted or heterochromatin this string is coiled and super coiled into a closed and complex structure (Herranz and Esteller (2007) Methods Mol. Biol. 361: 25-62).
- NETs Neutrophil extracellular traps
- ETs extracellular traps
- nucleosome may refer to “cell free nucleosome” when detected in body fluid samples. It will be appreciated that the term cell free nucleosome throughout this document is intended to include any circulating chromatin fragment that includes one or more nucleosomes. “Epigenetic features”, “epigenetic signal features” or “epigenetic signal structures” of a cell free nucleosome as referred herein may comprise, without limitation, one or more histone post-translational modifications, histone isoforms, modified nucleotides and/or proteins bound to a nucleosome in a nucleosome-protein adduct.
- the cell free nucleosome may be detected by binding to a component thereof.
- component thereof refers to a part of the nucleosome, i.e. the whole nucleosome does not need to be detected.
- the component of the cell free nucleosomes may be selected from the group consisting of: a histone protein ⁇ i.e. histone H1, H2A, H2B, H3 or H4), a histone post-translational modification, a histone variant or isoform, a protein bound to the nucleosome i.e.
- nucleosome-protein adduct a nucleosome-protein adduct
- DNA fragment associated with the nucleosome and/or a modified nucleotide associated with the nucleosome.
- the component thereof may be histone (isoform) H3.1 or histone H1 or DNA.
- the component of the nucleosome is a histone protein.
- histone refers to histones and modifications thereof, as described herein (e.g. post-translational modifications, mutations, isoforms, variants and fragments of histones, such as clipped histones).
- nucleosomes per se refers to the total nucleosome level or concentration present in the sample, regardless of any epigenetic features the nucleosomes may or may not include. Detection of the total nucleosome level typically involves detecting a histone protein common to all nucleosomes, such as histone H4. Therefore, nucleosomes per se may be measured by detecting a core histone protein, such as histone H4. As described herein, histone proteins form structural units known as nucleosomes which are used to package DNA in eukaryotic cells and also form the repeating units present in ETs and NETs.
- Mononucleosomes and oligonucleosomes can be detected by Enzyme-Linked ImmunoSorbant Assay (ELISA) and several methods have been reported (e.g. Salgame et al. (1997); Holdenrieder et al. (2001); van Nieuwenhuijze et al. (2003)). These assays typically employ an anti-histone antibody (for example anti-H2B, anti-H3 or anti-H1, H2A, H2B, H3 and H4) as capture antibody and an anti-DNA or anti-H2A-H2B-DNA complex antibody as detection antibody.
- an anti-histone antibody for example anti-H2B, anti-H3 or anti-H1, H2A, H2B, H3 and H4
- Circulating nucleosomes are not a homogeneous group of protein-nucleic acid complexes. Rather, they are a heterogeneous group of chromatin fragments originating from the digestion of chromatin on cell death and include an immense variety of epigenetic structures including particular histone isoforms (or variants), post-translational histone modifications, nucleotides or modified nucleotides, and protein adducts.
- Uses and methods of the invention may include data for additional biomarkers, such as the level of cell free nucleosomes per se and/or an epigenetic feature of a cell free nucleosome.
- additional biomarkers such as the level of cell free nucleosomes per se and/or an epigenetic feature of a cell free nucleosome.
- the terms “epigenetic signal structure” and “epigenetic feature” are used interchangeably herein. They refer to particular features of the nucleosome that may be detected.
- the epigenetic feature of the nucleosome is selected from the group consisting of: a post-translational histone modification, a histone variant, a particular nucleotide and a protein adduct.
- the epigenetic feature of the nucleosome is the histone isoform H3.1.
- the structure of a nucleosome may vary by the inclusion of alternative histone isoforms or variants which are different gene or splice products and have different amino acid sequences.
- the epigenetic feature of the nucleosome comprises a histone variant or isoform.
- histone variant and “histone isoform” may be used interchangeably herein.
- Histone isoforms are known in the art. Histone isoforms can be classed into a number of families which are subdivided into individual types. The sequences of a large number of histone isoforms are known and publicly available for example in the National Human Genome Research Institute NHGRI Histone Database (Marino-Ramirez et al.
- the Histone Database an integrated resource for histones and histone fold-containing proteins.
- GenBank NIH genetic sequence
- EMBL Nucleotide Sequence Database the DNA Data Bank of Japan (DDBJ).
- isoforms of histone H2 include H2A1, H2A2, mH2A1, mH2A2, H2AX and H2AZ.
- histone isoforms of H3 include H3.1, H3.2 and H3t.
- the histone isoform is H3.1.
- the epigenetic feature is a mutated histone.
- the mutation is in histone 3 (H3).
- the mutation in H3 is when lysine 27 is replaced by a methionine (H3K27M).
- the structure of nucleosomes can vary by post translational modification (PTM) of histone proteins.
- PTM of histone proteins typically occurs on the tails of the core histones and common modifications include acetylation, methylation or ubiquitination of lysine residues as well as citrullination or methylation of arginine residues and phosphorylation of serine residues and many others.
- a histone PTM may occur on different isoforms (variants) of the histone.
- the lysine residues that occur on the tail of histone H3 isoforms H3.1, H3.2 and H3.3 may be modified by acetylation or methylation.
- the epigenetic feature of the cell free nucleosome may be a histone post translational modification (PTM).
- the histone PTM may be present on a core nucleosome histone (e.g. H2A, H2B, H3 or H4), or a linker histone (e.g. H1 or H5). Examples of PTMs are described in WO 2005/019826 and WO 2017/068359.
- the histone PTMs are selected from acetylation, methylation (which may be mono-, di- or tri-methylation), phosphorylation, ribosylation, citrullination, ubiquitination, hydroxylation, glycosylation, nitrosylation, glutamination and isomerisation.
- the histone PTM is methylation of a lysine residue.
- the methylation is of a histone 3 lysine residue.
- the histone PTM is selected from H3K4Me, H3K4Me2, H3K9Me, H3K9Me3, H3K27Me3 or H3K36Me3.
- the histone PTM is acetylation of a lysine residue. In a further embodiment, the acetylation is of a histone 3 lysine residue. In a yet further embodiment, the histone PTM is selected from H3K9Ac, H3K14AC, H3K18AC or H3K27AC. In another embodiment, the histone PTM is H4PanAc. In one embodiment, the histone PTM is phosphorylation of a serine residue. In a further embodiment, the phosphorylation is of an isoform X of histone 2A (H2AX) serine residue or phosphorylation of a histone 3 serine residue.
- H2AX histone 2A
- the histone PTM is selected from pH2AX or H3S10Ph. In one embodiment, the histone PTM is selected from citrullination or ribosylation. In a further embodiment, the histone PTM is citrullinated H3 (H3cit) or citrullinated H4 (H4cit). In a further embodiment, the histone PTM is citrullination of a histone 3 arginine residue. In a yet further embodiment, the histone PTM is H3R8Cit.
- the histone PTM is selected from the group consisting of: H3K4Me, H3K4Me2, H3K9Me, H3K9Me3, H3K27Me3, H3K36Me3, H3K9Ac, H3K14AC, H3K18AC, H3K27AC, H4PanAc, pH2AX, H3S10Ph and H3R8Cit.
- a group or class of related histone post translational modifications may also be detected.
- a typical example, without limitation, would involve a 2-site immunoassay employing one antibody or other selective binder directed to bind to nucleosomes and one antibody or other selective binder directed to bind the group of histone modifications in question.
- Examples of such antibodies directed to bind to a group of histone modifications would include, for illustrative purposes and without limitation, anti-pan- acetylation antibodies (e.g. a Pan-acetyl H4 antibody [H4panAc]), anti-citrullination antibodies or anti-ubiquitin antibodies.
- the epigenetic feature is a DNA modification.
- nucleosomes also differ in their nucleotide and modified nucleotide composition. Some nucleosomes may comprise more 5-methylcytosine residues, or 5-hydroxymethylcytosine residues or other nucleotides or modified nucleotides, than other nucleosomes.
- the epigenetic feature is a DNA modification selected from 5-methylcytosine or 5-hydroxymethylcytosine.
- the defined calibrated DNA modification is 5-methylcytosine or 5-hydroxymethylcytosine.
- nucleosome protein adducts A further type of circulating nucleosome subset is nucleosome protein adducts. It has been known for many years that chromatin comprises a large number of non-histone proteins bound to its constituent DNA and/or histones. These chromatin associated proteins are of a wide variety of types and have a variety of functions including transcription factors, transcription enhancement factors, transcription repression factors, histone modifying enzymes, DNA damage repair proteins and many more. These chromatin fragments including nucleosomes and other non-histone chromatin proteins or DNA and other nonhistone chromatin proteins are described in the art. Therefore, in one embodiment, the epigenetic feature comprises one or more protein-nucleosome adducts or complexes.
- epigenetic feature of cell free nucleosomes may be detected in methods and uses of the invention.
- the epigenetic features may be the same type (e.g. PTMs, histone isoforms, nucleotides or protein adducts) or different types (e.g. a PTM in combination with a histone isoform).
- a post-translational histone modification and a histone variant may be detected (/.e. more than one type of epigenetic feature is detected).
- more than one type of post-translational histone modification is detected, or more than one type of histone isoform is detected.
- the method may additionally comprise measuring or detecting the level of circulating cell free nucleosomes.
- Said measurement or detection comprises methods described hereinbefore, such as an immunoassay, immunochemical, mass spectroscopy, chromatographic, chromatin immunoprecipitation or biosensor method.
- the measurement or detection employs a single binding agent.
- the measurement or detection comprises a 2-site immunometric assay employing two binding agents.
- the terms “antibody”, “binder” or “ligand” as used herein are not limiting but are intended to include any binder capable of specifically binding to particular molecules or entities and that any suitable binder can be used in the method of the invention.
- the binding agent is an antibody.
- the binding agent is a chromatin binding protein.
- the most commonly used epitope binders in the art are antibodies or derivatives of an antibody that contain a specific binding domain.
- the antibody may be a polyclonal antibody or a monoclonal antibody or a fragment thereof capable of specific binding to the epitope.
- any binder capable of binding to a particular epitope may be used for the purposes of the invention.
- the reagents may comprise one or more ligands or binders, for example, naturally occurring or chemically synthesised compounds, capable of specific binding to the desired target.
- a ligand or binder may comprise a peptide, an antibody or a fragment thereof, or a synthetic ligand such as a plastic antibody, or an aptamer or oligonucleotide, capable of specific binding to the desired target.
- the antibody can be a monoclonal antibody or a fragment thereof. It will be understood that if an antibody fragment is used then it retains the ability to bind the biomarker so that the biomarker may be detected (in accordance with the present invention).
- a ligand/binder may be labelled with a detectable marker, such as a luminescent, fluorescent, enzyme or radioactive marker; alternatively or additionally a ligand according to the invention may be labelled with an affinity tag, e.g. a biotin, avidin, streptavidin or His (e.g. hexa-His) tag.
- a detectable marker such as a luminescent, fluorescent, enzyme or radioactive marker
- an affinity tag e.g. a biotin, avidin, streptavidin or His (e.g. hexa-His) tag.
- affinity tag e.g. a biotin, avidin, streptavidin or His (e.g. hexa-His) tag.
- ligand binding may be determined using a label-free technology for example that of ForteBio Inc.
- the terms antibody or binder as used herein are interchangeable and refer to any moiety capable of specific binding to an epitope.
- the binding agent is directed to a histone, nucleosome core protein, DNA epitope or a protein adducted to a nucleosome.
- the binding agent is directed to a histone isoform, such as a histone isoform of a core histone, in particular a histone H3 isoform.
- the binding agent specifically binds to histone isoform H3.1.
- a binding agent is considered to “specifically bind” if there is a greater than 10 fold difference, and preferably a 25, 50 or 100 fold difference between the binding of the agent to a particular target epitope compared to an non-target epitope.
- the binding agent may comprise an MHC molecule or part thereof which comprises the peptide binding groove.
- the agent may comprise an anti-peptide antibody.
- antibody includes a whole immunoglobulin molecule or a part thereof or a bioisostere or a mimetic thereof or a derivative thereof or a combination thereof. Examples of a part thereof include: Fab, F(ab)'2; and Fv. Examples of a bioisostere include single chain Fv (scFv) fragments, chimeric antibodies, bifunctional antibodies.
- the term “mimetic” relates to any chemical which may be a peptide, polypeptide, antibody or other organic chemical which has the same binding specificity as the antibody.
- derivative as used herein in relation to antibodies includes chemical modification of an antibody. Illustrative of such modifications would be replacement of hydrogen by an alkyl, acyl, or amino group.
- the binding agent may be an aptamer or a non- immunoglobulin scaffold such as an affibody, an affilin molecule, an AdNectin, a lipocalin mutein, a DARPin, a Knottin, a Kunitz-type domain, an Avimer, a Tetranectin or a transbody.
- a non- immunoglobulin scaffold such as an affibody, an affilin molecule, an AdNectin, a lipocalin mutein, a DARPin, a Knottin, a Kunitz-type domain, an Avimer, a Tetranectin or a transbody.
- the method of measuring the level of nucleosomes comprises contacting the sample with a solid phase comprising a binding agent that detects nucleosomes or a component thereof, and detecting binding to said binding agent.
- the method of measuring the level of nucleosomes may comprise: (a) contacting the sample with a first binding agent which binds to an epigenetic feature of a cell free nucleosome; (b) contacting the sample bound by the first binding agent in step (a) with a second binding agent which binds to cell free nucleosomes; and (c) detecting or quantifying the binding of the second binding agent in the sample.
- the measuring the level of nucleosomes may comprise: (a) contacting the sample with a first binding agent which binds to cell free nucleosomes; (b) contacting the sample bound by the first binding agent in step (a) with a second binding agent which binds to an epigenetic feature of the cell free nucleosome; and (c) detecting or quantifying the binding of the second binding agent in the sample.
- the binding agent is linked to a solid phase. Therefore, the circulating chromatin fragment (e.g. nucleosome) may be bound and isolated from the sample before analysis.
- Methods of the invention may be for use in cancer detection or diagnosis, early cancer screening, residual disease detection, relapse detection, metastasis detection or a combination thereof.
- detecting or “diagnosing” as used herein encompasses identification, confirmation, and/or characterisation of a disease state.
- Methods of detecting, monitoring and of diagnosis according to the invention are useful to confirm the existence of a disease, to monitor development of the disease by assessing onset and progression, or to assess amelioration or regression of the disease.
- Methods of detecting, monitoring and of diagnosis are also useful in methods for assessment of clinical screening, prognosis, choice of therapy, evaluation of therapeutic benefit, i.e. for drug screening and drug development.
- the method described herein is repeated on multiple occasions. This embodiment provides the advantage of allowing the detection results to be monitored over a time period. Such an arrangement will provide the benefit of monitoring or assessing the efficacy of treatment of a disease state. Such monitoring methods of the invention can be used to monitor onset, progression, stabilisation, amelioration, relapse and/or remission.
- test samples may be taken on two or more occasions.
- the method may further comprise comparing the level of the biomarker(s) present in the test sample with one or more control(s) and/or with one or more previous test sample(s) taken earlier from the same test subject, e.g. prior to commencement of therapy, and/or from the same test subject at an earlier stage of therapy.
- the method may comprise detecting a change in the nature or amount of the biomarker(s) in test samples taken on different occasions.
- a change in the level of the biomarker in the test sample relative to the level in a previous test sample taken earlier from the same test subject may be indicative of a beneficial effect, e.g. stabilisation or improvement, of said therapy on the disorder or suspected disorder.
- the method of the invention may be periodically repeated in order to monitor for the recurrence of a disease.
- Methods of the invention may be used to identify a patient suitable for cancer treatment, such as immunotherapy. Therefore, methods of the invention may be for use in a method for monitoring the efficacy of a therapy in a subject having, suspected of having, or being predisposed to cancer.
- Methods for monitoring efficacy of a therapy can be used to monitor the therapeutic effectiveness of existing therapies and new therapies in human subjects and in non-human animals (e.g. in animal models). These monitoring methods can be incorporated into screens for new drug substances and combinations of substances. [109] In a further embodiment the monitoring of more rapid changes due to fast acting therapies may be conducted at shorter intervals of hours or days.
- Biomarkers for detecting the presence of a disease are essential targets for discovery of novel targets and drug molecules that retard or halt progression of the disorder. As the level of the biomarker is indicative of disorder and of drug response, the biomarker is useful for identification of novel therapeutic compounds in in vitro and/or in vivo assays. Biomarkers described herein can be employed in methods for screening for compounds that modulate the activity of the biomarker.
- biomarkers for a disease state permits integration of diagnostic procedures and therapeutic regimes.
- the biomarkers provide the means to indicate therapeutic response, failure to respond, unfavourable side-effect profile, degree of medication compliance and achievement of adequate serum drug levels.
- the biomarkers may be used to provide warning of adverse drug response. Biomarkers are useful in development of personalized therapies, as assessment of response can be used to fine-tune dosage, minimise the number of prescribed medications, reduce the delay in attaining effective therapy and avoid adverse drug reactions.
- biomarker of the invention can be used to titrate the optimal dose, predict a positive therapeutic response and identify those subjects at high risk of severe side effects.
- Biomarker-based tests provide a first line assessment of ‘new’ subjects, and provide objective measures for accurate and rapid diagnosis, not achievable using the current measures.
- Biomarker monitoring methods are also vital as subject monitoring tools, to enable the physician to determine whether relapse is due to worsening of the disorder. If pharmacological treatment is assessed to be inadequate, then therapy can be reinstated or increased; a change in therapy can be given if appropriate. As the biomarkers are sensitive to the state of the disorder, they provide an indication of the impact of drug therapy.
- the subject is suspected of relapse to a cancer.
- Minimal residual disease is the name given to small numbers of cancer cells that remain in the person during treatment, or after treatment when the patient is in remission (/.e. patients with no symptoms or signs of disease).
- MRD is the major cause of relapse in cancer. Methods of the invention are therefore useful in monitoring patients who are suspected of relapse, particularly patients who are in remission from cancer.
- the subject tested using the methods described herein may present with symptoms indicative of cancer, for example the symptoms of a haematological cancer may include anaemia, leucocytosis and/or swollen lymph nodes.
- the subject has a high level of leucocytosis. This may also be referred to a “high white blood cell count”.
- Haematological cancers typically cause increased proliferation of abnormal white or red blood cells which results in a high white blood cell count.
- leucocytosis is not sufficient to diagnose a patient with a haematological cancer (in particular leukaemia) because it is frequently a sign of an inflammatory response, most commonly the result of infection. Therefore, methods of the invention are able to provide a more specific differential method to identify patients who are likely to be suffering from cancer or an inflammatory condition.
- Cut-off values can be predetermined by analysing results from multiple patients and controls, and determining a suitable value for classifying a subject as with or without the disease. For example, for diseases where the level of biomarker is higher in patients suffering from the disease, then if the level detected is higher than the cut-off, the patient is indicated to suffer from the disease. Alternatively, for diseases where the level of biomarker is lower in patients suffering from the disease, then if the level detected is lower than the cut-off, the patient is indicated to suffer from the disease.
- the advantages of using simple cut-off values include the ease with which clinicians are able to understand the test and the elimination of any need for software or other aids in the interpretation of the test results. Cut-off levels can be determined using methods in the art.
- control subjects may be selected on a variety of basis which may include, for example, subjects known to be free of the disease or may be subjects with a different disease (for example, for the investigation of differential diagnosis).
- the “control” may comprise a healthy subject, a non-diseased subject and/or a subject without a haematological cancer. Comparison with a control is well known in the field of diagnostics.
- Both positive and negative controls may be used.
- the presence of a cancer disease in a subject may be confirmed by comparison of results with known cancer controls (positive control) as well as with known disease free or non-cancer controls (negative control).
- the method additionally comprises determining at least one clinical parameter for the patient.
- This parameter can be used in the interpretation of results.
- Clinical parameters may include any relevant clinical information for example, without limitation, gender, weight, Body Mass Index (BMI), smoking status, temperature and dietary habits. Therefore, in one embodiment, the clinical parameter is selected from the group consisting of: age, sex and body mass index (BMI).
- the method of the invention is performed to identify a subject at high risk of having a cancer and therefore in need of further testing (/.e. further cancer investigations).
- the further testing may involve one or more of: biopsy (such as bone marrow biopsy or lymph node biopsy), cytogenetic testing, immunophenotyping, CT scanning, X-ray (in particular chest X-ray to identify swollen lymph nodes) and/or lumbar puncture.
- Methods and biomarkers described herein may be used to identify if a patient is in need of a biopsy, in particular a bone marrow or lymph node biopsy (e.g. for patients with suspected haematological cancer). Therefore, according to a further aspect of the invention there is provided a method of identifying a patient in need of a biopsy comprising performing the method of the invention and using the results obtained to identify whether the patient is in need of a biopsy.
- the subject may be a human or an animal subject.
- the subject is a human subject.
- the subject is a (non-human) animal subject.
- the animal is a companion animal (also referred to as a pet or domestic animal).
- Companion animals include, for example dogs, cats, rabbits, ferrets, horses, cows, or the like.
- the companion animal is a dog or cat, particularly a dog. The methods described herein may be performed in vitro, or ex vivo.
- Plasma samples were procured through the following Commercial Biobanks: Discovery Life Sciences (DLS), Alternative Research, and Bay Biosciences. Selection of 52 self-reported healthy individuals for Illumina sequencing were selected based on donor age criteria: an even distribution of ages from 40 to 90 years old. Selection of 207 cancer cases was based on Stage lll/IV status and diversity of cancer types. Selection of healthy samples for Nanopore sequencing was based on volume of plasma available and the age of the donor. Samples were evenly distributed across ages with 10 samples in each of the following age groups: 40-49, 50-59, and 60-69 years old. 15 samples were procured in the age group 70-79, and 5 samples in the age group 80-89. Healthy samples were only considered on a lack of cancer diagnosis. Cancer samples were selected based on late stage diagnosis (III or IV), type of cancer, treatment status (untreated) and large sample volume.
- Plasma samples were received and subsequently stored at -80°C. Aliquots were thawed at room temperature (RT) for up to 2 hours, depending on the aliquot volume. Plasma was then spun at 14000 x g for 2 minutes at room temperature and transferred to a new tube avoiding any pelleted fraction.
- Nucleosome levels in plasma were quantified using the Nu.Q® H3.1 assay developed by Volition for use on the IDS i 10 instrument, using 250 uL of plasma according to the manufacturer’s recommendations. Samples were completed either in duplicate or triplicate. Raw RLU (Relative Light Units) were converted to concentrations of nucleosomes (ng/mL) using the provided kit standards, and replicate measurements were averaged.
- DNA from the basic characterization step was used for Illumina Sequencing. If more was needed, additional DNA from 1-5 mL of plasma was extracted using the QIAamp® Circulating Nucleic Acid Kit (Qiagen, catalogue # 55114).
- Illumina sequencing libraries were constructed using SRSLY® PicoPlus DNA NGS Library Preparation Kit (ClaretBio - Cat: CBS-K250B). Libraries were sequenced by Discovery Life Sciences using the NovaSeq 6000 S4 200 cycle flow cell. Genome mapping to hg38 was performed using the Illumina BaseSpace DRAGEN pipeline.
- methylation BED files were created using modkit v. 0.1.5 with the command “modkit pileup --cpg --combine-strands --ignore h --filter-threshold 0.9 --bedgraph”.
- Modkit pileup --cpg --combine-strands --ignore h --filter-threshold 0.9 --bedgraph For methylation deconvolution, Minimap BAM files were used directly (details below).
- Fragment lengths were extracted from ONT Minimap BAMs using the script “fragmentationReports.py”, which uses Pysam and defines the fragment length based on the primary alignment, calculating the difference between the start and end coordinates on the reference genome.
- Fragment length distributions for PCA analysis were created by defining bins as 10 A x where x contains a range 50 to 50,000. Raw fragment lengths are log transformed and assigned to the closest bin by rounding to the nearest increment of 10 A 0.01. The proportion of fragments is defined as the number of fragments in a given bin divided by the total number of fragments in all bins. For the proportion of total DNA, each fragment in a bin is multiplied by the number of base pairs in the fragment and summed to get the bin base pair total. Each bin total is then divided by the sum of all base pair sums in all bins.
- Minimap BAMs were filtered using the script “stratifyBamByFraglen.py”, which uses the same Pysam code as above to calculate the fragment length of each read, and write that read to the output BAM file only if it is within the correct length range.
- the code to perform this analysis is available as the script “fraglens_to_histogram.py”.
- ONT Minimap BAMs were processed using the script “fragmentationReports.py”, which uses Pysam to perform the following analysis. We collect all reads that contain a perfect match to the final 5 base pairs of the ONT adapter sequence (“CACCT”) as the final sequence in the soft-clipped portion of the read. We then collect the first 5 base pairs of the aligned portion of the read, which is by definition adjacent to the adapter sequence. This is the “end motif”. If any of the adjacent 5 base pairs have a basecalling quality less than 20, the end is not counted. We count each of the two ends of each fragment as independent observations.
- CACCT ONT adapter sequence
- DNA proportions were calculated for fragment length bins as described above. PCA was performed using the sklearn. decomposition Python package. PCA was performed on column normalized versions of the fragment length distributions. These were defined by taking the proportion of DNA in each bin for of a given sample, and calculating a z-score based on the mean and standard deviation of the bin across all 21 healthy samples (not including the 3 “unspun” healthy samples).
- Oxford Nanopore Technologies may be used to identify DNA methylation states which can be used to determine cell-of-origin (COO) from circulating plasma DNA.
- ONT Oxford Nanopore Technologies
- each cancer and healthy sample was represented as the fraction of DNA sequenced, in length bins from 50-60,000 bp. Fragment lengths are often quantified as the fraction of sequencing reads in each bin. However, we reasoned that for ONT sequencing it would be preferable to analyze the amount of DNA present in each bin, rather than the sequence counts (thus, a 1kb fragment will be weighted 10x more than a 100bp fragment). One reason is that ONT sequencing is asynchronous and thus longer DNA fragments take proportionally longer to sequence than short fragments (unlike short-read sequencing). We also reasoned that quantifying the proportion of DNA rather than fragment count would better reflect the overall cell of origin of the DNA, since 1 cell yields a constant number of base pairs, not a constant number of sequence fragments.
- PCA Principal Component Analysis
- Figure 2A Principal Component Analysis
- the first component (PC1) bifurcated a majority of the cancers (in yellow) from a set of 7 other cancers (in red) - the majority of these PC1-positive cancers were AML.
- the second component (PC2) separated most cancers from the healthy samples. Because our aim is to identify aberrations from normal variation in healthy individuals, we z-score normalized all bins by their mean and standard deviation in normal samples, and re-performed PCA, which allowed better resolution of cancer vs. healthy as well as the set of 7 outlier cancers (Figure 2B). The top two components of this bin-normalized PCA captured 87% of variance.
- ONT sequencing can analyze long cfDNA fragments, but some of these may represent contamination from genomic DNA released as part of sample processing, especially from cell lysis during freezing and thawing of samples.
- Our standard processing protocol includes a post-thaw high-speed spin to reduce the number of such fragments (Figure 3A).
- Figure 3A the “healthy ultralong”, Figure 2C-D and Figure 3B). While this pattern was rare and thus not captured by the first 4 components of our PCA analysis, we found it interesting because the fragments were longer than the PC1 “laddered fragments” (or “hypofragmented” fragments) observed in AML and other cancers.
- PCA Principal Component Analysis
- PC1-high The hypofragmented cancer group (PC1-high) was strongly distinguished from the 3 healthy samples with likely genomic contamination, which are high in PC2 ( Figure 4A bottom).
- PC2 is defined by a size range of 7,500-53,000 bp. Consistent with our earlier analysis, the post-spin samples had much lower PC2 scores than the unspun samples (Figure 4A, PC scores), and much less total cfDNA in the 7,500-53,000 bp bin ( Figure 4A “Frac DNA”, 4-9% for post-spin samples and 38-48% for pre-spin samples). Aside from these three sample groups, all other healthy volunteers and cancers had low values for PC1 and PC2 ( Figure 4A, “PC scores”).
- PC3 The third principal component (PC3) defined a hyperfragmented set of 17 cancers (Figure 4C, “PC3-high cancers”). These had elevated cfDNA in the range 75-145 bp and 245-295 bp (corresponding to short mono-nucleosomes and short di-nucleosomes, respectively). Longer fragments, including those greater than 1kb, were under-represented. Interestingly, PC3-high cancers tended to have lower frequencies of the DNASE1L3- associated CC end motifs ( Figure 4C).
- PCA analysis in this Example included the 3 unspun samples. To ensure that these were not influencing the clustering, we performed PCA with these samples excluded. This produced nearly identical versions of PC1 and PC3 (as the first two components), and identical grouping of all cancer samples.
- EXAMPLE 5 Origins of short mono and dinucleosome fragments (75-145bp, 245-295 bp)
- PC3 mono and di-nucleosomes
- DNA methylation allows for “deconvolution” of plasma DNA into its constituent cell types using reference methylation atlases.
- For the 5 samples with detectable CNAs we were able to identify strong signals from the correct cell of origin, compared to only 1 of the 5 samples without CNAs (Figure 5A). While our ability to accurately deconvolute cell types at this sequencing depth is limited, this does provide additional evidence that the samples without CNAs had low cancer content.
- the absolute fraction of the cancer cell of origin (COO) was never above 50%, consistent with CNA analysis indicating that a significant contribution to elevated cfDNA levels is not attributed to cancer DNA.
- DNASE1L3-associated CC motifs were also investigated in long fragments.
- the results shown in Figure 6 focus on the differences between the PC1 group and the PC2 group. All samples from the PC1 class, totalling seven samples, and six samples from the PC2 class were investigated. Three of the six samples from the PC2 class represent the three healthy individuals that have been spun, showing a high level of PC2 fragments, and then the same three samples unspun, displaying PC2 fragments. Each one of these is aligned in the heat map of Figure 6, where the plot illustrates the CC end motif frequency as a function of fragment length for these samples.
- laddered fragments in this size range can be released by necrotic cells (lingerer et al. 2021). It is also possible that active DNASE1L3 during blood handling could fragment longer DNA from either necrosing or lysed cells.
- Nucleosome concentrations were quantified using the Nu.Q® H3.1 ELISA assay.
- DNA libraries were prepared and sequenced using Oxford Nanopore Technologies Native Barcoding Kit v. 14, PromethlON R10.4.1 flow cells, and the P2 Solo sequencer. Base and modification calling were performed using the ONT Dorado basecaller v. 0.4.1 with the dna_r10.4.1_e8.2_400bps_hac@v4.2.0_5mCG_5hmCG modification calling model. DNA modifications were extracted using Oxford Nanopore modkit 0.1.13. CpG islands for canFam5 were taken from the UCSC genome browser.
- the commonly hypermethylated human CpG island promoters described above were mapped to orthologous gene promoters in the canFam5 genome using orthologous gene mappings from the Zoonomia TOGA dataset (Kirilenko, 2023). This set of promoters was then filtered for those where the promoter (transcription start site) overlapped a CpG island in the canFam5 UCSC CpG island track. Finally, we removed CpG island promoters that were longer than 20kb in the canFam5 genome, since most indicated a canFam5 genome assembly errors within repetitive telomeric regions upon manual inspection. The canine CpG island methylation signature was the mean of the resulting 1,272 resulting CpG islands.
- PCA Principal Component Analysis
- CGI CpG island
- EXAMPLE 9 Hyperfragmentation is a strong predictor of healthy vs. cancer status and inflammation vs. cancer status in client-owned dogs
- Inflammation is often associated with increased levels of circulating nucleosomes and DNA in humans and dogs, including in the canine inflammatory condition Pyometra (a uterine infection; while most commonly associated with dogs, the infection has also been identified in other animals, such as cattle, swine, cats and many rodent animals).
- Pyometra a uterine infection; while most commonly associated with dogs, the infection has also been identified in other animals, such as cattle, swine, cats and many rodent animals.
- INF high circulating nucleosomes
- LSA treatment naive lymphoma samples
- HSA treatment naive hemangiosarcoma samples
- the first principal component (pel) had values that were positively correlated to the fraction of fragments less than 150 bp in length, and thus we termed this the “hyperfragmentation” component.
- Figure 10B global methylation
- Figure 10C tumor fraction from ichorCNA analysis
- CGI CpG island methylation
- MLR multivariable logistic regression
- EXAMPLE 10 Using Hyperfragmentation and cancer DNA markers to monitor dogs during and after treatment
- DNA libraries were prepared and sequenced using Oxford Nanopore Technologies Native Barcoding Kit v. 14, PromethlON R10.4.1 flow cells, and the P2 Solo sequencer, to a “shallow” depth of 1-2x genomic coverage.
- Base and modification calling were performed using the ONT Dorado basecaller v. 0.4.1 with the dna_r10.4.1_e8.2_400bps_hac@v4.2.0_5mCG_5hmCG modification calling model.
- DNA modifications were extracted using Oxford Nanopore modkit 0.1.13. CpG islands for canFam5 were taken from the UCSC genome browser.
- the “--modbam_qual 0.9” setting was used to filter out any modification base with a modification probability score less than 0.9.
- NETs Neutrophil Extracellular Traps
- Kirilenko et al. 2023 Integrating gene annotation with orthology inference at scale. Science 380: eabn3107.
- Dnase1l3 deletion causes aberrations in length and end-motif frequencies in plasma DNA. Proc Natl Acad Sci 116: 641-649.
- CelFiE-ISH a probabilistic model for multi-cell type deconvolution from single-molecule DNA methylation haplotypes. Genome Biol. 25: 151.
- Van Der Pol et al. 2023 Real-time analysis of the cancer genome and fragmentome from plasma and urine cell-free DNA using nanopore sequencing. EMBO Mol Med 15: e 17282.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Engineering & Computer Science (AREA)
- Immunology (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Pathology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Hospice & Palliative Care (AREA)
- Oncology (AREA)
- Cell Biology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention relates to methods of determining a tissue of origin, a cell type of origin, origination from a cancerous or immune cell, or a combination thereof of cell-free DNA by using fragment length data.
Description
METHOD FOR DETERMINING THE ORIGIN OF CIRCULATING DNA
FIELD OF THE INVENTION
[1] The present invention relates to methods of determining the origin of cell free (circulating) DNA using a nanopore sequencer.
BACKGROUND OF THE INVENTION
[2] Identification and characterization of circulating tumor DNA (ctDNA) in cancer patients has important functions in oncology, including early detection and monitoring for recurrence. When bulk cell-free DNA without enrichment is sequenced using Next Generation Sequencing (NGS), the fraction of reads obtained corresponding to ctDNA obtained is proportional to the fraction of tumor DNA in circulation, whereas the remainder comes from other cell types, typically derived from blood or other sites of collateral damage.
[3] One limitation of previous studies using ctDNA, is that short-read sequencing is generally used which is strongly biased for fragments of length 700 or less. This makes some of the DNA measured by bulk methods “invisible”. Recent studies using long-read technologies indicate that fragments much longer than 1kb are present in plasma of healthy and cancer patients.
[4] Fragmentation patterns can also be used to understand the origin of circulating DNA, although this is complicated by the fact that DNAse activity is not restricted to these cells but also occurs during circulation. Unlike bisulfite sequencing, which results in additional fragmentation of input DNA and has been used for many of the circulating cancer DNA studies, nanopore sequencing preserves native fragmentation properties. One of the important global fragmentation changes associated with cancer is the shortening of the typical mono-nucleosomes and di-nucleosome fragments. While this shortening process disproportionally affects DNA derived from cancer relative to other non-cancer cell types, it appears to affect non-cancer DNA as well.
[5] There remains a need in the art to provide improved methods for determining the origin of circulating DNA. We now report methods for determining whether circulating chromatin fragments present in a blood sample have a cancer origin or non-cancer origin (e.g. an immune cell origin).
SUMMARY OF THE INVENTION
[6] According to a first aspect, there is provided a method of determining a tissue of origin, a cell type of origin, origination from a cancerous or immune cell, or a combination thereof of cell-free DNA (cfDNA), the method comprising:
(i) providing a body fluid sample comprising cfDNA;
(ii) passing said cfDNA through a nanopore sequencer to produce a sequence of said cfDNA and obtain fragment length data of said cfDNA; and
(iii) identifying for said cfDNA passed through the nanopore sequencer a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof based on the fragment length data; thereby determining a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
[7] According to a further aspect of the invention, there is provided the use of a size profile obtained by nanopore sequencing of all circulating chromatin fragments present in a body fluid sample as a biomarker for the diagnosis of cancer.
[8] According to a further aspect of the invention, there is provided the use of a size profile obtained by nanopore sequencing of all circulating chromatin fragments present in a body fluid sample to identify a patient suitable for cancer treatment, such as immunotherapy.
BRIEF DESCRIPTION OF THE FIGURES
[9] FIGURE 1 : Nanopore sequencing of a selected subset of 37 cancer samples and 21 healthy samples. (A) ichorCNA tumor fractions for all samples sequenced using Oxford Nanopore Technologies (ONT). (B) Comparison of tumor fractions derived from Illumina sequencing (x axis) and ONT (y axis). 5/16 cancer samples with detectable CNA in ONT (y axis>0) were not detectable by Illumina (x axis=0). Healthy samples are not visible - all had a value of 0,0. (C) H3.1 nucleosome levels vs. Illumina tumor fractions or ONT tumor fractions, for the same set of samples. (D) Comparison of ONT ichorCNA cancer detection and DNA methylation-based cancer detection. “ichorCNA” indicates whether CNAs were detected by ichorCNA. Methylation Cell of Origin (“Meth. COO”) indicates that the correct cancer cell of origin had the highest normalized cell type proportion, as shown in (E) for the six relevant cell types. “Global hypometh” indicates a cancer sample had significantly lower genome-wide methylation than healthy samples (see Methods). (E) Details of COO analysis showing z-score normalized contributions for six relevant cell types in the analysis.
[10] FIGURE 2: Identification of cancer-associated fragment length features. (A-B) Principal component analysis of all cancer and healthy samples, using either unnormalized (A) or z-score normalized (B) counts for each fragment length bin. Lines indicate values above 50 for PC1 and values above -2 for PC2. (C) Unnormalized counts of all cancer and healthy samples, ordered by PC1 and then PC2. Red line selects PC2 values greater than - 2. Loadings for each of the PCs are shown above ONT samples. Three samples with significant fragments over 10kb are separated out as “Healthy-Ultralong” samples. (D) Same as C, but using bin counts z-score normalized using mean and standard deviation of healthy samples.
[11] FIGURE S: (A) Typical “spun” plasma processing workflow and alternative “unspun” workflow. (B) All ONT samples, showing proportion of DNA in fragments greater than 7.5kb in length. (C) Bioanalyzer trace showing one of the Healthy-Ultralong samples, together with a pre-centrifugation (“Unspun”) aliquot of the same plasma. (D) H3.1 concentration and (E) cfDNA concentration for 3 outlier healthy samples and 4 other healthy samples. (F-G) Repeated PCA analysis run including the three Unspun counterparts of the three Healthy-Ultralong samples. The right side of (E) shows the fraction of sequenced DNA in each of the fragment length ranges defined by each of the 3 principal components (right). (H-J) Proportion of fragments greater than 500bp (H), 1 ,000 bp (I) and 7,500 bp (J). (K) Proportion of fragment ends starting with CC in ONT sequencing, as a function of fragment length, for 4 typical healthy volunteers (black) and the three outlier healthy volunteers processed with the spun workflow (red) and the unspun workflow (blue). (L) The proportion of fragment ends starting with CCCA in ONT sequencing, in typical length fragments (100- 400bp, x axis) vs. fragments greater than 7.5kb (y axis). In (H-J), ** indicates p<0.002 by ratio paired t-test, two-tailed.
[12] FIGURE 4: Unsupervised clustering by fragment length. (A) All samples ordered by first three principal components. The first 3 sample groups are cancer samples with the cancer type indicated by color, and the second 3 groups are healthy samples. Bar plots on the left show total cfDNA concentration (ng/mL), total plasma H3.1 nucleosome concentration (ng/mL), tumor DNA fraction from ichorCNA from ONT sequencing (“CNA ONT”) and Illumina sequencing (“CAN ILLUMINA”), and the fraction of ONT fragments ending with the motif CCCA. In all bar plots, higher bars are also indicated with darker colors. The heatmap to the right of these bar plots shows the proportion of DNA sequenced in different fragment length bins, on a log scale from 50 bp to 50,000 bp. The left heatmap shows the unnormalized proportion of sequenced DNA in each bin, and the right heatmap shows z-scores which are column normalized according to the mean and standard deviation
across all spun healthy samples. Above the heatmaps are the PC loadings for each of the first 3 PCs. The heatmap to the right shows the same data, but each fragment length bin/column is z-score normalized according to the mean and standard deviation across all healthy samples (not including “pre-spin” samples). The bar plots on the right show the eigenvalues for the 3 principal components (“PC scores”), and the percentage of sequenced DNA in the ranges defined the by those PCs (“Frac DNA”). One sample with a sub-threshold but positive PC1 value is marked with an asterisk. (B) CC end motif frequencies as a function of fragment length, for the PC1 high and PC2 high samples. (C) CC end motif frequencies as a function of fragment length, for the 6 cancer samples with the highest PC3 scores, and the 6 cancer samples with the lowest PC3-scores.
[13] FIGURE S: Origins of short mono and dinucleosome fragments (75-145bp, 245- 295 bp) (A) All samples, showing Cell of Origin (COO) scores for relevant cell types. 10 samples that had positive PC3 scores and >1.5M reads are shown with asterisk or “X”. Those with asterisk have a top COO cell type matching the cancer type, whereas the correct cell type was not detected in the four marked with “X”. (B) ichorCNA tumor fraction for the 10 samples described above, for only the typical mononucleosome fragments (x axis) and the shorter mono-nucleosome and di-nucleosome fragments associated with PC3 (y axis). (C) Methylated percentage averaged across CpGs from 2,274 CpG islands commonly hypermethylated in cancer. Elevated methylation at these regions was observed in the same six samples with elevated tumor fraction (Prostate cancer CA-14, Colorectal cancers CA-35, and CA-75, Non Hodgkins Lymphomas CB-64, CA-19, and CA-07) (D) Cell type fraction for the 4 cell types correctly identified among the 10 samples, as estimated using COO deconvolution.
[14] FIGURE S: Results for DNASE1L3-associated CC motifs in long fragments. CC end motif frequency is plotted as a function of fragment length.
[15] FIGURE 7: Links between hyperfragmentation and elevated circulating chromatin levels in cancer. (A) 30 cancer samples from the PC3-high and PC1-3 low groups, ordered by PC3 value. Heatmap rows show sample features clustered by hierarchical clustering based on 1-p (Spearman’s rho). Heatmap rows are normalized to row min and max values. (B) Spearman correlation values for all pairwise comparisons. (C-H) PC3 values plotted against the following features: (C) fraction of short fragments, (D) fraction of fragment ends with CCCA motif, (E) detection by ichorCNA, (F) detection by cancer methylation cell of origin, (G) H3.1 nucleosomes, and (H) total cfDNA by Qubit.
[16] FIGURE 8: Cancer-Specific DNA Methylation Patterns in Client-Owned Dogs. (A) Global methylation changes (WGM) using “ultra-shallow” ONT sequencing performed on the plasma of 7 healthy dogs and 14 dogs with various types of cancer. (B) H3.1 plasma nucleosome concentrations in dogs newly diagnosed with large cell lymphoma (n=15) before and 24 hours after the first dose of chemotherapy (week 1, cycle 1). H3.1 concentration in dogs with (C) increases and (D) decreases, in nucleosomes 24 hours after chemotherapy. (E) Global methylation changes and (F) methylation of the cancer-specific CGI regions, evaluated in 5 dogs with lymphoma undergoing treatment. (G) Fraction of fragments less than 150bp and (H) overall DNA fragment length distribution (fraction of DNA in fragments in each size bin), assessed in 5 dogs with lymphoma pre-chemotherapy, 24 hours postchemotherapy and when in remission.
[17] FIGURE 9: Hyperfragmentation is a Strong Predictor of Healthy vs. Cancer Status in Dogs. (A) Hyperfragmentation component (pel) in the 21 “ultra-shallow” healthy vs. cancer dog cohort. (B-D) Bona-fide cancer DNA markers in the same cohort: global methylation (WGM) (B), tumor fraction from ichorCNA Copy Number Alterations (C), and average methylation at cancer-associated CpG Islands (CGI) (D). (E) Receiver Operating Characteristics (ROC) using Multivariable Logistic Regression (MLR) to predict healthy vs cancer status using a combination of global methylation, CpG Island methylation, and tumor fraction as inputs. Area Under the Curve (AUC) is shown for the full MLR as well as the median of MLR runs using Leave-One-Out Cross Validation (LOOCV). Significance values for A-D are computed using Fisher’s LSD test, indicated by p>=0.05 (ns), 0.01<=p<0.05 (*), 0.001 <=p<0.01 (**), 0.0001 <=p<0.001 (***), p<0.0001 (****).
[18] FIGURE 10: Hyperfragmentation is a Strong Predictor of Inflammation vs. Cancer Status in Dogs. (A) Hyperfragmentation component (pel) in “shallow” inflammation vs. cancer cohort (“INF” is inflammatory inflammation, “LSA” is lymphoma”, and “HSS” is hemangiosarcoma). (B-D) Bona-fide cancer DNA markers in the same cohort: global methylation (WGM) (B), tumor fraction from ichorCNA Copy Number Alterations (C), and average methylation at cancer-associated CpG Islands (CGI) (D). (E) Receiver Operating Characteristics (ROC) using Multivariable Logistic Regression (MLR) to predict inflammation vs cancer status using a combination of global methylation, CpG Island methylation, and tumor fraction as inputs. Area Under the Curve (AUC) is shown for the full MLR as well as the median of MLR runs using Leave-One-Out Cross Validation (LOOCV). “INF” refers to Inflammation, “LSA” to lymphoma, and “HSA” to hemangiosarcoma. Significance values for A-D are computed using Fisher’s LSD test, indicated by p>=0.05 (ns), 0.01<=p<0.05 (*), 0.001 <=p<0.01 (**), 0.0001 <=p<0.001 (***), p<0.0001 (****).
[19] FIGURE 11 : Using Hyperfragmentation and Cancer DNA Markers to Monitor Dogs during and After Treatment. (A) For 5 lymphoma cases and 5 hemangiosarcoma cases from the “shallow” ONT sequencing cohort, the differential abundance of each fragment length bin from 50bp to 10,000bp was calculated as the fraction of fragments in the bin in the patient’s clinical remission sample minus the fraction of fragments in the bin in the patient’s diagnosis (pre-treatment) sample. (B) Values of the hyperfragmentation component (pel) show a reduction in hyperfragmentation during remission for all 10 cases. (C) Bona- fide cancer DNA markers at diagnosis vs. during remission for this same cohort.
[20] FIGURE 12: Hyperfragmentation changes 24 hours post chemotherapy in dogs. (A) For 13 lymphoma cases from the “shallow” ONT sequencing cohort, pre-treatment (“Diagnosis”) values vs. 24 hours post chemotherapy values are shown for H3.1 nucleosome levels (left) and values of the hyperfragmentation component (pel), with both generally showing increases at the post chemotherapy timepoint. (B) Bona-fide cancer DNA markers for the same samples. (C) Pre-therapy (“Diagnosis”), the fragment length distribution for patient “HR” including all cancer-associated CpG islands shows longer fragments for both the cancer and non-cancer fragments, while post-chemotherapy fragments are shorter (hyperfragmentation) in both the cancer and non-cancer fragment groups.
[21] FIGURE 13: Enrichment of Long Fragments in Clinical Sepsis. (A) The heatmap shows the proportion of DNA sequenced in different fragment length bins, on a log scale from 50 bp to 50,000 bp, for blood from generally healthy control subjects (N=25), patients suspected of sepsis but ultimately determined to be negative (N=25), and patients positive for sepsis (N=10). (B) shows the fraction of fragments in each sample that have lengths between 900bp to 4,300bp. (C) shows the fraction of granulocyte (neutrophil) DNA estimated using methylation-based cell of origin deconvolution from the Non-Negative Least Squares (NNLS) method of the CelFiE-ISH package.
DETAILED DESCRIPTION
[22] Circulating tumor DNA (ctDNA) is routinely identified in cancer patients and increasingly used for monitoring and early detection. However, the concentration of immune cell DNA is elevated in cancer, suggesting major changes in immune cell turnover or cell free DNA (cfDNA) homeostasis. The present inventors used nanopore sequencing to identify a general shortening cfDNA in most cancers, and methylation-based cell of origin analysis revealed this phenomenon affects both cancer and non-cancer DNA. Therefore, incorporation of long-read sequencing provides an important parameter to consider when
determining the tissue of origin of cfDNA and in methods of detecting ctDNA for liquid biopsy.
[23] According to a first aspect, there is provided a method of determining a tissue of origin, a cell type of origin, origination from a cancerous or immune cell, or a combination thereof of cell-free DNA (cfDNA), the method comprising:
(i) providing a body fluid sample comprising cfDNA;
(ii) passing said cfDNA through a nanopore sequencer to produce a sequence of said cfDNA and obtain fragment length data of said cfDNA; and
(iii) identifying for said cfDNA passed through the nanopore sequencer a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof based on the fragment length data; thereby determining a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
[24] According to a further aspect of the invention, there is provided a method of determining a tissue of origin, a cell type of origin, origination from a cancerous or immune cell, or a combination thereof of cell-free DNA (cfDNA) in an animal subject, the method comprising:
(i) providing a body fluid sample comprising cfDNA;
(ii) passing said cfDNA through a nanopore sequencer to produce a sequence of said cfDNA and obtain DNA modification data of said cfDNA, wherein the DNA modification data is selected from: methylation data, hydroxymethylation data and both; and
(iii) identifying for said cfDNA passed through the nanopore sequencer a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof based on the DNA modification data; thereby determining a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA in the animal subject.
[25] Methods of the invention employ a nanopore sequencer. Nanopore sequencing is more appropriate for longer range alterations such as aneuploidy and chromosomal arm amplifications or deletions particularly where rapid results are required. One advantage we have found for nanopore sequencing is to provide direct methylated and/or hydroxymethylated DNA sequence results for cfDNA in real time (with no chemical pretreatment). Nanopore sequencer instruments can also be small, low cost and suitable for use near to the patient. Moreover, when sufficient data has been obtained in real time, sequencing can be terminated avoiding the unnecessary use of further reagents leading to
economy of use. Furthermore, nanopore sequencers are small and so can be used broadly and closer to the patient.
[26] Nanopore DNA sequencing (for example the sequencing methods employed by Oxford Nanopore Technology DNA instruments) is becoming more commonly employed by workers in the field. In short, nanopore sequencing involves the passage of DNA strands through electrically charged nanopores. In some embodiments, passing is translocating. When a nucleotide passes through a nanopore an electrical disturbance characteristic of the nucleotide is induced and this is detected by a sensor connected to the nanopore. Nanopore sequencing typically produces lower coverage results and less accurate DNA sequence results than sequencing by synthesis (“SBS”, for example the sequencing methods employed by Illumina NGS instruments are sequencing by synthesis) but has a number of advantages including the ability to sequence long DNA chains without fragmentation, the ability to directly sequence non-standard nucleotides in addition to adenine, thymine, cytosine and guanine (for example 5-methylcytosine and 5-hydroxymethylcytosine) and the generation of sequence data in real time and without library amplification. This facilitates the obtaining of sequence data in a shorter time of a few hours or less. Nanopore sequencing is expensive if used for high coverage sequencing of whole genomes but more economic for shallow sequencing (e.g. 1X coverage) or ultra-shallow (e.g. 0.05x coverage) of cfDNA fragments. In one embodiment, the nanopore sequencer is an Oxford Nanopore sequencer.
[27] Methods of the invention may use a computer-based machine learning system to analyse the fragment length data. Algorithms (e.g. machine learning algorithms) executed by a computer can be used to automate analytical model building, e.g., for clustering, classification or pattern recognition. Machine learning algorithms may be supervised or unsupervised. Learning algorithms include, for example, artificial neural networks, discriminant analyses (e.g. Bayesian classifier or Fischer analysis), support vector machines, decision trees (e.g. recursive partitioning processes such as CART - classification and regression trees, or random forests), linear classifiers (e.g. multiple linear regression (MLR), partial least squares (PLS) regression, and principal components regression), hierarchical clustering, and cluster analysis.
[28] In one embodiment, analysis of the fragment length data is performed on the cfDNA after passing the cfDNA through the nanopore sequencer by clustering the data based on the length of said cfDNA.
[29] In one embodiment, the clustering is unsupervised clustering.
[30] In one embodiment, a classifier is used to cluster the data based on the length of said cfDNA. The term “classifier” generally refers to algorithm computer code that receives, as input, test data and produces, as output, a classification of the input data as belonging to one or another class (e.g. length of fragment).
[31] In one embodiment, analysis of the fragment length data comprises transforming a fraction of sequencing reads from the fragment length data to an estimated fraction of total cfDNA present in the sample.
[32] In one embodiment, said transforming is performed prior to clustering.
[33] In one embodiment, clustering the data comprises separating the data into short, mid and long cfDNA lengths.
[34] In one embodiment, the short cfDNA length comprises 50-500 base pairs (bp) in length. In a further embodiment, the short cfDNA length comprises about 60-300 bp in length, such as about 75-145 and about 245-294 bp in length.
[35] In one embodiment, the mid cfDNA length comprises 501-5,000 bp in length. In a further embodiment, the mid cfDNA length comprises about 900-4,300 bp in length.
[36] In one embodiment, the long cfDNA length comprises 5,001 bp or more in length, such as 5,001-60,000 bp in length. In a further embodiment, the long cfDNA length comprises about 7,500-53,000 bp in length.
[37] In one embodiment, the method additionally comprises extracting the cfDNA from circulating chromatin fragments in the sample and sequencing the extracted cfDNA using the nanopore sequencer. In a further embodiment, the cfDNA is unamplified after it is extracted from a sample from a subject. Unlike most cfDNA sequencing approaches, amplification is not required (i.e. the method may be “amplification free”), which may provide an even more accurate representation of fragmentation features.
[38] In one embodiment, the cfDNA is modified with a sequencing adapter. Therefore, the method may comprise ligating an adapter sequence below 75 nucleotides in length to the cfDNA to produce adapter ligated cfDNA. Said adapter sequence may comprise a nucleic acid barcode that uniquely identifies a source sample (i.e. the sample from which the
cfDNA is obtained) of the cfDNA. Using adapter sequences can produce an adapter ligated cfDNA library for analysis. The adapter ligated cfDNA library may then be passed through a nanopore sequencer to produce a sequence of the cfDNA.
[39] In some embodiments, the method further comprises performing an additional analysis on the cfDNA. In a further embodiment, the additional analysis is performed on the cfDNA after passing the cfDNA through the nanopore sequencer.
[40] In one embodiment, fragmentation location analysis is performed on the cfDNA after passing the cfDNA through the nanopore sequencer and said fragmentation location analysis is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
[41] In one embodiment, the fragmentation location analysis comprises fragment end motif analysis. In some embodiments, the fragment end motif analysis is performed with the sequence determined from sequencing a plurality of cfDNAs.
[42] In one embodiment, wherein the method additionally comprises:
(a) fragment end motif analysis;
(b) performing RNA sequencing (RNA-seq) on the body fluid sample and detecting a level of mRNA expression of a deoxyribonuclease (DNase) enzyme;
(c) performing Reverse Transcription Polymerase Chain Reaction (RT-PCR) on the body fluid sample and detecting a level of mRNA expression of a DNase enzyme;
(d) performing an assay on the body fluid sample to detect a level of protein expression of a DNase enzyme; and/or
(e) performing an activity assay to measure functional activity of a DNase enzyme.
[43] DNase enzymes catalyse the hydrolytic cleavage of phosphodiester linkages in the DNA backbone, thus degrading DNA. Two main types are found in animals; deoxyribonuclease I (DNase I) and deoxyribonuclease II (DNase II) which are separated into further subcategories. In one embodiment, the DNase enzyme is a DNase I enzyme. In a further embodiment, the DNase I enzyme is DNASE1 L3.
[44] In some embodiments, the end sequence is an end 4 nucleotides. In some embodiments, the end sequences are the sequences provided in Chan et al., “Plasma DNA
Profile Associated with DNASE1L3 Gene Mutations: Clinical Observations, Relationships to Nuclease Substrate Preference, and In Vivo Correction”, American Journal of Human Genetics, Vol. 107, No. 5 (2020), herein incorporated by reference in its entirety. In some embodiments, the end sequences are the sequences provided in Serpas et al., “Dnase1l3 deletion causes aberrations in length and end-motif frequencies in plasma DNA”. Proc. Natl Acad. Sci. Vol. 116, pages 641-649 (2019), herein incorporated by reference in its entirety. In some embodiments, the end sequence is selected from CCCA, CCAG, CCTG, CCAA, CCCT, CCTT, CCAT, CAAA, CCTC, CCAC, TGAA, TAAA, CCTA, CCCC, TGAG, TGTT, CAAG, CTTT, AAAA, TGTG, CATT, CACA, CAGA, TATT, and CAGG. In some embodiments, the end sequence is CCCA. In some embodiments, the presence of a specific end fragment sequence indicates the DNA is from a cancer cell. In some embodiments, the presence of a specific end fragment sequence indicates the DNA is from an immune cell. In some embodiments, an enrichment of a specific end fragment sequence indicates the sample is from a subject that has cancer. In some embodiments, the presence or absence of a specific end fragment sequence is indicative of DNASE1L3 activity. In particular, a reduction or absence of the end sequence (e.g. CCCA, CCTG, CCAG, CCAA, CCAT and CCTC, in particular CCCA) indicates loss of function of DNASE1L3.
[45] In one embodiment, passing the cfDNA through the nanopore sequencer produces DNA modification data. As used herein, the term “DNA modification data” refers to the information of the modification of a portion of bases in the DNA molecule. In some embodiments, the modification is an epigenetic modification, such as an epigenetically modified base.
[46] In one embodiment, the DNA modification data is selected from: methylation data, hydroxymethylation data and both, and said DNA modification data is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA. In a further embodiment, the methylation data is 5-methylcytosine (5mC) methylation. In a further embodiment, the hydroxymethylation data is 5-hydroxymethylcytosine (5hmC) hydroxymethylation. As used herein, the term “methylation data” refers to the information of the methylation status of a portion of the bases in a DNA molecule. As used herein, the term “hydroxymethylation data” refers to the information of the hydroxymethylation status of a portion of the bases in a DNA molecule. In some embodiments, a portion is all of the bases. In some embodiments, the bases are cytosines.
[47] In one embodiment, copy number analysis is performed on the cfDNA after passing the cfDNA through the nanopore sequencer and said copy number analysis is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA. In a further embodiment, the fragment length data is combined with copy number analysis and one or more signatures derived from DNA modification data (such as methylation data) to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
[48] In one embodiment, the copy number analysis results in the detection of an oncogene amplification and further comprising administering an agent that targets said oncogene.
[49] In some embodiments, the identifying is based on the fragment length and the fragmentation location analysis. In some embodiments, the identifying is based on the fragment length and DNA modification data. In some embodiments, the identifying is based on the fragment length and copy number analysis. In some embodiments, the identifying is based on the fragment length and the level of circulating nucleosomes and/or total level of cfDNA. In some embodiments, the identifying is based on the fragment length, the fragmentation location analysis, the DNA modification data and the level of circulating nucleosomes. In some embodiments, the identifying is based on the fragment length, the fragmentation location analysis, the DNA modification data and the total level of cfDNA. In some embodiments, the identifying is based on the fragment length, the fragmentation location analysis, the DNA modification data, the copy number analysis and the level of circulating nucleosomes. In some embodiments, the identifying is based on the fragment length, the fragmentation location analysis, the DNA modification data, the copy number analysis and the total level of cfDNA.
[50] The present inventors have found a distinct hyperfragmentation pattern occurs across a large fraction of cancers and are associated with markers of altered DNase activity. The results show that fragments derived from cancer can be distinguished by a shorter length distribution (e.g. 75-145 bp and 245-295 bp) and end motif patterns consistent with loss of DNASE1L3 fragmentation activity. Therefore, the DNase activity, in particular DNASE1 L3 activity, can be used in the determination of a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA. In one embodiment, the method additionally comprises detecting DNASE1L3 activity. In a further embodiment, said detecting comprises:
(a) fragment end motif analysis, such as analysing the presence or absence of a specific end fragment sequence which is indicative of DNASE1L3 activity;
(b) performing RNA sequencing (RNA-seq) on the body fluid sample and detecting a level of DNASE1L3 mRNA expression;
(c) performing Reverse Transcription Polymerase Chain Reaction (RT-PCR) on the body fluid sample and detecting a level of DNASE1 L3 mRNA expression;
(d) performing an assay on the body fluid sample to detect a level of DNASE1L3 protein expression; and/or
(e) performing an activity assay to measure functional DNASE1 L3 activity, such as a Fluorescence Resonance Energy Transfer (FRET) assay.
[51] In one embodiment, the method is a method of determining origination from a cancerous cell and further comprises identifying a cancer-specific DNA modification change in said cancerous cell. In an alternative embodiment, the method is a method of determining origination from an immune cell, which optionally further comprises identifying an immune- specific sequence in said immune cell.
[52] In one embodiment, the method additionally comprises centrifuging the sample prior to passing the cfDNA through a nanopore sequencer.
[53] Methods and uses described herein are tested in body fluid samples. The sample may be any biological fluid (or body fluid) sample taken from a subject including, without limitation, cerebrospinal fluid (CSF), whole blood, blood serum, plasma, menstrual blood, endometrial fluid, urine, saliva, or other bodily fluid (stool, tear fluid, synovial fluid, sputum), breath, e.g. as condensed breath, or an extract or purification therefrom, or dilution thereof. In particular, blood, serum or plasma samples are used. Preferably, plasma samples are used. Plasma samples may be collected in collection tubes containing one or more anticoagulants such as ethylenediamine tetraacetic acid (EDTA), heparin, or sodium citrate, in particular EDTA.
[54] According to a further aspect of the invention, there is provided the use of a size profile obtained by nanopore sequencing of all circulating chromatin fragments present in a body fluid sample as a biomarker for the diagnosis of cancer.
[55] The term “biomarker” means a distinctive biological or biologically derived indicator of a process, event, or condition. Used herein, the biomarker may refer to fragment length data obtained by the nanopore sequencer. Biomarkers can be used in methods of
diagnosis, e.g. clinical screening, and prognosis assessment and in monitoring the results of therapy, identifying patients most likely to respond to a particular therapeutic treatment, drug screening and development. Biomarkers and uses thereof are valuable for identification of new drug treatments and for discovery of new targets for drug treatment.
[56] According to a further aspect of the invention, there is provided the use of a size profile obtained by nanopore sequencing of all circulating chromatin fragments present in a body fluid sample to identify a patient suitable for cancer treatment, such as immunotherapy.
Cancers
[57] Cancerous cells of interest in the present invention are derived from a cancer. In some embodiments the cancer is a haematological cancer including without limitation leukaemias, lymphomas (including canine lymphoma), myelomas and angiosarcomas (including canine hemangiosarcoma). In some embodiments the cancer disease is a solid cancer including without limitation, lung cancer, liver cancer, prostate cancer, breast cancer, gastric cancer, colorectal cancer, thyroid cancer, skin cancer (e.g. melanoma), bladder cancer, cervical cancer, pancreatic cancer, brain cancer, ovarian cancer, endometrial cancer or renal cancer.
[58] Haematological cancers are cancers of the blood, therefore may also be referred to as “liquid or blood cancers”. There are 3 principal types of haematological cancers: leukaemias, which are caused by the rapid production of abnormal white blood cells; lymphomas which are caused by abnormal lymphoma cells; and myelomas, which is a cancer of the plasma cells.
[59] For the purposes of the present invention a blood cancer may be considered to be any cancer in direct contact with the circulation. This includes angiosarcomas and hemangiosarcomas as these are cancers of the vascular lining and share the proximity to the circulation of blood cell cancers. In one embodiment, the cancerous cells are derived from hemangiosarcoma.
[60] Leukaemia is cancer of the blood cells which usually starts in the bone marrow and travels through the bloodstream. In leukaemia, the bone marrow produces mutated cells and spreads them into the blood, where they grow and crowd out healthy blood cells. Lymphoma diseases affect the cells in the lymphatic system. In lymphomas, immune cells called lymphocytes grow out of control and collect in lymph nodes, the spleen, in other lymph
tissues or in neighbouring organs. Myeloma, also known as multiple myeloma, develops in the bone marrow and affects plasma cells, which produce antibodies that attack infections and diseases. Examples of blood cancers include Acute Lymphoblastic Leukaemia (ALL), Acute Myeloid Leukaemia (AML), Hodgkin Lymphoma (HL) and Non-Hodgkin Lymphoma (NHL).
[61] References to “acute leukaemia” means the cancer progresses quickly and aggressively, usually requiring immediate treatment. ALL involves the development of large numbers of immature lymphocytes which are unable to fight infection. This causes the patient to have less room for healthy white blood cells, red blood cells, and platelets in the circulation. As a result, the patient usually suffers from a weakened immune system and the symptoms of anaemia, such as tiredness, breathlessness and an increased risk of excessive bleeding. The risk for developing ALL is highest in children younger than 5 years of age and it is the most common type of leukaemia that affects children. The risk then declines slowly until the mid-20s, and begins to rise again slowly after age 50. Overall, about 4 of every 10 cases of ALL are in adults.
[62] AML affects myeloblasts which results in the accumulation of abnormal monocytes and granulocytes in the bone marrow. AML may also affect myeloid stem cells resulting in abnormal red blood cells or platelets. As with ALL, this causes the patient to have lower levels of healthy white blood cells, red blood cells, and platelets in the circulation. AML is one of the most common types of leukaemia in adults and the average age at diagnosis is 68.
[63] HL and NHL are the two main types of lymphoma. HL has a particular appearance under the microscope and contains cells called Reed-Sternberg cells (a type of B lymphocyte that has become cancerous), whereas NHL looks different under the microscope and does not contain Reed-Sternberg cells. Most lymphomas are NHL and only about 1 in 5 are HL. NHL is a cancer affecting lymphocytes and usually starts in lymph nodes or lymph tissue. It is one of the more common cancers among children, teens and young adults.
[64] Current methods of diagnosing leukaemia and myeloma involve obtaining a complete blood count (CBC) test to identify abnormal levels of white blood cells relative to red blood cells and platelets. However, an elevated white blood cell count (WBC) is not specific to patients with a haematological malignancy; it can also be the result of an ongoing response to infection or other inflammatory process. For lymphoma, an X-ray, CT or PET scan can be used to detect swollen lymph nodes, however this is also non-specific.
[65] In order to confirm a diagnosis of a haematological cancer, a bone marrow or lymph node biopsy is required. Therefore overdiagnosis of haematological cancers at an early stage in the diagnostic process can lead to unnecessary biopsies which are invasive, potentially hazardous and relatively costly to healthcare providers. Cytogenetics analysis and/or immunophenotyping can also be used to confirm a haematological cancer diagnosis, however these methods are expensive to perform and therefore are typically only used at a late stage of the diagnostic process.
[66] The diagnosis of solid and liquid cancers requires an invasive tissue biopsy. Therefore methods of the invention as described herein, may be used for the selection of subjects for whom a biopsy is required.
Immune cells
[67] The results herein obtained from several cancer types, indicated an abundance of immune-derived fragments with lengths of 1-4 kilobases (kb) (e.g. about 5-25 nucleosomes), with a possible shift from myeloid to lymphoid origin. Identifying cfDNA of immune cell origin can provide valuable information which can be use prognostically or to monitor treatment response (e.g. immunotherapy). The information can also be used to distinguish cancer from other inflammatory conditions (e.g. sepsis).
[68] Immune cells of interest in the present invention include, but are not limited to, CD34+ cells, B-Cells, CD45+ (lymphocyte common antigen) cells, Alpha-Beta T-cells, Cytotoxic T-cells, Helper T-cells, Plasma Cells, Neutrophils, Monocytes, Macrophages, Red Blood Cells, Platelets, Dendritic Cells, Phagocytes, Granulocytes, Innate lymphoid cells, Natural Killer (NK) cells and Gamma Delta T-cells. Typically, immune cells are classified with the aid of combinatorial cell surface molecule analysis (e.g. via flow cytometry) to identify or group or cluster to differentiate immune cells into sub-populations. These can be then still further sub-divided with additional analysis.
Circulating chromatin fragments
[69] Uses and methods may additionally comprise measuring or detecting the level of chromatin fragments in the body fluid sample and said level is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
[70] The term “chromatin fragment” as used herein refers to a complex of proteins and nucleic acid whose origin lies in the chromosome or mitochondria of a cell. The term encompasses chromatin fragments found outside of cells, which may also be referred to as “cell free chromatin fragments”. A fragment of chromatin may contain a nucleosome and/or associated DNA and/or any of a huge variety of non-histone chromatin associated proteins in a multi-protein-nucleic acid complex. Some examples of non-histone chromatin associated proteins include transcription factors, cofactors, co-activators, co-repressors, RNA polymerase moieties, elongation factors, chromatin remodelling factors, mediators, STAT moieties, upstream binding factor (UBF) and others.
[71] The present invention is not limited to any particular method for quantifying circulating chromatin fragments and any suitable method may be used. Chromatin fragments, cfDNA or cf-nucleosomes may be measured by many methods including, for example without limitation, binding methods such as immunochemical or immunoassay methods or binding by DNA intercalating dyes, sequencing (for example to determine read numbers), rtPCR methods and spectroscopic methods.
[72] In one embodiment, the method additionally comprises measuring or detecting the level of circulating (cell free) nucleosomes in the body fluid sample and said level is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
[73] In one embodiment, the method additionally comprises measuring or detecting the level of circulating nucleosomes in the body fluid sample and said level is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA. Sequencing can only provide relative cell type proportions therefore in order to address the question of globally elevated levels of chromatin, the inventors profiled not only DNA concentration in these individuals, but also absolute concentrations of circulating nucleosomes.
[74] The nucleosome is the basic unit of chromatin structure and consists of a protein complex of eight highly conserved core histones (comprising of a pair of each of the histones H2A, H2B, H3, and H4). Around this complex is wrapped approximately 146 base pairs of DNA. Another histone, H1 or H5, acts as a linker and is involved in chromatin compaction. The DNA is wound around consecutive nucleosomes in a structure often said to resemble “beads on a string” and this forms the basic structure of open or euchromatin. In compacted
or heterochromatin this string is coiled and super coiled into a closed and complex structure (Herranz and Esteller (2007) Methods Mol. Biol. 361: 25-62).
[75] Neutrophil extracellular traps (NETs) and extracellular traps (ETs) are chromatin fragments are released as long strings of nucleosomes.
[76] References to “nucleosome” may refer to “cell free nucleosome” when detected in body fluid samples. It will be appreciated that the term cell free nucleosome throughout this document is intended to include any circulating chromatin fragment that includes one or more nucleosomes. “Epigenetic features”, “epigenetic signal features” or “epigenetic signal structures” of a cell free nucleosome as referred herein may comprise, without limitation, one or more histone post-translational modifications, histone isoforms, modified nucleotides and/or proteins bound to a nucleosome in a nucleosome-protein adduct.
[77] It will be understood that the cell free nucleosome may be detected by binding to a component thereof. The term “component thereof” as used herein refers to a part of the nucleosome, i.e. the whole nucleosome does not need to be detected. The component of the cell free nucleosomes may be selected from the group consisting of: a histone protein {i.e. histone H1, H2A, H2B, H3 or H4), a histone post-translational modification, a histone variant or isoform, a protein bound to the nucleosome i.e. a nucleosome-protein adduct), a DNA fragment associated with the nucleosome and/or a modified nucleotide associated with the nucleosome. For example, the component thereof may be histone (isoform) H3.1 or histone H1 or DNA.
[78] In one embodiment, the component of the nucleosome is a histone protein. References herein to “histone” refer to histones and modifications thereof, as described herein (e.g. post-translational modifications, mutations, isoforms, variants and fragments of histones, such as clipped histones).
[79] Methods and uses of the invention may measure the level of (cell free) nucleosomes per se. References to “nucleosomes per se” refers to the total nucleosome level or concentration present in the sample, regardless of any epigenetic features the nucleosomes may or may not include. Detection of the total nucleosome level typically involves detecting a histone protein common to all nucleosomes, such as histone H4. Therefore, nucleosomes per se may be measured by detecting a core histone protein, such as histone H4. As described herein, histone proteins form structural units known as
nucleosomes which are used to package DNA in eukaryotic cells and also form the repeating units present in ETs and NETs.
[80] Normal cell turnover in adult humans involves the creation by cell division of a large number of cells daily and the death of a similar number, mainly by apoptosis but also by other cell death mechanisms including NETosis. Under normal conditions the levels of circulating nucleosomes found in healthy subjects is reported to be low. Elevated levels are found in subjects with a variety of conditions including many cancers, auto-immune diseases, inflammatory conditions, stroke and myocardial infarction (Holdenrieder & Stieber (2009) Crit Rev Clin Lab Sci, 46(1): 1-24).
[81] Mononucleosomes and oligonucleosomes can be detected by Enzyme-Linked ImmunoSorbant Assay (ELISA) and several methods have been reported (e.g. Salgame et al. (1997); Holdenrieder et al. (2001); van Nieuwenhuijze et al. (2003)). These assays typically employ an anti-histone antibody (for example anti-H2B, anti-H3 or anti-H1, H2A, H2B, H3 and H4) as capture antibody and an anti-DNA or anti-H2A-H2B-DNA complex antibody as detection antibody.
[82] Circulating nucleosomes are not a homogeneous group of protein-nucleic acid complexes. Rather, they are a heterogeneous group of chromatin fragments originating from the digestion of chromatin on cell death and include an immense variety of epigenetic structures including particular histone isoforms (or variants), post-translational histone modifications, nucleotides or modified nucleotides, and protein adducts. It will be clear to those skilled in the art that an elevation in nucleosome levels will be associated with elevations in some circulating nucleosome subsets containing particular epigenetic signals including nucleosomes comprising particular histone isoforms (or variants), comprising particular post-translational histone modifications, comprising particular nucleotides or modified nucleotides and comprising particular protein adducts (for example myeloperoxidase, neutrophil elastase adducts or other adducts associated with NETs). Assays for these types of chromatin fragments are known in the art (for example, see WO 2005/019826, WO 2013/030579, WO 2013/030578, WO 2013/084002 which are herein incorporated by reference).
[83] Uses and methods of the invention may include data for additional biomarkers, such as the level of cell free nucleosomes per se and/or an epigenetic feature of a cell free nucleosome. It will be understood that the terms “epigenetic signal structure” and “epigenetic feature” are used interchangeably herein. They refer to particular features of the nucleosome
that may be detected. In one embodiment, the epigenetic feature of the nucleosome is selected from the group consisting of: a post-translational histone modification, a histone variant, a particular nucleotide and a protein adduct. In one embodiment, the epigenetic feature of the nucleosome is the histone isoform H3.1.
[84] The structure of a nucleosome may vary by the inclusion of alternative histone isoforms or variants which are different gene or splice products and have different amino acid sequences. In one embodiment, the epigenetic feature of the nucleosome comprises a histone variant or isoform. It will be understood that the term “histone variant” and “histone isoform” may be used interchangeably herein. Many histone isoforms are known in the art. Histone isoforms can be classed into a number of families which are subdivided into individual types. The sequences of a large number of histone isoforms are known and publicly available for example in the National Human Genome Research Institute NHGRI Histone Database (Marino-Ramirez et al. The Histone Database: an integrated resource for histones and histone fold-containing proteins. Database Vol.2011. and http://genome.nhgri.nih.gov/histones/complete.shtml), the GenBank (NIH genetic sequence) Database, the EMBL Nucleotide Sequence Database and the DNA Data Bank of Japan (DDBJ). For example, isoforms of histone H2 include H2A1, H2A2, mH2A1, mH2A2, H2AX and H2AZ. In another example, histone isoforms of H3 include H3.1, H3.2 and H3t. In one embodiment, the histone isoform is H3.1.
[85] Another way the structure of nucleosomes may vary is by mutation. Therefore, in one embodiment, the epigenetic feature is a mutated histone. In a further embodiment, the mutation is in histone 3 (H3). In a yet further embodiment, the mutation in H3 is when lysine 27 is replaced by a methionine (H3K27M).
[86] The structure of nucleosomes can vary by post translational modification (PTM) of histone proteins. PTM of histone proteins typically occurs on the tails of the core histones and common modifications include acetylation, methylation or ubiquitination of lysine residues as well as citrullination or methylation of arginine residues and phosphorylation of serine residues and many others. It will be understood that a histone PTM may occur on different isoforms (variants) of the histone. For example, the lysine residues that occur on the tail of histone H3 isoforms H3.1, H3.2 and H3.3 may be modified by acetylation or methylation. Many histone modifications are known in the art and the number is increasing as new modifications are identified (e.g. see Zhao and Garcia (2015) Cold Spring Harb Perspect Biol, 7: a025064). Therefore, in one embodiment, the epigenetic feature of the cell free nucleosome may be a histone post translational modification (PTM). The histone PTM
may be present on a core nucleosome histone (e.g. H2A, H2B, H3 or H4), or a linker histone (e.g. H1 or H5). Examples of PTMs are described in WO 2005/019826 and WO 2017/068359.
[87] In one embodiment, the histone PTMs are selected from acetylation, methylation (which may be mono-, di- or tri-methylation), phosphorylation, ribosylation, citrullination, ubiquitination, hydroxylation, glycosylation, nitrosylation, glutamination and isomerisation. In one embodiment, the histone PTM is methylation of a lysine residue. In a further embodiment, the methylation is of a histone 3 lysine residue. In a yet further embodiment, the histone PTM is selected from H3K4Me, H3K4Me2, H3K9Me, H3K9Me3, H3K27Me3 or H3K36Me3. In one embodiment, the histone PTM is acetylation of a lysine residue. In a further embodiment, the acetylation is of a histone 3 lysine residue. In a yet further embodiment, the histone PTM is selected from H3K9Ac, H3K14AC, H3K18AC or H3K27AC. In another embodiment, the histone PTM is H4PanAc. In one embodiment, the histone PTM is phosphorylation of a serine residue. In a further embodiment, the phosphorylation is of an isoform X of histone 2A (H2AX) serine residue or phosphorylation of a histone 3 serine residue. In a yet further embodiment, the histone PTM is selected from pH2AX or H3S10Ph. In one embodiment, the histone PTM is selected from citrullination or ribosylation. In a further embodiment, the histone PTM is citrullinated H3 (H3cit) or citrullinated H4 (H4cit). In a further embodiment, the histone PTM is citrullination of a histone 3 arginine residue. In a yet further embodiment, the histone PTM is H3R8Cit. In one embodiment, the histone PTM is selected from the group consisting of: H3K4Me, H3K4Me2, H3K9Me, H3K9Me3, H3K27Me3, H3K36Me3, H3K9Ac, H3K14AC, H3K18AC, H3K27AC, H4PanAc, pH2AX, H3S10Ph and H3R8Cit.
[88] A group or class of related histone post translational modifications (rather than a single modification) may also be detected. A typical example, without limitation, would involve a 2-site immunoassay employing one antibody or other selective binder directed to bind to nucleosomes and one antibody or other selective binder directed to bind the group of histone modifications in question. Examples of such antibodies directed to bind to a group of histone modifications would include, for illustrative purposes and without limitation, anti-pan- acetylation antibodies (e.g. a Pan-acetyl H4 antibody [H4panAc]), anti-citrullination antibodies or anti-ubiquitin antibodies.
[89] In one embodiment, the epigenetic feature is a DNA modification. In addition to the epigenetic signalling mediated by nucleosome histone isoform and PTM composition, nucleosomes also differ in their nucleotide and modified nucleotide composition. Some
nucleosomes may comprise more 5-methylcytosine residues, or 5-hydroxymethylcytosine residues or other nucleotides or modified nucleotides, than other nucleosomes. In one embodiment, the epigenetic feature is a DNA modification selected from 5-methylcytosine or 5-hydroxymethylcytosine. Thus, in some embodiments, the defined calibrated DNA modification is 5-methylcytosine or 5-hydroxymethylcytosine.
[90] A further type of circulating nucleosome subset is nucleosome protein adducts. It has been known for many years that chromatin comprises a large number of non-histone proteins bound to its constituent DNA and/or histones. These chromatin associated proteins are of a wide variety of types and have a variety of functions including transcription factors, transcription enhancement factors, transcription repression factors, histone modifying enzymes, DNA damage repair proteins and many more. These chromatin fragments including nucleosomes and other non-histone chromatin proteins or DNA and other nonhistone chromatin proteins are described in the art. Therefore, in one embodiment, the epigenetic feature comprises one or more protein-nucleosome adducts or complexes.
[91] It will be understood that more than one epigenetic feature of cell free nucleosomes may be detected in methods and uses of the invention. The epigenetic features may be the same type (e.g. PTMs, histone isoforms, nucleotides or protein adducts) or different types (e.g. a PTM in combination with a histone isoform). For example, a post-translational histone modification and a histone variant may be detected (/.e. more than one type of epigenetic feature is detected). Alternatively, or additionally, more than one type of post-translational histone modification is detected, or more than one type of histone isoform is detected.
[92] As described herein, the method may additionally comprise measuring or detecting the level of circulating cell free nucleosomes. Said measurement or detection comprises methods described hereinbefore, such as an immunoassay, immunochemical, mass spectroscopy, chromatographic, chromatin immunoprecipitation or biosensor method. In one embodiment, the measurement or detection employs a single binding agent. In an alternative embodiment, the measurement or detection comprises a 2-site immunometric assay employing two binding agents.
[93] It will be clear to those skilled in the art that the terms “antibody”, “binder” or “ligand” as used herein are not limiting but are intended to include any binder capable of specifically binding to particular molecules or entities and that any suitable binder can be used in the method of the invention. In one embodiment, the binding agent is an antibody. In an alternative embodiment, the binding agent is a chromatin binding protein.
[94] The most commonly used epitope binders in the art are antibodies or derivatives of an antibody that contain a specific binding domain. The antibody may be a polyclonal antibody or a monoclonal antibody or a fragment thereof capable of specific binding to the epitope. However, any binder capable of binding to a particular epitope may be used for the purposes of the invention. The reagents may comprise one or more ligands or binders, for example, naturally occurring or chemically synthesised compounds, capable of specific binding to the desired target. A ligand or binder may comprise a peptide, an antibody or a fragment thereof, or a synthetic ligand such as a plastic antibody, or an aptamer or oligonucleotide, capable of specific binding to the desired target. The antibody can be a monoclonal antibody or a fragment thereof. It will be understood that if an antibody fragment is used then it retains the ability to bind the biomarker so that the biomarker may be detected (in accordance with the present invention). A ligand/binder may be labelled with a detectable marker, such as a luminescent, fluorescent, enzyme or radioactive marker; alternatively or additionally a ligand according to the invention may be labelled with an affinity tag, e.g. a biotin, avidin, streptavidin or His (e.g. hexa-His) tag. Alternatively, ligand binding may be determined using a label-free technology for example that of ForteBio Inc. The terms antibody or binder as used herein are interchangeable and refer to any moiety capable of specific binding to an epitope.
[95] In one embodiment, the binding agent is directed to a histone, nucleosome core protein, DNA epitope or a protein adducted to a nucleosome. In a further embodiment, the binding agent is directed to a histone isoform, such as a histone isoform of a core histone, in particular a histone H3 isoform. In a particular embodiment, the binding agent specifically binds to histone isoform H3.1.
[96] A binding agent is considered to “specifically bind” if there is a greater than 10 fold difference, and preferably a 25, 50 or 100 fold difference between the binding of the agent to a particular target epitope compared to an non-target epitope.
[97] The binding agent may comprise an MHC molecule or part thereof which comprises the peptide binding groove. Alternatively the agent may comprise an anti-peptide antibody. As used herein, "antibody" includes a whole immunoglobulin molecule or a part thereof or a bioisostere or a mimetic thereof or a derivative thereof or a combination thereof. Examples of a part thereof include: Fab, F(ab)'2; and Fv. Examples of a bioisostere include single chain Fv (scFv) fragments, chimeric antibodies, bifunctional antibodies.
[98] The term "mimetic" relates to any chemical which may be a peptide, polypeptide, antibody or other organic chemical which has the same binding specificity as the antibody. The term "derivative" as used herein in relation to antibodies includes chemical modification of an antibody. Illustrative of such modifications would be replacement of hydrogen by an alkyl, acyl, or amino group.
[99] The binding agent may be an aptamer or a non- immunoglobulin scaffold such as an affibody, an affilin molecule, an AdNectin, a lipocalin mutein, a DARPin, a Knottin, a Kunitz-type domain, an Avimer, a Tetranectin or a transbody.
[100] In one embodiment, the method of measuring the level of nucleosomes comprises contacting the sample with a solid phase comprising a binding agent that detects nucleosomes or a component thereof, and detecting binding to said binding agent. The method of measuring the level of nucleosomes may comprise: (a) contacting the sample with a first binding agent which binds to an epigenetic feature of a cell free nucleosome; (b) contacting the sample bound by the first binding agent in step (a) with a second binding agent which binds to cell free nucleosomes; and (c) detecting or quantifying the binding of the second binding agent in the sample. Alternatively, the measuring the level of nucleosomes may comprise: (a) contacting the sample with a first binding agent which binds to cell free nucleosomes; (b) contacting the sample bound by the first binding agent in step (a) with a second binding agent which binds to an epigenetic feature of the cell free nucleosome; and (c) detecting or quantifying the binding of the second binding agent in the sample.
[101] In some embodiments the binding agent is linked to a solid phase. Therefore, the circulating chromatin fragment (e.g. nucleosome) may be bound and isolated from the sample before analysis.
Detection and diagnosis methods
[102] Methods of the invention may be for use in cancer detection or diagnosis, early cancer screening, residual disease detection, relapse detection, metastasis detection or a combination thereof.
[103] The term “detecting” or “diagnosing” as used herein encompasses identification, confirmation, and/or characterisation of a disease state. Methods of detecting, monitoring and of diagnosis according to the invention are useful to confirm the existence of a disease,
to monitor development of the disease by assessing onset and progression, or to assess amelioration or regression of the disease. Methods of detecting, monitoring and of diagnosis are also useful in methods for assessment of clinical screening, prognosis, choice of therapy, evaluation of therapeutic benefit, i.e. for drug screening and drug development.
[104] In one embodiment, the method described herein is repeated on multiple occasions. This embodiment provides the advantage of allowing the detection results to be monitored over a time period. Such an arrangement will provide the benefit of monitoring or assessing the efficacy of treatment of a disease state. Such monitoring methods of the invention can be used to monitor onset, progression, stabilisation, amelioration, relapse and/or remission.
[105] In monitoring methods, test samples may be taken on two or more occasions. The method may further comprise comparing the level of the biomarker(s) present in the test sample with one or more control(s) and/or with one or more previous test sample(s) taken earlier from the same test subject, e.g. prior to commencement of therapy, and/or from the same test subject at an earlier stage of therapy. The method may comprise detecting a change in the nature or amount of the biomarker(s) in test samples taken on different occasions.
[106] A change in the level of the biomarker in the test sample relative to the level in a previous test sample taken earlier from the same test subject may be indicative of a beneficial effect, e.g. stabilisation or improvement, of said therapy on the disorder or suspected disorder. Furthermore, once treatment has been completed, the method of the invention may be periodically repeated in order to monitor for the recurrence of a disease.
[107] Methods of the invention may be used to identify a patient suitable for cancer treatment, such as immunotherapy. Therefore, methods of the invention may be for use in a method for monitoring the efficacy of a therapy in a subject having, suspected of having, or being predisposed to cancer.
[108] Methods for monitoring efficacy of a therapy can be used to monitor the therapeutic effectiveness of existing therapies and new therapies in human subjects and in non-human animals (e.g. in animal models). These monitoring methods can be incorporated into screens for new drug substances and combinations of substances.
[109] In a further embodiment the monitoring of more rapid changes due to fast acting therapies may be conducted at shorter intervals of hours or days.
[110] Biomarkers for detecting the presence of a disease are essential targets for discovery of novel targets and drug molecules that retard or halt progression of the disorder. As the level of the biomarker is indicative of disorder and of drug response, the biomarker is useful for identification of novel therapeutic compounds in in vitro and/or in vivo assays. Biomarkers described herein can be employed in methods for screening for compounds that modulate the activity of the biomarker.
[111] The identification of biomarkers for a disease state permits integration of diagnostic procedures and therapeutic regimes. The biomarkers provide the means to indicate therapeutic response, failure to respond, unfavourable side-effect profile, degree of medication compliance and achievement of adequate serum drug levels. The biomarkers may be used to provide warning of adverse drug response. Biomarkers are useful in development of personalized therapies, as assessment of response can be used to fine-tune dosage, minimise the number of prescribed medications, reduce the delay in attaining effective therapy and avoid adverse drug reactions. Thus by monitoring a biomarker of the invention, subject care can be tailored precisely to match the needs determined by the disorder and the pharmacogenomic profile of the subject, the biomarker can thus be used to titrate the optimal dose, predict a positive therapeutic response and identify those subjects at high risk of severe side effects.
[112] Biomarker-based tests provide a first line assessment of ‘new’ subjects, and provide objective measures for accurate and rapid diagnosis, not achievable using the current measures.
[113] Biomarker monitoring methods are also vital as subject monitoring tools, to enable the physician to determine whether relapse is due to worsening of the disorder. If pharmacological treatment is assessed to be inadequate, then therapy can be reinstated or increased; a change in therapy can be given if appropriate. As the biomarkers are sensitive to the state of the disorder, they provide an indication of the impact of drug therapy.
[114] In one embodiment, the subject is suspected of relapse to a cancer. Minimal residual disease (MRD) is the name given to small numbers of cancer cells that remain in the person during treatment, or after treatment when the patient is in remission (/.e. patients with no symptoms or signs of disease). However, MRD is the major cause of relapse in
cancer. Methods of the invention are therefore useful in monitoring patients who are suspected of relapse, particularly patients who are in remission from cancer.
[115] The subject tested using the methods described herein may present with symptoms indicative of cancer, for example the symptoms of a haematological cancer may include anaemia, leucocytosis and/or swollen lymph nodes. In one embodiment, the subject has a high level of leucocytosis. This may also be referred to a “high white blood cell count”. Haematological cancers typically cause increased proliferation of abnormal white or red blood cells which results in a high white blood cell count. However, leucocytosis is not sufficient to diagnose a patient with a haematological cancer (in particular leukaemia) because it is frequently a sign of an inflammatory response, most commonly the result of infection. Therefore, methods of the invention are able to provide a more specific differential method to identify patients who are likely to be suffering from cancer or an inflammatory condition.
[116] Detecting and/or quantifying may be compared to a cut-off level. Cut-off values can be predetermined by analysing results from multiple patients and controls, and determining a suitable value for classifying a subject as with or without the disease. For example, for diseases where the level of biomarker is higher in patients suffering from the disease, then if the level detected is higher than the cut-off, the patient is indicated to suffer from the disease. Alternatively, for diseases where the level of biomarker is lower in patients suffering from the disease, then if the level detected is lower than the cut-off, the patient is indicated to suffer from the disease. The advantages of using simple cut-off values include the ease with which clinicians are able to understand the test and the elimination of any need for software or other aids in the interpretation of the test results. Cut-off levels can be determined using methods in the art.
[117] Detecting and/or quantifying may also be compared to a control. It will be clear to those skilled in the art that the control subjects may be selected on a variety of basis which may include, for example, subjects known to be free of the disease or may be subjects with a different disease (for example, for the investigation of differential diagnosis). The “control” may comprise a healthy subject, a non-diseased subject and/or a subject without a haematological cancer. Comparison with a control is well known in the field of diagnostics.
[118] Both positive and negative controls may be used. Thus, the presence of a cancer disease in a subject may be confirmed by comparison of results with known cancer controls
(positive control) as well as with known disease free or non-cancer controls (negative control).
[119] It will be understood that it is not necessary to measure control levels or size profiles for comparative purposes on every occasion. For example, for healthy/non-diseased controls, once the ‘normal range’ is established it can be used as a benchmark for all subsequent tests. A normal range can be established by obtaining samples from multiple control subjects without cancer and testing for the level of biomarker. Results for subjects suspected to have cancer can then be examined to see if they fall within, or outside of, the respective normal range. Use of a ‘normal range’ is standard practice for the detection of disease.
[120] In one embodiment, the method additionally comprises determining at least one clinical parameter for the patient. This parameter can be used in the interpretation of results. Clinical parameters may include any relevant clinical information for example, without limitation, gender, weight, Body Mass Index (BMI), smoking status, temperature and dietary habits. Therefore, in one embodiment, the clinical parameter is selected from the group consisting of: age, sex and body mass index (BMI).
[121] In one embodiment, the method of the invention is performed to identify a subject at high risk of having a cancer and therefore in need of further testing (/.e. further cancer investigations). The further testing may involve one or more of: biopsy (such as bone marrow biopsy or lymph node biopsy), cytogenetic testing, immunophenotyping, CT scanning, X-ray (in particular chest X-ray to identify swollen lymph nodes) and/or lumbar puncture.
[122] Methods and biomarkers described herein may be used to identify if a patient is in need of a biopsy, in particular a bone marrow or lymph node biopsy (e.g. for patients with suspected haematological cancer). Therefore, according to a further aspect of the invention there is provided a method of identifying a patient in need of a biopsy comprising performing the method of the invention and using the results obtained to identify whether the patient is in need of a biopsy.
[123] References herein to “subject” or “patient” are used interchangeably. The subject may be a human or an animal subject. In one embodiment the subject is a human subject. In some embodiments the subject is a (non-human) animal subject. In a further embodiment, the animal is a companion animal (also referred to as a pet or domestic animal). Companion animals include, for example dogs, cats, rabbits, ferrets, horses, cows, or the like. In
particular, the companion animal is a dog or cat, particularly a dog. The methods described herein may be performed in vitro, or ex vivo.
[124] It will be understood that the embodiments described herein may be applied to all aspects of the invention, i.e. the embodiment described for the uses may equally apply to the claimed methods and so forth.
[125] The invention will now be illustrated with reference to the following non-limiting examples.
EXAMPLES
Methods
Plasma sample procurement
[126] Plasma samples were procured through the following Commercial Biobanks: Discovery Life Sciences (DLS), Innovative Research, and Bay Biosciences. Selection of 52 self-reported healthy individuals for Illumina sequencing were selected based on donor age criteria: an even distribution of ages from 40 to 90 years old. Selection of 207 cancer cases was based on Stage lll/IV status and diversity of cancer types. Selection of healthy samples for Nanopore sequencing was based on volume of plasma available and the age of the donor. Samples were evenly distributed across ages with 10 samples in each of the following age groups: 40-49, 50-59, and 60-69 years old. 15 samples were procured in the age group 70-79, and 5 samples in the age group 80-89. Healthy samples were only considered on a lack of cancer diagnosis. Cancer samples were selected based on late stage diagnosis (III or IV), type of cancer, treatment status (untreated) and large sample volume.
[127] All three commercial biobanks collect blood in EDTA tubes, then isolate plasma by centrifuging at 1300-2000 xg for 10-15 minutes, then freeze plasma at -80°C.
Sample Storage and Preparation
[128] Plasma samples were received and subsequently stored at -80°C. Aliquots were thawed at room temperature (RT) for up to 2 hours, depending on the aliquot volume. Plasma was then spun at 14000 x g for 2 minutes at room temperature and transferred to a new tube avoiding any pelleted fraction.
Nucleosome Quantification
[129] Nucleosome levels in plasma were quantified using the Nu.Q® H3.1 assay developed by Volition for use on the IDS i 10 instrument, using 250 uL of plasma according to the manufacturer’s recommendations. Samples were completed either in duplicate or triplicate. Raw RLU (Relative Light Units) were converted to concentrations of nucleosomes (ng/mL) using the provided kit standards, and replicate measurements were averaged.
DNA Extraction and Basic Characterization
[130] All DNA extractions were completed with the QIAamp® Circulating Nucleic Acid Kit (Qiagen, catalogue # 55114) or with the QIAamp® MinElute ccfDNA Kit (Qiagen, catalogue # 55204). When possible, the extraction was completed with 1 mL plasma (if less, 1X PBS was used to bring volume to 1 mL). Additional extractions (if DNA levels were not sufficient for ONT) were carried out with up to 5 mL plasma, according to the manufacturer’s protocol.
[131] The Qiagen protocol was followed per the manufacturer’s recommendations with the following adjustments: samples were eluted in 30 pL heated (60°C) elution buffer and the elution step was also completed at 60°C to increase DNA yield.
[132] Samples were quantified with Qubit and total DNA levels were calculated based on plasma volume input. DNA profiles were confirmed using BioAnalyzer or Tapestation (Agilent).
Illumina Sequencing and mapping
[133] When possible, DNA from the basic characterization step was used for Illumina Sequencing. If more was needed, additional DNA from 1-5 mL of plasma was extracted using the QIAamp® Circulating Nucleic Acid Kit (Qiagen, catalogue # 55114). Illumina sequencing libraries were constructed using SRSLY® PicoPlus DNA NGS Library Preparation Kit (ClaretBio - Cat: CBS-K250B). Libraries were sequenced by Discovery Life Sciences using the NovaSeq 6000 S4 200 cycle flow cell. Genome mapping to hg38 was performed using the Illumina BaseSpace DRAGEN pipeline.
Oxford Nanopore (ONT) Sequencing
[134] Samples were prepped for Nanopore sequencing using the 109 library prep kit and the 104 and 114 barcoding kits (ONT, catalogue #s: SQK-LSK109, EXP-NBD104, and EXP- NBD114, respectively). The Oxford Nanopore Technology (ONT) protocol was used, with the following adjustments: In the DNA repair and end-prep section, DNA input was decreased to 30 ng, and incubation time was increased from 5 min at 20°C and 5 minutes at
65°C to 30 minutes at each temperature. Throughout the protocol, the AMPure bead:reaction ratio was changed from 1:1 to 3:1 , and ethanol wash volumes were increased from 200 uL to 400 uL. When sequenced, additional adaptations were made to the input and amount of library loaded per flow cell as needed based on the success of the previous sequencing runs and the number of pores available on the flow cell. R9.4.1 flow cells were sequenced on the MinlON Mk1c, with sequencing runs lasting for 16-72 hours, depending on the health of the flow cell. Samples were sequenced individually, but were still barcoded to avoid possible contamination. Barcode demultiplexing and quality filtering were performed by MinKnow using FAST model.
ONT basecalling, mapping, and DNA modification calling
[135] Passing fast5 files were re-processed using Dorado v. 0.2.4+3fc2b0f using the SUP basecalling model dna_r9.4.1_e8_sup@v3.3 and the Remora modification calling model dna_r9.4.1_e8_sup@v3.3_5mCG@v0, which resulted in unaligned BAM files including DNA modification tags. These were mapped to the UCSC hg38.analysisSet assembly (https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/analysisSet/) using Minimap v. 2.24-r1122 with “minimap2 -y -t 2 -ax map-ont” and filtered with samtools v. 1.14 with “samtools view -e ([qs] >= 10) && (mapq >= 20)”. Duplicates were marked with samtools v.1.17 “samtools markdup”, and supplementary and secondary mappings were removed using “samtools view -F 0xD04”.
[136] For global methylation and CpG island methylation analysis, methylation BED files were created using modkit v. 0.1.5 with the command “modkit pileup --cpg --combine-strands --ignore h --filter-threshold 0.9 --bedgraph". For methylation deconvolution, Minimap BAM files were used directly (details below).
Copy number alteration analysis
[137] Analysis of CNAs was performed using ichorCNA v. 0.5.0. Default parameters were used with all chromosomes 1-22, with a genomic bin size of 1E6. The following other parameters were used:
“centromere=’GRCh38. GCA_000001405.2_centromere_acen.txt’, scStates=”,repTimeWig=",ploidy=2,maxCN=7,includeHOMD=FALSE,estimateNo rmal=TRUE,estimatePloidy=TRUE,estimateScPrevalence=TRUE,altFracThresh old=0.05, genomeBuild="hg38",genomeStyle="UCSC",flankLength=100000, txnE =0.9999999, chrs="c('chr1', ... ,'chr22')",chrTrain="c('chr1 ', ... ,'chr22')",normal="c(0. 5)"
[138] CNA was considered “detected” if tumor fraction was estimated to be greater than 0.
Fragment length analysis
[139] Fragment lengths were extracted from ONT Minimap BAMs using the script “fragmentationReports.py”, which uses Pysam and defines the fragment length based on the primary alignment, calculating the difference between the start and end coordinates on the reference genome. Fragment length distributions for PCA analysis were created by defining bins as 10Ax where x contains a range 50 to 50,000. Raw fragment lengths are log transformed and assigned to the closest bin by rounding to the nearest increment of 10A0.01. The proportion of fragments is defined as the number of fragments in a given bin divided by the total number of fragments in all bins. For the proportion of total DNA, each fragment in a bin is multiplied by the number of base pairs in the fragment and summed to get the bin base pair total. Each bin total is then divided by the sum of all base pair sums in all bins.
[140] In order to perform CNA, end motif analysis, and methylation cell of origin analysis in specific fragment length bins, Minimap BAMs were filtered using the script “stratifyBamByFraglen.py”, which uses the same Pysam code as above to calculate the fragment length of each read, and write that read to the output BAM file only if it is within the correct length range. The code to perform this analysis is available as the script “fraglens_to_histogram.py”.
End motif analysis
[141] ONT Minimap BAMs were processed using the script “fragmentationReports.py”, which uses Pysam to perform the following analysis. We collect all reads that contain a perfect match to the final 5 base pairs of the ONT adapter sequence (“CACCT”) as the final sequence in the soft-clipped portion of the read. We then collect the first 5 base pairs of the aligned portion of the read, which is by definition adjacent to the adapter sequence. This is the “end motif”. If any of the adjacent 5 base pairs have a basecalling quality less than 20, the end is not counted. We count each of the two ends of each fragment as independent observations.
Principal Component Analysis (PCA) of fragment length distributions
[142] DNA proportions were calculated for fragment length bins as described above. PCA was performed using the sklearn. decomposition Python package. PCA was performed on column normalized versions of the fragment length distributions. These were defined by
taking the proportion of DNA in each bin for of a given sample, and calculating a z-score based on the mean and standard deviation of the bin across all 21 healthy samples (not including the 3 “unspun” healthy samples).
Cancer methylation signatures
For the “global” hypomethylation signature, we used the mean methylation value at 495,252 genomic CpGs across the genome which have been associated with rapid methylation loss during replication in cancer (Zheng et al., 2023). These were taken from the file zhou_NN_scores_PMD_top_bottom_10ptile.0based.hg19.bed.gz (https://doi.org/10.1101/2022.08.16.504069), and lifted over to hg38. For the CpG island methylation signature, we used the mean of all CpGs contained within 3,329 CpG islands that were significantly hypermethylated within human cancer data from The Cancer Genome Atlas (TCGA) project (Zheng et al., 2021).
Methylation Cell of Origin (COO) analysis
[143] We used a version of CelFiE-ISH package v0.0.2, https://gitlab.com/methylgrammarlab/deconvolution_models), modified to take Minimap BAM files as input. We used the command: deconvolution --model uxm --minjength 2 --u_threshold 0.25 --modbam_qual 0.9 --percent_u U_atlas.hg38.tsv -m input.bam -i U250.hg38.tsv -b -- cpg_coordinates hg38.analysisSet.CGmotifs.modkit. merged. Obased. bed. gz -- epiformat modbam — outfile output.txt
[144] The “--modbam_qual 0.9” setting was used to filter out any modification base with a modification probability score less than 0.9. We used the 31 cell types from the WGBS atlas (Loyfer et al., 2023), as described in the CelFiE-ISH paper (Unterman et al., 2024). We summed individual cell types to create composite cell type groups such as “Gastrointesintal- Ep” ('Gastric-Ep'+'Colon-Ep'+'Small-lnt-Ep'). For Figure 5, each cell type was normalized by calculating the Z-score using the mean and standard deviation across all healthy and cancer samples.
Statistical tests
[145] Statistical tests, jitter plots, and scatterplots were generated with GraphPad Prism v. 10.2.1.
EXAMPLE 1 : Nanopore sequencing of a selected subset of cancer and healthy human samples
[146] Oxford Nanopore Technologies (ONT) may be used to identify DNA methylation states which can be used to determine cell-of-origin (COO) from circulating plasma DNA. In order to better understand the origins of increased nucleosome and DNA levels found in cancer, we picked a subset of 37 cancer samples representing a range of different cancer types (from 12 cancer types) and 21 healthy samples,, as well as high and low chromatin cases. Each sample was sequenced to a depth of 1-16million reads (median=4.7 million) using ONT MinlON 9.4.1 flowcells, LSK109 library chemistry.
[147] While ONT sequencing yielded less fragments on average than the Illumina sequencing, we were able to detect copy number alterations (CNAs) in 16 of the 37 cancer samples (Figure 1A). While we did not have any samples with CNA for some cancer types such as AML, Brain, and Pancreas, we did have multiple cases with CNAs for several common cancer types including Colorectal and Lung. Not only were CNAs and tumor fraction estimates highly consistent between the two methods (Figure 1B), ONT was able to detect CNA in 5 samples (all with tumor fractions estimated at less than 0.2) that the Illumina Whole Genome Sequencing (WGS) data could not (Figure 1C).
[148] Using methylation information from ONT reads and a methylation atlas of normal cell types we performed COO deconvolution. This identified the correct cancer COO for 11 of the 16 CNA-positive samples, and 1 CNA-negative sample (Figure 1D-E). Importantly, the absolute fraction of the cancer derived DNA was never above 50%, consistent with CNA analysis and indicating that cancer DNA was not the major contributor of elevated cfDNA levels in the samples studied. Five Acute Myeloid Leukemia samples had no CNAs but did have elevated granulocyte derived DNA, which could not be unambiguously assigned to cancer or non-cancer origin given the common myeloid lineage. The remaining 15 samples had neither detection of CNAs nor of cancer derived DNA, suggesting low cancer fraction despite many having high cfDNA and H3.1 nucleosome levels (Figure 1D).
EXAMPLE 2: Identification of cancer-associated fragment length features
[149] A number of studies have investigated cfDNA fragment lengths using short read WGS sequencing, finding significant differences between cancer and non-cancer plasma. Using our ONT WGS data, we surveyed fragment lengths using a single-molecule
technology using long reads and no amplification, providing the most unbiased sequencing survey of fragment length across multiple cancer types.
[150] We first represented each cancer and healthy sample as the fraction of DNA sequenced, in length bins from 50-60,000 bp. Fragment lengths are often quantified as the fraction of sequencing reads in each bin. However, we reasoned that for ONT sequencing it would be preferable to analyze the amount of DNA present in each bin, rather than the sequence counts (thus, a 1kb fragment will be weighted 10x more than a 100bp fragment). One reason is that ONT sequencing is asynchronous and thus longer DNA fragments take proportionally longer to sequence than short fragments (unlike short-read sequencing). We also reasoned that quantifying the proportion of DNA rather than fragment count would better reflect the overall cell of origin of the DNA, since 1 cell yields a constant number of base pairs, not a constant number of sequence fragments.
[151] We next performed Principal Component Analysis (PCA) unsupervised clustering (Figure 2A) resulting in groups PC1 and PC2. Interestingly, the first component (PC1) bifurcated a majority of the cancers (in yellow) from a set of 7 other cancers (in red) - the majority of these PC1-positive cancers were AML. The second component (PC2) separated most cancers from the healthy samples. Because our aim is to identify aberrations from normal variation in healthy individuals, we z-score normalized all bins by their mean and standard deviation in normal samples, and re-performed PCA, which allowed better resolution of cancer vs. healthy as well as the set of 7 outlier cancers (Figure 2B). The top two components of this bin-normalized PCA captured 87% of variance.
[152] We next plotted the loading coefficients of the 2 components, and visualized each sample ordered by their values in PC1 and PC2 space (Figure 2C-D). We term the 7 cancers with high PC1 values “laddered” (or “hypofragmented”) samples - these included all 5 of Acute Myeloid Leukemias in our study, as well as one Lung cancer and one NonHodgkins Lymphoma (NHL) case. One additional Hodgkins Lymphoma was weakly positive for the PC1 signature. We identified the bins with the highest loadings for PC1, which marked a range from 900-4300 base pairs (Figure 2C). The NHL sample had about 3% of all reads in this range, whereas the other six samples had 7-33% of reads in this range. However, because these reads were significantly longer than the remainder of the reads, they made up 20% of the total DNA sequenced in the NHL sample and 33-62% in the other six samples.
[153] The second component (PC2) separated cancer from healthy samples, and was most strongly associated with fragments length 75-145 bp and 245-295 bp (Figure 2B-D). This corresponds to the shortened mono-nucleosome and dinucleosome fragments. In addition to the 7 cancers that are high for PC1 , there are 17 additional cancers that have positive PC2 values. These cancers had higher DNA and H3.1 nucleosome concentrations than the remaining 13 cases (Figure 2C) - the cancers with the top 5 highest DNA concentration and top 5 highest nucleosomes concentrations were in the PC2-high group.
EXAMPLE 3: Identification of ultra-long fragments originating from genomic contamination
[154] ONT sequencing can analyze long cfDNA fragments, but some of these may represent contamination from genomic DNA released as part of sample processing, especially from cell lysis during freezing and thawing of samples. Our standard processing protocol includes a post-thaw high-speed spin to reduce the number of such fragments (Figure 3A). Despite this, in our initial analysis we noticed that 3 of the healthy samples had much longer fragments, from 7,500-40,000 bp (the “healthy ultralong”, Figure 2C-D and Figure 3B). While this pattern was rare and thus not captured by the first 4 components of our PCA analysis, we found it interesting because the fragments were longer than the PC1 “laddered fragments” (or “hypofragmented” fragments) observed in AML and other cancers. Reasoning that these might represent genomic contamination from cellular degradation that occurred post-collection, we took advantage of pre-centrifugation (Figure 3A, “Unspun” protocol) aliquots that we had collected for these same plasma samples (Figure 3C). As measured by both total DNA and nucleosome concentration, these 3 samples had unusually high levels pre-centrifugation, compared to other healthy samples (Figure 3D-E, red bars). Interestingly, the post-centrifugation (“Spun”) aliquots did not have identifiably high levels relative to other samples (Figure 3D-E, black bars), suggesting that ONT sequencing might be more sensitive in identifying cellular contamination.
[155] We sequenced these 3 Unspun samples using ONT, and repeated our PCA analysis (Figure 3F). The first two components only captured about 81% of variance, and it was necessary to include the 3rd component, which captured another 10% (resulting in groups PC1, PC2 and PC3). PC1 was nearly identical to our earlier PCA, and PC3 was nearly identical to PC2 from our earlier PCA and was most strongly associated with fragments length 75-145 bp and 245-295 bp (Figure 3F, upper). (New) PC2 was completely new, and only had positive values for the three healthy samples with ultra-long reads, and their unspun (Figure 3F, lower). PC2 was most enriched for fragments in the range 7,500-
53,000 bp (Figure 3G). In the three ultra-long healthy samples, these fragments made up 1- 2.5% of all fragments before centrifugation (unspun), and 0.2-0.3% of all fragments after (spun). However, because these reads were significantly longer than the remainder of the reads, they comprised a much higher fraction of total DNA sequenced - 38-48% of DNA in the unspun samples, and 4-9% of DNA in the spun samples (Figure 3G). This was not detectable using bulk DNA and nucleosome assays.
[156] We quantified the proportion of DNA fragments greater than 500, 1000, or 7,500 bp before and after the high-speed spin (Figure 3H-J). In each case, the high-speed spin decreased the fraction of long fragments significantly.
[157] cfDNA fragments disproportionally start and end with the nucleotide motif CCCA and other motifs beginning with “CO”, which is the result of DNASE1L3 activity in circulation (Serpas et al. 2019; Chan et al. 2020). To assess whether the ultra-long fragments in our samples were the product of DNASE1L3-associated fragmentation, we analyzed CC end motif frequencies in the three outlier samples along with 4 typical healthy samples. Typical healthy volunteers had very few fragments over 1kb to analyze, but those that did had typically high CC end motif levels. Both the pre and post-spin samples for the three outlier healthy volunteers had typically high CC end motif levels up to about 500bp, which decreased in larger fragments (Figure 3K, blue and red lines). We confirmed this for the most specific motif, CCCA (Figure 3L).
[158] Taken together, our analysis captures at least two distinct origins of cfDNA in fragments longer than 1kb. The laddered fragments in the 1-5kb range were associated specifically with certain cancers, while the much longer fragments in the 7.5-50kb range had the same size distribution as cellular contamination present before centrifugation.
EXAMPLE 4: Unsupervised clustering by fragment length identifies both hypofragmented and hyperfragmented cancer classes
[159] In order to look at properties of cancer fragmentation classes in more detail, we analyzed additional features of the Principal Component Analysis (PCA) described in EXAMPLE 3. The first sample group (PC1-high) contained seven cancers with severely hypofragmented cfDNA, which was highly elevated in the range 900-4300bp (Figure 4A, unnormalized heatmap left, and column normalized heatmap right). The cancers in this group had between 20-66% of all cfDNA in fragments 900-4300bp (Figure 4A, “Frac DNA in PC1”). Interestingly, the PC1-high group included all five Acute Myeloid Leukemias in our
study, as well as one Lung cancer and one Non-Hodgkins Lymphoma (NHL) case. An additional Hodgkin’s Lymphoma (starred) was weakly positive for the PC1 signature, indicating a tendency for hematopoietic cancers to display this fragmentation profile. Furthermore, these fragments show nucleosomal phasing in cut preference, as indicated by the laddered appearance of the fragment length distribution, as well as CC end motif levels (Figure 4B), consistent with fragmentation by DNASE1 L3 in circulation rather than genomic contamination.
[160] The hypofragmented cancer group (PC1-high) was strongly distinguished from the 3 healthy samples with likely genomic contamination, which are high in PC2 (Figure 4A bottom). PC2 is defined by a size range of 7,500-53,000 bp. Consistent with our earlier analysis, the post-spin samples had much lower PC2 scores than the unspun samples (Figure 4A, PC scores), and much less total cfDNA in the 7,500-53,000 bp bin (Figure 4A “Frac DNA”, 4-9% for post-spin samples and 38-48% for pre-spin samples). Aside from these three sample groups, all other healthy volunteers and cancers had low values for PC1 and PC2 (Figure 4A, “PC scores”).
[161] The third principal component (PC3) defined a hyperfragmented set of 17 cancers (Figure 4C, “PC3-high cancers”). These had elevated cfDNA in the range 75-145 bp and 245-295 bp (corresponding to short mono-nucleosomes and short di-nucleosomes, respectively). Longer fragments, including those greater than 1kb, were under-represented. Interestingly, PC3-high cancers tended to have lower frequencies of the DNASE1L3- associated CC end motifs (Figure 4C).
[162] The PCA analysis in this Example included the 3 unspun samples. To ensure that these were not influencing the clustering, we performed PCA with these samples excluded. This produced nearly identical versions of PC1 and PC3 (as the first two components), and identical grouping of all cancer samples.
EXAMPLE 5: Origins of short mono and dinucleosome fragments (75-145bp, 245-295 bp)
[163] The most common signature among the cancer cases was the shortening of mono and di-nucleosomes (“PC3” in the from Figure 4A). While this pattern is preferentially observed in DNA derived from cancer cells (for instance, with single nucleotide mutations or larger copy number alterations), that alone does not seem to account for the overall shift in
fragment length. The ONT technology allows us to investigate this question using several different cancer-specific features.
[164] Excluding the 7 cancer cases associated with the longer “laddered” signature (PC2), there were 17 cancer cases with positive PC3 values (shown as a track on Figure 5A). Of these, 10 had a minimum of 1.25M sequence reads within the relevant fragment length ranges (75-145bp, 245-295 bp), and thus we limited our analysis to these 10 samples because fewer reads would not be sufficient for DNA methylation analysis. These were 1 Breast cancer, 2 Colorectal cancers, 2 NHL, 2 Prostate, 1 Ovarian, and 2 Pancreatic (shown with asterisks in Figure 5A). We first looked at copy number alterations (CNAs), because in silico fragment length selection for shorter lengths has been shown to increase detection of CNAs. In our analysis, those 6 samples with reliably detected CNAs showed relatively similar tumor fractions whether or not we used the typical mononucleosome fragments (150- 240bp) or the fragments PC3-associated fragment lengths (Figure 5B). We also investigated global DNA methylation changes, using common lamina-associated domains for global hypomethylation, and Polycomb-Repressive Complex associated CpG island promoters for global hypermethylation. 6 of the 10 samples had noticeably elevated methylation, including 3 NHLs, 2 CRCs, and a Prostate cancer. Half of these had identical methylation levels on the short fragments, while the other half had inconsistent changes (Figure 5C).
[165] DNA methylation allows for “deconvolution” of plasma DNA into its constituent cell types using reference methylation atlases. We performed deconvolution analysis using the top 250 methylation markers for each cell type using methods described in Loyfer (2023) Nature 613: 355-364 and Unterman (2023) bioRxiv. For the 5 samples with detectable CNAs, we were able to identify strong signals from the correct cell of origin, compared to only 1 of the 5 samples without CNAs (Figure 5A). While our ability to accurately deconvolute cell types at this sequencing depth is limited, this does provide additional evidence that the samples without CNAs had low cancer content. Importantly, the absolute fraction of the cancer cell of origin (COO) was never above 50%, consistent with CNA analysis indicating that a significant contribution to elevated cfDNA levels is not attributed to cancer DNA.
[166] For the 6 samples where we could identify the correct cell of origin, we performed deconvolution independently using the typical mononucleosome fragments (150-240 bp) and compared these to the PC3-associated fragments (Figure 5D). Using the same deconvolution, we looked at the fraction and composition of different blood cell types (Figure
5E). While there is a high degree of heterogeneity, there is little alteration in either the total fraction of blood cells or the composition in the PC3-associated fragments. Together, this suggests that cancer DNA and different blood cell types are both impacted by a process that shortens cell-free DNA in cancer. Given the overall higher concentration of DNA and nucleosomes in the cancers without CNA, a likely scenario is that cfDNA is inefficiently cleared in cancer, and longer retention in the blood may expose the DNA to additional fragmentation.
EXAMPLE 6: End motif analysis of fragments
[167] DNASE1L3-associated CC motifs were also investigated in long fragments. The results shown in Figure 6 focus on the differences between the PC1 group and the PC2 group. All samples from the PC1 class, totalling seven samples, and six samples from the PC2 class were investigated. Three of the six samples from the PC2 class represent the three healthy individuals that have been spun, showing a high level of PC2 fragments, and then the same three samples unspun, displaying PC2 fragments. Each one of these is aligned in the heat map of Figure 6, where the plot illustrates the CC end motif frequency as a function of fragment length for these samples.
[168] We observe that all of the samples in the smaller fragments, which do not correspond to the PC2 signature, have a high CC end motif frequency, indicating likely high DNASE1 L3 activity. The divergence in the larger fragments associated with PC2 and PC1 indicates that PC1 samples in the longer fragments retain high CC end motif frequency and, therefore, DNASE1L3 activity. The PC2 samples have reduced CC end motif frequency, descending to the genomic background level, indicating that they likely are not the result of DNASE1 L3 fragmentation. Rather, they result from genomic DNA that comes from cells remaining in the frozen plasma that are lysed upon thawing.
[169] We know that all three unspun samples exhibit much higher levels of these fragments than the spun samples, which aligns with the understanding that high-speed spinning can be employed to remove these types of genomic contamination. This is an important finding because the size range of the two groups, PC1 and PC2, is partially overlapping. Therefore, we can use methods to delineate fragments that are the result of genomic contamination from fragments that are derived from cell-free DNA in the blood with characteristic DNASE1L3 and CC end motifs.
EXAMPLE ?: Links between hyperfragmentation and elevated circulating chromatin levels in cancer
[170] Hyperfragmentation occurred in 9 of the 12 cancer types in our ONT cohort. While this phenomenon has been documented in cancer (Mouliere et al. 2011; Jiang et al. 2015; Mouliere et al. 2018; Van Der Pol et al. 2023), its significance remains poorly understood. We tested whether a correlation analysis based on our PCA-based ONT signature could provide additional insight.
[171] In order to determine whether the shorter fragments had the typical fragmentation preferences associated with DNASE1L3 activity, which lead to fragment ends containing CO, we investigated end motifs of these fragments. We first excluded the 7 PC1-high hypofragmented cancers, leaving 17 PC3-high cancers and 13 PC3-low cancers. We calculated pairwise correlations between PC3 values and 6 other tumor features and used these to perform hierarchical clustering of the features which resulted in three major clusters (Figure 7A-B).
[172] The PC3 value clustered tightly with the fraction of fragments less than 150 bp, a simple but commonly used definition for short fragments. These two formed a single cluster with the DNASE1L3-associated CCCA end motif (Figure 7A-B, blue cluster and example scatterplots in Figure 7C-D). The clustering of hyperfragmentation and the CCCA end motif is significant given that both features were previously linked to DNASE1L3 activity (Serpas et al. 2019; Chan et al. 2020; Han et al. 2020).
[173] The two metrics that directly detect cancer DNA were also tightly clustered (Figure 7A-B, red cluster). However, these were not well correlated to the PC3/DNASE1 L3 (blue) cluster - the average pairwise correlation between features in the red and blue was only 0.17 (Figure 7B, and example scatterplots in Figure 7E-F). To specifically measure whether shortened fragments were enriched for cancer-derived DNA, we calculated CNA tumor fraction and methylation cell of origin directly in the shorter fragments associated with PC3. Of the 10 PC3-high cancers with sufficient read coverage, only 2 had measurable enrichment for cancer DNA, and neither was more than 1.5-fold enriched.
[174] H3.1 nucleosome and cfDNA concentration also formed a tight cluster (Figure 7A- B, green cluster). In contrast to the features related to cancer fraction, these were moderately correlated with the PC3/DNASE1 L3 cluster (average pairwise correlation 0.43), and indeed, formed a supercluster (Figure 7A-B). While cancers with low PC3
hyperfragmentation values had variable DNA and nucleosome concentrations, those with high PC3 values almost always had elevated concentrations (Figure 7G-H). This suggests a potential link between DNASE1 L3-associated hyperfragmentation and the abnormally high levels of immune-derived cfDNA observed in cancer. The correlation with nucleosome levels suggests that the data analysis is measuring a specific type of inflammatory response to cancer, which could provide a prognostic marker or be predictive of treatment (e.g. immunotherapy) outcomes.
Discussion:
[175] Here, we performed a survey of cancer types using Illumina and Oxford Nanopore (ONT) sequencing, discovering new fragmentation changes that occurred across diverse cancer types.
[176] Here, we showed that the abundance of ultra-long fragments (greater than 7,500bp) was strongly influenced by high-speed centrifugation after thawing, and that such fragments could remain even after centrifugation in some samples. While the majority of these fragments were >7,500bp, all fragments of length 500 bp or longer were significantly influenced by high-speed centrifugation.
[177] The ultra-long fragments we identified had a dramatic loss of DNASE1L3- associated end motifs, which start with the nucleotide sequence CO. Here, we found an order of magnitude difference in the frequency of ultra-long reads before and after highspeed centrifugation, demonstrating that longer reads are at least sometimes a consequence of pre-analytical variables. This suggests that the abundance of such reads can be used for pre-analytical quality control and assurance and can be bioinformatically subtracted when they do occur.
[178] We were able to clearly distinguish another class of long fragments, which was cancer-specific and included almost 20% of the cancer samples in our study. These fragments could be distinguished by a shorter length distribution (peaking at 900-4300bp or 5-25 nucleosomes) and had end motif patterns consistent with typical DNASE1L3 fragmentation. The length distribution observed in these cancers is broadly consistent with that of Neutrophil Extracellular Traps (NETs) exposed to circulating nucleases (Pisareva et al. 2022), suggesting they might derive from NETs fragmented in circulation. NETs have been associated with myeloid hyperproliferation (Wolach et al. 2018) and linked to accumulation of total circulating DNA in cancer (Pastor et al. 2022; Thierry and Pisareva
2023). Alternatively, laddered fragments in this size range can be released by necrotic cells (lingerer et al. 2021). It is also possible that active DNASE1L3 during blood handling could fragment longer DNA from either necrosing or lysed cells.
[179] Our finding that hyperfragmentation was poorly correlated to cancer DNA fraction is somewhat paradoxical in the face of earlier work which documented the preferential shortening of cancer DNA relative to non-cancer DNA using xenograft analysis (Mouliere et al. 2011 , 2018; Van Der Pol et al. 2023; Moldovan et al. 2024), copy number analysis (Jiang et al. 2015; Mouliere et al. 2018) and mutational analysis (Mouliere et al. 2011, 2018; Widman et al. 2022; Wan et al. 2020). Despite this statistical association, there is a high degree of variability and many individual cancer cases show no preferential shortening of cancer DNA (Mouliere et al. 2018). Furthermore, a systematic analysis of mutated fragments in cancer shows that fragment shortening is a weak predictor of cancer origin, with an AUROC of 0.51-0.61 across different solid cancer types (Widman et al. 2022). Indeed, in our own analysis in hyperfragmented solid tumors, only 20% of cases showed a substantial preference for cancer DNA in short fragments, and the enrichment was modest.
[180] In contrast, we did find a positive association between the degree of hyperfragmentation and total cfDNA and nucleosomes levels. This suggests that hyperfragmentation is linked to a systemic process which results in accumulation of both cancer and blood-derived cfDNA, which is linked to loss of DNASE1 L3 activity (Serpas et al. 2019; Chan et al. 2020; Han et al. 2020; Han and Lo 2021). The preferential shortening of cancer-derived DNA in some individual cancers is likely related to cancer-associated changes in nucleosome density and/or DNA methylation in long-range chromatin domains (Cristiano et al. 2019), both of which can influence DNASE1 L3 activity (Han et al. 2020). The events that result in hyperfragmentation and accumulation of total cfDNA in cancer are unclear. In mice, loss of DNASE1L3 activity leads to fragment shortening, increased anti- DNA antibody, and progressive systemic lupus erythematosus (SLE) like disease (Sisirak et al. 2016; Serpas et al. 2019). Likewise, Loss of Function variants in the DNASE1 L3 gene in humans are linked to shortening of fragments and SLE (Chan et al. 2014, 2020). Taken together, these suggest that the hyperfragmentation phenotype and elevated cfDNA levels in cancer may both be the result of an imbalance between DNASE1L3 activity and cfDNA clearance that leads to an immune response.
EXAMPLE 8: Cancer-specific DNA methylation patterns in client-owned dogs
Methods:
[181] All methods were the same for canine as described for the human examples above, with the following exceptions. Pre-treatment blood samples were collected from healthy dogs (n=7) and dogs diagnosed with a variety of cancers (n=14) with informed owner consent. For this “ultra-shallow” sequencing cohort, DNA isolation and sequencing were performed as described above for human samples, with a median 0.05x genomic coverage. For an additional group of high-grade lymphoma cases, samples were collected both before (treatment naive) and 24 hours after chemotherapy (n=15) as well as during follow up chemotherapy visits when patients were in remission (n=5). DNA was isolated from EDTA plasma using the Qiagen QIAamp kit and quantified using Qubit. Nucleosome concentrations were quantified using the Nu.Q® H3.1 ELISA assay. DNA libraries were prepared and sequenced using Oxford Nanopore Technologies Native Barcoding Kit v. 14, PromethlON R10.4.1 flow cells, and the P2 Solo sequencer. Base and modification calling were performed using the ONT Dorado basecaller v. 0.4.1 with the dna_r10.4.1_e8.2_400bps_hac@v4.2.0_5mCG_5hmCG modification calling model. DNA modifications were extracted using Oxford Nanopore modkit 0.1.13. CpG islands for canFam5 were taken from the UCSC genome browser. The commonly hypermethylated human CpG island promoters described above were mapped to orthologous gene promoters in the canFam5 genome using orthologous gene mappings from the Zoonomia TOGA dataset (Kirilenko, 2023). This set of promoters was then filtered for those where the promoter (transcription start site) overlapped a CpG island in the canFam5 UCSC CpG island track. Finally, we removed CpG island promoters that were longer than 20kb in the canFam5 genome, since most indicated a canFam5 genome assembly errors within repetitive telomeric regions upon manual inspection. The canine CpG island methylation signature was the mean of the resulting 1,272 resulting CpG islands. For the canine global methylation signature, we used the mean methylation at all CpGs in the genome, after removing those contained within CpG islands defined in the canFam5 UCSC CpG island track. Principal Component Analysis (PCA) was performed as described above for humans. The first principal component (PC1) corresponded to short fragments (less than 150 bp) in all PCA analyses, and we thus termed this the “hyperfragmentation” component.
Results:
[182] We adapted multiple cancer-specific methylation signatures described in humans to canine methylation data. First, we used the “ultra-shallow” ONT sequencing cohort (median 0.05x genomic coverage, using MinlON flow cells) to assess global whole-genome
DNA methylation (WGM) in 7 healthy canines and 14 canines with a variety of cancer diagnoses, which demonstrated global hypomethylation across cancer types when compared to healthy controls (Figure 8A).
[183] We used “shallow” whole-genome sequencing (median 1-2x coverage, using PromethlON flow cells) to profile a subset of dogs recently diagnosed with lymphoma that had plasma samples collected at diagnosis (treatment naive) and again 24 hours after the 1st chemotherapy treatment. While in previous studies post-treatment samples were taken later and H3.1 nucleosomes have generally been found to decrease, this data shows that H3.1 nucleosome levels increase at 24 hours post-treatment (Figure 8B-D). Global DNA methylation levels from ONT sequencing were mostly low at diagnosis (indicating high cancer content), as shown in Figure 8E. While these showed almost no change postchemotherapy, they were often increased at the time of clinical remission (Figure 8E). CpG island (CGI) promoters that commonly gain DNA methylation in human cancers were significantly lower at remission (Figure 8F), suggesting a high degree of concordance between general DNA methylation changes occurring in humans and dogs. Abnormally short fragments (less than 150bp), which have also been associated with human cancers, were lower at remission but had inconsistent changes post-chemotherapy (Figure 8G). The proportion of DNA sequenced in different fragment length bins on a logarithmic scale from 50 bp to 10,000 bp shows a global trend of fragment shortening in the post-chemotherapy timepoints relative to pre-chemotherapy and remission (Figure 8H).
Discussion:
[184] In humans, global hypomethylation is a hallmark of cancer. The same has been shown in canines (Scattone et al. 2021, Xavier et al. 2020). For this pilot study, plasma samples of dogs with cancer were compared to those of healthy dogs. Global hypomethylation was present in nearly all cases regardless of cancer histology (Figure 8A). A subset of dogs diagnosed with lymphoma had serial plasma samples prospectively collected for evaluation. Plasma nucleosome concentrations were elevated in 12 of the 15 dogs at the time of diagnosis (Figure 8B-D). Another sample was collected 24 hours after the first dose of chemotherapy and all but 3 dogs demonstrated an increase in plasma nucleosome concentrations at that time.
[185] Using ONT sequencing of a subset of 5 dogs, we found that two cancer-specific DNA methylation features showed little change 24 hours post-chemotherapy (Figure 8E-F). While fragments less than 150 bp were often elevated (Figure 8G), examination of the whole spectrum of fragment lengths showed a reduction of longer fragments in all cases
(Figure 8H). In humans, these shortened fragments are not always associated with cancer DNA (Mouliere et al. 2018), and patients with autoimmune diseases also show an increase in these shortened fragments (Chan et al. 2020). The shortening of fragments without cancer-associated DNA methylation changes suggests that the increase of total nucleosomes at the 24-hour time point may be partially capturing an immune response or other systemic change in cfDNA processing. This parallels with the associations seen in human cancers (e.g. see Example 7), where PC3-associated fragment shortening was accompanied by elevated total nucleosomes but not always by increased cancer-derived DNA. By the time of remission, both DNA methylation features and fragmentation length showed a marked change and resembled those of healthy dogs (Figures 8E-H). One case, XB, did not have clinically detectable elevations in H3.1 plasma nucleosome concentrations at the time of diagnosis, however, this case did have fragmentation and DNA methylation patterns consistent with cancer suggesting that ONT sequencing may provide a more sensitive detection tool (Figures 8E-H). Therefore, DNA methylation-based methods used to detect the presence and nature of circulating cancer DNA in humans can similarly be applied to dogs to improve cancer detection in canine populations.
EXAMPLE 9: Hyperfragmentation is a strong predictor of healthy vs. cancer status and inflammation vs. cancer status in client-owned dogs
Methods:
[186] We include the “ultra-shallow” healthy vs. cancer cohort described in Example 8, as well as an expanded set of 13 pre-treatment samples from the “shallow” lymphoma cohort described there. For an additional group of hemangiosarcoma cases (n=5), blood samples were collected at diagnosis (treatment naive) and when patients were in clinical remission. For a group of non-cancer inflammation cases (Pyometra, n=8), blood samples were collected during active inflammation. For all except the “ultra-shallow” cases, we used the “shallow” sequencing protocol of 1-2x genomic coverage. For all group to group comparisons, we used Fisher’s LSD test to calculate p-values. For classification, we used Multiple Logistic Regression (MLR), which was performed without regularization.
[187] For the CpG island methylation, we used the original set of CpG islands described in Example 8 for analysis of all samples from “shallow” cohorts. For the “ultra-shallow” cohort, we refined this list by identifying the subset of CGIs that were completely unmethylated in independent control samples (taken from the “Pyometra” set). First, we removed 1 of the 8 Pyometra samples that had lower genomic coverage from the rest. Then, for each of 290,636 individual CpG dinucleotides covered by a CGI in the original set, we
summed the number of reads for the remaining 7 Pyometra samples, and removed those with less than 10 reads total, resulting in 285,753 CpGs with sufficient coverage. Then, we retained only those CpGs within less than 5% of these reads methylated, resulting in a set of 212,078 nearly completely unmethylated CpGs. We merged all resulting CpGs within 50bp into a sub-island region, resulting in 5,404 sub-islands. We then removed all sub-islands that were less than 100bp in length or contained less than 10 individual CpGs, resulting in 2,433 sub-islands covering 201,045 individual CpGs. The covered CpGs were used to calculate methylation averages only for the “ultra-shallow” cohort. It was not appropriate to use these for comparisons between Pyometra and other “shallow” samples.
Results:
[188] We performed PCA analysis as described above, using the log-transformed bin frequencies from all 21 “ultra-shallow” healthy and cancer samples. The first principal component (pel) had values that were positively correlated to the fraction of fragments less than 150 bp in length, and thus we termed this the “hyperfragmentation” component. This component almost perfectly separated healthy from cancer samples (Figure 9A), as indicated by an Area Under the Curve (AUC) of 0.98, calculated from a Receiver Operating Characteristics curve. To compare this to bona-fide markers of cancer DNA, we investigated global methylation (Figure 9B), tumor fraction from ichorCNA analysis (Figure 9C), and cancer-associated CpG island methylation (Figure 9D). These cancer features had AUC values of 0.81-0.91 , indicating slightly poorer classification. We used multivariable logistic regression (MLR) to classify heathy vs. cancer, including the three bona fide cancer DNA markers, resulting in an AUC of 0.92, or 0.84 using Leave-One-Out Cross Validation (LOOCV) to assess robustness (Figure 9E). In this analysis, hyperfragmentation performed as well or better than the bona fide cancer DNA markers, either alone or in combination.
[189] Inflammation is often associated with increased levels of circulating nucleosomes and DNA in humans and dogs, including in the canine inflammatory condition Pyometra (a uterine infection; while most commonly associated with dogs, the infection has also been identified in other animals, such as cattle, swine, cats and many rodent animals). To assess the value of cfDNA hyperfragmentation in distinguishing inflammation from cancer, we sequenced 8 inflammatory Pyometra samples with high circulating nucleosomes (“INF”), with 13 treatment naive lymphoma samples (“LSA”), and 5 treatment naive hemangiosarcoma samples (“HSA”). We first combined all 26 samples in a single PCA analysis as described above. The first principal component (pel) had values that were positively correlated to the fraction of fragments less than 150 bp in length, and thus we termed this the “hyperfragmentation” component. Hyperfragmentation was a strong predictor
of inflammation vs. cancer (AUC=0.85, Figure 10A). To compare this to bona-fide markers of cancer DNA, we investigated global methylation (Figure 10B), tumor fraction from ichorCNA analysis (Figure 10C), and CpG island methylation (CGI, Figure 10D). These cancer features had AUC values of 0.70-0.88, indicating poorer or comparable levels of classification. We used multivariable logistic regression (MLR) to classify heathy vs. cancer, including the three bona fide cancer DNA markers, resulting in an AUC of 0.94, or 0.83 using Leave-One-Out Cross Validation (LOOCV) to assess robustness (Figure 10E). In this analysis, hyperfragmentation performed comparably to the bona fide cancer DNA markers, either alone or in combination.
EXAMPLE 10: Using Hyperfragmentation and cancer DNA markers to monitor dogs during and after treatment
Methods:
[190] For fragment-level methylation analysis of CpG islands, we used only the 2,433 sub-islands that were completely unmethylated in Pyometra samples, described in Example 9. Cancer fragments were defined as those with 40% or higher average methylation, and non-cancer fragments were defined as those with less than 15% average methylation.
Results:
[191] A subset of 5 of the dogs diagnosed with lymphoma (LSA) and 5 of the dogs diagnosed with hemangiosarcoma (HSA) had plasma samples collected before treatment and also at the time of clinical remission. The proportion of DNA sequenced in different fragment length bins on a logarithmic scale from 50bp to 10,000bp showed a consistent trend of shorter fragments at pre-treatment and longer fragments at remission (Figure 11 A). Correspondingly, the hyperfragmentation component (pel) decreased during remission in all 10 cases in both cancer types (Figure 11B). We compared this to bona-fide cancer DNA markers - global methylation, CpG island methylation, and tumor fraction from ichorCNA analysis (Figure 11C), none of which showed perfect consistency in differentiating pretreatment from remission samples.
[192] A subset of the dogs recently diagnosed with lymphoma additionally had plasma samples collected 24 hours after the first chemotherapy treatment. While in previous studies, levels of H3.1 nucleosomes have been shown to generally decrease post-treatment, those studies focused on later time points after treatment. In our study, the 24 hour time point revealed an increase in H3.1 nucleosomes (Figure 12A, left). Based on the hyperfragmentation component (pel), the degree of hyperfragmention was also higher post-
therapy (Figure 12A, right). In contrast, bona-fide cancer DNA markers including CpG island promoters and tumor fraction based on ichorCNA were typically unchanged or inconsistently changed post-therapy (Figure 12B). When we divided the reads at CpG island promoters into inferred “cancer fragments” and “non-cancer fragments” based on DNA methylation (illustrated in Figure 12C), we observed that both cancer and non-cancer fragments were equally shortened post-therapy (Figure 12D). This suggests that fragment shortening may be the result of systemic response to chemotherapy that affects all circulating DNA.
Discussion:
[193] Hyperfragmentation of circulating cell-free DNA has been previously associated with human cancer, and we have used a clustering-based hyperfragmentation signature derived from Nanopore sequencing to analyze human cancers. Here, we show that cfDNA hyperfragmentation is informative for monitoring canine cancers. In particular, a high degree of hyperfragmentation was observed in pre-treatment cancer cases, and these levels decreased in the same patients when in clinical remission (Figure 11A-B). Whereas other bona-fide cancer DNA markers such as CpG island methylation also changed during remission (Figure 11 C), these changes were not as consistent.
[194] In order to better understand when circulating DNA signatures begin to change after treatment, we investigated 5 lymphoma cases 24 hours after chemotherapy. Interestingly, at this time point hyperfragmentation consistently increased, in contrast to the decrease observed later during remission in these same dogs (Figure 12A, right). This 24 hour increase in hyperfragmentation corresponded to elevated levels of circulating H3.1 nucleosomes (Figure 12A, left), in contrast to bona-fide cancer DNA markers generally went down during this period or remained unchanged (Figure 12B). Taken together, these results suggested that the hyperfragmentation was not restricted to cancer DNA. To investigate this, individual fragments within cancer-associated CpG islands were assigned to cancer or non-cancer status, and the two groups had essentially the same degree of hyperfragmentation (Figure 12C). This suggests a potentially systemic cfDNA response to the chemotherapy occurs within the first 24 hours which exacerbates the hyperfragmentation of all DNA, and that monitoring for the reduction of cancer DNA should begin at a late time point to observe a reduction in hyperfragmentation and other cancer DNA markers.
EXAMPLE 11 : Nanopore sequencing of clinical samples
Methods:
[195] DNA was isolated from EDTA plasma using the MAGicBead™ cfDNA kit and quantified using Qubit. DNA libraries were prepared and sequenced using Oxford Nanopore Technologies Native Barcoding Kit v. 14, PromethlON R10.4.1 flow cells, and the P2 Solo sequencer, to a “shallow” depth of 1-2x genomic coverage. Base and modification calling were performed using the ONT Dorado basecaller v. 0.4.1 with the dna_r10.4.1_e8.2_400bps_hac@v4.2.0_5mCG_5hmCG modification calling model. DNA modifications were extracted using Oxford Nanopore modkit 0.1.13. CpG islands for canFam5 were taken from the UCSC genome browser.
[196] We used a version of CelFiE-ISH package v0.0.2, https://gitlab.com/methylgrammarlab/deconvolution_models), modified to take Minimap BAM files as input. We used the command: deconvolution --model uxm --minjength 2 --u_threshold 0.25 --modbam_qual 0.9 --percent_u U_atlas.hg38.tsv -m input.bam -i U250.hg38.tsv -b -- cpg_coordinates hg38.analysisSet.CGmotifs. modkit. merged. Obased. bed. gz -- epiformat modbam — outfile output.txt
The “--modbam_qual 0.9” setting was used to filter out any modification base with a modification probability score less than 0.9. We used the 31 cell types from the WGBS atlas (Loyfer et al., 2023), as described in the CelFiE-ISH paper (bioRxiv DOI: 10.1101/2023.08.20.554012).
Results:
[197] Sepsis has been associated with high levels of circulating DNA and nucleosomes. We sequenced circulating cfDNA from the plasma 25 control subjects with low H3.1 nucleosomes, 25 suspected sepsis cases that were ultimately determined to be sepsisnegative, and 10 sepsis cases with elevated H3.1. Fragment length profiles showed that the control and negative subjects had relatively short fragments, with the exception of 3 control subjects that had ultra-long fragments characteristic of cellular contamination (Figure 13A). In contrast, most sepsis patients had abundant fragments up to 2kb in length. To quantify this, we calculated the percentage of fragments within the 900-4, 300bp range identified previously as hypofragmented. Most of the sepsis samples had grater than 10% of fragments within this hypofragmented range, whereas no control sample had greater than 10% (Figure 13B). Using DNA cell-of-origin methylation analysis, we estimated the fraction
of neutrophil-derived DNA in each sample, which showed that the sepsis samples are significantly enriched for neutrophil-derived DNA (Figure 13C).
Discussion:
[198] Neutrophil Extracellular Traps (NETs) have been implicated as a contributor to poor outcomes in sepsis. The length distribution of fragments observed in these sepsis is broadly consistent with that of NETs exposed to circulating nucleases (Pisareva et al. 2022), suggesting they might derive from NETs fragmented in circulation.
REFERENCES
Adalsteinsson et al. 2017. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors. Nat Commun 8: 1324.
Alcaide et al. 2020. Evaluating the quantity, quality and size distribution of cell-free DNA by multiplex droplet digital PCR. Sci Rep 10: 12564.
Baca et al. 2023. Liquid biopsy epigenomic profiling for cancer subtyping. Nat Med 29: 2737-2741.
Bauden et al. 2015. Circulating nucleosomes as epigenetic biomarkers in pancreatic cancer. Clin Epigenetics 7: 106.
Budhraja et al. 2023. Genome-wide analysis of aberrant position and sequence of plasma DNA fragment ends in patients with cancer. Sci Transl Med 15: eabm6863.
Chan et al. 2014. Plasma DNA aberrations in systemic lupus erythematosus revealed by genomic and methylomic sequencing. Proc Natl Acad Sci 111.
Chan et al. 2020. Plasma DNA Profile Associated with DNASE1L3 Gene Mutations: Clinical Observations, Relationships to Nuclease Substrate Preference, and In Vivo Correction. Am J Hum Genet 107: 882-894.
Che et al. 2024. Genomic origin, fragmentomics, and transcriptional properties of long cell-free DNA molecules in human plasma. Genome Res 34: 189-200.
Choy et al. 2022. Single-Molecule Sequencing Enables Long Cell-Free DNA Detection and Direct Methylation Analysis for Cancer Patients. Clin Chem 68: 1151-1163.
Cohen et al. 2023. Practical recommendations for using ctDNA in clinical decision making. Nature 619: 259-268.
Cristiano et al. 2019. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature 570: 385-389.
Dor Y, Cedar H. 2018. Principles of DNA methylation and their implications for biology and medicine. The Lancet 392: 777-786.
Fedyuk et al. 2023. Multiplexed, single-molecule, epigenetic analysis of plasma- isolated nucleosomes for cancer diagnostics. Nat Biotechnol 41 : 212-221.
Fitzgerald et al. 2022. The future of early cancer detection. Nat Med 28: 666-677.
Han DSC, Lo YMD. 2021. The Nexus of cfDNA and Nuclease Biology. Trends Genet 37: 758-770.
Han et al. 2020. The Biology of Cell-free DNA Fragmentation and the Roles of DNASE1, DNASE1L3, and DFFB. Am J Hum Genet 106: 202-214.
Hasenleithner SO, Speicher MR. 2022. A clinician’s handbook for using ctDNA throughout the patient journey. Mol Cancer 21 : 81.
Jiang et al. 2015. Lengthening and shortening of plasma DNA in hepatocellular carcinoma patients. Proc Natl Acad Sci 112.
Katsman et al. 2022. Detecting cell-of-origin and cancer-specific methylation features of cell-free DNA from Nanopore sequencing. Genome Biol 23: 158.
Kirilenko et al. 2023. Integrating gene annotation with orthology inference at scale. Science 380: eabn3107.
Lo et al. 2021. Epigenetics, fragmentomics, and topology of cell-free DNA in liquid biopsies. Science 372: eaaw3616.
Loyfer et al. 2023. A DNA methylation atlas of normal human cell types. Nature 613: 355-364.
Markus et al. 2018. Evaluation of pre-analytical factors affecting plasma DNA analysis. Sci Rep 8: 7375.
Martignano et al. 2021. Nanopore sequencing from liquid biopsy: analysis of copy number variations from cell-free DNA of lung cancer patients. Mol Cancer 20: 32.
Mattox et al. 2023. The Origin of Highly Elevated Cell-Free DNA in Healthy Individuals and Patients with Pancreatic, Colorectal, Lung, or Ovarian Cancer. Cancer Discov 13: 2166-2179.
Moldovan et al. 2024. Multi-modal cell-free DNA genomic and fragmentomic patterns enhance cancer survival and recurrence analysis. Cell Rep Med 5: 101349.
Mouliere et al. 2018. Enhanced detection of circulating tumor DNA by fragment size analysis. Sci Transl Med 10: eaat4921.
Mouliere et al. 2011. High Fragmentation Characterizes Tumour-Derived Circulating DNA ed. T. Lee. PLoS ONE 6: e23418.
Pastor et al. 2022. Association of neutrophil extracellular traps with the production of circulating DNA in patients with colorectal cancer. iScience 25: 103826.
Pisareva et al. 2022. Neutrophil extracellular traps have auto-catabolic activity and produce mononucleosome-associated circulating DNA. Genome Med 14: 135.
Sadeh et al. 2021. ChlP-seq of plasma cell-free nucleosomes identifies gene expression programs of the cells of origin. Nat Biotechnol 39: 586-598.
Scattone et al. 2020. Quantification of Global DNA Methylation in Canine Melanotic and Amelanotic Oral Mucosal Melanomas and Peripheral Blood Leukocytes From the Same Patients With OMM: First Study. Front Vet Sci 8:680181.
Serpas et al. 2019. Dnase1l3 deletion causes aberrations in length and end-motif frequencies in plasma DNA. Proc Natl Acad Sci 116: 641-649.
Sisirak et al. 2016. Digestion of Chromatin in Apoptotic Cell Microparticles Prevents Autoimmunity. Cell 166: 88-101.
Snyder et al. 2016. Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin. Cell 164: 57-68.
Thierry AR. 2023. Circulating DNA fragmentomics and cancer screening. Cell Genomics 3: 100242.
Thierry AR, Pisareva E. 2023. A New Paradigm of the Origins of Circulating DNA in Patients with Cancer. Cancer Discov 13: 2122-2124. lllz et al. 2016. Inferring expressed genes by whole-genome sequencing of plasma DNA. Nat Genet 48: 1273-1278. lingerer et al. 2021. Serial profiling of cell-free DNA and nucleosome histone modifications in cell cultures. Sci Rep 11 : 9460.
Unterman et al. 2023. Multi-cell type deconvolution using a probabilistic model for single-molecule DNA methylation haplotypes. http://biorxiv.org/lookup/doi/10.1101/2023.08.20.554012
Unterman et al. 2024. CelFiE-ISH: a probabilistic model for multi-cell type deconvolution from single-molecule DNA methylation haplotypes. Genome Biol. 25: 151.
Van Der Pol Y, Mouliere F. 2019. Toward the Early Detection of Cancer by Decoding the Epigenetic and Environmental Fingerprints of Cell-Free DNA. Cancer Cell 36: 350-368.
Van Der Pol et al. 2023. Real-time analysis of the cancer genome and fragmentome from plasma and urine cell-free DNA using nanopore sequencing. EMBO Mol Med 15: e 17282.
Wan et al. 2020. ctDNA monitoring using patient-specific sequencing and integration of variant reads. Sci Transl Med 12: eaaz8084.
Widman et al. 2022. Machine learning guided signal enrichment for ultrasensitive plasma tumor burden monitoring, http://biorxiv.org/lookup/doi/10.1101/2022.01.17.476508
Wolach et al. 2018. Increased neutrophil extracellular trap formation promotes thrombosis in myeloproliferative neoplasms. Sci Transl Med 10: eaan8292.
Xavier et al. 2020. Epigenetic Mechanisms in Canine Cancer. Front Oncol. 10: 591843.
Yu et al. 2023. Comparison of Single Molecule, Real-Time Sequencing and Nanopore Sequencing for Analysis of the Size, End-Motif, and Tissue-of-Origin of Long Cell- Free DNA in Plasma. Clin Chem 69: 168-179.
Yu et al. 2021. Single-molecule sequencing reveals a large population of long cell- free DNA molecules in maternal plasma. Proc Natl Acad Sci 118: e2114937118.
Zheng et al. 2021. A pan-cancer analysis of CpG Island gene regulation reveals extensive plasticity within Polycomb target genes. Nat. Commun. 12: 2485.
Zheng et al. 2023 Comprehensive analyses of partially methylated domains and differentially methylated regions in esophageal cancer reveal both cell-type- and cancer- specific epigenetic regulation. Genome Biology 24: 193.
Zhou et al. 2022. Epigenetic analysis of cell-free DNA by fragmentomic profiling. Proc Natl Acad Sci 119: e2209852119.
Claims
1. A method of determining a tissue of origin, a cell type of origin, origination from a cancerous or immune cell, or a combination thereof of cell-free DNA (cfDNA), the method comprising:
(i) providing a body fluid sample comprising cfDNA;
(ii) passing said cfDNA through a nanopore sequencer to produce a sequence of said cfDNA and obtain fragment length data of said cfDNA; and
(iii) identifying for said cfDNA passed through the nanopore sequencer a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof based on the fragment length data; thereby determining a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
2. The method of claim 1 , wherein analysis of the fragment length data is performed on the cfDNA after passing the cfDNA through the nanopore sequencer by clustering the data based on the length of said cfDNA.
3. The method of claim 2, wherein the clustering is unsupervised clustering.
4. The method of claim 2 or claim 3, wherein a classifier is used to cluster the data based on the length of said cfDNA.
5. The method of any one of claims 2 to 4, wherein analysis of the fragment length data comprises transforming a fraction of sequencing reads from the fragment length data to an estimated fraction of total cfDNA present in the sample.
6. The method of claim 5, wherein said transforming is performed prior to clustering.
7. The method of any one of claims 2 to 6, wherein clustering the data comprises separating the data into short, mid and long cfDNA lengths.
8. The method of claim 7, wherein the mid cfDNA length comprises about 900-4,300 bp in length, the long cfDNA length comprises about 7,500-53,000 bp in length and/or the short cfDNA length comprises about 75-145 and 245-294 bp in length.
9. The method of any one of claims 1 to 8, wherein fragmentation location analysis is performed on the cfDNA after passing the cfDNA through the nanopore sequencer and said fragmentation location analysis is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
10. The method of claim 9, wherein the fragmentation location analysis comprises fragment end motif analysis.
11. The method of any one of claims 1 to 10, wherein passing the cfDNA through the nanopore sequencer produces DNA modification data.
12 The method of claim 11 , wherein the DNA modification data is selected from: methylation data, hydroxymethylation data and both, and said DNA modification data is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
13. The method of any one of claims 1 to 12, wherein copy number analysis is performed on the cfDNA after passing the cfDNA through the nanopore sequencer and said copy number analysis is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
14. The method of any one of claims 1 to 13, wherein the method additionally comprises measuring or detecting the level of circulating nucleosomes in the body fluid sample and said level is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
15. The method of any one of claims 1 to 14, wherein the method additionally comprises measuring or detecting the total level of cfDNA present in the body fluid sample and said level is used in combination with the fragment length data to determine a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA.
16. The method of any one of claims 1 to 15, wherein the method additionally comprises centrifuging the sample prior to passing the cfDNA through a nanopore sequencer.
17. The method of any one of claims 1 to 16, wherein the method additionally comprises:
(a) fragment end motif analysis;
(b) performing RNA sequencing (RNA-seq) on the body fluid sample and detecting a level of mRNA expression of a deoxyribonuclease (DNase) enzyme;
(c) performing Reverse Transcription Polymerase Chain Reaction (RT-PCR) on the body fluid sample and detecting a level of mRNA expression of a DNase enzyme;
(d) performing an assay on the body fluid sample to detect a level of protein expression of a DNase enzyme; and/or
(e) performing an activity assay to measure functional activity of a DNase enzyme.
18. The method of any one of claims 1 to 17, wherein the method additionally comprises detecting DNASE1 L3 activity.
19. The method of claim 18, wherein said detecting comprises:
(a) fragment end motif analysis;
(b) performing RNA-seq on the body fluid sample and detecting a level of DNASE1 L3 mRNA expression;
(c) performing RT-PCR on the body fluid sample and detecting a level of DNASE1 L3 mRNA expression;
(d) performing an assay on the body fluid sample to detect a level of DNASE1L3 protein expression; and/or
(e) performing an activity assay to measure functional DNASE1L3 activity.
20. The method of any one of claim 1 to 19, wherein the body fluid sample is a blood, serum or plasma sample.
21. The method of any one of claims 1 to 20, wherein the body fluid sample has been obtained from a subject suspected of relapse to a cancer.
22. The method of claim 21 , wherein the subject is suspected of minimal residual disease (MRD)
23. The method of claim 21 or claim 22, wherein the subject is in remission from cancer.
24. Use of a size profile obtained by nanopore sequencing of all circulating chromatin fragments present in a body fluid sample as a biomarker for the diagnosis of cancer.
25. Use of a size profile obtained by nanopore sequencing of all circulating chromatin fragments present in a body fluid sample to identify a patient suitable for cancer treatment, such as immunotherapy.
26. A method of determining a tissue of origin, a cell type of origin, origination from a cancerous or immune cell, or a combination thereof of cell-free DNA (cfDNA) in an animal subject, the method comprising:
(i) providing a body fluid sample comprising cfDNA;
(ii) passing said cfDNA through a nanopore sequencer to produce a sequence of said cfDNA and obtain DNA modification data of said cfDNA, wherein the DNA modification data is selected from: methylation data, hydroxymethylation data and both; and
(iii) identifying for said cfDNA passed through the nanopore sequencer a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof based on the DNA modification data; thereby determining a tissue of origin, cell type of origin, origination from a cancerous or immune cell or a combination thereof of cfDNA in the animal subject.
27. The method of claim 26, wherein the animal subject is a dog.
28. The method of claim 26 or claim 27, wherein the animal subject is suspected of relapse to a cancer.
29. The method of any one of claims 26 to 28, wherein the animal subject is suspected of minimal residual disease (MRD)
30. The method of any one of claims 26 to 29, wherein the animal subject is in remission from cancer.
31. The method of any one of claims 26 to 30, wherein the DNA modification data is methylation data.
32. The method of any one of claims 26 to 31 , further comprising performing analysis of fragment length data, fragmentation location analysis, copy number analysis and/or measuring or detecting the total level of cfDNA present in the body fluid sample.
Applications Claiming Priority (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463560513P | 2024-03-01 | 2024-03-01 | |
| US63/560,513 | 2024-03-01 | ||
| US202463636646P | 2024-04-19 | 2024-04-19 | |
| US63/636,646 | 2024-04-19 | ||
| US202463648600P | 2024-05-16 | 2024-05-16 | |
| US63/648,600 | 2024-05-16 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025181348A1 true WO2025181348A1 (en) | 2025-09-04 |
Family
ID=94871329
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2025/055553 Pending WO2025181348A1 (en) | 2024-03-01 | 2025-02-28 | Method for determining the origin of circulating dna |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025181348A1 (en) |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2005019826A1 (en) | 2003-08-18 | 2005-03-03 | Chroma Therapeutics Limited | Detection of histone modification in cell-free nucleosomes |
| WO2013030578A2 (en) | 2011-09-01 | 2013-03-07 | Singapore Volition Pte Limited | Method for detecting nucleosomes |
| WO2013030579A1 (en) | 2011-09-01 | 2013-03-07 | Singapore Volition Pte Limited | Method for detecting nucleosomes containing histone variants |
| WO2013084002A2 (en) | 2011-12-07 | 2013-06-13 | Singapore Volition Pte Limited | Method for detecting nucleosome adducts |
| WO2017068359A1 (en) | 2015-10-21 | 2017-04-27 | Belgian Volition Sprl | Method for detecting nucleosomes containing histone modifications and variants |
| WO2023067597A1 (en) * | 2021-10-18 | 2023-04-27 | Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. | Use of nanopore sequencing for determining the origin of circulating dna |
| US11788135B2 (en) * | 2016-08-05 | 2023-10-17 | The Broad Institute, Inc. | Methods for genome characterization |
| WO2024031097A2 (en) * | 2022-08-05 | 2024-02-08 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and methods for cancer screening |
-
2025
- 2025-02-28 WO PCT/EP2025/055553 patent/WO2025181348A1/en active Pending
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2005019826A1 (en) | 2003-08-18 | 2005-03-03 | Chroma Therapeutics Limited | Detection of histone modification in cell-free nucleosomes |
| WO2013030578A2 (en) | 2011-09-01 | 2013-03-07 | Singapore Volition Pte Limited | Method for detecting nucleosomes |
| WO2013030579A1 (en) | 2011-09-01 | 2013-03-07 | Singapore Volition Pte Limited | Method for detecting nucleosomes containing histone variants |
| WO2013084002A2 (en) | 2011-12-07 | 2013-06-13 | Singapore Volition Pte Limited | Method for detecting nucleosome adducts |
| WO2017068359A1 (en) | 2015-10-21 | 2017-04-27 | Belgian Volition Sprl | Method for detecting nucleosomes containing histone modifications and variants |
| US11788135B2 (en) * | 2016-08-05 | 2023-10-17 | The Broad Institute, Inc. | Methods for genome characterization |
| WO2023067597A1 (en) * | 2021-10-18 | 2023-04-27 | Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. | Use of nanopore sequencing for determining the origin of circulating dna |
| WO2024031097A2 (en) * | 2022-08-05 | 2024-02-08 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and methods for cancer screening |
Non-Patent Citations (61)
| Title |
|---|
| ADALSTEINSSON ET AL.: "Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors", NAT COMMUN, vol. 8, 2017, pages 1324, XP055449803, DOI: 10.1038/s41467-017-00965-y |
| ALCAIDE ET AL.: "Evaluating the quantity, quality and size distribution of cell-free DNA by multiplex droplet digital PCR", SCI REP, vol. 10, 2020, pages 12564 |
| BACA ET AL.: "Liquid biopsy epigenomic profiling for cancer subtyping", NAT MED, vol. 29, 2023, pages 2737 - 2741, XP093236745, DOI: 10.1038/s41591-023-02605-z |
| BAUDEN ET AL.: "Circulating nucleosomes as epigenetic biomarkers in pancreatic cancer", CLIN EPIGENETICS, vol. 7, 2015, pages 106, XP021229536, DOI: 10.1186/s13148-015-0139-4 |
| BUDHRAJA ET AL.: "Genome-wide analysis of aberrant position and sequence of plasma DNA fragment ends in patients with cancer", SCI TRANSL MED, vol. 15, 2023, pages eabm6863, XP093056866, DOI: 10.1126/scitranslmed.abm6863 |
| CHAN ET AL.: "Plasma DNA aberrations in systemic lupus erythematosus revealed by genomic and methylomic sequencing", PROC NATL ACAD SCI, vol. 111, 2014, XP055495862, DOI: 10.1073/pnas.1421126111 |
| CHAN ET AL.: "Plasma DNA Profile Associated with DNASE1 L3 Gene Mutations: Clinical Observations, Relationships to Nuclease Substrate Preference, and In Vivo Correction", AM J HUM GENET, vol. 107, 2020, pages 882 - 894, XP086318687, DOI: 10.1016/j.ajhg.2020.09.006 |
| CHE ET AL.: "Genomic origin, fragmentomics, and transcriptional properties of long cell-free DNA molecules in human plasma", GENOME RES, vol. 34, 2024, pages 189 - 200 |
| CHOY ET AL.: "Single-Molecule Sequencing Enables Long Cell-Free DNA Detection and Direct Methylation Analysis for Cancer Patients", CLIN CHEM, vol. 68, 2022, pages 1151 - 1163 |
| COHEN ET AL.: "Practical recommendations for using ctDNA in clinical decision making", NATURE, vol. 619, 2023, pages 259 - 268 |
| CRISTIANO ET AL.: "Genome-wide cell-free DNA fragmentation in patients with cancer", NATURE, vol. 570, 2019, pages 385 - 389, XP036814426, DOI: 10.1038/s41586-019-1272-6 |
| CRISTIANO STEPHEN ET AL: "Genome-wide cell-free DNA fragmentation in patients with cancer", NATURE,, vol. 570, no. 7761, 29 May 2019 (2019-05-29), pages 385 - 389, XP036814426, [retrieved on 20190529], DOI: 10.1038/S41586-019-1272-6 * |
| DOR YCEDAR H: "Principles of DNA methylation and their implications for biology and medicine", THE LANCET, vol. 392, 2018, pages 777 - 786, XP085461265, DOI: 10.1016/S0140-6736(18)31268-6 |
| FEDYUK ET AL.: "Multiplexed, single-molecule, epigenetic analysis of plasma-isolated nucleosomes for cancer diagnostics", NAT BIOTECHNOL, vol. 41, 2023, pages 212 - 221, XP093070069, DOI: 10.1038/s41587-022-01447-3 |
| FITZGERALD ET AL.: "The future of early cancer detection", NAT MED, vol. 28, 2022, pages 666 - 677, XP037801548, DOI: 10.1038/s41591-022-01746-x |
| HAN DSCLO YMD: "The Nexus of cfDNA and Nuclease Biology", TRENDS GENET, vol. 37, 2021, pages 758 - 770, XP086694635, DOI: 10.1016/j.tig.2021.04.005 |
| HAN ET AL.: "The Biology of Cell-free DNA Fragmentation and the Roles of DNASE1, DNASE1L3, and DFFB", AM J HUM GENET, vol. 106, 2020, pages 202 - 214, XP055822533, DOI: 10.1016/j.ajhg.2020.01.008 |
| HASENLEITHNER SOSPEICHER MR: "A clinician's handbook for using ctDNA throughout the patient journey", MOL CANCER, vol. 21, 2022, pages 81 |
| HERRANZESTELLER, METHODS MOL. BIOL., vol. 361, 2007, pages 25 - 62 |
| HOLDENRIEDERSTIEBER, CRIT REV CLIN LAB SCI, vol. 46, no. 1, 2009, pages 1 - 24 |
| JIANG ET AL.: "Lengthening and shortening of plasma DNA in hepatocellular carcinoma patients", PROC NATL ACAD SCI, vol. 112, 2015, XP055223840, DOI: 10.1073/pnas.1500076112 |
| KATSMAN EFRAT ET AL: "Detecting cell-of-origin andcancer-specific methylation features ofcell-free DNA fromNanopore sequencing", GENOME BIOLOGY, 15 July 2022 (2022-07-15), XP093012695, Retrieved from the Internet <URL:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9283844/pdf/13059_2022_Article_2710.pdf> [retrieved on 20230110], DOI: 10.1186/s13059-022-02710-1 * |
| KATSMAN ET AL.: "Detecting cell-of-origin and cancer-specific methylation features of cell-free DNA from Nanopore sequencing", GENOME BIOL, vol. 23, 2022, pages 158 |
| KIRILENKO ET AL.: "Integrating gene annotation with orthology inference at scale", SCIENCE, vol. 380, 2023, pages eabn3107 |
| LO ET AL.: "Epigenetics, fragmentomics, and topology of cell-free DNA in liquid biopsies", SCIENCE, vol. 372, 2021, pages eaaw3616 |
| LOYFER ET AL.: "A DNA methylation atlas of normal human cell types", NATURE, vol. 613, 2023, pages 355 - 364, XP093208240, DOI: 10.1038/s41586-022-05580-6 |
| MARIHO-RAMIREZ ET AL.: "The Histone Database: an integrated resource for histones and histone fold-containing proteins", DATABASE, vol. 2011, Retrieved from the Internet <URL:http://genome.nhgri.nih.gov/histones/complete.shtml> |
| MARKUS ET AL.: "Evaluation of pre-analytical factors affecting plasma DNA analysis", SCI REP, vol. 8, 2018, pages 7375 |
| MARTIGNANO ET AL.: "Nanopore sequencing from liquid biopsy: analysis of copy number variations from cell-free DNA of lung cancer patients", MOL CANCER, vol. 20, 2021, pages 32, XP093015034, DOI: 10.1186/s12943-021-01327-5 |
| MATTOX ET AL.: "The Origin of Highly Elevated Cell-Free DNA in Healthy Individuals and Patients with Pancreatic, Colorectal, Lung, or Ovarian Cancer", CANCER DISCOV, vol. 13, 2023, pages 2166 - 2179 |
| MOLDOVAN ET AL.: "Multi-modal cell-free DNA genomic and fragmentomic patterns enhance cancer survival and recurrence analysis", CELL REP MED, vol. 5, 2024, pages 101349 |
| MOULIERE ET AL.: "Enhanced detection of circulating tumor DNA by fragment size analysis", SCI TRANSL MED, vol. 10, 2018, pages eaat4921, XP055669959, DOI: 10.1126/scitranslmed.aat4921 |
| MOULIERE ET AL.: "High Fragmentation Characterizes Tumour-Derived Circulating DNA", PLOS ONE, vol. 6, 2011, pages e23418, XP002730500, DOI: 10.1371/journal.pone.0023418 |
| PASTOR ET AL.: "Association of neutrophil extracellular traps with the production of circulating DNA in patients with colorectal cancer", ISCIENCE, vol. 25, 2022, pages 103826, XP093001079, DOI: 10.1016/j.isci.2022.103826 |
| PISAREVA ET AL.: "Neutrophil extracellular traps have auto-catabolic activity and produce mononucleosome-associated circulating DNA", GENOME MED, vol. 14, 2022, pages 135 |
| SADEH ET AL.: "ChIP-seq of plasma cell-free nucleosomes identifies gene expression programs of the cells of origin", NAT BIOTECHNOL, vol. 39, 2021, pages 586 - 598, XP037450454, DOI: 10.1038/s41587-020-00775-6 |
| SCATTONE ET AL.: "Quantification of Global DNA Methylation in Canine Melanotic and Amelanotic Oral Mucosal Melanomas and Peripheral Blood Leukocytes From the Same Patients With OMM: First Study", FRONT VET SCI, vol. 8, 2020, pages 680181 |
| SERPAS ET AL.: "Dnase1I3 deletion causes aberrations in length and end-motif frequencies in plasma DNA", PROC NATL ACAD SCI, vol. 116, 2019, pages 641 - 649, XP055822531, DOI: 10.1073/pnas.1815031116 |
| SHI JIPING ET AL: "Size profile of cell-free DNA: A beacon guiding the practice and innovation of clinical testing", THERANOSTICS, vol. 10, no. 11, 26 March 2020 (2020-03-26), AU, pages 4737 - 4748, XP093017678, ISSN: 1838-7640, Retrieved from the Internet <URL:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7163439/pdf/thnov10p4737.pdf> DOI: 10.7150/thno.42565 * |
| SISIRAK ET AL.: "Digestion of Chromatin in Apoptotic Cell Microparticles Prevents Autoimmunity", CELL, vol. 166, 2016, pages 88 - 101, XP029627879, DOI: 10.1016/j.cell.2016.05.034 |
| SNYDER ET AL.: "Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin", CELL, vol. 164, 2016, pages 57 - 68 |
| SNYDER MATTHEW W ET AL: "Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin", CELL, ELSEVIER, AMSTERDAM NL, vol. 164, no. 1, 14 January 2016 (2016-01-14), pages 57 - 68, XP029385484, ISSN: 0092-8674, DOI: 10.1016/J.CELL.2015.11.050 * |
| THIERRY AR: "Circulating DNA fragmentomics and cancer screening", CELL GENOMICS, vol. 3, 2023, pages 100242 |
| THIERRY ARPISAREVA E: "A New Paradigm of the Origins of Circulating DNA in Patients with Cancer", CANCER DISCOV, vol. 13, 2023, pages 2122 - 2124 |
| UDOMRUK SASIMOL ET AL: "Size distribution of cell-free DNA in oncology", CRITICAL REVIEWS IN ONCOLOGY/HEMATOLOGY, ELSEVIER, AMSTERDAM, NL, vol. 166, 28 August 2021 (2021-08-28), XP086812493, ISSN: 1040-8428, [retrieved on 20210828], DOI: 10.1016/J.CRITREVONC.2021.103455 * |
| ULZ ET AL.: "Inferring expressed genes by whole-genome sequencing of plasma DNA", NAT GENET, vol. 48, 2016, pages 1273 - 1278, XP093151144, DOI: 10.1038/ng.3648 |
| UNGERER ET AL.: "Serial profiling of cell-free DNA and nucleosome histone modifications in cell cultures", SCI REP, vol. 11, 2021, pages 9460 |
| UNTERMAN ET AL., MULTI-CELL TYPE DECONVOLUTION USING A PROBABILISTIC MODEL FOR SINGLE-MOLECULE DNA METHYLATION HAPLOTYPES, 2023, Retrieved from the Internet <URL:http://biorxiv.org/lookup/doi/10.1101/2023.08.20.554012> |
| UNTERMAN ET AL.: "CelFiE-ISH: a probabilistic model for multi-cell type deconvolution from single-molecule DNA methylation haplotypes", GENOME BIOL, vol. 25, 2024, pages 151 |
| UNTERMAN, BIORXIV, 2023 |
| VAN DER POL ET AL.: "Real-time analysis of the cancer genome and fragmentome from plasma and urine cell-free DNA using nanopore sequencing", EMBO MOL MED, vol. 15, 2023, pages e17282 |
| VAN DER POL YMOULIERE F: "Toward the Early Detection of Cancer by Decoding the Epigenetic and Environmental Fingerprints of Cell-Free DNA", CANCER CELL, vol. 36, 2019, pages 350 - 368, XP085861188, DOI: 10.1016/j.ccell.2019.09.003 |
| WAN ET AL.: "ctDNA monitoring using patient-specific sequencing and integration of variant reads", SCI TRANSL MED, vol. 12, 2020, pages eaaz8084, XP093073693, DOI: 10.1126/scitranslmed.aaz8084 |
| WIDMAN ET AL., MACHINE LEARNING GUIDED SIGNAL ENRICHMENT FOR ULTRASENSITIVE PLASMA TUMOR BURDEN MONITORING, 2022, Retrieved from the Internet <URL:http://biorxiv.org/lookup/doi/10.1101/2022.01.17.476508> |
| WOLACH ET AL.: "Increased neutrophil extracellular trap formation promotes thrombosis in myeloproliferative neoplasms", SCI TRANSL MED, vol. 10, 2018, pages eaan8292 |
| XAVIER ET AL.: "Epigenetic Mechanisms in Canine Cancer", FRONT ONCOL, vol. 10, 2020, pages 591843 |
| YU ET AL.: "Comparison of Single Molecule, Real-Time Sequencing and Nanopore Sequencing for Analysis of the Size, End-Motif, and Tissue-of-Origin of Long Cell-Free DNA in Plasma", CLIN CHEM, vol. 69, 2023, pages 168 - 179 |
| YU ET AL.: "Single-molecule sequencing reveals a large population of long cell-free DNA molecules in maternal plasma", PROC NATL ACAD SCI, vol. 118, 2021, pages e2114937118, XP093174388, DOI: 10.1073/pnas.2114937118 |
| ZHENG ET AL.: "A pan-cancer analysis of CpG Island gene regulation reveals extensive plasticity within Polycomb target genes", NAT. COMMUN, vol. 12, 2021, pages 2485 |
| ZHENG ET AL.: "Comprehensive analyses of partially methylated domains and differentially methylated regions in esophageal cancer reveal both cell-type- and cancer-specific epigenetic regulation", GENOME BIOLOGY, vol. 24, 2023, pages 193 |
| ZHOU ET AL.: "Epigenetic analysis of cell-free DNA by fragmentomic profiling", PROC NATL ACAD SCI, vol. 119, 2022, pages e2209852119 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20240254559A1 (en) | Genomic stability profiling | |
| Lau et al. | Single-molecule methylation profiles of cell-free DNA in cancer with nanopore sequencing | |
| CN104067124B (en) | Method for detecting the nucleosome containing nucleotide | |
| EP2751572B1 (en) | Method for detecting nucleosomes | |
| AU2010311535B2 (en) | Means and methods for non-invasive diagnosis of chromosomal aneuploidy | |
| KR101595305B1 (en) | Compositions and Methods of Detecting TIABS | |
| Miller et al. | Programs, origins and immunomodulatory functions of myeloid cells in glioma | |
| US20230368915A1 (en) | Metastasis predictor | |
| US20230178245A1 (en) | Immunotherapy Response Signature | |
| US20220093217A1 (en) | Genomic profiling similarity | |
| MX2014002444A (en) | Method for detecting nucleosomes containing histone variants. | |
| WO2014089241A9 (en) | Molecular profiling for cancer | |
| Oberhofer et al. | Profiling disease and tissue-specific epigenetic signatures in cell-free DNA | |
| JP2020518234A (en) | Methods for exploring and identifying precursor genetic status of solid tumor development | |
| KR20220125708A (en) | Next-generation sequencing-based target gene RNA sequencing panel and analysis algorithm | |
| WO2025181348A1 (en) | Method for determining the origin of circulating dna | |
| WO2020194057A1 (en) | Biomarkers for disease detection | |
| Zhang et al. | The application of targeted RNA sequencing for the analysis of fusion genes, gene mutations, IKZF1 intragenic deletion, and CRLF2 overexpression in acute lymphoblastic leukemia | |
| de Lima Guido et al. | Integrated Genomic DNA/RNA Profiling vs Fluorescence in Situ Hybridization in the Detection of MYC and BCL2 (and BCL6) Rearrangements in Large B-Cell Lymphomas: Updates Amid the New WHO Classification of Lymphoid Neoplasms | |
| WO2020071784A1 (en) | Macrophage-specific biomarker panel and use thereof | |
| Yoshino et al. | Variants of Uncertain Significances in Hereditary Breast and Ovarian Cancer | |
| Berman et al. | Long-read sequencing identifies aberrant fragmentation patterns linked to elevated cell-free DNA levels in cancer | |
| Doebley | Predicting cancer subtypes from nucleosome profiling of cell-free DNA | |
| EP4639165A1 (en) | Assessment of biological samples for nucleic acid analysis | |
| HK40008468A (en) | Method for detecting nucleosomes containing nucleotides |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25710002 Country of ref document: EP Kind code of ref document: A1 |