[go: up one dir, main page]

EP3811365A1 - Mesure de bruit permettant une analyse du nombre de copies sur des données de séquençage d'un panel ciblé - Google Patents

Mesure de bruit permettant une analyse du nombre de copies sur des données de séquençage d'un panel ciblé

Info

Publication number
EP3811365A1
EP3811365A1 EP19778848.2A EP19778848A EP3811365A1 EP 3811365 A1 EP3811365 A1 EP 3811365A1 EP 19778848 A EP19778848 A EP 19778848A EP 3811365 A1 EP3811365 A1 EP 3811365A1
Authority
EP
European Patent Office
Prior art keywords
copy number
subject
sample
value
cancer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP19778848.2A
Other languages
German (de)
English (en)
Inventor
Johannes Heuckmann
Tobias Zacherle
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens Healthineers AG
Original Assignee
Siemens Healthcare GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Healthcare GmbH filed Critical Siemens Healthcare GmbH
Publication of EP3811365A1 publication Critical patent/EP3811365A1/fr
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/20Sequence assembly
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • the present invention relates to a method for determining the statistical noise level in the calculation of a subject's genetic copy number value in massively parallel nucleic acid sequencing data derived from a sample, as well as a method for determining a subject's genetic copy number value in massively parallel nucleic acid sequencing data derived from a sample and a method to determine a subject's genetic copy number value for stratifying the subject for cancer therapy.
  • Copy Number analysis is a crucial part of genomic analysis and can provide major indications for a certain line of treatment in targeted cancer therapy (Albertson et al.,2006 Trends in Genetics, 22 ( 8 ) : 447-55 ; Romond et al . , 2005, The New England Journal of Medicine, 353 ( 16) : 1673-84 ; or
  • the present invention addresses this need and presents a method for determining the statistical noise level in the calculation of a subject's genetic copy number value in massively parallel nucleic acid sequencing data derived from a sample, comprising: (a) obtaining massively parallel sequencing information of a subject's sample and of a cohort of normal sample for a defined set of genomic loci as nucleic acid sequence reads; (b) aligning said nucleic acid
  • sequencing reads to a reference sequence (c) determining copy number values using a median normalized ratio of sample read count data and normal read count data as obtained in step (a) ; (d) determining the ratio of a standard error value or a standard deviation value to a copy number value as determined in step (c) at each locus of said defined set of genomic loci; (e) adding the ratios as determined in step (d) , optionally multiplying each ratio with a weight factor corresponding to the clinical relevance of the locus; (f) obtaining a value for the statistical noise level in a subject's genetic copy number wherein said value for the statistical noise level is the sum of added ratios as obtained in step (e) ; and (g) normalizing the value obtained in step (f) by the size of the defined set of genomic loci or by the sum of all applied weights for the defined set of genomic loci if different weights are applied.
  • the approach allows for an improved stratification of a patient for cancer therapy and a more reliable
  • the copy number value as determined in step (c) of the method depicted above is determined on the basis of mean values, optionally on the basis of median values or trimmed mean values.
  • said set of genomic loci comprise (s) one or more of the following: (an) exonic sequence (s), (an) intronic sequence (s), and (a) gene region (s), preferably (a) gene region (s).
  • the set of genomic loci comprise a preselected panel of genes, preferably a panel of genes associated with a disease or with the development of a disease, more preferably a panel of genes associated with cancer or the development of cancer.
  • the panel of genes is a panel of at least 10 genes. It is preferred that the panel is a panel of at least 30 genes. In a more
  • it is a panel of at least 100 genes.
  • the present invention relates to a method of reducing the statistical noise level in the calculation of a subject's genetic copy number value in massively parallel nucleic acid sequencing data derived from a sample,
  • said normal sample is selected from a heterogeneous cohort of normal samples, wherein normal sample yields the lowest final noise measure among all normal samples of said cohort with respect to the subject's sample analyzed, or wherein said normal sample is a sub-cohort selected from a heterogeneous cohort of normal samples, wherein said sub-cohort comprises those normal samples which yield the 2, 3, 4 or 5 lowest final noise measures among all normal samples of said cohort with respect to the subject's sample analyzed, and wherein a reduced value for the statistical noise level in a subject's genetic copy number is obtained.
  • the heterogeneous cohort of normal samples comprises samples with coverage levels ranging from at least about 50 x to at most about 10000 x.
  • a value obtained for the statistical noise level in a subject's genetic copy number is considered to indicate a noisy sample if a threshold is surpassed, which threshold is calculated according to a procedure comprising the steps of: (a) selecting a set of normal samples as a calibration test set, wherein all copy numbers are known to have a value of 2; (b) selecting a disjunct set of calibration normal samples as a reference set, wherein all copy numbers are known to have a value of 2; (c) analyzing each normal sample in the
  • the present invention relates to a method for determining a subject's genetic copy number value in massively parallel nucleic acid sequencing data derived from a sample, comprising adjusting the determination of the subject's genetic copy number value by applying a value for the statistical noise level in a subject's genetic copy number as determined in the method as defined herein.
  • said adjusting of the determination of the subject's genetic copy number value comprises an exclusion from further usage of samples which have been identified as being higher than the respective threshold obtained according to the threshold calculation procedure described herein.
  • said adjusting of the determination of the subject's genetic copy number value comprises a modification of the statistical significance of a calculated genetic copy number subject to the value for the statistical noise level in a subject's genetic copy number as determined in a method for determining the statistical noise level in the calculation of a subject's genetic copy number value described herein.
  • the present invention relates to a method to determine a subject's genetic copy number value for stratifying the subject for cancer therapy, comprising: (a) performing a massively parallel nucleic acid sequencing of nucleic acids extracted from a subject's tumor sample; (b) determining the subject's genetic copy number value according to the method for determining a subject's genetic copy number value as defined herein; and (c) attributing the determined subject's genetic copy number value to a group of increased, normal or decreased genetic copy number values, which can guide a treatment decision.
  • a decreased genetic copy number value corresponds to a genetic copy number value significantly lower than 2
  • a normal genetic copy number value corresponds to a genetic copy number value of 2
  • an increased copy number value corresponds to a copy number value significantly higher than 2.
  • increased genetic copy number value indicates a preference for a targeted cancer therapy.
  • step (a) comprising a hybrid-capture based nucleic acid enrichment for genomic loci of interest.
  • the sample comprises one or more premalignant or malignant cells; cells from a solid tumor or soft tissue tumor or a metastatic lesion; tissue or cells from a surgical margin; a histologically normal tissue obtained in a biopsy; one or more circulating tumor cells (CTC; cell free DNA (cfDNA) ; a normal, adjacent tissue (NAT) from a subject having a tumor or being at risk of having a tumor; or a blood, plasma, urine, saliva or serum sample containing nucleic acids from the tumor from the same subject having a tumor or being at risk of having a tumor; or a paraffin or FFPE-sample.
  • CTC circulating tumor cells
  • cfDNA cell free DNA
  • NAT normal, adjacent tissue
  • the cancer may be breast cancer, prostate cancer, ovarian cancer, renal cancer, lung cancer, pancreas cancer, urinary bladder cancer, uterus cancer, kidney cancer, brain cancer, stomach cancer, colon cancer, melanoma or fibrosarcoma, GIST, glioblastoma or hematological leukemia and lymphomas, both from the myeloid and lymphatic lineage .
  • the method to determine a subject's genetic copy number value for stratifying the subject for cancer therapy further comprises providing a report in electronic, web-based, or paper form, to a subject or to another person or entity, a caregiver, a physician, an oncologist, a hospital, a clinic, a third party payor, an insurance company or a government office.
  • the report comprises one or more of: (i) output from the method, comprising the determined genetic copy number value; (ii) information on the meaning of the determined genetic copy number value, wherein said
  • information comprises information on prognosis and potential or suggested therapeutic options; (iii) information on the likely effectiveness of a therapeutic option, the
  • FIG. 1 shows an example calculation for the statistical noise according to the invention.
  • the calculation was performed with 5 genes (indicated in the first column) , with each gene having a different number of partitions (second column), i.e. copy number probes corresponding to e.g. sub-portions of a gene such as exons or parts of exons, whose size is typically defined by technical parameters of hybrid capture approaches or similar methods.
  • the mean copy number (“mean CN)
  • SE standard error of the copy number
  • SE/mean CN the standard error divided by the mean copy number
  • SD standard deviation of copy numbers of partitions
  • SD/mean CN standard deviation divided by the mean copy number
  • the values for SD (6 th column) are linked to the values for SE (4 th column) such, that each SD value is the product of the square root of the number of partitions times the SE value.
  • the SE/mean CN values can subsequently be added up and then divided by 5 (corresponding to the number of genes in this example) to obtain the normalized noise measure value MRSE (mean relative standard error) indicated at the bottom of the figure.
  • the SD/mean CN values can be added up and then divided by 5 (corresponding to the number of genes in this example) to obtain the alternative variant of the noise measure MCV (mean coefficient of variation) , also indicated at the bottom of the figure.
  • MCV mean coefficient of variation
  • the MRSE is 0.0646
  • the MCV for the same genes and partitions yields a value of 0.2366.
  • FIG.2 shows schematic examples of low noise (left hand side) and high noise (right hand side) calculated according to methods of the invention and provides an intuitive
  • partition copy number is obtained (see lower panel of Fig.2) .
  • the calculated partition copy numbers show a small standard deviation and the resulting mean copy number for the gene a small standard error
  • the calculated partition copy numbers show a large standard deviation and the mean gene copy number shows a large standard error.
  • a large noise measure value (as SE/SD are large) may be obtained for normal 2.
  • a smaller noise measure value (as SE/SD are smaller) may be obtained for normal 1.
  • FIG. 3 shows examples of high statistical noise and low statistical noise values calculated according to methods of the invention.
  • the figure shows normalized log2 ratios of read counts from a tumor sample over read counts of normal samples for a 39 genes panel.
  • MRSE noise
  • FIG. 3 shows the normalized log2 ratios of read counts from a tumor sample over read counts of normal samples for a 39 genes panel.
  • MRSE noise
  • the noise calculation associated with the samples used for the upper plot results in a noise (MRSE) value of 0.0692
  • MRSE noise
  • the normal sample used in the context of the upper plot thus does not fit to the tumor sample, which is reflected by the statistical noise difference of 0.0692 vs. 0.0308.
  • the plots show that the difference in noise levels is correspondingly reflected in differences of spread of log2 ratios.
  • FIGS. 4 and 5 show examples for copy number data, for the case of choosing the best normal (i.e. one which minimizes the noise measure according to the invention; see Fig. 4) and for the case of choosing a single fixed normal (with 200ng input material; see Fig. 5) .
  • the diagrams show calculated relative copy numbers (not taking into actual ploidy factors, but assuming ploidy 2) for cell line HCC2218, which differ according the amount of sample material used for the
  • FIG. 6 depicts a calibration approach to determine a noise threshold for noisy samples according to the present
  • a normalized noise measure here: MCV
  • MCV normalized noise measure
  • the plot thus depicts the correlation of a direct noise measure with simple interpretation (which is, however, only applicable in calibration settings, when copy numbers are well defined and known) to the noise measure according to the present invention (which is less direct, but more generally applicable, e.g. in the diagnostic setting, when real copy numbers in a sample are unknown and to be determined) .
  • FIG. 8 shows a calibration approach to determine a noise threshold for noisy samples according to the present
  • the plot depicts the correlation of a direct noise measure with simple interpretation to the noise measure according to the invention.
  • ⁇ 20 % typically indicates a deviation from the indicated numerical value of ⁇ 20 %, preferably ⁇ 15 %, more preferably ⁇ 10 %, and even more preferably ⁇ 5 %.
  • the present invention concerns in one aspect a method for determining the statistical noise level in the calculation of a subject's genetic copy number values in massively parallel nucleic acid sequencing data derived from a sample, comprising: (a) obtaining massively parallel sequencing information of a subject's sample and of a cohort of normal samples for a defined set of genomic loci as nucleic acid sequence reads; (b) aligning said nucleic acid sequencing reads to a reference sequence; (c)
  • step (a) determining copy number values using a mediannormalized ratio of sample read count data and normal read count data as obtained in step (a) ; (d) determining the ratio of a standard error value or a standard deviation value to a copy number value as determined in step (c) at each locus of said defined set of genomic loci; (e) adding the ratios as determined in step (d) , optionally multiplying each ratio with a weight factor corresponding to the clinical relevance of the
  • step (e) obtained in step (e) ; and (g) normalizing the value obtained in step (f) by the size (e.g. number of genes) of the defined set of genomic loci or by the sum of all applied weights for the defined set of genomic loci if different weights are applied .
  • the term "genetic copy number” relates to the number of copies of a gene or portion of a gene or of a genomic region comprising one or more than one gene or of a genomic segment or of a chromosome, or in certain embodiments also of the entire genome or substantial portions thereof per cell.
  • the number of genes etc. may be derived from a specific subset of genomic segments or targeted sequences.
  • a “targeted sequence” may thus comprise any genomic sequence segment which is considered of interest for the determination of the genetic copy number. It may include coding sequences, or coding sequences and non-coding sequences.
  • the copy number may be measured in relation to targeted sequences including exonic sequences only, or in relation to combinations of exonic sequences and additional elements such as introns and/or other genomic elements. It is preferred that the copy number is measured in relation to exonic sequences. In specific, different
  • the copy number may also be measured in relation to further elements of the genome.
  • regulatory sequences such as promoter, or terminator sequences may be included in a targeted sequence.
  • the term "genetic copy number" relates typically to the copy number of diploid organisms, wherein a copy number of 2 is a default situation, with a copy number of ⁇ 2 indicating a deletion of a gene or genomic region, chromosome or genome etc. and a copy number of >2 indicating a duplication or amplification of a gene or genomic region, chromosome or genome etc.
  • the methods of the present invention my also be used in a non-diploid context as, for example, a cancer disease may alter the ploidy of initially diploid cells.
  • statistical noise level refers to the level of unexpected variability within the value calculated for the genetic copy number. Generally, noisy data or noisy values are rendered meaningless by the existence of too much variation. Accordingly, any meaningful signal or value may be obscured by random data, i.e. statistical noise.
  • NGS next-generation sequence
  • second generation sequencing techniques typically includes next-generation sequence (NGS) or second generation sequencing techniques.
  • the massively parallel sequencing approach includes any sequencing method that determines the nucleotide sequence of either individual nucleic acid molecules or expanded clones for individual nucleic acid molecules in a highly parallel fashion. For example, more than 10 8 molecules may be
  • sequencing may be performed according to any suitable massive parallel approach.
  • Typical platforms include Roche 454, GS FLX Titanium, Illumina, Life Technologies Ion Proton, Oxford Nanopore Technolgies, Solexa, Solid or Helicos Biosciences Heliscope systems.
  • Obtaining massively parallel sequencing information means that any suitable massively parallel sequencing approach as mentioned, or as known to a skilled person, can be performed.
  • the sequencing may include the preparation of templates, the sequencing, as well as subsequent imaging and initial data analysis steps.
  • Preparation steps may, for example, include randomly breaking nucleic acids such as genomic DNA, into smaller sizes and generating sequencing templates such as fragment templates.
  • Spatially separated templates can, for example, be attached or immobilized at solid surfaces which allows for a
  • sequencing reaction to be performed simultaneously.
  • a library of nucleic acid fragments is generated and adaptors containing universal priming sites are ligated to the end of the fragments.
  • the fragments are denatured into single strands and captured by beads.
  • a huge number of templates may be attached or immobilized in a polyacrylamide gel, or be chemically crosslinked to an amino-coated glass surface, or be deposited on individual titer plates.
  • solid phase amplification may be employed.
  • forward and reverse primers are typically attached to a solid support.
  • the surface density of amplified fragments is defined by the ratio of the primers to the template on the support.
  • This method may produce millions of spatially separated template clusters which can be hybridized to universal sequencing primers for massively parallel sequencing reactions. Further suitable options include multiple displacement amplification methods.
  • Suitable sequencing methods include, but are not limited to, cyclic reversible termination (CRT) or sequencing by
  • SBS synthesis by Illumina
  • SBL sequencing by ligation
  • pyrosequencing single-molecule addition
  • real-time sequencing exemplary platforms using CRT methods are
  • Exemplary SBL platforms include the Life/APG/SOLiD support oligonucleotide ligation detection.
  • An exemplary pyrosequencing platform is Roche/454.
  • Exemplary real-time sequencing platforms include the Pacific Biosciences platform and the Life/Visi-Gen platform.
  • Other sequencing methods to obtain massively parallel nucleic acid sequence data include nanopore sequencing, sequencing by hybridization, nano-transistor array based sequencing, scanning tunneling microscopy (STM) based sequencing, or nanowire-molecule sensor based sequencing. Further details with respect to the sequencing approach would be known to the skilled person, or can be derived from suitable literature sources such as Goodwin et al . , Nature Reviews Genetics,
  • a preferred sequencing method is sequencing by synthesis.
  • sequencing reads which may be single-end or paired-end reads.
  • Obtaining such sequencing data may further include the addition of assessment steps or data analysis steps.
  • sequencing reads may be used with any suitable sequencing read length. It is preferred to make use of sequencing reads of a length of about 50 to about 200, or about 75 to about 150 nucleotides, e.g. 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200 or more nucleotides or any value in between the mentioned values. Most preferably, a length of 80 nucleotides is employed .
  • alignment or “sequence alignment” or “aligning” as used herein relate to the process of sequence comparison and matching a sequencing read with a sequence location, e.g., a genomic location. In the context of the present invention alignment exclusively relates to nucleotide
  • sequences For the performance of an alignment operation or sequence comparison any suitable algorithm or tool can be used. Preferred is an algorithm such as the Burrows-Wheeler Aligner (BWA) , e.g. as described by Li and Durbin, 2009, Bioinformatics, 25, 1754-1760. Information on the position where an alignment correspondence between a sequencing read and a reference sequence was detected may be stored together with the sequence
  • BWA Burrows-Wheeler Aligner
  • position information For example, position information, information on the degree of correspondence, version and identity
  • information on the reference sequence etc. may be stored together with the sequence information.
  • sequence information on the reference sequence etc. may be stored together with the sequence information.
  • a format such as BAM, SAM or CRAM may be used.
  • BAM and SAM formats are designed to contain the same
  • the SAM format is a human readable format, and easier to process by conventional text based processing programs, such as, for example, standard Linux commands or python.
  • the BAM format provides binary versions of the same data, and is designed to provide a good compression rate.
  • the CRAM format is similar to the BAM format. In this format the compression is driven by the reference the sequence data is aligned to.
  • reference sequence as used herein relates to a sequence, which is used for alignment purposes within the context of the present invention.
  • the reference sequence is typically a genomic sequence or part of a genomic sequence.
  • the reference sequence is a human genomic sequence.
  • the reference sequence may alternatively be a non-human genomic sequence such as monkey-, mouse-, rat-, bovine-sequence, a domestic animal sequence, a companion animal sequence etc.
  • sequence may either be provided in a sense direction, or in a reverse-complement direction.
  • sense or “sense direction” corresponds to the plus strand of a duplex nucleic acid.
  • reverse complementary corresponds to the minus strand of a duplex nucleic acid.
  • the reference sequence may be selected as any suitable genomic sequence derivable from databases as known the skilled person. For example, a
  • reference sequence may be derived from the reference assembly provided by the Human Genome Reference Consortium. Also envisaged are further similar reference sequences. The reference sequence may further be limited to certain sectors of the genome, e.g. specific chromosomes, or parts of a chromosome, e.g. exons or certain genes, groups of genes or gene clusters etc. It is preferred that the reference
  • sequence is a well established, curated and/or controlled sequence which comprises advantageously no or only a minimal proportion of sequencing errors.
  • a "subject's sample” as used herein may be any suitable sample derived from a subject. It is preferred that the sample is a tumor sample, i.e. the nucleic acids may be extracted from a tumor of a patient. Also envisaged is to make use of previously deposited samples, e.g. samples derived from the umbilical cord.
  • a "normal sample” or “normal” as used herein relates to a sample which is known to have a non-aberrant copy number.
  • the normal sample is diploid with no copy number aberrations w.r.t. the used reference genome (i.e. usually copy numbers of genes will be 2, except for cases, where the gene occurs multiple times on the reference genome) .
  • a normal or normal sample may typically be understood as a reference or reference sample since copy number alterations are
  • the normal sample's characteristics e.g. diploidy, no copy number aberrations
  • the confirmation may, for example, be based on the fact that it was tested and characterized in more than one available publication or project, preferably by trusted third parties or consortia, to confidently know its copy number status.
  • the term "normal sample” also refers to a group or cohort of more than one sample. Accordingly, a cohort of normal samples relates to a group of samples which share as defining characteristic the fact that it is diploid with no copy number aberrations w.r.t. the used reference genome.
  • a cohort of normal samples may comprise any suitable number of normal samples, e.g. 2, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55,
  • a sub-cohort of samples may exist within a cohort of samples.
  • a "sub-cohort" may share more than one defining characteristic, e.g. besides the fact that it is diploid with no copy number aberrations w.r.t. the used reference genome, the samples may have been sequenced on the same machine, they may have been derived from the same subject etc.
  • a further defining characteristic shared among sub-cohorts may be that the normal samples produce significantly lower noise measure values for copy numbers in a certain sample.
  • Such a sub cohort of samples is suited for being used for normalization of said samples.
  • the normal sample is sample derived from and/or managed by the Hapmap project.
  • a normal sample may be derived from a healthy subject, i.e. a subject having copy number of 2, by performing a biopsy.
  • a defined set of genomic loci as used herein accordingly relates to the acquirement of sequence information as described above for a subset of the genome.
  • a subset may be a part of one or more chromosomes, or a sub-chromosomal region.
  • regions may further comprise more than one sub-chromosomal region from two or more
  • genomic loci may, in preferred embodiments, correspond to genes, more preferably to coding sequences of genes, thus excluding regulatory sequences, introns,
  • embodiments also intronic sequences and/or regulatory
  • genomic loci comprise gene regions.
  • the term "gene region" as used herein relates to one or more segments of a genome which encode a specific protein, or any variant of said protein, e.g. by splicing, as duplicated version etc.
  • each genomic locus or gene region corresponds to one gene.
  • the genomic loci as mentioned herein comprises a preselected panel of genes.
  • preselected panel of genes refers to a number of genes which are known from literature sources or previous analyses to have an increased predictive value in diagnosis. Such genes may, for example, differ in accordance with the
  • a disease to be analyzed may differ from panels of domestic or companion animals etc.
  • the preselected panel of genes is a panel of genes associated with a disease or associated with the development of a disease. Whether a gene is associated with a disease or the development of a disease can be derived from suitable
  • the disease may, in particularly preferred embodiments, be cancer. It is
  • panels preselected for the diagnosis of cancer or for the guidance of cancer related treatment decisions may comprise any combination or selection of genes, i.e. also genes which are not per se related to the etiology of cancer or associated with cancer.
  • a "gene of interest" as used herein is comprised in the predefined panel of genes.
  • the preselected panel of genes is a panel of at least 10 genes or gene regions, of at least 30 genes or gene regions, or of at least 100 genes or gene regions. Also envisaged are, in certain embodiments, panels comprising 300 or more than 300 genes. For example, a panel of 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75,
  • the panel of genes to be analyzed may comprise one or more of genes involved in cancer or the development of cancer, or may comprise a mixture of genes involved in cancer or the development of cancer, and genes which have no linkage to cancer or the development of cancer.
  • genes to be comprised in the panel of genes can be derived from suitable databases, such as for example, the COSMIC (catalogue of somatic mutations in cancer) , which can be accessed at http://cancer.sanger.ac.uk/cosmic, the candidate cancer gene database accessible at http://ccgd- starrlab.oit.umn.edu, or database ClinVar, accessible at https://www.ncbi.nlm.nih.gov/clinvar. Also envisaged are the databases provided by NIH concerning predictive human or mammalian genes. It is further preferred that the panel of genes comprises genes which have a high or significant relevance for the disease under considerations, e.g. for cancer. Examples of genes, wherein one or more of the genes or part of the genes, such as one or more exons, are
  • genes for the panel of genes according to the present invention may be added to the above list in accordance with scientific knowledge in the field of cancer biology .
  • size of the defined set of genomic loci refers to the number of genes, exons or functional elements included in a defined set of genomic loci
  • the size may vary
  • sampling read count data relates to the number of sequencing reads per unit, e.g. per defined set of genomic loci, per gene region, per partition.
  • partition as used herein relates to sub-portions of a gene such as exons or parts of exons, whose size is typically defined by technical parameters of hybrid capture approaches or similar methods.
  • a partition may have a size of about 50 to about 450 nucleotides, or preferably of about 100 to 200 nucleotides.
  • a preselected panel of genes as defined herein above may comprise between about 1000 to about 20 000 partitions, each partition of the size of about 100 to 300 nucleotides.
  • Corresponding data are further median normalized. Such a normalization is based on the ratios of tumor to normal read counts. The median of the ratios of tumor to normal read counts for all partitions of the sample is computed, and then all ratios are divided by this median value to obtain a normalized read count ratio.
  • the normalization allows to eliminate influence by total coverage on the determination of the copy number value.
  • This normalized ratio can then be converted to copy numbers of the tumor cells in the sample using the ploidy (e.g. 2 for diploid) and tumor purity of the sample, i.e. the fraction of tumor cells over all cells in the sample, which shows a degree of contamination in a typical diagnostic setting.
  • This contamination which is specifically designated "normal contamination” relates to the patient's tissue material of non-tumor cells, or to the patient's genetic material of non tumor cells, which is - due to technical reasons (e.g. biopsy procedures, blood cell preparation) - unavoidably present in material obtained in a patient's tumor biopsy or when a patient's blood is obtained. . Further shifts/adjustments might be necessary to obtain final integer absolute copy numbers .
  • Standard error value or "SE value” as used herein relates to the standard deviation of the sampling
  • the quantity of interest e.g. mean or median
  • the quantity of interest e.g. mean or median
  • standard error is the standard deviation of the partition copy numbers within the gene region divided by the square root of the sample size (number of partitions within the gene region) .
  • standard deviation value or "SD value” as used herein relates to the amount of variation or dispersion of a set of data values. Generally, a low SD value indicates that the data points are close to the average of the data set, whereas a high SD value indicates that the data points are spread out over a wide range of values.
  • standard deviation value or "SD value” as used herein relates to the amount of variation or dispersion of a set of data values. Generally, a low SD value indicates that the data points are close to the average of the data set, whereas a high SD value indicates that the data points are spread out over a wide range of values.
  • standard deviation value or "SD value” as used herein relates to the amount of variation or dispersion of a set of data values. Generally, a low SD value indicates that the data points are close to the average of the data set, whereas
  • estimated standard deviation and estimated standard error of mean gene copy numbers e.g. based on some set of data may be calculated according to the following formulas:
  • SD(mean) and SE (mean) refer to the standard error and standard deviation of the mean
  • n is the number of data points, from which the mean is calculated
  • x bar is the mean calculated from the n data points
  • xi is the data point i out of n data points.
  • noise level in the calculation of a subject's genetic copy number value is determined.
  • the determination of the noise level is primarily based on classic copy number determination, i.e. the calculation of a ratio between a test sample sequencing read count and a control or normal sample sequencing read count.
  • the calculation may be performed according to a suitable scheme as known to the skilled person, or in accordance with suitable literature sources such as Talevich et al.,2016, PLoS Comput Biol, 12 ( 4 ) : el 004873.
  • a normalization step is performed after the ratio has been determined.
  • the resulting value may, in certain embodiments, be multiplied with the assumed or determined ploidy of the analyzed genome or additional steps are taken to calculate purity and ploidy estimates of the analyzed sample to finally obtain integer absolute copy numbers for each gene of interest.
  • this multiplication step may be skipped if the ploidy is not determined or determinable.
  • the ratio of standard error values to a copy number value as determined in the previous step at each locus of the defined set of loci is calculated.
  • the ratio of standard deviation values to a copy number value as determined in the previous step at each locus of the defined set of loci may be calculated.
  • the value may be modulated by suitable multiplication operations. For example, an obtained value may be weighted up by multiplication with a factor >1 in case of a clinically relevant, highly relevant or extremely relevant locus. Similarly, an obtained value may be weighted down by multiplication with a factor ⁇ 1 in case of a clinically less relevant, or irrelevant locus. In certain embodiments, only a weighting up operation in case of clinically relevant loci may be performed.
  • Information on clinical relevance may be derivable from suitable literature sources or database entries , e.g. CIViC (https : / /www . civicdb . org) . Generally, the relevance of a locus depends on the disease associated. Also secondary factors such as age of the subject, gender, additional diseases, general health conditions, previous treatments etc. may be considered.
  • corresponding to the clinical relevance of a locus may be, in certain embodiments, the factor 2, 3, 4 or 5.
  • the factor 2, 3, 4 or 5 For example, for a diagnostic test, which focusses on lung cancer
  • a weight factor of 2 may be used for the genes FGFR1 and MET, for which copy number amplifications were shown to be cancer driving mechanisms in non-small cell lung cancer and are potential therapeutic targets (further
  • the value for a preliminary noise level of the tested sample obtained in the previous step is normalized by the size of the defined set of genomic loci. For example, if more than 50 loci are analyzed, the overall sum of the ratios is higher than in a case in which only, for example,
  • a normalization according to the size of the set of genomic loci becomes necessary.
  • the normalization may be performed in accordance with a numerical value for the number of loci.
  • the normalization step may be skipped since in such a scenario the obtained values are comparable among them.
  • normalization may also be performed subsequently, e.g. when a comparison with different sets of genomic loci is intended.
  • the method may be performed with different weight factors as described above. In such embodiments, the method may be performed with different weight factors as described above. In such embodiments,
  • the normalization value may be the sum of all applied weights. For example if 10 genomic loci (e.g. 10 different gene regions) are analyzed and 8 of the genomic loci are given weight 1, and two of the genomic loci are given weight 2 (e.g. due to the special relevance of these genomic loci for certain cancer types, which are of interest for the analysis, as can be derived from suitable literature sources or databases such as e.g. Gene Cards (accessible at https : //www . genecards . org) or CIViC
  • the copy number value as determined in step (c) is determined on the basis of mean values.
  • the present invention also envisages different alternative approaches.
  • the copy number value may be determined on the basis of median values.
  • the term "median value” as used herein relates to the value separating the higher half from the lower half of a certain group of data.
  • the copy number value is determined on the basis of trimmed mean values.
  • the term "trimmed mean value” as used herein relates to a statistical measure of central tendency. Its determination typically involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, and typically discarding an equal amount of both.
  • the present invention relates to a method of reducing the statistical noise level in the calculation of a subject's genetic copy number value in massively parallel nucleic acid sequencing data derived from a sample,
  • said normal sample is selected from a heterogeneous cohort of normal samples, wherein normal sample yields the lowest final noise measure among all normal samples of said cohort with respect to the subject's sample analyzed, or wherein said normal sample is a sub-cohort selected from a heterogeneous cohort of normal samples, wherein said sub-cohort comprises those normal samples which yield the 2, 3, 4 or 5 lowest final noise measures among all normal samples of said cohort with respect to the subject's sample analyzed, and wherein a reduced value for the statistical noise level in a subject's genetic copy number is obtained.
  • the present invention relates to a method of reducing the statistical noise level in the calculation of a subject's genetic copy number value in massively parallel nucleic acid sequencing data derived from a sample, comprising carrying out the method of determining the statistical noise level in the calculation of a subject's genetic copy number value in massively parallel nucleic acid sequencing data as defined herein, wherein said normal sample is selected from a heterogeneous cohort of normal samples and wherein certain normal samples of low noise are selected from said heterogeneous cohort.
  • a reduced value for the statistical noise level in a subject's genetic copy number may be obtained.
  • a reduced value for the statistical noise level as used herein is thus typically obtained by analyzing the subject's sample separately with all the normals in the normal cohort. For each normal, copy numbers and a noise measure can be calculated.
  • the normal data which resembles the subject's sample most in overall coverage distribution and is generated under the most similar conditions (e.g. input material, overall workflow) yields the "lowest noise measure".
  • This particular normal sample is subsequently used for the final analysis and calculation of copy numbers. The results are expected to have a reduced and - for the given cohort of normals - lowest noise final noise measure.
  • heterogeneous cohort of normal samples means that the data of the normal samples within the cohort was obtained with differing quality of the nucleic acids (e.g. derived from FFPE or fresh frozen tissue
  • the heterogeneity of the cohort of normal samples is a prerequisite for obtaining an optimal low noise normal sample with respect to each subject's sample analyzed.
  • the heterogeneous cohort may comprise 2 - 500 normal samples
  • a normal sample is selected if it yields the lowest final noise measure among all normal samples of the heterogeneous cohort with respect to the subject's sample after having performed the method for determining the
  • a sub-cohort may be selected which comprises those normal samples which yield the 2,3,4 or 5 lowest final noise
  • an effective normal can be generated, e.g. by taking median or mean values of read counts for all the normals in the sub-cohort for each partition. With these effective normal read counts for each partition, tumor over normal read count ratios can be formed.
  • a heterogeneous cohort of normal samples comprises normal samples with a certain sequencing coverage level.
  • the sequencing coverage may, for example, be between about 50 x to 10 000 x. In preferred embodiments, the sequencing coverage is between about 50 x and 5000 x. It is particularly preferred that the sequencing coverage is from at least about 100 x to at about 3000 x, e.g. about 100 x,
  • the sequencing coverage values are determined as coverage values after PCR duplication masking.
  • the calculated noise level has to be compared to a threshold value.
  • Said threshold value has to be calibrated with several parameters used in the calculation. It may hence change and/or has to be
  • This procedure may be employed for values obtained with the method for determining the statistical noise level in the calculation of a subject's genetic copy number.
  • the procedure comprises as first step (a) the selection of a set of normal samples as a calibration test set, wherein all relevant gene copy numbers are known to have a value of 2.
  • calibration test set as used herein relates to a group of patients ' test samples or derived forms thereof, which are typically not identical to the patient's test sample which is analyzed with respect to its copy number in the main method of the present invention.
  • the calibration test set may, in preferred embodiments, be a group of cell line samples.
  • the set may have any suitable size, e.g. comprising about 10 to 50 normal samples.
  • a disjunct set of normal samples is selected as a calibration reference set, wherein all copy numbers are known to have a value of 2.
  • the term "disjunct set of normal samples" as used herein relates to the use of a different set of normal samples. This different set of normal is hence used as for calibration purposes and thus
  • the normal samples comprised in said calibration reference set are preferably heterogeneous, i.e. they may, for example, be derived from different nucleic acids or from different cell lines, they may have been handled by different persons, or have been analysed on different sequencing machines, or different input materials were used during their handling etc.
  • the normal samples comprised in said calibration reference set are preferably heterogeneous, i.e. they may, for example, be derived from different nucleic acids or from different cell lines, they may have been handled by different persons, or have been analysed on different sequencing machines, or different input materials were used during their handling etc.
  • calibration reference set is typically designed such that it can be changed iteratively when performing steps c) to d) of the present calibration procedure. This allows to
  • heterogeneous normals in the calibration reference set is that the definition of thresholds at the lower and higher end, i.e. for low and high noise, is possible. Otherwise, i.e. if the calibration reference set would merely consists of normals, which are very similar to the normals in the calibration test set, it is likely to obtain mainly low noise values which prevents a suitable determination of thresholds for large In a specific embodiment there is a larger
  • step c) each normal sample in the calibration test set is analyzed to determine the copy number value of all loci of a defined set of genomic loci as defined herein above.
  • the copy number standard error or the copy number standard deviation as defined herein above is determined.
  • the copy number stand deviation may preferably be determined if mean gene copy numbers are under consideration.
  • the calibration reference set mentioned in step (b) of the procedure is used for normalization.
  • the best normal (or best normal cohort as described above) out of the reference cohort is chosen for each test sample, such that the noise measure is minimized for each sample (previously described noise reduction) .
  • step (d) of the procedure the noise measure value for each of the calibration test samples is calculated as defined in steps (d) to (g) of the method for determining the statistical noise level as outlined herein above is determined .
  • step (e) a direct noise measure value of the copy number value of step (c) is calculated.
  • the sum of the weights may be used for normalization in this step. No normalization has to be applied for the case of p -> , as the maximum norm only extracts the maximal deviation, which is the largest absolute value of the vector components.
  • the noise measure of each calibration test sample is plotted against the mean absolute deviation to copy number 2 or, in a second plot, the noise measure of each calibration test sample is plotted against the maximum deviation to copy number 2.
  • the plotting may be performed according to any suitable methodology, preferably by using the mean absolute deviation to copy number 2 or the maximum deviation to copy number 2 as x-axis and the determined noise measure as described herein as y- axis .
  • the used criterion of 0.5 below or above 2 is a significance criterion according to the present invention, since the values 2.5 and 1.5 are centrally arranged in-between the integer values 2 and 3 or 2 and 1, respectively, which are expected in view of the fact that the copy numbers typically are integers.
  • the used criterion is a significance criterion according to the present invention, as the
  • step (j) of said procedure said threshold obtained in step (i) is applied to the value for the statistical noise level obtained in a method for determining the statistical noise level in the calculation of a subject's genetic copy number value, or the method of reducing the statistical noise level in the calculation of a subject's genetic copy number value as defined herein above.
  • different threshold values may be obtained, wherein, for example, one threshold value (e.g. threshold value A) indicates a necessity for modifying a copy number valued calculated, or wherein, in a different example, another threshold value (e.g. threshold value B) indicates a necessity for excluding a sample from further employment.
  • additional threshold values e.g. C, D, E etc.
  • different thresholds may be determined by applying different objective criteria on the mean absolute deviation to 2 or maximum deviation to 2, which then translate to different values of noise measures, ranging, e.g., from less restrictive to more restrictive.
  • different specific use groups or categories may be obtained which can be employed to distinguish the potential further treatment of the samples.
  • the present invention relates to a method for determining a subject's genetic copy number value in massively parallel nucleic acid sequencing data derived from a sample, comprising adjusting the determination of the subject's genetic copy number value by applying a value for the statistical noise level in a subject's genetic copy number as determined in the method for determining the statistical noise level in the calculation of a subject's genetic copy number value, as defined above, or the method of reducing the statistical noise level in the calculation of a subject's genetic copy number value in massively parallel nucleic acid sequencing data derived from a sample, as defined above, to said sample.
  • the determination of the subject's copy number value comprises the steps of
  • sequencing read count is preferably
  • threshold values A and B are used) or surpasses threshold value C (if three threshold values A, B and C are used) as defined herein above, the adjustment of the determination of the subject's genetic copy number value comprises an
  • exclusion from further usage refers to either a full exclusion from copy number calling or an exclusion from calling of copy number deletions.
  • the adjustment of the determination of the subject's genetic copy number value may comprise or be an elevation of the calling threshold. For example, if a
  • the adjusting of the determination of the subject's genetic copy number value may comprise a modification of the statistical significance of a calculated genetic copy number subject to the value for the statistical noise level in a subject's genetic copy number as determined in the method of the invention as defined herein to said sample.
  • the statistical significance of a calculated copy number value may be reduced or increased depending on the noise level value obtained in accordance with the method of the present invention.
  • the present invention relates to a method to determine a subject's genetic copy number value for stratifying the subject for cancer therapy, comprising (a) performing a massively parallel nucleic acid sequencing of nucleic acids extracted from a subject's tumor sample; (b) determining the subject's genetic copy number value according to the methods as described above; and (c) attributing the determined subject's genetic copy number value to a group of increased, normal or decreased copy number values, which can guide a treatment decision.
  • the method is preferably an in vitro method.
  • stratifying patients means that patients are partitioned by a factor other than the treatment itself. This factor, may, in the present case, be copy number value as defined herein above. The stratification may, for example, help to control confounding variables, or to facilitate the detection and interpretation between
  • the patient may be analyzed with respect to its copy number value.
  • specific therapy forms or specifically adjusted therapy forms may be used.
  • the stratification may, in particular, be based on the attribution of a determined copy number value to a diagnosis group .
  • the subject may,
  • the copy number value determined either have a decreased genetic copy number value, a normal copy number value or an increased copy number value.
  • a decreased genetic copy number value corresponds to a genetic copy number significantly lower than 2.
  • a normal copy number value corresponds to a copy number value of 2.
  • a decreased or increased genetic copy number value indicates a preference for a targeted cancer therapy.
  • a normal copy number value indicates a healthy status.
  • a "targeted cancer therapy” as used herein relates to the blocking of the growth of cancer cells by interfering with specific targeted molecules needed for carcinogenesis and tumor growth. Examples of targeted cancer therapeutic approaches include the blocking or turning off of chemical signals instructing the cancer cell to grow and divide, the modification of proteins within the cancer cells so the cells die, the blocking of the angiogenesis, the triggering of the immune system to kill cancer cells, or the transport of toxins to cancer cells to specifically kill said cancer cells.
  • cancer therapy as used herein relates to any suitable therapeutic treatment of a cancer disease or a tumor as known to the skilled person.
  • the treatment includes chemotherapy, a treatment with small molecules, an antibody- treatment, an immunotherapy, e.g. a checkpoint inhibitor based immunotherapy or a combination thereof.
  • additional therapy forms including gene-therapy,
  • antisense-RNA therapy etc. as well as any other suitable type of treatment, including future therapy forms.
  • the skilled person would be aware of the corresponding therapy forms and also the usability of compounds and compositions for specific cancer forms, or can derive this information from suitable literature sources such as Karp and Falchook, Handbook of targeted cancer therapy, 2014. Ed. Lippincott Williams.
  • the "cancer" form to be treated may be any cancer known to the skilled person, e.g. a cancer form, which can be associated with an elevated or decreased copy numbers.
  • This may, for example, be breast cancer, prostate cancer, ovarian cancer, renal cancer, lung cancer, pancreas cancer, urinary bladder cancer, uterus cancer, kidney cancer, brain cancer, stomach cancer, colon cancer, melanoma or fibrosarcoma, gastrointestinal stromal tumor (GIST) , glioblastoma and hematological leukemia and lymphomas, both from the myeloid and lymphatic lineage.
  • GIST gastrointestinal stromal tumor
  • the method to determine a subject's genetic copy number according to the present invention envisages the performance of a massively parallel nucleic acid
  • nucleic acid e.g. DNA
  • the nucleic acid may be derived from any suitable sample. It is preferred to extract the nucleic acids from a tumor sample of a patient. Also
  • control sample e.g. samples derived from the umbilical cord.
  • the sample to be used may preferably be a sample comprising one or more premalignant or malignant cells. It may further be a sample comprising cells from a solid tumor or soft- tissue tumor or a metastatic lesion. Also envisaged is the use of a sample comprising tissue or cells from a surgical margin. Further envisaged is the employment of a
  • the present invention also relates to the use of one or more circulating tumor cells (CTC) , e.g. obtained from blood samples.
  • CTC circulating tumor cells
  • the sample may also be a sample comprising circulating tumor DNA (ctDNA) .
  • ctDNA circulating tumor DNA
  • cfDNA cell free DNA
  • Such DNA may, for example, be present in blood samples or processed blood samples, or other liquid samples obtained from a subject.
  • a blood, plasma or serum sample from the same subject having a tumor or being at risk of having a tumor may be used.
  • sample may be a paraffin or FFPE-sample.
  • the in vitro method as mentioned above includes a preparation step for nucleic acids, which comprises a hybrid-capture based nucleic acid enrichment for genomic regions of interest, i.e. targeted sequences such as exonic sequences etc. as defined above.
  • hybrid-capture based nucleic acid enrichment means that firstly a library of nucleic acids is provided, which is subsequently contacted with a library, either being in solution or being immobilized on a substrate, which comprises a plurality of baits, e.g.
  • oligonucleotide baits complementary to a gene or genomic region of interest to form a hybridization mixture; and subsequently separating a plurality of bait/nucleic acid hybrids from the mixture, e.g. by binding to an entity allowing for separation.
  • This enriched mixture may subsequently be purified or further processed.
  • the identity, amount, concentration, length, form etc. of the baits may be adjusted in accordance with the intended hybridization result. Thereby, a focusing on a gene or region of interest may be achieved, since only those fragments or nucleic acids are capable of hybridizing which show complementarity to the bait sequence.
  • the present invention envisages further variations and future
  • the method as described herein above comprises the additional step of providing a report on the obtained results as to the determination of a subject's copy number value as well as its use for the guidance of a treatment decision.
  • a report may be provided in any suitable manner or form, e.g. as electronic file, as electronic file distributed or accessible over the internet, e.g. provided in cloud or deposited on a server, or web-based, e.g. provided on suitable web-site.
  • the report may be provided in paper form.
  • the report may be provided and thus drafted in a corresponding form, to a patient (including information relevant for the patient) , a relative or other person associated with the patient
  • an oncologist including information relevant for the
  • the report may accordingly be redacted, modified, extended or adjusted to the above specified recipient.
  • information relevant for the oncologist e.g. as to the copy number value, may be omitted in the report for the patient etc.
  • the report may comprise, the present invention envisages one or more of the following:
  • Information on the meaning of the determined genetic copy number value may also comprise information on prognosis of the disease, and/or on potential or suggested therapeutic options. Also included may be a conclusion on the most promising treatment, or a
  • the corresponding information may be derived from suitable databases, or literature sources, e.g. by a medical professional. These sources may also be provided in the report.
  • (iii) Further included may be information on the likely effectiveness of a therapeutic option, or the acceptability of a therapeutic option. Moreover, information on the
  • advisability of applying the therapeutic option to a patient having a certain genetic copy number value identified in the report may be given.
  • the corresponding information may be derived from suitable databases, or literature sources. These sources may also be provided in the report.
  • administration routes, dosage regimen, treatment regimen etc. This may further be extended to the potential administration of additional drugs, e.g. if this information about a patient is already known, or if a co-administration of drugs is necessary or advisable.
  • the present invention also envisages a determination system, which performs any of the herein above defined methods.
  • the system may be implemented on any suitable storage or computer platform, e.g. be cloud-based, internet-based, intra-net based or present on local computer or cellphones etc.
  • the present invention envisages to the provision of a data processing apparatus or system comprising means for carrying out any one or more steps of the methods of the present invention as mentioned herein above
  • the present invention additionally envisages a computer program product, which performs any of the herein above defined methods, or any one or more steps of the methods of the present invention as mentioned herein above .
  • a computer-readable storage medium comprising a computer program product as defined above.
  • the computer-readable storage medium may be connected to a server element, or be present in a cloud structure, or be connected via internet to one or more database structures, or client databases etc.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Organic Chemistry (AREA)
  • Medical Informatics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Public Health (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Epidemiology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Primary Health Care (AREA)
  • Oncology (AREA)
  • Hospice & Palliative Care (AREA)
  • Bioethics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Medicinal Chemistry (AREA)

Abstract

La présente invention concerne un procédé de détermination du niveau de bruit statistique lors du calcul d'une valeur du nombre de copies génétiques d'un sujet dans des données de séquençage d'acide nucléique massif parallèle dérivées d'un échantillon, un procédé de détermination d'une valeur du nombre de copies génétiques d'un sujet dans des données de séquençage d'acide nucléique massif parallèle dérivées d'un échantillon et un procédé de détermination d'une valeur du nombre de copies génétiques d'un sujet permettant de stratifier le sujet dans le cadre d'une thérapie anticancéreuse.
EP19778848.2A 2018-09-28 2019-09-16 Mesure de bruit permettant une analyse du nombre de copies sur des données de séquençage d'un panel ciblé Pending EP3811365A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1815851.9A GB2577548B (en) 2018-09-28 2018-09-28 Method for determining a subject's genetic copy number value
PCT/EP2019/074654 WO2020064390A1 (fr) 2018-09-28 2019-09-16 Mesure de bruit permettant une analyse du nombre de copies sur des données de séquençage d'un panel ciblé

Publications (1)

Publication Number Publication Date
EP3811365A1 true EP3811365A1 (fr) 2021-04-28

Family

ID=68072322

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19778848.2A Pending EP3811365A1 (fr) 2018-09-28 2019-09-16 Mesure de bruit permettant une analyse du nombre de copies sur des données de séquençage d'un panel ciblé

Country Status (4)

Country Link
US (1) US20220036972A1 (fr)
EP (1) EP3811365A1 (fr)
GB (1) GB2577548B (fr)
WO (1) WO2020064390A1 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111681088B (zh) * 2020-08-11 2020-12-11 北京每日优鲜电子商务有限公司 信息推送方法、装置、电子设备和计算机可读介质
CN113380327B (zh) * 2021-03-15 2023-06-13 浙江大学 一种人体生物学年龄预测与人体衰老程度评估方法
CN113270141B (zh) * 2021-06-10 2023-02-21 哈尔滨因极科技有限公司 一种基因组拷贝数变异检测整合算法
CN113674803B (zh) * 2021-08-30 2023-08-08 广州燃石医学检验所有限公司 一种拷贝数变异的检测方法、装置、存储介质及其应用

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8554488B2 (en) * 2005-12-14 2013-10-08 Cold Spring Harbor Laboratory Determining a probabilistic diagnosis of autism by analysis of genomic copy number variations
US7660675B2 (en) * 2006-07-24 2010-02-09 Agilent Technologies, Inc. Method and system for analysis of array-based, comparative-hybridization data
US9652585B2 (en) * 2010-03-16 2017-05-16 Bluegnome Limited Comparative genomic hybridization array method for preimplantation genetic screening
JP6534191B2 (ja) * 2013-10-21 2019-06-26 ベリナタ ヘルス インコーポレイテッド コピー数変動を決定することにおける検出の感度を向上させるための方法
WO2016094853A1 (fr) * 2014-12-12 2016-06-16 Verinata Health, Inc. Utilisation de la taille de fragments d'adn acellulaire pour déterminer les variations du nombre de copies
CN107406876B (zh) * 2014-12-31 2021-09-07 夸登特健康公司 表现出病变细胞异质性的疾病的检测和治疗以及用于传送测试结果的系统和方法
WO2018022890A1 (fr) * 2016-07-27 2018-02-01 Sequenom, Inc. Classifications de modifications du nombre de copies génétiques

Also Published As

Publication number Publication date
GB2577548A (en) 2020-04-01
US20220036972A1 (en) 2022-02-03
WO2020064390A1 (fr) 2020-04-02
GB2577548B (en) 2022-10-26

Similar Documents

Publication Publication Date Title
JP7458360B2 (ja) 疾患細胞不均一性を示す疾患の検出および処置、ならびに通信試験結果のためのシステムおよび方法
EP3931360B1 (fr) Systèmes et procédés d'utilisation de données de séquençage pour la détection de pathogènes
Barthel et al. Longitudinal molecular trajectories of diffuse glioma in adults
US20240371472A1 (en) Methods of detecting somatic and germline variants in impure tumors
Cross et al. The evolutionary landscape of colorectal tumorigenesis
JP7637139B2 (ja) がん予測パイプラインにおけるrna発現コールを自動化するためのシステムおよび方法
Newman et al. Integrated digital error suppression for improved detection of circulating tumor DNA
Strom Current practices and guidelines for clinical next-generation sequencing oncology testing
KR102384620B1 (ko) 유전적 변이의 비침습 평가를 위한 방법 및 프로세스
US20220017891A1 (en) Improvements in variant detection
US20200402613A1 (en) Improvements in variant detection
Xie et al. Patterns of somatic alterations between matched primary and metastatic colorectal tumors characterized by whole-genome sequencing
EP3625802B1 (fr) Scansoft : procédé de détection de suppressions génomiques et de duplications en données de séquençage parallèle massif
EP3682035A1 (fr) Détection de variants mononucléotidiques somatiques à partir d'acide nucléique acellulaire avec application à une surveillance de maladie résiduelle minimale
WO2020064390A1 (fr) Mesure de bruit permettant une analyse du nombre de copies sur des données de séquençage d'un panel ciblé
Santorsola et al. A multi-parametric workflow for the prioritization of mitochondrial DNA variants of clinical interest
EP3788173B1 (fr) Marqueur de substitution et procédé de mesure de charge de mutation de tumeur
SoRelle et al. Assembling and validating bioinformatic pipelines for next-generation sequencing clinical assays
Kim et al. Circulating tumor DNA-based genotyping and monitoring for predicting disease relapses of patients with peripheral T-cell lymphomas
CN113053460A (zh) 用于基因组和基因分析的系统和方法
JP2021101629A5 (fr)
WO2022054086A1 (fr) Système et procédé d'identification d'anomalies génomiques associées au cancer et leurs implications
EP3588506B1 (fr) Systèmes et procédés d'analyse génomique et génétique
US20250316385A1 (en) Sclcpheno-seq, a targeted capture panel and associated methodology to call the activity of key transcription factors of clinical relevance to small cell lung cancer from patient liquid biopsies
Choi et al. Ultra-fast Prediction of Somatic Structural Variations by Reduced Read Mapping via Pan-Genome k-mer Sets

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20210119

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: SIEMENS HEALTHINEERS AG

17Q First examination report despatched

Effective date: 20240201