[go: up one dir, main page]

WO2010019588A1 - Method for detecting or diagnosing genomic instability - Google Patents

Method for detecting or diagnosing genomic instability Download PDF

Info

Publication number
WO2010019588A1
WO2010019588A1 PCT/US2009/053423 US2009053423W WO2010019588A1 WO 2010019588 A1 WO2010019588 A1 WO 2010019588A1 US 2009053423 W US2009053423 W US 2009053423W WO 2010019588 A1 WO2010019588 A1 WO 2010019588A1
Authority
WO
WIPO (PCT)
Prior art keywords
ucr
region
conserved
genomic
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2009/053423
Other languages
French (fr)
Inventor
Francesca Ciccarelli
Anna De Grassi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Istituto Europeo di Oncologia SRL IEO
Original Assignee
Istituto Europeo di Oncologia SRL IEO
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Istituto Europeo di Oncologia SRL IEO filed Critical Istituto Europeo di Oncologia SRL IEO
Publication of WO2010019588A1 publication Critical patent/WO2010019588A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the invention relates to molecular markers for hereditary cancers and methods for determining genomic instability and the presence of cancer or predisposition to develop cancer.
  • Genomic instability is a common trait of cancer cells and plays a pivotal role in promoting carcinogenesis in several hereditary tumors.
  • MMR mismatch repair
  • One of the best-known examples is the Lynch syndrome, an autosomal dominant condition associated to heterozygous mutations in mismatch repair (MMR) genes (Peltomaki et al . , 1997) .
  • MMR mismatch repair
  • individuals affected by the Lynch syndrome undergo somatic inactivation of the second allele that causes the impairment of the MMR machinery and the onset of the "mutator phenotype" (Loeb, 1991) .
  • the tumorigenic process starts when mutations hit oncogenes and/or tumor suppressors, often in actively renovating tissues such as endometrium, ovary, and colon.
  • the genetic condition is known as Hereditary Non-Polyposis Colorectal Cancer (HNPCC) , which represents the most common form of inherited colorectal cancer (Lynch et al . ,
  • MMR deficiency A hallmark of MMR deficiency is microsatellite instability (MSI) that measures replication errors at repeated regions of the genome. Since more than 90% of HNPCC show MSI (Aaltonen et al . , 1994; Soreide, 2007), this has become a common diagnostic marker of MMR deficiency. MSI detects clonal mutations at specific repeats of the cancer genome that are particularly prone to accumulate mutations, and therefore it provides only indirect evidence of a widespread genomic instability. Recently, large-scale mutational screenings returned the first estimations of the mutation frequency, which is the number of mutations per genome unit, associated to coding and non- coding sequences of cancer genomes (Greenman et al . , 2007; Wood et al .
  • the returned picture is a "static snapshot" of the cancer genome, where only the tip of the iceberg (i.e., clonal mutations) is captured.
  • next -generation sequencing technologies could offer a valid solution, as they rely on amplification and sequencing of distinct DNA filaments. Because sensitivity of these methods increases with coverage, rare mutations should become detectable by performing an ultra-deep re- sequencing of a given DNA region. The obvious drawback is connected with specificity: at deep coverage, low frequency substitutions are an indistinguishable mixture of technical errors and true mutations, which makes it hard to distinguish true signal from noise.
  • the present invention solves the drawbacks and problems encountered in the prior art and provides a method for detecting or diagnosing genomic instability in a subject, which involves determining in the subject the difference in mutation frequency between a genomic region that is not conserved among species (i.e., not under evolutionary constraints) and another genomic region that is under strong evolutionary constraints, such as an ultra-conserved genomic region (UCR) .
  • a statistically significant increase in mutation frequency in the non-conserved region as compared to the region under strong evolutionary constraints establishes a likelihood of genomic instability in the subject.
  • Figures IA and IB show the features of eUCR41.
  • the genomic coordinates refer to the hg!8 assembly of the human genome.
  • the two grey bars correspond to the extremely conserved sequence (Visel et al . , 2008) , and to the genomic region tested for enhancer activity (Pennacchio et al . , 2006) , respectively.
  • Black bars indicate the eleven overlapping segments used for the amplification.
  • Fig. IB the base composition and percentage of homopolymers over the total length of the three regions (eUCR41, ultraconserved core, and flanking segments) are shown.
  • Figure 2 shows the depth of coverage reached with the sequencing screenings. For each sample, the coverage of sequencing (reads/bp) was measured. The average coverage is 49,150 in sample CC; 45,370 in sample NC; 52,530 in sample PBL; and 48,380 in sample H-PBL. Regions in which the coverage almost doubles correspond to overlapping segments between contiguous amplicons (see Materials and Methods in the Example and Figure IA) . The gradient in gray shading corresponds to the degree of sequence conservation, as reported in Fig. IA. UCR41 is highlighted.
  • Figures 3A-3D show examples of high frequency errors. For each of the four hot spot regions described in Table 7, a different example of high frequency errors derived from sample CC is shown. In all cases, the errors are due to indels that cause misalignments between the reads and the reference sequence. In three cases, the misaligned region corresponds to the end of the reads (*) .
  • Fig. 3A Reference position 1050-1061, frequency 0.1%; SEQ ID NOs: 23 and 24.
  • Fig. 3B Reference position 633- 652, frequency 0.6%; SEQ ID NOs: 25 and 26.
  • Fig. 3C Reference position 1071-1094, frequency 0.1%; SEQ ID NOs:27 and 28.
  • Fig. 3D Reference position 29-45 frequency 0.1%; SEQ ID NOs: 29 and 30.
  • Figures 4A and 4B shows graphs of mutation spectrum of eUCR41 in Sample CC.
  • Fig. 4A all detected substitutions are mapped on the corresponding positions of eUCR41.
  • Two range of substitution frequency are shown: 40.5%-2.5% and ⁇ 1.0%, since no substitution was detected in the range 1.0%-2.5%. All substitutions reported in the range 1.0%-0.1% were manually checked and excluded as sequencing errors. The range of frequency highlighted in grey (2.5%-0.1%) was used to estimate the mutation rate of cancer genome in the simulation model.
  • Fig. 4B mutability was calculated using sliding windows of the same length as UCR41. Values corresponding to the middle point of each window are reported. Mutability increases with the decrease of sequence conservation: it is always below average for sequence identity > 50%, while it is above average for non-conserved segments. Similar trends were observed for all samples deriving from HNPCC (data not shown) .
  • Figure 5 is a schematic illustration of a model for the progressive acquisition of mutations during clonal expansion.
  • the mutation frequency of each mutation in the final population reflects the timing of its appearance.
  • the formula to calculate the mutation frequency is described in Table 8.
  • CD cell division.
  • Figures 6A-6D are graphs showing observed and expected mutability outside and inside UCR41. The expected distribution of mutability ratios are shown for each sample, after 1,000,000 random reassignments of positions with low substitution frequency. Arrows correspond to the experimentally observed ratio.
  • CC neoplastic colon mucosa
  • NC non-neoplastic colon mucosa
  • PBL peripheral blood leukocytes
  • H-PBL Fig. 6C
  • Figure 7 is a graph showing the variation of the mutability ratio for decreasing contribution of random errors.
  • the mutability ratio outside and inside UCR41 increases in all samples from HNPCC patients.
  • errors overcome true mutations inside and outside UCR41 at any value of frequency cutoff.
  • the corresponding mutability ratio is therefore always around 1.
  • Values on the Y axis correspond to the observed ratio for each sample.
  • Figure 8 is a graph showing sensitivity in detecting rare mutations.
  • the linear regression curve was calculated by plotting the observed frequency of the mutated allele G for a series of dilutions into the corresponding A wild-type allele.
  • a strict linear correlation is maintained between observed and expected substitution frequency also for allele frequency of 0.01% (dilution 1:10,000) .
  • UCRs Ultraconserved regions of the human genome constitute a possible repository of such immutable segments.
  • UCRs are genomic elements longer than 200 base pairs (bp) , 100% identical between human, mouse and rat (Bejerano et al . , 2004) , almost utterly depleted in polymorphisms within the human population (Derti et al .
  • the present invention provides a method for detecting" or diagnosing genomic instability in a subject.
  • This method involves sequencing both a genomic region that is under strong evolutionary constraints (e.g., an ultra-conserved genomic region (UCR) ) and a genomic region that is not conserved among species (i.e., not under evolutionary constraints) from DNA obtained from cells of the subject, and determining the difference in mutation frequency ( (mutated sequenced alleles/total sequenced alleles) xlOO) between the two regions.
  • the genomic region under strong evolutionary constraints e.g., UCR serves as an internal control to detect an increase in mutations in the non-conserved region.
  • a statistically significant increase in mutation frequency in the non-conserved genomic region as compared to the genomic region under strong evolutionary constraints establishes a likelihood of genomic instability in the subject.
  • Detecting or diagnosing genomic instability in a subject provides an indication of the likelihood that the subject already has or is predisposed to developing hereditary cancers/tumors and genetic diseases in which there is an increase in the mutation rate or mutation frequency.
  • a preferred embodiment is detecting or diagnosing genomic instability associated with Lynch Syndrome.
  • Lynch Syndrome refers to a number of hereditary tumors that can affect at least eight organs (colon, stomach, endometrium, ovaries, small bowel, hepatobiliary epithelium, brain, uroepithelial epithelium) and which are characterized by widespread increase of genome instability known as "mutator" "phenotype" due to loss of heterozygosis in one of the genes of the mismatch repair (MMR) .
  • mutator phenotype
  • MMR mismatch repair
  • Molecular markers currently used for the diagnosis of Lynch Syndrome rely either on the presence of microsatellite instability (MSI) or on the loss of immunohistochemical staining for MMR proteins.
  • the method according to the present invention applied ultra-deep, single molecule sequencing to detect the difference in mutation frequency between a UCR of the cancer genome and its flanking regions (non-conserved regions) .
  • the well-known hypomutability of UCRs in healthy genomes was found to also be preserved in cancer cells. Therefore, an ultraconserved region can serve as an "internal control" for detecting an increase in mutations
  • HNPCC a well-known form of Lynch Syndrome. This difference in mutation frequency, which is also detectable in non-cancerous tissues of the same HNPCC patients, however completely disappears in DNA obtained from the blood of healthy patients. Thus, the whole genome of an individual affected by HNPCC (and not only the genome derived from cancer cells of the individual) has a statistically significant increased mutation frequency as compared to the genome of a healthy individual .
  • hereditary cancer in which the increased mutation frequency of a non-conserved region as compared to an ultraconserved region can be used to detect the presence of the cancer or the predisposition of the subject to develop the cancer is hereditary MYH colorectal cancer, where mutations in the MYH gene have been found to be associated with a recessive form of polyposis.
  • the genomic region that is under strong evolutionary constraints is a region of at least 200 base pairs that are ultraconserved (100% identity with no insertions or deletions) between homologous regions of the human, rat and mouse genomes (Bejeranos et al . , 2004) .
  • these ultraconserved regions are genomic regions 100% conserved during mammal evolution (Bejeranos et al . , 2004) and within the human population (Drake et al., 2006; Katzman et al . , 2007) . Bejeranos et al .
  • the genomic region that is not conserved among mammalian species (i.e., not under evolutionary constraints to be conserved) which is used to compare its mutational frequency with that of a UCR internal control in the method of the present invention, may be immediately adjacent to the UCR (or in the sequence flanking either or both sides of the UCR) , or it can be located far away from the UCR.
  • the non- conserved genomic region used in the method of the present invention is located in the DNA immediately flanking the UCR on one or both sides, more preferably within about 600 to 700 base pairs from the UCR.
  • the non-conserved region flanking the UCR to form an extended UCR provides the advantage that both regions used for determining a difference of mutation frequency are physically located near each other on the same segment of DNA. This however is only preferred and not an absolute requirement .
  • the non-conserved region preferably has no more than 60% conservation (sequence homology or identity) between mammalian species, in contrast with the 100% conservation of the UCRs .
  • the length of the UCR and the non-conserved region sequenced in the method according to the present invention are preferably at least 400 base pairs each, more preferably at least 200 base pairs each.
  • the larger the UCR region to be sequenced the larger the flanking sequence that should be sequenced.
  • the length of the non-conserved genomic region sequenced is at least twice as long as the length of the UCR.
  • UCR41 which can be covered by a single amplicon, as can be seen in Figure IA.
  • the -1,500 bp eUCR41 selected as a molecular marker for detecting genomic instability according to the present invention is also shown in Fig. IA, with the criteria for selecting eUCR41 for ultra-deep sequencing and use as the molecular marker presented in Table 1 of the Example hereinbelow .
  • UCR41 is certainly a preferred UCR, with eUCR41 a particularly preferred combination of UCR41 and a non- conserved region in the flanking sequences of UCR41.
  • the non-conserved region and the UCR to be sequenced as part of the method of the present invention are sequenced at a high depth (ultra deep) of coverage, most preferably using an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picoliter-scale volumes.
  • This single molecule sequencing technology was developed by company 454 (Margulies et al., Nature 437:376-380, 2005 and US Patent 7,211,390, both of which are herein incorporated by reference for the sequencing technology) and currently commercialized by Roche (www.454.com) .
  • the output of the sequencing at a high depth of coverage i.e., more than 10,000 reads, allowed the detection of rare mutations present in a very small portion of the cell population.
  • the high depth of coverage is at least 20,000 reads, i.e., between 20,000 and 40,000 reads. While higher numbers of reads provides increasing better coverage, generally no more than about 40,000 reads is considered necessary. Lower coverage, e.g., lower number of reads, considerably reduces the costs of this analysis.
  • a statistically significant increase in mutation frequency between the non- conserved region and the UCR, as determined by ultra-deep sequencing in the present method, is one where the statistical parameter p (probability) ⁇ 0.01, more preferably P ⁇ 0.001.
  • the method of the present invention can be used to provide a simple highly sensitive analysis for diagnosing Lynch Syndrome.
  • This method is non-invasive; a blood sample from a subject is all that is needed.
  • the application of this method can be extended to testing predisposition for Lynch syndrome, e.g., HPNCC, in members of families with the hereditary forms of cancer.
  • HPNCC as a preferred embodiment, such a testing for predisposition in individuals with a family history of HPNCC is currently not possible with molecular markers so it is done through annual colonoscopy.
  • This method of the present invention as used for testing of predisposition, is a one time test and does not require periodical testing. This allows particular vigilance in the form of annual colonoscopy and prophylactic/therapeutic treatments to prevent cancer development in those individuals with a family history of HPNCC and who are determined to be heterozygous for MMR.
  • eUCR41 as the best candidate for ultra-deep sequencing was done as shown in Table 1.
  • Table 1 The entire sequence of eUCR41 was divided into eleven overlapping segments (amplicons) , each around 200 bp long.
  • amplicons a pair of forward and reverse primers was designed with 40%-60% of GC content and a melting temperature of 58-60 0 C.
  • the UCSC in silico PCR tool was used to check that selected primers did not have spurious additional matches on the human genome. All primers were fused with ad-hoc 5' overhangs to allow emulsion PCR and sequencing.
  • dCNE duplicated conserved non coding elements
  • CEU Utah residents with ancestry from northern and western Europe
  • MAF minor allele frequency
  • HNPCC carriers were selected from the Registry of Hereditary Colorectal Cancer at the Istituto Nazionale Tumori (Milan, Italy) . Heterozygous MLHl and MSH2 mutations were detected on genomic DNA purified from peripheral blood leukocytes (Blasi et al . , 2006) . Nine healthy controls more than 50 years old (four males and five females) were selected among blood donors with Italian ancestry and no personal history of cancer. Tumors (six adenocarcinomas and three adenomas) and normal colonic mucosa were surgical removed and cryoconserved.
  • Tumor and matched normal DNAs were amplified by PCR using fluorescent primers followed by gel electrophoresis on a 3130 DNA Sequencer (Applied Biosystems, Foster City, CA) and fragments were analyzed using GeneScan and Genotyper softwares (Canzian et al . , 1996) . All tumor samples used for the analysis showed altered electrophoretic pattern in tumor DNA compared with normal DNA for at least two microsatellites of the NCI recommended panel (Boland et al . , 1998) .
  • Genomic DNA was extracted from frozen tumors and normal mucosa using the QIAmp ® DNA Mini Kit and from peripheral blood leukocytes using the QIAmp ® DNA Blood Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. Genomic DNA was amplified by PCR using the high fidelity Pwo SuperYield DNA Polymerase (Roche) . The PCR products were individually checked on agarose gel and purified using the AGENCOURT AMPure kit (Beckman Coulter) according to the manufacturers' protocol.
  • n is the number of reads differing from the reference and t is the total number of reads for position j.
  • Positions with high substitution frequency (>0.1%) in all four samples were manually checked to reject possible false positives.
  • positions with low substitution frequency ( ⁇ 0.1%) only base substitutions and no indels were considered to reduce the probability of pyrosequencing artifacts associated with insertions and deletions.
  • PCR errors The number of possible errors introduced by the DNA polymerase during the polymerase chain reaction (PCR errors) , was first estimated and then removed from experimental data. PCR errors were quantified using two different approaches. The first one was based on the binomial probability distribution, in which the number of PCR errors X was considered a random variable that follows a binomial distribution:
  • the number n of single stranded DNA sequences was taken from the number of reads of each sample (49,194; 45,383; 53,212; and 49,005 in CC, NC, PBL and H-PBL, respectively) .
  • the cycles of PCR amplifications were simulated in silico using a model similar to that used for the mutation rate. Starting from one DNA double strand of length L 1 errors were randomly introduced at a rate r in each position of the strand at each of the d PCR cycles. Once introduced, errors were retained in all the daughter strands. At the end of the amplification, the number of PCR errors present in the n single strands of DNA sequences was derived. The procedure was re-iterated 1,000 times to generate a distribution of N values. The number of estimated PCR errors returned by the two approaches is identical and is shown in Table 3 below.
  • Dilution experiments were performed using the 157 bp long segment of eUCR41 corresponding to amplicon 9, which bears a single nucleotide polymorphism in position 1204 (SNP A/G; Figure IA) .
  • This segment was amplified from the peripheral blood leukocytes of two healthy donors showing homozygous AA and GG genotypes, respectively (Samples 13 and 14, Table 4) . After amplification, the regions were purified as described above and pooled in different relative amounts. Four final dilutions were obtained with decreasing G:A ratios (1:1,000; 1:2,000; 1:5,000; and 1:10,000; respectively) .
  • DNA quantifications of the two alleles were performed using the Victor PicoGreen fluorometer (PerkinElmer Life Sciences) . The obtained values were used to calibrate the successive dilution.
  • the DNA samples corresponding to the four dilutions were sequenced using four distinct lanes using a four-lane gasket for 70x75 PicoTiterPlate device on the GS FLX Sequencer at BMR Genomics (Padua, Italy) . Specificity was measured as TN/ (TN+FP) . The number of true negatives (TN) was calculated as the number of sequence positions showing errors at a frequency lower than the frequency of the diluted allele.
  • eUCRs extended UCRs
  • All 307 eUCRs were screened for genomic and functional properties that would favor the detection of a difference in mutability between the ultraconserved core and the flanking segments (Table 1) .
  • the best candidate was identified as eUCR41, a 1493 bp long region centered on a 217 bp (654-870) long ultraconserved core ( Figure IA) .
  • This region bears two known SNPs at 286 and 1204, has no coding activity and is located in a gene desert. It acts as a developmental enhancer in the mouse forebrain (Pennacchio et al .
  • MSI microsatellite instability
  • CC colon cancer
  • NC non-neoplastic mucosa
  • PBL peripheral blood leukocytes
  • H- PBL healthy peripheral blood leukocytes.
  • the Minor Allele Frequency (MAF) in all samples is reported, as derived from 454 and Sanger sequencing.
  • MAF was calculated as the percentage of reads bearing the minor allele in each sample.
  • the Sanger screening it corresponds to the fraction of minor alleles detected in the nine patients and in the nine healthy donors.
  • Sanger genotyping confirmed that the two clonal mutations in sample CC are heterozygous mutations present in two different patients. Combining this information with the frequency in the 454 screening, it is possible to infer that they are present in about 74% and 47% of the cells of the two patients, respectively.
  • CC colon cancer
  • NC non-neoplastic mucosa
  • PBL peripheral blood leukocytes
  • H-PBL healthy peripheral blood leukocytes.
  • the mutation frequency allows for roughly assigning an "age" to each mutation detected in sample CC ( Figures 5A and 6) .
  • the present inventors refer to "mutation frequency" and not to "substitution frequency” because the present inventors consider only true mutations and not errors.
  • Clonal mutations which are present in the vast majority of the neoplastic cell population, likely arose at early stages of clonal proliferation. Mutations with lower frequency, and therefore present only in a smaller fraction of the cell population, were instead introduced later. This inverse correlation allows to estimate the mutation rate, defined as the number of mutations per sequenced nucleotide introduced at each cell division (muts/bp/cd) .
  • To derive the mutation rate of HNPCC we simulated a model of cell proliferation that reproduces the progressive accumulation of mutations during cancer clonal expansion ( Figure 5) .
  • the numbers of founder cells and cell divisions were derived from the experimental data of sample CC. At each cell division, the mutation frequency at each sequence position, was measured. This allowed obtaining a final number of expected mutations with frequency 2.6%-0.1% to be compared to the experimentally observed one.
  • the present inventors developed an algorithm that simulates cell proliferation and the progressive inclusion of mutations.
  • the algorithm is based on three steps: (1) Each simulation starts from nine founder cells, resembling the nine HNPCC carriers. The genome of each cell corresponds to a numerical string as long as eUCR41. All positions of the string were initially set to zero; (2) Eleven cycles of cell division are simulated to generate 36,864 final alleles. This number approximates the number of distinct DNA filaments that were experimentally sequenced; (3) At each cell division, mutations are introduced at a given mutation rate. Each mutation corresponded to a 0 to 1 transition in a random position of the string and was propagated to all daughter cells.
  • the mutations frequency defined as the number of mutations divided by the number of final alleles, was calculated for each position of the string ( Figure 5) .
  • the resulting mutation frequency was multiplied by 0.7. This value approximates a reliable content of tumoral cells in the initial samples ( ⁇ 70%) , as estimated using histological analysis (Thomas et al . , 2006; Thomas et al . , 2007) and confirmed by the frequency of clonal mutation at position 871 (74%) .
  • the HNPCC genome acquires mutations at a rate ⁇ 6xlO-6 mutations per base pair at each cell division (muts/bp/cd) .
  • This estimation represents the first measure of mutation rate inferred directly from sequencing data and is compatible with the estimated frequency of fixed mutations in MMR deficient cancers (3.2x10-5 muts/bp) (Greenman et al. , 2007) .
  • the frequency was calculated as the number of times that the substitution was observed divided by the number of times that that position was read.
  • the data confirm our initial assumption that UCR41 is also maintained as ultraconserved in somatic cells and it can therefore be used to normalize the experimental errors.
  • the mutation rate of the HNPCC genome allows detecting an increased occurrence of mutations in the flanking segments when compared to the ultraconserved core. No increase is detectable in the sample H-PBL, although UCR41 is very likely to also be conserved here as well.
  • the mutation rate of healthy human genome is so low that sequencing errors overcome true mutations in the entire region. The different behavior between HNPCC and healthy samples becomes more evident when the contribution of random errors decreases.
  • predisposition testing in family members with the Lynch syndrome consists of genetic screening of the MMR genes to identify germline mutations (Lynch, 2007; Vasen et al . , 2007) .
  • Our strategy here constitutes an alternative test for diagnosing cancer predisposition without any a priori knowledge of the mutated genes.
  • Genome Res 14 (4) : 708-715 (2004) Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 14 (4) : 708-715.
  • Flaman JM Frevier T, Moreau V, Charbonnier F, Martin C et al .
  • Pennacchio LA Ahituv N
  • Moses AM Ahituv N
  • Prabhakar S Nobrega MA et al .

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A method is provided for detecting or diagnosing genomic instability in an individual by determining the difference in mutation frequency between a non-conserved genomic region and an ultra-conserved genomic region (UCR), where a statistically significant increase in mutation frequency in the non-conserved region as compared to the UCR establishes a likelihood of genomic instability in the individual.

Description

METHOD FOR DETECTING OR DIAGNOSING GENOMIC INSTABILITY
BACKGROUND OF THE INVENTION
Field of the Invention
[0001] The invention relates to molecular markers for hereditary cancers and methods for determining genomic instability and the presence of cancer or predisposition to develop cancer.
Description of the Related Art
[0002] Genomic instability is a common trait of cancer cells and plays a pivotal role in promoting carcinogenesis in several hereditary tumors. One of the best-known examples is the Lynch syndrome, an autosomal dominant condition associated to heterozygous mutations in mismatch repair (MMR) genes (Peltomaki et al . , 1997) . During their lifespan, individuals affected by the Lynch syndrome undergo somatic inactivation of the second allele that causes the impairment of the MMR machinery and the onset of the "mutator phenotype" (Loeb, 1991) . The tumorigenic process starts when mutations hit oncogenes and/or tumor suppressors, often in actively renovating tissues such as endometrium, ovary, and colon. In the latter case, the genetic condition is known as Hereditary Non-Polyposis Colorectal Cancer (HNPCC) , which represents the most common form of inherited colorectal cancer (Lynch et al . , 2003) .
[0003] A hallmark of MMR deficiency is microsatellite instability (MSI) that measures replication errors at repeated regions of the genome. Since more than 90% of HNPCC show MSI (Aaltonen et al . , 1994; Soreide, 2007), this has become a common diagnostic marker of MMR deficiency. MSI detects clonal mutations at specific repeats of the cancer genome that are particularly prone to accumulate mutations, and therefore it provides only indirect evidence of a widespread genomic instability. Recently, large-scale mutational screenings returned the first estimations of the mutation frequency, which is the number of mutations per genome unit, associated to coding and non- coding sequences of cancer genomes (Greenman et al . , 2007; Wood et al . , 2007; Jones et al . , 2008; Parsons et al . , 2008) . These studies were designed to detect mutations occurring in most cancer cells, namely in an expanded clonal population, while neglecting low frequency substitutions. The returned picture is a "static snapshot" of the cancer genome, where only the tip of the iceberg (i.e., clonal mutations) is captured.
[0004] The detection of low frequency mutations in addition to clonal mutations is instrumental to clarify many controversial aspects of cancer genetics. This would offer a "dynamic" view of the mutational landscape and allows an estimation of the mutation rate, defined as the number of mutations introduced in the genome at each cell division. In addition, the high sensitivity needed to find rare mutations helps to trace the appearance of the mutator phenotype thus clarifying the role of genomic instability during the early stages of carcinogenesis. So far, technical limitations prevented the detection of low frequency mutations, since standard sequencing procedures cannot reach the required level of sensitivity. In the past years, several approaches have been explored to overcome this problem, all of which implied complex experimental settings (Bielas et al . , 2005; Li et al . , 2006) . In principle, next -generation sequencing technologies could offer a valid solution, as they rely on amplification and sequencing of distinct DNA filaments. Because sensitivity of these methods increases with coverage, rare mutations should become detectable by performing an ultra-deep re- sequencing of a given DNA region. The obvious drawback is connected with specificity: at deep coverage, low frequency substitutions are an indistinguishable mixture of technical errors and true mutations, which makes it hard to distinguish true signal from noise.
[0005] Citation of any document herein is not intended as an admission that such document is pertinent prior art, or considered material to the patentability of any claim of the present application. Any statement as to content or a date of any document is based on the information available to applicant at the time of filing and does not constitute an admission as to the correctness of such a statement.
SUMMARY OF THE INVENTION
[0006] The present invention solves the drawbacks and problems encountered in the prior art and provides a method for detecting or diagnosing genomic instability in a subject, which involves determining in the subject the difference in mutation frequency between a genomic region that is not conserved among species (i.e., not under evolutionary constraints) and another genomic region that is under strong evolutionary constraints, such as an ultra-conserved genomic region (UCR) . A statistically significant increase in mutation frequency in the non-conserved region as compared to the region under strong evolutionary constraints establishes a likelihood of genomic instability in the subject.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] Figures IA and IB show the features of eUCR41. In Fig. IA, the genomic coordinates refer to the hg!8 assembly of the human genome. The two grey bars correspond to the extremely conserved sequence (Visel et al . , 2008) , and to the genomic region tested for enhancer activity (Pennacchio et al . , 2006) , respectively. Black bars indicate the eleven overlapping segments used for the amplification. In Fig. IB, the base composition and percentage of homopolymers over the total length of the three regions (eUCR41, ultraconserved core, and flanking segments) are shown.
[0008] Figure 2 shows the depth of coverage reached with the sequencing screenings. For each sample, the coverage of sequencing (reads/bp) was measured. The average coverage is 49,150 in sample CC; 45,370 in sample NC; 52,530 in sample PBL; and 48,380 in sample H-PBL. Regions in which the coverage almost doubles correspond to overlapping segments between contiguous amplicons (see Materials and Methods in the Example and Figure IA) . The gradient in gray shading corresponds to the degree of sequence conservation, as reported in Fig. IA. UCR41 is highlighted.
[0009] Figures 3A-3D show examples of high frequency errors. For each of the four hot spot regions described in Table 7, a different example of high frequency errors derived from sample CC is shown. In all cases, the errors are due to indels that cause misalignments between the reads and the reference sequence. In three cases, the misaligned region corresponds to the end of the reads (*) . (Fig. 3A) Reference position 1050-1061, frequency 0.1%; SEQ ID NOs: 23 and 24. (Fig. 3B) Reference position 633- 652, frequency 0.6%; SEQ ID NOs: 25 and 26. (Fig. 3C) Reference position 1071-1094, frequency 0.1%; SEQ ID NOs:27 and 28. (Fig. 3D) Reference position 29-45 frequency 0.1%; SEQ ID NOs: 29 and 30.
[0010] Figures 4A and 4B shows graphs of mutation spectrum of eUCR41 in Sample CC. In Fig. 4A, all detected substitutions are mapped on the corresponding positions of eUCR41. Two range of substitution frequency are shown: 40.5%-2.5% and <1.0%, since no substitution was detected in the range 1.0%-2.5%. All substitutions reported in the range 1.0%-0.1% were manually checked and excluded as sequencing errors. The range of frequency highlighted in grey (2.5%-0.1%) was used to estimate the mutation rate of cancer genome in the simulation model. In Fig. 4B, mutability was calculated using sliding windows of the same length as UCR41. Values corresponding to the middle point of each window are reported. Mutability increases with the decrease of sequence conservation: it is always below average for sequence identity > 50%, while it is above average for non-conserved segments. Similar trends were observed for all samples deriving from HNPCC (data not shown) .
[0011] Figure 5 is a schematic illustration of a model for the progressive acquisition of mutations during clonal expansion. The mutation frequency of each mutation in the final population reflects the timing of its appearance. The formula to calculate the mutation frequency is described in Table 8. CD = cell division.
[0012] Figures 6A-6D are graphs showing observed and expected mutability outside and inside UCR41. The expected distribution of mutability ratios are shown for each sample, after 1,000,000 random reassignments of positions with low substitution frequency. Arrows correspond to the experimentally observed ratio. CC: neoplastic colon mucosa (Fig. 6A); NC: non-neoplastic colon mucosa (Fig. 6B) ; PBL: peripheral blood leukocytes; H-PBL (Fig. 6C) : peripheral blood leukocytes from healthy donors (Fig. 6D) .
[0013] Figure 7 is a graph showing the variation of the mutability ratio for decreasing contribution of random errors. By progressively decreasing the number of rare substitutions, the mutability ratio outside and inside UCR41 increases in all samples from HNPCC patients. In H-PBL, errors overcome true mutations inside and outside UCR41 at any value of frequency cutoff. The corresponding mutability ratio is therefore always around 1. Values on the Y axis correspond to the observed ratio for each sample.
[0014] Figure 8 is a graph showing sensitivity in detecting rare mutations. Serial dilution of amplicon 9 bearing a SNP in position 1204 (G, Figure IA) to the corresponding wild-type amplicon (A) . The linear regression curve was calculated by plotting the observed frequency of the mutated allele G for a series of dilutions into the corresponding A wild-type allele. A strict linear correlation is maintained between observed and expected substitution frequency also for allele frequency of 0.01% (dilution 1:10,000) .
DETAILED DESCRIPTION OF THE INVENTION
[0015] In order to overcome technical errors in detecting rare mutation, the present inventors discovered that internal controls, i.e., genomic elements that do not accumulate true mutations so that all substitutions observed in these regions are bona fide errors, can be used in a reliable and highly sensitive method for detecting genomic instability in a subject. Ultraconserved regions (UCRs) of the human genome constitute a possible repository of such immutable segments. UCRs are genomic elements longer than 200 base pairs (bp) , 100% identical between human, mouse and rat (Bejerano et al . , 2004) , almost utterly depleted in polymorphisms within the human population (Derti et al . , 2006) , and under purifying selection stronger than non synonymous sites (Katzman et al . , 2007) . These properties can be used to normalize the experimental errors of DNA amplification and sequencing. By comparing the mutability of UCRs with that of genomically unstable regions, the higher mutation rate of the latter should become eventually detectable. This model works only under two assumptions. The first one is that UCRs are conserved not only in germline but also in somatic cells. Recently, an altered expression of some UCRs has been reported in leukemia and carcinomas (Calin et al . , 2007) , and two out of six SNPs present in UCRs show significant association with familial breast cancer risk (Yang et al . , 2008) . Both these studies suggest that UCRs may play a role also in adult cells and therefore they might be under somatic selection. The second assumption is that the cancer mutation rate is higher or at least comparable to the experimental error rate, because only in this case the difference in mutability can be appreciated. This seems a plausible assumption, given the current estimations for the cancer-associated mutator phenotype (Bielas et al . , 2005; Li et al. , 2006) .
[0016] In developing this analytical approach, the laboratory of the present inventors sequenced -40,000 distinct DNA filaments of a ~l,500 bp genomic segment/region centered on a carefully selected UCR (UCR41) . This genomic region was derived from three different tissues of patients affected by HNPCC: neoplastic colon mucosa; non-neoplastic colon mucosa and peripheral blood. As a negative control, the peripheral blood of nine healthy donors was used. To amplify and sequence each sample, emulsion PCR followed by pyrosequencing was used (Margulies et al . , 2005) . This method offers, to date, the best compromise between sufficiently long reads and low error rate in miscalled bases (Shendure et al . , 2008) . The depth of coverage that was reached allowed the present inventors to estimate the mutation rate during cancer clonal expansion and to detect genomic instability in neoplastic as well as in non-neoplastic cells of HPNCC individuals, as exemplified in the Example herein below.
[0017] The present invention provides a method for detecting" or diagnosing genomic instability in a subject. This method involves sequencing both a genomic region that is under strong evolutionary constraints (e.g., an ultra-conserved genomic region (UCR) ) and a genomic region that is not conserved among species (i.e., not under evolutionary constraints) from DNA obtained from cells of the subject, and determining the difference in mutation frequency ( (mutated sequenced alleles/total sequenced alleles) xlOO) between the two regions. The genomic region under strong evolutionary constraints (e.g., UCR) serves as an internal control to detect an increase in mutations in the non-conserved region. Thus, a statistically significant increase in mutation frequency in the non-conserved genomic region as compared to the genomic region under strong evolutionary constraints establishes a likelihood of genomic instability in the subject.
[0018] Detecting or diagnosing genomic instability in a subject provides an indication of the likelihood that the subject already has or is predisposed to developing hereditary cancers/tumors and genetic diseases in which there is an increase in the mutation rate or mutation frequency. A preferred embodiment is detecting or diagnosing genomic instability associated with Lynch Syndrome.
[0019] Lynch Syndrome refers to a number of hereditary tumors that can affect at least eight organs (colon, stomach, endometrium, ovaries, small bowel, hepatobiliary epithelium, brain, uroepithelial epithelium) and which are characterized by widespread increase of genome instability known as "mutator" "phenotype" due to loss of heterozygosis in one of the genes of the mismatch repair (MMR) . Molecular markers currently used for the diagnosis of Lynch Syndrome rely either on the presence of microsatellite instability (MSI) or on the loss of immunohistochemical staining for MMR proteins. Both of these tests are fairly reliable in detecting Lynch Syndrome, but cannot be used for assessing heterozygosis of MMR repair genes and hence a likely predisposition to develop this type of hereditary cancer because they are not detectable in noncancerous tissue. [0020] As shown in the Example hereinbelow, the method according to the present invention applied ultra-deep, single molecule sequencing to detect the difference in mutation frequency between a UCR of the cancer genome and its flanking regions (non-conserved regions) . The well-known hypomutability of UCRs in healthy genomes was found to also be preserved in cancer cells. Therefore, an ultraconserved region can serve as an "internal control" for detecting an increase in mutations
(mutation frequency) of a non-conserved region, which preferably flank the UCR used as the internal control . The Example hereinbelow demonstrated that a statistically significant difference was detected in samples of colon cancer from individuals with Hereditary Non-polypsis Colorectal Caner
(HNPCC) , a well-known form of Lynch Syndrome. This difference in mutation frequency, which is also detectable in non-cancerous tissues of the same HNPCC patients, however completely disappears in DNA obtained from the blood of healthy patients. Thus, the whole genome of an individual affected by HNPCC (and not only the genome derived from cancer cells of the individual) has a statistically significant increased mutation frequency as compared to the genome of a healthy individual .
[0021] Another non-limiting example of a hereditary cancer in which the increased mutation frequency of a non-conserved region as compared to an ultraconserved region can be used to detect the presence of the cancer or the predisposition of the subject to develop the cancer is hereditary MYH colorectal cancer, where mutations in the MYH gene have been found to be associated with a recessive form of polyposis.
[0022] The genomic region that is under strong evolutionary constraints is a region of at least 200 base pairs that are ultraconserved (100% identity with no insertions or deletions) between homologous regions of the human, rat and mouse genomes (Bejeranos et al . , 2004) . Thus, these ultraconserved regions (UCRs) are genomic regions 100% conserved during mammal evolution (Bejeranos et al . , 2004) and within the human population (Drake et al., 2006; Katzman et al . , 2007) . Bejeranos et al . (2004) also identified and reported 481 UCRs (see also the supporting online material at www.sciencmag.org/cgi/content/full/1098119/DCl and www.cse.ucsc.edu/~jill/ultra.html, which are all incorporated by reference for the listing and disclosures of different UCRs) .
[0023] The genomic region that is not conserved among mammalian species (i.e., not under evolutionary constraints to be conserved) , which is used to compare its mutational frequency with that of a UCR internal control in the method of the present invention, may be immediately adjacent to the UCR (or in the sequence flanking either or both sides of the UCR) , or it can be located far away from the UCR. Preferably, however, the non- conserved genomic region used in the method of the present invention is located in the DNA immediately flanking the UCR on one or both sides, more preferably within about 600 to 700 base pairs from the UCR. The non-conserved region flanking the UCR to form an extended UCR (eUCR) provides the advantage that both regions used for determining a difference of mutation frequency are physically located near each other on the same segment of DNA. This however is only preferred and not an absolute requirement .
[0024] Furthermore, the non-conserved region preferably has no more than 60% conservation (sequence homology or identity) between mammalian species, in contrast with the 100% conservation of the UCRs .
[0025] The length of the UCR and the non-conserved region sequenced in the method according to the present invention are preferably at least 400 base pairs each, more preferably at least 200 base pairs each. The larger the UCR region to be sequenced, the larger the flanking sequence that should be sequenced. Preferably, the length of the non-conserved genomic region sequenced is at least twice as long as the length of the UCR.
[0026] Of the 481 UCR identified by Bejeranos et al . (2004), the most preferred UCR is UCR41, which can be covered by a single amplicon, as can be seen in Figure IA. The -1,500 bp eUCR41 selected as a molecular marker for detecting genomic instability according to the present invention is also shown in Fig. IA, with the criteria for selecting eUCR41 for ultra-deep sequencing and use as the molecular marker presented in Table 1 of the Example hereinbelow .
[0027] Among the 481 UCRs identified, there are 111 transcribing UCRs and 114 inconclusive UCRs (i.e., associated with ESTs and may be transcribed) that are not suitable for use as the UCR in the method of the present invention. Accordingly, it is preferred that both the UCR and the non-conserved region, which are sequenced for determining the difference in mutation frequency between the two regions, be in non-coding regions of the genome. After excluding the UCRs not considered to be suitable, there are still quite a number of other suitable UCRs, as would be appreciated by those of skill in the art. From the selection criteria in Table 1 and the results obtained in the Example hereinbelow, UCR41 is certainly a preferred UCR, with eUCR41 a particularly preferred combination of UCR41 and a non- conserved region in the flanking sequences of UCR41.
[0028] The non-conserved region and the UCR to be sequenced as part of the method of the present invention, as exemplified in the Example hereinbelow with eUCR41, are sequenced at a high depth (ultra deep) of coverage, most preferably using an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picoliter-scale volumes. This single molecule sequencing technology was developed by company 454 (Margulies et al., Nature 437:376-380, 2005 and US Patent 7,211,390, both of which are herein incorporated by reference for the sequencing technology) and currently commercialized by Roche (www.454.com) . Using a fully dedicated run of a GS-FLX sequencer, the output of the sequencing at a high depth of coverage, i.e., more than 10,000 reads, allowed the detection of rare mutations present in a very small portion of the cell population. Preferably, the high depth of coverage is at least 20,000 reads, i.e., between 20,000 and 40,000 reads. While higher numbers of reads provides increasing better coverage, generally no more than about 40,000 reads is considered necessary. Lower coverage, e.g., lower number of reads, considerably reduces the costs of this analysis.
[0029] A statistically significant increase in mutation frequency between the non- conserved region and the UCR, as determined by ultra-deep sequencing in the present method, is one where the statistical parameter p (probability) <0.01, more preferably P<0.001.
[0030] The method of the present invention can be used to provide a simple highly sensitive analysis for diagnosing Lynch Syndrome. This method is non-invasive; a blood sample from a subject is all that is needed. The application of this method can be extended to testing predisposition for Lynch syndrome, e.g., HPNCC, in members of families with the hereditary forms of cancer. Using HPNCC as a preferred embodiment, such a testing for predisposition in individuals with a family history of HPNCC is currently not possible with molecular markers so it is done through annual colonoscopy. As the statistically significant differences in mutation frequency are found in non-cancerous cells of individuals with HPNCC, this suggests such differences are associated with germ line mutations of MMR genes, such as in healthy carriers (i.e., from families with HPNCC) heterozygous for MMR genes . Because of the tendency of heterozygous MMR+/~ genomes to accumulate low frequency substitutions, this genomic instability predisposes heterozygous MMR individuals to the inactivation of the second allele, thereby leading to HPNCC.
[0031] This method of the present invention, as used for testing of predisposition, is a one time test and does not require periodical testing. This allows particular vigilance in the form of annual colonoscopy and prophylactic/therapeutic treatments to prevent cancer development in those individuals with a family history of HPNCC and who are determined to be heterozygous for MMR.
[0032] Having now generally described the invention, the same will be more readily understood through reference to the following example which is provided by way of illustration and is not intended to be limiting of the present invention.
EXAMPLE
[0033] Early detection of cancer-associated genomic instability is crucial particularly in tumor types where it represents the essential underlying mechanism of tumorigenesis . Currently used methods require the presence of already established neoplastic cells because they only detect clonal mutations. In principle, parallel sequencing of single DNA filaments could reveal the early phases of tumor initiation by detecting low frequency mutations, provided an adequate depth of coverage and an effective control of the experimental error. We applied ultra-deep sequencing to estimate the genomic instability of individuals with Hereditary Non-Polyposis Colorectal Cancer (HNPCC) . To overcome the experimental error, an ultraconserved region (UCR) of the human genome was used as an internal control . The depth of coverage that was reached allowed the simulation of cancer clonal expansion and the estimation of the rate of introduction of somatic mutations. By comparing the mutability outside and inside the UCR, a tendency of the ultraconserved element to accumulate significantly fewer mutations than the flanking segments in both neoplastic and nonneoplastic HNPCC samples was observed. No difference between the two regions was detectable in cells from healthy donors. This is the first direct evidence of a costitutional genomic instability of individuals with heterozygous mutations in mismatch repair genes. The analysis in this study suggests a predisposition of such individuals to acquire the second hit necessary for cancer initiation, and constitutes the proof of principle for the development of a more sensitive molecular assay of genomic instability.
MATERIALS AND METHODS UCR Selection
[0034] The genomic coordinates of 481 UCRs were derived from the hgl8 release of the human genome (March 2006) . The conservation between each human UCR and the corresponding orthologous element in mouse (February 2006) , rat (November 2004) , dog (May 2005) , cow (March 2005) , chicken (February 2004) and fugu (August 2002) was derived from the multiz alignments
(Blanchette et al . , 2004) . Only 307 UCRs detectable in all seven species were retained for further analysis. These UCRs were extended on both sides up to 50% of sequence conservation, measured as the percentage of nucleotides over a 25 bp sliding window conserved in at least four of the seven species. To include also non-conserved segments, regions were further extended of 500 bp on both sides. The selection of extended UCR41
(eUCR41) as the best candidate for ultra-deep sequencing was done as shown in Table 1. The entire sequence of eUCR41 was divided into eleven overlapping segments (amplicons) , each around 200 bp long. For each amplicon, a pair of forward and reverse primers was designed with 40%-60% of GC content and a melting temperature of 58-600C. The UCSC in silico PCR tool was used to check that selected primers did not have spurious additional matches on the human genome. All primers were fused with ad-hoc 5' overhangs to allow emulsion PCR and sequencing.
Table 1 : Criteria for the Selection of eUCR41 for Ultra-Deep Sequencing
Figure imgf000016_0001
Legend: Shown are the genomic and functional features, the reasons why they are important for the selection of the best eUCR, the detection methods and the corresponding properties of eUCR41, the selected candidate. dCNE, duplicated conserved non coding elements; CEU, Utah residents with ancestry from northern and western Europe; MAF, minor allele frequency.
Table 2: Primers Used for the Amplification of eUCR41
Figure imgf000017_0001
Figure imgf000018_0001
Figure imgf000019_0001
Figure imgf000020_0001
Sample Preparation and Sequencing
[0035] Nine HNPCC carriers were selected from the Registry of Hereditary Colorectal Cancer at the Istituto Nazionale Tumori (Milan, Italy) . Heterozygous MLHl and MSH2 mutations were detected on genomic DNA purified from peripheral blood leukocytes (Blasi et al . , 2006) . Nine healthy controls more than 50 years old (four males and five females) were selected among blood donors with Italian ancestry and no personal history of cancer. Tumors (six adenocarcinomas and three adenomas) and normal colonic mucosa were surgical removed and cryoconserved. Hematoxilin-eosin staining revealed that tumor areas were not heavily contaminated with normal cells, did not present necrosis, and that normal colonic mucosa was free of tumor infiltration. Tumor and matched normal DNAs were amplified by PCR using fluorescent primers followed by gel electrophoresis on a 3130 DNA Sequencer (Applied Biosystems, Foster City, CA) and fragments were analyzed using GeneScan and Genotyper softwares (Canzian et al . , 1996) . All tumor samples used for the analysis showed altered electrophoretic pattern in tumor DNA compared with normal DNA for at least two microsatellites of the NCI recommended panel (Boland et al . , 1998) . Genomic DNA was extracted from frozen tumors and normal mucosa using the QIAmp® DNA Mini Kit and from peripheral blood leukocytes using the QIAmp® DNA Blood Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. Genomic DNA was amplified by PCR using the high fidelity Pwo SuperYield DNA Polymerase (Roche) . The PCR products were individually checked on agarose gel and purified using the AGENCOURT AMPure kit (Beckman Coulter) according to the manufacturers' protocol. All 99 amplicons from each tissue type (cancer colon, non-neoplastic colon, peripheral blood leukocytes, and healthy peripheral blood leukocytes) were quantified using NanoDrop® ND- 1000 UV-Vis Spectrophotometer and pooled in equimolar ratio to obtain four samples (CC, NC, PBL, H-PBL) . Four independent runs of pyrosequencing were performed with 454 Life Sciences (Branford, CT) , each of them on a 70x75 mm PicoTiterPlate using the GS FLX Sequencer. Emulsion PCR and sequencing were performed as previously described (Margulies et al . , 2005) . Each sequence read was base called (Margulies et al . , 2005) , filtered by quality metrics and aligned to the human reference sequence as previously described (Thomas et al . , 2006) . Sanger sequencing was performed to characterize the genotype of each individual in each tissue and to identify the carriers of the two mutations in cancer. Amplicons were generated using the Pwo SuperYield DNA polymerase (Roche) and sequenced in both directions on a 3130x1 sequencer, Data Collection 3.0 (Applied Biosystems) , using the dRhodamine chemistry standard conditions.
Statistical Analysis
[0036] For each position of eUCR41, the number of reads bearing a nucleotide different from the reference sequence was counted. The substitution frequency at position j was defined as:
(n. /t.)xlOO
where n is the number of reads differing from the reference and t is the total number of reads for position j. Positions with high substitution frequency (>0.1%) in all four samples were manually checked to reject possible false positives. In the analysis of positions with low substitution frequency (<0.1%) , only base substitutions and no indels were considered to reduce the probability of pyrosequencing artifacts associated with insertions and deletions. The mutability of eUCR41 as well as of specific regions (i.e., ultraconserved core; flanking segments; 217 bp-long sliding windows) was defined as:
Figure imgf000022_0001
ι=J l=J where j is the starting position and L is the length of the region. To compare the mutability outside and inside UCR41, the distributions of substitution frequency corresponding to the two regions were compared using the Wilcoxon test. To account for the putative effects of length and base composition on the mutability of UCR41 and flanking segments, all positions with low frequency substitutions were randomly reassigned in each sample, keeping the same base composition. The analysis was repeated 1,000,000 times and the ratio between the expected mutability outside and inside UCR41 was calculated at each round. The probability of observing the experimental ratio by chance was calculated as the fraction of the expected ratios equal or higher than the observed value .
Estimation of PCR Errors
[0037] The number of possible errors introduced by the DNA polymerase during the polymerase chain reaction (PCR errors) , was first estimated and then removed from experimental data. PCR errors were quantified using two different approaches. The first one was based on the binomial probability distribution, in which the number of PCR errors X was considered a random variable that follows a binomial distribution:
X ~ B(L, p) where, L is the length of the region and p is the probability to accumulate errors at a given position after d duplications with a given number of errors r introduced per base pairs at each duplication: p ^ l- (l-r)d
From this model, the total number of PCR errors expected in a region L is :
Figure imgf000023_0001
The total number N of PCR errors present in n single stranded DNA sequences will be:
N = E (X)n/2
[0038] In the analysis in this study, parameters r, d, L and n were all derived from the experimental data. The applied error rate was r = 6.5xlO"7 errors/bp/duplication (Andre et al . , 1997; Dabrowski et al . , 1998) . The number of duplications was set equal to the number of PCR cycles d = 40. The length L of the region was calculated as the number of positions unchanged or bearing low frequency substitutions in each sample (1431; 1435; 1418; and 1415 in CC, NC, PBL and H-PBL, respectively) . The number n of single stranded DNA sequences was taken from the number of reads of each sample (49,194; 45,383; 53,212; and 49,005 in CC, NC, PBL and H-PBL, respectively) . In the second approach, the cycles of PCR amplifications were simulated in silico using a model similar to that used for the mutation rate. Starting from one DNA double strand of length L1 errors were randomly introduced at a rate r in each position of the strand at each of the d PCR cycles. Once introduced, errors were retained in all the daughter strands. At the end of the amplification, the number of PCR errors present in the n single strands of DNA sequences was derived. The procedure was re-iterated 1,000 times to generate a distribution of N values. The number of estimated PCR errors returned by the two approaches is identical and is shown in Table 3 below.
Table 3. Estimation of PCR Errors
Figure imgf000024_0001
Legend: For each sample, the total number of estimated PCR errors and the corresponding percentage over the total number of low frequency substitutions (<0.1%) is reported. These estimations were derived using the binomial probability distribution. Comparable numbers were obtained using the simulation model . For both observed and expected distributions of mutability ratios, the mean, as well as 95% confidence interval (in brackets) , are reported. In each sample, observed and expected distributions were compared using the Wilcoxon test.
[0039] To verify the putative effect of PCR errors on the difference in mutability originally detected between the UCR core and the flanking regions, a number of low- frequency substitutions equal to the estimated number of PCR errors in each sample was randomly removed. The procedure was repeated 1,000 times and the distribution of observed mutability ratios between the flanking regions and the UCR core was derived. Applying the same reshuffling procedure used for the real samples, the distribution of expected ratios was also derived. The results of both simulations are reported in Table 3, together with the p-values of the comparison between observed and expected distributions. Ail statistical analyses were performed using the R statistical environment and ad-hoc Perl scripts. Serial Dilution
[0040] Dilution experiments were performed using the 157 bp long segment of eUCR41 corresponding to amplicon 9, which bears a single nucleotide polymorphism in position 1204 (SNP A/G; Figure IA) . This segment was amplified from the peripheral blood leukocytes of two healthy donors showing homozygous AA and GG genotypes, respectively (Samples 13 and 14, Table 4) . After amplification, the regions were purified as described above and pooled in different relative amounts. Four final dilutions were obtained with decreasing G:A ratios (1:1,000; 1:2,000; 1:5,000; and 1:10,000; respectively) . To correct for possible experimental inaccuracies during DNA quantification and pipetting, at each step of the serial dilutions DNA quantifications of the two alleles were performed using the Victor PicoGreen fluorometer (PerkinElmer Life Sciences) . The obtained values were used to calibrate the successive dilution. The DNA samples corresponding to the four dilutions were sequenced using four distinct lanes using a four-lane gasket for 70x75 PicoTiterPlate device on the GS FLX Sequencer at BMR Genomics (Padua, Italy) . Specificity was measured as TN/ (TN+FP) . The number of true negatives (TN) was calculated as the number of sequence positions showing errors at a frequency lower than the frequency of the diluted allele.
Table 4: Genotyping and Confirmation of High-Frequency Mutations
Figure imgf000026_0001
Legend: For both HNPCC patients (1-9) and healthy donors (10-18) , the corresponding genotype of SNPs and somatic mutations in eUCR41 is reported in each analyzed individual, as detected by Sanger sequencing. The genotype was used to measure the minor allele frequency (MAF) , defined as the frequency of the rare allele over the total . The similar values of the MAFs obtained with Sanger and with 454 allowed confirmation that the samples used in this study were pooled in equimolar ratios (Table 8) . Clonal somatic mutations in sample CC of patients 5 and 6 are reported in italics, while the individuals used for the dilution series are shown in bold. Blood of patient 1 was not available for further analysis. RESULTS
UCR Selection, Amplification and Sequencing
[0041] Starting from 481 UCRs (Bejerano et al . , 2004, and its supporting online material at www.sciencemag.org/cgi/ content/full/1098119/DCl and at www.cse.ucsc.edu/~jill/ ultra.html, the listing of UCRs and their details are herein incorporated by reference) , the analysis was restricted to the 307 regions detectable in seven fully-sequenced vertebrates (human, mouse, rat, cow, chicken, frog and fugu) . The sequences of all UCRs were extended in both directions to allow the inclusion of non-conserved sequences. The resulting extended UCRs (eUCRs) were composed of the ultraconserved core and non- conserved tails. All 307 eUCRs were screened for genomic and functional properties that would favor the detection of a difference in mutability between the ultraconserved core and the flanking segments (Table 1) . The best candidate was identified as eUCR41, a 1493 bp long region centered on a 217 bp (654-870) long ultraconserved core (Figure IA) . This region bears two known SNPs at 286 and 1204, has no coding activity and is located in a gene desert. It acts as a developmental enhancer in the mouse forebrain (Pennacchio et al . , 2006) and might be transcribed in adult cells (Calin et al . , 2007) . Homopolymers composed of more than three identical nucleotides, which may favor pyrosequencing errors, were found to only contribute to a small portion of the entire region (-8.2%) . In addition, the base composition is similar inside and outside the ultraconserved core (Figure IB) .
[0042] DNA was extracted from the neoplastic colon mucosa, non-neoplastic colon mucosa and peripheral blood of nine HNPCC patients with known germline mutations in either the MLHl or MSH2 gene. All tumor samples, six adenocarcinomas and three adenomas, were verified to display high degrees of MSI (Table 5) . As negative control, the peripheral blood of nine healthy donors was used. To amplify eUCR41, the region was divided into eleven overlapping segments (Figure IA) and the PCR errors were reduce by using the highest fidelity DNA polymerase available to date
(Andre et al . , 1997) . To uniformly cover the region and minimize the contribution of single individuals, we pooled equimolar ratios of all amplicons from the different tissues types of each individual into four distinct samples: cancer colon (CC) , nonneoplastic colon (NC) , peripheral blood leukocytes (PBL) , and healthy peripheral blood leukocytes (H-PBL) . Each sample was sequenced on both sides using a fully dedicated run of ultra-deep pyrosequencing (Margulies et al . , 2005) . This allowed sequencing of more than 83 million single bases per sample, corresponding to an average coverage of more than 45,000 reads/bp (Table 6, Figure 2) . After aligning all obtained reads to the reference sequence, the substitution frequency at each position, defined as the percentage of reads bearing a nucleotide different from the reference, was measured. Positions with high (>0.1%) and low
(<0.1%) substitution frequency were distinguished (Table 6), according to the estimated detection power of the method (Thomas et al., 2006; Wang et al . , 2007) .
Table 5: HNPCC Samples Used for the Analysis
Figure imgf000028_0001
Legend: For each HNPCC patient, sex, germline mutation, histological type of tumor, and level of microsatellite instability (MSI) are indicated. Germline mutations are described following the guidelines of the Human Genome Variation Society (www.hgvs.org/mutnomen) . MSI was assessed in both adenomas and adenocarcinomas by checking for the presence of at least two unstable microsatellite markers (BAT25 and BAT26) (Boland et al . , 1998) .
Table 6: Statistics of the Ultra-Deep Sequencing Screening
Figure imgf000030_0001
Legend: For all four samples, the total number of sequence reads and bases are shown,
UD together with the average length of the reads and the percentage of reads aligned to the reference sequence. The latter corresponds to the fraction of reads that passed the quality filter of 454 (see Materials and Methods) . Shown also are the positions of eUCR41 (N=1493) with substitutions at high (>0.1%) and low (<0.1%) frequency. The threshold of 0.1% represents the detection power of 454. We used all positions with low substitution frequency to compare the mutability outside and inside UCR41 and to derive the corresponding mutability ratio. It is to be noted that, although the overall number of mutated positions in the four samples is comparable, the difference between HNPCC and healthy donors becomes appreciable when the mutability outside and inside UCR41 is compared. CC, colon cancer; NC, non-neoplastic mucosa; PBL, peripheral blood leukocytes; H- PBL, healthy peripheral blood leukocytes.
Analysis of High -Frequency Substitutions and Estimation of HNPCC Mutation Rate
[0043] After manual inspection, all but four high frequency- substitutions were discarded (Table 6) . Errors were mostly generated by incorrect insertions and deletions (indels) in proximity of polynucleotide stretches, often at the end of the reads where the sequencing performance decreases (Table 7) . In some cases, indels caused misalignments between the reads and the reference sequence, which resulted in false substitutions (Figures 3A-3D) .
Table 7: Manual Inspection of Positions with High Frequency Errors
Figure imgf000031_0001
Legend: For each type of error, the possible source, the range of positions in the reference sequence, and resulting positions with errors in all four samples are reported. Most sequencing errors occur in close proximity of stretches of polynucleotides and result in hot spots of false insertions and deletions (indels) . In some cases, mdels also cause misalignments with the reference sequence with consequent false substitutions. Representative flowgrams are shown in Figured 3A- 3D for all four main error hot spots. In four positions, the sequencing errors are due to miscalls. We considered them as false substitutions because either they had similar substitution frequency in all four samples (116, 1444, 1445) or they were present only in one sequencing direction (345, present only in reverse amplicons) . In these cases we do not show any flowgram because they are not explicative of the error type. [0044] Of the four high-frequency mutations that passed the manual inspection, two are the known SNPs detectable in all four samples and two are G:C to A: T clonal somatic transitions only present in sample CC (Figure 4A) . We genotyped eUCR41 in all analyzed individuals (Table 4) and confirmed that the Minor Allele Frequency (MAF) of the two SNPs obtained with 454 was comparable with that inferred from Sanger sequencing (Table 8) . This confirms that amplicons from the nine individuals were pooled in equimolar ratios in all four samples and that all of them contributed uniformly to the obtained results. Sanger sequencing also showed that the two somatic mutations are detectable in heterozygosis in two different patients (patients 5 and 6, Table 4) . From the substitution frequency obtained from pyrosequencing (Table 8) , it can be inferred that mutations 871 and 1095 occur in 37.0% and 23.4% of the corresponding PCR products. Considering that both are heterozygous mutations, they are present in about 74% and 47% of the diploid cancer genomes of patients 6 and 5, respectively. Therefore, these mutations bona fide reflect the expansion of the dominant neoplastic clones. Further experimental validations are needed to assess whether these two clonal mutations are driver or passenger. The fact that both correspond to the wild-type nucleotide in mouse (A:T) suggests that they might be tolerated.
Table 8: MAF of the High Frequency Mutations in eUCR41
Figure imgf000033_0001
Legend: For both SNPs and somatic mutations (MUT) , the Minor Allele Frequency (MAF) in all samples is reported, as derived from 454 and Sanger sequencing. In the case of 454, MAF was calculated as the percentage of reads bearing the minor allele in each sample. In the Sanger screening, it corresponds to the fraction of minor alleles detected in the nine patients and in the nine healthy donors. Sanger genotyping confirmed that the two clonal mutations in sample CC are heterozygous mutations present in two different patients. Combining this information with the frequency in the 454 screening, it is possible to infer that they are present in about 74% and 47% of the cells of the two patients, respectively. CC, colon cancer; NC, non-neoplastic mucosa; PBL, peripheral blood leukocytes; H-PBL, healthy peripheral blood leukocytes.
[0045] The presence of clonal mutations can be used to estimate the mutation rate in HNPCC. In our experimental setting, the substitution frequency is inversely correlated to the time of insertion of each mutation during clonal expansion (Figure 4A) . This inverse correlation allowed us to simulate a model of cell proliferation that reproduces the progressive accumulation of mutations during cancer clonal expansion.
[0046] In our experimental setting, the mutation frequency allows for roughly assigning an "age" to each mutation detected in sample CC (Figures 5A and 6) . In this case, the present inventors refer to "mutation frequency" and not to "substitution frequency" because the present inventors consider only true mutations and not errors. Clonal mutations, which are present in the vast majority of the neoplastic cell population, likely arose at early stages of clonal proliferation. Mutations with lower frequency, and therefore present only in a smaller fraction of the cell population, were instead introduced later. This inverse correlation allows to estimate the mutation rate, defined as the number of mutations per sequenced nucleotide introduced at each cell division (muts/bp/cd) . To derive the mutation rate of HNPCC, we simulated a model of cell proliferation that reproduces the progressive accumulation of mutations during cancer clonal expansion (Figure 5) .
[0047] All parameters needed for the simulation were derived from the experimental data (Table 9) .
Table 9: Parameters for the Simulation Model
Figure imgf000034_0001
Legend: The numbers of founder cells and cell divisions were derived from the experimental data of sample CC. At each cell division, the mutation frequency at each sequence position, was measured. This allowed obtaining a final number of expected mutations with frequency 2.6%-0.1% to be compared to the experimentally observed one.
[0048] Using these parameters, the present inventors developed an algorithm that simulates cell proliferation and the progressive inclusion of mutations. The algorithm is based on three steps: (1) Each simulation starts from nine founder cells, resembling the nine HNPCC carriers. The genome of each cell corresponds to a numerical string as long as eUCR41. All positions of the string were initially set to zero; (2) Eleven cycles of cell division are simulated to generate 36,864 final alleles. This number approximates the number of distinct DNA filaments that were experimentally sequenced; (3) At each cell division, mutations are introduced at a given mutation rate. Each mutation corresponded to a 0 to 1 transition in a random position of the string and was propagated to all daughter cells. Several values of mutations rate were tested and for each value 1,000 simulations were generated to obtain a corresponding distribution of expected mutations. At the end of each simulation, the mutations frequency, defined as the number of mutations divided by the number of final alleles, was calculated for each position of the string (Figure 5) . To take into account the likely presence of non-neoplastic cells within the cancer sample, the resulting mutation frequency was multiplied by 0.7. This value approximates a reliable content of tumoral cells in the initial samples (~70%) , as estimated using histological analysis (Thomas et al . , 2006; Thomas et al . , 2007) and confirmed by the frequency of clonal mutation at position 871 (74%) . Only observed and expected mutations with frequency between 2.6% (clonal mutations) and 0.1% (detection power) were compared. For each tested mutation rate, the probability to observe no mutations was calculated as the fraction of simulations bearing no mutations in the corresponding distribution (Table 10) . Table 9: Mutations Expected in Sample CC at Different Values of
Mutation Rate
Figure imgf000036_0001
Legend: For each value of mutation rate, the expected median number of mutated positions derived from 1,000 simulations and with frequency 2.6-0.1% is shown. Values of the distributions at quantiles corresponding to probability p=0.01 and p=0.99 are reported in parentheses. The probability was derived by comparing the expected distribution to the observed number of mutated positions in sample CC within the same frequency range (0) . The analysis was done for the entire eUCR41 and restricted only to the flanking segments, assuming the frozen status of UCR41. The values of mutation rate that better approximate the number of observed mutated positions are in bold.
[0049] The results of our simulations suggest that MMR deficient genomes acquire mutations at a rate ≤βxlCT6 muts/bp/cd during the clonal expansion (Table 7) . This value constitutes the upper limit of the mutation rate, because we cannot exclude that single base mutations occur at a lower rate. Our estimation represents the first measure of mutation rate inferred from the sequencing screening of more than 40,000 distinct DNA filaments.
[0050] It should be noted that this model relies on a number of assumptions: (1) the mutation rate is constant during the clonal expansion; (2) back-mutations are negligible events; and (3) the rate of cellular death is constant across all nine cell descents . These assumptions are acceptable considering that only eleven cell cycles are necessary to simulate the final number of experimentally analysed alleles.
[0051] According to the results obtained, the HNPCC genome acquires mutations at a rate ≤6xlO-6 mutations per base pair at each cell division (muts/bp/cd) . This estimation represents the first measure of mutation rate inferred directly from sequencing data and is compatible with the estimated frequency of fixed mutations in MMR deficient cancers (3.2x10-5 muts/bp) (Greenman et al. , 2007) .
Instability of HNPCC Neoplastic and Non-Neoplastic Genome
[0052] Low frequency substitutions (<0.1%) likely consist of an indistinguishable mixture of recent true mutations and errors that have been introduced during DNA amplification and pyrosequencing. To reduce the impact of typical 454 errors, indels (insertions and deletions) were excluded from the analysis and only nucleotide substitutions were considered. The pattern of these substitutions is different and their frequency is lower (Table 11) than the recently estimated contribution of PCR errors (Campbell et al . , 2008) . This is likely due to the fact that the polymerase used was the one with the lowest error rate compared to all other thermostable polymerases with 3 '-5' proofreading activity (Flaman et al . , 1994; Cline et al . , 1996; Andre et al . , 1997) . All low frequency substitutions were used to measure the mutability of eUCR41, defined as the substitution frequency over the region (see Materials and Methods) . To verify whether UCR41 is maintained as being conserved in cancer cells, the mutability within eUCR41 was dynamically scanned using sliding windows as long as UCR41. While non-conserved segments of eUCR41 show mutability that are always higher than average, mutability decreases for increasing values of sequence conservation and reaches the minimum in correspondence of the ultraconserved core
(Figure 4B) . To assess the significance of the inverse correlation between mutability and sequence conservation, the distribution of substitution frequency within the ultraconserved core was compared with that of its flanking regions. The two distributions were found to differ significantly in neoplastic and non-neoplastic HNPCC samples, but not in healthy donors
(Table 12) . We checked whether the difference in length and, although minimal (Figure IB) , in base composition between UCR41 and its flanking segments could contribute to the observed difference in mutability. We randomly reassigned the positions of all mutated bases in each sample for 1,000,000 times, preserving the original base composition of the two regions. At each reshuffling step, the ratio between mutability outside and inside the ultraconserved core was derived and the resulting distribution was compared to the experimentally observed value
(Table 12) . We again detected significantly more substitutions in the flanking regions than in the ultraconserved core in all HNPCC samples but not in healthy donors (Figures 6A- 6D) .
Table 11: Frequency and Pattern of Low Frequency Substitutions
Figure imgf000039_0001
Legendi For each type of substitution, the frequency was calculated as the number of times that the substitution was observed divided by the number of times that that position was read.
Table 12: Substitution Frequency and Mutability Outside and
Inside UCR41
Figure imgf000039_0002
Legend: For each sample, reported are the number of positions with low substitution frequency and the median substitution frequency outside and inside UCR41. At such a low substitution frequency, it is not possible to directly compare substitution frequencies between different samples because of the high contribution of run-specific errors. Only when comparing the distributions of substitution frequency outside and inside UCR41, is it clear that they differ significantly in all three HNPCC samples but not in H-PBL (Wilcoxon test) . To assess the probability of obtaining a mutability ratio equal or greater than the observed value, the positions with low substitution frequency in each sample were randomly reassigned 1,000,000 times. Also in this case, the observed mutability ratio in the HNPCC samples is significantly different from the expected (see also Figures 6A- 6D) . *One-tailed Wilcoxon test; **Two-tailed Wilcoxon test (alpha value = 0.05) .
Control for Possible Amplification and Sequencing Errors
[0053] Because we rely on low frequency substitutions for estimating genomic instability, it is instrumental to control for possible sources of noise that could invalidate our results. We therefore re-analyzed the data after filtering for different types of error. First, we removed all stretches of homopolymers (n>3) and two flanking bases on both sides, which are known to accumulate pyrosequencing artifacts (Campbell et al . , 2008) . Second, we removed all reads hosting at least one uncalled base, since they are prone to errors (Huse et al . , 2007) . Finally, we discarded all substitutions occurring only in one read, which bear most random errors (Wang et al . , 2007) . In all cases, the difference in mutability outside and inside UCR41 remains significant in all HNPCC samples and not significant in H-PBL (Table 13) .
Table 13s Mutability Comparison Outside and Inside UCR41 After Filtering for Possible
Sequencing Errors
►C-. o
Figure imgf000041_0001
Legend: Reported is the number of positions with low substitution frequency (<0.1%) outside and inside UCR41 for each sample, after three different filters for sequencing errors were applied. After each filtering, the usual statistical analyses were applied. In particular, the distributions of substitution frequency outside and inside UCR41 were compared using the Wilcoxon test, while the observed mutability ratio was compared to the expected distribution after 1,000,000 random reshufflings . *Two-tailed Wilcoxon test (alpha value = 0.05) , **Probability of observing a mutability ratio equal or higher than the observed value, after 1,000,000 random reshufflings
[0054] Although we used the highest fidelity polymerase, we further controlled whether PCR errors could have any impact on our results. We estimated that -12-15% of low frequency substitutions could be errors introduced by the DNA polymerase (see Materials and Methods) . After randomly removing a comparable fraction of substitutions in all four samples, we again observed the same difference in mutability between outside and inside UCR41 in HNPCC and no difference in H-PBL (Table 3) . This test clearly excludes that PCR errors impacted in a significant manner on the observed difference in mutability between the UCR core and its flanking regions .
[0055] Altogether, the data confirm our initial assumption that UCR41 is also maintained as ultraconserved in somatic cells and it can therefore be used to normalize the experimental errors. At deep coverage, the mutation rate of the HNPCC genome allows detecting an increased occurrence of mutations in the flanking segments when compared to the ultraconserved core. No increase is detectable in the sample H-PBL, although UCR41 is very likely to also be conserved here as well. In this case, the mutation rate of healthy human genome is so low that sequencing errors overcome true mutations in the entire region. The different behavior between HNPCC and healthy samples becomes more evident when the contribution of random errors decreases. For increasing cut-offs of substitution frequency, the mutability ratio increases in all HNPCC samples but not in H-PBL, where it is always around 1 (Figure 7) . This result also excludes that the mutability ratio of the normal sample is due to a casual and non- homogenous distributions of low frequency substitutions between the ultraconserved core and the flanking segments. Sensitivity and Specificity in Detecting Rare Substitutions
[0056] In order to experimentally assess the error rate associated with pyrosequencing, we performed a controlled dilution experiment, in which an amplicon carrying a single mutation (G, corresponding to the SNP at position 1204, Figure IA) was diluted with the corresponding wild-type amplicon (A) . At each step of the four controlled dilutions (1:1,000; 1:2,000; 1:5,000; and 1:10,000) , wild-type and mutant amplicons were first quantified separately to control for experimental inaccuracy and then pooled. The four samples were sequenced using four distinct lanes. Although the expected coverage was 70,000 reads/lane, we obtained around the double amount of reads for each sample, which indicates an optimal experimental setting (Table 14) . By plotting the observed frequency of the mutated allele against the corresponding dilution, we observed a strict linear correlation (R2>0.99) also for the most extreme dilution (Figure 8 and Table 14) . This result assesses the high sensitivity of our procedure in detecting very rare mutations. As expected, we observed a decrease in specificity for decreasing values of substitution frequency (Table 14) , which supports the need of UCR41 as an internal normalization of the experimental error.
Table 14: Sensitivity and Specificity in Detecting Rare Mutations
Figure imgf000043_0001
Legend; For each dilution value, reported are the total number of sequenced reads, the number of reads bearing the mutated allele (G) , and the observed and expected substitution frequency. At each dilution value, we considered errors at all positions showing a substitution frequency equal or higher than the corresponding frequency of the mutated allele. This allowed the measurement of the specificity, defined as the number of true negatives (156 - Errors) over all variable positions (156) .
DISCUSSION
[0057] We exploited the frozen status of UCR41 to increase sensitivity and specificity of ultra-deep sequencing and hence quantify cancer-associated genomic instability. The obtained results offered several insights into cancer genetics. We provided the first indication that an ultraconserved element is also conserved in somatic cells with conditions of genomic instability. This result suggests that genomic instability is not constant in all regions of the cancer genome and that, even in advanced tumoral stages such as carcinoma, certain genomic portions are utterly preserved from modifications. It remains to be verified whether all UCRs are under the same somatic conservation and which are the reasons for this conservation. In the case of UCR41, it could be a sign of strong purifying selection. In mouse embryos, UCR41 acts as an enhancer (Pennacchio et al . , 2006), while in adult tissues it transcribes for non-coding RNAs (Calin et al . , 2007) . In addition, a genomic rearrangement in the region in between UCR41 and the downstream PROXl gene can cause heart defects (Gill et al . , 2009) . Altogether, these observations may indeed indicate that UCR41 is under functional constraints that prevent its modification in both germline and somatic cells. However, the alternative hypothesis of UCR41 as a cold spot for mutations cannot be completely ruled out, given the report that mice lacking UCRs are in general viable and fertile (Ahituv et al . , 2007) .
[0058] Whatever the biological reason for its conservation may be, we proved that UCR41 can be used to measure genomic instability. The lack of sensitivity of standard sequencing methods, which only return a consensus of the most represented molecular species, has so far prevented clarification of several aspects of cancer-associated genomic instability. One of these concerns is the rate of inclusion of somatic mutations during tumor clonal expansion. Owing to the possibility of sequencing single DNA filaments, we were able to estimate that HNPCC cancer cells can acquire a maximum of 6 somatic mutations/Mb at each cell division during their clonal expansion. Assuming that replication errors hit random positions of the human genome, less than one out of 60 human genes will acquire one somatic mutation at each cell division. Of those, only mutations that are either neutral or beneficial for the cancer cell will be kept and eventually fixed into the population. Although the simulation is based on a small region of the human genome and relies on some assumptions, namely the model does not consider cell death, differentiation and apoptotic loss, it represents the first estimation of mutation rate based on sequence data.
[0059] The most striking result of this study is the finding that the genome of non-neoplastic HNPCC cells has a constitutional mutation rate higher than that of MMR proficient genomes and, therefore, it is deficient in repairing DNA (Figure 7) . Although there are sporadic reports of low frequency MSI (Parsons et al . , 1995; Alazzouzi et al . , 2005) , HNPCC nonneoplastic cells are commonly assumed to repair DNA normally (Parsons et al . , 1993; de Ia Chapelle, 2004) . This lack of knowledge was due to the use of assays that require the presence of clonal mutations. For example, to reproduce the data of this experiment, several thousands of different clones should be obtained and sequenced, with the concrete possibility of cloning PCR errors.
[0060] The data obtained in this study now clearly show that there is a tendency of MMR+/~ genome to accumulate low frequency substitutions even before cancer transformation. This constitutional instability could predispose MMR+/" individuals to the inactivation of the second allele, which is a mandatory step to initiate carcinogenesis (Parsons et al . 1993; de Ia Chapelle 2004) . This finding highlights the importance of an early diagnosis of genomic instability for selecting the best clinical approach to prevent, slow down, or monitor the progression to cancer. A molecular test to reveal cancer predisposition could also restrict invasive surveillance examinations, such as colonoscopy and/or extracolonic screening of endometrium and ovary, only to positive carriers. To date, predisposition testing in family members with the Lynch syndrome consists of genetic screening of the MMR genes to identify germline mutations (Lynch, 2007; Vasen et al . , 2007) . Our strategy here constitutes an alternative test for diagnosing cancer predisposition without any a priori knowledge of the mutated genes.
[0061] Having now fully described this invention, it will be appreciated by those skilled in the art that the same can be performed within a wide range of equivalent parameters, concentrations, and conditions without departing from the spirit and scope of the invention and without undue experimentation.
[0062] While this invention has been described in connection with specific embodiments thereof, it will be understood that it is capable of further modifications. This application is intended to cover any variations, uses, or adaptations of the inventions following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains and as may be applied to the essential features hereinbefore set forth as follows in the scope of the appended claims . [0063] All references cited herein, including journal articles or abstracts, published or corresponding U.S. or foreign patent applications, issued U.S. or foreign patents, or any other references, are entirely incorporated by reference herein, including all data, tables, figures, and text presented in the cited references. Additionally, the entire contents of the references cited within the references cited herein are also entirely incorporated by references.
[0064] Reference to known method steps, conventional methods steps, known methods or conventional methods is not in any way an admission that any aspect, description or embodiment of the present invention is disclosed, taught or suggested in the relevant art .
[0065] The foregoing description of the specific embodiments will so fully reveal the general nature of the invention that others can, by applying knowledge within the skill of the art (including the contents of the references cited herein) , readily modify and/or adapt for various applications such specific embodiments, without undue experimentation, without departing from the general concept of the present invention. Therefore, such adaptations and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein. It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance presented herein, in combination with the knowledge of one of ordinary skill in the art. REFERENCES
Aaltonen LA, Peltomaki P, Mecklin JP, Jarvinen H, Jass JR et al . (1994) Replication errors in benign and malignant tumors from hereditary nonpolyposis colorectal cancer patients. Cancer Res 54(7) : 1645-1648.
Ahituv N, Zhu Y, Visel A, Holt A, Afzal V et al . (2007) Deletion of ultraconserved elements yields viable mice. PLoS Biol 5 (9) :e234.
Alazzouzi H, Domingo E, Gonzalez S, Blanco I, Armengol M et al .
(2005) Low levels of microsatellite instability characterize MLHl and MSH2 HNPCC carriers before tumor diagnosis. Hum MoI Genet 14 (2) :235-239.
Andre P, Kim A, Khrapko K, Thilly WG (1997) Fidelity and mutational spectrum of Pfu DNA polymerase on a human mitochondrial DNA sequence. Genome Res 7 (8) : 843 -852.
Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE (2001)
Segmental duplications: organization and impact within the current human genome project assembly. Genome Res 11(6) : 1005-1017.
Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ et al .
(2004) Ultraconserved elements in the human genome. Science 304 (5675) : 1321-1325.
Bielas JH, Loeb LA (2005) Quantification of random genomic mutations. Nat Methods 2 (4) : 285-290.
Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF et al .
(2004) Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 14 (4) : 708-715.
Blasi MF, Ventura I, Aquilina G, Degan P, Bertario L et al .
(2006) A human cell-based assay to evaluate the effects of alterations in the MLHl mismatch repair gene. Cancer Res 66 (18) :9036-9044.
Boland CR, Thibodeau SN, Hamilton SR, Sidransky D, Eshleman JR et al . (1998) A National Cancer Institute Workshop on Microsatellite Instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer. Cancer Res 58 (22) : 5248-5257. Calin GA, Liu C-g, Ferracin M, Hyslop T, Spizzo R et al . (2007)
Ultraconserved Regions Encoding ncRNAs Are Altered in Human Leukemias and Carcinomas. Cancer Cell 12 (3 ) :215-229.
Campbell PJ, Pleasance ED, Stephens PJ, Dicks E, Ranee R et al .
(2008) Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proc Natl Acad Sci U S A 105(35) : 13081-13086.
Canzian F, Salovaara R, Hemminki A, Kristo P, Chadwick RB et al . (1996) Semiautomated assessment of loss of heterozygosity and replication error in tumors. Cancer Res 56(14) :3331- 3337.
Cline J, Braman JC, Hogrefe HH (1996) PCR fidelity of pfu DNA polymerase and other thermostable DNA polymerases. Nucleic Acids Res 24 (18) : 3546-3551.
Dabrowski S, Kur J (1998) Cloning and expression in Escherichia coli of the recombinant his-tagged DNA polymerases from Pyrococcus furiosus and Pyrococcus woesei . Protein Expr Purif 14 (1) : 131-138. de Ia Chapelle A (2004) Genetic predisposition to colorectal cancer. Wat Rev Cancer 4 (10) : 769-780.
Derti A, Roth FP, Church GM, Wu CT (2006) Mammalian ultraconserved elements are strongly depleted among segmental duplications and copy number variants. Wat Genet 38 (10) : 1216-1220.
Drake et al . , (2006) Conserved noncoding sequences are selectively constrained and not mutation cold spots, Wat. Genet 38:23-7
Flaman JM, Frebourg T, Moreau V, Charbonnier F, Martin C et al .
(1994) A rapid PCR fidelity assay. Nucleic Acids Res 22(15) : 3259-3260.
Gill HK, Parsons SR, Spalluto C, Davies AF, Knorz VJ et al .
(2009) Separation of the PROXl gene from upstream conserved elements in a complex inversion/translocation patient with hypoplastic left heart. Eur J Hum Genet.
Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C et al .
(2007) Patterns of somatic mutation in human cancer genomes Nature 446 (7132) : 153-158. Huse SM, Huber JA, Morrison HG, Sogin ML, Welch DM (2007) Accuracy and quality of massively parallel DNA pyrosequencing . Genome Biol 8(7) :R143.
Jones S, Zhang X, Parsons DW, Lin JC-H, Leary RJ et al . (2008)
Core Signaling Pathways in Human Pancreatic Cancers Revealed by Global Genomic Analyses. Science 321 (5897) : 1801-1806.
Jurka J (2000) Repbase update: a database and an electronic journal of repetitive elements. Trends Genet 16(9) :418-420.
Katzman S, Kern AD, Bejerano G, Fewell G, Fulton L et al . (2007) Human genome ultraconserved elements are ultraselected. Science 317 (5840) : 915.
Li M, Diehl F, Dressman D, Vogelstein B, Kinzler KW (2006)
BEAMing up for detection and quantification of rare sequence variants. Nat Methods 3(2) : 95-97.
Loeb LA (1991) Mutator phenotype may be required for multistage carcinogenesis. Cancer Res 51 (12 ) : 3075-3079.
Lynch HT, de Ia Chapelle A (2003) Hereditary colorectal cancer. N Engl J Med 348 (10) : 919-932.
Lynch PM (2007) New issues in genetic counseling of hereditary colon cancer. Clin Cancer Res 13(22 Pt 2) : 6857s-6861s .
Margulies M, Egholm M, Altman WE, Attiya S, Bader JS et al .
(2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437 (7057) : 376-380.
McEwen GK, Woolfe A, Goode D, Vavouri T, Callaway H et al . (2006) Ancient duplicated conserved noncoding elements in vertebrates: a genomic and functional analysis. Genome Res 16 (4) :451-465.
Parsons DW, Jones S, Zhang X, Lin JC-H, Leary RJ et al . (2008) An Integrated Genomic Analysis of Human Glioblastoma Multiforme. Science 321 (5897) : 1807-1812.
Parsons R, Li GM, Longley M, Modrich P, Liu B et al . (1995)
Mismatch repair deficiency in phenotypically normal human cells. Science 268 (5211) :738-740.
Parsons R, Li GM, Longley MJ, Fang WH, Papadopoulos N et al . (1993) Hypermutability and mismatch repair deficiency in RER+ tumor cells. Cell 75 (6) : 1227-1236. Peltomaki P, de Ia Chapelle A (1997) Mutations predisposing to hereditary nonpolyposis colorectal cancer. Adv Cancer Res 71 :93-119.
Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA et al .
(2006) In vivo enhancer analysis of human conserved non- coding sequences. Nature 444 (7118) : 499-502.
Shendure J, Ji H (2008) Next -generation DNA sequencing. Nat Biotechnol 26 (10) : 113.5-1145.
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L et al . (2001) dbSNP : the NCBI database of genetic variation. Nucleic Acids Res 29 (1) :308-311.
Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M et al .
(2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15(8) : 1034-1050.
Soreide K (2007) Molecular testing for microsatellite instability and DNA mismatch repair defects in hereditary and sporadic colorectal cancers- -ready for prime time? Tumour Biol 28 (5) :290-300.
Thomas RK, Baker AC, Debiasi RM, Winckler W, Laframboise T et al .
(2007) High-throughput oncogene mutation profiling in human cancer. Nat Genet 39 (3 ) : 347-351.
Thomas RK, Nickerson E, Simons JF, Janne PA, Tengs T et al .
(2006) Sensitive mutation detection in heterogeneous cancer specimens by massively parallel picoliter reactor sequencing. Nat Med 12 (7) : 852-855.
Thorisson GA, Smith AV, Krishnan L, Stein LD (2005) The International HapMap Project Web site. Genome Res 15 (11) :1592-1593.
Vasen HF, Moslein G, Alonso A, Bernstein I, Bertario L et al .
(2007) Guidelines for the clinical management of Lynch syndrome (hereditary non-polyposis cancer) . J Med Genet 44 (6) :353-362.
Visel A, Prabhakar S, Akiyama JA, Shoukry M, Lewis KD et al .
(2008) Ultraconservation identifies a small subset, of extremely constrained developmental enhancers. JVat Genet 40 (2) :158-160.
Wang C, Mitsuya Y, Gharizadeh B, Ronaghi M, Shafer RW (2007) Characterization of mutation spectra with ultra-deep pyrosequencing : application to HIV-I drug resistance. Genome Res 17 (8) :1195-1201.
Wood LD, Parsons DW, Jones S, Lin J, Sjoblom T et al . (2007) The genomic landscapes of human breast and colorectal cancers . Science 318 (5853) : 1108-1113.
Yang R, Frank B, Hemminki K, Bartram CR, Wappenschmidt B et al . (2008) SNPs in ultraconserved elements and familial breast cancer risk. Carcinogenesis 29(2) :351-355.

Claims

WHAT IS CLAIMED IS:
1. A method for detecting or diagnosing genomic instability in a subject, comprising: sequencing a genomic region that is not conserved among species, said region being obtained from DNA of cells of the subj ect ; sequencing an ultra- conserved (UCR) genomic region that is under strong evolutionary constraints, said region being obtained from DNA of cells of the subject; and determining the difference in mutation frequency between the two regions, wherein a significant increase in mutation frequency in the non-conserved region as compared to the UCR under strong evolutionary constraints establishes a likelihood of genomic instability in the subject, whereby the UCR serves as an internal control to detect an increase in mutations in the non-conserved region.
2. The method of claim 1, wherein the non-conserved region and the UCR under strong evolutionary constraints that are sequenced are in non-coding regions of the genome.
3. The method of claim 1, wherein the non-conserved region and the UCR under strong evolutionary constraints that are sequenced each comprises at least 200 base pairs.
4. The method of claim 1, wherein the non-conserved region and the UCR under strong evolutionary constraints that are sequenced each comprises at least 400 base pairs.
5. The method of claim 3 or claim 4, wherein the length of the genomic region containing the non-conserved region that is sequenced is at least twice as long as the length of the UCR under strong evolutionary constraints that is sequenced.
6. The method of claim 1, wherein said sequencing steps are accomplished by ultra deep, single molecule sequencing technology.
7. The method of claim 6, wherein said sequencing steps provides at least 20,000 reads of the genomic regions.
8. The method of claim 1, wherein the subject from whom the samples are taken is a subject suspected of having a hereditary predisposition to Lynch syndrome.
9. The method of claim 1, wherein the non-conserved genomic region and the UCR under strong evolutionary constraints are obtained from the same cells of the subject.
10. The method of claim 9, wherein the cells of the subject from which the DNA of the non-conserved genomic region and the UCR under strong evolutionary constraints are obtained are blood cells.
11. The method of claim 1, wherein the non-conserved region has no more than 60% conservation between species.
12. The method of claim 1, wherein the subject is a patient suspected of having Lynch syndrome, wherein a statistically significant difference in mutation frequency between the non- conserved and UCR regions of the genome indicates a likelihood that the subject already has, or can develop, Lynch syndrome .
13. The method of claim 1, wherein the UCR under strong evolutionary constraints is UCR41.
14. The method of claim 1, wherein the non- conserved region is in a region of the genome flanking the UCR within about 600 to 700 base pairs from the UCR.
15. The method of claim 1, wherein the mutation rate of the subject is considered to be higher than normal if there is a statistically significant difference in mutation frequency between the UCR under strong evolutionary constraints and the non-conserved region.
PCT/US2009/053423 2008-08-11 2009-08-11 Method for detecting or diagnosing genomic instability Ceased WO2010019588A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13608708P 2008-08-11 2008-08-11
US61/136,087 2008-08-11

Publications (1)

Publication Number Publication Date
WO2010019588A1 true WO2010019588A1 (en) 2010-02-18

Family

ID=41669242

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2009/053423 Ceased WO2010019588A1 (en) 2008-08-11 2009-08-11 Method for detecting or diagnosing genomic instability

Country Status (1)

Country Link
WO (1) WO2010019588A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011101744A3 (en) * 2010-02-22 2011-12-08 Population Genetics Technologies Ltd. Region of interest extraction and normalization methods
US20120178077A1 (en) * 2010-08-31 2012-07-12 Canon U.S. Life Sciences, Inc. Thermal calibration
US20140011184A1 (en) * 2010-08-31 2014-01-09 Canon U.S. Life Sciences, Inc. Positive Controls
US20220213561A1 (en) * 2014-04-21 2022-07-07 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US12494267B2 (en) 2010-05-18 2025-12-09 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US12492429B2 (en) 2014-04-21 2025-12-09 Natera, Inc. Detecting mutations and ploidy in chromosomal segments

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7078168B2 (en) * 2001-02-27 2006-07-18 Biotage Ab Method for determining allele frequencies
US20070037185A1 (en) * 2005-05-11 2007-02-15 The Board Of Regents Of The University Of Texas System Estimating allele frequencies by small pool PCR

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7078168B2 (en) * 2001-02-27 2006-07-18 Biotage Ab Method for determining allele frequencies
US20070037185A1 (en) * 2005-05-11 2007-02-15 The Board Of Regents Of The University Of Texas System Estimating allele frequencies by small pool PCR

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BEJERANO ET AL.: "Ultraconserved elements in the human genome.", SCIENCE, vol. 304, no. 5675, 2004, pages 1321 - 1325 *
SIMONS ET AL.: "Transposon-free regions in mammalian genomes.", GENOME RESEARCH, vol. 16, no. 2, 2006, pages 164 - 172 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011101744A3 (en) * 2010-02-22 2011-12-08 Population Genetics Technologies Ltd. Region of interest extraction and normalization methods
US12494267B2 (en) 2010-05-18 2025-12-09 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US20120178077A1 (en) * 2010-08-31 2012-07-12 Canon U.S. Life Sciences, Inc. Thermal calibration
US20140011184A1 (en) * 2010-08-31 2014-01-09 Canon U.S. Life Sciences, Inc. Positive Controls
US10591364B2 (en) * 2010-08-31 2020-03-17 Canon U.S.A., Inc. Thermal calibration
US11022573B2 (en) * 2010-08-31 2021-06-01 Canon U.S.A., Inc. Positive controls
US20220213561A1 (en) * 2014-04-21 2022-07-07 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US12492429B2 (en) 2014-04-21 2025-12-09 Natera, Inc. Detecting mutations and ploidy in chromosomal segments

Similar Documents

Publication Publication Date Title
US20250006299A1 (en) Detection of genetic or molecular aberrations associated with cancer
Hegde et al. ACMG technical standards and guidelines for genetic testing for inherited colorectal cancer (Lynch syndrome, familial adenomatous polyposis, and MYH-associated polyposis)
Mao et al. Genetic testing for inherited colorectal cancer and polyposis, 2021 revision: a technical standard of the American College of Medical Genetics and Genomics (ACMG)
Bottillo et al. Germline and somatic NF1 mutations in sporadic and NF1‐associated malignant peripheral nerve sheath tumours
De Grassi et al. Ultradeep sequencing of a human ultraconserved region reveals somatic and constitutional genomic instability
US20090123928A1 (en) Genomic Landscapes of Human Breast and Colorectal Cancers
AU2007284649B2 (en) Consensus coding sequences of human breast and colorectal cancers
WO2010123626A1 (en) Cd133 polymorphisms and expression predict clinical outcome in patients with cancer
WO2010019588A1 (en) Method for detecting or diagnosing genomic instability
KR102275752B1 (en) Method and kit for determining the genome integrity and/or the quality of a library of dna sequences obtained by deterministic restriction site whole genome amplification
WO2017112738A1 (en) Methods for measuring microsatellite instability
KR101582723B1 (en) Genetic marker for predicting a risk of developing colorectal cancer and use thereof
Pirulli et al. Molecular analysis of hyperoxaluria type 1 in Italian patients reveals eight new mutations in the alanine: glyoxylate aminotransferase gene
WO2009103061A2 (en) Methods and compositions for identifying, diagnosing, and treating neuroblastoma
JP2008048733A (en) How to predict the risk of developing cancer
US20220205043A1 (en) Detecting cancer risk
Garcia‐Barceló et al. Mutational analysis of SHH and GLI3 in anorectal malformations
AU2017279575A1 (en) Detection of genetic or molecular aberrations associated with cancer
Yan et al. Clinical features and mismatch repair genes analyses of Chinese suspected hereditary non‐polypsis colorectal cancer: A cost‐effective screening strategy proposal
Shoukier et al. Characterization of five novel large deletions causing hereditary haemorrhagic telangiectasia
EP2385135B1 (en) RAD51C as a human cancer susceptibility gene
Suszynska et al. Variant identification in BARD1, PRDM9, RCC1, and RECQL in patients with ovarian cancer by targeted next-generation sequencing of DNA pools
KR101492710B1 (en) Makers for the diagnosis to prostate cancer using FGF23 gene and method for predicting and detecting to prostate cancer using the same
Kushnir et al. Su1812 Advanced Colorectal Adenomas in Patients Under 45 Years of Age: Are We Still Missing Lynch Syndrome?
Langer et al. Mutations and polymorphisms in the SDHB, SDHD, VHL, and RET genes in sporadic and familial pheochromocytomas.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09807173

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09807173

Country of ref document: EP

Kind code of ref document: A1