WO2016068627A1 - Procédé d'analyse de variation du nombre de copies absolu sur la base d'un seul échantillon - Google Patents
Procédé d'analyse de variation du nombre de copies absolu sur la base d'un seul échantillon Download PDFInfo
- Publication number
- WO2016068627A1 WO2016068627A1 PCT/KR2015/011515 KR2015011515W WO2016068627A1 WO 2016068627 A1 WO2016068627 A1 WO 2016068627A1 KR 2015011515 W KR2015011515 W KR 2015011515W WO 2016068627 A1 WO2016068627 A1 WO 2016068627A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sample
- copy number
- purity
- sequence information
- target region
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/10—Ploidy or copy number detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B99/00—Subject matter not provided for in other groups of this subclass
Definitions
- the present invention relates to a method for analyzing an absolute copy number variation based on a single sample, and relates to a method for analyzing an absolute copy number in at least one target region for an experimental sample using only an experimental sample without a control sample.
- Copy number variation is a structural
- CNV structural variation
- CNV refers to amplification or deletion of DNA fragments of lkb or more.
- CNV is present at a very high frequency of more than 10 percent in the human population, and the average size of CNV in an individual's genome is 3.5 ⁇ 0.5 Mbp (0.1 percent).
- Many studies have demonstrated that CNV is associated with complex diseases such as autism, schizophrenia, Alzheimer's disease, and cancer.
- NGS Next Generation Sequencing
- Non-Patent Document 1 Alkan C et al., Nature Genetics 41: 1061-1067; JL Hayes et al., Genomics, vol. 102, Issue 3, pp. 174-181, 2013; Chiang DY et al., Nature Methods 6: 677-681 [Content of invention]
- One example of the present invention is to provide a method for analyzing an absolute copy number in at least one target region for a test sample using only a test sample without a control sample.
- Another example of the invention provides at least one target for a test sample
- a computer readable method for analyzing absolute copy numbers in a region is provided.
- Another example of the invention provides a computer program or computer executable instruction stored on a computer readable storage medium for carrying out a method of analyzing absolute copy numbers in at least one target area for an experimental sample. It provides a computer-readable storage medium (or recording medium) recorded.
- Another example of the invention provides at least one target for a test sample
- a computer readable storage medium (or recording medium) containing a computer program stored on a computer readable storage medium or a computer executable instruction for executing a computer readable method for analyzing an absolute copy number in an area.
- the read count by reading the mapping of the test sample sequence data to the standard reference sequence data for each chromosomal position (Chromosomal Position) (Read Count) ), Calculating a target region ratio (TRR) based on the read count, estimating the purity of the test sample and the average number of replicates of the test sample in at least one target region, calculating and estimated parameters Predicting an absolute number of replicates of at least one target region in the experimental sample based on.
- one example of the present invention is a sample sequence information on the chromosome
- TRR sample target region ratio
- It relates to a method for determining the absolute copy number variation of a sample based on a single sample.
- TRR target region ratio
- the present invention and another example provide a computer program stored in a computer readable storage medium for carrying out the steps of the computer readable method.
- a further example of the present invention provides a computer readable storage medium (or recording medium) containing computer executable instructions for executing the steps of the computer readable method.
- the number of copies of the test sample can be determined without the process of lead mapping of the control sample, the control sample sequence data, and the control sample of the control sample.
- the absolute number of copies in at least one target region can be known, thus providing accurate replication in the desired target region. The number can be determined, and even if no control sample is present, it can be useful for discovering somatic cell clone variation.
- FIG. 1 is a block diagram illustrating an analysis system for copy number variation based on a single sample according to an embodiment of the present invention.
- FIG. 2 is a block diagram for explaining an apparatus for analyzing copy number variation illustrated in FIG. 1.
- FIG. 3 is a block diagram illustrating a method for analyzing copy number variation performed in the apparatus for analyzing copy number variation shown in FIG. 1.
- FIG. 4 is a diagram for describing lead mapping performed in the apparatus for analyzing copy number variation illustrated in FIG. 1.
- FIG. 5 is a diagram comparing a simulation result graph when using the copy number predictor shown in FIG. 1 and using the existing copy number predictor.
- FIG. 6 is a flowchart illustrating a method of analyzing absolute copy number variation based on a single sample according to an embodiment of the present invention.
- FIG. 7 illustrates a computer readable storage medium for executing the method for analyzing sample sequence information according to an embodiment of the present invention.
- FIG. 8 is a sample sequence executed in the analysis device for copy number variation shown in FIG. It is a block diagram for demonstrating the information analysis method.
- FIG. 9 is a diagram for describing a method of calculating a frequency rate performed by the apparatus for analyzing copy number variation shown in FIG. 1.
- FIG. 10 is a diagram for describing a segmentation method performed in the apparatus for analyzing copy number variation shown in FIG. 1.
- FIG. 1 is a diagram for describing a node definition method for candidate extraction performed in the apparatus for analyzing copy number variation shown in FIG. 1.
- FIG. 12 is a diagram for describing a filtering method performed in the apparatus for analyzing copy number variation illustrated in FIG. 1.
- FIG. 13 is a view for explaining an estimation method performed in the analysis apparatus for copy number variation shown in FIG. 1.
- FIG. 14 is a diagram comparing graphs of sample purity simulation values with respect to sample estimate values estimated by the apparatus for analyzing copy number variation shown in FIG. 1.
- 15 is a flowchart illustrating a method of estimating sample purity and number of copies according to an embodiment of the present invention.
- a target region and a target base sequence mean a selected region (target region) and a base sequence (target base sequence) of the region, respectively, to be analyzed in the genome or chromosome.
- the target region and target base sequence may be present in one or more for one sample.
- the target region may be an arbitrary section to be analyzed in full-length genome analysis, and may be a region for designing and selecting probes for sequencing at library prep in targeted sequencing. .
- the analysis system 1 for copy number variation may include a genome decoder 100 and an analysis apparatus 300 for copy number variation.
- the analysis system (1) of the copy number variation of FIG. 1 is only an embodiment of the present invention, the present invention is not limitedly interpreted through FIG. 1.
- each component of FIG. 1 is generally connected through a network 200.
- the genome decoder 100 and the copy number analysis device 300 may be connected through the network 200.
- the genome reader 100 and the analysis device 300 for copy number variation may be directly connected.
- the network 200 refers to a connection structure capable of exchanging information between respective nodes such as terminals and servers.
- An example of such a network 200 is WCDMA, Internet, or LAN (Local).
- the genome decoder 100 and the analysis apparatus 300 for copy number variation disclosed in FIG. 1 are not limited to those shown in FIG. 1.
- the genome decoder 100 may amplify DNA sequences, photograph fluorescent labels, and the like by photographing means, and perform image processing to parallelize DNA genetic information.
- the genome decoder 100 may be applied to a field for determining gene mutation, DNA copy number, and chromosome rearrangement.
- the genome decoder 100 may read a single DNA several times. The number of reads may be defined as a read count, and the read count may also be defined as a depth.
- the copy number variation analysis device 300 read-maps the experimental sample sequence data to standard reference sequence data to calculate a read count, and calculates a target region ratio (TRR) based on the read count. Can be calculated.
- TRR target region ratio
- the analysis apparatus 300 for copy number variation does not require the control sample sequence data of a control sample of a control sample in the process of calculating the TRR, which will be described in detail through the following equation.
- the apparatus 300 for analyzing the copy number variation estimates TRR, which is a predetermined number of purity, average number of copies, and number of copies of the experimental sample, and calculates and estimates the parameters, namely, TRR, purity, average number of copies, and average number of copies.
- the absolute number of replicates of the experimental sample can be predicted based on parameters including the number TRR.
- a parameter including TRR in which purity, average number of copies, and average number of copies are preset numbers, may borrow a prediction value from the outside.
- the analysis apparatus 300 for copy number variation may be implemented as a computer that can access a server or a terminal at a remote location through the network 200.
- the computer may include, for example, a notebook, a desktop, a laptop, and the like.
- FIG. 2 is a block diagram illustrating an apparatus for analyzing copy number variation shown in FIG. 1
- FIG. 3 illustrates a method for analyzing copy number variation performed in the apparatus for analyzing copy number variation shown in FIG. 1.
- FIG. 4 is a block diagram for explaining lead mapping performed in the apparatus for analyzing copy number variation shown in FIG. 1
- FIG. 5 illustrates a case where the copy number predictor shown in FIG. It is a figure which compared the simulation result graph in the case of using a number prediction apparatus.
- the copy number predicting apparatus 300 includes a receiver 310, a calculator 330, an estimator 350, and a predictor 370. can do.
- the receiver 310 may receive the experiment sample sequence data generated by the genome decoder 100 (S3 100).
- the test sample sequence data may be data having a plurality of read counts by reading the test sample in the genome reader 100 a plurality of times.
- the test sample may be a cancer sample.
- the calculation unit 330 converts the received experimental sample sequence data onto chromosomes.
- a target region ratio may be calculated based on the read count of read mapping of standard reference sequence data for each position (S3200, S3300) (S3400).
- a read count is calculated by (a) assuming that the test sample sequence data and the control sample sequence data are present, and (b) lead mapping the control sample sequence data to standard reference sequence data which is a human standard sequence.
- the TRR of the target area may be obtained based on the obtained read count.
- the TRR is the ratio of the read count of at least one target region located in the experimental sample data and the control sequence data, which is the ratio of the read count in the target region ⁇ .
- the target region ratio (TRR) may be expressed by Equation 1 below.
- TRR is the ratio of the read count of the test sample and the control sample in the specific target area
- ⁇ is the mapped read count of the test sample and the control sample in the target area
- ⁇ is the mapped control Is the total read count of the sample
- ⁇ is the total read count of the mapped test samples, which is the sum of ti in at least one target area.
- the test sample may be a cancer sample and the control sample may be a normal sample.
- TRR is calculated for at least one target region, and TRR may be derived as in Equation 2 and Equation 3 below.
- the TRR according to the present invention is at least lead mapped. It may be calculated based on the read count in one target area and the ratio of the total sum of the read counts.
- d can be divided into dt (test sample) and dn (control sample) according to the sample.
- T is the total read count of the test sample, and L is the total number of target areas.
- d n is the unit read count of the control sample
- T is the total lead count of the mapped test samples, which is the sum of ti in at least one target region i.
- Equation 3 control sample sequence data of the control sample is not required. That is, the final 11 is required only t b T, L value, because ti, T, L are all obtained from the test sample as described above.
- the method for analyzing absolute copy number variation based on a single sample calculates the TRR using only the experimental sample and replicates the present invention, even if there is no control sample sequence data of a control sample which is a control of the experimental sample. The number variation can be predicted.
- T which is the sum of ⁇ in at least one target region, may be summarized as in Equation 4 below.
- the total read count ⁇ can be calculated as follows. Where ⁇ is the total lead count of the target region of the test sample and Lj is the total number of target regions with the number of copies j. Since the average copy number J of a pure experimental sample, for example, a dark sample, can be expressed by Equation 5 below, ⁇ can be expressed by Equation 6 below. Like ⁇ , ti can be expressed as equation (7). At least one target area, ie .
- the lead count of the test sample sequence data in the at least one target area is represented by the following equation. It can be defined as 7.
- Equation 8 Equation 8
- Equation 8 Equation 8 + g ⁇ 2) a 2 + 0-2)
- Equation 9 Equation 9
- ⁇ is the purity of the test sample
- J is the average copy number of the test sample
- j is the number of copies of the i th target region of the test sample.
- J and ⁇ can be expressed by Equation 10 by arranging them as A as in Equation 8.
- J and ⁇ can receive input from the outside. That is, an estimate may be received from an external pathology specialist, estimated using a micro array, or received through another method (S350O).
- the prediction unit 370 may predict the absolute copy number of the experimental sample based on the calculated and estimated parameters.
- the calculated parameter may be TRR
- the estimated parameters may be J and ⁇ . That is, if the prediction unit 370 defines S and J as an absolute copy number score as in Equation 1 1 based on the calculated and estimated parameters, the equation 10 may be converted as shown in Equation 12, and the absolute copy number ji in the i th target region may be calculated.
- S and J are absolute copy scores where the i th target region of the test sample has a copy number j
- A is as defined in Equation 9, can be calculated through the estimated parameters J and ⁇ ,
- TRR J is a TRR having a copy number j in the i-th target region of the test sample
- TRR 1 is a TRR of the test sample having a read count of 1 in at least one target area.
- TRR is calculated using a test sample, for example, a read count measured from a cancer sample, and a value obtained by dividing A by subtracting A (TRR j -A) by (TRR'-A) is obtained. The number could be determined.
- TRR 1 means the TRR of a test sample having a read count of 1 in at least one target region, and A can be calculated through the estimated parameters J (average number of copies of the test sample) and ⁇ (purity of the test sample). have.
- a TRR which is a preset number of copies in at least one target region, may be estimated.
- the preset copy number may be 1 and may be represented by TRR 1 , where TRR 1 may mean a TRR having 1 copy number in at least one target region.
- the absolute copy number of the test sample may be the absolute copy number of the test sample in at least one target region.
- the method for analyzing absolute copy number variation based on a single sample according to an embodiment of the present invention may be performed even if the control sample sequence data other than the control sample does not exist, that is, only the test sample is used. It is possible to predict an absolute copy number in at least one target region. At this time, if the absolute number of copies in the at least one target region can be identified through the method for analyzing the absolute number of copies variation based on a single sample according to an embodiment of the present invention, accurate treatment and clinical experiment of the patient may be possible.
- FIG. 5 shows the results of the simulation of the TRR according to the chromosomal position
- (b) shows the results of simulating S and j according to the chromosomal position according to an embodiment of the present invention do.
- (a) shows a simulation result of 50% of the cancer sample purity, and since only the TRR is shown, the absolute number of copies cannot be known, but
- (b) shows that each segment is listed according to the absolute number of copies. It can be seen that the absolute number of copies in the target area can be identified.
- the method of analyzing the absolute copy number variation based on the single sample of FIGS. 2 to 5 may be easily inferred from the same or described contents, and thus descriptions thereof will be omitted.
- FIG. 6 is a flowchart illustrating a method of analyzing absolute copy number variation based on a single sample according to an embodiment of the present invention.
- the apparatus for analyzing copy number variation receives experimental sample sequence data generated by a genome sequencer (S6100).
- the apparatus for analyzing the copy number variation is based on a read count in which the received test sample sequence data is read mapped to standard reference sequence data for each chromosomal position. Region Ratio) is exported (S6200).
- the analysis apparatus for copy number variation estimates the purity of the test sample and the average copy number of the test sample in the at least one target region (S6300).
- the analysis apparatus for copy number variation predicts the absolute copy number of the experimental sample based on the calculated and estimated parameters (S6400). Such matters that are not described for the analysis of the absolute copy number variation based on the single sample of FIG. 6 are the same as those described above for the method for analyzing the absolute copy number variation based on the single sample through FIGS. 1 to 5. Since it can be easily inferred from the description, it will be omitted.
- Another example of the invention relates to a method for analyzing absolute copy number variation based on a single sample.
- the method for analyzing absolute copy number variation based on a single sample according to the present invention may comprise the following steps:
- TRR target region ratio
- the sample sequence information is read by read mapping to reference sequence information for each chromosome position.
- the step of obtaining the count can be performed.
- the sample sequence information and the reference sequence information can be obtained by a conventional sequence information analysis method. For example, as a method of analyzing through a sequencer, large-scale parallel sequencing such as next-generation sequencing can be performed on a test sample.
- the obtained sequence information may be prepared in a form stored in a data storage medium or obtained through a network data transmission / reception apparatus. In one embodiment of the invention, it may be received using the genome sequencer 100 shown in the sequence information analysis system 1 of FIG. 1, provided that the sample sequence information analysis system 1 of FIG. Since only one embodiment of the present invention, the present invention is not limited to FIG. 1.
- the sample sequence information means sequence information of a sample to be analyzed, and the reference sequence information is a reference genome sequence, which is a genome sequence representing one species.
- a reference genome of a human may now be constructed based on published (eg, UCSC, NCBI, etc.) reference genomic sequences such as build 37 (GRCh37), hgl 8, hgl9, hg38.
- the sample sequence information or the reference sequence information may be obtained by, for example, a large-scale parallel sequencing method in the next-generation sequencing method, and the sequence information, the read depth, or the read count number may be obtained by using the next-generation sequencing method.
- the polynucleotide fragment is a rea d used for next generation sequencing
- the number of polynucleotide fragments is read count or read depth
- the average polynucleotide fragment number is It may be an average number of leads.
- sequencing means that a single genome is innumerable polynucleotides
- sequence of each fragment is randomly decomposed into fragments, the sequences of each fragment are read simultaneously, and the sequence data thus obtained are combined using bioinformatics to collectively decipher a large amount of genomic information.
- the next-generation sequencing method is, for example, 454 platform (Margulies, et al., Nature (2005) 437: 376-380), Illumina Genome Analyzer (or Solexa TM platform), Illumina HiSeq2000, HisSeq2500, MiSeq, NextSeq500, Life Tech Ion PGM, Ion Proton, Ion S5, Ion S5XL, or SOLiD (Applied Biosystems) or Helicos True Single Molecule DNA Sequencing Technology (Harris, et al., Science (2008) 320: 106 109), single molecule from Pacific Biosciences, and And / or by real-time (SMRT TM) technology or the like.
- large-scale parallel sequencing that is possible on nanopore sequencing (Soni and Meller, Clin Chem (2007) 53: 1996-2001) has been carried out by the
- Sequencing is possible with high order multiplexing in a parallel fashion (Dear, Brief Funct Genomic Proteomic (2003) 1: 397-416). Each of these platforms sequences single molecules that are either clonally expanded or not amplified of nucleic acid fragments. Sequence information of polynucleotide fragments can be obtained using commercially available sequencing instruments. In addition, the sequencing may be performed by various other known sequencing methods and / or modifications thereof.
- the sample sequence information may be whole genome sequence information or sequence information of a selected target region.
- a target region and a target base sequence mean a selected region (target region) and a base sequence (target base sequence) of the region, respectively, to be analyzed in the genome or chromosome.
- the target region and target base sequence may be present in one or more for one sample.
- the target region is an arbitrary region to be analyzed in whole genome sequencing, and in target sequencing, a region for designing and selecting probes for sequencing at library prep. Can mean come.
- the sample sequence information or reference sequence information may be obtained by, for example, a large-scale parallel sequencing method in the next generation sequencing method, and sequence information, read depth, or read count number may be obtained using the next generation sequencing method.
- Sample sequence information through the next generation sequencing method The entire genome sequence information or a specific selection region, that is, a target region may be selected to perform next generation sequencing, and the sequence information of the target region may be used as sample sequence information.
- the targeted sequencing method using the NGS method is, for example
- the sample sequence information for example, the sample sequence information obtained from the genome sequencer 100 may be mapped to the reference sequence information for each position on the chromosome (S3200). For example, it may be performed by the calculation unit 330 of the analysis device 300 of the sample sequence information of FIG.
- the sample sequence information may be data having a plurality of read counts by reading a plurality of test samples from the genome sequencer 100.
- the test sample may be a cancer sample.
- the number of read counts for each target region of the sample sequence information may be calculated while reading the 250 test sample sequence information.
- the read count may be calculated in at least one target region located in the sample sequence information.
- the purity of the test sample in at least one target region and the average copy number of the test sample can be estimated.
- a method for estimating the purity of an experimental sample and an average copy number of a test sample in a single sample may be performed by measuring at least one parameter including the purity of the test sample and the average copy number, and using the parameters to detect somatic mutations.
- a method for estimating the purity of an experimental sample and an average copy number of a test sample in a single sample may be performed by measuring at least one parameter including the purity of the test sample and the average copy number, and using the parameters to detect somatic mutations.
- step (2) it will be described in detail by dividing step (2) into detailed steps.
- the step (i) is based on an allele having the same sample sequence information and reference sequence information and a different allele frequency (BAF: B Allele Frequency) based on the frequency of alleles having different sample sequence information and reference sequence information.
- BAF B Allele Frequency
- the sample may be a cancer cell sample modified with cancer cells. If the number of copies of the same allele between the sample sequence information and the reference sequence information is n, the number of copies of alleles different from the sample sequence information and the reference sequence information is m, and the purity of the sample is ⁇ , different from the same allele ( ⁇ ).
- the frequency of the allele ⁇ can be defined as in Equation 13 and Equation 14, respectively.
- Equations 13 and 14 ⁇ is the number of copies of the same allele, m is the number of copies of the different alleles and m and n are each 0 or natural numbers,
- ⁇ is the purity of the sample
- Fb is the frequency of different alleles (B).
- the purity of a sample can be expressed in terms of the purity (tumor purity or tumor cellularity) of the total number of cells in the sample. Can be.
- a biopsy of the cancer sample means a ratio of only cancer-derived cells excluding normal cells (stroma cells, white blood cells, etc.) contained in the sample.
- Equation 3 the definitions of ⁇ , 1 ⁇ , 01, 13 ⁇ 4 and Fb are the same as Equations 13 and 14.
- the frequency rate of different alleles can be calculated based on the number of copies of the same allele, the number of copies of the different alleles, and the purity of the experimental sample.
- step (ii) may segment the sample sequence information based on BAFs of different alleles of the sample sequence information.
- the division of sequence information has a region in which the average of the frequencies of different alleles differs from each other.
- Finding and dividing a segment for example, grabbing a random area and t-testing the mean.
- the division of the sequence information may be performed by various methods, and the division method includes, for example, a circular binary segmentation (CBS) method, but is not limited thereto.
- CBS circular binary segmentation
- the segment refers to a group of sequence information, in which the average of different allele BAFs in the sequence information of the sample is the same, and refers to the black bar portion shown in FIG. 10 (c).
- FIG. 10 (a) is a BAF graph of a control sample
- (b) is a BAF graph of an experimental sample.
- the BAF graph may be segmented using CBS (CircuIar Binary Segmentation) or another segmentation method.
- Step (iii) in the information analysis method of the sample sequence according to the present invention by applying at least one segment to the copy number model of the frequency rate for the sample purity, it is possible to extract the copy number and sample purity candidate of the sample. have.
- the copy number model of the frequency rate with respect to the sample purity may be a nm plot model.
- each node ( ⁇ ) £ 161,) (162 ... 0 (6) can be defined, where nodes are (n, m, a, F a , Since the value of F b ) is included, when candidate nodes are selected, copy numbers and sample purity candidates of different alleles can be extracted.
- Equation 15 may be converted as shown in Equation 16. Substituting each segment into the n-m plot model, an ⁇ candidate may be derived, and this is defined as a node candidate or a sample purity candidate. Further, candidate values of the copy number (m, n) can also be obtained from the sample purity candidate values.
- Equation 14 For example, assuming that ⁇ is 0, m is 1, and the BAF of a segment is 0.7, and each parameter is substituted into Equation 4, 0.5 is obtained, and ⁇ , ⁇ , and ⁇ are represented by Equations 13 and 14 If the above parameter is substituted, Fa is 0.3 and Fb is 1.0, so the node in this case is (0,1,0.57,0.3, 1.0).
- the step (iv) may include testing the sample purity and the number of copies filtered through at least one filter among candidates for the sample purity and the number of copies extracted in the step (Hi). It can be estimated by setting the sample purity and the number of copies of the sample, respectively.
- the at least one filter may include at least one filter selected from the group consisting of a ratio filter, a copy number filter, and a unit filter.
- the filter may be filtered using all of the ratio filter, the copy number filter, and the unit filter. Can be.
- the ratio filter may be a filter for filtering whether or not the TRR ratio based on the read count in at least one target region with respect to a target region ratio (TRR) having a predetermined number of read counts is equal to the following equation. Can be defined as 17.
- TRR is a measured value obtained from lead mapping of sequence information of an experimental sample.
- the copy number filter may filter whether the average copy number of the test sample is the same and may be defined as in Equation 18 below.
- the estimating step includes all candidates having the same average number of copies (J) in Equation 18 among the sample candidates obtained in the extraction step using the copy number filter.
- the unit filter may be a filter for filtering whether the read count of the unit region is the same among at least one target region, and may be defined as in Equation 19 below.
- d may mean a unit read count, and may be a read count of a unit area in which at least one target area has a copy number of 1. That is, the estimating step may filter out all of the sample candidate certificates obtained in the extraction step by using the unit filter, leaving all of the candidates having the same number (d) of Equation 19 left behind.
- the candidate extracted in the extraction step is defined as nodes 1 to 6 (nodel, ..., node6), and at least one filter is simultaneously or sequentially used.
- the obtained sample candidate may be removed, i.e. the node may be removed.
- three filters may be used. It is not meant to be written sequentially.
- the remaining nodes may be identified through filtering. That is, when nodes 3 and 5 are finally selected, it can be seen that the segments correspond to the case where the purity of the test sample corresponds to 0.7 purity.
- the sample purity, Fa, Fb, the number of copies of the same allele n, the number of copies of different alleles based on the information of the last remaining node m can be found.
- FIG. 15 is a flowchart illustrating a method of analyzing sample sequence information according to an embodiment of the present invention.
- the apparatus for analyzing sample sequence information receives sample sequence information generated by a genome sequencer and read-maps reference sequence information for each chromosome position (SI). 100).
- the analyzer for analyzing the sample sequence information includes different alleles based on the frequency of the allele (A Allele) having the same sample sequence information and the reference sequence information and the allele (B Allele) having different sample sequence information and the reference sequence information.
- the analyzer for analyzing sample sequence information divides the sample sequence information based on BAF (S 1300).
- the apparatus for analyzing sample sequence information applies the divided at least one segment to a copy number model having a frequency ratio to sample purity to extract copy numbers and sample purity candidates of different alleles (S 1400).
- the apparatus for analyzing sample sequence information estimates the purity and average copy number of the experimental sample using at least one filter (S1500).
- Absolute copy number variation based on a single sample of the sample sequence according to the invention the number of read counts per target region of the sample sequence information can be calculated while reading the sequence information of the experimental sample.
- the read count may be calculated in at least one target region located in the sample sequence information.
- the calculation unit 330 may read the received test sample sequence data to read counts that are read mapped to standard reference sequence data for each chromosomal position. Based on (S3200, S3300), a target region ratio (TRR) may be calculated (S3400).
- TRR target region ratio
- control sample sequence data which is a control of the experimental sample
- the control sample sequence data is not required in the analysis apparatus 300 for copy number variation according to an embodiment of the present invention.
- the TRR of the target area may be obtained based on the obtained read count.
- TRR is the ratio of the read count of at least one target region located in the experimental sample data and the control sequence data, which is the ratio of the read count in target region i.
- the target region ratio (TRR) may be expressed by Equation 1 below.
- TRR is the ratio of the read count of the test sample and the control sample in the specific target area i, is the mapped read count of the test sample and the control sample in the target area i, ⁇ is the total of the mapped control sample Is the read count and ⁇ is the total read count of the mapped samples, the sum of ⁇ in the at least one target region.
- the test sample may be a cancer sample and the control sample may be a normal sample.
- TRR is calculated for at least one target region, and TRR may be derived as in Equation 2 and Equation 3 below. Therefore, as shown in Equation 2 or Equation 3, the TRR according to the present invention is based on the ratio of the total sum of the read count and the read count in the at least one lead-mapped target region. Can be calculated based on this.
- d can be divided into dt (test sample) and dn (control sample) according to the sample.
- T is the total read count of the target region of the test sample, and Lj is the total number of target regions with the copy number j.
- TRRr 1 means TRR when the i th target region has a copy number j, and the definitions of ti, n b N, T are the same as in Equation 1,
- d n is the unit read count of the control sample
- L is the total number of target areas.
- Equation 3 control sample sequence data of the control sample is not required. That is, TRRf ⁇ finally requires only the values of t T, L, because ti, T, L are values obtained from the test sample as described above.
- the method for analyzing absolute copy number variation based on a single sample calculates the TRR using only the experimental sample and replicates the present invention, even if there is no control sample sequence data of the control sample which is a control of the experimental sample. The number variation can be predicted.
- T which is the sum of ti in at least one target region, may be summarized as in Equation 4 below.
- the total read count ⁇ can be calculated as follows. Is the total read count of the target region of the test sample, and Lj is the total number of the target region having the number of copies j. Since the average copy number J of a pure experimental sample, for example, a dark sample, can be expressed by Equation 5 below, ⁇ can be expressed by Equation 6 below. Like ⁇ , ti can be expressed as equation (7). If the at least one target region, i.e.
- the target region has a number of copies of the test sample with j, the number of copies is represented by ji, and if the purity of the test sample is ⁇ , reading of the experimental sample sequence data in the at least one target region
- the count ti may be defined as shown in Equation 7 below.
- Equation 8 the TRR of Equation 2 can be summarized as in Equation 8 below in the i-th target region having the number of j copies.
- Equation 10 may be expressed.
- ⁇ is the purity of the test sample
- J is the average copy number of the test sample
- j is the number of copies of the i th target region of the test sample.
- J and ⁇ can be expressed by Equation 10 by arranging them as A as in Equation 8.
- J and ⁇ can receive input from the outside. That is, an estimate may be received from an external pathologist, or may be estimated using a micro army, or may be received through another method (S3500).
- an example of the method of obtaining the average copy number J and the sample purity ⁇ is an experiment. Estimating the purity of the sample and the average number of copies of the sample, (i) from the read mapping result, the allele (A Allele) having the same sample sequence information and reference sequence information, and the sample sequence information and reference sequence information Calculating a frequency of B allele frequency (BAF) based on the frequency of different alleles (B Allele); (ii) segmenting the sample sequence information based on the frequency rate of the different alleles; (iii) the divided at least one
- the method for estimating the purity of the test sample and the average copy number of the test sample in the single sample may include measuring at least one parameter including the purity of the test sample and the average copy number, and using the parameters to improve the accuracy of the discovery of the somatic mutation. In addition, even in the absence of a control sample, it may be useful for discovering somatic cell copy number variation.
- the experiment based on the calculated and estimated parameters The absolute copy number of the sample can be determined.
- the calculated parameter may be TRR
- the estimated parameters may be J and a. That is, on the basis of the calculated and estimated parameters, if 8 is defined as the absolute copy number score as shown in Equation 1 1, Equation 10 may be converted as shown in Equation 12 below.
- the absolute copy number j can be calculated.
- the absolute copy score is a score that can be calculated using a simple formula, and has a negative relationship with the absolute copy number.
- the clinical significance of the absolute copy score is that the absolute copy number can be expressed as an integer such as 0, 1, 2, 3 than the previous threshold method, which allows more accurate definition of the copy variation. There is an advantage.
- the TRR 'having the number of copies 1 can be theoretically calculated and the TRR value of the target region can be obtained from the measured value, so the following s value in the target region can be calculated. That is, by dividing the TRR value, only j desired to be calculated remains and the remaining variables are canceled out.
- S and J are absolute copy scores in which the i th target region of the test sample has a copy number j, A is as defined in Equation 9, and can be calculated through the estimated parameters J and ⁇ ,
- TRR 'is a TRR having a copy number j in the i-th target region of the test sample
- TRR 1 is a TRR of the test sample having a read count of 1 in at least one target region.
- TRR is calculated from the test sample, for example, a cancer sample, and the value obtained by dividing A by subtracting A (TRR J -A) by (TRR'-A) is obtained. The number can be determined.
- TRR 1 is the number of test samples with a lead count of 1 in at least one target area.
- TRR A can be calculated through the estimated parameters J (average number of replicates of the test sample) and ⁇ (purity of the test sample).
- Number of copies in at least one target region TRR which is a preset number, may be estimated.
- the group may be a copy number set to 1, can be expressed as 1 TRR, 1, TRR can mean a number TRR of replication in at least one of a target area.
- the absolute copy number of the test sample may be the absolute copy number of the test sample in at least one target region.
- the methods and information described herein provide a computer program stored in a computer readable storage medium for carrying out the steps of the method capable of executing the steps described above.
- the computer program stored in the computer readable storage medium may be combined with hardware.
- a computer program stored in the computer readable storage medium is a program for executing the steps in a computer, wherein all the above steps are executed by one program or by two or more programs executing one or more steps. Can be.
- Programs or software stored on the computer readable storage medium may be any, including, for example, on a communication channel such as a telephone line, the Internet, a wireless connection, or the like, or on a portable medium such as a computer readable disk, a flash drive, or the like. It can be delivered to a computer device through known delivery methods.
- Another example also provides a computer readable storage medium (or recording medium) containing computer executable instructions for executing the steps of the method.
- the computer readable medium may include both computer storage media and communication media.
- Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
- Computer storage media may include RAM, ROM, EEPROM, flash memory (eg, USB memory, SD memory, SSD, CF memory, xD memory, etc.), magnetic disks, laser disks, or other Can be used to store memory, CD-ROM, digital versatile disk or other optical disk, magnetic cassette, magnetic tape, magnetic disk storage or other magnetic storage device or desired information and accessible by computer
- RAM random access memory
- ROM read only memory
- EEPROM electrically erasable programmable read-only memory
- flash memory e.g, USB memory, SD memory, SSD, CF memory, xD memory, etc.
- Communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transmission mechanism, and includes any information delivery media.
- the communication medium may be selected from one or more of a wired medium such as a wired network or a direct-wired connection, and a wireless medium such as an acoustic medium, an RF, an infrared ray, and other wireless mediums. .
- Combinations of one or more of the above may also be included within the scope of computer readable media.
- FIG. 7 An example of a computer readable medium according to one embodiment of the present invention is shown in FIG. 7, for example as one component of computer system 500, the computer system comprising one or more processors 510, one or more computer readable storage. May include a medium 530 and a memory 520
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
Abstract
L'invention concerne un procédé d'analyse de variation du nombre de copies absolu sur la base d'un seul échantillon, le procédé consistant à : recevoir des données de séquence d'échantillon d'expérience générées dans un séquenceur diélectrique ; calculer un rapport de région cible (TRR) sur la base d'un comptage de lectures dans lequel les données de séquence d'échantillon d'expérience reçues sont lues, en correspondance avec des données de séquence de référence standard spécifiques à une position chromosomique ; estimer la pureté d'un échantillon d'expérience et le nombre de copies moyen d'un échantillon d'expérience dans au moins une zone cible ; et prédire la variation de nombre de copies absolu de l'échantillon d'expérience sur la base des paramètres calculés et estimés.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020157031732A KR101839088B1 (ko) | 2014-10-29 | 2015-10-29 | 단일 시료에 기반한 절대 복제수 변이를 분석하는 방법 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR10-2014-0148411 | 2014-10-29 | ||
| KR20140148411 | 2014-10-29 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2016068627A1 true WO2016068627A1 (fr) | 2016-05-06 |
Family
ID=55857852
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2015/011515 Ceased WO2016068627A1 (fr) | 2014-10-29 | 2015-10-29 | Procédé d'analyse de variation du nombre de copies absolu sur la base d'un seul échantillon |
Country Status (2)
| Country | Link |
|---|---|
| KR (1) | KR101839088B1 (fr) |
| WO (1) | WO2016068627A1 (fr) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110310704A (zh) * | 2019-05-08 | 2019-10-08 | 西安电子科技大学 | 一种基于局部异常因子的拷贝数变异检测方法 |
| CN113658638A (zh) * | 2021-08-20 | 2021-11-16 | 江苏先声医学诊断有限公司 | 一种基于ngs平台的同源重组缺陷的检测方法和质控体系 |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR101913735B1 (ko) | 2018-05-03 | 2018-11-01 | 주식회사 셀레믹스 | 차세대 염기서열 분석을 위한 시료 간 교차 오염 탐색용 내부 검정 물질 |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120046877A1 (en) * | 2010-07-06 | 2012-02-23 | Life Technologies Corporation | Systems and methods to detect copy number variation |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2406400B1 (fr) * | 2009-03-09 | 2016-07-13 | Life Technologies Corporation | Méthodes de détermination du nombre de copies d'une séquence de génome dans un échantillon biologique |
| CN102682224B (zh) * | 2011-03-18 | 2015-01-21 | 深圳华大基因科技服务有限公司 | 检测拷贝数变异的方法和装置 |
-
2015
- 2015-10-29 WO PCT/KR2015/011515 patent/WO2016068627A1/fr not_active Ceased
- 2015-10-29 KR KR1020157031732A patent/KR101839088B1/ko active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120046877A1 (en) * | 2010-07-06 | 2012-02-23 | Life Technologies Corporation | Systems and methods to detect copy number variation |
Non-Patent Citations (5)
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110310704A (zh) * | 2019-05-08 | 2019-10-08 | 西安电子科技大学 | 一种基于局部异常因子的拷贝数变异检测方法 |
| CN113658638A (zh) * | 2021-08-20 | 2021-11-16 | 江苏先声医学诊断有限公司 | 一种基于ngs平台的同源重组缺陷的检测方法和质控体系 |
Also Published As
| Publication number | Publication date |
|---|---|
| KR101839088B1 (ko) | 2018-03-15 |
| KR20160062747A (ko) | 2016-06-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7637139B2 (ja) | がん予測パイプラインにおけるrna発現コールを自動化するためのシステムおよび方法 | |
| Klughammer et al. | The DNA methylation landscape of glioblastoma disease progression shows extensive heterogeneity in time and space | |
| US11560598B2 (en) | Systems and methods for analyzing circulating tumor DNA | |
| US20210265012A1 (en) | Systems and methods for use of known alleles in read mapping | |
| EP3323070B1 (fr) | Analyse de néoantigènes | |
| Magi et al. | Characterization of MinION nanopore data for resequencing analyses | |
| US9670530B2 (en) | Haplotype resolved genome sequencing | |
| US20200098448A1 (en) | Methods of normalizing and correcting rna expression data | |
| US20160002717A1 (en) | Determining mutation burden in circulating cell-free nucleic acid and associated risk of disease | |
| WO2017127741A1 (fr) | Procédés et systèmes de séquençage haute fidélité | |
| Bortone et al. | Improved T-cell receptor diversity estimates associate with survival and response to anti–PD-1 therapy | |
| JP2020537527A (ja) | 体細胞構造変異の検出のための方法、及び、システム | |
| WO2016068627A1 (fr) | Procédé d'analyse de variation du nombre de copies absolu sur la base d'un seul échantillon | |
| WO2021262770A1 (fr) | Caractérisation de novo de points chauds de fragmentation d'adn acellulaire chez des sujets sains et cancéreux à un stade précoce | |
| KR101841265B1 (ko) | Nmf를 이용한 표적 염기 서열 해독에서의 바이어스 제거 방법 | |
| US20200104285A1 (en) | Signature-hash for multi-sequence files | |
| JP2025514547A (ja) | 遺伝子疾患の診断及び管理のための片親起源疾患対立遺伝子検出のための方法及び装置 | |
| Li et al. | Contamination assessment for cancer next-generation sequencing: method development and clinical implementation | |
| WO2016068626A1 (fr) | Procédé d'analyse d'informations de séquence d'échantillon basé sur un échantillon unique | |
| US20250279155A1 (en) | Consensus-based classification technique to determine genetically inffered ancestry from comprehensive genomic profiling of tumor dna | |
| Evans et al. | Computational analysis in cancer exome sequencing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| ENP | Entry into the national phase |
Ref document number: 20157031732 Country of ref document: KR Kind code of ref document: A |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15854910 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 15854910 Country of ref document: EP Kind code of ref document: A1 |