WO2023171859A1 - Method for distinguishing between somatic mutations and germline mutations - Google Patents
Method for distinguishing between somatic mutations and germline mutations Download PDFInfo
- Publication number
- WO2023171859A1 WO2023171859A1 PCT/KR2022/011527 KR2022011527W WO2023171859A1 WO 2023171859 A1 WO2023171859 A1 WO 2023171859A1 KR 2022011527 W KR2022011527 W KR 2022011527W WO 2023171859 A1 WO2023171859 A1 WO 2023171859A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- mutations
- reads
- size
- somatic
- pair
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6827—Hybridisation assays for detection of mutation or polymorphism
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
Definitions
- the present invention relates to a method for distinguishing between somatic mutations and germline mutations.
- NGS Next-generation sequencing
- VAF variant allele frequency
- One aspect of the present invention includes the steps of a) extracting cell-free DNA from a target sample and obtaining paired-end reads targeting cancer-related genes; b) deriving the size of the obtained pair-end read; and c) classifying somatic mutations and germline mutations from the fraction value of the short fragment size quantified among the derived pair-end reads. Distinguishing somatic mutations and germline mutations from cell-free nucleic acids. The purpose is to provide a method of providing information for.
- One aspect of the present invention includes the steps of a) extracting cell-free DNA from a target sample and obtaining paired-end reads targeting cancer-related genes; b) deriving the size of the obtained pair-end read; and c) classifying somatic mutations and germline mutations from the fraction value of the quantified short fragment size among the derived pair-end reads; Provided is a method of providing information for distinguishing somatic mutations and germline mutations from cell-free nucleic acids containing a.
- the target sample in step a) may be a sample isolated from a cancer patient.
- the method of deriving the size of the obtained pair-end read in step b) is b-1) correcting the orientation of the mapped reads and calculating the difference in base sequence length of the plurality of reads. Calculating and obtaining a density plot according to fragment size; b-2) deriving a fragment value with a fragment size of 100 to 155 bp from the density plot; It may include.
- the method for classifying somatic mutations and germline mutations in step c) is based on the fraction value of the quantified short fragment size among the derived pair-end reads and the overall average pair-end read. If the fraction value of the short fragment size quantified among pair-end reads is large by comparing the sizes, it may be classified as a somatic mutation.
- the method of providing information for distinguishing between somatic mutations and germline mutations of the present invention can distinguish between somatic mutations and germline mutations using cell-free nucleic acids that are relatively easy to obtain.
- Figure 1 is a graph showing the distribution of read fragment sizes from cfDNA samples of colon cancer patients according to allele frequency.
- Figure 2 is a graph showing the distribution according to the fragment size of the read from the cfDNA sample of a colon cancer patient, divided into somatic mutations and germline mutations according to allele frequency.
- Figure 3 shows an ROC curve created to distinguish germline mutations and somatic mutations according to various fragment sizes of reads derived from cfDNA samples of colon cancer patients.
- Figure 4 is a box plotting graph that distinguishes germline mutations and somatic mutations using the average fragment size of reads derived from cfDNA samples of colon cancer patients.
- Figure 5 is a box plotting graph distinguishing germline mutations and somatic mutations using AUC P1 of reads derived from cfDNA samples of colon cancer patients.
- Figure 6 is a graph showing the distribution of read fragment sizes from cfDNA samples of lung cancer patients according to allele frequency.
- Figure 7 is a graph showing the distribution according to the fragment size of the read from the cfDNA sample of a lung cancer patient, divided into somatic mutations and germline mutations according to allele frequency.
- Figure 8 shows an ROC curve created to distinguish germline mutations and somatic mutations according to various fragment sizes of reads derived from cfDNA samples of lung cancer patients.
- Figure 9 is a box plotting graph that distinguishes germline mutations and somatic mutations using the average fragment size of reads derived from cfDNA samples of lung cancer patients.
- Figure 10 is a box plotting graph distinguishing germline mutations and somatic mutations using AUC P1 of reads derived from cfDNA samples of lung cancer patients.
- One aspect of the present invention includes the steps of a) extracting cell-free DNA from a target sample and obtaining paired-end reads targeting cancer-related genes; b) deriving the size of the obtained pair-end read; and c) classifying somatic mutations and germline mutations from the fraction value of the short fragment size quantified among the derived pair-end reads. Distinguishing somatic mutations and germline mutations from cell-free nucleic acids. Provides a method of providing information for
- the types of mutations that can be found in the human body largely include germline mutations and somatic mutations. Meanwhile, in the blood of cancer patients, circulating tumor DNA (ctDNA) and cell-free DNA (cfDNA) derived from the primary cancer are circulating together, and the present inventors used liquid biopsy to determine reproductive The present invention was completed by identifying a method for distinguishing between germline mutation and somatic mutation.
- Step a) is a step of extracting cell-free DNA from the target sample and obtaining paired-end reads targeting cancer-related genes.
- sample' refers to a sample such as tissue, cells, whole blood, serum, plasma, saliva, sputum, cerebrospinal fluid, or urine from which a paired-end sequence can be obtained. It includes, but is not limited to, and may specifically include serum and plasma.
- the target sample in step a) may be a sample isolated from a cancer patient.
- the cancer may be any one of squamous cell carcinoma, adenocarcinoma, sarcoma, colon cancer, and lung cancer.
- sample' refers to a sample such as tissue, cells, whole blood, serum, plasma, saliva, sputum, cerebrospinal fluid, or urine from which a paired-end sequence can be obtained. It includes, but is not limited to, and may specifically include serum and plasma.
- 'cell-free DNA' or 'cfDNA' refers to a fragment of nucleic acid found outside of a cell (e.g., body fluid), and the body fluid is bloodstream, cerebrospinal fluid, saliva, or Including, but not limited to, urine.
- the cfDNA may be derived from the subject (e.g., from the subject's cells) or from a source other than the subject (e.g., from a viral infection).
- the 'paired-end sequence' refers to sequences cloned from both ends (pair-ends) of a gene of interest whose approximate distances from each other are known, and are sequenced in the forward and reverse directions. ) means one sequence.
- the method of obtaining the paired-end sequence may be performed by a method known in the art, and may preferably be obtained through next-generation sequencing (or 'NGS'). The specific method of next-generation sequencing technology is described in Metzker, M. (2010) Nature Biotechnology Reviews 11:31-46, which is incorporated herein by reference.
- Step b) is a step of deriving the size of the obtained pair-end read.
- the method of deriving the size of the obtained pair-end read in step b) is b-1) Obtaining a density plot according to fragment size by correcting the orientation of the mapped reads and calculating the difference in base sequence length of the plurality of reads; b-2) It may include deriving a ratio value between the number of fragments with a fragment size of 100 to 155 bp and the number of fragments with a fragment size of 160 to 180 bp from the density plot.
- the method is as follows. Same as:
- the pair-end sequence obtained through step a) is mapped to a reference read, and the size of the fragment is calculated using the coordinate value obtained after mapping. For example, if read1 is mapped to chr1:100-250 and read2 is mapped to chr1:100-250 in a specific read fragment, after correcting the orientation of the read fragment, read2 The difference between chr1:250, the end, and chr1:100, the end of read1, which is 150 bp, can be obtained. If 100 of the reads are mapped, the fragment size can be calculated for the 100 reads using the above calculation method. For example, 90 bp fragments can be classified into 10 fragments, 105 bp fragments are 20, 130 bp fragments are 30, and 160 bp fragments are 40, and a density plot can be created through this. there is.
- step b-2 is performed to derive a fragment value with a fragment size of 100 to 155 bp from the density plot.
- AUC P1 area under curve
- AUC P2 AUC value for 160 to 180 bp
- the overall average fragment size can be calculated using the genomic adjustment value of the read fragment.
- Step c) is a step of classifying somatic mutations and germline mutations.
- the method of classifying somatic mutations and germline mutations of step c) includes quantification of the derived pair-end reads. By comparing the fraction value of the short fragment size with the size of the overall average pair-end read, if the fraction value of the quantified short fragment size among the pair-end reads is large, it may be classified as a somatic mutation. .
- Example 1 Method for deriving fragment distribution of pair-end sequence in cfDNA
- cfDNA was obtained from blood samples obtained from 322 colon cancer patients and 100 lung cancer patients who visited the Department of Hematology and Oncology at Seoul National University Hospital using Promega's Maxwell automated equipment (Maxwell® RSC ccfDNA Plasma Kit) according to the manufacturer's protocol. Afterwards, NGS (Next Generation Sequencing) was performed on the obtained cfDNA. When performing targeted panel sequencing, the NGS DNA library prep kit (IMBdx) was used, and the target region was amplified using AlphaLiquid® 100 target capture panel (IMBdx). 150bp pair-end sequencing was performed using Illumina's NextSeq 550 platform.
- IMBdx NGS DNA library prep kit
- IMBdx AlphaLiquid® 100 target capture panel
- the obtained pair-end sequencing results were aligned using the Burrows-Wheeler Aligner (BWA, version 0.7.10) “mem” algorithm.
- BWA Burrows-Wheeler Aligner
- hg38 a known human gemone
- IMBdx's UniqSeq protocol was used and the company's in-house filtering steps were used (by default, the variant caller used is vardict).
- the size of the fragment was determined using the genomic coordinate of the read fragment generated after alignment. For example, if read1 is mapped to chr1:100-250 and read2 is mapped to chr1:100-250 in a specific read fragment, after correcting the orientation of the read fragment, read2 The difference between chr1:250, the end, and chr1:100, the end of read1, which is 150 bp, can be obtained.
- the fragments were classified into 10 90 bp fragments, 20 fragments of 105 bp, 30 fragments of 130 bp, and 40 fragments of 160 bp, and a density plot was created from the calculated DNA fragment sizes.
- the actual fragment size calculated to create the density plot was calculated as a continuous value of 80 to 1000 bp, and the number was counted accordingly.
- AUC P1 100-155 bp
- AUC P2 160-180 bp
- the overall average fragment size was calculated using the genomic adjustment value of the read fragment generated after the alignment, and the calculated AUC P1 value and the average fragment size were used to determine germline mutation and somatic mutation. It was used as a reference indicator to distinguish between mutations.
- Example 2 Distinction between germline mutations and somatic mutations according to section size from colon cancer patient samples
- FIG. 1 is a graph showing the size of the actual reads according to the VAF value by dividing the alteration allele and reference allele according to each mutation type, and through this, somatic mutation.
- the diagram in Figure 1 was classified into somatic mutations and germline mutations and subdivided each into somatic mutations and germline mutations.
- FIG. 2 it was confirmed that the lead fragment that actually had the modified allele of the somatic mutation had a shorter fragment size than the lead fragment that was confirmed to be a germline mutation.
- Figure 3 shows the results of confirming through the ROC curve created according to the method of Example 1 that germline mutations and somatic mutations that do not exist in the existing database can be distinguished using values obtained from various fragment sizes.
- AUC P1 is shown in dark blue, and the average intercept size is shown in light orange. From the above results, it was confirmed that when using the AUC P1 value and average fragment size, somatic mutations and germline mutations can be distinguished with high accuracy.
- Figures 4 and 5 show box plotting results distinguishing germline mutations and somatic mutations using the average intercept and AUC P1 derived according to the method of Example 1.
- the AUC P1 value and average fragment size are used. It was confirmed that somatic mutations and germline mutations were statistically significantly distinguished.
- Example 3 Distinction between germline mutations and somatic mutations according to section size from lung cancer patient samples
- Figure 6 is a graph showing the size of the actual reads according to the VAF value by dividing the alteration allele and reference allele according to each mutation type, and through this, somatic mutation.
- the diagram in Figure 6 was classified into somatic mutations and germline mutations and subdivided each into somatic mutations and germline mutations.
- Figure 7. it was confirmed that the leads with more variant alleles of somatic mutations had shorter fragment sizes than the lead fragments confirmed to be germline mutations.
- Figure 8 shows the results of confirming through the ROC curve created according to the method of Example 1 that germline mutations and somatic mutations that do not exist in the existing database can be distinguished using values obtained from various fragment sizes.
- AUC P1 is shown in dark blue, and the average intercept size is shown in light orange. From the above results, it was confirmed that when using the AUC P1 value and average fragment size, somatic mutations and germline mutations can be distinguished with high accuracy.
- Figures 9 and 10 show box plotting results for distinguishing germline mutations and somatic mutations using the average intercept and AUC P1 derived according to the method of Example 1.
- the AUC P1 value and average fragment size are used. It was confirmed that somatic mutations and germline mutations were statistically significantly distinguished.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Immunology (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Pathology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Hospice & Palliative Care (AREA)
- Oncology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
본 발명은 체세포 변이 및 생식세포 변이를 구별하는 방법에 관한 것이다.The present invention relates to a method for distinguishing between somatic mutations and germline mutations.
DNA 돌연변이는 암의 원인이며 암 연구 및 치료에 있어 중요한 부분이다. 차세대 서열분석(next-generation sequencing: NGS)은 최신 서열분석기가 생성할 수 있는 방대한 수의 리드(read)로 인해 드 노보(de novo) 돌연변이 검출을 위한 유망한 기술이다. 이론상, 충분한 리드 정도(read depth)가 주어지면, 변이 대립유전자 빈도(variant allele frequency: VAF) 또는 게놈 영역에 상관없이, 게놈 시료의 모든 돌연변이 또는 변이를 관찰할 수 있다.DNA mutations are the cause of cancer and are an important part of cancer research and treatment. Next-generation sequencing (NGS) is a promising technology for de novo mutation detection due to the vast number of reads that modern sequencers can generate. In theory, given sufficient read depth, all mutations or variations in a genomic sample can be observed, regardless of variant allele frequency (VAF) or genomic region.
하지만, 리드 내 노이즈로 인해 신뢰성이 있게 변이를 확인하는 것은 명확하지 않다. 서열분석 리드로부터 변이를 확인하기 위한 몇 가지 생물정보학적 수단이 개발되고 있다.However, it is not clear to reliably identify mutations due to noise within the reads. Several bioinformatic tools are being developed to identify mutations from sequencing reads.
그럼에도 불구하고 액체생검(liquid biopsy)을 통하여 생식세포 돌연변이(germline mutation)와 체세포 돌연변이(somatic mutation)를 구별할 수 없는 경우가 많아 체세포 돌연변이를 검출하기 위한 방법이 지속적으로 요구되고 있다.Nevertheless, since germline mutations and somatic mutations cannot be distinguished through liquid biopsy in many cases, methods for detecting somatic mutations are continuously required.
본 발명의 일 양상은 a) 대상 시료에서 세포 유리 핵산(cell-free DNA)을 추출하여 암 연관 유전자를 표적으로 페어-엔드 리드(paired-end read)를 수득하는 단계; b) 상기 수득한 페어-엔드 리드의 크기를 도출하는 단계; 및 c) 상기 도출된 페어-엔드 리드 중 정량화된 짧은 절편 크기의 프랙션(fraction) 값으로부터 체세포 변이 및 생식세포 변이를 분류하는 단계를 포함하는 세포 유리 핵산으로부터 체세포 변이 및 생식세포 변이를 구별하기 위한 정보를 제공하는 방법을 제공하는 것을 목적으로 한다.One aspect of the present invention includes the steps of a) extracting cell-free DNA from a target sample and obtaining paired-end reads targeting cancer-related genes; b) deriving the size of the obtained pair-end read; and c) classifying somatic mutations and germline mutations from the fraction value of the short fragment size quantified among the derived pair-end reads. Distinguishing somatic mutations and germline mutations from cell-free nucleic acids. The purpose is to provide a method of providing information for.
본 발명의 일 양상은 a) 대상 시료에서 세포 유리 핵산(cell-free DNA)을 추출하여 암 연관 유전자를 표적으로 페어-엔드 리드(paired-end read)를 수득하는 단계; b) 상기 수득한 페어-엔드 리드의 크기를 도출하는 단계; 및 c) 상기 도출된 페어-엔드 리드 중 정량화된 짧은 절편 크기의 프랙션(fraction) 값으로부터 체세포 변이 및 생식세포 변이를 분류하는 단계; 를 포함하는 세포 유리 핵산으로부터 체세포 변이 및 생식세포 변이를 구별하기 위한 정보를 제공하는 방법을 제공한다.One aspect of the present invention includes the steps of a) extracting cell-free DNA from a target sample and obtaining paired-end reads targeting cancer-related genes; b) deriving the size of the obtained pair-end read; and c) classifying somatic mutations and germline mutations from the fraction value of the quantified short fragment size among the derived pair-end reads; Provided is a method of providing information for distinguishing somatic mutations and germline mutations from cell-free nucleic acids containing a.
본 발명의 일 구체예로 상기 a) 단계의 대상 시료는 암환자로부터 분리된 시료일 수 있다.In one embodiment of the present invention, the target sample in step a) may be a sample isolated from a cancer patient.
본 발명의 일 구체예로 상기 b) 단계의 상기 수득한 페어-엔드 리드의 크기를 도출하는 방법은 b-1) 맵핑된 리드들의 방향성(orient)을 보정하고 복수개의 리드의 염기서열 길이 차이를 계산하여 절편 크기(fragment size)에 따른 밀도 플롯(density plot)을 수득하는 단계; b-2) 상기 밀도 플롯으로부터 절편 크기 100 내지 155 bp인 절편 값을 도출하는 단계; 를 포함하는 것일 수 있다.In one embodiment of the present invention, the method of deriving the size of the obtained pair-end read in step b) is b-1) correcting the orientation of the mapped reads and calculating the difference in base sequence length of the plurality of reads. Calculating and obtaining a density plot according to fragment size; b-2) deriving a fragment value with a fragment size of 100 to 155 bp from the density plot; It may include.
본 발명의 일 구체예로 상기 c) 단계의 체세포 변이 및 생식세포 변이를 분류하는 방법은 상기 도출된 페어-엔드 리드 중 정량화된 짧은 절편 크기의 프랙션(fraction) 값과 전체 평균 페어-엔드 리드의 크기를 비교하여 페어-엔드 리드 중 정량화된 짧은 절편 크기의 프랙션(fraction) 값이 큰 경우 체세포 변이로 분류하는 것일 수 있다. In one embodiment of the present invention, the method for classifying somatic mutations and germline mutations in step c) is based on the fraction value of the quantified short fragment size among the derived pair-end reads and the overall average pair-end read. If the fraction value of the short fragment size quantified among pair-end reads is large by comparing the sizes, it may be classified as a somatic mutation.
본 발명의 체세포 변이 및 생식세포 변이를 구별하기 위한 정보를 제공하는 방법은 비교적 수득이 용이한 세포 유리 핵산을 이용하여 체세포 변이 및 생식세포 변이를 구별할 수 있다.The method of providing information for distinguishing between somatic mutations and germline mutations of the present invention can distinguish between somatic mutations and germline mutations using cell-free nucleic acids that are relatively easy to obtain.
도 1은 대장암 환자의 cfDNA 시료로부터 리드의 절편 크기에 따른 분포를 대립유전자 빈도에 따라 나타낸 그래프이다.Figure 1 is a graph showing the distribution of read fragment sizes from cfDNA samples of colon cancer patients according to allele frequency.
도 2는 대장암 환자의 cfDNA 시료로부터 리드의 절편 크기에 따른 분포를 대립유전자 빈도에 따라 체세포 변이, 생식세포 변이로 각각 세분화하여 나타낸 그래프이다.Figure 2 is a graph showing the distribution according to the fragment size of the read from the cfDNA sample of a colon cancer patient, divided into somatic mutations and germline mutations according to allele frequency.
도 3은 대장암 환자의 cfDNA 시료에서 도출된 리드의 다양한 절편 크기에 따라 생식세포 변이와 체세포 변이를 구별하기 위해 작성된 ROC 커브를 나타낸다.Figure 3 shows an ROC curve created to distinguish germline mutations and somatic mutations according to various fragment sizes of reads derived from cfDNA samples of colon cancer patients.
도 4는 대장암 환자의 cfDNA 시료에서 도출된 리드의 평균 절편 크기를 이용하여 생식세포 변이와 체세포 변이를 구별한 박스 플롯팅(box plotting) 그래프이다.Figure 4 is a box plotting graph that distinguishes germline mutations and somatic mutations using the average fragment size of reads derived from cfDNA samples of colon cancer patients.
도 5는 대장암 환자의 cfDNA 시료에서 도출된 리드의 AUC P1를 이용하여 생식세포 변이와 체세포 변이를 구별한 박스 플롯팅 그래프이다.Figure 5 is a box plotting graph distinguishing germline mutations and somatic mutations using AUC P1 of reads derived from cfDNA samples of colon cancer patients.
도 6은 폐암 환자의 cfDNA 시료로부터 리드의 절편 크기에 따른 분포를 대립유전자 빈도에 따라 나타낸 그래프이다.Figure 6 is a graph showing the distribution of read fragment sizes from cfDNA samples of lung cancer patients according to allele frequency.
도 7은 폐암 환자의 cfDNA 시료로부터 리드의 절편 크기에 따른 분포를 대립유전자 빈도에 따라 체세포 변이, 생식세포 변이로 각각 세분화하여 나타낸 그래프이다.Figure 7 is a graph showing the distribution according to the fragment size of the read from the cfDNA sample of a lung cancer patient, divided into somatic mutations and germline mutations according to allele frequency.
도 8은 폐암 환자의 cfDNA 시료에서 도출된 리드의 다양한 절편 크기에 따라 생식세포 변이와 체세포 변이를 구별하기 위해 작성된 ROC 커브를 나타낸다.Figure 8 shows an ROC curve created to distinguish germline mutations and somatic mutations according to various fragment sizes of reads derived from cfDNA samples of lung cancer patients.
도 9는 폐암 환자의 cfDNA 시료에서 도출된 리드의 평균 절편 크기를 이용하여 생식세포 변이와 체세포 변이를 구별한 박스 플롯팅(box plotting) 그래프이다.Figure 9 is a box plotting graph that distinguishes germline mutations and somatic mutations using the average fragment size of reads derived from cfDNA samples of lung cancer patients.
도 10은 폐암 환자의 cfDNA 시료에서 도출된 리드의 AUC P1를 이용하여 생식세포 변이와 체세포 변이를 구별한 박스 플롯팅 그래프이다.Figure 10 is a box plotting graph distinguishing germline mutations and somatic mutations using AUC P1 of reads derived from cfDNA samples of lung cancer patients.
본 발명의 일 양상은 a) 대상 시료에서 세포 유리 핵산(cell-free DNA)을 추출하여 암 연관 유전자를 표적으로 페어-엔드 리드(paired-end read)를 수득하는 단계; b) 상기 수득한 페어-엔드 리드의 크기를 도출하는 단계; 및 c) 상기 도출된 페어-엔드 리드 중 정량화된 짧은 절편 크기의 프랙션(fraction) 값으로부터 체세포 변이 및 생식세포 변이를 분류하는 단계를 포함하는 세포 유리 핵산으로부터 체세포 변이 및 생식세포 변이를 구별하기 위한 정보를 제공하는 방법을 제공한다.One aspect of the present invention includes the steps of a) extracting cell-free DNA from a target sample and obtaining paired-end reads targeting cancer-related genes; b) deriving the size of the obtained pair-end read; and c) classifying somatic mutations and germline mutations from the fraction value of the short fragment size quantified among the derived pair-end reads. Distinguishing somatic mutations and germline mutations from cell-free nucleic acids. Provides a method of providing information for
인체에서 발견될 수 있는 돌연변이 종류는 크게 생식세포 돌연변이(germline mutation)와 체세포 돌연변이(somatic mutation)가 있다. 한편, 암환자의 혈액에서는 원발암 유래의 종양 유전체(circulating tumor DNA, ctDNA)와 세포유리 유전체(cell-free DNA, cfDNA)가 함께 순환하고 있는데, 본 발명자들은 액체생검(liquid biopsy)을 통하여 생식세포 돌연변이(germline mutation)와 체세포 돌연변이(somatic mutation)를 구별할 수 있는 방법을 확인하여 본 발명을 완성하였다.The types of mutations that can be found in the human body largely include germline mutations and somatic mutations. Meanwhile, in the blood of cancer patients, circulating tumor DNA (ctDNA) and cell-free DNA (cfDNA) derived from the primary cancer are circulating together, and the present inventors used liquid biopsy to determine reproductive The present invention was completed by identifying a method for distinguishing between germline mutation and somatic mutation.
상기 a) 단계는 대상 시료에서 세포 유리 핵산(cell-free DNA)을 추출하여 암 연관 유전자를 표적으로 페어-엔드 리드(paired-end read)를 수득하는 단계이다.Step a) is a step of extracting cell-free DNA from the target sample and obtaining paired-end reads targeting cancer-related genes.
본 명세서에서 사용되는 용어, '시료(sample)'는 페어-엔드 시퀀스(paired-end sequence)를 얻을 수 있는 조직, 세포, 전혈, 혈청, 혈장, 타액, 객담, 뇌척수액 또는 뇨와 같은 시료 등을 포함하나, 이에 제한되지 않으며, 구체적으로는 혈청, 혈장일 수 있다. As used herein, the term 'sample' refers to a sample such as tissue, cells, whole blood, serum, plasma, saliva, sputum, cerebrospinal fluid, or urine from which a paired-end sequence can be obtained. It includes, but is not limited to, and may specifically include serum and plasma.
본 발명의 일 구체예로 상기 a) 단계의 대상 시료는 암환자로부터 분리된 시료일 수 있다.In one embodiment of the present invention, the target sample in step a) may be a sample isolated from a cancer patient.
상기 암은 편평세포암종, 선암종, 육종, 대장암 및 폐암 중 어느 하나일 수 있다. The cancer may be any one of squamous cell carcinoma, adenocarcinoma, sarcoma, colon cancer, and lung cancer.
본 명세서에서 사용되는 용어, '시료(sample)'는 페어-엔드 시퀀스(paired-end sequence)를 얻을 수 있는 조직, 세포, 전혈, 혈청, 혈장, 타액, 객담, 뇌척수액 또는 뇨와 같은 시료 등을 포함하나, 이에 제한되지 않으며, 구체적으로는 혈청, 혈장일 수 있다. As used herein, the term 'sample' refers to a sample such as tissue, cells, whole blood, serum, plasma, saliva, sputum, cerebrospinal fluid, or urine from which a paired-end sequence can be obtained. It includes, but is not limited to, and may specifically include serum and plasma.
본 명세서에서, '세포 유리 핵산(cell-free DNA)' 또는 'cfDNA'는 세포의 외부(예를 들어, 체액)에서 발견되는 핵산의 단편을 의미하는 것으로, 상기 채액은 혈류, 뇌척수액, 타액 또는 소변을 포함하지만, 이에 제한되지 않는다. 상기 cfDNA는 대상으로부터(예를 들어, 대상의 세포로부터) 유래될 수 있거나, 대상 이외의 공급원으로부터(예를 들어, 바이러스 감염으로부터) 유래될 수 있다.As used herein, 'cell-free DNA' or 'cfDNA' refers to a fragment of nucleic acid found outside of a cell (e.g., body fluid), and the body fluid is bloodstream, cerebrospinal fluid, saliva, or Including, but not limited to, urine. The cfDNA may be derived from the subject (e.g., from the subject's cells) or from a source other than the subject (e.g., from a viral infection).
본 명세서에서, 상기 '페어-엔드 시퀀스(paired-end sequence)'는 서로의 대략적인 거리를 알고 있는, 목적하는 유전자의 양 말단 (페어-엔드)에서부터 복제한 서열들로서 정방향 및 역방향으로 시퀀싱(sequencing) 한 서열을 의미한다. 상기 페어-엔드 시퀀스를 수득하는 방법은 당업계에 공지된 방법으로 수행될 수 있으며, 바람직하게는 차세대 염기서열 분석기법(또는 'NGS')을 통해 수득될 수 있다. 차세대 염기서열 분석기법의 구체적인 방법은 Metzker, M. (2010) Nature Biotechnology Reviews11:31-46]에 기재되어 있으며, 상기 문헌은 본 명세서에 참조로서 삽입된다.In this specification, the 'paired-end sequence' refers to sequences cloned from both ends (pair-ends) of a gene of interest whose approximate distances from each other are known, and are sequenced in the forward and reverse directions. ) means one sequence. The method of obtaining the paired-end sequence may be performed by a method known in the art, and may preferably be obtained through next-generation sequencing (or 'NGS'). The specific method of next-generation sequencing technology is described in Metzker, M. (2010) Nature Biotechnology Reviews 11:31-46, which is incorporated herein by reference.
상기 b) 단계는 상기 수득한 페어-엔드 리드의 크기를 도출하는 단계로서, 본 발명의 일 구체예로 상기 b) 단계의 상기 수득한 페어-엔드 리드의 크기를 도출하는 방법은 b-1) 맵핑된 리드들의 방향성(orient)을 보정하고 복수개의 리드의 염기서열 길이 차이를 계산하여 절편 크기(fragment size)에 따른 밀도 플롯(density plot)을 수득하는 단계; b-2) 상기 밀도 플롯으로부터 절편 크기 100 내지 155 bp인 절편의 개수와 160 내지 180 bp인 절편의 개수의 비율 값을 도출하는 단계를 포함하는 것일 수 있다.Step b) is a step of deriving the size of the obtained pair-end read. In one embodiment of the present invention, the method of deriving the size of the obtained pair-end read in step b) is b-1) Obtaining a density plot according to fragment size by correcting the orientation of the mapped reads and calculating the difference in base sequence length of the plurality of reads; b-2) It may include deriving a ratio value between the number of fragments with a fragment size of 100 to 155 bp and the number of fragments with a fragment size of 160 to 180 bp from the density plot.
상기 b) 단계의 b-1) 맵핑된 리드들의 방향성(orient)을 보정하고 복수개의 리드의 염기서열 길이 차이를 계산하여 절편 크기(fragment size)에 따른 밀도 플롯(density plot)을 수득하는 것은 하기와 같다:To obtain a density plot according to fragment size by correcting the orientation of the mapped reads in step b-1) and calculating the difference in nucleotide sequence length of a plurality of reads, the method is as follows. Same as:
상기 a) 단계를 통해 수득한 페어-엔드 시퀀스를 레퍼런스 리드(reference read)에 맵핑(mapping)하고, 맵핑 후에 얻은 조정(coordinate) 값을 이용하여 절편의 크기를 계산한다. 예를 들어, 특정한 리드 절편(read fragment)에서 read1이 chr1:100-250에, read2가 chr1:100-250에 각각 맵핑이 되어 있다면, 상기 리드 절편의 방향성(orient)를 보정한 후, read2의 말단인 chr1:250과 read1의 말단인 chr1:100의 차이값인 150 bp를 구할 수 있다. 상기 리드들이 100개가 맵핑되어 있다면, 100개의 리드에 대하여 위와 같은 계산 방법을 통해 절편 크기를 계산할 수 있다. 예를 들어, 90 bp인 절편이 10개, 105 bp인 절편이 20개, 130 bp인 절편이 30개, 160bp인 절편이 40개와 같이 분류될 수 있고 이를 통해 밀도 플롯(density plot)을 작성할 수 있다.The pair-end sequence obtained through step a) is mapped to a reference read, and the size of the fragment is calculated using the coordinate value obtained after mapping. For example, if read1 is mapped to chr1:100-250 and read2 is mapped to chr1:100-250 in a specific read fragment, after correcting the orientation of the read fragment, read2 The difference between chr1:250, the end, and chr1:100, the end of read1, which is 150 bp, can be obtained. If 100 of the reads are mapped, the fragment size can be calculated for the 100 reads using the above calculation method. For example, 90 bp fragments can be classified into 10 fragments, 105 bp fragments are 20, 130 bp fragments are 30, and 160 bp fragments are 40, and a density plot can be created through this. there is.
본 발명의 일 구체예에 따르면, 상기 b-1) 단계 이후, b-2) 상기 밀도 플롯으로부터 절편 크기 100 내지 155 bp인 절편 값을 도출하는 단계를 수행하게 된다.According to one embodiment of the present invention, after step b-1), step b-2) is performed to derive a fragment value with a fragment size of 100 to 155 bp from the density plot.
예를 들어, 상기 100개의 리드에 대한 절편 크기별 개수에 따라 100 내지 155 bp에 대한 AUC(area under curve)값(AUC P1)과 160 내지 180 bp에 대한 AUC 값(AUC P2)을 구하면, AUC P1은 10+20+30/100 = 0.6, AUC P2는 40/100 = 0.4가 될 수 있으며, 리드 절편의 게놈 조정값을 이용하여 전체 평균 절편 크기를 계산할 수 있다. For example, if the AUC (area under curve) value for 100 to 155 bp (AUC P1) and the AUC value (AUC P2) for 160 to 180 bp are calculated according to the number of fragment sizes for the 100 reads, AUC P1 can be 10+20+30/100 = 0.6, AUC P2 can be 40/100 = 0.4, and the overall average fragment size can be calculated using the genomic adjustment value of the read fragment.
상기 c) 단계는 체세포 변이 및 생식세포 변이를 분류하는 단계로, 본 발명의 일 구체예로 상기 c) 단계의 체세포 변이 및 생식세포 변이를 분류하는 방법은 상기 도출된 페어-엔드 리드 중 정량화된 짧은 절편 크기의 프랙션(fraction) 값과 전체 평균 페어-엔드 리드의 크기를 비교하여 페어-엔드 리드 중 정량화된 짧은 절편 크기의 프랙션(fraction) 값이 큰 경우 체세포 변이로 분류하는 것일 수 있다.Step c) is a step of classifying somatic mutations and germline mutations. In one embodiment of the present invention, the method of classifying somatic mutations and germline mutations of step c) includes quantification of the derived pair-end reads. By comparing the fraction value of the short fragment size with the size of the overall average pair-end read, if the fraction value of the quantified short fragment size among the pair-end reads is large, it may be classified as a somatic mutation. .
이하 하나 이상의 구체예를 실시예를 통하여 보다 상세하게 설명한다. 그러나, 이들 실시예는 하나 이상의 구체예를 예시적으로 설명하기 위한 것으로 본 발명의 범위가 이들 실시예에 한정되는 것은 아니다.Hereinafter, one or more specific examples will be described in more detail through examples. However, these examples are intended to illustrate one or more embodiments and the scope of the present invention is not limited to these examples.
실시예 1: cfDNA에서 페어-엔드 시퀀스의 절편 분포 도출 방법Example 1: Method for deriving fragment distribution of pair-end sequence in cfDNA
서울대병원 혈액종양내과 내원한 대장암 환자 322명 및 폐암 환자 100명에서 얻은 혈액 샘플로부터 Promega사의 Maxwell 자동화장비(Maxwell® RSC ccfDNA Plasma Kit)를 이용하여 제조사의 프로토콜에 따라 cfDNA를 수득하였다. 이후, 상기 수득한 cfDNA로부터 NGS(Next Generation Sequencing)을 수행하였다. Targeted panel sequencing 수행 시 NGS DNA library prep 키트(IMBdx 사)를 사용하였으며, AlphaLiquid® 100 target capture panel(IMBdx 사)을 사용하여 타겟 영역의 증폭을 수행하였다. 150bp 페어-엔드 시퀀싱은 Illumina 사의 NextSeq 550 platform을 이용하였다.cfDNA was obtained from blood samples obtained from 322 colon cancer patients and 100 lung cancer patients who visited the Department of Hematology and Oncology at Seoul National University Hospital using Promega's Maxwell automated equipment (Maxwell® RSC ccfDNA Plasma Kit) according to the manufacturer's protocol. Afterwards, NGS (Next Generation Sequencing) was performed on the obtained cfDNA. When performing targeted panel sequencing, the NGS DNA library prep kit (IMBdx) was used, and the target region was amplified using
이후, 수득한 페어-엔드 시퀀싱 결과를 Burrows-Wheeler Aligner (BWA, version 0.7.10) "mem" 알고리즘을 이용하여 얼라인먼트(alignment)를 수행하였다. 본 실시예에서는 공지된 인간 게놈(human gemone)인 hg38을 레퍼런스 게놈(reference genome)으로 사용하였다. 베리언트 콜링(variant calling)은 IMBdx 사의 UniqSeq protocol을 이용하였으며 및 자사의 in-house filtering steps를 이용하였다(기본적으로 사용된 variant caller는 vardict임).Afterwards, the obtained pair-end sequencing results were aligned using the Burrows-Wheeler Aligner (BWA, version 0.7.10) “mem” algorithm. In this example, hg38, a known human gemone, was used as a reference genome. For variant calling, IMBdx's UniqSeq protocol was used and the company's in-house filtering steps were used (by default, the variant caller used is vardict).
절편의 크기는 얼라인먼트 이후 생성되는 리드 절편(read fragment)의 게놈 조정값(genomic coordinate)을 이용하여 길이를 구하였다. 예를 들어, 특정한 리드 절편(read fragment)에서 read1이 chr1:100-250에, read2가 chr1:100-250에 각각 맵핑이 되어 있다면, 상기 리드 절편의 방향성(orient)를 보정한 후, read2의 말단인 chr1:250과 read1의 말단인 chr1:100의 차이값인 150 bp를 구할 수 있다. 90 bp인 절편이 10개, 105 bp인 절편이 20개, 130 bp인 절편이 30개, 160bp인 절편이 40개로 분류되었고, 상기 계산된 DNA 절편 크기로부터 밀도 플롯을 작성하였다. 밀도 플롯을 작성하기 위해 계산되는 실제 절편 크기는 80 내지 1000 bp의 연속된 값으로 계산되었으며, 그에 따른 개수를 카운트하였다.The size of the fragment was determined using the genomic coordinate of the read fragment generated after alignment. For example, if read1 is mapped to chr1:100-250 and read2 is mapped to chr1:100-250 in a specific read fragment, after correcting the orientation of the read fragment, read2 The difference between chr1:250, the end, and chr1:100, the end of read1, which is 150 bp, can be obtained. The fragments were classified into 10 90 bp fragments, 20 fragments of 105 bp, 30 fragments of 130 bp, and 40 fragments of 160 bp, and a density plot was created from the calculated DNA fragment sizes. The actual fragment size calculated to create the density plot was calculated as a continuous value of 80 to 1000 bp, and the number was counted accordingly.
상기 밀도 플롯으로부터, AUC P1(100-155bp) 값과 AUC P2(160-180bp)을 계산하였다. 상기 분류된 절편의 개수로부터 AUC P1은 10+20+30 / 100 = 0.6이고, AUC P2 = 40 / 100 = 0.4를 도출할 수 있다. 또한, 상기 얼라인먼트 이후 생성되는 리드 절편의 게놈 조정값을 이용하여 전체 평균 절편 크기를 계산하였으며, 상기 계산된 AUC P1 값과 상기 평균 절편 크기를 이용하여 생식세포 변이(germline mutation)과 체세포 변이(somatic mutation)을 구분하는 참고 지표로 사용하였다. From the density plot, AUC P1 (100-155 bp) and AUC P2 (160-180 bp) values were calculated. From the number of classified segments, AUC P1 can be derived as 10+20+30 / 100 = 0.6, and AUC P2 = 40 / 100 = 0.4. In addition, the overall average fragment size was calculated using the genomic adjustment value of the read fragment generated after the alignment, and the calculated AUC P1 value and the average fragment size were used to determine germline mutation and somatic mutation. It was used as a reference indicator to distinguish between mutations.
실시예 2: 대장암 환자 시료로부터 절편 크기에 따른 생식세포 변이 및 체세포 변이의 구별Example 2: Distinction between germline mutations and somatic mutations according to section size from colon cancer patient samples
대장암 환자 322명의 혈장 cfDNA 시료를 대상으로 상기 실시예 1의 방법에 따라 도출한 리드의 절편 크기에 대한 분포를 대립 유전자 빈도(Variant Allele Frequency, VAF)에 따라 도표로 나타내었다. 도 1은 각각의 돌연변이 종류에 따라 변형 대립유전자(alteration allele)와 레퍼런스 대립유전자(reference allele) 등을 나누어 실제 리드들에 대한 크기를 VAF 값에 따라 나타내는 그래프로, 이를 통하여 체세포 변이(somatic mutation)의 경우, 변형 대립유전자를 가지고 있는 리드일수록 낮은 VAF 에서도 더 짧은 절편 크기(fragment size)로 분포하고 있음을 확인할 수 있었으며, 상기 도 1의 도표를 체세포 변이, 생식세포 변이로 분류하여 각각을 세분화하여 확인한 결과 도 2에서 보는 바와 같이, 실제로 체세포 변이의 변형 대립유전자를 가지고 있는 리드일수록 생식세포 변이로 확인이 된 리드 절편보다 더 짧은 절편 크기를 가지고 있음을 확인할 수 있었다.The distribution of the fragment sizes of the leads derived according to the method of Example 1 for plasma cfDNA samples from 322 colorectal cancer patients was plotted according to the allele frequency (Variant Allele Frequency, VAF). Figure 1 is a graph showing the size of the actual reads according to the VAF value by dividing the alteration allele and reference allele according to each mutation type, and through this, somatic mutation. In the case of , it was confirmed that reads with variant alleles were distributed with shorter fragment sizes even at low VAF, and the diagram in Figure 1 was classified into somatic mutations and germline mutations and subdivided each into somatic mutations and germline mutations. As a result, as shown in FIG. 2, it was confirmed that the lead fragment that actually had the modified allele of the somatic mutation had a shorter fragment size than the lead fragment that was confirmed to be a germline mutation.
도 3은 다양한 절편 크기로부터 얻을 수 있는 값을 이용하여 기존 데이터베이스에 존재하지 않는 생식세포 변이와 체세포 변이를 구별할 수 있음을 실시예 1의 방법에 따라 작성된 ROC 커브를 통해 확인한 결과이다. AUC P1을 남색으로 표시하였으며, 평균 절편 크기를 연한 주황색으로 나타내었다. 상기 결과로부터 AUC P1 값과 평균 절편 크기를 이용할 때, 체세포 변이 생식세포 변이를 높은 정확도로 구분할 수 있음을 확인할 수 있었다.Figure 3 shows the results of confirming through the ROC curve created according to the method of Example 1 that germline mutations and somatic mutations that do not exist in the existing database can be distinguished using values obtained from various fragment sizes. AUC P1 is shown in dark blue, and the average intercept size is shown in light orange. From the above results, it was confirmed that when using the AUC P1 value and average fragment size, somatic mutations and germline mutations can be distinguished with high accuracy.
도 4 및 도 5는 실시예 1의 방법에 따라 도출된 평균 절편 및 AUC P1을 이용한 생식세포 변이 및 체세포 변이를 구별한 박스 플롯팅(box plotting) 결과를 나타낸다. 도 4 및 도 5에서 보는 바와 같이, 기존 연구자들이 가장 많이 사용하는 생식세포 변이 데이터베이스 (예를 들어, GNOMAD 등)에 등록되어 있지 않은 생식세포 변이인 경우, AUC P1 값과 평균 절편 크기를 이용하여 체세포 변이와 생식세포 변이가 통계적으로 유의미하게 구분됨을 확인할 수 있었다.Figures 4 and 5 show box plotting results distinguishing germline mutations and somatic mutations using the average intercept and AUC P1 derived according to the method of Example 1. As shown in Figures 4 and 5, in the case of a germline variant that is not registered in the germline variant database (e.g., GNOMAD, etc.) most commonly used by existing researchers, the AUC P1 value and average fragment size are used. It was confirmed that somatic mutations and germline mutations were statistically significantly distinguished.
실시예 3: 폐암 환자 시료로부터 절편 크기에 따른 생식세포 변이 및 체세포 변이의 구별Example 3: Distinction between germline mutations and somatic mutations according to section size from lung cancer patient samples
폐암 환자 100명의 혈장 cfDNA 시료를 대상으로 상기 실시예 1의 방법에 따라 도출한 리드의 절편 크기에 대한 분포를 대립 유전자 빈도(Variant Allele Frequency, VAF)에 따라 도표로 나타내었다. 도 6은 각각의 돌연변이 종류에 따라 변형 대립유전자(alteration allele)와 레퍼런스 대립유전자(reference allele) 등을 나누어 실제 리드들에 대한 크기를 VAF 값에 따라 나타내는 그래프로, 이를 통하여 체세포 변이(somatic mutation)의 경우, 변형 대립유전자를 가지고 있는 리드일수록 낮은 VAF 에서도 더 짧은 절편 크기(fragment size)로 분포하고 있음을 확인할 수 있었으며, 상기 도 6의 도표를 체세포 변이, 생식세포 변이로 분류하여 각각을 세분화하여 확인한 결과 도 7에서 보는 바와 같이. 실제로 체세포 변이의 변형 대립유전자를 가지고 있는 리드일수록 생식세포 변이로 확인이 된 리드 절편보다 더 짧은 절편 크기를 가지고 있음을 확인할 수 있었다.The distribution of the fragment sizes of the leads derived according to the method of Example 1 for plasma cfDNA samples from 100 lung cancer patients was plotted according to the allele frequency (Variant Allele Frequency, VAF). Figure 6 is a graph showing the size of the actual reads according to the VAF value by dividing the alteration allele and reference allele according to each mutation type, and through this, somatic mutation. In the case of , it was confirmed that reads with variant alleles were distributed with shorter fragment sizes even at low VAF, and the diagram in Figure 6 was classified into somatic mutations and germline mutations and subdivided each into somatic mutations and germline mutations. As a result of confirmation, as shown in Figure 7. In fact, it was confirmed that the leads with more variant alleles of somatic mutations had shorter fragment sizes than the lead fragments confirmed to be germline mutations.
도 8은 다양한 절편 크기로부터 얻을 수 있는 값을 이용하여 기존 데이터베이스에 존재하지 않는 생식세포 변이와 체세포 변이를 구별할 수 있음을 실시예 1의 방법에 따라 작성된 ROC 커브를 통해 확인한 결과이다. AUC P1을 남색으로 표시하였으며, 평균 절편 크기를 연한 주황색으로 나타내었다. 상기 결과로부터 AUC P1 값과 평균 절편 크기를 이용할 때, 체세포 변이 생식세포 변이를 높은 정확도로 구분할 수 있음을 확인할 수 있었다.Figure 8 shows the results of confirming through the ROC curve created according to the method of Example 1 that germline mutations and somatic mutations that do not exist in the existing database can be distinguished using values obtained from various fragment sizes. AUC P1 is shown in dark blue, and the average intercept size is shown in light orange. From the above results, it was confirmed that when using the AUC P1 value and average fragment size, somatic mutations and germline mutations can be distinguished with high accuracy.
도 9 및 도 10은 실시예 1의 방법에 따라 도출된 평균 절편 및 AUC P1을 이용한 생식세포 변이 및 체세포 변이를 구별한 박스 플롯팅(box plotting) 결과를 나타낸다. 도 9 및 도 10에서 보는 바와 같이, 기존 연구자들이 가장 많이 사용하는 생식세포 변이 데이터베이스 (예를 들어, GNOMAD 등)에 등록되어 있지 않은 생식세포 변이인 경우, AUC P1 값과 평균 절편 크기를 이용하여 체세포 변이와 생식세포 변이가 통계적으로 유의미하게 구분됨을 확인할 수 있었다.Figures 9 and 10 show box plotting results for distinguishing germline mutations and somatic mutations using the average intercept and AUC P1 derived according to the method of Example 1. As shown in Figures 9 and 10, in the case of a germline variant that is not registered in the germline variant database (e.g., GNOMAD, etc.) most commonly used by existing researchers, the AUC P1 value and average fragment size are used. It was confirmed that somatic mutations and germline mutations were statistically significantly distinguished.
이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been examined focusing on its preferred embodiments. A person skilled in the art to which the present invention pertains will understand that the present invention may be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered from an illustrative rather than a restrictive perspective. The scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the equivalent scope should be construed as being included in the present invention.
Claims (4)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR10-2022-0030307 | 2022-03-10 | ||
| KR1020220030307A KR102544002B1 (en) | 2022-03-10 | 2022-03-10 | Method for Differentiating Somatic Mutation and Germline Mutation |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023171859A1 true WO2023171859A1 (en) | 2023-09-14 |
Family
ID=86948118
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2022/011527 Ceased WO2023171859A1 (en) | 2022-03-10 | 2022-08-04 | Method for distinguishing between somatic mutations and germline mutations |
Country Status (2)
| Country | Link |
|---|---|
| KR (1) | KR102544002B1 (en) |
| WO (1) | WO2023171859A1 (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180307796A1 (en) * | 2017-04-21 | 2018-10-25 | Illumina, Inc. | Using cell-free dna fragment size to detect tumor-associated variant |
| JP2019511070A (en) * | 2016-02-09 | 2019-04-18 | トマ・バイオサイエンシズ,インコーポレーテッド | System and method for analyzing nucleic acids |
| WO2019200338A1 (en) * | 2018-04-12 | 2019-10-17 | Illumina, Inc. | Variant classifier based on deep neural networks |
| KR20200057024A (en) * | 2017-09-20 | 2020-05-25 | 가던트 헬쓰, 인크. | Methods and systems for differentiating somatic and germline variants |
| JP2020521442A (en) * | 2017-05-16 | 2020-07-27 | ガーダント ヘルス, インコーポレイテッド | Identification of somatic or germline origin for cell-free DNA |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| NL2020861B1 (en) * | 2018-04-12 | 2019-10-22 | Illumina Inc | Variant classifier based on deep neural networks |
| KR102835853B1 (en) | 2019-10-08 | 2025-07-17 | 일루미나, 인코포레이티드 | Fragment size characterization of cell-free DNA mutations from clonal hematopoiesis |
-
2022
- 2022-03-10 KR KR1020220030307A patent/KR102544002B1/en active Active
- 2022-08-04 WO PCT/KR2022/011527 patent/WO2023171859A1/en not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2019511070A (en) * | 2016-02-09 | 2019-04-18 | トマ・バイオサイエンシズ,インコーポレーテッド | System and method for analyzing nucleic acids |
| US20180307796A1 (en) * | 2017-04-21 | 2018-10-25 | Illumina, Inc. | Using cell-free dna fragment size to detect tumor-associated variant |
| JP2020521442A (en) * | 2017-05-16 | 2020-07-27 | ガーダント ヘルス, インコーポレイテッド | Identification of somatic or germline origin for cell-free DNA |
| KR20200057024A (en) * | 2017-09-20 | 2020-05-25 | 가던트 헬쓰, 인크. | Methods and systems for differentiating somatic and germline variants |
| WO2019200338A1 (en) * | 2018-04-12 | 2019-10-17 | Illumina, Inc. | Variant classifier based on deep neural networks |
Also Published As
| Publication number | Publication date |
|---|---|
| KR102544002B1 (en) | 2023-06-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7546946B2 (en) | Use of size and number abnormalities in plasma DNA for the detection of cancer - Patents.com | |
| ES2959360T3 (en) | Improving cancer screening using acellular viral nucleic acids | |
| ES2772029T3 (en) | Methods and processes for the non-invasive evaluation of genetic variations | |
| Xu et al. | Non-invasive analysis of genomic copy number variation in patients with hepatocellular carcinoma by next generation DNA sequencing | |
| CN108676879A (en) | Application of specific methylation sites as diagnostic markers for molecular typing of breast cancer | |
| CN109830264B (en) | Method for classifying tumor patients based on methylation sites | |
| AU2019261597B2 (en) | Systems and methods for using pathogen nucleic acid load to determine whether a subject has a cancer condition | |
| US20130122499A1 (en) | System and method of detecting local copy number variation in dna samples | |
| Kim et al. | rSW-seq: algorithm for detection of copy number alterations in deep sequencing data | |
| CN115083521B (en) | Method and system for identifying tumor cell group in single cell transcriptome sequencing data | |
| CN116200490A (en) | Method for detecting tiny residual focus of solid tumor | |
| CN110093417A (en) | A method of the detection unicellular somatic mutation of tumour | |
| WO2023171859A1 (en) | Method for distinguishing between somatic mutations and germline mutations | |
| GB2596233A (en) | Methods and systems for detecting genetic fusions to identify a lung disorder | |
| KR102491322B1 (en) | Preparation Method Using Multi-Feature Prediction Model for Cancer Diagnosis | |
| WO2023191262A1 (en) | Method for predicting cancer recurrence using patient-specific panel | |
| Liu et al. | DirectHRD enables sensitive scar-based classification of homologous recombination deficiency | |
| WO2018199627A1 (en) | Personalized anticancer treatment method and system using cancer genome sequence mutation, transcript expression, and patient survival information | |
| WO2023182585A1 (en) | Method for analyzing copy number variation in circulating tumor nucleic acid | |
| CN108531593A (en) | Special application of the methylation sites as breast cancer relapse diagnosis marker | |
| WO2022124575A1 (en) | Method for diagnosing microsatellite instability using coefficient of variation of sequence lengths in microsatellite loci | |
| CN116042820B (en) | Colon cancer DNA methylation molecular markers and application thereof in preparation of early diagnosis kit for colon cancer | |
| CN116434830B (en) | Tumor focus position identification method based on ctDNA multi-site methylation | |
| Borchmann | An atlas of bacterial and viral associations in cancer | |
| Oroperv et al. | A reference-free strategy for circulating tumor DNA detection from whole-genome sequencing data |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22931106 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22931106 Country of ref document: EP Kind code of ref document: A1 |