WO2018149114A1 - Method and device for determining microdeletion and microduplication in foetal chromosomes - Google Patents
Method and device for determining microdeletion and microduplication in foetal chromosomes Download PDFInfo
- Publication number
- WO2018149114A1 WO2018149114A1 PCT/CN2017/100423 CN2017100423W WO2018149114A1 WO 2018149114 A1 WO2018149114 A1 WO 2018149114A1 CN 2017100423 W CN2017100423 W CN 2017100423W WO 2018149114 A1 WO2018149114 A1 WO 2018149114A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- microdeletion
- nucleic acid
- concentration
- fragment
- window
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
Definitions
- the present invention relates to the field of biomedicine, and in particular, to methods and apparatus for determining microdeletions and microduplications in chromosomes.
- the existing detection methods have certain limitations.
- the detection method 1) has low precision, and a large number of false positive results will occur. Since the results are only based on the change of the proportion of fragments in a certain region, the detection results are lacking, and the effective filtering is lacking. Method, the appearance of false positives is difficult to avoid.
- Method 2) requires probe capture and high-depth sequencing, or needs to obtain parent-source information. High-depth capture requires design of the chip, which increases the difficulty of the experiment. High-depth sequencing increases the cost, and the uncaptured part cannot. Determination.
- Another aspect of the invention also provides an apparatus for determining microdeletion microrepetitions in a fetal chromosome, comprising:
- micro-deletion micro-repeat fragment concentration calculating device for obtaining a concentration fm containing a micro-deletion micro-repeat fragment
- a fetal nucleic acid concentration obtaining device for obtaining a male fetal nucleic acid concentration fy or a female fetal nucleic acid concentration fs;
- a first filtering device for calculating rmY or rms according to the missing copy number or repeated copy number, filtering out false positives
- a second filtering device for taking the fractional part of the rmY dmY, or the fractional part of the rms Divide dms to determine whether dmY or dms is positive, otherwise the result is filtered out;
- the third filtering device is configured to filter the micro-deletion micro-repeat fragments according to the determination principle, and filter to obtain a fetal chromosome micro-deletion micro-repeat fragment.
- the method and device provided by the invention can accurately determine the microdeletion microrepetition in the chromosome, and is particularly suitable for determining the microdeletion microrepetition of the fetal chromosome in the peripheral blood of the pregnant woman.
- the invention does not require an additional chip design, saves the cost of the chip design, and makes the experimental method simple.
- Figure 1 is a flow diagram of a method of determining fetal microdeletion microduplication in an embodiment of the invention.
- Figure 2 is a flow diagram of a method of obtaining a concentration fm of a fragment containing a microdeletion microrepeat in the embodiment of Figure 1.
- FIG. 3 is a flow chart of a method of obtaining a micro-deletion micro-repeat final window in the embodiment of FIG. 1.
- FIG. 4 is a flow chart of a method of obtaining a male fetal nucleic acid concentration fy in the embodiment of FIG. 1.
- Figure 5 is a flow diagram of a method of obtaining a female fetal nucleic acid concentration fs in the embodiment of Figure 1.
- Figure 6 is a flow chart of a method of obtaining a predetermined range in the method of Figure 5.
- Figure 8 is a block diagram showing the structure of an apparatus for determining microdeletion microduplication in a fetal chromosome in another embodiment of the present invention.
- Fig. 9 is a block diagram showing the configuration of a concentration calculating apparatus containing microdeletion microrepetition fragments in the embodiment of Fig. 8.
- FIG. 10 is a structural block diagram of an ultimate window obtaining unit in which the micro-deletion micro-repetition in the embodiment of FIG. 8 is located.
- Figure 11 is a block diagram showing the structure of a male fetal nucleic acid concentration fy obtaining unit in the embodiment of Figure 8.
- Figure 12 is a block diagram showing the structure of a female fetal nucleic acid concentration fs obtaining unit in the embodiment of Figure 8.
- Figure 13 is a block diagram showing the structure of a predetermined range determining element in the embodiment of Figure 8.
- Figure 14 is a block diagram showing the structure of a predetermined function determining element in the embodiment of Figure 8.
- Figure 15 is a graph showing the results of 19 sample microdeletion microrepeats in Example 2.
- Second average depth obtaining unit 113 Microdeletion microrepeat fragment concentration obtaining unit 114
- the micro-missing micro-repetition is the ultimate window obtaining unit 115
- First sequencing component 1151 Alignment component 1152 Length determining component 1153
- Primary window determination component 1154 First statistical component 1155
- Correction component 1156 First merge component 1157
- First filter element 1158 Second merge component 1159
- Fetal nucleic acid concentration obtaining device 120 Male fetal nucleic acid concentration fy obtaining unit 121
- Second sequencing component 1211 First number determining component 1212
- Filter module 12121 Second statistical component 1213
- Average depth acquisition component 1214 Male fetal nucleic acid concentration acquisition component 1215
- Third sequencing component 1221 Second number determining component 1222 Frequency determining component 1223
- Female fetal nucleic acid concentration determining element 1224 Predetermined range determining component 1225 Length determination module 12251 First frequency determination module 12252 Correlation coefficient determination module 12253 Scheduled range determination module 12254 Predetermined function determining component 1226 Second frequency determination module 12261 Fitting module 12262 Ratio calculation device 130 First filter device 140 False positive judgment unit 141 Second filter device 150 Positive judgment unit 151 And value calculation device 160 Third filter device 170
- parent sample refers herein to a biological sample obtained from a pregnant subject, eg, a woman.
- microdeletion microrepetition refers to deletions or duplications on the chromosome that range from 1.5 kb to 10 Mb in length.
- GC correction refers to the correction of the GC content in the sequence.
- the present invention provides a method for determining microduplication of fetal chromosomal microdeletions, comprising:
- the inventors have surprisingly found that the method of the present invention enables accurate determination of microdeletion microrepetitions in chromosomes, and is particularly useful for determining fetal chromosomes in peripheral blood of pregnant women. Microdeletion microrepetition.
- the concentration fm of the microdeletion microrepetition fragment in the step S1 is obtained by the following steps:
- d1 the total number of sequences of the primary window containing the microdeletion microrepetitions/the total number of primary windows containing the microdeletion microrepetitions;
- the total number of primary windows and the number of sequences without the microdeletion microrepetition can be derived from a method containing a microdeletion microrepeat end window.
- the final window has an absolute coordinate of the start and end positions. Based on the coordinates of the secondary window, the coordinates of the secondary window are found, and then it is confirmed how many primary windows are in the secondary window, and the initial and final primary windows are removed. To exclude fluctuations in the data, and then get the final primary window, calculate the total number of sequences.
- the final window containing microdeletion microduplication is obtained by the following steps:
- S111 Perform nucleic acid sequencing on a biological sample containing free nucleic acid to obtain a sequencing result composed of a plurality of sequencing data;
- each of the unique alignment sequencing sequence sets can only match one position of the reference genome
- S117 Combine a predetermined number of adjacent primary windows into a plurality of secondary windows, and determine a number of sequences in each secondary window.
- S121 Perform a hypothesis test on the final window obtained by the final combination, and obtain an ultimate window containing the micro-deletion micro-repeat.
- the biological sample containing the free nucleic acid is free fetal nucleic acid in the peripheral blood of the pregnant woman.
- the nucleic acid is DNA.
- the sequencing result comprises a length of the free nucleic acid and a base arrangement order.
- the "length" refers to the length of the nucleic acid, and can be expressed in units of base pairs, that is, bp.
- the sequencing is double-end sequencing, single-end sequencing or single-molecule sequencing.
- the length of the free nucleic acid is easily obtained, which is advantageous for the subsequent steps.
- the predetermined length in the step S114 is 1 bp to 5 M, and the predetermined number in the step S117 is 5 to 100.
- the predetermined length is 20-40 Kb.
- the method of GC correction comprises using local weighted regression, linear regression or logistic regression.
- the inter-batch adjustment is to calculate a baseline for each primary window corresponding to all samples in the sequence of the sequencing, based on the number of unique alignment sequencing sequences within each primary window based on the baseline. Weighted correction.
- the value of T1 in the step S118 comprises calculating according to a Z-test or a T-test, the filtering is filtering out the secondary window in which the T1 value is between -3-3.
- the value of T2 in the step S119 is calculated according to a rank sum test, a symbol test or a run test, and the non-significant difference is that the T2 value of the adjacent two windows is -3-3. between.
- the hypothesis test in the step S121 comprises calculating according to a Z test or a T test, the test threshold being defined as 3. That is, when the statistic of the test is >3 or ⁇ -3, it is determined to be the final window containing the microdeletion microrepetition.
- the male fetal nucleic acid concentration fy in the step S2 is obtained by the following steps:
- the total number of primary windows and the number of sequences without the microdeletion microrepetition can be derived from a method containing a microdeletion microrepeat end window.
- the step S212 further comprises: dividing the reference genome into a plurality of primary windows according to a predetermined length, and removing the primary window in the Y chromosome whose number of unique alignment sequences is more than 5 times the number of the average sequence.
- the primary window is a primary window adjusted by GC modification.
- the female fetal nucleic acid concentration fs in the step S2 is obtained by the following steps:
- the predetermined range in the step S222 is determined by the following steps:
- S2222 setting a plurality of candidate length ranges, and respectively determining a frequency of the unique alignment sequencing sequence that appears in each candidate length range of the plurality of control samples;
- the predetermined range is determined based on a plurality of control samples, wherein the concentration of the nucleic acid in the control sample is known, preferably, the predetermined range is based on at least 20 control samples definite.
- control sample is a maternal peripheral blood sample of a normal male fetus with a known ratio of free fetal nucleic acid, and the nucleic acid concentration in the control sample is determined using the Y chromosome.
- the free fetal nucleic acid concentration in the control sample is determined using the Y chromosome, i.e., by the method of the above-described male fetal nucleic acid concentration fy of the present invention.
- the candidate length range in the S2222 spans from 1 to 300 bp, preferably from 1 to 20 bp.
- the plurality of candidate length ranges have a step size of 1-2 bp.
- the candidate length ranges are 1-20, 2-21,3-22, ..., respectively, wherein the span is 20 bp and the step size is 1 bp.
- the predetermined range in the step S222 is 179 bp to 206 bp.
- the predetermined function in the step S223 is obtained by the following steps:
- S2232 Fitting a frequency of the unique alignment sequencing sequence within the predetermined range among the plurality of control samples with a known nucleic acid concentration to determine the predetermined function.
- the fit is a linear fit.
- the step S4 further includes: if the rmY ⁇ 2 is calculated according to the missing copy number or the repeated copy number is calculated to obtain rmY ⁇ 6, it is determined to be untrustworthy, and the fake is filtered. Positive result
- rms ⁇ 2 is calculated based on the missing copy number or the repeated copy number is calculated to obtain rms ⁇ 6, it is determined to be unreliable, and the false positive result is filtered out.
- the step S5 further comprises: if dmY ⁇ 0.13 or dmY>0.85, dmY is positive;
- dms ⁇ 0.15 or dms > 0.791, dms is positive.
- the determining principle in the step S7 is: if amY is between 0.95 and 1.05, the fragment of the microdeletion microrepetition is considered to be from the mother, and the fragment of the microdeletion microrepetition is filtered. ;
- the microdeletion microrepetitive fragment is considered to be from the mother, and the microdeletion microrepetitive fragment is filtered.
- an aspect of the present invention also provides an apparatus 100 for determining microdeletion microrepetitions in a fetal chromosome, comprising:
- a micro-deletion micro-repeat fragment concentration calculating device 110 for obtaining a concentration fm containing a micro-deletion micro-repeat fragment
- a fetal nucleic acid concentration obtaining device 120 for obtaining a male fetal nucleic acid concentration fy or a female fetal nucleic acid concentration fs;
- a first filtering device 140 configured to calculate rmY or rms according to the missing copy number or repeated copy number, and filter out false positives;
- the second filtering device 150 is configured to take the fractional part dmY of rmY, or the fractional part dms of rms, to determine whether dmY or dms is positive, or filter out the result;
- the third filtering device 170 is configured to filter the micro-deletion micro-repeat fragments according to the determination principle, and filter to obtain a fetal chromosome micro-deletion micro-repeat fragment.
- the micro-deletion micro-repeat fragment concentration calculating apparatus 110 further includes:
- An initial pole window obtaining unit 111 configured to obtain a primary window containing no microdeletion microrepetition according to a primary window containing microdeletion microrepetitions, calculate a total sequence number of primary windows containing microdeletion microduplications, and microdeletion microduplications The total number of primary windows, as well as the total number of sequences of primary windows without microdeletion microduplications and the number of primary windows that do not contain microdeletion microduplications;
- the micro-deletion micro-repetition fragment concentration calculation device 110 further includes a micro-deletion micro-repetition in the ultimate window obtaining unit 115.
- the micro-deletion micro-repetition in the final window obtaining unit 115 includes:
- a first sequencing component 1151 for performing nucleic acid sequencing on a biological sample containing free nucleic acid to obtain a sequencing result composed of a plurality of sequencing data
- a length determining component 1153 for determining a length of each unique aligned sequencing sequence in the unique aligned sequencing sequence set
- a primary window determining component 1154 for dividing the reference genome into a plurality of primary windows according to a predetermined length, the predetermined length being 1 bp - 5 M;
- a first statistical component 1155 configured to count the number of each unique alignment sequencing sequence falling into each primary window
- a first merging element 1157 configured to combine a predetermined number of adjacent primary windows into a plurality of secondary windows, and determine a number of sequences in each secondary window;
- a first filter element 1158 configured to perform statistical verification on each secondary window, calculate a T1 value, and filter the secondary window according to the T1 value;
- a second merging component 1159 is configured to perform a statistical check on the filtered secondary window, calculate a T2 value, and merge two adjacent secondary windows having no significant difference into an ultimate window according to the T2 value;
- the micro-deletion micro-repeat final window determining component 1161 is configured to perform a hypothesis test on the final merged final window to obtain an ultimate window containing micro-deletion micro-repeats.
- the predetermined number of the first merging elements 1157 is between 5 and 100.
- the predetermined length is 20-40 Kb.
- the method of GC correction in the correction element 1156 includes the use of local weighted regression, linear regression or logistic regression.
- the inter-batch adjustment in the correction element 1156 is to calculate a baseline for each primary window corresponding to all samples in the sequenced batch, a unique alignment within each primary window based on the baseline.
- the number of sequencing sequences is weighted and corrected.
- the T1 value in the first filter element 1158 comprises a calculation based on a Z-test or a T-test, which filter is filtered out of a secondary window having a T1 value between -3-3.
- the T2 value in the second combining element 1159 is calculated according to a rank sum test, a symbol check or a run test, the insignificant difference being the T2 value of the adjacent two windows at -3 Between -3.
- the hypothesis test in the micro-deletion micro-repeat final window determining element 1161 comprises calculating from a Z-test or a T-test, the test threshold being defined as three. That is, when the statistic of the test is >3 or ⁇ -3, it is determined to be the final window containing the microdeletion microrepetition.
- the fetal nucleic acid concentration obtaining unit 121 further includes a male fetal nucleic acid concentration fy obtaining unit 121.
- the male fetal nucleic acid concentration fy obtaining unit 121 includes:
- a second sequencing component 1211 for sequencing a biological sample containing free nucleic acid to obtain a sequencing result composed of a plurality of sequencing data
- a first number determining component 1212 configured to determine, according to the sequencing result, a number of unique aligned sequencing sequences in the Y chromosome in the sample that fall within a primary window;
- a second statistical component 1213 for counting the sum of the number of unique aligned sequencing sequences in each primary window on the Y chromosome and the total number of primary windows;
- the first number determining element 1212 further comprises a filtering module 12121 for grouping reference genes by a predetermined length Divided into a plurality of primary windows, the primary window in which the number of unique alignment sequences in the Y chromosome is more than 5 times larger than the average number of sequences is removed.
- the fetal nucleic acid concentration obtaining device 120 further includes a female fetal nucleic acid concentration fs obtaining unit 122.
- the female fetal nucleic acid concentration fs obtaining unit 122 includes:
- a third sequencing component 1221 configured to sequence a biological sample containing free nucleic acid to obtain a sequencing result composed of a plurality of sequencing data
- a second number determining component 1222 configured to determine, according to the sequencing result, a number of unique aligned sequencing sequences whose length falls within a predetermined range in the sample;
- a frequency determining component 1223 for determining a frequency at which a unique alignment sequencing sequence occurs within the predetermined range based on the number of unique aligned sequencing sequences whose length falls within a predetermined range;
- a female fetal nucleic acid concentration determining element 1224 is configured to determine a female fetal nucleic acid concentration fs in the sample according to a predetermined function based on a frequency of the unique aligned sequencing sequence occurring within the predetermined range.
- the female fetal nucleic acid concentration fs obtaining unit 122 further includes a predetermined range determining component 1225.
- the predetermined range determining component 1225 further includes:
- a length determining module 12251 configured to determine a length of a unique alignment sequencing sequence included in the plurality of control samples
- the first frequency determining module 12252 is configured to set a plurality of candidate length ranges, and respectively determine a frequency of the unique aligned sequencing sequence that occurs in each candidate length range of the plurality of control samples;
- a correlation coefficient determination module 12253 configured to generate a frequency of uniquely aligned sequencing sequences within each candidate length range based on the plurality of control samples, and in the control sample a concentration of the nucleic acid, determining a correlation coefficient between each of the candidate length ranges and a concentration of the nucleic acid in the control sample;
- the predetermined range determining module 12254 is configured to determine, according to the value of the correlation coefficient, at least one candidate length range or a candidate length range combination as the predetermined range.
- the predetermined range is determined based on a plurality of control samples, wherein the nucleic acid concentration in the control sample is known, preferably, the predetermined range is determined based on at least 20 control samples of.
- control sample is a maternal peripheral blood sample of a normal male fetus with a known ratio of free fetal nucleic acid, and the free fetal nucleic acid concentration in the control sample is determined using the Y chromosome. That is, it is determined by the method of the above-described male fetal nucleic acid concentration fy of the present invention.
- the candidate length range spans from 1 to 300 bp, preferably from 1 to 20 bp.
- the plurality of candidate length ranges have a step size of 1-2 bp.
- the candidate length ranges are 1-20, 2-21,3-22, ..., respectively, wherein the span is 20 bp and the step size is 1 bp.
- the predetermined range is from 179 bp to 206 bp.
- the female fetal nucleic acid concentration fs obtaining unit 122 further includes a predetermined function determining component 1226.
- the predetermined function determining component 1226 includes:
- a second frequency determining module 12261 configured to determine, in the plurality of control samples, a frequency of occurrence of a unique alignment sequencing sequence within the predetermined range, respectively;
- a fitting module 12262 for using the plurality of control samples in the predetermined range
- the frequency at which a unique aligned sequencing sequence occurs is fitted to a known nucleic acid concentration to determine the predetermined function.
- the fit is a linear fit.
- the first filtering device 140 further includes a false positive determining unit 141, configured to calculate rmY ⁇ 6 if rmY ⁇ 2 or the repeated copy number is calculated according to the missing copy number. , then judged to be untrustworthy, filtering out false positive results;
- rms ⁇ 2 is calculated based on the missing copy number or the repeated copy number is calculated to obtain rms ⁇ 6, it is determined to be unreliable, and the false positive result is filtered out.
- the second filtering device 150 further includes a positive determining unit 151 for determining that dmY is positive if dmY ⁇ 0.13 or dmY>0.85; or, if dms ⁇ 0.15 or dms>0.791, Then dms is positive.
- the determining principle in the third filtering device 170 is: if amY is between 0.95 and 1.05, the fragment of the microdeletion microrepetition is considered to be from the mother, and the microdeletion is filtered. Repeated segment
- the microdeletion microrepetitive fragment is considered to be from the mother, and the microdeletion microrepetitive fragment is filtered.
- the reference genome is divided into a plurality of primary windows according to a predetermined length, the predetermined length being 1 bp to 5 M, preferably 20 kp to 40 kp being a predetermined length, for example (1-20 bp, 20-40 bp, 40-80 bp, 80) -100bp, 100-120bp, ...,);
- the male fetal nucleic acid concentration fy is calculated.
- d2 is the average depth of the primary window without the microdeletion microrepetition
- d2 the total number of primary windows without the microdeletion microrepetition / Number of primary windows without micro-deletion micro-repeats.
- the predetermined range is 179 bp to 206 bp.
- the predetermined range is obtained by the following steps:
- this embodiment uses a male fetal control sample in which the concentration of free fetal nucleic acid is determined according to the Y chromosome, ie through the above male The method of fetal nucleic acid concentration fy is determined.
- a combination of degrees or ranges determines a correlation coefficient between each of the candidate length ranges and the concentration of the nucleic acid; wherein the correlation coefficient is calculated by correlation, including linear regression, logistic regression, local weighting, and the like.
- the candidate length range has a span of 1-300 bp, preferably 1-20 bp.
- the multiple candidate length ranges have a step size of 1-2 bp.
- the predetermined function is obtained by the following steps:
- the missing copy number is calculated by rms ⁇ 2 or the repeated copy number is calculated to obtain rms ⁇ 6, it is determined to be untrustworthy, and the false positive result is filtered out;
- Filtering false positives is to remove the effects of multiple copies and make the results more accurate.
- microdeletion microrepetitive fragment is considered to be from a mother, and the microdeletion microrepetitive fragment is filtered;
- the microdeletion microrepetitive fragment is considered to be from the mother, the microdeletion microrepetitive fragment is filtered, and the fetal chromosomal microdeletion microrepetitive fragment is obtained after filtration.
- One batch of 100 samples was selected, and 2 ml of peripheral blood was extracted for plasma separation.
- Library construction can be performed with reference to plasma library construction requirements well known to those skilled in the art.
- the sequencing process can be performed on a computer basis with reference to sequencing procedures well known to those skilled in the art.
- the sequencing results are aligned to the reference genome to determine the location of the unique alignment sequence.
- the reference genome was divided into multiple primary windows according to the length of 20 kb.
- the number of unique alignment sequences and GC content in each primary window was counted, and the number of sequences falling into the primary window by local weighted regression was used for GC correction.
- Table 1 Table shows the results of 19 samples, the first column is the id of the sample, the second column is the microdeletion microrepetition, and the third column is the microdeletion microrepetition length of the chromosome, fourth The column is the detected T value.
- the selected region was 179bp-206bp, and the correlation coefficient was R -0.9056996.
- the relationship between the frequency of the nucleic acid in the remaining 11 samples ranging from 179 bp to 206 bp in the range of 179 bp to 206 bp was calculated as a function of the free fetal nucleic acid concentration.
- Filtration is performed according to the copy number, and the missing fragment having the copy number rmY value of 2 or more is filtered, and the fragment having the repeated copy number rmY value of 6 or more is filtered.
- fragments having a dmY greater than 0.13 and less than 0.85 were filtered.
- Fragments with amY > 0.95 and amY ⁇ 1.05 were filtered to obtain samples of male fetuses containing microdeletions and microduplications.
- fragments with dms greater than 0.15 and less than 0.791 were filtered.
- Fragments with ams > 0.93 and ams ⁇ 1.06 were filtered to obtain a sample of female fetuses containing microdeletions and microduplications.
- the results of the micro-deletion micro-replication can be filtered to obtain a large number of false positives, and an accurate result is obtained, see FIG.
- the abscissa indicates the number of the sample
- the ordinate indicates the concentration
- fm indicates the concentration estimated by the microdeletion microrepetition
- fy indicates the concentration estimated by the male fetus sample according to chrY
- fs indicates the concentration estimated by the fetus according to the fragment.
- the method for determining the microdeletion microrepetition in the fetal chromosome in this embodiment is the same as that in the second embodiment. Again, the difference is that in step 4.2, the 40 kb window is divided.
- the method for determining the microdeletion microrepetition in the fetal chromosome in this embodiment is the same as that in the second embodiment, except that the step 4.4 is performed in units of 200, and the length of the secondary window obtained after the combination is 4M.
Landscapes
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Chemical & Material Sciences (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Analytical Chemistry (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
本发明涉及生物医学领域,具体的,确定染色体中微缺失和微重复的方法及设备。The present invention relates to the field of biomedicine, and in particular, to methods and apparatus for determining microdeletions and microduplications in chromosomes.
在无创产前检测(NIPT)临床领域,胎儿微缺失微重复的筛查灵敏度较低。现有基于母亲外周血估算胎儿染色体微缺失微重复的方法主要分为以下两个方向:1)基于在母亲血浆中胚胎DNA序列比例的变化估算微缺失微重复。2)利用单核苷酸多态性(SNPs)位点表现的差异化,选择多个SNP位点进行估算。In the non-invasive prenatal testing (NIPT) clinical field, screening for fetal microdeletion microrepetition is less sensitive. Existing methods for estimating fetal microdeletion microduplication based on maternal peripheral blood are mainly divided into the following two directions: 1) Estimating microdeletion microduplication based on changes in the proportion of embryonic DNA sequences in maternal plasma. 2) Using the differentiation of single nucleotide polymorphism (SNPs) locus expression, multiple SNP loci were selected for estimation.
现有的检测方法都存在一定的局限性,检测的方法1)精度较低,会出现大量的假阳性结果,由于结果仅根据某区域内片段的比例的变化来得出检测结果,缺乏有效的过滤方法,假阳性的出现很难避免。方法2)需要探针捕获和高深度测序,或者需要获得父源性信息,高深度捕获需要设计芯片,增加了实验的难度,高深度测序会增加一定的成本,未被捕获的部分则不能进行测定。The existing detection methods have certain limitations. The detection method 1) has low precision, and a large number of false positive results will occur. Since the results are only based on the change of the proportion of fragments in a certain region, the detection results are lacking, and the effective filtering is lacking. Method, the appearance of false positives is difficult to avoid. Method 2) requires probe capture and high-depth sequencing, or needs to obtain parent-source information. High-depth capture requires design of the chip, which increases the difficulty of the experiment. High-depth sequencing increases the cost, and the uncaptured part cannot. Determination.
发明内容Summary of the invention
本发明的目的在于,提供一种确定胎儿染色体中微缺失微重复的方法及设备,该方法通过计算胎儿微缺失微重复的片段浓度及胎儿本身核酸的浓度,对微缺失微重复进行评估,降低假阳性率,精准度高。It is an object of the present invention to provide a method and apparatus for determining microdeletion microduplication in a fetal chromosome by evaluating the microdeletion microrepetition by calculating the concentration of the microdeletion of the fetal microdeletion and the concentration of the nucleic acid of the fetus itself. False positive rate, high precision.
基于以上目的,本发明一方面提供一种确定胎儿染色体中微缺 失重复的方法,包括以下步骤:Based on the above objects, an aspect of the present invention provides a method for determining a micro defect in a fetal chromosome The method of losing duplicates includes the following steps:
S1、获得含有微缺失微重复片段的浓度fm;S1, obtaining a concentration fm containing a microdeletion microrepeat fragment;
S2、获得男性胎儿核酸浓度fy或女性胎儿核酸浓度fs;S2, obtaining a male fetal nucleic acid concentration fy or a female fetal nucleic acid concentration fs;
S3、计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的比值rmY=fm/fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的比值rms=fm/fs;S3. Calculating a ratio rmY=fm/fy of the concentration fm containing the microdeletion microrepeat fragment to the male fetal nucleic acid concentration fy, or calculating a ratio of the concentration fm containing the microdeletion microrepeat fragment to the female fetal nucleic acid concentration fs rms=fm/fs ;
S4、根据缺失的拷贝数或重复的拷贝数计算rmY或rms,过滤掉假阳性;S4, calculating rmY or rms according to the missing copy number or the repeated copy number, filtering out the false positive;
S5、取rmY的小数部分dmY,或rms的小数部分dms,判断dmY或dms是否为阳性,否则过滤掉结果;S5, taking the fractional part dmY of rmY, or the fractional part dms of rms, determining whether dmY or dms is positive, otherwise filtering out the result;
S6、计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的和为amY=fm+fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的和为ams=fm+fs;S6. Calculating the sum of the concentration fm containing the microdeletion microrepeat fragment and the male fetal nucleic acid concentration fy is amY=fm+fy, or calculating the sum of the concentration fm containing the microdeletion microrepeat fragment and the female fetal nucleic acid concentration fs is ams=fm +fs;
S7、根据判定原则对微缺失微重复的片段进行过滤,过滤后得到胎儿染色体微缺失微重复片段。S7, filtering the micro-deletion micro-repeat fragments according to the judgment principle, and filtering to obtain a fetal chromosome micro-deletion micro-repeat fragment.
本发明另一方面还提供一种确定胎儿染色体中微缺失微重复的设备,包括:Another aspect of the invention also provides an apparatus for determining microdeletion microrepetitions in a fetal chromosome, comprising:
微缺失微重复片段浓度计算装置,用于获得含有微缺失微重复片段的浓度fm;a micro-deletion micro-repeat fragment concentration calculating device for obtaining a concentration fm containing a micro-deletion micro-repeat fragment;
胎儿核酸浓度获得装置,用于获得男性胎儿核酸浓度fy或女性胎儿核酸浓度fs;a fetal nucleic acid concentration obtaining device for obtaining a male fetal nucleic acid concentration fy or a female fetal nucleic acid concentration fs;
比值计算装置,用于计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的比值rmY=fm/fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的比值rms=fm/fs;a ratio calculating device for calculating a ratio rmY=fm/fy of a concentration fm containing a microdeletion microrepetition fragment to a male fetal nucleic acid concentration fy, or calculating a ratio rms of a concentration fm containing a microdeletion microrepeat fragment to a female fetal nucleic acid concentration fs =fm/fs;
第一过滤装置,用于根据缺失的拷贝数或重复的拷贝数计算rmY或rms,过滤掉假阳性;a first filtering device for calculating rmY or rms according to the missing copy number or repeated copy number, filtering out false positives;
第二过滤装置,用于取rmY的小数部分dmY,或rms的小数部 分dms,判断dmY或dms是否为阳性,否则过滤掉结果;a second filtering device for taking the fractional part of the rmY dmY, or the fractional part of the rms Divide dms to determine whether dmY or dms is positive, otherwise the result is filtered out;
和值计算装置,用于计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的和为amY=fm+fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的和为ams=fm+fs;And a value calculation device for calculating the sum of the concentration fm containing the microdeletion microrepetition fragment and the male fetal nucleic acid concentration fy as amY=fm+fy, or calculating the concentration fm containing the microdeletion microrepeat fragment and the female fetal nucleic acid concentration fs And ams=fm+fs;
第三过滤装置,用于根据判定原则对微缺失微重复片段进行过滤,过滤后得到胎儿染色体微缺失微重复片段。The third filtering device is configured to filter the micro-deletion micro-repeat fragments according to the determination principle, and filter to obtain a fetal chromosome micro-deletion micro-repeat fragment.
本发明提供的方法及装置,能够精确的确定染色体中的微缺失微重复,尤其适用于确定孕妇外周血中的胎儿染色体的微缺失微重复。The method and device provided by the invention can accurately determine the microdeletion microrepetition in the chromosome, and is particularly suitable for determining the microdeletion microrepetition of the fetal chromosome in the peripheral blood of the pregnant woman.
同现有技术相比:Compared with the prior art:
1、本发明不需要额外的芯片设计,节省了芯片设计的费用,并使得实验方法简单。1. The invention does not require an additional chip design, saves the cost of the chip design, and makes the experimental method simple.
2、不需要高深度的测序,在用全基因组数据进行染色体非整倍性的基础上对数据进行后续的处理,能够直接得出准确的微缺失微重复结果,无需增加数据量。2. Without high-level sequencing, the subsequent processing of the data on the basis of chromosome aneuploidy using whole genome data can directly yield accurate micro-deletion micro-repetition results without increasing the amount of data.
3、克服了现有技术中采用snp的方法可能会遗漏一些捕获无法达到的区域,本发明可以在全基因组上进行检测。3. Overcoming the prior art method using snp may miss some areas that cannot be captured, and the present invention can be detected on the whole genome.
4、首次实现了用全基因组测序的方法,判断胎儿微缺失微重复浓度。4. For the first time, the whole genome sequencing method was used to determine the micro-repetition concentration of fetal microdeletions.
5、降低了假阳性,提高了准确度。5. Reduced false positives and improved accuracy.
6、去除了母体微缺失微重复对胎儿的影响,提高准确度。6. Remove the influence of the micro-repetition of the mother micro-deletion on the fetus and improve the accuracy.
图1是本发明一实施例中确定胎儿染色体微缺失微重复的方法的流程图。BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a flow diagram of a method of determining fetal microdeletion microduplication in an embodiment of the invention.
图2是图1实施例中的获得含有微缺失微重复的片段的浓度fm的方法的流程图。 Figure 2 is a flow diagram of a method of obtaining a concentration fm of a fragment containing a microdeletion microrepeat in the embodiment of Figure 1.
图3是图1实施例中的获得含有微缺失微重复终极窗口的方法的流程图。3 is a flow chart of a method of obtaining a micro-deletion micro-repeat final window in the embodiment of FIG. 1.
图4是图1实施例中的获得男性胎儿核酸浓度fy的方法的流程图。4 is a flow chart of a method of obtaining a male fetal nucleic acid concentration fy in the embodiment of FIG. 1.
图5是图1实施例中的获得女性胎儿核酸浓度fs的方法的流程图。Figure 5 is a flow diagram of a method of obtaining a female fetal nucleic acid concentration fs in the embodiment of Figure 1.
图6是图5方法中的获得预定范围的方法的流程图。Figure 6 is a flow chart of a method of obtaining a predetermined range in the method of Figure 5.
图7是图5方法中的获得预定的函数的方法的流程图。7 is a flow chart of a method of obtaining a predetermined function in the method of FIG. 5.
图8是本发明另一实施例中确定胎儿染色体中微缺失微重复的设备的结构框图。Figure 8 is a block diagram showing the structure of an apparatus for determining microdeletion microduplication in a fetal chromosome in another embodiment of the present invention.
图9是图8实施例中的含有微缺失微重复片段的浓度计算装置的结构框图。Fig. 9 is a block diagram showing the configuration of a concentration calculating apparatus containing microdeletion microrepetition fragments in the embodiment of Fig. 8.
图10是图8实施例中的微缺失微重复所在终极窗口获得单元的结构框图。FIG. 10 is a structural block diagram of an ultimate window obtaining unit in which the micro-deletion micro-repetition in the embodiment of FIG. 8 is located.
图11是图8实施例中的男性胎儿核酸浓度fy获得单元的结构框图。Figure 11 is a block diagram showing the structure of a male fetal nucleic acid concentration fy obtaining unit in the embodiment of Figure 8.
图12是图8实施例中的女性胎儿核酸浓度fs获得单元的结构框图。Figure 12 is a block diagram showing the structure of a female fetal nucleic acid concentration fs obtaining unit in the embodiment of Figure 8.
图13是图8实施例中的预定范围确定元件的结构框图。Figure 13 is a block diagram showing the structure of a predetermined range determining element in the embodiment of Figure 8.
图14是图8实施例中的预定函数确定元件的结构框图。Figure 14 is a block diagram showing the structure of a predetermined function determining element in the embodiment of Figure 8.
图15是实施例2中19个样本微缺失微重复结果展示图。Figure 15 is a graph showing the results of 19 sample microdeletion microrepeats in Example 2.
主要元件符号说明Main component symbol description
如下具体实施方式将结合上述附图进一步说明本发明。The invention will be further illustrated by the following detailed description in conjunction with the accompanying drawings.
下面详细描述本发明的实施例。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below. The embodiments described below with reference to the accompanying drawings are intended to be illustrative of the invention and are not to be construed as limiting.
需要说明的是,术语“初级”、“次级”、“终极”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“初级”、“次级”、“终极”的特征可以明示或者隐含地包括一个或更多个该特征。进一步地,在本发明的描述中,除非另有说明,“多个”的含义是两个或两个以上。本发 明中的“唯一比对序列”、“唯一比对测序序列”有时也可称之为“序列”、“测序序列”。It should be noted that the terms "primary", "secondary", "ultimate" are used for descriptive purposes only, and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "primary", "secondary", "ultimate" may include one or more of the features, either explicitly or implicitly. Further, in the description of the present invention, the meaning of "a plurality" is two or more unless otherwise specified. This hair The "unique alignment sequence" and "unique alignment sequence" in the Ming are sometimes referred to as "sequences" and "sequencing sequences".
术语“母体样品”在本文中是指这样的生物样品,其获自妊娠受试者,例如,妇女。The term "parent sample" refers herein to a biological sample obtained from a pregnant subject, eg, a woman.
术语“微缺失微重复”是指染色体上出现长度为1.5kb-10Mb的缺失或重复。The term "microdeletion microrepetition" refers to deletions or duplications on the chromosome that range from 1.5 kb to 10 Mb in length.
术语“GC修正”是指对序列中的GC含量进行修正。The term "GC correction" refers to the correction of the GC content in the sequence.
参见图1,本发明提供一种确定胎儿染色体微缺失微重复的方法,包括:Referring to Figure 1, the present invention provides a method for determining microduplication of fetal chromosomal microdeletions, comprising:
S1、获得含有微缺失微重复片段的浓度fm;S1, obtaining a concentration fm containing a microdeletion microrepeat fragment;
S2、获得男性胎儿核酸浓度fy或女性胎儿核酸浓度fs;S2, obtaining a male fetal nucleic acid concentration fy or a female fetal nucleic acid concentration fs;
S3、计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的比值rmY=fm/fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的比值rms=fm/fs;S3. Calculating a ratio rmY=fm/fy of the concentration fm containing the microdeletion microrepeat fragment to the male fetal nucleic acid concentration fy, or calculating a ratio of the concentration fm containing the microdeletion microrepeat fragment to the female fetal nucleic acid concentration fs rms=fm/fs ;
S4、根据缺失的拷贝数或重复的拷贝数计算rmY或rms,过滤掉假阳性;S4, calculating rmY or rms according to the missing copy number or the repeated copy number, filtering out the false positive;
S5、取rmY的小数部分dmY,或rms的小数部分dms,判断dmY或dms是否为阳性,否则过滤掉结果;S5, taking the fractional part dmY of rmY, or the fractional part dms of rms, determining whether dmY or dms is positive, otherwise filtering out the result;
S6、计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的和为amY=fm+fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的和为ams=fm+fs;S6. Calculating the sum of the concentration fm containing the microdeletion microrepeat fragment and the male fetal nucleic acid concentration fy is amY=fm+fy, or calculating the sum of the concentration fm containing the microdeletion microrepeat fragment and the female fetal nucleic acid concentration fs is ams=fm +fs;
S7、根据判定原则对微缺失微重复的片段进行过滤,过滤后得到胎儿染色体微缺失微重复片段。S7, filtering the micro-deletion micro-repeat fragments according to the judgment principle, and filtering to obtain a fetal chromosome micro-deletion micro-repeat fragment.
发明人惊奇的发现,利用本发明的方法能够精确的确定染色体中的微缺失微重复,尤其适用于确定孕妇外周血中的胎儿染色体的 微缺失微重复。The inventors have surprisingly found that the method of the present invention enables accurate determination of microdeletion microrepetitions in chromosomes, and is particularly useful for determining fetal chromosomes in peripheral blood of pregnant women. Microdeletion microrepetition.
参见图2,根据本发明的一个实施例,所述步骤S1中含有微缺失微重复片段的浓度fm通过如下步骤获得:Referring to Fig. 2, according to an embodiment of the present invention, the concentration fm of the microdeletion microrepetition fragment in the step S1 is obtained by the following steps:
S11、根据含有微缺失微重复的初级窗口,获得不含微缺失微重复的初级窗口,计算含有微缺失微重复的初级窗口的总序列数和含有微缺失微重复的初级窗口的总数目,以及不含有微缺失微重复的初级窗口的总序列数和不含有微缺失微重复的初级窗口的总数目;S11. Obtaining a primary window containing no microdeletion microrepetition according to a primary window containing microdeletion microrepetitions, calculating a total number of primary windows containing microdeletion microrepetitions and a total number of primary windows containing microdeletion microrepetitions, and The total number of sequences of primary windows that do not contain microdeletion microduplications and the total number of primary windows that do not contain microdeletion microduplications;
S12、获得含有微缺失微重复的初级窗口的平均深度d1,d1=含有微缺失微重复的初级窗口的总序列数/含有微缺失微重复的初级窗口的总数目;S12, obtaining an average depth d1 of the primary window containing the microdeletion microrepetition, d1=the total number of sequences of the primary window containing the microdeletion microrepetitions/the total number of primary windows containing the microdeletion microrepetitions;
S13、获得不含微缺失微重复的初级窗口的平均深度d2,d2=不含微缺失微重复的初级窗口的总序列数/不含微缺失微重复的初级窗口的总数目;S13, obtaining an average depth d2 of the primary window containing no microdeletion microrepetition, d2 = total number of primary windows without microdeletion microrepetitions / total number of primary windows without microdeletion microrepetitions;
S14、计算含有微缺失微重复片段的浓度fm,fm=2×︱d2-d1︱/d2。S14. Calculating a concentration fm containing microdeletion microrepeats, fm=2×-d2-d1-/d2.
本领域技术人员可以理解的,所述不含微缺失微重复的初级窗口的总数目及序列数可由含有微缺失微重复终极窗口的方法推导得到。例如,终极窗口有个起始和终止位置的绝对坐标,根据与次级窗口的坐标的关系,找到次级窗口的坐标,然后确认次级窗口有多少初级窗口在,去掉最初和最后的初级窗口,以排除数据的波动,然后得到最终的初级窗口,计算得到总序列数。As will be understood by those skilled in the art, the total number of primary windows and the number of sequences without the microdeletion microrepetition can be derived from a method containing a microdeletion microrepeat end window. For example, the final window has an absolute coordinate of the start and end positions. Based on the coordinates of the secondary window, the coordinates of the secondary window are found, and then it is confirmed how many primary windows are in the secondary window, and the initial and final primary windows are removed. To exclude fluctuations in the data, and then get the final primary window, calculate the total number of sequences.
参见图3,根据本发明的一个实施例,所述含有微缺失微重复的终极窗口通过以下步骤获得:Referring to Figure 3, in accordance with an embodiment of the present invention, the final window containing microdeletion microduplication is obtained by the following steps:
S111、对含有游离核酸的生物样本进行核酸测序,以便获得由多个测序数据构成的测序结果; S111. Perform nucleic acid sequencing on a biological sample containing free nucleic acid to obtain a sequencing result composed of a plurality of sequencing data;
S112、将所述测序结果与参考基因组进行比对,以便构建唯一比对测序序列集,所述唯一比对测序序列集中的每一个测序序列仅能够与所述参考基因组的一个位置匹配;S112, aligning the sequencing result with a reference genome to construct a unique alignment sequencing sequence set, each of the unique alignment sequencing sequence sets can only match one position of the reference genome;
S113、确定所述唯一比对测序序列集中各唯一比对测序序列的长度;S113. Determine a length of each unique alignment sequencing sequence in the unique alignment sequencing sequence set;
S114、按照预定长度将参考基因组划分为多个初级窗口,所述预定长度为1bp-5M;S114, dividing the reference genome into a plurality of primary windows according to a predetermined length, the predetermined length being 1 bp-5M;
S115、统计所述各唯一比对测序序列落入各个初级窗口的数目;S115. Count the number of each unique alignment sequence that falls into each primary window;
S116、对落入初级窗口中的序列数进行GC修正,以及对修正后的结果进行批次间调整;S116: performing GC correction on the number of sequences falling in the primary window, and performing batch-to-batch adjustment on the corrected result;
S117、将预定数目个相邻的初级窗口合并为多个次级窗口,确定各个次级窗口中的序列数目;S117. Combine a predetermined number of adjacent primary windows into a plurality of secondary windows, and determine a number of sequences in each secondary window.
S118、对各个次级窗口进行统计检验,计算出T1值,根据所述T1值过滤所述次级窗口;S118, performing statistical tests on each secondary window, calculating a T1 value, and filtering the secondary window according to the T1 value;
S119、对过滤后的次级窗口进行统计检验,计算出T2值,根据所述T2值将相邻两个无显著性差异的次级窗口合并为终极窗口;S119. Perform a statistical test on the filtered secondary window, calculate a T2 value, and merge two adjacent secondary windows having no significant difference into an ultimate window according to the T2 value;
S120、重复步骤S118-S120,直至无法合并;S120, repeat steps S118-S120 until they cannot be merged;
S121、对最终合并得到的终极窗口进行假设检验,获得含有微缺失微重复的终极窗口。S121: Perform a hypothesis test on the final window obtained by the final combination, and obtain an ultimate window containing the micro-deletion micro-repeat.
根据本发明的一个实施例,所述含有游离核酸的生物样本为孕妇外周血中的游离胎儿核酸。According to an embodiment of the invention, the biological sample containing the free nucleic acid is free fetal nucleic acid in the peripheral blood of the pregnant woman.
根据本发明的一个实施例,所述核酸为DNA。According to an embodiment of the invention, the nucleic acid is DNA.
根据本发明的一个实施例,所述测序结果包括所述游离核酸的长度及碱基排列顺序。所述“长度”是指核酸的长度,可以用碱基对即bp作为单位。 According to an embodiment of the invention, the sequencing result comprises a length of the free nucleic acid and a base arrangement order. The "length" refers to the length of the nucleic acid, and can be expressed in units of base pairs, that is, bp.
根据本发明的一个实施例,所述测序为双末端测序、单末端测序或单分子测序。由此,容易得到游离核酸的长度,有利于后续步骤的进行。According to an embodiment of the invention, the sequencing is double-end sequencing, single-end sequencing or single-molecule sequencing. Thereby, the length of the free nucleic acid is easily obtained, which is advantageous for the subsequent steps.
本领域技术人员可以理解的,由于血样中游离胎儿DNA比较短,因需要获得所有游离DNA分子的长度,从而单末端测序需测通整条游离DNA分子,或者采用双末端测序。It will be understood by those skilled in the art that since the free fetal DNA in the blood sample is relatively short, it is necessary to obtain the length of all the free DNA molecules, so that single-end sequencing requires measurement of the entire free DNA molecule, or double-end sequencing.
根据本发明的一个实施例,所述步骤S114中的预定长度为1bp-5M,所述步骤S117中的预定数目为5-100个。优选所述预定长度为20-40Kb。According to an embodiment of the present invention, the predetermined length in the step S114 is 1 bp to 5 M, and the predetermined number in the step S117 is 5 to 100. Preferably, the predetermined length is 20-40 Kb.
根据本发明的一个实施例,所述GC修正的方法包括采用局部加权回归法,线性回归法或逻辑回归法。According to an embodiment of the invention, the method of GC correction comprises using local weighted regression, linear regression or logistic regression.
根据本发明的一个实施例,所述批次间调整为用测序的批次内所有样本计算对应的每个初级窗口的基线,根据基线对每个初级窗口内的唯一比对测序序列的数目进行加权修正。According to one embodiment of the invention, the inter-batch adjustment is to calculate a baseline for each primary window corresponding to all samples in the sequence of the sequencing, based on the number of unique alignment sequencing sequences within each primary window based on the baseline. Weighted correction.
根据本发明的一个实施例,所述步骤S118中T1值包括根据Z检验或T检验计算得到,所述过滤为将T1值在-3-3之间的次级窗口过滤掉。According to an embodiment of the invention, the value of T1 in the step S118 comprises calculating according to a Z-test or a T-test, the filtering is filtering out the secondary window in which the T1 value is between -3-3.
根据本发明的一个实施例,所述步骤S119中T2值包括根据秩和检验、符号检验或游程检验计算得到,所述无显著性差异为相邻两个窗口的T2值在-3-3之间。According to an embodiment of the present invention, the value of T2 in the step S119 is calculated according to a rank sum test, a symbol test or a run test, and the non-significant difference is that the T2 value of the adjacent two windows is -3-3. between.
根据本发明的一个实施例,所述步骤S121中假设检验包括根据Z检验或T检验计算得到,所述检验阈值定义为3。也即,当检验的统计量>3或者<-3,判定为含有微缺失微重复的终极窗口。According to an embodiment of the invention, the hypothesis test in the step S121 comprises calculating according to a Z test or a T test, the test threshold being defined as 3. That is, when the statistic of the test is >3 or <-3, it is determined to be the final window containing the microdeletion microrepetition.
参见图4,根据本发明的一个实施例,所述步骤S2中所述男性胎儿核酸浓度fy通过如下步骤获得: Referring to FIG. 4, according to an embodiment of the present invention, the male fetal nucleic acid concentration fy in the step S2 is obtained by the following steps:
S211、对含有游离核酸的生物样本进行测序,获得由多个测序数据构成的测序结果;S211, sequencing a biological sample containing free nucleic acid to obtain a sequencing result composed of a plurality of sequencing data;
S212、根据所述测序结果确定所述样品中的Y染色体中唯一比对测序序列落入初级窗口的数目;S212. Determine, according to the sequencing result, a number of unique alignment sequences in the Y chromosome in the sample that fall into the primary window;
S213、统计Y染色体上各初级窗口中唯一比对测序序列的数目总和以及所述初级窗口的总数目;S213. Count the sum of the number of unique alignment sequencing sequences in each primary window on the Y chromosome and the total number of the primary windows;
S214、获得Y染色体中初级窗口的平均深度dy,dy=Y染色体上唯一比对测序序列数目总和/Y染色体上初级窗口的数目;S214, obtaining an average depth dy of the primary window in the Y chromosome, dy=the sum of the number of unique aligned sequencing sequences on the Y chromosome/the number of primary windows on the Y chromosome;
S215、获得男性胎儿核酸浓度fy,fy=2×dy/d2,所述d2为不含微缺失微重复的初级窗口的平均深度,d2=不含微缺失微重复的初级窗口的总序列数/不含微缺失微重复的初级窗口的数目。S215, obtaining a male fetal nucleic acid concentration fy, fy=2×dy/d2, the d2 is the average depth of the primary window without the microdeletion microrepetition, and d2=the total number of primary windows without the microdeletion microrepetition/ The number of primary windows that do not contain microdeletion microduplication.
本领域技术人员可以理解的,所述不含微缺失微重复的初级窗口的总数目及序列数可由含有微缺失微重复终极窗口的方法推导得到。As will be understood by those skilled in the art, the total number of primary windows and the number of sequences without the microdeletion microrepetition can be derived from a method containing a microdeletion microrepeat end window.
根据本发明的一个实施例,所述步骤S212进一步包括:按照预定长度将参考基因组划分为多个初级窗口,去除Y染色体中唯一比对序列数目大于平均序列数目5倍以上的初级窗口。优选的,所述初级窗口为经过GC修改调整后的初级窗口。According to an embodiment of the present invention, the step S212 further comprises: dividing the reference genome into a plurality of primary windows according to a predetermined length, and removing the primary window in the Y chromosome whose number of unique alignment sequences is more than 5 times the number of the average sequence. Preferably, the primary window is a primary window adjusted by GC modification.
参见图5,根据本发明的一个实施例,所述步骤S2中女性胎儿核酸浓度fs通过如下步骤获得:Referring to FIG. 5, according to an embodiment of the present invention, the female fetal nucleic acid concentration fs in the step S2 is obtained by the following steps:
S221、对含有游离核酸的生物样本进行测序,获得由多个测序数据构成的测序结果;S221, sequencing a biological sample containing free nucleic acid to obtain a sequencing result composed of a plurality of sequencing data;
S222、根据所述测序结果,确定所述样品中长度落入预定范围的唯一比对测序序列的数目;S222. Determine, according to the sequencing result, a number of unique alignment sequencing sequences whose length falls within a predetermined range in the sample;
S223、基于所述长度落入预定范围的唯一比对测序序列的数 目,确定在所述预定范围内出现唯一比对测序序列的频率;S223. The number of unique alignment sequencing sequences falling within a predetermined range based on the length Determining a frequency at which a unique alignment sequence occurs within the predetermined range;
S224、根据所述预定范围内出现唯一比对测序序列的频率,根据预定函数,确定所述样本中女性胎儿核酸浓度fs。S224. Determine a female fetal nucleic acid concentration fs in the sample according to a predetermined function according to a frequency at which the unique alignment sequence appears within the predetermined range.
参见图6,根据本发明的一个实施例,所述步骤S222中预定范围通过如下步骤确定:Referring to FIG. 6, according to an embodiment of the present invention, the predetermined range in the step S222 is determined by the following steps:
S2221、确定所述多个对照样品中所包含的唯一比对测序序列的长度;S2221. Determine a length of the unique alignment sequencing sequence included in the plurality of control samples;
S2222、设定多个候选长度范围,并分别确定所述多个对照样品在各候选长度范围内出现的唯一比对测序序列的频率;S2222: setting a plurality of candidate length ranges, and respectively determining a frequency of the unique alignment sequencing sequence that appears in each candidate length range of the plurality of control samples;
S2223、基于所述多个对照样品在各候选长度范围内出现唯一比对测序序列的频率以及所述对照样品中核酸的浓度,确定各所述候选长度范围与所述对照样品中核酸的浓度的相关性系数;S2223: determining a frequency of each of the candidate length ranges and a concentration of the nucleic acid in the control sample based on a frequency of the unique alignment sequence and a concentration of the nucleic acid in the control sample in each candidate length range based on the plurality of control samples. Correlation coefficient
S2224、基于所述相关性系数的数值,确定至少一个候选长度范围或者候选长度范围组合作为所述预定范围。S2224. Determine, according to the value of the correlation coefficient, at least one candidate length range or a candidate length range combination as the predetermined range.
根据本发明的一个实施例,所述预定范围是基于多个对照样品确定的,其中,所述对照样品中核酸的浓度是已知的,优选的,所述预定范围是基于至少20个对照样品确定的。According to an embodiment of the invention, the predetermined range is determined based on a plurality of control samples, wherein the concentration of the nucleic acid in the control sample is known, preferably, the predetermined range is based on at least 20 control samples definite.
根据本发明的一个实施例,所述对照样品为已知游离胎儿核酸比例的怀有正常男胎的孕妇外周血样本,并且所述对照样品中核酸浓度是利用Y染色体确定的。According to an embodiment of the invention, the control sample is a maternal peripheral blood sample of a normal male fetus with a known ratio of free fetal nucleic acid, and the nucleic acid concentration in the control sample is determined using the Y chromosome.
根据本发明的一个实施例,所述对照样品中游离胎儿核酸浓度是利用Y染色体确定,也即通过本发明上述男性胎儿核酸浓度fy的方法确定的。According to one embodiment of the invention, the free fetal nucleic acid concentration in the control sample is determined using the Y chromosome, i.e., by the method of the above-described male fetal nucleic acid concentration fy of the present invention.
根据本发明的一个实施例,所述S2222中候选长度范围的跨度为1-300bp,优选的为1-20bp。 According to an embodiment of the present invention, the candidate length range in the S2222 spans from 1 to 300 bp, preferably from 1 to 20 bp.
根据本发明的一个实施例,所述多个候选长度范围的步长为1-2bp。According to an embodiment of the invention, the plurality of candidate length ranges have a step size of 1-2 bp.
例如,所述候选长度范围分别为1-20,2-21,3-22……,其中跨度为20bp,步长为1bp。For example, the candidate length ranges are 1-20, 2-21,3-22, ..., respectively, wherein the span is 20 bp and the step size is 1 bp.
根据本发明的一个实施例,所述步骤S222中预定范围为179bp-206bp。According to an embodiment of the present invention, the predetermined range in the step S222 is 179 bp to 206 bp.
参见图7,根据本发明的一个实施例,所述步骤S223中预定的函数通过如下步骤获得:Referring to FIG. 7, according to an embodiment of the present invention, the predetermined function in the step S223 is obtained by the following steps:
S2231、分别在所述多个对照样品中,确定在所述预定范围内出现唯一比对测序序列的频率;S2231, respectively, determining, in the plurality of control samples, a frequency at which a unique alignment sequencing sequence occurs within the predetermined range;
S2232、将所述多个对照样品中在所述预定范围内出现唯一比对测序序列的频率与已知的核酸浓度进行拟合,以便确定所述预定的函数。S2232: Fitting a frequency of the unique alignment sequencing sequence within the predetermined range among the plurality of control samples with a known nucleic acid concentration to determine the predetermined function.
根据本发明的一个实施例,所述拟合为线性拟合。According to an embodiment of the invention, the fit is a linear fit.
根据本发明的一个实施例,所述预定函数为d=-0.3215×p+1.62562,其中,d表示核酸浓度,p表示在所述预定范围内出现的唯一比对测序序列的频率。According to an embodiment of the invention, the predetermined function is d = -0.3215 x p + 1.62562, wherein d represents the nucleic acid concentration and p represents the frequency of the unique aligned sequencing sequence occurring within the predetermined range.
根据本发明的一个实施例,所述步骤S4还包括:若根据所述缺失的拷贝数计算得到rmY≧2或所述重复的拷贝数计算得到rmY≧6,则判定为不可信,过滤掉假阳性结果;According to an embodiment of the present invention, the step S4 further includes: if the rmY≧2 is calculated according to the missing copy number or the repeated copy number is calculated to obtain rmY≧6, it is determined to be untrustworthy, and the fake is filtered. Positive result
或者,若根据所述缺失的拷贝数计算得到rms≧2或所述重复的拷贝数计算得到rms≧6,则判定为不可信,过滤掉假阳性结果。Alternatively, if rms≧2 is calculated based on the missing copy number or the repeated copy number is calculated to obtain rms≧6, it is determined to be unreliable, and the false positive result is filtered out.
根据本发明的一个实施例,所述步骤S5还包括:若dmY<0.13或dmY>0.85,则dmY为阳性;According to an embodiment of the present invention, the step S5 further comprises: if dmY<0.13 or dmY>0.85, dmY is positive;
或者,若dms<0.15或dms>0.791,则dms为阳性。 Alternatively, if dms < 0.15 or dms > 0.791, dms is positive.
根据本发明的一个实施例,所述步骤S7中的判定原则为:若amY在0.95-1.05之间,则认为所述微缺失微重复的片段来自于母亲,过滤所述微缺失微重复的片段;According to an embodiment of the present invention, the determining principle in the step S7 is: if amY is between 0.95 and 1.05, the fragment of the microdeletion microrepetition is considered to be from the mother, and the fragment of the microdeletion microrepetition is filtered. ;
或者,若ams在0.93-1.06之间,则认为所述微缺失微重复的片段来自于母亲,过滤所述微缺失微重复的片段。Alternatively, if ams is between 0.93-1.06, the microdeletion microrepetitive fragment is considered to be from the mother, and the microdeletion microrepetitive fragment is filtered.
参见图8,本发明一方面还提供一种确定胎儿染色体中微缺失微重复的设备100,包括:Referring to Figure 8, an aspect of the present invention also provides an
微缺失微重复片段浓度计算装置110,用于获得含有微缺失微重复片段的浓度fm;a micro-deletion micro-repeat fragment
胎儿核酸浓度获得装置120,用于获得男性胎儿核酸浓度fy或女性胎儿核酸浓度fs;a fetal nucleic acid
比值计算装置130,用于计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的比值rmY=fm/fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的比值rms=fm/fs;The ratio calculating means 130 is configured to calculate a ratio rmY=fm/fy of the concentration fm containing the microdeletion microrepetition fragment to the male fetal nucleic acid concentration fy, or calculate a ratio of the concentration fm containing the microdeletion microrepeat fragment to the female fetal nucleic acid concentration fs Rms=fm/fs;
第一过滤装置140,用于根据缺失的拷贝数或重复的拷贝数计算rmY或rms,过滤掉假阳性;a
第二过滤装置150,用于取rmY的小数部分dmY,或rms的小数部分dms,判断dmY或dms是否为阳性,否则过滤掉结果;The
和值计算装置160,用于计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的和为amY=fm+fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的和为ams=fm+fs;And a
第三过滤装置170,用于根据判定原则对微缺失微重复片段进行过滤,过滤后得到胎儿染色体微缺失微重复片段。The
参见图9,根据本发明的一个实施例,所述微缺失微重复片段浓度计算装置110进一步包括:
Referring to FIG. 9, according to an embodiment of the present invention, the micro-deletion micro-repeat fragment
初极窗口获得单元111,用于根据含有微缺失微重复的初级窗口,获得不含微缺失微重复的初级窗口,计算含有微缺失微重复的初级窗口的总序列数和含有微缺失微重复的初级窗口的总数目,以及不含微缺失微重复的初级窗口的总序列数和不含有微缺失微重复的初级窗口的数目;An initial pole
第一平均深度获得单元112,用于获得含有微缺失微重复的初级窗口的平均深度d1,d1=含有微缺失微重复的初级窗口的总序列数/含有微缺失微重复的初级窗口的数目;a first average
第二平均深度获得单元113,用于获得不含微缺失微重复的初级窗口的平均深度d2,d2=不含微缺失微重复的初级窗口的总序列数/不含微缺失微重复的初级窗口的总数目;a second average
微缺失微重复片段浓度获得单元114,用于计算含有微缺失微重复片段的浓度fm,fm=2×︱d2-d1︱/d2。The microdeletion micro-repulsion fragment
根据本发明的一个实施例,所述微缺失微重复片段浓度计算装置110进一步包括微缺失微重复所在终极窗口获得单元115,参见图10,所述微缺失微重复所在终极窗口获得单元115包括:According to an embodiment of the present invention, the micro-deletion micro-repetition fragment
第一测序元件1151,用于对含有游离核酸的生物样本进行核酸测序,以便获得由多个测序数据构成的测序结果;a
比对元件1152,用于将所述测序结果与参考基因组进行比对,以便构建唯一比对测序序列集,所述唯一比对测序序列集中的每一个测序序列仅能够与所述参考基因组的一个位置匹配;Aligning
长度确定元件1153,用于确定所述唯一比对测序序列集中各唯一比对测序序列的长度;a
初级窗口确定元件1154,用于按照预定长度将参考基因组划分为多个初级窗口,所述预定长度为1bp-5M;
a primary
第一统计元件1155,用于统计所述各唯一比对测序序列落入各个初级窗口的数目;a first
修正元件1156,用于对落入初级窗口中的序列数进行GC修正,以及对修正后的结果进行批次间调整;a
第一合并元件1157,用于将预定数目个相邻的初级窗口合并为多个次级窗口,确定各个次级窗口中的序列数目;a
第一过滤元件1158,用于对各个次级窗口进行统计检验,计算出T1值,根据所述T1值过滤所述次级窗口;a
第二合并元件1159,用于对过滤后的次级窗口进行统计检验,计算出T2值,根据所述T2值将相邻两个无显著性差异的次级窗口合并为终极窗口;a
重复元件1160,用于重复启动第一过滤元件1158、第二合并元件1159,直至无法合并;
微缺失微重复终极窗口确定元件1161,用于对最终合并得到的终极窗口进行假设检验,获得含有微缺失微重复的终极窗口。The micro-deletion micro-repeat final
根据本发明的一个实施例,所述第一合并元件1157中的预定数目为5-100个。优选所述预定长度为20-40Kb。According to an embodiment of the invention, the predetermined number of the
根据本发明的一个实施例,所述修正元件1156中的GC修正的方法包括采用局部加权回归法,线性回归法或逻辑回归法。According to one embodiment of the invention, the method of GC correction in the
根据本发明的一个实施例,所述修正元件1156中的批次间调整为用测序的批次内所有样本计算对应的每个初级窗口的基线,根据基线对每个初级窗口内的唯一比对测序序列的数目进行加权修正。According to one embodiment of the invention, the inter-batch adjustment in the
根据本发明的一个实施例,所述第一过滤元件1158中T1值包括根据Z检验或T检验计算得到,所述过滤为将T1值在-3-3之间的次级窗口过滤掉。
According to an embodiment of the invention, the T1 value in the
根据本发明的一个实施例,所述第二合并元件1159中T2值包括根据秩和检验、符号检验或游程检验计算得到,所述无显著性差异为相邻两个窗口的T2值在-3-3之间。According to an embodiment of the invention, the T2 value in the
根据本发明的一个实施例,所述微缺失微重复终极窗口确定元件1161中的假设检验包括根据Z检验或T检验计算得到,所述检验阈值定义为3。也即,当检验的统计量>3或者<-3,判定为含有微缺失微重复的终极窗口。According to an embodiment of the invention, the hypothesis test in the micro-deletion micro-repeat final
根据本发明的一个实施例,所述胎儿核酸浓度获得装置120进一步包括男性胎儿核酸浓度fy获得单元121,参见图11,所述男性胎儿核酸浓度fy获得单元121包括:According to an embodiment of the present invention, the fetal nucleic acid
第二测序元件1211,用于对含有游离核酸的生物样本进行测序,获得由多个测序数据构成的测序结果;a
第一数目确定元件1212,用于根据所述测序结果确定所述样品中的Y染色体中唯一比对测序序列落入初级窗口的数目;a first
第二统计元件1213,用于统计Y染色体上各初级窗口中唯一比对测序序列的数目总和以及所述初级窗口的总数目;a second
平均深度获得元件1214,用于获得Y染色体中初级窗口的平均深度dy,dy=Y染色体上唯一比对测序序列数目总和/Y染色体上初级窗口的数目;The average
男性胎儿核酸浓度获得元件1215,用于获得男性胎儿核酸浓度fy,fy=2×dy/d2,所述d2为不含微缺失微重复的初级窗口的平均深度,d2=不含微缺失微重复的初级窗口的总序列数/不含微缺失微重复的初级窗口的数目。Male fetal nucleic acid
根据本发明的实施例,所述第一数目确定元件1212进一步包括过滤模块12121,所述过滤元件用于按照预定长度将参考基因组划
分为多个初级窗口,去除Y染色体中唯一比对序列数目大于平均序列数目5倍以上的初级窗口。According to an embodiment of the invention, the first
根据本发明的一个实施例,所述胎儿核酸浓度获得装置120进一步包括女性胎儿核酸浓度fs获得单元122,参见图12,所述女性胎儿核酸浓度fs获得单元122包括:According to an embodiment of the present invention, the fetal nucleic acid
第三测序元件1221,用于对含有游离核酸的生物样本进行测序,获得由多个测序数据构成的测序结果;a
第二数目确定元件1222,用于根据所述测序结果,确定所述样品中长度落入预定范围的唯一比对测序序列的数目;a second
频率确定元件1223,用于基于所述长度落入预定范围的唯一比对测序序列的数目,确定在所述预定范围内出现唯一比对测序序列的频率;a
女性胎儿核酸浓度确定元件1224,用于根据所述预定范围内出现的唯一比对测序序列的频率,根据预定的函数,确定所述样本中女性胎儿核酸浓度fs。A female fetal nucleic acid
根据本发明的一个实施例,所述女性胎儿核酸浓度fs获得单元122进一步包括预定范围确定元件1225,参见图13,根据本发明的一个实施例,所述预定范围确定元件1225进一步包括:According to an embodiment of the present invention, the female fetal nucleic acid concentration
长度确定模块12251,用于确定所述多个对照样品中所包含的唯一比对测序序列的长度;a
第一频率确定模块12252,用于设定多个候选长度范围,并分别确定所述多个对照样品在各候选长度范围内出现的唯一比对测序序列的频率;The first
相关性系数确定模块12253,用于基于所述多个对照样品在各候选长度范围内出现唯一比对测序序列的频率以及所述对照样品中
核酸的浓度,确定各所述候选长度范围与所述对照样品中核酸的浓度的相关性系数;a correlation
预定范围确定模块12254,用于基于所述相关性系数的数值,确定至少一个候选长度范围或者候选长度范围组合作为所述预定范围。The predetermined
根据本发明的一个实施例,所述预定范围是基于多个对照样品确定的,其中,所述对照样品中核酸浓度是已知的,优选的,所述预定范围是基于至少20个对照样品确定的。According to an embodiment of the invention, the predetermined range is determined based on a plurality of control samples, wherein the nucleic acid concentration in the control sample is known, preferably, the predetermined range is determined based on at least 20 control samples of.
根据本发明的一个实施例,所述对照样品为已知游离胎儿核酸比例的怀有正常男胎的孕妇外周血样本,并且所述对照样品中游离胎儿核酸浓度是利用Y染色体确定的。也即通过本发明上述男性胎儿核酸浓度fy的方法确定的。According to an embodiment of the invention, the control sample is a maternal peripheral blood sample of a normal male fetus with a known ratio of free fetal nucleic acid, and the free fetal nucleic acid concentration in the control sample is determined using the Y chromosome. That is, it is determined by the method of the above-described male fetal nucleic acid concentration fy of the present invention.
根据本发明的一个实施例,所述候选长度范围的跨度为1-300bp,优选的为1-20bp。According to an embodiment of the invention, the candidate length range spans from 1 to 300 bp, preferably from 1 to 20 bp.
根据本发明的一个实施例,所述多个候选长度范围的步长为1-2bp。According to an embodiment of the invention, the plurality of candidate length ranges have a step size of 1-2 bp.
例如,所述候选长度范围分别为1-20,2-21,3-22……,其中跨度为20bp,步长为1bp。For example, the candidate length ranges are 1-20, 2-21,3-22, ..., respectively, wherein the span is 20 bp and the step size is 1 bp.
根据本发明的一个实施例,所述预定范围为179bp-206bp。According to an embodiment of the invention, the predetermined range is from 179 bp to 206 bp.
根据本发明的一个实施例,所述女性胎儿核酸浓度fs获得单元122进一步包括预定函数确定元件1226,参见图14,所述预定函数确定元件1226包括:According to an embodiment of the present invention, the female fetal nucleic acid concentration
第二频率确定模块12261,用于分别在所述多个对照样品中,确定在所述预定范围内出现唯一比对测序序列的频率;a second
拟合模块12262,用于将所述多个对照样品中在所述预定范围
内出现唯一比对测序序列的频率与已知的核酸浓度进行拟合,以便确定所述预定的函数。a
根据本发明的一个实施例,所述拟合为线性拟合。According to an embodiment of the invention, the fit is a linear fit.
根据本发明的一个实施例,所述预定函数为d=-0.3215×p+1.62562,其中,d表示游离胎儿核酸浓度,p表示在所述预定范围内出现唯一比对测序序列的频率。According to an embodiment of the invention, the predetermined function is d = -0.3215 x p + 1.62562, wherein d represents the free fetal nucleic acid concentration and p represents the frequency at which the unique aligned sequencing sequence occurs within the predetermined range.
根据本发明的一个实施例,所述第一过滤装置140还包括假阳性判断单元141,用于若根据所述缺失的拷贝数计算得到rmY≧2或所述重复的拷贝数计算得到rmY≧6,则判定为不可信,过滤掉假阳性结果;According to an embodiment of the present invention, the
或者,若根据所述缺失的拷贝数计算得到rms≧2或所述重复的拷贝数计算得到rms≧6,则判定为不可信,过滤掉假阳性结果。Alternatively, if rms≧2 is calculated based on the missing copy number or the repeated copy number is calculated to obtain rms≧6, it is determined to be unreliable, and the false positive result is filtered out.
根据本发明的一个实施例,所述第二过滤装置150还包括阳性判断单元151,用于判断若dmY<0.13或dmY>0.85,则dmY为阳性;或者,若dms<0.15或dms>0.791,则dms为阳性。According to an embodiment of the present invention, the
根据本发明的一个实施例,所述第三过滤装置170中的判定原则为:若amY在0.95-1.05之间,则认为所述微缺失微重复的片段来自于母亲,过滤所述微缺失微重复的片段;According to an embodiment of the present invention, the determining principle in the
或者,若ams在0.93-1.06之间,则认为所述微缺失微重复的片段来自于母亲,过滤所述微缺失微重复的片段。Alternatively, if ams is between 0.93-1.06, the microdeletion microrepetitive fragment is considered to be from the mother, and the microdeletion microrepetitive fragment is filtered.
实施例1Example 1
一、获得微缺失微重复的片段的浓度fm;First, obtaining the concentration fm of the microdeletion microrepetitive fragment;
1、对含有游离核酸的生物样本进行核酸测序,以便获得由多个测序数据构成的测序结果; 1. Performing nucleic acid sequencing on a biological sample containing free nucleic acid to obtain a sequencing result composed of a plurality of sequencing data;
2、将所述测序结果与参考基因组进行比对,以便构建唯一比对测序序列集,所述唯一比对测序序列集中的每一个测序序列仅能够与所述参考基因组的一个位置匹配;2. Aligning the sequencing results with a reference genome to construct a unique aligned sequencing sequence set, each of the unique aligned sequencing sequences being capable of only matching one position of the reference genome;
3、确定所述唯一比对测序序列集中各唯一比对测序序列的长度;3. determining the length of each unique alignment sequencing sequence in the unique alignment sequencing sequence set;
4、按照预定长度将参考基因组划分为多个初级窗口,所述预定长度为1bp-5M,优选的采用20kp-40kp为一个预定长度,例如(1-20bp,20-40bp,40-80bp,80-100bp,100-120bp,……,);4. The reference genome is divided into a plurality of primary windows according to a predetermined length, the predetermined length being 1 bp to 5 M, preferably 20 kp to 40 kp being a predetermined length, for example (1-20 bp, 20-40 bp, 40-80 bp, 80) -100bp, 100-120bp, ...,);
5、统计所述各唯一比对测序序列的长度落入各个初级窗口的唯一比对测序序列的数目;5. Counting the number of unique aligned sequencing sequences in which each of the unique aligned sequencing sequences falls within each primary window;
6、对落入初级窗口中的序列数进行GC修正,以及对修正后的结果进行批次间调整,所述GC修正的方法包括采用局部加权回归法,线性回归法或逻辑回归法;6. Perform GC correction on the number of sequences falling into the primary window, and perform batch-to-batch adjustment on the corrected result, including the method of local weighted regression, linear regression or logistic regression;
7、将预定数目个相邻的初级窗口合并为多个次级窗口,确定各个次级窗口中的序列数目,所述预定数目为5-100个;例如以5个初级窗口合并为1个次级窗口,5个初级窗口分别为1-20bp,20-40bp,40-80bp,80-100bp,100-120bp,合并后的次级窗口为1-120bp。7. Combining a predetermined number of adjacent primary windows into a plurality of secondary windows, determining the number of sequences in each secondary window, the predetermined number being 5-100; for example, combining 5 primary windows into 1 time The first window is 1-20 bp, 20-40 bp, 40-80 bp, 80-100 bp, 100-120 bp, and the combined secondary window is 1-120 bp.
8、对各个次级窗口进行统计检验,计算出T1值,所述T1值包括Z检验或T检验计算得到;8. Perform a statistical test on each secondary window to calculate a T1 value, which is calculated by a Z test or a T test;
9、根据所述T1值过滤所述次级窗口,即将T1值在-3-3之间的次级窗口过滤掉;9. Filtering the secondary window according to the T1 value, that is, filtering the secondary window whose T1 value is between -3-3;
10、对过滤后的次级窗口进行统计检验,计算出T2值,所述T2值包括但不限于根据秩和检验、符号检验或游程检验计算得到;10. Perform a statistical test on the filtered secondary window to calculate a T2 value including, but not limited to, calculated according to a rank sum test, a symbol test, or a run test;
11、根据T2值将相邻两个无显著性差异的次级窗口合并为终 极窗口,所述无显著性差异为两个窗口的T2在-3-3之间;11. Combine two adjacent secondary windows with no significant difference according to the T2 value. a pole window, the insignificant difference is that the T2 of the two windows is between -3-3;
12、重复8-10,直至无法合并;12. Repeat 8-10 until you cannot merge;
13、将最终合并得到的终极窗口进行假设检验,获得含有微缺失微重复的终极窗口,所述假设检验包括根据Z检验或T检验计算得到,即当检验的统计量>3或者<-3,判定为含有微缺失微重复的终极窗口。13. Perform a hypothesis test on the final window obtained by the final combination, and obtain an ultimate window containing micro-deletion micro-repetitions, which is calculated according to a Z-test or a T-test, that is, when the statistic of the test is >3 or <-3, It was judged to be the ultimate window containing microdeletion microduplication.
14、根据含有微缺失微重复的初级窗口,获得不含微缺失微重复的初级窗口,计算含有微缺失微重复的初级窗口的总序列数和含有微缺失微重复的初级窗口的总数目,以及不含有微缺失微重复的初级窗口的总序列数和不含有微缺失微重复的初级窗口的总数目;14. Obtaining a primary window containing no microdeletion microrepetitions based on a primary window containing microdeletion microrepetitions, calculating a total number of primary windows containing microdeletion microrepetitions and a total number of primary windows containing microdeletion microduplications, and The total number of sequences of primary windows that do not contain microdeletion microduplications and the total number of primary windows that do not contain microdeletion microduplications;
15、计算含有微缺失微重复的终极窗口的平均深度d1,d1=含有微缺失微重复的初级窗口的总序列数/含有微缺失微重复的终极窗口的总数目;15. Calculating the average depth d1 of the final window containing microdeletion microrepetitions, d1 = total number of sequences of primary windows containing microdeletion microduplications / total number of final windows containing microdeletion microduplications;
16、计算不含微缺失微重复的初级窗口的平均深度d2,d2=不含微缺失微重复的初级窗口的总序列数/不含微缺失微重复的初级窗口的总数目;16. Calculating the average depth d2 of the primary window without microdeletion microrepetitions, d2 = total number of primary windows without microdeletion microduplications / total number of primary windows without microdeletion microduplications;
17、计算微缺失微重复的片段的浓度fm,fm=2×︱d2-d1︱/d2。17. Calculate the concentration fm of the microdeletion microrepetitive fragment, fm = 2 x - d2 - d1 - / d2.
二、获得男性胎儿核酸浓度fy或女性胎儿核酸浓度fs;2. Obtaining a male fetal nucleic acid concentration fy or a female fetal nucleic acid concentration fs;
1、确定待测样品中是否含有Y染色体,若含有,计算男性胎儿核酸浓度fy,若不含有,计算女性胎儿核酸浓度fs;1. Determine whether the Y chromosome is contained in the sample to be tested, and if so, calculate the male fetal nucleic acid concentration fy, if not, calculate the female fetal nucleic acid concentration fs;
2、若含有Y染色体,计算男性胎儿核酸浓度fy。2. If the Y chromosome is contained, the male fetal nucleic acid concentration fy is calculated.
(1)根据上述测序结果确定所述样品中的Y染色体中唯一比对测序序列落入初级窗口的数目;(1) determining, according to the above sequencing result, the number of unique alignment sequences in the Y chromosome in the sample falling into the primary window;
(2)去除初级窗口中经GC修改调整后的唯一比对序列数目大 于平均序列数目5倍以上的初级窗口;(2) Remove the large number of unique alignment sequences adjusted by GC modification in the primary window a primary window that is more than 5 times the average number of sequences;
(3)统计Y染色体上各初级窗口中唯一比对测序序列的数目总和以及所述初级窗口的总数目;(3) counting the sum of the number of unique alignment sequencing sequences in each primary window on the Y chromosome and the total number of primary windows;
(4)获得Y染色体中初级窗口的平均深度dy,dy=Y染色体上唯一比对测序序列数目总和/Y染色体上初级窗口的数目;(4) Obtain the average depth dy of the primary window in the Y chromosome, dy=the sum of the number of unique alignment sequences on the Y chromosome/the number of primary windows on the Y chromosome;
(5)获得男性胎儿核酸浓度fy,fy=2×dy/d2,所述d2为不含微缺失微重复的初级窗口的平均深度,d2=不含微缺失微重复的初级窗口的总序列数/不含微缺失微重复的初级窗口的数目。(5) Obtaining a male fetal nucleic acid concentration fy,fy=2×dy/d2, the d2 is the average depth of the primary window without the microdeletion microrepetition, and d2=the total number of primary windows without the microdeletion microrepetition / Number of primary windows without micro-deletion micro-repeats.
3、若不含有Y染色体,计算女性胎儿核酸浓度fs。3. If the Y chromosome is not included, calculate the female fetal nucleic acid concentration fs.
(1)确定所述含有游离核酸的生物样本中长度落入预定范围的唯一比对测序序列的数目;所述预定范围为179bp-206bp。(1) determining the number of unique alignment sequencing sequences in which the length of the biological sample containing the free nucleic acid falls within a predetermined range; the predetermined range is 179 bp to 206 bp.
所述预定范围通过如下步骤获得:The predetermined range is obtained by the following steps:
a、选出至少20个对照样品,即包含已知的游离胎儿核酸浓度的样品,本实施例采用男性胎儿对照样品,所述对照样品中游离胎儿核酸浓度根据Y染色体确定,也即通过上述男性胎儿核酸浓度fy的方法确定的。a, selecting at least 20 control samples, ie samples containing known free fetal nucleic acid concentrations, this embodiment uses a male fetal control sample in which the concentration of free fetal nucleic acid is determined according to the Y chromosome, ie through the above male The method of fetal nucleic acid concentration fy is determined.
b、统计出所有对照样品中所包含的唯一比对测序序列的长度,从0bp到Mbp(M表示核酸的最长的长度),并确定每个长度下出现的唯一比对测序序列的序列数;b. Count the length of the unique alignment sequence contained in all control samples, from 0 bp to Mbp (M indicates the longest length of the nucleic acid), and determine the number of unique alignment sequences that occur at each length. ;
c、以某个长度为候选长度范围,按照1-2bp的步长挪动划分多个候选长度范围,例如1bp,2bp,3bp,…,100bp,…,300bp,统计出所述对照样品在每个候选长度范围内出现的唯一比对测序序列的频率;c. using a certain length as a candidate length range, and dividing a plurality of candidate length ranges according to a step size of 1-2 bp, for example, 1 bp, 2 bp, 3 bp, ..., 100 bp, ..., 300 bp, and counting the control samples in each The frequency of unique alignment sequences that occur within the candidate length range;
d、找出所述多个对照样品在各候选长度范围内出现唯一比对测序序列的频率与所述对照样品中核酸的浓度相关性比较强的候选长 度范围或范围的组合,确定各所述候选长度范围与所述核酸的浓度的相关性系数;其中,相关性系数通过相关性计算得到,包括线性回归、逻辑回归、局部加权等方法计算得到。d. finding a candidate length of the plurality of control samples that have a unique correlation between the frequency of the sequencing sequence and the concentration of the nucleic acid in the control sample within each candidate length range A combination of degrees or ranges determines a correlation coefficient between each of the candidate length ranges and the concentration of the nucleic acid; wherein the correlation coefficient is calculated by correlation, including linear regression, logistic regression, local weighting, and the like.
其中,所述候选长度范围的跨度为1-300bp,优选的为1-20bp。多个候选长度范围的步长为1-2bp。Wherein, the candidate length range has a span of 1-300 bp, preferably 1-20 bp. The multiple candidate length ranges have a step size of 1-2 bp.
e、基于所述相关性系数的数值,确定至少一个候选长度范围或者候选长度范围组合作为所述预定范围。e. Determine, according to the value of the correlation coefficient, at least one candidate length range or a candidate length range combination as the predetermined range.
(2)基于所述长度落入预定范围的唯一比对测序序列的数目,统计出所述预定范围内出现唯一比对测序序列的频率;(2) counting the frequency at which the unique alignment sequence appears within the predetermined range based on the number of unique alignment sequencing sequences whose length falls within a predetermined range;
(3)基于所述预定范围内的唯一比对测序序列的频率,根据预定的函数,确定所述样本中女性胎儿核酸浓度fs。(3) determining a female fetal nucleic acid concentration fs in the sample based on a predetermined function based on the frequency of the unique aligned sequencing sequence within the predetermined range.
所述预定函数通过如下步骤获得:The predetermined function is obtained by the following steps:
a、分别在所述多个对照样品中,确定在所述预定范围内出现唯一比对测序序列的频率,所述对照样品中的预定范围及唯一比对测序序列的频率通过前述预定范围确定方法得到;a determining, in the plurality of control samples, respectively, a frequency at which the unique alignment sequence is generated within the predetermined range, the predetermined range in the control sample and the frequency of the unique alignment sequencing sequence are determined by the aforementioned predetermined range get;
b、将所述多个对照样品中在所述预定范围内出现唯一比对测序序列插入片段的频率与已知的核酸浓度进行线性拟合,以便确定所述预定的函数。b. Linearly fitting the frequency of occurrence of the unique alignment sequence insert within the predetermined range within the predetermined range to a known nucleic acid concentration to determine the predetermined function.
优选的,所述预定函数为d=-0.3215×p+1.62562,其中,d表示游离胎儿核酸浓度,p表示在所述预定范围内出现唯一比对测序序列的频率。Preferably, the predetermined function is d = -0.3215 x p + 1.62562, wherein d represents the free fetal nucleic acid concentration and p represents the frequency at which a unique alignment sequence occurs within the predetermined range.
三、计算微缺失微重复的片段的浓度fm与男性胎儿核酸浓度fy的比值rmY=fm/fy,或计算含有微缺失微重复的片段的浓度fm与女性胎儿核酸浓度fs的比值rms=fm/fs; 3. Calculate the ratio of the concentration fm of the microdeletion microrepetition fragment to the male fetal nucleic acid concentration fy rmY=fm/fy, or calculate the ratio of the concentration fm of the fragment containing the microdeletion microrepetition to the female fetal nucleic acid concentration fs rms=fm/ Fs;
四、根据缺失的拷贝数计算得到rmY≧2或重复的拷贝数计算得到rmY≧6,则判定为不可信,过滤掉假阳性结果;4. Calculate rmY≧2 according to the missing copy number or calculate the copy number to obtain rmY≧6, then judge it as untrustworthy and filter out the false positive result;
或者,若缺失的拷贝数计算得到rms≧2或重复的拷贝数计算得到rms≧6,则判定为不可信,过滤掉假阳性结果;Alternatively, if the missing copy number is calculated by rms≧2 or the repeated copy number is calculated to obtain rms≧6, it is determined to be untrustworthy, and the false positive result is filtered out;
过滤假阳性是为了去除多拷贝的影响,使结果更准确。Filtering false positives is to remove the effects of multiple copies and make the results more accurate.
五、取rmY的小数部分dmY,或rms的小数部分dms,判断dmY或dms是否为阳性,否则过滤掉结果:5. Take the fractional part of the rmY dmY, or the fractional part of the rms dms, to determine whether the dmY or dms is positive, otherwise the result is filtered:
若dmY<0.13或dmY>0.85,则dmY为阳性;If dmY<0.13 or dmY>0.85, dmY is positive;
或者,若dms<0.15或dms>0.791,则dms为阳性;Or, if dms<0.15 or dms>0.791, dms is positive;
六、计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的和为amY=fm+fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的和为ams=fm+fs;6. Calculate the sum of the concentration fm containing the microdeletion microrepetition fragment and the male fetal nucleic acid concentration fy as amY=fm+fy, or calculate the sum of the concentration fm containing the microdeletion microrepeat fragment and the female fetal nucleic acid concentration fs as ams=fm +fs;
七、若amY在0.95-1.05之间,则认为所述微缺失微重复的片段来自于母亲,过滤所述微缺失微重复的片段;7. If amY is between 0.95-1.05, the microdeletion microrepetitive fragment is considered to be from a mother, and the microdeletion microrepetitive fragment is filtered;
或者,若ams在0.93-1.06之间,则认为所述微缺失微重复的片段来自于母亲,过滤所述微缺失微重复的片段,过滤后得到胎儿染色体微缺失微重复片段。Alternatively, if the ams is between 0.93-1.06, the microdeletion microrepetitive fragment is considered to be from the mother, the microdeletion microrepetitive fragment is filtered, and the fetal chromosomal microdeletion microrepetitive fragment is obtained after filtration.
实施例2Example 2
1、样品收集及处理1. Sample collection and processing
选择1个批次100个样本,提取外周血2ml进行血浆分离。One batch of 100 samples was selected, and 2 ml of peripheral blood was extracted for plasma separation.
2、文库构建2, library construction
可参照本领域人员熟知的血浆文库构建要求进行文库构建Library construction can be performed with reference to plasma library construction requirements well known to those skilled in the art.
3、测序3. Sequencing
测序过程可参照本领域人员熟知的测序流程进行上机操作。 The sequencing process can be performed on a computer basis with reference to sequencing procedures well known to those skilled in the art.
4、数据分析4, data analysis
通过双末端测序得到测序结果,经过以下分析得到初始的微缺失微重复的结果,步骤如下:The sequencing results were obtained by double-end sequencing, and the results of the initial micro-deletion micro-repetition were obtained by the following analysis, and the steps are as follows:
4.1 比对,将测序结果比对到参考基因组上,确定唯一比对测序序列的位置。4.1 Alignment, the sequencing results are aligned to the reference genome to determine the location of the unique alignment sequence.
4.2 按照20kb的长度将参考基因组划分为多个初级窗口,统计每个初级窗口内的唯一比对测序序列数和GC含量,用局部加权回归落入初级窗口中的序列数进行GC修正。4.2 The reference genome was divided into multiple primary windows according to the length of 20 kb. The number of unique alignment sequences and GC content in each primary window was counted, and the number of sequences falling into the primary window by local weighted regression was used for GC correction.
4.3 对批次内所有样本,对每个初级窗口进行基线的修正,批次间调整。4.3 For all samples in the batch, perform baseline corrections and inter-batch adjustments for each primary window.
4.4 以100个为单位将相邻的初级窗口进行合并,合并后得到多个次级窗口,所述次级窗口的长度为2M;4.4 Combine adjacent primary windows in units of 100, and combine to obtain a plurality of secondary windows, the secondary window having a length of 2M;
4.5 利用Z检验计算各个次级窗口的T1值,将T1值在-3-3之间的次级窗口过滤掉;4.5 Calculate the T1 value of each secondary window by using the Z test, and filter out the secondary window with the T1 value between -3-3;
4.6 对过滤后的次级窗口进行游程检验计算出T2值,根据T2值将相邻两个T2值在-3-3之间的次级窗口合并为终极窗口;4.6 Run the run test on the filtered secondary window to calculate the T2 value, and merge the adjacent two secondary windows with the T2 value between -3-3 into the final window according to the T2 value;
4.7 重复步骤4.5-4.6,直至无法合并;4.7 Repeat steps 4.5-4.6 until they cannot be merged;
4.8 根据Z检验计算最终合并得到的终极窗口,计算得到微缺失微重复结果,共检出19个样本有微缺失微重复的结果。4.8 According to the Z test, the final window obtained by the final combination was calculated, and the micro-deletion micro-repetition results were calculated. A total of 19 samples with micro-deletion micro-duplication were detected.
表1 19个样本检出的结果Table 1 Results of 19 samples
表1.表中说明了19个样本检出的结果,其中第一列是样本的id,第二列是发生微缺失微重复的染色体,第三列是染色体的微缺失微重复长度,第四列是检出的T值。Table 1. Table shows the results of 19 samples, the first column is the id of the sample, the second column is the microdeletion microrepetition, and the third column is the microdeletion microrepetition length of the chromosome, fourth The column is the detected T value.
4.9 根据微缺失微重复的结果计算微缺失微重复片段的浓度,具体的步骤如下:4.9 Calculate the concentration of micro-deletion micro-repeat fragments based on the results of micro-deletion micro-repeats. The specific steps are as follows:
计算每个样本中含有微缺失微重复的初级窗口的平均深度d1;Calculating an average depth d1 of the primary window containing microdeletion microrepetitions in each sample;
计算不含微缺失微重复的初级窗口的平均深度d2;Calculating the average depth d2 of the primary window without microdeletion microrepetitions;
计算微缺失微重复的片段的浓度fm; Calculating the concentration fm of the microdeletion microrepetitive fragment;
计算胎儿核酸浓度。对以上19个结果的表格如下:Calculate the fetal nucleic acid concentration. The table for the above 19 results is as follows:
表2. 19个样本的胎儿核酸浓度信息Table 2. Fetal nucleic acid concentration information for 19 samples
4.10 根据chrY的比例计算胎儿浓度,得到其中8个样本的男性胎儿浓度,具体的步骤如下:4.10 Calculate the fetal concentration according to the ratio of chrY, and obtain the male fetal concentration of 8 samples. The specific steps are as follows:
去除染色体chrY中经GC修改调整后的唯一比对序列数目大于平均序列数目5倍以上的初级窗口;Removing the primary window in which the number of unique alignment sequences adjusted by GC modification in the chromosome chrY is more than 5 times the number of average sequences;
计算chrY中初级窗口的平均深度dy;Calculating the average depth dy of the primary window in chrY;
计算男性胎儿核酸浓度fy,结果如下表:Calculate the male fetal nucleic acid concentration fy, the results are as follows:
表3. 19个样本由chrY估算的胎儿核酸浓度的结果Table 3. Results of fetal nucleic acid concentrations estimated from chrY in 19 samples
4.11 根据片段长度计算胎儿的浓度,得到11个样本的女性胎儿浓度,具体的步骤如下:4.11 Calculate the concentration of the fetus according to the length of the fragment, and obtain the female fetus concentration of 11 samples. The specific steps are as follows:
统计出整个批次中共有41个男性样本,找出频率与胎儿浓度相关性较强的区域,这里选出的区域为179bp-206bp,相关系数R=-0.9056996。A total of 41 male samples were counted in the whole batch to find the region with high frequency and fetal concentration. The selected region was 179bp-206bp, and the correlation coefficient was R=-0.9056996.
确定剩余11个样本中长度范围在179bp-206bp里的核酸唯一比对测序序列出现的频率与游离的胎儿核酸浓度的函数关系,利用选出的区域179bp-206bp做线性拟合,得到关系式d=a×p+b,公式中d代表浓度,p代表出现的频率,计算得到a,b分别为-0.3215和1.62562。The relationship between the frequency of the nucleic acid in the remaining 11 samples ranging from 179 bp to 206 bp in the range of 179 bp to 206 bp was calculated as a function of the free fetal nucleic acid concentration. The selected region was linearly fitted with 179 bp to 206 bp to obtain the relationship d. = a × p + b, where d represents the concentration and p represents the frequency of occurrence, and a and b are calculated to be -0.3215 and 1.62562, respectively.
根据拟合计算女性胎儿样本的结果,得出的结果如下:The results of the female fetus samples were calculated according to the fitting, and the results were as follows:
表4. 19个样本根据片段长度计算得到的胎儿核酸浓度Table 4. Fetal nucleic acid concentrations calculated from 19 samples based on fragment length
4.12 对微缺失微重复结果进行筛选。4.12 Screen for microdeletion microrepetition results.
对男性的胎儿:For male fetuses:
计算微缺失微重复的片段的浓度fm与男性胎儿核酸浓度fy的比值rmY=fm/fy;Calculating the ratio of the concentration fm of the microdeletion microrepetitive fragment to the male fetal nucleic acid concentration fy rmY=fm/fy;
根据拷贝数进行过滤,过滤缺失的拷贝数rmY值在2以上的片段,过滤重复的拷贝数rmY值在6以上的片段。Filtration is performed according to the copy number, and the missing fragment having the copy number rmY value of 2 or more is filtered, and the fragment having the repeated copy number rmY value of 6 or more is filtered.
对rmY取小数部分得到dmY。Take the fractional part of rmY to get dmY.
对剩余的片段,过滤dmY大于0.13并且小于0.85的片段。For the remaining fragments, fragments having a dmY greater than 0.13 and less than 0.85 were filtered.
计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的和为amY=fm+fy;Calculating the sum of the concentration fm containing the microdeletion microrepeat fragment and the male fetal nucleic acid concentration fy is amY=fm+fy;
过滤amY>0.95并且amY<1.05的片段,得到男性胎儿含有微缺失微重复的样本。Fragments with amY > 0.95 and amY < 1.05 were filtered to obtain samples of male fetuses containing microdeletions and microduplications.
对女性的胎儿:For female fetuses:
计算微缺失微重复的片段的浓度fm与女性胎儿核酸浓度fs的比值rms=fm/fs;Calculating the ratio of the concentration fm of the microdeletion microrepetition fragment to the female fetal nucleic acid concentration fs rms=fm/fs;
根据拷贝数进行过滤,过滤缺失的拷贝数rms值在2以上的片段,过滤重复的拷贝数rms值在6以上的片段。Filtering according to the copy number, filtering the missing fragments with the copy number rms value of 2 or more, and filtering the fragments with the repeated copy number rms value of 6 or more.
对rms取小数部分得到dms。 Take the fractional part of rms to get dms.
对剩余的片段,过滤dms大于0.15并且小于0.791的片段。For the remaining fragments, fragments with dms greater than 0.15 and less than 0.791 were filtered.
计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的和为ams=fm+fs;Calculating the sum of the concentration fm containing the microdeletion microrepeat fragment and the female fetal nucleic acid concentration fs is ams=fm+fs;
过滤ams>0.93并且ams<1.06的片段,得到女性胎儿含有微缺失微重复的样本。Fragments with ams > 0.93 and ams < 1.06 were filtered to obtain a sample of female fetuses containing microdeletions and microduplications.
得到阳性的结果如下表5:The results obtained positive are shown in Table 5 below:
表5.过滤后的微缺失微重复结果Table 5. Microdeletion microrepetition results after filtration
对微缺失微重复的结果经过以上的处理,能够过滤大量的假阳性,得到准确的结果,参见图15。图中横坐标表示样本的编号,纵坐标表示浓度,其中fm表示微缺失微重复估算出来的浓度,fy表示男胎样本根据chrY估算出来的浓度,fs表示女胎根据片段估算出来的浓度,可以看出,经过以上标准的判断,编号为28的样本为最终含有微重复结果的胎儿样本。The results of the micro-deletion micro-replication can be filtered to obtain a large number of false positives, and an accurate result is obtained, see FIG. In the figure, the abscissa indicates the number of the sample, and the ordinate indicates the concentration, where fm indicates the concentration estimated by the microdeletion microrepetition, fy indicates the concentration estimated by the male fetus sample according to chrY, and fs indicates the concentration estimated by the fetus according to the fragment. It can be seen that, after judging by the above criteria, the sample numbered 28 is a fetal sample that ultimately contains micro-repetition results.
实施3Implementation 3
本实施例确定胎儿染色体中微缺失微重复的方法与实施例2相 同,其不同之处在于,步骤4.2中按照40kb的窗口进行划分。The method for determining the microdeletion microrepetition in the fetal chromosome in this embodiment is the same as that in the second embodiment. Again, the difference is that in step 4.2, the 40 kb window is divided.
实施4Implementation 4
本实施例确定胎儿染色体中微缺失微重复的方法与实施例2相同,其不同之处在于,步骤4.4中以200个为单位进行合并,合并后得到的次级窗口的长度为4M。The method for determining the microdeletion microrepetition in the fetal chromosome in this embodiment is the same as that in the second embodiment, except that the step 4.4 is performed in units of 200, and the length of the secondary window obtained after the combination is 4M.
实施5Implementation 5
本实施例确定胎儿染色体中微缺失微重复的方法与实施例2相同,其不同之处在于,步骤4.11中采用40个男性样本,选出的区域为185-204bp,相关系数R=-0.87。The method for determining microdeletion microduplication in fetal chromosomes in this example is the same as in Example 2, except that 40 male samples are used in step 4.11, the selected region is 185-204 bp, and the correlation coefficient is R=-0.87.
利用选出的区域185-204bp做线性拟合,得到关系式d=a×p+b,公式中d代表浓度,p代表出现的频率,计算得到a,b分别为0.0334和1.6657。Using the selected region 185-204bp for linear fitting, the relationship d = a × p + b is obtained. In the formula, d represents the concentration, and p represents the frequency of occurrence. Calculated a and b are respectively 0.0334 and 1.6657.
以上实施方式仅用以说明本发明的技术方案而非限制,尽管参照以上较佳实施方式对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或等同替换都不应脱离本发明技术方案的精神和范围。 The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to be limiting, and the present invention will be described in detail with reference to the preferred embodiments of the present invention. Neither should the spirit and scope of the technical solutions of the present invention be deviated.
Claims (13)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710086851.6 | 2017-02-17 | ||
| CN201710086851.6A CN106778069B (en) | 2017-02-17 | 2017-02-17 | Method and apparatus for determining microdeletion microreplication in fetal chromosomes |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018149114A1 true WO2018149114A1 (en) | 2018-08-23 |
Family
ID=58958599
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2017/100423 Ceased WO2018149114A1 (en) | 2017-02-17 | 2017-09-04 | Method and device for determining microdeletion and microduplication in foetal chromosomes |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN106778069B (en) |
| WO (1) | WO2018149114A1 (en) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106778069B (en) * | 2017-02-17 | 2020-02-14 | 广州精科医学检验所有限公司 | Method and apparatus for determining microdeletion microreplication in fetal chromosomes |
| CN110970089B (en) * | 2019-11-29 | 2023-05-23 | 北京优迅医疗器械有限公司 | Pretreatment method and pretreatment device for fetal concentration calculation and application of pretreatment device |
| CN112037846A (en) * | 2020-07-14 | 2020-12-04 | 广州市达瑞生物技术股份有限公司 | cffDNA aneuploidy detection method, system, storage medium and detection equipment |
| CN116246704B (en) * | 2023-05-10 | 2023-08-15 | 广州精科生物技术有限公司 | System for Noninvasive Prenatal Testing of Fetus |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130150253A1 (en) * | 2012-01-20 | 2013-06-13 | Sequenom, Inc. | Diagnostic processes that factor experimental conditions |
| CN105051209A (en) * | 2013-01-10 | 2015-11-11 | 香港中文大学 | Noninvasive Prenatal Molecular Karyotyping of Maternal Plasma |
| US20160034640A1 (en) * | 2014-07-30 | 2016-02-04 | Sequenom, Inc. | Methods and processes for non-invasive assessment of genetic variations |
| CN105555968A (en) * | 2013-05-24 | 2016-05-04 | 塞昆纳姆股份有限公司 | Methods and procedures for the non-invasive assessment of genetic variation |
| CN106778069A (en) * | 2017-02-17 | 2017-05-31 | 广州精科医学检验所有限公司 | Determine the method and apparatus of micro-deleted micro- repetition in fetal chromosomal |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104136628A (en) * | 2011-10-28 | 2014-11-05 | 深圳华大基因医学有限公司 | Method for detecting micro-deletion and micro-repetition of chromosome |
| CN104745718B (en) * | 2015-04-23 | 2018-02-16 | 北京中仪康卫医疗器械有限公司 | A kind of method for detecting human embryos microdeletion and micro- repetition |
-
2017
- 2017-02-17 CN CN201710086851.6A patent/CN106778069B/en active Active
- 2017-09-04 WO PCT/CN2017/100423 patent/WO2018149114A1/en not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130150253A1 (en) * | 2012-01-20 | 2013-06-13 | Sequenom, Inc. | Diagnostic processes that factor experimental conditions |
| CN105051209A (en) * | 2013-01-10 | 2015-11-11 | 香港中文大学 | Noninvasive Prenatal Molecular Karyotyping of Maternal Plasma |
| CN105555968A (en) * | 2013-05-24 | 2016-05-04 | 塞昆纳姆股份有限公司 | Methods and procedures for the non-invasive assessment of genetic variation |
| US20160034640A1 (en) * | 2014-07-30 | 2016-02-04 | Sequenom, Inc. | Methods and processes for non-invasive assessment of genetic variations |
| CN106778069A (en) * | 2017-02-17 | 2017-05-31 | 广州精科医学检验所有限公司 | Determine the method and apparatus of micro-deleted micro- repetition in fetal chromosomal |
Non-Patent Citations (3)
| Title |
|---|
| WAPNER, R.J. ET AL.: "Expanding the Scope of Noninvasive Prenatal Testing: detection of Foetal Microdeletion Syndromes", AMERICAN JOURNAL OF OBSTETRICS & GYNAECOLOGY, vol. 212, no. 3, 31 March 2015 (2015-03-31), XP055537008 * |
| YIN, XUYANG ET AL.: "Foetal Genetic Abnormality Detection through Maternal Plasma Free Nucleic Acid High-Throughput Sequencing", CHINESE JOURNAL OF PRENATAL DIAGNOSIS, vol. 8, no. 2, 31 December 2016 (2016-12-31), pages 44 - 51 * |
| ZHAO, C. ET AL.: "Detection of Foetal Subchromosomal Abnormalities by Sequencing Circulating Cell -Free DNA from Maternal Plasma", CLINICAL CHEMISTRY, vol. 61, no. 4, 31 December 2015 (2015-12-31), pages 608 - 616, XP055215005 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN106778069B (en) | 2020-02-14 |
| CN106778069A (en) | 2017-05-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN102952854B (en) | Single cell sorting and screening method and device thereof | |
| JP6534191B2 (en) | Method for improving the sensitivity of detection in determining copy number variation | |
| CN103525939B (en) | The method and system of Non-invasive detection foetal chromosome aneuploidy | |
| CN104846089B (en) | A kind of quantitative approach of fetal cell-free DNA in maternal plasma ratio | |
| CN107229841B (en) | A kind of genetic mutation appraisal procedure and system | |
| CN104169929B (en) | For determining system and the device of fetus whether existence numerical abnormalities of chromosomes | |
| CN113450871B (en) | Method for identifying sample identity based on low-depth sequencing | |
| KR101614471B1 (en) | Method and apparatus for diagnosing fetal chromosomal aneuploidy using genomic sequencing | |
| WO2016011982A1 (en) | Method and device for determining a ratio of free nucleic acids in a biological sample and use thereof | |
| CN105844116B (en) | The processing method and processing unit of sequencing data | |
| CN110648722B (en) | Device for risk assessment of neonatal genetic diseases | |
| WO2018149114A1 (en) | Method and device for determining microdeletion and microduplication in foetal chromosomes | |
| US12260935B2 (en) | Limit of detection based quality control metric | |
| CN104951671A (en) | Device for detecting aneuploidy of fetus chromosomes based on single-sample peripheral blood | |
| CN109402247B (en) | Fetus chromosome detection system based on DNA variation counting | |
| CN116864011A (en) | Colorectal cancer molecular marker identification method and system based on multi-omics data | |
| CN114171116B (en) | Method for assessing fetal DNA concentration by maternal dissociation and intrinsic DNA and application thereof | |
| CN108229099B (en) | Data processing method, data processing device, storage medium and processor | |
| CN114496078A (en) | Method for judging maternal-child relationship between pregnant woman and fetus by calculating fetal concentration | |
| CN113889189A (en) | Method and application for assessing fetal DNA concentration with biological paternal and maternal DNA | |
| US12073921B2 (en) | System for increasing the accuracy of non invasive prenatal diagnostics and liquid biopsy by observed loci bias correction at single base resolution | |
| CN109321641A (en) | A prenatal non-invasive fetal chromosome detection system based on DNA fragment enrichment and sequencing technology | |
| CN105177130B (en) | It is used for assessing the mark of aids patient generation immune reconstitution inflammatory syndrome | |
| CN113981062A (en) | Method for evaluating fetal DNA concentration by using DNA of non-biotics father and mother and application | |
| CN119673271B (en) | Method and device for identifying parent source pollution and detecting copy number abnormality by using peak value |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17896566 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02/12/2019) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 17896566 Country of ref document: EP Kind code of ref document: A1 |