WO2018149114A1 - 确定胎儿染色体中微缺失微重复的方法及设备 - Google Patents
确定胎儿染色体中微缺失微重复的方法及设备 Download PDFInfo
- Publication number
- WO2018149114A1 WO2018149114A1 PCT/CN2017/100423 CN2017100423W WO2018149114A1 WO 2018149114 A1 WO2018149114 A1 WO 2018149114A1 CN 2017100423 W CN2017100423 W CN 2017100423W WO 2018149114 A1 WO2018149114 A1 WO 2018149114A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- microdeletion
- nucleic acid
- concentration
- fragment
- window
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
Definitions
- the present invention relates to the field of biomedicine, and in particular, to methods and apparatus for determining microdeletions and microduplications in chromosomes.
- the existing detection methods have certain limitations.
- the detection method 1) has low precision, and a large number of false positive results will occur. Since the results are only based on the change of the proportion of fragments in a certain region, the detection results are lacking, and the effective filtering is lacking. Method, the appearance of false positives is difficult to avoid.
- Method 2) requires probe capture and high-depth sequencing, or needs to obtain parent-source information. High-depth capture requires design of the chip, which increases the difficulty of the experiment. High-depth sequencing increases the cost, and the uncaptured part cannot. Determination.
- Another aspect of the invention also provides an apparatus for determining microdeletion microrepetitions in a fetal chromosome, comprising:
- micro-deletion micro-repeat fragment concentration calculating device for obtaining a concentration fm containing a micro-deletion micro-repeat fragment
- a fetal nucleic acid concentration obtaining device for obtaining a male fetal nucleic acid concentration fy or a female fetal nucleic acid concentration fs;
- a first filtering device for calculating rmY or rms according to the missing copy number or repeated copy number, filtering out false positives
- a second filtering device for taking the fractional part of the rmY dmY, or the fractional part of the rms Divide dms to determine whether dmY or dms is positive, otherwise the result is filtered out;
- the third filtering device is configured to filter the micro-deletion micro-repeat fragments according to the determination principle, and filter to obtain a fetal chromosome micro-deletion micro-repeat fragment.
- the method and device provided by the invention can accurately determine the microdeletion microrepetition in the chromosome, and is particularly suitable for determining the microdeletion microrepetition of the fetal chromosome in the peripheral blood of the pregnant woman.
- the invention does not require an additional chip design, saves the cost of the chip design, and makes the experimental method simple.
- Figure 1 is a flow diagram of a method of determining fetal microdeletion microduplication in an embodiment of the invention.
- Figure 2 is a flow diagram of a method of obtaining a concentration fm of a fragment containing a microdeletion microrepeat in the embodiment of Figure 1.
- FIG. 3 is a flow chart of a method of obtaining a micro-deletion micro-repeat final window in the embodiment of FIG. 1.
- FIG. 4 is a flow chart of a method of obtaining a male fetal nucleic acid concentration fy in the embodiment of FIG. 1.
- Figure 5 is a flow diagram of a method of obtaining a female fetal nucleic acid concentration fs in the embodiment of Figure 1.
- Figure 6 is a flow chart of a method of obtaining a predetermined range in the method of Figure 5.
- Figure 8 is a block diagram showing the structure of an apparatus for determining microdeletion microduplication in a fetal chromosome in another embodiment of the present invention.
- Fig. 9 is a block diagram showing the configuration of a concentration calculating apparatus containing microdeletion microrepetition fragments in the embodiment of Fig. 8.
- FIG. 10 is a structural block diagram of an ultimate window obtaining unit in which the micro-deletion micro-repetition in the embodiment of FIG. 8 is located.
- Figure 11 is a block diagram showing the structure of a male fetal nucleic acid concentration fy obtaining unit in the embodiment of Figure 8.
- Figure 12 is a block diagram showing the structure of a female fetal nucleic acid concentration fs obtaining unit in the embodiment of Figure 8.
- Figure 13 is a block diagram showing the structure of a predetermined range determining element in the embodiment of Figure 8.
- Figure 14 is a block diagram showing the structure of a predetermined function determining element in the embodiment of Figure 8.
- Figure 15 is a graph showing the results of 19 sample microdeletion microrepeats in Example 2.
- Second average depth obtaining unit 113 Microdeletion microrepeat fragment concentration obtaining unit 114
- the micro-missing micro-repetition is the ultimate window obtaining unit 115
- First sequencing component 1151 Alignment component 1152 Length determining component 1153
- Primary window determination component 1154 First statistical component 1155
- Correction component 1156 First merge component 1157
- First filter element 1158 Second merge component 1159
- Fetal nucleic acid concentration obtaining device 120 Male fetal nucleic acid concentration fy obtaining unit 121
- Second sequencing component 1211 First number determining component 1212
- Filter module 12121 Second statistical component 1213
- Average depth acquisition component 1214 Male fetal nucleic acid concentration acquisition component 1215
- Third sequencing component 1221 Second number determining component 1222 Frequency determining component 1223
- Female fetal nucleic acid concentration determining element 1224 Predetermined range determining component 1225 Length determination module 12251 First frequency determination module 12252 Correlation coefficient determination module 12253 Scheduled range determination module 12254 Predetermined function determining component 1226 Second frequency determination module 12261 Fitting module 12262 Ratio calculation device 130 First filter device 140 False positive judgment unit 141 Second filter device 150 Positive judgment unit 151 And value calculation device 160 Third filter device 170
- parent sample refers herein to a biological sample obtained from a pregnant subject, eg, a woman.
- microdeletion microrepetition refers to deletions or duplications on the chromosome that range from 1.5 kb to 10 Mb in length.
- GC correction refers to the correction of the GC content in the sequence.
- the present invention provides a method for determining microduplication of fetal chromosomal microdeletions, comprising:
- the inventors have surprisingly found that the method of the present invention enables accurate determination of microdeletion microrepetitions in chromosomes, and is particularly useful for determining fetal chromosomes in peripheral blood of pregnant women. Microdeletion microrepetition.
- the concentration fm of the microdeletion microrepetition fragment in the step S1 is obtained by the following steps:
- d1 the total number of sequences of the primary window containing the microdeletion microrepetitions/the total number of primary windows containing the microdeletion microrepetitions;
- the total number of primary windows and the number of sequences without the microdeletion microrepetition can be derived from a method containing a microdeletion microrepeat end window.
- the final window has an absolute coordinate of the start and end positions. Based on the coordinates of the secondary window, the coordinates of the secondary window are found, and then it is confirmed how many primary windows are in the secondary window, and the initial and final primary windows are removed. To exclude fluctuations in the data, and then get the final primary window, calculate the total number of sequences.
- the final window containing microdeletion microduplication is obtained by the following steps:
- S111 Perform nucleic acid sequencing on a biological sample containing free nucleic acid to obtain a sequencing result composed of a plurality of sequencing data;
- each of the unique alignment sequencing sequence sets can only match one position of the reference genome
- S117 Combine a predetermined number of adjacent primary windows into a plurality of secondary windows, and determine a number of sequences in each secondary window.
- S121 Perform a hypothesis test on the final window obtained by the final combination, and obtain an ultimate window containing the micro-deletion micro-repeat.
- the biological sample containing the free nucleic acid is free fetal nucleic acid in the peripheral blood of the pregnant woman.
- the nucleic acid is DNA.
- the sequencing result comprises a length of the free nucleic acid and a base arrangement order.
- the "length" refers to the length of the nucleic acid, and can be expressed in units of base pairs, that is, bp.
- the sequencing is double-end sequencing, single-end sequencing or single-molecule sequencing.
- the length of the free nucleic acid is easily obtained, which is advantageous for the subsequent steps.
- the predetermined length in the step S114 is 1 bp to 5 M, and the predetermined number in the step S117 is 5 to 100.
- the predetermined length is 20-40 Kb.
- the method of GC correction comprises using local weighted regression, linear regression or logistic regression.
- the inter-batch adjustment is to calculate a baseline for each primary window corresponding to all samples in the sequence of the sequencing, based on the number of unique alignment sequencing sequences within each primary window based on the baseline. Weighted correction.
- the value of T1 in the step S118 comprises calculating according to a Z-test or a T-test, the filtering is filtering out the secondary window in which the T1 value is between -3-3.
- the value of T2 in the step S119 is calculated according to a rank sum test, a symbol test or a run test, and the non-significant difference is that the T2 value of the adjacent two windows is -3-3. between.
- the hypothesis test in the step S121 comprises calculating according to a Z test or a T test, the test threshold being defined as 3. That is, when the statistic of the test is >3 or ⁇ -3, it is determined to be the final window containing the microdeletion microrepetition.
- the male fetal nucleic acid concentration fy in the step S2 is obtained by the following steps:
- the total number of primary windows and the number of sequences without the microdeletion microrepetition can be derived from a method containing a microdeletion microrepeat end window.
- the step S212 further comprises: dividing the reference genome into a plurality of primary windows according to a predetermined length, and removing the primary window in the Y chromosome whose number of unique alignment sequences is more than 5 times the number of the average sequence.
- the primary window is a primary window adjusted by GC modification.
- the female fetal nucleic acid concentration fs in the step S2 is obtained by the following steps:
- the predetermined range in the step S222 is determined by the following steps:
- S2222 setting a plurality of candidate length ranges, and respectively determining a frequency of the unique alignment sequencing sequence that appears in each candidate length range of the plurality of control samples;
- the predetermined range is determined based on a plurality of control samples, wherein the concentration of the nucleic acid in the control sample is known, preferably, the predetermined range is based on at least 20 control samples definite.
- control sample is a maternal peripheral blood sample of a normal male fetus with a known ratio of free fetal nucleic acid, and the nucleic acid concentration in the control sample is determined using the Y chromosome.
- the free fetal nucleic acid concentration in the control sample is determined using the Y chromosome, i.e., by the method of the above-described male fetal nucleic acid concentration fy of the present invention.
- the candidate length range in the S2222 spans from 1 to 300 bp, preferably from 1 to 20 bp.
- the plurality of candidate length ranges have a step size of 1-2 bp.
- the candidate length ranges are 1-20, 2-21,3-22, ..., respectively, wherein the span is 20 bp and the step size is 1 bp.
- the predetermined range in the step S222 is 179 bp to 206 bp.
- the predetermined function in the step S223 is obtained by the following steps:
- S2232 Fitting a frequency of the unique alignment sequencing sequence within the predetermined range among the plurality of control samples with a known nucleic acid concentration to determine the predetermined function.
- the fit is a linear fit.
- the step S4 further includes: if the rmY ⁇ 2 is calculated according to the missing copy number or the repeated copy number is calculated to obtain rmY ⁇ 6, it is determined to be untrustworthy, and the fake is filtered. Positive result
- rms ⁇ 2 is calculated based on the missing copy number or the repeated copy number is calculated to obtain rms ⁇ 6, it is determined to be unreliable, and the false positive result is filtered out.
- the step S5 further comprises: if dmY ⁇ 0.13 or dmY>0.85, dmY is positive;
- dms ⁇ 0.15 or dms > 0.791, dms is positive.
- the determining principle in the step S7 is: if amY is between 0.95 and 1.05, the fragment of the microdeletion microrepetition is considered to be from the mother, and the fragment of the microdeletion microrepetition is filtered. ;
- the microdeletion microrepetitive fragment is considered to be from the mother, and the microdeletion microrepetitive fragment is filtered.
- an aspect of the present invention also provides an apparatus 100 for determining microdeletion microrepetitions in a fetal chromosome, comprising:
- a micro-deletion micro-repeat fragment concentration calculating device 110 for obtaining a concentration fm containing a micro-deletion micro-repeat fragment
- a fetal nucleic acid concentration obtaining device 120 for obtaining a male fetal nucleic acid concentration fy or a female fetal nucleic acid concentration fs;
- a first filtering device 140 configured to calculate rmY or rms according to the missing copy number or repeated copy number, and filter out false positives;
- the second filtering device 150 is configured to take the fractional part dmY of rmY, or the fractional part dms of rms, to determine whether dmY or dms is positive, or filter out the result;
- the third filtering device 170 is configured to filter the micro-deletion micro-repeat fragments according to the determination principle, and filter to obtain a fetal chromosome micro-deletion micro-repeat fragment.
- the micro-deletion micro-repeat fragment concentration calculating apparatus 110 further includes:
- An initial pole window obtaining unit 111 configured to obtain a primary window containing no microdeletion microrepetition according to a primary window containing microdeletion microrepetitions, calculate a total sequence number of primary windows containing microdeletion microduplications, and microdeletion microduplications The total number of primary windows, as well as the total number of sequences of primary windows without microdeletion microduplications and the number of primary windows that do not contain microdeletion microduplications;
- the micro-deletion micro-repetition fragment concentration calculation device 110 further includes a micro-deletion micro-repetition in the ultimate window obtaining unit 115.
- the micro-deletion micro-repetition in the final window obtaining unit 115 includes:
- a first sequencing component 1151 for performing nucleic acid sequencing on a biological sample containing free nucleic acid to obtain a sequencing result composed of a plurality of sequencing data
- a length determining component 1153 for determining a length of each unique aligned sequencing sequence in the unique aligned sequencing sequence set
- a primary window determining component 1154 for dividing the reference genome into a plurality of primary windows according to a predetermined length, the predetermined length being 1 bp - 5 M;
- a first statistical component 1155 configured to count the number of each unique alignment sequencing sequence falling into each primary window
- a first merging element 1157 configured to combine a predetermined number of adjacent primary windows into a plurality of secondary windows, and determine a number of sequences in each secondary window;
- a first filter element 1158 configured to perform statistical verification on each secondary window, calculate a T1 value, and filter the secondary window according to the T1 value;
- a second merging component 1159 is configured to perform a statistical check on the filtered secondary window, calculate a T2 value, and merge two adjacent secondary windows having no significant difference into an ultimate window according to the T2 value;
- the micro-deletion micro-repeat final window determining component 1161 is configured to perform a hypothesis test on the final merged final window to obtain an ultimate window containing micro-deletion micro-repeats.
- the predetermined number of the first merging elements 1157 is between 5 and 100.
- the predetermined length is 20-40 Kb.
- the method of GC correction in the correction element 1156 includes the use of local weighted regression, linear regression or logistic regression.
- the inter-batch adjustment in the correction element 1156 is to calculate a baseline for each primary window corresponding to all samples in the sequenced batch, a unique alignment within each primary window based on the baseline.
- the number of sequencing sequences is weighted and corrected.
- the T1 value in the first filter element 1158 comprises a calculation based on a Z-test or a T-test, which filter is filtered out of a secondary window having a T1 value between -3-3.
- the T2 value in the second combining element 1159 is calculated according to a rank sum test, a symbol check or a run test, the insignificant difference being the T2 value of the adjacent two windows at -3 Between -3.
- the hypothesis test in the micro-deletion micro-repeat final window determining element 1161 comprises calculating from a Z-test or a T-test, the test threshold being defined as three. That is, when the statistic of the test is >3 or ⁇ -3, it is determined to be the final window containing the microdeletion microrepetition.
- the fetal nucleic acid concentration obtaining unit 121 further includes a male fetal nucleic acid concentration fy obtaining unit 121.
- the male fetal nucleic acid concentration fy obtaining unit 121 includes:
- a second sequencing component 1211 for sequencing a biological sample containing free nucleic acid to obtain a sequencing result composed of a plurality of sequencing data
- a first number determining component 1212 configured to determine, according to the sequencing result, a number of unique aligned sequencing sequences in the Y chromosome in the sample that fall within a primary window;
- a second statistical component 1213 for counting the sum of the number of unique aligned sequencing sequences in each primary window on the Y chromosome and the total number of primary windows;
- the first number determining element 1212 further comprises a filtering module 12121 for grouping reference genes by a predetermined length Divided into a plurality of primary windows, the primary window in which the number of unique alignment sequences in the Y chromosome is more than 5 times larger than the average number of sequences is removed.
- the fetal nucleic acid concentration obtaining device 120 further includes a female fetal nucleic acid concentration fs obtaining unit 122.
- the female fetal nucleic acid concentration fs obtaining unit 122 includes:
- a third sequencing component 1221 configured to sequence a biological sample containing free nucleic acid to obtain a sequencing result composed of a plurality of sequencing data
- a second number determining component 1222 configured to determine, according to the sequencing result, a number of unique aligned sequencing sequences whose length falls within a predetermined range in the sample;
- a frequency determining component 1223 for determining a frequency at which a unique alignment sequencing sequence occurs within the predetermined range based on the number of unique aligned sequencing sequences whose length falls within a predetermined range;
- a female fetal nucleic acid concentration determining element 1224 is configured to determine a female fetal nucleic acid concentration fs in the sample according to a predetermined function based on a frequency of the unique aligned sequencing sequence occurring within the predetermined range.
- the female fetal nucleic acid concentration fs obtaining unit 122 further includes a predetermined range determining component 1225.
- the predetermined range determining component 1225 further includes:
- a length determining module 12251 configured to determine a length of a unique alignment sequencing sequence included in the plurality of control samples
- the first frequency determining module 12252 is configured to set a plurality of candidate length ranges, and respectively determine a frequency of the unique aligned sequencing sequence that occurs in each candidate length range of the plurality of control samples;
- a correlation coefficient determination module 12253 configured to generate a frequency of uniquely aligned sequencing sequences within each candidate length range based on the plurality of control samples, and in the control sample a concentration of the nucleic acid, determining a correlation coefficient between each of the candidate length ranges and a concentration of the nucleic acid in the control sample;
- the predetermined range determining module 12254 is configured to determine, according to the value of the correlation coefficient, at least one candidate length range or a candidate length range combination as the predetermined range.
- the predetermined range is determined based on a plurality of control samples, wherein the nucleic acid concentration in the control sample is known, preferably, the predetermined range is determined based on at least 20 control samples of.
- control sample is a maternal peripheral blood sample of a normal male fetus with a known ratio of free fetal nucleic acid, and the free fetal nucleic acid concentration in the control sample is determined using the Y chromosome. That is, it is determined by the method of the above-described male fetal nucleic acid concentration fy of the present invention.
- the candidate length range spans from 1 to 300 bp, preferably from 1 to 20 bp.
- the plurality of candidate length ranges have a step size of 1-2 bp.
- the candidate length ranges are 1-20, 2-21,3-22, ..., respectively, wherein the span is 20 bp and the step size is 1 bp.
- the predetermined range is from 179 bp to 206 bp.
- the female fetal nucleic acid concentration fs obtaining unit 122 further includes a predetermined function determining component 1226.
- the predetermined function determining component 1226 includes:
- a second frequency determining module 12261 configured to determine, in the plurality of control samples, a frequency of occurrence of a unique alignment sequencing sequence within the predetermined range, respectively;
- a fitting module 12262 for using the plurality of control samples in the predetermined range
- the frequency at which a unique aligned sequencing sequence occurs is fitted to a known nucleic acid concentration to determine the predetermined function.
- the fit is a linear fit.
- the first filtering device 140 further includes a false positive determining unit 141, configured to calculate rmY ⁇ 6 if rmY ⁇ 2 or the repeated copy number is calculated according to the missing copy number. , then judged to be untrustworthy, filtering out false positive results;
- rms ⁇ 2 is calculated based on the missing copy number or the repeated copy number is calculated to obtain rms ⁇ 6, it is determined to be unreliable, and the false positive result is filtered out.
- the second filtering device 150 further includes a positive determining unit 151 for determining that dmY is positive if dmY ⁇ 0.13 or dmY>0.85; or, if dms ⁇ 0.15 or dms>0.791, Then dms is positive.
- the determining principle in the third filtering device 170 is: if amY is between 0.95 and 1.05, the fragment of the microdeletion microrepetition is considered to be from the mother, and the microdeletion is filtered. Repeated segment
- the microdeletion microrepetitive fragment is considered to be from the mother, and the microdeletion microrepetitive fragment is filtered.
- the reference genome is divided into a plurality of primary windows according to a predetermined length, the predetermined length being 1 bp to 5 M, preferably 20 kp to 40 kp being a predetermined length, for example (1-20 bp, 20-40 bp, 40-80 bp, 80) -100bp, 100-120bp, ...,);
- the male fetal nucleic acid concentration fy is calculated.
- d2 is the average depth of the primary window without the microdeletion microrepetition
- d2 the total number of primary windows without the microdeletion microrepetition / Number of primary windows without micro-deletion micro-repeats.
- the predetermined range is 179 bp to 206 bp.
- the predetermined range is obtained by the following steps:
- this embodiment uses a male fetal control sample in which the concentration of free fetal nucleic acid is determined according to the Y chromosome, ie through the above male The method of fetal nucleic acid concentration fy is determined.
- a combination of degrees or ranges determines a correlation coefficient between each of the candidate length ranges and the concentration of the nucleic acid; wherein the correlation coefficient is calculated by correlation, including linear regression, logistic regression, local weighting, and the like.
- the candidate length range has a span of 1-300 bp, preferably 1-20 bp.
- the multiple candidate length ranges have a step size of 1-2 bp.
- the predetermined function is obtained by the following steps:
- the missing copy number is calculated by rms ⁇ 2 or the repeated copy number is calculated to obtain rms ⁇ 6, it is determined to be untrustworthy, and the false positive result is filtered out;
- Filtering false positives is to remove the effects of multiple copies and make the results more accurate.
- microdeletion microrepetitive fragment is considered to be from a mother, and the microdeletion microrepetitive fragment is filtered;
- the microdeletion microrepetitive fragment is considered to be from the mother, the microdeletion microrepetitive fragment is filtered, and the fetal chromosomal microdeletion microrepetitive fragment is obtained after filtration.
- One batch of 100 samples was selected, and 2 ml of peripheral blood was extracted for plasma separation.
- Library construction can be performed with reference to plasma library construction requirements well known to those skilled in the art.
- the sequencing process can be performed on a computer basis with reference to sequencing procedures well known to those skilled in the art.
- the sequencing results are aligned to the reference genome to determine the location of the unique alignment sequence.
- the reference genome was divided into multiple primary windows according to the length of 20 kb.
- the number of unique alignment sequences and GC content in each primary window was counted, and the number of sequences falling into the primary window by local weighted regression was used for GC correction.
- Table 1 Table shows the results of 19 samples, the first column is the id of the sample, the second column is the microdeletion microrepetition, and the third column is the microdeletion microrepetition length of the chromosome, fourth The column is the detected T value.
- the selected region was 179bp-206bp, and the correlation coefficient was R -0.9056996.
- the relationship between the frequency of the nucleic acid in the remaining 11 samples ranging from 179 bp to 206 bp in the range of 179 bp to 206 bp was calculated as a function of the free fetal nucleic acid concentration.
- Filtration is performed according to the copy number, and the missing fragment having the copy number rmY value of 2 or more is filtered, and the fragment having the repeated copy number rmY value of 6 or more is filtered.
- fragments having a dmY greater than 0.13 and less than 0.85 were filtered.
- Fragments with amY > 0.95 and amY ⁇ 1.05 were filtered to obtain samples of male fetuses containing microdeletions and microduplications.
- fragments with dms greater than 0.15 and less than 0.791 were filtered.
- Fragments with ams > 0.93 and ams ⁇ 1.06 were filtered to obtain a sample of female fetuses containing microdeletions and microduplications.
- the results of the micro-deletion micro-replication can be filtered to obtain a large number of false positives, and an accurate result is obtained, see FIG.
- the abscissa indicates the number of the sample
- the ordinate indicates the concentration
- fm indicates the concentration estimated by the microdeletion microrepetition
- fy indicates the concentration estimated by the male fetus sample according to chrY
- fs indicates the concentration estimated by the fetus according to the fragment.
- the method for determining the microdeletion microrepetition in the fetal chromosome in this embodiment is the same as that in the second embodiment. Again, the difference is that in step 4.2, the 40 kb window is divided.
- the method for determining the microdeletion microrepetition in the fetal chromosome in this embodiment is the same as that in the second embodiment, except that the step 4.4 is performed in units of 200, and the length of the secondary window obtained after the combination is 4M.
Landscapes
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Chemical & Material Sciences (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Analytical Chemistry (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
一种确定胎儿染色体中微缺失微重复的方法及设备,包括:获得含有微缺失微重复片段的浓度fm;获得男/女性胎儿核酸浓度fy/fs;计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的比值rmY,或fm与女性胎儿核酸浓度fs的比值rms;根据缺失的拷贝数或重复的拷贝数计算rmY或rms,过滤掉假阳性;取rmY的小数部分dmY,或rms的小数部分dms,判断dmY/dms是否为阳性,否则过滤掉结果;计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的和为amY,或fm与女性胎儿核酸浓度fs的和为ams;根据判定原则对微缺失微重复片段进行过滤,过滤后得到胎儿染色体微缺失微重复片段。
Description
本发明涉及生物医学领域,具体的,确定染色体中微缺失和微重复的方法及设备。
在无创产前检测(NIPT)临床领域,胎儿微缺失微重复的筛查灵敏度较低。现有基于母亲外周血估算胎儿染色体微缺失微重复的方法主要分为以下两个方向:1)基于在母亲血浆中胚胎DNA序列比例的变化估算微缺失微重复。2)利用单核苷酸多态性(SNPs)位点表现的差异化,选择多个SNP位点进行估算。
现有的检测方法都存在一定的局限性,检测的方法1)精度较低,会出现大量的假阳性结果,由于结果仅根据某区域内片段的比例的变化来得出检测结果,缺乏有效的过滤方法,假阳性的出现很难避免。方法2)需要探针捕获和高深度测序,或者需要获得父源性信息,高深度捕获需要设计芯片,增加了实验的难度,高深度测序会增加一定的成本,未被捕获的部分则不能进行测定。
发明内容
本发明的目的在于,提供一种确定胎儿染色体中微缺失微重复的方法及设备,该方法通过计算胎儿微缺失微重复的片段浓度及胎儿本身核酸的浓度,对微缺失微重复进行评估,降低假阳性率,精准度高。
基于以上目的,本发明一方面提供一种确定胎儿染色体中微缺
失重复的方法,包括以下步骤:
S1、获得含有微缺失微重复片段的浓度fm;
S2、获得男性胎儿核酸浓度fy或女性胎儿核酸浓度fs;
S3、计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的比值rmY=fm/fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的比值rms=fm/fs;
S4、根据缺失的拷贝数或重复的拷贝数计算rmY或rms,过滤掉假阳性;
S5、取rmY的小数部分dmY,或rms的小数部分dms,判断dmY或dms是否为阳性,否则过滤掉结果;
S6、计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的和为amY=fm+fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的和为ams=fm+fs;
S7、根据判定原则对微缺失微重复的片段进行过滤,过滤后得到胎儿染色体微缺失微重复片段。
本发明另一方面还提供一种确定胎儿染色体中微缺失微重复的设备,包括:
微缺失微重复片段浓度计算装置,用于获得含有微缺失微重复片段的浓度fm;
胎儿核酸浓度获得装置,用于获得男性胎儿核酸浓度fy或女性胎儿核酸浓度fs;
比值计算装置,用于计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的比值rmY=fm/fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的比值rms=fm/fs;
第一过滤装置,用于根据缺失的拷贝数或重复的拷贝数计算rmY或rms,过滤掉假阳性;
第二过滤装置,用于取rmY的小数部分dmY,或rms的小数部
分dms,判断dmY或dms是否为阳性,否则过滤掉结果;
和值计算装置,用于计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的和为amY=fm+fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的和为ams=fm+fs;
第三过滤装置,用于根据判定原则对微缺失微重复片段进行过滤,过滤后得到胎儿染色体微缺失微重复片段。
本发明提供的方法及装置,能够精确的确定染色体中的微缺失微重复,尤其适用于确定孕妇外周血中的胎儿染色体的微缺失微重复。
同现有技术相比:
1、本发明不需要额外的芯片设计,节省了芯片设计的费用,并使得实验方法简单。
2、不需要高深度的测序,在用全基因组数据进行染色体非整倍性的基础上对数据进行后续的处理,能够直接得出准确的微缺失微重复结果,无需增加数据量。
3、克服了现有技术中采用snp的方法可能会遗漏一些捕获无法达到的区域,本发明可以在全基因组上进行检测。
4、首次实现了用全基因组测序的方法,判断胎儿微缺失微重复浓度。
5、降低了假阳性,提高了准确度。
6、去除了母体微缺失微重复对胎儿的影响,提高准确度。
图1是本发明一实施例中确定胎儿染色体微缺失微重复的方法的流程图。
图2是图1实施例中的获得含有微缺失微重复的片段的浓度fm的方法的流程图。
图3是图1实施例中的获得含有微缺失微重复终极窗口的方法的流程图。
图4是图1实施例中的获得男性胎儿核酸浓度fy的方法的流程图。
图5是图1实施例中的获得女性胎儿核酸浓度fs的方法的流程图。
图6是图5方法中的获得预定范围的方法的流程图。
图7是图5方法中的获得预定的函数的方法的流程图。
图8是本发明另一实施例中确定胎儿染色体中微缺失微重复的设备的结构框图。
图9是图8实施例中的含有微缺失微重复片段的浓度计算装置的结构框图。
图10是图8实施例中的微缺失微重复所在终极窗口获得单元的结构框图。
图11是图8实施例中的男性胎儿核酸浓度fy获得单元的结构框图。
图12是图8实施例中的女性胎儿核酸浓度fs获得单元的结构框图。
图13是图8实施例中的预定范围确定元件的结构框图。
图14是图8实施例中的预定函数确定元件的结构框图。
图15是实施例2中19个样本微缺失微重复结果展示图。
主要元件符号说明
| 确定胎儿染色体中微缺失微重复的设备 | 100 |
| 微缺失微重复片段浓度计算装置 | 110 |
| 初极窗口获得单元 | 111 |
| 第一平均深度获得单元 | 112 |
| 第二平均深度获得单元 | 113 |
| 微缺失微重复片段浓度获得单元 | 114 |
| 微缺失微重复所在终极窗口获得单元 | 115 |
| 第一测序元件 | 1151 |
| 比对元件 | 1152 |
| 长度确定元件 | 1153 |
| 初级窗口确定元件 | 1154 |
| 第一统计元件 | 1155 |
| 修正元件 | 1156 |
| 第一合并元件 | 1157 |
| 第一过滤元件 | 1158 |
| 第二合并元件 | 1159 |
| 重复元件 | 1160 |
| 微缺失微重复终极窗口确定元件 | 1161 |
| 胎儿核酸浓度获得装置 | 120 |
| 男性胎儿核酸浓度fy获得单元 | 121 |
| 第二测序元件 | 1211 |
| 第一数目确定元件 | 1212 |
| 过滤模块 | 12121 |
| 第二统计元件 | 1213 |
| 平均深度获得元件 | 1214 |
| 男性胎儿核酸浓度获得元件 | 1215 |
| 女性胎儿核酸浓度fs获得单元 | 122 |
| 第三测序元件 | 1221 |
| 第二数目确定元件 | 1222 |
| 频率确定元件 | 1223 |
| 女性胎儿核酸浓度确定元件 | 1224 |
| 预定范围确定元件 | 1225 |
| 长度确定模块 | 12251 |
| 第一频率确定模块 | 12252 |
| 相关性系数确定模块 | 12253 |
| 预定范围确定模块 | 12254 |
| 预定函数确定元件 | 1226 |
| 第二频率确定模块 | 12261 |
| 拟合模块 | 12262 |
| 比值计算装置 | 130 |
| 第一过滤装置 | 140 |
| 假阳性判断单元 | 141 |
| 第二过滤装置 | 150 |
| 阳性判断单元 | 151 |
| 和值计算装置 | 160 |
| 第三过滤装置 | 170 |
如下具体实施方式将结合上述附图进一步说明本发明。
下面详细描述本发明的实施例。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。
需要说明的是,术语“初级”、“次级”、“终极”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“初级”、“次级”、“终极”的特征可以明示或者隐含地包括一个或更多个该特征。进一步地,在本发明的描述中,除非另有说明,“多个”的含义是两个或两个以上。本发
明中的“唯一比对序列”、“唯一比对测序序列”有时也可称之为“序列”、“测序序列”。
术语“母体样品”在本文中是指这样的生物样品,其获自妊娠受试者,例如,妇女。
术语“微缺失微重复”是指染色体上出现长度为1.5kb-10Mb的缺失或重复。
术语“GC修正”是指对序列中的GC含量进行修正。
参见图1,本发明提供一种确定胎儿染色体微缺失微重复的方法,包括:
S1、获得含有微缺失微重复片段的浓度fm;
S2、获得男性胎儿核酸浓度fy或女性胎儿核酸浓度fs;
S3、计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的比值rmY=fm/fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的比值rms=fm/fs;
S4、根据缺失的拷贝数或重复的拷贝数计算rmY或rms,过滤掉假阳性;
S5、取rmY的小数部分dmY,或rms的小数部分dms,判断dmY或dms是否为阳性,否则过滤掉结果;
S6、计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的和为amY=fm+fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的和为ams=fm+fs;
S7、根据判定原则对微缺失微重复的片段进行过滤,过滤后得到胎儿染色体微缺失微重复片段。
发明人惊奇的发现,利用本发明的方法能够精确的确定染色体中的微缺失微重复,尤其适用于确定孕妇外周血中的胎儿染色体的
微缺失微重复。
参见图2,根据本发明的一个实施例,所述步骤S1中含有微缺失微重复片段的浓度fm通过如下步骤获得:
S11、根据含有微缺失微重复的初级窗口,获得不含微缺失微重复的初级窗口,计算含有微缺失微重复的初级窗口的总序列数和含有微缺失微重复的初级窗口的总数目,以及不含有微缺失微重复的初级窗口的总序列数和不含有微缺失微重复的初级窗口的总数目;
S12、获得含有微缺失微重复的初级窗口的平均深度d1,d1=含有微缺失微重复的初级窗口的总序列数/含有微缺失微重复的初级窗口的总数目;
S13、获得不含微缺失微重复的初级窗口的平均深度d2,d2=不含微缺失微重复的初级窗口的总序列数/不含微缺失微重复的初级窗口的总数目;
S14、计算含有微缺失微重复片段的浓度fm,fm=2×︱d2-d1︱/d2。
本领域技术人员可以理解的,所述不含微缺失微重复的初级窗口的总数目及序列数可由含有微缺失微重复终极窗口的方法推导得到。例如,终极窗口有个起始和终止位置的绝对坐标,根据与次级窗口的坐标的关系,找到次级窗口的坐标,然后确认次级窗口有多少初级窗口在,去掉最初和最后的初级窗口,以排除数据的波动,然后得到最终的初级窗口,计算得到总序列数。
参见图3,根据本发明的一个实施例,所述含有微缺失微重复的终极窗口通过以下步骤获得:
S111、对含有游离核酸的生物样本进行核酸测序,以便获得由多个测序数据构成的测序结果;
S112、将所述测序结果与参考基因组进行比对,以便构建唯一比对测序序列集,所述唯一比对测序序列集中的每一个测序序列仅能够与所述参考基因组的一个位置匹配;
S113、确定所述唯一比对测序序列集中各唯一比对测序序列的长度;
S114、按照预定长度将参考基因组划分为多个初级窗口,所述预定长度为1bp-5M;
S115、统计所述各唯一比对测序序列落入各个初级窗口的数目;
S116、对落入初级窗口中的序列数进行GC修正,以及对修正后的结果进行批次间调整;
S117、将预定数目个相邻的初级窗口合并为多个次级窗口,确定各个次级窗口中的序列数目;
S118、对各个次级窗口进行统计检验,计算出T1值,根据所述T1值过滤所述次级窗口;
S119、对过滤后的次级窗口进行统计检验,计算出T2值,根据所述T2值将相邻两个无显著性差异的次级窗口合并为终极窗口;
S120、重复步骤S118-S120,直至无法合并;
S121、对最终合并得到的终极窗口进行假设检验,获得含有微缺失微重复的终极窗口。
根据本发明的一个实施例,所述含有游离核酸的生物样本为孕妇外周血中的游离胎儿核酸。
根据本发明的一个实施例,所述核酸为DNA。
根据本发明的一个实施例,所述测序结果包括所述游离核酸的长度及碱基排列顺序。所述“长度”是指核酸的长度,可以用碱基对即bp作为单位。
根据本发明的一个实施例,所述测序为双末端测序、单末端测序或单分子测序。由此,容易得到游离核酸的长度,有利于后续步骤的进行。
本领域技术人员可以理解的,由于血样中游离胎儿DNA比较短,因需要获得所有游离DNA分子的长度,从而单末端测序需测通整条游离DNA分子,或者采用双末端测序。
根据本发明的一个实施例,所述步骤S114中的预定长度为1bp-5M,所述步骤S117中的预定数目为5-100个。优选所述预定长度为20-40Kb。
根据本发明的一个实施例,所述GC修正的方法包括采用局部加权回归法,线性回归法或逻辑回归法。
根据本发明的一个实施例,所述批次间调整为用测序的批次内所有样本计算对应的每个初级窗口的基线,根据基线对每个初级窗口内的唯一比对测序序列的数目进行加权修正。
根据本发明的一个实施例,所述步骤S118中T1值包括根据Z检验或T检验计算得到,所述过滤为将T1值在-3-3之间的次级窗口过滤掉。
根据本发明的一个实施例,所述步骤S119中T2值包括根据秩和检验、符号检验或游程检验计算得到,所述无显著性差异为相邻两个窗口的T2值在-3-3之间。
根据本发明的一个实施例,所述步骤S121中假设检验包括根据Z检验或T检验计算得到,所述检验阈值定义为3。也即,当检验的统计量>3或者<-3,判定为含有微缺失微重复的终极窗口。
参见图4,根据本发明的一个实施例,所述步骤S2中所述男性胎儿核酸浓度fy通过如下步骤获得:
S211、对含有游离核酸的生物样本进行测序,获得由多个测序数据构成的测序结果;
S212、根据所述测序结果确定所述样品中的Y染色体中唯一比对测序序列落入初级窗口的数目;
S213、统计Y染色体上各初级窗口中唯一比对测序序列的数目总和以及所述初级窗口的总数目;
S214、获得Y染色体中初级窗口的平均深度dy,dy=Y染色体上唯一比对测序序列数目总和/Y染色体上初级窗口的数目;
S215、获得男性胎儿核酸浓度fy,fy=2×dy/d2,所述d2为不含微缺失微重复的初级窗口的平均深度,d2=不含微缺失微重复的初级窗口的总序列数/不含微缺失微重复的初级窗口的数目。
本领域技术人员可以理解的,所述不含微缺失微重复的初级窗口的总数目及序列数可由含有微缺失微重复终极窗口的方法推导得到。
根据本发明的一个实施例,所述步骤S212进一步包括:按照预定长度将参考基因组划分为多个初级窗口,去除Y染色体中唯一比对序列数目大于平均序列数目5倍以上的初级窗口。优选的,所述初级窗口为经过GC修改调整后的初级窗口。
参见图5,根据本发明的一个实施例,所述步骤S2中女性胎儿核酸浓度fs通过如下步骤获得:
S221、对含有游离核酸的生物样本进行测序,获得由多个测序数据构成的测序结果;
S222、根据所述测序结果,确定所述样品中长度落入预定范围的唯一比对测序序列的数目;
S223、基于所述长度落入预定范围的唯一比对测序序列的数
目,确定在所述预定范围内出现唯一比对测序序列的频率;
S224、根据所述预定范围内出现唯一比对测序序列的频率,根据预定函数,确定所述样本中女性胎儿核酸浓度fs。
参见图6,根据本发明的一个实施例,所述步骤S222中预定范围通过如下步骤确定:
S2221、确定所述多个对照样品中所包含的唯一比对测序序列的长度;
S2222、设定多个候选长度范围,并分别确定所述多个对照样品在各候选长度范围内出现的唯一比对测序序列的频率;
S2223、基于所述多个对照样品在各候选长度范围内出现唯一比对测序序列的频率以及所述对照样品中核酸的浓度,确定各所述候选长度范围与所述对照样品中核酸的浓度的相关性系数;
S2224、基于所述相关性系数的数值,确定至少一个候选长度范围或者候选长度范围组合作为所述预定范围。
根据本发明的一个实施例,所述预定范围是基于多个对照样品确定的,其中,所述对照样品中核酸的浓度是已知的,优选的,所述预定范围是基于至少20个对照样品确定的。
根据本发明的一个实施例,所述对照样品为已知游离胎儿核酸比例的怀有正常男胎的孕妇外周血样本,并且所述对照样品中核酸浓度是利用Y染色体确定的。
根据本发明的一个实施例,所述对照样品中游离胎儿核酸浓度是利用Y染色体确定,也即通过本发明上述男性胎儿核酸浓度fy的方法确定的。
根据本发明的一个实施例,所述S2222中候选长度范围的跨度为1-300bp,优选的为1-20bp。
根据本发明的一个实施例,所述多个候选长度范围的步长为1-2bp。
例如,所述候选长度范围分别为1-20,2-21,3-22……,其中跨度为20bp,步长为1bp。
根据本发明的一个实施例,所述步骤S222中预定范围为179bp-206bp。
参见图7,根据本发明的一个实施例,所述步骤S223中预定的函数通过如下步骤获得:
S2231、分别在所述多个对照样品中,确定在所述预定范围内出现唯一比对测序序列的频率;
S2232、将所述多个对照样品中在所述预定范围内出现唯一比对测序序列的频率与已知的核酸浓度进行拟合,以便确定所述预定的函数。
根据本发明的一个实施例,所述拟合为线性拟合。
根据本发明的一个实施例,所述预定函数为d=-0.3215×p+1.62562,其中,d表示核酸浓度,p表示在所述预定范围内出现的唯一比对测序序列的频率。
根据本发明的一个实施例,所述步骤S4还包括:若根据所述缺失的拷贝数计算得到rmY≧2或所述重复的拷贝数计算得到rmY≧6,则判定为不可信,过滤掉假阳性结果;
或者,若根据所述缺失的拷贝数计算得到rms≧2或所述重复的拷贝数计算得到rms≧6,则判定为不可信,过滤掉假阳性结果。
根据本发明的一个实施例,所述步骤S5还包括:若dmY<0.13或dmY>0.85,则dmY为阳性;
或者,若dms<0.15或dms>0.791,则dms为阳性。
根据本发明的一个实施例,所述步骤S7中的判定原则为:若amY在0.95-1.05之间,则认为所述微缺失微重复的片段来自于母亲,过滤所述微缺失微重复的片段;
或者,若ams在0.93-1.06之间,则认为所述微缺失微重复的片段来自于母亲,过滤所述微缺失微重复的片段。
参见图8,本发明一方面还提供一种确定胎儿染色体中微缺失微重复的设备100,包括:
微缺失微重复片段浓度计算装置110,用于获得含有微缺失微重复片段的浓度fm;
胎儿核酸浓度获得装置120,用于获得男性胎儿核酸浓度fy或女性胎儿核酸浓度fs;
比值计算装置130,用于计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的比值rmY=fm/fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的比值rms=fm/fs;
第一过滤装置140,用于根据缺失的拷贝数或重复的拷贝数计算rmY或rms,过滤掉假阳性;
第二过滤装置150,用于取rmY的小数部分dmY,或rms的小数部分dms,判断dmY或dms是否为阳性,否则过滤掉结果;
和值计算装置160,用于计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的和为amY=fm+fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的和为ams=fm+fs;
第三过滤装置170,用于根据判定原则对微缺失微重复片段进行过滤,过滤后得到胎儿染色体微缺失微重复片段。
参见图9,根据本发明的一个实施例,所述微缺失微重复片段浓度计算装置110进一步包括:
初极窗口获得单元111,用于根据含有微缺失微重复的初级窗口,获得不含微缺失微重复的初级窗口,计算含有微缺失微重复的初级窗口的总序列数和含有微缺失微重复的初级窗口的总数目,以及不含微缺失微重复的初级窗口的总序列数和不含有微缺失微重复的初级窗口的数目;
第一平均深度获得单元112,用于获得含有微缺失微重复的初级窗口的平均深度d1,d1=含有微缺失微重复的初级窗口的总序列数/含有微缺失微重复的初级窗口的数目;
第二平均深度获得单元113,用于获得不含微缺失微重复的初级窗口的平均深度d2,d2=不含微缺失微重复的初级窗口的总序列数/不含微缺失微重复的初级窗口的总数目;
微缺失微重复片段浓度获得单元114,用于计算含有微缺失微重复片段的浓度fm,fm=2×︱d2-d1︱/d2。
根据本发明的一个实施例,所述微缺失微重复片段浓度计算装置110进一步包括微缺失微重复所在终极窗口获得单元115,参见图10,所述微缺失微重复所在终极窗口获得单元115包括:
第一测序元件1151,用于对含有游离核酸的生物样本进行核酸测序,以便获得由多个测序数据构成的测序结果;
比对元件1152,用于将所述测序结果与参考基因组进行比对,以便构建唯一比对测序序列集,所述唯一比对测序序列集中的每一个测序序列仅能够与所述参考基因组的一个位置匹配;
长度确定元件1153,用于确定所述唯一比对测序序列集中各唯一比对测序序列的长度;
初级窗口确定元件1154,用于按照预定长度将参考基因组划分为多个初级窗口,所述预定长度为1bp-5M;
第一统计元件1155,用于统计所述各唯一比对测序序列落入各个初级窗口的数目;
修正元件1156,用于对落入初级窗口中的序列数进行GC修正,以及对修正后的结果进行批次间调整;
第一合并元件1157,用于将预定数目个相邻的初级窗口合并为多个次级窗口,确定各个次级窗口中的序列数目;
第一过滤元件1158,用于对各个次级窗口进行统计检验,计算出T1值,根据所述T1值过滤所述次级窗口;
第二合并元件1159,用于对过滤后的次级窗口进行统计检验,计算出T2值,根据所述T2值将相邻两个无显著性差异的次级窗口合并为终极窗口;
重复元件1160,用于重复启动第一过滤元件1158、第二合并元件1159,直至无法合并;
微缺失微重复终极窗口确定元件1161,用于对最终合并得到的终极窗口进行假设检验,获得含有微缺失微重复的终极窗口。
根据本发明的一个实施例,所述第一合并元件1157中的预定数目为5-100个。优选所述预定长度为20-40Kb。
根据本发明的一个实施例,所述修正元件1156中的GC修正的方法包括采用局部加权回归法,线性回归法或逻辑回归法。
根据本发明的一个实施例,所述修正元件1156中的批次间调整为用测序的批次内所有样本计算对应的每个初级窗口的基线,根据基线对每个初级窗口内的唯一比对测序序列的数目进行加权修正。
根据本发明的一个实施例,所述第一过滤元件1158中T1值包括根据Z检验或T检验计算得到,所述过滤为将T1值在-3-3之间的次级窗口过滤掉。
根据本发明的一个实施例,所述第二合并元件1159中T2值包括根据秩和检验、符号检验或游程检验计算得到,所述无显著性差异为相邻两个窗口的T2值在-3-3之间。
根据本发明的一个实施例,所述微缺失微重复终极窗口确定元件1161中的假设检验包括根据Z检验或T检验计算得到,所述检验阈值定义为3。也即,当检验的统计量>3或者<-3,判定为含有微缺失微重复的终极窗口。
根据本发明的一个实施例,所述胎儿核酸浓度获得装置120进一步包括男性胎儿核酸浓度fy获得单元121,参见图11,所述男性胎儿核酸浓度fy获得单元121包括:
第二测序元件1211,用于对含有游离核酸的生物样本进行测序,获得由多个测序数据构成的测序结果;
第一数目确定元件1212,用于根据所述测序结果确定所述样品中的Y染色体中唯一比对测序序列落入初级窗口的数目;
第二统计元件1213,用于统计Y染色体上各初级窗口中唯一比对测序序列的数目总和以及所述初级窗口的总数目;
平均深度获得元件1214,用于获得Y染色体中初级窗口的平均深度dy,dy=Y染色体上唯一比对测序序列数目总和/Y染色体上初级窗口的数目;
男性胎儿核酸浓度获得元件1215,用于获得男性胎儿核酸浓度fy,fy=2×dy/d2,所述d2为不含微缺失微重复的初级窗口的平均深度,d2=不含微缺失微重复的初级窗口的总序列数/不含微缺失微重复的初级窗口的数目。
根据本发明的实施例,所述第一数目确定元件1212进一步包括过滤模块12121,所述过滤元件用于按照预定长度将参考基因组划
分为多个初级窗口,去除Y染色体中唯一比对序列数目大于平均序列数目5倍以上的初级窗口。
根据本发明的一个实施例,所述胎儿核酸浓度获得装置120进一步包括女性胎儿核酸浓度fs获得单元122,参见图12,所述女性胎儿核酸浓度fs获得单元122包括:
第三测序元件1221,用于对含有游离核酸的生物样本进行测序,获得由多个测序数据构成的测序结果;
第二数目确定元件1222,用于根据所述测序结果,确定所述样品中长度落入预定范围的唯一比对测序序列的数目;
频率确定元件1223,用于基于所述长度落入预定范围的唯一比对测序序列的数目,确定在所述预定范围内出现唯一比对测序序列的频率;
女性胎儿核酸浓度确定元件1224,用于根据所述预定范围内出现的唯一比对测序序列的频率,根据预定的函数,确定所述样本中女性胎儿核酸浓度fs。
根据本发明的一个实施例,所述女性胎儿核酸浓度fs获得单元122进一步包括预定范围确定元件1225,参见图13,根据本发明的一个实施例,所述预定范围确定元件1225进一步包括:
长度确定模块12251,用于确定所述多个对照样品中所包含的唯一比对测序序列的长度;
第一频率确定模块12252,用于设定多个候选长度范围,并分别确定所述多个对照样品在各候选长度范围内出现的唯一比对测序序列的频率;
相关性系数确定模块12253,用于基于所述多个对照样品在各候选长度范围内出现唯一比对测序序列的频率以及所述对照样品中
核酸的浓度,确定各所述候选长度范围与所述对照样品中核酸的浓度的相关性系数;
预定范围确定模块12254,用于基于所述相关性系数的数值,确定至少一个候选长度范围或者候选长度范围组合作为所述预定范围。
根据本发明的一个实施例,所述预定范围是基于多个对照样品确定的,其中,所述对照样品中核酸浓度是已知的,优选的,所述预定范围是基于至少20个对照样品确定的。
根据本发明的一个实施例,所述对照样品为已知游离胎儿核酸比例的怀有正常男胎的孕妇外周血样本,并且所述对照样品中游离胎儿核酸浓度是利用Y染色体确定的。也即通过本发明上述男性胎儿核酸浓度fy的方法确定的。
根据本发明的一个实施例,所述候选长度范围的跨度为1-300bp,优选的为1-20bp。
根据本发明的一个实施例,所述多个候选长度范围的步长为1-2bp。
例如,所述候选长度范围分别为1-20,2-21,3-22……,其中跨度为20bp,步长为1bp。
根据本发明的一个实施例,所述预定范围为179bp-206bp。
根据本发明的一个实施例,所述女性胎儿核酸浓度fs获得单元122进一步包括预定函数确定元件1226,参见图14,所述预定函数确定元件1226包括:
第二频率确定模块12261,用于分别在所述多个对照样品中,确定在所述预定范围内出现唯一比对测序序列的频率;
拟合模块12262,用于将所述多个对照样品中在所述预定范围
内出现唯一比对测序序列的频率与已知的核酸浓度进行拟合,以便确定所述预定的函数。
根据本发明的一个实施例,所述拟合为线性拟合。
根据本发明的一个实施例,所述预定函数为d=-0.3215×p+1.62562,其中,d表示游离胎儿核酸浓度,p表示在所述预定范围内出现唯一比对测序序列的频率。
根据本发明的一个实施例,所述第一过滤装置140还包括假阳性判断单元141,用于若根据所述缺失的拷贝数计算得到rmY≧2或所述重复的拷贝数计算得到rmY≧6,则判定为不可信,过滤掉假阳性结果;
或者,若根据所述缺失的拷贝数计算得到rms≧2或所述重复的拷贝数计算得到rms≧6,则判定为不可信,过滤掉假阳性结果。
根据本发明的一个实施例,所述第二过滤装置150还包括阳性判断单元151,用于判断若dmY<0.13或dmY>0.85,则dmY为阳性;或者,若dms<0.15或dms>0.791,则dms为阳性。
根据本发明的一个实施例,所述第三过滤装置170中的判定原则为:若amY在0.95-1.05之间,则认为所述微缺失微重复的片段来自于母亲,过滤所述微缺失微重复的片段;
或者,若ams在0.93-1.06之间,则认为所述微缺失微重复的片段来自于母亲,过滤所述微缺失微重复的片段。
实施例1
一、获得微缺失微重复的片段的浓度fm;
1、对含有游离核酸的生物样本进行核酸测序,以便获得由多个测序数据构成的测序结果;
2、将所述测序结果与参考基因组进行比对,以便构建唯一比对测序序列集,所述唯一比对测序序列集中的每一个测序序列仅能够与所述参考基因组的一个位置匹配;
3、确定所述唯一比对测序序列集中各唯一比对测序序列的长度;
4、按照预定长度将参考基因组划分为多个初级窗口,所述预定长度为1bp-5M,优选的采用20kp-40kp为一个预定长度,例如(1-20bp,20-40bp,40-80bp,80-100bp,100-120bp,……,);
5、统计所述各唯一比对测序序列的长度落入各个初级窗口的唯一比对测序序列的数目;
6、对落入初级窗口中的序列数进行GC修正,以及对修正后的结果进行批次间调整,所述GC修正的方法包括采用局部加权回归法,线性回归法或逻辑回归法;
7、将预定数目个相邻的初级窗口合并为多个次级窗口,确定各个次级窗口中的序列数目,所述预定数目为5-100个;例如以5个初级窗口合并为1个次级窗口,5个初级窗口分别为1-20bp,20-40bp,40-80bp,80-100bp,100-120bp,合并后的次级窗口为1-120bp。
8、对各个次级窗口进行统计检验,计算出T1值,所述T1值包括Z检验或T检验计算得到;
9、根据所述T1值过滤所述次级窗口,即将T1值在-3-3之间的次级窗口过滤掉;
10、对过滤后的次级窗口进行统计检验,计算出T2值,所述T2值包括但不限于根据秩和检验、符号检验或游程检验计算得到;
11、根据T2值将相邻两个无显著性差异的次级窗口合并为终
极窗口,所述无显著性差异为两个窗口的T2在-3-3之间;
12、重复8-10,直至无法合并;
13、将最终合并得到的终极窗口进行假设检验,获得含有微缺失微重复的终极窗口,所述假设检验包括根据Z检验或T检验计算得到,即当检验的统计量>3或者<-3,判定为含有微缺失微重复的终极窗口。
14、根据含有微缺失微重复的初级窗口,获得不含微缺失微重复的初级窗口,计算含有微缺失微重复的初级窗口的总序列数和含有微缺失微重复的初级窗口的总数目,以及不含有微缺失微重复的初级窗口的总序列数和不含有微缺失微重复的初级窗口的总数目;
15、计算含有微缺失微重复的终极窗口的平均深度d1,d1=含有微缺失微重复的初级窗口的总序列数/含有微缺失微重复的终极窗口的总数目;
16、计算不含微缺失微重复的初级窗口的平均深度d2,d2=不含微缺失微重复的初级窗口的总序列数/不含微缺失微重复的初级窗口的总数目;
17、计算微缺失微重复的片段的浓度fm,fm=2×︱d2-d1︱/d2。
二、获得男性胎儿核酸浓度fy或女性胎儿核酸浓度fs;
1、确定待测样品中是否含有Y染色体,若含有,计算男性胎儿核酸浓度fy,若不含有,计算女性胎儿核酸浓度fs;
2、若含有Y染色体,计算男性胎儿核酸浓度fy。
(1)根据上述测序结果确定所述样品中的Y染色体中唯一比对测序序列落入初级窗口的数目;
(2)去除初级窗口中经GC修改调整后的唯一比对序列数目大
于平均序列数目5倍以上的初级窗口;
(3)统计Y染色体上各初级窗口中唯一比对测序序列的数目总和以及所述初级窗口的总数目;
(4)获得Y染色体中初级窗口的平均深度dy,dy=Y染色体上唯一比对测序序列数目总和/Y染色体上初级窗口的数目;
(5)获得男性胎儿核酸浓度fy,fy=2×dy/d2,所述d2为不含微缺失微重复的初级窗口的平均深度,d2=不含微缺失微重复的初级窗口的总序列数/不含微缺失微重复的初级窗口的数目。
3、若不含有Y染色体,计算女性胎儿核酸浓度fs。
(1)确定所述含有游离核酸的生物样本中长度落入预定范围的唯一比对测序序列的数目;所述预定范围为179bp-206bp。
所述预定范围通过如下步骤获得:
a、选出至少20个对照样品,即包含已知的游离胎儿核酸浓度的样品,本实施例采用男性胎儿对照样品,所述对照样品中游离胎儿核酸浓度根据Y染色体确定,也即通过上述男性胎儿核酸浓度fy的方法确定的。
b、统计出所有对照样品中所包含的唯一比对测序序列的长度,从0bp到Mbp(M表示核酸的最长的长度),并确定每个长度下出现的唯一比对测序序列的序列数;
c、以某个长度为候选长度范围,按照1-2bp的步长挪动划分多个候选长度范围,例如1bp,2bp,3bp,…,100bp,…,300bp,统计出所述对照样品在每个候选长度范围内出现的唯一比对测序序列的频率;
d、找出所述多个对照样品在各候选长度范围内出现唯一比对测序序列的频率与所述对照样品中核酸的浓度相关性比较强的候选长
度范围或范围的组合,确定各所述候选长度范围与所述核酸的浓度的相关性系数;其中,相关性系数通过相关性计算得到,包括线性回归、逻辑回归、局部加权等方法计算得到。
其中,所述候选长度范围的跨度为1-300bp,优选的为1-20bp。多个候选长度范围的步长为1-2bp。
e、基于所述相关性系数的数值,确定至少一个候选长度范围或者候选长度范围组合作为所述预定范围。
(2)基于所述长度落入预定范围的唯一比对测序序列的数目,统计出所述预定范围内出现唯一比对测序序列的频率;
(3)基于所述预定范围内的唯一比对测序序列的频率,根据预定的函数,确定所述样本中女性胎儿核酸浓度fs。
所述预定函数通过如下步骤获得:
a、分别在所述多个对照样品中,确定在所述预定范围内出现唯一比对测序序列的频率,所述对照样品中的预定范围及唯一比对测序序列的频率通过前述预定范围确定方法得到;
b、将所述多个对照样品中在所述预定范围内出现唯一比对测序序列插入片段的频率与已知的核酸浓度进行线性拟合,以便确定所述预定的函数。
优选的,所述预定函数为d=-0.3215×p+1.62562,其中,d表示游离胎儿核酸浓度,p表示在所述预定范围内出现唯一比对测序序列的频率。
三、计算微缺失微重复的片段的浓度fm与男性胎儿核酸浓度fy的比值rmY=fm/fy,或计算含有微缺失微重复的片段的浓度fm与女性胎儿核酸浓度fs的比值rms=fm/fs;
四、根据缺失的拷贝数计算得到rmY≧2或重复的拷贝数计算得到rmY≧6,则判定为不可信,过滤掉假阳性结果;
或者,若缺失的拷贝数计算得到rms≧2或重复的拷贝数计算得到rms≧6,则判定为不可信,过滤掉假阳性结果;
过滤假阳性是为了去除多拷贝的影响,使结果更准确。
五、取rmY的小数部分dmY,或rms的小数部分dms,判断dmY或dms是否为阳性,否则过滤掉结果:
若dmY<0.13或dmY>0.85,则dmY为阳性;
或者,若dms<0.15或dms>0.791,则dms为阳性;
六、计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的和为amY=fm+fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的和为ams=fm+fs;
七、若amY在0.95-1.05之间,则认为所述微缺失微重复的片段来自于母亲,过滤所述微缺失微重复的片段;
或者,若ams在0.93-1.06之间,则认为所述微缺失微重复的片段来自于母亲,过滤所述微缺失微重复的片段,过滤后得到胎儿染色体微缺失微重复片段。
实施例2
1、样品收集及处理
选择1个批次100个样本,提取外周血2ml进行血浆分离。
2、文库构建
可参照本领域人员熟知的血浆文库构建要求进行文库构建
3、测序
测序过程可参照本领域人员熟知的测序流程进行上机操作。
4、数据分析
通过双末端测序得到测序结果,经过以下分析得到初始的微缺失微重复的结果,步骤如下:
4.1 比对,将测序结果比对到参考基因组上,确定唯一比对测序序列的位置。
4.2 按照20kb的长度将参考基因组划分为多个初级窗口,统计每个初级窗口内的唯一比对测序序列数和GC含量,用局部加权回归落入初级窗口中的序列数进行GC修正。
4.3 对批次内所有样本,对每个初级窗口进行基线的修正,批次间调整。
4.4 以100个为单位将相邻的初级窗口进行合并,合并后得到多个次级窗口,所述次级窗口的长度为2M;
4.5 利用Z检验计算各个次级窗口的T1值,将T1值在-3-3之间的次级窗口过滤掉;
4.6 对过滤后的次级窗口进行游程检验计算出T2值,根据T2值将相邻两个T2值在-3-3之间的次级窗口合并为终极窗口;
4.7 重复步骤4.5-4.6,直至无法合并;
4.8 根据Z检验计算最终合并得到的终极窗口,计算得到微缺失微重复结果,共检出19个样本有微缺失微重复的结果。
表1 19个样本检出的结果
表1.表中说明了19个样本检出的结果,其中第一列是样本的id,第二列是发生微缺失微重复的染色体,第三列是染色体的微缺失微重复长度,第四列是检出的T值。
4.9 根据微缺失微重复的结果计算微缺失微重复片段的浓度,具体的步骤如下:
计算每个样本中含有微缺失微重复的初级窗口的平均深度d1;
计算不含微缺失微重复的初级窗口的平均深度d2;
计算微缺失微重复的片段的浓度fm;
计算胎儿核酸浓度。对以上19个结果的表格如下:
表2. 19个样本的胎儿核酸浓度信息
4.10 根据chrY的比例计算胎儿浓度,得到其中8个样本的男性胎儿浓度,具体的步骤如下:
去除染色体chrY中经GC修改调整后的唯一比对序列数目大于平均序列数目5倍以上的初级窗口;
计算chrY中初级窗口的平均深度dy;
计算男性胎儿核酸浓度fy,结果如下表:
表3. 19个样本由chrY估算的胎儿核酸浓度的结果
4.11 根据片段长度计算胎儿的浓度,得到11个样本的女性胎儿浓度,具体的步骤如下:
统计出整个批次中共有41个男性样本,找出频率与胎儿浓度相关性较强的区域,这里选出的区域为179bp-206bp,相关系数R=-0.9056996。
确定剩余11个样本中长度范围在179bp-206bp里的核酸唯一比对测序序列出现的频率与游离的胎儿核酸浓度的函数关系,利用选出的区域179bp-206bp做线性拟合,得到关系式d=a×p+b,公式中d代表浓度,p代表出现的频率,计算得到a,b分别为-0.3215和1.62562。
根据拟合计算女性胎儿样本的结果,得出的结果如下:
表4. 19个样本根据片段长度计算得到的胎儿核酸浓度
4.12 对微缺失微重复结果进行筛选。
对男性的胎儿:
计算微缺失微重复的片段的浓度fm与男性胎儿核酸浓度fy的比值rmY=fm/fy;
根据拷贝数进行过滤,过滤缺失的拷贝数rmY值在2以上的片段,过滤重复的拷贝数rmY值在6以上的片段。
对rmY取小数部分得到dmY。
对剩余的片段,过滤dmY大于0.13并且小于0.85的片段。
计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的和为amY=fm+fy;
过滤amY>0.95并且amY<1.05的片段,得到男性胎儿含有微缺失微重复的样本。
对女性的胎儿:
计算微缺失微重复的片段的浓度fm与女性胎儿核酸浓度fs的比值rms=fm/fs;
根据拷贝数进行过滤,过滤缺失的拷贝数rms值在2以上的片段,过滤重复的拷贝数rms值在6以上的片段。
对rms取小数部分得到dms。
对剩余的片段,过滤dms大于0.15并且小于0.791的片段。
计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的和为ams=fm+fs;
过滤ams>0.93并且ams<1.06的片段,得到女性胎儿含有微缺失微重复的样本。
得到阳性的结果如下表5:
表5.过滤后的微缺失微重复结果
| 样本 | 8 |
| 染色体 | |
| 起始位置 | 19394465 |
| 终止位置 | 27194537 |
| 微缺失微重复的长度 | 7.80M |
| T值 | 5.030 |
| 微缺失微重复计算的浓度 | 0.119863 |
| 根据chrY计算得到的胎儿浓度 | |
| 根据片段长度计算得到的胎儿浓度 | 0.126509 |
对微缺失微重复的结果经过以上的处理,能够过滤大量的假阳性,得到准确的结果,参见图15。图中横坐标表示样本的编号,纵坐标表示浓度,其中fm表示微缺失微重复估算出来的浓度,fy表示男胎样本根据chrY估算出来的浓度,fs表示女胎根据片段估算出来的浓度,可以看出,经过以上标准的判断,编号为28的样本为最终含有微重复结果的胎儿样本。
实施3
本实施例确定胎儿染色体中微缺失微重复的方法与实施例2相
同,其不同之处在于,步骤4.2中按照40kb的窗口进行划分。
实施4
本实施例确定胎儿染色体中微缺失微重复的方法与实施例2相同,其不同之处在于,步骤4.4中以200个为单位进行合并,合并后得到的次级窗口的长度为4M。
实施5
本实施例确定胎儿染色体中微缺失微重复的方法与实施例2相同,其不同之处在于,步骤4.11中采用40个男性样本,选出的区域为185-204bp,相关系数R=-0.87。
利用选出的区域185-204bp做线性拟合,得到关系式d=a×p+b,公式中d代表浓度,p代表出现的频率,计算得到a,b分别为0.0334和1.6657。
以上实施方式仅用以说明本发明的技术方案而非限制,尽管参照以上较佳实施方式对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或等同替换都不应脱离本发明技术方案的精神和范围。
Claims (13)
- 一种确定胎儿染色体中微缺失微重复的方法,其特征在于,包括以下步骤:S1、获得含有微缺失微重复片段的浓度fm;S2、获得男性胎儿核酸浓度fy或女性胎儿核酸浓度fs;S3、计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的比值rmY=fm/fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的比值rms=fm/fs;S4、根据缺失的拷贝数或重复的拷贝数计算rmY或rms,过滤掉假阳性;S5、取rmY的小数部分dmY,或rms的小数部分dms,判断dmY或dms是否为阳性,否则过滤掉结果;S6、计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的和为amY=fm+fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的和为ams=fm+fs;S7、根据判定原则对微缺失微重复的片段进行过滤,过滤后得到胎儿染色体微缺失微重复片段。
- 根据权利要求1所述的方法,其特征在于,所述步骤S1中含有微缺失微重复片段的浓度fm通过如下步骤获得:S11、根据含有微缺失微重复的初级窗口,获得不含微缺失微重复的初级窗口,计算含有微缺失微重复的初级窗口的总序列数和含有微缺失微重复的初级窗口的总数目,以及不含有微缺失微重复的初级窗口的总序列数和不含有微缺失微重复的初级窗口的总数目;S12、获得含有微缺失微重复的初级窗口的平均深度d1,d1= 含有微缺失微重复的初级窗口的总序列数/含有微缺失微重复的初级窗口的数目;S13、获得不含微缺失微重复的初级窗口的平均深度d2,d2=不含微缺失微重复的初级窗口的总序列数/不含微缺失微重复的初级窗口的总数目;S14、计算含有微缺失微重复片段的浓度fm,fm=2×︱d2-d1︱/d2。
- 根据权利要求2所述的方法,其特征在于,所述含有微缺失微重复的终极窗口通过以下步骤获得:S111、对含有游离核酸的生物样本进行核酸测序,以便获得由多个测序数据构成的测序结果;S112、将所述测序结果与参考基因组进行比对,以便构建唯一比对测序序列集,所述唯一比对测序序列集中的每一个测序序列仅能够与所述参考基因组的一个位置匹配;S113、确定所述唯一比对测序序列集中各唯一比对测序序列的长度;S114、按照预定长度将参考基因组划分为多个初级窗口,所述预定长度为1bp-5M;S115、统计所述各唯一比对测序序列落入各个初级窗口的数目;S116、对落入初级窗口中的序列数进行GC修正,以及对修正后的结果进行批次间调整;S117、将预定数目个相邻的初级窗口合并为多个次级窗口,确定各个次级窗口中的序列数目;S118、对各个次级窗口进行统计检验,计算出T1值,根据所 述T1值过滤所述次级窗口;S119、对过滤后的次级窗口进行统计检验,计算出T2值,根据所述T2值将相邻两个无显著性差异的次级窗口合并为终极窗口;S120、重复步骤S118-S120,直至无法合并;S121、对最终合并得到的终极窗口进行假设检验,获得含有微缺失微重复的终极窗口。
- 根据权利要求3所述的方法,其特征在于,所述步骤S114中的预定长度为1bp-5M,所述步骤S117中的预定数目为5-100个。
- 根据权利要求3所述的方法,其特征在于,所述步骤S118中T1值包括根据Z检验或T检验计算得到,所述过滤为将T1值在-3~3之间的次级窗口过滤掉。
- 根据权利要求3所述的方法,其特征在于,所述步骤S119中T2值包括根据秩和检验、符号检验或游程检验计算得到,所述无显著性差异为相邻两个窗口的T2值在-3-3之间。
- 根据权利要求3所述的方法,其特征在于,所述步骤S121中假设检验包括根据Z检验或T检验计算得到,即当检验的统计量>3或者<-3,判定为含有微缺失微重复的终极窗口。
- 根据权利要求1所述的方法,其特征在于,所述步骤S2中所述男性胎儿核酸浓度fy通过如下步骤获得:S211、对含有游离核酸的生物样本进行测序,获得由多个测序数据构成的测序结果;S212、根据所述测序结果,确定所述样品中的Y染色体中唯一比对测序序列落入初级窗口的数目;S213、统计Y染色体上各初级窗口中唯一比对测序序列的数目 总和以及所述初级窗口的总数目;S214、获得Y染色体中初级窗口的平均深度dy,dy=Y染色体上唯一比对测序序列数目总和/Y染色体上初级窗口的数目;S215、获得男性胎儿核酸浓度fy,fy=2×dy/d2,所述d2为不含微缺失微重复的初级窗口的平均深度,d2=不含微缺失微重复的初级窗口的总序列数/不含微缺失微重复的初级窗口的数目。
- 根据权利要求1所述的方法,其特征在于,所述步骤S2中女性胎儿核酸浓度fs通过如下步骤获得:S221、对含有游离核酸的生物样本进行测序,获得由多个测序数据构成的测序结果;S222、根据所述测序结果,确定所述样品中长度落入预定范围的唯一比对测序序列的数目;S223、基于所述长度落入预定范围的唯一比对测序序列的数目,确定在所述预定范围内出现唯一比对测序序列的频率;S224、根据所述预定范围内出现唯一比对测序序列的频率,根据预定函数,确定所述样本中女性胎儿核酸浓度fs。
- 根据权利要求1所述的方法,其特征在于,所述步骤S4还包括:若根据所述缺失的拷贝数计算得到rmY≧2或所述重复的拷贝数计算得到rmY≧6,则判定为不可信,过滤掉假阳性结果;或者,若根据所述缺失的拷贝数计算得到rms≧2或所述重复的拷贝数计算得到rms≧6,则判定为不可信,过滤掉假阳性结果。
- 根据权利要求1所述的方法,其特征在于,所述步骤S5还包括:若dmY<0.13或dmY>0.85,则dmY为阳性;或者,若dms<0.15或dms>0.791,则dms为阳性。
- 根据权利要求1述的方法,其特征在于,所述步骤S7中的判定原则为:若amY在0.95-1.05之间,则认为所述微缺失微重复的片段来自于母亲,过滤所述微缺失微重复的片段;或者,若ams在0.93-1.06之间,则认为所述微缺失微重复的片段来自于母亲,过滤所述微缺失微重复的片段。
- 一种确定胎儿染色体中微缺失微重复的设备,其特征在于,包括:微缺失微重复片段浓度计算装置,用于获得含有微缺失微重复片段的浓度fm;胎儿核酸浓度获得装置,用于获得男性胎儿核酸浓度fy或女性胎儿核酸浓度fs;比值计算装置,用于计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的比值rmY=fm/fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的比值rms=fm/fs;第一过滤装置,用于根据缺失的拷贝数或重复的拷贝数计算rmY或rms,过滤掉假阳性;第二过滤装置,用于取rmY的小数部分dmY,或rms的小数部分dms,判断dmY或dms是否为阳性,否则过滤掉结果;和值计算装置,用于计算含有微缺失微重复片段的浓度fm与男性胎儿核酸浓度fy的和为amY=fm+fy,或计算含有微缺失微重复片段的浓度fm与女性胎儿核酸浓度fs的和为ams=fm+fs;第三过滤装置,用于根据判定原则对微缺失微重复片段进行过滤,过滤后得到胎儿染色体微缺失微重复片段。
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710086851.6 | 2017-02-17 | ||
| CN201710086851.6A CN106778069B (zh) | 2017-02-17 | 2017-02-17 | 确定胎儿染色体中微缺失微重复的方法及设备 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018149114A1 true WO2018149114A1 (zh) | 2018-08-23 |
Family
ID=58958599
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2017/100423 Ceased WO2018149114A1 (zh) | 2017-02-17 | 2017-09-04 | 确定胎儿染色体中微缺失微重复的方法及设备 |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN106778069B (zh) |
| WO (1) | WO2018149114A1 (zh) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106778069B (zh) * | 2017-02-17 | 2020-02-14 | 广州精科医学检验所有限公司 | 确定胎儿染色体中微缺失微重复的方法及设备 |
| CN110970089B (zh) * | 2019-11-29 | 2023-05-23 | 北京优迅医疗器械有限公司 | 胎儿浓度计算的预处理方法、预处理装置及其应用 |
| CN112037846A (zh) * | 2020-07-14 | 2020-12-04 | 广州市达瑞生物技术股份有限公司 | 一种cffDNA非整倍体检测方法、系统、储存介质以及检测设备 |
| CN116246704B (zh) * | 2023-05-10 | 2023-08-15 | 广州精科生物技术有限公司 | 用于胎儿无创产前检测的系统 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130150253A1 (en) * | 2012-01-20 | 2013-06-13 | Sequenom, Inc. | Diagnostic processes that factor experimental conditions |
| CN105051209A (zh) * | 2013-01-10 | 2015-11-11 | 香港中文大学 | 母体血浆的无创性产前分子染色体核型分析 |
| US20160034640A1 (en) * | 2014-07-30 | 2016-02-04 | Sequenom, Inc. | Methods and processes for non-invasive assessment of genetic variations |
| CN105555968A (zh) * | 2013-05-24 | 2016-05-04 | 塞昆纳姆股份有限公司 | 遗传变异的非侵入性评估方法和过程 |
| CN106778069A (zh) * | 2017-02-17 | 2017-05-31 | 广州精科医学检验所有限公司 | 确定胎儿染色体中微缺失微重复的方法及设备 |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104136628A (zh) * | 2011-10-28 | 2014-11-05 | 深圳华大基因医学有限公司 | 一种检测染色体微缺失和微重复的方法 |
| CN104745718B (zh) * | 2015-04-23 | 2018-02-16 | 北京中仪康卫医疗器械有限公司 | 一种检测人类胚胎染色体微缺失和微重复的方法 |
-
2017
- 2017-02-17 CN CN201710086851.6A patent/CN106778069B/zh active Active
- 2017-09-04 WO PCT/CN2017/100423 patent/WO2018149114A1/zh not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130150253A1 (en) * | 2012-01-20 | 2013-06-13 | Sequenom, Inc. | Diagnostic processes that factor experimental conditions |
| CN105051209A (zh) * | 2013-01-10 | 2015-11-11 | 香港中文大学 | 母体血浆的无创性产前分子染色体核型分析 |
| CN105555968A (zh) * | 2013-05-24 | 2016-05-04 | 塞昆纳姆股份有限公司 | 遗传变异的非侵入性评估方法和过程 |
| US20160034640A1 (en) * | 2014-07-30 | 2016-02-04 | Sequenom, Inc. | Methods and processes for non-invasive assessment of genetic variations |
| CN106778069A (zh) * | 2017-02-17 | 2017-05-31 | 广州精科医学检验所有限公司 | 确定胎儿染色体中微缺失微重复的方法及设备 |
Non-Patent Citations (3)
| Title |
|---|
| WAPNER, R.J. ET AL.: "Expanding the Scope of Noninvasive Prenatal Testing: detection of Foetal Microdeletion Syndromes", AMERICAN JOURNAL OF OBSTETRICS & GYNAECOLOGY, vol. 212, no. 3, 31 March 2015 (2015-03-31), XP055537008 * |
| YIN, XUYANG ET AL.: "Foetal Genetic Abnormality Detection through Maternal Plasma Free Nucleic Acid High-Throughput Sequencing", CHINESE JOURNAL OF PRENATAL DIAGNOSIS, vol. 8, no. 2, 31 December 2016 (2016-12-31), pages 44 - 51 * |
| ZHAO, C. ET AL.: "Detection of Foetal Subchromosomal Abnormalities by Sequencing Circulating Cell -Free DNA from Maternal Plasma", CLINICAL CHEMISTRY, vol. 61, no. 4, 31 December 2015 (2015-12-31), pages 608 - 616, XP055215005 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN106778069B (zh) | 2020-02-14 |
| CN106778069A (zh) | 2017-05-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN102952854B (zh) | 单细胞分类和筛选方法及其装置 | |
| JP6534191B2 (ja) | コピー数変動を決定することにおける検出の感度を向上させるための方法 | |
| CN103525939B (zh) | 无创检测胎儿染色体非整倍体的方法和系统 | |
| CN104846089B (zh) | 一种孕妇外周血中胎儿游离dna比例的定量方法 | |
| CN107229841B (zh) | 一种基因变异评估方法及系统 | |
| CN104169929B (zh) | 用于确定胎儿是否存在性染色体数目异常的系统和装置 | |
| CN113450871B (zh) | 基于低深度测序的鉴定样本同一性的方法 | |
| KR101614471B1 (ko) | 유전체 서열분석을 이용한 태아 염색체 이수성의 진단 방법 및 장치 | |
| WO2016011982A1 (zh) | 确定生物样本中游离核酸比例的方法、装置及其用途 | |
| CN105844116B (zh) | 测序数据的处理方法和处理装置 | |
| CN110648722B (zh) | 新生儿遗传病患病风险评估的装置 | |
| WO2018149114A1 (zh) | 确定胎儿染色体中微缺失微重复的方法及设备 | |
| US12260935B2 (en) | Limit of detection based quality control metric | |
| CN104951671A (zh) | 基于单样本外周血检测胎儿染色体非整倍性的装置 | |
| CN109402247B (zh) | 一种基于dna变异计数的胎儿染色体检测系统 | |
| CN116864011A (zh) | 基于多组学数据的结直肠癌分子标志物识别方法及系统 | |
| CN114171116B (zh) | 孕妇游离及本身dna评估胎儿dna浓度的方法及应用 | |
| CN108229099B (zh) | 数据处理方法、装置、存储介质及处理器 | |
| CN114496078A (zh) | 通过计算胎儿浓度判断孕妇与胎儿亲子关系的方法 | |
| CN113889189A (zh) | 以生父和母亲dna评估胎儿dna浓度的方法及应用 | |
| US12073921B2 (en) | System for increasing the accuracy of non invasive prenatal diagnostics and liquid biopsy by observed loci bias correction at single base resolution | |
| CN109321641A (zh) | 一种基于dna片段富集及测序技术的产前无创胎儿染色体检测系统 | |
| CN105177130B (zh) | 用来评估艾滋病人发生免疫重建炎性综合症的标志物 | |
| CN113981062A (zh) | 以非生父和母亲dna评估胎儿dna浓度的方法及应用 | |
| CN119673271B (zh) | 一种利用峰值鉴定母源污染和检测拷贝数异常的方法及装置 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17896566 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02/12/2019) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 17896566 Country of ref document: EP Kind code of ref document: A1 |