[go: up one dir, main page]

HK40081922A - Method, device and storage medium for detecting fetal chromosomal aneuploid abnormalities - Google Patents

Method, device and storage medium for detecting fetal chromosomal aneuploid abnormalities Download PDF

Info

Publication number
HK40081922A
HK40081922A HK42023071359.6A HK42023071359A HK40081922A HK 40081922 A HK40081922 A HK 40081922A HK 42023071359 A HK42023071359 A HK 42023071359A HK 40081922 A HK40081922 A HK 40081922A
Authority
HK
Hong Kong
Prior art keywords
value
fetal
chromosome
sample
new
Prior art date
Application number
HK42023071359.6A
Other languages
Chinese (zh)
Inventor
杨杰淳
彭继光
彭智宇
孙隽
向嘉乐
刘晶娟
李婧柔
Original Assignee
深圳华大基因股份有限公司
Filing date
Publication date
Application filed by 深圳华大基因股份有限公司 filed Critical 深圳华大基因股份有限公司
Publication of HK40081922A publication Critical patent/HK40081922A/en

Links

Description

Method, device and storage medium for detecting fetal chromosome aneuploidy abnormality
Technical Field
The present application relates to the technical field of detecting fetal chromosomal aneuploidy abnormality, and in particular, to a method, an apparatus, and a storage medium for detecting fetal chromosomal aneuploidy abnormality.
Background
The fetal chromosome aneuploidy is abnormal, namely the fetal chromosome is aneuploid, the chromosome of a normal fetus is 23 pairs (46), namely the chromosome is aneuploid, and if the chromosome is deleted or increased to form aneuploidy, the existence of the fetal chromosome abnormality, namely the fetal chromosome aneuploidy abnormality is shown.
Currently, the clinically more common fetal chromosomal aneuploidy abnormalities are down syndrome, edward syndrome and Patau syndrome.
Down syndrome (21-trisomy) is a genetic disease caused by trisomy of chromosome 21 with common symptoms of developmental delay, specific facial features, and mild to moderate intellectual impairment. At present, the Down syndrome has no effective treatment method, and the life quality of patients can be improved only by living care and education. In addition to down syndrome, clinically common fetal chromosomal aneuploidy abnormalities include edwardsies syndrome (18-trisomy syndrome) and Patau syndrome (13-trisomy syndrome), which all cause severe dysplasia in children.
The molecular biological mechanism of Down syndrome is that the chromosome 21 is not separated when germ cells are generated, resulting in fertilized eggs containing 3 copies of chromosome 21, and further resulting in a series of molecular and developmental biological process abnormalities. Since there is no effective treatment for chromosomal aneuploidy syndrome represented by down syndrome and no specific behavior or environmental factors related to the onset of down syndrome have been found, the main current countermeasure is to avoid the birth of infants with severe genetic diseases such as down syndrome by prenatal screening of pregnant women, i.e., to perform appropriate detection when the mother is pregnant, and to avoid the birth of trisomy infants by terminating pregnancy if the relevant index is positive or high risk.
Traditional screening assessment of trisomy risk was performed by serological markers such as AFP, free β -hCG, uE3, inhibin-a. Because the serological marker is an indirect index and cannot directly reflect the fetal chromosome aneuploidy state, the sensitivity and the specificity are poor. Around 2010, high-throughput sequencing technology has been gradually developed and popularized, which can accurately detect and quantify free DNA (cfDNA) in maternal plasma, and then screen chromosome abnormalities (i.e. NIPT, non-Invasive preliminary Testing) including trisomy 21 by the relative content of target chromosomes. In 2015, an article published in a journal of New England and analyzed by prospective and multi-center clinical tests on 15841 samples shows that the performance of NIPT is remarkably superior to that of the traditional screening, and the sensitivity and specificity of the NIPT reach more than 99.9%; in contrast, the sensitivity of the traditional serological screening means is only 78.9%, and the specificity is only 94.6%, which proves that the NIPT greatly improves the screening effect of the chromosome aneuploidy syndrome represented by Down syndrome.
However, the performance of NIPT detection still remains to be improved. According to the article published in 2015 by Zhang et al, the authors analyze 112669 NIPT test results with follow-up results, and find that the detection performance of the conventional NIPT mainly has the following two problems: first, the Positive Predictive Value (PPV) is to be improved. According to data in the article, the positive predictive value of T21 is 92.2%, the positive predictive value of T18 is 76.6%, and the positive predictive value of T13 is only 32.8%, which shows that the traditional NIPT detection method has more false positives and the positive predictive value needs to be improved; secondly, the retest rate is high. According to the data in the article, 112669 samples produced 3213 heavy blood draws, with a rate of 2.8%. The re-drawing of blood means that the first NIPT detection value is in the gray zone, so that a negative or positive detection result cannot be given, and a tube of blood needs to be re-drawn and re-detected. In this case, the pregnant woman suffers not only one extra bleeding; more importantly, the period of the NIPT detection report result is prolonged, so that the pregnant woman may miss the optimal intervention period, and great hidden danger is brought to the life health of the pregnant woman.
Therefore, how to improve the positive predictive value of NIPT detection and reduce the retest rate is a research focus and difficulty of fetal chromosome aneuploidy abnormality detection.
Disclosure of Invention
It is an object of the present application to provide an improved method, apparatus and storage medium for detecting fetal chromosomal aneuploidy abnormalities.
In order to achieve the purpose, the following technical scheme is adopted in the application:
the first aspect of the application discloses a method for detecting fetal chromosome aneuploidy abnormality, which comprises the steps of calculating and obtaining a new Z value of a sample to be detected according to the fetal DNA concentration, the Z value and the embedding degree in free DNA of pregnant woman blood of the sample to be detected, wherein the new Z value is marked as Z new Judging whether the fetal chromosome of the sample to be detected has aneuploidy abnormality or not according to the new Z value; the degree of mosaicism is the ratio of fetal abnormal cells to all fetal cells.
It should be noted that, the key point of the method for detecting fetal chromosomal aneuploidy abnormality in the present application is to use the fetal DNA concentration, the unique index of the present application: the three variables of the embedding degree and the traditional Z value are calculated to obtain a new Z value which is commonly used and recognized in the current field, namely Z new . Wherein "conventional Z value" is the Z value obtained according to conventional methods; the "new Z value" of the present application is the Z value obtained by three variable calculations of the present application. According to the method, the chimerism degree is used as an input variable for calculating a 'new Z value', the accuracy of NIPT detection is improved, and the method has very good effect on true positive and false positive samplesGood discrimination, reduction of false positives; the new Z value accords with normal distribution in distribution, and not only can the requirements of current supervision and clinical use be met; and the fluctuation of data distribution can be greatly reduced, so that the gray area rate is reduced, the retest rate is reduced, and the stability of a detection result is improved.
In one implementation mode of the application, a new Z value of a sample to be detected is obtained through calculation according to the fetal DNA concentration, the Z value and the embedding degree in the free DNA of the pregnant woman blood of the sample to be detected, the new Z value of the sample to be detected is obtained through inputting the fetal DNA concentration, the Z value and the embedding degree into a fetal chromosome aneuploidy abnormality detection model, a model output value corresponding to the sample to be detected is obtained, and the new Z value of the sample to be detected is obtained through imprinting of the model output value; the fetal chromosome aneuploidy abnormality detection model is a model obtained by taking a plurality of samples with known fetal chromosome conditions as training samples, wherein the training samples comprise positive samples and negative samples of fetal chromosome aneuploidy abnormality, and performing machine learning model training by taking the fetal DNA concentration, the Z value and the embedding degree as input to obtain a model output value which integrates the fetal DNA concentration, the Z value and the embedding degree to represent the fetal chromosome conditions.
It is understood that the training of the machine learning model to obtain the new Z value is only one implementation of the present application, and does not exclude that other calculation methods may be used to obtain the new Z value of the present application by calculating the fetal DNA concentration, Z value, and chimerism.
In one implementation manner of the application, a new Z value of a sample to be tested is obtained by printing a model output value, and the new Z value of the sample to be tested is obtained by calculation according to the model output value of the sample to be tested, a positive threshold value, a negative threshold value and the median of the model output values of all negative samples; the positive threshold is the threshold of the model output value corresponding to the positive sample, and the negative threshold is the threshold of the model output value corresponding to the negative sample.
It should be noted that the model output value, or the value generated by the machine learning model, is a value for estimating the fetal chromosomal aneuploidy abnormality output by the fetal chromosomal aneuploidy abnormality detection model, and the value cannot be statistically significant like the conventional Z valueDividing a threshold, and only carrying out threshold demarcation according to the characteristics of the training data; for example, a negative threshold value is defined, so that all true positive samples in the training data can not be judged to be negative, and the model can not generate false negative; defining a positive threshold, so that as many true positive samples as possible can be judged as positive, and as few original false positive samples as possible are judged as positive, thereby reducing false positive to improve the performance of NIPT detection; a gray zone is formed between the positive threshold value and the negative threshold value; in order to enable the model output value of the sample to be detected to be directly used for judging the fetal chromosome aneuploidy abnormal state, the application further prints the model output value as a new Z value, namely Z new (ii) a Moreover, the test result shows that the new Z value obtained by the printing of the application conforms to the normal distribution, and the center of the distribution is positioned at 0, so that the Z value can still be used>3 as a positive judgment criterion, Z<1.96 as a negative judgment criterion.
In one implementation manner of the application, the median of the model output values of the negative samples is the median of the model output values of all the negative samples obtained by inputting all the negative training samples into the fetal chromosome aneuploidy abnormality detection model again;
in one implementation of the present application, obtaining the new Z value of the sample to be tested by imprinting the model output value includes the following imprinting methods,
when the model output value of the sample to be detected is larger than the positive threshold value, Z new =LD-cut p +3;
When the model output value of the sample to be detected is smaller than the positive threshold value and larger than the negative threshold value,
when the model output value of the sample to be tested is less than the negative threshold value,
in the above formula, Z new For new Z value, LD is the sample to be measuredModel output value of book, cut p Cut as positive threshold n As negative threshold, med is the median of the model output values for all negative samples.
In the present application, the idea of obtaining a new Z value by model output value imprinting is as follows:
1) When the model output value is greater than the positive threshold, the final transformed new Z value needs to be greater than 3, since the clinical decision for trisomy positivity is habituated to Z >3 as the threshold.
2) When the model output value is in the gray zone, the new Z value obtained by final conversion needs to be between 1.96 and 3, because the clinical practice takes Z to [1.96, 3) as the gray zone range.
3) When the model output value is less than the negative threshold, the final transformed new Z value needs to be less than 1.96, because the clinical decision habit for trisomy negativity has Z <1.96 as the threshold.
In addition, the application also ensures that the median of the new Z value of the negative sample is 0, because the median should be equal to 0 for the standard normal distribution; therefore, the equation is filled with Med, which is the median of the model output values of the negative samples. It can be found from the above imprinting formula that when the model output value is equal to the median of the model output values of the negative samples, the new Z value is 0.
It should be further noted that the specific numerical values in the above formula are data obtained specifically in an implementation manner of the present application; it will be appreciated that if the training samples change, the data in the corresponding printout formula will also change; however, the basic principle of obtaining new Z values by means of the imprinting formula is unchanged.
In one implementation mode of the application, whether the fetal chromosome of the sample to be detected has aneuploidy abnormality is judged according to the new Z value, wherein the new Z value is more than 3, and the new Z value is judged to be positive, namely, the fetal chromosome is aneuploidy abnormal; a new Z value less than 1.96 is judged as negative, namely, the fetal chromosome is normal.
In one implementation of the present application, the machine learning model is a Linear Discriminant Analysis (LDA) model.
In one implementation of the present application, the fetal abnormal cell is a cell containing a fetal chromosomal aneuploidy abnormality.
In one implementation of the present application, the fetal DNA concentration and Z value in the maternal blood free DNA are calculated from the high throughput sequencing data of the maternal blood free DNA.
In an implementation manner of the present application, the degree of engagement is obtained by calculation according to a formula one;
formula one
In formula I, mosaic k (ii) degree of chimerism of the kth chromosome, fra k Relative fetal concentration for the kth chromosome, FF is fetal DNA concentration;
fra k calculating by adopting a formula II;
formula two
In formula two, fra k Is the relative fetal concentration of the kth chromosome,is the mean value of the corrected depths of the kth chromosome,the mean of the corrected depths for all autosomes;
in the first formula and the second formula, the value of k is 1 to 22;
Mosaic k a value of 0 indicates that the kth chromosome of the fetus is normal; mosaic k 1, indicating that the kth chromosome of the fetus is completely trisomy; mosaic k Between 0 and 1, it indicates that the kth chromosome of the fetus is chimeric. In the present application, the existence of chimeric fetal chromosome k means that chromosome k in a part of fetal cells is in a trisomy state, and chromosome k in another part of fetal cells is in a non-trisomy state; in principle, in the fixingUnder the condition of fetal DNA concentration, if the fetus is a complete trisomy, the trisomy signal in the peripheral blood of the pregnant woman is stronger; if the fetus is a chimeric trisome, the trisome signal in the peripheral blood of the pregnant woman is weaker; in addition, a low degree of chimerism generally indicates false positives due to data fluctuation, and a high degree of chimerism generally indicates true positives.
In one implementation of the present application, the mean value of the corrected depths of each chromosome and the mean value of the corrected depths of all autosomes are obtained by high throughput sequencing data calculation of maternal blood free DNA.
In one implementation manner of the present application, the method for detecting fetal chromosomal aneuploidy abnormality of the present application includes the following steps:
the data acquisition step comprises the steps of acquiring high-throughput sequencing data of the blood free DNA of the pregnant woman to be detected;
the data processing step comprises the steps of calculating the concentration and the Z value of the DNA of the fetus according to the acquired high-throughput sequencing data of the free DNA of the blood of the pregnant woman to be detected;
calculating the chimerism degree, including calculating the chimerism degree of each chromosome according to a formula I;
calculating a new Z value, namely calculating to obtain the new Z value of the sample to be detected according to the fetal DNA concentration, the Z value and the embedding degree in the free DNA of the blood of the pregnant woman of the sample to be detected;
and judging the aneuploidy abnormality of the chromosome of the fetus, wherein the judging step comprises the step of judging whether the aneuploidy abnormality of the chromosome of the fetus to be detected occurs according to the new Z value.
It should be noted that the key point of the method for detecting fetal chromosomal aneuploidy abnormality in the present application is to comprehensively consider three variables of fetal DNA concentration, degree of mosaicism and traditional Z value through a fetal chromosomal aneuploidy abnormality detection model to obtain a model output value, and convert the model output value into a commonly used and approved Z value in the current field, i.e., a new Z value (Z value) new ). Wherein, the Z value obtained in the 'traditional Z value', namely the 'data processing step', is called as the 'traditional Z value' according to the Z value obtained in the traditional conventional method, in order to better distinguish the 'new Z value' of the application, and the application is printed by the model output valueThe Z value obtained is called the "new Z value". The method and the device have the advantages that the embedding degree is incorporated into the machine learning model, so that the accuracy of NIPT detection is improved, the true positive and false positive samples are well distinguished, and the false positive is reduced; the new Z value accords with normal distribution in distribution, and not only can the requirements of current supervision and clinical use be met; and the fluctuation of data distribution can be greatly reduced, so that the ash area rate is reduced, the retest rate is reduced, and the stability of a detection result is improved.
The second aspect of the application discloses a method for constructing a fetal chromosome aneuploidy abnormality detection model, which comprises the steps of adopting a plurality of samples with known fetal chromosome conditions as training samples, wherein the training samples comprise positive samples and negative samples of fetal chromosome aneuploidy abnormality, taking the concentration of fetal DNA, a Z value and the degree of mosaicism as input, carrying out machine learning model training, obtaining a model output value which integrates three variables of the concentration of fetal DNA, the Z value and the degree of mosaicism to represent the fetal chromosome conditions, and obtaining the model, namely the fetal chromosome aneuploidy abnormality detection model.
It should be noted that, the method for constructing the model for detecting the fetal chromosomal aneuploidy abnormality of the present application is actually a method for constructing the model for detecting the fetal chromosomal aneuploidy abnormality in the method for detecting the fetal chromosomal aneuploidy abnormality of the present application; therefore, the calculation methods of fetal DNA concentration, Z value and fitness can be referred to the method for detecting fetal chromosomal aneuploidy abnormality of the present application, and will not be described in detail herein.
The third aspect of the application discloses a device for detecting the fetal chromosome aneuploidy abnormality, which comprises a new Z value calculating module and a fetal chromosome aneuploidy abnormality judging module; the new Z value calculation module calculates and obtains a new Z value of the sample to be detected according to the fetal DNA concentration, the Z value and the embedding degree in the free DNA of the pregnant woman blood of the sample to be detected; the degree of mosaicism is the ratio of abnormal fetal cells to all fetal cells; and the fetal chromosome aneuploidy abnormality module judges whether the fetal chromosome of the sample to be detected has aneuploidy abnormality according to the new Z value.
In an implementation manner of the present application, the new Z value calculating module further includes inputting the fetal DNA concentration, the Z value, and the chimerism degree into the fetal chromosome aneuploidy abnormality detection model to obtain a model output value corresponding to the sample to be detected, and obtaining a new Z value of the sample to be detected by the model output value through imprinting; the fetal chromosome aneuploidy abnormality detection model is obtained by taking a plurality of samples with known fetal chromosome conditions as training samples, wherein the training samples comprise positive samples and negative samples of fetal chromosome aneuploidy abnormality, and performing machine learning model training by taking fetal DNA concentration, Z value and embedding degree as input; and the model output value is used for integrating three variables of fetal DNA concentration, Z value and chimerism degree to represent the fetal chromosome condition.
Therefore, in an implementation manner of the present application, the apparatus of the present application further includes a model training module, which uses a plurality of samples of known fetal chromosome conditions as training samples, where the training samples include positive samples and negative samples of fetal chromosome aneuploidy abnormalities, and performs machine learning model training by taking the fetal DNA concentration, the Z value, and the degree of mosaicism as inputs to obtain a model output value that integrates three variables, namely, the fetal DNA concentration, the Z value, and the degree of mosaicism to represent the fetal chromosome conditions, so as to obtain a model, that is, a fetal chromosome aneuploidy abnormality detection model. Preferably, the machine learning model is a linear discriminant analysis model.
In one implementation manner of the present application, the new Z value calculation module includes a model output value analysis sub-module and a Z value printing sub-module; the model output value analysis submodule is used for inputting the fetal DNA concentration, the Z value and the chimerism degree of the sample to be detected into the fetal chromosome aneuploidy abnormality detection model to obtain a model output value corresponding to the sample to be detected; the Z value printing sub-module is used for calculating and obtaining a new Z value of the sample to be detected according to the model output value of the sample to be detected, the positive threshold value, the negative threshold value and the median of the model output values of all the negative samples; the positive threshold is a threshold of a model output value corresponding to a positive sample, and the negative threshold is a threshold of a model output value corresponding to a negative sample.
In one implementation of the present application, the Z-value print submodule, obtains a new Z-value according to the following manner,
when the model output value of the sample to be detected is greater than the positive threshold value, Z new =LD-cut p +3;
When the model output value of the sample to be detected is smaller than the positive threshold value and larger than the negative threshold value,
when the model output value of the sample to be tested is less than the negative threshold value,
in the above formula, Z new For the new Z value, LD is the model output value of the sample to be measured, cut p Cut as positive threshold n As a negative threshold, med is the median of the model output values of all negative samples;
in an implementation manner of the application, the fetal chromosome aneuploidy abnormality module judges whether the fetal chromosome of the sample to be detected has aneuploidy abnormality according to the new Z value, wherein the new Z value is more than 3 and is judged to be positive, namely, the fetal chromosome aneuploidy abnormality; a new Z value less than 1.96 is judged as negative, namely, the fetal chromosome is normal.
It should be noted that in the apparatus of the present application, the model training module may be used as required, for example, in the case where the fetal chromosome aneuploidy abnormality detection model, the positive threshold, the negative threshold and the median of the model output values of all negative samples have been obtained, other modules may directly call the model and the data; thus, it is not necessary to run the model training module every time a test is made. Of course, if the training samples change, e.g., the training samples are added, it is recommended to run the model training module to further refine the model and the various data.
It should be noted that, the device for detecting fetal chromosome aneuploidy abnormality of the present application is actually a method for detecting fetal chromosome aneuploidy abnormality of the present application implemented by each module; due to the fact thatIn this regard, specific definitions of the modules may be found in the methods of the present application for detecting fetal chromosomal aneuploidy abnormalities. For example, calculation of fetal DNA concentration, Z value and degree of mosaicism, specific Z new Calculation method, linear discriminant analysis model, how to base on Z new For positive and negative judgment, etc., reference may be made to the method for detecting fetal chromosomal aneuploidy abnormality of the present application.
A fourth aspect of the present application discloses an apparatus for detecting a fetal chromosomal aneuploidy abnormality, the apparatus comprising a memory and a processor; the memory includes a memory for storing a program; the processor comprises a program stored by the memory for executing the method for detecting the fetal chromosomal aneuploidy abnormality or the method for constructing the fetal chromosomal aneuploidy abnormality detection model.
It is to be understood that, when the apparatus of the present application implements the method for constructing the model for detecting fetal chromosomal aneuploidy abnormality of the present application by executing the program stored in the memory, the apparatus of the present application is actually an apparatus for model construction, and the model constructed by the apparatus can be used for detecting fetal chromosomal aneuploidy abnormality according to the method of the present application.
A fifth aspect of the present application discloses a computer-readable storage medium having stored therein a program executable by a processor to implement the method of detecting a fetal chromosomal aneuploidy abnormality of the present application or the method of constructing a model for detecting a fetal chromosomal aneuploidy abnormality of the present application.
It is to be understood that, when a program stored in a computer-readable storage medium of the present application is capable of being executed by a processor to implement the method for constructing a model for detecting fetal chromosomal aneuploidy abnormality of the present application, the computer-readable storage medium of the present application is actually a computer-readable storage medium for model construction, and the computer-readable storage medium can be directly used to implement the construction of a model for detecting fetal chromosomal aneuploidy abnormality, whereby the model obtained by the construction can be used for detecting fetal chromosomal aneuploidy abnormality according to the method of the present application.
Due to the adoption of the technical scheme, the beneficial effects of the application are as follows:
according to the method and the device for detecting the fetal chromosome aneuploidy abnormality, the embedding degree is firstly brought into the fetal chromosome aneuploidy abnormality detection, three variables of the fetal DNA concentration, the embedding degree and the traditional Z value are comprehensively considered, and a new Z value is calculated. The method and the device can improve the accuracy of NIPT detection, have good discrimination on true positive and false positive samples, and reduce false positives. In addition, the new Z value accords with normal distribution in distribution, and the requirements of current supervision and clinical use can be met; the fluctuation of data distribution can be reduced, so that the gray area rate and the retest rate are reduced, and the stability of a detection result is improved.
Drawings
FIG. 1 is a block flow diagram of a method for detecting fetal chromosomal aneuploidy abnormalities in an embodiment of the present application;
FIG. 2 is a block diagram of an apparatus for detecting fetal chromosomal aneuploidy abnormality according to an embodiment of the present application;
FIG. 3 is a T13 chimerism analysis chart of 10240 samples in the example of the present application;
FIG. 4 is a Q-Q graph showing new Z values of chromosome 21 of 10000 samples in the example of the present application;
fig. 5 is a distribution diagram of conventional Z values and new Z values of 10000 samples of chromosome 13 in the example of the present application.
Detailed Description
The present application will be described in further detail below with reference to the accompanying drawings by way of specific embodiments. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted or replaced with other devices, materials, or methods in various circumstances. In some instances, certain operations related to the present application have not been shown or described in this specification in order not to obscure the core of the present application with unnecessary detail, but it is not necessary for those skilled in the art to describe these operations in detail, and the related operations will be fully understood from the description in the specification and the general knowledge of the art.
The chimerism degree is creatively used as a variable for calculating a new Z value, so that the accuracy of NIPT detection is improved. Therefore, the application provides a method for detecting fetal chromosome aneuploidy abnormality, which comprises the steps of calculating to obtain a new Z value of a sample to be detected according to the fetal DNA concentration, the Z value and the embedding degree in the free DNA of the blood of a pregnant woman of the sample to be detected, and judging whether the fetal chromosome of the sample to be detected has aneuploidy abnormality or not according to the new Z value; wherein the degree of mosaicism is the ratio of abnormal fetal cells to all fetal cells.
In one implementation of the present application, the method for detecting an aneuploidy abnormality of a fetal chromosome specifically includes, as shown in fig. 1, a data acquisition step 11, a data processing step 12, a chimerism calculation step 13, a new Z value calculation step 14, and a fetal chromosome aneuploidy abnormality determination step 15.
Wherein, the data acquisition step 11 comprises the step of acquiring high-throughput sequencing data of the blood free DNA of the pregnant woman to be detected. For example, in one implementation of the present application, the off-line data is a fastq formatted file generated by the sequencer.
And a data processing step 12, which comprises calculating the fetal DNA concentration, the Z value, the average value of the corrected depth of each chromosome and the average value of the corrected depth of all autosomes according to the acquired high-throughput sequencing data of the blood free DNA of the pregnant woman to be detected. In one implementation of the present application, this step includes the general operations of a conventional NIPT procedure, which specifically includes the following:
a) Sequence comparison and filtration, wherein sequence information contained in a fastq format file generated by a sequencer is compared to a human reference genome such as GRCh37/hg19 through open software such as BWA (0.7.7-r 441), sequences with poor comparison quality, multiple comparison sequences, repeated sequences and non-perfect comparison sequences are removed through filtration, unique comparison sequences are left, and information such as coordinates of each unique comparison sequence is stored in a bam format file.
B) Dividing a human reference genome into windows of about 60kb, and counting the number of uniquely aligned sequences in each window of 60kb to serve as the original depth information of the window, namely the window depth. Further, GC correction and sample-to-sample correction are performed on the original depth of each window to obtain corrected depth information (namely UR) of each window, the corrected depths of all windows on one chromosome are averaged to obtain an average value of the corrected depths of the kth chromosome, and the average value of the corrected depths of all autosomes is calculated.
C) Calculating the DNA concentration of the fetus, wherein different calculation modes are adopted for male fetus and female fetus respectively in the application, and the calculation modes are as follows:
the concentration of male fetus was calculated as follows:
the fetal concentration of the male fetus is determined by the ratio of Y chromosomes, the UR mean value of the window of the Y chromosomes is divided by the UR mean value of the autosomes, and the UR mean value is multiplied by 2 to obtain the fetal concentration FF of the male fetus:
the female fetus concentration was calculated as follows:
fetal concentration in a female fetus was estimated by building a high-dimensional regression model using the non-uniform distribution of fetal free DNA on the genome, with the background assumption that the fetal, regardless of male or female, is characterized by a different distribution of fetal cfDNA and maternal cfDNA on the genome. Therefore, the fetal concentration estimated by the Y chromosome method of the male fetus is used as the input of the training model, and the regression model is constructed by the neural network machine learning method, which specifically comprises the following steps:
where l is the sequence number of the layers of the network, the first layer is the input layer, the last layer is the output layer (only one neuron), and the middle is the hidden layer.Is the value of the jth neuron in the ith layer,is the value of the kth neuron at layer l-1,for the connection weight of the kth neuron at layer l-1 to the jth neuron at layer l,the input bias of the jth neuron of the ith layer. The most common form of function f is the reconstructed linear unit, i.e. f (x) = max (0, x). w and b are obtained during model training. When the model is applied, the numerical values of the neurons are calculated layer by layer according to the formula, and the numerical value of the neuron at the last layer is the predicted value of the fetal concentration model.
D) In the traditional Z value calculation, the depths of all intervals on a certain chromosome conform to normal distribution, so that the Z value of the chromosome to be detected can be calculated by using the depth distribution of the intervals of the chromosome to be detected as a reference, and the Z value can be used as a basis for judging whether the chromosome is a trisome or not.
Specifically, the traditional Z value calculation method is as follows:
the corrected depth information UR of each window of the autosome obeys Poisson distribution, the depth information UR obeys normal distribution when the number of the windows is large, the distribution of the chromosome UR to be detected and the distribution of the reference chromosome UR do not have obvious difference for normal samples, and the distribution of the chromosome UR to be detected and the distribution of the reference chromosome UR have small difference for abnormal samples, and the fetal chromosome aneuploidy abnormality can be judged by using Z test, which is specifically as follows:
wherein:
chromosome i URThe mean value of (a);
mean of chromosome j UR;
SD i : standard deviation of UR representing chromosome i;
SD j : standard deviation of UR representing chromosome j;
L i : the number of windows representing chromosome i division;
L j : the number of windows representing chromosome j divisions;
Z i : indicating the significance of aneuploidy of chromosome i, the difference between response and euploidy.
The above formula compares 22 autosomes within the same sample against each other, and the background assumption for this is that most of the chromosomes in a sample should be normal diploids. Therefore, the target chromosome is compared 21 times with the rest 21 chromosomes, if the target chromosome is a normal diploid, most of the values of the 21 times of Z tests are close to 0, and a negative Z value is obtained by averaging; on the other hand, if the target chromosome is trisomy, most values of the 21Z tests are far greater than 0, and averaging results in a positive Z value.
The fitness calculation step 13 includes calculating the fitness of each chromosome based on the fetal DNA concentration.
For example, the chimerism of each chromosome is calculated according to the formula one:
formula one
In formula I, mosaic k Degree of chimerism of the kth chromosome, fra k Relative fetal concentration for the kth chromosome, FF is fetal DNA concentration; fra k And calculating by adopting a formula II:
formula two
In formula two, fra k Is the relative fetal concentration of the kth chromosome,is the mean value of the corrected depths of the kth chromosome,the mean of the corrected depths for all autosomes; in the first and second formulas, k has a value of 1 to 22.
Mosaic k A value of 0 indicates that chromosome k is normal; mosaic k A value of 1 indicates that chromosome k of the fetus is completely trisomy; mosaic k Between 0 and 1, it indicates that chromosome k of the fetus is chimeric.
It should be noted that the calculation of the chimerism level and its incorporation into the fetal chromosomal aneuploidy abnormality is one of the innovative improvements of the present application. Studies have shown that fetal trisomy is not all the case in complete trisomy, that is, not every cell in the fetus is in a trisomy state, and some fetal cells are in a trisomy state, and some of the non-trisomy states are called chimerism. The detection result of NIPT is influenced by the fetal chimerism, for example, in the case of a fixed fetal DNA concentration, if the fetus is a perfect trisomy, the trisomy signal in the peripheral blood of a pregnant woman is strong; if the fetus is chimeric trisomy, the trisomy signals in the peripheral blood of the pregnant woman are weaker. Since NIPT involves multiple steps of plasma collection, storage, transport, cfDNA isolation, pooling, on-machine sequencing, etc., small fluctuations in any one step can lead to fluctuations in the final assay results, and for negative samples, weak trisomy signals of similar low fitness may be brought about by data fluctuations. Therefore, the inventors of the present application have creatively proposed that the degree of mosaicism is quantitatively described, and the difference between the degree of mosaicism of a true positive sample and the degree of mosaicism of a weak trisomy signal due to data fluctuation is further clarified, so as to better distinguish between a true positive sample and a false positive sample.
A new Z value calculating step 14, which comprises calculating and obtaining a new Z value of the sample to be detected according to the fetal DNA concentration, the Z value and the embedding degree in the free DNA of the pregnant woman blood of the sample to be detected; wherein the degree of mosaicism is the ratio of abnormal fetal cells to all fetal cells.
For example, the new Z value calculation step 14 is divided into a model output value analysis sub-step and a Z value printing sub-step.
And the substep of analyzing the model output value comprises the step of inputting the fetal DNA concentration, the traditional Z value and the embedding degree of the sample to be detected into the fetal chromosome aneuploidy abnormality detection model to obtain the model output value corresponding to the sample to be detected. The fetal chromosome aneuploidy abnormality detection model is a model obtained by taking a plurality of samples of known fetal chromosome aneuploidy abnormality situations as training samples, taking the DNA concentration of a fetus, the traditional Z value and the embedding degree as input, taking the output value of the model as output and carrying out machine learning model training.
It should be noted that machine learning model training is another innovative improvement of the present application, and the present application finds that, before training a model, a very good linear relationship exists among three variables of fetal DNA concentration, fitness and traditional Z value; therefore, three variables of the fetal DNA concentration, the degree of mosaicism and the traditional Z value are put into an LDA (linear discriminant analysis) model for model training to obtain a trained model, namely a fetal chromosome aneuploidy abnormality detection model.
The general form of the LDA model is as follows:
LD=W 1 a 1 +W 2 a 2 +…+w k a k
wherein w k Is a coefficient, i.e. the model output value obtained by the model training, and a k Is a variable, and is the sample information of the input model, in this case fetal concentration, conventional Z-value, and degree of engagement. Therefore, after the model is trained, coefficients of three variables of the fetal concentration, the traditional Z value and the chimerism degree are actually obtained, and by the three coefficients and the fetal concentration, the traditional Z value and the chimerism degree of the sample, the result of the machine learning model, namely the model output value (LD value), can be obtained through the formula.
Z-value printing substeps comprisingCalculating to obtain a new Z value of the sample to be detected according to the model output value of the sample to be detected, the positive threshold value, the negative threshold value and the median of the model output values of all the negative samples, and marking the value as Z new
It should be noted that the result generated by the machine learning model no longer conforms to a certain distribution with statistical significance, so that the threshold cannot be divided according to the statistical significance like the conventional Z value, and only the threshold is divided by the features of the training data. And a negative threshold value is defined, so that all true positive samples in the training data can not be judged to be negative, and the model can not generate false negative. And defining a positive threshold value, so that as many true positive samples as possible can be judged as positive, and simultaneously as few original false positive samples as possible are judged as positive, thereby reducing false positive to improve the performance of NIPT detection. Between the positive and negative thresholds are gray areas.
The result generated by the machine learning model no longer conforms to a certain distribution with statistical significance, however, in practical clinical use, according to clinical use habits and regulatory requirements, the NIPT trisomy detection result must be fed back in the form of a Z value, and with 3 as a positive threshold, how to convert the result of the machine learning model without statistical significance into the Z value with statistical significance is the third innovative improvement of the present application. The machine learning model adopted in an implementation mode of the application is a linear model, so that the final generated result of the machine learning model can keep the distribution characteristics of the traditional Z value; therefore, the method creatively adopts a printing method to print the model output value into a new Z value, so that the NIPT detection performance can be improved, and the final result has a distribution characteristic similar to the Z value, namely, the final result conforms to normal distribution with the center being 0.
In an implementation manner of the present application, a specific printing method is as follows:
when the model output value of the sample to be detected is greater than the positive threshold value, Z new =LD-cut p +3;
When the model output value of the sample to be detected is smaller than the positive threshold value and larger than the negative threshold value,
when the model output value of the sample to be tested is less than the negative threshold value,
in the above formula, Z new For the new Z value, LD is the model output value of the sample to be measured, cut p Cut as positive threshold n As negative threshold, med is the median of the model output values of negative samples.
And a step 15 of judging the aneuploidy abnormality of the fetus chromosome, which comprises the step of judging whether the aneuploidy abnormality of the chromosome of the fetus to be detected occurs according to the new Z value.
In one implementation of the present application, the new Z value obtained by the segmented printing also conforms to the normal distribution, and the center of the distribution is located at 0; therefore, Z <1.96 can still be used as a positive judgment value and Z >3 as a negative judgment value.
The method comprises the steps of adopting a plurality of samples with known fetal chromosome conditions as training samples, carrying out machine learning model training by taking the fetal DNA concentration, the Z value and the embedding degree as input, obtaining a model output value which integrates three variables of the fetal DNA concentration, the Z value and the embedding degree and represents the fetal chromosome conditions, and obtaining the model, namely the fetal chromosome aneuploidy abnormality detection model. The methods for calculating the fetal DNA concentration, the Z value and the degree of mosaicism can refer to the method for detecting the fetal chromosomal aneuploidy abnormality of the present application, and will not be described herein.
Those skilled in the art will appreciate that all or part of the functions of the above-described methods may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above method are implemented by means of a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above can be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a portable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above methods may be implemented.
Therefore, based on the method for detecting the fetal chromosome aneuploidy abnormality, the application provides a device for detecting the fetal chromosome aneuploidy abnormality, which comprises a new Z value calculation module and a fetal chromosome aneuploidy abnormality judgment module, wherein the new Z value calculation module calculates and obtains a new Z value of a sample to be detected according to the fetal DNA concentration, the Z value and the embedding degree in the blood free DNA of a pregnant woman of the sample to be detected; the degree of mosaicism is the ratio of abnormal fetal cells to all fetal cells; the fetal chromosome aneuploidy abnormality module judges whether the fetal chromosome of the sample to be detected has aneuploidy abnormality according to the new Z value.
In one implementation of the present application, the apparatus for detecting an abnormal fetal chromosomal aneuploidy includes, as shown in fig. 2, a data obtaining module 21, a data processing module 22, a chimeric degree calculating module 23, a model training module 24, a new Z value calculating module 25, and a fetal chromosomal aneuploidy abnormality determining module 26.
The data acquisition module 21 includes a high-throughput sequencing module for acquiring free DNA of blood of the pregnant woman to be tested. For example, a document in fastq format generated by a sequencer is obtained.
And the data processing module 22 comprises a module for calculating the fetal DNA concentration, the traditional Z value, the average value of the corrected depth of each chromosome and the average value of the corrected depth of all autosomes according to the acquired high-throughput sequencing data of the blood free DNA of the pregnant woman to be detected. Calculation of fetal DNA concentration, conventional Z values, mean of corrected depths for each chromosome, mean of corrected depths for all autosomes, etc. is performed, for example, with reference to existing conventional NIPT protocols.
The fitness calculating module 23 includes calculating the fitness of each chromosome based on the fetal DNA concentration.
For example, the degree of mosaicism of each chromosome is calculated according to formula one;
formula one
In formula I, mosaic k Degree of chimerism of the kth chromosome, fra k Relative fetal concentration for the kth chromosome, FF is fetal DNA concentration;
fra k calculating by adopting a formula II;
formula II
In formula two, fra k Is the relative fetal concentration of the kth chromosome,is the mean value of the corrected depths of the kth chromosome,the mean of the corrected depths for all autosomes;
in the formula I and the formula II, the value of k is 1 to 22;
Mosaic k a value of 0 indicates that chromosome k is normal; mosaic k A value of 1 indicates that chromosome k of the fetus is completely trisomy; mosaic k Between 0 and 1, indicating that chromosome k of the fetus is chimeric.
The model training module 24 is configured to use a plurality of samples of known fetal chromosome conditions as training samples, where the training samples include positive samples and negative samples of fetal chromosome aneuploidy abnormalities, and perform machine learning model training by using fetal DNA concentration, Z value, and degree of mosaicism as inputs to obtain a model output value that represents fetal chromosome conditions by integrating three variables of fetal DNA concentration, Z value, and degree of mosaicism, and the obtained model is a fetal chromosome aneuploidy abnormality detection model; after model training, the corresponding positive threshold value is obtained by using the positive sample, the corresponding negative threshold value is obtained by using the negative sample, and the median is obtained by using all the negative sample model output values.
The new Z value calculating module 25 is used for calculating and obtaining a new Z value of the sample to be detected according to the fetal DNA concentration, the Z value and the embedding degree in the free DNA of the pregnant woman blood of the sample to be detected; wherein the degree of mosaicism is the ratio of abnormal fetal cells to all fetal cells.
For example, the new Z value calculation module 25 includes a model output value analysis sub-module and a Z value print sub-module; the model output value analysis submodule is used for inputting the fetal DNA concentration, the Z value and the chimerism degree of the sample to be detected into the fetal chromosome aneuploidy abnormality detection model to obtain a model output value corresponding to the sample to be detected; the Z value printing sub-module is used for calculating and obtaining a new Z value of the sample to be detected according to the model output value of the sample to be detected, the positive threshold value, the negative threshold value and the median of the model output values of all negative samples; the positive threshold is a threshold of a model output value corresponding to a positive sample, and the negative threshold is a threshold of a model output value corresponding to a negative sample.
The fetal chromosome aneuploidy abnormality determining module 26 is configured to determine whether the chromosome of the fetus to be detected has aneuploidy abnormality according to the new Z value. For example, a new Z value greater than 3 is determined to be positive, i.e., a fetal chromosomal aneuploidy abnormality; a new Z value less than 1.96 is judged as negative, namely, the fetal chromosome is normal.
There is also provided in another implementation form of the present application an apparatus for detecting a fetal chromosomal aneuploidy abnormality, the apparatus including a memory and a processor; a memory including a memory for storing a program; a processor comprising instructions for implementing the following method by executing a program stored in a memory: calculating to obtain a new Z value of the sample to be detected according to the fetal DNA concentration, the Z value and the embedding degree in the free DNA of the pregnant woman blood of the sample to be detected, and judging whether the fetal chromosome of the sample to be detected has aneuploidy abnormality or not according to the new Z value; wherein the degree of mosaicism is the ratio of abnormal fetal cells to all fetal cells. Or, in particular, for implementing the following method: the data acquisition step comprises the steps of acquiring high-throughput sequencing data of the blood free DNA of the pregnant woman to be detected; the data processing step comprises the steps of calculating the fetal DNA concentration and the traditional Z value according to the acquired high-throughput sequencing data of the free DNA of the blood of the pregnant woman to be detected; a chimerism degree calculating step including calculating a chimerism degree of each chromosome based on the fetal DNA concentration; a step of analyzing model values, which comprises the steps of inputting the DNA concentration, the traditional Z value and the chimerism degree of the fetus to be detected into a fetus chromosome aneuploidy abnormality detection model to obtain a model output value corresponding to a sample to be detected; a Z value printing step, which comprises calculating to obtain a new Z value according to the model output value of the sample to be detected, the positive threshold value, the negative threshold value and the median of the model output value of the negative sample; and judging the aneuploidy abnormality of the chromosome of the fetus, wherein the judging step comprises the step of judging whether the aneuploidy abnormality of the chromosome of the fetus to be detected occurs according to the new Z value.
Alternatively, the apparatus comprises a memory and a processor; a memory including a memory for storing a program; a processor comprising instructions for implementing the following method by executing a program stored in a memory: the method comprises the steps of adopting a plurality of samples with known fetal chromosome conditions as training samples, wherein the training samples comprise positive samples and negative samples of fetal chromosome aneuploidy abnormality, carrying out machine learning model training by taking fetal DNA concentration, Z values and embedding degrees as input, and obtaining a model output value which integrates three variables of the fetal DNA concentration, the Z values and the embedding degrees to represent the fetal chromosome conditions, so that an obtained model, namely a fetal chromosome aneuploidy abnormality detection model, is obtained.
In another implementation form, a computer-readable storage medium is provided, the storage medium including a program, the program being executable by a processor to implement a method of: calculating to obtain a new Z value of the sample to be detected according to the fetal DNA concentration, the Z value and the embedding degree in the free DNA of the blood of the pregnant woman of the sample to be detected, and judging whether the fetal chromosome of the sample to be detected has aneuploidy abnormality or not according to the new Z value; wherein the degree of mosaicism is the ratio of abnormal fetal cells to all fetal cells. Or, in particular, for implementing the following method: the data acquisition step comprises the steps of acquiring high-throughput sequencing data of the blood free DNA of the pregnant woman to be detected; the data processing step comprises the steps of calculating the fetal DNA concentration and the traditional Z value according to the acquired high-throughput sequencing data of the free DNA of the blood of the pregnant woman to be detected; a chimerism degree calculating step including calculating a chimerism degree of each chromosome based on the fetal DNA concentration; a step of analyzing model values, which comprises the steps of inputting the DNA concentration, the traditional Z value and the chimerism degree of the fetus to be detected into a fetus chromosome aneuploidy abnormality detection model to obtain a model output value corresponding to a sample to be detected; a Z value printing step, which comprises calculating to obtain a new Z value according to the model output value of the sample to be detected, the positive threshold value, the negative threshold value and the median of the model output value of the negative sample; and judging the aneuploidy abnormality of the chromosome of the fetus, wherein the judging step comprises the step of judging whether the aneuploidy abnormality of the chromosome of the fetus to be detected occurs according to the new Z value.
Alternatively, the storage medium includes a program that is executable by a processor to implement the method of: the method comprises the steps of adopting a plurality of samples with known fetal chromosome conditions as training samples, wherein the training samples comprise positive samples and negative samples of fetal chromosome aneuploidy abnormality, carrying out machine learning model training by taking fetal DNA concentration, Z values and embedding degrees as input, and obtaining a model output value which integrates three variables of the fetal DNA concentration, the Z values and the embedding degrees to represent the fetal chromosome conditions, so that an obtained model, namely a fetal chromosome aneuploidy abnormality detection model, is obtained.
The method and the device of the application are different from the prior art in that:
(1) The method originally creates a detection index, namely the chimerism degree, and researches show that the chimerism degree has good discrimination on true positive samples and false positive samples which are only reported by a Z value (namely the traditional Z value) method at present.
(2) The method integrates three variables of fetal concentration, the unique index of the method, namely embedding degree and traditional Z value, and specifically selects Linear Discriminant Analysis (LDA) as a machine learning model in an implementation mode to train the model and judge the result. Researches find that the judgment result of the model can reduce false positives compared with the original result, and the detection effect is improved.
(3) All three variables used in the method are in linear relations, the linear relations are simple and clear, and the complexity caused by excessive variables and different dimension and distribution characteristics among the variables is avoided. In one implementation, a linear discriminant analysis model (LDA) model is used for the analysis, which is simple and does not present the problem of overfitting.
(4) The application develops a new Z value conversion method, and the numerical value which is obtained by machine learning and does not have statistical meaning is converted into a Z value which is clinically common and meets the supervision requirement, namely the new Z value of the application. In addition, the new Z value obtained by the Z value conversion method meets normal distribution in distribution, and can meet the requirements of current supervision and clinical use.
(5) With the new Z value of this application and the contrast of traditional Z value, can discover new Z value greatly reduced the volatility of data distribution, reduce grey zone, retest, promote the stability of testing result.
(6) The scheme based on machine learning provided by the application utilizes the real sample data accumulated by the Huada gene to train the model while considering a plurality of variables, so that the model can well learn and master the unique characteristics of the data of the Huada gene due to factors such as experimental reagents, a sequencing platform and the like, and the method can be better applied to the data generated by the current actual production of the Huada gene. It can be understood that the present application establishes a set of methods for learning on personalized data, not on the data of the wara gene itself.
The method firstly creates a new detection index, namely the embedding degree, further integrates three variables of the DNA concentration of the fetus, the embedding degree and the traditional Z value, and overcomes the inaccuracy of the result caused by the fact that the traditional NIPT only depends on the Z value to carry out three-body judgment; and using a linear modeThe model integrates the three variables, the model is simple and the problem of overfitting does not exist. Further, the application also develops a Z value conversion method, namely Z value printing, and the meaningless numerical value obtained by the machine learning model is converted into a meaningful and clinically approved Z value, namely Z new Meanwhile, the gray area and retest of the traditional Z value are reduced, and the stability of the detection result is improved.
It is understood that it is not excluded on the basis of the present application that further parameters may be used for model training and analysis of fetal chromosomal aneuploidy abnormalities, e.g. taking into account variables of gestational week, age of pregnant woman, etc. Of course, as variables increase, the corresponding machine learning models also need to be replaced, for example, using a non-linear QDA model. In addition, the specific Z value segmentation printing of this application also can adjust according to the demand.
Example 1
The present example uses the established model for detecting fetal chromosomal aneuploidy abnormality to predict samples with diagnosis/follow-up results. Specifically, 108293 samples are adopted for model training in the present example, and the samples are classified into negative and positive types when entering the model training, but the samples also include 3 types of karyotypes, the negative samples include true negative and false positive samples, and the positive samples include true positive samples. As the calculation modes of the fetal concentrations of male and female embryos are different, the data characteristics of the fetal concentration of the male and female embryos are different, and the fetal concentration is one of the key variables of the model, so that 2 models are trained for each male and female. The specific number of samples is shown in table 1.
TABLE 1 samples for model training
For male Woman
True positive 798 620
False positive 234 318
True negative 56864 49459
Total up to 57896 50397
TABLE 2 sample data example for model training
Fetal concentration Conventional Z value Degree of embedding
True positives 1 0.149 14.501 0.900
True positive 2 0.120 10.058 0.885
False positive 1 0.293 4.834 0.173
False positive 2 0.389 6.821 0.158
True negative 1 0.229 -0.810 -0.035
True negative 2 0.120 -0.596 -0.049
In this example, the fetal DNA concentration, the conventional Z value and the chimerism of the training sample are input into the LDA model for training as shown in Table 2, and the model output value is obtained. And taking the median of the value obtained by the machine learning of the negative sample to obtain the median of the model output value, namely Med in the subsequent printing formula. The median obtained by calculation in this example is shown in Table 3.
TABLE 3 median of model output values
Before printing, LD values were thresholded by artificially observing the distribution of true negative, false positive and true positive samples, such that: 1. the true positive samples are not judged to be negative; 2. as many true positive samples as possible were judged to be positive; 3. as few false positive samples as possible were judged to be positive. Defining the threshold value of LD, i.e. the positive threshold value (cut) in the imprinting formula, according to the above-mentioned principle p ) And negative threshold (cut) n ) The specific values in this example are shown in table 4.
TABLE 4 threshold values for LD values
After printing, the clinically usual threshold values for the Z values of 1.96 and 3 were taken, and the following printing methods were obtained:
when the model output value is greater than the positive threshold value, Z new =LD-cut p +3;
When the model output value is less than the positive threshold value and greater than the negative threshold value,
when the model output value is less than the negative threshold value,
in the above formula, Z new I.e. new Z value, LD is the model output value, cut p Cut as positive threshold n As negative threshold, med is the median of the model output values of the negative samples.
The number of samples which are detected by the Huada gene in actual clinical application and are subjected to prenatal diagnosis/postpartum follow-up is 10240. In the actual clinical detection, the detection results of the samples are given according to the traditional Z values, and the follow-up prenatal diagnosis/postnatal follow-up is carried out according to the detection results, so that each sample can be classified into 3 types of true positive, false positive and true negative according to the detection results of each sample and the results of the prenatal diagnosis/postnatal follow-up, and the specific sample information is shown in Table 5.
Wherein, the traditional Z value calculation mode is as follows:
wherein:
mean of chromosome i UR;
mean of chromosome j UR;
SD i : standard deviation of UR representing chromosome i;
SD j : standard deviation of UR representing chromosome j;
L i : the number of windows representing chromosome i division;
L j : the number of windows representing chromosome j divisions;
Z i : indicating the significance of aneuploidy of chromosome i, and the difference between response and euploidy.
TABLE 5 results of three-body test given by conventional Z values
TABLE 6 sample data examples for model testing
Fetal concentration Conventional Z value Degree of engagement
True positive 1 0.067 5.824 0.885
True positive 2 0.146 11.714 0.808
False positive 1 0.188 3.602 0.205
False positive 2 0.187 4.186 0.252
True negative 1 0.115 1.137 0.090
True negative 2 0.059 -1.125 -0.192
It can be seen that the positive predictive values of T21, T18 and T13 are 0.86, 0.58 and 0.36 respectively according to the conventional detection of Z value, and the false positive problem is more prominent.
By adopting the fetal chromosome aneuploidy abnormality detection model and the method for detecting fetal chromosome aneuploidy abnormality, the mosaicism of the 10240 samples is calculated, taking T13 as an example, and the result is shown in FIG. 3, wherein the mosaicism can better distinguish true positive samples, false positive samples and true negative samples. And further inputting three variables of the embedding degree, the fetal concentration and the traditional Z value into a trained machine learning model, and generating a new Z value through Z value printing. Some sample data used for the model test are shown in table 6. The 10240 samples were redetermined by the new Z values, and Z >3 was judged positive and Z <1.96 was judged negative to generate new detection results, and the results are shown in Table 7.
TABLE 7 results of trisomy detection given by the improved fetal chromosomal aneuploidy abnormality detection method
The results in table 7 show that 14 cases of T21 false positives, 33 cases of T18 false positives and 39 cases of T13 false positives are all correctly determined as negative by using the new Z value, and 87 cases of T21 true positives, 45 cases of T18 true positives, 22 cases of T13 true positives and 10000 cases of true negative samples are still correctly determined, so that the positive prediction values of T21, T18 and T13 all reach 100%, the sensitivity is 100%, the specificity is 100%, the sensitivity is ensured, and the false positives to be detected are greatly reduced, the PPV is increased and the specificity is improved.
Example 2
The present example uses the established model to test the continuous samples of the production line.
Due to the factors such as the collection of diagnosis/follow-up results, the karyotype sample is not a single center continuous sample, so the characteristics of the karyotype sample on the data distribution cannot reflect the real distribution characteristics of the crowd, and the real distribution characteristics of a new Z value cannot be evaluated. Therefore, the distribution characteristics of the new Z values obtained by using continuous samples received by a certain medical examination of Huada gene in a period of time are evaluated and compared with the traditional Z values to show the real characteristics and rules of the new Z values in actual use.
10000 continuous samples which are clinically detected in a certain time period of a single medical examination of Huada gene are extracted, new Z values are calculated for the 10000 samples by using the fetal chromosome aneuploidy abnormality detection model and the method for detecting fetal chromosome aneuploidy abnormality, the Z value of the chromosome 21 is taken as an example, whether the distribution of the Z value of the chromosome 21 accords with normal distribution or not is checked, and the result is shown in figure 4. The results in fig. 4 show that the new Z values for 10000 samples of a single central, continuous time period lie substantially on the diagonal of the Q-Q plot, where individual samples deviating more from the diagonal of the Q-Q plot are positive samples with stronger signals, and fig. 4 shows that the new Z values are very well normalized.
Further comparing the distribution of the new Z value with the conventional Z value, take the Z value of chromosome 13 as an example, as shown in FIG. 5. The results of FIG. 5 show that, first, the new Z values are more closely centered at 0, indicating that the new Z values are more closely fit to a normal distribution centered at 0 than the conventional Z values. Secondly, the new Z value distribution is more concentrated compared with the traditional Z value, which shows that the new Z value has lower volatility and better stability compared with the traditional Z value.
The new Z value fluctuates less than the conventional Z value, and may bring about an effect of decreasing the gray zone rate. This example further demonstrates this with a larger sample size. Specifically, 360786 clinical samples of a single Huada gene tested in 2020 year by a single medical laboratory were tested 383306 times. The new Z value produced 785T 21 gray zones, 345T 18 gray zones and 288T 13 gray zones in 383306 tests, the gray zone rates of T21, T18 and T13 are respectively 0.22%, 0.09% and 0.08%, and the overall gray zone rate of the three-body test is 0.39%. In contrast, the conventional Z value resulted in 3071 times T21 gray zone, 4350 times T18 gray zone and 2335 times T13 gray zone, with the gray zone rates of T21, T18 and T13 being 0.80%, 1.14% and 0.61%, respectively, and the overall gray zone rate of the three-body test being 2.55%, as shown in table 8.
TABLE 8 comparison of gray zone sample numbers and gray zone rates for conventional Z values and new Z values
The results in table 8 show that the new Z values generated by the method of the present application can reduce the ash zone rate of trisomy detection to about one tenth of the previous one, greatly reduce retesting due to ash zones, and improve NIPT detection performance.
The foregoing is a more detailed description of the present application in connection with specific embodiments thereof, and it is not intended that the present application be limited to the specific embodiments thereof. It will be apparent to those skilled in the art from this disclosure that many more simple derivations or substitutions can be made without departing from the spirit of the disclosure.

Claims (11)

1. A method of detecting chromosomal aneuploidy abnormalities in a fetus comprising: calculating to obtain a new Z value of a sample to be detected according to the fetal DNA concentration, the Z value and the embedding degree in the free DNA of the blood of the pregnant woman of the sample to be detected, and judging whether the fetal chromosome of the sample to be detected has aneuploidy abnormality or not according to the new Z value;
the degree of mosaicism is the ratio of fetal abnormal cells to all fetal cells.
2. The method of claim 1, wherein: calculating to obtain a new Z value of the sample to be detected according to the fetal DNA concentration, the Z value and the embedding degree in the free DNA of the pregnant woman blood of the sample to be detected, wherein the new Z value of the sample to be detected comprises inputting the fetal DNA concentration, the Z value and the embedding degree into a fetal chromosome aneuploidy abnormality detection model to obtain a model output value corresponding to the sample to be detected, and printing the model output value to obtain the new Z value of the sample to be detected;
the fetal chromosome aneuploidy abnormality detection model is a model obtained by taking a plurality of samples with known fetal chromosome conditions as training samples, wherein the training samples comprise positive samples and negative samples of fetal chromosome aneuploidy abnormality, and performing machine learning model training by taking fetal DNA concentration, Z value and embedding degree as input to obtain a model output value which integrates three variables of fetal DNA concentration, Z value and embedding degree to represent the fetal chromosome conditions;
preferably, the new Z value of the sample to be tested is obtained by the model output value printing, including calculating to obtain the new Z value of the sample to be tested according to the model output value of the sample to be tested, the positive threshold value, the negative threshold value, and the median of the model output values of all negative samples;
the positive threshold is the threshold of the model output value corresponding to the positive sample, and the negative threshold is the threshold of the model output value corresponding to the negative sample;
preferably, the median of the model output values of all negative samples is the median of the model output values of all negative samples obtained by inputting all negative training samples into the fetal chromosome aneuploidy abnormality detection model again;
preferably, the new Z value of the sample to be tested is obtained by imprinting the model output value, including imprinting means,
when the model output value of the sample to be detected is larger than the positive threshold value, Z new =LD-cut p +3;
When the model output value of the sample to be detected is smaller than the positive threshold value and larger than the negative threshold value,
when the model output value of the sample to be tested is less than the negative threshold value,
in the above formula, Z new For the new Z value, LD is the model output value of the sample to be measured, cut p Is positive thresholdValue, cut n As a negative threshold, med is the median of the model output values of all negative samples;
preferably, whether the fetal chromosome of the sample to be detected has aneuploidy abnormality is judged according to the new Z value, wherein the new Z value is more than 3, and the fetal chromosome is judged to be positive, namely, the fetal chromosome is aneuploidy abnormal; the new Z value is less than 1.96, and the judgment is negative, namely the fetus chromosome is normal;
preferably, the machine learning model is a linear discriminant analysis model;
preferably, the fetal abnormal cell is a cell containing a fetal chromosomal aneuploidy abnormality;
preferably, the fetal DNA concentration in the free DNA in the blood of pregnant woman, the Z value, is calculated from high throughput sequencing data of the free DNA in the blood of pregnant woman.
3. The method according to claim 1 or 2, characterized in that: the embedding degree is obtained by calculating a formula I;
formula one
In formula I, mosaic k (ii) degree of chimerism of the kth chromosome, fra k Relative fetal concentration for the kth chromosome, FF is fetal DNA concentration;
fra k calculating by adopting a formula II;
formula two
In formula two, fra k Is the relative fetal concentration of the kth chromosome,is the mean value of the corrected depths of the kth chromosome,for all that isMean value of corrected depths of autosomes;
in the first formula and the second formula, the value of k is 1 to 22;
Mosaic k a value of 0 indicates that the kth chromosome of the fetus is normal; mosaic k 1, indicating that the kth chromosome of the fetus is completely a trisomy; mosaic k Between 0 and 1, indicating that the kth chromosome of the fetus is chimeric;
preferably, the mean of the corrected depths of each chromosome, the mean of the corrected depths of all autosomes, is calculated from high throughput sequencing data of maternal blood free DNA.
4. A method for constructing a fetal chromosome aneuploidy abnormality detection model is characterized by comprising the following steps: the method comprises the steps of adopting a plurality of samples with known fetal chromosome conditions as training samples, carrying out machine learning model training on the training samples including positive samples and negative samples of fetal chromosome aneuploidy abnormality by taking fetal DNA concentration, Z values and embedding degrees as input, obtaining a model output value which integrates three variables of the fetal DNA concentration, the Z values and the embedding degrees and represents the fetal chromosome conditions, and training the obtained model, namely a fetal chromosome aneuploidy abnormality detection model.
5. The construction method according to claim 4, wherein: the fetal DNA concentration and the Z value are obtained by calculation according to the high-throughput sequencing data of the free DNA in the blood of the pregnant woman; the degree of mosaicism is the ratio of abnormal fetal cells to all fetal cells;
preferably, the fetal abnormal cell is a cell containing a fetal chromosomal aneuploidy abnormality;
preferably, the degree of embedding is calculated by formula one;
formula one
In the first formula, mosaic k Degree of chimerism of the kth chromosome, fra k Is as followsRelative fetal concentration of k chromosomes, FF is fetal DNA concentration;
fra k calculating by adopting a formula II;
formula II
In the second formula, fra k Is the relative fetal concentration of the kth chromosome,is the mean value of the corrected depths of the kth chromosome,the mean of the corrected depths for all autosomes;
in the first formula and the second formula, the value of k is 1 to 22;
Mosaic k a value of 0 indicates that the kth chromosome of the fetus is normal; mosaic k 1, indicating that the kth chromosome of the fetus is completely a trisomy; mosaic k Between 0 and 1, indicating that the kth chromosome of the fetus is chimeric;
preferably, the average value of the corrected depth of each chromosome and the average value of the corrected depths of all autosomes are obtained by calculation according to the high-throughput sequencing data of the blood free DNA of the pregnant woman;
preferably, the machine learning model is a linear discriminant analysis model.
6. An apparatus for detecting chromosomal aneuploidy abnormalities in a fetus, comprising: the method comprises a new Z value calculation module and a fetal chromosome aneuploidy abnormality judgment module;
the new Z value calculating module is used for calculating and obtaining a new Z value of the sample to be detected according to the fetal DNA concentration, the Z value and the embedding degree in the free DNA of the blood of the pregnant woman of the sample to be detected; the degree of mosaicism is the ratio of abnormal fetal cells to all fetal cells;
the fetal chromosome aneuploidy abnormality module is used for judging whether the fetal chromosome of the sample to be detected has aneuploidy abnormality according to the new Z value.
7. The apparatus of claim 6, wherein: the new Z value calculating module is also used for inputting the DNA concentration, the Z value and the chimerism degree of the fetus into the fetus chromosome aneuploidy abnormality detection model to obtain a model output value corresponding to the sample to be detected, and the model output value is used for printing to obtain a new Z value of the sample to be detected;
the fetal chromosome aneuploidy abnormality detection model is a model obtained by taking a plurality of samples with known fetal chromosome conditions as training samples, wherein the training samples comprise positive samples and negative samples of fetal chromosome aneuploidy abnormality, and performing machine learning model training by taking the fetal DNA concentration, the Z value and the embedding degree as input; and the output value of the model is used for integrating three variables of fetal DNA concentration, Z value and chimerism degree to represent the fetal chromosome condition.
8. The apparatus of claim 7, wherein: the method comprises the following steps that a plurality of samples of known fetal chromosome conditions are used as training samples, the training samples comprise positive samples and negative samples of fetal chromosome aneuploidy abnormality, machine learning model training is carried out by taking fetal DNA concentration, Z values and embedding degrees as input, a model output value of a model for representing the fetal chromosome conditions by integrating three variables of the fetal DNA concentration, the Z values and the embedding degrees is obtained, and the obtained model is a fetal chromosome aneuploidy abnormality detection model;
preferably, the machine learning model is a linear discriminant analysis model;
preferably, the new Z value calculating module comprises a model output value analyzing sub-module and a Z value printing sub-module; the model output value analysis submodule is used for inputting the fetal DNA concentration, the Z value and the chimerism degree of a sample to be detected into the fetal chromosome aneuploidy abnormality detection model to obtain a model output value corresponding to the sample to be detected; the Z value printing sub-module is used for calculating and obtaining a new Z value of the sample to be detected according to the model output value of the sample to be detected, the positive threshold value, the negative threshold value and the median of the model output values of all negative samples; the positive threshold is the threshold of the model output value corresponding to the positive sample, and the negative threshold is the threshold of the model output value corresponding to the negative sample;
preferably, the Z-value print sub-module obtains a new Z-value according to,
when the model output value of the sample to be detected is larger than the positive threshold value, Z new =LD-cut p +3;
When the model output value of the sample to be detected is smaller than the positive threshold value and larger than the negative threshold value,
when the model output value of the sample to be tested is less than the negative threshold value,
in the above formula, Z new For the new Z value, LD is the model output value of the sample to be measured, cut p Cut as positive threshold n As a negative threshold, med is the median of the model output values of all negative samples;
preferably, in the fetal chromosome aneuploidy abnormality module, whether the fetal chromosome of the sample to be detected has aneuploidy abnormality is judged according to the new Z value, wherein the new Z value is more than 3 and is judged to be positive, namely, the fetal chromosome aneuploidy abnormality is judged; and judging the new Z value to be negative when the Z value is less than 1.96, namely the fetus chromosome is normal.
9. The apparatus of claim 6, wherein: the system also comprises a data acquisition module, a data acquisition module and a data processing module, wherein the data acquisition module is used for acquiring high-throughput sequencing data of the free DNA of the blood of the pregnant woman of the sample to be detected;
preferably, the system also comprises a data processing module, which is used for calculating the fetal DNA concentration and the Z value according to the acquired high-throughput sequencing data of the free DNA in the blood of the pregnant woman;
preferably, the data processing module is further configured to calculate an average value of corrected depths of each chromosome and an average value of corrected depths of all autosomes according to the acquired high-throughput sequencing data of the blood free DNA of the pregnant woman to be detected;
preferably, the method further comprises a mosaicism calculating module for calculating the mosaicism of each chromosome according to a formula I;
formula one
In formula I, mosaic k Degree of chimerism of the kth chromosome, fra k Relative fetal concentration for the kth chromosome, FF is fetal DNA concentration;
fra k calculating by adopting a formula II;
formula two
In formula two, fra k Is the relative fetal concentration of the kth chromosome,is the mean value of the corrected depths of the kth chromosome,the mean of the corrected depths for all autosomes;
in the first formula and the second formula, the value of k is 1 to 22;
Mosaic k a value of 0 indicates that the kth chromosome of the fetus is normal; mosaic k 1, indicating that the kth chromosome of the fetus is completely trisomy; mosaic k Between 0 and 1, indicating that the chromosome k of the fetus is chimeric.
10. An apparatus for detecting chromosomal aneuploidy abnormalities in a fetus, said apparatus comprising:
a memory for storing a program;
a processor for implementing the method for detecting fetal chromosomal aneuploidy abnormality of any one of claims 1-3 or the method for constructing the fetal chromosomal aneuploidy abnormality detection model of claim 4 or 5 by executing the program stored in the memory.
11. A computer-readable storage medium characterized by: comprising a program executable by a processor to implement the method of detecting a fetal chromosomal aneuploidy abnormality of any one of claims 1-3 or the method of constructing the fetal chromosomal aneuploidy abnormality detection model of claim 4 or 5.
HK42023071359.6A 2023-04-12 Method, device and storage medium for detecting fetal chromosomal aneuploid abnormalities HK40081922A (en)

Publications (1)

Publication Number Publication Date
HK40081922A true HK40081922A (en) 2023-06-02

Family

ID=

Similar Documents

Publication Publication Date Title
CN115223654B (en) Methods, devices and storage media for detecting fetal chromosomal aneuploidy
CN110268044B (en) Method and device for detecting chromosome variation
CN112669901A (en) Chromosome copy number variation detection device based on low-depth high-throughput genome sequencing
CN111226281B (en) Method and device for determining chromosome aneuploidy and constructing classification model
EP3023504B1 (en) Method and device for detecting chromosomal aneuploidy
Liang et al. BrainXcan identifies brain features associated with behavioral and psychiatric traits using large-scale genetic and imaging data
CN110191964B (en) Methods and devices for determining the proportion of free nucleic acids from predetermined sources in biological samples
EP3688473B1 (en) Method and computer program for predicting bilirubin levels in neonates
CN107622183B (en) Fetal chromosome ploidy detection and analysis method based on multiple indexes
HK40081922A (en) Method, device and storage medium for detecting fetal chromosomal aneuploid abnormalities
KR20150039484A (en) Method and apparatus for diagnosing cancer using genetic information
CN117106870B (en) Method and device for determining fetal concentration
WO2023010242A1 (en) Method and system for estimating fetal nucleic acid concentration in non-invasive prenatal gene test data
Yekdast An intelligent method for down syndrome detection in fetuses using ultrasound images and deep learning neural networks
Thomas et al. Computational Method of Predicting Down Syndrome on Foetus by Utilizing First Trimester Ultrasound Scan
CN120452825B (en) Fetal development condition assessment method and system aiming at maternal microplastic exposure
WO2025222351A1 (en) Chromosomal aneuploidy analysis method and use
CN109686401B (en) Method for identifying uniqueness of heterologous low-frequency genome signal and application thereof
Zhao et al. Optimization and anomaly judgment of NIPT detection based on multivariate statistical model and machine learning
CN119049727A (en) Data analysis method and system for prenatal deformity of gene chip based on data processing
CN117393054A (en) Method and device for identifying true and false positive of copy number variation of nucleic acid sample and source of cell division
CN120544886A (en) A method, system, device and medium for evaluating cognitive function in stable schizophrenia
HK40047962B (en) Method and device for determining chromosome aneuploidy and constructing classification model
HK40047962A (en) Method and device for determining chromosome aneuploidy and constructing classification model
Osherovich Chromosome triple play