[go: up one dir, main page]

WO2019242187A1 - Procédé et appareil de détection de variations du nombre de copies de chromosome, et milieu de stockage - Google Patents

Procédé et appareil de détection de variations du nombre de copies de chromosome, et milieu de stockage Download PDF

Info

Publication number
WO2019242187A1
WO2019242187A1 PCT/CN2018/111958 CN2018111958W WO2019242187A1 WO 2019242187 A1 WO2019242187 A1 WO 2019242187A1 CN 2018111958 W CN2018111958 W CN 2018111958W WO 2019242187 A1 WO2019242187 A1 WO 2019242187A1
Authority
WO
WIPO (PCT)
Prior art keywords
chromosome
mer
specific
standard
occurrences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2018/111958
Other languages
English (en)
Chinese (zh)
Inventor
孙亚洲
肖贡
陈斌
杜刘稳
牛团结
陈杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Diagnoa Genomics Technology Co Ltd
Original Assignee
Shenzhen Diagnoa Genomics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Diagnoa Genomics Technology Co Ltd filed Critical Shenzhen Diagnoa Genomics Technology Co Ltd
Publication of WO2019242187A1 publication Critical patent/WO2019242187A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Definitions

  • the present application relates to a method, a device, a computer device, and a storage medium for detecting an abnormality in chromosome copy number.
  • a method, a device, and a storage medium for detecting an abnormality in chromosome copy number are provided.
  • a method for detecting chromosome copy number abnormalities including:
  • a specific k-mer corresponding to each chromosome contained in the target species stored in the target database is obtained, where the specific k-mer is a k-mer in each chromosome that satisfies a preset specific condition, and the k-mer mer refers to a genomic sequence of length k;
  • a copy number of each specific k-mer is obtained from the target database, and the copy number is a specificity that has the least number of occurrences of the specific k-mer in the corresponding chromosome and that on the chromosome The ratio of the number of occurrences of k-mer;
  • a chromosome whose actual signal intensity is not within the standard confidence interval of the corresponding chromosome is determined to be a chromosome with abnormal copy number.
  • a device for detecting abnormal chromosome copy number includes:
  • a specific k-mer acquisition module configured to acquire sequencing data of a sample to be detected as the data to be detected, and determine a target species corresponding to the data to be detected; and acquire a corresponding one of each chromosome contained in the target species stored in the target database.
  • a specific k-mer where the specific k-mer is a k-mer in each chromosome that meets a preset specificity condition, and the k-mer refers to a genomic sequence of length k;
  • An actual appearance frequency acquisition module configured to obtain the actual appearance frequency of the specific k-mer included in each chromosome in the data to be detected
  • a copy number obtaining module is configured to obtain a copy number of each specific k-mer from the target database, the copy number is the number of occurrences of the specific k-mer in the corresponding chromosome and the chromosome The ratio of the number of occurrences of the specific k-mer with the fewest occurrences;
  • a determination module configured to calculate the actual signal intensity of the corresponding chromosome according to the actual number of occurrences and the number of copies of each specific k-mer; determine that the chromosome whose actual signal intensity is not within the standard confidence interval of the corresponding chromosome is a copy Number of abnormal chromosomes.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the one or more processors are executed. The following steps:
  • a specific k-mer corresponding to each chromosome contained in the target species stored in the target database is obtained.
  • the specific k-mer is a k-mer in each chromosome that meets a preset specificity condition.
  • mer refers to a genomic sequence of length k;
  • a copy number of each specific k-mer is obtained from the target database, and the copy number is a specificity that has the least number of occurrences of the specific k-mer on the corresponding chromosome and the number of occurrences on the chromosome.
  • a chromosome whose actual signal intensity is not within the standard confidence interval of the corresponding chromosome is determined to be a chromosome with abnormal copy number.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the one or more processors execute the following steps:
  • a specific k-mer corresponding to each chromosome contained in the target species stored in the target database is obtained, where the specific k-mer is a k-mer in each chromosome that satisfies a preset specific condition, and the k-mer mer refers to a genomic sequence of length k;
  • a copy number of each specific k-mer is obtained from the target database, and the copy number is a specificity that has the least number of occurrences of the specific k-mer on the corresponding chromosome and the number of occurrences on the chromosome.
  • a chromosome whose actual signal intensity is not within the standard confidence interval of the corresponding chromosome is determined to be a chromosome with abnormal copy number.
  • FIG. 1 is a schematic flowchart of a method for detecting a chromosome copy number abnormality according to one or more embodiments.
  • FIG. 2 is a schematic flow chart before step 102 according to one or more embodiments.
  • FIG. 3 is a schematic flow chart before step 102 according to another embodiment.
  • Figure 4 is a list of copy numbers of specific k-mers of chromosome X according to one or more embodiments.
  • FIG. 5 is a schematic flowchart of step 110 according to one or more embodiments.
  • FIG. 6 is a schematic flowchart of a method for detecting an abnormal chromosome copy number according to one or more embodiments, which further includes other steps.
  • FIG. 7A is a standard signal intensity recording table of a chromosome in a normal male sample according to one or more embodiments.
  • FIG. 7B is a standard signal intensity recording table of a chromosome in a normal female sample according to one or more embodiments.
  • FIG. 8 is a schematic flowchart of step 610 according to one or more embodiments.
  • FIG. 9A is a distribution table of pre-set reliability values P according to standard signal intensities of chromosomes in normal male samples in one or more embodiments.
  • FIG. 9B is a distribution table of pre-set reliability values P according to standard signal intensities of chromosomes in normal female samples in one or more embodiments.
  • FIG. 10 is a schematic flowchart of step 610 according to another embodiment.
  • FIG. 11 is a schematic flowchart of a method for detecting an abnormal chromosome copy number according to another or more embodiments, including other steps.
  • FIG. 12 is a schematic flowchart before step 102 according to still another embodiment.
  • FIG. 13 is a table showing the actual number of occurrences of the specific k-mer of a specific chromosome according to one or more embodiments.
  • FIG. 14 is a schematic flowchart of a method for detecting an abnormality of a chromosome copy number according to another or more embodiments.
  • FIG. 15 is a schematic flowchart of step 1402 according to one or more embodiments.
  • FIG. 16 is a table of human chromosome copy numbers in accordance with one or more embodiments.
  • FIG. 17 is a schematic flowchart of step 1404 according to one or more embodiments.
  • FIG. 18 is a single-copy signal strength calculation table for a specific chromosome according to one or more embodiments.
  • FIG. 19 is a single copy signal intensity recording table for each chromosome according to one or more embodiments.
  • FIG. 20 is a calculation table of actual signal intensities of individual chromosomes according to one or more embodiments.
  • FIG. 21 is a block diagram of an apparatus for detecting an abnormality in chromosome copy number according to one or more embodiments.
  • FIG. 22 is a block diagram of a computer device in accordance with one or more embodiments.
  • a method for detecting an abnormal chromosome copy number which includes the following steps:
  • Step 102 Obtain sequencing data of the sample to be detected as the data to be detected, and determine a target species corresponding to the data to be detected.
  • the data to be detected refers to the data output by a sample after the sequence of a biomolecule contained in a sample is read by a DNA sequencer, an RNA sequencer, or a protein sequencing device.
  • DNA sequencing is the process of determining the exact sequence of nucleotides within a DNA molecule. It includes any method or technique for determining the four base sequences of adenine, guanine, cytosine, and thymine in a DNA strand.
  • a sequencer is an instrument capable of measuring the sequence of an input sample. The sequence measured here includes not only DNA sequences but also sequences composed of other substances such as proteins and RNA. Samples can be in the form of a drop of blood, a sputum, a handful of soil, and so on.
  • the species to which the data to be detected belongs, that is, the target species. For example, when the sequencing data is a human gene sequence, the target species is human.
  • Step 104 Obtain a specific k-mer corresponding to each chromosome contained in the target species stored in the target database, and the specific k-mer is a k-mer, k-mer in each chromosome that satisfies a preset specific condition. Refers to a genomic sequence of length k.
  • Each target species contains one or more individuals. Each individual contains one or more genomes, and each genome contains one or more chromosomes. Therefore, each target species contains multiple chromosomes.
  • the target database may store a feature target sequence set previously established for each chromosome, and the feature target sequence set corresponding to each chromosome may include a specific k-mer corresponding to each chromosome.
  • the specific k-mer refers to a k-mer selected from the k-mers contained in each chromosome and meeting a preset specificity condition, that is, a specific k-mer corresponding to each chromosome.
  • the preset specific condition is a condition set by a technician in advance for selecting a matching k-mer. The preset specific condition may be determined according to a technician's consideration or an actual project requirement.
  • k-mer refers to a genomic sequence of length k, where k is a natural number. If there are a different deterministic characters in a genomic data, then for a particular k, there may be a total of k-mers with a power of a that are different.
  • deterministic characters refer to the five bases A (adenine), T (thymine), C (cytosine), G (guanine), and U (uracil); In the case of protein sequences, deterministic characters are defined amino acid characters.
  • Step 106 Obtain the actual number of occurrences of the specific k-mer contained in each chromosome in the data to be detected.
  • the data to be detected can be compared with each chromosome separately, that is, the appearance of the specific k-mer included in the characteristic target sequence set corresponding to each chromosome in the data to be detected
  • the number of times is the actual number of times each specific k-mer appears in the data to be detected.
  • Step 108 Obtain a copy number of each specific k-mer from the target database.
  • the copy number is the specific k-mer with the least number of occurrences of the specific k-mer on the corresponding chromosome and the specific k-mer on the chromosome. The ratio of the number of occurrences.
  • the copy number of each specific k-mer refers to the ratio of the number of occurrences of the specific k-mer on the corresponding chromosome to the number of occurrences of the specific k-mer with the least number of occurrences on the chromosome.
  • Step 110 Calculate the actual signal intensity of the corresponding chromosome according to the actual number of occurrences and the copy number of each specific k-mer.
  • the actual signal intensity of each specific k-mer can be calculated according to these two parameters.
  • the ratio of Ci and Fi can be calculated, and the ratio is used as the adjusted number of appearances of each specific k-mer. In this way, the number of adjusted occurrences of all specific k-mers contained in each chromosome can be calculated. Then calculate the average of the number of occurrences of the specific k-mer adjusted in each chromosome, and use this average as the single copy signal intensity E of the corresponding chromosome.
  • Step 112 Determine a chromosome whose actual signal strength is not within the standard confidence interval of the corresponding chromosome as a chromosome with abnormal copy number.
  • the standard confidence interval refers to the standard signal intensity interval calculated in advance based on a large number of samples.
  • the standard signal strength is actually calculated in the same way as the actual signal strength, but since the standard test sample is a sample confirmed to have no abnormal chromosome copy number, the standard signal strength is for the data of the standard test sample, and the actual signal strength is for Data to be tested.
  • the actual signal intensity of the chromosome is within the standard confidence interval of the corresponding chromosome, it can be judged that there is no copy number abnormality on the chromosome; otherwise, it can be judged that the copy number is abnormal on the chromosome.
  • a chromosome whose actual signal intensity is not within the standard confidence interval of the corresponding chromosome can be determined as a chromosome with an abnormal copy number.
  • the actual signal intensity of each chromosome is compared with the standard confidence interval of the corresponding chromosome.
  • the actual signal intensity of chromosome 1 is compared with a pre-established standard confidence interval of chromosome 1
  • the actual signal intensity of chromosome 2 is compared with a pre-established standard confidence interval of chromosome 2.
  • the actual signal intensity of each chromosome can be compared with the standard confidence interval of the corresponding chromosome, and the chromosome that is not within the standard confidence interval of the corresponding chromosome can be determined as a chromosome with abnormal copy number.
  • This method of detecting chromosome copy number abnormalities is compared with the characteristic target sequence in each chromosome of the target species, that is, the specific k-mer, which is part of the entire target species genome, and is therefore specific.
  • the comparison of the performance k-mer can reduce the comparison space, thereby shortening the analysis time and improving the detection efficiency.
  • the specific k-mer refers to the k-mer in a chromosome whose appearance frequency in the genome occurrence number index table corresponding to the chromosome meets a preset error condition.
  • the set of characteristic target sequences corresponding to each chromosome includes a specific k-mer in each chromosome that satisfies a predetermined specificity condition.
  • the preset specific condition refers to a k-mer included in a chromosome whose occurrence number in the genome occurrence number index table corresponding to each chromosome meets a preset error condition.
  • the preset error condition refers to the error condition preset by the technician according to the actual project requirements.
  • the error condition can be a range of regions, that is, the k-mer selected as a specific can be allowed to have a certain error, instead of being completely satisfied. Some strict objective condition.
  • each chromosome there is an index table of the number of occurrences of the genome corresponding to the chromosome.
  • the number of k-mers contained in each chromosome in the chromosome can be obtained according to the index of the number of occurrences of the genome corresponding to each chromosome. It has appeared in the genome, that is, the k-mer in the chromosome whose occurrence number in the chromosome genome occurrence index table meets the preset error condition can be selected, and the selected k-mer is used as the specific k-mer.
  • the method before step 102, further includes the following steps: generating an index table of the number of occurrences of the genome corresponding to each chromosome, and the index of the number of times of the genome records that the genome contained in the chromosome corresponding to each k-mer contains The number of genomes of the k-mer; the index table of the number of occurrences of the genome is stored in the feature target sequence set corresponding to the chromosome.
  • the genome is all the genetic information in an organism. This genetic information is stored in the form of a nucleotide sequence.
  • the sum of the genetic material in a complete monomer of an organism is the genome.
  • an individual's complete genome can contain multiple chromosomes, and each chromosome can contain multiple k-mers.
  • chromosome genome commonly used in the art is used here to refer to the sum of all sequences contained in a complete chromosome.
  • the number of genome occurrences corresponding to each chromosome has been recorded in the index table of the number of occurrences of the genome corresponding to each chromosome in the number of genomes corresponding to the chromosome, that is, the number of genomes index table records each k-mer
  • the number of the k-mer genome is contained in the genome corresponding to the chromosome to which it belongs.
  • the genomic appearance frequency index table corresponding to each chromosome can be stored into the feature target sequence set corresponding to each chromosome, that is, stored in the target database. After storage, if needed, Data can be retrieved from the target database at the genome occurrence index table, which improves the detection efficiency.
  • the method further includes the following steps:
  • Step 100 Select a k-mer that satisfies a preset specific condition from the k-mers corresponding to each chromosome.
  • Step 101 Store a k-mer that satisfies a preset specific condition into a feature target sequence set corresponding to each chromosome.
  • each feature target sequence set includes a specific k-mer corresponding to each chromosome.
  • Specific k-mer refers to the selection of k-mers that satisfy preset specific conditions from the k-mers contained in each chromosome.
  • a k-mer that satisfies a preset specificity condition that is, a specific k-mer
  • the specific k-mer can be stored in a feature target sequence set corresponding to each chromosome.
  • a feature target library is established in advance, so when detecting whether the chromosome is abnormal, it can directly call data that requires specific k-mer, which improves the detection efficiency.
  • the method further includes the following steps:
  • Step 302 Obtain the number of occurrences of the specific k-mer included in each chromosome contained in the target species stored in the target database in the corresponding chromosome C, and the specific k-mer corresponding to the least number of occurrences in the chromosome.
  • the number of occurrences is taken as the minimum number of occurrences Cm.
  • step 304 the ratio of the number of occurrences C to the minimum number of occurrences Cm is used as the copy number of the specific k-mer.
  • Step 306 Generate a specific k-mer copy number list corresponding to each chromosome according to the copy number of the specific k-mer included in each chromosome.
  • Step 308 Store the specific k-mer copy number list into the target database.
  • the above step 108 includes: obtaining the copy number of each specific k-mer according to the specific k-mer copy number list.
  • the target species contains multiple chromosomes, and each chromosome contains one or more specific k-mers.
  • the number of occurrences of each specific k-mer contained in each chromosome on the chromosome C can be obtained, and the number of occurrences of the specific k-mer with the least number of occurrences in the chromosome can be obtained as the minimum number of occurrences Cm .
  • the ratio of the number of occurrences C to the number of occurrences of the k-mer with the least number of occurrences on the chromosome is the copy number of the specific k-mer.
  • the copy number of each specific k-mer can be calculated to generate a list of specific k-mer copy numbers corresponding to the chromosome.
  • Each specific k-mer copy number list can be stored in the characteristic target sequence set corresponding to the chromosome, which is convenient for directly calling the list to obtain relevant data when needed, improving detection efficiency.
  • the copy number of the specific k-mer with the least occurrence is equal to Cm / Cm, that is, the copy number of the specific k-mer with the least occurrence is 1.
  • the above step 110 includes:
  • Step 502 Calculate the ratio of the actual number of occurrences of each specific k-mer to the number of copies.
  • Step 504 Calculate the average of the ratio of the actual number of occurrences of all specific k-mers to the number of copies in each chromosome as the single-copy signal strength of the corresponding chromosome.
  • Step 506 Calculate the actual signal intensity of the corresponding chromosome according to the signal intensity of the single copy of each chromosome.
  • each chromosome can contain multiple specific k-mers, so the ratio of the actual number of occurrences of all specific k-mers contained in each chromosome to the number of copies can be obtained, and the average of the ratio can be obtained. Therefore, each chromosome will have a corresponding average of the ratio of the actual number of occurrences to the number of copies, and this average is the single-copy signal strength of each chromosome. Therefore, the actual signal intensity corresponding to each chromosome can be calculated based on the signal intensity of a single copy of each chromosome.
  • the actual signal intensity of the corresponding chromosome is calculated according to the following formula:
  • the actual signal intensity of the chromosome (single copy signal intensity of the chromosome-M) / SD, where M is the average of the single copy signal intensity of all chromosomes, and SD is the variance of the single copy signal intensity of all chromosomes.
  • the average value M and the variance of the single-copy signal intensity of all chromosomes can be calculated.
  • the method for detecting an abnormal chromosome copy number further includes the following steps:
  • Step 602 Obtain a preset number of standard test samples, and the standard test samples are samples confirmed to have no abnormal chromosome copy number.
  • Step 604 Obtain the actual number of occurrences of the specific k-mer contained in each chromosome in the standard detection sample in the data to be detected.
  • the chromosome in the data to be detected can be compared with a standard confidence interval list corresponding to a predetermined chromosome, and it can be determined whether there is an abnormal copy number of the chromosome in the data to be detected.
  • a standard confidence interval list corresponding to a chromosome a preset number of standard detection samples need to be obtained first.
  • the standard test sample is a sample confirmed as having no abnormal chromosome copy number.
  • the preset quantity is an exponential quantity that can be set by the technicians, but it should be based on meeting the requirements of a large sample in statistics. Generally the preset number should be greater than 30, or greater than 100. After obtaining multiple standard detection samples, the actual number of occurrences of the specific k-mer in the chromosome contained in each standard detection sample in the data to be detected can be obtained.
  • Step 606 Obtain a copy number of each specific k-mer in each chromosome included in the standard detection sample from the target database.
  • Step 608 Obtain the standard signal intensity of the corresponding chromosome according to the actual number of occurrences and the copy number of each specific k-mer included in the standard detection sample.
  • the copy number of each specific k-mer refers to the ratio of the number of occurrences of the specific k-mer on the corresponding chromosome to the number of occurrences of the specific k-mer with the least number of occurrences on the chromosome.
  • a standard signal intensity record table can be established according to different genders. For example, if the target species is a human, a standard signal intensity record table of chromosomes in a standard test sample belonging to a male and a standard signal intensity record table of a chromosome in a standard test sample belonging to a female can be established.
  • a standard signal intensity recording table for chromosomes in a normal male sample as shown in FIG. 7A and a standard signal intensity recording table for chromosomes in a normal female sample as shown in FIG. 7B.
  • a standard signal intensity record corresponding to a chromosome included in a male sample and a standard signal intensity record corresponding to a chromosome included in a female sample are recorded.
  • the standard signal intensity of chromosome 1 in sample 1 is recorded as S 1 1
  • the standard signal intensity of chromosome 2 is recorded as S 1 2
  • the standard signal intensity of chromosome 1 in sample i is recorded as S i 1
  • the standard signal intensity of chromosome 2 in sample i is recorded as S i 2 .
  • the recording method is the same in FIG. 7B.
  • Step 610 Determine a standard confidence interval corresponding to the chromosome when the confidence value is preset according to the standard signal intensity of each chromosome in the multiple standard detection samples.
  • Step 612 Obtain a list of standard confidence intervals corresponding to the chromosomes included in the target species according to the standard confidence intervals corresponding to each chromosome.
  • a confidence interval is an interval for a population parameter to be estimated. By obtaining a random sample from the population, the calculated confidence interval may include the population parameter of the population. This confidence is also called the confidence level.
  • the preset reliability value P here refers to a confidence value set by a technician in advance, and is generally set to a value greater than 0.95, which is infinitely close to 1 but not equal to 1. The preset reliability value can be adjusted by a technician in actual applications as needed. For example, if the confidence value is set to 95% confidence, P is 0.95, and if the confidence value is set to 99.9%, P is 0.999.
  • the two boundary values LB and UB of the standard signal strength of the chromosome can be determined according to the preset preset confidence value, and a confidence interval corresponding to the preset confidence value can be obtained.
  • LB is the minimum of the confidence interval
  • UB is the maximum of the confidence interval. Therefore, the confidence interval obtained is actually the interval of standard signal strength.
  • the standard signal intensity interval corresponding to the preset confidence value can be obtained, that is, the standard signal intensity interval of each chromosome, that is, the standard confidence interval corresponding to each chromosome. Since the target species contains multiple chromosomes, a list of standard confidence intervals corresponding to the chromosomes contained in the target species can actually be obtained.
  • the standard confidence interval list contains standard confidence intervals corresponding to each chromosome. For example, if the preset reliability value P is set to 0.98, the standard signal intensity interval corresponding to each chromosome at a probability of 98% can be obtained.
  • the above step 610 includes:
  • Step 802 Obtain a standard signal intensity of each chromosome contained in each standard detection sample.
  • Step 804 Calculate the mean and variance of the standard signal strengths of the chromosomes included in all the standard detection samples according to the gender of the standard detection samples.
  • Step 806 Determine the standard confidence corresponding to the chromosome contained in the standard detection sample corresponding to each gender when the confidence value is preset according to the mean and variance of the standard signal strengths in the multiple standard detection samples corresponding to each gender for each sex. Interval.
  • each chromosome refers to each numbered chromosome.
  • the mean and variance of the standard signal intensity of chromosome 1 can be calculated.
  • the mean and variance of standard signal intensities of chromosomes 2, 3, ..., 22 and X, Y and other chromosomes can be calculated.
  • the corresponding standard confidence interval that is, the corresponding standard signal intensity interval, of each chromosome can be determined when the confidence value is preset. For example, with humans as the target species, you can also create a distribution table of the pre-set reliability value P of the standard signal intensity of the chromosome in the male sample and the preset of the standard signal intensity of the chromosome in the female sample according to the standard test samples of different genders. Table of distributions of confidence values P.
  • the normal male sample contains 22 autosomal and XY chromosomes.
  • M ′ represents the average value of the standard signal intensity of all chromosomes
  • SD ′ represents the variance of the standard signal intensity of all chromosomes.
  • LB represents the minimum value of the confidence interval corresponding to the preset confidence value P for each chromosome
  • UB represents the maximum value of the confidence interval corresponding to the preset confidence value P for each chromosome. The minimum and maximum values give the corresponding confidence intervals.
  • Figure 9B The difference between Figure 9A and Figure 9B is that the genomes of individuals of different sexes have different chromosomal compositions. For example, in Figure 9A corresponding to a male sample, in addition to 22 autosomes, X and Y sex chromosomes are included, while in female samples, For 22 chromosomes and two X sex chromosomes. The rest of the data represent the same meaning.
  • the standard test sample is a peripheral blood sample of a normal mother carrying a normal baby.
  • the peripheral blood sample includes a peripheral blood sample of a normal mother carrying a normal baby boy, and a peripheral mother's peripheral blood sample. Peripheral blood samples from normal mothers carrying normal baby boy twins, Peripheral blood samples from normal mothers carrying normal baby girl twins, and Peripheral blood samples from normal mothers carrying normal one boy and one female twin.
  • Peripheral blood is blood other than bone marrow.
  • a normal mother means that the mother's chromosome copy number is not abnormal
  • a normal baby means that the baby's chromosome copy number is also normal.
  • the criteria for identifying as a normal mother or normal baby can also be adjusted by technical staff based on actual project research.
  • the standard test samples can be peripheral blood samples of normal mothers carrying normal babies.
  • the peripheral blood samples include peripheral blood samples from normal mothers with normal boys, and normal mothers with normal girls.
  • the standard test sample may also be a peripheral blood sample from a normal mother with multiple normal babies.
  • a normal mother carries a peripheral blood sample of a normal triplet
  • a normal mother carries a peripheral blood sample of a normal quadruplet, and so on.
  • there is no need to limit the number of babies pregnant by a normal mother but a peripheral blood sample of a normal mother pregnant with a normal baby can be obtained as a standard test sample.
  • step 610 includes the following steps:
  • Step 1002 Determine a standard confidence interval corresponding to a chromosome at a preset confidence value according to a standard signal intensity of each chromosome contained in a peripheral blood sample of a normal mother carrying a normal baby boy.
  • a standard confidence interval corresponding to a preset chromosome confidence value of a chromosome is determined according to a standard signal intensity of each chromosome contained in a peripheral blood sample of a normal mother carrying a normal baby girl.
  • a standard confidence interval corresponding to a chromosome at a preset confidence value is determined according to a standard signal intensity of each chromosome contained in a peripheral blood sample of a normal mother carrying a normal baby boy twin.
  • Step 1008 Determine a standard confidence interval corresponding to a chromosome at a preset confidence value according to a standard signal strength of each chromosome contained in a peripheral blood sample of a normal mother carrying a normal baby girl twin.
  • Step 1010 Determine a standard confidence interval corresponding to a chromosome at a preset confidence value according to a standard signal intensity of each chromosome contained in a peripheral blood sample of a normal mother and a female twin.
  • the above steps 1002 to 1010 are to determine the standard confidence interval corresponding to the preset chromosome confidence value according to the standard signal intensity of the chromosome contained in the different standard detection samples.
  • the standard test sample is a peripheral blood sample of a normal mother carrying a normal baby boy
  • the reliability of the chromosome in the preset setting can be determined according to the standard signal intensity of each chromosome contained in the peripheral blood sample of a normal mother carrying a normal baby boy.
  • the standard confidence interval corresponding to the value.
  • the chromosome can be determined based on the standard signal intensity of each chromosome contained in the peripheral blood sample of a normal mother carrying a normal male and female twin.
  • the standard confidence interval when the confidence value is preset.
  • the above-mentioned step 112 includes: when it is detected that the actual signal intensity corresponding to the chromosome does not belong to the standard confidence interval corresponding to the corresponding chromosome, determining the chromosome corresponding to the actual signal intensity as having a copy number abnormality Chromosome.
  • the standard confidence intervals corresponding to each chromosome of the target species can be calculated, and a list of standard confidence intervals can be obtained. Therefore, the actual signal intensity of the chromosome contained in the target species to which the data to be measured can be compared with the standard confidence interval of the corresponding chromosome obtained in advance. When it is detected that the actual signal intensity corresponding to the chromosome does not belong to the standard confidence interval corresponding to the corresponding chromosome, the chromosome corresponding to the actual signal intensity is determined to be a chromosome with abnormal copy number. In the alignment, each chromosome is compared with a standard confidence interval corresponding to each chromosome.
  • the chromosome 1 contained in the target species to which the sequencing data belongs is compared with the pre-calculated standard confidence interval of chromosome 1.
  • the chromosome 2 contained in the target species to which the sequencing data belongs is compared with the pre-calculated chromosome 2
  • the standard confidence intervals are compared. In this way, all chromosomes contained in the target species to which the sequencing data belongs are compared to determine whether there is an abnormal copy number in the chromosome.
  • the standard confidence interval corresponding to the pre-calculated chromosome 1 is (LB1, UB1), and the 1 contained in the test sample is detected and determined. Whether the actual signal intensity of chromosome chromosome exists in the interval (LB1, UB1). If it does not exist, it indicates that the copy number of chromosome 1 is abnormal; if it does, it indicates that chromosome 1 is normal and there is no abnormal copy number.
  • the method for detecting an abnormal chromosome copy number further includes the following steps:
  • Step 1102 Determine a standard confidence interval list of a chromosome corresponding to each gender according to the gender of the target species.
  • Step 1104 Obtain the gender of the sample to be tested.
  • step 1106 the actual signal intensity of each chromosome is compared with the standard confidence interval corresponding to the corresponding chromosome in the standard confidence interval list of the corresponding sex of the target species.
  • step 1108 when it is detected that the actual signal intensity of the chromosome does not belong to the standard confidence interval of the corresponding chromosome of the corresponding gender, the chromosome corresponding to the actual signal intensity is determined to be a chromosome with abnormal copy number.
  • the target database stores a list of standard confidence intervals corresponding to chromosomes included in a sample created according to gender. For example, taking a person as an example, a target table stores a distribution table of preset reliability values P of standard signal intensities of chromosomes in normal male samples, and a normal distribution record of preset reliability values P of male samples records normal The standard confidence interval for each chromosome contained in the male sample when the confidence is preset.
  • the target database stores a distribution table of preset reliability values P of standard signal intensities of chromosomes in normal female samples, and a distribution table of preset reliability values P of female samples records each of the values contained in normal female samples. The standard confidence interval for each chromosome when the confidence is preset.
  • Gender classification of target species that is, to divide target species into parts corresponding to gender according to gender. For example, when the target species is human, the target species is divided into male and female according to gender. Then you can determine the standard confidence interval for each sex's corresponding chromosome. After classifying the target species according to sex, the chromosomes contained in the target species of each sex can be clarified, thereby obtaining the standard confidence interval corresponding to each chromosome. For example, if the female target species contains 22 autosomes and two X sex chromosomes, a distribution table of preset confidence values P of standard signal intensities of chromosomes in normal female samples can be obtained from the target database.
  • Standard confidence intervals corresponding to these 22 chromosomes and X chromosomes were obtained from the table. That is, when the sample to be measured comes from a female, a distribution table of preset reliability values P of the standard signal intensity of the chromosome corresponding to the female is obtained. That is, the actual signal intensity of each chromosome of the female sample to be tested is compared with the standard confidence interval of each chromosome in the list of female standard confidence intervals. In this way, when it is detected that the actual signal intensity of the chromosome does not belong to the standard confidence interval of the corresponding chromosome of the corresponding gender, the chromosome corresponding to the actual signal intensity is determined to be a chromosome with abnormal copy number.
  • the method further includes the following steps:
  • Step 1202 Obtain multiple chromosomes included in the target species.
  • Step 1204 sort and sort multiple chromosomes included in the target species.
  • Step 1206 Obtain a pre-selected high-confidence genome that meets a preset reliability condition.
  • Step 1208 Determine a high-confidence genome corresponding to each chromosome contained in the target species.
  • the target species is the species from which the test sample is derived.
  • the human is the target species.
  • the target species can be a human or a species other than a human.
  • the genomic data of target and non-target species can be derived from the RefSeq data set (RefSeq reference sequence database of the National Center for Biotechnology Information) of the NCBI (RefSeq reference sequence database, which has biological significance provided by the National Center for Bioinformatics). Non-redundant gene and protein sequences) or other public or private genomes. The genomes of all target and non-target species are integrated into a complete collection.
  • an individual's complete genome contains multiple chromosomes. Therefore, after obtaining the respective genomes of different individuals corresponding to the target species, multiple chromosomes contained in the target species can be obtained. Because there may be multiple sets of genomes of the target species collected, that is, different genomes of different individuals or populations from the same target species. Taking humans as target species, the genomes of target species collected may include genomes from European, North American Indian, and Chinese Han ethnic groups. Therefore, each chromosome of the target species may contain sequences belonging to that chromosome from a different genome. Taking humans as an example, the first chromosome of humans can include the first chromosome of European descent, the first chromosome of North American Indian, and the first chromosome of Chinese Han. Here, the data of each identical chromosome of the target species are put together, that is, the sequence data set of each chromosome of the target species is composed.
  • a preselected genome that meets the preset credibility conditions is obtained, that is, a high credibility genome that meets the preset credibility conditions is selected, and the corresponding chromosomes of the target species can be determined.
  • High-confidence genome refers to a genome that satisfies a preset confidence condition. Of course, the order here can also be changed. A large number of genomes can be collected from the NCBI in advance, and these genomes can be screened to select a genome that meets the preset reliability conditions as a high-confidence genome.
  • each target species determines the high-confidence sequence data set of each chromosome contained in each target species, that is, to put together the data of each identical chromosome of all high-confidence genomes of each target species, that is, each target species is composed High confidence sequence data set for individual chromosomes.
  • satisfying the preset credibility condition includes any of the following: when the proportion of non-deterministic characters contained in the chromosome sequence is lower than a preset proportion threshold; the sequence belonging to the same chromosome included in the chromosome sequence When the fragment is below the preset fragment threshold; compare a certain chromosome sequence with all other chromosomal sequences whose genetic relationship meets the preset genetic distance threshold range to determine the average full coverage of the chromosome sequence in the similar chromosome sequences Percentage, when the average coverage percentage is higher than the preset percentage value.
  • the proportion of non-deterministic characters refers to the proportion of non-ACGT characters contained in it. If the proportion of non-ACGT characters in a piece of DNA genome data is too high, then the piece of data is a genome with a suspected low confidence .
  • non-deterministic characters refer to characters other than ACGTU.
  • non-deterministic characters refer to characters other than certain amino acid characters.
  • the genome can be considered to satisfy a preset credibility condition.
  • the genome sequence is a suspected low confidence genome. That is, when a sequence fragment included in a genomic sequence belonging to the same chromosome is lower than a preset fragment threshold, the genomic sequence data can also be considered to satisfy a preset confidence condition.
  • Genetic distance refers to an index that measures the size of the overall genetic difference between species (or individuals).
  • the k-mer in the specific k-mer satisfies the following two conditions: the number of occurrences in the genome occurrence index table corresponding to each chromosome meets a first preset error condition; The number of occurrences in the genome occurrence number index table corresponding to each chromosome and the number of appearances in the genome occurrence number index table of the complete set meet the second preset error condition.
  • the genome occurrence index table of a certain chromosome records the number of genomes of each k-mer in the genome included in the corresponding chromosome; the genome occurrence index table of the complete set records each of the target species
  • the k-mer included in the chromosome includes the number of the k-mer genome in the genome included in the corpus.
  • each chromosome has its own set of characteristic target sequences, and the specific k-mer included in the set of characteristic target sequences refers to a k-mer that satisfies a preset specific condition.
  • the preset specific condition includes a first preset error condition and a second preset error condition. When the k-mer satisfies these two conditions at the same time, it is considered that the k-mer meets the preset specific condition and the k -mer as a specific k-mer.
  • the complete set refers to the collection of all high-confidence genomes collected.
  • the high-confidence genome contains both the genomes of each target species and the genomes of non-target species, such as pathogenic bacteria, symbiotic bacteria, and probiotics. , Human, animal, plant, etc. high confidence genome.
  • An index table of the number of occurrences of a genome of a certain chromosome records the number of each k-mer's genome in the corresponding genome of the corresponding chromosome.
  • the count corresponding to each k-mer recorded in the genome occurrence index table of the complete set represents how many genomes of the k-mer have appeared in the total set. If the k-mer appears multiple times in the same genome, it will only be counted once.
  • each k-mer contains the number of genomes of the k-mer in the corresponding genome of the corresponding chromosome
  • the genome occurrence number index table of the complete set records in The genome included in the corpus contains the number of k-mer genomes.
  • the selection of the specific k-mer includes two parameters, a preset error condition and a second preset error condition, and thus allows the non-specificity of the specific k-mer within a certain range. Without these two parameters, non-specificity in a certain range cannot be allowed, and it is often difficult to find a specific k-mer for a chromosome. Therefore, by selecting a specific k-mer that allows a certain amount of error, and thus establishing a set of characteristic target sequences, a specific target that can represent the chromosome can be found with high probability.
  • the first preset error condition is: the sum of the ratio of the number of occurrences in the genome occurrence number index table corresponding to each chromosome to the number of genomes contained in the corresponding chromosome and the first threshold is greater than or equal to 1.
  • the first preset error condition refers to that the sum of the ratio of the number of occurrences recorded in the genome occurrence number index table corresponding to the chromosome to the number of genomes corresponding to the chromosome and the first threshold is greater than or equal to 1. Assume that there are N corresponding genomes of this chromosome, and the number of occurrences of a certain k-mer in the genome occurrence index table corresponding to this chromosome is C1, and the first threshold is P1, then the first preset error condition is C1 / N + P1 ⁇ 1.
  • the first threshold value P1 represents an acceptable error probability, and can be any value between 0 and 1.
  • the first threshold value can be set by a technician according to the actual project.
  • the first threshold is less than 5%.
  • the first threshold is an acceptable error probability.
  • the first threshold may be any value between 0 and 1.
  • the first threshold may be set to a value less than 5%.
  • the second preset error condition is: the ratio of the number of occurrences in the genome occurrence number index table corresponding to each chromosome to the number of occurrences in the genome occurrence number index table of the complete set and the second threshold value. Is greater than or equal to 1.
  • the second preset error condition refers to that the sum of the ratio of the number of occurrences recorded in the genome occurrence number index table corresponding to the chromosome to the occurrence number in the genome occurrence number index table of the corpus and the second threshold is greater than or equal to 1.
  • the number of occurrences of a k-mer in the genome occurrence number index table corresponding to the chromosome is C1
  • the number of occurrences of the k-mer in the genome occurrence number index table of the complete set is C2
  • the second threshold value is P2.
  • the second preset error condition refers to C1 / C2 + P2 ⁇ 1.
  • the second threshold value is the same as the above-mentioned first threshold value, which represents an acceptable error probability, and can be any value between 0 and 1.
  • the second threshold value P2 can also be set by a technician based on the actual project.
  • the second threshold is less than 5%.
  • the second threshold value is the same as the first threshold value, which means an acceptable error probability.
  • the second threshold value can also be any value between 0 and 1, and the second threshold value can be set to a value less than 5%.
  • the first threshold and the second threshold may be equal or different.
  • the method before step 102, further includes the following steps: generating an index table of the number of occurrences of the genome corresponding to each chromosome, and the index of the number of times of the genome records that each k-mer is included in the corresponding genome of the corresponding chromosome The number of genomes of the k-mer; the index table of the number of occurrences of the genome is stored in the feature target sequence set corresponding to the chromosome.
  • the genome is all the genetic information in an organism. This genetic information is stored in the form of a nucleotide sequence.
  • the sum of the genetic material in a complete monomer of an organism is the genome.
  • Each individual's complete genome can contain multiple chromosomes, while the genome of each chromosome can contain multiple k-mers.
  • the term "chromosome genome” commonly used in the art is used here to refer to the sum of all sequences contained in a complete chromosome.
  • the number of genome occurrences corresponding to each chromosome has been recorded in the index table of the number of occurrences of the genome corresponding to each chromosome in the number of genomes corresponding to the chromosome, that is, the number of genomes index table records each k-mer
  • the genome corresponding to the corresponding chromosome contains the number of the k-mer genome.
  • the genomic appearance frequency index table corresponding to each chromosome can be stored into the feature target sequence set corresponding to each chromosome, that is, stored in the target database. After storage, if needed, Data can be retrieved from the target database at the genome occurrence index table, which improves the detection efficiency.
  • the method before obtaining the sequencing data of the sample, the method further includes: generating a genome occurrence index table of the complete set, and the genome occurrence index table of the complete set records a genome containing the k-mer in a genome included in the complete set.
  • the number of genomic appearances index table of the complete set is stored in the target database.
  • a characteristic target sequence set corresponding to each chromosome is stored.
  • the full set contains all the high-reliability genomes collected, that is, the full set contains both the high-reliability genomes of the target species corresponding to the data to be detected, and the multiple non-detected data corresponding targets.
  • Species high confidence genome.
  • the genome occurrence index table of the complete set records how many genomes of the k-mer contained in each chromosome have appeared in the complete set, that is, the genome count index table of the complete set records that each k-mer contains the genome contained in the complete set. There are the number of k-mer genomes.
  • the genome number table of the complete set actually how many genomes each k-mer contains in the complete set is recorded, that is, how many genomes each k-mer appears in the entire genome is recorded.
  • the number of measurements is the number of genomes, not the number of k-mer occurrences. If a k-mer occurs more than once in the same genome, it will still be counted only once in the genome occurrence index table of the complete set.
  • an index table of the number of occurrences of the genome for the complete set can be established.
  • the genomic appearance frequency index table of the complete set is different from the genomic appearance frequency index table corresponding to each chromosome.
  • the genomic appearance frequency index table of a certain chromosome corresponds to the chromosome, and each chromosome has its corresponding genomic appearance frequency index table , But the genomic appearance frequency index table of the complete set will only generate one, which is for all data. After storing the generated genomic appearance frequency index table of the complete set, if it is needed in the process of detecting the data to be detected, the data can be retrieved from the target database, thereby improving the detection efficiency.
  • the method further includes: generating a specific k-mer actual occurrence frequency record table corresponding to the chromosome according to the actual occurrence number.
  • the specific k-mer contained in each chromosome is stored. After the data to be detected is obtained, the data to be detected can be compared with the specific k-mer of each chromosome, that is, each The actual number of times a specific k-mer appears in the data to be detected. After obtaining the actual number of occurrences of each specific k-mer in the sequencing data, a record table of the actual occurrences of specific k-mer corresponding to each chromosome can be generated according to the acquired data.
  • M corresponding specific k-mer actual occurrence frequency record tables will be generated, and the specific k-mer actual occurrence frequency record table records the specificity contained in each chromosome.
  • the actual number of k-mer occurrences in the sequencing data is the actual number of k-mer occurrences in the sequencing data.
  • the specific number of occurrences of the specific k-mer of a particular chromosome the leftmost column records the specific k-mer contained in chromosome X, and the second column records the corresponding specificity
  • the actual number of occurrences of sexual k-mer in the sequencing data is C 1 , C 2 ,... According to the actual number of occurrences of the specific k-mer in the sequencing data, a corresponding record of the actual occurrences of the specific k-mer is generated, and the data is stored for subsequent recall, thereby improving the detection efficiency.
  • a method for detecting an abnormal chromosome copy number includes the following steps:
  • Step 1402 A feature target sequence set corresponding to each chromosome is established.
  • step 1402 includes:
  • Step 1402A Collection and sorting of high-confidence genomes.
  • the high-confidence genome can include both the genome in the target species corresponding to the data to be detected and the genome that does not belong to the target species corresponding to the data to be detected.
  • high-confidence genomes of commensal bacteria, probiotics, humans, animals, plants, and the like.
  • High confidence genomes can be derived from the NCBI's RefSeq dataset or other public or private high confidence genomes.
  • non-deterministic characters For example, for the DNA genome, the proportion of non-deterministic characters refers to the proportion of non-ACGT characters contained in it. If the proportion of non-ACGT characters in a piece of DNA genome data is too high, then the piece of data is suspected of low confidence. Genome. For DNA or RNA sequences, non-deterministic characters refer to characters other than ACGTU. For protein sequences, non-deterministic characters refer to characters other than certain amino acid characters.
  • Genomes with a low average percentage of coverage are those that are suspected of having low completion, ie, low confidence.
  • Genetic distance refers to an index that measures the size of the overall genetic difference between species (or individuals).
  • Step 1402B Determine a high-confidence sequence data set of each chromosome in the target species corresponding to the data to be detected.
  • each chromosome of the target species may contain sequences belonging to that chromosome from a different genome.
  • the first chromosome of humans can include the first chromosome of European descent, the first chromosome of North American Indian, and the first chromosome of Chinese Han.
  • the data of each identical chromosome of all high-confidence genomes of the target species are put together, that is, a high-confidence sequence data set of each chromosome of the target species is assembled.
  • the high-confidence sequence data sets of all chromosomes of the target species and the high-confidence sequence data sets of all non-target species are brought together to form a complete set. That is, a high-confidence sequence data set of all chromosomes of the target species corresponding to the data to be detected and a high-confidence sequence data set of all chromosomes of other target species are brought together to form a complete set.
  • the ratio of the copy number of each chromosome of the target species corresponding to the data to be detected is determined under normal circumstances, and the autosome and sex chromosome are distinguished.
  • a normal human genome contains 23 pairs and a total of 46 chromosomes.
  • chromosomes 1 to 22 are autosomes, and their copy numbers are two.
  • X and Y chromosomes are sex chromosomes. Normal males have only one X chromosome and one Y chromosome. Normal women have two X chromosomes and no Y chromosomes.
  • Copy number refers to the number of haploid genomes (haploid geneome) of a certain gene or a specific DNA sequence.
  • the information determined in FIG. 16 is generated only once when the target species corresponding to the data to be detected is determined, and then the information in FIG. 16 is called when analyzing each sample data that needs to be detected.
  • step 1402C an index table of the number of occurrences of the genome of the complete set is generated.
  • the genomic occurrence index table of the corpus can be generated.
  • k-mer refers to a genomic sequence of length k.
  • k can be defined by itself, and the range can generally be set between 11 and 32. If there are a different deterministic characters in a genomic data, then for a specific k, there may be a total of k different powers of k.
  • DNA has a total of four different deterministic characters of ACGT, then for a particular k, there are 4 possible k-th different k-mers.
  • n For a genome of length n, there may be at most n-k + 1 different k-mers.
  • an n-character genome contains different k-mers that are much smaller than n-k + 1. Therefore, if the ordinary k-mer counting method is used, a given k-mer may appear multiple times and may be counted multiple times in a given genome.
  • the genome occurrence index table of the complete set which is different from the previous method, if a k-mer occurs more than once in a genome, the genome occurrence index table of the complete set still counts only once. Therefore, the count corresponding to a k-mer in the resulting k-mer genome occurrence number index table represents how many genomes the k-mer has appeared in the total set.
  • each chromosome of the target genome can be operated as a species here, that is, each individual sequence that can completely represent the chromosome contained in each stained high-confidence data set of the target species is considered as For a single genome.
  • the high-confidence dataset of human chromosome 1 may contain three pieces of data, namely the chromosome 1 sequence of European descent, the chromosome 1 sequence of North American Indian, and the chromosome 1 of Chinese Han Chromosome 1 sequence, then the European chromosome 1 sequence is regarded as a complete independent genome to participate in the count of the k-mer genome occurrence index table, and the North American Indian chromosome 1 sequence is regarded as a complete The independent genome participates in the counting of the k-mer genome appearance index table.
  • the Chinese chromosome number 1 of the Han ethnic group is regarded as a complete independent genome participating in the counting of the k-mer genome appearance index table.
  • step 1402D an index table of the number of occurrences of the genome corresponding to each chromosome is generated.
  • the genome appearance number index table of a chromosome is different from the genome appearance number index table of the complete set in step 1402C.
  • the genome occurrence index table of the complete set records the complete set, that is, how many genomes of a k-mer have appeared in the complete set, but the genome occurrence number index table corresponding to the chromosome corresponds to each chromosome, and records each The k-mer contained in each chromosome has appeared in how many genomes corresponding to the chromosome.
  • Step 1402E Generate a specific k-mer table corresponding to each chromosome.
  • the specific k-mer table corresponding to each chromosome records the k-mers that satisfy the preset specific conditions in each chromosome, that is, the specific k-mer.
  • the specific k-mer is a k-mer selected from the k-mers that meets the preset specificity conditions. The selection of a specific k-mer must meet the following two conditions:
  • the high-confidence data set of the chromosome contains N genomes
  • the number of occurrences of a certain k-mer in the genome occurrence index table corresponding to the chromosome is C 1 .
  • the first threshold value P 1 and the second threshold value P 2 may be equal to or different from each other.
  • the two parameters of the first threshold P 1 and the second threshold P 2 are added, allowing an error rate within a certain range, that is, allowing the non-specificity of the specific k-mer within a certain range. . Without these two parameters, non-specificity in a certain range cannot be allowed, and it is often difficult to find a specific k-mer for a certain chromosome.
  • the probability of a false positive on this chromosome is less than or equal to P 1 n ' (that is, the power n' to P 2 ). For n 'large enough, the probability of false positives that can occur here is extremely small.
  • the false negative rate refers to the proportion of positives that produce a negative test result in the test, that is, the conditional probability that a negative test result exists considering the condition being searched for.
  • k-mer when calculating the false positive probability, can be independently corrected. For any two k-mers A and B in the specific k-mer list, if there are no less than j characters between them at their ends (for example, the last j characters of A and B's The first j characters are exactly the same), then the two k-mers A and B are considered to be coincident ends.
  • j is generally a value greater than 5 and less than or equal to k-1, that is, 5 ⁇ j ⁇ (k-1).
  • the terminal coincidence detection should include A and B, A reverse complementary sequences A 'and B, A and B reverse complementary sequences B', and A reverse complementary sequences A 'and B reverse complementary sequences B'.
  • each specific k-mer or specific region retained in the table in the final state is one Non-coincidence specific regions.
  • multiple k-mers belonging to the same non-overlapping specific region only calculate the value of P1 or P2 once. If there are M chromosomes in the target species, then a specific k-mer table of M corresponding chromosomes will be created here.
  • Step 1402F Generate a specific k-mer copy number list corresponding to each chromosome.
  • the number of occurrences of each specific k-mer screened out is calculated, that is, the specific k-mer on this chromosome Record as many occurrences as possible in all genomes of the high confidence dataset.
  • the number of copies of each specific k-mer of the chromosome is calculated from the number of occurrences of one k-mer, which is the least frequent of all specific k-mers of the chromosome, that is, Cm. If the target species has a total of M chromosomes, then a specific k-mer copy number list of M corresponding chromosomes will be created here.
  • the copy number of specific k-mer is a value greater than or equal to one.
  • Module A can be run from time to time in order to continuously update the feature target sequence set corresponding to each chromosome, that is, update the target database. For example, whenever the reference genome data is updated, module A can be run. However, module A does not need to be run or updated during the analysis of each actual sample.
  • Step 1404 Calculate the actual signal strength of each chromosome contained in the target sample corresponding to the data to be detected.
  • step 1404 includes:
  • Step 1404A Obtain data to be detected.
  • Step 1404B Obtain a specific k-mer list and a specific k-mer copy number list.
  • Step 1404C Obtain the actual number of occurrences of the specific k-mer contained in each chromosome in the data to be detected.
  • the specific k-mer list and specific k-mer copy number list of each chromosome in the target species generated in step 1402 are called. If there are M chromosomes in the target species corresponding to the data to be detected, a total of M specific k-mer lists and specific k-mer copy number lists corresponding to each chromosome need to be called. The actual number of occurrences of the specific k-mer contained in each chromosome of the target species in the data to be detected is obtained. The number of occurrences of the specific k-mer can be recorded to the corresponding position in the actual number of occurrences of the specific k-mer of the corresponding chromosome. That is, according to the actual number of occurrences of the specific k-mer contained in each chromosome in the data to be detected, a record table of the actual number of occurrences of the specific k-mer corresponding to the chromosome is generated.
  • step 1404D a single copy signal intensity E of each chromosome is calculated.
  • a single copy signal strength calculation table for a specific chromosome is shown in FIG.
  • any specific k-mer belonging to this specific chromosome can be obtained.
  • the adjusted number of occurrences of all specific k-mers of the chromosome is averaged, and the average value is the single copy signal intensity E of the chromosome.
  • the single copy signal intensity E of each chromosome contained in the target species can be recorded and stored through the single copy signal intensity record table of each chromosome as shown in FIG. 19 .
  • Step 1404E calculate the actual signal strength S of each chromosome.
  • the average M and variance SD of all single-copy signal intensity E can be calculated.
  • the calculation formulas for other chromosomes are also calculated in this way.
  • Step 1406 Calculate a standard confidence interval list corresponding to the chromosome contained in the target species according to the standard detection sample.
  • the actual signal intensity of each chromosome contained in each standard test sample can be calculated in the manner in step 1404.
  • the standard signal strength of the standard test sample is referred to as the standard signal strength.
  • the standard signal intensity of each chromosome contained in each standard detection sample can be calculated.
  • the standard signal intensity corresponding to the chromosomes contained in all the standard test samples can be recorded in a table. Further, gender-sensitive records can be distinguished. That is, a standard signal intensity record table of chromosomes in normal male samples and a standard signal intensity record table of chromosomes in normal female samples are generated.
  • the standard signal intensities of each chromosome included in all the standard detection samples are statistically calculated, and the mean value M 'and the variance SD' of the standard signal intensity distributions of the respective standard detection samples of each chromosome are calculated.
  • the standard test sample is human and there are 100 standard test samples, then there are 100 chromosomes 1, 100 chromosomes 2, ..., 100 22 stains.
  • the specific number of X and Y sex chromosomes needs to be determined according to the gender of these 100 people. Therefore, in order to meet the number of X and Y sex chromosomes, the number of standard test samples for a certain sex should also be required. So for chromosome 1, there are 100 standard signal intensities.
  • the corresponding mean and variance of chromosome 1 can be calculated according to the standard signal intensities corresponding to the 100 chromosomes 1, and the mean and variance of standard signal intensities of other chromosomes can also be calculated.
  • a standard confidence interval corresponding to each chromosome contained in the standard detection sample when the confidence value is preset can be determined, that is, an interval of standard signal strength. That is, two boundary values LB and UB of the confidence interval with the confidence degree P are obtained. LB is the minimum of the confidence interval, and UB is the maximum of the confidence interval.
  • P is generally a value greater than 0.95, infinitely close to 1 but not equal to 1. In practical applications, the confidence level can be adjusted as required. For example, with 95% confidence, P is 0.95, and 99.9% confidence, P is 0.999.
  • a distribution table of P-confidence boundary values of the actual signal strengths of the chromosomes corresponding to the two sexes of the target species can be obtained.
  • the standard confidence interval corresponding to each chromosome of the target species can be estimated in a statistical manner by calculating statistics on the standard signal intensity of the chromosomes of a large number of standard test sample data. That is, the actual signal intensity interval corresponding to each chromosome of the target species in the normal sample when the reliability P value is preset is estimated.
  • the above standard test sample can also be: a peripheral blood sample of a normal mother carrying a normal baby, the peripheral blood sample includes a peripheral blood sample of a normal mother carrying a normal baby boy, and a normal mother carrying a normal baby girl Peripheral blood samples, peripheral blood samples from normal mothers carrying normal baby boy twins, peripheral blood samples from normal mothers carrying normal baby girl twins, and peripheral blood samples from normal mothers carrying normal one male and one female twin. Therefore, when making a distribution table of P-confidence boundary values, the table can also be adjusted according to the difference in the standard detection samples.
  • Step 1408 It is detected whether there is an abnormal copy number in the data to be detected.
  • the actual signal intensity of each chromosome can be compared with each chromosome of the target species obtained in step 1406 above when the reliability P value is set.
  • the corresponding standard confidence intervals are compared separately.
  • the actual signal intensity of chromosome 1 contained in the target species corresponding to the data to be detected is compared with the standard confidence interval of chromosome 1.
  • the actual signal intensity of chromosome 1 is not within the standard confidence interval of chromosome 1, it can be determined that copy number abnormality exists in chromosome 1. Conversely, it can be determined that chromosome 1 is not copy number abnormal.
  • step 1406 a distribution table of pre-set reliability values P of standard signal intensities of chromosomes in the corresponding samples is established according to different genders of the target species. Therefore, the actual signal intensity of the sex chromosome can also be compared with the distribution table of the preset reliability values P corresponding to different genders. The actual signal intensity of the X chromosome and the actual signal intensity of the Y chromosome calculated from the data to be tested are compared with the boundary value of the confidence interval in the distribution table of the preset confidence value P corresponding to different genders.
  • the data to be tested corresponds Gender is male. If the calculated actual signal intensity of the X chromosome and the actual signal intensity of the Y chromosome in the data to be detected are in the distribution table of the preset reliability value P of the standard signal intensity of the chromosome in a normal female sample, then the data to be tested corresponds to Gender is female.
  • the actual signal strength of each chromosome in the data to be detected is compared with the confidence interval of each chromosome in the distribution table of the preset confidence value P.
  • the probability of false positives can be reduced by increasing the preset reliability value P. But increasing P increases the probability of false negatives.
  • the actual signal intensity of each chromosome can be compared with the standard confidence interval of the corresponding chromosome, and the chromosome that is not within the standard confidence interval of the corresponding chromosome can be determined as a chromosome with abnormal copy number.
  • This method of detecting chromosome copy number abnormalities is compared with the characteristic target sequence in each chromosome of the target species, that is, the specific k-mer, which is part of the entire target species genome, and is therefore specific.
  • the comparison of the performance k-mer can reduce the comparison space, thereby shortening the analysis time and improving the detection efficiency.
  • the characteristic target of each chromosome of the target species generated here is the integration of multiple genomes of different individuals or populations in the target species, thus avoiding "when a set of data comes from a genetic relationship that is far away from the reference genome Individuals, the effect of using whole-genome alignments becomes worse.
  • a device for detecting an abnormal chromosome copy number including:
  • the specific k-mer acquisition module 2102 is used to obtain sequencing data of a sample to be detected as the data to be detected, and determine a target species corresponding to the data to be detected; and acquire a specificity corresponding to each chromosome contained in the target species stored in the target database.
  • Sexual k-mer, specific k-mer is the k-mer in each chromosome that meets the preset specificity conditions, k-mer refers to the genomic sequence of length k;
  • the actual appearance frequency obtaining module 2104 is configured to obtain the actual appearance times of the specific k-mer included in each chromosome in the data to be detected;
  • the copy number acquisition module 2106 is used to obtain the copy number of each specific k-mer from the target database.
  • the copy number is the least number of occurrences of the specific k-mer on the corresponding chromosome and the number of occurrences on the chromosome. Ratio of occurrences of specific k-mers;
  • a determination module 2108 configured to calculate the actual signal intensity of the corresponding chromosome according to the actual number of occurrences and the copy number of each specific k-mer; determine that the chromosome whose actual signal intensity is not within the standard confidence interval of the corresponding chromosome exists as a copy number Abnormal chromosomes.
  • the determination module 2108 is further configured to calculate the ratio of the actual number of occurrences of each specific k-mer to the number of copies; calculate the actual number of occurrences and the number of copies of all specific k-mers contained in each chromosome The average value of the ratio of the chromosomes is used as the single-copy signal strength of the corresponding chromosome; and the actual signal strength of the corresponding chromosome is calculated based on the single-copy signal strength of each chromosome.
  • the actual signal intensity of the corresponding chromosome is calculated according to the following formula:
  • the actual signal intensity of the chromosome (single copy signal intensity of the chromosome-M) / SD, where M is the average of the single copy signal intensity of all chromosomes, and SD is the variance of the single copy signal intensity of all chromosomes.
  • the apparatus for detecting abnormal copy number of a chromosome further includes a standard confidence interval list calculation module (not shown in the figure) for obtaining a preset number of standard test samples, and the standard test samples are confirmed as having no chromosomes.
  • Samples with abnormal copy number Obtain the actual number of occurrences of the specific k-mer contained in each chromosome in the standard test sample in the data to be tested; obtain from the target database each of each chromosome contained in the standard test sample Copy number of specific k-mer; get the standard signal intensity of the corresponding chromosome according to the actual number of occurrences and copy number of each specific k-mer included in the standard detection sample; detect each chromosome in the sample according to multiple standards
  • the standard signal strength of the chromosome determines the standard confidence interval corresponding to the chromosome when the confidence value is preset; and according to the standard confidence interval corresponding to each chromosome, a list of standard confidence intervals corresponding to the chromosome contained in the target species is obtained.
  • the above-mentioned standard confidence interval list calculation module is further configured to obtain the standard signal intensity of each chromosome contained in each standard detection sample; and calculate the chromosome Mean and variance of standard signal strengths; and based on the mean and variance of standard signal strengths in multiple standard test samples for each chromosome for the corresponding gender, determine the pre-set reliability of the chromosomes contained in the standard test samples corresponding to each gender The standard confidence interval corresponding to the value.
  • the standard test sample is a peripheral blood sample of a normal mother carrying a normal baby.
  • the peripheral blood sample includes a peripheral blood sample of a normal mother carrying a normal baby boy, and a peripheral mother's peripheral blood sample. Peripheral blood samples from normal mothers carrying normal baby boy twins, Peripheral blood samples from normal mothers carrying normal baby girl twins, and Peripheral blood samples from normal mothers carrying normal one boy and one female twin.
  • the above-mentioned standard confidence interval list calculation module is further configured to determine a standard confidence interval corresponding to a chromosome at a preset confidence value according to a standard signal intensity of each chromosome contained in a peripheral blood sample of a normal mother carrying a normal baby boy;
  • the standard signal intensity of each chromosome contained in the peripheral blood sample of a normal baby girl is determined by the standard confidence interval of the chromosome when the confidence value is preset; according to the The standard signal intensity of each chromosome determines the standard confidence interval of the chromosome when the confidence value is preset; according to the standard signal intensity of each chromosome contained in the peripheral blood sample of a normal mother carrying a normal baby girl twin, it is determined that the chromosome is in a preset
  • the above-mentioned determination module 2108 is further configured to, when it is detected that the actual signal intensity corresponding to the chromosome does not belong to the standard confidence interval corresponding to the corresponding chromosome, determine the chromosome corresponding to the actual signal intensity as a copy number Abnormal chromosomes.
  • the apparatus for detecting abnormal copy number of a chromosome further includes a gender division comparison module (not shown in the figure) for determining a standard confidence interval list of a chromosome corresponding to each gender according to the gender of the target species; respectively Compare the actual signal strength of each chromosome with the standard confidence interval corresponding to the corresponding chromosome in the list of standard confidence intervals for the corresponding sex of the target species; and when it is detected that the actual signal strength of the chromosome does not belong to the corresponding sex When corresponding to the standard confidence interval of a chromosome, the chromosome corresponding to the actual signal intensity is determined as a chromosome with abnormal copy number.
  • the above-mentioned apparatus for detecting abnormal copy number of a chromosome further includes a target sequence creation module (not shown in the figure), configured to obtain a specificity contained in each chromosome included in the target species stored in the target database.
  • the number of occurrences of sexual k-mer in the corresponding chromosome C, and the number of occurrences of specific k-mer in the corresponding chromosome are taken as the minimum occurrences Cm; the ratio of the occurrences C to the minimum occurrences Cm is taken as the specificity Copy number of specific k-mer; generating a specific k-mer copy number list corresponding to each chromosome according to the copy number of specific k-mer contained in each chromosome; and storing the specific k-mer copy number list To the target database.
  • the above-mentioned target sequence creation module is further configured to obtain multiple chromosomes contained in the target species; classify and sort multiple chromosomes contained in the target species; and obtain a pre-selected condition that satisfies a preset credibility High-confidence genome; and determining the high-confidence genome corresponding to each chromosome contained in the target species.
  • satisfying the preset credibility condition includes any of the following: when the proportion of non-deterministic characters contained in the chromosome sequence is lower than a preset proportion threshold; the sequence belonging to the same chromosome included in the chromosome sequence When the fragment is below the preset fragment threshold; compare a certain chromosome sequence with all other chromosomal sequences whose genetic relationship meets the preset genetic distance threshold range to determine the average full coverage of the chromosome sequence in the similar chromosome sequences Percentage, when the average coverage percentage is higher than the preset percentage value.
  • the k-mer in the specific k-mer satisfies the following two conditions: the number of occurrences in the genome occurrence index table corresponding to each chromosome meets a first preset error condition; The number of occurrences in the genome occurrence index table corresponding to each chromosome, and the occurrences in the genome occurrence index table of the complete set meet the second preset error condition; the genome appearance index table records the corresponding chromosome of each k-mer Contains the number of k-mer genomes in the genome; the genome occurrence index table of the complete set records the k-mers contained in each chromosome in the target species, and the k-mer genomes in the complete set contain the k-mer genomes. Number.
  • the first preset error condition is: the sum of the ratio of the number of occurrences in the genome occurrence number index table corresponding to each chromosome to the number of genomes contained in the corresponding chromosome and the first threshold is greater than or equal to 1.
  • the first threshold is less than 5%.
  • the second preset error condition is: the ratio of the number of occurrences in the genome occurrence number index table corresponding to each chromosome to the number of occurrences in the genome occurrence number index table of the complete set and the second threshold value. Is greater than or equal to 1.
  • the second threshold is less than 5%.
  • Each module in the above apparatus for detecting abnormal copy number of a chromosome can be realized in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 22.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in a non-volatile storage medium.
  • the computer equipment database is used to store data for detecting abnormal chromosome copy numbers.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by a processor to implement a method for detecting abnormalities in chromosome copy number.
  • FIG. 22 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied.
  • the specific computer equipment may be Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.
  • a computer device includes a memory and one or more processors.
  • Computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the method for detecting an abnormality of a chromosome copy number provided in any embodiment of the present application is implemented. A step of.
  • One or more non-transitory computer-readable storage media storing computer-readable instructions.
  • the computer-readable instructions When executed by one or more processors, the one or more processors implement one of the embodiments of the present application.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Organic Chemistry (AREA)
  • Medical Informatics (AREA)
  • Zoology (AREA)
  • Biotechnology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Wood Science & Technology (AREA)
  • Bioethics (AREA)
  • Epidemiology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Genetics & Genomics (AREA)
  • Artificial Intelligence (AREA)
  • Immunology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Biochemistry (AREA)
  • Evolutionary Computation (AREA)
  • Analytical Chemistry (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne un procédé de détection de variations du nombre de copies de chromosome, comprenant : l'obtention de données de séquençage d'un échantillon à détecter comme données à détecter, et la détermination d'une espèce cible correspondant aux données à détecter ; l'obtention de mère k spécifique correspondant à chaque chromosome compris dans l'espèce cible stocké dans une base de données cible ; l'obtention du nombre d'apparitions réel du mère k spécifique compris dans chaque chromosome dans les données à détecter ; l'obtention du nombre de copies de chaque mère k spécifique à partir de la base de données cible ; l'obtention, selon le nombre d'apparitions réel et le nombre de copies de chaque mère k spécifique, de l'intensité de signal réel du chromosome correspondant par calcul ; et la détermination des chromosomes dont les intensités de signal réel ne se trouvent pas dans un intervalle de confiance standard du chromosome correspondant comme chromosomes ayant des variations du nombre de copies.
PCT/CN2018/111958 2018-06-22 2018-10-25 Procédé et appareil de détection de variations du nombre de copies de chromosome, et milieu de stockage Ceased WO2019242187A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810651441.6A CN109192246B (zh) 2018-06-22 2018-06-22 检测染色体拷贝数异常的方法、装置和存储介质
CN201810651441.6 2018-06-22

Publications (1)

Publication Number Publication Date
WO2019242187A1 true WO2019242187A1 (fr) 2019-12-26

Family

ID=64948725

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/111958 Ceased WO2019242187A1 (fr) 2018-06-22 2018-10-25 Procédé et appareil de détection de variations du nombre de copies de chromosome, et milieu de stockage

Country Status (2)

Country Link
CN (1) CN109192246B (fr)
WO (1) WO2019242187A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112151112A (zh) * 2019-06-27 2020-12-29 天津中科智虹生物科技有限公司 一种遗传基因检测的方法和装置
CN113409885B (zh) * 2021-06-21 2022-09-20 天津金域医学检验实验室有限公司 一种自动化数据处理以及作图方法及系统
CN113793641B (zh) * 2021-09-29 2023-11-28 苏州赛美科基因科技有限公司 一种从fastq文件中快速判断样本性别的方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140100792A1 (en) * 2012-10-04 2014-04-10 Sequenom, Inc. Methods and processes for non-invasive assessment of genetic variations
CN104745718A (zh) * 2015-04-23 2015-07-01 北京嘉宝仁和医疗科技有限公司 一种检测人类胚胎染色体微缺失和微重复的方法
CN104789686A (zh) * 2015-05-06 2015-07-22 安诺优达基因科技(北京)有限公司 检测染色体非整倍性的试剂盒和装置
WO2017094941A1 (fr) * 2015-12-04 2017-06-08 주식회사 녹십자지놈 Procédé de détermination de la variation du nombre de copies dans un échantillon comprenant un mélange d'acides nucléiques

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5632382B2 (ja) * 2008-10-31 2014-11-26 アッヴィ・インコーポレイテッド 遺伝子コピー数変化のパターンに基づいた非小細胞肺癌のゲノム分類
US9898687B2 (en) * 2011-08-03 2018-02-20 Trigeminal Solutions, Inc. Technique for identifying association variables
KR102393608B1 (ko) * 2012-09-04 2022-05-03 가던트 헬쓰, 인크. 희귀 돌연변이 및 카피수 변이를 검출하기 위한 시스템 및 방법
CN104951672B (zh) * 2015-06-19 2017-08-29 中国科学院计算技术研究所 一种第二代、三代基因组测序数据联用的拼接方法及系统
CN107287285A (zh) * 2017-03-28 2017-10-24 上海至本生物科技有限公司 一种预测同源重组缺失机制及患者对癌症治疗响应的方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140100792A1 (en) * 2012-10-04 2014-04-10 Sequenom, Inc. Methods and processes for non-invasive assessment of genetic variations
CN104745718A (zh) * 2015-04-23 2015-07-01 北京嘉宝仁和医疗科技有限公司 一种检测人类胚胎染色体微缺失和微重复的方法
CN104789686A (zh) * 2015-05-06 2015-07-22 安诺优达基因科技(北京)有限公司 检测染色体非整倍性的试剂盒和装置
WO2017094941A1 (fr) * 2015-12-04 2017-06-08 주식회사 녹십자지놈 Procédé de détermination de la variation du nombre de copies dans un échantillon comprenant un mélange d'acides nucléiques

Also Published As

Publication number Publication date
CN109192246B (zh) 2020-10-16
CN109192246A (zh) 2019-01-11

Similar Documents

Publication Publication Date Title
US20230114581A1 (en) Systems and methods for predicting homologous recombination deficiency status of a specimen
US11848107B2 (en) Predicting likelihood and site of metastasis from patient records
Robertson et al. Longitudinal dynamics of clonal hematopoiesis identifies gene-specific fitness effects
JP7689557B2 (ja) 相同組換え欠損を推定するための統合された機械学習フレームワーク
US20250364135A1 (en) Systems and methods for multi-label cancer classification
Gupta et al. Hierarchical clustering can identify B cell clones with high confidence in Ig repertoire sequencing data
Tarca et al. A novel signaling pathway impact analysis
KR20230044325A (ko) 유전적 변이의 비침습 평가를 위한 방법 및 프로세스
JP2023514851A (ja) 癌の病態を判別または示すメチル化パターンの同定
CN110770838A (zh) 用于确定体细胞突变克隆性的方法和系统
Sahlin et al. Identification of putative pathogenic single nucleotide variants (SNVs) in genes associated with heart disease in 290 cases of stillbirth
WO2019242187A1 (fr) Procédé et appareil de détection de variations du nombre de copies de chromosome, et milieu de stockage
WO2019242445A1 (fr) Procédé de détection, dispositif, équipement d'ordinateur et support d'informations de groupe d'opérations pathogènes
Daw et al. A paradigm for calling sequence in families: the long life family study
Kurkiewicz et al. Towards development of a statistical framework to evaluate myotonic dystrophy type 1 mRNA biomarkers in the context of a clinical trial
Bishop et al. A research-based gene panel to investigate breast, ovarian and prostate cancer genetic risk
Gao et al. Haplotype-enhanced inference of somatic copy number profiles from single-cell transcriptomes
US11535896B2 (en) Method for analysing cell-free nucleic acids
Fan et al. Unraveling the H19/GAS1 axis in recurrent implantation failure: A potential biomarker for diagnosis and insight into immune microenvironment alteration
Lu et al. Overcoming genetic drop-outs in variants-based lineage tracing from single-cell RNA sequencing data
Qu et al. Adaptive parameter of standard deviation enhances the power of noninvasive prenatal screens
Poletti TiMMing: developing an innovative suite of bioinformatic tools to harmonize and track the origin of copy number alterations in the evolutive history of multiple myeloma
CN120727100A (zh) 一种乳腺癌易感基因有害胚系变异评估模型的构建方法及其系统
HK40047016B (zh) 基於检测限的质量控制度量
Inkeles Applications of high-throughput genome and transcriptome analysis in human disease

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18923012

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 18/05/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18923012

Country of ref document: EP

Kind code of ref document: A1