WO2025138253A1 - Procédé et appareil de détection de variation génétique, support de stockage et dispositif informatique - Google Patents
Procédé et appareil de détection de variation génétique, support de stockage et dispositif informatique Download PDFInfo
- Publication number
- WO2025138253A1 WO2025138253A1 PCT/CN2023/143619 CN2023143619W WO2025138253A1 WO 2025138253 A1 WO2025138253 A1 WO 2025138253A1 CN 2023143619 W CN2023143619 W CN 2023143619W WO 2025138253 A1 WO2025138253 A1 WO 2025138253A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- variation
- sites
- gene
- site
- base
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
Definitions
- the present disclosure relates to the field of gene detection technology, and in particular to a gene variation detection method, device, storage medium and computer equipment.
- Genetic variation detection refers to sequencing and differential analysis of genes of a species or population through sequencing technology to obtain a large amount of genetic variation information such as single nucleotide polymorphism (SNP), insertion, deletion, structure variation (SV) and copy number variation (CNV) in the genome.
- SNP single nucleotide polymorphism
- SV structure variation
- CNV copy number variation
- Nanopore sequencing (NS) technology is a new type of single-molecule DNA sequencing technology that uses the change in current in a protein nanopore to identify DNA base sequences. Compared with traditional second-generation sequencing technology that is limited by read length, nanopore sequencing has a longer read length, which can solve the problem of poor performance of second-generation sequencing technology in detecting variants in complex genomic regions.
- the accuracy of gene variation detection based on nanopore sequencing results is not high.
- the main purpose of the embodiments of the present application is to provide a gene variation detection method, device, storage medium and computer equipment, which can improve the accuracy of gene variation detection.
- a first aspect of an embodiment of the present application provides a method for detecting gene variation, the method comprising:
- a target haplotype corresponding to the gene fragment to be tested is determined in a reference haplotype set, and an insertion or deletion site of the gene fragment to be tested is determined based on the target haplotype.
- a second aspect of the present application provides a gene variation detection device, the device comprising:
- a first acquisition unit is used to acquire base sequence comparison result data corresponding to the sequencing data of the gene fragment to be tested;
- a detection unit configured to perform gene variation detection on the sequence alignment result data based on a first gene variation detection model to obtain a plurality of single-base variation sites, wherein the first gene variation detection model is a model for performing gene variation detection based on gene stacking;
- a second acquisition unit is used to acquire quality scores corresponding to the multiple single-base variation sites, and determine the single-base variation sites with quality scores greater than a preset threshold as first-category variation sites;
- a determination unit is used to determine a target haplotype corresponding to the gene fragment to be tested in a reference haplotype set based on the first type of variation site, and to determine an insertion or deletion site of the gene fragment to be tested based on the target haplotype.
- the determining unit includes:
- a first calculation subunit is used to calculate the genotype likelihood values corresponding to the first type of variant sites and multiple alleles
- the first determination subunit is used to determine the target haplotype corresponding to the gene segment to be tested in the reference haplotype set based on the genotype likelihood value.
- the first determining subunit is further configured to:
- the iterative updating step is executed cyclically until the number of cycles reaches a preset number, and the haplotype obtained by the last update is output as the target haplotype of the gene fragment to be tested, wherein the iterative updating step comprises the following steps:
- the haplotype of the gene fragment to be tested is updated based on the combination of individual genotypes of each single-base variation site.
- the second acquisition unit includes:
- a detection subunit used to detect potential mutation sites on the base sequence comparison result data, and obtain multiple potential mutation sites and a quality score corresponding to each potential mutation site;
- a second determination subunit configured to determine quality scores corresponding to the plurality of single base variation sites according to a matching relationship between the plurality of single base variation sites and the plurality of potential variation sites;
- the third determination subunit is used to determine the single base variation site with a quality score greater than a preset threshold as a first type variation site.
- the gene variation detection device provided by the present application further includes:
- a receiving subunit used for receiving an input quality value control ratio
- the second calculation subunit is used to calculate a preset threshold based on the quality value control ratio and the quality score corresponding to each potential variation site.
- the fourth determination subunit is used to determine the single base variation whose quality score is not greater than the preset threshold.
- the site was determined to be a type II variant site;
- a review subunit configured to review the second-category variation of the second-category variation site based on a second gene variation detection model, and determine the site with a confirmed variation as a third-category variation site as a result of the review, wherein the second gene variation detection model is a model for performing gene variation detection based on full alignment;
- the fifth determination subunit is used to determine the target variation detection result of the gene fragment to be tested according to the first type of variation site, the third type of variation site and the insertion or deletion site.
- the fifth determining subunit includes:
- a screening module used to perform quality screening on the first type of variation sites and the third type of variation sites to obtain a target single-base variation site;
- the first determination module is used to determine the target variation detection result of the gene fragment to be tested according to the target single base variation site and the insertion or deletion site.
- the screening module includes:
- An acquisition submodule used to acquire the mutation frequencies corresponding to the first type of variation sites and the third type of variation sites;
- the determination submodule is used to determine that the mutation sites among the first type of mutation sites and the third type of mutation sites whose mutation frequencies are greater than a preset frequency threshold are target single-base mutation sites.
- the review subunit includes:
- An acquisition module used to acquire base sequence features corresponding to each second-category variation site
- an identification module configured to perform gene variation identification on each of the base sequence features based on a second gene variation detection model to obtain a gene variation identification result
- An acquisition subunit used to acquire nanopore sequencing data of the gene fragment to be tested
- the alignment subunit is used to compare the nanopore sequencing data to the reference genome to obtain base sequence alignment result data.
- a third aspect of an embodiment of the present application proposes a computer device, comprising a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, the gene variation detection method described in the first aspect is implemented.
- the fourth aspect of the embodiments of the present application proposes a computer-readable storage medium, which stores a computer program.
- the computer program When the computer program is executed by a processor, it implements the gene variation detection method described in the first aspect.
- the fifth aspect of an embodiment of the present application proposes a computer program product, which includes a computer program, and the computer program is read and executed by a processor of a computer device, so that the computer device executes the gene variation detection method described in the first aspect.
- the gene variation detection method, device, storage medium and computer equipment proposed in the embodiments of the present application obtains base sequence comparison result data corresponding to the sequencing data of the gene fragment to be tested; performs gene variation detection on the sequence comparison result data based on the first gene variation detection model to obtain multiple single-base variation sites, and the first gene variation detection model is a model for gene variation detection based on gene stacking;
- the quality scores corresponding to multiple single-base variation sites are obtained, and the single-base variation sites with quality scores greater than a preset threshold are determined as first-category variation sites; based on the first-category variation sites, a target haplotype corresponding to the gene fragment to be tested is determined in the reference haplotype set, and the insertion or deletion site of the gene fragment to be tested is determined based on the target haplotype.
- this method determines the accurate target haplotype of the gene fragment to be tested in the reference haplotype set through the high-quality single-base variation sites measured by the model for gene variation detection based on gene stacking, and then determines the insertion/deletion site of the gene fragment to be tested according to the insertion/deletion site of the target haplotype, thereby improving the accuracy of insertion/deletion site detection when performing gene variation detection based on sequencing results when the sequencing results are inaccurate, thereby improving the accuracy of gene variation detection.
- FIG1 is a flow chart of a method for detecting gene variation provided in an embodiment of the present application.
- FIG2 is a flow chart of step 101 in FIG1 ;
- FIG3 is a flow chart of step 103 in FIG1 ;
- FIG4 is a schematic diagram of a supplementary process of a gene variation detection method provided in an embodiment of the present application.
- FIG5 is a flow chart of step 104 in FIG1 ;
- FIG6 is a flowchart of the burn-in iteration process
- Fig. 7 is a schematic diagram of the overall process of using high-quality SNP sites and reference haplotype sets to perform haplotype identification on the gene fragment to be tested;
- FIG8 is another supplementary flow diagram of the gene variation detection method provided in an embodiment of the present application.
- FIG9 is a flow chart of step 803 in FIG8 ;
- FIG10 is a flow chart of step 901 in FIG9 ;
- FIG11 is a flow chart of step 802 in FIG8 ;
- FIG12 is a schematic diagram of the overall process of the gene variation detection method provided by the present disclosure.
- FIG13 is a comparison diagram of the Indel mutation detection effects of the disclosed solution and related solutions.
- FIG. 15 is a schematic diagram of the hardware structure of a computer device provided in an embodiment of the present application.
- Haplotype refers to the combination pattern of alleles at multiple loci of an individual, that is, the arrangement of alleles at a series of connected loci.
- Genotype refers to the general term for all gene combinations of an individual organism. It reflects the genetic composition of the organism, that is, the sum of all genes obtained from its parents. The genotype used in genetics often refers to the genotype of a certain trait. If two organisms have a different genotype at one point, their genotypes are different. Therefore, genotype refers to all combinations of all alleles at all loci of an individual.
- Alleles are genes that control different forms of the same trait at the same position on a pair of homologous chromosomes. If the two alleles on the gene (or locus) on the homologous chromosomes are the same, then the individual becomes homozygous for the trait. If the alleles are different, the individual is heterozygous for the trait.
- Read length Due to the limitation of sequencing level, genome sequencing requires breaking the genome into DNA fragments before library construction and sequencing. Read length refers to the base sequence obtained by a single sequencing by a sequencer. Different sequencing instruments have different read lengths. Sequencing the entire genome will generate hundreds of millions of reads.
- the first solution is to use base stacking as the input of the neural network.
- the principle is that multiple DNA sequencing reads are aligned and stacked at specific positions to form a sequence stacking graph.
- the base at each position is represented by the height and frequency in the stacking graph, which can capture the variation information and heterozygosity between different individuals to identify the variation site.
- This method can efficiently and accurately identify single-base variation sites (SNP sites), but when the accuracy of sequencing data is insufficient, the accuracy of identifying insertion/deletion sites (Indel sites) is not high.
- the second method is to use full alignment as the input of the neural network.
- the principle is to use DNA sequencing reads to fully align with the reference genome to form an overall comparison result, which can directly obtain the base and mutation status of each position and accurately detect single base mutations.
- This method has high accuracy in detecting mutations at SNP sites and Indels, but it is inefficient.
- the present application embodiment provides a kind of gene variation detection method, device, storage medium and computer equipment. This method is supplemented by introducing high-quality reference haplotype set as long read length sequencing data, infers the haplotype of sample based on high-quality reference haplotype set and high-quality SNP site, and then determines the accurate Indel site of sample according to the inferred haplotype.Below, the gene variation detection method provided by the present application embodiment is described.
- Step 101 obtaining base sequence comparison result data corresponding to the sequencing data of the gene fragment to be tested;
- This method can be called a genotype imputation method.
- This method uses the linkage disequilibrium (LD) principle to impute the sites with poor sequencing data in the gene fragment to be tested or the sites with inaccurate identification due to background noise based on known high-quality SNP sites and known high-quality reference genotypes, thereby determining the target haplotype corresponding to the gene fragment to be tested, and then determining the Indel site in the gene fragment to be tested based on the target haplotype, which can greatly improve the accuracy of detecting the Indel site in the gene fragment to be tested.
- LD linkage disequilibrium
- the above-mentioned gene fragment to be tested can be a complete DNA fragment or RNA fragment, or a part of the sub-fragment cut out of the DNA fragment or RNA fragment.
- the DNA fragment or RNA fragment here can be a DNA or RNA fragment of any species.
- the sequencing data of the gene fragment to be tested can be any type of sequencing data, that is, it can be sequencing data measured by any sequencing method, for example, the first generation sequencing method (such as Sanger method, or dideoxy chain termination method), the second generation sequencing method (such as Illumina method, or massive parallel sequencing) and the third generation sequencing method (such as nanopore sequencing or PacBio sequencing) can be used.
- the first gene variation detection model can be specifically the model using base stacking as the neural network input introduced above.
- the linkage disequilibrium principle specifically refers to the non-random association at two or more sites, which may be on the same chromosome or on different chromosomes. As long as the probability of two sites appearing at the same time is greater than the probability of random combination of the population, it means that the two sites are in a state of linkage disequilibrium. Linkage disequilibrium is caused by mutation or recombination, and linkage disequilibrium occurs when a new mutation occurs near a certain SNP of the chromosome. The strength of linkage disequilibrium is related to the distance between two SNPs.
- the genotype filling method provided in the present disclosure is applied in the scene of gene detection by nanopore sequencing, and can have a more obvious effect.
- Step 604 updating the haplotype of the gene fragment to be tested based on the combination of individual genotypes at each single-base variation site.
- the target genotype corresponding to the gene fragment to be tested is determined in a high-quality reference genotype set through multiple rounds of Burn-in iterations.
- the haplotype corresponding to the gene fragment to be tested can be found in the reference haplotype set based on the high-quality SNP sites to form a candidate haplotype subset.
- the PBWT algorithm can be used to determine multiple haplotypes that are most similar to the sequencing sequence of the gene fragment to be tested in the reference genotype set based on the high-quality SNP sites to form a candidate haplotype subset.
- the probability of the occurrence of the base corresponding to each first variable site can be further calculated based on the genotype likelihood value and the candidate haplotype subset, and the probability can be used as the emission probability of the hidden Markov model.
- the hidden Markov model specifically the Li and Stephens hidden Markov model (Li and Stephens HMM)
- Li and Stephens HMM Li and Stephens hidden Markov model
- the haplotype of the gene fragment to be tested can be updated based on different combinations of individual genotypes of each SNP site.
- the target monomer type of the gene fragment to be tested can be obtained, and the accurate genotype of the gene fragment to be tested can also be obtained.
- Figure 7 is a schematic diagram of the overall process of using high-quality SNP sites and reference haplotype sets to identify the haplotype of the gene fragment to be tested. That is, the genotype likelihood value of the gene variant site is first calculated based on the long-read sequencing data and the existing high-quality reference haplotype set, and then the occurrence probability of the corresponding base is calculated based on the genotype likelihood value of the gene variant site as the emission probability input to Li and Stephen In the hidden Markov model, multiple rounds of iterations are used to determine the target haplotype corresponding to the gene fragment to be tested based on the reference haplotype set.
- a gene variation detection method is provided that can obtain the complete gene variation site (i.e., including the SNP site and the Indel site) of the gene fragment to be tested.
- steps 801 to 803 may also be included.
- Step 801 determining a single base variation site whose quality score is not greater than a preset threshold as a second type variation site
- Step 802 performing single base variation review on the second type of variation sites based on the second gene variation detection model, and determining the sites with confirmed variation as third type variation sites as a result of the review;
- Step 803 determining the target variation detection result of the gene fragment to be tested according to the first type variation site, the third type variation site, and the insertion or deletion site.
- the Indel sites can be directly eliminated due to their low quality. Then, the genotype is filled according to the high-quality SNP in the SNP to obtain the target haplotype, and the accurate Indel site of the gene fragment to be tested is determined according to the target haplotype.
- the high-quality SNP sites i.e., the first type of variation sites
- the high-quality SNP sites can be directly used as part of the SNP sites of the gene fragment to be tested due to their high quality.
- the disclosed embodiment can use the second gene variation detection model to perform variation review on them. If the second variation detection model confirms that it is a SNP site, it is confirmed that it is a third type of variation site, and then it can be used together with the aforementioned high-quality SNP sites as the final SNP sites of the gene fragment to be tested.
- a single-base variation site with a quality score not greater than a preset threshold can be determined as a second-class variation site; then, the second-class variation site is reviewed for single-base variation based on the second gene variation detection model, and the site with a confirmed variation as a third-class variation site is determined as a low-quality variation site in the SNP site output by the aforementioned first variation detection model, which is further reviewed as a high-quality site by the second variation detection model. Furthermore, the target variation detection result corresponding to the gene fragment to be tested can be determined based on the first-class variation site, the third-class variation site, and the insertion or deletion site.
- the second gene variation detection model can be the aforementioned gene variation detection model based on full alignment as the neural network input.
- the second gene variation detection model can be a trained residual neural network model (ResNet).
- ResNet residual neural network model
- the genotype, base quality, alignment quality, positive and negative chains and other different dimensional features of each second type of variation site and its preceding and following base sequences can be specifically determined first as the input of the above-mentioned residual neural network model, so as to further screen the second type of variation sites based on the above-mentioned residual neural network model to determine whether the second type of variation sites are sites where variations actually exist.
- the credible variation site can be determined as a third type of variation site.
- the union of the first type of variation site and the third type of variation site is the final
- the SNP site and the Indel site determined by the target haplotype determined by genotype filling are the final Indel sites of the gene fragment to be tested. Then the final SNP site and the final Indel site together constitute the final target variation detection result of the gene fragment to be tested.
- step 803 includes but is not limited to steps 901 to 902 .
- Step 901 performing quality screening on the first type of variant sites and the third type of variant sites to obtain target single base variant sites;
- Step 902 determining the target variation detection result of the gene fragment to be tested according to the target single base variation site and the insertion or deletion site.
- the first type of variation sites and the third type of variation sites can be quality screened again to screen out some low-quality SNP sites.
- the first type of variation sites and the third type of variation sites can be quality screened to obtain the target single-base variation sites. Then, the target variation detection result of the gene fragment to be tested can be determined based on the target single-base variation site and the insertion or deletion site.
- step 901 includes but is not limited to steps 1001 to 1002 .
- Step 1001 obtaining the mutation frequencies corresponding to the first type of variation sites and the third type of variation sites;
- Step 1002 Determine the mutation sites among the first and third types of mutation sites whose mutation frequencies are greater than a preset frequency threshold as target single-base mutation sites.
- the mutation frequency of the variant site can be used to screen the first and third variant sites. Specifically, the mutation frequencies corresponding to the first and third variant sites can be obtained, and then the variant sites with mutation frequencies greater than a preset frequency threshold in the first and third variant sites are regarded as target single-base variant sites.
- mutation frequency or variation frequency refers to the frequency of a specific allele in a species' mutagenic population, or the ratio of all alleles. This frequency reflects the frequency of alleles in a specific population. For high-frequency variation sites, the AF value is relatively high, which means that the frequency of the allele in the population is high. The AF value of low-frequency variation sites is relatively low, indicating that the frequency of the allele in the population is low. In genetic studies, AF values are usually calculated by comparing genotype frequencies between different individuals.
- the first-class variant sites and the second-class variant sites can be screened based on a preset frequency threshold. Specifically, the first-class variant sites and the second-class variant sites whose mutation frequencies are greater than the preset frequency threshold can be determined as Target single-base variant sites.
- the frequency threshold can be set to 0.1, and then the first-category variant sites and the second-category variant sites with mutation frequencies higher than 0.1 are determined as target single-base variant sites.
- step 802 may include but is not limited to steps 1101 to 1103 .
- Step 1101 obtaining base sequence features corresponding to each second-category variable site
- Step 1102 performing gene variation identification on each base sequence feature based on the second gene variation detection model to obtain a gene variation identification result
- the specific process of performing single-base variation verification on the second type of variation site based on the second gene variation detection model can obtain the base sequence features corresponding to each second type of variation site.
- the base sequence features include the genotyping, base quality, comparison quality, and positive and negative strand features of the second type of variation site and its preceding and following base sequences.
- the base sequence features corresponding to each second-class variant site can be used as the input of the second gene variation detection model, and input into the second gene variation detection model to obtain the gene variation identification result output by the second gene variation detection model. Further, multiple third-class variant sites can be determined from multiple second-class variant sites based on the gene variation identification result output by the second gene variation detection model.
- the indel site in the gene variation detection result output by the neural network model based on stacking can be eliminated. Then, for high-quality variation sites, a genotype filling method relying on haplotype information can be further used to obtain more accurate indel sites; for low-quality variation sites, a neural network model based on full alignment can be used to screen out high-quality variation sites. Finally, the high-quality variation sites output by the neural network model based on stacking and the neural network model based on full alignment, and the indel sites obtained by genotype filling are used as the final variation detection results. This method can obtain more accurate gene variation detection results on the basis of higher gene variation detection efficiency.
- the disclosed embodiment proposes a scheme for genotyping low-quality sequencing sites based on the principle of linkage disequilibrium.
- This method uses SNP sites with good performance in long-read sequences to genotype low-quality sequencing sites or segments in long-read sequencing, infers the haplotype sequence of the sequencing sample, and obtains potential mutation sites. It can still greatly improve the accuracy and sensitivity of Indel recognition in long-read sequencing at low depth. This method is applicable to the analysis of both single samples and population samples.
- the homozygous and heterozygous SNP sites with the top 80% of the quality scores (qual) of the mutation sites output by the BiLSTM model are used as high-quality SNP sites.
- the genotypes are confirmed by using the selected high-quality sites and long-read sequences, and the haplotype sequences of the long-read sequences are confirmed by the combination of different genotypes. Then, haplotypes similar to the haplotypes of the long-read sequences are searched in the existing high-quality reference haplotype sequence set, and the haplotypes of the samples are inferred based on these haplotype subsets using the Lee and Stephens hidden Markov model, and the Indel sites present in the obtained haplotypes are obtained.
Landscapes
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Chemical & Material Sciences (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Analytical Chemistry (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Des modes de réalisation de la présente demande appartiennent au domaine de la détection de variation génétique, et concernent un procédé et un appareil de détection de variation génétique, un support de stockage et un dispositif informatique. Le procédé consiste à : au moyen d'un site de variation à base unique de haute qualité mesuré par un modèle pour effectuer une détection de variation génétique sur la base de l'accumulation de gènes, déterminer, à partir d'un ensemble d'haplotypes de référence, un haplotype d'un segment de gène en cours de test et, sur la base de l'haplotype, déterminer un site d'insertion/délétion du segment de gène en cours de test.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2023/143619 WO2025138253A1 (fr) | 2023-12-29 | 2023-12-29 | Procédé et appareil de détection de variation génétique, support de stockage et dispositif informatique |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2023/143619 WO2025138253A1 (fr) | 2023-12-29 | 2023-12-29 | Procédé et appareil de détection de variation génétique, support de stockage et dispositif informatique |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025138253A1 true WO2025138253A1 (fr) | 2025-07-03 |
Family
ID=96216459
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/143619 Pending WO2025138253A1 (fr) | 2023-12-29 | 2023-12-29 | Procédé et appareil de détection de variation génétique, support de stockage et dispositif informatique |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025138253A1 (fr) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050089904A1 (en) * | 2003-09-05 | 2005-04-28 | Martin Beaulieu | Allele-specific sequence variation analysis |
| US20130073214A1 (en) * | 2011-09-20 | 2013-03-21 | Life Technologies Corporation | Systems and methods for identifying sequence variation |
| CN106611106A (zh) * | 2016-12-06 | 2017-05-03 | 北京荣之联科技股份有限公司 | 基因变异检测方法及装置 |
| CN109416928A (zh) * | 2016-06-07 | 2019-03-01 | 伊路米纳有限公司 | 用于进行二级和/或三级处理的生物信息学系统、设备和方法 |
| US20230207050A1 (en) * | 2021-12-28 | 2023-06-29 | Illumina Software, Inc. | Machine learning model for recalibrating nucleotide base calls corresponding to target variants |
| CN116959560A (zh) * | 2023-03-16 | 2023-10-27 | 西安交通大学 | 基于第三代测序的基因组短变异深度学习检测方法及系统 |
-
2023
- 2023-12-29 WO PCT/CN2023/143619 patent/WO2025138253A1/fr active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050089904A1 (en) * | 2003-09-05 | 2005-04-28 | Martin Beaulieu | Allele-specific sequence variation analysis |
| US20130073214A1 (en) * | 2011-09-20 | 2013-03-21 | Life Technologies Corporation | Systems and methods for identifying sequence variation |
| CN109416928A (zh) * | 2016-06-07 | 2019-03-01 | 伊路米纳有限公司 | 用于进行二级和/或三级处理的生物信息学系统、设备和方法 |
| CN106611106A (zh) * | 2016-12-06 | 2017-05-03 | 北京荣之联科技股份有限公司 | 基因变异检测方法及装置 |
| US20230207050A1 (en) * | 2021-12-28 | 2023-06-29 | Illumina Software, Inc. | Machine learning model for recalibrating nucleotide base calls corresponding to target variants |
| CN116959560A (zh) * | 2023-03-16 | 2023-10-27 | 西安交通大学 | 基于第三代测序的基因组短变异深度学习检测方法及系统 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Schaid et al. | From genome-wide associations to candidate causal variants by statistical fine-mapping | |
| KR102447812B1 (ko) | 서열-특정 오류(sse)를 유발시키는 서열 패턴을 식별하기 위한 심층 학습-기반 프레임워크 | |
| US20140163900A1 (en) | Analyzing short tandem repeats from high throughput sequencing data for genetic applications | |
| CN114999573A (zh) | 一种基因组变异检测方法及检测系统 | |
| CN110621785B (zh) | 基于三代捕获测序对二倍体基因组单倍体分型的方法和装置 | |
| Chen et al. | Using Mendelian inheritance to improve high-throughput SNP discovery | |
| CN106480221A (zh) | 基于基因拷贝数变异位点对林木群体基因型分型的方法 | |
| Keightley et al. | Inference of site frequency spectra from high-throughput sequence data: quantification of selection on nonsynonymous and synonymous sites in humans | |
| CN117275577A (zh) | 一种基于二代测序技术检测人线粒体遗传突变位点算法 | |
| CN119359841B (zh) | 一种通过组合图形直观展示生物个体间遗传差异及组合图形生成方法 | |
| US20250157573A1 (en) | Genome wide assembly-based structural variant calling | |
| CN119832980A (zh) | 基因变异检测方法、装置、电子设备及存储介质 | |
| JPWO2019132010A1 (ja) | 塩基配列における塩基種を推定する方法、装置及びプログラム | |
| WO2025138253A1 (fr) | Procédé et appareil de détection de variation génétique, support de stockage et dispositif informatique | |
| CN113380324B (zh) | 一种T细胞受体序列motif组合识别检测方法、存储介质及设备 | |
| CN116508105A (zh) | 基于单倍型块的基因组标记插补 | |
| CN117174178B (zh) | 一种基于二代短读长序列的单倍型距离评估方法及装置 | |
| CN120442832A (zh) | 一种芥菜型油菜dna-bsa测序分析方法 | |
| Orr | Methods for detecting mutations in non-model organisms | |
| Prodanov | Read Mapping, Variant Calling, and Copy Number Variation Detection in Segmental Duplications | |
| Hedges | Bioinformatics of Human Genetic Disease Studies | |
| Al Bkhetan | Optimisation of phasing: towards improved haplotype-based genetic investigations | |
| Niehus | Multi-Sample Approaches and Applications for Structural Variant Detection | |
| Li et al. | CNV_MCD: Detection of copy number variations based on minimum covariance determinant using next-generation sequencing data | |
| Jiang et al. | Greedily assemble tandem repeats for next generation sequences |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23962927 Country of ref document: EP Kind code of ref document: A1 |