WO2025237449A2 - Procédé, dispositif et produit-programme pour prédire une mutation de saut de met14 - Google Patents
Procédé, dispositif et produit-programme pour prédire une mutation de saut de met14Info
- Publication number
- WO2025237449A2 WO2025237449A2 PCT/CN2025/112642 CN2025112642W WO2025237449A2 WO 2025237449 A2 WO2025237449 A2 WO 2025237449A2 CN 2025112642 W CN2025112642 W CN 2025112642W WO 2025237449 A2 WO2025237449 A2 WO 2025237449A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- met
- base changes
- reference genome
- skipping
- predicting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/50—Mutagenesis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Definitions
- This application relates to the field of intelligent healthcare, specifically to a method, apparatus, program product, and computer-readable storage medium for predicting MET 14 skipping mutations.
- NSCLC Non-small cell lung cancer
- driver genes The mutation status of driver genes is an important predictor of the efficacy of targeted therapy.
- the MET gene is another important driver gene in NSCLC and has become a focus of targeted therapy.
- MET exon 14 MET 14
- MET gene amplification a form of MET exon 14
- protein overexpression MET inhibitors
- MET inhibitors have achieved good anti-tumor effects in NSCLC patients with MET 14 skipping mutations.
- the rapid development of drugs has led to the FDA approving Tepotinib and Capmatinib, and the NMPA approving servotinib for NSCLC patients with MET 14 skipping mutations.
- methods for detecting MET 14 skipping mutations include next-generation sequencing (NGS), Sanger sequencing of exon 14 and its flanking introns, reverse transcription-PCR (RT-PCR), and RNA-based NGS detection.
- NGS next-generation sequencing
- RT-PCR reverse transcription-PCR
- RNA-based analysis detects a higher proportion of MET 14 skipping cases, with an RNA detection rate of 4.2%.
- RNA-based analysis is highly dependent on RNA quality, which may be unsatisfactory in some clinical samples.
- this application provides a method for predicting MET 14 skipping mutations, specifically including: acquiring patient MET gene data, including chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes;
- the MET gene data further includes: reference genome bases and mutant bases.
- the chromosome number, physical coordinates, reference genome bases, mutant bases, number of reference genome base changes, and number of mutant base changes are input into the prediction model to obtain the prediction result of whether it is a MET 14 skipping mutation.
- the MET 14 skipping mutation includes classical splicing and/or hidden splicing; the chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes are input into the prediction model to obtain the prediction result of whether it is a classical splicing and/or hidden splicing MET 14 skipping mutation.
- the classic shear includes GT-AG.
- the hidden shearing includes one or more of the following: indels of 50 bp or more, polypyrimidine regions of the intron region, Branch AA, and shearing aids ESE of the exon region.
- the hidden shearing includes: indels of 50 bp or more, polypyrimidine regions of the intron region, Branch AA, and shearing aids ESE of the exon region.
- the training process of the prediction model is as follows: obtain the patient's MET gene dataset and labels, including chromosome number, physical coordinates, number of reference genome bases and number of mutated base changes, and input the chromosome number, physical coordinates, number of reference genome bases and number of base changes into the model to be trained to obtain the prediction model, wherein the labels are RNA positive samples or other positive samples.
- the prediction model may employ one or more of the following: random forest, decision tree, support vector machine, logistic regression model, convolutional neural network, XGBoost, and AdaBoost.
- the process of acquiring the patient's MET gene data includes:
- MET gene data including: chromosome number, physical coordinates, reference genome bases, and mutated bases;
- the number of base changes in the reference genome and the number of base changes in the mutant bases are calculated to obtain the number of base changes in the reference genome and the number of base changes in the mutant bases; thus, the patient's MET gene data are obtained.
- the acquisition of the basic MET gene data includes:
- the alignment results were obtained by comparing the NGS sequencing data with the MET reference genome.
- the comparison results are subjected to mutation detection to obtain the detection results
- the NGS sequencing data are DNA sequencing data and/or RNA sequencing data.
- the purpose of this application is to provide a computer program product that includes a computer program or instructions, which are executed by a processor to implement the above-described method for predicting MET 14 jump mutations.
- the purpose of this application is to provide a computer device including a memory, a processor, and a computer program or instructions stored in the memory, wherein the computer program or instructions are executed by the processor to implement the above-described method for predicting MET 14 skip mutations.
- the purpose of this application is to provide a computer-readable storage medium having a computer program or instructions stored thereon, which are executed by a processor to implement the above-described method for predicting MET 14 jump mutations.
- this application focuses on considering the impact of numerous or even structural changes in the number of base variations in MET gene exons and introns on the binding stability of snRNPs, thereby discovering occult splicing mutation patterns in MET 14 other than classical splicing.
- the aim is to achieve accurate prediction of occult splicing in MET 14 exons at the DNA level, even when tissues are inaccessible or RNA data is unavailable.
- the prediction model is trained using the number of base variations as the primary feature, enabling the detection of both classical and occult splicing mutations in MET 14, avoiding omissions and improving disease detection rates.
- Figure 1 is a schematic flowchart of the method for predicting MET 14 skipping mutations provided in an embodiment of this application;
- Figure 2 is a schematic diagram of the system for predicting MET 14 skipping mutations provided in an embodiment of this application;
- Figure 3 is a schematic diagram of a device for predicting MET 14 skipping mutations provided in an embodiment of this application;
- Figure 4 shows the mutation type detection rate of NSCLC research and internal data provided in the embodiments of this application
- Figure 5 shows the comparison results of mutation types between NSCLC research and internal data provided in the embodiments of this application.
- Figure 6 shows the results of DNA and RNA dual-detection mutations in NSCLC research and internal data provided in the embodiments of this application;
- Figure 7 is a schematic flowchart of the method for predicting MET 14 skipping mutations based on MET gene basic data provided in the embodiments of this application;
- Figure 8 is a schematic diagram of a system for predicting MET 14 skipping mutations based on MET gene basic data provided in an embodiment of this application.
- Figure 1 is a schematic diagram of the method for predicting MET 14 skipping mutations provided in an embodiment of this application, specifically including:
- S1 Obtain the patient's MET gene data, including chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes;
- MET 14 skipping mutation prediction is performed by acquiring MET gene data.
- the MET gene data consists of basic MET gene data and processed basic MET gene data, specifically:
- MET gene data including chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes;
- a prediction result is obtained to determine whether it is a MET 14 skipping mutation.
- the basic data of the MET gene data includes one or more of the following: chromosome number, physical coordinates; the processed data of the raw data includes: the number of base changes in the reference genome, the number of mutated base changes.
- the acquisition of MET gene data includes: acquiring the patient's NGS sequencing data;
- the alignment results were obtained by comparing the NGS sequencing data with the MET reference genome.
- the comparison results are subjected to mutation detection to obtain the detection results
- MET gene data were extracted based on the detection results.
- the NGS sequencing data are DNA sequencing data and/or RNA sequencing data.
- a method for acquiring genetic data includes:
- dbSNP When the rsID is known, directly query dbSNP or Ensembl; dbSNP (NCBI): Enter the rsID (e.g., rs123456) to query and obtain the chromosome location, reference/mutated bases, and reference genome version. Ensembl VEP: Enter the chromosome coordinates or rsID to obtain detailed annotations. The chromosome number, coordinates, reference sequence, and mutant sequence can be directly extracted from the data returned by the results page or API.
- gene data is extracted using tools such as Python and R.
- tools like FastQC are used to assess the quality of the raw NGS sequencing data (usually in FASTQ format), checking metrics such as sequencing data quality distribution, base content, GC content, and sequence repetition rate to understand the overall data quality.
- One or more tools, such as FastP Trimmomatic and Cutadapt are used to remove adapter sequences, low-quality bases, and short sequence fragments from the sequencing data.
- quality thresholds e.g., Phred quality value below 20
- length thresholds e.g., sequence length less than 30 bp
- Sequence alignment Download a suitable reference genome containing the MET gene to ensure its integrity and accuracy, enabling accurate alignment of the sequencing reads to the genome.
- Use alignment tools such as BWA (Burrows-Wheeler Aligner) and Bowtie2 to align the preprocessed sequencing data with the reference genome. These tools can quickly and accurately find the best matching positions of sequencing reads on the reference genome and generate alignment files in SAM (Sequence Alignment/Map) format.
- Use tools such as SAMtools to convert the SAM files to BAM (Binary Alignment/Map) files.
- BAM files are a binary compressed form of SAM files, with a smaller file size and easier subsequent processing.
- bwa mem is used to align the quality-controlled split reads to the reference genome hg19 (GRCh37)
- samtools view is used to filter out multiple alignments and unaligned reads
- samtools sort is used to sort the alignment results to generate sort.bam.
- Gencore was used to deduplicate tumor reads, and low-quality and erroneous bases were corrected. Then, samtools sort was used for ranking. The deduplicated and sorted BAM was used for somatic snv indel detection. Sambamba was used to deduplicate reads from both tumor and leukocyte control samples; the leukocyte BAM was used for somatic snv indel detection.
- Variant Detection Select a variant detection tool.
- Common variant detection tools include GATK (Genome Analysis Toolkit) and FreeBayes. These tools can identify single nucleotide variants (SNVs), insertions and deletions (InDels), and other variant information in sequencing data based on alignment results. Through these steps, variant sites in the sample can be accurately detected, and a VCF file containing all variant information can be generated.
- the variant detection tool samtools mpileup is used to create pileup format files for tumor and leukocyte samples, and MutLoc (SNPIndel) and Varcidt (longindel) are used to detect original mutations. A VCF file containing all variant information is then generated.
- Screening for MET gene-related variants Using tools such as VCFtools and Bcftools, based on the location information of the MET gene on the reference genome (such as chromosomal location and gene region), the variant sites related to the MET gene are screened from the whole genome VCF file, and a VCF file containing only MET gene variant information is generated.
- tools such as VCFtools and Bcftools, based on the location information of the MET gene on the reference genome (such as chromosomal location and gene region)
- the NGS capture panel design, library construction, and sequencing methods used in this application can be considered to yield approximately or identical sequencing data at the experimental level.
- Tumor somatic mutation information can be obtained through background pooling or paired analysis.
- the model input data is obtained from VCF files, including chromosome, physical coordinates, reference genome bases, and mutated base information.
- OneHot encoding of ATCG bases was not performed, and base changes were converted into numerical statistics as input elements through data preprocessing. This is because OneHot encoding has limitations on base sequence length and is not suitable for scenarios involving the discovery of DNA sequence disruptions such as deletions or base substitutions exceeding 50 bp.
- a supervised machine learning model a random forest model—was constructed based on nearly five years of internal NGS data from non-small cell lung cancer patients.
- Sample types included tissue slides and paraffin rolls, plasma, pleural effusion, cerebrospinal fluid, and other DNA data, as well as RNA-positive results as labels.
- Extracted features needed to be converted into a format suitable for the machine learning model's input. This included converting text data to numerical data, handling missing values, and encoding categorical features.
- the ATCG bases of the DNA sequence are one-hot encoded, converting them into information that the neural network can process.
- One-hot encoding is very useful when processing DNA sequence data; it converts sequence data into numerical data, allowing the machine learning model to better understand and process this data. For example, A: [1,0,0,0]; T: [0,1,0,0]; C: [0,0,1,0]; G: [0,0,0,1].
- one-hot encoding significantly increases the dimensionality of the data; based on SpliceAI testing and literature recommendations, the base sequence length should not exceed 50 bp.
- RNA splicing is catalyzed by snRNPs and the assembly of other proteins, which together constitute the spliceosome.
- damage to snRNPs caused by deletions and substitutions exceeding 50 bp is a crucial characteristic and must not be overlooked in analysis and annotation. Therefore, the ATCG base counts for REF and ALT are shown in Tables 2 and 3, with base count variations serving as the feature element for the training and validation datasets.
- S2 Input the chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes into the prediction model to obtain the prediction result of whether it is a MET 14 skipping mutation.
- the MET gene baseline data further includes: reference genome bases and mutated bases.
- the chromosome number, physical coordinates, reference genome bases, mutated bases, number of reference genome base variations, and number of mutated base variations are input into the prediction model to obtain a prediction result regarding whether it is a MET 14 skipping mutation.
- the training process of the prediction model is as follows: The patient's MET gene dataset and labels, including chromosome number, physical coordinates, number of reference genome bases, and number of mutated base variations, are obtained.
- the chromosome number, physical coordinates, number of reference genome bases, and number of mutated base variations are input into the model to be trained to obtain the prediction model.
- the labels are RNA-positive samples or other positive samples.
- the training process of the prediction model is as follows: obtain the patient's MET gene dataset and labels, including chromosome number, physical coordinates, number of reference genome bases and number of mutated base changes, reference genome bases, mutated bases, and input the chromosome number, physical coordinates, reference genome bases, mutated bases, number of reference genome bases, and number of mutated base changes into the model to be trained to obtain the prediction model, wherein the labels are RNA positive samples or other positive samples.
- the MET 14 skipping mutation includes classical splicing and/or hidden splicing; the chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes are input into the prediction model to obtain the prediction result of whether it is a classical splicing and/or hidden splicing MET 14 skipping mutation.
- the classic shear includes GT-AG.
- the chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes are input into the prediction model to obtain a prediction result of whether it is a classic splicing MET 14 skipping mutation.
- the chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes are input into the prediction model to obtain a prediction result of whether it is a hidden splicing MET 14 skipping mutation.
- the chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes are input into the prediction model to obtain the prediction results of whether it is a classical splicing or a hidden splicing MET 14 skipping mutation.
- the chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes are input into the prediction model to obtain a prediction result of whether it is a MET 14 skipping mutation, and includes the predicted probability of classic splicing MET 14 skipping mutation and hidden splicing MET 14 skipping mutation.
- the hidden shearing includes one or more of the following: indels of 50 bp or more, polypyrimidine regions of the intron region, Branch AA, and shearing aids ESE of the exon region;
- the hidden scission includes: indels of 50 bp or more, polypyrimidine regions of the intron region, BranchAA, and scission aids ESE of the exon region.
- the prediction model employs one or more of the following: random forest, decision tree, support vector machine, logistic regression model, convolutional neural network, XGBoost, and AdaBoost.
- X represents feature data
- y represents label data, containing the label for each sample.
- RNA positive is represented by 1, and negative by 0.
- the test set proportion is set to 0.3, meaning 30% of the data will be used as the test set.
- ⁇ random_state ⁇ is set to 42 to ensure consistent splitting on each code run, contributing to reproducibility.
- a random forest model is created, using GridSearchCV to iterate through different parameter combinations, and 5-fold cross-validation is used to evaluate the performance of each combination. The optimal parameter combination is then used to train the model, which is evaluated on the test set. Finally, the trained model is saved for prediction on new data.
- the method for obtaining the specific chromosome number, physical coordinates, reference genome sequence, and mutant genome sequence of the mutation does not include PCR and amplicon methods, or may lead to missed detection of MET 14 hidden splicing mutations.
- the optimal parameter combination for this Rfmodel model is optimized as follows: max_depth: None, meaning there is no limit to the maximum depth of the decision tree, and the tree will grow until all leaf nodes are pure. Best cross-validation score: 0.98, the best score obtained in cross-validation is 0.98, which is a very high score, indicating that the model has good performance on the training data, as shown in Table 4. Accuracy: 0.99, the accuracy is 99%, meaning that the model can correctly predict 99% of the samples. Sensitivity: 1.00, also known as recall or true positive rate, indicates that the model's ability to correctly identify positive samples is 100%. Specificity: 0.98, indicating that the model's ability to correctly identify negative samples is 98%, as shown in Table 4. The predictive performance of the random forest model in this application is superior to that of DNN neural network (Accuracy: 0.67), support vector machine (Accuracy: 0.81), and Logist regression model (Accuracy: 0.70) in peer-to-peer testing.
- the Rfmodel model can predict: classical cleavage GT-AG, and hidden cleavage including: indels longer than 50 bp, polypyrimidine regions (pypyridine regions) in the intron region, and branch AA and exon regions, as well as cleavage cofactors ESE.
- this model can effectively compensate for or replace SpliceAI.
- Table 5 compares the prediction of newly collected negative and positive data in the past year with other analytical methods.
- the process of training the prediction model and applying the prediction model to predict MET 14 skipping mutations is as follows:
- Step A Obtain lung cancer DNA data from non-small cell lung cancer patients over the years and analyze the MET gene VCF analysis file content: chromosome number chr, physical coordinates pos, reference genome base ref, and mutated base alt information as input layer pre-information.
- Step B Data preprocessing.
- the information from Step A is converted into chr, pos, lenR, and lenA.
- Step C Load the data into the model.
- ⁇ random_state ⁇ sets the seed for the random number generator to ensure that the results of each split are the same, increasing the reproducibility of the code.
- Step D Obtain new NGS data.
- the bioinformatics analysis VCF file includes: chromosome number (chr), physical coordinates (pos), reference genome bases (ref), and mutant base (alt) information as input pre-information.
- Step E Data preprocessing.
- the input layer and information numbers are converted into chr, pos, lenR, and lenA, and loaded into the random forest model Rfmodel for prediction.
- Table 1 shows the binary classification prediction results of Rfmodel. "0" indicates that it is not a MET 14 splice mutation, and "1" indicates that it is a MET 14 splice mutation.
- Figure 7 illustrates an embodiment of another method for predicting MET 14 skipping mutations provided in this application.
- the method involves obtaining basic MET gene data and processing the basic data to predict MET 14 skipping mutations. Specifically:
- MET gene data (obtain the patient's basic MET gene data), including chromosome number, physical coordinates, reference genome bases, and mutated bases;
- the number of base changes in the reference genome and the number of base changes in the mutant bases are calculated to obtain the number of base changes in the reference genome and the number of base changes in the mutant bases.
- the input to the (first) prediction model also includes reference genome bases and mutated bases.
- the chromosome number, physical coordinates, reference genome bases, mutated bases, number of reference genome base changes, and number of mutated base changes are input to the prediction model to obtain the prediction result of whether it is a MET 14 skipping mutation.
- the training process of the (first) prediction model is as follows: obtain the patient's MET gene dataset and labels, including chromosome number, physical coordinates, reference genome bases, and mutated bases; calculate the number of changes in reference genome bases and mutated bases, and then input the chromosome number, physical coordinates, number of changes in reference genome bases, and number of changes in mutated bases into the model to be trained for training to obtain the prediction model, wherein the labels are RNA positive samples or other positive samples.
- the training process of the (first) prediction model is as follows: obtain the patient's MET gene dataset and labels, including chromosome number, physical coordinates, reference genome bases, and mutated bases; calculate the number of changes in reference genome bases and mutated bases, and then input the chromosome number, physical coordinates, number of changes in reference genome bases, number of changes in mutated bases, reference genome bases, and mutated bases into the model to be trained for training to obtain the prediction model, wherein the labels are RNA positive samples or other positive samples.
- the chromosome number, physical coordinates, reference genome bases, mutated bases, number of reference genome base changes, and number of mutated base changes are input into the prediction model to obtain the prediction result of whether it is a classical splicing or a hidden splicing MET 14 skipping mutation.
- the chromosome number, physical coordinates, reference genome bases, mutated bases, number of reference genome base changes, and number of mutated base changes are input into the prediction model to obtain a prediction result of whether it is a classic splicing MET 14 skipping mutation.
- the chromosome number, physical coordinates, reference genome bases, mutated bases, number of reference genome base changes, and number of mutated base changes are input into the prediction model to obtain the prediction result of whether it is a hidden splicing MET 14 skipping mutation.
- the MET 14 skipping mutation includes classical splicing and/or hidden splicing; the chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes are input into the prediction model to obtain the prediction result of whether it is a classical splicing and/or hidden splicing MET 14 skipping mutation.
- the classic shear includes GT-AG.
- the chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes are input into the prediction model to obtain a prediction result of whether it is a classic splicing MET 14 skipping mutation.
- the chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes are input into the prediction model to obtain a prediction result of whether it is a hidden splicing MET 14 skipping mutation.
- the chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes are input into the prediction model to obtain the prediction results of whether it is a classical splicing or a hidden splicing MET 14 skipping mutation.
- the chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes are input into the prediction model to obtain a prediction result of whether it is a MET 14 skipping mutation, and includes the predicted probability of classic splicing MET 14 skipping mutation and hidden splicing MET 14 skipping mutation.
- the hidden shearing includes one or more of the following: indels of 50 bp or more, polypyrimidine regions of the intron region, Branch AA, and shearing aids ESE of the exon region;
- the hidden scission includes: indels of 50 bp or more, polypyrimidine regions of the intron region, BranchAA, and scission aids ESE of the exon region.
- the difference between the embodiments in Figure 1 and Figure 7 lies in the data acquired.
- the embodiment in Figure 1 acquires basic MET gene data and processed gene data, which are then directly input into the prediction model for prediction.
- the embodiment in Figure 7 acquires basic MET gene data, performs computational processing on the basic MET gene data, and then inputs the basic MET gene data and the processed data into the first prediction model for prediction.
- the prediction model and the first prediction model are different prediction models trained based on different input data.
- the present application also discloses a computer program product or system, including a computer program that, when executed by a processor, implements the above-described method steps for predicting MET 14 skip mutations.
- Figure 2 is a schematic diagram of the system for predicting MET 14 skipping mutations provided in this application embodiment, specifically including: an acquisition unit: acquiring patient MET gene data, including chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes;
- Prediction Unit Input the chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes into the prediction model to obtain the prediction result of whether it is a MET 14 skipping mutation.
- Figure 3 is a schematic diagram of a device for predicting MET 14 skip mutations provided in an embodiment of this application, specifically including: a memory and a processor; the memory is used to store program instructions; the processor is used to call the program instructions, and when the program instructions are executed, any one of the above-described methods for predicting MET 14 skip mutations is executed.
- the present application also discloses a computer-readable storage medium storing a computer program, which, when executed by a processor, represents any of the above-described methods for predicting MET 14 jump mutations.
- Figure 8 is a schematic diagram of the system for predicting MET 14 skipping mutations provided in the embodiments of this application, specifically including: an acquisition unit: acquiring patient MET gene data, including chromosome number, physical coordinates, reference genome bases, and mutated bases;
- Calculation unit Calculates the number of base changes in the reference genome and the number of base changes in the mutant bases to obtain the number of base changes in the reference genome and the number of base changes in the mutant bases;
- Prediction Unit Input the chromosome number, physical coordinates, number of base changes in the reference genome, and number of mutated base changes into the prediction model to obtain the prediction result of whether it is a MET 14 skipping mutation.
- multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be electrical, mechanical, or other forms.
- the units described as separate components may or may not be physically separated; the components shown as units may or may not be physical units, that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of this embodiment.
- the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
- the integrated units described above can be implemented in hardware or as software functional units. Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be implemented by a program instructing related hardware.
- This program can be stored in a computer-readable storage medium, which may include: read-only memory (ROM), random access memory (RAM), a magnetic disk, or an optical disk, etc.
- the program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioethics (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
La présente demande concerne le domaine du traitement médical intelligent et en particulier un procédé, un dispositif et un produit-programme pour prédire une mutation conduisant à un saut du MET14. Le procédé consiste à : obtenir des données sur le gène MET d'un patient comprenant un nombre de chromosomes, des coordonnées physiques, un nombre de changements de base dans un génome de référence et un nombre de changements de base mutés ; et transmettre le nombre de chromosomes, les coordonnées physiques, le nombre de changements de base dans un génome de référence, et le nombre de changements de base mutés à un modèle de prédiction de façon à obtenir un résultat de prédiction indiquant si une mutation conduisant à un saut de MET14 est présente. La présente invention peut détecter un épissage classique et un épissage caché de mutation conduisant à un saut de MET14, améliorer le taux de détection de mutation et présente une bonne valeur clinique.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510183538.9A CN120048347A (zh) | 2025-02-19 | 2025-02-19 | 一种预测met 14跳跃突变的方法、设备及程序产品 |
| CN202510183538.9 | 2025-02-19 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025237449A2 true WO2025237449A2 (fr) | 2025-11-20 |
Family
ID=95754687
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2025/112642 Pending WO2025237449A2 (fr) | 2025-02-19 | 2025-08-05 | Procédé, dispositif et produit-programme pour prédire une mutation de saut de met14 |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN120048347A (fr) |
| WO (1) | WO2025237449A2 (fr) |
-
2025
- 2025-02-19 CN CN202510183538.9A patent/CN120048347A/zh active Pending
- 2025-08-05 WO PCT/CN2025/112642 patent/WO2025237449A2/fr active Pending
Also Published As
| Publication number | Publication date |
|---|---|
| CN120048347A (zh) | 2025-05-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7689557B2 (ja) | 相同組換え欠損を推定するための統合された機械学習フレームワーク | |
| CN106909806B (zh) | 定点检测变异的方法和装置 | |
| US20230187021A1 (en) | Methods for Non-Invasive Assessment of Genomic Instability | |
| US20230114581A1 (en) | Systems and methods for predicting homologous recombination deficiency status of a specimen | |
| US20250333797A1 (en) | Normalizing tumor mutation burden | |
| US20220215900A1 (en) | Systems and methods for joint low-coverage whole genome sequencing and whole exome sequencing inference of copy number variation for clinical diagnostics | |
| CA3167253A1 (fr) | Procedes et systemes de dosage de biopsie de liquide | |
| CN104762402A (zh) | 超快速检测人类基因组单碱基突变和微插入缺失的方法 | |
| WO2021258026A1 (fr) | Détection de réponse et progression moléculaire à partir d'adn acellulaire circulant | |
| US20240076744A1 (en) | METHODS AND SYSTEMS FOR mRNA BOUNDARY ANALYSIS IN NEXT GENERATION SEQUENCING | |
| US20220415443A1 (en) | Machine-learning model for generating confidence classifications for genomic coordinates | |
| WO2019132010A1 (fr) | Procédé, appareil et programme d'estimation de type de base dans une séquence de bases | |
| WO2024254548A1 (fr) | Prédiction de sexe biologique basée sur la méthylation | |
| WO2025237449A2 (fr) | Procédé, dispositif et produit-programme pour prédire une mutation de saut de met14 | |
| CN117561573A (zh) | 从碱基判读错误模式自动鉴定核苷酸测序中的故障来源 | |
| US20250166734A1 (en) | Machine learning systems and methods for somatic mutation detection | |
| US20240233872A9 (en) | Component mixture model for tissue identification in dna samples | |
| US20240296920A1 (en) | Redacting cell-free dna from test samples for classification by a mixture model | |
| WO2023245068A1 (fr) | Systèmes et procédés de séquençage et d'analyse de diversité d'acides nucléiques | |
| WO2025188814A1 (fr) | Technique de classification consensuelle pour déterminer une ascendance génétiquement inférée à partir d'un profilage génomique complet d'adn tumoral | |
| HK1239889A (en) | Method of detecting designated location of variation and device thereof | |
| HK1239889A1 (en) | Method of detecting designated location of variation and device thereof |