[go: up one dir, main page]

CN119361137A - Lung cancer detection method and system using high-throughput sequencing technology - Google Patents

Lung cancer detection method and system using high-throughput sequencing technology Download PDF

Info

Publication number
CN119361137A
CN119361137A CN202411921747.0A CN202411921747A CN119361137A CN 119361137 A CN119361137 A CN 119361137A CN 202411921747 A CN202411921747 A CN 202411921747A CN 119361137 A CN119361137 A CN 119361137A
Authority
CN
China
Prior art keywords
patient
lung cancer
markers
patients
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202411921747.0A
Other languages
Chinese (zh)
Other versions
CN119361137B (en
Inventor
张力喆
刘沁
崔天星
王其成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Weierxiao Medical Laboratory Co ltd
Original Assignee
Hangzhou Weierxiao Medical Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Weierxiao Medical Laboratory Co ltd filed Critical Hangzhou Weierxiao Medical Laboratory Co ltd
Priority to CN202411921747.0A priority Critical patent/CN119361137B/en
Publication of CN119361137A publication Critical patent/CN119361137A/en
Application granted granted Critical
Publication of CN119361137B publication Critical patent/CN119361137B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a lung cancer detection method and a system using a high-throughput sequencing technology, which belong to the technical field of biological information, and specifically comprise the steps of determining false detection data of lung cancer markers in a detection depth corresponding to high-throughput sequencing and PCR amplification process according to analysis results of detection data, determining the credibility coefficient of the lung cancer markers and the credible markers according to distribution data of the lung cancer markers and other types of lung cancer markers in different patients, determining the similarity condition of the credible markers of the patients and the distribution data of the markers of different historical diagnosis patients, determining the distribution similarity coefficient of the patients and different historical diagnosis patients and the distribution data of reference diagnosis patients according to the credible coefficients of the different credible markers, and outputting the detection results of lung cancer of the patients based on the distribution data of misdiagnosed patients in the reference diagnosis patients in different similarity coefficient intervals, so that the accuracy of the diagnosis results is improved.

Description

Lung cancer detection method and system using high-throughput sequencing technology
Technical Field
The invention belongs to the technical field of biological information, and particularly relates to a lung cancer detection method and system using a high-throughput sequencing technology.
Background
In order to realize the detection treatment of lung cancer, in the invention patent application CN202311543059.0, namely a method for predicting the oncogenicity expression of the knowledge graph of the multi-type variation of the lung cancer genome, the data comparison and correction result of the whole genome high-throughput sequencing result are converted into abnormal data and a preliminary knowledge graph is constructed, and the additional information is included and the knowledge graph is perfected, but the following technical problems exist:
In the lung cancer detection process using the high-throughput sequencing technology, in the prior art, the lung cancer is often detected based on the identification result of the tumor marker or the marker composition in the gene detection result, but the gene detection result is related to the PCR amplification, the detection depth and the like except for the influence of the gene mutation of the patient, so that if the influence of other factors is ignored, the accuracy of the lung cancer detection result cannot be ensured.
Aiming at the technical problems, the invention provides a lung cancer detection method and a lung cancer detection system by using a high-throughput sequencing technology.
Disclosure of Invention
In order to achieve the purpose of the invention, the invention adopts the following technical scheme:
according to one aspect of the present invention, a method for lung cancer detection using high throughput sequencing technology is provided.
A lung cancer detection method by using a high-throughput sequencing technology specifically comprises the following steps:
s1, determining a sequencing result of a lung cancer marker of a patient by using a high-throughput sequencing technology, and entering a next step when determining that the sequencing result of the patient is abnormal based on distribution data of the lung cancer marker of the patient and diagnosis matching data of different types of lung cancer markers;
S2, determining false detection data of the lung cancer markers in the detection depth corresponding to the high-throughput sequencing and the PCR amplification process according to analysis results of the detection data, and determining the credibility coefficient of the lung cancer markers and credible markers by combining the lung cancer markers with the distribution data of other types of lung cancer markers in different patients;
S3, determining the similarity of the distribution data of the trusted markers of the patient and the markers of different historical diagnosis patients, and combining the trusted coefficients of the different trusted markers to determine the distribution similarity coefficients of the patient and the different historical diagnosis patients and the reference diagnosis patients;
s4, based on distribution data of misdiagnosed patients in reference diagnosis patients in different similarity coefficient intervals, outputting a detection result of lung cancer of the patients.
The invention has the beneficial effects that:
Based on the distribution data of the lung cancer markers of the patient and the diagnosis matching data of the lung cancer markers of different types, whether the sequencing result of the patient is abnormal or not is determined, so that the accurate assessment of whether the sequencing result of the patient is abnormal or not from the angles of the distribution number of the lung cancer markers and the accuracy of the lung cancer diagnosis of the different lung cancer markers is realized, the differentiated analysis processing of the sequencing result of the different patients is realized, and the accuracy of the lung cancer detection analysis result of the patient is also improved.
Based on the distribution data of the misdiagnosed patients in the reference diagnosis patients in different similar coefficient intervals, the output of the lung cancer detection results of the patients is carried out, the distribution situation of the misdiagnosed patients in the different reference diagnosis patients is fully considered, meanwhile, the comprehensive consideration of the similar coefficients of the different reference diagnosis patients is realized, the accurate evaluation of the misdiagnosis rate of the lung cancer detection results of the patients is realized, and the technical problem that the lung cancer diagnosis results are inaccurate due to single consideration of sequencing data is avoided.
The further technical scheme is that the distribution data of the lung cancer markers of the patient are determined according to the distribution positions of the corresponding gene segments of the lung cancer markers of the patient.
The further technical scheme is that the diagnosis matching data of the lung cancer marker is determined according to the distribution data of misdiagnosed patients in the historical diagnosis patients corresponding to the lung cancer marker.
The further technical scheme is that determining that the sequencing result of the patient is abnormal specifically comprises:
determining a lung cancer marker of the patient based on the distribution data of the lung cancer marker of the patient, and taking the lung cancer marker as a patient matching marker;
According to the diagnosis matching data of different patient matching markers, determining the distribution data of misdiagnosed patients in the historical diagnosis patients corresponding to the different patient matching markers, and determining the diagnosis matching coefficients of the different patient matching markers through the distribution data of the misdiagnosed patients;
Determining the diagnosis matching markers of the patient through the diagnosis matching coefficients of different patient matching markers, and determining whether the sequencing result of the patient is abnormal or not by utilizing the number of the diagnosis matching markers.
The further technical scheme is that the diagnosis matching coefficient is determined according to the distribution proportion of the misdiagnosed patients.
The further technical scheme is that the diagnosis matching marker of the patient is a patient matching marker with a diagnosis matching coefficient larger than a preset matching coefficient.
The further technical scheme is that the method for outputting the lung cancer detection result of the patient specifically comprises the following steps:
determining the distribution proportion of misdiagnosed patients in different similar coefficient intervals according to the distribution data of misdiagnosed patients in the reference diagnosis patients in different similar coefficient intervals;
Determining corresponding preset correction factors based on the number of reference diagnosis patients in different similar coefficient intervals, and determining the weight coefficients of the different similar coefficient intervals by using the products of preset weight coefficients corresponding to the different similar coefficient intervals and the preset correction factors;
Determining the lung cancer diagnosis misdiagnosis rate of the patient according to the weight coefficients of different similar coefficient intervals and the distribution proportion of misdiagnosed patients, and outputting the detection result of the lung cancer of the patient by utilizing the lung cancer diagnosis misdiagnosis rate.
The further technical scheme is that the method for outputting the detection result of the lung cancer of the patient by using the lung cancer diagnosis misdiagnosis rate specifically comprises the following steps:
When the lung cancer diagnosis misdiagnosis rate is smaller than a preset misdiagnosis rate, determining that the lung cancer detection result of the patient is lung cancer;
And when the lung cancer diagnosis misdiagnosis rate is not less than the preset misdiagnosis rate, determining that the lung cancer detection result of the patient is that the lung cancer does not exist.
In a second aspect, the invention provides a computer system comprising a memory and a processor in communication, and a computer program stored on the memory and capable of running on the processor, wherein the processor, when running the computer program, performs a lung cancer detection method as described above using high throughput sequencing techniques.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention as set forth hereinafter.
In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
FIG. 1 is a flow chart of a method for lung cancer detection using high throughput sequencing technology.
FIG. 2 is a flow chart of a method of determining the presence of anomalies in a sequencing result of a patient.
Fig. 3 is a flow chart of a method of determining a confidence coefficient for a lung cancer marker.
Fig. 4 is a flow chart of a method of referring to a determination of a diagnostic patient.
FIG. 5 is a block diagram of a computer system.
Detailed Description
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present disclosure.
The gene detection result is related to PCR amplification, detection depth and the like besides the influence of the gene mutation of the patient, so that the determination of the lung cancer detection result is realized if the influence of multiple factors is needed.
And taking the lung cancer markers of the patients as patient matching markers, determining diagnosis matching coefficients of the different patient matching markers according to the quantity ratio of misdiagnosed patients in the historic diagnosis patients corresponding to the different patient matching markers, and determining that the sequencing result of the patients is abnormal when the sum of the diagnosis matching coefficients of the different patient matching markers is smaller than 0.7.
Determining the false detection proportion corresponding to the lung cancer markers by using the detection depth corresponding to the high-throughput sequencing and the false detection data of the lung cancer markers in the PCR amplification process, determining the credibility coefficient of the lung cancer markers by using the false detection proportion, and determining that the credibility coefficient of the lung cancer markers is 0.9 when the false detection proportion is 0.1, and taking the lung cancer markers with the credibility coefficient more than 0.95 as the credible markers.
And determining distribution similarity coefficients of the patient and different historical diagnosis patients after normalization processing and summation by using the credibility coefficients of similar lung cancer markers, and taking the historical diagnosis patients with the distribution similarity coefficients more than 0.7 as reference diagnosis patients.
Determining the lung cancer diagnosis misdiagnosis rate by utilizing the ratio of the number of misdiagnosis in a reference diagnosis patient, determining that the lung cancer of the patient exists as the detection result of the lung cancer when the lung cancer diagnosis misdiagnosis rate is smaller than the preset misdiagnosis rate, and determining that the lung cancer of the patient does not exist as the detection result of the lung cancer when the lung cancer diagnosis misdiagnosis rate is not smaller than the preset misdiagnosis rate.
In order to solve the above problems, according to an aspect of the present invention, as shown in fig. 1, there is provided a lung cancer detection method using a high throughput sequencing technology, specifically comprising:
s1, determining a sequencing result of a lung cancer marker of a patient by using a high-throughput sequencing technology, and entering a next step when determining that the sequencing result of the patient is abnormal based on distribution data of the lung cancer marker of the patient and diagnosis matching data of different types of lung cancer markers;
S2, determining false detection data of the lung cancer markers in the detection depth corresponding to the high-throughput sequencing and the PCR amplification process according to analysis results of the detection data, and determining the credibility coefficient of the lung cancer markers and credible markers by combining the lung cancer markers with the distribution data of other types of lung cancer markers in different patients;
S3, determining the similarity of the distribution data of the trusted markers of the patient and the markers of different historical diagnosis patients, and combining the trusted coefficients of the different trusted markers to determine the distribution similarity coefficients of the patient and the different historical diagnosis patients and the reference diagnosis patients;
s4, based on distribution data of misdiagnosed patients in reference diagnosis patients in different similarity coefficient intervals, outputting a detection result of lung cancer of the patients.
Further, the distribution data of the lung cancer markers of the patient are determined according to the distribution positions of the corresponding gene segments of the lung cancer markers of the patient.
The diagnosis matching data of the lung cancer marker is determined according to the distribution data of misdiagnosed patients in the history diagnosis patients corresponding to the lung cancer marker.
It will be appreciated that, as shown in fig. 2, determining that there is an abnormality in the sequencing result of the patient specifically includes:
determining a lung cancer marker of the patient based on the distribution data of the lung cancer marker of the patient, and taking the lung cancer marker as a patient matching marker;
According to the diagnosis matching data of different patient matching markers, determining the distribution data of misdiagnosed patients in the historical diagnosis patients corresponding to the different patient matching markers, and determining the diagnosis matching coefficients of the different patient matching markers through the distribution data of the misdiagnosed patients;
Determining the diagnosis matching markers of the patient through the diagnosis matching coefficients of different patient matching markers, and determining whether the sequencing result of the patient is abnormal or not by utilizing the number of the diagnosis matching markers.
Specifically, the diagnosis matching coefficient is determined according to the distribution proportion of the misdiagnosed patients.
The diagnosis matching marker of the patient is a patient matching marker with a diagnosis matching coefficient larger than a preset matching coefficient.
It will be appreciated that when the number of diagnostic match markers is greater than the number of preset markers, then it is determined that there is an abnormality in the sequencing result of the patient.
Further, when the sequencing result of the patient is not abnormal, determining that the lung cancer detection result of the patient is that the lung cancer is not present.
In addition, determining that the abnormality exists in the sequencing result of the patient specifically includes:
determining a lung cancer marker of the patient based on the distribution data of the lung cancer marker of the patient, and taking the lung cancer marker as a patient matching marker;
according to the diagnosis matching data of different patient matching markers, determining distribution data of misdiagnosed patients in historical diagnosis patients corresponding to different patient matching markers, and determining the diagnosis matching markers in the patient matching markers according to the number of the misdiagnosed patients;
determining whether an abnormality exists in the sequencing result of the patient using the number of diagnostic match markers.
Specifically, determining that the sequencing result of the patient is abnormal specifically includes:
s11, determining a lung cancer marker of the patient based on distribution data of the lung cancer marker of the patient, and taking the lung cancer marker as a patient matching marker;
S12, determining distribution data of misdiagnosed patients in historical diagnosis patients corresponding to different patient matching markers according to diagnosis matching data of different patient matching markers, and determining diagnosis matching coefficients of different patient matching markers through the distribution data of the misdiagnosed patients;
S13, determining the comprehensive matching coefficient of the patient through the diagnosis matching coefficients of different patient matching markers, and determining whether the sequencing result of the patient is abnormal or not by utilizing the comprehensive matching coefficient.
Optionally, before entering step S12, it is further required to determine whether the number of patient matching markers is greater than the number of preset matching markers, if the number of patient matching markers is greater than the number of preset matching markers, determining that the sequencing result of the patient is abnormal, and if the number of patient matching markers is not greater than the number of preset matching markers, proceeding to step S12.
Optionally, the step S12 includes steps S121 to S124, specifically:
S121, determining distribution data of misdiagnosed patients in historical diagnosis patients corresponding to different patient matching markers according to diagnosis matching data of the different patient matching markers, when the patient matching markers with the distribution number of the misdiagnosed patients being smaller than the preset number ratio exist, turning to step S122, and when the patient matching markers with the distribution number of the misdiagnosed patients being smaller than the preset number ratio do not exist, determining that the sequencing result of the patient is not abnormal;
s122, taking patient matching markers with the distribution number of misdiagnosed patients being smaller than the preset number of patients as screening markers, determining that the sequencing result of the patients is abnormal when the number of the historical diagnosis patients corresponding to the screening markers is larger than the preset number of the patients, and switching to the step S123 when the number of the historical diagnosis patients corresponding to the screening markers is not larger than the preset number of the patients;
S123, determining data reliable markers in the screening markers according to the number of historical diagnosis patients corresponding to the screening markers, determining that the sequencing result of the patients is abnormal when the number of the data reliable markers meets the requirement, and switching to step S124 when the number of the data reliable markers does not meet the requirement;
s124, determining the diagnosis matching coefficients of different patient matching markers according to the distribution data of the misdiagnosed patient, and proceeding to step S13.
Optionally, the step S13 includes steps S131 to S134, specifically:
S131, when the diagnosis matching coefficients of different patient matching markers are smaller than a preset matching coefficient threshold value, entering a step S132, and when the patient matching markers with the diagnosis matching coefficients not smaller than the preset matching coefficient threshold value exist, entering a step S133;
S132, when the number of the patient markers is smaller than a preset marker number threshold, determining that the sequencing result of the patient is not abnormal, and when the number of the patient markers is not smaller than the preset marker number threshold, turning to step S134;
S133, taking patient matching markers with diagnosis matching coefficients not smaller than a preset matching coefficient threshold as matching markers, determining that abnormality exists in the sequencing result of the patient when the number of the matching markers is larger than the preset matching marker number threshold, and switching to the step S134 when the number of the matching markers is not larger than the preset matching marker number threshold;
S134, determining comprehensive matching coefficients of the patient through diagnosis matching coefficients of different patient matching markers, and determining whether abnormality exists in sequencing results of the patient by using the comprehensive matching coefficients.
Further, the false detection data of the lung cancer marker comprises the number of false detection patients of the lung cancer marker and the number proportion of the number of false detection patients.
Specifically, as shown in fig. 3, the method for determining the trusted coefficient of the lung cancer marker is as follows:
Determining the false detection proportion corresponding to the lung cancer markers according to the detection depth corresponding to the high-throughput sequencing and the false detection data of the lung cancer markers in the PCR amplification process;
According to the distribution data of the lung cancer markers and other types of lung cancer markers in different patients, determining a patient in which the lung cancer markers and other types of lung cancer markers are located at the same time, using the patient as a matched patient, and determining the matched markers in the other types of lung cancer markers according to the number proportion of the matched patients;
and determining the credibility coefficient of the lung cancer marker based on the quantity ratio of the matched marker in the lung cancer marker of the patient.
Further, the matching marker is other types of lung cancer standard with the number of matching patients with the ratio of the number of matching patients being larger than the preset number of matching patients.
In addition, the credibility coefficient of the lung cancer marker is determined according to the product of the quantity ratio of the matched marker in the lung cancer marker of the patient and the false detection ratio.
It can be appreciated that the confidence coefficient of the lung cancer marker ranges from 0 to 1, wherein when the confidence coefficient of the lung cancer marker is greater than a predetermined confidence coefficient, the lung cancer marker is determined to be a bearable marker.
In another embodiment, the method for determining the confidence coefficient of the lung cancer marker comprises the following steps:
determining the number of accurate diagnosis patients corresponding to the lung cancer markers according to the detection depth corresponding to the high-throughput sequencing and the false detection data of the lung cancer markers in the PCR amplification process;
According to the distribution data of the lung cancer markers and other types of lung cancer markers in different patients, determining a patient in which the lung cancer markers and other types of lung cancer markers are located at the same time, using the patient as a matched patient, and determining the matched markers in the other types of lung cancer markers according to the number proportion of the matched patients;
And determining the credibility coefficient of the lung cancer marker based on the number of the accurate diagnosis patients corresponding to the lung cancer marker and the number of the matched markers.
Further, the confidence coefficient of the lung cancer marker is determined based on the product of a preset confidence coefficient corresponding to the number of accurately diagnosed patients corresponding to the lung cancer marker and a preset confidence coefficient corresponding to the number of matched markers.
In one embodiment, the method for determining the trusted coefficient of the lung cancer marker comprises the following steps:
S21, determining the false detection proportion corresponding to the lung cancer markers according to the detection depth corresponding to the high-throughput sequencing and the false detection data of the lung cancer markers in the PCR amplification process, and determining the diagnosis matching coefficient of the lung cancer markers by combining the number of false detection patients;
S22, according to distribution data of the lung cancer markers and other types of lung cancer markers in different patients, determining a patient in which the lung cancer markers and other types of lung cancer markers are located at the same time as one patient, taking the patient as a matched patient, determining matched markers in the other types of lung cancer markers according to the number proportion of the matched patients, and determining marker matching coefficients of the lung cancer markers based on the number of the matched markers, the number of the matched patients of the different matched markers and the number proportion of the matched patients;
S23, determining the credibility coefficient of the lung cancer marker based on the marker matching coefficient of the matching marker and the diagnosis matching coefficient.
Optionally, the step S21 includes steps S211 to S213, specifically:
S211, determining the number of false detection patients corresponding to the lung cancer markers according to the detection depth corresponding to the high-throughput sequencing and the false detection data of the lung cancer markers in the PCR amplification process, determining that the lung cancer markers do not belong to beaconing markers when the number of the false detection patients is larger than the preset number of patients, and turning to step S212 when the number of the false detection patients is not larger than the preset number of patients and is in a preset number interval, and turning to step S213 when the number of the false detection patients is not in a preset number interval;
S212, determining a false detection proportion corresponding to the lung cancer marker, determining that the lung cancer marker does not belong to a bearable marker when the false detection proportion does not meet the requirement, and switching to step S213 when the false detection proportion meets the requirement;
S213, determining a diagnosis matching coefficient of the lung cancer marker according to the false detection proportion corresponding to the lung cancer marker and the number of false detection patients, determining that the lung cancer marker does not belong to a bearable marker when the diagnosis matching coefficient of the lung cancer marker is smaller than a preset matching coefficient threshold value, and switching to the step S22 when the diagnosis matching coefficient of the lung cancer marker is not smaller than the preset matching coefficient threshold value.
Optionally, the step S22 includes steps S221 to S223, specifically:
S221, according to the distribution data of the lung cancer markers and other types of lung cancer markers in different patients, determining that the lung cancer markers and other types of lung cancer markers are located in one patient at the same time and are used as matched patients, when other types of lung cancer markers with the number ratio of the matched patients being larger than the number ratio of the preset matched patients do not exist, determining that the lung cancer markers do not belong to bearable markers, and when other types of lung cancer markers with the number ratio of the matched patients being larger than the number ratio of the preset matched patients exist, switching to step S222;
S222, determining the matched markers in the other types of lung cancer markers according to the number proportion of the matched patients, when the number of the matched markers is larger than a preset matched marker number threshold value, turning to step S224, and when the number of the matched markers is not larger than the preset matched marker number threshold value, turning to step S223;
S223, determining basic matching coefficients of different matching markers according to the number and the number proportion of the matching patients with different matching markers, determining that the lung cancer markers do not belong to bearable markers when no matching markers with the basic matching coefficients in a preset matching coefficient interval exist, and switching to the step S224 when the matching markers with the basic matching coefficients in the preset matching coefficient interval exist;
S224, determining a marker matching coefficient of the lung cancer marker based on the number of the matched markers, the number of matched patients of different matched markers and the number ratio, when the marker matching coefficient is smaller than a preset coefficient threshold value, determining that the lung cancer marker does not belong to a bearable marker, and when the marker matching coefficient is not smaller than the preset coefficient threshold value, turning to step S23.
Specifically, as shown in fig. 4, the method for determining the reference diagnosis patient is as follows:
Determining matching data of the patient's trusted marker based on the similarity of the patient's trusted marker and the distribution data of the patient's historic diagnostic patient's markers, and taking the matching data as a similar marker;
And determining distribution similarity coefficients of the patient and the historical diagnosis patient according to the number of the similar markers and the credibility coefficients of different similar markers, and determining whether the historical diagnosis patient is a reference diagnosis patient or not by utilizing the distribution similarity coefficients.
Further, distribution similarity coefficients of the patient and the historical diagnostic patient are determined according to the sum of weights of the credible coefficients of different similarity markers.
In addition, when the distribution similarity coefficient of the historical diagnosis patient is larger than a preset similarity coefficient threshold value, the historical diagnosis patient is determined to be a reference diagnosis patient.
Optionally, the method for determining the reference diagnosis patient comprises the following steps:
Determining that there is no matching trusted marker for the patient and the historic diagnostic patient based on a similarity of the patient's trusted marker to the distribution data of the patient's markers:
determining that the historic diagnostic patient does not belong to a reference diagnostic patient;
Determining that the patient has a matching trusted marker with the historically diagnosed patient:
taking the matched trusted markers as similar markers, and when the number of the similar markers is smaller than a preset similar marker number threshold value:
acquiring the credibility coefficients of different similar markers, and determining that the historical diagnosis patient does not belong to a reference diagnosis patient when the average value of the credibility coefficients of the different similar markers is smaller than a preset credibility coefficient threshold value;
When the average value of the credibility coefficients of the different similar markers is not smaller than a preset credibility coefficient threshold value:
determining that the historical diagnosis patient does not belong to a reference diagnosis patient when the credibility coefficients of different similar markers are determined to be within a preset credibility coefficient interval;
When the similar markers which are not in the preset credible coefficient interval exist or the number of the similar markers is not smaller than a preset similar marker number threshold value:
And determining distribution similarity coefficients of the patient and the historical diagnosis patient according to the number of the similar markers and the credibility coefficients of different similar markers, and determining whether the historical diagnosis patient is a reference diagnosis patient or not by utilizing the distribution similarity coefficients.
Specifically, outputting the detection result of the lung cancer of the patient specifically includes:
determining the distribution proportion of misdiagnosed patients in different similar coefficient intervals according to the distribution data of misdiagnosed patients in the reference diagnosis patients in different similar coefficient intervals;
Determining corresponding preset correction factors based on the number of reference diagnosis patients in different similar coefficient intervals, and determining the weight coefficients of the different similar coefficient intervals by using the products of preset weight coefficients corresponding to the different similar coefficient intervals and the preset correction factors;
Determining the lung cancer diagnosis misdiagnosis rate of the patient according to the weight coefficients of different similar coefficient intervals and the distribution proportion of misdiagnosed patients, and outputting the detection result of the lung cancer of the patient by utilizing the lung cancer diagnosis misdiagnosis rate.
The further technical scheme is that the method for outputting the detection result of the lung cancer of the patient by using the lung cancer diagnosis misdiagnosis rate specifically comprises the following steps:
When the lung cancer diagnosis misdiagnosis rate is smaller than a preset misdiagnosis rate, determining that the lung cancer detection result of the patient is lung cancer;
And when the lung cancer diagnosis misdiagnosis rate is not less than the preset misdiagnosis rate, determining that the lung cancer detection result of the patient is that the lung cancer does not exist.
In a second aspect of embodiment 2, as shown in FIG. 5, the present invention provides a computer system comprising a memory and a processor communicatively coupled, and a computer program stored on the memory and capable of running on the processor, wherein the processor, when running the computer program, performs a lung cancer detection method using a high throughput sequencing technique as described above.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, devices, non-volatile computer storage medium embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the section of the method embodiments being relevant.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The foregoing is merely one or more embodiments of the present description and is not intended to limit the present description. Various modifications and alterations to one or more embodiments of this description will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of one or more embodiments of the present description, is intended to be included within the scope of the claims of the present description.

Claims (10)

1.一种利用高通量测序技术的肺癌检测方法,其特征在于,具体包括:1. A method for detecting lung cancer using high-throughput sequencing technology, characterized in that it specifically includes: 利用高通量测序技术进行患者的肺癌标志物的测序结果的确定,并基于所述患者的肺癌标志物的分布数据以及不同类型的肺癌标志物的诊断匹配数据,确定所述患者的测序结果存在异常时,进入下一步骤;Determining the sequencing results of the patient's lung cancer markers using high-throughput sequencing technology, and if it is determined that the sequencing results of the patient are abnormal based on the distribution data of the patient's lung cancer markers and the diagnostic matching data of different types of lung cancer markers, proceeding to the next step; 以检测数据的分析结果,确定在高通量测序对应的检测深度和PCR扩增过程中的肺癌标志物的误检数据,并结合所述肺癌标志物与其它类型的肺癌标志物在不同患者的分布数据进行所述肺癌标志物的可信系数以及可信标志物的确定;Based on the analysis results of the detection data, the false detection data of the lung cancer marker in the detection depth corresponding to the high-throughput sequencing and the PCR amplification process are determined, and the credibility coefficient of the lung cancer marker and the credible marker are determined in combination with the distribution data of the lung cancer marker and other types of lung cancer markers in different patients; 确定所述患者的可信标志物与不同的历史诊断患者的标志物的分布数据的相似情况,并结合不同的可信标志物的可信系数进行所述患者与不同的历史诊断患者的分布相似系数以及参考诊断患者的确定;Determine the similarity of the distribution data of the patient's credible markers and markers of different historically diagnosed patients, and determine the distribution similarity coefficients of the patient and different historically diagnosed patients and reference diagnosed patients in combination with the credibility coefficients of different credible markers; 基于不同的相似系数区间内的参考诊断患者中的误诊患者的分布数据,进行所述患者的肺癌的检测结果的输出。Based on the distribution data of misdiagnosed patients among the reference diagnosed patients within different similarity coefficient intervals, the detection result of the lung cancer of the patient is output. 2.如权利要求1所述的利用高通量测序技术的肺癌检测方法,其特征在于,所述患者的肺癌标志物的分布数据根据所述患者的肺癌标志物的对应的基因片段的分布位置进行确定。2. The lung cancer detection method using high-throughput sequencing technology according to claim 1, characterized in that the distribution data of the patient's lung cancer marker is determined according to the distribution position of the gene fragment corresponding to the patient's lung cancer marker. 3.如权利要求1所述的利用高通量测序技术的肺癌检测方法,其特征在于,所述肺癌标志物的诊断匹配数据根据所述肺癌标志物对应的历史诊断患者中的误诊患者的分布数据进行确定。3. The lung cancer detection method using high-throughput sequencing technology according to claim 1, characterized in that the diagnostic matching data of the lung cancer marker is determined based on the distribution data of misdiagnosed patients among the historically diagnosed patients corresponding to the lung cancer marker. 4.如权利要求1所述的利用高通量测序技术的肺癌检测方法,其特征在于,确定所述患者的测序结果存在异常,具体包括:4. The method for detecting lung cancer using high-throughput sequencing technology according to claim 1, wherein determining that the sequencing result of the patient is abnormal comprises: 基于所述患者的肺癌标志物的分布数据,确定所述患者的肺癌标志物,并将其作为患者匹配标志物;Based on the distribution data of the lung cancer markers of the patient, determining the lung cancer markers of the patient and using them as patient matching markers; 根据不同的患者匹配标志物的诊断匹配数据,确定不同的患者匹配标志物对应的历史诊断患者中的误诊患者的分布数据,并通过误诊患者的分布数据确定不同的患者匹配标志物的诊断匹配系数;Determine the distribution data of misdiagnosed patients among historically diagnosed patients corresponding to the different patient matching markers according to the diagnostic matching data of different patient matching markers, and determine the diagnostic matching coefficients of the different patient matching markers through the distribution data of the misdiagnosed patients; 通过不同的患者匹配标志物的诊断匹配系数确定所述患者的诊断匹配标志物,利用诊断匹配标志物的数量确定所述患者的测序结果是否存在异常。The diagnostic matching markers of the patient are determined by the diagnostic matching coefficients of different patient matching markers, and the number of diagnostic matching markers is used to determine whether the sequencing result of the patient is abnormal. 5.如权利要求4所述的利用高通量测序技术的肺癌检测方法,其特征在于,所述诊断匹配系数根据误诊患者的分布比例进行确定。5. The lung cancer detection method using high-throughput sequencing technology according to claim 4, characterized in that the diagnostic matching coefficient is determined according to the distribution ratio of misdiagnosed patients. 6.如权利要求1所述的利用高通量测序技术的肺癌检测方法,其特征在于,所述患者的诊断匹配标志物为诊断匹配系数大于预设匹配系数的患者匹配标志物。6. The lung cancer detection method using high-throughput sequencing technology according to claim 1, characterized in that the patient's diagnostic matching marker is a patient matching marker with a diagnostic matching coefficient greater than a preset matching coefficient. 7.如权利要求1所述的利用高通量测序技术的肺癌检测方法,其特征在于,所述参考诊断患者的确定的方法为:7. The method for detecting lung cancer using high-throughput sequencing technology according to claim 1, wherein the method for determining the reference diagnosis patient is: 基于所述患者的可信标志物与所述历史诊断患者的标志物的分布数据的相似情况,确定所述患者的可信标志物的匹配数据,并将其作为相似标志物;Based on the similarity between the distribution data of the patient's credible markers and the markers of the historically diagnosed patients, determining the matching data of the patient's credible markers and using them as similar markers; 根据相似标志物的数量以及不同的相似标志物的可信系数确定所述患者与所述历史诊断患者的分布相似系数,并利用所述分布相似系数确定所述历史诊断患者是否为参考诊断患者。The distribution similarity coefficient between the patient and the historically diagnosed patient is determined according to the number of similar markers and the credibility coefficients of different similar markers, and the distribution similarity coefficient is used to determine whether the historically diagnosed patient is a reference diagnosed patient. 8.如权利要求7所述的利用高通量测序技术的肺癌检测方法,其特征在于,所述患者与所述历史诊断患者的分布相似系数根据不同的相似标志物的可信系数的权重和进行确定。8. The lung cancer detection method using high-throughput sequencing technology according to claim 7, characterized in that the distribution similarity coefficient between the patient and the historically diagnosed patient is determined based on the weighted sum of the credibility coefficients of different similar markers. 9.如权利要求7所述的利用高通量测序技术的肺癌检测方法,其特征在于,当所述历史诊断患者的分布相似系数大于预设形似系数阈值时,则确定所述历史诊断患者为参考诊断患者。9. The lung cancer detection method using high-throughput sequencing technology according to claim 7, characterized in that when the distribution similarity coefficient of the historically diagnosed patient is greater than a preset similarity coefficient threshold, the historically diagnosed patient is determined to be a reference diagnosed patient. 10.一种计算机系统,包括:通信连接的存储器和处理器,以及存储在所述存储器上并能够在所述处理器上运行的计算机程序,其特征在于:所述处理器运行所述计算机程序时执行权利要求1-9任一项所述的一种利用高通量测序技术的肺癌检测方法。10. A computer system, comprising: a memory and a processor in communication connection, and a computer program stored in the memory and capable of running on the processor, wherein the processor executes a method for detecting lung cancer using high-throughput sequencing technology according to any one of claims 1 to 9 when running the computer program.
CN202411921747.0A 2024-12-25 2024-12-25 A lung cancer detection method and system using high-throughput sequencing technology Active CN119361137B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411921747.0A CN119361137B (en) 2024-12-25 2024-12-25 A lung cancer detection method and system using high-throughput sequencing technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411921747.0A CN119361137B (en) 2024-12-25 2024-12-25 A lung cancer detection method and system using high-throughput sequencing technology

Publications (2)

Publication Number Publication Date
CN119361137A true CN119361137A (en) 2025-01-24
CN119361137B CN119361137B (en) 2025-04-08

Family

ID=94316574

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411921747.0A Active CN119361137B (en) 2024-12-25 2024-12-25 A lung cancer detection method and system using high-throughput sequencing technology

Country Status (1)

Country Link
CN (1) CN119361137B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120748768A (en) * 2025-09-02 2025-10-03 厦门大学附属第一医院(厦门市第一医院、厦门市红十字会医院、厦门市糖尿病研究所) Tumor radiotherapy reaction prediction system and method based on rule learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190316184A1 (en) * 2018-04-14 2019-10-17 Natera, Inc. Methods for cancer detection and monitoring
CN112522411A (en) * 2020-12-29 2021-03-19 北京泱深生物信息技术有限公司 Lung cancer gene marker group, application and diagnosis system
CN112941180A (en) * 2021-02-25 2021-06-11 浙江大学医学院附属妇产科医院 Group of lung cancer DNA methylation molecular markers and application thereof in preparation of lung cancer early diagnosis kit
CN116344037A (en) * 2023-04-04 2023-06-27 中山大学附属第六医院 Method, device, electronic equipment and storage medium for determining MSI classification
CN117412765A (en) * 2021-01-29 2024-01-16 国立卫生研究所 Methods of diagnosing MSI cancers

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190316184A1 (en) * 2018-04-14 2019-10-17 Natera, Inc. Methods for cancer detection and monitoring
CN112522411A (en) * 2020-12-29 2021-03-19 北京泱深生物信息技术有限公司 Lung cancer gene marker group, application and diagnosis system
CN117412765A (en) * 2021-01-29 2024-01-16 国立卫生研究所 Methods of diagnosing MSI cancers
CN112941180A (en) * 2021-02-25 2021-06-11 浙江大学医学院附属妇产科医院 Group of lung cancer DNA methylation molecular markers and application thereof in preparation of lung cancer early diagnosis kit
CN116344037A (en) * 2023-04-04 2023-06-27 中山大学附属第六医院 Method, device, electronic equipment and storage medium for determining MSI classification

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZUGANG YIN等: "Application of artificial intelligence in diagnosis and treatment of colorectal cancer : A novel Prospect", 《FRONTIERS》, 8 March 2023 (2023-03-08) *
刘权兴: "基于液体活检的多组学检测技术在肺癌早期诊断中的临床应用研究", 《中国优秀博士论文电子期刊》, 15 May 2024 (2024-05-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120748768A (en) * 2025-09-02 2025-10-03 厦门大学附属第一医院(厦门市第一医院、厦门市红十字会医院、厦门市糖尿病研究所) Tumor radiotherapy reaction prediction system and method based on rule learning

Also Published As

Publication number Publication date
CN119361137B (en) 2025-04-08

Similar Documents

Publication Publication Date Title
CN110268044B (en) Method and device for detecting chromosome variation
KR20200106179A (en) Quality control template to ensure the effectiveness of sequencing-based assays
CN113096728B (en) A detection method, device, storage medium and device for tiny residual lesions
US20210166813A1 (en) Systems and methods for evaluating longitudinal biological feature data
JP2016503301A (en) How to determine the presence or absence of aggressive prostate cancer
EP2864918B1 (en) Systems and methods for generating biomarker signatures
CN119361137B (en) A lung cancer detection method and system using high-throughput sequencing technology
US12272431B2 (en) Detecting false positive variant calls in next-generation sequencing
JP7085480B2 (en) Detection of specimens during blinking and fluorescent reactions
Zhou et al. Classification of missense mutations of disease genes
CN117497047B (en) Method, equipment and medium for screening tumor gene markers based on exon sequencing
CN112786103A (en) Method and device for analyzing feasibility of target sequencing Panel for estimating tumor mutation load
CN117106870B (en) Method and device for determining fetal concentration
JP2013526863A (en) Discontinuous state for use as a biomarker
US10083274B2 (en) Non-hypergeometric overlap probability
CN118222713A (en) Application of biomarker in detection of brain glioma-related TLS
WO2023207396A1 (en) Construction method for model for analyzing variation detection result
CN119207557B (en) A method and system for monitoring early recurrence of colorectal cancer
TWI832443B (en) Methylation biomarker selection apparatuses and methods
CN117809741B (en) Method and device for predicting cancer characteristic genes based on molecular evolution selective pressure
CN119889640B (en) Methods, equipment, and media for prognostic assessment of colorectal cancer based on multimodal data
EP4594527A1 (en) Detection method, computer program product, data processing unit and detection system for detecting mutations of a polynucleotide in a biological sample
Song et al. QuadST identifies cell-cell interaction-changed genes in spatially resolved transcriptomics data
JP2009008442A (en) Determination method of stray sample
Dinh Leveraging Machine Learning to Identify Proteomic Biomarkers of Tibial Bone Stress Reinjury

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant