WO2012071685A1 - Méthode et système d'analyse bio-informatique de typage précis du papillomavirus - Google Patents
Méthode et système d'analyse bio-informatique de typage précis du papillomavirus Download PDFInfo
- Publication number
- WO2012071685A1 WO2012071685A1 PCT/CN2010/001943 CN2010001943W WO2012071685A1 WO 2012071685 A1 WO2012071685 A1 WO 2012071685A1 CN 2010001943 W CN2010001943 W CN 2010001943W WO 2012071685 A1 WO2012071685 A1 WO 2012071685A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- sample
- sequencing
- hpv
- fragments
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
Definitions
- the invention relates to the field of biological genetic engineering technology, and in particular relates to a method and system for bioinformatics analysis of HPV accurate typing. Background technique
- HPV Human Papillomavirus
- HPV16 high-risk types
- HPV6 low-risk types
- HPV6 high-risk types
- infection rates range from less than 1% to as high as 50%.
- More than 100 types of HPV can infect the skin (skin type) or the mucous membranes of the respiratory and anal genital tract (mucosal type), and more than 40 types of HPV can infect the cervix.
- HPV plays an important role in the initiation, development, progression, and even malignancy of many tumors, and is considered to be the tumor virus most closely related to human tumors.
- HPV typing is important for the development of HPV treatment options, the risk of HPV infection, and the regional specificity of HPV infection. Therefore, the current research suggests that it is necessary to perform the typing detection of HPV present in each sample, which will help to analyze the pathogenicity of various HPV types in more detail to achieve the best clinical prevention and treatment effects.
- the detection methods for HPV genotyping in the prior art mainly include the following:
- ELISA method The specific reaction between the antigen and the antibody is used to connect the analyte to the enzyme, and the color reaction is generated by the enzyme and the substrate for quantitative determination. This method only The identification that can be used for individual subtypes has gradually been replaced by other assays.
- PCR Polymerase Chain Reaction
- Hybrid capture detection method The molecular hybridization chemiluminescence is used to amplify the signal, and the HPV type is determined by interpreting the intensity of the light. This method has the disadvantage of being unable to detect HPV specific types and multiple infections and high costs.
- PCR combined hybridization detection method is a method of sharing PCR and hybridization. This method also has the disadvantages of time consuming and complicated means.
- gene chip technology Gene chip technology, there are many classifications, commonly used in the in situ synthesis of oligonucleotides. The method has the disadvantages of inaccurate detection results, high experimental conditions, and high cost.
- a technical problem to be solved by the present invention is to provide a method and system for bioinformatics analysis of HPV accurate classification, which can realize HPV type with high sensitivity and specificity and rapid recognition of gene sequences.
- One aspect of the present invention provides a method for bioinformatics analysis of HPV exact typing, the method comprising: receiving a sequencing fragment obtained by high-throughput sequencing technology; performing a sample linker sequence and a sample linker sequence library in the sequenced fragment Alignment, realizing the sub-sample operation; comparing the sequenced fragments with the reference genomic sequence, screening the compared sequences to determine the HPV type or negative of the sequenced fragments; determining the sequence fragments of the determined type by sample Merge, and filter according to the number and proportion of sequence fragments supporting the corresponding type after the combination; finally confirm the HPV type of each sample Don't either be determined to be negative.
- the method further comprises: after receiving the sequencing sequence, filtering the sequencing sequence to remove the unqualified sequence.
- the step of "filtering the sequencing sequence to remove the unqualified sequence” further comprises: presetting the sequencing quality threshold and ratio of the unqualified base Threshold; when the sequencing quality value of the base in the sequencing sequence is lower than the sequencing quality threshold, and the number of bases below the sequencing quality threshold accounts for the ratio of the number of bases of the entire sequence exceeds the ratio threshold; Qualified sequence and filtered; when the number of undetermined bases in the sequencing result of the sequencing sequence exceeds 10% of the number of bases in the entire sequence, the sequencing sequence is considered to be an unqualified sequence and filtered; For alignment, if a sequencing linker sequence is present in the sequencing sequence, the sequencing sequence is a failed sequence and filtered.
- the method further comprises: removing the sample linker sequence from the sequence fragment after performing the sample-sequencing operation.
- the step of "removing the sample linker sequence from the sequence fragment” further comprises: presetting the sequencing quality threshold and the number of bases of the sample linker sequence A sequence in which the sequencing quality of the base in the linker sequence is lower than the sequencing quality threshold, and the number of bases exceeds the base number threshold is removed.
- the method further comprises: step a: performing a complete matching operation between the sample linker sequence and the sequence in the sample linker sequence library; step b, taking the sample The linker sequence degrades the 1- 2 bp base and performs a perfect match with the corresponding portion of the sequence in the sample linker sequence library; Step c, allows the sample linker sequence to have only one base insertion, ie, in the sample The start of the head sequence performs a perfect match operation. When a base cannot match, the base is regarded as an insert base. After skipping the base, the exact match operation is continued. Step d: Allow the sample linker sequence to have only one base.
- Step a> Step b>"Step c or Step d" The sequence determines the alignment of the final sample linker sequence; the sequence of the same sample linker sequence is considered to be from the same sample, thereby distinguishing the sample; and the sample linker sequence in the sequence of the sample is removed.
- the method further comprises: if there is no comparison result in the four steps of steps a - d, or one step simultaneously compares two results Or only step c and step d are compared at the same time; the comparison result is considered to be invalid information due to indistinguishable, and the corresponding entire sequence is removed.
- the step of "screening the compared sequence” further comprises: aligning the sequenced fragments obtained by the high throughput sequencing technique to the reference genome sequence After the alignment, screen and remove the alignment results in the alignment length less than 70%, or the consistency is less than 85% of the sequence; retain the best results in each sequence alignment result; retain the suboptimal results;
- the suboptimal result satisfies: the consistency of the sequence * the alignment length and the alignment score are higher than or equal to 0.9 times and 0.85 times of the best result, respectively, and the probability that the sequence is not correlated with the reference sequence is lower than the best result.
- the method further comprises: normalizing the number of sequence fragments after combining the samples by combining the samples of the determined type.
- standardizing the number of sequence fragments after sample combination further comprises: proportionally the number of sequences owned by each sample in each library The amount of sequencing scaled to the library is the average amount of sequencing in the ideal case.
- the step "screening according to the number and proportion of sequence fragments supporting the corresponding types after the combination” further includes: after standardization, according to the following conditions Screening in sequence: If the number of available sequences is less than the average number of valid sequence fragments of the negative control sample plus the sum of four standard deviations, the actual or sequencing operation is considered to be unsuccessful; otherwise, if the alignment results support HPV type If the number of sequence fragments is less than a predetermined threshold, it is considered to be negative; if the ratio of the number of sequence fragments supporting the HPV type to the total number of sequence fragments reaches a predetermined threshold or more, it is considered that the type is infected.
- Another aspect of the present invention provides a system for bioinformatics analysis of HPV accurate typing, the system comprising: a receiving module for receiving a sequencing fragment obtained by high-throughput sequencing technology; a sample module for sequencing The sample linker sequence in the fragment is compared with the sample linker sequence library to implement a sample-sequencing operation; a sequence type determination module is used to compare the sequenced fragment with the reference genome sequence, and the compared sequence is compared to determine the selected sequence.
- the HPV type or negative of the sequence fragment; the sample type determining module is configured to combine the sample fragments of the determined type by sample, and according to the number and proportion of the sequence fragments supporting the corresponding type after the combination; The HPV type of the sample was either negative.
- the receiving module is further configured to: after receiving the sequencing sequence, filtering the sequencing sequence to remove the unqualified sequence.
- the sub-sample module is further configured to: after the sub-sample operation is performed, remove the sample linker sequence from the sequence fragment.
- the combined screening module is further configured to: after combining the determined sequence segments by samples, performing the combined number of sequence fragments of the samples standardization.
- standardizing the number of sequence fragments after combining the samples further comprises: proportionally the number of sequences owned by each sample in each library
- the amount of sequencing scaled to the library is the average amount of sequencing in the ideal case.
- the method and system for bioinformatics analysis of HPV precise typing provided by the invention realize high sensitivity and specificity by using sequencing technology and analysis means, and quickly identify and confirm the purpose of HPV type.
- FIG. 1 is a flow chart showing a method for bioinformatics analysis of HPV accurate typing according to an embodiment of the present invention
- FIG. 2 is a flow chart showing another embodiment of a method of bioinformatics analysis of HPV exact typing provided by the present invention
- FIG. 3 is a flow chart showing another embodiment of a method of bioinformatics analysis of HPV precise typing provided by the present invention.
- FIG. 4 is a flow chart showing another embodiment of a method of bioinformatics analysis of HPV exact typing provided by the present invention.
- Figure 5 is a flow chart showing one embodiment of a method of bioinformatics analysis of HPV exact typing provided by the present invention
- FIG. 6 is a schematic structural diagram of a system for bioinformatics analysis of HPV accurate typing according to an embodiment of the present invention
- FIG. 7 is a schematic diagram showing the variation of the effective sequence of each stage in the original sequence in the method and system of the bioinformatics analysis of the HPV precise classification provided by the embodiment of the present invention
- FIG. 8 is a schematic diagram showing the distribution of the number of effective sequence segments of a real sample and a negative control sample according to an embodiment of the present invention
- FIG. 9 is a schematic diagram showing the results of repeatability after 10 times of sequencing and analysis of each sample provided by an embodiment of the present invention.
- Figure 10 is a schematic diagram showing the comparison of the negative positive results and the blood negative samples and the clinical test results measured by all the real samples provided by the embodiments of the present invention.
- Figure 11 is a diagram showing the results of detection of plasmid samples in a second type of library provided by an embodiment of the present invention. detailed description
- the samples specifically used in the examples of the present invention include: 328 patient real tissue samples, blood negative samples, pure water negative samples, and positive samples of plasmids loaded with specific HPV types.
- the strategies that can be employed in various embodiments include: 96 samples per sequencing library, two types of libraries are designed: The first category is 82 patient true tissue samples, 6 pure water negative samples, 6 blood negative Sample, 2 plasmid positive samples; The second category was 90 plasmid positive samples, 6 pure water negative samples. Each library was sequenced 10 times to facilitate verification of the repeatability of the information analysis. Therefore, 50 libraries were sequenced on the machine.
- FIG. 1 is a flow chart showing a method for bioinformatics analysis of HPV accurate typing according to an embodiment of the present invention.
- the method 100 for bioinformatics analysis of HPV precise typing comprises: Step 102, receiving a sequencing fragment obtained by high-throughput sequencing technology.
- the high-throughput sequencing technology employed in the present invention may be Illumina GA sequencing technology or other existing high-throughput sequencing technologies.
- Step 104 Align the sample linker sequence in the sequenced segment with the sample linker sequence library to implement a sample-sequencing operation.
- the sample connector sequence library used in the embodiment of the present invention is 96 pairs of primer-index designed experimentally. (The sample connector sequence library used in the present invention can be designed according to the experimental requirements and the number of samples, and the sample linker sequence base during the design process. Distribution and length should pay attention to comprehensively consider the number of samples tested and the non-homology of different sample linker sequences. Ensure that different samples are sampled by sample linker alignment.
- Step 106 Align the sequenced fragment with the reference genome sequence, and compare the sequence after the screening to determine the HPV type or negative of the sequenced fragment.
- the sequencing fragments obtained by the high-throughput sequencing technology are aligned to the reference genome sequence by any short sequence mapping program (such as a mapping program such as blast), wherein the reference genome sequence can be taken from the public database NCBI, the public database.
- Any short sequence mapping program such as a mapping program such as blast
- the reference genome sequence can be taken from the public database NCBI, the public database.
- "screening the aligned sequences” further comprises: after comparing the sequencing fragments obtained by the high-throughput sequencing technology to the reference genome sequence, screening and removing the alignment results in the alignment result is lower than 70%, or a sequence with less than 85% identity (100% means that the two sequences are identical); retain the best results for each sequence ratio; retain suboptimal results; where suboptimal results satisfy: sequence The consistency* alignment length and alignment score are higher than or equal to 0.9 times and 0.85 times of the optimal result, respectively, and the probability that the sequence is not correlated with the reference sequence is 10 3 times lower than the optimal result; Whether the best result of the sequence and the suboptimal result are aligned to the same type or its subtype, and if so, the comparison result is only compared to the sequence of a certain type as the effective sequence, and the effective sequence alignment is determined. HPV type or negative.
- Step 108 Combine the determined sequence segments into samples, and select according to the number and proportion of the sequence segments supporting the corresponding types after the combination; finally confirm that the HPV type of each sample is negative.
- An embodiment of the method for bioinformatics analysis of HPV accurate typing provided by the present invention utilizes bioinformatics analysis methods and technical means to quickly detect a large number of samples and quickly complete detection of infected HPV types. High sensitivity and specificity.
- FIG. 2 is a flow chart showing another embodiment of a method of bioinformatics analysis of HPV precise typing provided by the present invention.
- the method 200 for bioinformatics analysis of HPV precise typing includes: steps 202, 203, 204, 206, and 208, wherein steps 202, 204, 206, and 208 can perform the steps shown in FIG. 1, respectively.
- 102, 104, 106, and 108 are the same or similar technical contents, and the technical contents thereof will not be described herein for the sake of brevity.
- step 203 is performed to filter the sequencing sequence to remove the unqualified sequence.
- the step of "sequencing the sequencing sequence to remove the unqualified sequence” further includes: presetting the sequencing quality threshold and the proportional threshold of the unqualified base (the low quality threshold in the present invention is determined by the specific sequencing technology and the sequencing environment) For example, if the number of bases whose sequencing quality value is less than 5 exceeds 50% of the number of bases of the entire sequence, it is considered to be an unqualified sequence).
- sequencing quality value of the base in the sequencing sequence is lower than the sequencing quality threshold (eg, 5), and the number of bases below the sequencing quality threshold accounts for more than a proportional threshold (eg, 50%)
- a proportional threshold eg, 50%
- the sequence is considered to be an unqualified sequence and filtered.
- the method for bioinformatics analysis of HPV accurate typing removes the unqualified sequence by filtering the sequencing sequence, thereby further reducing the influence of the unqualified sequence, thereby improving the accuracy of the detection analysis.
- Figure 3 is a flow chart showing another embodiment of the method of bioinformatics analysis of HPV exact typing provided by the present invention.
- the method 300 for bioinformatics analysis of HPV exact typing includes: steps 302, 304, 305, 306, and 308, wherein steps 302, 304, 306, and 308 can perform the steps shown in FIG. 1, respectively.
- 102, 104, 106, and 108 are the same or similar technical contents, and the technical contents thereof will not be described herein for the sake of brevity.
- step 305 is performed to remove the sample connector sequence M column segment.
- the step of "removing the sample linker sequence from the sequence fragment” further comprises: presetting the sequencing quality threshold (eg, 5) and the number of bases threshold (eg, 3) of the sample linker sequence; sequencing the bases in the linker sequence A sequence whose mass value is lower than the sequencing quality and the number of bases exceeds the base number threshold is removed.
- the sequencing quality threshold eg, 5
- the number of bases threshold eg, 3
- a sequence of 10 bp (base pair) of the linker sequence in the present embodiment in which the sequence quality value is less than 5 and the number is greater than 3 is removed.
- Step a completely matching the sample linker sequence with the sequence in the sample linker sequence library
- Step b Degrading the sample linker sequence by l-2 bp base, and performing complete matching operation with the corresponding part of the sequence in the sample linker sequence library;
- Step c allowing the sample linker sequence to insert only one base, that is, performing a perfect match operation at the beginning of the sample linker sequence, and treating the base as an insert base when a base cannot match, skipping the base Continue to perform the exact match operation;
- Step d allowing the sample linker sequence to have only one base deletion, ie in the sample
- the final sample connector is determined according to the order of priority: step a> step b>"step c or step d".
- Sequence alignment results in the case of processing linker alignments, sometimes the same sequence will get different alignment results. Setting the priority of the screening comparison results can be understood as: the highest of step a, b times, c and d has the same priority).
- step a - d if there is no comparison result in the four steps of steps a - d, or one step simultaneously compares the two results, or only step c and step d simultaneously compare the results; then the comparison result is considered to be due to Cannot distinguish and determine invalid information, and remove the corresponding entire sequence.
- An embodiment of the method for bioinformatics analysis of HPV precise typing compares the sample linker sequence in the sequenced fragment with the sample linker sequence library, and after performing the sample-sequencing operation, the sample linker sequence is sequenced from the sequence The fragment is removed to ensure the authenticity and reliability of the HPV typing analysis, providing further protection for further HPV classification.
- Figure 4 is a flow chart showing another embodiment of the method of bioinformatics analysis of HPV exact typing provided by the present invention.
- the method 400 for bioinformatics analysis of HPV precise typing includes: steps 402, 404, 406, 408, 409, and 410, wherein steps 402, 404, and 406 can respectively perform the steps shown in FIG. 102, 104, 106 the same or similar technical content, for the sake of brevity, the technical content will not be repeated here.
- step 408 is performed to merge the sequence segments of the determined type by sample. Specifically, in step 404, the relationship between which samples the respective sequences are from is found, and according to this relationship, will belong to the same The sequences of the samples are grouped together and their alignment with the HPV reference genome is counted.
- step 409 the number of sequence fragments after the sample is combined is standardized.
- the sequencing amount of each library sample is different due to the heterogeneity of the concentration on the respective libraries.
- the number of sequences owned by each sample is scaled to the average amount of sequencing in which the sequencing amount of the library is ideal. That is, the number of combined sequences for each sample is normalized.
- step 410 screening is performed according to the number and proportion of sequence fragments supporting the corresponding type after standardization, and finally confirming that the HPV type of each sample is negative.
- the existing information of the sample is filtered and filtered.
- the screening conditions used are as follows: If the number of available sequence fragments is less than a certain value (such as 137), the experiment or sequencing operation is considered to be unsuccessful; the comparison result supports HPV type. The number of sequence fragments is less than a certain threshold (such as 350), and the test result is considered negative.
- the comparison results support that the ratio of the number of sequence fragments of a certain type of HPV to the total number of sequence fragments reaches a predetermined threshold (the threshold is set in the specific experimental background, and the authenticity and repeatability of the detection should be considered comprehensively, such as 12%). Above, the sample is considered to be infected with this type. Among them, the specific value of each part depends on the specific experimental conditions.
- Figure 5 is a flow chart showing one embodiment of a method of bioinformatics analysis of HPV exact typing provided by the present invention.
- the method for bioinformatics analysis of HPV precise typing comprises: Step 502: Receiving a sequencing fragment obtained by high-throughput sequencing technology.
- Step 502 Receiving a sequencing fragment obtained by high-throughput sequencing technology.
- Illumina GA high throughput sequencing technology is employed.
- Step 504 After receiving the sequencing sequence, filtering the sequencing sequence to remove the unqualified sequence.
- the unqualified sequence includes: The number of minus bases with a sequencing quality value below 5 is more than 50% of the number of bases in the entire sequence, which is considered to be an unqualified sequence; the number of N in the sequencing result exceeds the entire sequence of bases. A 10% of the number is considered to be an unqualified sequence; it is aligned with the sequence of the sequenced strander sequence, and if the sequence of the sequenced linker is present in the sequence, it is considered to be an unqualified sequence.
- Step 506 Comparing the sample connector sequence in each sequence with the sample sequence library to implement the sample-sequencing operation.
- Step 508 the sample sequence is removed from the sequence segment. Specifically, a sequence in the linker sequence in which the number of bases having a sequencing quality value of less than 5 is greater than three is removed. Then, 1) the sample linker sequence is completely matched with the sequence in the sample linker sequence library; 2) the sample linker sequence is degraded by l-2bp and the sequence corresponding to the sequence in the sample linker library is completely matched; 3) the sample sequence is allowed only There is a base insertion. Perform a perfect match at the beginning of the sample linker sequence. When a base cannot match, consider the base to be an insert base. After skipping this base, continue the strict exact match operation. 4) Allow the sample sequence to have only one base. The absence of the base.
- step 510 the sequenced fragments are aligned with the reference genome sequence, and the sequence after the comparison is screened.
- the HPV type or negative of the sequenced fragment after screening is determined.
- the blast mapping program is used to compare the sequencing fragments obtained by the high-throughput sequencing technology to the reference genome sequence. After the alignment, the alignment in the alignment result was less than 70%, or the sequence was less than 85%.
- each sequence alignment that is, the first comparison result of the blast software comparison output, and also retain the suboptimal result; wherein, the suboptimal result satisfies: sequence consistency * alignment length, ratio
- the scores corresponding to the scores are respectively 0.9 times or 0.85 times higher than or equal to the best result, and the probability that the sequence is uncorrelated with the reference sequence match is 10 3 times lower than the best result.
- it is judged whether the sequence of the sequence is the same type (or a subtype thereof), and finally only the selected alignment result is compared with the sequence of a certain type as a valid sequence, and each sequence is determined.
- the HPV type was compared or confirmed to be negative.
- step 512 the alignment results of the determined types of sequences are combined by sample. Specifically, in step 506, the relationship from which sample each sequence is derived has been found, and according to this relationship, the sequences belonging to the same sample are grouped together, and their alignment results with the HPV reference genome are counted.
- the number of merged sequences for each sample is normalized.
- sample one read one num one STD Sample_read_num_ori * (150000/read num ori) ; where sample_read_num_STD represents the number of sample sequences after normalization; sample_read _num_ori represents the actual sequence number of the sample ⁇ 1 J; read num ori represents the number of sequences of the sample corresponding library sequencing.
- Step 516 screening according to the number and proportion of sequence fragments supporting the corresponding type after standardization, and finally confirming that the HPV type of each sample is determined to be negative.
- the screening is performed according to the following conditions: The number of available sequences is less than 137, and the experiment or the sequencing operation is considered to be unsuccessful; otherwise, the comparison result supports the HPV type sequence fragment number less than 350, which is considered to be negative.
- the alignment results support that the number of HPV types of sequence fragments accounts for more than 12% of the total number of sequence fragments, and it is considered that the type is infected, and the HPV type of each sample infection is finally determined or determined to be negative.
- FIG. 6 is a schematic structural diagram of a system for bioinformatics analysis of HPV accurate classification according to an embodiment of the present invention.
- a system 600 for bioinformatics analysis of HPV accurate typing includes: a receiving module 602, a sub-sample module 604, a sequence type determining module 606, and a sample type determining module 608. among them
- the receiving module 602 is configured to receive the sequenced segment obtained by the high-throughput sequencing technology.
- the sample module 604 is configured to compare the sample connector sequence in the sequenced segment with the sample connector sequence library to implement a sample-sequencing operation.
- the sequence type determination module 606 is configured to compare the sequenced fragment with the reference genome sequence, and compare the sequence of the sequence to determine the HPV type or negative of the sequenced fragment.
- the sample type determining module 608 is configured to combine the determined sequence segments by samples, and perform screening according to the number and proportion of the sequence fragments supporting the corresponding types after the combination; finally confirming the HPV type of each sample or determining negative.
- the receiving module is further configured to: after receiving the sequencing sequence, filtering the sequencing sequence to remove the unqualified sequence.
- filtering the sequencing sequence For details of the specific process, refer to the description in the method embodiment, and details are not described herein again.
- the sub-sample module is further configured to: after the sub-sample operation is performed, remove the sample linker sequence from the sequence fragment.
- the combined screening module is further configured to: after combining the determined sequence segments by samples, performing the combined number of sequence fragments of the samples standardization.
- standardizing the number of sequence fragments after combining the samples further comprises: proportionally the number of sequences owned by each sample in each library
- the amount of sequencing scaled to the library is the average amount of sequencing in the ideal case.
- the embodiment of the system for bioinformatics analysis of HPV accurate classification utilizes bioinformatics analysis methods and technical means to quickly detect a large number of samples and quickly complete detection of infected HPV types. High sensitivity and specificity.
- FIG. 7 is a schematic diagram showing the variation of the effective sequence of each stage in the original sequence during the analysis process of the method and system for bioinformatics analysis of the HPV precise classification provided by the embodiment of the present invention.
- the abscissa represents the sequencing library code and the ordinate represents the ratio of the effective sequence to the original sequence.
- the Filter curve indicates the change of the ratio of the effective sequence to the original sequence of the different sequencing libraries after filtering the sequencing sequence;
- the Lib-match curve indicates the proportion of the effective sequence to the original sequence of the different sequencing libraries after the sample differentiation is completed;
- the Final curve indicates that the different sequenced libraries account for the effective sequence after the sequence HPV type is determined.
- the proportion of the original sequence changes.
- the sequence utilization rate of all 50 sequencing libraries in this example reached more than 80%.
- Figure 8 is a diagram showing the distribution of the number of valid sequence fragments of the real sample and the negative control sample provided by the embodiment of the present invention.
- the average number of valid sequence fragments of the negative control sample was 19.82.
- the standard deviation of the number of valid sequence fragments plus four times the mean is 136.98.
- the use of 137 valid sequence fragments as experimental or sequencing success or not defined values can effectively distinguish between real and negative control samples.
- Fig. 9 is a view showing the results of repeatability after sequencing and analysis of each sample 10 times in the sample provided by the embodiment of the present invention.
- Figure 9 shows the results of repeatability after 10 replicates of each sample and analysis.
- the abscissa represents the defined value that determines the positive result of the test
- the ordinate represents the average of the repetition rates of all samples. It can be clearly seen by those skilled in the art according to FIG. 9 that all samples are sequenced in Hong Kong or Shenzhen.
- the number of sequence fragments supporting the HPV type is determined to be a defined value of the positive result of the detection result, the sample is repeatedly analyzed.
- the repeatability is as high as 99%, which fully reflects the stability of the present invention for HPV detection.
- FIG. 10 is a schematic diagram showing the comparison between the negative positive results measured by the real samples and the blood negative samples and clinical test results provided by the embodiments of the present invention.
- blood is a confirmed negative sample without HPV infection. Patients with a test result greater than 1 were clinically confirmed to be positive for HPV infection.
- the result of confirming the positive result of HPV infection in this embodiment is mostly the same as the clinical test result.
- the value of 350 can distinguish between blood-negative and positive samples, avoiding false positives. Because the clinical test results are not completely positive Therefore, the detection results of this embodiment are sufficient to demonstrate the accuracy of the present invention.
- Figure 11 is a schematic diagram showing the results of detection of plasmid samples in a second type of library provided by an embodiment of the present invention.
- the abscissa indicates the type in which the HPV virus was loaded into the plasmid, and the ordinate indicates the proportion of the sequence fragment supporting the corresponding HPV virus type during the analysis of the example. It can be clearly seen by those skilled in the art according to FIG. 11 that a sample supporting a ratio of the number of sequence fragments of a certain type of HPV is determined to be a type of HPV infection, and the sample can be effectively and specifically detected. Specific type.
- HBB sample 43 HPV6 sample 75 HBB sample 12 - sample 44 HBB sample 76 HBB sample 13 HBB sample 45 - sample 77 HBB sample 14 HPV59 sample 46 HBB sample 78 HBB sample 15 HPV16 sample 47 - sample 79 HBB sample 16 HBB sample 48 HBB sample 80 HBB sample 17 HBB sample 49 HBB sample 81 HBB sample 18 HBB sample 50 HBB sample 82 HBB sample 19 HBB sample 51 HBB plasmid (type 33) HPV33 sample 20 HPV16 sample 52 HBB plasmid (type 33) HPV33 sample 21 HBB sample 53 HBB blood negative sample HBB sample 22 HBB sample 54 HBB blood negative sample HBB sample 23 HPV11 sample 55 HBB blood negative sample HBB sample 24 HBB sample 56 HBB blood negative sample HBB sample 25 HBB sample 57 HBB blood negative sample HBB sample 26 HBB sample 58 - Blood negative sample HBB sample 27 HBB sample 59 HBB pure water negative sample - sample 28 HBB sample 60 HBB pure water negative sample - sample
- Table 1 shows the results of detection of a sample library provided by the experimental example of the present invention. As shown in Table 1, this table is a sample library test result for the first class library. Where "HBB" indicates that the test result is negative, "-" indicates that the number of detected sequences is lower than 137 due to a sample problem or an experimental problem, and the sample test is considered to have failed.
- An embodiment of a method and system for bioinformatics analysis of HPV accurate classification provided by the present invention, which utilizes bioinformatics analysis methods and technical means to quickly detect a large number of samples and quickly complete the infection of HPV type. Detection, with high sensitivity and specificity.
- An embodiment of the method and system for bioinformatics analysis of HPV accurate typing provided by the present invention, by filtering the sequencing sequence, removing unqualified sequences, further reducing the influence of the unqualified sequence, thereby improving detection The accuracy of the analysis.
- An embodiment of a method and system for bioinformatics analysis of HPV accurate typing provided by the present invention, comparing a sample linker sequence in a sequenced segment with a sample linker sequence library, and implementing a sample-sequencing operation, and then taking the sample The linker sequence is removed from the sequence fragment to ensure the authenticity and reliability of the HPV typing analysis, further impeding further HPV typing.
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
La présente invention concerne une méthode et un système d'analyse bio-informatique de typage précis du papillomavirus, ladite méthode comprenant : la réception de fragments de séquençage obtenus par une technique de séquençage à haut débit ; la comparaison d'une séquence de liaison d'échantillon dans les fragments de séquençage avec une bibliothèque de séquences de liaison d'échantillon pour mettre en œuvre la séparation des échantillons ; la comparaison des fragments de séquençage avec une séquence génomique de référence, la filtration de la séquence qui a été comparée ; la détermination du type de papillomavirus des fragments de séquence filtrés ou la détermination qu'ils sont négatifs ; la combinaison des fragments de séquence dont les types ont été déterminés échantillon par échantillon ; la filtration selon la quantité et le rapport des fragments de séquence combinés qui supportent le type correspondant ; l'identification du type de papillomavirus de chaque échantillon ou leur identification comme étant finalement négatifs. La méthode et le système d'analyse bio-informatique de typage précis du papillomavirus fournis dans l'invention utilisent une méthode d'analyse bio-informatique et une solution technique pour mettre en œuvre une détection rapide sur grands échantillons et une détection rapide du type de papillomavirus d'infection avec une sensibilité et une spécificité relativement élevées.
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| HK13112598.6A HK1185113B (en) | 2010-12-02 | Method and system for bioinformatics analysis of hpv precise typing | |
| CN201080070484.7A CN103261442B (zh) | 2010-12-02 | 2010-12-02 | Hpv 精确分型的生物信息学分析的方法及系统 |
| PCT/CN2010/001943 WO2012071685A1 (fr) | 2010-12-02 | 2010-12-02 | Méthode et système d'analyse bio-informatique de typage précis du papillomavirus |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2010/001943 WO2012071685A1 (fr) | 2010-12-02 | 2010-12-02 | Méthode et système d'analyse bio-informatique de typage précis du papillomavirus |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2012071685A1 true WO2012071685A1 (fr) | 2012-06-07 |
Family
ID=46171145
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2010/001943 Ceased WO2012071685A1 (fr) | 2010-12-02 | 2010-12-02 | Méthode et système d'analyse bio-informatique de typage précis du papillomavirus |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN103261442B (fr) |
| WO (1) | WO2012071685A1 (fr) |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019047109A1 (fr) * | 2017-09-07 | 2019-03-14 | 深圳华大基因股份有限公司 | Méthode et système d'analyse bioinformatique pour le typage précis du hpv |
| CN111919257B (zh) * | 2018-07-27 | 2021-05-28 | 思勤有限公司 | 降低测序数据中的噪声的方法和系统及其实施和应用 |
| CN111755075B (zh) * | 2019-03-28 | 2023-09-29 | 深圳华大生命科学研究院 | 对免疫组库高通量测序样本间序列污染进行过滤的方法 |
| CN110951853B (zh) * | 2019-12-10 | 2021-03-30 | 中山大学附属第一医院 | 一种精确检测人基因组中dna病毒的方法 |
| CN116403647B (zh) * | 2023-06-08 | 2023-08-15 | 上海精翰生物科技有限公司 | 一种检测慢病毒整合位点的生物信息检测方法及其应用 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101397590A (zh) * | 2008-10-27 | 2009-04-01 | 杭州迪安医学检验中心有限公司 | 人乳头状瘤病毒基因分型方法 |
| CN101435002A (zh) * | 2008-12-12 | 2009-05-20 | 深圳华大基因科技有限公司 | 一种检测人类乳头瘤病毒基因型的方法 |
| CN101838709A (zh) * | 2010-04-13 | 2010-09-22 | 中山大学 | 一种微量hpv快速基因分型方法 |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| DK3404114T3 (da) * | 2005-12-22 | 2021-06-28 | Keygene Nv | Fremgangsmåde til detektering af AFLP-baseret polymorfisme med højt gennemløb |
-
2010
- 2010-12-02 WO PCT/CN2010/001943 patent/WO2012071685A1/fr not_active Ceased
- 2010-12-02 CN CN201080070484.7A patent/CN103261442B/zh active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101397590A (zh) * | 2008-10-27 | 2009-04-01 | 杭州迪安医学检验中心有限公司 | 人乳头状瘤病毒基因分型方法 |
| CN101435002A (zh) * | 2008-12-12 | 2009-05-20 | 深圳华大基因科技有限公司 | 一种检测人类乳头瘤病毒基因型的方法 |
| CN101838709A (zh) * | 2010-04-13 | 2010-09-22 | 中山大学 | 一种微量hpv快速基因分型方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| HK1185113A1 (en) | 2014-02-07 |
| CN103261442A (zh) | 2013-08-21 |
| CN103261442B (zh) | 2014-12-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20230151436A1 (en) | Diagnostic applications using nucleic acid fragments | |
| US12234515B2 (en) | Enhancement of cancer screening using cell-free viral nucleic acids | |
| CN112639987B (zh) | 核酸重排和整合分析 | |
| CN112397151B (zh) | 基于靶向捕获测序的甲基化标志物筛选与评价方法及装置 | |
| CN105624796A (zh) | 芯片及其在检测耳聋相关基因中的用途 | |
| WO2012071685A1 (fr) | Méthode et système d'analyse bio-informatique de typage précis du papillomavirus | |
| CN116189763A (zh) | 一种基于二代测序的单样本拷贝数变异检测方法 | |
| Bryan et al. | Direct Comparison of Alternative Blood-Based Approaches for Early Detection and Diagnosis of HPV-Associated Head and Neck Cancers | |
| CN106156539B (zh) | 分析个体两类状态的免疫差异的方法和装置 | |
| US20230207059A1 (en) | Genome sequencing and detection techniques | |
| CN102982253B (zh) | 一种多样本间甲基化差异检测方法及装置 | |
| HK1185113B (en) | Method and system for bioinformatics analysis of hpv precise typing | |
| WO2016176846A1 (fr) | Kit de réactifs, appareil et procédé de détection de l'aneuploïdie chromosomique | |
| WO2019047109A1 (fr) | Méthode et système d'analyse bioinformatique pour le typage précis du hpv | |
| HK40045642A (en) | Nucleic acid rearrangement and integration analysis | |
| CN115527611A (zh) | 一种基于全外显子组测序分析hpv病毒整合位点的方法 | |
| HK40023330A (en) | Enhancement of cancer screening using cell-free viral nucleic acids | |
| HK40029037A (en) | Enhancement of cancer screening using cell-free viral nucleic acids | |
| HK40029037B (en) | Enhancement of cancer screening using cell-free viral nucleic acids | |
| HK40026071A (en) | Bioinformatics analysis method and system for hpv precise typing | |
| HK40015154A (en) | Diagnostic applications using nucleic acid fragments |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10860226 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 10860226 Country of ref document: EP Kind code of ref document: A1 |