[go: up one dir, main page]

WO2023029044A1 - Méthode et appareil de séquençage de cellule unique, dispositif, support et produit-programme - Google Patents

Méthode et appareil de séquençage de cellule unique, dispositif, support et produit-programme Download PDF

Info

Publication number
WO2023029044A1
WO2023029044A1 PCT/CN2021/116704 CN2021116704W WO2023029044A1 WO 2023029044 A1 WO2023029044 A1 WO 2023029044A1 CN 2021116704 W CN2021116704 W CN 2021116704W WO 2023029044 A1 WO2023029044 A1 WO 2023029044A1
Authority
WO
WIPO (PCT)
Prior art keywords
cluster
threshold
subset
clusters
nucleotide sequences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2021/116704
Other languages
English (en)
Chinese (zh)
Inventor
韩仁敏
高欣
祁俊海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Biomap Beijing Intelligence Technology Ltd
Original Assignee
Biomap Beijing Intelligence Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Biomap Beijing Intelligence Technology Ltd filed Critical Biomap Beijing Intelligence Technology Ltd
Priority to PCT/CN2021/116704 priority Critical patent/WO2023029044A1/fr
Priority to CN202111481203.3A priority patent/CN114171117B/zh
Publication of WO2023029044A1 publication Critical patent/WO2023029044A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Definitions

  • the present disclosure relates to the technical field of single-cell sequencing, and in particular to methods, devices, electronic equipment, computer-readable storage media and computer program products for single-cell sequencing.
  • Single-cell sequencing technology refers to a new technology for high-throughput sequencing analysis of genome, transcriptome, and epigenome at the level of a single cell. It can reveal the gene structure and gene expression state of a single cell, and reflect the heterogeneity among cells. It plays an important role in the fields of tumor, developmental biology, microbiology, neuroscience, etc., and is becoming the focus of life science research. In related technologies, there is still a lot of room for improvement in the study of single-cell sequencing.
  • a method for single-cell sequencing comprising:
  • the nanopore sequencing signal determine the merger threshold; based on the second similarity threshold, perform the first clustering on multiple nucleotide sequences to obtain the second multiple clusters, the first similarity threshold is greater than the second similarity threshold and performing clustering optimization on the second plurality of clusters based on the merging threshold to obtain a third plurality of clusters.
  • an apparatus for single-cell sequencing including a module for implementing the above method.
  • an electronic device including: at least one processor; and at least one memory communicatively connected to the at least one processor. At least one memory stores instructions, and the instructions, when executed by at least one processor, cause at least one processor to perform the above method.
  • a non-transitory computer-readable storage medium storing instructions. When executed by at least one processor of a computer, the instructions cause the computer to execute the above method.
  • a computer program product including a computer program, and the computer program implements the above method when executed by a processor.
  • FIG. 1 is a flow chart of a method for single-cell sequencing according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of an example process for performing a first clustering of sequences in the method of FIG. 1 according to an embodiment of the disclosure
  • FIG. 3 is a schematic diagram of an example process of performing a first clustering of sequences in the method of FIG. 1 according to an embodiment of the present disclosure
  • FIG. 4 is a flowchart of an example process of determining a merge threshold in the method of FIG. 1 according to an embodiment of the present disclosure
  • FIG. 5 is a flowchart of an example process of cluster optimization in the method of FIG. 1 according to an embodiment of the present disclosure
  • 6A-6B are schematic diagrams of an example process of cluster optimization in the method of FIG. 1 according to an embodiment of the present disclosure
  • FIG. 7 is a flowchart of an example process of cluster optimization in the method of FIG. 1 according to an embodiment of the present disclosure
  • FIG. 8 is a schematic diagram of an example process of cluster optimization in the method of FIG. 1 according to an embodiment of the present disclosure
  • FIG. 9 is a schematic diagram of an example process of cluster optimization in the method of FIG. 1 according to an embodiment of the present disclosure
  • FIG. 10 is a flowchart of an example process of cluster optimization in the method of FIG. 1 according to an embodiment of the present disclosure
  • FIG. 11 is a block diagram of an apparatus for single-cell sequencing according to an embodiment of the present disclosure.
  • FIG. 12 is a block diagram of an electronic device for single cell sequencing according to an embodiment of the disclosure.
  • first, second, etc. to describe various elements is not intended to limit the positional relationship, temporal relationship or importance relationship of these elements, and such terms are only used for Distinguishes one element from another.
  • first element and the second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on contextual description.
  • the single-cell identification strategy based on the label requires the "demultiplexing" of the sequence obtained by sequencing, that is, putting the sequence into the corresponding "box” according to the label, and the label in the DNA sequence in each box is the same Yes, these DNA sequences come from the same single cell.
  • Some demultiplexing methods have appeared now, and these methods mainly rely on deep learning. Deep learning methods are greatly dependent on data sets, and require users to train based on samples in advance, which is not universal.
  • Three-generation sequencing can provide two kinds of information: nanopore sequencing signals and their corresponding nucleotide sequences after translation.
  • DTW pairwise dynamic time warping
  • spectral clustering hierarchical clustering, k- means clustering
  • spectral clustering hierarchical clustering, k- means clustering
  • using nucleotide sequence information in combination with related clustering tools such as using CD-HIT to complete clustering is very fast, but the clustering accuracy is particularly poor.
  • the embodiments of the present disclosure simultaneously use two kinds of information (nanopore sequencing signal and nucleotide sequence information) provided by three-generation sequencing to carry out mixed clustering on nucleotide sequences and improve the clustering efficiency. class accuracy and extends the generality of clustering.
  • FIG. 1 is a flowchart of a method 100 for single-cell sequencing according to an embodiment of the present disclosure. As shown in FIG. 1 , the method 100 includes step 110 to step 150 .
  • a sequencing library can include multiple nucleotide sequences from multiple single cells.
  • a nanopore sequencing signal corresponding to a sequence can be obtained by using nanopore sequencing technology, and then a nucleotide sequence translated from the nanopore sequencing signal can be obtained.
  • the plurality of nucleotide sequences are first clustered to obtain a first plurality of clusters, the first plurality of clusters including the largest cluster with the largest cluster size.
  • the information of the multiple nucleotide sequences can be used to perform the first clustering on the multiple nucleotide sequences through a greedy clustering algorithm.
  • each cluster in the first plurality of clusters includes one or several nucleotide sequences. It should be known that the "cluster size" in this application refers to the number of nucleotide sequences included in a cluster.
  • the first plurality of clusters can be sorted by cluster size, and the largest cluster with the largest cluster size can be obtained.
  • a merge threshold is determined based on the average signal length of nanopore sequencing signals corresponding to multiple nucleotide sequences and the nanopore sequencing signals corresponding to each nucleotide sequence in the largest cluster.
  • a first clustering is performed on the plurality of nucleotide sequences based on a second similarity threshold to obtain a second plurality of clusters, wherein the first similarity threshold is greater than the second similarity threshold.
  • a higher first similarity threshold for example, 90%
  • the clustering result is used to calculate the merging threshold.
  • perform the first clustering again on the plurality of nucleotide sequences with a second similarity threshold for example, 85%
  • the information of the nanopore sequencing signal can be further used to refine the second plurality of clusters.
  • the sequence read lengths of nanopore sequencing signals corresponding to all the sequences can be obtained, and the average signal length is the average value of all sequence read lengths.
  • the merging threshold can be determined by using the nanopore sequencing signals corresponding to all the nucleotide sequences in the largest cluster.
  • the merging threshold can be a function of the nanopore sequencing signal corresponding to all nucleotide sequences in the largest cluster.
  • step 150 cluster optimization is performed on the second plurality of clusters based on the merging threshold to obtain a third plurality of clusters.
  • some clusters of the second plurality of clusters may be merged based on a merge threshold.
  • multiple refinements may also be performed on the merged clusters.
  • the third cluster after optimization is the final clustering result.
  • the method 100 performs mixed clustering on nucleotide sequences by utilizing two kinds of information (nanopore sequencing signal and nucleotide sequence) obtained by nanopore sequencing technology. First cluster the nucleotide sequences with the first similarity threshold, then combine the results of the first cluster and the nanopore sequencing signal to determine the merge threshold, and then perform clustering on the nucleotide sequences with the second similarity threshold The first clustering, finally using the merging threshold to merge and refine the results of the first clustering performed with the second similarity threshold. Therefore, compared to the clustering results using only nucleotide sequence information, the method 100 improves the clustering accuracy; and relative to the clustering results using only nanopore sequencing signals, the method 100 improves the clustering efficiency. In addition, the method 100 does not need to use a large number of samples for training, but only needs to input two kinds of corresponding information, so it has high versatility.
  • FIG. 2 is a flowchart of an example process of first clustering sequences in method 100 of FIG. 1 according to an embodiment of the disclosure.
  • the first clustering of multiple nucleotide sequences includes steps 210 to 270 .
  • Steps 210 to 270 are an iterative process as a whole, that is, continue to execute steps 210 to 270 until the set of nucleotide sequences to be clustered among the multiple nucleotide sequences is empty.
  • a representative sequence of the set of nucleotide sequences to be clustered is determined.
  • the nucleotide sequences to be clustered are multiple nucleotide sequences themselves.
  • determining a representative sequence of the set of nucleotide sequences to be clustered includes determining a nucleotide sequence having the longest length in the set of nucleotide sequences to be clustered as the representative sequence.
  • the nucleotide sequences to be clustered can be sorted in descending order of sequence length, and then the nucleotide sequence with the longest length is selected as a representative sequence.
  • step 220 the set of nucleotide sequences to be clustered is filtered by a short word filter.
  • a short word filter can filter out sequences with a shorter length. By first filtering with a short term filter, the number of subsequent pairwise alignments of sequences can be reduced.
  • step 230 it is judged whether the filtered set of nucleotide sequences to be clustered is an empty set.
  • step 240 in response to the filtered set of nucleotide sequences to be clustered is non-empty, for each nucleotide sequence in the filtered set of nucleotide sequences to be clustered: determine the nucleotide sequence and representative similarity between sequences.
  • the similarity between two different sequences can be determined by sequence alignment.
  • sequence consensus algorithm of BLAST or the gap compression algorithm can be used.
  • the nucleotide sequence is added to a similarity cluster comprising representative sequences.
  • the nucleotide sequences whose similarity is greater than the preset first similarity threshold are added to the similarity cluster where the representative sequence is located .
  • all the sequences greater than the first similarity threshold are added to the similarity cluster where the representative sequence is located, and then the generation of a cluster is completed.
  • step 260 in response to the filtered set of nucleotide sequences to be clustered is an empty set, a representative sequence is added to the short word cluster.
  • the representative sequence itself is regarded as a short word cluster.
  • step 270 the nucleotide sequences in the similarity cluster and the short word cluster are removed from the set of nucleotide sequences to be clustered, so as to update the set of nucleotide sequences to be clustered.
  • the generated similarity clusters that is, clustered
  • the updated set of nucleotide sequences to be clustered will return to the initial step 210 of the iteration to redefine representative sequences and generate another similarity cluster.
  • each similarity cluster and each short word cluster obtained in the iterative process form a second plurality of clusters. It should be known that, for the first clustering with the second similarity threshold, the above steps 210 to 270 are also applicable, and it is only necessary to replace the first similarity threshold in step 250 with the second similarity threshold.
  • the embodiments of the present disclosure can use the information of nucleotide sequence to perform the first clustering according to different similarity thresholds, and use the short word filter to improve the clustering efficiency during the clustering process.
  • steps 210 to 270 may be implemented by Algorithm 1 as follows:
  • N represents the set of nucleotide sequences to be clustered
  • NS represents the set of nucleotide sequences filtered out by the short word filter
  • S represents the set of nucleotide sequences to be clustered after filtering
  • center now represents represents the sequence
  • identity represents the similarity threshold (for example, it can represent the first similarity threshold or the second similarity threshold)
  • c represents the distance between a sequence x in the filtered nucleotide sequence set to be clustered and the representative sequence center now
  • Clusters means multiple first clusters.
  • FIG. 3 is a schematic diagram of an example process of first clustering sequences in the method 100 of FIG. 1 according to an embodiment of the disclosure.
  • the set of nucleotide sequences to be clustered 310 firstly, they are sorted in descending order according to the length of each nucleotide sequence, and the sorted set 320 is obtained. Then select the nucleotide sequence with the longest length in the set 320 , that is, the first nucleotide sequence, as the representative sequence 321 .
  • the set of nucleotide sequences to be clustered 310 or the sorted set 320 is input to a short word filter 330 to filter out short-length nucleotide sequences, such as the sequence 326 . Further, through the sequence comparison module 340, the similarity between each nucleotide sequence in the filtered nucleotide sequence set to be clustered and the representative sequence 321 is calculated. For example, the similarity between the sequence 322 and the representative sequence 321 is greater than or equal to the first similarity threshold, while the similarities between the sequences 323, 324 and 325 and the representative sequence 321 are all smaller than the first similarity threshold.
  • sequence 322 is added to the similarity cluster 350 including the representative sequence 321 and the sequence 322 is removed from the set of nucleotide sequences to be clustered 310 to obtain an updated sequence 360 of the set of nucleotide sequences to be clustered.
  • the updated set sequence 360 of nucleic acid sequences to be clustered is continuously iterated 370 until the set sequence 360 of updated nucleic acid sequences to be clustered is an empty set.
  • each similarity cluster 380-1, 380-2 through 380-k and each short word cluster 390-1, 390-2 through 390-t form a first plurality of clusters.
  • FIG. 4 is a flowchart of an example process of determining a merge threshold in method 100 of FIG. 1 according to an embodiment of the disclosure. As shown in FIG. 4 , determining the merge threshold (step 130 ) includes steps 410 to 430 .
  • a first threshold nanopore sequencing signal is randomly selected from the nanopore sequencing signals corresponding to each nucleotide sequence in the largest cluster.
  • the number of nanopore sequencing signals to be selected from the largest cluster can be determined according to the size of the largest cluster.
  • the first threshold in response to determining that the maximum cluster size is greater than the second threshold, the first threshold is the second threshold. In another example, in response to determining that the maximum cluster size is less than or equal to the second threshold, the first threshold is the maximum cluster size.
  • step 420 a first dynamic time warping distance between every two nanopore sequencing signals among the first threshold number of nanopore sequencing signals is calculated.
  • the dynamic time warping (DTW) distance between the two signals can be calculated.
  • a merging threshold is determined based on the sum of the first dynamic time warping distances, the average signal length of the nanopore sequencing signals corresponding to the multiple nucleotide sequences, and the maximum cluster size.
  • the method for calculating the combining threshold may be determined according to the average signal length.
  • the merging threshold is a function of the mean of all DTW distances between any pair of signals.
  • step 410 to step 430 may be implemented by Algorithm 2 as follows:
  • a high similarity threshold (for example, 90%) can be set to run the first clustering method shown in Algorithm 1 to obtain the first plurality of clusters.
  • MaxCluster represents the largest cluster
  • MaxLength represents the maximum cluster size
  • AveLenSig represents the average signal length of nanopore sequencing signals corresponding to multiple nucleotide sequences
  • sum represents the sum of the first dynamic time warping distances
  • sum/MaxLength Denotes the mean of all first dynamic time warping distances
  • Threshold denotes the merge threshold.
  • the second threshold is denoted as 10. When the maximum cluster size MaxLength is greater than the second threshold (10), then the first threshold is equal to the second threshold (10).
  • the first threshold (10) sequences are randomly selected from the MaxCluster.
  • the maximum cluster size MaxLength is less than or equal to the second threshold (10)
  • the first threshold is equal to MaxLength.
  • the first threshold (MaxLength) sequence is randomly selected from the maximum cluster MaxCluster.
  • the merge threshold Threshold can also be expressed as:
  • Threshold sum/MaxLength+c
  • c is a constant and can be determined by constructing a large number of simulation data sets for testing.
  • the embodiments of the present disclosure on the basis of first clustering the information of multiple nucleotide sequences using a high similarity threshold, the nanopore sequencing signals corresponding to the multiple nucleotide sequences are used to determine Merge Threshold.
  • the similarity between every two signals is measured by the first dynamic time warping distance. Therefore, the embodiments of the present disclosure combine two kinds of information, which can be used to improve the accuracy of clustering.
  • FIG. 5 is a flowchart of an example process of cluster optimization in the method 100 of FIG. 1 according to an embodiment of the disclosure. As shown in FIG. 5 , performing clustering optimization on the second plurality of clusters based on the merging threshold (step 140 ) includes steps 510 to 540 .
  • a first subset of the second plurality of clusters is determined, each cluster in the first subset having a cluster size greater than a third threshold.
  • the second plurality of clusters may be classified into good clusters (first subset) and bad clusters based on a cluster size of each cluster and a third threshold. For example, if the cluster size of the cluster is greater than a third threshold (for example, it may be set to 5), then the cluster belongs to the first subset.
  • the third threshold for classification may be determined according to the maximum cluster size.
  • determining the first subset includes: in response to determining that the largest cluster size is greater than a third threshold, determining clusters of the second plurality of clusters greater than a third threshold to form the first subset; and in response to The maximum cluster size is determined to be less than or equal to a third threshold, and clusters of the second plurality of clusters equal to the maximum cluster size are determined to form the first subset.
  • step 520 for each cluster in the first subset: randomly select a fourth threshold nanopore sequencing signal from nanopore sequencing signals corresponding to each nucleotide sequence in the cluster.
  • step 530 the ratio of each of the fourth threshold nanopore sequencing signals randomly selected from the cluster to the fourth threshold randomly selected nanopore sequencing signals from another cluster in the first subset is calculated The corresponding second dynamic time warping distance between .
  • step 540 in response to determining that the respective second dynamic time warping distances are both less than the merge threshold, the cluster and another cluster are merged to obtain a merged first subset.
  • nanopore sequencing signals corresponding to the fourth threshold sequence may be randomly selected from each cluster in the first subset.
  • the fourth threshold can be set to three, for example.
  • step 510 to step 540 may be implemented through Algorithm 3 as follows:
  • the function G ET M ER T H CFS IGNAL can implement step 510 to step 530 .
  • MaxLength represents the maximum cluster size
  • the fifth threshold is set to 5. It can be seen that when the maximum cluster size MaxLength is greater than the third threshold (5), those clusters whose cluster size is greater than the third threshold (5) are selected from the second plurality of clusters to form the first subset GoodCluster. On the other hand, when the maximum cluster size MaxLength is less than or equal to the third threshold (5), those clusters whose cluster size is equal to the maximum cluster size MaxLength are selected from the second plurality of clusters to form the first subset Good Cluster. Further, after the first subset GoodCluster is merged through step 530 and step 540, the merged first subset RefineGoodCluster can be obtained.
  • FIG. 6A-6B are schematic diagrams of an example process of cluster optimization in the method 100 of FIG. 1 , according to an embodiment of the present disclosure.
  • FIG. 6A shows the second plurality of clusters 610 , 620 , 630 , 640 , 650 and 660 obtained after the first clustering of 30 nucleotide sequences with the second similarity threshold. Sequences with the same texture in each sequence in the figure are located in the same cluster. For example, nucleotide sequences 641 , 642 and 643 are located in cluster 640 .
  • a first subset 670 of the plurality of first clusters 610 to 660 is determined.
  • the clusters 610 , 630 and 660 with a cluster size larger than 5 form the first subset 670 .
  • clusters 620 , 640 and 650 with a cluster size smaller than 5 do not belong to the first subset 670 .
  • FIG. 6B illustrates example operations for merging optimization on each of the clusters 610 , 630 , and 660 in the first subset 670 .
  • the fourth threshold set to 3 in FIG. 6B
  • sequences 611 , 612 and 613 are first randomly selected from the cluster 610 .
  • three sequences 631 , 632 and 633 are randomly selected from cluster 620 .
  • the second dynamic time warping distances between the nanopore sequencing signals corresponding to the sequences from different clusters are calculated respectively.
  • the second dynamic time warping distances between the sequence 611 and the sequence 631, between the sequence 611 and the sequence 632, and between the sequence 611 and the sequence 633 are respectively calculated. Similar operations are also performed for sequence 612 and sequence 613 .
  • judge 680 whether all the second dynamic time warping distances are smaller than the combining threshold. For example, all the second dynamic time warping distances between the sequences 611 , 612 and 613 and the sequences 631 , 632 and 633 are smaller than the merging threshold, then the cluster 610 and the cluster 630 are merged to obtain the cluster 690 .
  • sequences 661 , 662 and 663 are randomly picked from cluster 660 .
  • the embodiments of the present disclosure can use the nanopore sequencing signal to merge some of the clusters in the first cluster, thereby improving the clustering accuracy.
  • FIG. 7 is a flowchart of an example process of cluster optimization in the method 100 of FIG. 1 according to an embodiment of the disclosure. As shown in FIG. 7 , cluster optimization (step 140 ) further includes steps 710 to 740 .
  • a consensus sequence signal corresponding to each cluster in the merged first subset is determined.
  • the consensus sequence of a cluster can be determined first, and then the nanopore sequencing signal of the corresponding consensus sequence can be determined to obtain the consensus sequence signal.
  • step 720 for each nucleotide sequence included in the second subset: for each consensus sequence signal: calculate the first distance between the nanopore sequencing signal corresponding to the nucleotide sequence and the consensus sequence signal Three dynamic time warping distances. For each sequence in the second subset, respective third dynamic time warping distances between that sequence and all consensus sequence signals are calculated.
  • step 730 in response to determining that the third dynamic time warping distance is less than the merging threshold, adding the nucleotide sequence to a cluster in the merged first subset corresponding to the consensus sequence signal to update the merged first subset set.
  • the third dynamic time warping distance between a sequence in the second subset and a consensus sequence signal is smaller than the merging threshold, the sequence is added to the cluster corresponding to the consensus sequence signal.
  • the nucleotide sequences added to the merged first subset are removed from the second subset to update the second subset.
  • step 710 to step 740 may be implemented by Algorithm 4 as follows:
  • RefineGoodCluster represents the merged first subset
  • OSS represents the second subset
  • InitialCFSignalSet represents the set of consensus sequence signals of each cluster in the first subset RefineGoodCluster
  • threshold represents the merge threshold
  • FIG. 8 is a schematic diagram of an example process of cluster optimization in the method 100 of FIG. 1 according to an embodiment of the present disclosure.
  • the merged first subset includes cluster 810 and cluster 820 .
  • the consensus sequence signal 811 for cluster 810 and the consensus sequence signal 821 for cluster 820 are respectively determined.
  • the second subset 830 includes 9 nucleotide sequences. Taking the sequence 831 as an example, calculate the third dynamic time warping distance (denoted by d 1 and d 2 respectively) between the nanopore sequencing signal corresponding to the sequence 831 and the consensus sequence signal 811 and the consensus sequence signal 821 . Then the comparator 840 judges whether d 1 and d 2 are smaller than the merge threshold.
  • the sequence 831 is added to the cluster 810 corresponding to the consensus sequence signal 811, so as to update the cluster 810 to be the cluster 810'.
  • the sequence 833 is added to the cluster 820 corresponding to the consensus sequence 821, and a new Cluster 820 is cluster 820'. Accordingly, the second subset 830 will remove sequences 832 and 833, for example.
  • the second subset 830 retains the sequence 832 . Finally, the updated second subset 830' is obtained.
  • the embodiment of the present disclosure utilizes the consensus sequence signal to further optimize the merged second plurality of clusters, thereby improving the clustering accuracy.
  • further cluster optimization may be performed in response to the updated second subset being non-empty.
  • clustering is performed on the updated second subset based on nanopore sequencing signals corresponding to each nucleotide sequence in the updated second subset to obtain at least one cluster.
  • the fourth dynamic time warping distance between the nanopore sequencing signals corresponding to every two nucleotide sequences in each cluster in at least one cluster is smaller than the merge threshold, and the updated merged first subset
  • the set and the at least one cluster form the third plurality of clusters.
  • Algorithm 4 can be referred to to implement the above steps.
  • G represents at least one cluster. For each sequence in at least one cluster G, the fourth dynamic time warping distance between any pair of them is smaller than the merging threshold.
  • FIG. 9 is a schematic diagram of an example process of cluster optimization in the method 100 of FIG. 1 according to an embodiment of the present disclosure.
  • the updated second subset includes nucleotide sequences 910 , 920 , 930 and 940 .
  • the nucleotide sequence 910 firstly calculate the fourth dynamic time warping distances 921 , 931 and 941 between the sequence 910 and the nanopore sequencing signals corresponding to the nanopore sequencing signals and the sequences 920 , 930 and 940 .
  • the fourth dynamic time warping distances 921 , 931 and 941 are compared by a comparator 950 with a merge threshold.
  • Cluster 960 includes sequences 910 and 920 . Since the distances 931 and 941 are still greater than or equal to the merging threshold, next, a fourth dynamic time warping distance 943 between the nanopore sequencing signal corresponding to the sequence 930 and the nanopore sequencing signal corresponding to the sequence 940 is calculated. Then, the comparator 950 is used to determine whether the fourth dynamic time warping distance 943 is smaller than the combination threshold. When the fourth dynamic time warping distance 943 is smaller than the merging threshold, another new cluster 970 is generated. Cluster 970 includes sequences 930 and 940 .
  • the embodiments of the present disclosure can further optimize the nucleotide sequences that are not classified into the updated second cluster, thereby improving the clustering accuracy.
  • FIG. 10 is a flowchart of an example process of cluster optimization in the method 100 of FIG. 1 according to an embodiment of the disclosure.
  • the cluster optimization may further include steps 1010 to 1050 in response to a third subset of the second plurality of clusters other than the third plurality of clusters being non-empty.
  • steps 1010 to 1050 can be used as a checking mechanism for clustering results, for finding nucleotide sequences that have not been added to the cluster. For example, some nucleotide sequences are very short in length due to translation errors. Steps 1010 to 1050 can add such nucleotide sequences to the corresponding clusters.
  • step 1010 for each nucleotide sequence in the third subset: calculate the nanopore sequencing signal corresponding to the nucleotide sequence and the nanopore corresponding to a nucleotide sequence randomly selected from the third plurality of clusters Fifth dynamic time warping distance between hole signals.
  • step 1020 in response to determining that the fifth dynamic time warping distance is less than the merge threshold, adding the nucleotide sequence to a cluster of the third plurality of clusters comprising a randomly selected nucleotide sequence to update the third plurality of clusters set.
  • each nucleotide sequence added to the third plurality of clusters is removed from the third subset to update the third subset.
  • step 1010 to step 1030 may be implemented by Algorithm 5 as follows:
  • NN represents the third subset
  • ss represents a sequence randomly selected from the current third cluster Clusters now .
  • Algorithm 5 calculates the fifth dynamic time warping distance between the nanopore sequencing signal corresponding to each sequence nn in the third subset NN and ss, and judges whether the distance is smaller than the merging threshold Threshold. If it is less than the merge threshold Threshold, add nn to the Cluster where ss is located.
  • the cluster optimization in response to the updated third subset being non-empty, the cluster optimization further includes step 1040 and step 1050 .
  • each nucleotide sequence in the updated third subset is classified into a corresponding individual cluster.
  • each respective individual cluster is added to the updated third plurality of clusters.
  • the embodiment of the present disclosure also introduces a checking mechanism to perform clustering on nucleotide sequences that have not been added to the clustering, thereby ensuring the integrity of the clustering results.
  • the multiple nucleotide sequences are from multiple single cells, the nucleotide sequences from the same single cell have the same tag, and the nucleotide sequences from different single cells have different tags.
  • the embodiments of the present disclosure can not only be used for clustering without labels, but also can be used for clustering nucleotide sequences with labels, and directly associate the clustering results with labels.
  • the third plurality of clusters is associated with respective corresponding tags
  • the method 100 further includes: based on the third plurality of clusters and the corresponding tags associated with the third plurality of clusters, selecting from the plurality of clusters
  • the nucleotide sequences from each single cell in the plurality of single cells are isolated from the nucleotide sequences.
  • the sequences in the sequencing library are integrated with tags, and the tags reflect the cell of origin of the sequences.
  • labeled sequences can be clustered. After the clustering is completed, the nucleotide sequence of each single cell can be separated from multiple nucleotide sequences according to the clustering result. Therefore, the method 100 can separate the nucleotide sequence of each single cell based on the source of the single cell from a large number of nucleotide sequences mixed with multiple single cells, thereby improving the accuracy of single cell sequencing.
  • FIG. 11 is a block diagram of an apparatus 1100 for single-cell sequencing according to an embodiment of the present disclosure.
  • the single cell sequencing apparatus 1100 includes an acquisition module 1110 , a first similarity clustering module 1120 , a determination module 1130 , a second similarity clustering module 1140 and a clustering optimization module 1150 .
  • the acquiring module 1110 is configured to acquire multiple nucleotide sequences in the sequencing library and nanopore sequencing signals corresponding to the multiple nucleotide sequences.
  • the first similarity clustering module 1120 is configured to perform first clustering on a plurality of nucleotide sequences based on a first similarity threshold to obtain a first plurality of clusters, the first plurality of clusters includes the cluster with the largest The largest cluster set of set size.
  • the determining module 1130 is configured to determine the merge threshold based on the average signal length of nanopore sequencing signals corresponding to multiple nucleotide sequences and the nanopore sequencing signals corresponding to each nucleotide sequence in the largest cluster.
  • the second similarity clustering module 1140 is configured to perform first clustering on a plurality of nucleotide sequences based on a second similarity threshold to obtain a second plurality of clusters, the first similarity threshold being greater than the second similarity threshold
  • the cluster optimization module 1150 is configured to perform cluster optimization on the second plurality of clusters based on the merging threshold to obtain a third plurality of clusters.
  • the determination module 1130 includes a first selection submodule 1131 , a first calculation submodule 1132 and a first determination submodule 1133 .
  • the first selecting submodule 1131 is configured to randomly select a first threshold number of nanopore sequencing signals from the nanopore sequencing signals corresponding to each nucleotide sequence in the largest cluster.
  • the first calculation sub-module 1132 is configured to calculate a first dynamic time warping distance between every two nanopore sequencing signals in the first threshold number of nanopore sequencing signals.
  • the first determination sub-module 1133 is configured to determine the merging threshold based on the sum of the first dynamic time warping distances, the average signal length of the nanopore sequencing signals corresponding to the multiple nucleotide sequences and the maximum cluster size.
  • the cluster optimization module 1150 includes a second determination submodule 1151 , a second selection submodule 1152 , a second calculation submodule 1153 and a merging submodule 1154 .
  • the second determination sub-module 1151 is configured to determine a first subset of the second plurality of clusters based on the maximum cluster size and the third threshold.
  • the second selecting submodule 1152 is configured to, for each cluster in the first subset: randomly select a fourth threshold nanopore sequencing signal from the nanopore sequencing signals corresponding to each nucleotide sequence in the cluster.
  • the second calculation sub-module 1153 is configured to calculate the ratio of each nanopore sequencing signal among the fourth threshold nanopore sequencing signals randomly selected from the cluster to the fourth threshold randomly selected from another cluster in the first subset. The corresponding second dynamic time warping distance between nanopore sequencing signals.
  • the merging sub-module 1154 is configured to, in response to determining that the respective second dynamic time warping distances are both smaller than the merging threshold, merge the cluster and the other cluster to obtain a merged first subset.
  • the cluster optimization module 1150 in response to the second subset of the second plurality of clusters excluding the merged first subset being non-empty, the cluster optimization module 1150 further includes a third determining submodule 1155, a third calculating A submodule 1156 , a first updating submodule 1157 and a second updating submodule 1158 .
  • the third determination sub-module 1155 is configured to determine the consensus sequence signal corresponding to each cluster in the merged first subset.
  • the third calculation submodule 1156 is configured to: for each nucleotide sequence included in the second subset: for each consensus sequence signal: calculate the nanopore sequencing signal and the consensus sequence signal corresponding to the nucleotide sequence The third dynamic time warping distance between .
  • the first update submodule 1157 is configured to add the nucleotide sequence to a cluster in the merged first subset corresponding to the consensus sequence signal in response to determining that the third dynamic time warping distance is less than the merge threshold, to The merged first subset is updated.
  • the second update sub-module 1158 is configured to remove from the second subset nucleotide sequences added to the merged first subset to update the second subset.
  • each module of the apparatus 1100 shown in FIG. 11 may correspond to each step in the method 100 described above with reference to FIGS. 1-10 .
  • the operations, features and advantages described above with respect to the method 100 are also applicable to the apparatus 1100 and the modules it includes. For the sake of brevity, some operations, features and advantages are not described in detail here.
  • a discussion herein of a particular module performing an action includes the particular module itself performing the action, or alternatively the particular module invoking or otherwise accessing another component or module that performs the action (or performs the action in conjunction with the particular module). Accordingly, a particular module that performs an action may include the particular module that performs the action itself and/or another module that the particular module calls or otherwise accesses that performs the action.
  • an electronic device including: at least one processor; and at least one memory connected to the at least one processor in communication, the at least one memory stores instructions, and when the instructions are executed by the at least one processor , causing at least one processor to execute the above method.
  • a non-transitory computer-readable storage medium storing instructions. When executed by at least one processor of a computer, the instructions cause the computer to execute the above method.
  • a computer program product including a computer program, and the computer program implements the above method when executed by a processor.
  • FIG. 12 shows an example configuration of an electronic device 1200 that may be used to implement the methods described herein.
  • Electronic device 1200 may be various different types of devices. Examples of electronic device 1200 include, but are not limited to: desktop computers, server computers, notebook or netbook computers, mobile devices (e.g., tablet computers, cellular or other wireless telephones (e.g., smartphones), notepad computers, mobile stations), Wearable devices (eg, glasses, watches), entertainment devices (eg, entertainment appliances, set-top boxes communicatively coupled to display devices, game consoles), televisions or other display devices, automotive computers, and the like.
  • mobile devices e.g., tablet computers, cellular or other wireless telephones (e.g., smartphones), notepad computers, mobile stations), Wearable devices (eg, glasses, watches), entertainment devices (eg, entertainment appliances, set-top boxes communicatively coupled to display devices, game consoles), televisions or other display devices, automotive computers, and the like.
  • mobile devices e.g., tablet computers, cellular or other wireless telephones (e.g., smartphones), notepad computers, mobile stations)
  • Wearable devices eg, glasses,
  • Electronic device 1200 may include at least one processor 1202, memory 1204, communication interface(s) 1206, display device 1208, other input/output (I/O) devices capable of communicating with each other, such as through a system bus 1214 or other suitable connection. 1210 and one or more mass storage devices 1212.
  • processor 1202 memory 1204, communication interface(s) 1206, display device 1208, other input/output (I/O) devices capable of communicating with each other, such as through a system bus 1214 or other suitable connection. 1210 and one or more mass storage devices 1212.
  • the processor 1202 may be a single processing unit or multiple processing units, and all processing units may include single or multiple computing units or multiple cores.
  • Processor 1202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuits, and/or any device that manipulates signals based on operational instructions.
  • processor 1202 may be configured to retrieve and execute computer-readable instructions stored in memory 1204, mass storage device 1212, or other computer-readable media, such as program code for operating system 1216, application programs 1218 program code of other programs 1220, etc.
  • Memory 1204 and mass storage device 1212 are examples of computer-readable storage media for storing instructions for execution by processor 1202 to implement the various functions described above.
  • memory 1204 may generally include both volatile and non-volatile memory (eg, RAM, ROM, etc.).
  • mass storage devices 1212 may generally include hard drives, solid state drives, removable media including external and removable drives, memory cards, flash memory, floppy disks, optical disks (eg, CD, DVD), storage arrays, network attached storage , storage area network and so on.
  • Both the memory 1204 and the mass storage device 1212 may be collectively referred to herein as a memory or a computer-readable storage medium, and may be a non-transitory medium capable of storing computer-readable, processor-executable program instructions as computer program codes,
  • the computer program code may be executed by the processor 1202 as a specific machine configured to implement the operations and functions described in the examples herein.
  • Programs may be stored on mass storage device 1212 . These programs include operating system 1216, one or more application programs 1218, other programs 1220, and program data 1222, and they may be loaded into memory 1204 for execution. Examples of such application programs or program modules may include, for example, computer program logic (e.g., computer program code or instructions) acquisition module 1110, first clustering module 1120, determination module 1130, and clustering optimization for implementing the following components/functions: Module 1140, method 100 (including any suitable steps of method 100), and/or additional embodiments described herein.
  • computer program logic e.g., computer program code or instructions
  • modules 1216 , 1218 , 1220 , and 1222 may be implemented using any form of computer-readable media that is accessible by electronic device 1200 .
  • “computer-readable media” includes at least two types of computer-readable media, namely, computer-readable storage media and communication media.
  • Computer-readable storage media includes volatile and nonvolatile, removable and non-removable media implemented by any method or technology for storage of information, such as computer-readable instructions, data structures, program module or other data.
  • Computer-readable storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory, or other memory technology, CD-ROM, digital versatile disk (DVD), or other optical storage device, magnetic cartridge, tape, magnetic disk storage device, or other magnetic storage device, or any other non-transmission medium that can be used to store information for access by an electronic device.
  • communication media may embody computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism.
  • Computer-readable storage media as defined herein do not include communication media.
  • One or more communication interfaces 1206 are used to exchange data with other devices, such as over a network, direct connection, and the like.
  • Such communication interfaces may be one or more of the following: any type of network interface (e.g., a network interface card (NIC)), wired or wireless (such as IEEE 802.11 wireless LAN (WLAN)) wireless interface, global microwave Access Interoperability (Wi-MAX) interface, Ethernet interface, Universal Serial Bus (USB) interface, cellular network interface, Bluetooth TM interface, Near Field Communication (NFC) interface, etc.
  • the communication interface 1206 can facilitate communication within a variety of networks and protocol types, including wired networks (eg, LAN, cable, etc.) and wireless networks (eg, WLAN, cellular, satellite, etc.), the Internet, and the like. Communication interface 1206 may also provide for communication with external storage devices (not shown), such as in storage arrays, network attached storage, storage area networks, and the like.
  • a display device 1208, such as a monitor may be included for displaying information and images to a user.
  • Other I/O devices 1210 may be devices that receive various inputs from the user and provide various outputs to the user, and may include touch input devices, gesture input devices, cameras, keyboards, remote controls, mice, printers, audio input/ output devices, etc.
  • a cloud includes and/or represents a platform for resources.
  • the platform abstracts the underlying functionality of the cloud's hardware (eg, servers) and software resources.
  • Resources may include applications and/or data that may be used when computing processing is performed on a server remote from the electronic device 1200 .
  • Resources may also include services provided over the Internet and/or over a subscriber network, such as a cellular or Wi-Fi network.
  • the platform can abstract resources and functions to connect the electronic device 1200 with other electronic devices. Accordingly, implementation of the functionality described herein may be distributed throughout the cloud. For example, the functions may be implemented partly on the electronic device 1200 and partly through a platform that abstracts the functions of the cloud.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

La présente invention concerne une méthode de séquençage de cellule unique. La méthode consiste : à acquérir, à partir d'une bibliothèque de séquençage, une pluralité de séquences nucléotidiques et des signaux de séquençage de nanopores correspondant à la pluralité de séquences nucléotidiques ; d'après une première valeur seuil de similarité, à effectuer un premier groupement sur la pluralité de séquences de nucléotides afin d'obtenir des premiers ensembles de groupes multiples comprenant le plus grand ensemble de groupes de plus grande taille d'ensembles de groupes ; à déterminer une valeur seuil de fusion d'après la valeur moyenne de longueur de signaux des signaux de séquençage de nanopores correspondant à la pluralité de séquences nucléotidiques et un signal de séquençage de nanopores correspondant à chaque séquence nucléotidique du plus grand ensemble de groupes ; d'après une seconde valeur seuil de similarité, à effectuer un premier groupement sur la pluralité de séquences de nucléotides afin d'obtenir des deuxièmes ensembles de groupes multiples, la première valeur seuil de similarité dépassant la seconde valeur seuil de similarité ; et, d'après la valeur seuil de fusion, à effectuer une optimisation de groupement sur les deuxièmes ensembles de groupes multiples afin d'obtenir des troisièmes ensembles de groupes multiples.
PCT/CN2021/116704 2021-09-06 2021-09-06 Méthode et appareil de séquençage de cellule unique, dispositif, support et produit-programme Ceased WO2023029044A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/116704 WO2023029044A1 (fr) 2021-09-06 2021-09-06 Méthode et appareil de séquençage de cellule unique, dispositif, support et produit-programme
CN202111481203.3A CN114171117B (zh) 2021-09-06 2021-12-06 用于单细胞测序的方法、装置、设备、介质和程序产品

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/116704 WO2023029044A1 (fr) 2021-09-06 2021-09-06 Méthode et appareil de séquençage de cellule unique, dispositif, support et produit-programme

Publications (1)

Publication Number Publication Date
WO2023029044A1 true WO2023029044A1 (fr) 2023-03-09

Family

ID=80483518

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/116704 Ceased WO2023029044A1 (fr) 2021-09-06 2021-09-06 Méthode et appareil de séquençage de cellule unique, dispositif, support et produit-programme

Country Status (2)

Country Link
CN (1) CN114171117B (fr)
WO (1) WO2023029044A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117594130A (zh) * 2024-01-19 2024-02-23 北京普译生物科技有限公司 纳米孔测序信号评价方法、装置、电子设备和存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115496114B (zh) * 2022-11-18 2023-04-07 成都戎星科技有限公司 一种基于k-均值聚类的tdma突发长度估计方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104321441A (zh) * 2012-02-16 2015-01-28 牛津楠路珀尔科技有限公司 聚合物的测量的分析
CN109415765A (zh) * 2016-04-14 2019-03-01 昆塔波尔公司 用纳米孔的聚合物分析中的混合光学信号
US20200035325A1 (en) * 2018-07-24 2020-01-30 King Abdullah University Of Science And Technology Continuous wavelet-based dynamic time warping method and system
WO2020084404A1 (fr) * 2018-10-25 2020-04-30 King Abdullah University Of Science And Technology Système et procédé de recherche et de mappage de sous-séquence directe dans un signal brut de nanopore
CN111292806A (zh) * 2020-03-27 2020-06-16 武汉古奥基因科技有限公司 一种利用纳米孔测序的转录组分析方法
US20210139977A1 (en) * 2019-11-07 2021-05-13 Hong Kong Baptist University Method for identifying RNA isoforms in transcriptome using Nanopore RNA reads

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012149171A1 (fr) * 2011-04-27 2012-11-01 The Regents Of The University Of California Conception de sondes cadenas pour effectuer un séquençage génomique ciblé
GB201319779D0 (en) * 2013-11-08 2013-12-25 Cartagenia N V Genetic analysis method
WO2017209891A1 (fr) * 2016-05-31 2017-12-07 Quantapore, Inc. Séquençage bicolore par les nanopores
CN110111843B (zh) * 2018-01-05 2021-07-06 深圳华大基因科技服务有限公司 对核酸序列进行聚类的方法、设备及存储介质
CN110232951B (zh) * 2018-12-06 2023-08-01 苏州金唯智生物科技有限公司 判断测序数据饱和的方法、计算机可读介质和应用
CN110600078B (zh) * 2019-08-23 2022-03-18 北京百迈客生物科技有限公司 一种基于纳米孔测序检测基因组结构变异的方法
US11495324B2 (en) * 2019-10-01 2022-11-08 Microsoft Technology Licensing, Llc Flexible decoding in DNA data storage based on redundancy codes
CN112750502B (zh) * 2021-01-18 2022-04-15 中南大学 二维分布结构判定的单细胞转录组测序数据聚类推荐方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104321441A (zh) * 2012-02-16 2015-01-28 牛津楠路珀尔科技有限公司 聚合物的测量的分析
CN109415765A (zh) * 2016-04-14 2019-03-01 昆塔波尔公司 用纳米孔的聚合物分析中的混合光学信号
US20200035325A1 (en) * 2018-07-24 2020-01-30 King Abdullah University Of Science And Technology Continuous wavelet-based dynamic time warping method and system
WO2020084404A1 (fr) * 2018-10-25 2020-04-30 King Abdullah University Of Science And Technology Système et procédé de recherche et de mappage de sous-séquence directe dans un signal brut de nanopore
US20210139977A1 (en) * 2019-11-07 2021-05-13 Hong Kong Baptist University Method for identifying RNA isoforms in transcriptome using Nanopore RNA reads
CN111292806A (zh) * 2020-03-27 2020-06-16 武汉古奥基因科技有限公司 一种利用纳米孔测序的转录组分析方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HAN RENMIN, LI YU, GAO XIN, WANG SHENG: "An accurate and rapid continuous wavelet dynamic time warping algorithm for end-to-end mapping in ultra-long nanopore sequencing", BIOINFORMATICS, OXFORD UNIVERSITY PRESS , SURREY, GB, vol. 34, no. 17, 1 September 2018 (2018-09-01), GB , pages i722 - i731, XP093041541, ISSN: 1367-4803, DOI: 10.1093/bioinformatics/bty555 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117594130A (zh) * 2024-01-19 2024-02-23 北京普译生物科技有限公司 纳米孔测序信号评价方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN114171117A (zh) 2022-03-11
CN114171117B (zh) 2022-07-15

Similar Documents

Publication Publication Date Title
Alharbi et al. A review of deep learning applications in human genomics using next-generation sequencing data
Yang et al. Feature selection revisited in the single-cell era
Zhang et al. Critical downstream analysis steps for single-cell RNA sequencing data
Ali et al. Alignment-free protein interaction network comparison
Sharma et al. DeepFeature: feature selection in nonimage data using convolutional neural network
Huang et al. Extracting biological meaning from large gene lists with DAVID
Langfelder et al. When is hub gene selection better than standard meta-analysis?
Yan et al. A graph-based approach to systematically reconstruct human transcriptional regulatory modules
Wang et al. Conditional generative adversarial network for gene expression inference
Chiu et al. Missing value imputation for microarray data: a comprehensive comparison study and a web tool
Wang et al. Progress in single-cell multimodal sequencing and multi-omics data integration
Wei et al. CALLR: a semi-supervised cell-type annotation method for single-cell RNA sequencing data
Li et al. HSM6AP: a high-precision predictor for the Homo sapiens N6-methyladenosine (m^ 6 A) based on multiple weights and feature stitching
Žitnik et al. Gene network inference by fusing data from diverse distributions
CN114171117B (zh) 用于单细胞测序的方法、装置、设备、介质和程序产品
Lee et al. Ontology-aware classification of tissue and cell-type signals in gene expression profiles across platforms and technologies
Ahmad et al. Integrating heterogeneous omics data via statistical inference and learning techniques
Ji et al. scAnnotate: an automated cell-type annotation tool for single-cell RNA-sequencing data
Wang et al. Transfer learning for clustering single-cell RNA-seq data crossing-species and batch, case on uterine fibroids
Karaaslanli et al. scSGL: kernelized signed graph learning for single-cell gene regulatory network inference
Sun et al. Inferring cell diversity in single cell data using consortium-scale epigenetic data as a biological anchor for cell identity
CN115392366A (zh) 跨模态数据对齐方法、装置、设备和存储介质
Cahuantzi et al. Unsupervised identification of significant lineages of SARS-CoV-2 through scalable machine learning methods
Shah et al. Model-based clustering of array CGH data
Kawaguchi et al. Learning single-cell chromatin accessibility profiles using meta-analytic marker genes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21955575

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 20.06.2024)

122 Ep: pct application non-entry in european phase

Ref document number: 21955575

Country of ref document: EP

Kind code of ref document: A1