[go: up one dir, main page]

CN114121158A - Deep network self-adaption based scRNA-seq cell type identification method - Google Patents

Deep network self-adaption based scRNA-seq cell type identification method Download PDF

Info

Publication number
CN114121158A
CN114121158A CN202111471768.3A CN202111471768A CN114121158A CN 114121158 A CN114121158 A CN 114121158A CN 202111471768 A CN202111471768 A CN 202111471768A CN 114121158 A CN114121158 A CN 114121158A
Authority
CN
China
Prior art keywords
data set
data
scrna
network
seq
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111471768.3A
Other languages
Chinese (zh)
Inventor
王树林
刘孟林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202111471768.3A priority Critical patent/CN114121158A/en
Publication of CN114121158A publication Critical patent/CN114121158A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to data mining in bioinformatics, and in particular to the mining of scRNA-seq data. In particular to a deep network self-adaptive scRNA-seq cell type identification method. The method of the invention comprises the processing of scRNA-seq data; constructing neural network training scRNA-seq data; adding a self-adaptive layer optimization neural network architecture to overcome the difference between different batches of data sets; accurate identification of cell types in scRNA-seq datasets with unknown type information. The method can be used for identifying the cell type of the unknown scRNA-seq data set, and can effectively overcome the technical difference and batch effect influence between the data set with known type information and the data set with unknown type information.

Description

Deep network self-adaption based scRNA-seq cell type identification method
Technical Field
The invention relates to data mining in bioinformatics, in particular to the mining of scRNA-seq data, and particularly relates to a deep network self-adaptive scRNA-seq cell type identification method.
Background
Cells are considered to be the basic structural and functional unit of an organism. Human cells contain about 2 ten thousand genes, and each cell has a self-specific gene expression pattern, and only partial genes are expressed. This results in cell-specific protein components and biological functions. scRNA-seq uses single cell as unit, improves high-throughput sequencing through whole genome or transcriptome amplification, can reveal the gene structure and gene expression state of single cell and reflect heterogeneity among cells. The development of scRNA-seq technology has progressed rapidly in the last decade, the scale of sequencing data has increased from tens to thousands or even millions of cells, and a number of new sequencing platforms, such as 10XGenomics chromosome, inDrop and Drop-seq, have emerged. The recognition of cell types plays an important role in the analysis of scRNA-seq data, and well annotated scRNA-seq data enables biologists to conduct further downstream analyses and improves our understanding of the cellular mechanisms of disease.
The current methods for identifying scRNA-seq data types through bioinformatics are mainly divided into three categories: the first method firstly clusters cell groups, then finds out a specific marker gene of each cluster through differential expression analysis, and finally annotates cells according to the body function of the gene. However, the generalization performance of such methods is generally poor. Furthermore, as the scale of sequencing data increases, the task of annotating cells by looking for marker genes becomes increasingly burdensome and time consuming. The second method is to use the information of a well-annotated reference data set to assist in cell type identification in new data. Representative methods of this are to project cells in the target data set into a space defined by highly informative genes selected from a well-annotated source data set and then assign cell identities to the cells in the target data set based on their correlation to the average cell-type specific gene expression in the source data. However, such methods can only use cell type information in the reference data, ignoring useful information in the target data. The last method mainly overcomes the burden of large-scale scRNA-seq data type identification through a deep neural network, and sequencing data are embedded into a low-dimensional space by using a nonlinear automatic encoder to perform subsequent clustering and classification tasks. Also these methods do not take into account the performance breakdown that can be caused by technical variations and batch effects, especially when the target and reference data come from different sequencing platforms, the accuracy of cell classification can be greatly reduced.
In summary, the differences between different sequencing platforms, different tissues and different species data sets are not fully considered in the existing methods, and the well-annotated reference data set and the gene expression information and data distribution information of the unknown data set are rarely fully utilized, so that how to design a robust method to accurately identify unknown scRNA-seq cell types still remains a challenge.
Disclosure of Invention
Aiming at the problems existing in the method and the importance of accurate identification of the scRNA-seq cell types, the invention provides a method for identifying the scRNA-seq cell types based on deep network self-adaptation. The method adopts deep network self-adaptation to extract gene expression information and align data distribution of a well-annotated reference data set and an unknown target data set, and is a method for identifying cell types of scRNA-seq data sets in different batches. The method comprises the following steps:
1. data collection phase
The method collects data sets of multiple situations from multiple data platforms. The first is a universal reference dataset, generated by two sequencing modes, 10x and CelSeq2, respectively; the second type is a human pancreas tissue data set generated by adopting different sequencing modes, and the human pancreas tissue data set is generated by five sequencing modes, namely CelSeq, CelSeq2, SmartSeq2, Fluidigmc1 and inDrop; the third category is a dataset of different tissues within the same species, a mouse senescent cell map (Tabula Muris Senis) dataset downloaded from Figshare, containing 23341 gene expression information from 96307 cells, containing 22 tissues. The data sets can be combined to evaluate the accuracy of the method for identifying cell types of different tissues under multiple species.
2. Data preprocessing stage
And randomly dividing different scRNA-seq data sets into a source domain and a target domain, wherein the type information of the source domain is known, and the type information of the target domain is unknown. The processing object is a gene expression matrix of scRNA-seq data, wherein the names of the row cells are listed as gene names. The additional columns are cell type information. The data preprocessing comprises three steps of quality control, data standardization and cell type conversion. Quality control is to check whether outliers are present in the original data set and set a threshold removal, and data normalization is to filter low quality cells with less than 5000 reads and 500 genes, and genes expressed by less than 10 cells. Each cell was then normalized to 10000 read counts using SCANPY; and finally, carrying out logarithmic processing and normalization processing on the data set. Cell type conversion is the conversion of cell type information of a data set into numerical numbers for subsequent cell classification.
3. Stage for building neural network architecture
The neural network used in the method consists of an input layer and two full-connection layers, the number of neurons of the input layer is the number of genes after data preprocessing, 1000 neurons are used in the first layer of the full-connection layers, and 100 neurons are used in the second layer. The activity of neurons in the fully-connected Layer is normalized by Normalization Layer Normalization. Layer Normalization is defined as:
Figure BDA0003385329730000021
the nonlinear activation function in the fully-connected layer uses SELU, defined as:
SELU(x)=scale*(max(0,x)+min(0,α*(exp(x)-1)))
in the pre-training stage, a mirror image of a neural network is used as a decoder, an auto-encoder is integrally formed to pre-train a target domain, and Mean Square Error (MSE) is used as a reconstruction loss function of the auto-encoder; in the formal training stage, the source domain and the target domain both adopt the neural network as a basic network structure, the source domain network further comprises a classification layer, the number of neurons in the classification layer is the number of cell types, and cross-entropy (cross-entropy) is used as a classification loss function of the source domain network, and is defined as:
Figure BDA0003385329730000031
where y represents the true type tag of the cell, y [ j ] is defined as 1 if the cell belongs to the jth cell type, and the rest of y is defined as 0. y 'represents the type label of the output, and y' j represents the posterior probability that the cell is the jth cell type.
4. Optimizing neural network architecture phases
And a domain self-adaptive layer is added after the second full-connection layer of the network structures of the source domain and the target domain, so that the data distribution of the source domain and the data distribution of the target domain are closer to each other by the self-adaptive layer, and the influence of batch effect on the final classification result is reduced. The adaptive measurement method adopts multinuclear MMD (MK-MMD), and the square formula of the MK-MMD is defined as:
Figure BDA0003385329730000032
where p, q represent the probability distributions of the source and target domains, respectively, HkRepresenting a regenerated Hilbert space RKHS, d with a characteristic kernel kk(p, q) represents the RKHS distance between the average embeddings of p and q. The important property is if
Figure BDA0003385329730000033
Figure BDA0003385329730000034
Then p is q. The feature kernel associated with the feature map φ is defined as:
k(Xs,Xt)=<φ(Xs),φ(Xt)>
its multi-core representation is a plurality of PSD cores kuConvex combination of }:
Figure BDA0003385329730000035
wherein for the coefficient { betauConstraint is applied to ensure that the derived multi-core k is characteristic. The MK-MMD is weighted by a plurality of different cores, and the finally obtained characterization capability is stronger than that of the MMD with only one core.
The optimization goal of the final network model consists of two parts: classification loss functions and adaptive losses. The optimization objective is achieved by minimizing the classification loss and MK-MMD, with the overall loss function defined as:
Figure BDA0003385329730000036
wherein Θ represents all weights and bias parameters of the network, and is a target to be learned; lambda [ alpha ]>0 is a penalty parameter; l1To l2Representing the number of layers to be adapted;
Figure BDA0003385329730000041
and
Figure BDA0003385329730000042
a layer I hidden representation representing the source domain and the target domain respectively; x is the number ofaAnd naRespectively representing all sets of data containing type information in a source domain and a target domain; j (-) represents the classification loss function. The key task in the learning phase is to learn the network parameters Θ and β of MK-MMD.
5. Accurate identification of cell types for unknown scRNA-seq datasets
And performing parameter updating and iterative optimization on the source domain network and the target network by using a mini-batch Stochastic Gradient Descent (SGD) method. And dividing the source data set and the target data set into a plurality of mini-batch as input training and optimizing networks through the DataLoder of the PyTorch, and using the finally trained target domain network as a classifier to accurately identify the type of the target data set with unknown type information.
Drawings
FIG. 1: deep network adaptive model skeleton map
FIG. 2: target data pre-training flow chart
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to experiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The hardware environment is mainly a PC host. The CPU of the PC host is Intel (R) core (TM) i5-6400, 2.70GHz, and the memory is 16GB RAM, 64-bit operating system. The software is implemented in Python language under Pycharm environment by taking Windows 10 as a platform, the Pycharm version is 2021.1.3, and the Python language version is 3.7.0.
1. Data collection and arrangement stage
The data used in the method comprises three major types, wherein the first type is a reference data set and is generated by two sequencing modes, namely 10x and CelSeq 2; the second type is a human pancreas tissue data set generated by adopting different sequencing modes, and the human pancreas tissue data set is generated by five sequencing modes, namely CelSeq, CelSeq2, SmartSeq2, Fluidigmc1 and inDrop; the third category is the mouse senescent cell map (Tabula Muris Senis) dataset downloaded from Figshare. The gene expression information of 23341 genes from 96307 cells is included, 22 tissues are included, and sequencing data are complete. The practicability of the method can be evaluated more perfectly through the three types of data. All data objects are stored over andata. The initial data for scRNA-seq consists of several parts as shown in Table 1.
Table 1: the major components of AnnData
Figure BDA0003385329730000043
Figure BDA0003385329730000051
In the method, the matrix data is a matrix of cells by genes; the observed value data includes cell type information, batch information of sequencing data, and the like.
2. Data preprocessing stage
And randomly dividing different scRNA-seq data sets into a source data set and a target data set, wherein the type information of the source data set is known, and the type information of the target data set is unknown. The method includes the steps of firstly determining genes detected in a source data set and a target data set, and then combining the source data set and the target data set into a matrix based on common genes. Then, pretreatment of the scRNA-seq initial data is started, wherein the pretreatment comprises three steps of quality control, data standardization and cell type conversion. The quality control mainly comprises deleting data without cell type information in cells, wherein the type information of the data is 'nan' or 'NA'; data normalization is achieved through a SCANPY package, low-quality cells with less than 5000 reads and 500 genes and genes expressed by less than 10 cells are filtered, then each cell is normalized into 10000 reading counts, and finally whether the data set is subjected to logarithmic processing and normalization can be selected according to actual conditions. The method performs logarithmic processing and normalization on all data sets in the experimental process. The cell type conversion is to map the cell type information in the character form into a number, so that the cell can be conveniently classified and the type identification accuracy of the evaluation method can be conveniently realized. After preprocessing, the data is re-split into a source data set and a target data set. Example data information for the source domain and the target domain is shown in table 2.
Table 2: data information of source domain and target domain
Figure BDA0003385329730000052
3. Stage for building neural network architecture
The neural network used in the method consists of an input layer and two full-connection layers, the number of neurons of the input layer is the number of genes after data preprocessing, 1000 neurons are used in the first layer of the full-connection layers, and 100 neurons are used in the second layer. Wherein each connection layer comprises four steps which are respectively as follows: (1) applying a linear transformation to the input data; (2) normalizing the activity of the neurons by Normalization Layer Normalization; (3) performing nonlinear transformation on the activity of the neuron by using an activation function SELU; (4) regularization is implemented using dropout.
In the pre-training stage, a mirror image of a neural network is used as a decoder, an auto-encoder is integrally formed to pre-train a target domain, and Mean Square Error (MSE) is used as a reconstruction loss function of the auto-encoder; the default parameter for this method is pretrain _ epochs, 10. Whether to start the pre-training step can be selected according to the actual situation of the data set and the training result.
In the formal training stage, the source domain and the target domain both adopt the neural network as a basic network structure, the source domain network further comprises a classification layer, and the number of neurons in the classification layer is the number of cell types. The main parameter settings in the neural network are as follows: the initial learning rate is 0.001, the learning rate is decayed in a step exponential manner, and the decay step length is set to be 20. This means that after every 20 epochs, the learning rate will be the raw learning rate multiplied by 0.95, the neural network is trained using 50 epochs, the mini-batch size is 32, which is the number of cells used in each epochs.
4. Optimization and training phase
A domain adaptation layer is added after the second fully-connected layer of the network structure of both the source domain and the target domain, and the adaptation loss between the source domain and the target domain is measured using a multi-core MMD (MK-MMD). And calculating the MMD distance of the source domain and the target domain in the adaptive layer in the training process, specifically mapping the source domain and the target domain into a regeneration Hilbert space RKHS with a characteristic kernel k, and then calculating the data distribution distance of the source domain and the target domain in a high-dimensional space. The method uses 40 n _ iter _ per _ epoch to train the adaptive layer, which means that 40 mmd processes are iteratively trained in each global training step.
The optimization goal of the final network model consists of two parts: classification loss functions and adaptive losses. The optimization objective is achieved by minimizing the classification loss and MK-MMD. The penalty parameter for the adaptive loss part of the total loss function is set to 10 by default. And performing parameter updating and iterative optimization on the source domain network and the target network by using a mini-batch Stochastic Gradient Descent (SGD) method. And dividing the source data set and the target data set into a plurality of mini-batch serving as input training and optimizing networks through the DataLoder of the PyTorch, and using the finally trained source domain network as a classifier to accurately identify the type of the target data set with unknown type information.
5. Result analysis verification
The scRNA-seq data generated by sequencing 10x and CelSeq2 in the reference dataset are respectively used as a source dataset and a target dataset, the corresponding accuracy rates are counted, then the two are exchanged, the corresponding accuracy rates are counted, and the results are shown in Table 3.
Table 3: accuracy of cell type identification in reference dataset
Figure BDA0003385329730000071
For human pancreatic tissue data sets generated by different sequencing platforms, the method uses CelSeq and CelSeq2 as a source data set and a target data set respectively, and counts corresponding accuracy rates, and then exchanges the data sets to count the corresponding accuracy rates, wherein experimental results are shown in Table 4.
Table 4: accuracy of cell type identification in human pancreatic tissue data set
Figure BDA0003385329730000072
As can be seen from tables 3 and 4, although the source data set and the target data set are generated by different sequencing platforms, the accuracy of type identification of the target data set without type information in the reference data set and the human pancreatic tissue data set is relatively high, the former is close to 100%, and the latter is 92%. The method can well overcome the difference between the reference data set and different batches of data in human pancreatic tissues to a certain extent, and accurately identify the type of unknown data by using the existing data.
The data set of mouse senescent cell map (Tabula Muris Senis) comprises 22 tissues, and 4 tissues with rich cell types (Heart, Limb _ Muscle, Brain _ Non-Myeloid, Liver) are selected for the experiment. The data sequenced by 10XGenomics is taken as a source data set, the data sequenced by SmartSeq2 is taken as a target data set, the corresponding accuracy is counted, and the experimental results are shown in Table 5.
Table 5: cell type identification accuracy rate in mouse senescent cell map data set
Figure BDA0003385329730000073
As can be seen from table 5, in the mouse aging cell map (Tabula Muris Senis) with a large data volume and rich cell types, the accuracy of the method for identifying the type of the target data set without type information in a plurality of different tissues is still high, and the reliability of the method for identifying the scRNA-seq cell type with batch effect is further confirmed.

Claims (6)

1. A deep network self-adaptive scRNA-seq cell type identification method is characterized by comprising the following implementation steps:
(1) collecting data, including a universal reference data set, human pancreas tissue data sets generated by different sequencing modes, and data sets of different tissues in the same species;
(2) preprocessing scRNA-seq data, wherein different scRNA-seq data sets are randomly divided into a source data set and a target data set, the type information of the source data set is known, the type information of the target data set is unknown, and the preprocessing comprises three steps of quality control, data standardization and cell type conversion;
(3) building a neural network architecture, firstly initializing neural network parameters by a target domain through a self-encoder, and then adopting the same neural network as a basic network structure of the source domain and the target domain;
(4) the optimization framework is characterized in that a domain self-adaptive layer is added in the network structures of the source domain and the target domain, the self-adaptive layer can enable the data distribution of the source domain and the data distribution of the target domain to be closer, and the influence of batch effect on the final classification result is reduced;
(5) accurately identifying the cell type of an unknown scRNA-seq data set, performing parameter updating and iterative optimization on a source domain network and a target network by using a small-batch stochastic gradient descent (mini-batch SGD) method, and enabling a final model to have the capability of accurately identifying the type of the target data set with unknown type information.
2. The deep network adaptation-based scRNA-seq cell type identification method according to claim 1, characterized in that the data collection stage:
(1) the reference data set was generated by two sequencing modes, 10x and CelSeq2 respectively;
(2) the human pancreatic tissue data set was generated by five sequencing modes, CelSeq2, SmartSeq2, Fluidigmc1, inDrop respectively;
(3) a mouse senescent cell map (Tabula Muris Senis) dataset downloaded from Figshare, containing 23341 gene expression information from 96307 cells, containing 22 tissues.
3. The deep network adaptation-based scRNA-seq cell type identification method according to claim 1, characterized in that the data preprocessing stage:
(1) checking whether an abnormal value exists in the original data set and setting a threshold value for removal;
(2) filtering low quality cells with less than 5000 reads and 500 genes, and genes expressed by less than 10 cells, regularizing each cell to 10000 read counts using SCANPY, and finally logarithmically processing and normalizing the data set;
(3) converting the cell type information of the data set to a numerical number facilitates subsequent cell classification.
4. The deep network adaptation-based scRNA-seq cell type identification method according to claim 1, which is characterized in that a neural network architecture stage is built:
(1) the neural network consists of an input layer and two full-connection layers, the number of neurons of the input layer is the number of genes after data preprocessing, 1000 neurons are used in the first layer of the full-connection layers, and 100 neurons are used in the second layer;
(2) in the pre-training stage, a mirror image of a neural network is used as a decoder, an auto-encoder is integrally formed to pre-train a target domain, and Mean Square Error (MSE) is used as a reconstruction loss function of the auto-encoder;
(3) in the formal training stage, the source domain and the target domain both adopt the neural network as a basic network structure, the source domain network further comprises a classification layer, the number of neurons of the classification layer is the number of cell types, and cross-entropy (cross-entropy) is used as a classification loss function of the source domain network.
5. The deep network adaptation-based scRNA-seq cell type identification method according to claim 1, characterized by an optimization network framework stage:
(1) adding a self-adaptive layer after a second full connection layer of the source domain network and the target network;
(2) the adaptive measurement method adopts multi-core MMD (MK-MMD) which measures the distance of data distribution of a source domain and a target domain in a regeneration Hilbert space RKHS, and the square formula of the MK-MMD is defined as follows:
Figure FDA0003385329720000022
the feature kernel associated with the feature map φ is defined as:
k(Xs,Xt)=<φ(Xs),φ(Xt)>
its multi-core representation is a plurality of PSD cores kuConvex combination of }:
Figure FDA0003385329720000021
(3) alignment of data distribution of the source domain and the target domain is achieved by minimizing MK-MMD.
6. The deep network adaptive based scRNA-seq cell type recognition method according to claim 1, characterized in that the cell type of unknown scRNA-seq data set can be accurately recognized, the source data set and the target data set are divided into a plurality of mini-lots as input training and optimization networks through the DataLoder carried by PyTorch, and the optimization target is composed of two parts: and (4) classification loss and adaptive loss, and finally, taking the trained target domain network as a classifier to accurately identify the type of the target data set.
CN202111471768.3A 2021-12-01 2021-12-01 Deep network self-adaption based scRNA-seq cell type identification method Pending CN114121158A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111471768.3A CN114121158A (en) 2021-12-01 2021-12-01 Deep network self-adaption based scRNA-seq cell type identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111471768.3A CN114121158A (en) 2021-12-01 2021-12-01 Deep network self-adaption based scRNA-seq cell type identification method

Publications (1)

Publication Number Publication Date
CN114121158A true CN114121158A (en) 2022-03-01

Family

ID=80366890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111471768.3A Pending CN114121158A (en) 2021-12-01 2021-12-01 Deep network self-adaption based scRNA-seq cell type identification method

Country Status (1)

Country Link
CN (1) CN114121158A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114974433A (en) * 2022-05-26 2022-08-30 厦门大学 A fast annotation method for circulating tumor cells based on deep transfer learning
CN116452910A (en) * 2023-03-28 2023-07-18 河南科技大学 scRNA-seq data characteristic representation and cell type identification method based on graph neural network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451424A (en) * 2017-07-31 2017-12-08 浙江绍兴千寻生物科技有限公司 In high volume unicellular RNA seq data quality controls and analysis method
WO2021127436A2 (en) * 2019-12-19 2021-06-24 Illumina, Inc. High-throughput single-cell libraries and methods of making and of using

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451424A (en) * 2017-07-31 2017-12-08 浙江绍兴千寻生物科技有限公司 In high volume unicellular RNA seq data quality controls and analysis method
WO2021127436A2 (en) * 2019-12-19 2021-06-24 Illumina, Inc. High-throughput single-cell libraries and methods of making and of using

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
彭绍亮;白亮;王力;程敏霞;王树林: "面向智慧医疗的可信边缘计算", 《电信科学》, 8 June 2020 (2020-06-08) *
李贱成;徐克前;: "单细胞转录组测序技术及其应用", 生命的化学, no. 08, 15 August 2020 (2020-08-15) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114974433A (en) * 2022-05-26 2022-08-30 厦门大学 A fast annotation method for circulating tumor cells based on deep transfer learning
CN116452910A (en) * 2023-03-28 2023-07-18 河南科技大学 scRNA-seq data characteristic representation and cell type identification method based on graph neural network
CN116452910B (en) * 2023-03-28 2023-11-28 河南科技大学 scRNA-seq data characteristic representation and cell type identification method based on graph neural network

Similar Documents

Publication Publication Date Title
CN114819056B (en) A single-cell data integration method based on domain adversarial and variational inference
CN114091603A (en) A spatial transcriptome cell clustering and analysis method
CN116959585B (en) Deep learning-based whole genome prediction method
CN111564183A (en) A Dimensionality Reduction Method for Single-Cell Sequencing Data Fusion Gene Ontology and Neural Network
Rasheed et al. Metagenomic taxonomic classification using extreme learning machines
CN115881232A (en) ScRNA-seq cell type annotation method based on graph neural network and feature fusion
CN114121158A (en) Deep network self-adaption based scRNA-seq cell type identification method
CN112967755A (en) Cell type identification method for single cell RNA sequencing data
CN117153268B (en) A method and system for determining cell type
CN118335189A (en) Single-cell deep clustering method fused with variational graph attention autoencoder
CN115862746B (en) Accurate single-cell multi-group chemical matching data generation method
CN114512188B (en) DNA binding protein recognition method based on improved protein sequence position specificity matrix
Huang et al. Sequential reinforcement active feature learning for gene signature identification in renal cell carcinoma
Cao et al. Cell blast: searching large-scale scrna-seq databases via unbiased cell embedding
CN117037910B (en) A method for evaluating the probability of intergene correlation based on gene expression data
CN110797083B (en) Biomarker identification method based on multiple networks
Ma et al. EnsembleKQC: an unsupervised ensemble learning method for quality control of single cell RNA-seq sequencing data
CN111755074A (en) A method for predicting the origin of DNA replication in Saccharomyces cerevisiae
CN119601090B (en) A method and system for identifying gene co-expression networks based on graph convolutional neural networks
CN117877590B (en) Cell clustering method, device, equipment and storage medium based on sequencing data
CN119479827B (en) Single-cell RNA sequencing data classification method and system based on deep learning
CN116646010B (en) Human virus detection method and device, equipment and storage medium
CN119724374B (en) Single-cell distillation discrimination clustering method, system, electronic equipment and medium based on asymmetric self-encoder
CN119132401B (en) High-precision single cell classification method and device based on artificial intelligence algorithm
Wu et al. Research on the identification method of Asgard archaea using CNN-LSTM fusion of prokaryotic microbial short gene sequence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20220301