[go: up one dir, main page]

CN105701365A - Cancer-related genes finding method by using miRNA expression data - Google Patents

Cancer-related genes finding method by using miRNA expression data Download PDF

Info

Publication number
CN105701365A
CN105701365A CN201610019087.6A CN201610019087A CN105701365A CN 105701365 A CN105701365 A CN 105701365A CN 201610019087 A CN201610019087 A CN 201610019087A CN 105701365 A CN105701365 A CN 105701365A
Authority
CN
China
Prior art keywords
mirna
sample
gene
data
genes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610019087.6A
Other languages
Chinese (zh)
Other versions
CN105701365B (en
Inventor
杨利英
曹阳
袁细国
张军英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201610019087.6A priority Critical patent/CN105701365B/en
Publication of CN105701365A publication Critical patent/CN105701365A/en
Application granted granted Critical
Publication of CN105701365B publication Critical patent/CN105701365B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本发明公开了一种利用miRNA表达数据发现癌症相关基因的方法,基于癌症基因组图谱TCGA下的泛癌症项目PanCancer,运用统计分析和机器学习算法,对基因表达数据进行分析处理,识别与复杂疾病相关的基因;包括:样本数据整理;对miRNA数据进行统计分析;将miRNA按均值变化率排序;选定靶基因;提取出相应的疾病样本和正常样本;利用Relief算法对上述提取出来的mRNA样本中的基因进行排序。本发明可发现与癌症等复杂疾病相关的多个风险基因,对复杂疾病的生物靶向治疗、生物药物研制、致病机理阐释及风险预测等都有重要意义。

The invention discloses a method for discovering cancer-related genes by using miRNA expression data. Based on the pan-cancer project PanCancer under the Cancer Genome Atlas TCGA, statistical analysis and machine learning algorithms are used to analyze and process gene expression data to identify genes related to complex diseases. including: sorting out sample data; performing statistical analysis on miRNA data; sorting miRNA according to the average rate of change; selecting target genes; extracting corresponding disease samples and normal samples; genes are sorted. The invention can discover a plurality of risk genes related to complex diseases such as cancer, which is of great significance to biological target therapy, biological drug development, pathogenic mechanism interpretation and risk prediction of complex diseases.

Description

一种利用miRNA表达数据发现癌症相关基因的方法A method for discovering cancer-related genes using miRNA expression data

技术领域technical field

本发明属于数据处理技术领域,尤其涉及一种利用miRNA表达数据发现癌症相关基因的方法。The invention belongs to the technical field of data processing, and in particular relates to a method for discovering cancer-related genes by using miRNA expression data.

背景技术Background technique

生物信息学是一门生命科学和计算机科学相结合的新兴学科,研究生物信息的采集、处理、存储、传播、分析和解释等,通过综合利用生物学、计算机科学及信息技术来揭示复杂的生物数据所蕴藏的生物学奥秘。基因是遗传信息的载体,对基因的探究有助于加深对疾病的认识。人类目前已知的基因个数超过2万个,对应的测序数据得到的mRNA基因表达数据达到2万多维,而且每种疾病相关的基因各不相同,有些疾病相关基因已经被发现,但是大多数的相关基因有待于进一步研究。可见,对mRNA基因表达数据直接进行分析需要处理高维数据,计算复杂度极大。Bioinformatics is an emerging discipline combining life science and computer science, which studies the collection, processing, storage, dissemination, analysis and interpretation of biological information, and reveals complex biological information through the comprehensive use of biology, computer science and information technology. Biological mysteries hidden in data. Genes are the carriers of genetic information, and the exploration of genes will help to deepen the understanding of diseases. There are currently more than 20,000 genes known to humans, and the mRNA gene expression data obtained from the corresponding sequencing data has reached more than 20,000 dimensions, and the genes related to each disease are different. Some disease-related genes have been discovered, but most Most of the related genes need further study. It can be seen that direct analysis of mRNA gene expression data needs to deal with high-dimensional data, and the computational complexity is extremely high.

miRNA是一类内生的、长度约为20-24个核苷酸的小RNA,人类已知的miRNA有1000多个,其在细胞内具有多种重要的调节作用。miRNA可以调控人体中很多基因,即每个miRNA可以有多个靶基因,多个miRNA也可以调节同一个基因。miRNA调控基因的方式总共有三种。第一种作用方式是切分靶基因分子结构,在这种情况下,两者在结构上表现为完全互补,miRNA的功能同siRNA非常相似,植物中的miRNA大多数都是这种作用方式。第二种作用方式是阻碍靶基因翻译,这种情况下,两者在结构上变现为不完全互补,这种不完全互补导致靶基因翻译受阻,随之影响基因表达的稳定性,非植物生物中发现最多的作用方式都是这种方式,比如秀丽隐杆线虫的lin-4就是以这种方式影响秀丽隐杆线虫生长发育,但是在植物中这种作用方式很少见。第三种作用方式是前面两种方式结合,有的miRNA部分与靶基因互补结合,这时就表现为切割靶基因,而剩余部分却与靶基因不完全结合,这时就表现为阻碍靶基因翻译。鉴于miRNA表达数据维度小,通过处理miRNA表达数据,获取疾病的风险miRNA,然后利用miRNA的靶基因mRNA数据进行分析,可以在降低数据维度的同时达到预测疾病相关基因的目的。miRNA is a kind of endogenous small RNA with a length of about 20-24 nucleotides. There are more than 1000 miRNAs known to human beings, which have various important regulatory functions in cells. miRNA can regulate many genes in the human body, that is, each miRNA can have multiple target genes, and multiple miRNAs can also regulate the same gene. There are three ways in which miRNAs regulate genes. The first mode of action is to split the molecular structure of the target gene. In this case, the two are completely complementary in structure, and the function of miRNA is very similar to that of siRNA. Most miRNAs in plants have this mode of action. The second mode of action is to hinder the translation of the target gene. In this case, the two are incompletely complementary in structure. This incomplete complementarity leads to the blockage of the translation of the target gene, which then affects the stability of gene expression. Non-plant organisms This is the most common mode of action found in C. elegans. For example, lin-4 of Caenorhabditis elegans affects the growth and development of C. elegans in this way, but this mode of action is rare in plants. The third way of action is the combination of the first two ways. Some miRNA parts are complementary to the target gene. At this time, it is shown to cut the target gene, while the remaining part is not completely combined with the target gene. At this time, it is shown to hinder the target gene. translate. In view of the small dimension of miRNA expression data, by processing miRNA expression data to obtain disease-risk miRNAs, and then using miRNA target gene mRNA data for analysis, the purpose of predicting disease-related genes can be achieved while reducing data dimensions.

现有技术直接处理复杂疾病mRNA基因表达数据维数高和计算量大。The existing technology directly deals with complex disease mRNA gene expression data with high dimensionality and heavy calculation.

发明内容Contents of the invention

本发明的目的在于提供一种利用miRNA表达数据发现癌症相关基因的方法,旨在解决现有技术直接处理复杂疾病mRNA基因表达数据维数高和计算量大的问题。The purpose of the present invention is to provide a method for discovering cancer-related genes using miRNA expression data, aiming to solve the problems of high dimensionality and large amount of calculation in the prior art to directly process mRNA gene expression data of complex diseases.

本发明是这样实现的,一种利用miRNA表达数据发现癌症相关基因的方法,所述利用miRNA表达数据发现癌症相关基因的方法基于癌症基因组图谱TCGA下的泛癌症项目PanCancer,运用统计分析和机器学习算法,对基因表达数据进行分析处理,识别与复杂疾病相关的基因,包括:The present invention is achieved in this way, a method for discovering cancer-related genes using miRNA expression data, the method for discovering cancer-related genes using miRNA expression data is based on the pan-cancer project PanCancer under the Cancer Genome Atlas TCGA, using statistical analysis and machine learning Algorithms to analyze and process gene expression data to identify genes associated with complex diseases, including:

样本数据整理,获取某种疾病的miRNA表达数据和mRNA表达数据,两种数据均包含疾病样本和对应的正常样本;Sample data sorting, obtaining miRNA expression data and mRNA expression data of a certain disease, both of which include disease samples and corresponding normal samples;

对miRNA数据进行统计分析,分别求得正常样本和疾病样本的平均表达值,此过程要排除零值的影响;Perform statistical analysis on the miRNA data to obtain the average expression values of normal samples and disease samples, and the influence of zero values should be excluded in this process;

将miRNA按均值变化率排序,变化率越大的排名越靠前,筛选排名靠前的10个miRNA作为相关miRNA;Sort the miRNAs according to the average change rate, the higher the change rate, the higher the ranking, and select the top 10 miRNAs as related miRNAs;

应用miRanda、miRDB、miRWalk、RNA22、Targetscan五个靶基因预测软件作为预测mRNA的工具,获取相应miRNA的靶基因,选定靶基因遵循如下条件:对于所用的五个靶基因预测软件,假设K表示同时预测到相同靶基因的预测软件个数的最大值,Nk表示同时被K个靶基因软件预测的基因个数,作为预选基因至少要被R(0≤R≤K)个靶基因预测软件同时预测到;The five target gene prediction softwares miRanda, miRDB, miRWalk, RNA22, and Targetscan were used as tools for predicting mRNA to obtain the target genes of the corresponding miRNAs, and the selected target genes followed the following conditions: The maximum number of prediction software that predicts the same target gene at the same time, N k represents the number of genes predicted by K target gene software at the same time, as a pre-selected gene must be at least R (0≤R≤K) target gene prediction software predicted at the same time;

根据选中的mRNA,从初始mRNA表达数据中提取出相应的疾病样本和正常样本;According to the selected mRNA, extract corresponding disease samples and normal samples from the initial mRNA expression data;

利用Relief算法对上述提取出来的mRNA样本中的基因进行排序,按重要性从大到小排列,取前45个基因作为预测的疾病相关基因。Use the Relief algorithm to sort the genes in the above extracted mRNA samples, arrange them in descending order of importance, and take the top 45 genes as the predicted disease-related genes.

进一步,对miRNA数据进行分析时,要排除零值的影响,求取miRNA均值时,先求出每个样本中非零值的个数m,然后求得miRNA样本表达值总和Sum,则计算出样本均值为Sum/m,正常样本表达值均值为n,疾病样本表达值均值为c,则得相应的表达值变化率为|n-c|/n,根据miRNA的均值变化率,确定样本表达值变化率排名前10的miRNA为相关的miRNA。Further, when analyzing miRNA data, it is necessary to exclude the influence of zero values. When calculating the mean value of miRNA, first calculate the number m of non-zero values in each sample, and then calculate the sum of the expression values of miRNA samples Sum, then calculate The mean value of the sample is Sum/m, the mean value of the expression value of the normal sample is n, and the mean value of the expression value of the disease sample is c, then the corresponding expression value change rate is |n-c|/n, and the sample expression value change is determined according to the mean change rate of miRNA The top 10 miRNAs are related miRNAs.

进一步,作为预选基因至少要被R(0≤R≤K)个靶基因预测软件同时预测到,Nk>10时,R=K;当Nk<10且Nk-1>10时,R=K-1;同理,若Nk-1<10且Nk-2>10,则R=K-2,以此类推。Further, as a preselected gene, at least R (0≤R≤K) target gene prediction software must be simultaneously predicted. When N k >10, R=K; when N k <10 and N k-1 >10, R =K-1; similarly, if N k-1 <10 and N k-2 >10, then R=K-2, and so on.

进一步,选取的特征选择方法是Relief算法,特征权重计算公式如下:Furthermore, the selected feature selection method is the Relief algorithm, and the feature weight calculation formula is as follows:

ww ff == ww ff -- &Sigma;&Sigma; jj == 11 kk dd ii ff ff (( ff ,, sthe s ii ,, SameSame jj )) rr kk ++ &Sigma;&Sigma; jj == 11 kk dd ii ff ff (( ff ,, sthe s ii ,, Missmiss jj )) rr kk ff == 11 ,, ...... ,, qq ;; ii == 11 ,, ...... ,, pp ;;

其中,si(i=1,...,p)表示第i个样本,p为样本数目;Samej表示si的第j个同类样本,Missj表示si的第j个异类样本,k表示近邻个数;wf(f=1,...,q)表示特征f的权重,即第f个预选基因的重要程度,q为预选基因的数目;r表示抽样次数;Among them, s i (i=1,...,p) represents the i-th sample, p is the number of samples; Same j represents the j-th similar sample of s i , Miss j represents the j-th heterogeneous sample of s i , k represents the number of neighbors; w f (f=1,...,q) represents the weight of feature f, that is, the importance of the fth pre-selected gene, q is the number of pre-selected genes; r represents the sampling frequency;

函数diff定义如下:The function diff is defined as follows:

dd ii ff ff (( ff ,, sthe s ii ,, sthe s jj )) == || sthe s ii ff -- sthe s jj ff MaxMax ff -- MinMin ff || ;;

其中,sif表示特征f在第i个样本上的取值,sjf表示特征f在第j个样本上的取值,Maxf表示特征f在样本中的最大值,Minf则表示特征f在样本中的最小值,抽样次数r=10,近邻个数k=20,Relief算法迭代次数为30次,计算每个特征的权重W={w1,w2,…,wq},并根据权重W对mRNA排序。Among them, s if represents the value of feature f on the i-th sample, s jf represents the value of feature f on the j-th sample, Max f represents the maximum value of feature f in the sample, and Min f represents the feature f The minimum value in the sample, the number of sampling r=10, the number of neighbors k=20, the number of iterations of the Relief algorithm is 30 times, and the weight W={w 1 ,w 2 ,…,w q } of each feature is calculated, and The mRNAs are ranked according to weight W.

本发明的另一目的在于提供一种所述利用miRNA表达数据发现癌症相关基因的方法的系统,所述系统包括:Another object of the present invention is to provide a system for the method of using miRNA expression data to discover cancer-related genes, the system comprising:

样本数据整理模块,用于获取某种疾病的miRNA表达数据和mRNA表达数据,两种数据均包含疾病样本和对应的正常样本;The sample data sorting module is used to obtain miRNA expression data and mRNA expression data of a certain disease, both of which include disease samples and corresponding normal samples;

统计分析模块,用于对miRNA数据进行统计分析,分别求得正常样本和疾病样本的平均表达值,此过程要排除零值的影响;The statistical analysis module is used to perform statistical analysis on miRNA data, and obtain the average expression values of normal samples and disease samples respectively, and the influence of zero value should be excluded in this process;

筛选排名模块,用于将miRNA按均值变化率排序,变化率越大的排名越靠前,筛选排名靠前的10个miRNA作为相关miRNA;The screening and ranking module is used to sort the miRNAs according to the average rate of change, the higher the rate of change, the higher the ranking, and filter the top 10 miRNAs as related miRNAs;

选定靶基因模块,用于应用miRanda、miRDB、miRWalk、RNA22、Targetscan五个靶基因预测软件作为预测mRNA的工具,获取相应miRNA的靶基因,选定靶基因遵循如下条件:对于所用的五个靶基因预测软件,假设K表示同时预测到相同靶基因的预测软件个数的最大值,Nk表示同时被K个靶基因软件预测的基因个数;The selected target gene module is used to apply miRanda, miRDB, miRWalk, RNA22, and Targetscan five target gene prediction software as a tool for predicting mRNA to obtain the target gene of the corresponding miRNA. The selected target gene follows the following conditions: For the five used Target gene prediction software, assuming that K represents the maximum number of prediction software that predicts the same target gene at the same time, and N k represents the number of genes predicted by K target gene software at the same time;

提取模块,用于根据选中的mRNA,从初始mRNA表达数据中提取出相应的疾病样本和正常样本;The extraction module is used to extract corresponding disease samples and normal samples from the initial mRNA expression data according to the selected mRNA;

排序模块,用于利用Relief算法对上述提取出来的mRNA样本中的基因进行排序,按重要性从大到小排列,取前45个基因作为预测的疾病相关基因。The sorting module is used to sort the genes in the above-mentioned extracted mRNA samples using the Relief algorithm, arrange them in descending order of importance, and take the first 45 genes as the predicted disease-related genes.

进一步,所述统计分析模块进一步包括:Further, the statistical analysis module further includes:

非零值求取单元,用于求取miRNA均值时,先求出每个样本中非零值的个数m;The non-zero value calculation unit is used to calculate the miRNA mean value, and first calculate the number m of non-zero values in each sample;

样本均值计算单元,用于求得miRNA样本表达值总和Sum,则计算出样本均值为Sum/m;The sample mean value calculation unit is used to obtain the sum Sum of miRNA sample expression values, and calculate the sample mean value as Sum/m;

表达值变化率计算单元,正常样本表达值均值为n,疾病样本表达值均值为c,则得相应的表达值变化率为|n-c|/n;Expression value change rate calculation unit, the average expression value of normal samples is n, and the average expression value of disease samples is c, then the corresponding expression value change rate is |n-c|/n;

排名单元,用于根据miRNA的均值变化率,确定样本表达值变化率排名前10的miRNA为相关的miRNA。The ranking unit is used to determine the top 10 miRNAs in the change rate of sample expression value as related miRNAs according to the mean change rate of miRNAs.

本发明的另一目的在于提供一种应用所述利用miRNA表达数据发现癌症相关基因的方法的生物靶向治疗系统。Another object of the present invention is to provide a bio-targeted treatment system using the method for discovering cancer-related genes using miRNA expression data.

本发明的另一目的在于提供一种应用所述利用miRNA表达数据发现癌症相关基因的方法的生物药物研制工艺。Another object of the present invention is to provide a biopharmaceutical development process using the method for discovering cancer-related genes using miRNA expression data.

本发明的另一目的在于提供一种应用所述利用miRNA表达数据发现癌症相关基因的方法的致病机理阐释系统。Another object of the present invention is to provide a pathogenic mechanism elucidation system using the method for discovering cancer-related genes using miRNA expression data.

本发明的另一目的在于提供一种应用所述利用miRNA表达数据发现癌症相关基因的方法的致病风险预测系统。Another object of the present invention is to provide a disease-causing risk prediction system using the method for discovering cancer-related genes using miRNA expression data.

本发明提供的利用miRNA表达数据发现癌症相关基因的方法,基于癌症基因组图谱TCGA(TheCancerGenomeAtlas)下的泛癌症项目PanCancer,运用统计分析和机器学习算法,对基因表达数据进行分析处理,识别与复杂疾病相关的基因。本发明可发现与癌症等复杂疾病相关的多个风险基因,对复杂疾病的生物靶向治疗、生物药物研制、致病机理阐释及风险预测等都有重要意义,可以针对求得的风险基因设计基因靶向疗法;根据基因标记选择敏感性高的药物或者开发新药物;基于发现的相关基因,能够分析复杂疾病的发展过程,以确定其形成机制;还可以对预测的风险基因进行易感基因检测,以降低患病风险。本发明考虑到mRNA表达数据量大导致样本数据难以处理,故采用数据量小且对mRNA有调控作用的miRNA作为分析点,现有技术处理的是2万多维的mRNA表达数据,而本方法分析的是1千多维的miRNA表达数据,维度降低了20倍,因此计算复杂度降低,计算时间缩短,避免了因数据量大导致的计算时间过长等不利因素。本发明利用miRNA能快速地定位到致病mRNA,不是局限于某种复杂疾病或者某种癌症,而是对所有的复杂疾病均可以利用该方法分析相关基因。本发明是通过分析某种疾病的miRNA表达数据来确定靶基因,由靶基因mRNA表达数据筛选出风险基因,此外不需要任何与疾病相关的信息。因此,只要给出某种疾病的miRNA及mRNA表达数据,就可以应用该方法进行分析,适用性广。The method for discovering cancer-related genes using miRNA expression data provided by the present invention is based on the pan-cancer project PanCancer under the Cancer Genome Atlas TCGA (The Cancer Genome Atlas), and uses statistical analysis and machine learning algorithms to analyze and process gene expression data to identify complex diseases. related genes. The invention can discover multiple risk genes related to complex diseases such as cancer, which is of great significance to the biological target therapy of complex diseases, the development of biological drugs, the interpretation of pathogenic mechanism and the risk prediction, etc., and can be designed according to the obtained risk genes Gene-targeted therapy; select highly sensitive drugs or develop new drugs based on gene markers; based on the discovered related genes, it can analyze the development process of complex diseases to determine their formation mechanism; it can also conduct susceptibility genes for predicted risk genes testing to reduce the risk of disease. The present invention takes into account that the large amount of mRNA expression data makes it difficult to process sample data, so miRNAs with small data amounts and regulatory effects on mRNA are used as analysis points. The prior art deals with more than 20,000 dimensional mRNA expression data, while this method The analysis is more than 1,000-dimensional miRNA expression data, and the dimension is reduced by 20 times, so the calculation complexity is reduced, the calculation time is shortened, and unfavorable factors such as excessive calculation time caused by a large amount of data are avoided. The present invention uses miRNA to rapidly locate pathogenic mRNA, and is not limited to a certain complex disease or a certain cancer, but can use this method to analyze related genes for all complex diseases. The present invention determines the target gene by analyzing the miRNA expression data of a certain disease, and screens the risk gene from the mRNA expression data of the target gene, and does not need any information related to the disease. Therefore, as long as the miRNA and mRNA expression data of a certain disease are given, this method can be applied for analysis and has wide applicability.

附图说明Description of drawings

图1是本发明实施例提供的利用miRNA表达数据发现癌症相关基因的方法流程图。Fig. 1 is a flowchart of a method for discovering cancer-related genes using miRNA expression data provided by an embodiment of the present invention.

具体实施方式detailed description

为了使本发明的目的、技术方案及优点更加清楚明白,以下结合实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

本发明利用miRNA表达数据量较小的特点和miRNA与mRNA间的靶定关系,通过分析miRNA表达数据来获取癌症相关基因,从而解决现有技术直接利用mRNA表达数据分析时数据量过大的问题。The present invention utilizes the characteristics of small amount of miRNA expression data and the targeting relationship between miRNA and mRNA to obtain cancer-related genes by analyzing miRNA expression data, thereby solving the problem of excessive data volume when directly using mRNA expression data analysis in the prior art .

下面结合附图对本发明的应用原理作详细的描述。The application principle of the present invention will be described in detail below in conjunction with the accompanying drawings.

如图1所示,本发明实施例的利用miRNA表达数据发现癌症相关基因的方法包括以下步骤:As shown in Figure 1, the method for discovering cancer-related genes using miRNA expression data according to the embodiment of the present invention includes the following steps:

S101:基于miRNA在正常样本和疾病样本中的表达差异筛选出与癌症相关的miRNA;S101: Screen miRNAs related to cancer based on the expression difference of miRNAs in normal samples and disease samples;

S102:利用miRNA和mRNA之间的映射关系,靶定到该miRNA作用的mRNA上;S102: Utilizing the mapping relationship between miRNA and mRNA, targeting the mRNA on which the miRNA acts;

S103:通过对靶定mRNA表达数据的分析来发现癌症的相关基因。S103: Discover cancer-related genes by analyzing targeted mRNA expression data.

本发明使用的数据,包括miRNA和mRNA中的正常和疾病样本,均来自TGCA的泛癌研究项目。The data used in the present invention, including normal and disease samples in miRNA and mRNA, are all from the pan-cancer research project of TGCA.

本发明的具体实现步骤如下。The specific implementation steps of the present invention are as follows.

步骤一,数据处理Step 1, data processing

把样本数据分为如下四组:正常样本miRNA表达数据、疾病样本miRNA表达数据、正常样本mRNA表达数据、疾病样本mRNA表达数据,这里需要确保样本的miRNA和mRNA名称对应一致。Divide the sample data into the following four groups: miRNA expression data of normal samples, miRNA expression data of disease samples, mRNA expression data of normal samples, and mRNA expression data of disease samples. Here, it is necessary to ensure that the miRNA and mRNA names of the samples are consistent.

步骤二,筛选出表达值变化率高的miRNAStep 2: Screen out miRNAs with a high rate of change in expression values

1.对于miRNA数据,求出正常和疾病两类样本数据中每个miRNA对应表达值的均值。因为样本中有miRNA表达值现为0的情况,所以在处理的时候需要统计出零值的个数,若某个miRNA的样本中非零值的总数为m,样本值总和为Sum,则该miRNA样本的均值为Sum/m。以此方法计算每个miRNA正常和疾病样本的均值。1. For the miRNA data, calculate the mean value of the corresponding expression value of each miRNA in the normal and disease sample data. Because the miRNA expression value in the sample is now 0, the number of zero values needs to be counted during processing. If the total number of non-zero values in a certain miRNA sample is m, and the sum of the sample values is Sum, then the The mean of miRNA samples is Sum/m. This method was used to calculate the mean value of normal and disease samples for each miRNA.

2.根据样本均值,计算出每个miRNA的变化率,若某个miRNA的正常样本均值为n,疾病样本均值为c,则变化率为|n-c|/n。2. Calculate the change rate of each miRNA according to the sample mean value. If the normal sample mean value of a certain miRNA is n, and the disease sample mean value is c, then the change rate is |n-c|/n.

3.根据变化率对所有的miRNA进行排序,选出前10个变化率大的miRNA。3. Sort all miRNAs according to their rate of change, and select the top 10 miRNAs with a large rate of change.

步骤三,获取选定miRNA的靶基因Step 3, get the target gene of the selected miRNA

利用miRanda、miRDB、miRWalk、RNA22和Targetscan五种靶基因预测软件,获得选定miRNA的靶基因。对于所用的五个靶基因预测软件,假设K表示同时预测到相同靶基因的预测软件个数的最大值,Nk表示同时被K个靶基因软件预测的基因个数。本方法要求作为预选基因至少要被R(0≤R≤K)个靶基因预测软件同时预测到,Nk>10时,R=K;当Nk<10且Nk-1>10时,R=K-1;同理,若Nk-1<10且Nk-2>10,则R=K-2,以此类推。Using five target gene prediction software, miRanda, miRDB, miRWalk, RNA22 and Targetscan, the target genes of selected miRNAs were obtained. For the five target gene prediction software used, it is assumed that K represents the maximum number of prediction software that simultaneously predicted the same target gene, and N k represents the number of genes predicted by K target gene software at the same time. This method requires at least R (0≤R≤K) target gene prediction software to simultaneously predict the preselected gene. When N k >10, R=K; when N k <10 and N k-1 >10, R=K-1; similarly, if N k-1 <10 and N k-2 >10, then R=K-2, and so on.

步骤四:利用Relief算法筛选相关基因Step 4: Use the Relief algorithm to screen related genes

根据选定的靶基因mRNA,在mRNA表达数据中筛选出正常样本和疾病样本,并将两类样本数据整合在一起,利用Relief算法对mRNA按照重要性从大到小排序,其中Relief算法中抽样次数r=10,近邻个数k=20,算法迭代次数为30次。根据Relief排序结果,选定前45个作为预测的相关基因。According to the selected target gene mRNA, normal samples and disease samples were screened out from the mRNA expression data, and the data of the two types of samples were integrated together, and the mRNA was sorted in descending order of importance using the Relief algorithm. The number of times r=10, the number of neighbors k=20, and the number of iterations of the algorithm is 30 times. According to the results of Relief sorting, the top 45 genes were selected as the predicted related genes.

下面结合实验对本发明的应用效果作详细的描述。The application effects of the present invention will be described in detail below in conjunction with experiments.

实验一,选取TCGAPanCancer项目中的乳腺癌表达数据(BRCA)作为实验对象,数据中共有1045个miRNA和20530个mRNA。根据上述实验步骤对乳腺癌表达数据进行处理:In experiment 1, the breast cancer expression data (BRCA) in the TCGAPanCancer project was selected as the experimental object, and there were 1045 miRNAs and 20530 mRNAs in the data. Breast cancer expression data were processed according to the above experimental steps:

1.导入miRNA样本数据,首先筛选所有的正常样本数据,保证正常样本中没有全零的情况,若有某个miRNA的正常样本数据全部为零,则在删除正常样本中该miRNA数据的同时,也删除疾病样本中的对应miRNA的数据。1. Import the miRNA sample data, first screen all the normal sample data to ensure that there are no zeros in the normal sample, if there is a miRNA in the normal sample data that is all zero, delete the miRNA data in the normal sample at the same time, Data corresponding to miRNAs in disease samples were also deleted.

2.针对筛选完成的miRNA数据,对正常和疾病样本分别求均值,并计算变化率。2. For the screened miRNA data, average the normal and disease samples respectively, and calculate the rate of change.

3.根据变化率对miRNA排序,排序后选择前10个变化率大的miRNA。本实验最终选定如下10个MicroRNA:hsa-mir-133b,hsa-mir-133a,hsa-mir-208b,hsa-mir-206,hsa-mir-551b,hsa-mir-145,hsa-mir-378,hsa-mir-451,hsa-mir-144,hsa-mir-1。3. Rank the miRNAs according to their rate of change, and select the top 10 miRNAs with the highest rate of change after sorting. In this experiment, the following 10 MicroRNAs were finally selected: hsa-mir-133b, hsa-mir-133a, hsa-mir-208b, hsa-mir-206, hsa-mir-551b, hsa-mir-145, hsa-mir- 378, hsa-mir-451, hsa-mir-144, hsa-mir-1.

4.对选定的miRNA,利用前述五种靶基因预测软件预测靶基因,得到725个靶基因mRNA。4. For the selected miRNAs, use the aforementioned five target gene prediction software to predict target genes, and obtain 725 target gene mRNAs.

5.从mRNA数据中选出725个靶基因mRNA对应的数据,然后利用Relief算法对mRNA进行重要性排序,设定抽样次数r=10,近邻个数k=20,算法迭代次数为30次,选择出前45个重要的mRNA为相关基因。45个mRNA如下:RXFP2,GYPA,OTX2,PRDM9,CYP11B1,MMD2,CHRNA4,NEUROD1,PABPC1L2B,RIT2,CNTN5,NEUROD4,SLC4A1,PRDM7,FBXO40,GABRG2,GPR6,ZIC3,SPINLW1,DMRT1,CYP3A4,DPCR1,LHX9,ISL2,LIPI,SOST,HHLA2,S100A7,RIPPLY1,TRHDE,BMP3,KCNMB2,PAX5,PAX3,ANGPT4,DSCAM,EREG,OR7D2,DRD1,GFRA3,LEP,GPR26,LIX1,ZIC1,GDAP1L1。5. Select the data corresponding to 725 target gene mRNAs from the mRNA data, and then use the Relief algorithm to sort the importance of the mRNAs, set the number of sampling r=10, the number of neighbors k=20, and the number of algorithm iterations is 30 times, The top 45 important mRNAs were selected as related genes. The 45 mRNAs are as follows: RXFP2, GYPA, OTX2, PRDM9, CYP11B1, MMD2, CHRNA4, NEUROD1, PABPC1L2B, RIT2, CNTN5, NEUROD4, SLC4A1, PRDM7, FBXO40, GABRG2, GPR6, ZIC3, SPINLW1, DMRT1, CYP3A4, DPCR1, LHX9 , ISL2, LIPI, SOST, HHLA2, S100A7, RIPPLY1, TRHDE, BMP3, KCNMB2, PAX5, PAX3, ANGPT4, DSCAM, EREG, OR7D2, DRD1, GFRA3, LEP, GPR26, LIX1, ZIC1, GDAP1L1.

下面分析求得的基因在乳腺癌中的作用及其与已知乳腺癌重要基因间的功能联系,以此说明这些基因与乳腺癌的相关性,从而验证本研究所提方法的有效性。The function of the obtained genes in breast cancer and their functional relationship with known important genes of breast cancer will be analyzed below to illustrate the correlation between these genes and breast cancer, thereby verifying the effectiveness of the method proposed in this study.

NEUROD1是NeuroD家族的碱性bHLH转录因子,它可以联合其他bHLH的转录因子产生异源二聚体并激活一种叫E-box的特殊的DNA序列转录,它还有助于多种细胞分化通路调控。HeidiFiegl发现NEUROD1在乳腺组织的肿瘤和肺肿瘤样本中出现了甲基化异常的现象,而且肿瘤恶化等级越高的样本中甲基化的水平就越高。SLC4A1编码的蛋白质是AE蛋白家族的一员,该蛋白在红细胞中起到很大的作用,其可以作为一种转运蛋白介质帮助相应的物质穿过细胞膜。AGorbatenko的研究表明SLC4A1在所有乳腺癌子类型中下调,这说明SLC4A1可能对乳腺癌病变产生一定影响。CYP3A4能够编码细胞色素P450酶,这种酶参与了现今一半药物的代谢过程,比如乙酰氨基酚、可待因、环孢素A以及地西泮和红霉素,同时还参与一些类固醇和致癌物的代谢。CKeshava发现CYP3A4的变异可能会导致乳腺癌的激素代谢水平出现失调,同时也可能会激活外源物导致癌症产生,是乳腺癌的一个重要相关基因。HHLA2编码的蛋白质存在单核细胞的表面,这种蛋白可以和淋巴细胞上的受体结合,从而调节细胞介导的免疫力,并抑制单核细胞的增殖。MJanakiram通过分析TCGA相关表达数据发现HHLA2的拷贝数在乳腺癌中升高了29%,导致HHLA2在乳腺癌中会出现过多表达的情况,这侧面说明HHLA2对乳腺癌病变产生一定影响。S100A7编码的蛋白属于S100蛋白族一员,S100蛋白广泛存在于细胞浆和细胞核,并参与许多细胞过程,比如细胞周期和分化的调控。Emberley详细说明了S100A7在乳腺癌中的研究状况和S100A7在乳腺癌中的具体作用方式和表达情况,同样Haddadd也阐述了S100A7和乳腺癌间的关系。PAX3是PAX转录因子家族的一员,其包含一个配对的盒状域和一个配对的同源结构域,这些基因在胎儿发育过程中起到非常重要的作用。WJTan在其文章中具体描述了PAX3在乳腺癌临床中表达情况,并分析了PAX3对乳腺癌造成的影响。LEP编码出了一种由白细胞分泌的蛋白质,LEP主要对调节体重过程中起到重要作用,它可以抑制食物的摄入量和调节能量消耗,Cleveland描述了LEP基因变异与乳腺癌的发病率之间的相关关系,这说明LEP的表达异常不但可能导致体重失衡,也可能会导致乳腺癌的病变。NEUROD1 is a basic bHLH transcription factor of the NeuroD family. It can combine with other bHLH transcription factors to produce heterodimers and activate the transcription of a special DNA sequence called E-box. It also contributes to various cell differentiation pathways regulation. HeidiFiegl found that NEUROD1 was abnormally methylated in breast tissue tumors and lung tumor samples, and the higher the degree of tumor progression, the higher the level of methylation in samples. The protein encoded by SLC4A1 is a member of the AE protein family, which plays an important role in red blood cells, as a transport protein medium to help the corresponding substances pass through the cell membrane. A Gorbatenko's study showed that SLC4A1 was down-regulated in all breast cancer subtypes, suggesting that SLC4A1 may have some influence on breast cancer lesions. CYP3A4 encodes the cytochrome P450 enzyme involved in the metabolism of half of today's drugs, such as acetaminophen, codeine, cyclosporine A, and diazepam and erythromycin, as well as some steroids and carcinogens metabolism. CKeshava found that the mutation of CYP3A4 may lead to the imbalance of hormone metabolism in breast cancer, and may also activate foreign substances to cause cancer. It is an important gene related to breast cancer. The protein encoded by HHLA2 exists on the surface of monocytes. This protein can bind to receptors on lymphocytes, thereby regulating cell-mediated immunity and inhibiting the proliferation of monocytes. MJanakiram found that the copy number of HHLA2 increased by 29% in breast cancer by analyzing TCGA-related expression data, resulting in overexpression of HHLA2 in breast cancer, which shows that HHLA2 has a certain impact on breast cancer lesions. The protein encoded by S100A7 is a member of the S100 protein family. S100 proteins are widely present in the cytoplasm and nucleus, and participate in many cellular processes, such as the regulation of cell cycle and differentiation. Emberley detailed the research status of S100A7 in breast cancer and the specific mode of action and expression of S100A7 in breast cancer. Similarly, Haddadd also explained the relationship between S100A7 and breast cancer. PAX3 is a member of the PAX transcription factor family, which contains a paired box domain and a paired homeodomain, and these genes play a very important role in fetal development. In his article, WJ Tan specifically described the clinical expression of PAX3 in breast cancer, and analyzed the impact of PAX3 on breast cancer. LEP encodes a protein secreted by white blood cells. LEP plays an important role in regulating body weight. It can inhibit food intake and regulate energy expenditure. Cleveland described the relationship between LEP gene variation and the incidence of breast cancer. This shows that the abnormal expression of LEP may not only lead to weight imbalance, but also may cause breast cancer lesions.

通过上述分析可以看出,预测的基因会对乳腺癌病变产生影响,但这些基因具体的致病原理还需要相关的技术人员作深入的分析。From the above analysis, it can be seen that the predicted genes will have an impact on breast cancer lesions, but the specific pathogenic principles of these genes still require in-depth analysis by relevant technical personnel.

下面利用David数据库中的通路分析工具和STRING-DB数据库对预测基因进行整体分析。这两种分析方法可以从侧面说明预测基因通过对癌症的重要基因产生作用而导致疾病的发生,验证了预测基因和癌症基因间的关联性。本实验选择的乳腺癌重要基因有PIK3CA、TP53、PTEN、AKT1以及SF3B1。The pathway analysis tools in the David database and the STRING-DB database are used to analyze the predicted genes as a whole. These two analysis methods can explain from the side that the predicted gene causes the occurrence of the disease through its effect on the important genes of cancer, and verifies the correlation between the predicted gene and the cancer gene. The important genes of breast cancer selected in this experiment are PIK3CA, TP53, PTEN, AKT1 and SF3B1.

利用David数据库发现预测基因与重要基因之间存在风险通路,并且预测基因中也存在相关的风险通路。EREG和重要基因中的PIK3CA、AKT1存在通路,LEP与PIK3CA、AKT1间存在风险通路,值得注意的是,LEP和筛选基因中的DRD1、GABRG2以及RXFP2也存在相关的通路,如表1所示。通路分析还发现LEP、EREG和乳腺癌重要基因PIK3CA、AKT1之间存在生物代谢方面的联系,而DRD1、GABRG2、RXF2与LEP之间存在通路联系,这种相互间的联系可能是导致疾病发生的源头。Using the David database, it was found that there is a risk pathway between the predicted gene and the important gene, and there is also a related risk pathway in the predicted gene. There are pathways between EREG and PIK3CA and AKT1 in important genes, and there are risk pathways between LEP and PIK3CA and AKT1. It is worth noting that there are also related pathways between LEP and screening genes DRD1, GABRG2 and RXFP2, as shown in Table 1. Pathway analysis also found that there is a biological metabolic connection between LEP, EREG and breast cancer important genes PIK3CA and AKT1, and there is a pathway connection between DRD1, GABRG2, RXF2 and LEP, which may lead to the occurrence of the disease source.

表1乳腺癌相关基因参与的通路Table 1 Pathways involved in breast cancer-related genes

针对乳腺癌的重要基因连同筛选出来的45个基因,在STRING-DB上查看它们之间的相互作用关系,分析结果如表2所示。有些基因和其他基因没有任何联系,如OR7D2、HHLA2、DPCR1等,这并不说明它们对乳腺癌没有作用,这些基因可能单独作用于乳腺癌(如HHLA2,前面已经分析过该基因对乳腺癌的病变产生的影响),也可能与乳腺癌的其他重要基因存在相互作用。其余基因之间存在很多相互作用,这些基因构成了一个关系网络,预测的基因可能会通过某种生物功能正向或者负向地影响网络中的乳腺癌重要基因,从而产生乳腺癌病变。For the important genes of breast cancer and the 45 genes screened out, check the interaction relationship between them on STRING-DB, and the analysis results are shown in Table 2. Some genes have no connection with other genes, such as OR7D2, HHLA2, DPCR1, etc., which does not mean that they have no effect on breast cancer. These genes may act on breast cancer alone (such as HHLA2, which has been analyzed before. lesions), and may also interact with other important genes in breast cancer. There are many interactions among the remaining genes, and these genes constitute a relationship network, and the predicted genes may positively or negatively affect the important breast cancer genes in the network through certain biological functions, thus resulting in breast cancer lesions.

表2乳腺癌相关基因和重要基因之间的关联Table 2 Association between breast cancer-related genes and important genes

实验二,选取TCGAPanCancer项目中的肾癌表达数据(KIRC)作为实验对象,数据中共有1045个miRNA和20530个mRNA。In the second experiment, the kidney cancer expression data (KIRC) in the TCGAPanCancer project was selected as the experimental object, and there were 1045 miRNAs and 20530 mRNAs in the data.

使用与实验一相同的方法,对样本变化率进行排序,排序后选择前10个样本作为目标miRNA。选定的miRNA如下:hsa-mir-200c,hsa-mir-514b,hsa-mir-506,hsa-mir-508,hsa-mir-514-2,hhsa-mir-141,hsa-mir-514-3,hsa-mir-514-1,hsa-mir-184,hsa-mir-934。利用前述五种靶基因预测软件预测选定miRNA的靶基因,获得504个mRNA。从初始mRNA表达数据中选出这些mRNA样本数据,然后利用Relief算法计算权重,选取前45个mRNA作为目标mRNA。45个选定的mRNA如下:ODAM,KLHL1,TAC1,NPY2R,HYAL4,FOXE1,TTR,SLC6A14,GLRA3,FUT9,GRIA2,KCNA1,CXorf41,TFAP2B,SFTPB,CRISP1,PDE6H,AGXT2L1,LHFPL4,SLC30A8,STXBP5L,TMEM196,IL1F5,ASTN1,CRISP3,HTR2C,LIN28B,TRIM42,KIAA1486,COL9A1,GCM1,TNNI1,SCG3,ANXA10,BTC,SORCS1,KCND2,LRRN1,MSTN,ERBB4,PRG4,NAPB,ARHGAP12,C12orf53,RAD52。Using the same method as in Experiment 1, the rate of change of samples was sorted, and the top 10 samples were selected as target miRNAs after sorting. Selected miRNAs are as follows: hsa-mir-200c, hsa-mir-514b, hsa-mir-506, hsa-mir-508, hsa-mir-514-2, hhsa-mir-141, hsa-mir-514- 3, hsa-mir-514-1, hsa-mir-184, hsa-mir-934. Using the aforementioned five target gene prediction software to predict the target genes of the selected miRNAs, 504 mRNAs were obtained. These mRNA sample data were selected from the initial mRNA expression data, and then the weight was calculated using the Relief algorithm, and the top 45 mRNAs were selected as target mRNAs. The 45 selected mRNAs are as follows: ODAM, KLHL1, TAC1, NPY2R, HYAL4, FOXE1, TTR, SLC6A14, GLRA3, FUT9, GRIA2, KCNA1, CXorf41, TFAP2B, SFTPB, CRISP1, PDE6H, AGXT2L1, LHFPL4, SLC30A8, STXBP5L, TMEM196, IL1F5, ASTN1, CRISP3, HTR2C, LIN28B, TRIM42, KIAA1486, COL9A1, GCM1, TNNI1, SCG3, ANXA10, BTC, SORCS1, KCND2, LRRN1, MSTN, ERBB4, PRG4, NAPB, ARHGAP12, C12orf53, RAD52.

下面分析求得的基因在肾癌中的作用,以说明这些基因与肾癌的相关性,从而验证本研究所提方法的有效性。The function of the obtained genes in renal cancer will be analyzed below to illustrate the correlation between these genes and renal cancer, so as to verify the effectiveness of the method proposed in this study.

KLHL1是一个蛋白编码基因,属于肌组织蛋白家族的一员,在肾脏的组织细胞有表达,在许多脑部组织中也有表达。这说明KLHL1的突变可能导致肾脏中的某些细胞出现功能性问题,从而影响肾脏部位的癌变。NPY2R编码的蛋白质是神经肽(NPY)Y中Y2的受体,NPY受体参与多种生物学行为,包括食物的摄取、刺激抗焦虑、昼夜节律性疼痛调制以及传输和垂体激素释放控制。人类肾中有293种细胞受被包含NPY2R在内的基因调控,可见NPY2R在肾脏功能中起到有重要作用,NPY2R基因正常表达与否会在肾病中起到作用。KCNA1属于基因包含活性Ca2(+-)在内的钾道6-TM家族,其对四聚体的形成有一定的贡献,同时还参与子类家族的蛋白形成,例如KV1.1、KV1.2、KCNQ2和KCNQ3等。KCNA1的突变影响着肾脏部位的功能,对过表达人类肾部细胞进行分析时发现KCNA1在非功能性区域发生突变,同时也对KV1.1.的功能性区域产生了负面的影响。FOXE1属于转录因子家族的一员,该基因可能在甲状腺疾病的突变上存在相关影响,进而对肾部病变起到一定作用。有报道指出,包括FOXE1在内的基因会通过影响甲状腺机能进而影响到肾部,导致肾脏畸形和病变。SLC6A14编码的酶是溶解载体家族6中的一个成员,溶解载体家族主要用于帮助钠和氯元素在人体神经质中运输,该编码蛋白还参与了中性和阳离子氨基酸的转运,同时也作为β氨基丙酸的载体。有发现SLC6A14在肾部病变组织中出现过表达的情况,这种过表达说明SLC6A14的变异可能对肾部功能产生影响。FUT9编码的莱克斯寡糖岩藻糖基转移酶属于糖基转移酶家族中的一员,主要存在于高尔基体,在器官胚胎发育过程中也起到重要作用,FUT9还负责调控CD15在成熟粒细胞中的表达。FUT9在肾脏中表达降低1.8倍会导致CD24A在肾脏的表达增加1.8倍,直接会严重影响肾脏功能的正常发挥。虽然FUT9不会直接导致肾脏的病变,但是会间接通过影响CD24A表达影响肾部功能,其对肾脏的作用不能轻视。STXBP5L是一种重要的旁系同源基因,其编码的蛋白质能与突触融合蛋白结合。STXBP5作为蛋白质与突触的神经元相互作用,在肾脏部位大量存在,对肾部功能影响也起到很大的作用,说明STXBP5L的变异对肾脏功能有很大影响。KLHL1 is a protein-coding gene, a member of the muscle tissue protein family, expressed in tissue cells of the kidney, and also expressed in many brain tissues. This suggests that mutations in KLHL1 may lead to functional problems in certain cells in the kidney, which could affect the development of cancer in the kidney. The protein encoded by NPY2R is the receptor for Y2 in neuropeptide (NPY) Y, and NPY receptors are involved in a variety of biological actions, including food intake, stimulation of anxiety, circadian pain modulation, and transmission and control of pituitary hormone release. There are 293 kinds of cells in the human kidney that are regulated by genes including NPY2R. It can be seen that NPY2R plays an important role in kidney function, and whether the normal expression of NPY2R gene will play a role in kidney disease. KCNA1 belongs to the potassium channel 6-TM family whose genes include active Ca2(+-), which contributes to the formation of tetramers, and also participates in the formation of subfamily proteins, such as KV1.1, KV1.2 , KCNQ2 and KCNQ3 etc. Mutations in KCNA1 affect the function of the kidney. Analysis of overexpressed human kidney cells revealed that KCNA1 was mutated in a non-functional region and also negatively affected the functional region of KV1.1. FOXE1 is a member of the transcription factor family, and this gene may have related effects on the mutation of thyroid disease, and then play a certain role in kidney disease. It has been reported that genes including FOXE1 can affect the kidney by affecting the function of the thyroid gland, leading to kidney deformities and lesions. The enzyme encoded by SLC6A14 is a member of the lytic carrier family 6. The lytic carrier family is mainly used to help transport sodium and chlorine elements in human neurons. The encoded protein is also involved in the transport of neutral and cationic amino acids, and also acts as a β-amino acid Propionic acid carrier. It has been found that SLC6A14 is overexpressed in renal lesion tissues, which indicates that the variation of SLC6A14 may have an impact on renal function. The Rex oligosaccharide fucosyltransferase encoded by FUT9 is a member of the glycosyltransferase family. It mainly exists in the Golgi apparatus and plays an important role in the development of organ embryos. FUT9 is also responsible for regulating CD15 in mature granules. expression in cells. A 1.8-fold decrease in the expression of FUT9 in the kidney will lead to a 1.8-fold increase in the expression of CD24A in the kidney, which will directly and seriously affect the normal function of the kidney. Although FUT9 does not directly cause renal lesions, it indirectly affects renal function by affecting the expression of CD24A, and its role in the kidney cannot be underestimated. STXBP5L is an important paralogous gene that encodes a protein that binds to syntaxin. As a protein, STXBP5 interacts with synaptic neurons, exists in large quantities in the kidney, and also plays a significant role in affecting kidney function, indicating that the variation of STXBP5L has a great impact on kidney function.

下面利用David数据库的KEGGpathway工具和STRING-DB数据库对发现的肾癌相关基因作整体分析。这里选择的KIRC重要基因有TP53,CDH1,VEGFA,MUC1以及EGFR。通过分析发现,预测的基因和重要基因间存在关联通路,且预测的基因间也存在关联通路。BTC、ERBB与肾癌的重要基因EGFR存在风险通路,与EGFR存在通路关联的还有VEGFA和HTR2C,与EGFR存在通路图还有VEGFA,GRIA2和TP53存在通路,此外预测的肾癌相关基因HTR2C、GRIA2、GLRA3、NPY2R之间也存在风险通路,,如表3所示。The KEGG pathway tool of the David database and the STRING-DB database are used to conduct an overall analysis of the discovered kidney cancer-related genes. The important KIRC genes selected here are TP53, CDH1, VEGFA, MUC1 and EGFR. Through the analysis, it was found that there were associated pathways between the predicted genes and important genes, and there were also associated pathways between the predicted genes. There is a risk pathway between BTC, ERBB and EGFR, an important gene of kidney cancer. There are also pathways associated with EGFR, such as VEGFA and HTR2C. The pathway diagram of EGFR also has pathways of VEGFA, GRIA2 and TP53. In addition, the predicted kidney cancer-related genes HTR2C, There are also risk pathways among GRIA2, GLRA3, and NPY2R, as shown in Table 3.

表3肾癌相关基因参与的通路Table 3 Pathways involved in renal cancer-related genes

利用STRING-DB数据库对预测的肾癌相关基因和重要基因进行交互作用查看,结果如表4所示。从表4可以看出,预测的基因之间及其与重要基因之间存在很多联系,说明预测基因可能通过某些方式作用于肾癌重要基因,从而影响重要基因的正常表达,导致肾癌病变。TTR、KCNA1、FOXE1和ODAM四个基因尤其要重视,它们与许多肾癌重要基因有联系,可能同时作用于多个重要基因。The STRING-DB database was used to check the interaction between the predicted kidney cancer-related genes and important genes, and the results are shown in Table 4. It can be seen from Table 4 that there are many connections between the predicted genes and important genes, indicating that the predicted genes may act on the important genes of kidney cancer in some ways, thereby affecting the normal expression of important genes and leading to kidney cancer lesions. . Four genes, TTR, KCNA1, FOXE1 and ODAM, should be paid special attention to. They are related to many important genes of renal cancer and may act on multiple important genes at the same time.

表4肾癌相关基因和重要基因之间的关联Table 4 Association between kidney cancer-related genes and important genes

本发明的工作原理:Working principle of the present invention:

通过分析维数较小的miRNA表达数据并利用miRNA对基因的调控作用,在高维的mRNA表达数据中靶定出一个低维子集,进而利用Relief算法确定各个维度上基因的重要性,由此筛选出复杂疾病的相关基因。Relief算法是1992年由Kira和Rendell提出的一种特征权重算法,通过样本进行训练,根据训练获取样本特征的分类权重,权重越大意味着该特征对分类的意义越大。Relief算法中特征权重计算公式如下:By analyzing the small-dimensional miRNA expression data and utilizing the regulatory effect of miRNA on genes, a low-dimensional subset is targeted in the high-dimensional mRNA expression data, and then the importance of genes in each dimension is determined by using the Relief algorithm. This screens out genes associated with complex diseases. The Relief algorithm is a feature weight algorithm proposed by Kira and Rendell in 1992. It is trained through samples, and the classification weight of the sample features is obtained according to the training. The larger the weight, the greater the significance of the feature for classification. The feature weight calculation formula in the Relief algorithm is as follows:

ww ff == ww ff -- &Sigma;&Sigma; jj == 11 kk dd ii ff ff (( ff ,, sthe s ii ,, SameSame jj )) rr kk ++ &Sigma;&Sigma; jj == 11 kk dd ii ff ff (( ff ,, sthe s ii ,, Missmiss jj )) rr kk ff == 11 ,, ...... ,, qq ;; ii == 11 ,, ...... ,, pp

其中,si(i=1,...,p)表示第i个样本,p为样本数目;Samej表示si的第j个同类样本,Missj表示si的第j个异类样本,k表示近邻个数;wf(f=1,...,q)表示特征f的权重,即第f个预选基因的重要程度,q为预选基因的数目;r表示抽样次数。函数diff定义如下:Among them, s i (i=1,...,p) represents the i-th sample, p is the number of samples; Same j represents the j-th similar sample of s i , Miss j represents the j-th heterogeneous sample of s i , k represents the number of neighbors; w f (f=1,...,q) represents the weight of feature f, that is, the importance of the fth pre-selected gene, q is the number of pre-selected genes; r represents the sampling frequency. The function diff is defined as follows:

dd ii ff ff (( ff ,, sthe s ii ,, sthe s jj )) == || sthe s ii ff -- sthe s jj ff MaxMax ff -- MinMin ff ||

其中,sif表示特征f在第i个样本上的取值,sjf表示特征f在第j个样本上的取值,Maxf表示特征f在样本中的最大值,Minf则表示特征f在样本中的最小值。依据设定的抽样次数和近邻个数,Relief算法经过多次迭代求得每个特征的权重W={w1,w2,…,wq},然后根据权重W对特征进行排序。Among them, s if represents the value of feature f on the i-th sample, s jf represents the value of feature f on the j-th sample, Max f represents the maximum value of feature f in the sample, and Min f represents the feature f The minimum value in the sample. According to the set sampling times and the number of neighbors, the Relief algorithm obtains the weight W={w 1 ,w 2 ,…,w q } of each feature through multiple iterations, and then sorts the features according to the weight W.

以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. within range.

Claims (10)

1. one kind utilizes the method that miRNA expression data finds cancer related gene, it is characterized in that, the described method utilizing miRNA expression data to find cancer related gene is based on the general cancer project PanCancer under cancer gene group collection of illustrative plates TCGA, use statistical analysis and machine learning algorithm, it is analyzed gene expression data processing, identifies the gene relevant to complex disease;Including:
Sample data arranges, and obtains miRNA expression data and the mrna expression data of certain disease, and two kinds of data all comprise disease sample and corresponding normal sample;
MiRNA data being carried out statistical analysis, tries to achieve the mean expression value of normal sample and disease sample respectively, this process to get rid of the impact of null value;
By miRNA by the sequence of Change in Mean rate, the ranking that rate of change is more big is more forward, screens 10 forward miRNA of ranking as relevant miRNA;
Application five microRNA target prediction softwares of miRanda, miRDB, miRWalk, RNA22, Targetscan are as the instrument of prediction mRNA, obtain the target gene of corresponding miRNA, selected target gene follows following condition: for five microRNA target prediction softwares used, assume that K represents the maximum of the forecasting software number simultaneously predicting identical target gene, NkRepresent simultaneously by the gene number of K target gene software prediction, at least to be predicted by R (0≤R≤K) individual microRNA target prediction software as preliminary election gene simultaneously;
According to the mRNA chosen, go out corresponding disease sample and normal sample from initial mrna expression extracting data;
Utilize Relief algorithm that the gene in said extracted mRNA sample out is ranked up, arrange from big to small by importance, take front 45 genes disease related gene as prediction。
2. utilize the method that miRNA expression data finds cancer related gene as claimed in claim 1, it is characterized in that, when miRNA data are analyzed, get rid of the impact of null value, when asking for miRNA average, first obtain the number m of nonzero value in each sample, then miRNA sample expression values summation Sum is tried to achieve, then calculating sample average is Sum/m, normal sample expression values average is n, disease sample expression values average is c, then corresponding expression values rate of change is | n-c |/n, Change in Mean rate according to miRNA, before determining sample expression values rate of change ranking, the miRNA of 10 is relevant miRNA。
3. utilize the method that miRNA expression data finds cancer related gene as claimed in claim 1, it is characterised in that at least to be predicted by R (0≤R≤K) individual microRNA target prediction software as preliminary election gene, N simultaneouslykDuring > 10, R=K;Work as Nk< 10 and Nk-1During > 10, R=K-1;In like manner, if Nk-1< 10 and Nk-2> 10, then R=K-2, by that analogy。
4. utilize the method that miRNA expression data finds cancer related gene as claimed in claim 1, it is characterised in that the feature selection approach chosen is Relief algorithm, and feature weight computing formula is as follows:
w f = w f - &Sigma; j = 1 k d i f f ( f , s i , Same j ) r k + &Sigma; j = 1 k d i f f ( f , s i , Miss j ) r k f = 1 , ... , q ; i = 1 , ... , p ;
Wherein, si(i=1 ..., p) represent i-th sample, p is number of samples;SamejRepresent siThe similar sample of jth, MissjRepresent siJth foreign peoples's sample, k represents neighbour's number;Wf(f=1 ..., q) represent the weight of feature f, i.e. the significance level of the f preliminary election gene, q is the number of preliminary election gene;R represents frequency in sampling;
Function diff definition is as follows:
d i f f ( f , s i , s j ) = | s i f - s j f Max f - Min f | ;
Wherein, sifRepresent feature f value on i-th sample, sjfRepresent feature f value on jth sample, MaxfRepresent feature f maximum in the sample, MinfThen representing feature f minima in the sample, frequency in sampling r=10, neighbour's number k=20, Relief algorithm iteration number of times is 30 times, calculates the weight W={w of each feature1,w2,…,wq, and according to weight W, mRNA is sorted。
5. the system of the method utilizing miRNA expression data to find cancer related gene as claimed in claim 1, it is characterised in that described system includes:
Sample data sorting module, for obtaining miRNA expression data and the mrna expression data of certain disease, two kinds of data all comprise disease sample and corresponding normal sample;
Statistical analysis module, for miRNA data are carried out statistical analysis, tries to achieve the mean expression value of normal sample and disease sample respectively, and this process to get rid of the impact of null value;
Screening ranking module, for being sorted by Change in Mean rate by miRNA, the ranking that rate of change is more big is more forward, screens 10 forward miRNA of ranking as relevant miRNA;
Selected target gene module, for applying five the microRNA target prediction softwares of miRanda, miRDB, miRWalk, RNA22, Targetscan instrument as prediction mRNA, obtain the target gene of corresponding miRNA, selected target gene follows following condition: for five microRNA target prediction softwares used, assume that K represents the maximum of the forecasting software number simultaneously predicting identical target gene, NkRepresent simultaneously by the gene number of K target gene software prediction, at least to be predicted by R (0≤R≤K) individual microRNA target prediction software as preliminary election gene, N simultaneouslykDuring > 10, R=K;Work as Nk< 10 and Nk-1During > 10, R=K-1;In like manner, if Nk-1< 10 and Nk-2> 10, then R=K-2, by that analogy;
Extraction module, for according to the mRNA chosen, going out corresponding disease sample and normal sample from initial mrna expression extracting data;
Order module, for utilizing Relief algorithm that the gene in said extracted mRNA sample out is ranked up, arranges from big to small by importance, takes front 45 genes disease related gene as prediction。
6. system as claimed in claim 5, it is characterised in that described statistical analysis module farther includes:
Unit asked for by nonzero value, when being used for asking for miRNA average, first obtains the number m of nonzero value in each sample;
Sample average computing unit, is used for trying to achieve miRNA sample expression values summation Sum, then calculating sample average is Sum/m;
Expression values rate of change computing unit, normal sample expression values average is n, and disease sample expression values average is c, then corresponding expression values rate of change is | n-c |/n;
Ranking unit, for the Change in Mean rate according to miRNA, it is determined that before sample expression values rate of change ranking, the miRNA of 10 is relevant miRNA。
7. the Biological target therapy system applying the method utilizing miRNA expression data discovery cancer related gene described in claim 1-4 any one。
8. the bio-pharmaceutical development technology applying the method utilizing miRNA expression data discovery cancer related gene described in claim 1-4 any one。
9. the pathogenesis system applying the method utilizing miRNA expression data discovery cancer related gene described in claim 1-4 any one。
10. the pathogenic Risk Forecast System applying the method utilizing miRNA expression data discovery cancer related gene described in claim 1-4 any one。
CN201610019087.6A 2016-01-12 2016-01-12 It was found that the method and related system of cancer related gene, process for preparing medicine Active CN105701365B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610019087.6A CN105701365B (en) 2016-01-12 2016-01-12 It was found that the method and related system of cancer related gene, process for preparing medicine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610019087.6A CN105701365B (en) 2016-01-12 2016-01-12 It was found that the method and related system of cancer related gene, process for preparing medicine

Publications (2)

Publication Number Publication Date
CN105701365A true CN105701365A (en) 2016-06-22
CN105701365B CN105701365B (en) 2018-09-07

Family

ID=56226286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610019087.6A Active CN105701365B (en) 2016-01-12 2016-01-12 It was found that the method and related system of cancer related gene, process for preparing medicine

Country Status (1)

Country Link
CN (1) CN105701365B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845104A (en) * 2017-01-19 2017-06-13 东南大学 Method and system and the application of carcinoma of the rectum correlation microRNA molecule mark are found using TCGA database resources
CN107066835A (en) * 2017-01-19 2017-08-18 东南大学 A kind of utilization common data resource discovering and method and system and the application for integrating rectum cancer associated gene and its functional analysis
CN107358062A (en) * 2017-06-02 2017-11-17 西安电子科技大学 A kind of construction method of double-deck gene regulatory network
CN107516021A (en) * 2017-08-03 2017-12-26 北京百迈客生物科技有限公司 A kind of data analysing method based on high-flux sequence
CN108182346A (en) * 2016-12-08 2018-06-19 杭州康万达医药科技有限公司 Predict method for building up and its application of the siRNA for the machine learning model of the toxicity of certain class cell
CN108664764A (en) * 2018-05-14 2018-10-16 浙江大学 A kind of colon cancer cancer cell detector that parameter is optimal
CN109036572A (en) * 2018-06-29 2018-12-18 迈凯基因科技有限公司 A kind of multiple database exchange method and device
CN109033750A (en) * 2018-07-18 2018-12-18 温州大学 A method of miRNA is to related disease gene influence degree for quantization
CN109063420A (en) * 2018-06-29 2018-12-21 迈凯基因科技有限公司 A kind of colorectal cancer genetic mutation and drug interpret multiple database interactive system
CN109065181A (en) * 2018-06-29 2018-12-21 迈凯基因科技有限公司 A kind of multiple database exchange method and device based on wide in range retrieval
CN109694912A (en) * 2019-02-28 2019-04-30 深圳市亚辉龙生物科技股份有限公司 The nucleic acid compositions and its kit and detection method of application, the detection methylation of methylation sites
CN109923614A (en) * 2016-10-31 2019-06-21 首选网络株式会社 Disease suffer from decision maker, disease suffer from determination method and disease suffer from decision procedure
WO2019243909A1 (en) * 2018-06-18 2019-12-26 International Business Machines Corporation Determining potential cancer therapeutic targets by joint modeling of survival events
WO2020124585A1 (en) * 2018-12-21 2020-06-25 北京哲源科技有限责任公司 Method for acquiring intracellular deterministic event, electronic device, and storage medium
CN112708673A (en) * 2021-03-26 2021-04-27 广州市妇女儿童医疗中心 Application of PRDM9 transposon fusion as congenital megacolon disease marker
CN112852957A (en) * 2021-03-26 2021-05-28 广州市妇女儿童医疗中心 Early diagnosis marker for Hirschmannia and application thereof
CN113838527A (en) * 2021-09-26 2021-12-24 平安科技(深圳)有限公司 Target gene prediction model generation method and device and storage medium
CN114333991A (en) * 2020-09-30 2022-04-12 北京瑷格干细胞科技有限公司 Method for screening disease markers by bioinformatics and application thereof
CN118116454A (en) * 2024-02-01 2024-05-31 西南大学 Integrated automatic analysis method and computer program product for miRNA targets

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102321735A (en) * 2010-11-25 2012-01-18 上海聚类生物科技有限公司 Method for searching target gene of reverse miRNA
WO2015078906A1 (en) * 2013-11-26 2015-06-04 Integragen A method for predicting responsiveness to a treatment with an egfr inhibitor
US20150315645A1 (en) * 2014-05-03 2015-11-05 The Regents Of The University Of California Methods of identifying biomarkers associated with or causative of the progression of disease
CN105063209A (en) * 2015-08-10 2015-11-18 北京吉因加科技有限公司 Quantitative detection method of exosome miRNA (micro ribonucleic acid)

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102321735A (en) * 2010-11-25 2012-01-18 上海聚类生物科技有限公司 Method for searching target gene of reverse miRNA
WO2015078906A1 (en) * 2013-11-26 2015-06-04 Integragen A method for predicting responsiveness to a treatment with an egfr inhibitor
US20150315645A1 (en) * 2014-05-03 2015-11-05 The Regents Of The University Of California Methods of identifying biomarkers associated with or causative of the progression of disease
CN105063209A (en) * 2015-08-10 2015-11-18 北京吉因加科技有限公司 Quantitative detection method of exosome miRNA (micro ribonucleic acid)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109923614A (en) * 2016-10-31 2019-06-21 首选网络株式会社 Disease suffer from decision maker, disease suffer from determination method and disease suffer from decision procedure
CN108182346A (en) * 2016-12-08 2018-06-19 杭州康万达医药科技有限公司 Predict method for building up and its application of the siRNA for the machine learning model of the toxicity of certain class cell
CN108182346B (en) * 2016-12-08 2021-07-30 杭州康万达医药科技有限公司 Establishment method and application of machine learning model for predicting toxicity of siRNA to certain cells
CN107066835A (en) * 2017-01-19 2017-08-18 东南大学 A kind of utilization common data resource discovering and method and system and the application for integrating rectum cancer associated gene and its functional analysis
CN107066835B (en) * 2017-01-19 2020-03-17 东南大学 System for discovering and integrating rectal cancer related gene and functional analysis thereof
CN106845104A (en) * 2017-01-19 2017-06-13 东南大学 Method and system and the application of carcinoma of the rectum correlation microRNA molecule mark are found using TCGA database resources
CN106845104B (en) * 2017-01-19 2019-04-09 东南大学 Method, system and application for discovering rectal cancer-related microRNA molecular markers using TCGA database resources
CN107358062A (en) * 2017-06-02 2017-11-17 西安电子科技大学 A kind of construction method of double-deck gene regulatory network
CN107516021A (en) * 2017-08-03 2017-12-26 北京百迈客生物科技有限公司 A kind of data analysing method based on high-flux sequence
CN107516021B (en) * 2017-08-03 2019-11-19 北京百迈客生物科技有限公司 A kind of data analysing method based on high-flux sequence
CN108664764A (en) * 2018-05-14 2018-10-16 浙江大学 A kind of colon cancer cancer cell detector that parameter is optimal
US11410745B2 (en) 2018-06-18 2022-08-09 International Business Machines Corporation Determining potential cancer therapeutic targets by joint modeling of survival events
CN112204667A (en) * 2018-06-18 2021-01-08 国际商业机器公司 Determination of potential cancer therapeutic targets by joint modeling of survival events
GB2589745A (en) * 2018-06-18 2021-06-09 Ibm Determining potential cancer therapeutic targets by joint modeling of survival events
WO2019243909A1 (en) * 2018-06-18 2019-12-26 International Business Machines Corporation Determining potential cancer therapeutic targets by joint modeling of survival events
CN109063420A (en) * 2018-06-29 2018-12-21 迈凯基因科技有限公司 A kind of colorectal cancer genetic mutation and drug interpret multiple database interactive system
CN109065181A (en) * 2018-06-29 2018-12-21 迈凯基因科技有限公司 A kind of multiple database exchange method and device based on wide in range retrieval
CN109063420B (en) * 2018-06-29 2020-08-11 迈凯基因科技有限公司 Colorectal cancer gene variation and drug interpretation multi-database interaction system
CN109036572B (en) * 2018-06-29 2020-08-11 迈凯基因科技有限公司 Multi-database interaction method and device
CN109036572A (en) * 2018-06-29 2018-12-18 迈凯基因科技有限公司 A kind of multiple database exchange method and device
CN109065181B (en) * 2018-06-29 2021-01-01 迈凯基因科技有限公司 Multi-database interaction method and device based on broad search
CN109033750A (en) * 2018-07-18 2018-12-18 温州大学 A method of miRNA is to related disease gene influence degree for quantization
WO2020124585A1 (en) * 2018-12-21 2020-06-25 北京哲源科技有限责任公司 Method for acquiring intracellular deterministic event, electronic device, and storage medium
CN111602201A (en) * 2018-12-21 2020-08-28 北京哲源科技有限责任公司 Methods, electronic devices and storage media for obtaining deterministic events in cells
CN111602201B (en) * 2018-12-21 2023-08-01 北京哲源科技有限责任公司 Method, electronic device and storage medium for obtaining intracellular deterministic events
CN109694912A (en) * 2019-02-28 2019-04-30 深圳市亚辉龙生物科技股份有限公司 The nucleic acid compositions and its kit and detection method of application, the detection methylation of methylation sites
CN114333991A (en) * 2020-09-30 2022-04-12 北京瑷格干细胞科技有限公司 Method for screening disease markers by bioinformatics and application thereof
CN112708673A (en) * 2021-03-26 2021-04-27 广州市妇女儿童医疗中心 Application of PRDM9 transposon fusion as congenital megacolon disease marker
CN112852957A (en) * 2021-03-26 2021-05-28 广州市妇女儿童医疗中心 Early diagnosis marker for Hirschmannia and application thereof
CN112852957B (en) * 2021-03-26 2021-11-12 广州市妇女儿童医疗中心 Markers for early diagnosis of Hirschsprung's disease and their applications
CN113838527A (en) * 2021-09-26 2021-12-24 平安科技(深圳)有限公司 Target gene prediction model generation method and device and storage medium
CN113838527B (en) * 2021-09-26 2023-09-01 平安科技(深圳)有限公司 Method and device for generating target gene prediction model and storage medium
CN118116454A (en) * 2024-02-01 2024-05-31 西南大学 Integrated automatic analysis method and computer program product for miRNA targets

Also Published As

Publication number Publication date
CN105701365B (en) 2018-09-07

Similar Documents

Publication Publication Date Title
CN105701365B (en) It was found that the method and related system of cancer related gene, process for preparing medicine
Ali et al. Sequencing identifies a distinct signature of circulating microRNAs in early radiographic knee osteoarthritis
US20220180964A1 (en) Systems and methods for karyotyping by sequencing
US20190065670A1 (en) Predicting disease burden from genome variants
Min et al. Got target?: computational methods for microRNA target prediction and their extension
EP4447053A2 (en) Deep learning-based splice site classification
Wake et al. Novel microRNA discovery using small RNA sequencing in post-mortem human brain
Kanke et al. miRquant 2.0: an expanded tool for accurate annotation and quantification of microRNAs and their isomiRs from small RNA-sequencing data
Shukla et al. A compilation of Web-based research tools for miRNA analysis
Széll et al. The enigmatic world of mRNA-like ncRNAs: their role in human evolution and in human diseases
Xie et al. SRG-vote: Predicting miRNA-gene relationships via embedding and LSTM ensemble
Jayaswal et al. Identification of microRNAs with regulatory potential using a matched microRNA-mRNA time-course data
Xiao et al. Differential expression pattern-based prioritization of candidate genes through integrating disease-specific expression data
Mohebbi et al. Beyond sequence: A novel image-based model for microrna target prediction
Jin et al. Identification and characterization of salt-tolerance relative miRNAs in Procambarus clarkii by high-throughput sequencing
Ashok et al. Systems biology tools for the identification of potential drug targets and biological markers effective for cancer therapeutics
Bhatt et al. In silico exploration of miRNA from EST data of avocado and predicting its cross-kingdom effects on human
Yuan Characterizing Transcriptionally-Derived Molecular Subsets of Systemic Sclerosis Using Deep Neural Networks and miRNA Activity Scores
Milanese et al. Roles of Skeletal Muscle in Development: A Bioinformatics and Systems Biology Overview
Bandyopadhyay et al. Analyzing miRNA co-expression networks to explore TF-miRNA regulation (Supplementary details)
WO2025062046A1 (en) Small rna (mirxon) signatures for determining biological states
Simpson Jr Investigating Disease Mechanisms and Drug Response Differences in Transcriptomics Sequencing Data
Parveen Advanced hierarchical learning approach for microRNA and target prediction
Wake RNA sequencing differential expression and small RNA analyses of obesity and BMI with post-mortem human brain
HK40117868A (en) Deep learning-based splice site classification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant