[go: up one dir, main page]

CN116825199A - Method and system for screening siRNA sequence to reduce off-target effect - Google Patents

Method and system for screening siRNA sequence to reduce off-target effect Download PDF

Info

Publication number
CN116825199A
CN116825199A CN202310144980.1A CN202310144980A CN116825199A CN 116825199 A CN116825199 A CN 116825199A CN 202310144980 A CN202310144980 A CN 202310144980A CN 116825199 A CN116825199 A CN 116825199A
Authority
CN
China
Prior art keywords
sirna
screening
data
sequence
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310144980.1A
Other languages
Chinese (zh)
Inventor
王全军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202310144980.1A priority Critical patent/CN116825199A/en
Publication of CN116825199A publication Critical patent/CN116825199A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Analytical Chemistry (AREA)
  • Public Health (AREA)
  • Chemical & Material Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the technical field of siRNA sequence screening biological information, in particular to a method and a system for screening an siRNA sequence to reduce off-target effect. The method comprises the steps of obtaining siRNA candidate sequence data and characteristic data, establishing a machine learning model according to the characteristic data, training the machine learning model by utilizing a training set to obtain an siRNA sequence screening model, extracting data in a test set, inputting the data into the siRNA sequence screening model, and obtaining the screening score, wherein the highest screening score is the optimal siRNA. The method and the system have the advantages that the screening efficiency, accuracy, sensitivity, specificity and MCC value are obviously higher than those of the comparative example, and the condition of training for multiple degrees does not exist; and the off-target effect of siRNA on target mRNA can be almost avoided, and the method has important reference value in the future siRNA interference efficiency prediction field.

Description

Method and system for screening siRNA sequence to reduce off-target effect
Technical Field
The invention relates to the technical field of siRNA sequence screening biological information, in particular to a method and a system for screening an siRNA sequence to reduce off-target effect.
Background
RNA interference (siRNA) is a biological phenomenon that is ubiquitous in nature and leads to degradation of target mRNA. Gene silencing based on siRNA has become a powerful tool for functional gene analysis, and experimental results show that antisense RNA inhibits gene expression by complementary binding to mRNA sequences. The silencing effect of siRNA is very strong, and 1-3 double-chain siRNAs can mediate gene silencing in cells. siRNA is produced by cleavage of double-stranded RNA (dsRNA) by Dicer enzyme and can be combined with an enzyme complex into RNA-induced silencing complex (RISC), the double-stranded, iRNA is unwound into a single strand, and then combined with target mRNA through its antisense strand, promoting enzymatic degradation of the target mRNA. The key to siRNA success depends on the efficient interaction of siRNA and mRNAs, so designing highly efficient specific sirnas is a very challenging problem in siRNA application. At present, many sites for designing high-efficiency siRNA exist, but it is not clear that characteristic parameters play a role in determining the high-efficiency of siRNA. The complex action mechanism between the siRNA and the mRNA determines that the base at each position in the siRNA sequence has certain preferential wins, however, the existing siRNA design rules have a lot of inconsistencies, and the mechanisms of the rules are not completely clear, so that the existing siRNA design cannot well inhibit the expression of target genes, and the development of the siRNA technology is affected. Many studies have demonstrated that these rules are not entirely applicable to all target genes, and that they are of different value for different target genes, requiring re-knowledge of existing siRNA design rules and further optimization to reduce the impact due to the inconsistency of the design rules.
A large number of biological experiments show that siRNAs bound to different targets of the same mRNA have different silencing efficiencies. In view of the high efficiency of searching for a proper siRNA binding target on mRNA in a biological experiment mode, low off-target prevention or off-target rate, high cost, long period and many interference factors, the prediction of the proper siRNA binding target on mRNA by means of computer technology has significant meaning. Early, the target prediction of siRNA silencing mRNA is mainly based on the observation of the occurrence frequency of various bases on an mRNA target sample combined with siRNA by researchers, the efficiency is low, and the optimal result is difficult to obtain. With the increase of siRNA combined with mRNA target samples and the rising of machine learning methods, the base sequence characteristics of the siRNA combined with mRNA targets are extracted, and then a prediction model is trained by using large sample data, so that the target prediction efficiency and accuracy of siRNA silencing mRNA are greatly improved. However, the existing prediction model only considers the base sequence characteristics of the siRNA binding to the mRNA target, but does not consider the secondary structural characteristics of the RNA at the binding mRNA target, so that the prediction effect is still unsatisfactory.
Disclosure of Invention
In order to solve or alleviate the above-mentioned part of the technical problems, the present invention provides a novel method and system for screening siRNA sequences to reduce off-target effects. The method adopts a brand new machine learning model and training method to screen and score the candidate siRNA, and screens the siRNA with the highest score to obtain the most suitable siRNA. The method and the system have the advantages that the screening efficiency, accuracy, sensitivity, specificity and MCC value are obviously higher than those of the comparative example, and the condition of training for multiple degrees does not exist; and the off-target effect of siRNA on target mRNA can be almost avoided, and the method has important reference value in the future siRNA interference efficiency prediction field. For this purpose, the invention provides the following technical scheme:
in a first aspect, the present invention provides a method for screening siRNA sequences to obtain miRNA free from off-target, comprising:
s100, acquiring siRNA candidate sequence data;
s200, extracting characteristic data according to the siRNA candidate sequence data, and forming a training set and a testing set from the characteristic data; the characteristic data comprise sequence characteristic data, secondary structure characteristic data, thermodynamic characteristic data and mid-target rate data corresponding to each siRNA candidate sequence;
s300, constructing a machine learning model according to the sequence characteristic data, the secondary structure characteristic data, the thermodynamic characteristic data and the middle target rate data;
s400, training a machine learning model by using a training set, and obtaining an siRNA sequence screening model;
s500, extracting feature vectors in a test set, and inputting an siRNA sequence screening model, wherein the feature vectors comprise sequence feature vectors, secondary structure feature vectors and thermodynamic feature vectors; and
s600, screening siRNA with optimal target effect in mRNA according to the screening scores and outputting data, wherein the output data comprises sequence data, characteristic values and screening scores of the optimal siRNA; wherein, the highest screening score is the optimal siRNA.
Further, the siRNA candidate sequence comprises siRNA with the mid-target rate more than 90%, the mid-target rate 70% -90%, the mid-target rate 50% -70% and the mid-target rate less than 50%; the training set comprises sequence characteristic data, secondary structure characteristic data, thermodynamic characteristic data and middle target rate data corresponding to siRNA with the middle target rate of more than 90%, the middle target rate of 70% -90%, the middle target rate of 50% -70% and the middle target rate of less than 50%.
Further, the sequence features include G/C content, U-T1, U-T2, U-T3, U-T4, A, N, (G-C)%, (A-U)%, (G+C)% and (A+U)%, the secondary structural features include hydrogen bonding coefficients, and the thermodynamic features include ΔGm, ΔGs, ΔGd, P, W and M.
Further, the characteristic extraction method of the G/C content is that the percentage ratio of G and C in the candidate siRNA respectively;
the characteristic extraction method of U-T1 is to judge whether the 5' end of the antisense strand is A/U, if so, the characteristic value is 1, otherwise, the characteristic value is 0;
the characteristic extraction method of U-T2 is to judge whether the 5' end of the sense strand is G/C, if so, the characteristic value is 1, otherwise, the characteristic value is 0;
the characteristic extraction method of U-T3 is to judge whether AU is enriched in 1/3 region at 5' end of antisense strand, if yes, the characteristic value is 1, otherwise 0;
the characteristic extraction method of U-T4 is to judge whether there is GC area exceeding 9 bits continuously, if yes, the characteristic value is 1, otherwise it is 0;
the feature extraction method of A is to judge whether U-T1/U-T2/U-T3/U-T4 is satisfied at the same time, if yes, the feature value is 1, otherwise, 0;
the characteristic extraction method of N is to judge whether U-T1/U-T2/U-T3/U-T4 is not satisfied at the same time, if yes, the characteristic value is 1, otherwise, 0;
the feature extraction method of (G-C)% is 100× (G% -C%)/(G% +C%);
the feature extraction method of (A-U)% is 100× (A% -U%)/(A% +U%);
the feature extraction method of (G+C)% is to calculate G% +C%;
the feature extraction method of (A+U)% is calculated as A++U%.
Further, the feature extraction method of the hydrogen bond coefficient comprises the following calculation by adopting the following formula:
in this formula, i represents the nucleotide number in the target siRNA region to which the siRNA corresponds, and PH-bond is the probability that the ith nucleotide forms a hydrogen bond with other nucleotides in the same mRNA.
Further, the feature extraction method of Δgm is to calculate the energy to open the target mRNA binding site;
the characteristic extraction method of delta Gs is to calculate the energy for opening siRNA;
the ΔGd feature extraction method is to calculate the energy (in kcal/mol) released by binding of siRNA to mRNA
P is characterized by calculating and judging the quotient of the length from the 5' end of mRNA and the length of mRNA from the position of the first base combined with siRNA;
the characteristic extraction method of W is to calculate the number of the ligand which is not complementary to the secondary structure formed by target mRNA;
the feature extraction method of M is to calculate the energy released by the formation of secondary structure of target mRNA.
Further, the machine learning model is
Wherein S is a screening score for a certain siRNA against a certain target mRNA; m is the number of all candidate siRNAs as described above for a certain target mRNA; n is the number of features; t (T) 1 To target mRNA for siRNA sequence characteristic value, T 2 To the secondary structural feature value of a target mRNA, T 3 Is a thermodynamic characteristic value for a certain target mRNA; p, q and j are model parameters.
Further, the step S400 specifically includes:
s401, after the training set is obtained, extracting a sequence characteristic value, a secondary structure characteristic value and a thermodynamic characteristic value in each training sample to respectively form corresponding characteristic vectors;
s402, training the features in a training sample by using the machine learning model, and confirming optimal parameters of the machine learning model by using 10 times of cross validation;
s403, establishing the siRNA sequence screening model according to the determined optimal parameters.
In a second aspect, the present invention provides a siRNA sequence screening apparatus for obtaining miRNA free from off-target comprising:
an input unit for receiving siRNA candidate sequence data;
a storage unit for storing a program of an siRNA screening model of the siRNA candidate sequence and characteristic data of the siRNA candidate sequence; the characteristic data of the siRNA candidate sequence comprises sequence characteristic data, secondary structure characteristic data, thermodynamic characteristic data and middle target rate data corresponding to various siRNA sequence data;
an arithmetic unit for scanning the siRNA candidate sequence by using a program;
and an output unit for outputting the analysis result of the siRNA candidate sequence.
In a third aspect, the present invention provides a system for obtaining siRNA sequence screening free of off-target mirnas, comprising a screening apparatus according to the second aspect;
a database for storing an siRNA dataset comprising known siRNA sequence data, silencing data, test data, and clinical information data for each encoded mRNA; and
and a learning device for determining the optimal parameters of the machine learning model.
The more technical effects of the siRNA sequence screening method, device and system for obtaining miRNA free from off-target provided by the invention will be specifically described in the examples.
Drawings
Fig. 1 is a flow chart of a screening method for obtaining siRNA sequences free from off-target mirnas according to an embodiment of the present invention.
Fig. 2 is a schematic flow chart of step S400 in fig. 1.
FIG. 3 is a diagram showing the result of electrophoresis of the interference of the siRNA sequence of the screened mRNA sequence of the NOX4 gene with the mRNA expression level of the gastric cancer cells of the mouse MFC, wherein lanes are Marker and 10 screened siRNA sequences in sequence.
FIG. 4 is a graph showing the result of electrophoresis of the mRNA expression level of the gastric cancer cells of the mouse MFC interfered with by the siRNA sequence of the selected mRNA sequence of the NOX4 gene provided by the comparative example, wherein lanes are Marker and 10 selected siRNA sequences in sequence.
FIG. 5 is a diagram showing the result of electrophoresis of the interference of the siRNA sequence of the mRNA sequence of the SLC22A17 gene on the mRNA expression level of the gastric cancer cells of the mouse MFC, wherein the lanes are Marker and 10 selected siRNA sequences in sequence.
FIG. 6 is a diagram showing the result of electrophoresis of the mRNA expression level of the mRNA of the mouse MFC gastric cancer cells interfered by the siRNA sequence of the mRNA sequence of the SLC22A17 gene screened according to the comparative example, wherein lanes are Marker and 10 siRNA sequences screened in sequence.
Fig. 7 is a schematic diagram of a siRNA sequence screening apparatus for obtaining miRNA free from off-target according to an embodiment of the present invention.
Fig. 8 is a schematic diagram of a siRNA sequence screening system for obtaining miRNA free from off-target according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. The reagents not specifically and individually described in the present invention are all conventional reagents and are commercially available; methods which are not specifically described in detail are all routine experimental methods and are known from the prior art.
The following technique corresponds to a novel screening technique for reducing the off-target rate of siRNA to target mRNA.
Fig. 1 is an example of a flow chart of a screening method for obtaining siRNA sequences that are free of off-target mirnas.
In step S100, the screening apparatus acquires siRNA candidate sequence data. The siRNA candidate sequence data includes data for various siRNAs, e.g., available from a conventional publicly available database), or a Huesken et al website, e.g., according to the Huesken website, the siRNAs were designed to co-encode 52 immune-related siRNAs cgb5, INHBB, TMSBI5A, DKK1, GDF7, SLC22a17, GHR, MMP12, PGF, FGFR4, IGHV3-35, INHBA, IGHD3-16, IGLV3-22, MASP1, BMP8A, CARD11, NOX4, NPR3, IGHA1, ZNF385DAS2, Z995721, U623174, SLC22a11, SIGLECI2, RNU7154P, REC114, pitpmm 2AS1, PAKS, PAD13, MTND1P23, MTC02P12, MPEG1, MMP3, LRRN4CL, LINC00886, LCN2, IGLV170, IGHV 460, IGKV6D21, IGHV 3D20, IGKV10R2108, IGKV idb, IGKV1D43, IGKV18, IGHV349, IGHV322, IGHV320, iggck HDGFLI, GGTAIP, and IGHV 20, and siRNA sequences with good siRNA-binding efficiency, and siRNA-binding to their respective siRNAs, and to obtain their siRNAs with good siRNAs-binding efficiency. The siRNA-related data for each of these mRNAs designed are given in Table 1.
Wherein, the number of siRNAs with the mid-target rate of more than 90% on each mRNA target is 528, the number of siRNAs with the mid-target rate of 70-90% is 894, the number of siRNAs with the mid-target rate of 50-70% is 739, and the number of siRNAs with the mid-target rate of less than 50% is 638.
Specifically, the data of these siRNA sequences are a SAM file and an siRNA annotation FASTA file generated by RNA-seq, and the format requirements of the input data are: SAM format and FASTA format.
In step S200, the screening device extracts the feature data and composes the feature data into a training set and a test set. The characteristic data comprise sequence characteristic data, secondary structure characteristic data, thermodynamic characteristic data and mid-target rate data corresponding to various siRNA sequence data.
Wherein the sequence characteristic data includes values of G/C content, U-T1, U-T2, U-T3, U-T4, A, N, (G-C)%, (A-U)%, (G+C)% and (A+U)%. The characteristics of the siRNA sequence related to the target rate in the siRNA in the present invention are shown in Table 1.
TABLE 1siRNA sequence characterization
The secondary structural feature data are hydrogen bond coefficient values. Because some parts of the mRNA fold on themselves to form a tighter secondary structure, such as a hairpin structure, the binding of the siRNA to the mRNA for the partial sequence is affected, so that the inhibition effect of the partial siRNA on the expression of the siRNA is not obvious. From the two points, the smaller steric hindrance and the looser secondary structure can enable the siRNA to be combined with the target mRNA more easily, and are key for ensuring the generation of stronger siRNA silencing effect and mid-target rate. Thus, the next effort was to predict the ease of forming complex secondary structures at the mRNA leather sequences corresponding to the siRNAs through a series of calculations, and thereby select the appropriate siRNA sequences.
The difficulty in forming a complex secondary structure from an mRNA target sequence corresponding to siRNA can be assessed by the hydrogen bond coefficient, which can be calculated by the following formula.
In this formula, i represents the nucleotide number, P, in the target siRNA region to which the siRNA corresponds H-bond Is the possibility of the ith nucleotide forming hydrogen bonds with other nucleotides in the same mRNA, P H-bond Is calculated based on the possible secondary structure of all mRNAs proposed by Mfold.
P H-bond Number of hydrogen bonds that can be formed by 1 st to ith nucleotide/number of all possible secondary structures of mRNA
NHB is the number of hydrogen bonds that can be formed by the i-th nucleotide, and is equal to 3 if G or C, and equal to 2 if A or T.
Thermodynamic characteristic data are the values of Δgm, Δgs, Δgd, P, W, and M. The thermodynamic characteristics of the binding of the siRNA extracted by the invention to mRNA are shown in Table 2.
TABLE 2 thermodynamic characteristics of siRNA binding to mRNA
In some embodiments, the screening device may employ existing RNAup software to effect the extraction step of the feature data, or the extraction may be accomplished by programming in the C# language.
In some embodiments, 80% of the 2799 siRNAs with a mid-target rate greater than 90%, a mid-target rate between 70% and 90%, a mid-target rate between 50% and 70% and a mid-target rate < 50% are selected as training samples of the training set, respectively, and the rest are selected as test samples of the test set.
In step 300, a machine learning model is provided for screening scores of a siRNA against a mRNA according to the above examples:
wherein S is a screening score for a certain siRNA against a certain target mRNA; m is the number of all candidate siRNAs as described above for a certain target mRNA; n is the number of features, e.g. features of the siRNA sequence described above, n 1 =11; for the above secondary structural feature n 2 =1; for the thermodynamic characteristics described above, n 3 =6。
Wherein T is 1 To target mRNA for siRNA sequence characteristic value, T 2 To the secondary structural feature value of a target mRNA, T 3 Is a thermodynamic characteristic value for a target mRNA. p, q and j are model parameters.
In step S400, performing supervised training learning on the machine learning model by using a training set, and obtaining an siRNA sequence screening model; wherein the training set is the sequence characteristic value, the secondary structure characteristic value and the thermodynamic characteristic value of the corresponding siRNAs.
In the embodiment of step S400, it includes S401-S404.
In step S401, after the screening device acquires the training set, the sequence feature value, the secondary structure feature value and the thermodynamic feature value in each training sample are extracted to form corresponding feature vectors, such as feature vectors of 19 dimensions in total. The screening device stores all feature vectors corresponding to each training sample as one row for representing the corresponding training sample. I.e. one instance for each row.
In step S402, training the features in the training samples using the machine learning model described above, using 10-fold cross-validation to confirm the optimal parameters of the machine learning model;
in step S403, an siRNA sequence screening model is established according to the determined optimal parameters.
And S500, extracting feature vectors in the test set, and inputting an siRNA sequence screening model. Wherein the feature vectors include a sequence feature vector, a secondary structure feature vector, and a thermodynamic feature vector.
Step S600, screening siRNA with optimal target effect in mRNA according to the screening scores obtained respectively, and outputting data, wherein the output data comprises sequence data, characteristic values and screening scores of the optimal siRNA. Wherein, the highest screening score is the optimal siRNA.
In one comparative example, a Support Vector Machine (SVM) was used for data training and optimal siRNA screening. The method comprises the following steps:
in this comparative example, LIBSVM with version number 286-1 and MATLAB with version R2012a were used for data training and screening. The SVM training process comprises the following steps:
(1) Processing all training set data into a format required by the LIBSVM software package;
(2) Verifying whether data normalization operation is performed;
(3) Selecting RBF kernel functions to perform cross validation to determine optimal parameters-c and-g;
(4) Training the whole training set according to the optimal parameters-c and-g to obtain a support vector machine model;
(5) And testing and predicting by using the obtained model.
Model evaluation:
the machine learning model and the screening method provided by the examples and the comparative examples are respectively adopted to extract characteristic values, construct a training set and a testing set for 2799 siRNAs of the immune related siRNA which affects the survival rate of gastric cancer or liver cancer, perform data training and optimal siRNA screening, and evaluate the model. Model evaluation was performed using accuracy, sensitivity, specificity and MCC values.
Table 3 model training performance of examples and comparative examples
As can be seen from Table 3, the model training method provided in the comparative example was adopted, in which the accuracy, sensitivity, specificity and MCC value were gradually increased between 5 and 15 times of training, but the model training method was unchanged after 15 to 20 times of training, which indicated that the training times were not too high, and the formation of multiple training was avoided. In addition, the training method has the highest accuracy of 8863%, the highest sensitivity of 8647%, the highest specificity of 8809% and the highest MCC value of 07832.
The machine learning model provided by the invention is used for training, the accuracy, the sensitivity, the specificity and the MCC value of the machine learning model are slowly increased between 5 and 15 times, and the machine learning model reaches the maximum after 20 times of training. After 20 training times, the accuracy is 9882% at maximum, the sensitivity is 9636% at maximum, the specificity is 9727% at maximum, and the MCC value is 09125 at maximum. The accuracy, sensitivity, specificity and MCC value of the machine learning model are all obviously higher than those of the comparative example, and the condition that the machine learning model is trained does not exist for a plurality of degrees of training.
Therefore, the machine learning model is adopted for training, the optimal siRNA interference screening effect on mRNA is known, the effect is better, and the machine learning model has important reference value in the future siRNA interference efficiency prediction field.
In order to further verify that the machine learning model and the screening method provided by the invention can more accurately screen the optimal siRNA for mRNA and effectively reduce the targeting rate for mRNA.
The invention provides siRNA design for 20 immune related siRNAs CGB5, INHBB, TMSBI5A, DKK1, GDF7, SLC22A17, GHR, MMP12, PGF, FGFR4, IGHV3-35, INHBA, IGHD3-16, IGLV3-22, MASP1, BMP8A, CARD11, NOX4, NPR3 and IGHA1 corresponding coding mRNAs, respectively adopting http:// design dharmacon siRNA website, WI siRNA Selection Program (http:// siRNA wi limit /), http:// design siRNAs jp/, www wittrogenecom/siRNA Search, http:// sisearchcgkise/, tp:// opticosanled/and http:// sourcefenginet/respectively related siRNAs, and finally obtaining 1093 sequences by combining the siRNAs.
And (3) adopting the steps of S100-S400 for 1093 siRNA sequences, and adopting a model training method and a screening method provided by a comparative example to respectively obtain the parameter-optimized siRNA screening model. The parameter-optimized siRNA screening model was used to predict the corresponding mRNA encoding siRNA sequences for NOX4siRNA and SLC22A17siRNA, respectively, as shown in tables 4-7. The 3 best siRNA sequences obtained in the examples and comparative examples, respectively, were tested for interference.
The interference test steps generally include:
(1) Recombinant vector
Recombinant vectors were designed for the siRNA sequences encoding mRNA of the 2 immune-related sirnas affecting gastric cancer survival, respectively. For example, the selected siRNA is subjected to shRNA synthesis. Annealing the synthesized oligonucleotide single strand to synthesize a double-stranded DNA template. The pLL3.7 vector was subjected to double digestion with restriction enzymes Hpa I and Xho I, and the digested products were recovered using an agarose gel DNA recovery kit. The recovered linearized vector was ligated to double stranded shRNA overnight in a water bath at 16 ℃. Ligation was used to transform 30. Mu.L of E.coli DH 5. Alpha. Competent cells at 5. Mu.L. Screening an ampicillin Lin Guti culture medium, picking positive colonies, performing amplification culture, and sequencing by Shanghai Biotechnology engineering services Limited company. Sequencing is successful to extract the recombinant vector.
(2) Transfection
Cell lines: mouse MFC gastric cancer cells, purchased from Shanghai cell institute.
And (3) respectively transfecting and culturing 60hd MFC cells with the recombinant vector, extracting total RNA, and measuring the concentration and purity of the RNA by using a nucleic acid protein detection instrument. The total RNA extracted was then reverse transcribed into cDNA according to the procedure of the reverse transcription kit and stored at-20 ℃. GAPDH is taken as an internal reference gene, and detection primers of NOX4 and SLC22A17 (NOX 4-F: gccaccatggctgtgtcctggaggagc, SEQ ID NO.81 are respectively designed;
NOX4-R:gtgctgaaagactctttattgtattcaaatct,SEQ ID NO.82;
SLC22A17-F:cccttgtctctaaggattggcg,SEQ ID NO.83;
SLC22A17-R atctgccgcttcactatcagcc, SEQ ID NO. 84) usingPremix Ex Taq TM The kit is subjected to qRT-PCR reaction, 3 repeats are arranged for each sample, and the reaction system is:. About.>Premix Ex Taq (2X) 10. Mu.L, 10. Mu. Mol/L upstream and downstream primers each 0.4. Mu.L, DNA template [ (-)<100ng)2μL,ddH 2 O (sterilized purified water) 7.2. Mu.L. The reaction procedure is that the reaction is pre-denatured for 30s at 95 ℃; denaturation at 95℃for 5s, annealing at 51℃for 20s, extension at 72℃for 30s for 40 cycles; 95℃10s,51℃15s,95℃10s. After the reaction, the amplification curve and the melting curve were confirmed, and the relative expression level of the target gene was 2 -△△Ct The method calculation was repeated 3 times. Hypothesis testing employs one-way Analysis of Variance (AVONA). Simultaneously, the method is adopted to detect the relative expression quantity of NOX4 and SLC22A17 mRNA in the mouse MFC gastric cancer cells which are not transfected with the recombinant vector
In tables 4 to 7, the silencing efficiency is the percentage ratio of the relative expression amounts of the mRNAs of NOX4 and SLC22A17 in the transfected mouse MFC gastric cancer cells to the relative expression amounts of the mRNAs of NOX4 and SLC22A17 in the mouse MFC gastric cancer cells not transfected with the recombinant vector.
Table 4 silencing efficacy of siRNA sequences against NOX4 gene mRNA sequences screened in examples
TABLE 5 siRNA sequences against NOX4 Gene mRNA sequences screened in comparative examples
FIGS. 3 and 4 are graphs showing the results of electrophoresis of the mRNA expression levels of the gastric cancer cells of mouse MFC, which are interfered with by the siRNA sequences against the mRNA sequences of the NOX4 gene selected in the examples. As can be seen, the target band in FIG. 3 is hardly visible, while the target band in the last 3 lanes in FIG. 4 is brighter, indicating that the silencing efficiency of the siRNA provided by the examples is higher.
As shown in tables 4 and 5, the silencing efficiency of siRNA sequences against NOX4 gene mRNA sequences screened in the examples was higher than 90%. The silencing efficiency of the siRNA sequence screened by the comparative example aiming at the NOX4 gene mRNA sequence is lower, and part of the siRNAs have silencing efficiency through the test, and the off-target rate is 30% (the off-target rate is that the siRNAs with the silencing efficiency of 0 account for the total number of 10 siRNAs).
TABLE 6 siRNA sequences against mRNA sequence of the SLC22A17 gene selected in the examples
TABLE 7siRNA sequences against mRNA sequence of SLC22A17 Gene screened in comparative examples
FIGS. 5 and 6 are graphs showing the results of electrophoresis of the mRNA expression levels of the gastric cancer cells of mouse MFC, which are interfered with by the siRNA sequences against the mRNA sequences of the NOX4 gene selected in the examples. As can be seen, the target bands in FIG. 5 are hardly visible, while the target bands in the last 5 lanes in FIG. 6 are brighter, indicating that the silencing efficiency of the siRNA provided by the examples is higher.
As shown in tables 6 and 7, the silencing efficiency of siRNA sequences against NOX4 gene mRNA sequences screened in the examples was higher than 90%. The silencing efficiency of the siRNA sequence screened by the comparative example aiming at the NOX4 gene mRNA sequence is lower, and part of the siRNAs have silencing efficiency through the test, and the off-target rate is 50% (the off-target rate is that the siRNAs with the silencing efficiency of 0 account for the total number of 10 siRNAs).
Therefore, the method for screening siRNA provided by the invention can obtain the siRNA sequence of interfering mRNA with higher accuracy and higher silencing efficiency, and almost no off-target effect is generated.
Hereinafter, a procedure for analysis using an siRNA sequence screening device or system for obtaining a miRNA free from off-target will be described. Researchers implemented the siRNA sequence screening device or system described above to obtain miRNA free of off-target as a Web application in a private network and tested the results. The web application may run on a web server on which the target siRNA screening program is installed.
FIG. 7 shows an example of an siRNA sequence screening apparatus. The siRNA sequence screening device (800) comprises an input unit (810), a storage unit (820), an operation unit (830) and an output unit (840).
An input unit (810) receives siRNA candidate sequence data. The iRNA candidate sequence data includes data for various sirnas, for example, available from a conventional publicly available database), or a Huesken et al website. Specifically, the data of these siRNA sequences are a SAM file and an siRNA annotation FASTA file generated by RNA-seq, and the format requirements of the input data are: SAM format and FASTA format. The input unit (810) may be a physical interface device such as a keyboard mouse touch pad. Alternatively, the input unit (810) may be a device that receives multiplexed data stored from an external storage medium (USB or the like). Alternatively, the input unit (810) may be a communication device that receives multiplexed data from an external network.
A storage unit (820) stores a program of an siRNA screening model of an siRNA candidate sequence and feature data of the siRNA candidate sequence. The characteristic data of the siRNA candidate sequence comprises sequence characteristic data, secondary structure characteristic data, thermodynamic characteristic data and mid-target rate data corresponding to various siRNA sequence data. The program uses training set data to perform association degree operation on each of the plurality of siRNA candidate sequences and miRNA, and calculates the association relation between each occurrence of the siRNA and miRNA targeting interference. The program calculates a screening score S by carrying out mathematical operation on the association degree and the correlation relation, and the program screens the siRNA candidate sequence by taking the screening score as a standard. siRNA screening models for screening scores are provided in the examples above.
An arithmetic unit (830) scans the siRNA candidate sequence by a program. An arithmetic unit (830) screens out siRNAs free from off-target or low off-target miRNAs from a plurality of siRNAs, and performs a screening scoring S operation on the screened siRNAs. The arithmetic unit (830) refers to a processor device, such as a CPU AP (Application processor), that processes a specific operation by a program.
The output unit (840) is a device for outputting the analysis result of the siRNA candidate sequence. The output unit (840) may be a printer or the like that outputs display device text of video. Further, the output unit (840) may be a communication device that transfers the result of the analysis to other devices.
Further, to obtain a parameter optimized siRNA screening model, as shown in fig. 8, the present invention further provides a system for obtaining an siRNA sequence screening free from an off-target miRNA, which includes the above-mentioned siRNA sequence screening device and database (900) and learning device (910) for obtaining an off-target miRNA.
A database (900) for storing the siRNA dataset. The siRNA dataset includes known siRNA sequence data, silencing data, test data, and clinical information data for each encoded mRNA. Further, the database (900) may store siRNA datasets. The siRNA dataset is unpublished data containing information derived from specific laboratory studies and experimental results.
For example, the above examples provided mRNA encoding data of the respective mRNA encoding sequences, such AS mRNA encoding data of 52 immune-related siRNAs cgb5, INHBB, TMSBI5A, DKK1, GDF7, SLC22a17, GHR, MMP12, PGF, FGFR4, IGHV3-35, INHBA, IGHD3-16, IGLV3-22, MASP1, BMP8A, CARD, NOX4, NPR3, IGHA1, znnf 385DAS2, Z995721, U623174, SLC22a11, SIGLECI2, RNU7154P, REC114, pitppnm 2AS1, PAKS, PAD13, MTND1P23, MTC02P12, MPEG1, MMP3, LRRN4CL, LINC00886, LCN2, IGLV170, IGKV 460, IGKV6D21, IGKV3D20, IGKV320, IGKV10R2108, kv idb, IGKV1D43, IGKV18, IGKV 385, IGKV320, IGHV320, and hv 99 were obtained for the 52, and the targeted data.
A learning device (910) determines optimal parameters of the machine learning model. The learning device determines a correlation of the siRNA screening scores with respect to the screened siRNAs (920). For example, the learning means may determine the correlation of mRNA-siRNA expression, i.e. perform steps S401 to S403, including in particular training these features in the training samples using the machine learning model described above, using 10-fold cross-validation to confirm the optimal parameters of the machine learning model.
A learning device (910) performs a corresponding screening scoring operation on the association degree of the screened siRNA and mRNA; the correlation (siRNA screening model (930)) is reflected by the screening score.
The learning device (910) receives multiplexed data from the database (900). The database (900) retains the siRNA data and clinical information described above. The learning means (910) uses 10-fold cross-validation to confirm the optimal parameters of the machine learning model by analyzing the siRNA data, scanning and analyzing the siRNA candidate sequence using the method provided in the foregoing embodiment, and performing the steps of SS401 to S403 on the machine learning model.
The siRNA sequence screening system can analyze in a variety of ways and provide visual analysis results. The siRNA sequence screening system may search for target siRNAs on-line based on integrated siRNA involvement and analysis. For this reason, the siRNA sequence screening system is constructed to facilitate access to multiple sets of chemical data for gastric cancer or liver cancer, and siRNA silencing efficiency analysis, targeting rate analysis, in consideration of clinical impact and data visualization, so that access to target sirnas for each subtype can provide a ranking function.
In addition, the siRNA screening method described above may be implemented as a program (or application) including an executable algorithm that can be executed on a computer. The program may be provided by storing the program in a non-transitory computer readable medium.
A non-transitory readable medium refers to a medium that semi-permanently stores data and can be read by a device, and not a medium that stores data for a short time such as registers, caches, and memories. In particular, it may be provided by storing the above-described various applications or programs in a non-transitory readable medium such as a CD, DVD, hard disk, blu-ray disc, USB, memory card, ROM, or the like.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention.

Claims (7)

1. A method for screening siRNA sequences for protection against off-target mirnas, comprising:
s100, acquiring siRNA candidate sequence data;
s200, extracting characteristic data according to the siRNA candidate sequence data, and forming a training set and a testing set from the characteristic data; the characteristic data comprise sequence characteristic data, secondary structure characteristic data, thermodynamic characteristic data and mid-target rate data corresponding to each siRNA candidate sequence;
s300, constructing a machine learning model according to the sequence characteristic data, the secondary structure characteristic data, the thermodynamic characteristic data and the middle target rate data;
s400, training a machine learning model by using a training set, and obtaining an siRNA sequence screening model;
s500, extracting feature vectors in a test set, and inputting an siRNA sequence screening model, wherein the feature vectors comprise sequence feature vectors, secondary structure feature vectors and thermodynamic feature vectors; and
s600, screening siRNA with optimal target effect in mRNA according to the screening scores and outputting data, wherein the output data comprises sequence data, characteristic values and screening scores of the optimal siRNA; wherein, the highest screening score is the optimal siRNA;
the machine learning model is as follows:
wherein S is a screening score for a certain siRNA against a certain target mRNA; m is the number of all candidate siRNAs as described above for a certain target mRNA; n is the number of features; t (T) 1 To target mRNA for siRNA sequence characteristic value, T 2 To the secondary structural feature value of a target mRNA, T 3 Is a thermodynamic characteristic value for a certain target mRNA; p, q and j are model parameters.
2. The screening method of claim 1, wherein the siRNA candidate sequence comprises siRNA with a mid-target rate > 90%, a mid-target rate 70% -90%, a mid-target rate 50% -70% and a mid-target rate < 50%; the training set comprises sequence characteristic data, secondary structure characteristic data, thermodynamic characteristic data and middle target rate data corresponding to siRNA with the middle target rate of more than 90%, the middle target rate of 70% -90%, the middle target rate of 50% -70% and the middle target rate of less than 50%.
3. The screening method of claim 1, wherein the sequence features include G/C content, U-T1, U-T2, U-T3, U-T4, A, N, (G-C)%, (a-U)%, (g+c)% and (a+u)%, the secondary structural features include hydrogen bonding coefficients, and the thermodynamic features include Δgm, Δgs, Δgd, P, W and M.
4. The screening method according to claim 3, wherein the characteristic extraction method of the G/C content is the percentage ratio of G to C in the candidate siRNA;
the characteristic extraction method of U-T1 is to judge whether the 5' end of the antisense strand is A/U, if so, the characteristic value is 1, otherwise, the characteristic value is 0;
the characteristic extraction method of U-T2 is to judge whether the 5' end of the sense strand is G/C, if so, the characteristic value is 1, otherwise, the characteristic value is 0;
the characteristic extraction method of U-T3 is to judge whether AU is enriched in 1/3 region at 5' end of antisense strand, if yes, the characteristic value is 1, otherwise 0;
the characteristic extraction method of U-T4 is to judge whether there is GC area exceeding 9 bits continuously, if yes, the characteristic value is 1, otherwise it is 0;
the feature extraction method of A is to judge whether U-T1/U-T2/U-T3/U-T4 is satisfied at the same time, if yes, the feature value is 1, otherwise, 0;
the characteristic extraction method of N is to judge whether U-T1/U-T2/U-T3/U-T4 is not satisfied at the same time, if yes, the characteristic value is 1, otherwise, 0;
the feature extraction method of (G-C)% is 100× (G% -C%)/(G% +C%);
the feature extraction method of (A-U)% is 100× (A% -U%)/(A% +U%);
the feature extraction method of (G+C)% is to calculate G% +C%;
the feature extraction method of (A+U)% is calculated as A++U%.
5. The method according to claim 3, wherein,
the feature extraction method of the hydrogen bond coefficient comprises the following formula:
in this formula, i represents the nucleotide number, P, in the target siRNA region to which the siRNA corresponds H-bond Is the possibility of the ith nucleotide forming hydrogen bonds with other nucleotides in the same mRNA.
6. The method according to claim 3, wherein,
the feature extraction method of Δgm is to calculate the energy to open the target mRNA binding site;
the characteristic extraction method of delta Gs is to calculate the energy for opening siRNA;
the ΔGd feature extraction method is to calculate the energy (in kcal/mol) released by binding of siRNA to mRNA
P is characterized by calculating and judging the quotient of the length from the 5' end of mRNA and the length of mRNA from the position of the first base combined with siRNA;
the characteristic extraction method of W is to calculate the number of the ligand which is not complementary to the secondary structure formed by target mRNA;
the feature extraction method of M is to calculate the energy released by the formation of secondary structure of target mRNA.
7. The screening method according to claim 1, wherein the step S400 specifically includes:
s401, after the training set is obtained, extracting a sequence characteristic value, a secondary structure characteristic value and a thermodynamic characteristic value in each training sample to respectively form corresponding characteristic vectors;
s402, training the features in a training sample by using the machine learning model, and confirming optimal parameters of the machine learning model by using cross verification;
s403, establishing the siRNA sequence screening model according to the determined optimal parameters.
CN202310144980.1A 2023-02-21 2023-02-21 Method and system for screening siRNA sequence to reduce off-target effect Pending CN116825199A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310144980.1A CN116825199A (en) 2023-02-21 2023-02-21 Method and system for screening siRNA sequence to reduce off-target effect

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310144980.1A CN116825199A (en) 2023-02-21 2023-02-21 Method and system for screening siRNA sequence to reduce off-target effect

Publications (1)

Publication Number Publication Date
CN116825199A true CN116825199A (en) 2023-09-29

Family

ID=88120996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310144980.1A Pending CN116825199A (en) 2023-02-21 2023-02-21 Method and system for screening siRNA sequence to reduce off-target effect

Country Status (1)

Country Link
CN (1) CN116825199A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119296635A (en) * 2024-12-11 2025-01-10 北京悦康科创医药科技股份有限公司 Construction and training method, device and screening method of siRNA gene inhibition efficiency screening model
CN119851749A (en) * 2024-12-11 2025-04-18 北京悦康科创医药科技股份有限公司 SiRNA screening method and device based on deep learning

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080009012A1 (en) * 2006-03-16 2008-01-10 Dharmacon, Inc. Methods, libraries and computer program products for determining whether siRNA induced phenotypes are due to off-target effects
US20080234941A1 (en) * 2003-10-27 2008-09-25 Jackson Aimee L Method of Designing Sirnas for Gene Silencing
WO2009042115A2 (en) * 2007-09-24 2009-04-02 Rosetta Inpharmatics Llc Methods of designing short hairpin rnas (shrnas) for gene silencing
US20090264510A1 (en) * 2006-01-31 2009-10-22 Maciej Wieczorek Double helical oligonucleotides interfering with mRNA used as effective anticancer agents
CN104419702A (en) * 2013-09-04 2015-03-18 北京中康万达医药科技有限公司 Method for screening siRNA on basis of bioinformatics
CN108182346A (en) * 2016-12-08 2018-06-19 杭州康万达医药科技有限公司 Predict method for building up and its application of the siRNA for the machine learning model of the toxicity of certain class cell
CN110010194A (en) * 2019-04-10 2019-07-12 浙江科技学院 A Prediction Method of RNA Secondary Structure
CN111354420A (en) * 2020-03-08 2020-06-30 吉林大学 siRNA research and development method for COVID-19 virus drug therapy
CN112951319A (en) * 2021-02-25 2021-06-11 深圳市新合生物医疗科技有限公司 Method and system for screening siRNA sequence to reduce off-target effect
CN113066527A (en) * 2021-04-14 2021-07-02 吉优诺(上海)基因科技有限公司 Target prediction method and system for siRNA knockdown of mRNA
KR20220083620A (en) * 2020-12-11 2022-06-20 주식회사 뉴클릭스바이오 Method and appartus for screening RNA aptamer using Monte Carlo tree search approach

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080234941A1 (en) * 2003-10-27 2008-09-25 Jackson Aimee L Method of Designing Sirnas for Gene Silencing
US20090264510A1 (en) * 2006-01-31 2009-10-22 Maciej Wieczorek Double helical oligonucleotides interfering with mRNA used as effective anticancer agents
US20080009012A1 (en) * 2006-03-16 2008-01-10 Dharmacon, Inc. Methods, libraries and computer program products for determining whether siRNA induced phenotypes are due to off-target effects
WO2009042115A2 (en) * 2007-09-24 2009-04-02 Rosetta Inpharmatics Llc Methods of designing short hairpin rnas (shrnas) for gene silencing
CN104419702A (en) * 2013-09-04 2015-03-18 北京中康万达医药科技有限公司 Method for screening siRNA on basis of bioinformatics
CN108182346A (en) * 2016-12-08 2018-06-19 杭州康万达医药科技有限公司 Predict method for building up and its application of the siRNA for the machine learning model of the toxicity of certain class cell
US20200020420A1 (en) * 2016-12-08 2020-01-16 Hangzhou Converd Co., Ltd. Method for Establishing Machine Learning Model for Predicting Toxicity of siRNA to Certain Type of Cells and Application Thereof
CN110010194A (en) * 2019-04-10 2019-07-12 浙江科技学院 A Prediction Method of RNA Secondary Structure
CN111354420A (en) * 2020-03-08 2020-06-30 吉林大学 siRNA research and development method for COVID-19 virus drug therapy
KR20220083620A (en) * 2020-12-11 2022-06-20 주식회사 뉴클릭스바이오 Method and appartus for screening RNA aptamer using Monte Carlo tree search approach
CN112951319A (en) * 2021-02-25 2021-06-11 深圳市新合生物医疗科技有限公司 Method and system for screening siRNA sequence to reduce off-target effect
CN113066527A (en) * 2021-04-14 2021-07-02 吉优诺(上海)基因科技有限公司 Target prediction method and system for siRNA knockdown of mRNA

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANG ZHANG 等: "Secretion of human soluble programmed cell death protein 1 by chimeric antigen receptor-modified T cells enhances anti-tumor efficacy", 《CYTOTHERAPY》, vol. 22, no. 12, pages 734 - 743 *
严婉荣;肖彤斌;赵志祥;肖敏;陈绵才;: "辣椒MicroRNA靶标的预测及其与病毒的关系分析", 基因组学与应用生物学, no. 06, pages 143 - 149 *
刘元宁;徐宝林;张浩;陈竟博;韩烨;禹剑龙;: "基于siRNA-mRNA结合热力学特征的高效siRNA筛选", 吉林大学学报(工学版), no. 01, pages 196 - 200 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119296635A (en) * 2024-12-11 2025-01-10 北京悦康科创医药科技股份有限公司 Construction and training method, device and screening method of siRNA gene inhibition efficiency screening model
CN119851749A (en) * 2024-12-11 2025-04-18 北京悦康科创医药科技股份有限公司 SiRNA screening method and device based on deep learning

Similar Documents

Publication Publication Date Title
Sloma et al. Improving RNA secondary structure prediction with structure mapping data
Liu et al. Prediction of functional microRNA targets by integrative modeling of microRNA binding and target expression data
Ray et al. RNAcompete methodology and application to determine sequence preferences of unconventional RNA-binding proteins
US10400288B2 (en) MicroRNA-based method for early detection of prostate cancer in urine samples
CN116798513B (en) Method and system for screening siRNA sequence to reduce off-target effect
Keller et al. Can circulating miRNAs live up to the promise of being minimal invasive biomarkers in clinical settings?
Barbato et al. Computational challenges in miRNA target predictions: to be or not to be a true target?
US8084598B1 (en) Bioionformality detectable group of novel regulatory oligonucleotides and uses thereof
Cheng et al. Consistent global structures of complex RNA states through multidimensional chemical mapping
Wen et al. In Silico identification and characterization of mRNA-like noncoding transcripts in Medicago truncatula
CN112823213A (en) Methods and systems for high depth sequencing of methylated nucleic acids
Zhang et al. A review on recent computational methods for predicting noncoding RNAs
CN116825199A (en) Method and system for screening siRNA sequence to reduce off-target effect
Morgado et al. Computational tools for plant small RNA detection and categorization
Babarinde et al. Computational methods for mapping, assembly and quantification for coding and non-coding transcripts
CN113066527B (en) Target prediction method and system for siRNA knockdown mRNA
CN106148324B (en) Analysis and identification method of RNA-RNA interaction and its application
Oulas et al. A new microRNA target prediction tool identifies a novel interaction of a putative miRNA with CCND2
Goñi et al. Uncovering functional lncRNAs by scRNA-seq with ELATUS
Bastami et al. The miRNA targetome of coronary artery disease is perturbed by functional polymorphisms identified and prioritized by in-depth bioinformatics analyses exploiting genome-wide association studies
Rajendiran et al. Computational approaches and related tools to identify MicroRNAs in a species: A Bird’s Eye View
US7842800B2 (en) Bioinformatically detectable group of novel regulatory bacterial and bacterial associated oligonucleotides and uses thereof
Wu et al. Genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in Arabidopsis thaliana
US20050222399A1 (en) Bioinformatically detectable group of novel regulatory oligonucleotides associated with alzheimer&#39;s disease and uses thereof
Olson et al. Established and emerging liquid biomarkers for prostate cancer detection: A review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned

Effective date of abandoning: 20240816

AD01 Patent right deemed abandoned