[go: up one dir, main page]

WO2023217290A1 - Prédiction génophénotypique basée sur un réseau neuronal en graphes - Google Patents

Prédiction génophénotypique basée sur un réseau neuronal en graphes Download PDF

Info

Publication number
WO2023217290A1
WO2023217290A1 PCT/CN2023/095224 CN2023095224W WO2023217290A1 WO 2023217290 A1 WO2023217290 A1 WO 2023217290A1 CN 2023095224 W CN2023095224 W CN 2023095224W WO 2023217290 A1 WO2023217290 A1 WO 2023217290A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
graph neural
node
gene
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/095224
Other languages
English (en)
Chinese (zh)
Inventor
章依依
吴翠玲
徐晓刚
王军
李萧缘
虞舒敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to JP2023543455A priority Critical patent/JP7522936B2/ja
Publication of WO2023217290A1 publication Critical patent/WO2023217290A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Definitions

  • the invention relates to the field of intelligent computing breeding, and in particular to gene phenotype prediction based on graph neural networks and corresponding graph neural network training.
  • soybeans are an important component of grain production. How to select and cultivate high-yield soybeans is a problem that agriculturists are currently studying.
  • the proposal of the genome-wide selection algorithm provides a direction for genetic breeding.
  • the existing representative methods include Best Linear Unbiased Prediction (BLUP), Genomic Best Linear Unbiased Prediction (Genomic Best Linear Unbiased Prediction, GBLUP), Ridge Regression Best Linear Unbiased Prediction (RR-BLUP), Least Absolute Shrinkage and Selection Operator (LASSO), etc.
  • BLUP Best Linear Unbiased Prediction
  • GBLUP Genomic Best Linear Unbiased Prediction
  • RR-BLUP Ridge Regression Best Linear Unbiased Prediction
  • LASSO Least Absolute Shrinkage and Selection Operator
  • Deep learning-based gene phenotype prediction (DeepGS) algorithm proposed by the Northwest A&F University team can predict the phenotypic traits of wheat by constructing a convolutional neural network, and exceeds the performance of traditional whole-genome selection algorithms.
  • DGS deep learning-based gene phenotype prediction
  • most of the existing genome-wide selection algorithms based on deep learning use simple convolutional neural networks and do not utilize gene-related prior knowledge.
  • Graph neural networks can be trained on the basis of prior knowledge graphs and achieve considerable results.
  • Graph neural networks are divided into spectrum-based methods and spatial domain-based methods, including Graph Neural Network (GNN), Graph Convolutional Network (GCN), and Graph Attention Network (GAT) and other methods.
  • GCN Graph Neural Network
  • GCN Graph Convolutional Network
  • GAT Graph Attention Network
  • the present invention adopts the following technical solutions:
  • a gene phenotype training method based on a graph neural network including the following steps: for a specific species, based on the correlation between the gene loci of the species and the phenotype, construct a graph neural network including a multi-layer network, wherein, in In each layer of the graph neural network, nodes represent gene sites, edges represent two gene sites related to the same phenotype, and the weight of the edge is used to reflect the degree of association between the gene sites; collecting the The genetic data and phenotypic data of multiple samples of the species are used as training data; for the training data, the genetic data is encoded based on the probability value of site detection to obtain the gene sites and genes corresponding to the genetic data.
  • encoding the genetic data based on the probability value of gene site detection includes converting the probability value PL based on the gene site detection genotype of 0/0, 0/1, 1/1 respectively according to the following formula: The probability P of supporting each described genotype:
  • the obtained probability P of the gene site is formed into a 3-dimensional vector [a, b, c], which is used as the genotype representation corresponding to the gene site, where, The above a, b, c respectively represent the probability that the genotype of the gene site is 0/0, 0/1, 1/1; for undetected gene sites, the genotype is represented by the vector [0, 0,0] represents.
  • each node can be updated through the weight and convolution kernel parameters of the neighborhood nodes.
  • the following steps may be included: for each node c in the current layer of the graph neural network, construct m candidate nodes from the first-order adjacent nodes of the node c, where m is an integer greater than 0; None Sample n nodes from m candidate nodes of node c with replacement as neighbor nodes of node c, and when m is less than n, sample all m candidate nodes as neighbor nodes of node c; aggregate all the The information of all neighboring nodes of the node c is obtained to obtain the neighborhood information of the node c.
  • Convolution and activation operations are performed on the information obtained by splicing the information hc of the node c, and the output information h , c of the node c in the current layer network of the graph neural network is obtained as the next layer of the graph neural network. Input to one layer of the network.
  • hi represents the information of the i-th neighbor node of node c
  • wi represents the weight of the i-th neighbor node of node c.
  • h and c represent the information output by node c from the current layer network, that is, the input of the next layer network
  • represents the activation function
  • W represents the convolution kernel parameters
  • hc represents the information input by node c to the current network layer.
  • using a multi-layer perceptron to obtain the phenotype classification result corresponding to the genetic data includes the following steps: converting the graph neural network The vectors with a dimension of 3 output by all nodes in the last layer of the network are spliced to obtain the spliced vector; the spliced vector is input into the multi-layer perceptron to obtain the classification result output by the multi-layer perceptron as the Phenotypic classification results corresponding to the above genetic data.
  • a loss function is used to perform supervised training on the model parameters of the graph neural network and/or the multi-layer perceptron, which may specifically include the following Steps: Divide s phenotypes into k intervals on average as categories, and obtain a genotype representation true value vector with dimensions s ⁇ k, where the dimension s ⁇ k is consistent with the dimension of the phenotype classification result; use loss Function to perform multi-phenotype supervised training based on the phenotype classification result and the genotype representation true value vector of the phenotype.
  • the loss function may be a focal loss function, and the formula for calculating the classification loss based on the phenotype classification result and the genotype representation true value vector of the phenotype is:
  • p x, y represents the confidence of the phenotype classification result at the abscissa x and ordinate y of the feature map, Represents the true category label of the genotype representation true value vector of the phenotype at this position, 1 represents a positive sample, 0 represents a negative sample; ⁇ is a value greater than 0, ⁇ is a decimal between [0, 1], ⁇ and ⁇ are both fixed values and do not participate in training.
  • a gene phenotype prediction method based on graph neural network including the following steps: for the gene data to be classified, encoding the gene data based on the probability value of site detection to obtain the gene corresponding to the gene data to be classified Locus and genotype characterization; input the encoded genetic data to be classified into the trained graph neural network and multi-layer perceptron to obtain the phenotypic results corresponding to the genetic data to be classified.
  • the graph neural network and the multi- The layer perceptron is a gene phenotype prediction network trained by the aforementioned method for the species to which the genetic data to be classified belongs.
  • a graph neural network-based gene phenotype training device used to implement the graph neural network-based gene phenotype training method, including a graph neural network building module, a data acquisition module, a precoding module, a genetic data input module and Classification module; the graph neural network building module constructs a graph neural network for genes based on the correlation between gene sites and phenotypes: nodes represent gene sites, and edges represent two gene sites that are related to a certain phenotype at the same time. The weight of the edge is used to reflect the degree of association between gene sites; the data acquisition module collects the genetic data of the sample, obtains the phenotypic data corresponding to the sample, and divides the training set and the test set for training and verification.
  • Graph neural network the precoding module, for training data, precodes genetic data based on site detection to obtain gene sites and their corresponding genotypes; the genetic data input module converts the encoded genetic data into Input the graph neural network constructed.
  • Each layer of the network uses a one-dimensional convolution kernel with a length of 3, and the convolution kernel is shared between neighborhoods; the classification module splices the output results of each node, and the spliced The results are input into the multi-layer perceptron, the phenotype classification results are output, and the model is supervised and trained according to the loss function.
  • a gene phenotype prediction device based on a graph neural network is used to implement the gene phenotype training device based on a graph neural network.
  • the genetic data to be classified is encoded by the precoding module, it is then passed through the genetic data input module. Input the trained classification module to obtain the phenotypic results corresponding to the genetic data to be classified.
  • the advantages and beneficial effects of the present invention are: first, by using the prior knowledge of the correlation between gene phenotypes to construct a gene graph neural network, and eliminating weakly correlated gene sites, it can effectively reduce the input gene dimension, thereby achieving reduction.
  • the purpose of dimensional denoising secondly, by dividing phenotypes into multiple intervals for classification prediction, it can effectively reduce the difficulty of training, increase the stability of the model algorithm, and support simultaneous training and prediction of multiple phenotypes; finally, compared with Compared with traditional whole-gene selection algorithms such as rrBLUP, the technical solution provided by the present invention has better performance in the prediction of various phenotypes, including a 20% to 30% improvement in the Pearson Correlation Coefficient.
  • FIG. 1 is a flow chart of a graph neural network training method for gene phenotype prediction according to an embodiment of the present invention.
  • Figure 2 is a flow chart of a gene phenotype prediction method based on graph neural network according to an embodiment of the present invention.
  • Figure 3 is a simplified model architecture diagram for classifying and identifying gene phenotypes based on graph neural networks according to an embodiment of the present invention.
  • Figure 4 is a schematic structural diagram of a graph neural network training device for gene phenotype prediction according to an embodiment of the present invention.
  • a training method of a graph neural network for gene phenotype prediction may include the following steps S110 to S160.
  • Step S110 For a specific species, construct a graph neural network including a multi-layer network based on the correlation between the gene locus and the phenotype of the species.
  • nodes represent gene sites
  • edges represent two gene sites that are simultaneously related to a certain phenotype
  • the weight of the edge is used to reflect the degree of association between gene sites.
  • a graph neural network of soybean genes can be constructed based on the correlation information between soybean gene loci and phenotypes shown in Table 1 below. Among them, there are 39 gene loci. If two gene loci are associated with the same phenotype more times, the weight of the edge will be higher. Therefore, the edge weight can reflect the degree of association between gene sites.
  • Step S120 Collect genetic data and phenotypic data of multiple samples of the species as training data.
  • the training data can be divided into a training set and a test set to respectively train and verify the graph neural network.
  • the genetic data of 3,000 soybean samples that is, Single Nucleotide Polymorphisms (SNP) site information
  • SNP Single Nucleotide Polymorphisms
  • Table 1 the genetic data of 3,000 soybean samples. During training and testing, only the 39 genes involved in Table 1 need to be used.
  • the input genetic data is encoded based on the probability value PL of gene site detection, and the PL value based on the gene site detection genotype is 0/0, 0/1, 1/1 Convert to the probability P of supporting each genotype according to the following formula:
  • the probability P obtained for a certain gene site can be formed into a 3-dimensional vector [a, b, c], as the genotype representation corresponding to the gene site, where a, b, c represent in turn
  • the genotype of this gene locus has a probability of 0/0, 0/1, and 1/1.
  • their genotype representation can be represented by the vector [0,0,0].
  • Step S140 Input the encoded genetic data into the constructed graph neural network, so as to pass through each layer of the graph neural network in sequence.
  • Each layer of the graph neural network uses a one-dimensional convolution kernel with a length of 3, and the convolution kernel is shared between neighborhoods.
  • the encoded genetic data with a dimension of 39 ⁇ 3 is input into the constructed graph neural network.
  • the graph neural network can be a graph neural network with 8 network layers. Each layer of the network uses 3 one-dimensional convolution kernels with a length of 3, and the convolution kernels are shared between neighborhoods.
  • step S141 In the graph neural network, for each node in the current layer, m candidate nodes are constructed from its first-order neighboring nodes. Among them, m is an integer greater than 0.
  • Step S143 Aggregate the information of all neighbor nodes of node c to obtain the neighbor information of node c
  • the aggregation formula can be expressed as follows:
  • hi represents the information of the i-th neighbor node of node c
  • wi represents the weight of the i-th neighbor node of node c.
  • the aggregated neighborhood information of node c can be:
  • Step S144 Aggregate neighborhood information of node c Perform vector splicing with the information hc of node c, and concatenate The subsequent information is subjected to convolution and activation operations to obtain the output information h , c of the current layer of the graph neural network.
  • h and c represent the information output by node c from the current network layer, that is, the input of the next layer of network
  • represents the activation function
  • W represents the convolution kernel parameters
  • CONCAT represents the splicing operation
  • hc represents the input of node c to the current network layer.
  • Step S150 Based on the output result of each node in the last layer of the graph neural network, use a multi-layer perceptron to obtain the phenotype classification result corresponding to the genetic data.
  • Step S160 Use a loss function to perform supervised training on the model parameters of the graph neural network and/or the multi-layer perceptron based on the phenotypic classification results and genotype representation corresponding to the genetic data.
  • the loss function is mainly used to calculate the loss value based on the phenotype classification result and the genotype characterization.
  • step S150 may specifically include the following step S151.
  • step S151 splice the vectors with dimension 3 output by all nodes in the last layer of the graph neural network, and then input the spliced vector into the multi-layer perceptron to obtain the classification result output by the multi-layer perceptron as the above Phenotypic classification results.
  • the first layer of the fully connected network inputs a splicing probability vector with a dimension of 117, and outputs an intermediate probability vector with a dimension of 80; then, the second layer of the fully connected network inputs an intermediate probability vector with a dimension of 80, and outputs a dimension of 80.
  • the final probability vector of 20 is used as the phenotype classification result.
  • the above step S160 may specifically include the following steps S161 to S162.
  • step S161 divide s phenotypes into k intervals on average as categories, and then obtain a genotype representation true value vector with dimensions s ⁇ k (hereinafter also referred to as the true value vector for short).
  • the dimensions of the truth vector correspond one-to-one with the dimensions of the phenotype classification result output by the multi-layer perceptron network. Taking plant height as an example, it can be divided into five categories according to the average interval: extremely short, short, normal, high, and extremely high. Other phenotypes can be deduced in this way and will not be described here.
  • step S162 Use the loss function to perform multi-table analysis based on the phenotype classification result and the true value vector of the phenotype. Supervised training.
  • the loss function in supervised training can use the focal loss function Focal Loss
  • the formula for calculating the classification loss can be:
  • p x, y represents the confidence of the phenotype classification result at the abscissa x and ordinate y of the feature map, Represents the true category label of the target at the position of the true value vector of the phenotype, 1 represents a positive sample, 0 represents a negative sample; ⁇ is a value greater than 0, ⁇ is a decimal between [0, 1], ⁇ and ⁇ are all fixed values and do not participate in training.
  • the training effect is optimal when ⁇ takes a value of 0.1 and ⁇ takes a value of 2.
  • SGD Stochastic Gradient Descent
  • GPUs Graphics Processing Units
  • the batch size is 16 and the number of training steps is 50k.
  • the initial The learning rate is 0.01 and then scales down by a factor of 10 at 20k and 40k steps.
  • a gene phenotype prediction method based on graph neural network is also provided.
  • the gene phenotype prediction method may include the following steps S210 to S220.
  • step S210 for the genetic data to be classified, the genetic data is encoded based on the probability value of site detection to obtain the gene site and genotype representation corresponding to the genetic data to be classified.
  • the processing of step S210 is basically the same as that of step S130. For details, please refer to the above and will not be described again here.
  • step S220 the encoded genetic data to be classified is input into the trained graph neural network and multi-layer perceptron to obtain the phenotypic results corresponding to the genetic data to be classified.
  • the graph neural network and the multi-layer perceptron can be a gene phenotype prediction network for the species to which the genetic data to be classified is obtained using the aforementioned training method, and the specific training method will not be described in detail here.
  • each layer of the graph neural network includes 5 nodes.
  • the output of the graph neural network is input to the multi-layer perceptron 320 and output by the multi-layer perceptron 320. The final classification result of the genetic data.
  • a training device for graph neural network for gene phenotype prediction including a graph neural network building module, a data acquisition module, a precoding module, a gene data input module and a classification module.
  • the graph neural network building module is aimed at a specific species and constructs a graph neural network including a multi-layer network based on the correlation between the genetic loci and phenotype of the species.
  • the node represents a gene site
  • the edge represents two gene sites related to the same phenotype
  • the weight of the edge is used to reflect the relationship between the gene sites.
  • the data acquisition module collects the genetic data and phenotypic data of multiple samples of the species as training data; the precoding module, for the training data, performs on the genetic data based on the probability value of site detection Encoding to obtain the gene locus and genotype representation corresponding to the genetic data; the genetic data input module inputs the encoded genetic data into the graph neural network to pass through each layer of the graph neural network in sequence , wherein each layer of the network in the graph neural network uses a one-dimensional convolution kernel with a length of 3, and the convolution kernel is shared between neighborhoods; the classification module is based on each layer of the last layer of the graph neural network.
  • a multi-layer perceptron is used to obtain the phenotypic classification result corresponding to the genetic data, so that the loss function can be used to compare the graph neural network and/or the multi-layered neural network according to the phenotypic classification result.
  • the model parameters of the layer perceptron are supervised for training.
  • a graph neural network-based gene phenotype prediction device used to implement the graph neural network-based gene phenotype prediction and classification, which may include the above-mentioned precoding module and be trained via the above-mentioned training method and/or training device
  • the resulting gene phenotype prediction network is input into the trained graph neural network and multi-layer perceptron, and the corresponding genetic data to be classified is obtained. Phenotypic results.
  • the graph neural network and the multi-layer perceptron are gene phenotype prediction networks obtained by using the aforementioned training method and/or training device to train the species to which the genetic data to be classified belongs.
  • the present invention also provides an embodiment of a gene phenotype prediction device based on a graph neural network.
  • an embodiment of the present invention provides a device for gene phenotype prediction based on a graph neural network, including a memory (specifically, it may include a non-volatile memory 430 and/or a memory 440) and one or more processors 410 .
  • executable codes are stored in the memories 430 and 440.
  • the one or more processors 410 execute the executable codes, they are used to implement a graph neural network-based gene phenotype prediction in the above embodiment. method.
  • the device also includes an internal bus 420 to connect the processor 410 and the memories 430, 440.
  • the device may also include a network interface 450 for the device to communicate with the outside.
  • the embodiment of the graph neural network-based gene phenotype prediction device of the present invention can be applied to any device with data processing capabilities, and any device with data processing capabilities can be a device or device such as a computer.
  • the device embodiments may be implemented by software, or may be implemented by hardware or a combination of software and hardware. Taking software implementation as an example, as a logical device, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory and running them through the processor of any device with data processing capabilities. From the hardware level, as shown in Figure 4, it is a hardware structure diagram of any device with data processing capabilities where the gene phenotype prediction device based on graph neural network of the present invention is located. In addition to the processor shown in Figure 4 , memory, network interface, and non-volatile memory, any device with data processing capabilities where the device in the embodiment is located may also include other hardware based on the actual functions of any device with data processing capabilities. In this regard No longer.
  • the device embodiment since it basically corresponds to the method embodiment, please refer to the partial description of the method embodiment for relevant details.
  • the device embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated.
  • the components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
  • Embodiments of the present invention also provide a computer-readable storage medium on which a program is stored.
  • the program is executed by a processor, the gene phenotype prediction method based on a graph neural network in the above embodiments is implemented.
  • the computer-readable storage medium may be an internal storage unit of any device with data processing capabilities as described in any of the foregoing embodiments, such as a hard disk or a memory.
  • the computer-readable storage medium can also be an external storage device of any device with data processing capabilities, such as a plug-in hard disk, smart memory card (Smart Media Card, SMC), SD card, flash memory card equipped on the device (Flash Card) etc.
  • the computer-readable storage medium may also include both an internal storage unit and an external storage device of any device with data processing capabilities.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by any device with data processing capabilities, and can also be used to temporarily store data that has been output or is to be output.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Bioethics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente divulgation concerne une prédiction génophénotypique basée sur un réseau neuronal en graphes. Selon un mode de réalisation du procédé de la présente divulgation, pour une espèce spécifique, un réseau neuronal en graphes comprenant de multiples couches de réseau est construit selon la corrélation entre des loci de gène et des phénotypes de l'espèce. Chaque couche de réseau du réseau neuronal en graphes utilise un noyau de convolution unidimensionnel ayant une longueur de 3, et le noyau de convolution est partagé par des voisinages ; et dans chaque couche de réseau, des nœuds représentent les loci de gène, des bords représentent que deux loci de gène sont associés au même phénotype, et le poids de chaque bord est utilisé pour refléter le degré d'association entre les loci de gène. Un résultat obtenu après que des données de gène échantillon de l'espèce sont entrées dans le réseau neuronal en graphes est entré dans un perceptron multicouche, et un résultat de classification phénotypique correspondant peut être obtenu. Le réseau neuronal en graphes et le perceptron multicouche peuvent être entraînés et vérifiés sur la base de la différence entre le résultat de classification des données de gène échantillon et une valeur réelle, et une classification phénotypique peut être réalisée à l'aide du réseau neuronal en graphes entraîné et du perceptron multicouche entraîné.
PCT/CN2023/095224 2022-10-11 2023-05-19 Prédiction génophénotypique basée sur un réseau neuronal en graphes Ceased WO2023217290A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023543455A JP7522936B2 (ja) 2022-10-11 2023-05-19 グラフニューラルネットワークに基づく遺伝子表現型予測

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211238697.7A CN115331732B (zh) 2022-10-11 2022-10-11 基于图神经网络的基因表型训练、预测方法及装置
CN202211238697.7 2022-10-11

Publications (1)

Publication Number Publication Date
WO2023217290A1 true WO2023217290A1 (fr) 2023-11-16

Family

ID=83915021

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/095224 Ceased WO2023217290A1 (fr) 2022-10-11 2023-05-19 Prédiction génophénotypique basée sur un réseau neuronal en graphes

Country Status (3)

Country Link
JP (1) JP7522936B2 (fr)
CN (1) CN115331732B (fr)
WO (1) WO2023217290A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119440822A (zh) * 2024-10-24 2025-02-14 西安电子科技大学 一种基于异构图神经网络的细粒度微服务资源预测方法

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115331732B (zh) * 2022-10-11 2023-03-28 之江实验室 基于图神经网络的基因表型训练、预测方法及装置
CN120530394A (zh) * 2022-12-27 2025-08-22 深圳华大生命科学研究院 批次效应去除模型的训练方法及去除方法、装置
CN116072214B (zh) * 2023-03-06 2023-07-11 之江实验室 基于基因显著性增强的表型智能预测、训练方法及装置
CN116580767B (zh) * 2023-04-26 2024-03-12 之江实验室 一种基于自监督与Transformer的基因表型预测方法和系统
CN117198406B (zh) * 2023-09-21 2024-06-11 亦康(北京)医药科技有限公司 一种特征筛选方法、系统、电子设备及介质
CN116959561B (zh) * 2023-09-21 2023-12-19 北京科技大学 一种基于神经网络模型的基因相互作用预测方法和装置
CN116992919B (zh) * 2023-09-28 2023-12-19 之江实验室 一种基于多组学的植物表型预测方法和装置
CN119694406B (zh) * 2025-02-24 2025-09-09 季华实验室 一种优质育种群体选择方法、装置、设备及存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096327A (zh) * 2016-06-07 2016-11-09 广州麦仑信息科技有限公司 基于Torch监督式深度学习的基因性状识别方法
CN113593635A (zh) * 2021-08-06 2021-11-02 上海市农业科学院 一种玉米表型预测方法及系统
CN114333986A (zh) * 2021-09-06 2022-04-12 腾讯科技(深圳)有限公司 模型训练、药物筛选和亲和力预测的方法与装置
CN114649097A (zh) * 2022-03-04 2022-06-21 广州中医药大学(广州中医药研究院) 一种基于图神经网络及组学信息的药物功效预测方法
CN114765063A (zh) * 2021-01-12 2022-07-19 上海交通大学 基于图神经网络表征的蛋白质与核酸结合位点预测方法
CN114783524A (zh) * 2022-06-17 2022-07-22 之江实验室 基于自适应重采样深度编码器网络的通路异常检测系统
US20220301658A1 (en) * 2021-03-19 2022-09-22 X Development Llc Machine learning driven gene discovery and gene editing in plants
CN115331732A (zh) * 2022-10-11 2022-11-11 之江实验室 基于图神经网络的基因表型训练、预测方法及装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5438644A (en) * 1991-09-09 1995-08-01 University Of Florida Translation of a neural network into a rule-based expert system
CN108388768A (zh) * 2018-02-08 2018-08-10 南京恺尔生物科技有限公司 利用生物知识搭建的神经网络模型的生物特性预测方法
EP3899685A4 (fr) * 2018-12-21 2022-09-07 Teselagen Biotechnology Inc. Procédé, appareil et support lisible par ordinateur pour optimiser efficacement un phénotype avec un modèle de prédiction spécialisé
CN110010201A (zh) * 2019-04-16 2019-07-12 山东农业大学 一种rna选择性剪接位点识别方法及系统
US11228505B1 (en) * 2021-01-29 2022-01-18 Fujitsu Limited Explanation of graph-based predictions using network motif analysis
CN114360654B (zh) * 2022-01-05 2025-08-01 重庆邮电大学 一种基于基因表达的图神经网络数据集构建方法
CN114637923B (zh) * 2022-05-19 2022-09-02 之江实验室 基于层次注意力图神经网络的数据信息推荐方法和装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096327A (zh) * 2016-06-07 2016-11-09 广州麦仑信息科技有限公司 基于Torch监督式深度学习的基因性状识别方法
CN114765063A (zh) * 2021-01-12 2022-07-19 上海交通大学 基于图神经网络表征的蛋白质与核酸结合位点预测方法
US20220301658A1 (en) * 2021-03-19 2022-09-22 X Development Llc Machine learning driven gene discovery and gene editing in plants
CN113593635A (zh) * 2021-08-06 2021-11-02 上海市农业科学院 一种玉米表型预测方法及系统
CN114333986A (zh) * 2021-09-06 2022-04-12 腾讯科技(深圳)有限公司 模型训练、药物筛选和亲和力预测的方法与装置
CN114649097A (zh) * 2022-03-04 2022-06-21 广州中医药大学(广州中医药研究院) 一种基于图神经网络及组学信息的药物功效预测方法
CN114783524A (zh) * 2022-06-17 2022-07-22 之江实验室 基于自适应重采样深度编码器网络的通路异常检测系统
CN115331732A (zh) * 2022-10-11 2022-11-11 之江实验室 基于图神经网络的基因表型训练、预测方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119440822A (zh) * 2024-10-24 2025-02-14 西安电子科技大学 一种基于异构图神经网络的细粒度微服务资源预测方法

Also Published As

Publication number Publication date
JP2024524795A (ja) 2024-07-09
CN115331732B (zh) 2023-03-28
JP7522936B2 (ja) 2024-07-25
CN115331732A (zh) 2022-11-11

Similar Documents

Publication Publication Date Title
WO2023217290A1 (fr) Prédiction génophénotypique basée sur un réseau neuronal en graphes
US12045723B2 (en) Neural network method and apparatus
JP7684287B2 (ja) 単一細胞rna-seqデータ処理
CN108595916B (zh) 基于生成对抗网络的基因表达全谱推断方法
CN111899882B (zh) 一种预测癌症的方法及系统
CN111798935B (zh) 基于神经网络的普适性化合物结构-性质相关性预测方法
CN110097921B (zh) 基于影像组学的胶质瘤内基因异质性可视化定量方法和系统
CN112232413A (zh) 基于图神经网络与谱聚类的高维数据特征选择方法
WO2010064414A1 (fr) Programme de groupement de gènes, procédé de groupement de gènes, et dispositif d’analyse de groupes de gènes
WO2023124342A1 (fr) Procédé de recherche d'architecture neuronale automatique à faible coût pour la classification d'images
CN119763665B (zh) 一种基于图表示学习的基因调控网络推断方法及系统
Lin et al. A novel chromosome cluster types identification method using ResNeXt WSL model
CN109344898A (zh) 基于稀疏编码预训练的卷积神经网络图像分类方法
CN114882008A (zh) 一种基于病理图像特征检测肿瘤驱动基因差异表达算法
CN115908909A (zh) 基于贝叶斯卷积神经网络的进化神经架构搜索方法及系统
CN107193993A (zh) 基于局部学习特征权重选择的医疗数据分类方法及装置
CN112687329A (zh) 一种基于非癌组织突变信息的癌症预测系统及其构建方法
CN117976185A (zh) 一种联合深度学习的乳腺癌风险评估方法与系统
CN117079049A (zh) 基于双度量网络的小样本细粒度图像分类方法
CN114692748B (zh) 一种磨玻璃肺结节的识别方法
CN119993269A (zh) 基于图神经网络的癌症驱动基因预测方法及系统
CN115601342B (zh) 一种基于混合稀疏的样本和特征选择方法、装置
CN108304546B (zh) 一种基于内容相似度和Softmax分类器的医学图像检索方法
CN118824361A (zh) 一种基于深度学习模型的遗传变异致病性预测方法及系统
TWI796251B (zh) 穩健預測模型的建立方法、預測系統及阿茲海默症預測系統

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2023543455

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23803058

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 23803058

Country of ref document: EP

Kind code of ref document: A1