WO2025059950A1

WO2025059950A1 - A machine learning technique for protein-ligand binding prediction

Info

Publication number: WO2025059950A1
Application number: PCT/CN2023/120202
Authority: WO
Inventors: Lijun Wu; Yingce XIA; Shufang XIE; Tao Qin; Tieyan LIU
Original assignee: Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2023-09-20
Filing date: 2023-09-20
Publication date: 2025-03-27
Anticipated expiration: 2026-03-20

Abstract

Disclosed is an end-to-end model that combines pocket prediction and docking to generate accurate and fast protein-ligand binding predictions. In some configurations, a unique ligand-informed pocket prediction module is leveraged for docking pose estimation. The model further enhances the docking process by incrementally integrating the predicted pocket to optimize protein-ligand binding, reducing discrepancies between training and inference. The disclosed protein-ligand modeling techniques provide advantages in terms of effectiveness and efficiency compared to existing methods.

Description

A MACHINE LEARNING TECHNIQUE FOR PROTEIN-LIGAND BINDING PREDICTION

BACKGROUND

Modeling the interaction between proteins and ligands and accurately predicting their binding structures is a critical yet challenging task in drug discovery. Recent advancements in deep learning have shown promise in addressing this challenge, with sampling-based and regression-based methods emerging as two prominent approaches. However, these methods have notable limitations. Sampling-based methods often suffer from low efficiency due to the need to generate multiple candidate structures for selection. On the other hand, regression-based methods offer fast predictions but may experience decreased accuracy. Additionally, the variation in protein sizes often requires the use of external modules for selecting suitable binding pockets, further reducing efficiency.

It is with respect to these and other considerations that the disclosure made herein is presented.

SUMMARY

Disclosed is an end-to-end model that combines pocket prediction and docking to generate accurate and fast protein-ligand binding predictions. In some configurations, a unique ligand-informed pocket prediction module is leveraged for docking pose estimation. The model further enhances the docking process by incrementally integrating the predicted pocket to optimize protein-ligand binding, reducing discrepancies between training and inference. The techniques of this disclosure demonstrate advantages in terms of effectiveness and efficiency compared to existing methods.

Features and technical benefits other than those explicitly described above will be apparent from a reading of the following Detailed Description and a review of the associated drawings. This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. The term “techniques, ” for instance, may refer to system (s) , method (s) , computer-readable instructions, module (s) , algorithms, hardware logic, and/or operation (s) as permitted by the context described above and throughout the document.

BRIEF DESCRIPTION OF THE DRAWINGS

The Detailed Description is described with reference to the accompanying figures. In the figures, the left-most digit (s) of a reference number identifies the figure in which the reference number first appears. The same reference numbers in different figures indicate similar or identical items. References made to individual items of a plurality of items can use a reference number with a letter of a sequence of letters to refer to each individual item. Generic references to the items may use the specific reference number without the sequence of letters.

FIG. 1 illustrates a machine learning model generating a protein-ligand complex from a protein and a ligand.

FIG. 2 illustrates a graph representation of the protein-ligand complex.

FIG. 3 illustrates one example architecture of a machine learning model for implementing the techniques of this disclosure.

FIG. 4 illustrates a binding layer of a pocket prediction module or a docking prediction module.

FIG. 5 illustrates predicting a docking pose of a ligand within a predicted pocket.

FIG. 6 is a flow diagram of an example method for a machine learning technique for protein-ligand binding prediction.

FIG. 7 shows a computer architecture diagram of a computing device capable of implementing aspects of the techniques and technologies presented herein.

DETAILED DESCRIPTION

Biomolecular interactions are the cornerstone of many essential functions within the human body. Some examples include protein-ligand binding, protein-protein interaction, protein-DNA interaction, etc. Of particular interest to the realm of drug discovery is how drug-like small molecules (ligands) bind to proteins. Techniques such as molecular docking are used to predict the conformation of a ligand when it binds to a target protein. The resulting docked protein-ligand complex can provide valuable insights for drug development.

Fast and accurate prediction of the docked ligand pose has proven challenging. Two families of methods are commonly used: sampling-based and regression-based prediction. Sampling-based approaches rely on physics-informed empirical energy functions to score and rank the large number of sampled conformations. Even with the use of deep learning-based scoring functions for conformation evaluation, sampling-based methods still need a large number of potential ligand poses for selection and optimization. Some sampling-based techniques utilize a deep diffusion model that significantly improves accuracy. However, such models still require a large number of sampled/generated ligand poses for selection, resulting in high computational costs and correspondingly long runtimes to predict docking.

The regression-based methods that use deep learning models to predict the docked ligand pose bypass the dependency on the sampling process. Some techniques utilize a two-stage framework to simulate the docking process by predicting the protein-ligand distance matrix and then optimizing the pose. Other techniques directly predict the docked pose coordinates. Though efficient, the accuracy of these regression-based methods falls behind the sampling-based methods.

An additional challenge to existing techniques for predicting protein-ligand binding is variations in protein size. For example, large protein sizes may require the use of external modules to identify suitable binding pockets before predicting the docking pose. Identifying binding pockets as a separate step reduces efficiency by spending computing resources to refine a binding pocket that is not compatible with the ligand in question. For instance, some products use P2Rank to generate the pocket center candidates. (Radoslav Krivák and David Hoksza. P2rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. Journal of cheminformatics, 10 (1) : 1-12, 2018. ) Using a separate module to identify binding pockets results in the need for a separate module for pocket selection, increasing the training and inference complexity.

To address these limitations, the inventors have developed an end-to-end framework for predicting both the location of a binding pocket and the way a ligand binds to that pocket. The disclosed techniques unify pocket prediction and docking, streamlining the process within a single model architecture. The single model architecture improves accuracy by incorporating information about the protein and the ligand when predicting a binding pocket and a docking pose. Efficiency is also improved by avoiding the identification of binding pockets that are incompatible with the ligand. The disclosed techniques use a novel deep learning architecture to employ regression-based methods that have lower computational costs compared to existing regression-based methods, and yet are able to achieve accuracy comparable to sampling-based approaches.

The single model architecture consists of a series ofequivariant layers with geometry-aware updates, allowing for either pocket prediction or docking. An equivariant module is a layer or a set of layers in a machine learning model that maintains a certain form of symmetry between the input and output. This means that if the input is transformed in some way, the output will be transformed in a similar way. In some configurations, for pocket prediction, lightweight layer configurations are utilized to maintain efficiency without sacrificing accuracy.

In contrast to conventional pocket prediction, which only uses a protein as input and forecasts multiple potential pockets, the disclosed techniques incorporate a specific ligand to pinpoint the unique pocket for that ligand. Integrating the ligand into pocket prediction is crucial as it aligns with the fundamental characterization of the docking problem. In this way, a quick and precise pocket prediction is obtained, without having to introduce external modules such as P2Rank.

The disclosed embodiments are distinguished from existing pocket prediction techniques with the introduction of a novel equivariant module. The disclosed embodiments are also distinguished from existing techniques by combining two separate loss functions into the machine learning model architecture. Another improvement over prior techniques is jointly training the pocket prediction module and the docking module, leveraging knowledge gained from the docking module to improve the performance of pocket prediction.

Several strategies are additionally proposed to make a fast and accurate docking prediction:

(1) A pocket prediction module operates as the first layer in a model hierarchy and is jointly trained with the subsequent docking module. This ensures a seamless, end-to-end process for protein-ligand docking prediction. In some configurations, a pocket center constraint is added to the model architecture using Gumbel-Softmax. This pocket center constraint assigns a probabilistic weighting to the inclusion of amino acids in the pocket, which helps to identify the most probable pocket center and improve the precision of the docking prediction.

(2) The predicted pocket is incorporated into the docking module training using a scheduled sampling approach. The scheduled sampling approach ensures consistency between the training and inference stages with respect to the pocket. This avoids any mismatch that may arise from using the native pocket during training and the predicted pocket during inference. The scheduled sampling approach also ensures that the models are trained on a variety of possible pockets, allowing the model to generalize well to new docking scenarios.

(3) Directly predicting the ligand pose and optimizing the coordinates based on the protein-ligand distance map are both widely adopted in the sampling-based methods to ensure efficiency. In some configurations, these predictions are used in a different way to produce a more accurate pose prediction.

To evaluate the performance of the disclosed embodiments, experiments were conducted on the binding structure prediction benchmark. The disclosed embodiments were compared to multiple existing methods. Results demonstrate that the disclosed embodiments outperform existing methods, achieving a mean ligand Root Mean Square Deviation (RMSD) of 6.4. This is a significant improvement over previous methods and demonstrates the effectiveness of the disclosed embodiments.

Another benefit demonstrated by the disclosed embodiments is superior generalization ability, performing surprisingly well on unseen proteins. This suggests that the disclosed embodiments can be applied to a wide range of docking scenarios and has the potential to be a useful tool in drug discovery.

In addition to achieving superior performance, the disclosed embodiments are also much more efficient during inference. For example, the disclosed embodiments have been sampled as performing 170 times faster than popular sampling-based techniques. This efficiency is critical in real-world drug discovery scenarios, where time and resources are often limited.

FIG. 1 illustrates a schematic overview of the functioning of machine learning model 102 for generating a model of a protein-ligand complex 110 from protein 104 and ligand 108. Machine learning model 102 takes as inputs a representation of protein 104 and a representation ofligand 108. Protein 104 may have one or more pockets 106 to which ligand 108 is capable of binding. Protein 104 is formed from multiple amino acids which are also known as amino acid residues and are referred to herein simply as residues. Some subset of the residues 112 are identified as the residues that make up pocket 106. Pocket 106 may be defined in part by pocket center 114-a coordinate at the center of pocket 106. In some configurations, machine learning model 102 analyzes protein 104 in conjunction with ligand 108 to predict the shape of pocket 106.

Additionally, or alternatively, machine learning model 102 predicts protein-ligand complex 110. Protein-ligand complex 110 indicates which pocket 106 ligand 108 will bind with, as well as the shape of pocket 106. Protein-complex 110 may also predict the docking pose of ligand 108 as it binds with pocket 106. Docking pose refers to the conformation and orientation of ligand 108 as it binds with pocket 106, where the conformation refers to the spatial arrangement of atoms of ligand 108.

FIG. 2 shows a diagram 200 that illustrates a protein-ligand complex graph 210 which is a mathematical representation, specifically a graph representation, of a protein-ligand complex such as protein-ligand complex 110. In a graph representation of protein 104, each residue (i.e., amino acid) is a node and each peptide bond between two residues is represented as an edge. For the ligand 108, each atom is a node and chemical bonds between atoms are represented as edges. Protein-ligand complex graph 210 is denoted as The graph 210 has nodes 212and edges 214 ε. The nodes 212 are divided into two sets, andwhich represent atoms of the ligand and residues of the protein, respectively. Node 212A is part of protein subgraph 220, while node 212B is part of ligand subgraph 230. Similarly, the edges 214 are divided into three setsandreferring to chemical bonds within the ligand 232, peptide bonds within the protein 222, and external contact surfaces 242 between the residues of protein and the atoms of the ligand, respectively. Edge 214A is part of protein subgraph 220, while edge 214B is part of ligand subgraph 230.

The ligand subgraph 230 iswhere nodeis an atom, h_i is the pre-extracted feature, andis the corresponding coordinate in three-dimensional space. Pre-extracted features of atoms v_i may include atomic number, covalent radius, electrostatic charge, etc. The number of atoms is denoted as n^l. Features may be pre-extracted features using existing techniques such as those described in TorchDrug. (Zhaocheng Zhu, et al. Torchdrug: A powerful and flexible machine learning platform for drug discovery. arXiv preprint arXiv: 2202.08320, 2022. )

The protein subgraph 220 iswhere nodeis a residue, h_j is initialized with a pre-trained feature, andis the location of the C_∝ atom in the residue. Techniques for creating graph representations of proteins are known to those of ordinary skill in the art. (Zeming Lin, et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv, 2022 and Gabriele Corso, et al. Diffdock: Diffusion steps, twists, and turns for molecular docking. arXiv preprint arXiv: 2210.01776, 2022. ) C_∝ refers to the alpha carbon in the amino acid residue, the key atom around which the rest of the residue is oriented. The number of residues is denoted as n^p. In some configurations, the edge setis constructed with a cut-off distance. The cut-off distance may be set arbitrarily to any distance (typically measured in Angstroms) such as, for example, between 1 16such 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 (angstroms) .

FIG. 3 illustrates one example architecture 300 of the machine learning model 102. Machine learning model 102 includes pocket prediction module 302 and docking module 304. Pocket prediction module 310 receives graph representation311 as input. Graph representation311 includes protein subgraph 321 and ligand subgraph 331, which describe protein 104 and ligand 108, respectively. Protein subgraph 321 and ligand subgraph 331 are similar to protein subgraph 220 and ligand subgraph 230, except that initially, Graph representation311 may not include external contact surfaces.

Pocket prediction module 310 processes graph representation311 with M pocket binding layers 320 to predict the coordinates of the pocket center 114. The output of one of pocket binding layers 320 is provided as input to the next binding layer, until the final output of pocket prediction module 310 is obtained. After identifying the pocket center 114, predicted pocket subgraph 330 is defined as a set of residues 112 within a fixed radius around the pocket center 114. The result of pocket binding layers 320 may be provided to Gumbel-Softmax layer 322. Gumbel-softmax layer 322 approximates discrete categorical distributions with a continuous distribution that is differentiable, allowing end-to-end back-propagation to include pocket binding layers 320.

Predicted pocket subgraph 330 is denoted aswith n^p* residues in the pocket. The nodes that are included in the pocket are illustrated in FIG. 3 with cross hatching. The pocket subgraph 330 and ligand subgraph 230 form a new pocket-ligand complex 340, which is provided as input to docking module 350: defines edges in the external contact surface between predicted pocket subgraph 330 and ligand subgraph 230. For clarity, indices i, k are used for ligand nodes and j, k′ for protein nodes. In some configurations, pocket prediction subgraph 330 represents a bounded protein and ligand subgraph 230 represents an unbounded ligand. As such, docking module 350 treats the bounded pocket prediction subgraph 330 as fixed while unbounded ligand subgraph 230 may change shape while the final docking pose is being determined.

Docking module 350 predicts the docking pose complex 110 of the ligand, denoted asSpecifically, N docking binding layers 360 process pocket-ligand complex 340 to generate docking pose 370. The docking binding layers 360 are similar to pocket binding layer 320, sharing at least an independent message passing layer, a cross-attention update layer, and an interfacial message passing layer. Due to the similarities, they may be referred to simply as binding layers without specifying them as the layers used in pocket prediction or docking pose estimation. Iterative refinement 380 provides predicted docking pose complex 370 as input to docking binding layers 360 for further refinement of predicted docking pose complex 370. The final iteration is emitted from docking module 350 as docking pose complex 110. In some configurations, docking module 350 focuses on the blind docking scenario in which the location of the binding pocket is not known in advance.

In some configurations, a graphical representation of docking pose complex 110 is generated and displayed on an output device. The graphical representation could be a 3D model, such as a space filling model or a ball and stick model showing the orientation ofligand 108 in predicted pocket 106 of protein 104. The graphical representation may be manipulated by a user and rotated in space in order to see the interaction of ligand 108 and protein 104 from different angles. Thus, techniques of this disclosure can include displaying a graphical representation of docking pose complex 110 on an output device.

FIG. 4 illustrates pocket binding layer 320. For clarity, pocket prediction module 310 is used for a demonstration, but the same description of pocket binding layer 320 applies to docking binding layer 360 of docking module 350.

Binding Layer Overview

Binding layer 402 (can be either the pocket binding layers 320 or the docking binding layers 360) encodes node-level information-data or features specific to each node-as vectors. For example, node-level information of ligand subgraph 331 is encoded in vector h_i, while node-level information of protein subgraph 321 is encoded in h_j. Additionally, binding layer 402 models pair embeddings 440 for each protein residue-ligand atom pair (i, j) , capturing the relationship between each pair of protein residue and ligand atom. When binding layer 402 is part of docking module 350, pair embeddings 440 may encode relationships between residues 112 of pocket 106 and atoms of ligand 108.

Pair embedding 440, denoted as z_ij, may be constructed by an outer product module (OPM) : “Outer product” refers to a mathematical operation that takes two vectors and returns a matrix. Here, OPM is a function that takes two embedding vectors, such as h_i and h_j, and returns matrix z_ij that captures the relationship between them. “Linear” refers to a linear mathematical transform that makes vectors more expressive or otherwise useful. For example, Linear (x) may multiply vector x with a matrix A and add a bias term b to achieve a desired effect. For the initial pair embedding, OPM operates on the initial protein/ligand node embeddings h_i and h_j.

Binding layer 402 conducts three-step message passing: (1) The first step is an independent message passing layer 470 in which the protein and the ligand are handled separately. Independent message passing layer 470 passes messages within the ligand or within the protein to update node embeddings and coordinates. Messages are not passed between nodes of the ligand and nodes of the protein. (2) The second step is a cross-attention update layer 480. Cross-attention update layer 480 operates to exchange information across every node, including between the protein and the ligand, and updates pair embeddings accordingly. (3) The third step is an interfacial message passing layer 490. Interfacial message passing layer 490 focuses on the contact surface between ligand subgraph 331 and protein subgraph 321, and attentively updates coordinates and representations for such nodes. One benefit of this model architecture is the recognition of distinct characteristics between internal interactions within the ligand or protein and external interactions between the ligand and protein.

Several instances of binding layer 402 may be stacked on top of each other. In some configurations, a final independent message passing layer 470 processes the results of the interracial message passing layer 490 of the last iteration of binding layer 402. This allows for further adjustment before yielding the output of pocket prediction module 310/docking module 350.

In some configurations, the independent message passing layer 470 and interfacial message passing layer 490 are E (3) -equivariant. E (3) refers to a group of rigid body motions in 3D space, including translations, rotations, and reflections. If a layer is E (3) -equivariant, it means that applying a transformation (like rotation or translation) to the input will result in an equivalent transformation of the output.

At the same time, cross-attention update layer 480 is E (3) -invariant because it does not encode structure. The output of an E (3) -invariant remains the same even after a transformation is applied to the input.

Independent Message Passing

In some configurations, a variant of Equivariant Graph Convolutional Layer (EGCL) is used as independent message passing layer 470. EGCL is a specialized neural network layer designed to work on graph-structured data while maintaining E (3) -equivariance, i.e., preserving certain geometric properties like rotation and translation. Equations (1) - (3) listed below are specific to the detailed message passing of ligand nodes. The equations for message passing of protein nodes are the same, changing the node embedding h_i/h_j from ligand nodes to protein nodes. When used for protein nodes, h refers to a protein residue instead of a ligand atom, and the feature representations and coordinates may be updated based on edges within the protein subgraph 321 instead of edges within the ligand subgraph 331.

With the ligand atom embeddingand the corresponding coordinatein the l-th layer of M pocket binding layers 320, independent message passing is performed as follows:

(1)

(2)

(3)

where φ_e, φ_x, φ_h are Multi-Layer Perceptons (MLPs) anddenotes the neighbors of node i regarding the internal edges ε^l of the ligand. A multi-layer perceptron is a type of artificial neural network that consists of multiple layers of interconnected nodes, or "neurons. " It′s one of the simplest types of feedforward neural networks, meaning that data flows in one direction-from the input layer to the output layer, without any cycles. However, the disclosed embodiments are not limited to MLPs -other types of artificial neural networks are similarly contemplated.

Equations (1) - (3) describe how information flows across ligand subgraph 331 in independent message passing layer 470. Equivalent equations, with minor modifications discussed above, apply the same or similar techniques to when processing protein subgraph 321.

Equation (1) generates a "message" between two neighbor atoms in the ligand subgraph 331. Atoms are neighbors if an edge connects them in ligand subgraph 331. This message is a function of the current features of the atoms and their spatial distance, and will be used in equations (2) and (3) to update the feature representation and spatial coordinates of atom i. Specifically, in equation (1) , m_ik refers to a message from node k to node i. m_ik is the result of applying a perceptron or equivalent neural network φ_e to the embeddingsandand the distancebetween atoms i and k.

Equation (2) updates the feature representation of a given atom i by taking the current feature representation and adding to it a value that depends on its current features and aggregated information coming from its ligand subgraph 331 neighbors. Specifically, equation (2) updates feature representationof node i based on incoming messages m_ik from neighborsThem_ik expression aggregates messages from all neighbors of node i. The aggregation is passed withto perceptron φ_h, the result of which is added to the original feature vectorin order to update

Equation (3) takes the existing coordinates of an atom i, looks at the distance from each neighbor, and then adjusts the coordinates based on these distances and the messages m_ik received from each neighbor. Specifically, equation (3) updates spatial coordinatesbased on incoming messages m_ik from neighborsof node i. is the current spatial coordinates of the i-th ligand atom at the l-th pocket binding layer 320. For each neighbor k, the distance between i and k is computed asThis distance is then weighted by multiplying the distance by φ_x (m_itk) , a perceptron that processes the message received from node k. Distances are weighted to determine how much influence each message should have when updating the coordinate. Theexpression averages the weighted distances. The resulting average is added to the originalcoordinate to update thecoordinate.

Independent message passing layer 470 emits H’ 472-the updated feature vectors of the nodes of ligand subgraph 331. The feature vectors of protein subgraph 321 are similarly processed, with slight modifications to equations (1) - (3) as discussed above, to produce feature vectors H’ 474. Independent message passing layer 470 is labeled as independent because nodes from ligand subgraph 331 do not affect nodes from protein subgraph 321, and vice-versa.

Cross-attention Update

Cross-attention is applied to enhance feature vectors H’ 472 and H’ 474 generated by independent message passing layer 470. The cross-attention update of a particular node is affected by messages received from all protein/ligand nodes. As such, unlike the independent message passing layer 470, cross-attention update layer 480 causes nodes from ligand subgraph 331 to be affected by nodes from protein subgraph 321, and vice-versa. Pair embeddings B 476 are also updated according to messages from all ligand/protein nodes.

Equations (4) and (5) below are specific to processing nodes from the ligand subgraph 331. Similar equations may also be applied to nodes from protein subgraph 321, with slight modifications as discussed in conjunction with independent message passing layer 470. Given node embeddingsandobtained from ligand atom representations H’ 472 and protein residue representations H’ 474, respectively, and the pair embeddingsobtained from pair embeddings B 476, multi-head cross-attention is performed over all protein residues:

(4)

(5)

whereare linear projections of the node embeddingsandrespectively, andis a linear transformation of pair embeddingAnalogous equations, as discussed above, are used to update protein embeddingsobtained from H’ 474. Based on updated node embeddingsandthe pair embeddings obtained from B 476 are further updated by

Cross-attention update layer 480 uses equations (4) and (5) , and their protein residue equivalents, to update what the model knows about each ligand atom and each protein residue. The update is based on the features of each atom/residue and those of their neighbors in graph representation311. As with other attention mechanisms, cross-attention update layer 480 allows parts of the neural network to focus on the most relevant information for a given task. Cross-attention update layer 480 outputs ligand representations H” 482 and protein representations H” 484, which contain the refinedandnode embeddings, respectively. Cross-attention layer 480 also outputs pair embeddings B’ 486, which contain the refinedpair embeddings.

Equation (4) calculates the attention weightbetween ligand atom i and protein residue j for the h^th attention head. This attention weight determines how much attention the ligand atom i should pay to protein residue j for the h-th attention head. Attention weight represents the importance of residuej when updating the feature representation of atom i.

Specifically, equation (4) computes a dot product of a scaled query vectorand a key vectorand adds the result to a pairwise interactionThe result is processed by softmax_j, which, for a given atom i, generates a probability distribution across all residuesj. As such, represents the importance (i.e., attention weight) of protein residue j for a particular atom i.

Equation (5) updates the node embeddingsThe updated node embeddingis based on weighted sums ofwhich are the ‘values’ of the attention mechanism. Specifically, theis weighted bythe attention weight, which is computed by equation (4) . concat_1≤h≤H concatenates the results obtained from different heads of a multi-head attention system -although this step is optional when a single-headed attention mechanism is employed. The result of the concatenation, or in the case of a single-headed attention mechanism the weighted average of the value vector, is then passed through a linear transformation before being added to the original node embeddingto update the node embedding

Interfacial Message Passing

With the updated ligand and protein representations H” 482 and H” 484, respectively, interfacial message passing layer 490 may be applied to update the included node features and the coordinates on the contact surface. In some configurations, interfacial message passing layer 490 has an additional attention bias:

(6)
(7)

(8)

whereandand φ_q, φ_k, φ_v, φ_b, φ_xv are MLPs. ε^lp*denotes the external edges between ligand and protein contact surface constructed by cut-off distanceThis means that edges in the graph are included for ligand and protein contact surfaces that are 10 Angstroms or less apart. Edges between two nodes of the ligand subgraph or the protein subgraph, or that are not within 10 Angstroms of each other, are not used in equations (6) - (8) . This allows interfacial message passing layer 490 to focus on the interaction between the protein pocket and the ligand. 10 Angstroms is illustrative, and other values are similarly contemplated.

Equation (6) is the attention equation. α_ij is the attention weight from node i to node j. q_i and k_ij are the query and key vectors, and bij is the attention bias. denotes the neighbors of node i given the edgesbetween the ligand and the protein.

Equation (7) updates the hidden representation of node embeddingsfor node i at layer l + 1 based on its previous representationand a sum over its neighbors, weighted by the attention weights α_ij and feature vectors v_ij. Equation (8) updates the coordinatefor node i at layer l + 1, similar to equation (5) discussed above.

Pocket Prediction

In pocket prediction module 310, given the protein-ligand complex graph311, residues 112 of the protein 104 that belong to the pocket 106 are identified. Previous works such as TankBind and E3Bind both use P2Rank to produce multiple pocket candidates. Subsequently, either affinity score (TankBind) or self-confidence score (E3Bind) is required to select the most appropriate docked pose. Though P2Rank is faster and more accurate than previous tools, it is based on numerical algorithms and traditional machine learning classifiers. Furthermore, the incorporation of P2Rank necessitates the selection of candidate poses following multiple pose docking. These factors restrict the performance and efficiency of fully deep learning-based docking approaches.

Pocket prediction module 310 differs from previous solutions by treating pocket prediction as a binary classification task on residues 112, where each residue 112 in protein 104 is classified as belonging to pocket 106 or not. In some configurations, a single pocket 106 is identified on protein 104. In this way, pocket prediction and ligand docking are integrated into a single framework that identifies a single docking pose in a single pocket. Converging on a single docking pose improves efficiency compared to solutions that first find multiple potential pockets and only then determine which pocket a ligand will dock with.

Specifically, a binary cross-entropy loss function 452 is used to train the pocket binding layers 320 of pocket prediction module 310:

where n_p is the number of residues in the protein, and y_j is the binary indicator for residue j (i.e., 1 if it belongs to a pocket, 0 otherwise) . is the predicted probability of residue j belonging to a pocket, is the protein-ligand complex graph 311, and σ is the sigmoid function. Binary cross-entropy loss function 452 trains the binding layers so that the probabilities p_j of a residue belonging to the pocket are as accurate as possible.

Constraint for Pocket Center

In addition to the direct classification of each residue to decide the pocket 106, a pocket center 114 is predicted around which the pocket 106 is defined. The pocket center 114 is a coordinate position in three-dimensional space. A sphere is then defined around the pocket center 114 with a radius, e.g., To accomplish this, a constraint about the pocket center 114 is added to make a more accurate prediction.

Specifically, a pocket center regression task is applied to the classified pocket residues. Given n^p′ predicted pocket residuesprovided by pocket binding layers 320, the pocket center coordinate is the average of the coordinates: With this predictd pocket center, a distance loss between the predicted pocket center x^p′ and the native pocket center x^p* can be computed.

The pocket center computation inherently involves discrete decisions, i.e., selecting which amino acids contribute to the pocket. However, gradient-based machine learning models prefer to work with smooth, differentiable functions. Hence, a technique such as Gumbel-Softmax may be applied to produce a differentiable approximation of the discrete selection process. Gumbel-Softmax, or an equivalent technique for providing a differentiable approximation, provides a probabilistic “hard” selection. This more accurately reflects the discrete decision to include or exclude an amino acid in the pocket.

where g_j is a random noise sampled from Gumbel distribution g_j = -log (-log U_m) , U_m～Uniform (0, 1) , and τ_e is the controllable temperature that controls the "sharpness" of the decisions. Then, a Huber loss may be computed between the predicted pocket center and the native pocket center x^p* as the pocket center constraint loss function 454:

The native pocket center x^p* is the actual geometric center of the pocket 106 in the protein 104. This would be known from experimental data or high-quality computational methods. While a Huber loss is used in this example, any other measure of difference is similarly contemplated, such as mean squared error. The Huber loss may be a “constraint loss” , meaning it may be a term in the overall loss function that the model aims to minimize during training.

The pocket prediction loss function 450 is comprised of classification loss452 and pocket center constraint loss454, with a weight factor α. For example, a weight factor α = 0.2 has been found useful:

By applying pocket prediction loss function 450 during training, pocket prediction module 310 learns to predict a pocket center based on a weighted average of the coordinates of residues believed to be part of the pocket. A measure of how closely this aligns with the actual, native, pocket center is made, e.g., with a Huber loss. By combining binary cross-entropy loss function 452 and pocket center constraint loss function 454, pocket binding layers 320 become better at predicting not just which residues 112 form pocket 106, but also where the pocket center 114 is located.

In some configurations, for some proteins, pocket binding layers 320 may predict each residue as negative for a pocket, making it impossible to determine the pocket center. This is likely due to an imbalance between the pocket and non-pocket residues. For these rare cases, the Gumbel-softmax predicted center xp is taken as the pocket center.

FIG. 5 illustrates predicting a docking pose of a ligand within a predicted pocket. In the docking task, given a pocket substructure520, ligand subgraph 331, and pair embeddings 440, docking binding layers 360 of docking module 350 predict the coordinate of each atom in the ligand subgraph331. The docking task is challenging since it requires the model to preserve E (3) -equivariance for every node while capturing pocket structure and chemical bonds of ligands.

Iterative refinement

Iterative refinement 380 is adopted in docking binding layers 360 to refine the structures by feeding the predicted ligand docking pose complex 370 back to the docking binding layers 360. During iterative refinement 380, new graphs are generated and the edges are constructed dynamically as inputs to docking binding layers 360. After k iterations of iterative refinement of the N binding layer alternations, the final coordinates x^L and node embeddingsandof predicted docking pose complex 370 are obtained.

In addition to directly optimizing the coordinate loss, distance map constraints are added to refine the ligand docking pose. Distance matrices may be constructed in two ways. One is to directly compute based on the predicted coordinates: The other is predicted from the pair embeddingswith a perceptron transition: where each vector outputs a distance scalar.

Training Loss

The docking loss function 550 is comprised of coordinate loss552 and distance map loss554:

552 may be computed as the Huber distance between the predicted coordinates and ground truth coordinates of the ligand atoms, although other distance measures may be used. 554 is comprised of three terms, each of which isloss between different components of the ground truth and the two reconstructed distance maps. Formally,

where D_ij is ground truth distance matrix. In some configurations, β = y = 1.0.

Training Strategy

As discussed herein, a hierarchical unified framework predicts the docked ligand coordinates based on the predicted protein pocket. One problem commonly encountered with machine learning models is a mismatch between data used when training and data used during inference. During training, ground-truth data is often used as input while during inference the input is a prediction generated by another module. For example, docking module 350 may be trained with a native, ground-truth, pocket, but during inference only the predicted pocket generated by pocket prediction module 310 will be available. Using the native pocket in this way is known as teacher-forcing training.

To reduce this mis-match, a scheduled training strategy is applied to gradually involve the predicted pocket in the training stage instead of using the native pocket only. Specifically, training consists of two stages: (1) in the initial stage, since the performance of pocket prediction is poor, the native pocket is used to perform the docking training; (2) In the second stage, with the improved pocket prediction ability, the predicted pocket is also used to train docking. In this second stage, the ratio between predicted pocket and native pocket may be 1∶ 3, for example, although other ratios are similarly contemplated.

Comprehensive Training Loss

Comprehensive training loss function 556 comprises two components: pocket prediction loss function 450 and the docking loss function 550,

Results

Table 1 below illustrates flexible blind self-docking performance. The top half contains results from traditional docking software; the bottom half contains results from recent deep learning based docking methods. The last line, “FABind” shows the results of the fast and accurate machine learning model of this disclosure. The number of poses that DiffDock samples is specified in parentheses. DiffDock was run three times with different random seeds, and the mean is reported. The symbol "*" means that the method operates exclusively on CPU. The superior results are emphasized by bold formatting, while those of the second-best are denoted by an underline.

Table 1

FIG. 6 is a flow diagram of an example method for fast and accurate protein-ligand binding. Routine 600 begins at operation 602, where a protein representation subgraph 321 and a ligand representation subgraph 331 are received.

Next at operation 604, a pocket prediction module 310 generates a pocket prediction 330 based on the protein subgraph 321 and the ligand subgraph 331. Pocket prediction module 310 may utilize multiple pocket binding layers 320, each of which may include independent message passing layer 470, cross-attention update layer 480, or interracial message passing layer 490.

Next at operation 606, a docking module 350 generates a predicted docking pose complex 370 that indicates how the ligand represented by subgraph 321 is predicted to bind with the protein represented by ligand subgraph 331.

Next at operation 608, during training, back-propagation is performed across docking module 350 and pocket prediction module 310. In some configurations, a combination of docking loss function 550 and pocket prediction loss function 450 are used to update the weights of pocket binding layers 320 and docking binding layers 360 of pocket prediction module 310 and docking module 350, respectively.

FIG. 7 shows additional details of an example computer architecture 700 for a device, such as a computer or a server configured as part of the systems described herein, capable of executing computer instructions (e.g., a module or a program component described herein) . The computer architecture 700 illustrated in FIG. 7 includes processing unit (s) 702, a system memory 704, including a random-access memory 706 ( “RAM” ) and a read-only memory ( “ROM” ) 708, and a system bus 710 that couples the memory 704 to the processing unit (s) 702. The processing unit (s) 702 include one or more hardware processors and may also comprise or be part of a processing system. In various examples, the processing unit (s) 702 of the processing system are distributed. Stated another way, one processing unit 702 may be located in a first location (e.g., a rack within a datacenter) while another processing unit 702 of the processing system is located in a second location separate from the first location.

Processing unit (s) , such as processing unit (s) 702, can represent, for example, a CPU-type processing unit, a GPU-type processing unit, a neural processing unit, a field-programmable gate array (FPGA) , another class of digital signal processor (DSP) , or other hardware logic components that may, in some instances, be driven by a CPU. For example, and without limitation, illustrative types of hardware logic components that can be used include Application-Specific Integrated Circuits (ASICs) , Application-Specific Standard Products (ASSPs) , System-on-a-Chip Systems (SOCs) , Complex Programmable Logic Devices (CPLDs) , etc.

A basic input/output system containing the basic routines that help to transfer information between elements within the computer architecture 700, such as during startup, is stored in the ROM 708. The computer architecture 700 further includes a mass storage device 712 for storing an operating system 714, application (s) 716, modules 718, and other data described herein. For example, the machine learning model 102 introduced in FIG. 1 may be stored in whole or part in the mass storage device 712.

The mass storage device 712 is connected to processing unit (s) 702 through a mass storage controller connected to the bus 710. The mass storage device 712 and its associated computer-readable media provide non-volatile storage for the computer architecture 700. Although the description of computer-readable media contained herein refers to a mass storage device, it should be appreciated by those skilled in the art that computer-readable media can be any available computer-readable storage media or communication media that can be accessed by the computer architecture 700.

Computer-readable media can include computer-readable storage media and/or communication media. Computer-readable storage media can include one or more of volatile memory, nonvolatile memory, and/or other persistent and/or auxiliary computer storage media, removable and non-removable computer storage media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Thus, computer storage media includes tangible and/or physical forms of media included in a device and/or hardware component that is part of a device or external to a device, including but not limited to random access memory (RAM) , static random-access memory (SRAM) , dynamic random-access memory (DRAM) , phase change memory (PCM) , read-only memory (ROM) , erasable programmable read-only memory (EPROM) , electrically erasable programmable read-only memory (EEPROM) , flash memory, compact disc read-only memory (CD-ROM) , digital versatile disks (DVDs) , optical cards or other optical storage media, magnetic cassettes, magnetic tape, magnetic disk storage, magnetic cards or other magnetic storage devices or media, solid-state memory devices, storage arrays, network attached storage, storage area networks, hosted computer storage or any other storage memory, storage device, and/or storage medium that can be used to store and maintain information for access by a computing device.

In contrast to computer-readable storage media, communication media can embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism. As defined herein, computer storage media does not include communication media. That is, computer-readable storage media does not include communications media consisting solely of a modulated data signal, a carrier wave, or a propagated signal, per se.

According to various configurations, the computer architecture 700 may operate in a networked environment using logical connections to remote computers through the network 720. The computer architecture 700 may connect to the network 720 through a network interface unit 722 connected to the bus 710. The computer architecture 700 also may include an input/output controller 724 for receiving and processing input from a number of other devices, including a keyboard, mouse, touch, or electronic stylus or pen. Similarly, the input/output controller 724 may provide output to a display screen, a printer, or other type of output device.

It should be appreciated that the software components described herein may, when loaded into the processing unit (s) 702 and executed, transform the processing unit (s) 702 and the overall computer architecture 700 from a general-purpose computing system into a special-purpose computing system customized to facilitate the functionality presented herein. The processing unit (s) 702 may be constructed from any number of transistors or other discrete circuit elements, which may individually or collectively assume any number of states. More specifically, the processing unit (s) 702 may operate as a finite-state machine, in response to executable instructions contained within the software modules disclosed herein. These computer-executable instructions may transform the processing unit (s) 702 by specifying how the processing unit (s) 702 transition between states, thereby transforming the transistors or other discrete hardware elements constituting the processing unit (s) 702.

The particular implementation of the technologies disclosed herein is a matter of choice dependent on the performance and other requirements of a computing device. Accordingly, the logical operations described herein are referred to variously as states, operations, structural devices, acts, or modules. These states, operations, structural devices, acts, and modules can be implemented in hardware, software, firmware, in special-purpose digital logic, and any combination thereof. It should be appreciated that more or fewer operations can be performed than shown in the figures and described herein. These operations can also be performed in a different order than those described herein.

It also should be understood that the illustrated methods can end at any time and need not be performed in their entireties. Some or all operations of the methods, and/or substantially equivalent operations, can be performed by execution of computer-readable instructions included on a computer-storage media, as defined below. The term “computer-readable instructions, ” and variants thereof, as used in the description and claims, is used expansively herein to include routines, applications, application modules, program modules, programs, components, data structures, algorithms, and the like. Computer-readable instructions can be implemented on various system configurations, including single-processor or multiprocessor systems, minicomputers, mainframe computers, personal computers, hand-held computing devices, microprocessor-based, programmable consumer electronics, combinations thereof, and the like.

Thus, it should be appreciated that the logical operations described herein are implemented (1) as a sequence of computer implemented acts or program modules running on a computing system and/or (2) as interconnected machine logic circuits or circuit modules within the computing system. The implementation is a matter of choice dependent on the performance and other requirements of the computing system. Accordingly, the logical operations described herein are referred to variously as states, operations, structural devices, acts, or modules. These operations, structural devices, acts, and modules may be implemented in software, in firmware, in special purpose digital logic, and any combination thereof.

Illustrative Embodiments

The following clauses describe multiple possible embodiments for implementing the features described in this disclosure. The various embodiments described herein are not limiting nor is every feature from any given embodiment required to be present in another embodiment. Any two or more of the embodiments may be combined together unless context clearly indicates otherwise. As used herein in this document “or” means and/or. For example, “Aor B” means A without B, B without A, or A and B. As used herein, “comprising” means including all listed features and potentially including addition of other features that are not listed. “Consisting essentially of” means including the listed features and those additional features that do not materially affect the basic and novel characteristics of the listed features. “Consisting of” means only the listed features to the exclusion of any feature not listed.

Clause 1. A method for predicting protein-ligand docking, comprising: receiving a protein subgraph (321) and a ligand subgraph (331) at a previously-trained machine learning model (102) comprising a pocket prediction module (310) and a docking module (350) ; generating, with the pocket prediction module (310) , a predicted pocket (330) based on the protein subgraph (321 ) and the ligand subgraph (331) ; and generating, with the docking module (350) , a docking pose complex (110) of the protein subgraph (321) and the ligand subgraph (331) , wherein the docking pose complex (ll0) indicates how the ligand subgraph (331) is bound to the predicted pocket (330) .

Clause 2. The method of clause 1, wherein the predicted pocket comprises a single predicted pocket generated by the pocket prediction module.

Clause 3. The method of clause 1 or 2, wherein the pocket prediction module includes a binding layer that includes an independent message passing layer, a cross-attention layer, and an interfaeial message passing layer.

Clause 4. The method of clause 3, wherein an output of the independent message passing layer is provided as input to the cross-attention layer.

Clause 5. The method of clause 3 or 4, wherein an output of the cross-attention layer is provided as input to the interfacial message passing layer.

Clause 6. The method of any of clauses 1 to 5, wherein the pocket prediction module or the docking module includes an independent message passing layer that: updates a feature vector of an atom within the ligand subgraph based on feature vectors of atoms that are neighbors of the atom within the ligand subgraph and distances to the atoms that are neighbors of the atom within the ligand subgraph; and updates a feature vector of a residue within the protein subgraph based on feature vectors of residues that are neighbors of the residue within the protein subgraph and distances to the residues that are neighbors of the residue within the protein subgraph.

Clause 7. The method of any of clauses 1 to 6, wherein the pocket prediction module or the docking module includes a cross-attention layer that: updates a feature vector of an atom within the ligand subgraph based on feature vectors of atoms that are neighbors of the atom and feature vectors of residues that are neighbors of the atom; and updates a feature vector of a residue within the protein subgraph based on feature vectors of atoms that are neighbors of the residue and feature vectors of atoms that are neighbors of the residue.

Clause 8. The method of any of clauses 1 to 7, wherein the pocket prediction module or the docking module includes an interfacial message passing layer that: updates a feature vector of an atom within the ligand subgraph based on feature vectors of residues that are neighbors of the atom within a defined cut-off distance; and updates coordinates of the atom based on distances to the residues that are neighbors of the atom within the defined cut-off.

Clause 9. A system comprising: a processing unit; and a computer-readable medium having computer-executable instructions stored thereupon, which, when executed by the processing unit, cause the processing unit to: receive a protein subgraph (321) and a ligand subgraph (331 ) at a machine learning model (102) comprising a pocket prediction module (310) and a docking module (350) ; generate, with the pocket prediction module (310) , a predicted pocket (330) based on the protein subgraph (321) and the ligand subgraph (331) ; generate, with the docking module (350) , a docking pose complex (370) of the protein subgraph (321) and the ligand subgraph (331) , wherein the ligand subgraph (331) is bound to the predicted pocket (330) ; and update weights of the machine learning model (102) based on a loss function (556) that combines a pocket prediction loss function (450) and a docking loss function (550) .

Clause 10. The system of clause 9, wherein the docking loss function comprises a coordinate loss function and a distance map loss function.

Clause 11. The system of clause 10, wherein the distance map loss function is computed based on predicted coordinates of a node of protein subgraph (321) or ligand subgraph (331) .

Clause 12. The system of clause 10, wherein the distance map loss function is generated by a multi-level perceptron of pair embeddings.

Clause 13. The system of any one of clauses 10 to 12, wherein the coordinate loss function comprises a distance measurement between predicted coordinates and ground truth coordinates.

Clause 14. The system of any of clauses 9 to 13, wherein the pocket prediction loss function comprises a binary classification that determines whether a residue belongs to the predicted pocket or not.

Clause 15. The system of any of clauses 9 to 14, wherein the pocket prediction loss function comprises a pocket center constraint loss function.

Clause 16. A computer-readable storage medium having encoded thereon computer-readable instructions that when executed by a processing unit causes a system to: receive a protein subgraph (321) and a ligand subgraph (331) at a machine learning model (102) comprising a pocket prediction module (310) ; and generate, with the pocket prediction module (310) , a predicted pocket (330) based on the protein subgraph (321) and the ligand subgraph (331) .

Clause 17. The computer-readable storage medium of clause 16, wherein the instructions further cause the processing unit to: generate, at a docking module (350) , a docking pose complex (370) of the protein subgraph (321) and the ligand subgraph (331) , wherein the ligand subgraph (331) is bound to the predicted pocket (330) .

Clause 18. The computer-readable storage medium of clause 17, wherein the docking pose prediction module comprises a binding layer that includes an independent message passing layer, a cross-attention layer, and an interfacial message passing layer.

Clause 19. The computer-readable storage medium of clause 18, wherein the docking pose prediction module iteratively refines the docking pose complex by providing the docking pose complex as input to the binding layer to generate a refined docking pose complex.

Clause 20. The computer-readable storage medium of any of clauses 17 to 19, wherein the machine learning model is trained by applying a back-propagation algorithm that propagates a loss across the docking pose prediction module and the pocket prediction module.

Conclusion

While certain example embodiments have been described, these embodiments have been presented by way of example only and are not intended to limit the scope of the inventions disclosed herein. Thus, nothing in the foregoing description is intended to imply that any particular feature, characteristic, step, module, or block is necessary or indispensable. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the inventions disclosed herein. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of certain of the inventions disclosed herein.

The terms “a, ” “an, ” “the” and similar referents used in the context of describing the invention are to be construed to cover both the singular and the plural unless otherwise indicated herein or clearly contradicted by context. The terms “based on, ” “based upon, ” and similar referents are to be construed as meaning “based at least in part” which includes being “based in part” and “based in whole, ” unless otherwise indicated or clearly contradicted by context. The terms “portion, ” “part, ” or similar referents are to be construed as meaning at least a portion or part of the whole including up to the entire noun referenced.

It should be appreciated that any reference to “first, ” “second, ” etc. elements within the Summary and/or Detailed Description is not intended to and should not be construed to necessarily correspond to any reference of “first, ” “second, ” etc. elements of the claims. Rather, any use of “first” and “second” within the Summary, Detailed Description, and/or claims may be used to distinguish between two different instances of the same element.

In closing, although the various techniques have been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended representations is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as example forms of implementing the claimed subject matter.

Furthermore, references have been made to publications, patents and/or patent applications throughout this specification. Each of the cited references is individually incorporated herein by reference for its particular cited teachings as well as for all that it discloses.

Claims

A method for predicting protein-ligand docking, comprising:

receiving a protein subgraph (321) and a ligand subgraph (331) at a previously-trained machine learning model (102) comprising a pocket prediction module (310) and a docking module (350) ;

generating, with the pocket prediction module (310) , a predicted pocket (330) based on the protein subgraph (321) and the ligand subgraph (331) ; and

generating, with the docking module (350) , a docking pose complex (110) of the protein subgraph (321) and the ligand subgraph (331) , wherein the docking pose complex (110) indicates how the ligand subgraph (331) is bound to the predicted pocket (330) .
The method of claim 1, wherein the predicted pocket comprises a single predicted pocket generated by the pocket prediction module.
The method of claim 1 or 2, wherein the pocket prediction module includes a binding layer that includes an independent message passing layer, a cross-attention layer, and an interfacial message passing layer.
The method of claim 3, wherein an output of the independent message passing layer is provided as input to the cross-attention layer.
The method of claim 3 or 4, wherein an output of the cross-attention layer is provided as input to the interfacial message passing layer.
The method of any of claims 1 to 5, wherein the pocket prediction module or the docking module includes an independent message passing layer that:

updates a feature vector of an atom within the ligand subgraph based on feature vectors of atoms that are neighbors of the atom within the ligand subgraph and distances to the atoms that are neighbors of the atom within the ligand subgraph; and

updates a feature vector of a residue within the protein subgraph based on feature vectors of residues that are neighbors of the residue within the protein subgraph and distances to the residues that are neighbors of the residue within the protein subgraph.
The method of any of claims 1 to 6, wherein the pocket prediction module or the docking module includes a cross-attention layer that:

updates a feature vector of an atom within the ligand subgraph based on feature vectors of atoms that are neighbors of the atom and feature vectors of residues that are neighbors of the atom; and

updates a feature vector of a residue within the protein subgraph based on feature vectors of atoms that are neighbors of the residue and feature vectors of atoms that are neighbors of the residue.
The method of any of claims 1 to 7, wherein the pocket prediction module or the docking module includes an interfacial message passing layer that:

updates a feature vector of an atom within the ligand subgraph based on feature vectors of residues that are neighbors of the atom within a defined cut-off distance; and

updates coordinates of the atom based on distances to the residues that are neighbors of the atom within the defined cut-off.
A system comprising:

a processing unit; and

a computer-readable medium having computer-executable instructions stored thereupon, which, when executed by the processing unit, cause the processing unit to:

receive a protein subgraph (321) and a ligand subgraph (331) at a machine learning model (102) comprising a pocket prediction module (310) and a docking module (350) ;

generate, with the pocket prediction module (310) , a predicted pocket (330) based on the protein subgraph (321) and the ligand subgraph (331) ;

generate, with the docking module (350) , a docking pose complex (370) of the protein subgraph (321) and the ligand subgraph (331) , wherein the ligand subgraph (331) is bound to the predicted pocket (330) ; and

update weights of the machine learning model (102) based on a loss function (556) that combines a pocket prediction loss function (450) and a docking loss function (550) .
The system of claim 9, wherein the docking loss function comprises a coordinate loss function and a distance map loss function.
The system of claim 10, wherein the distance map loss function is computed based on predicted coordinates of a node of protein subgraph (321) or ligand subgraph (331) .
The system of claim 10, wherein the distance map loss function is generated by a multi-level perceptron of pair embeddings.
The system of any one of claims 10 to 12, wherein the coordinate loss function comprises a distance measurement between predicted coordinates and ground truth coordinates.
The system of any of claims 9 to 13, wherein the pocket prediction loss function comprises a binary classification that determines whether a residue belongs to the predicted pocket or not.
The system of any of claims 9 to 14, wherein the pocket prediction loss function comprises a pocket center constraint loss function.
A computer-readable storage medium having encoded thereon computer-readable instructions that when executed by a processing unit causes a system to:

receive a protein subgraph (321) and a ligand subgraph (331) at a machine learning model (102) comprising a pocket prediction module (310) ; and

generate, with the pocket prediction module (310) , a predicted pocket (330) based on the protein subgraph (321) and the ligand subgraph (331) .
The computer-readable storage medium of claim 16, wherein the instructions further cause the processing unit to:

generate, at a docking module (350) , a docking pose complex (370) of the protein subgraph (321) and the ligand subgraph (331) , wherein the ligand subgraph (331) is bound to the predicted pocket (330) .
The computer-readable storage medium of claim 17, wherein the docking pose prediction module comprises a binding layer that includes an independent message passing layer, a cross-attention layer, and an interfacial message passing layer.
The computer-readable storage medium of claim 18, wherein the docking pose prediction module iteratively refines the docking pose complex by providing the docking pose complex as input to the binding layer to generate a refined docking pose complex.
The computer-readable storage medium of any of claims 17 to 19, wherein the machine learning model is trained by applying a back-propagation algorithm that propagates a loss across the docking pose prediction module and the pocket prediction module.