PandaDock-GNN Overview
PandaDock-GNN is an SE(3)-equivariant Graph Neural Network scoring function for protein-ligand binding affinity prediction.
Key Features
- SE(3)-Equivariance
The model produces identical predictions regardless of the rotation or translation of the input complex. This is achieved through E(n)-equivariant graph neural network (EGNN) layers.
- Heterogeneous Graph Representation
Protein and ligand atoms are represented as separate node types in a heterogeneous graph, allowing the model to learn distinct representations for each.
- Multi-Task Learning
The model jointly predicts:
pEC50: Binding affinity (regression)
Activity: Binary classification (active/inactive)
Architecture
Input: Protein-Ligand Complex
│
├─ Protein Atoms → Node Features (56 dims)
├─ Ligand Atoms → Node Features (56 dims)
└─ Interactions → Edge Features (23 dims)
│
├─ Node Encoders (separate for protein/ligand)
│
├─ EGNN Layers × 6 (SE(3)-equivariant message passing)
│ - Update node features
│ - Update coordinates (equivariant)
│
├─ Attention Pooling
│ - Protein graph → embedding
│ - Ligand graph → embedding
│
└─ Prediction Heads
├─ Affinity → pEC50
└─ Activity → probability
Node Features (56 dimensions)
Element type one-hot (10 dims)
SYBYL atom type one-hot (16 dims)
Partial charge (1 dim)
Hybridization one-hot (4 dims)
Aromaticity flag (1 dim)
H-bond donor/acceptor (2 dims)
Ring membership (1 dim)
Residue type one-hot (20 dims, protein only)
Backbone flag (1 dim, protein only)
Edge Features (23 dimensions)
Distance (1 dim)
Gaussian RBF distance encoding (16 dims)
Bond type one-hot (4 dims)
Interaction type flags (2 dims)
Benchmark Performance
ULVSH Dataset (942 compounds, 10 protein targets):
Metric |
Value |
|---|---|
Pearson R |
0.82 |
Spearman ρ |
0.80 |
RMSE |
0.32 |
MAE |
0.12 |
BindingDB Dataset (8,891 protein-ligand complexes):
Training Configuration |
Test Pearson R |
Test RMSE |
|---|---|---|
BindingDB Only |
0.81 |
|
BindingDB + ULVSH |
0.79 |
0.96 |
PDBbind v2020 (5,316 complexes):
Metric |
Value |
|---|---|
Pearson R |
0.88 |
Spearman ρ |
0.88 |
RMSE |
0.93 pK |
PandaDock-GNN outperforms all baseline methods including: VM2, MMPBSA, MMGBSA, Gnina, Hyde, DeltaVina, GFN-FF, and PM6.
Usage
Training on ULVSH:
pandadock gnn train -d ULVSH/ -o models/ --epochs 100
Training on BindingDB:
python BindingDB_training/train_bindingdb.py \
--bindingdb BindingDB_training/bindingdb_affinity.tsv \
--output models/ --epochs 100
Combined Training (BindingDB + ULVSH):
python BindingDB_training/train_bindingdb.py \
--bindingdb BindingDB_training/bindingdb_affinity.tsv \
--ulvsh ULVSH/ --combined \
--output models/ --epochs 100
Prediction:
pandadock gnn predict -m model.pt -p protein.mol2 -l ligand.mol2
Hybrid Docking (Recommended):
pandadock hybrid -r protein.pdb -l ligand.sdf \
--center 10 20 30 --box 20 20 20 \
-m model.pt
References
The EGNN architecture is based on:
Satorras, V. G., Hoogeboom, E., & Welling, M. (2021). E(n) Equivariant Graph Neural Networks. International Conference on Machine Learning (ICML).