Hybrid Docking
The hybrid docking workflow combines traditional pose generation with GNN rescoring for optimal accuracy. This is the recommended approach for production use.
Why Hybrid Docking?
Traditional docking algorithms are good at:
Generating diverse pose conformations
Sampling the binding site efficiently
Running quickly
But they struggle with:
Accurate affinity ranking
Correlation with experimental data
The GNN excels at:
Predicting binding affinity
Ranking poses correctly
Correlating with experimental pEC50
The hybrid approach combines these strengths.
Workflow
Phase 1: Pose Generation
┌─────────────────────────────────────────┐
│ Hierarchical Docking (Vina scoring) │
│ → Generate 50+ diverse poses │
└────────────────────┬────────────────────┘
│
Phase 2: GNN Rescoring
┌────────────────────▼────────────────────┐
│ SE(3)-Equivariant GNN │
│ → Predict pEC50 for each pose │
└────────────────────┬────────────────────┘
│
Phase 3: Ranking & Output
┌────────────────────▼────────────────────┐
│ Rank by GNN pEC50 │
│ → Output top-K poses │
└─────────────────────────────────────────┘
Basic Usage
pandadock hybrid -r protein.pdb -l ligand.sdf \
--center 10 20 30 --box 20 20 20 \
-m models/best_model.pt
Full Options
pandadock hybrid \
--receptor protein.pdb \
--ligand ligand.sdf \
--center 10 20 30 \
--box 20 20 20 \
--model model.pt \
--output-dir hybrid_results/ \
--num-poses 50 \
--top-k 10 \
--fast # Optional: quick mode
Command Options
Option |
Default |
Description |
|---|---|---|
|
Required |
Receptor PDB file |
|
Required |
Ligand file (SDF/MOL2/PDB) |
|
Required |
Grid center coordinates (X Y Z) |
|
Required |
Grid box dimensions (X Y Z) |
|
Required |
GNN model checkpoint |
|
hybrid/ |
Output directory |
``-n/–num-poses``| 50 |
Poses to generate for rescoring |
|
|
10 |
Top poses to keep after rescoring |
|
False |
Use fast mode (fewer poses) |
Output Files
The output directory contains:
hybrid_results.csv: Rankings with GNN and Vina scorespose_1_pec50_X.XX.pdb: Top pose structurescomplex_1.pdb: Protein-ligand complexes
Example output table:
Rank GNN pEC50 GNN Energy Vina Energy Activity
1 7.234 -9.87 -8.23 0.92
2 6.891 -9.41 -7.89 0.85
3 6.543 -8.93 -8.45 0.78
...
Performance
Typical timings on GPU:
Phase 1 (50 poses): 10-30 seconds
Phase 2 (rescoring): 1-2 seconds
Total: 15-35 seconds per ligand
Compared to traditional docking alone:
~5x better ranking correlation
Same pose generation quality
Minimal overhead from GNN
Best Practices
Generate many poses: Use
--num-poses 50or moreKeep diverse poses: The GNN will find the best ones
Use trained model: Train on similar targets if possible
Check activity probability: High confidence = reliable prediction
Virtual Screening
For screening many ligands:
for ligand in ligands/*.sdf; do
pandadock hybrid -r protein.pdb -l "$ligand" \
--center 10 20 30 --box 20 20 20 \
-m model.pt -o "results/$(basename $ligand .sdf)"
done
Or use the Python API for parallelization.
Comparison with Traditional Docking
Metric |
Traditional |
Hybrid |
|---|---|---|
Affinity Correlation (R) |
0.12 |
0.67 |
Ranking Accuracy |
Low |
High |
Speed |
Fast |
Fast |
Pose Quality |
Good |
Good |