Hybrid Docking

The hybrid docking workflow combines traditional pose generation with GNN rescoring for optimal accuracy. This is the recommended approach for production use.

Why Hybrid Docking?

Traditional docking algorithms are good at:

  • Generating diverse pose conformations

  • Sampling the binding site efficiently

  • Running quickly

But they struggle with:

  • Accurate affinity ranking

  • Correlation with experimental data

The GNN excels at:

  • Predicting binding affinity

  • Ranking poses correctly

  • Correlating with experimental pEC50

The hybrid approach combines these strengths.

Workflow

Phase 1: Pose Generation
┌─────────────────────────────────────────┐
│  Hierarchical Docking (Vina scoring)    │
│  → Generate 50+ diverse poses           │
└────────────────────┬────────────────────┘
                     │
Phase 2: GNN Rescoring
┌────────────────────▼────────────────────┐
│  SE(3)-Equivariant GNN                  │
│  → Predict pEC50 for each pose          │
└────────────────────┬────────────────────┘
                     │
Phase 3: Ranking & Output
┌────────────────────▼────────────────────┐
│  Rank by GNN pEC50                      │
│  → Output top-K poses                   │
└─────────────────────────────────────────┘

Basic Usage

pandadock hybrid -r protein.pdb -l ligand.sdf \
                 --center 10 20 30 --box 20 20 20 \
                 -m models/best_model.pt

Full Options

pandadock hybrid \
    --receptor protein.pdb \
    --ligand ligand.sdf \
    --center 10 20 30 \
    --box 20 20 20 \
    --model model.pt \
    --output-dir hybrid_results/ \
    --num-poses 50 \
    --top-k 10 \
    --fast                    # Optional: quick mode

Command Options

Option

Default

Description

-r/--receptor

Required

Receptor PDB file

-l/--ligand

Required

Ligand file (SDF/MOL2/PDB)

--center

Required

Grid center coordinates (X Y Z)

--box

Required

Grid box dimensions (X Y Z)

-m/--model

Required

GNN model checkpoint

-o/--output

hybrid/

Output directory

``-n/–num-poses``| 50

Poses to generate for rescoring

--top-k

10

Top poses to keep after rescoring

--fast

False

Use fast mode (fewer poses)

Output Files

The output directory contains:

  • hybrid_results.csv: Rankings with GNN and Vina scores

  • pose_1_pec50_X.XX.pdb: Top pose structures

  • complex_1.pdb: Protein-ligand complexes

Example output table:

Rank  GNN pEC50    GNN Energy    Vina Energy    Activity
1     7.234        -9.87         -8.23          0.92
2     6.891        -9.41         -7.89          0.85
3     6.543        -8.93         -8.45          0.78
...

Performance

Typical timings on GPU:

  • Phase 1 (50 poses): 10-30 seconds

  • Phase 2 (rescoring): 1-2 seconds

  • Total: 15-35 seconds per ligand

Compared to traditional docking alone:

  • ~5x better ranking correlation

  • Same pose generation quality

  • Minimal overhead from GNN

Best Practices

  1. Generate many poses: Use --num-poses 50 or more

  2. Keep diverse poses: The GNN will find the best ones

  3. Use trained model: Train on similar targets if possible

  4. Check activity probability: High confidence = reliable prediction

Virtual Screening

For screening many ligands:

for ligand in ligands/*.sdf; do
    pandadock hybrid -r protein.pdb -l "$ligand" \
                     --center 10 20 30 --box 20 20 20 \
                     -m model.pt -o "results/$(basename $ligand .sdf)"
done

Or use the Python API for parallelization.

Comparison with Traditional Docking

Metric

Traditional

Hybrid

Affinity Correlation (R)

0.12

0.67

Ranking Accuracy

Low

High

Speed

Fast

Fast

Pose Quality

Good

Good