pandadock gnn - GNN Commands Reference

The pandadock gnn command group provides access to the SE(3)-equivariant Graph Neural Network scoring function.

Synopsis

pandadock gnn COMMAND [OPTIONS]

Commands

  • download-model - Download pre-trained model (~82 MB)

  • train - Train GNN model on protein-ligand dataset

  • predict - Predict binding affinity for a complex

  • benchmark - Benchmark model performance on test set

  • compare - Compare GNN against baseline scoring methods

  • rescore - Universal rescorer for poses from ANY docking tool

pandadock gnn download-model

Download the official pre-trained PandaDock-GNN model from GitHub releases.

The model was trained on the combined ULVSH + PDBbind dataset (200 epochs) and achieves:

  • PDBbind Pearson R: 0.88

  • ULVSH Test Pearson R: 0.82

  • ULVSH Activity AUC: 0.94

Options:

-o, --output PATH

Output directory for the model. Default: models/

-v, --version TEXT

Model version to download. Default: latest

-f, --force

Overwrite existing model file

Example:

# Download to default location
pandadock gnn download-model

# Download to custom directory
pandadock gnn download-model -o /path/to/models/

# Force re-download
pandadock gnn download-model --force

Output:

The model is saved as pandadock_gnn_v3.pt in the output directory.

After downloading, use the model with:

pandadock gnn predict -m models/pandadock_gnn_v3.pt -p protein.mol2 -l ligand.mol2
pandadock gnn rescore -m models/pandadock_gnn_v3.pt -r protein.pdb -p poses.sdf
pandadock hybrid -r protein.pdb -l ligand.sdf -m models/pandadock_gnn_v3.pt --center X Y Z --box X Y Z

pandadock gnn train

Train the PandaDock-GNN model on a protein-ligand dataset.

Required Options:

-d, --dataset PATH

Path to ULVSH dataset directory

-o, --output PATH

Output directory for checkpoints and logs

Optional Options:

--epochs N

Number of training epochs. Default: 100

--batch-size N

Batch size. Default: 32

--lr FLOAT

Learning rate. Default: 1e-4

--hidden-dim N

Hidden dimension. Default: 256

--num-layers N

Number of EGNN layers. Default: 6

--dropout FLOAT

Dropout rate. Default: 0.1

--split [random|target]

Data split strategy. Default: random

--patience N

Early stopping patience. Default: 20

--gpu / --cpu

Use GPU if available. Default: –gpu

--seed N

Random seed for reproducibility. Default: 42

Example:

pandadock gnn train -d ULVSH/ -o models/ --epochs 100

pandadock gnn predict

Predict binding affinity for a protein-ligand complex.

Required Options:

-m, --model PATH

Path to trained model checkpoint

-p, --protein PATH

Protein file (MOL2 or PDB)

-l, --ligand PATH

Ligand file (MOL2 or SDF)

Optional Options:

-s, --site PATH

Optional binding site MOL2 file

-o, --output PATH

Output JSON file for results

Example:

pandadock gnn predict -m model.pt -p protein.mol2 -l ligand.mol2

Output:

Prediction Results:
  pEC50: 6.234
  Energy: -8.52 kcal/mol
  Activity probability: 0.87
  Predicted active: True

pandadock gnn benchmark

Benchmark GNN model performance on a test set.

Required Options:

-m, --model PATH

Path to trained model checkpoint

-d, --dataset PATH

Path to ULVSH dataset directory

-o, --output PATH

Output directory for results

Optional Options:

--split [train|val|test]

Dataset split to evaluate. Default: test

Example:

pandadock gnn benchmark -m model.pt -d ULVSH/ -o results/

Output:

Generates metrics.json with Pearson R, Spearman rho, RMSE, and MAE.

pandadock gnn compare

Compare GNN performance against all baseline scoring methods from the ULVSH dataset.

Required Options:

-m, --model PATH

Path to trained model checkpoint

-d, --dataset PATH

Path to ULVSH dataset directory

-o, --output PATH

Output directory for comparison results

Optional Options:

--split [train|val|test|all]

Dataset split to evaluate. Default: test

Example:

pandadock gnn compare -m model.pt -d ULVSH/ -o comparison/

Output:

Generates:

  • comparison_results.csv - Metrics for all methods

  • comparison_results.json - JSON format

  • comparison_plot.png - Bar chart visualization

Example output:

COMPARISON RESULTS (sorted by Pearson R)
======================================================================

Method               Type            N      Pearson R
------------------------------------------------------------
>>> PandaDock-GNN    ML Scoring    942        0.6705 <<<
VM2                  ULVSH Baseline 942        0.1452
PM6                  ULVSH Baseline 939        0.0809
Hyde                 ULVSH Baseline 942        0.0178
...

PandaDock-GNN Rank: 1/9
*** PandaDock-GNN achieves BEST performance! ***

pandadock gnn rescore

Universal GNN rescorer for poses from any docking tool.

This command allows you to rescore docked poses from ANY docking software (AutoDock Vina, Glide, GOLD, pandadock-flex, pandadock-metal, etc.) using the SE(3)-equivariant GNN scoring function.

Required Options:

-m, --model PATH

Path to trained model checkpoint

-r, --receptor PATH

Receptor file (PDB or MOL2)

-p, --poses PATH

Poses file (multi-conformer SDF from any docking tool)

Optional Options:

-o, --output PATH

Output CSV file with ranked poses. Default: rescored_poses.csv

--output-sdf PATH

Output SDF file with poses ranked by GNN score and GNN properties added

--site-radius FLOAT

Radius around ligand centroid to extract binding site (Angstrom). Default: 10.0

Examples:

# Rescore poses from pandadock-flex
pandadock gnn rescore -m model.pt -r protein.pdb -p flex_poses.sdf

# Rescore AutoDock Vina output
pandadock gnn rescore -m model.pt -r receptor.pdb -p vina_out.sdf -o ranked.csv

# Get ranked SDF with GNN scores as properties
pandadock gnn rescore -m model.pt -r protein.pdb -p poses.sdf --output-sdf ranked.sdf

# Rescore Glide output
pandadock gnn rescore -m model.pt -r protein.pdb -p glide_poses.sdf -o glide_rescored.csv

Output CSV Format:

pose_name,pose_index,gnn_pKd,gnn_energy,activity_prob,predicted_active,gnn_rank
pose_3,3,7.234,-9.88,0.92,True,1
pose_1,1,6.891,-9.41,0.88,True,2
pose_5,5,6.543,-8.93,0.81,True,3
...

Output SDF Properties:

When --output-sdf is specified, each molecule in the output SDF will have:

  • GNN_pKd - Predicted pKd/pKi value

  • GNN_Energy - Predicted binding energy (kcal/mol)

  • GNN_Activity - Activity probability (0-1)

  • GNN_Rank - Rank based on GNN score (1 = best)

Workflow Example:

Combine with any docking tool:

# Step 1: Run flexible docking with pandadock-flex
pandadock-flex -r protein.pdb -l ligand.sdf --center 10 20 30 -o flex_output/

# Step 2: Rescore poses with GNN
pandadock gnn rescore -m model.pt -r protein.pdb -p flex_output/poses.sdf \\
    -o flex_rescored.csv --output-sdf flex_rescored.sdf

# Or with AutoDock Vina
vina --receptor receptor.pdbqt --ligand ligand.pdbqt --out vina_poses.sdf
pandadock gnn rescore -m model.pt -r receptor.pdb -p vina_poses.sdf

See Also