GNN Prediction

This guide covers using the trained GNN model for binding affinity prediction.

Basic Prediction

Predict binding affinity for a single protein-ligand complex:

pandadock gnn predict -m models/best_model.pt \
                      -p protein.mol2 -l ligand.mol2

Output:

Loading model...
Predicting...

Prediction Results:
  pEC50: 6.234
  Energy: -8.52 kcal/mol
  Activity probability: 0.87
  Predicted active: True

Supported File Formats

Protein files:

  • MOL2 (recommended)

  • PDB

  • Any format readable by BioPython

Ligand files:

  • MOL2 (recommended)

  • SDF

  • PDB

  • Any format readable by RDKit

Prediction Options

pandadock gnn predict \
    --model model.pt \
    --protein protein.mol2 \
    --ligand ligand.mol2 \
    --site site.mol2 \        # Optional binding site
    --output results.json     # Save results to file

Output Format

When using --output, results are saved as JSON:

{
  "pec50": 6.234,
  "energy": -8.52,
  "activity_prob": 0.87,
  "active": true
}

Interpreting Results

pEC50

Predicted -log10(EC50) value. Higher = stronger binding. Typical range: 4-10 (μM to nM affinity).

Energy

Estimated binding energy in kcal/mol. Calculated as: energy = -1.366 * pEC50

Activity Probability

Probability that the compound is active (EC50 < threshold). Range: 0-1.

Active

Binary classification: True if activity_prob > 0.5.

Batch Prediction

For multiple complexes, use the benchmark command:

pandadock gnn benchmark -m model.pt -d dataset/ -o results/

Or use Python API:

from pandadock.gnn.scoring import GNNScoring

scorer = GNNScoring(model_path='model.pt')

for protein, ligand in complexes:
    result = scorer.predict_affinity(protein, ligand)
    print(f"pEC50: {result['pec50']:.3f}")

Python API

Direct access to the GNN scorer:

from pandadock.gnn.scoring import GNNScoring

# Load model
scorer = GNNScoring(model_path='models/best_model.pt')

# Predict from files
result = scorer.predict_affinity(
    protein_file='protein.mol2',
    ligand_file='ligand.mol2',
    site_file='site.mol2'  # optional
)

print(f"pEC50: {result['pec50']:.3f}")
print(f"Energy: {result['energy']:.3f} kcal/mol")

# Predict from graph (advanced)
from pandadock.gnn.data.graph_builder import HeterogeneousGraphBuilder

builder = HeterogeneousGraphBuilder()
graph = builder.build_graph(protein_mol2, ligand_mol2, site_mol2)
result = scorer.predict_from_graph(graph)

Performance Notes

  • GPU inference: ~0.02 seconds per complex

  • CPU inference: ~0.1 seconds per complex

  • Memory: ~200 MB GPU memory per batch of 32

Troubleshooting

“Could not parse molecule”

Check file format. MOL2 is most reliable.

“Model config mismatch”

Ensure model was trained with same PandaDock version.

“CUDA out of memory”

Reduce batch size or use CPU (--cpu).