GNN Prediction

This guide covers using the trained GNN model for binding affinity prediction.

Basic Prediction

Predict binding affinity for a single protein-ligand complex:

pandadock gnn predict -m models/best_model.pt \
                      -p protein.mol2 -l ligand.mol2

Output:

Loading model...
Predicting...

Prediction Results:
  pEC50: 6.234
  Energy: -8.52 kcal/mol
  Activity probability: 0.87
  Predicted active: True

Supported File Formats

Protein files:

MOL2 (recommended)
PDB
Any format readable by BioPython

Ligand files:

MOL2 (recommended)
SDF
PDB
Any format readable by RDKit

Prediction Options

pandadock gnn predict \
    --model model.pt \
    --protein protein.mol2 \
    --ligand ligand.mol2 \
    --site site.mol2 \        # Optional binding site
    --output results.json     # Save results to file

Output Format

When using --output, results are saved as JSON:

{
  "pec50": 6.234,
  "energy": -8.52,
  "activity_prob": 0.87,
  "active": true
}

Interpreting Results

pEC50: Predicted -log10(EC50) value. Higher = stronger binding. Typical range: 4-10 (μM to nM affinity).
Energy: Estimated binding energy in kcal/mol. Calculated as: energy = -1.366 * pEC50
Activity Probability: Probability that the compound is active (EC50 < threshold). Range: 0-1.
Active: Binary classification: True if activity_prob > 0.5.

Batch Prediction

For multiple complexes, use the benchmark command:

pandadock gnn benchmark -m model.pt -d dataset/ -o results/

Or use Python API:

from pandadock.gnn.scoring import GNNScoring

scorer = GNNScoring(model_path='model.pt')

for protein, ligand in complexes:
    result = scorer.predict_affinity(protein, ligand)
    print(f"pEC50: {result['pec50']:.3f}")

Python API

Direct access to the GNN scorer:

from pandadock.gnn.scoring import GNNScoring

# Load model
scorer = GNNScoring(model_path='models/best_model.pt')

# Predict from files
result = scorer.predict_affinity(
    protein_file='protein.mol2',
    ligand_file='ligand.mol2',
    site_file='site.mol2'  # optional
)

print(f"pEC50: {result['pec50']:.3f}")
print(f"Energy: {result['energy']:.3f} kcal/mol")

# Predict from graph (advanced)
from pandadock.gnn.data.graph_builder import HeterogeneousGraphBuilder

builder = HeterogeneousGraphBuilder()
graph = builder.build_graph(protein_mol2, ligand_mol2, site_mol2)
result = scorer.predict_from_graph(graph)

Performance Notes

GPU inference: ~0.02 seconds per complex
CPU inference: ~0.1 seconds per complex
Memory: ~200 MB GPU memory per batch of 32

Troubleshooting

“Could not parse molecule”: Check file format. MOL2 is most reliable.
“Model config mismatch”: Ensure model was trained with same PandaDock version.
“CUDA out of memory”: Reduce batch size or use CPU (--cpu).