GNN Prediction ============== This guide covers using the trained GNN model for binding affinity prediction. Basic Prediction ---------------- Predict binding affinity for a single protein-ligand complex: .. code-block:: bash pandadock gnn predict -m models/best_model.pt \ -p protein.mol2 -l ligand.mol2 Output: .. code-block:: text Loading model... Predicting... Prediction Results: pEC50: 6.234 Energy: -8.52 kcal/mol Activity probability: 0.87 Predicted active: True Supported File Formats ---------------------- **Protein files:** * MOL2 (recommended) * PDB * Any format readable by BioPython **Ligand files:** * MOL2 (recommended) * SDF * PDB * Any format readable by RDKit Prediction Options ------------------ .. code-block:: bash pandadock gnn predict \ --model model.pt \ --protein protein.mol2 \ --ligand ligand.mol2 \ --site site.mol2 \ # Optional binding site --output results.json # Save results to file Output Format ------------- When using ``--output``, results are saved as JSON: .. code-block:: json { "pec50": 6.234, "energy": -8.52, "activity_prob": 0.87, "active": true } Interpreting Results -------------------- **pEC50** Predicted -log10(EC50) value. Higher = stronger binding. Typical range: 4-10 (μM to nM affinity). **Energy** Estimated binding energy in kcal/mol. Calculated as: energy = -1.366 * pEC50 **Activity Probability** Probability that the compound is active (EC50 < threshold). Range: 0-1. **Active** Binary classification: True if activity_prob > 0.5. Batch Prediction ---------------- For multiple complexes, use the benchmark command: .. code-block:: bash pandadock gnn benchmark -m model.pt -d dataset/ -o results/ Or use Python API: .. code-block:: python from pandadock.gnn.scoring import GNNScoring scorer = GNNScoring(model_path='model.pt') for protein, ligand in complexes: result = scorer.predict_affinity(protein, ligand) print(f"pEC50: {result['pec50']:.3f}") Python API ---------- Direct access to the GNN scorer: .. code-block:: python from pandadock.gnn.scoring import GNNScoring # Load model scorer = GNNScoring(model_path='models/best_model.pt') # Predict from files result = scorer.predict_affinity( protein_file='protein.mol2', ligand_file='ligand.mol2', site_file='site.mol2' # optional ) print(f"pEC50: {result['pec50']:.3f}") print(f"Energy: {result['energy']:.3f} kcal/mol") # Predict from graph (advanced) from pandadock.gnn.data.graph_builder import HeterogeneousGraphBuilder builder = HeterogeneousGraphBuilder() graph = builder.build_graph(protein_mol2, ligand_mol2, site_mol2) result = scorer.predict_from_graph(graph) Performance Notes ----------------- * **GPU inference**: ~0.02 seconds per complex * **CPU inference**: ~0.1 seconds per complex * **Memory**: ~200 MB GPU memory per batch of 32 Troubleshooting --------------- **"Could not parse molecule"** Check file format. MOL2 is most reliable. **"Model config mismatch"** Ensure model was trained with same PandaDock version. **"CUDA out of memory"** Reduce batch size or use CPU (``--cpu``).