pandadock-ml - ML-Enhanced Docking Command
The pandadock-ml command performs machine learning-enhanced molecular docking with deep learning scoring and pose prediction. It leverages graph neural networks and 3D convolutional networks for state-of-the-art accuracy.
Synopsis
pandadock-ml [OPTIONS]
Description
Performs molecular docking with ML-enhanced scoring:
Deep learning scoring function - Graph Neural Network (GNN) or 3D CNN
Pose ranking refinement - ML-based re-ranking of docked poses
Transfer learning - Pre-trained on PDBBind dataset
Uncertainty quantification - Confidence estimates for predictions
Ensemble models - Multiple models for robust predictions
Best accuracy: R = 0.91 correlation with experimental binding affinities.
Required Options
-r, --receptor PATHReceptor PDB file (protein structure)
-l, --ligand PATHLigand file (SDF, MOL2, or PDB format)
--center X Y ZGrid box center coordinates (X Y Z in Angstroms)
--box X Y ZGrid box dimensions (X Y Z in Angstroms)
ML Model Options
--model-type TYPEML model architecture. Default:
gnnOptions:
gnn- Graph Neural Network (recommended, fastest)cnn3d- 3D Convolutional Network (higher accuracy, slower)hybrid- Combined GNN + CNN (best accuracy)transformer- Transformer-based model (experimental)
--ml-scoring-mode MODEHow to use ML scoring. Default:
combinedOptions:
combined- Combine physics-based + ML scoringml_only- Use only ML scoringrefinement- Use ML for pose re-ranking only
--use-ensemble / --no-ensembleUse ensemble of ML models for robust predictions. Default: enabled
Ensemble averages predictions from 5 models trained on different data splits.
--model-weights PATHPath to custom model weights (optional)
Use pre-trained weights or your own fine-tuned model.
ML Feature Options
--include-protein-features / --no-protein-featuresInclude protein pocket features. Default: enabled
Protein features: pocket shape, hydrophobicity, electrostatics
--include-interaction-features / --no-interaction-featuresInclude protein-ligand interaction features. Default: enabled
Interaction features: H-bonds, ?-stacking, hydrophobic contacts
--include-pharmacophore / --no-pharmacophoreInclude pharmacophore features. Default: enabled
--grid-resolution FLOATGrid resolution for 3D CNN (Angstroms). Default: 0.5
Only used with
--model-type cnn3d
Docking Algorithm
-a, --algorithm ALGORITHMDocking algorithm for pose generation. Default:
enhanced_hierarchical_cpuML scoring can be combined with any docking algorithm.
Scoring Options
-s, --scoring FUNCTIONPhysics-based scoring for initial docking. Default:
physics_based--ml-weight FLOATWeight for ML score in combined mode. Default: 0.6
Final score = (1 - weight) ? physics + weight ? ML
--physics-weight FLOATWeight for physics score in combined mode. Default: 0.4
Uncertainty Quantification
--estimate-uncertainty / --no-estimate-uncertaintyEstimate prediction uncertainty. Default: enabled with ensemble
--uncertainty-threshold FLOATMaximum uncertainty for accepting predictions. Default: 1.0
Predictions with uncertainty > threshold are flagged as low confidence.
--monte-carlo-dropout / --no-monte-carlo-dropoutUse Monte Carlo dropout for uncertainty estimation. Default: disabled
More accurate but slower uncertainty estimates.
Output Options
-o, --output-dir PATHOutput directory. Default:
ml_docking_output-n, --num-poses NNumber of poses to generate. Default: 20
--visualize / --no-visualizeGenerate visualization plots. Default: enabled
--save-ml-featuresSave extracted ML features for analysis
--save-attention-mapsSave attention maps (for GNN/Transformer models)
Performance Options
--cpuworkers NNumber of CPU workers. Default: auto-detect
--gpuEnable GPU acceleration for ML inference
Highly recommended - 10-50x speedup for ML models
--gpu-batch-size NBatch size for GPU ML inference. Default: 32
--fastFast mode with reduced sampling
Examples
Basic ML Docking
pandadock-ml -r protein.pdb -l ligand.sdf \\
--center 10 20 30 --box 20 20 20 \\
-o ml_results/
Uses default GNN model with ensemble scoring.
High-Accuracy ML Docking
pandadock-ml -r protein.pdb -l ligand.sdf \\
--center 10 20 30 --box 20 20 20 \\
--model-type hybrid \\
--use-ensemble \\
--algorithm enhanced_hierarchical_cpu \\
--num-poses 50 \\
-o high_accuracy_ml/
GPU-Accelerated ML Docking
pandadock-ml -r target.pdb -l ligands.sdf \\
--center 10 20 30 --box 20 20 20 \\
--model-type gnn \\
--gpu \\
--gpu-batch-size 64 \\
-o gpu_ml_docking/
3D CNN Model
pandadock-ml -r protein.pdb -l ligand.sdf \\
--center 10 20 30 --box 20 20 20 \\
--model-type cnn3d \\
--grid-resolution 0.5 \\
--gpu \\
-o cnn3d_results/
ML-Only Scoring
pandadock-ml -r protein.pdb -l ligand.sdf \\
--center 10 20 30 --box 20 20 20 \\
--ml-scoring-mode ml_only \\
--model-type gnn \\
--use-ensemble \\
-o ml_only/
ML Pose Refinement
# First: Standard docking
pandadock dock -r protein.pdb -l ligands.sdf \\
--num-poses 100 \\
-o initial_docking/
# Second: ML re-ranking
pandadock-ml -r protein.pdb -l ligands.sdf \\
--ml-scoring-mode refinement \\
--model-type hybrid \\
--use-ensemble \\
-o ml_refined/
With Uncertainty Filtering
pandadock-ml -r protein.pdb -l library.sdf \\
--center 10 20 30 --box 20 20 20 \\
--use-ensemble \\
--estimate-uncertainty \\
--uncertainty-threshold 0.8 \\
-o filtered_predictions/
Only accepts predictions with uncertainty < 0.8
Custom Model Weights
pandadock-ml -r kinase.pdb -l inhibitors.sdf \\
--center 10 20 30 --box 20 20 20 \\
--model-type gnn \\
--model-weights kinase_finetuned.pt \\
-o custom_model/
Target-Specific Fine-Tuned Models
# Kinase-specific model
pandadock-ml -r kinase.pdb -l ligands.sdf \\
--model-type gnn \\
--model-weights models/kinase_specialist.pt \\
--center 10 20 30 --box 20 20 20
# GPCR-specific model
pandadock-ml -r gpcr.pdb -l ligands.sdf \\
--model-type gnn \\
--model-weights models/gpcr_specialist.pt \\
--center 10 20 30 --box 20 20 20
Output Files
Structures:
complex1.pdb, complex2.pdb, ...- Protein-ligand complexespose1.pdb, pose2.pdb, ...- Ligand poses only
Analysis:
ml_docking_results.json- Complete results with ML scoresml_predictions.csv- ML scores, uncertainties, featuresuncertainty_analysis.json- Uncertainty quantification resultsfeature_importance.json- ML feature importancesummary.txt- Human-readable summary
ML-Specific:
attention_maps/- Attention visualizations (if requested)ml_features/- Extracted features (if requested)
Visualizations:
ml_scores.png- ML score distributionuncertainty_plot.png- Uncertainty vs scorefeature_importance.png- Important features visualization
ML Predictions Output
{
"pose_1": {
"ml_score": -9.8,
"physics_score": -8.5,
"combined_score": -9.2,
"uncertainty": 0.45,
"confidence": "high",
"predicted_pKd": 8.5,
"predicted_Ki_nM": 3.2,
"feature_importance": {
"hydrophobic_contacts": 0.35,
"hydrogen_bonds": 0.28,
"shape_complementarity": 0.22,
"electrostatics": 0.15
}
}
}
Performance Characteristics
Accuracy:
R = 0.91 correlation with experimental data (hybrid model, ensemble)
R = 0.88 (GNN model)
R = 0.89 (3D CNN model)
Speed:
Model Type |
CPU Time |
GPU Time |
|---|---|---|
GNN |
0.1-0.2 s |
0.01 s |
3D CNN |
0.5-1.0 s |
0.05 s |
Hybrid |
0.3-0.5 s |
0.02 s |
Ensemble (x5) |
0.5-2.0 s |
0.05-0.1s |
Throughput:
CPU: 30-120 ligands/hour
GPU: 300-600 ligands/hour (10-20x speedup)
ML Models Details
Graph Neural Network (GNN)
Architecture:
Node features: Atomic properties (element, hybridization, charge)
Edge features: Bond type, distance, angle
Graph convolutions: 6 layers
Attention mechanism: Multi-head attention
Output: Binding affinity prediction
Advantages:
Fastest ML model
Rotationally/translationally invariant
Captures long-range interactions
Good generalization
3D Convolutional Network (CNN)
Architecture:
Input: 3D voxel grid (protein + ligand channels)
Convolution layers: 8 layers with batch normalization
Pooling: Max pooling between layers
Fully connected: 3 dense layers
Output: Binding affinity
Advantages:
Captures 3D spatial patterns
Good for shape complementarity
Handles electrostatics well
Hybrid Model
Combines GNN + 3D CNN:
GNN branch: Graph-based features
CNN branch: Spatial features
Late fusion: Concatenate features before final layers
Best accuracy but slower
Uncertainty Quantification
Methods:
Ensemble Disagreement
Uncertainty = standard deviation across ensemble predictions
Monte Carlo Dropout
Multiple forward passes with dropout enabled
Evidential Deep Learning
Direct uncertainty estimation (experimental)
Interpretation:
Low uncertainty (<0.5): High confidence
Medium uncertainty (0.5-1.0): Moderate confidence
High uncertainty (>1.0): Low confidence, novel chemical space
Use uncertainty to:
Filter unreliable predictions
Identify compounds requiring experimental validation
Detect out-of-distribution samples
Best Practices
When to Use ML Docking
Maximum accuracy required - Lead optimization, critical predictions
Novel scaffolds - ML can capture patterns physics-based scoring misses
Large datasets available - Can fine-tune models
GPU available - Makes ML inference fast
When Not to Use
Ultra-large screening (>100k compounds) - Too slow even with GPU
Very novel chemical space - May not generalize well
No GPU available and speed critical - Use faster scoring
Optimization Tips
For maximum accuracy:
--model-type hybrid \\
--use-ensemble \\
--estimate-uncertainty
For maximum speed:
--model-type gnn \\
--no-ensemble \\
--gpu \\
--gpu-batch-size 128
Balanced:
--model-type gnn \\
--use-ensemble \\
--gpu
Troubleshooting
Slow ML Inference
Problem: ML scoring very slow
Solutions:
Use GPU:
--gpuIncrease batch size:
--gpu-batch-size 64Use simpler model:
--model-type gnnDisable ensemble:
--no-ensemble
High Uncertainty
Problem: Many predictions have high uncertainty
Possible causes:
Novel chemical scaffolds not in training data
Unusual binding modes
Protein family not well-represented in training
Solutions:
Use physics-based or hybrid scoring as fallback
Fine-tune model on your target family
Flag high-uncertainty predictions for manual review
Model Loading Errors
Problem: Cannot load ML model weights
Solutions:
Verify model file exists
Check PyTorch/TensorFlow version compatibility
Re-download default models
Check file permissions
Out of GPU Memory
Problem: GPU out of memory during ML inference
Solutions:
Reduce batch size:
--gpu-batch-size 16Use smaller model:
--model-type gnn(not hybrid)Disable ensemble:
--no-ensembleUse CPU inference (remove
--gpu)
Fine-Tuning ML Models
You can fine-tune models on your own data:
# Train on custom dataset
pandadock-ml-train \\
--training-data my_protein_ligand_complexes.csv \\
--model-type gnn \\
--output-weights custom_model.pt
# Use fine-tuned model
pandadock-ml -r protein.pdb -l ligands.sdf \\
--model-weights custom_model.pt
Exit Status
Returns 0 on success, non-zero on error.
See Also
pandadock - Main Docking Command - Standard docking
pandadock-flex - Flexible Docking Command - Flexible docking
Hybrid ML Scoring - Hybrid ML scoring
Specialized Docking Modes - Specialized docking modes