Troubleshooting
Common errors and solutions when using Lighter.
Configuration Errors
ModuleNotFoundError: No module named 'project'
Cause: Missing __init__.py files or incorrect project path
Solution:
# In your config.yaml
project: ./my_project # Ensure path is correct
# Ensure all module directories have __init__.py:
my_project/
├── __init__.py # Required!
├── models/
│ ├── __init__.py # Required!
│ └── my_model.py
Config Reference Errors
Wrong: "$@system#model#parameters()" - Using # for attributes
Correct: "$@system#model.parameters()" - Use . for Python attributes
Wrong: Circular references
Correct: Use vars section
YAML Syntax Errors
Common mistakes:
- Missing colons after keys
- Inconsistent indentation (use spaces, not tabs)
- Missing quotes around values with special characters
- Missing values (like the roi_size example in inferers)
Training Issues
CUDA Out of Memory
Solutions:
# Reduce batch size
lighter fit config.yaml --system#dataloaders#train#batch_size=8
# Enable gradient accumulation
lighter fit config.yaml --trainer#accumulate_grad_batches=4
# Use mixed precision
lighter fit config.yaml --trainer#precision="16-mixed"
For distributed strategies, see PyTorch Lightning docs.
Loss is NaN
Check: 1. Learning rate too high → Reduce by 10x 2. Missing data normalization → Add transforms 3. Wrong loss function for task → Verify criterion 4. Gradient explosion → Add gradient clipping in Trainer config
Slow Training
Optimize:
system:
dataloaders:
train:
num_workers: 8 # Increase for faster data loading
pin_memory: true # For GPU training
persistent_workers: true # Reduce worker startup overhead
For profiling and optimization, see PyTorch Lightning performance docs.
Debugging Strategies
Quick Testing
Debug Config Values
Check Adapter Outputs
Temporarily add print transforms in adapters:
Getting Help
- Search this documentation
- Check FAQ
- Review PyTorch Lightning docs for Trainer issues
- Join Discord
- Open GitHub issue