Skip to content

Torch Titan Integration

Forgather integrates with Torch Titan, PyTorch's large-scale training framework. The integration comes in two flavors:

Native Titan uses Forgather purely for configuration management. The training loop runs unmodified Torch Titan code; Forgather only supplies the YAML configuration and uses its template inheritance system to derive and override Titan configs. This requires no custom Titan code.

Forgather Titan (tiny_titan) constructs a TrainSpec from the Forgather configuration using dependency injection. This allows custom optimizers, LR schedulers, datasets, and other training assets to be swapped in via configuration without touching Python.

Basic usage

# List available configurations
forgather ls

# Show details of preprocessed configuration
forgather [-t CONFIG_NAME] pp

# Train (launches via torchrun with the correct settings)
forgather [-t CONFIG_NAME] train

# Start Tensorboard (on another terminal)
# Use "--bind_all" to expose on all interfaces, not just localhost
forgather tb [-- --bind_all]

Example projects

See examples/torchtitan/ for working configurations:

  • llama3 — Native Titan: reproduces the official Torch Titan Llama3 base configs via Forgather, demonstrating how template inheritance simplifies managing Titan YAML variants.
  • tiny_titan — Forgather Titan: a native Torch Titan trainer with dependency injection for training assets; includes an FSDP config for a 117M parameter Llama3 model.
  • test_parallelisms — Compares DDP, tensor parallel, and pipeline parallel strategies against a single-GPU control with matched effective batch size.