Pipeline Parallel Training¶

Test harness for the Pipeline Parallel trainer, covering all supported PyTorch pipeline schedules, checkpoint save/resume, and activation checkpointing.

For documentation on the PipelineTrainer module, see docs/trainers/pipeline-parallel.md.

Configurations¶

Baseline (no pipeline):

Config	Description
`control.yaml`	Standard trainer, single GPU -- baseline for comparison

Single-stage schedules (stages_per_rank=1):

Config	GPUs	Schedule	Description
`tiny_gpipe_2gpu.yaml`	2	GPipe	Quick debug test with tiny model
`gpipe_2gpu.yaml`	2	GPipe	GPipe on 2 GPUs
`gpipe_4gpu.yaml`	4	GPipe	GPipe on 4 GPUs
`1f1b_4gpu.yaml`	4	1F1B	1-forward-1-backward on 4 GPUs

Multi-stage schedules (stages_per_rank=2):

Config	GPUs	Schedule	Description
`interleaved_1f1b_4gpu.yaml`	4	Interleaved 1F1B	2 stages per rank, interleaved
`looped_bfs_4gpu.yaml`	4	Looped BFS	Breadth-first scheduling
`zero_bubble_4gpu.yaml`	4	Interleaved Zero Bubble	Minimal pipeline bubble
`zbvz_4gpu.yaml`	4	ZBV Zero Bubble	V-pattern stage assignment

Activation checkpointing:

Config	Description
`gpipe_4gpu_activation_checkpoint.yaml`	GPipe with gradient checkpointing enabled

Checkpoint save/resume:

Config	Description
`checkpoint_test_train.yaml`	Train with frequent checkpoints (`save_steps: 50`)
`checkpoint_test_resume.yaml`	Resume from latest checkpoint

Cross-architecture:

Config	Description
`tiny_llama_gpipe_3gpu.yaml`	Llama model (instead of default causal) on 3 GPUs

Usage¶

# Run a 4-GPU pipeline test
forgather -t gpipe_4gpu.yaml train -d 0,1,2,3

# Compare schedules
forgather -t 1f1b_4gpu.yaml train -d 0,1,2,3
forgather -t interleaved_1f1b_4gpu.yaml train -d 0,1,2,3

# Test checkpoint round-trip
forgather -t checkpoint_test_train.yaml train -d 0,1
forgather -t checkpoint_test_resume.yaml train -d 0,1

Pipeline Parallel Training¶

Configurations¶

Usage¶

References¶