Models¶
A collection of model definitions.
Models¶
- causal_lm - A vanilla decoder-only transformer, loosely based on "Attention is All You Need."
- llama - Llama models in various sizes.
- llama_canon - Llama extended with Canon layers (depthwise causal 1D convolutions) from Physics of Language Models: Part 4.1.
- mistral - Mistral with sliding-window attention support.
- qwen3 - Qwen3 architecture from the Qwen3 model family.
- gemma3 - Google Gemma-3 text model with HuggingFace ↔ Forgather round-trip conversion support.
- deepone - A large Deepnet transformer with ALiBi positional encoding.
- singlehead - A minimal ALiBi transformer with a single attention head per layer; primarily a standalone custom-model example.
For the full forgather model command reference — constructing, testing, checkpoint handling, and using models with the HuggingFace and Forgather APIs — see docs/guides/model-cli.md.