Models¶

A collection of model definitions.

Models¶

causal_lm - A vanilla decoder-only transformer, loosely based on "Attention is All You Need."
llama - Llama models in various sizes.
llama_canon - Llama extended with Canon layers (depthwise causal 1D convolutions) from Physics of Language Models: Part 4.1.
mistral - Mistral with sliding-window attention support.
qwen3 - Qwen3 architecture from the Qwen3 model family.
gemma3 - Google Gemma-3 text model with HuggingFace ↔ Forgather round-trip conversion support.
deepone - A large Deepnet transformer with ALiBi positional encoding.
singlehead - A minimal ALiBi transformer with a single attention head per layer; primarily a standalone custom-model example.

For the full forgather model command reference — constructing, testing, checkpoint handling, and using models with the HuggingFace and Forgather APIs — see docs/guides/model-cli.md.