Skip to content

Models

A collection of model definitions.

Models

  • causal_lm - A vanilla decoder-only transformer, loosely based on "Attention is All You Need."
  • llama - Llama models in various sizes.
  • llama_canon - Llama extended with Canon layers (depthwise causal 1D convolutions) from Physics of Language Models: Part 4.1.
  • mistral - Mistral with sliding-window attention support.
  • qwen3 - Qwen3 architecture from the Qwen3 model family.
  • gemma3 - Google Gemma-3 text model with HuggingFace ↔ Forgather round-trip conversion support.
  • deepone - A large Deepnet transformer with ALiBi positional encoding.
  • singlehead - A minimal ALiBi transformer with a single attention head per layer; primarily a standalone custom-model example.

For the full forgather model command reference — constructing, testing, checkpoint handling, and using models with the HuggingFace and Forgather APIs — see docs/guides/model-cli.md.