Finalizing a Trained Model¶

After pre-training, you typically want to leave the original training output directory untouched (so the training state stays reproducible) and produce a separate, clean directory ready for fine-tuning, sharing, or inference.

The forgather finalize command builds that destination directory in one step:

Copies the model source code (*.py)
Copies the tokenizer (with optional new tokens and a chat template)
Copies config.json (with token IDs synced to the tokenizer)
Synthesizes a generation_config.json (or carries the source's forward, merging in any new stop tokens)
Preserves exactly one checkpoint -- by default the latest -- with optimizer_state.pt carried optionally and scheduler / dataset / RNG / trainer state always dropped

Quick start¶

# Duplicate the latest checkpoint into a clean handoff directory
forgather finalize output_models/wds out/wds_final

# Same, but preserve optimizer state for warm-start fine-tuning
forgather finalize output_models/wds out/wds_final --keep-optimizer

# Add ChatML tokens, set a chat template, and synthesize a sampling
# generation_config from a preset:
forgather finalize output_models/wds out/wds_chatml \
    --add-tokens chatml.yaml -t chatml.jinja \
    --generation-config precise

# Pull from a specific (non-latest) checkpoint
forgather finalize output_models/wds out/wds_final \
    -c output_models/wds/checkpoints/checkpoint-385440

CLI reference¶

forgather finalize SOURCE DEST [options]

The destination must not exist; it is created by the command.

Source selection¶

Option	Description
`-c, --checkpoint PATH`	Source checkpoint directory. Defaults to the latest under `SOURCE/checkpoints/`; if there is no `checkpoints/` directory the loader falls back to `SOURCE` itself.

Vocabulary and chat template¶

Option	Description
`--add-tokens YAML`	YAML file specifying tokens to add. Same format as `forgather convert --add-tokens`.
`--skip-default-tokens`	Don't auto-add a PAD token if missing.
`-t, --chat-template-path FILE`	Jinja2 chat template applied to the tokenizer. If neither the source nor this flag provide one, finalize logs a warning.

Stop tokens (for `generation_config.json`)¶

By default, when --add-tokens introduces tokens named <|im_end|>, <|eot|>, or <|end_of_turn|>, those IDs are merged into the generation_config.eos_token_id list alongside the original EOS so generation stops on either. The original eos_token_id in config.json is not modified -- only the generation config gets the merged list.

Option	Description
`--no-auto-stop-tokens`	Disable auto-detection of end-of-turn tokens.
`--stop-tokens "TOK1,TOK2"`	Explicit additional stop-token strings.

Generation config¶

Option	Description
`--generation-config carry`	(Default) Copy source `generation_config.json` if present, else synthesize a minimal `{bos,pad,eos}` config.
`--generation-config none`	Skip writing `generation_config.json` entirely.
`--generation-config PATH`	Load directly from a JSON file in the Forgather inference-preset format (keys: `max_tokens`, `temperature`, `top_p`, `top_k`, `repetition_penalty`, `num_beams`, ...).
`--generation-config NAME`	Bare name resolved against `~/.config/forgather/generation_config/NAME.json`. No presets ship with this branch -- populate that directory yourself, or pass an explicit `PATH`.

Forgather presets use max_tokens (matching chat-completion APIs); finalize translates this to HuggingFace's max_new_tokens and infers do_sample when not explicit. Token IDs (bos, pad, eos) are always overlaid from the (possibly-updated) tokenizer last.

Checkpoint contents¶

Option	Description
`--keep-optimizer`	Carry `optimizer_state.pt` from the source checkpoint into the dest checkpoint. Scheduler, dataset, RNG, and trainer state are always dropped.
`--root-copy`	Write weights only at the model root and skip creating `DEST/checkpoints/`. Mutually exclusive with `--keep-optimizer`. The default writes weights into `DEST/checkpoints/checkpoint-N/` and creates relative symlinks at the root for HuggingFace `from_pretrained` compatibility.

Storage¶

Option	Description
`--safetensors`	Save as safetensors. Default is PyTorch (`.bin`). PyTorch handles tied embeddings natively; safetensors raises on save when weights are tied.
`--dtype {bfloat16,float16,float32}`	Cast weights to this dtype before saving. Default: keep the dtype the source checkpoint was saved in.
`--device STR`	Device for loading the model during finalize (default `cpu`).

Quantization¶

Option	Description
`--quantize RECIPE`	Quantize the model before saving using the named torchao recipe. Works on any source. If the source was trained with `--qat-recipe`, this completes the QAT round-trip and keeps the QAT training-time accuracy benefit. If the source is plain bf16, this is standard post-training quantization (PTQ). See QAT Training for the recipe list and the QAT-vs-PTQ tradeoff.

Examples:

# QAT round-trip: source was trained with --qat-recipe
forgather finalize output_models/qat_run out/qat_int8_int4 \
    --quantize int8-dynamic-act-int4-weight

# PTQ: plain bf16 source, same flag
forgather finalize output_models/bf16_run out/bf16_int8_int4_ptq \
    --quantize int8-dynamic-act-int4-weight

When --quantize is set, finalize always writes .bin: torchao's quantized tensor subclasses don't expose a single .storage().data_ptr(), which the safetensors writer requires. If --safetensors is passed alongside --quantize, it is silently disabled with a warning.

Finalize also writes a quantization_config block into config.json with the recipe. Forgather's native checkpoint loader consumes this hint (with a state_dict scan as fallback) and installs the matching quantized linear modules before weights load — so forgather eval, the inference server, and any other tool using the native loader handle the artifact transparently with no caller-side flag. The same block also enables HF AutoModelForCausalLM.from_pretrained() auto-detection for non-Forgather consumers. See Evaluating Quantized Models.

Misc¶

Option	Description
`--dry-run`	Resolve and report what would be done; write nothing.
`--log-level LEVEL`	Logging level (default `INFO`).

Destination layout¶

Default (HuggingFace-compatible with one preserved checkpoint):

DEST/
├── config.json
├── tokenizer.json, tokenizer_config.json, special_tokens_map.json
├── generation_config.json                  # synthesized or carried
├── *.py                                    # model source from SOURCE root
├── .package_files_copied
├── pytorch_model*.bin* + index.json        # SYMLINKS into checkpoint-N
└── checkpoints/
    └── checkpoint-N/
        ├── checkpoint_manifest.json        # rewritten: only kept components
        ├── pytorch_model*.bin* + index.json
        └── optimizer_state.pt              # only if --keep-optimizer

With --root-copy:

DEST/
├── config.json
├── tokenizer*.json
├── generation_config.json
├── *.py
├── .package_files_copied
└── pytorch_model*.bin* + index.json        # at root, no checkpoints/

Token configuration format¶

The --add-tokens flag accepts a YAML file. Bundled, ready-to-use configs live in add_tokens_config/ -- start there for common cases (e.g. ChatML setup). For the full format reference, init strategies, and authoring guide, see Add-Tokens Config.

Short example:

eos_token: "<|im_end|>"

special_tokens:
  - "<|im_start|>"

pad_token:
  token: "<|pad|>"
  init: "zero"
  if_missing: true