Finalizing a Trained Model¶
After pre-training, you typically want to leave the original training output directory untouched (so the training state stays reproducible) and produce a separate, clean directory ready for fine-tuning, sharing, or inference.
The forgather finalize command builds that destination directory in one step:
- Copies the model source code (
*.py) - Copies the tokenizer (with optional new tokens and a chat template)
- Copies
config.json(with token IDs synced to the tokenizer) - Synthesizes a
generation_config.json(or carries the source's forward, merging in any new stop tokens) - Preserves exactly one checkpoint -- by default the latest -- with
optimizer_state.ptcarried optionally and scheduler / dataset / RNG / trainer state always dropped
Quick start¶
# Duplicate the latest checkpoint into a clean handoff directory
forgather finalize output_models/wds out/wds_final
# Same, but preserve optimizer state for warm-start fine-tuning
forgather finalize output_models/wds out/wds_final --keep-optimizer
# Add ChatML tokens, set a chat template, and synthesize a sampling
# generation_config from a preset:
forgather finalize output_models/wds out/wds_chatml \
--add-tokens chatml.yaml -t chatml.jinja \
--generation-config precise
# Pull from a specific (non-latest) checkpoint
forgather finalize output_models/wds out/wds_final \
-c output_models/wds/checkpoints/checkpoint-385440
CLI reference¶
The destination must not exist; it is created by the command.
Source selection¶
| Option | Description |
|---|---|
-c, --checkpoint PATH |
Source checkpoint directory. Defaults to the latest under SOURCE/checkpoints/; if there is no checkpoints/ directory the loader falls back to SOURCE itself. |
Vocabulary and chat template¶
| Option | Description |
|---|---|
--add-tokens YAML |
YAML file specifying tokens to add. Same format as forgather convert --add-tokens. |
--skip-default-tokens |
Don't auto-add a PAD token if missing. |
-t, --chat-template-path FILE |
Jinja2 chat template applied to the tokenizer. If neither the source nor this flag provide one, finalize logs a warning. |
Stop tokens (for generation_config.json)¶
By default, when --add-tokens introduces tokens named <|im_end|>,
<|eot|>, or <|end_of_turn|>, those IDs are merged into the
generation_config.eos_token_id list alongside the original EOS so generation
stops on either. The original eos_token_id in config.json is not
modified -- only the generation config gets the merged list.
| Option | Description |
|---|---|
--no-auto-stop-tokens |
Disable auto-detection of end-of-turn tokens. |
--stop-tokens "TOK1,TOK2" |
Explicit additional stop-token strings. |
Generation config¶
| Option | Description |
|---|---|
--generation-config carry |
(Default) Copy source generation_config.json if present, else synthesize a minimal {bos,pad,eos} config. |
--generation-config none |
Skip writing generation_config.json entirely. |
--generation-config PATH |
Load directly from a JSON file in the Forgather inference-preset format (keys: max_tokens, temperature, top_p, top_k, repetition_penalty, num_beams, ...). |
--generation-config NAME |
Bare name resolved against ~/.config/forgather/generation_config/NAME.json. No presets ship with this branch -- populate that directory yourself, or pass an explicit PATH. |
Forgather presets use max_tokens (matching chat-completion APIs); finalize
translates this to HuggingFace's max_new_tokens and infers do_sample
when not explicit. Token IDs (bos, pad, eos) are always overlaid from
the (possibly-updated) tokenizer last.
Checkpoint contents¶
| Option | Description |
|---|---|
--keep-optimizer |
Carry optimizer_state.pt from the source checkpoint into the dest checkpoint. Scheduler, dataset, RNG, and trainer state are always dropped. |
--root-copy |
Write weights only at the model root and skip creating DEST/checkpoints/. Mutually exclusive with --keep-optimizer. The default writes weights into DEST/checkpoints/checkpoint-N/ and creates relative symlinks at the root for HuggingFace from_pretrained compatibility. |
Storage¶
| Option | Description |
|---|---|
--safetensors |
Save as safetensors. Default is PyTorch (.bin). PyTorch handles tied embeddings natively; safetensors raises on save when weights are tied. |
--dtype {bfloat16,float16,float32} |
Cast weights to this dtype before saving. Default: keep the dtype the source checkpoint was saved in. |
--device STR |
Device for loading the model during finalize (default cpu). |
Quantization¶
| Option | Description |
|---|---|
--quantize RECIPE |
Quantize the model before saving using the named torchao recipe. Works on any source. If the source was trained with --qat-recipe, this completes the QAT round-trip and keeps the QAT training-time accuracy benefit. If the source is plain bf16, this is standard post-training quantization (PTQ). See QAT Training for the recipe list and the QAT-vs-PTQ tradeoff. |
Examples:
# QAT round-trip: source was trained with --qat-recipe
forgather finalize output_models/qat_run out/qat_int8_int4 \
--quantize int8-dynamic-act-int4-weight
# PTQ: plain bf16 source, same flag
forgather finalize output_models/bf16_run out/bf16_int8_int4_ptq \
--quantize int8-dynamic-act-int4-weight
When --quantize is set, finalize always writes .bin: torchao's
quantized tensor subclasses don't expose a single .storage().data_ptr(),
which the safetensors writer requires. If --safetensors is passed
alongside --quantize, it is silently disabled with a warning.
Finalize also writes a quantization_config block into config.json
with the recipe. Forgather's native checkpoint loader consumes this
hint (with a state_dict scan as fallback) and installs the matching
quantized linear modules before weights load — so forgather eval,
the inference server, and any other tool using the native loader
handle the artifact transparently with no caller-side flag. The same
block also enables HF AutoModelForCausalLM.from_pretrained()
auto-detection for non-Forgather consumers. See Evaluating Quantized
Models.
Misc¶
| Option | Description |
|---|---|
--dry-run |
Resolve and report what would be done; write nothing. |
--log-level LEVEL |
Logging level (default INFO). |
Destination layout¶
Default (HuggingFace-compatible with one preserved checkpoint):
DEST/
├── config.json
├── tokenizer.json, tokenizer_config.json, special_tokens_map.json
├── generation_config.json # synthesized or carried
├── *.py # model source from SOURCE root
├── .package_files_copied
├── pytorch_model*.bin* + index.json # SYMLINKS into checkpoint-N
└── checkpoints/
└── checkpoint-N/
├── checkpoint_manifest.json # rewritten: only kept components
├── pytorch_model*.bin* + index.json
└── optimizer_state.pt # only if --keep-optimizer
With --root-copy:
DEST/
├── config.json
├── tokenizer*.json
├── generation_config.json
├── *.py
├── .package_files_copied
└── pytorch_model*.bin* + index.json # at root, no checkpoints/
Token configuration format¶
The --add-tokens flag accepts a YAML file. Bundled, ready-to-use configs
live in add_tokens_config/ -- start there for
common cases (e.g. ChatML setup). For the full format reference, init
strategies, and authoring guide, see
Add-Tokens Config.
Short example:
eos_token: "<|im_end|>"
special_tokens:
- "<|im_start|>"
pad_token:
token: "<|pad|>"
init: "zero"
if_missing: true
See also¶
- Model Conversion -- full HuggingFace ↔ Forgather
conversion, also supports
--add-tokensand-t. - EOS Tokens and
generate()Stopping Criteria -- theory of operation: how HF'sgenerate()resolves stopping across the multiple files that carry EOS information. - QAT Training -- pair
--quantizehere with--qat-recipeat training time for the full QAT round-trip, or use--quantizealone on a plain bf16 source for post-training quantization.