Updating a Saved Model to Newer Forgather Sources¶
When the source code or templates for a Forgather model architecture
change -- a parameter is renamed, a layer refactored, a config field
renamed -- existing saved models become incompatible with the new code:
loading the checkpoint with strict=True fails on renamed FQNs, and
strict=False silently leaves the renamed parameters uninitialised.
forgather update migrates a saved Forgather model in place: it
regenerates the model code from the current sources and applies a
chain of versioned migrations to the saved config and weights, so the
result is a working model on the new schema with the original
hyperparameters preserved.
This is the in-Forgather counterpart to
forgather convert. Where convert round-trips
through HuggingFace, update stays inside Forgather and does not
require an HF counterpart for the architecture.
Quick start¶
# Migrate a saved model to the current schema for its arch
forgather update output_models/my_llama out/my_llama_v2
The tool reads the source model's forgather_arch and
forgather_arch_version from config.json, looks up the registered
converter for that arch, walks its migration chain end-to-end, and
writes the result to the destination directory.
When you need this¶
forgather update solves a different problem from
forgather model construct. model construct rebuilds a
model from a project + config template, which only works when you still
have a config that matches the saved hyperparameters. Real saved models
routinely carry customisations that have drifted from any template
default -- an extended RoPE base for long-context training, a hand-tuned
sliding window, a non-default head dim. Those values must come from the
saved config.json, not from a fresh template.
forgather update:
- Reads hyperparameters from the saved config and translates them through versioned config-translators.
- Applies parameter-FQN renames to the saved state_dict.
- Optionally applies weight-level transforms (head reshapes, permutations) when a step requires them.
- Stamps the new schema version into the destination's
config.json.
It does not re-train the model, change vocabulary, or rebuild the
tokenizer. Tokenizer files, generation_config.json, and any chat
template are carried forward verbatim.
CLI reference¶
Provenance overrides (use when the source model predates schema stamping):
| Option | Description |
|---|---|
--arch NAME |
Converter registry key (e.g. llama); overrides forgather_arch from source config |
--from-version N |
Source schema version; overrides forgather_arch_version from source config |
--to-version N |
Stop the chain at version N (default: the converter's current arch_version) |
Checkpoint:
| Option | Description |
|---|---|
-c, --checkpoint PATH |
Specific checkpoint directory to migrate (default: latest under SRC/checkpoints/, falling back to SRC root) |
Model properties:
| Option | Description |
|---|---|
--device {cpu,cuda,...} |
Device used during migration (default: cpu) |
--dtype {bfloat16,float16,float32} |
Override saved dtype (default: keep checkpoint dtype) |
--safetensors |
Save weights as safetensors (default: PyTorch .bin) |
--no-strict |
Allow missing/unexpected keys when loading the migrated state_dict (default: strict) |
Other:
| Option | Description |
|---|---|
--converter-path PATH |
Additional directory to search for converter plugins (repeatable) |
--dry-run |
Resolve and print the migration plan; write nothing |
--log-level LEVEL |
Logging level (default: INFO) |
How updates work¶
Schema provenance¶
Forgather model code generation stamps two fields into every newly
saved config.json:
forgather_arch-- string, the converter registry key for the architecture (e.g."llama"). Defaults tons.model_short_namefrom the model template, which already matches the converter key for archs underexamples/models/.forgather_arch_version-- PEP 440 version string (e.g."1","1.2","2.3.1") recording the schema version of the Forgather sources at the time of save. Defaults to"1". Pre-PEP-440 saved configs may carry a bare integer here; the tool coerces those cleanly so older models still load.
forgather update reads these fields to identify the migration
source. Models saved before this stamping was added simply lack the
fields; in that case the tool falls back to --arch llama and
--from-version 1 (the first schema version) and prints a warning
naming both fallbacks. The fallback is right for the overwhelming
majority of pre-stamping saved models, but a mis-identified source
arch or version will produce silently incorrect results — pass
--arch / --from-version explicitly to override the guess when
you know it's wrong.
Only the major component drives migrations: a 1.0 → 1.5 upgrade
walks no migrations, while 1.5 → 2.0 walks one major-step migration.
The bookkeeping principle is the standard PEP 440 contract — within a
major, all minor / patch versions are backwards-compatible by
definition; cross a major boundary and you've explicitly declared
breakage. If a maintainer needs to break compatibility, they bump the
major; otherwise they bump minor / patch and forgather update carries
old saved models forward without any migration code.
Versioned migrations¶
Each architecture's converter (the same Python class that
forgather convert uses for HF↔FG conversion)
declares two class attributes for the in-Forgather update path:
@register_converter("llama")
class LlamaConverter(HFConverter):
arch = "llama"
arch_version = "2.0" # current schema version (PEP 440)
forgather_migrations = {
# 1.x -> 2.0: rename attention.query_linear to attention.q_proj.
# The key is the *source major*; the entry migrates anything in
# major 1 (1, 1.2, 1.3.4, ...) up to the next major (2.0).
1: VersionMigration(
description="rename attention.query_linear -> attention.q_proj",
migrate_config=lambda cfg: cfg, # no config field rename here
param_subs=(
(r"attention\.query_linear", r"attention.q_proj", ()),
),
),
}
A VersionMigration step has three parts:
| Field | Purpose |
|---|---|
migrate_config |
Function that translates the (already partially migrated) config dict into the next-version dict. Field renames, default backfills, and removals all happen here. |
param_subs |
Recursive regex substitution list (same format as |
forgather convert and |
|
forgather.ml.remap_params.remap_state_dict). Rewrites parameter FQNs. |
|
transform_state_dict |
Optional weight-level transform applied after param_subs. Receives the renamed state_dict + migrated config, returns a new state_dict. Use for shape changes, head-dim reshapes, permutations. |
Migrations chain end-to-end¶
forgather update is designed for arbitrary major-version gaps.
Maintainers register one entry per major bump (1 -> 2, 2 -> 3,
...). Minor / patch bumps don't need entries — they're compatible by
definition. The tool composes the chain at runtime by walking majors:
Each step's output becomes the next step's input. Within any major,
the source's exact minor / patch is preserved (carried verbatim
through the chain) until a step explicitly rewrites it; on the
final hop, the destination is stamped with whatever --to-version
the user requested.
If any major step in the range is missing from
forgather_migrations, the tool fails before touching weights with
a clear diagnostic naming the missing major boundaries. Pass
--to-version to stop the chain at a major the converter can
reach.
Backwards updates (downgrades) are not supported.
What gets carried forward¶
- Model weights (after rename + optional transform).
- Hyperparameters from the saved
config.json(translated throughmigrate_config). - Tokenizer files (
tokenizer.json,tokenizer_config.json,special_tokens_map.json, ...) -- saved by the project materialisation against the source directory. generation_config.jsonand anychat_template.jinja/chat_template.json-- copied verbatim if the regenerated tree doesn't already produce them.hf_model_typeanddtypeForgather-specific config fields.
What gets regenerated¶
- All Python source files (
*.py) from the current Forgather templates and modelsrc. auto_map,architectures, and other code-generator-derived fields inconfig.json.forgather_arch_version-- bumped to the target version.
Audit log¶
Every successful update writes DST/forgather_update.json:
{
"schema": "forgather_update.v1",
"timestamp": "2026-05-01T12:34:56+00:00",
"source": "/path/to/output_models/my_llama",
"arch": "llama",
"from_version": "1.3",
"to_version": "3.0",
"migrations": [
"1.x->2.0: rename attention.query_linear -> attention.q_proj",
"2.x->3.0: head_dim split"
],
"dtype": "bfloat16",
"missing_keys": [],
"unexpected_keys": []
}
Use this to verify which migration steps actually ran when debugging a strict-load failure.
Examples¶
Stamped provenance, no overrides:
Older model without stamped metadata (the tool guesses
--arch llama and --from-version 1 and warns; pass either or both
explicitly to override the guess):
Compatible minor / patch upgrade (no migration code touched — the chain is empty, the destination is just re-stamped at the target version):
Stop at an intermediate version (useful when you want to verify a specific migration step in isolation):
Dry run (resolve the plan and print it; no writes):
Specific (non-latest) checkpoint:
Permissive load (e.g. when developing a new migration and the chain is incomplete):
Authoring a migration¶
The version policy maps cleanly onto code changes:
- Non-breaking change — a tweak that the existing saved
config.jsonandstate_dictstill round-trip through the new code (e.g. a default value change, an additive field, an internal refactor that doesn't rename FQNs). Bump only the minor or patch component ofarch_version. No migration entry needed. - Breaking change — anything that renames parameter FQNs,
renames or removes config fields, or reshapes weights. Bump the
major component and register a
forgather_migrationsentry keyed by the previous major.
Steps for a breaking change:
- Open the converter, e.g.
examples/models/llama/src/converter.py. - Bump
arch_versionto the new major (e.g."1.3"->"2.0"). - Add an entry to
forgather_migrationskeyed by the previous major, describing the M.x -> (M+1).0 step.
Example: renaming attention.query_linear -> attention.q_proj
between major 1 and major 2:
from forgather.ml.model_conversion import VersionMigration
@register_converter("llama")
class LlamaConverter(HFConverter):
arch = "llama"
arch_version = "2.0"
forgather_migrations = {
1: VersionMigration(
description="rename attention.query_linear -> attention.q_proj",
migrate_config=lambda cfg: cfg,
param_subs=(
(r"attention\.query_linear", r"attention.q_proj", ()),
),
),
}
Example with a config field rename (mlp_dim -> intermediate_size)
between major 2 and major 3 — arch_version becomes "3.0":
def _major2_to_major3_config(cfg):
new = dict(cfg)
if "mlp_dim" in new:
new["intermediate_size"] = new.pop("mlp_dim")
return new
forgather_migrations = {
1: ...,
2: VersionMigration(
description="rename mlp_dim -> intermediate_size",
migrate_config=_major2_to_major3_config,
),
}
Verify your migration with a real saved model before shipping. The
golden test is bit-equivalent logits between source and destination on
a fixed input -- forgather update runs strict load, so a missing
rename surfaces immediately.
See also¶
- Model Conversion -- bidirectional HF ↔ FG
conversion, including the parameter-remapping engine and converter
plugin pattern that
forgather updatereuses. - Model Architecture -- how Forgather
generates self-contained model code; the
forgather_arch/forgather_arch_versionprovenance fields are stamped during that code generation. - Finalize Model -- build a clean handoff
directory after pre-training.
finalizeandupdateare complementary:finalizeproduces a polished release artifact from a training run,updatebrings an older release artifact forward to a new schema. - Model CLI --
forgather model constructand related commands. Note:model constructrebuilds from a project template, which is not the same as updating a saved model -- useforgather updatewhen the saved hyperparameters must be preserved.