Skip to content

Updating a Saved Model to Newer Forgather Sources

When the source code or templates for a Forgather model architecture change -- a parameter is renamed, a layer refactored, a config field renamed -- existing saved models become incompatible with the new code: loading the checkpoint with strict=True fails on renamed FQNs, and strict=False silently leaves the renamed parameters uninitialised.

forgather update migrates a saved Forgather model in place: it regenerates the model code from the current sources and applies a chain of versioned migrations to the saved config and weights, so the result is a working model on the new schema with the original hyperparameters preserved.

This is the in-Forgather counterpart to forgather convert. Where convert round-trips through HuggingFace, update stays inside Forgather and does not require an HF counterpart for the architecture.

Quick start

# Migrate a saved model to the current schema for its arch
forgather update output_models/my_llama out/my_llama_v2

The tool reads the source model's forgather_arch and forgather_arch_version from config.json, looks up the registered converter for that arch, walks its migration chain end-to-end, and writes the result to the destination directory.

When you need this

forgather update solves a different problem from forgather model construct. model construct rebuilds a model from a project + config template, which only works when you still have a config that matches the saved hyperparameters. Real saved models routinely carry customisations that have drifted from any template default -- an extended RoPE base for long-context training, a hand-tuned sliding window, a non-default head dim. Those values must come from the saved config.json, not from a fresh template.

forgather update:

  • Reads hyperparameters from the saved config and translates them through versioned config-translators.
  • Applies parameter-FQN renames to the saved state_dict.
  • Optionally applies weight-level transforms (head reshapes, permutations) when a step requires them.
  • Stamps the new schema version into the destination's config.json.

It does not re-train the model, change vocabulary, or rebuild the tokenizer. Tokenizer files, generation_config.json, and any chat template are carried forward verbatim.

CLI reference

forgather update [OPTIONS] SRC_MODEL_PATH DST_MODEL_PATH

Provenance overrides (use when the source model predates schema stamping):

Option Description
--arch NAME Converter registry key (e.g. llama); overrides forgather_arch from source config
--from-version N Source schema version; overrides forgather_arch_version from source config
--to-version N Stop the chain at version N (default: the converter's current arch_version)

Checkpoint:

Option Description
-c, --checkpoint PATH Specific checkpoint directory to migrate (default: latest under SRC/checkpoints/, falling back to SRC root)

Model properties:

Option Description
--device {cpu,cuda,...} Device used during migration (default: cpu)
--dtype {bfloat16,float16,float32} Override saved dtype (default: keep checkpoint dtype)
--safetensors Save weights as safetensors (default: PyTorch .bin)
--no-strict Allow missing/unexpected keys when loading the migrated state_dict (default: strict)

Other:

Option Description
--converter-path PATH Additional directory to search for converter plugins (repeatable)
--dry-run Resolve and print the migration plan; write nothing
--log-level LEVEL Logging level (default: INFO)

How updates work

Schema provenance

Forgather model code generation stamps two fields into every newly saved config.json:

  • forgather_arch -- string, the converter registry key for the architecture (e.g. "llama"). Defaults to ns.model_short_name from the model template, which already matches the converter key for archs under examples/models/.
  • forgather_arch_version -- PEP 440 version string (e.g. "1", "1.2", "2.3.1") recording the schema version of the Forgather sources at the time of save. Defaults to "1". Pre-PEP-440 saved configs may carry a bare integer here; the tool coerces those cleanly so older models still load.

forgather update reads these fields to identify the migration source. Models saved before this stamping was added simply lack the fields; in that case the tool falls back to --arch llama and --from-version 1 (the first schema version) and prints a warning naming both fallbacks. The fallback is right for the overwhelming majority of pre-stamping saved models, but a mis-identified source arch or version will produce silently incorrect results — pass --arch / --from-version explicitly to override the guess when you know it's wrong.

Only the major component drives migrations: a 1.0 → 1.5 upgrade walks no migrations, while 1.5 → 2.0 walks one major-step migration. The bookkeeping principle is the standard PEP 440 contract — within a major, all minor / patch versions are backwards-compatible by definition; cross a major boundary and you've explicitly declared breakage. If a maintainer needs to break compatibility, they bump the major; otherwise they bump minor / patch and forgather update carries old saved models forward without any migration code.

Versioned migrations

Each architecture's converter (the same Python class that forgather convert uses for HF↔FG conversion) declares two class attributes for the in-Forgather update path:

@register_converter("llama")
class LlamaConverter(HFConverter):
    arch = "llama"
    arch_version = "2.0"  # current schema version (PEP 440)
    forgather_migrations = {
        # 1.x -> 2.0: rename attention.query_linear to attention.q_proj.
        # The key is the *source major*; the entry migrates anything in
        # major 1 (1, 1.2, 1.3.4, ...) up to the next major (2.0).
        1: VersionMigration(
            description="rename attention.query_linear -> attention.q_proj",
            migrate_config=lambda cfg: cfg,  # no config field rename here
            param_subs=(
                (r"attention\.query_linear", r"attention.q_proj", ()),
            ),
        ),
    }

A VersionMigration step has three parts:

Field Purpose
migrate_config Function that translates the (already partially migrated) config dict into the next-version dict. Field renames, default backfills, and removals all happen here.
param_subs Recursive regex substitution list (same format as
forgather convert and
forgather.ml.remap_params.remap_state_dict). Rewrites parameter FQNs.
transform_state_dict Optional weight-level transform applied after param_subs. Receives the renamed state_dict + migrated config, returns a new state_dict. Use for shape changes, head-dim reshapes, permutations.

Migrations chain end-to-end

forgather update is designed for arbitrary major-version gaps. Maintainers register one entry per major bump (1 -> 2, 2 -> 3, ...). Minor / patch bumps don't need entries — they're compatible by definition. The tool composes the chain at runtime by walking majors:

1.x -> 2.0 -> 3.0 -> ... -> N.0

Each step's output becomes the next step's input. Within any major, the source's exact minor / patch is preserved (carried verbatim through the chain) until a step explicitly rewrites it; on the final hop, the destination is stamped with whatever --to-version the user requested.

If any major step in the range is missing from forgather_migrations, the tool fails before touching weights with a clear diagnostic naming the missing major boundaries. Pass --to-version to stop the chain at a major the converter can reach.

Backwards updates (downgrades) are not supported.

What gets carried forward

  • Model weights (after rename + optional transform).
  • Hyperparameters from the saved config.json (translated through migrate_config).
  • Tokenizer files (tokenizer.json, tokenizer_config.json, special_tokens_map.json, ...) -- saved by the project materialisation against the source directory.
  • generation_config.json and any chat_template.jinja / chat_template.json -- copied verbatim if the regenerated tree doesn't already produce them.
  • hf_model_type and dtype Forgather-specific config fields.

What gets regenerated

  • All Python source files (*.py) from the current Forgather templates and modelsrc.
  • auto_map, architectures, and other code-generator-derived fields in config.json.
  • forgather_arch_version -- bumped to the target version.

Audit log

Every successful update writes DST/forgather_update.json:

{
  "schema": "forgather_update.v1",
  "timestamp": "2026-05-01T12:34:56+00:00",
  "source": "/path/to/output_models/my_llama",
  "arch": "llama",
  "from_version": "1.3",
  "to_version": "3.0",
  "migrations": [
    "1.x->2.0: rename attention.query_linear -> attention.q_proj",
    "2.x->3.0: head_dim split"
  ],
  "dtype": "bfloat16",
  "missing_keys": [],
  "unexpected_keys": []
}

Use this to verify which migration steps actually ran when debugging a strict-load failure.

Examples

Stamped provenance, no overrides:

forgather update output_models/llama_4m out/llama_4m_v2

Older model without stamped metadata (the tool guesses --arch llama and --from-version 1 and warns; pass either or both explicitly to override the guess):

forgather update legacy/llama_old out/llama_new \
    --arch llama --from-version 1

Compatible minor / patch upgrade (no migration code touched — the chain is empty, the destination is just re-stamped at the target version):

forgather update SRC DST --to-version 1.5

Stop at an intermediate version (useful when you want to verify a specific migration step in isolation):

forgather update SRC DST --to-version 2.0

Dry run (resolve the plan and print it; no writes):

forgather update SRC DST --dry-run

Specific (non-latest) checkpoint:

forgather update SRC DST -c SRC/checkpoints/checkpoint-385440

Permissive load (e.g. when developing a new migration and the chain is incomplete):

forgather update SRC DST --no-strict

Authoring a migration

The version policy maps cleanly onto code changes:

  • Non-breaking change — a tweak that the existing saved config.json and state_dict still round-trip through the new code (e.g. a default value change, an additive field, an internal refactor that doesn't rename FQNs). Bump only the minor or patch component of arch_version. No migration entry needed.
  • Breaking change — anything that renames parameter FQNs, renames or removes config fields, or reshapes weights. Bump the major component and register a forgather_migrations entry keyed by the previous major.

Steps for a breaking change:

  1. Open the converter, e.g. examples/models/llama/src/converter.py.
  2. Bump arch_version to the new major (e.g. "1.3" -> "2.0").
  3. Add an entry to forgather_migrations keyed by the previous major, describing the M.x -> (M+1).0 step.

Example: renaming attention.query_linear -> attention.q_proj between major 1 and major 2:

from forgather.ml.model_conversion import VersionMigration

@register_converter("llama")
class LlamaConverter(HFConverter):
    arch = "llama"
    arch_version = "2.0"
    forgather_migrations = {
        1: VersionMigration(
            description="rename attention.query_linear -> attention.q_proj",
            migrate_config=lambda cfg: cfg,
            param_subs=(
                (r"attention\.query_linear", r"attention.q_proj", ()),
            ),
        ),
    }

Example with a config field rename (mlp_dim -> intermediate_size) between major 2 and major 3 — arch_version becomes "3.0":

def _major2_to_major3_config(cfg):
    new = dict(cfg)
    if "mlp_dim" in new:
        new["intermediate_size"] = new.pop("mlp_dim")
    return new

forgather_migrations = {
    1: ...,
    2: VersionMigration(
        description="rename mlp_dim -> intermediate_size",
        migrate_config=_major2_to_major3_config,
    ),
}

Verify your migration with a real saved model before shipping. The golden test is bit-equivalent logits between source and destination on a fixed input -- forgather update runs strict load, so a missing rename surfaces immediately.

See also

  • Model Conversion -- bidirectional HF ↔ FG conversion, including the parameter-remapping engine and converter plugin pattern that forgather update reuses.
  • Model Architecture -- how Forgather generates self-contained model code; the forgather_arch / forgather_arch_version provenance fields are stamped during that code generation.
  • Finalize Model -- build a clean handoff directory after pre-training. finalize and update are complementary: finalize produces a polished release artifact from a training run, update brings an older release artifact forward to a new schema.
  • Model CLI -- forgather model construct and related commands. Note: model construct rebuilds from a project template, which is not the same as updating a saved model -- use forgather update when the saved hyperparameters must be preserved.