Forgather 1.2.0 release notes¶
Released: May 2026
This is the first formal Forgather release with notes. The headline of 1.2.0 is multi-node: the server, the dataset layer, and the web UI now treat a LAN-attached set of machines as a single entity. A 1.1.0 user mostly trained one machine at a time and either copied a dataset to every node by hand or paid the download and indexing cost on every machine; a 1.2.0 user can launch a multi-node bundle from the web UI, point every rank at one dataset server, and resume from a checkpoint in O(1) instead of streaming back to the saved position.
The release also folds in: native HTTPS / mTLS across all three servers,
torchao QAT + PTQ as a unified --quantize flow, in-place server restart,
auto-start services from server_config.yaml, a runtime Docker image
suitable for distribution, DGX Spark (GB10, aarch64) bring-up, plus the
long tail of fixes and webui polish that the items below list out.
Headline features¶
Multi-node forgather server¶
forgather server --cluster <name> puts a node into cluster mode. Peers
discover each other automatically over the LAN, track each other's
liveness, and elect one master. Separate clusters on the same LAN never
auto-merge.
forgather clusterCLI —nodes,jobs,submit,cancel. Thesubmitform mirrors the localtrainshape (-p/-t), takes repeatable--member host:gpus[:iface]selectors, and rolls every participant's status up into one bundle.- Cluster panel in the web UI — a Nodes view showing each peer's role, reachability, version, and GPU summary. Clicking a peer hostname opens its own web UI in a new tab with no second login prompt.
- Per-peer network diagnostics — measures inter-node latency and bandwidth, surfaced in the Nodes panel so you can spot a misconfigured link before a training run inherits the bottleneck.
- Per-node restart / shutdown controls in the Nodes panel.
Docs: docs/guides/multi-node-training.md.
Dataset server¶
A new long-running service that turns datasets into a cluster-shared resource. The trainer-side loader speaks to it transparently: existing configs pointing at local Arrow datasets do not need to change.
- Unified view across nodes — a dataset registered on any peer is reachable from every peer.
- Load balancing — multiple peers serving the same dataset are pooled as one logical endpoint.
- Streams datasets from remote hosts — pull rows over the network when the corpus doesn't live on the local node. Multiple ranks can share one handle.
- Stateful dataset protocol — resume from a checkpoint seeks directly to the saved position (O(1)).
- Resilient client — the trainer survives transient network errors and dataset-server restarts mid-run.
- Web UI dataset browser — a new Datasets view lists every dataset reachable from the cluster, and clicking one opens a paged content viewer so you can see what's actually in it without dropping to a notebook.
CLI: forgather dataset-server start | status | list | cache | local.
TLS / mTLS¶
The 1.1.0 security model -- bearer token on a loopback-bound server --
was fine for a single node running on localhost, but doesn't hold up
once the server binds to all interfaces for multi-node operation, so
1.2.0 adds native HTTPS across forgather server, dataset-server, and
inference-server, off a single per-host config under
~/.config/forgather/tls/. One forgather tls init enables TLS on a
host; forgather tls deploy automates per-peer bring-up across an
entire cluster.
forgather tlsCLI —init/status/renew/mint/install/deploy/enable/disable/export-ca/import-ca/trust-system/clean.- Cluster peer auth is now mutual TLS — peers authenticate each other by client certificate signed by the cluster CA.
- Docker runtime image — opt-in
TLS_INIT=1runsforgather tls initon first start, or bake a CA + cert into the image for build-once-distribute-N clusters.
Docs: docs/operations/tls.md. See Breaking
changes below for the cert reissue step required on upgrade.
Server config + auto-start services¶
A server_config.yaml with two top-level sections:
args:— CLI defaults (host, port, cluster name, persist-sessions, …). New--config PATHarg; default at<forgather_config_dir>/server/server_config.yaml.services:— auto-start entries for dataset-server, inference, TensorBoard, MkDocs. Manually-launched matching jobs count as the running instance, so a restart never double-spawns.
Sidebar overhaul to expose this:
- Tools / Services split — Tools is one-shot (Evaluate / Convert / Finalize / Update); Services is long-running (Inference / Dataset / TensorBoard / MkDocs) with a running-count pill and per-instance start / stop / delete controls.
- Sticky sidebar footer — Refresh, Scheduler pause/resume, Restart, Open config.
- In-place restart — the server can restart itself without dropping
its TTY or losing the subprocesses (training, inference,
dataset_server, mkdocs, tensorboard) it spawned. With opt-in
--persist-sessions, browser sessions also survive the restart.
Reference: tools/forgather_server/README.md
(also accessible as docs/forgather-server.md).
Quantization: torchao QAT + PTQ¶
1.2.0 adds quantization-aware training (new) and the ability to
quantize a finalized model (also new). forgather finalize
--quantize <recipe> is the single deploy-time entry point: it covers
both post-training quantization on any bf16 model and the convert step
of a QAT round-trip on a model trained with --qat-recipe. (Float8
training via torchao already existed in 1.1.0 and is unchanged.)
- Train-time —
--qat-recipesimulates the target low-bit precision in the forward pass while keeping the backward pass in full precision. Mutually exclusive with--fp8-recipe. - Deploy-time —
forgather finalize --quantize <recipe>swaps the fake-quantized layers for the real low-bit ops. Recipes:int8-dynamic-act-int4-weight,int4-weight-only,float8-dynamic-act-float8-weight,float8-dynamic-act-int4-weight. - Loads back transparently —
forgather eval test,forgather inf server --from-checkpoint, and trainerresume_from_checkpointautodetect torchao-quantized checkpoints and install the matching quantized linears before loading state. No CLI flag, no marker file. Closes #41 / #42. - Web UI — the Finalize modal grows a Quantize dropdown.
Docs: docs/trainers/qat-training.md.
Hardware bring-up¶
- DGX Spark (NVIDIA GB10, aarch64) is supported as a first-class cluster member. The dev and runtime Docker images install CUDA-enabled torch for aarch64.
- Unified-memory GPUs (GB10, Jetson) — these devices no longer drop
out of the GPU panel; host RAM is surfaced as the device pool and
flagged
unified_memory. - MFU sanity — when no GPU is recognised, peak-hardware-flops falls back to a deliberately pessimistic placeholder, so silent fallbacks show up as MFU > 100 % in dashboards instead of a believable number.
- FP8 on Blackwell — torch's bundled NVRTC on SM 12.1 (GB10) can't
JIT the
rowwiserecipe; documented with a workaround.tensorwiseis unaffected. - FP8 warning — torchao 0.16.0's
rowwise_with_gw_hprecipe is broken for ND inputs; the trainer warns at init.
Docker¶
- Runtime image — a distributable counterpart to the dev image
whose default command is
forgather server. The in-container user is remapped to PUID/PGID on entry, so a single prebuilt image works for any host user and bind-mounts (~/.cache/huggingface, etc.) don't drift ownership. NETWORK=hostopt-in, required for cluster peer discovery (mDNS multicast doesn't traverse the docker bridge).docker/build.sh→docker/build,docker/run.sh→docker/runfor tab-completion ergonomics; the old names remain for one release.- TensorBoard patch is version-gated — when upstream ships a fix the next image rebuild will fail loudly so the patch can be retired, instead of silently no-op'ing.
- Multi-node smoke test —
tests/smoke_runtime_multinode.shbuilds the runtime image, deploys to a remote host, brings up a two-node cluster, runs Tiny Llama across all GPUs, verifies the checkpoint, and cleans up. On failure it dumps container logs + cluster state + TTY logs +nvidia-smiinto one triage file.
Docs: docs/getting-started/docker.md.
Construct job + tokenizer projects¶
- Construct job type —
forgather construct --enqueuesurfaces the diagnostic / build side offorgather constructthrough the queue. Targets can be materialised (and side-effecting targets like a tokenizer trainer can be invoked) without running inline in the server process. A 🔨 Construct… entry appears on every config's context menu in the web UI. - Tokenizer projects modernised (
examples/tokenizers/) — dynamic CLI args (--output-dir,--tokenizer-name,--model-max-length,--vocab-size), so a custom tokenizer can be built entirely from the CLI viaforgather construct.
Configuration directory¶
Per-user state moved from ~/.forgather/ to ~/.config/forgather/
(platform-appropriate via platformdirs). Hard switch with no
automatic migration: auth tokens regenerate on first use; users with
customised config.yaml, generation_config/*, or hardware.yaml
need to move those by hand. Run rm -rf ~/.forgather after upgrading
to reclaim the orphaned dir.
Web UI: smaller items¶
- Files view — paste auto-renames on collision; new ⎘ Duplicate entry; Duplicate Config… prompt on project configs.
- Edit Meta… and Edit Meta Defaults… context-menu entries on projects and workspaces.
- Header right-click → Help… opens the server reference doc.
- Docs view — outline / TOC column, Back-button scroll restore,
default landing on
docs/README.md. - Sidebar count pills on tool groups.
- Clean Output menu hidden when the output directory doesn't exist. instead of lingering green on a stale cache.
- Server shutdown button with optional stop-all-jobs.
- Dev-container uv-cache permissions fixed.
Bug fixes worth calling out¶
- Template-editor saves no longer chmod files +x. The atomic-write helper used to default new files to mode 0o777 (umask-trimmed to 0o755), so every template-editor save silently flipped 0o644 → 0o755. Pass 0o666 explicitly. Regression-tested with a pinned umask.
- Atomic write is now POSIX-safe — full
fsync → rename → directory fsync. - Workspace Edit README.md opens
forgather_workspace/README.md, matching whereforgather ws createwrites the file. - NFS view-freshness nudges —
fs.delete_dirsyncs and bumps the parent dir's mtime afterrmtree; the multi-node build barrier syncs the builder before releasing peers and invalidates importlib caches on waiters after the barrier, defeating the most common "Unrecognized model — should have a model_type key" race on NFS.
Breaking changes (read these before upgrading)¶
- Per-user state moved —
~/.forgather/→~/.config/forgather/. No automatic migration. Auth tokens regenerate on first use; copyconfig.yaml,generation_config/,hardware.yamlby hand if you customised them.rm -rf ~/.forgatheronce you're done. - Docker
build.sh/run.shrenamed tobuild/run— wrappers are still in place for one release; update CI scripts and docs at your leisure. uvicornis pinned to>= 0.46, < 0.47— required by the mTLS integration.
Misc¶
small-llmexperiments, plots, and sub-project READMEs refreshed.tests/integrationexercises real TLS in the inference test.- Docs build is now a fast CI gate.
Contributors¶
This release was implemented largely in collaboration with Claude (Anthropic) via Claude Code. Co-author lines on individual commits identify which agent landed each piece.