Skip to content

Forgather 1.2.0 release notes

Released: May 2026

This is the first formal Forgather release with notes. The headline of 1.2.0 is multi-node: the server, the dataset layer, and the web UI now treat a LAN-attached set of machines as a single entity. A 1.1.0 user mostly trained one machine at a time and either copied a dataset to every node by hand or paid the download and indexing cost on every machine; a 1.2.0 user can launch a multi-node bundle from the web UI, point every rank at one dataset server, and resume from a checkpoint in O(1) instead of streaming back to the saved position.

The release also folds in: native HTTPS / mTLS across all three servers, torchao QAT + PTQ as a unified --quantize flow, in-place server restart, auto-start services from server_config.yaml, a runtime Docker image suitable for distribution, DGX Spark (GB10, aarch64) bring-up, plus the long tail of fixes and webui polish that the items below list out.


Headline features

Multi-node forgather server

forgather server --cluster <name> puts a node into cluster mode. Peers discover each other automatically over the LAN, track each other's liveness, and elect one master. Separate clusters on the same LAN never auto-merge.

  • forgather cluster CLInodes, jobs, submit, cancel. The submit form mirrors the local train shape (-p / -t), takes repeatable --member host:gpus[:iface] selectors, and rolls every participant's status up into one bundle.
  • Cluster panel in the web UI — a Nodes view showing each peer's role, reachability, version, and GPU summary. Clicking a peer hostname opens its own web UI in a new tab with no second login prompt.
  • Per-peer network diagnostics — measures inter-node latency and bandwidth, surfaced in the Nodes panel so you can spot a misconfigured link before a training run inherits the bottleneck.
  • Per-node restart / shutdown controls in the Nodes panel.

Docs: docs/guides/multi-node-training.md.

Dataset server

A new long-running service that turns datasets into a cluster-shared resource. The trainer-side loader speaks to it transparently: existing configs pointing at local Arrow datasets do not need to change.

  • Unified view across nodes — a dataset registered on any peer is reachable from every peer.
  • Load balancing — multiple peers serving the same dataset are pooled as one logical endpoint.
  • Streams datasets from remote hosts — pull rows over the network when the corpus doesn't live on the local node. Multiple ranks can share one handle.
  • Stateful dataset protocol — resume from a checkpoint seeks directly to the saved position (O(1)).
  • Resilient client — the trainer survives transient network errors and dataset-server restarts mid-run.
  • Web UI dataset browser — a new Datasets view lists every dataset reachable from the cluster, and clicking one opens a paged content viewer so you can see what's actually in it without dropping to a notebook.

CLI: forgather dataset-server start | status | list | cache | local.


TLS / mTLS

The 1.1.0 security model -- bearer token on a loopback-bound server -- was fine for a single node running on localhost, but doesn't hold up once the server binds to all interfaces for multi-node operation, so 1.2.0 adds native HTTPS across forgather server, dataset-server, and inference-server, off a single per-host config under ~/.config/forgather/tls/. One forgather tls init enables TLS on a host; forgather tls deploy automates per-peer bring-up across an entire cluster.

  • forgather tls CLIinit / status / renew / mint / install / deploy / enable / disable / export-ca / import-ca / trust-system / clean.
  • Cluster peer auth is now mutual TLS — peers authenticate each other by client certificate signed by the cluster CA.
  • Docker runtime image — opt-in TLS_INIT=1 runs forgather tls init on first start, or bake a CA + cert into the image for build-once-distribute-N clusters.

Docs: docs/operations/tls.md. See Breaking changes below for the cert reissue step required on upgrade.


Server config + auto-start services

A server_config.yaml with two top-level sections:

  • args: — CLI defaults (host, port, cluster name, persist-sessions, …). New --config PATH arg; default at <forgather_config_dir>/server/server_config.yaml.
  • services: — auto-start entries for dataset-server, inference, TensorBoard, MkDocs. Manually-launched matching jobs count as the running instance, so a restart never double-spawns.

Sidebar overhaul to expose this:

  • Tools / Services split — Tools is one-shot (Evaluate / Convert / Finalize / Update); Services is long-running (Inference / Dataset / TensorBoard / MkDocs) with a running-count pill and per-instance start / stop / delete controls.
  • Sticky sidebar footer — Refresh, Scheduler pause/resume, Restart, Open config.
  • In-place restart — the server can restart itself without dropping its TTY or losing the subprocesses (training, inference, dataset_server, mkdocs, tensorboard) it spawned. With opt-in --persist-sessions, browser sessions also survive the restart.

Reference: tools/forgather_server/README.md (also accessible as docs/forgather-server.md).


Quantization: torchao QAT + PTQ

1.2.0 adds quantization-aware training (new) and the ability to quantize a finalized model (also new). forgather finalize --quantize <recipe> is the single deploy-time entry point: it covers both post-training quantization on any bf16 model and the convert step of a QAT round-trip on a model trained with --qat-recipe. (Float8 training via torchao already existed in 1.1.0 and is unchanged.)

  • Train-time--qat-recipe simulates the target low-bit precision in the forward pass while keeping the backward pass in full precision. Mutually exclusive with --fp8-recipe.
  • Deploy-timeforgather finalize --quantize <recipe> swaps the fake-quantized layers for the real low-bit ops. Recipes: int8-dynamic-act-int4-weight, int4-weight-only, float8-dynamic-act-float8-weight, float8-dynamic-act-int4-weight.
  • Loads back transparentlyforgather eval test, forgather inf server --from-checkpoint, and trainer resume_from_checkpoint autodetect torchao-quantized checkpoints and install the matching quantized linears before loading state. No CLI flag, no marker file. Closes #41 / #42.
  • Web UI — the Finalize modal grows a Quantize dropdown.

Docs: docs/trainers/qat-training.md.


Hardware bring-up

  • DGX Spark (NVIDIA GB10, aarch64) is supported as a first-class cluster member. The dev and runtime Docker images install CUDA-enabled torch for aarch64.
  • Unified-memory GPUs (GB10, Jetson) — these devices no longer drop out of the GPU panel; host RAM is surfaced as the device pool and flagged unified_memory.
  • MFU sanity — when no GPU is recognised, peak-hardware-flops falls back to a deliberately pessimistic placeholder, so silent fallbacks show up as MFU > 100 % in dashboards instead of a believable number.
  • FP8 on Blackwell — torch's bundled NVRTC on SM 12.1 (GB10) can't JIT the rowwise recipe; documented with a workaround. tensorwise is unaffected.
  • FP8 warning — torchao 0.16.0's rowwise_with_gw_hp recipe is broken for ND inputs; the trainer warns at init.

Docker

  • Runtime image — a distributable counterpart to the dev image whose default command is forgather server. The in-container user is remapped to PUID/PGID on entry, so a single prebuilt image works for any host user and bind-mounts (~/.cache/huggingface, etc.) don't drift ownership.
  • NETWORK=host opt-in, required for cluster peer discovery (mDNS multicast doesn't traverse the docker bridge).
  • docker/build.shdocker/build, docker/run.shdocker/run for tab-completion ergonomics; the old names remain for one release.
  • TensorBoard patch is version-gated — when upstream ships a fix the next image rebuild will fail loudly so the patch can be retired, instead of silently no-op'ing.
  • Multi-node smoke testtests/smoke_runtime_multinode.sh builds the runtime image, deploys to a remote host, brings up a two-node cluster, runs Tiny Llama across all GPUs, verifies the checkpoint, and cleans up. On failure it dumps container logs + cluster state + TTY logs + nvidia-smi into one triage file.

Docs: docs/getting-started/docker.md.


Construct job + tokenizer projects

  • Construct job typeforgather construct --enqueue surfaces the diagnostic / build side of forgather construct through the queue. Targets can be materialised (and side-effecting targets like a tokenizer trainer can be invoked) without running inline in the server process. A 🔨 Construct… entry appears on every config's context menu in the web UI.
  • Tokenizer projects modernised (examples/tokenizers/) — dynamic CLI args (--output-dir, --tokenizer-name, --model-max-length, --vocab-size), so a custom tokenizer can be built entirely from the CLI via forgather construct.

Configuration directory

Per-user state moved from ~/.forgather/ to ~/.config/forgather/ (platform-appropriate via platformdirs). Hard switch with no automatic migration: auth tokens regenerate on first use; users with customised config.yaml, generation_config/*, or hardware.yaml need to move those by hand. Run rm -rf ~/.forgather after upgrading to reclaim the orphaned dir.


Web UI: smaller items

  • Files view — paste auto-renames on collision; new ⎘ Duplicate entry; Duplicate Config… prompt on project configs.
  • Edit Meta… and Edit Meta Defaults… context-menu entries on projects and workspaces.
  • Header right-click → Help… opens the server reference doc.
  • Docs view — outline / TOC column, Back-button scroll restore, default landing on docs/README.md.
  • Sidebar count pills on tool groups.
  • Clean Output menu hidden when the output directory doesn't exist. instead of lingering green on a stale cache.
  • Server shutdown button with optional stop-all-jobs.
  • Dev-container uv-cache permissions fixed.

Bug fixes worth calling out

  • Template-editor saves no longer chmod files +x. The atomic-write helper used to default new files to mode 0o777 (umask-trimmed to 0o755), so every template-editor save silently flipped 0o644 → 0o755. Pass 0o666 explicitly. Regression-tested with a pinned umask.
  • Atomic write is now POSIX-safe — full fsync → rename → directory fsync.
  • Workspace Edit README.md opens forgather_workspace/README.md, matching where forgather ws create writes the file.
  • NFS view-freshness nudgesfs.delete_dir syncs and bumps the parent dir's mtime after rmtree; the multi-node build barrier syncs the builder before releasing peers and invalidates importlib caches on waiters after the barrier, defeating the most common "Unrecognized model — should have a model_type key" race on NFS.

Breaking changes (read these before upgrading)

  1. Per-user state moved~/.forgather/~/.config/forgather/. No automatic migration. Auth tokens regenerate on first use; copy config.yaml, generation_config/, hardware.yaml by hand if you customised them. rm -rf ~/.forgather once you're done.
  2. Docker build.sh / run.sh renamed to build / run — wrappers are still in place for one release; update CI scripts and docs at your leisure.
  3. uvicorn is pinned to >= 0.46, < 0.47 — required by the mTLS integration.

Misc

  • small-llm experiments, plots, and sub-project READMEs refreshed.
  • tests/integration exercises real TLS in the inference test.
  • Docs build is now a fast CI gate.

Contributors

This release was implemented largely in collaboration with Claude (Anthropic) via Claude Code. Co-author lines on individual commits identify which agent landed each piece.