Skip to content

Working with Tokenizer Projects

Tokenizer projects define custom tokenizer configurations — for example, a BPE tokenizer trained on a specific corpus. They follow the standard Forgather project layout and use the same CLI commands as any other project.

# List available tokenizer configurations
forgather ls

# Show the preprocessed tokenizer configuration
forgather -t <config_name> pp

# Build the tokenizer (if it has not yet been built)
forgather -t <config_name> construct

For example tokenizer projects, see examples/tokenizers/.