Skip to content

Index

Wikitext BPE Tokenizer

An example BPE tokenizer trained on the "EleutherAI/wikitext_document_level" dataset.

8k

  • vocabulary_size: 8000
  • model_max_length: 2048

32k

  • vocabulary_size: 8000
  • model_max_length: 8192