Last updated: 2026-06-12
Tokenization
Tokenizer
Native BPE tokenizer that reads vocabulary and merge rules from GGUF metadata.
Implements byte-pair encoding directly in Zig using the tokenizer tables embedded in GGUF model files, eliminating external tokenizer dependencies.
15 exports shown
struct
Tokenizer
pub const Tokenizer = struct A native BPE tokenizer backed by vocabulary and merge tables from GGUF metadata.
function
initFromGGUF
pub fn initFromGGUF(gf: *const gguf.GGUFFile, allocator: std.mem.Allocator) !Tokenizer Initialize a Tokenizer from an open GGUF file.
Reads `tokenizer.ggml.tokens`, `tokenizer.ggml.merges`, token scores, and special token IDs (BOS, EOS, add-BOS flag) from GGUF metadata. Also builds the `merge_ranks` lookup table so that subsequent `encode` calls are fast.
function
encode
pub fn encode(self: *const Tokenizer, text: []const u8) ![]u32 Encode UTF-8 text into a sequence of token IDs.
Applies the appropriate pretokenizer (Gemma-4 chunk splitter, GPT-2 word splitter, or legacy no-split) and then BPE or SentencePiece merges. The returned slice is owned by the caller; free it with `freeEncoded`.
function
freeEncoded
pub fn freeEncoded(self: *const Tokenizer, tokens: []u32) void Release a token slice returned by `encode`.
function
encodePrompt
pub fn encodePrompt(self: *const Tokenizer, text: []const u8, allocator: std.mem.Allocator) ![]u32 Encode a prompt and prepend BOS when the model expects it.
The returned slice is allocated with `allocator`, so server routes can use a per-request allocator while the tokenizer keeps owning its internal scratch buffers. Special tokens (e.g. `<|start_header_id|>`) are resolved via `token_to_id` rather than being BPE-encoded character by character.
function
eosId
pub fn eosId(self: *const Tokenizer) u32 Return the model's end-of-sequence token ID as loaded from GGUF metadata.
function
isEndOfGeneration
pub fn isEndOfGeneration(self: *const Tokenizer, token: u32) bool Whether a sampled token ends the current generation turn.
Always terminates on the configured EOS. Gemma 4 additionally uses `<eos>=1` and `</s>=212` alongside the primary `<turn|>=106` EOS — we treat those as EOG too when the chat template is Gemma, but not for other tokenizers (Qwen token 1 is a plain `"` character).
function
bosId
pub fn bosId(self: *const Tokenizer) u32 Return the model's beginning-of-sequence token ID.
Falls back to `eos_id` when no BOS token was found in GGUF metadata.
function
shouldPrependBos
pub fn shouldPrependBos(self: *const Tokenizer) bool Return true when prompt construction should prepend a BOS token.
Requires both `prepend_bos` (from GGUF metadata) to be set and a valid `bos_id` to exist; returns false if either condition is absent.
function
preparePromptTokens
pub fn preparePromptTokens(self: *const Tokenizer, raw_tokens: []const u32) ![]u32 Wrap a raw token sequence with BOS/EOS according to the GGUF metadata flags.
Prepends BOS when `shouldPrependBos()` is true and appends EOS when `add_eos_token` is set. Allocates the result with the tokenizer's own allocator; caller is responsible for freeing the returned slice.
function
decodeToken
pub fn decodeToken(self: *const Tokenizer, token_id: u32, buf: []u8) []const u8 Decode a single token ID to UTF-8 text, reversing the GPT-2 byte-to-unicode mapping.
Handles SentencePiece word-boundary markers (▁ → space) and passes through non-ASCII codepoints (CJK, emoji) verbatim. Returns an empty string for out-of-range token IDs.
struct
ChatTemplateOptions
pub const ChatTemplateOptions = struct Options controlling chat template rendering behavior.
function
supportsThinkingToggle
pub fn supportsThinkingToggle(self: *const Tokenizer) bool Return whether the model's chat template supports an explicit thinking toggle.
function
applyChatTemplate
pub fn applyChatTemplate(self: *const Tokenizer, roles: []const []const u8, contents: []const []const u8, buf: []u8) ![]const u8 Format a conversation into a model prompt using the embedded chat template.
Convenience wrapper around `applyChatTemplateWithOptions` with default options.
function
applyChatTemplateWithOptions
pub fn applyChatTemplateWithOptions(self: *const Tokenizer, roles: []const []const u8, contents: []const []const u8, options: ChatTemplateOptions, buf: []u8) ![]const u8 Format a conversation into a model prompt with fine-grained rendering control.
Dispatches to the appropriate template renderer (ChatML, Llama-3, Gemma, OpenAI-MoE, or generic) based on `detectTemplateKind()`. Supports optional thinking tags, tool definitions, forced tool-call prefills, and generation-prompt suffixes.