Last updated: 2026-06-12

Tokenization

Tokenizer

All API Sections

Native BPE tokenizer that reads vocabulary and merge rules from GGUF metadata.

Implements byte-pair encoding directly in Zig using the tokenizer tables embedded in GGUF model files, eliminating external tokenizer dependencies.

15 exports 0 methods src/model/tokenizer.zig

15 exports shown

struct

Tokenizer

#
pub const Tokenizer = struct

A native BPE tokenizer backed by vocabulary and merge tables from GGUF metadata.

src/model/tokenizer.zig:21

function

initFromGGUF

#
pub fn initFromGGUF(gf: *const gguf.GGUFFile, allocator: std.mem.Allocator) !Tokenizer

Initialize a Tokenizer from an open GGUF file.

Reads `tokenizer.ggml.tokens`, `tokenizer.ggml.merges`, token scores, and special token IDs (BOS, EOS, add-BOS flag) from GGUF metadata. Also builds the `merge_ranks` lookup table so that subsequent `encode` calls are fast.

Parameters

gf
Parsed GGUF file whose metadata contains the tokenizer tables.
allocator
Used for all owned heap allocations; pass to `deinit` to free.

Returns

An initialized Tokenizer, or an error if required metadata is absent.

src/model/tokenizer.zig:72

function

encode

#
pub fn encode(self: *const Tokenizer, text: []const u8) ![]u32

Encode UTF-8 text into a sequence of token IDs.

Applies the appropriate pretokenizer (Gemma-4 chunk splitter, GPT-2 word splitter, or legacy no-split) and then BPE or SentencePiece merges. The returned slice is owned by the caller; free it with `freeEncoded`.

Parameters

text
UTF-8 input to tokenize; an empty string returns an empty slice.

Returns

Heap-allocated token ID sequence, or an error on allocation failure.

src/model/tokenizer.zig:464

function

freeEncoded

#
pub fn freeEncoded(self: *const Tokenizer, tokens: []u32) void

Release a token slice returned by `encode`.

src/model/tokenizer.zig:507

function

encodePrompt

#
pub fn encodePrompt(self: *const Tokenizer, text: []const u8, allocator: std.mem.Allocator) ![]u32

Encode a prompt and prepend BOS when the model expects it.

The returned slice is allocated with `allocator`, so server routes can use a per-request allocator while the tokenizer keeps owning its internal scratch buffers. Special tokens (e.g. `<|start_header_id|>`) are resolved via `token_to_id` rather than being BPE-encoded character by character.

src/model/tokenizer.zig:516

function

eosId

#
pub fn eosId(self: *const Tokenizer) u32

Return the model's end-of-sequence token ID as loaded from GGUF metadata.

src/model/tokenizer.zig:752

function

isEndOfGeneration

#
pub fn isEndOfGeneration(self: *const Tokenizer, token: u32) bool

Whether a sampled token ends the current generation turn.

Always terminates on the configured EOS. Gemma 4 additionally uses `<eos>=1` and `</s>=212` alongside the primary `<turn|>=106` EOS — we treat those as EOG too when the chat template is Gemma, but not for other tokenizers (Qwen token 1 is a plain `"` character).

src/model/tokenizer.zig:762

function

bosId

#
pub fn bosId(self: *const Tokenizer) u32

Return the model's beginning-of-sequence token ID.

Falls back to `eos_id` when no BOS token was found in GGUF metadata.

src/model/tokenizer.zig:772

function

shouldPrependBos

#
pub fn shouldPrependBos(self: *const Tokenizer) bool

Return true when prompt construction should prepend a BOS token.

Requires both `prepend_bos` (from GGUF metadata) to be set and a valid `bos_id` to exist; returns false if either condition is absent.

src/model/tokenizer.zig:779

function

preparePromptTokens

#
pub fn preparePromptTokens(self: *const Tokenizer, raw_tokens: []const u32) ![]u32

Wrap a raw token sequence with BOS/EOS according to the GGUF metadata flags.

Prepends BOS when `shouldPrependBos()` is true and appends EOS when `add_eos_token` is set. Allocates the result with the tokenizer's own allocator; caller is responsible for freeing the returned slice.

Parameters

raw_tokens
The BPE-encoded token IDs to wrap.

Returns

A newly allocated slice with optional BOS prefix and EOS suffix.

src/model/tokenizer.zig:789

function

decodeToken

#
pub fn decodeToken(self: *const Tokenizer, token_id: u32, buf: []u8) []const u8

Decode a single token ID to UTF-8 text, reversing the GPT-2 byte-to-unicode mapping.

Handles SentencePiece word-boundary markers (▁ → space) and passes through non-ASCII codepoints (CJK, emoji) verbatim. Returns an empty string for out-of-range token IDs.

Parameters

token_id
Vocabulary index to decode.
buf
Caller-supplied output buffer; result is a slice into this buffer.

Returns

UTF-8 bytes for the token, or an empty slice if the ID is out of range.

src/model/tokenizer.zig:815

struct

ChatTemplateOptions

#
pub const ChatTemplateOptions = struct

Options controlling chat template rendering behavior.

src/model/tokenizer.zig:871

function

supportsThinkingToggle

#
pub fn supportsThinkingToggle(self: *const Tokenizer) bool

Return whether the model's chat template supports an explicit thinking toggle.

src/model/tokenizer.zig:929

function

applyChatTemplate

#
pub fn applyChatTemplate(self: *const Tokenizer, roles: []const []const u8, contents: []const []const u8, buf: []u8) ![]const u8

Format a conversation into a model prompt using the embedded chat template.

Convenience wrapper around `applyChatTemplateWithOptions` with default options.

Parameters

roles
Parallel slice of role strings (e.g. "user", "assistant", "system").
contents
Parallel slice of message body strings.
buf
Caller-supplied output buffer that receives the formatted prompt.

Returns

A slice of `buf` containing the rendered prompt.

src/model/tokenizer.zig:944

function

applyChatTemplateWithOptions

#
pub fn applyChatTemplateWithOptions(self: *const Tokenizer, roles: []const []const u8, contents: []const []const u8, options: ChatTemplateOptions, buf: []u8) ![]const u8

Format a conversation into a model prompt with fine-grained rendering control.

Dispatches to the appropriate template renderer (ChatML, Llama-3, Gemma, OpenAI-MoE, or generic) based on `detectTemplateKind()`. Supports optional thinking tags, tool definitions, forced tool-call prefills, and generation-prompt suffixes.

Parameters

roles
Parallel slice of role strings (e.g. "user", "assistant", "system").
contents
Parallel slice of message body strings.
options
Rendering options; see `ChatTemplateOptions` for details.
buf
Caller-supplied output buffer that receives the formatted prompt.

Returns

A slice of `buf` containing the rendered prompt.

src/model/tokenizer.zig:957