Last updated: 2026-06-12
Tool Calling
Tool Format
Pluggable tool-calling format dispatch for chat completions.
`chatmlToolFormat()` handles Qwen3-family models. `NoopToolFormat` is the silent fallback for any other template kind.
10 exports shown
struct
ToolDefinition
pub const ToolDefinition = struct One tool definition extracted from the request's `tools` array.
struct
ToolCall
pub const ToolCall = struct One parsed tool call extracted from assistant output.
struct
ParsedAssistantOutput
pub const ParsedAssistantOutput = struct Result of parsing a non-streaming assistant message: the prose the user should see and any tool invocations the model emitted.
Returned by `ToolFormat.parseAssistantToolCalls` and consumed by the chat completions response builder when deciding between a `content` reply and a `tool_calls` reply.
enum
FeedResult
pub const FeedResult = enum Disposition of a streaming chunk after the `StreamingDetector` has inspected it.
The chat completions streamer uses this to decide whether to forward bytes as a `content` delta, swallow them for further inspection, or flush a finished tool call.
struct
StreamingDetector
pub const StreamingDetector = struct Streaming-mode tool-call detector.
The chat completions handler feeds each decoded chunk through this state machine; the detector buffers bytes that might be the start of a `<tool_call>` tag and flushes them either as content deltas or as parsed tool calls. One detector per streaming request.
Methods
5method
StreamingDetector.deinit
pub fn deinit(self: *StreamingDetector) void Free the internal buffers and any pending tool-call payloads.
Safe to call once at end-of-stream regardless of how many `feed` calls happened.
method
StreamingDetector.feed
pub fn feed(self: *StreamingDetector, chunk: []const u8) !FeedResult Push the next decoded chunk into the detector and return how the caller should react.
The detector retains ownership of `chunk`'s contribution to the internal buffer; callers should drain via `takeContentDelta` and `takePendingToolCall` between feeds. tool call is ready to consume, or the bytes are still being held.
method
StreamingDetector.takeContentDelta
pub fn takeContentDelta(self: *StreamingDetector) []const u8 Drain pending content bytes.
The returned slice aliases the detector's internal buffer — consume it before the next feed call (subsequent feeds will overwrite the same allocation). The detector retains ownership and the buffer is freed by deinit.
method
StreamingDetector.takePendingToolCall
pub fn takePendingToolCall(self: *StreamingDetector) ?ToolCall Drain one fully parsed tool call, FIFO order.
Caller takes ownership of the returned `id`/`name`/`arguments_json` slices and must free them with the same allocator that initialized this detector. Returns null when the queue is empty.
method
StreamingDetector.finalize
pub fn finalize(self: *StreamingDetector) []const u8 Flush all remaining buffered bytes as content at end of stream.
Any bytes still held in the speculative-match buffer are appended to the pending content buffer, and the combined slice is returned. The returned slice aliases the internal buffer; consume before calling deinit.
struct
ToolFormat
pub const ToolFormat = struct Vtable interface to a per-template tool-call format.
Lets the chat completions path render tool definitions, parse tool calls out of model output, and create matching streaming detectors without knowing whether the active template is ChatML, llama3, or anything else. Concrete implementations are minted via `forTemplate` or the per-format factories `chatmlToolFormat` / `noopToolFormat`.
Methods
4method
ToolFormat.renderToolDefinitions
pub fn renderToolDefinitions( self: ToolFormat, tools: []const ToolDefinition, buf: *std.ArrayList(u8), allocator: std.mem.Allocator, ) anyerror!void Append the format-specific rendering of `tools` (e.g.
Qwen's `# Tools\n<tools>...</tools>` block) to `buf`, ready to be spliced into the system message. Noop formats append nothing.
method
ToolFormat.renderToolResultMessage
pub fn renderToolResultMessage( self: ToolFormat, tool_call_id: []const u8, content: []const u8, buf: *std.ArrayList(u8), allocator: std.mem.Allocator, ) anyerror!void Append the format-specific tool-result message (e.g.
ChatML's `<tool_response>...</tool_response>`) to `buf`. Used when replaying `role: "tool"` history entries into the prompt.
method
ToolFormat.parseAssistantToolCalls
pub fn parseAssistantToolCalls( self: ToolFormat, model_output: []const u8, allocator: std.mem.Allocator, ) anyerror!ParsedAssistantOutput Split a complete (non-streaming) assistant response into prose plus a list of structured tool calls.
Allocates the returned slices with the caller's allocator; ownership transfers to the caller.
method
ToolFormat.newStreamingDetector
pub fn newStreamingDetector( self: ToolFormat, allocator: std.mem.Allocator, ) anyerror!*StreamingDetector Create a fresh streaming-mode detector for one chat completion stream.
Caller owns the returned pointer and must `deinit` + `destroy` it.
struct
NoopToolFormat
pub const NoopToolFormat = struct Silent fallback `ToolFormat` for templates that don't have a tool-call dialect wired in (everything except ChatML today).
Definition rendering is a no-op, parsing returns the model output verbatim with no tool calls, and the streaming detector treats everything as content.
function
noopToolFormat
pub fn noopToolFormat() ToolFormat Build a `ToolFormat` backed by `NoopToolFormat`.
Returned value is cheap to pass around and shares a single static instance.
function
forTemplate
pub fn forTemplate(template_kind: TemplateKind) ToolFormat Pick the right `ToolFormat` for a chat template family.
ChatML maps to the Qwen3-style `<tool_call>` dialect; everything else falls through to the no-op format (tools field accepted but silently ignored downstream).
function
chatmlToolFormat
pub fn chatmlToolFormat() ToolFormat Build a `ToolFormat` that emits the Qwen3-family `<tool_call>...` dialect.
Used for any template detected as ChatML. Returned value is cheap to pass around and shares a single static instance.