Last updated: 2026-06-12

Tool Calling

Tool Format

All API Sections

Pluggable tool-calling format dispatch for chat completions.

`chatmlToolFormat()` handles Qwen3-family models. `NoopToolFormat` is the silent fallback for any other template kind.

10 exports 9 methods src/server/tool_format.zig

10 exports shown

struct

ToolDefinition

#
pub const ToolDefinition = struct

One tool definition extracted from the request's `tools` array.

src/server/tool_format.zig:26

struct

ToolCall

#
pub const ToolCall = struct

One parsed tool call extracted from assistant output.

src/server/tool_format.zig:34

struct

ParsedAssistantOutput

#
pub const ParsedAssistantOutput = struct

Result of parsing a non-streaming assistant message: the prose the user should see and any tool invocations the model emitted.

Returned by `ToolFormat.parseAssistantToolCalls` and consumed by the chat completions response builder when deciding between a `content` reply and a `tool_calls` reply.

src/server/tool_format.zig:47

enum

FeedResult

#
pub const FeedResult = enum

Disposition of a streaming chunk after the `StreamingDetector` has inspected it.

The chat completions streamer uses this to decide whether to forward bytes as a `content` delta, swallow them for further inspection, or flush a finished tool call.

src/server/tool_format.zig:58

struct

StreamingDetector

#
pub const StreamingDetector = struct

Streaming-mode tool-call detector.

The chat completions handler feeds each decoded chunk through this state machine; the detector buffers bytes that might be the start of a `<tool_call>` tag and flushes them either as content deltas or as parsed tool calls. One detector per streaming request.

src/server/tool_format.zig:71

Methods

5

method

StreamingDetector.deinit

#
pub fn deinit(self: *StreamingDetector) void

Free the internal buffers and any pending tool-call payloads.

Safe to call once at end-of-stream regardless of how many `feed` calls happened.

src/server/tool_format.zig:86

method

StreamingDetector.feed

#
pub fn feed(self: *StreamingDetector, chunk: []const u8) !FeedResult

Push the next decoded chunk into the detector and return how the caller should react.

The detector retains ownership of `chunk`'s contribution to the internal buffer; callers should drain via `takeContentDelta` and `takePendingToolCall` between feeds. tool call is ready to consume, or the bytes are still being held.

Parameters
chunk
Newly decoded model bytes.
Returns

A `FeedResult` indicating whether content is ready to emit, a

src/server/tool_format.zig:104

method

StreamingDetector.takeContentDelta

#
pub fn takeContentDelta(self: *StreamingDetector) []const u8

Drain pending content bytes.

The returned slice aliases the detector's internal buffer — consume it before the next feed call (subsequent feeds will overwrite the same allocation). The detector retains ownership and the buffer is freed by deinit.

src/server/tool_format.zig:220

method

StreamingDetector.takePendingToolCall

#
pub fn takePendingToolCall(self: *StreamingDetector) ?ToolCall

Drain one fully parsed tool call, FIFO order.

Caller takes ownership of the returned `id`/`name`/`arguments_json` slices and must free them with the same allocator that initialized this detector. Returns null when the queue is empty.

src/server/tool_format.zig:230

method

StreamingDetector.finalize

#
pub fn finalize(self: *StreamingDetector) []const u8

Flush all remaining buffered bytes as content at end of stream.

Any bytes still held in the speculative-match buffer are appended to the pending content buffer, and the combined slice is returned. The returned slice aliases the internal buffer; consume before calling deinit.

src/server/tool_format.zig:239

struct

ToolFormat

#
pub const ToolFormat = struct

Vtable interface to a per-template tool-call format.

Lets the chat completions path render tool definitions, parse tool calls out of model output, and create matching streaming detectors without knowing whether the active template is ChatML, llama3, or anything else. Concrete implementations are minted via `forTemplate` or the per-format factories `chatmlToolFormat` / `noopToolFormat`.

src/server/tool_format.zig:254

Methods

4

method

ToolFormat.renderToolDefinitions

#
pub fn renderToolDefinitions( self: ToolFormat, tools: []const ToolDefinition, buf: *std.ArrayList(u8), allocator: std.mem.Allocator, ) anyerror!void

Append the format-specific rendering of `tools` (e.g.

Qwen's `# Tools\n<tools>...</tools>` block) to `buf`, ready to be spliced into the system message. Noop formats append nothing.

src/server/tool_format.zig:290

method

ToolFormat.renderToolResultMessage

#
pub fn renderToolResultMessage( self: ToolFormat, tool_call_id: []const u8, content: []const u8, buf: *std.ArrayList(u8), allocator: std.mem.Allocator, ) anyerror!void

Append the format-specific tool-result message (e.g.

ChatML's `<tool_response>...</tool_response>`) to `buf`. Used when replaying `role: "tool"` history entries into the prompt.

src/server/tool_format.zig:302

method

ToolFormat.parseAssistantToolCalls

#
pub fn parseAssistantToolCalls( self: ToolFormat, model_output: []const u8, allocator: std.mem.Allocator, ) anyerror!ParsedAssistantOutput

Split a complete (non-streaming) assistant response into prose plus a list of structured tool calls.

Allocates the returned slices with the caller's allocator; ownership transfers to the caller.

src/server/tool_format.zig:315

method

ToolFormat.newStreamingDetector

#
pub fn newStreamingDetector( self: ToolFormat, allocator: std.mem.Allocator, ) anyerror!*StreamingDetector

Create a fresh streaming-mode detector for one chat completion stream.

Caller owns the returned pointer and must `deinit` + `destroy` it.

src/server/tool_format.zig:325

struct

NoopToolFormat

#
pub const NoopToolFormat = struct

Silent fallback `ToolFormat` for templates that don't have a tool-call dialect wired in (everything except ChatML today).

Definition rendering is a no-op, parsing returns the model output verbatim with no tool calls, and the streaming detector treats everything as content.

src/server/tool_format.zig:341

function

noopToolFormat

#
pub fn noopToolFormat() ToolFormat

Build a `ToolFormat` backed by `NoopToolFormat`.

Returned value is cheap to pass around and shares a single static instance.

src/server/tool_format.zig:371

function

forTemplate

#
pub fn forTemplate(template_kind: TemplateKind) ToolFormat

Pick the right `ToolFormat` for a chat template family.

ChatML maps to the Qwen3-style `<tool_call>` dialect; everything else falls through to the no-op format (tools field accepted but silently ignored downstream).

src/server/tool_format.zig:385

function

chatmlToolFormat

#
pub fn chatmlToolFormat() ToolFormat

Build a `ToolFormat` that emits the Qwen3-family `<tool_call>...` dialect.

Used for any template detected as ChatML. Returned value is cheap to pass around and shares a single static instance.

src/server/tool_format.zig:595