Last updated: 2026-06-12

Scheduler

Request

All API Sections

Request lifecycle management for concurrent inference serving.

Each incoming API request maps to a Request that tracks its state through prefill, decode, and completion phases.

3 exports 5 methods src/scheduler/request.zig

3 exports shown

enum

RequestState

#
pub const RequestState = enum

Request processing state machine.

Valid transitions: pending → prefilling → decoding → completed, with cancelled reachable from any active state and failed reachable from prefilling or decoding.

src/scheduler/request.zig:12

struct

GenerationParams

#
pub const GenerationParams = struct

Generation parameters from the API request.

src/scheduler/request.zig:28

struct

Request

#
pub const Request = struct

A single inference request with its lifecycle state.

src/scheduler/request.zig:44

Methods

5

method

Request.init

#
pub fn init(allocator: std.mem.Allocator, id: u64, prompt_tokens: []const u32, params: GenerationParams) Request

Create a new request in the pending state with the given prompt and parameters.

Parameters
allocator
Allocator for the generated token buffer.
id
Unique request identifier.
prompt_tokens
Tokenized prompt (owned by the caller).
params
Generation parameters (max_tokens, temperature, etc.).
Returns

A Request ready to be submitted to the scheduler.

src/scheduler/request.zig:72

method

Request.transition

#
pub fn transition(self: *Request, new_state: RequestState) !void

Advance the request through the state machine.

Parameters
self
Request to transition.
new_state
Target state (must be a valid successor of the current state).
Returns

error.InvalidTransition if the transition is not allowed by the state machine.

src/scheduler/request.zig:90

method

Request.appendToken

#
pub fn appendToken(self: *Request, token: u32) !void

Append a generated token and record the first-token timestamp if unset.

Parameters
self
Request to append to.
token
Generated token ID to add.

src/scheduler/request.zig:104

method

Request.shouldStop

#
pub fn shouldStop(self: *const Request, eos_token_id: u32) bool

Check if generation should stop (max_tokens reached or EOS token emitted).

Parameters
self
Request to check.
eos_token_id
End-of-sequence token ID.
Returns

True if generation should stop.

src/scheduler/request.zig:115

method

Request.deinit

#
pub fn deinit(self: *Request) void

Release the generated token buffer owned by this request.

Parameters
self
Request to tear down.

src/scheduler/request.zig:126