Last updated: 2026-06-12

Scheduler

Scheduler

All API Sections

Continuous-batching scheduler groundwork for concurrent inference requests.

Today this module owns request slot accounting only. The HTTP serving hot path still serializes generation behind ServerState.generation_mutex; the batched prefill/decode dispatch loop is not wired yet.

1 exports 8 methods src/scheduler/scheduler.zig

1 exports shown

struct

Scheduler

#
pub const Scheduler = struct

Fixed-capacity pool of request slots used to track concurrent inference requests.

Each slot holds at most one active `Request`; slots are reused once released.

src/scheduler/scheduler.zig:15

Methods

8

method

Scheduler.init

#
pub fn init(allocator: std.mem.Allocator, max_parallel: u32) !Scheduler

Initialize the scheduler with a fixed number of concurrent request slots.

Parameters
allocator
Allocator for the slot array.
max_parallel
Maximum number of concurrent requests.
Returns

A Scheduler with all slots initially empty.

src/scheduler/scheduler.zig:29

method

Scheduler.submit

#
pub fn submit(self: *Scheduler, prompt_tokens: []const u32, params: GenerationParams) !u32

Submit a new request and assign it to the first free slot.

Parameters
self
Scheduler to submit to.
prompt_tokens
Tokenized prompt for the request.
params
Generation parameters (max_tokens, temperature, etc.).
Returns

The slot index that was assigned; pass this value to `release` when the request completes.

Notes

Returns `error.AllSlotsBusy` if every slot is occupied.

src/scheduler/scheduler.zig:47

method

Scheduler.isFull

#
pub fn isFull(self: *const Scheduler) bool

Check if all slots are occupied.

Parameters
self
Scheduler to query.
Returns

True if every slot holds an active request.

src/scheduler/scheduler.zig:66

method

Scheduler.activeCount

#
pub fn activeCount(self: *const Scheduler) u32

Get the number of active (non-null) requests.

Parameters
self
Scheduler to query.
Returns

Count of occupied slots.

src/scheduler/scheduler.zig:73

method

Scheduler.pendingPrefill

#
pub fn pendingPrefill(self: *Scheduler) []u32

Return slot IDs of requests in the prefilling state.

Parameters
self
Scheduler to query.
Returns

Empty slice (stub — allocation strategy TBD).

src/scheduler/scheduler.zig:84

method

Scheduler.activeDecoding

#
pub fn activeDecoding(self: *Scheduler) []u32

Return slot IDs of requests in the decoding state.

Parameters
self
Scheduler to query.
Returns

Empty slice (stub — allocation strategy TBD).

src/scheduler/scheduler.zig:94

method

Scheduler.release

#
pub fn release(self: *Scheduler, slot_id: u32) void

Release a completed or cancelled request's slot, freeing its resources.

Parameters
self
Scheduler to release from.
slot_id
Slot index to free (the value returned by `submit`).
Notes

Silently does nothing if `slot_id` is out of range or the slot is already empty.

src/scheduler/scheduler.zig:103

method

Scheduler.deinit

#
pub fn deinit(self: *Scheduler) void

Tear down all active requests and free the slot array.

Parameters
self
Scheduler to destroy.

src/scheduler/scheduler.zig:115