Last updated: 2026-06-12

Scheduler

KV Cache

All API Sections

Paged KV cache manager for concurrent request serving.

Manages a pool of fixed-size pages that are allocated per-request and freed on completion or cancellation. Each page maps to a contiguous region of the GPU KV cache buffer, giving each request non-overlapping token storage.

2 exports 7 methods src/scheduler/kv_cache.zig

2 exports shown

struct

KvPage

#
pub const KvPage = struct

A single page in the KV cache pool.

Each page maps to a contiguous region of the GPU KV buffer.

src/scheduler/kv_cache.zig:13

struct

KvPagePool

#
pub const KvPagePool = struct

Pool-based allocator for KV cache pages.

Tracks which pages are free and which are owned by active requests.

src/scheduler/kv_cache.zig:26

Methods

7

method

KvPagePool.init

#
pub fn init(allocator: std.mem.Allocator, total_pages: u32, page_size: u32) !KvPagePool

Initialize a page pool with the given number of pages and tokens per page.

Parameters
allocator
Allocator for the page array and free list.
total_pages
Number of pages to create.
page_size
Number of tokens each page can hold.
Returns

A KvPagePool with all pages initially free.

src/scheduler/kv_cache.zig:43

method

KvPagePool.allocPages

#
pub fn allocPages(self: *KvPagePool, request_id: u64, count: u32) ![]u32

Allocate `count` pages for a request and stamp them with `request_id`.

Parameters
request_id
Owner request ID recorded on each allocated page.
count
Number of pages to allocate.
Returns

Slice of allocated page IDs; caller must free it with the pool's allocator.

Notes

Returns error.KvCacheExhausted if fewer than `count` free pages remain.

src/scheduler/kv_cache.zig:72

method

KvPagePool.freePages

#
pub fn freePages(self: *KvPagePool, request_id: u64) void

Free all pages owned by a request, returning them to the free list.

Performs a linear scan over all pages; O(total_pages).

Parameters
request_id
Request whose pages should be freed.

src/scheduler/kv_cache.zig:87

method

KvPagePool.positionBase

#
pub fn positionBase(self: *const KvPagePool, page_ids: []const u32) u32

Return the token position base for a request's first allocated page.

Computed as `page_ids[0] * page_size`, which guarantees non-overlapping token storage across requests since each page_id maps to a disjoint range.

Parameters
page_ids
Allocated page IDs for the request (must be non-empty to get a meaningful result).
Returns

Token index of the first token slot owned by this request, or 0 if `page_ids` is empty.

src/scheduler/kv_cache.zig:102

method

KvPagePool.maxContext

#
pub fn maxContext(self: *const KvPagePool, page_count: u32) u32

Maximum context length (in tokens) that fits in `page_count` allocated pages.

Parameters
page_count
Number of pages allocated to the request.
Returns

`page_count * page_size` — the token capacity for those pages.

src/scheduler/kv_cache.zig:110

method

KvPagePool.freeCount

#
pub fn freeCount(self: *const KvPagePool) u32

Number of free pages currently available for allocation.

Returns

Count of unallocated pages remaining in the pool.

src/scheduler/kv_cache.zig:116