Last updated: 2026-06-12

API Server

Model Manager Metal

All API Sections

Metal-backed active-model runtime state for the HTTP server.

The server uses this manager to load one Metal model at a time, track the tokenizer and runtime allocations that belong to it, and project fit/status information back into the OpenAI-compatible model-management endpoints.

5 exports shown

struct

LoadSpec

#
pub const LoadSpec = struct

Identifies a model to load: a GGUF file path and optional managed-catalog id.

src/server/model_manager_metal.zig:22

struct

ModelSummary

#
pub const ModelSummary = struct

Compact view of one catalog entry for the HTTP `/v1/models` response.

src/server/model_manager_metal.zig:29

struct

ModelCatalogView

#
pub const ModelCatalogView = struct

Snapshot of the full model catalog, filtered by the current GPU profile.

src/server/model_manager_metal.zig:58

Methods

1

struct

LoadedResources

#
pub const LoadedResources = struct

All GPU and host resources for a loaded model: weights, tokenizer, and inference engine.

src/server/model_manager_metal.zig:70

struct

ModelManager

#
pub const ModelManager = struct

Thread-safe manager for the currently active model on the Metal backend.

Handles loading, hot-swapping, catalog queries, and VRAM budget enforcement.

src/server/model_manager_metal.zig:98

Methods

11

method

ModelManager.init

#
pub fn init( spec: LoadSpec, device: *const MetalDevice, allocator: std.mem.Allocator, ) !ModelManager

Create a manager and eagerly load the model described by `spec`.

Acquires the per-device GPU process lock before touching Metal resources.

Parameters
spec
Path, optional managed-catalog id, and optional context-length override.
device
Metal device to load weights onto.
allocator
Used for all heap allocations; must outlive the returned manager.
Returns

A fully initialised manager with a loaded model, or an error if loading fails.

src/server/model_manager_metal.zig:145

method

ModelManager.initEmpty

#
pub fn initEmpty( device: *const MetalDevice, requested_context_length: ?u32, allocator: std.mem.Allocator, ) ModelManager

Create an idle manager with no model currently loaded.

The GPU process lock is not acquired until the first model is activated. `null` lets the memory planner auto-size the context window.

Parameters
requested_context_length
Token limit to apply when a model is later loaded;

src/server/model_manager_metal.zig:171

method

ModelManager.currentResources

#
pub fn currentResources(self: *ModelManager) ?*LoadedResources

Return the currently loaded resource bundle, or null when idle.

src/server/model_manager_metal.zig:198

method

ModelManager.activeDisplayName

#
pub fn activeDisplayName(self: *ModelManager) []const u8

Return the active model display name, or `"none"` when no model is loaded.

src/server/model_manager_metal.zig:203

method

ModelManager.catalogProfile

#
pub fn catalogProfile(self: *const ModelManager) []const u8

Return the catalog profile string used for the active Metal device.

src/server/model_manager_metal.zig:210

method

ModelManager.currentMemoryUsage

#
pub fn currentMemoryUsage(self: *ModelManager) MemoryUsage

Snapshot memory usage for the active model, or zeroes when idle.

src/server/model_manager_metal.zig:215

method

ModelManager.collectCatalogView

#
pub fn collectCatalogView(self: *ModelManager, allocator: std.mem.Allocator, include_all: bool) !ModelCatalogView

Build a catalog view annotated with install, fit, and active-model status.

Entries are filtered to those that are supported on the current GPU profile and fit within the VRAM budget unless `include_all` is true. The currently-active model is always included even if its static VRAM estimate exceeds the live budget. Unrecognised loaded models (raw GGUF files with no catalog entry) appear as a synthetic entry with `managed = false`.

Parameters
allocator
Used to allocate the returned `ModelCatalogView.data` slice; caller must call `deinit`.
include_all
When true, unsupported and oversized entries are included in the result.
Returns

An owned `ModelCatalogView`; free with `ModelCatalogView.deinit`.

src/server/model_manager_metal.zig:249

method

ModelManager.supportsManagedEntry

#
pub fn supportsManagedEntry(self: *ModelManager, entry: catalog_mod.CatalogEntry, allocator: std.mem.Allocator) bool

Return whether a managed catalog entry is supported and fits on the active device.

src/server/model_manager_metal.zig:341

method

ModelManager.activateManagedModel

#
pub fn activateManagedModel(self: *ModelManager, model_id: []const u8, persist_active: bool) !void

Load the specified catalog model and make it the active inference target.

Hot-swaps the previous model if one is loaded. Validates that the model is installed, supported on this GPU profile, and fits within the VRAM budget before touching Metal.

Parameters
model_id
Catalog identifier of the managed model to activate.
persist_active
When true, writes the selection to the active-model config file.
Notes

Caller must already hold the shared generation lock.

src/server/model_manager_metal.zig:359

method

ModelManager.removeManagedModel

#
pub fn removeManagedModel(self: *ModelManager, model_id: []const u8, force: bool) !RemoveResult

Unload and delete a managed model from both GPU memory and the model store on disk.

If the model is currently loaded and `force` is false, returns `error.ModelLoadedInGpu`. With `force` true the model is evicted from GPU memory before deletion.

Parameters
model_id
Catalog identifier of the model to remove.
force
When true, evict the model from GPU memory even if it is active.
Returns

A `RemoveResult` describing what was unloaded and what was deleted.

Notes

Caller must already hold the shared generation lock.

src/server/model_manager_metal.zig:414