Last updated: 2026-06-12
API Server
Model Manager Runtime
Backend-selected model manager for the HTTP server.
This thin shim keeps the HTTP server code importing one stable manager type while build-time backend selection decides whether that implementation comes from the Vulkan runtime or the Apple Silicon Metal runtime.
2 exports shown
constant
LoadSpec
pub const LoadSpec = impl.LoadSpec Specification describing which model to load: a filesystem path, an optional managed-catalog ID, and an optional context-length override.
constant
ModelManager
pub const ModelManager = impl.ModelManager Thread-safe manager for the currently active model and inference engine; handles loading, hot-swapping, catalog queries, and VRAM budget enforcement.