Last updated: 2026-06-12

Model Format & Loading

Loader Metal

All API Sections

Metal-specific model loading — zero-copy via mmap + newBufferWithBytesNoCopy.

This replaces the Vulkan loader's staging-buffer DMA with direct mmap wrapping.

7 exports 1 methods src/model/loader_metal.zig

7 exports shown

struct

ModelInspection

#
pub const ModelInspection = struct

Summary returned by `inspectModel`: config plus file and tensor size statistics.

src/model/loader_metal.zig:16

struct

LoadedTensor

#
pub const LoadedTensor = struct

A tensor descriptor paired with a Metal buffer holding its weight data (mmap-wrapped or copied).

src/model/loader_metal.zig:29

struct

Model

#
pub const Model = struct

Runtime model state backed by a memory-mapped GGUF file and zero-copy Metal buffers.

src/model/loader_metal.zig:36

Methods

1

function

residentWeightBytes

#
pub fn residentWeightBytes(model: *const Model) u64

Returns the total byte count of model weights that are resident as Metal resources.

Copied tensor arenas replace their mmap-backed tensors in the GPU-visible working set, so arena bytes are counted once and aliased per-tensor handles are skipped to avoid double-counting.

Parameters

model
The loaded model whose resident weight size to measure.

Returns

Total bytes across all Metal-resident weight buffers (arenas + owned tensors).

src/model/loader_metal.zig:85

function

inspectConfig

#
pub fn inspectConfig(path: []const u8, allocator: std.mem.Allocator) !ModelConfig

Parse a GGUF file's metadata and return the derived `ModelConfig` without touching the GPU.

The file is memory-mapped and unmapped before returning; no Metal resources are created.

Parameters

path
Filesystem path to the `.gguf` model file.
allocator
Allocator used for GGUF metadata parsing (freed before return).

Returns

Parsed `ModelConfig` or an error if the file cannot be opened or parsed.

src/model/loader_metal.zig:440

function

inspectModel

#
pub fn inspectModel(path: []const u8, allocator: std.mem.Allocator) !ModelInspection

Parse a GGUF file and return a `ModelInspection` with size statistics and the derived config.

Computes the total raw byte size of all tensor payloads stored in the file. No GPU resources are created; the file mapping is released before returning.

Parameters

path
Filesystem path to the `.gguf` model file.
allocator
Allocator used for GGUF metadata parsing (freed before return).

Returns

`ModelInspection` containing file size, tensor byte count, and `ModelConfig`.

src/model/loader_metal.zig:470

function

load

#
pub fn load( path: []const u8, metal_ctx: ?*shim.MetalCtx, allocator: std.mem.Allocator, ) !Model

Load a GGUF model file and return a `Model` backed by zero-copy Metal buffers.

Each tensor's data is wrapped in a `newBufferWithBytesNoCopy` Metal buffer over the mmap'd file region. For model architectures that benefit from it (e.g. dense Gemma layers), select tensors are copied into pre-allocated Metal arenas to avoid UMA pressure from mixed mmap/Metal page-fault patterns. All weight buffers are registered with an `MTLResidencySet` on macOS 15+ to prevent paging between layers. supported architecture (qwen2, qwen2_moe, qwen35, mistral, mamba, jamba, gemma).

Parameters

path
Filesystem path to the `.gguf` model file.
metal_ctx
Active Metal context used to create and wrap GPU buffers; must be non-null.
allocator
Allocator for tensor and arena bookkeeping (retained in the returned `Model`).

Returns

Initialized `Model` or an error if the file cannot be mapped, parsed, or lacks a

src/model/loader_metal.zig:518