Last updated: 2026-06-12

Primary Reference

ZINC Zig API

This is the main documentation surface for the codebase: generated from src/**/*.zig, grouped by functionality, and linked with stable module and symbol anchors.

Sections 15
Modules 89
Exports 616
Methods 399

API Section

CLI & Entrypoints

16 symbols

Startup, argument parsing, and the top-level process path that wires model loading, tokenization, and generation together.

API Section

Model Format & Loading

59 symbols

GGUF parsing, metadata normalization, and the runtime structures that move weights from disk into GPU-resident buffers.

API Section

Tokenization

15 symbols

Prompt and output text conversion between UTF-8 strings and token IDs used by the decode loop.

API Section

Decode Planning

57 symbols

Static graph construction and dependency ordering for the per-token compute work that the runtime records and submits.

API Section

Inference Runtime

400 symbols

Decode state, pipeline ownership, command recording, and token sampling inside the active inference loop.

src/bench_hot_decode.zig 1 exports · 0 methods

Bench Hot Decode

Hot-path decode kernel microbenchmarks.

  • main
src/bench_support.zig 12 exports · 0 methods

Bench Support

Shared helpers for benchmark and standalone runner entrypoints.

  • metal_device
  • metal_loader
  • metal_buffer
  • metal_command
src/compute/forward_cuda_gemma.zig 1 exports · 9 methods

Forward Cuda Gemma

CUDA forward pass for the dense gemma4 transformer (Gemma 4 31B-it).

  • ForwardGemma
src/compute/forward_cuda.zig 1 exports · 9 methods

Forward Cuda

CUDA forward pass for the dense `qwen35` hybrid-SSM model (Qwen 3.5 9B).

  • ForwardCuda
src/compute/forward_metal.zig 17 exports · 15 methods

Forward Metal

Metal inference engine — decode loop for Apple Silicon.

  • CommandEncoderMode
  • runtime_context_cap
  • DecodeState
  • GenerateMetrics
src/compute/forward_zinc_rt.zig 11 exports · 11 methods

Forward Zinc Rt

ZINC_RT forward-pass bring-up.

  • m0_max_decode_tokens_default
  • Model
  • DecodeGraphSummary
  • DirectComputeKind
src/compute/forward.zig 4 exports · 14 methods

Forward

Run the inference runtime: decode state, pipeline ownership, and token generation.

  • DecodeState
  • SamplingParams
  • InferenceEngine
  • generate
src/gpu/interface.zig 5 exports · 0 methods

Interface

GPU backend abstraction — comptime-resolved, zero runtime overhead.

  • is_metal
  • is_cuda
  • is_vulkan
  • backend
src/gpu/memory_plan.zig 12 exports · 9 methods

Memory Plan

Shared runtime memory accounting helpers for Vulkan and Metal backends.

  • RuntimeMemoryProfile
  • effectiveContextCeiling
  • applyRequestedContextLimit
  • requestedContextTokens
src/gpu/process_lock.zig 5 exports · 2 methods

Process Lock

Cross-process GPU reservation lock keyed by backend and selected device.

  • Backend
  • ProcessLock
  • AcquireError
  • lockPath
src/zinc_rt/engine.zig 6 exports · 2 methods

Engine

ZINC_RT — the ZINC Runtime.

  • Tier
  • Options
  • Engine
  • parseTier
src/zinc_rt/fast_pool.zig 3 exports · 4 methods

Fast Pool

Low-overhead worker pool for the T-CPU decode matvec fan-out.

  • max_workers
  • Task
  • FastPool
src/zinc_rt/isa/cpu_zig/dequant.zig 13 exports · 0 methods

Dequant

Shared scalar GGML dequantization helpers for T-CPU kernels.

  • row
  • dotRow
  • dotF32Row
  • dotF16Row
src/zinc_rt/isa/cpu_zig/embed.zig 2 exports · 0 methods

Embed

T-CPU EMBED implementation.

  • Params
  • run
src/zinc_rt/isa/cpu_zig/flash_attn.zig 2 exports · 0 methods

Flash Attn

T-CPU flash attention (single-query decode) implementation.

  • Params
  • run
src/zinc_rt/isa/cpu_zig/lm_head.zig 2 exports · 0 methods

Lm Head

T-CPU LM_HEAD implementation.

  • Params
  • run
src/zinc_rt/isa/cpu_zig/matvec.zig 2 exports · 0 methods

Matvec

T-CPU matrix-vector projection implementation.

  • Params
  • run
src/zinc_rt/isa/cpu_zig/mod.zig 6 exports · 0 methods

Mod

Pure Zig T-CPU opcode implementations.

  • rms_norm
  • swiglu
  • argmax
  • dequant
src/zinc_rt/isa/cpu_zig/moe_gate_topk.zig 3 exports · 0 methods

Moe Gate Topk

T-CPU MOE_GATE_TOPK implementation.

  • RoutingRule
  • Params
  • run
src/zinc_rt/isa/cpu_zig/residual_rms_norm.zig 2 exports · 0 methods

Residual Rms Norm

T-CPU residual add + RMS norm implementation.

  • Params
  • run
src/zinc_rt/isa/cpu_zig/rms_norm.zig 2 exports · 0 methods

Rms Norm

T-CPU RMS_NORM implementation.

  • Params
  • run
src/zinc_rt/isa/cpu_zig/rope.zig 2 exports · 0 methods

Rope

T-CPU RoPE (Rotary Positional Embedding) implementation.

  • Params
  • run
src/zinc_rt/isa/cpu_zig/sigmoid_mul.zig 2 exports · 0 methods

Sigmoid Mul

T-CPU sigmoid-gated multiply implementation.

  • Params
  • run
src/zinc_rt/isa/cpu_zig/swiglu.zig 2 exports · 0 methods

Swiglu

T-CPU SwiGLU implementation.

  • Params
  • run
src/zinc_rt/isa/cpu_zig/vadd.zig 2 exports · 0 methods

Vadd

T-CPU element-wise vector addition implementation.

  • Params
  • run
src/zinc_rt/kmd.zig 42 exports · 0 methods

Kmd

Thin AMDGPU kernel-driver queries used by direct ZINC_RT tiers.

  • QueryStatus
  • ComputeUserqInfo
  • QueryResult
  • AMDGPU_HW_IP_COMPUTE
src/zinc_rt/lib.zig 12 exports · 0 methods

Lib

ZINC_RT reference-runtime module.

  • engine
  • ir_op
  • ir_graph
  • kmd
src/zinc_rt/ring/cpu.zig 1 exports · 2 methods

Cpu

T-CPU ring backend.

  • CpuRing
src/zinc_rt/ring/cs.zig 36 exports · 17 methods

Cs

AMDGPU DRM command-submission (CS) path — bring-up of the RADV / radeonsi PM4 submission foundation.

  • default_render_node
  • AMDGPU_HW_IP_GFX
  • AMDGPU_HW_IP_COMPUTE
  • AMDGPU_CTX_OP_ALLOC_CTX
src/zinc_rt/ring/kfd.zig 40 exports · 2 methods

Kfd

AMDGPU KFD (`/dev/kfd`) bring-up for the T1 PM4-direct tier.

  • default_render_node
  • kfd_device_node
  • topology_nodes_dir
  • min_kfd_major
src/zinc_rt/ring/mod.zig 2 exports · 0 methods

Mod

Backend-neutral packet batch types for ZINC_RT rings.

  • Packet
  • PacketBatch
src/zinc_rt/ring/packet_list.zig 1 exports · 7 methods

Packet List

Dynamic packet list for building per-token decode sequences.

  • PacketList
src/zinc_rt/ring/packet.zig 11 exports · 13 methods

Packet

PM4 packet builder shared by direct AMD ZINC_RT tiers.

  • Error
  • sh_reg_num_thread_x
  • sh_reg_pgm_lo
  • sh_reg_pgm_rsrc1
src/zinc_rt/ring/umq.zig 17 exports · 2 methods

Umq

AMDGPU user-mode queue (T2) availability and create/free smoke gate.

  • min_linux_major
  • min_linux_minor
  • default_render_node
  • user_queue_param_path

API Section

Sampling

8 symbols

Logit post-processing, argmax helpers, and token-selection controls layered on top of the decode runtime.

API Section

Shader Dispatch

137 symbols

Typed wrappers around the compute shaders that prepare push constants, descriptor layouts, and per-op dispatch dimensions.

API Section

Hardware Detection

15 symbols

Vendor and architecture heuristics that translate raw Vulkan properties into tuning defaults for AMD, NVIDIA, and Intel GPUs.

API Section

Vulkan Runtime

47 symbols

Low-level Vulkan setup, memory allocation, buffers, pipelines, and command submission primitives used throughout the engine.

API Section

Metal Runtime

47 symbols

Low-level Metal device discovery, buffers, pipelines, and command submission primitives used by the Apple Silicon backend.

API Section

Managed Models

64 symbols

Catalog metadata, cache management, model downloads, and active-selection helpers used by the CLI and server.

API Section

Scheduler

26 symbols

Continuous batching scheduler, paged KV cache management, and request lifecycle tracking for concurrent inference serving.

API Section

API Server

65 symbols

OpenAI-compatible HTTP server, route dispatch, SSE streaming, and session management for serving inference over the network.

API Section

Tool Calling

19 symbols

Tool-use protocol helpers: chat-template-aware tool definitions, argument parsing, and response formatting for function-calling-capable models.

API Section

CUDA Runtime

40 symbols

CUDA device discovery, context management, device buffers, NVRTC-compiled pipelines, and stream-based command submission for the NVIDIA backend.

src/cuda/buffer.zig 8 exports · 2 methods

Buffer

CUDA buffer wrapper — device-local allocations with optional pinned staging.

  • CudaBuffer
  • createBuffer
  • createBufferStaged
  • uploadMmap
src/cuda/c.zig 1 exports · 0 methods

C

Shared C import for the CUDA shim — all CUDA backend modules import from here to ensure type identity across compilation units (mirrors src/metal/c.zig).

  • shim
src/cuda/command.zig 2 exports · 6 methods

Command

CUDA command wrapper — kernel dispatch and stream/event synchronization (mirrors src/metal/command.zig).

  • CudaCommand
  • beginCommand
src/cuda/device.zig 2 exports · 10 methods

Device

CUDA device wrapper — NVIDIA GPU backend (mirrors src/metal/device.zig).

  • CudaCapabilities
  • CudaDevice
src/cuda/pipeline.zig 5 exports · 0 methods

Pipeline

CUDA compute pipeline wrapper — NVRTC-compiled CUfunction (mirrors src/metal/pipeline.zig).

  • CudaPipeline
  • createPipeline
  • createPipelineFromImage
  • setMaxDynamicShared
src/cuda/smoke.zig 1 exports · 0 methods

Smoke

Standalone smoke test for the ZINC CUDA backend Zig wrapper layer. Drives the GPU entirely through device.zig / buffer.zig / pipeline.zig / command.zig (which wrap cuda_shim.c) — proving the Zig<->CUDA seam: device select, staged buffers + H2D/D2H, NVRTC runtime compile, the buffers+push dispatch ABI, and both sync and async commit paths.

  • main
src/dbg_cuda.zig 1 exports · 0 methods

Dbg Cuda

CUDA forward-pass debug harness for the qwen35 hybrid-SSM model. Two modes:

  • main
src/loadtest_cuda.zig 1 exports · 0 methods

Loadtest Cuda

Standalone load-test for src/model/loader_cuda.zig.

  • main
src/run_cuda.zig 1 exports · 0 methods

Run Cuda

Standalone CUDA greedy-decode driver for the qwen35 forward pass.

  • main

Developer Entry Points

Start with the hot paths

src/compute/forward.zig Forward Run the inference runtime: decode state, pipeline ownership, and token generation. src/compute/forward_metal.zig Forward Metal Metal inference engine — decode loop for Apple Silicon. src/compute/graph.zig Graph Represent decode work as a dependency graph that can be topologically ordered. src/model/gguf.zig GGUF Parse GGUF container files and expose the metadata needed by the loader. src/model/tokenizer.zig Tokenizer Native BPE tokenizer that reads vocabulary and merge rules from GGUF metadata. src/compute/dmmv.zig DMMV Wrap the decode-time matrix-vector shader family used for projection ops. src/compute/elementwise.zig Elementwise Wrap the fused element-wise shader family used by the decode loop. src/server/routes.zig Routes Route dispatcher and endpoint handlers for the OpenAI-compatible API.

Agent Access

Machine-Readable Entry Points

Agents should prefer the generated Zig API exports instead of scraping HTML. The JSON export is the canonical structured surface; llms.txt points callers at the right docs in the right order.

Supplemental

Architecture and Narrative Docs

These pages support the API reference with higher-level specs, tuning notes, and protocol descriptions. They are intentionally secondary to the generated Zig reference.