Documentation
Start with how to actually use ZINC: confirm the hardware, build the binary, run one model, then go deeper into tuning and internals.
# Fastest path: build ZINC, see which supported models fit,
# pull one, and run it end to end.
git clone https://github.com/zolotukhin/zinc.git
cd zinc
zig build -Doptimize=ReleaseFast
export RADV_PERFTEST=coop_matrix
# Verify Vulkan, shaders, and the runtime path first
./zig-out/bin/zinc --check
# Show supported models for this machine
./zig-out/bin/zinc model list
# Pull one managed model from the built-in catalog
./zig-out/bin/zinc model pull qwen35-9b-q4k-m
# Check the managed model before the first run
./zig-out/bin/zinc --check --model-id qwen35-9b-q4k-m
# Run a prompt with that managed model
./zig-out/bin/zinc \
--model-id qwen35-9b-q4k-m \
--prompt "The capital of France is"
# If you already have a local GGUF file, use it directly instead
./zig-out/bin/zinc --check -m /path/to/model.gguf
./zig-out/bin/zinc \
-m /path/to/model.gguf \
--prompt "The capital of France is" Start Here
The practical docs for getting ZINC onto a real machine and actually running it.
Getting Started
FirstBuild ZINC, verify Vulkan, and run the first prompt.
Hardware Requirements
Linux, AMD/Intel GPUs, Vulkan, VRAM, macOS, and Apple Silicon expectations.
Running ZINC
CLI usage, server mode, KV quantization, and graph exports.
Serving HTTP API
OpenAI-compatible request and response shapes.
Development Guide
Build, test, debug, benchmark, and contribute.
OpenCode Setup
Configure OpenCode local coding with ZINC and Qwen through the OpenAI-compatible API, with thinking, tools, and trace proxy guidance.
Deep Docs
The architecture, tuning, and reference material once the runtime path is already clear.
Technical Specification
Architecture, kernels, model support, and scheduler behavior.
RDNA4 Tuning Guide
Cooperative matrix, bandwidth tuning, and toolchain notes.
AMD GPU Reference
RDNA3 and RDNA4 hardware specs, ISA, and memory hierarchy.
NVIDIA GPU Reference
RTX 30/40/50 (Ampere/Ada/Blackwell) specs, SM architecture, PTX/SASS opcodes, tensor cores, and CUDA inference tuning.
Intel GPU Reference
Arc B-series specs, Xe2 opcodes, memory bandwidth, and inference tuning notes.
TurboQuant Compression
Two-stage vector quantization and KV cache reduction.
Apple Silicon Reference
M1–M5 generations, Metal GPU families, and capability progression.
Apple Metal Reference
MSL kernels, simdgroup ops, threadgroup memory, and pipeline optimization.
Metal Enablement Notes
Implementation notes for the Vulkan-to-Metal port.
Benchmarking ZINC
CUDA backend for ZINC — design & implementation plan
GPU Memory Scaling Plan
Metal Performance Plan
ZINC Roadmap
ZINC_RT — The ZINC Runtime
Zig API Reference
GeneratedReference built from src/**/*.zig, with module pages, symbol anchors, and source links.