Last updated: 2026-05-31

Getting started with ZINC#

Experimental software: ZINC is under active development. The CLI path is the best-supported way to start. Server mode, model coverage, and performance tuning are still moving quickly.

ZINC runs local LLMs on AMD GPUs without ROCm by using Vulkan, and it runs the same managed GGUF model flow on Apple Silicon through Metal. The fastest way to check if it works on your machine is:

  1. Install Zig.
  2. Build the binary.
  3. Run one prompt from the terminal.

If that works, move on to the hardware requirements, running ZINC, and the lower-level tuning docs.

Fast path#

git clone https://github.com/zolotukhin/zinc.git
cd zinc
zig build -Doptimize=ReleaseFast
./zig-out/bin/zinc model pull qwen35-9b-q4k-m
./zig-out/bin/zinc --model-id qwen35-9b-q4k-m --prompt "What is the capital of France?" --chat

On RDNA4 Linux, set the cooperative matrix fast path before the check or first prompt:

export RADV_PERFTEST=coop_matrix

Before you start#

ZINC currently targets:

  • Linux with AMD RDNA3/RDNA4 GPUs through Vulkan 1.3
  • Linux with Intel Arc Xe2 / Battlemage GPUs through Vulkan 1.3 (experimental)
  • macOS with Apple Silicon (M1 through M5) through Metal
  • GGUF models (Q4_K, Q5_K, Q6_K, Q8_0, Q5_0, MXFP4, F16, F32 quantizations)

Windows is not supported. There is no native Windows build today, and WSL is not on the test matrix. If you are on Windows, you will need a Linux host (or VM with GPU passthrough) for the AMD/Intel paths, or a Mac for the Metal path.

Supported models#

This list is intentionally narrow. It shows the exact GGUFs that have been validated end-to-end.

Model Model ID (zinc model pull <id>) Exact GGUF Fits on Status
Qwen 3.5 9B qwen35-9b-q4k-m Qwen3.5-9B-Q4_K_M.gguf 8+ GB VRAM or unified supported
Gemma 4 26B-A4B MoE gemma4-26b-a4b-q4k-m gemma-4-26B-A4B-it-UD-Q4_K_M.gguf 16+ GB VRAM or unified supported
Gemma 4 31B gemma4-31b-q4k-m gemma-4-31B-it-Q4_K_M.gguf 24+ GB VRAM or unified supported
Qwen3.6 27B Dense qwen36-27b-q4k-m Qwen3.6-27B-Q4_K_M.gguf 24+ GB VRAM or unified experimental
Qwen3.6 35B-A3B UD qwen36-35b-a3b-q4k-xl Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf 24+ GB VRAM or unified supported

Install dependencies#

macOS (Apple Silicon)#

brew install zig
xcode-select --install

That is all you need. No Vulkan, no glslc, no Python, no MLX.

Linux (AMD GPU)#

sudo apt update
sudo apt install -y git libvulkan-dev vulkan-tools glslc

Then install Zig 0.15.2 or newer from ziglang.org/download.

Clone and build#

Same on both platforms:

git clone https://github.com/zolotukhin/zinc.git
cd zinc
zig build -Doptimize=ReleaseFast

The compiled binary ends up at ./zig-out/bin/zinc. On Linux, the build also compiles GLSL shaders to SPIR-V. On macOS, Metal shaders are compiled at runtime from MSL source.

Run the preflight check#

Before running a prompt, verify the machine and GPU:

./zig-out/bin/zinc --check

On RDNA4 Linux, enable cooperative matrix first:

export RADV_PERFTEST=coop_matrix
./zig-out/bin/zinc --check

The check verifies GPU detection, shader assets, and runtime initialization. If it reports READY [OK], you are good to go.

Browse the model catalog#

See what ZINC supports on your machine:

./zig-out/bin/zinc model list

The catalog auto-detects your GPU profile (amd-rdna4-32gb, apple-silicon, etc.) and shows which models fit.

Download a model#

./zig-out/bin/zinc model pull qwen35-9b-q4k-m

This downloads the model into a local cache and verifies the SHA-256 hash.

Run your first prompt#

The --chat flag wraps your prompt in the model's chat template (system prompt, role tags, etc.), which is required for instruct-tuned models to produce proper answers. Without --chat, the model treats the input as raw text completion, which still works but produces less focused output.

On RDNA4 Linux, the env var below is required — cooperative matrix is the fast path, and without it you may see slow or incorrect output. macOS users skip the export line.

# RDNA4 Linux only — required, not optional. Skip this line on macOS.
export RADV_PERFTEST=coop_matrix

./zig-out/bin/zinc --model-id qwen35-9b-q4k-m --prompt "What is the capital of France?" --chat

Good first-run signals in the logs:

info(loader): Loading model: ...
info(forward): Prefill complete: ...
info(forward): Generated 256 tokens in ... ms — XX.XX tok/s
info(zinc): Output text: Paris...

If you see a tok/s number and coherent output, the core path is working.

Start the chat UI#

./zig-out/bin/zinc chat

This starts the server (default port 9090) and opens the built-in chat UI in your browser. The server also exposes an OpenAI-compatible API at http://localhost:9090/v1.

You can also start the server manually:

./zig-out/bin/zinc --model-id qwen35-9b-q4k-m -p 8080

Then open http://localhost:8080/ in your browser.

Manage models#

# Set a default model for future runs
./zig-out/bin/zinc model use qwen35-9b-q4k-m

# Check the active default
./zig-out/bin/zinc model active

# Remove a cached model
./zig-out/bin/zinc model rm qwen35-9b-q4k-m