Evergreen Guides

Local LLM Inference Topics

These pages are the stable entry points for the technical clusters behind ZINC. Start here when you want the practical answer, then follow the links into the deeper engineering posts.

OpenCode OpenCode Local Coding with Qwen and ZINC

OpenCode can use ZINC and Qwen through the same OpenAI-compatible `/v1/chat/completions` API that powers the browser chat UI. The useful setup is local and boring: run ZINC, point OpenCode at localhost, keep tools enabled, set honest context limits, and use the trace proxy while testing coding workflows.

Gemma 4 Gemma 4 Local Inference

Gemma 4 is a useful local inference target because it stresses the parts of an engine that simple Llama-shaped models do not: sliding-window attention, asymmetric grouped-query attention, Gemma-specific normalization, and MoE routing on the A4B variant.

Qwen3.6 Qwen3.6 Local Inference

Qwen3.6 is the core search cluster for ZINC because it combines model-architecture curiosity with practical local inference intent. Readers want to know what the model is, whether it exists as GGUF, and what an engine has to do to run it well.

AMD RDNA4 AMD RDNA4 LLM Inference

RDNA4 is the default hardware story for ZINC: useful consumer and workstation AMD GPUs, strong memory bandwidth, Vulkan support, and no dependence on ROCm for local LLM inference.

KV Cache KV Cache Quantization for Local LLMs

KV cache quantization is the long-context memory lever. Once a model fits, the prompt length and concurrent sessions are usually limited by K/V bytes per token, not by the static weight file.