About
I'm Stepan Zolotukhin. I'm building a local LLM inference engine for AMD GPUs and Apple Silicon.
Right now I spend most of my time on GPU compute, Zig, Vulkan, Metal, and local AI inference.
I spent seven years building infrastructure for storage systems. For the past six years, I've worked at AWS on low-level systems problems across Linux kernels, userland, cross-compilation, glibc, and adjacent platform work.
I'm building ZINC, an inference engine for AMD GPUs and Apple Silicon. It uses Zig with Vulkan compute shaders on AMD and native Metal shaders on Apple Silicon — no ROCm, no MLX. The goal is to make the hardware people already own useful for local LLM inference.
I write about the work here: how the GPU kernels are tuned, what the quantization tradeoffs look like, and the systems engineering decisions behind ZINC.