Last updated: 2026-06-12
Inference Runtime
Kmd
Thin AMDGPU kernel-driver queries used by direct ZINC_RT tiers.
This file intentionally starts with capability discovery only. T2 UMQ queue creation needs the same UAPI definitions, but selection must first prove the kernel exposes compute user queues instead of relying on kernel version alone.
42 exports shown
enum
QueryStatus
pub const QueryStatus = enum Outcome of probing the AMDGPU render node for compute user-queue support.
Each variant maps to a specific failure mode when discovering whether the kernel exposes the UMQ surface ZINC_RT tier 2 needs.
struct
ComputeUserqInfo
pub const ComputeUserqInfo = struct Compute user-queue capability metadata reported by the kernel.
Captures the slot count and the EOP (end-of-pipe) scratch buffer sizing the driver requires when creating a compute UMQ.
struct
QueryResult
pub const QueryResult = struct Combined result returned by `queryComputeUserq`.
Carries the discovery status plus optional capability info and the errno captured from the failing ioctl, when applicable.
constant
AMDGPU_HW_IP_COMPUTE
pub const AMDGPU_HW_IP_COMPUTE: u32 = 1 AMDGPU HW IP type selector for the compute pipe used by `AMDGPU_INFO` queries.
constant
AMDGPU_INFO_HW_IP_INFO
pub const AMDGPU_INFO_HW_IP_INFO: u32 = 0x02 `AMDGPU_INFO` sub-query that returns `DrmAmdgpuInfoHwIp` for a given HW IP.
constant
AMDGPU_INFO_UQ_FW_AREAS
pub const AMDGPU_INFO_UQ_FW_AREAS: u32 = 0x24 `AMDGPU_INFO` sub-query that returns user-queue firmware area metadata (`DrmAmdgpuInfoUqMetadata`).
constant
AMDGPU_USERQ_OP_CREATE
pub const AMDGPU_USERQ_OP_CREATE: u32 = 1 `DRM_AMDGPU_USERQ` op code that allocates a new user-mode queue.
constant
AMDGPU_USERQ_OP_FREE
pub const AMDGPU_USERQ_OP_FREE: u32 = 2 `DRM_AMDGPU_USERQ` op code that releases a previously created user-mode queue.
constant
AMDGPU_GEM_DOMAIN_GTT
pub const AMDGPU_GEM_DOMAIN_GTT: u64 = 0x2 GEM domain flag requesting allocation in system GTT memory.
constant
AMDGPU_GEM_DOMAIN_VRAM
pub const AMDGPU_GEM_DOMAIN_VRAM: u64 = 0x4 GEM domain flag requesting allocation in device VRAM.
constant
AMDGPU_GEM_DOMAIN_DOORBELL
pub const AMDGPU_GEM_DOMAIN_DOORBELL: u64 = 0x40 GEM domain flag requesting allocation in the MMIO doorbell aperture.
constant
AMDGPU_GEM_CREATE_CPU_GTT_USWC
pub const AMDGPU_GEM_CREATE_CPU_GTT_USWC: u64 = 1 << 2 GEM creation flag asking the kernel to map GTT memory as CPU write-combined for fast streaming writes.
constant
AMDGPU_GEM_CREATE_VRAM_CLEARED
pub const AMDGPU_GEM_CREATE_VRAM_CLEARED: u64 = 1 << 3 GEM creation flag asking the kernel to zero-fill VRAM allocations before returning the BO.
constant
AMDGPU_GEM_CREATE_VM_ALWAYS_VALID
pub const AMDGPU_GEM_CREATE_VM_ALWAYS_VALID: u64 = 1 << 6 GEM creation flag keeping the BO permanently mapped in the device VM so it never needs revalidation.
constant
AMDGPU_VA_OP_MAP
pub const AMDGPU_VA_OP_MAP: u32 = 1 `DRM_AMDGPU_GEM_VA` operation that binds a BO into the device virtual address space.
constant
AMDGPU_VM_PAGE_READABLE
pub const AMDGPU_VM_PAGE_READABLE: u32 = 1 << 1 VA mapping flag granting GPU read access to the mapped range.
constant
AMDGPU_VM_PAGE_WRITEABLE
pub const AMDGPU_VM_PAGE_WRITEABLE: u32 = 1 << 2 VA mapping flag granting GPU write access to the mapped range.
constant
AMDGPU_VM_PAGE_EXECUTABLE
pub const AMDGPU_VM_PAGE_EXECUTABLE: u32 = 1 << 3 VA mapping flag granting GPU shader-execute access to the mapped range.
constant
AMDGPU_VM_MTYPE_DEFAULT
pub const AMDGPU_VM_MTYPE_DEFAULT: u32 = 0 << 5 VA mapping flag selecting the default memory type (MTYPE) for the GPU page table entry.
struct
DrmAmdgpuGemCreateIn
pub const DrmAmdgpuGemCreateIn = extern struct Input layout for the `DRM_IOCTL_AMDGPU_GEM_CREATE` ioctl.
Mirrors the kernel UAPI struct describing the requested buffer size, alignment, domain mask, and creation flags.
struct
DrmAmdgpuGemCreateOut
pub const DrmAmdgpuGemCreateOut = extern struct Output layout returned by `DRM_IOCTL_AMDGPU_GEM_CREATE`, holding the freshly allocated BO handle.
union
DrmAmdgpuGemCreate
pub const DrmAmdgpuGemCreate = extern union Tagged union packing the in/out forms of the GEM-create ioctl into the same buffer the kernel reads and writes.
struct
DrmAmdgpuGemMmapIn
pub const DrmAmdgpuGemMmapIn = extern struct Input layout for `DRM_IOCTL_AMDGPU_GEM_MMAP`, identifying the BO to expose to userspace.
struct
DrmAmdgpuGemMmapOut
pub const DrmAmdgpuGemMmapOut = extern struct Output layout returned by `DRM_IOCTL_AMDGPU_GEM_MMAP` with the file offset to pass to `mmap`.
union
DrmAmdgpuGemMmap
pub const DrmAmdgpuGemMmap = extern union Tagged union packing the in/out forms of the GEM-mmap ioctl into one shared buffer.
struct
DrmAmdgpuGemVa
pub const DrmAmdgpuGemVa = extern struct Argument layout for `DRM_IOCTL_AMDGPU_GEM_VA`, the ioctl that maps a BO into the GPU virtual address space.
Encodes the BO handle, VA operation, page-permission flags, target VA range, and any syncobj fence handles.
struct
DrmAmdgpuInfo
pub const DrmAmdgpuInfo = extern struct Argument layout for `DRM_IOCTL_AMDGPU_INFO`, the generic info-query ioctl.
`return_pointer`/`return_size` describe a userspace output buffer; `query` selects a sub-query whose discriminator-specific parameters live in `query_data`.
struct
DrmAmdgpuInfoHwIp
pub const DrmAmdgpuInfoHwIp = extern struct Output buffer for `AMDGPU_INFO_HW_IP_INFO`.
Reports the HW IP version and capabilities, ring-buffer alignment requirements, the bitmask of available kernel rings, and the count of user-queue slots — tier 2 checks both `available_rings` and `userq_num_slots` to confirm UMQ support.
union
DrmAmdgpuInfoUqMetadata
pub const DrmAmdgpuInfoUqMetadata = extern union Output buffer for `AMDGPU_INFO_UQ_FW_AREAS`, sized per IP type.
Reports the per-queue scratch buffers the firmware needs: shadow/CSA areas for GFX, the EOP buffer for compute, and the CSA area for SDMA.
struct
DrmAmdgpuUserqIn
pub const DrmAmdgpuUserqIn = extern struct Input layout for the `DRM_IOCTL_AMDGPU_USERQ` ioctl.
Describes the create/free op, the target IP (compute, gfx, sdma), the doorbell BO handle and slot offset within it, the VA ranges of the queue ring buffer plus its read/write pointers, and a pointer to the IP-specific MQD blob.
struct
DrmAmdgpuUserqOut
pub const DrmAmdgpuUserqOut = extern struct Output layout returned by a successful `AMDGPU_USERQ_OP_CREATE`, containing the kernel-assigned queue id used by subsequent ioctls.
union
DrmAmdgpuUserq
pub const DrmAmdgpuUserq = extern union Tagged union packing the in/out forms of the user-queue ioctl into the same buffer.
struct
DrmAmdgpuUserqMqdComputeGfx11
pub const DrmAmdgpuUserqMqdComputeGfx11 = extern struct GFX11 compute MQD payload pointed at by `DrmAmdgpuUserqIn.mqd`.
Currently only the EOP scratch VA is required; matches the kernel's `drm_amdgpu_userq_mqd_compute_gfx11` layout.
struct
Bo
pub const Bo = struct Thin handle for a kernel-managed GEM buffer object.
Pairs the kernel GEM handle with the allocation size so callers can re-issue ioctls (mmap, VA map) without re-querying the size.
function
queryComputeUserq
pub fn queryComputeUserq(render_node: []const u8) QueryResult Probe an AMDGPU render node to decide whether the compute user-mode-queue surface is usable.
Opens the render node, asks the kernel for compute HW IP info plus user-queue firmware areas, and validates that the queue slots and EOP buffer parameters look sane.
function
queryHwIp
pub fn queryHwIp(file: std.fs.File, ip_type: u32) !DrmAmdgpuInfoHwIp Issue `AMDGPU_INFO_HW_IP_INFO` for the given HW IP type on an open render-node file.
function
createGem
pub fn createGem(file: std.fs.File, size: u64, alignment: u64, domains: u64, flags: u64) !Bo Allocate a GEM buffer object via `DRM_IOCTL_AMDGPU_GEM_CREATE`.
function
mmapGem
pub fn mmapGem(file: std.fs.File, bo: Bo, prot: u32) ![]align(std.heap.page_size_min) u8 Map a previously created GEM BO into the calling process's address space.
First queries the kernel for the BO's mmap offset, then issues a shared `mmap` against the render-node fd at that offset.
function
mapGemVa
pub fn mapGemVa(file: std.fs.File, bo: Bo, va: u64, flags: u32) !void Bind a GEM BO into the device virtual address space at a caller-chosen VA.
Performs an `AMDGPU_VA_OP_MAP` over the full BO size starting at offset 0.
function
createComputeUserq
pub fn createComputeUserq( file: std.fs.File, doorbell_handle: u32, doorbell_offset: u32, queue_va: u64, queue_size: u64, rptr_va: u64, wptr_va: u64, eop_va: u64, flags: u32, ) !u32 Create a compute user-mode queue via `DRM_IOCTL_AMDGPU_USERQ`.
Builds a GFX11 compute MQD pointing at the caller-supplied EOP scratch VA and submits an `AMDGPU_USERQ_OP_CREATE`.
function
freeUserq
pub fn freeUserq(file: std.fs.File, queue_id: u32) !void Release a user-mode queue previously returned by `createComputeUserq` via `AMDGPU_USERQ_OP_FREE`.
function
lastErrno
pub fn lastErrno() ?linux.E Return the errno captured by the most recent ioctl performed through this module.