Last updated: 2026-06-12
Inference Runtime
Packet
PM4 packet builder shared by direct AMD ZINC_RT tiers.
This is intentionally syntax-only: it does not know about model shapes or IR op semantics. M1 lowering hands already-decided register writes and dispatch dimensions to this builder, then T2/T1 copy the resulting dwords into their user queue rings.
11 exports shown
constant
Error
pub const Error = error{OutOfSpace} Failure modes returned by `PacketBuilder` operations.
`OutOfSpace` means the caller-provided dword buffer cannot fit another PM4 packet without overrunning its bounds.
constant
sh_reg_num_thread_x
pub const sh_reg_num_thread_x: u32 = 0x207 SH register offset for `COMPUTE_NUM_THREAD_X` (workgroup X dimension).
constant
sh_reg_pgm_lo
pub const sh_reg_pgm_lo: u32 = 0x20c SH register offset for `COMPUTE_PGM_LO` (low 32 bits of the shader address).
constant
sh_reg_pgm_rsrc1
pub const sh_reg_pgm_rsrc1: u32 = 0x212 SH register offset for `COMPUTE_PGM_RSRC1` (VGPR/SGPR counts and float mode).
constant
sh_reg_resource_limits
pub const sh_reg_resource_limits: u32 = 0x215 SH register offset for `COMPUTE_RESOURCE_LIMITS` (waves-per-CU and locking).
constant
sh_reg_pgm_rsrc3
pub const sh_reg_pgm_rsrc3: u32 = 0x228 SH register offset for `COMPUTE_PGM_RSRC3` (extra GFX11+ shader resource bits).
constant
compute_user_data_0
pub const compute_user_data_0: u32 = 0x240 SH register offset for `COMPUTE_USER_DATA_0`; subsequent slots are contiguous and used to pass kernel argument pointers.
constant
dispatch_initiator_compute
pub const dispatch_initiator_compute: u32 = 5 Default `DISPATCH_INITIATOR` value enabling the compute pipeline.
struct
PacketBuilder
pub const PacketBuilder = struct Cursor-style writer that emits PM4 type-3 packets into a caller-owned dword buffer.
The builder is allocation-free and stateless beyond a write cursor, so callers can reuse the same backing buffer across submissions by calling `reset`.
Methods
13method
PacketBuilder.init
pub fn init(words: []u32) PacketBuilder Wrap a pre-allocated dword buffer.
index 0 and never grows the slice.
method
PacketBuilder.reset
pub fn reset(self: *PacketBuilder) void Rewind the write cursor without touching the backing buffer.
method
PacketBuilder.written
pub fn written(self: *const PacketBuilder) []const u32 Borrowed view of the dwords emitted so far.
method
PacketBuilder.writeNop
pub fn writeNop(self: *PacketBuilder, payload_dwords: u32) Error!void Emit a PM4 `NOP` packet that consumes `payload_dwords` body dwords.
minimum of 1 to satisfy the PKT3 body-size encoding.
method
PacketBuilder.setShReg
pub fn setShReg(self: *PacketBuilder, reg_offset: u32, values: []const u32) Error!void Emit `SET_SH_REG` writing `values` into consecutive SH register slots.
method
PacketBuilder.setShRegOne
pub fn setShRegOne(self: *PacketBuilder, reg_offset: u32, value: u32) Error!void Convenience helper that writes a single SH register.
method
PacketBuilder.setUserData64
pub fn setUserData64(self: *PacketBuilder, slot: u32, value: u64) Error!void Write a 64-bit value into a pair of contiguous `COMPUTE_USER_DATA_*` slots, little-endian (low dword first).
method
PacketBuilder.dispatchDirect
pub fn dispatchDirect(self: *PacketBuilder, dim_x: u32, dim_y: u32, dim_z: u32) Error!void Emit `DISPATCH_DIRECT` with a zero dispatch-initiator field.
Prefer `dispatchDirectInitiator` when `COMPUTE_DISPATCH_INITIATOR` bits must be set explicitly (e.g. `dispatch_initiator_compute`).
method
PacketBuilder.dispatchDirectInitiator
pub fn dispatchDirectInitiator( self: *PacketBuilder, dim_x: u32, dim_y: u32, dim_z: u32, dispatch_initiator: u32, ) Error!void Emit `DISPATCH_DIRECT` with a caller-supplied dispatch initiator value.
to take the firmware default or use `dispatch_initiator_compute` to force-enable the compute pipeline.
method
PacketBuilder.releaseMemSignal
pub fn releaseMemSignal(self: *PacketBuilder, gpu_addr: u64, value: u64) Error!void Emit a GFX10+ `RELEASE_MEM` end-of-pipe fence that writes `value` to `gpu_addr` after prior shader work and global-memory writes complete.
compute-ring fences; older ASICs are not a ZINC_RT direct target.
method
PacketBuilder.writeData64
pub fn writeData64(self: *PacketBuilder, gpu_addr: u64, value: u64) Error!void Emit `WRITE_DATA` that stores `value` (64 bits) at `gpu_addr` via the ME (micro-engine) with WR_CONFIRM set.
The ME stalls until the write lands in memory (WR_CONFIRM=1), so callers observe the value as soon as the packet retires.
method
PacketBuilder.copyData32
pub fn copyData32(self: *PacketBuilder, src_gpu_addr: u64, dst_gpu_addr: u64) Error!void Emit `COPY_DATA` that copies a single 32-bit dword from one GPU memory address to another with WR_CONFIRM set.
command processor stalls until the destination write completes.
method
PacketBuilder.padToAlignment
pub fn padToAlignment(self: *PacketBuilder, dword_alignment: usize) Error!void Pad the buffer with `NOP` packets until the current write cursor is a multiple of `dword_alignment` dwords.
in a single packet rather than spinning out many minimum-size NOPs.
function
lo32
pub fn lo32(value: u64) u32 Extract the low 32 bits of a 64-bit value for little-endian dword writes.
function
hi32
pub fn hi32(value: u64) u32 Extract the high 32 bits of a 64-bit value for little-endian dword writes.