Last updated: 2026-06-12

Inference Runtime

Packet

All API Sections

PM4 packet builder shared by direct AMD ZINC_RT tiers.

This is intentionally syntax-only: it does not know about model shapes or IR op semantics. M1 lowering hands already-decided register writes and dispatch dimensions to this builder, then T2/T1 copy the resulting dwords into their user queue rings.

11 exports 13 methods src/zinc_rt/ring/packet.zig

11 exports shown

constant

Error

#
pub const Error = error{OutOfSpace}

Failure modes returned by `PacketBuilder` operations.

`OutOfSpace` means the caller-provided dword buffer cannot fit another PM4 packet without overrunning its bounds.

src/zinc_rt/ring/packet.zig:13

constant

sh_reg_num_thread_x

#
pub const sh_reg_num_thread_x: u32 = 0x207

SH register offset for `COMPUTE_NUM_THREAD_X` (workgroup X dimension).

src/zinc_rt/ring/packet.zig:34

constant

sh_reg_pgm_lo

#
pub const sh_reg_pgm_lo: u32 = 0x20c

SH register offset for `COMPUTE_PGM_LO` (low 32 bits of the shader address).

src/zinc_rt/ring/packet.zig:36

constant

sh_reg_pgm_rsrc1

#
pub const sh_reg_pgm_rsrc1: u32 = 0x212

SH register offset for `COMPUTE_PGM_RSRC1` (VGPR/SGPR counts and float mode).

src/zinc_rt/ring/packet.zig:38

constant

sh_reg_resource_limits

#
pub const sh_reg_resource_limits: u32 = 0x215

SH register offset for `COMPUTE_RESOURCE_LIMITS` (waves-per-CU and locking).

src/zinc_rt/ring/packet.zig:40

constant

sh_reg_pgm_rsrc3

#
pub const sh_reg_pgm_rsrc3: u32 = 0x228

SH register offset for `COMPUTE_PGM_RSRC3` (extra GFX11+ shader resource bits).

src/zinc_rt/ring/packet.zig:42

constant

compute_user_data_0

#
pub const compute_user_data_0: u32 = 0x240

SH register offset for `COMPUTE_USER_DATA_0`; subsequent slots are contiguous and used to pass kernel argument pointers.

src/zinc_rt/ring/packet.zig:45

constant

dispatch_initiator_compute

#
pub const dispatch_initiator_compute: u32 = 5

Default `DISPATCH_INITIATOR` value enabling the compute pipeline.

src/zinc_rt/ring/packet.zig:47

struct

PacketBuilder

#
pub const PacketBuilder = struct

Cursor-style writer that emits PM4 type-3 packets into a caller-owned dword buffer.

The builder is allocation-free and stateless beyond a write cursor, so callers can reuse the same backing buffer across submissions by calling `reset`.

src/zinc_rt/ring/packet.zig:54

Methods

13

method

PacketBuilder.init

#
pub fn init(words: []u32) PacketBuilder

Wrap a pre-allocated dword buffer.

index 0 and never grows the slice.

Parameters
words
Backing storage; the builder writes packets starting at
Returns

A builder pointing at `words` with an empty write cursor.

src/zinc_rt/ring/packet.zig:62

method

PacketBuilder.reset

#
pub fn reset(self: *PacketBuilder) void

Rewind the write cursor without touching the backing buffer.

Parameters
self
Builder to reset; subsequent writes overwrite previous dwords.

src/zinc_rt/ring/packet.zig:68

method

PacketBuilder.written

#
pub fn written(self: *const PacketBuilder) []const u32

Borrowed view of the dwords emitted so far.

Parameters
self
Builder to inspect.
Returns

Slice of finalized packet words, ready to copy into a ring.

src/zinc_rt/ring/packet.zig:75

method

PacketBuilder.writeNop

#
pub fn writeNop(self: *PacketBuilder, payload_dwords: u32) Error!void

Emit a PM4 `NOP` packet that consumes `payload_dwords` body dwords.

minimum of 1 to satisfy the PKT3 body-size encoding.

Parameters
self
Builder to append to.
payload_dwords
Number of zero payload dwords; clamped to a

src/zinc_rt/ring/packet.zig:83

method

PacketBuilder.setShReg

#
pub fn setShReg(self: *PacketBuilder, reg_offset: u32, values: []const u32) Error!void

Emit `SET_SH_REG` writing `values` into consecutive SH register slots.

Parameters
self
Builder to append to.
reg_offset
Starting SH register offset (dword units from 0xB000).
values
Register values written in order; an empty slice is a no-op.

src/zinc_rt/ring/packet.zig:94

method

PacketBuilder.setShRegOne

#
pub fn setShRegOne(self: *PacketBuilder, reg_offset: u32, value: u32) Error!void

Convenience helper that writes a single SH register.

Parameters
self
Builder to append to.
reg_offset
SH register offset.
value
Value to write into that register.

src/zinc_rt/ring/packet.zig:107

method

PacketBuilder.setUserData64

#
pub fn setUserData64(self: *PacketBuilder, slot: u32, value: u64) Error!void

Write a 64-bit value into a pair of contiguous `COMPUTE_USER_DATA_*` slots, little-endian (low dword first).

Parameters
self
Builder to append to.
slot
Zero-based index added to `compute_user_data_0`.
value
64-bit kernel argument (typically a GPU virtual address).

src/zinc_rt/ring/packet.zig:117

method

PacketBuilder.dispatchDirect

#
pub fn dispatchDirect(self: *PacketBuilder, dim_x: u32, dim_y: u32, dim_z: u32) Error!void

Emit `DISPATCH_DIRECT` with a zero dispatch-initiator field.

Prefer `dispatchDirectInitiator` when `COMPUTE_DISPATCH_INITIATOR` bits must be set explicitly (e.g. `dispatch_initiator_compute`).

Parameters
self
Builder to append to.
dim_x
Workgroup count on X.
dim_y
Workgroup count on Y.
dim_z
Workgroup count on Z.

src/zinc_rt/ring/packet.zig:129

method

PacketBuilder.dispatchDirectInitiator

#
pub fn dispatchDirectInitiator( self: *PacketBuilder, dim_x: u32, dim_y: u32, dim_z: u32, dispatch_initiator: u32, ) Error!void

Emit `DISPATCH_DIRECT` with a caller-supplied dispatch initiator value.

to take the firmware default or use `dispatch_initiator_compute` to force-enable the compute pipeline.

Parameters
self
Builder to append to.
dim_x
Workgroup count on X.
dim_y
Workgroup count on Y.
dim_z
Workgroup count on Z.
dispatch_initiator
Raw `COMPUTE_DISPATCH_INITIATOR` bits; pass 0

src/zinc_rt/ring/packet.zig:141

method

PacketBuilder.releaseMemSignal

#
pub fn releaseMemSignal(self: *PacketBuilder, gpu_addr: u64, value: u64) Error!void

Emit a GFX10+ `RELEASE_MEM` end-of-pipe fence that writes `value` to `gpu_addr` after prior shader work and global-memory writes complete.

compute-ring fences; older ASICs are not a ZINC_RT direct target.

Parameters
self
Builder to append to.
gpu_addr
64-bit GPU virtual address to receive the fence value.
value
Fence payload (typically a monotonically increasing seqno).
Notes

This uses the GFX10+ release-mem packet layout used by amdgpu for

src/zinc_rt/ring/packet.zig:163

method

PacketBuilder.writeData64

#
pub fn writeData64(self: *PacketBuilder, gpu_addr: u64, value: u64) Error!void

Emit `WRITE_DATA` that stores `value` (64 bits) at `gpu_addr` via the ME (micro-engine) with WR_CONFIRM set.

The ME stalls until the write lands in memory (WR_CONFIRM=1), so callers observe the value as soon as the packet retires.

Parameters
self
Builder to append to.
gpu_addr
Destination GPU virtual address.
value
64-bit payload written little-endian.
Notes

Uses `dst_sel=5` (memory async/direct) and `engine_sel=0` (ME).

src/zinc_rt/ring/packet.zig:206

method

PacketBuilder.copyData32

#
pub fn copyData32(self: *PacketBuilder, src_gpu_addr: u64, dst_gpu_addr: u64) Error!void

Emit `COPY_DATA` that copies a single 32-bit dword from one GPU memory address to another with WR_CONFIRM set.

command processor stalls until the destination write completes.

Parameters
self
Builder to append to.
src_gpu_addr
Source GPU virtual address (memory, src_sel=1).
dst_gpu_addr
Destination GPU virtual address (memory, dst_sel=5).
Notes

Copies exactly 32 bits (count_sel=0). WR_CONFIRM=1 means the

src/zinc_rt/ring/packet.zig:229

method

PacketBuilder.padToAlignment

#
pub fn padToAlignment(self: *PacketBuilder, dword_alignment: usize) Error!void

Pad the buffer with `NOP` packets until the current write cursor is a multiple of `dword_alignment` dwords.

in a single packet rather than spinning out many minimum-size NOPs.

Parameters
self
Builder to pad.
dword_alignment
Required alignment in dwords (0 is a no-op).
Notes

Each emitted NOP carries enough payload to land on the alignment

src/zinc_rt/ring/packet.zig:253

function

lo32

#
pub fn lo32(value: u64) u32

Extract the low 32 bits of a 64-bit value for little-endian dword writes.

Parameters

value
64-bit input.

Returns

Bits [31:0] of `value`.

src/zinc_rt/ring/packet.zig:283

function

hi32

#
pub fn hi32(value: u64) u32

Extract the high 32 bits of a 64-bit value for little-endian dword writes.

Parameters

value
64-bit input.

Returns

Bits [63:32] of `value`.

src/zinc_rt/ring/packet.zig:290