Last updated: 2026-06-12

Inference Runtime

Cs

All API Sections

AMDGPU DRM command-submission (CS) path — bring-up of the RADV / radeonsi PM4 submission foundation.

T1 PM4-direct reaches the AMD command processor through three Linux ABIs: * `DRM_IOCTL_AMDGPU_USERQ` — the user-mode-queue ABI; the bench-node firmware reports zero compute USERQ slots, so it is unusable here (see `umq.zig`). * `/dev/kfd` `AMDKFD_IOC_CREATE_QUEUE` + a doorbell ring — works to create a raw `QUEUE_TYPE_COMPUTE` queue, but the MES never retires the PM4 we stage in it on this kernel (see `kfd.zig`). * `DRM_IOCTL_AMDGPU_CS` — the kernel-managed command-submission UAPI every AMD userspace driver (RADV, radeonsi, amdvlk) rides. The kernel owns the ring / doorbell / MES bookkeeping; userspace hands it an indirect buffer (IB) of PM4 and waits on the retired fence. This is the reliable foundation the GPU compute dispatch lowers onto.

This module brings the CS path's first retired PM4 batch up as a benchmark-visible gate: open the render node, query the compute HW IP, allocate an amdgpu context, create a persistent BO list for a GTT indirect-buffer BO plus data/signal/shader BOs, map them into the GPU VM at low VAs, submit PM4 streams through `DRM_IOCTL_AMDGPU_CS` using the same context/BO list, and wait for the returned fences with `DRM_IOCTL_AMDGPU_WAIT_CS`.

This is not the final T1/T2 ring from the design; it is the kernel-managed CS baseline used to validate packet bytes, BO residency, VM mapping, and fence retirement before lowering real decode slices onto the direct tiers.

36 exports 17 methods src/zinc_rt/ring/cs.zig

36 exports shown

constant

default_render_node

#
pub const default_render_node = "/dev/dri/renderD128"

Default DRM render node used by the CS bring-up gate when no path is provided.

src/zinc_rt/ring/cs.zig:37

constant

AMDGPU_HW_IP_GFX

#
pub const AMDGPU_HW_IP_GFX: u32 = 0

amdgpu HW IP block id for the graphics ring (uapi/drm/amdgpu_drm.h).

src/zinc_rt/ring/cs.zig:41

constant

AMDGPU_HW_IP_COMPUTE

#
pub const AMDGPU_HW_IP_COMPUTE: u32 = 1

amdgpu HW IP block id for the async compute ring used by ZINC submissions.

src/zinc_rt/ring/cs.zig:43

constant

AMDGPU_CTX_OP_ALLOC_CTX

#
pub const AMDGPU_CTX_OP_ALLOC_CTX: u32 = 1

`DRM_AMDGPU_CTX` op selector for allocating a new submission context.

src/zinc_rt/ring/cs.zig:47

constant

AMDGPU_CTX_OP_FREE_CTX

#
pub const AMDGPU_CTX_OP_FREE_CTX: u32 = 2

`DRM_AMDGPU_CTX` op selector for releasing a previously allocated context.

src/zinc_rt/ring/cs.zig:49

constant

AMDGPU_BO_LIST_OP_CREATE

#
pub const AMDGPU_BO_LIST_OP_CREATE: u32 = 0

`DRM_AMDGPU_BO_LIST` op selector to create a residency BO list handle.

src/zinc_rt/ring/cs.zig:53

constant

AMDGPU_BO_LIST_OP_DESTROY

#
pub const AMDGPU_BO_LIST_OP_DESTROY: u32 = 1

`DRM_AMDGPU_BO_LIST` op selector to destroy a previously created BO list.

src/zinc_rt/ring/cs.zig:55

constant

AMDGPU_CHUNK_ID_IB

#
pub const AMDGPU_CHUNK_ID_IB: u32 = 0x01

CS chunk id for an indirect-buffer descriptor (`DrmAmdgpuCsChunkIb`).

src/zinc_rt/ring/cs.zig:59

constant

AMDGPU_CHUNK_ID_BO_HANDLES

#
pub const AMDGPU_CHUNK_ID_BO_HANDLES: u32 = 0x06

CS chunk id for an inline BO-handles list, an alternative to a pre-created BO list.

src/zinc_rt/ring/cs.zig:61

constant

AMDGPU_IB_FLAG_EMIT_MEM_SYNC

#
pub const AMDGPU_IB_FLAG_EMIT_MEM_SYNC: u32 = 1 << 6

IB flag instructing the kernel to emit a memory-sync packet around the IB so writes from the BO list reach DRAM before/after the dispatch.

src/zinc_rt/ring/cs.zig:66

struct

DrmAmdgpuCtxIn

#
pub const DrmAmdgpuCtxIn = extern struct

Input payload of `DRM_IOCTL_AMDGPU_CTX`: selects an op and carries the caller-supplied `ctx_id` and submission priority for that op.

src/zinc_rt/ring/cs.zig:77

struct

DrmAmdgpuCtxOutAlloc

#
pub const DrmAmdgpuCtxOutAlloc = extern struct

Output payload of `AMDGPU_CTX_OP_ALLOC_CTX`: the kernel-assigned context id returned in the same `DrmAmdgpuCtx` union after a successful allocation.

src/zinc_rt/ring/cs.zig:86

struct

DrmAmdgpuCtxOutState

#
pub const DrmAmdgpuCtxOutState = extern struct

Output payload of the `AMDGPU_CTX_OP_QUERY_STATE` op: GPU reset state and hang counter for the queried context (unused on the bring-up path).

src/zinc_rt/ring/cs.zig:93

union

DrmAmdgpuCtx

#
pub const DrmAmdgpuCtx = extern union

Tagged union passed to `DRM_IOCTL_AMDGPU_CTX` covering the input request and the two output shapes (alloc / query-state).

src/zinc_rt/ring/cs.zig:101

struct

DrmAmdgpuBoListIn

#
pub const DrmAmdgpuBoListIn = extern struct

Input payload of `DRM_IOCTL_AMDGPU_BO_LIST`: op selector plus a pointer to an array of `DrmAmdgpuBoListEntry` describing the BOs the submission must keep resident.

src/zinc_rt/ring/cs.zig:110

struct

DrmAmdgpuBoListOut

#
pub const DrmAmdgpuBoListOut = extern struct

Output payload of `AMDGPU_BO_LIST_OP_CREATE`: the kernel-assigned BO list handle referenced from subsequent CS submissions.

src/zinc_rt/ring/cs.zig:120

union

DrmAmdgpuBoList

#
pub const DrmAmdgpuBoList = extern union

Tagged union passed to `DRM_IOCTL_AMDGPU_BO_LIST` covering input and output.

src/zinc_rt/ring/cs.zig:126

struct

DrmAmdgpuBoListEntry

#
pub const DrmAmdgpuBoListEntry = extern struct

Single residency entry inside a BO list: the GEM handle to make resident and a kernel-visible priority hint for eviction.

src/zinc_rt/ring/cs.zig:133

struct

DrmAmdgpuCsChunk

#
pub const DrmAmdgpuCsChunk = extern struct

One chunk inside a `DRM_IOCTL_AMDGPU_CS` submission: a typed sub-payload (`chunk_id`, length in dwords, pointer to the chunk data).

src/zinc_rt/ring/cs.zig:140

struct

DrmAmdgpuCsIn

#
pub const DrmAmdgpuCsIn = extern struct

Input payload of `DRM_IOCTL_AMDGPU_CS`: binds a context, BO list and an array of typed chunks (the IB descriptor lives in one of those chunks).

src/zinc_rt/ring/cs.zig:148

struct

DrmAmdgpuCsOut

#
pub const DrmAmdgpuCsOut = extern struct

Output payload of `DRM_IOCTL_AMDGPU_CS`: the fence handle the caller waits on via `DRM_IOCTL_AMDGPU_WAIT_CS` for the submission to retire.

src/zinc_rt/ring/cs.zig:158

union

DrmAmdgpuCs

#
pub const DrmAmdgpuCs = extern union

Tagged union passed to `DRM_IOCTL_AMDGPU_CS` covering input and output.

src/zinc_rt/ring/cs.zig:163

struct

DrmAmdgpuCsChunkIb

#
pub const DrmAmdgpuCsChunkIb = extern struct

Chunk payload for `AMDGPU_CHUNK_ID_IB`: describes the indirect-buffer VA, its size in bytes, the target IP type/ring, and submission flags such as `AMDGPU_IB_FLAG_EMIT_MEM_SYNC`.

src/zinc_rt/ring/cs.zig:171

struct

DrmAmdgpuWaitCsIn

#
pub const DrmAmdgpuWaitCsIn = extern struct

Input payload of `DRM_IOCTL_AMDGPU_WAIT_CS`: identifies the fence to wait on (by `handle`/`ctx_id` against a specific IP/ring) and the timeout.

src/zinc_rt/ring/cs.zig:183

struct

DrmAmdgpuWaitCsOut

#
pub const DrmAmdgpuWaitCsOut = extern struct

Output payload of `DRM_IOCTL_AMDGPU_WAIT_CS`: zero on successful retirement, nonzero on timeout or fence error.

src/zinc_rt/ring/cs.zig:194

union

DrmAmdgpuWaitCs

#
pub const DrmAmdgpuWaitCs = extern union

Tagged union passed to `DRM_IOCTL_AMDGPU_WAIT_CS` covering input and output.

src/zinc_rt/ring/cs.zig:199

struct

ArgmaxRangeResult

#
pub const ArgmaxRangeResult = struct

Result produced by the ordered-score argmax row-range kernel.

src/zinc_rt/ring/cs.zig:765

struct

DmmvArgmaxResult

#
pub const DmmvArgmaxResult = struct

Result produced by a quantized DMMV row-range kernel that performs its own in-kernel argmax over the computed rows.

src/zinc_rt/ring/cs.zig:772

enum

SmokeStatus

#
pub const SmokeStatus = enum

Outcome classification for the CS bring-up smoke gate.

Each variant maps to a specific failure point in the open → submit → wait pipeline, so the benchmark UI can attribute a regression to render-node access, kernel ABI mismatch, BO/VA setup, submission, or fence retirement.

src/zinc_rt/ring/cs.zig:781

struct

SmokeResult

#
pub const SmokeResult = struct

Structured result returned by the CS bring-up smoke gate.

Captures the rendezvous addresses, kernel-assigned handles, observed signal value, fence handles, and the final `SmokeStatus` so benchmark output can surface a precise failure mode without re-running the path.

src/zinc_rt/ring/cs.zig:806

Methods

1

method

SmokeResult.ok

#
pub fn ok(self: SmokeResult) bool

Returns true when both PM4 submissions retired and the signal BO read back the expected sentinel value.

src/zinc_rt/ring/cs.zig:825

struct

TokenBoundary

#
pub const TokenBoundary = struct

Per-token CS submission context for the PM4 bring-up tiers.

Owns the long-lived amdgpu context, BO list, and the GPU-mapped indirect- buffer / input / output / signal / shader buffers used by the `copyU32`, `argmaxTop2`, `rmsNormElement0` and `dmmvF32RowRange` dispatches. Reused across many submissions so each decode step only re-records PM4 into the existing IB and re-submits via `DRM_IOCTL_AMDGPU_CS`.

src/zinc_rt/ring/cs.zig:837

Methods

16

method

TokenBoundary.initDefault

#
pub fn initDefault() !TokenBoundary

Open the canonical render node (`default_render_node`) and finish the full CS bring-up: context, BO list, IB / input / output / signal / shader buffers, all mapped into a low GPU VA range.

Returns

A ready `TokenBoundary` whose `builder` can record PM4 immediately.

src/zinc_rt/ring/cs.zig:862

method

TokenBoundary.initPath

#
pub fn initPath(render_node: []const u8) !TokenBoundary

Open the given render node and bring up the full CS submission state.

Allocates an amdgpu context, creates GTT-backed BOs for the indirect buffer, input scratch (~2 MiB), output, signal and shader pages, maps each into a fixed low GPU VA so the kernel does not need to re-bind them per submission, uploads the gfx1201 PM4 kernels into the shader page, and creates a persistent BO list referencing all five BOs.

Parameters
render_node
Absolute path to the amdgpu DRM render node (e.g. `/dev/dri/renderD128`).
Returns

A ready `TokenBoundary` on success; the relevant `error.*Failed` variant otherwise.

Notes

Linux-only; returns `error.UnsupportedOs` on other platforms.

src/zinc_rt/ring/cs.zig:876

method

TokenBoundary.deinit

#
pub fn deinit(self: *TokenBoundary) void

Tear down every kernel resource the `init*` paths created: destroy the BO list, free the amdgpu context, `munmap` each CPU mapping, and close the render-node file descriptor.

Notes

Leaves the struct in an `undefined` state; do not reuse it.

src/zinc_rt/ring/cs.zig:984

method

TokenBoundary.copyU32

#
pub fn copyU32(self: *TokenBoundary, value: u32) !u32

Round-trip one `u32` through the GPU as the simplest end-to-end gate: PM4 `COPY_DATA` from the input page to the output page, plus a `WRITE_DATA` of a per-submission sentinel into the signal page.

Parameters
value
32-bit payload to copy.
Returns

The value the GPU wrote into `output_map[0]`.

Notes

Returns `error.SignalMismatch` if the post-fence signal value does not match the expected sentinel.

src/zinc_rt/ring/cs.zig:1002

method

TokenBoundary.produceToken

#
pub fn produceToken(self: *TokenBoundary, token_id: u32) !u32

Alias for `copyU32` framed as the per-token decode pulse: prove the GPU produced a token by round-tripping its id through a real PM4 submission and fence wait.

Parameters
token_id
Token id to round-trip through the GPU.
Returns

The id the GPU echoed back into the output page.

src/zinc_rt/ring/cs.zig:1057

method

TokenBoundary.argmaxTop2

#
pub fn argmaxTop2( self: *TokenBoundary, token0: u32, score0: f32, token1: u32, score1: f32, ) !u32

Dispatch the gfx1201 top-2 argmax kernel and return the selected token.

Loads the argmax program into the compute SGPRs, packs the output VA, two ordered scores, and two token ids into `compute_user_data_2..7`, fires one workgroup, then waits on the signal sentinel before reading the kernel-chosen token from the output page.

Parameters
token0
First candidate token id.
score0
Logit/score for `token0` (compared via ordered f32 bits).
token1
Second candidate token id.
score1
Logit/score for `token1`.
Returns

Whichever of `token0`/`token1` the kernel selected.

Notes

Returns `error.SignalMismatch` on fence mismatch or `error.ArgmaxTop2InvalidToken` if the kernel writes anything else.

src/zinc_rt/ring/cs.zig:1073

method

TokenBoundary.argmaxF32Range

#
pub fn argmaxF32Range( self: *TokenBoundary, scores: []const f32, start_row: u32, ) !ArgmaxRangeResult

Dispatch the gfx1201 ordered-score row-range argmax kernel.

Converts `scores` into sortable u32 keys, copies them into the shared input page, then lets the compute ring select the max row. The returned token id is absolute: `start_row + local_best`.

Parameters
scores
F32 logit/score row range to select from.
start_row
Absolute token row corresponding to `scores[0]`.
Returns

The selected absolute token id and the ordered score key the GPU stored.

src/zinc_rt/ring/cs.zig:1166

method

TokenBoundary.rmsNormElement0

#
pub fn rmsNormElement0( self: *TokenBoundary, hidden0: f32, inv_rms: f32, weight0: f32, ) !f32

Dispatch the single-element gfx1201 final-RMS-norm kernel.

Stores `hidden0 * inv_rms * weight0` into `output_map[0]` via a real PM4 dispatch on the compute ring, with a signal sentinel verifying retirement.

Parameters
hidden0
First hidden-state element.
inv_rms
Pre-computed inverse RMS scale.
weight0
First RMS-norm weight.
Returns

The fused `hidden0 * inv_rms * weight0` value the GPU produced.

Notes

Returns `error.SignalMismatch` if the signal sentinel does not match the expected per-submission value.

src/zinc_rt/ring/cs.zig:1272

method

TokenBoundary.dmmvF32RowRange

#
pub fn dmmvF32RowRange( self: *TokenBoundary, input: []const f32, weights_f32: []const u8, rows: u32, cols: u32, output: []f32, ) !void

Dispatch the gfx1201 row-range f32 dense matrix-vector kernel.

Copies the input vector and the row-major f32 weight block into the shared input page (64-byte aligned), records PM4 that points the kernel at the input/weights/output pages and the `rows`/`cols` arguments, and waits on the signal sentinel before reading `output`. This is the first row-oriented dense compute kernel the CS path runs; `cols` must be a multiple of 64 and `output` must hold at least `rows` elements.

Parameters
input
Input activation vector of length `cols`.
weights_f32
Row-major weight bytes; must hold at least `rows*cols*4` bytes.
rows
Number of output rows to compute.
cols
Inner dimension; must be a multiple of 64.
output
Output slice receiving `rows` f32 values.
Notes

Returns `error.ShapeMismatch`, `error.InputTooLarge`, `error.OutputTooLarge`, or `error.SignalMismatch` on invalid shapes or signal-readback failure.

src/zinc_rt/ring/cs.zig:1376

method

TokenBoundary.dmmvQ4_0RowRange

#
pub fn dmmvQ4_0RowRange( self: *TokenBoundary, input: []const f32, weights_q4_0: []const u8, rows: u32, cols: u32, output: []f32, ) !void

Dispatch the gfx1201 row-range Q4_0 matrix-vector kernel.

Copies the input vector and raw GGML Q4_0 rows into the shared input page, records PM4 for one serial workitem over `rows`, and reads back one f32 result per row. This intentionally validates real quantized model bytes through the native CS path while the full K-parallel DMMV kernel is still under construction.

Parameters
input
Input activation vector of length `cols`.
weights_q4_0
Row-major GGML Q4_0 row bytes; must hold at least `rows * (cols/32*18)` bytes.
rows
Number of output rows to compute.
cols
Inner dimension; must be a multiple of 32.
output
Output slice receiving `rows` f32 values.

src/zinc_rt/ring/cs.zig:1492

method

TokenBoundary.dmmvQ4_0RowRangeParallel

#
pub fn dmmvQ4_0RowRangeParallel( self: *TokenBoundary, input: []const f32, weights_q4_0: []const u8, rows: u32, cols: u32, output: []f32, ) !void

Dispatch the wave-lane gfx1201 Q4_0 matrix-vector kernel for exactly 64 rows in parallel.

Stages the same source-format rows as `dmmvQ4_0RowRange`, but launches one wave64 workgroup where each lane computes one row. Intended for 64-row LM-head prefix/window ranges where the GPU row values participate in choosing the sampled token.

Parameters
input
Input activation vector of length `cols`.
weights_q4_0
Row-major GGML Q4_0 row bytes; must hold exactly 64 rows.
rows
Must be exactly 64; any other value returns `error.ShapeMismatch`.
cols
Inner dimension; must be a multiple of 32.
output
Output slice receiving 64 f32 values (one per row).
Notes

Returns `error.SignalMismatch` if the post-fence signal value does not match the expected sentinel.

src/zinc_rt/ring/cs.zig:1608

method

TokenBoundary.dmmvQ4_0TwoRows

#
pub fn dmmvQ4_0TwoRows( self: *TokenBoundary, input: []const f32, row_a_q4_0: []const u8, row_b_q4_0: []const u8, cols: u32, output: []f32, ) !void

Dispatch Q4_0 DMMV for two arbitrary model rows staged back-to-back.

The caller supplies two individual source-format rows, which are packed into the shared staging page as a compact two-row matrix. This lets the current forward path obtain both LM-head top-2 scores from one real DMMV row-range submission even when the rows are not adjacent in vocab.

Parameters
input
Input activation vector of length `cols`.
row_a_q4_0
Raw GGML Q4_0 bytes for the first row; must hold at least `(cols/32)*18` bytes.
row_b_q4_0
Raw GGML Q4_0 bytes for the second row; same size requirement as `row_a_q4_0`.
cols
Inner dimension; must be a multiple of 32.
output
Output slice receiving 2 f32 values: `output[0]` for row A, `output[1]` for row B.
Notes

Returns `error.SignalMismatch` if the post-fence signal sentinel does not match.

src/zinc_rt/ring/cs.zig:1724

method

TokenBoundary.dmmvQ4_0ArgmaxRowRange

#
pub fn dmmvQ4_0ArgmaxRowRange( self: *TokenBoundary, input: []const f32, weights_q4_0: []const u8, rows: u32, cols: u32, ) !DmmvArgmaxResult

Dispatch the gfx1201 Q4_0 row-range DMMV kernel that performs argmax in the same submission.

The method stages the exact same source-format input and Q4_0 rows as `dmmvQ4_0RowRange`, but the kernel only stores the local best row and score. The forward path uses this for LM-head prefix/window candidates so a GPU-produced model value can directly participate in sampling without a follow-up direct argmax dispatch over copied logits.

src/zinc_rt/ring/cs.zig:1839

method

TokenBoundary.dmmvQ8_0RowRange

#
pub fn dmmvQ8_0RowRange( self: *TokenBoundary, input: []const f32, weights_q8_0: []const u8, rows: u32, cols: u32, output: []f32, ) !void

Dispatch the gfx1201 row-range Q8_0 matrix-vector kernel.

Copies the input vector and raw GGML Q8_0 rows into the shared input page, records PM4 for one serial workitem over `rows`, and reads back one f32 result per row. This keeps source-format Q8_0 model-slice validation exact while the final K-parallel DMMV kernel is still under construction.

Parameters
input
Input activation vector of length `cols`.
weights_q8_0
Row-major GGML Q8_0 row bytes; must hold at least `rows * (cols/32*34)` bytes.
rows
Number of output rows to compute.
cols
Inner dimension; must be a multiple of 32.
output
Output slice receiving `rows` f32 values.

src/zinc_rt/ring/cs.zig:1958

method

TokenBoundary.dmmvQ8_0TwoRowRanges

#
pub fn dmmvQ8_0TwoRowRanges( self: *TokenBoundary, input: []const f32, weights_a_q8_0: []const u8, rows_a: u32, weights_b_q8_0: []const u8, rows_b: u32, cols: u32, output: []f32, ) !void

Dispatch one gfx1201 Q8_0 DMMV kernel over two adjacent logical row ranges that share the same input vector.

The method packs `weights_a` followed by `weights_b` into the staging page, then runs the same compact Q8_0 row-range kernel over `rows_a + rows_b` rows. The output slice receives A's rows first and B's rows second. This is used by the M1 bridge to consume paired SSM alpha/beta projections without paying two CS submissions for the same activation vector.

Parameters
input
Input activation vector of length `cols`.
weights_a_q8_0
Row-major GGML Q8_0 bytes for range A; must hold at least `rows_a * (cols/32*34)` bytes.
rows_a
Number of rows in the A range.
weights_b_q8_0
Row-major GGML Q8_0 bytes for range B; must hold at least `rows_b * (cols/32*34)` bytes.
rows_b
Number of rows in the B range.
cols
Inner dimension; must be a multiple of 32.
output
Output slice receiving `rows_a + rows_b` f32 values: A rows first, then B rows.
Notes

Returns `error.SignalMismatch` if the post-fence signal sentinel does not match.

src/zinc_rt/ring/cs.zig:2079

method

TokenBoundary.dmmvQ8_0TwoRowRangesParallel64

#
pub fn dmmvQ8_0TwoRowRangesParallel64( self: *TokenBoundary, input: []const f32, weights_a_q8_0: []const u8, rows_a: u32, weights_b_q8_0: []const u8, rows_b: u32, cols: u32, output: []f32, ) !void

Dispatch one wave64 Q8_0 DMMV kernel over two packed row ranges totalling exactly 64 rows.

This is the row-parallel companion to `dmmvQ8_0TwoRowRanges` for the current SSM alpha+beta shape: 32 alpha rows plus 32 beta rows. Each lane computes one row from the packed staging block, eliminating the serial per-row loop of the scalar variant.

Parameters
input
Input activation vector of length `cols`.
weights_a_q8_0
Row-major GGML Q8_0 bytes for range A; must hold at least `rows_a * (cols/32*34)` bytes.
rows_a
Number of rows in the A range; `rows_a + rows_b` must equal 64.
weights_b_q8_0
Row-major GGML Q8_0 bytes for range B; must hold at least `rows_b * (cols/32*34)` bytes.
rows_b
Number of rows in the B range.
cols
Inner dimension; must be a multiple of 32.
output
Output slice receiving exactly 64 f32 values: A rows first, then B rows.
Notes

Returns `error.ShapeMismatch` if `rows_a + rows_b != 64`. Returns `error.SignalMismatch` on sentinel mismatch.

src/zinc_rt/ring/cs.zig:2106

function

lastErrno

#
pub fn lastErrno() ?linux.E

Errno captured from the most recent `ioctl` issued by this module, or null if the call succeeded.

Useful for surfacing a precise reason after a `SmokeResult.status` indicates a kernel-side failure.

Returns

The latest captured `linux.E` value, or null when there was no error.

src/zinc_rt/ring/cs.zig:2255

function

setupSmokeDefault

#
pub fn setupSmokeDefault() SmokeResult

Run the bring-up smoke gate against `default_render_node`.

Returns

A `SmokeResult` summarizing whether the two PM4 submissions retired and the signal sentinel matched.

src/zinc_rt/ring/cs.zig:2261

function

setupSmokePath

#
pub fn setupSmokePath(render_node: []const u8) SmokeResult

Run the bring-up smoke gate against the given DRM render node path.

Parameters

render_node
Absolute path to the amdgpu DRM render node to test.

Returns

A `SmokeResult` describing the open → submit → wait outcome.

src/zinc_rt/ring/cs.zig:2268

function

submitNopSmokeDefault

#
pub fn submitNopSmokeDefault() SmokeResult

Backwards-compatible alias for `setupSmokeDefault` named for the underlying PM4 NOP+WRITE_DATA stream that exercises the CS path.

Returns

A `SmokeResult` describing the bring-up outcome on `default_render_node`.

src/zinc_rt/ring/cs.zig:2275

function

submitNopSmokePath

#
pub fn submitNopSmokePath(render_node: []const u8) SmokeResult

Full bring-up smoke implementation: open the render node, query compute IP, allocate a context, create GTT-backed IB + signal BOs and map them at fixed low GPU VAs, build a PM4 NOP + `WRITE_DATA` stream, submit it twice through `DRM_IOCTL_AMDGPU_CS`, and verify each fence retires with the expected signal sentinel in the signal BO.

Parameters

render_node
Absolute path to the amdgpu DRM render node to exercise.

Returns

A `SmokeResult` whose `status` pinpoints the failure stage, or `.ok` on success.

Notes

Returns `.unsupported_os` immediately on non-Linux hosts; never throws.

src/zinc_rt/ring/cs.zig:2287