Last updated: 2026-06-12

CUDA Runtime

Pipeline

All API Sections

CUDA compute pipeline wrapper — NVRTC-compiled CUfunction (mirrors src/metal/pipeline.zig).

Compiles a `.cu` source string for the running device's arch (sm_XY) or loads a precompiled cubin/PTX image. Each pipeline holds a `CUmodule` + `CUfunction` pair obtained from the C shim. Use `createPipeline` for JIT compilation via NVRTC or `createPipelineFromImage` when an offline-compiled cubin/PTX blob is available. Free with `freePipeline` when done.

5 exports 0 methods src/cuda/pipeline.zig

5 exports shown

struct

CudaPipeline

#
pub const CudaPipeline = struct

A compiled CUDA kernel ready for dispatch.

src/cuda/pipeline.zig:13

function

createPipeline

#
pub fn createPipeline(ctx: ?*shim.CudaCtx, cu_source: [*:0]const u8, fn_name: [*:0]const u8) !CudaPipeline

NVRTC-compile `cu_source` and resolve `fn_name` for dispatch.

Parameters

ctx
Active CUDA context; must not be null.
cu_source
Null-terminated CUDA C source string passed directly to NVRTC.
fn_name
Null-terminated name of the kernel function to extract from the compiled module.

Returns

A `CudaPipeline` with populated `max_threads` and `shared_mem` fields, or `error.CudaPipelineCreateFailed`.

src/cuda/pipeline.zig:29

function

createPipelineFromImage

#
pub fn createPipelineFromImage(ctx: ?*shim.CudaCtx, image: [*]const u8, image_size: usize, fn_name: [*:0]const u8) !CudaPipeline

Load a kernel from a precompiled cubin/PTX image (offline nvcc path).

Parameters

ctx
Active CUDA context; must not be null.
image
Pointer to the raw cubin or PTX image bytes.
image_size
Byte length of `image`.
fn_name
Null-terminated name of the kernel function to locate in the loaded module.

Returns

A `CudaPipeline` with populated `max_threads` and `shared_mem` fields, or `error.CudaPipelineCreateFailed`.

src/cuda/pipeline.zig:45

function

setMaxDynamicShared

#
pub fn setMaxDynamicShared(pipe: *CudaPipeline, bytes: u32) void

Opt this kernel into a larger dynamic shared-memory cap (Ada/Blackwell).

default 48 KB barrier by calling `cuFuncSetAttribute` on the shim side.

Parameters

pipe
Pipeline whose dynamic shared-memory limit to raise.
bytes
New maximum dynamic shared memory per block in bytes.

Notes

No-op when `pipe.handle` is null. On Ada/Blackwell this lifts the

src/cuda/pipeline.zig:60

function

freePipeline

#
pub fn freePipeline(pipe: *CudaPipeline) void

Release the pipeline handle.

Safe to call with a null handle.

src/cuda/pipeline.zig:65