Last updated: 2026-06-12

CUDA Runtime

Device

All API Sections

CUDA device wrapper — NVIDIA GPU backend (mirrors src/metal/device.zig).

Owns CUDA context init and capability queries used by the loader, diagnostics, and CUDA inference runtime.

2 exports 10 methods src/cuda/device.zig

2 exports shown

struct

CudaCapabilities

#
pub const CudaCapabilities = struct

Capability snapshot queried once from the active CUDA device.

src/cuda/device.zig:10

struct

CudaDevice

#
pub const CudaDevice = struct

Active CUDA device wrapper plus capability metadata used by the backend.

src/cuda/device.zig:21

Methods

10

method

CudaDevice.init

#
pub fn init(allocator: std.mem.Allocator, device_index: u32) !CudaDevice

Initialize a specific CUDA device by index and query its capabilities.

Parameters
allocator
Allocator stored for future use by the backend.
device_index
Zero-based CUDA device ordinal.
Returns

Initialised `CudaDevice` with a live context, or `error.CudaInitFailed` if the shim rejects the index.

src/cuda/device.zig:31

method

CudaDevice.initBest

#
pub fn initBest(allocator: std.mem.Allocator) !CudaDevice

Initialize the highest-compute-capability device (prefer 5090 over 4090).

Probes up to 16 device indices, selects the one with the largest compute capability value, then opens a final context on that device via `init`.

Parameters
allocator
Forwarded to `init` for the selected device.
Returns

Initialised `CudaDevice` for the best device, or `error.CudaNoDevice` if no device is found.

src/cuda/device.zig:47

method

CudaDevice.totalMemory

#
pub fn totalMemory(self: *const CudaDevice) u64

Total device memory (VRAM capacity) in bytes.

src/cuda/device.zig:85

method

CudaDevice.freeMemory

#
pub fn freeMemory(self: *const CudaDevice) u64

Currently free device memory in bytes; 0 if the context has been destroyed.

src/cuda/device.zig:90

method

CudaDevice.computeCapability

#
pub fn computeCapability(self: *const CudaDevice) u32

Compute capability encoded as `major*10 + minor` (e.g.

120 = sm_120).

src/cuda/device.zig:96

method

CudaDevice.smCount

#
pub fn smCount(self: *const CudaDevice) u32

Number of streaming multiprocessors (SMs) on the device.

src/cuda/device.zig:101

method

CudaDevice.warpSize

#
pub fn warpSize(self: *const CudaDevice) u32

Threads per warp (32 on all current NVIDIA hardware).

src/cuda/device.zig:106

method

CudaDevice.maxSharedMemPerBlock

#
pub fn maxSharedMemPerBlock(self: *const CudaDevice) u64

Maximum shared memory per block available with opt-in dynamic allocation, in bytes.

src/cuda/device.zig:111

method

CudaDevice.name

#
pub fn name(self: *const CudaDevice, buf: []u8) []const u8

Copy the device name into `buf` and return the NUL-trimmed slice.

Parameters
buf
Caller-supplied scratch buffer; 64–256 bytes is typically sufficient.
Returns

Slice into `buf` containing the device name without a trailing NUL, or an empty slice if the context has been destroyed.

src/cuda/device.zig:118