Last updated: 2026-06-12
CUDA Runtime
Buffer
CUDA buffer wrapper — device-local allocations with optional pinned staging.
Unlike Metal (Apple unified memory), CUDA device memory is NOT CPU-visible; host<->device transfers are explicit (`upload`/`download`), staged through pinned host memory. Mirrors src/metal/buffer.zig.
8 exports shown
struct
CudaBuffer
pub const CudaBuffer = struct CUDA device buffer handle plus optional pinned-host staging mirror.
Methods
2function
createBuffer
pub fn createBuffer(ctx: ?*shim.CudaCtx, size: usize) !CudaBuffer Allocate a device-local buffer (the common case for weights/activations/state).
function
createBufferStaged
pub fn createBufferStaged(ctx: ?*shim.CudaCtx, size: usize) !CudaBuffer Allocate a device buffer paired with a pinned-host staging mirror for fast `upload`/`download`.
The host pointer is exposed via `contents()`.
function
uploadMmap
pub fn uploadMmap(ctx: ?*shim.CudaCtx, host_ptr: *const anyopaque, size: usize) !CudaBuffer Copy an existing host mapping (e.g.
mmap'd weights) to a new device-local buffer. Unlike Metal's zero-copy wrapMmap, this performs a full host-to-device copy.
function
aliasBuffer
pub fn aliasBuffer(base: *const CudaBuffer, offset: usize, size: usize) !CudaBuffer Create a lightweight view into an existing buffer's device allocation.
function
freeBuffer
pub fn freeBuffer(buf: *CudaBuffer) void Free a buffer handle (the shim only releases device memory if this buffer owns it — aliases just free the wrapper).
Safe with a null handle.
function
upload
pub fn upload(ctx: ?*shim.CudaCtx, buf: *const CudaBuffer, data: []const u8) void Copy bytes from host to device (synchronous on the context stream).
function
download
pub fn download(ctx: ?*shim.CudaCtx, buf: *const CudaBuffer, dst: []u8) void Copy bytes from device to host (synchronous on the context stream).