Last updated: 2026-06-12

CUDA Runtime

Buffer

All API Sections

CUDA buffer wrapper — device-local allocations with optional pinned staging.

Unlike Metal (Apple unified memory), CUDA device memory is NOT CPU-visible; host<->device transfers are explicit (`upload`/`download`), staged through pinned host memory. Mirrors src/metal/buffer.zig.

8 exports 2 methods src/cuda/buffer.zig

8 exports shown

struct

CudaBuffer

#
pub const CudaBuffer = struct

CUDA device buffer handle plus optional pinned-host staging mirror.

src/cuda/buffer.zig:11

Methods

2

method

CudaBuffer.devicePtr

#
pub fn devicePtr(self: *const CudaBuffer) u64

Raw device pointer (CUdeviceptr as u64) for kernel arg packing / aliasing.

src/cuda/buffer.zig:20

method

CudaBuffer.contents

#
pub fn contents(self: *const CudaBuffer) ?[*]u8

Pinned host staging pointer, if this buffer was created staged.

src/cuda/buffer.zig:26

function

createBuffer

#
pub fn createBuffer(ctx: ?*shim.CudaCtx, size: usize) !CudaBuffer

Allocate a device-local buffer (the common case for weights/activations/state).

Parameters

ctx
CUDA context that owns the allocation.
size
Number of bytes to allocate on the device.

Returns

A `CudaBuffer` with no host staging pointer; use `createBufferStaged` when CPU access is needed.

Notes

Returns `error.CudaBufferAllocFailed` if the shim returns a null handle.

src/cuda/buffer.zig:36

function

createBufferStaged

#
pub fn createBufferStaged(ctx: ?*shim.CudaCtx, size: usize) !CudaBuffer

Allocate a device buffer paired with a pinned-host staging mirror for fast `upload`/`download`.

The host pointer is exposed via `contents()`.

Parameters

ctx
CUDA context that owns the allocation.
size
Number of bytes to allocate on both the device and in pinned host memory.

Returns

A `CudaBuffer` whose `host_ptr` field is non-null; `contents()` returns the pinned staging address.

Notes

Returns `error.CudaBufferAllocFailed` if the shim returns a null handle.

src/cuda/buffer.zig:48

function

uploadMmap

#
pub fn uploadMmap(ctx: ?*shim.CudaCtx, host_ptr: *const anyopaque, size: usize) !CudaBuffer

Copy an existing host mapping (e.g.

mmap'd weights) to a new device-local buffer. Unlike Metal's zero-copy wrapMmap, this performs a full host-to-device copy.

Parameters

ctx
CUDA context that will own the resulting device allocation.
host_ptr
Pointer to the host memory region to copy from (typically an mmap'd file mapping).
size
Number of bytes to transfer.

Returns

A device-local `CudaBuffer`; `host_ptr` is null (data lives only on device after this call).

Notes

Returns `error.CudaMmapUploadFailed` if the shim returns a null handle.

src/cuda/buffer.zig:62

function

aliasBuffer

#
pub fn aliasBuffer(base: *const CudaBuffer, offset: usize, size: usize) !CudaBuffer

Create a lightweight view into an existing buffer's device allocation.

Parameters

base
Parent buffer whose device memory is aliased.
offset
Byte offset from the start of `base`'s device allocation.
size
Number of bytes the alias covers.

Returns

A `CudaBuffer` with `owns_handle = false`; calling `freeBuffer` on it releases only the wrapper, not the parent's device memory.

Notes

Returns `error.CudaBufferAllocFailed` if the shim returns a null handle.

src/cuda/buffer.zig:74

function

freeBuffer

#
pub fn freeBuffer(buf: *CudaBuffer) void

Free a buffer handle (the shim only releases device memory if this buffer owns it — aliases just free the wrapper).

Safe with a null handle.

src/cuda/buffer.zig:82

function

upload

#
pub fn upload(ctx: ?*shim.CudaCtx, buf: *const CudaBuffer, data: []const u8) void

Copy bytes from host to device (synchronous on the context stream).

Parameters

ctx
CUDA context whose stream is used for the transfer.
buf
Destination device buffer; must be at least `data.len` bytes.
data
Source slice on the host.

src/cuda/buffer.zig:93

function

download

#
pub fn download(ctx: ?*shim.CudaCtx, buf: *const CudaBuffer, dst: []u8) void

Copy bytes from device to host (synchronous on the context stream).

Parameters

ctx
CUDA context whose stream is used for the transfer.
buf
Source device buffer; must be at least `dst.len` bytes.
dst
Destination slice on the host.

src/cuda/buffer.zig:101