Last updated: 2026-06-12

Metal Runtime

Kernel Timing

All API Sections

Per-kernel Metal dispatch timing probe — default-off, env-flag-gated.

When `ZINC_METAL_KERNEL_TIMING=1` is set at engine init, every compute dispatch is wrapped in commit+wait+restart inside `MetalCommand.dispatch*` so we can measure CPU-side end-to-end ns per dispatch. The probe is intentionally destructive to throughput (each dispatch becomes a GPU sync point) and is intended ONLY for `--profile` runs where evidence about which kernels dominate dispatch cost matters more than absolute tok/s.

Aggregation is keyed by pipeline pointer; the human-readable label comes from `MetalPipeline.name` set at shader load time in forward_metal.zig.

7 exports 0 methods src/metal/kernel_timing.zig

7 exports shown

variable

enabled

#
pub var enabled: bool = false

Toggled true at engine init when `ZINC_METAL_KERNEL_TIMING=1`.

src/metal/kernel_timing.zig:26

struct

Entry

#
pub const Entry = struct

Snapshot view of one pipeline's aggregated dispatch cost.

src/metal/kernel_timing.zig:33

function

enable

#
pub fn enable() void

Enable the probe for the rest of the process.

Idempotent.

src/metal/kernel_timing.zig:47

function

reset

#
pub fn reset() void

Clear accumulated stats.

Typically called at the start of a profile request.

src/metal/kernel_timing.zig:52

function

record

#
pub fn record(pipe_handle: ?*const anyopaque, name: ?[]const u8, elapsed_ns: u64) void

Record one dispatch worth of elapsed ns against a pipeline.

Cheap when `enabled` is false (skips early at the call site).

src/metal/kernel_timing.zig:61

function

topByTotalNs

#
pub fn topByTotalNs(buf: []Entry) []Entry

Fill `buf` with up to `buf.len` entries ranked by descending total_ns.

Returns the populated prefix slice.

src/metal/kernel_timing.zig:132

function

topByAvgNs

#
pub fn topByAvgNs(buf: []Entry) []Entry

Fill `buf` with up to `buf.len` entries ranked by descending avg_ns.

Returns the populated prefix slice.

src/metal/kernel_timing.zig:138