Last updated: 2026-06-12
Metal Runtime
Kernel Timing
Per-kernel Metal dispatch timing probe — default-off, env-flag-gated.
When `ZINC_METAL_KERNEL_TIMING=1` is set at engine init, every compute dispatch is wrapped in commit+wait+restart inside `MetalCommand.dispatch*` so we can measure CPU-side end-to-end ns per dispatch. The probe is intentionally destructive to throughput (each dispatch becomes a GPU sync point) and is intended ONLY for `--profile` runs where evidence about which kernels dominate dispatch cost matters more than absolute tok/s.
Aggregation is keyed by pipeline pointer; the human-readable label comes from `MetalPipeline.name` set at shader load time in forward_metal.zig.
7 exports shown
variable
enabled
pub var enabled: bool = false Toggled true at engine init when `ZINC_METAL_KERNEL_TIMING=1`.
struct
Entry
pub const Entry = struct Snapshot view of one pipeline's aggregated dispatch cost.
function
enable
pub fn enable() void Enable the probe for the rest of the process.
Idempotent.
function
reset
pub fn reset() void Clear accumulated stats.
Typically called at the start of a profile request.
function
record
pub fn record(pipe_handle: ?*const anyopaque, name: ?[]const u8, elapsed_ns: u64) void Record one dispatch worth of elapsed ns against a pipeline.
Cheap when `enabled` is false (skips early at the call site).
function
topByTotalNs
pub fn topByTotalNs(buf: []Entry) []Entry Fill `buf` with up to `buf.len` entries ranked by descending total_ns.
Returns the populated prefix slice.
function
topByAvgNs
pub fn topByAvgNs(buf: []Entry) []Entry Fill `buf` with up to `buf.len` entries ranked by descending avg_ns.
Returns the populated prefix slice.