Last updated: 2026-06-12

Inference Runtime

Lm Head

All API Sections

T-CPU LM_HEAD implementation.

Projects hidden state through a GGUF output matrix and writes logits.

2 exports shown

struct

Params

#
pub const Params = struct

Inputs and outputs for one LM_HEAD call.

Parameters

raw_data
Raw GGUF tensor bytes for the output matrix `[vocab_size, hidden_dim]`.
tensor_type
GGML quantization format of `raw_data` (forwarded to `dequant.row`).
hidden
Final hidden state of length `hidden_dim`.
row_scratch
Caller-owned scratch buffer of length exactly `hidden_dim` for one dequantized row.
logits
Destination vector of length `vocab_size`; row `i` of the matrix maps to `logits[i]`.

src/zinc_rt/isa/cpu_zig/lm_head.zig:14

function

run

#
pub fn run(params: Params) !void

Project the hidden state through every row of the GGUF output matrix to produce vocab logits.

Rows are dequantized one at a time into `row_scratch` and dot-multiplied with `hidden`. `row_scratch` is not exactly `hidden.len`, otherwise void.

Parameters

params
Tensor data, hidden state, scratch row, and logits slice; see `Params`.

Returns

`error.EmptyInput` when either `hidden` or `logits` is empty, `error.ShapeMismatch` when

src/zinc_rt/isa/cpu_zig/lm_head.zig:27