Last updated: 2026-06-12

Inference Runtime

Embed

All API Sections

T-CPU EMBED implementation.

Reads one token row from a GGUF tensor into f32 hidden state.

2 exports shown

struct

Params

#
pub const Params = struct

Inputs and outputs for one EMBED call.

Parameters

raw_data
Raw GGUF tensor bytes for the embedding matrix `[vocab_size, hidden_dim]`.
tensor_type
GGML quantization format of `raw_data` (forwarded to `dequant.row`).
token_id
Row index to fetch; must be `< vocab_size`.
hidden_dim
Number of columns per row; also the required length of `output`.
vocab_size
Number of embedding rows in `raw_data`.
output
Destination hidden-state slice of length exactly `hidden_dim`.

src/zinc_rt/isa/cpu_zig/embed.zig:14

function

run

#
pub fn run(params: Params) !void

Dequantize the row at `params.token_id` of the embedding matrix into `params.output`.

Thin wrapper over `dequant.row` that validates the token index and output shape. the output slice does not match `hidden_dim`, otherwise void.

Parameters

params
Token id, matrix shape, and destination slice; see `Params`.

Returns

`error.TokenOutOfRange` when the token id is past `vocab_size`, `error.ShapeMismatch` when

src/zinc_rt/isa/cpu_zig/embed.zig:28