Last updated: 2026-06-12
Inference Runtime
Moe Gate Topk
T-CPU MOE_GATE_TOPK implementation.
Computes router logits from a GGUF gate matrix, then selects and normalizes the active expert weights using the same routing rules as the Vulkan path.
3 exports shown
enum
RoutingRule
pub const RoutingRule = enum Selection rule applied after the router projection to convert logits into expert weights.
struct
Params
pub const Params = struct Inputs and outputs for one MoE gate + top-k call.
function
run
pub fn run(params: Params) !void Project the hidden state through the router matrix, then select and normalize the top-k experts.
First fills `logits` row by row (matvec via `dequant.row`), then dispatches on `rule` to either `softmax_all` (softmax across all experts, pick top-k, renormalize) or `softmax_selected` (pick top-k by raw logit, softmax across that subset). `error.InvalidTopK` when `k` is zero or larger than the expert count, `error.ShapeMismatch` when scratch or output slices are too small, otherwise void.