multiple tokens, and a verifier filters them using the main model’s confidence. Focuses on speed–accuracy tradeoffs, visualization, and modular design for easy benchmarking and research.
visualization benchmarking acceleration research rejection-sampling modular-design llm-inference speculative-decoding token-verification verifier-guided-decoding draft-model efficient-generation speed-accuracy-tradeoff
-
Updated
Nov 9, 2025 - Jupyter Notebook