April 23, 2026

What Is KV Cache and Why Does It Make LLM Inference Fast?

Every token an LLM generates reuses Keys and Values from everything that came before. The KV cache is what makes that reuse cheap. Here's how it works — and why inference slows down with longer context.

Johannes Hayer

johanneshayer

PreviousWhy Transformers Can't Tell Position Apart — and How RoPE Fixes It