What Is KV Cache and Why Does It Make LLM Inference Fast?
Every token an LLM generates reuses Keys and Values from everything that came before. The KV cache is what makes that reuse cheap. Here's how it works — and why inference slows down with longer context.
Johannes Hayer
johanneshayer