Back to Blog

What Actually Happens Inside a Transformer Block

Attention gets all the press. But a transformer block is more than attention — there's a feedforward network that holds most of the parameters, two residual connections, and a normalisation design that determines whether large-scale training is stable. Here's all of it, in order.

Johannes Hayer

Johannes Hayer

johanneshayer

    What Actually Happens Inside a Transformer Block