AI

Synapse

AI in a Shell

Navigate
  • Dashboard
  • Chat
Settings
  • Settings
  • Get Help
  • Log In
    1/6

    What an LLM Does

    Nina ships her first LLM feature in a weekend. It answers questions, summarizes docs, writes code. Then a user asks: "What's the weather in Berlin?" — and the model confidently invents yesterday's forecast.

    Nina didn't break anything. That is how LLMs work. You would have built exactly the same thing.

    An LLM does one thing: take text in, predict the next token. Not browse the internet. Not query a database. Just pattern-match against a massive slice of the internet — one token at a time.

    That's why:

    • Hallucination = lossy compression — trillions of words squeezed into billions of parameters, gaps filled with plausible fabrications
    • Streaming responses = causal attention — the model predicts one token at a time, left to right
    • Uncanny understanding = contextual embeddings — words shift meaning based on surrounding context

    None of this is magic. It's engineering — built by solving one concrete failure at a time. It starts with a surprisingly dumb idea: counting words.

    Term note: Token, parameters, compression, attention, and embeddings are introduced here; they get deeper treatment in Module 2, Tokens & Embeddings, and Transformer Internals.

    From Word Counts to Meaning Vectors

    Before 2013, computers represented language by counting words. "Bank" plus "loan" meant finance. "Bank" plus "river" meant geography. It powered spam filters and search for 15 years.

    Fatal flaw: word order disappears entirely. "The dog bit the man" and "The man bit the dog" produce identical vectors — same words, same counts, opposite meaning.

    Word2Vec (2013) fixed this by placing words in a vector space based on co-occurrence — words that appear in similar contexts end up nearby:

    king - man + woman ≈ queen
    

    Each word gets an embedding — a vector of ~300 numbers representing its semantic properties. These aren't hand-coded; the model discovers them by reading billions of sentences.

    But Word2Vec assigns one fixed embedding per word. "Bank" gets the same vector next to "river" as next to "deposit." That's not a corner case — every polysemous word in English is frozen in the same ambiguity. Your model can't tell which meaning is active. Context has to come from somewhere.

    Term note: Vectors, Word2Vec, embeddings, and semantic similarity are introduced here; they are explained deeply in Tokens & Embeddings and Embeddings & Vector Databases.

    Checkpoint
    Checkpoint

    Word2Vec encodes `king − man + woman ≈ queen`. But it assigns one embedding to 'bank' regardless of context. What does this make it unable to do?

    Attention → Transformer → Scale

    Word2Vec gave "bank" one fixed vector — same whether it sits next to "river" or "deposit." Real language doesn't work that way.

    Attention (2014) fixed this by making embeddings dynamic. In "I deposited money at the bank," attention shifts "bank" toward finance because "money" and "deposited" are nearby. Same word, different vector per context — contextual embeddings.

    But attention was bolted onto RNNs — sequential models, one token at a time. Token 50 can't start until tokens 1–49 finish. No GPU count fixes that.

    The Transformer (2017) cut the RNN entirely. Attention Is All You Need: attention alone is sufficient — every token attends to every other in one parallel pass. Trainable on the entire internet.

    The architecture split:

    • Encoder-only (BERT): Bidirectional — understands. Cannot generate.
    • Decoder-only (GPT): Causal — generates left to right.

    All modern LLMs are decoder-only. Scale then did something no one fully predicted: reasoning, code, and arithmetic emerged without being explicitly trained in. They just appeared.

    Next: what IS an LLM physically — not a cloud, but two files on a laptop.

    Term note: Attention, RNNs, Transformers, encoder/decoder models, BERT, and GPT are introduced here; the mechanics are explained deeply in Transformer Internals and LLM Types & Modalities.

    Checkpoint
    Checkpoint

    Why did the Transformer replace RNNs even when both could use attention?

    Discuss
    Synapse
    "

    Every step in this arc solved a concrete failure of the previous one. Current LLMs still fail at arithmetic, real-time data, and grounding in facts. What do you think the next breakthrough will solve — and what architectural change might it require?

    "