No Image

Embeddings & Semantic Search

Understand embeddings, semantic similarity, and vector databases — the foundation for RAG and semantic search.

Why take this course?

Before you can retrieve, you need to represent. Learn how text becomes vectors, how vector databases make retrieval fast at scale, and how to build a production semantic search pipeline from corpus to query.

Prerequisites

This course builds on concepts from the following courses. It is recommended to complete them first:

How LLMs Work

Course Modules

1Module 1: From Tokens to Meaning

Text has two layers: surface form (words, characters) and semantic meaning. Learn how embeddings bridge them — converting text into dense vectors where proximity encodes similarity, enabling search that understands intent rather than matching keywords.

Learning Goals

Explain the semantic gap and why keyword search fails for conceptual queries.
Describe what an embedding is and how high-dimensional vectors encode semantic meaning.
Understand cosine similarity and why angle matters more than Euclidean distance for text comparison.
Distinguish static embeddings (Word2Vec) from contextual embeddings (BERT/SBERT) and explain why context matters.

Concept Card Preview

Visuals, diagrams, and micro-interactions you'll see in this module.

The Semantic Gap

Nina's users keep complaining: "I searched for 'affordable places to live' and got nothing." She checks the database — t…

What Are Embeddings?

So how do you represent meaning as numbers? An embedding is a dense vector of floating-point numbers — typically 768…

Loading diagram...

Measuring Similarity

Embeddings live in high-dimensional space. To compare them, you need a distance metric — and the right one isn't the obv…

2Module 2: Vector Databases

With millions of embeddings, brute-force similarity search becomes impractical. Explore the algorithms (HNSW, IVF) and databases (FAISS, Pinecone, Weaviate, Milvus) that make semantic search fast at scale — and the metadata filtering and hybrid search capabilities that make it useful in real applications.

Learning Goals

Explain why exact nearest neighbor search fails at scale and what Approximate Nearest Neighbor (ANN) algorithms trade for speed.
Describe how HNSW and IVF indexes work and when to use each.
Compare major vector databases (FAISS, Pinecone, Weaviate, Milvus, Chroma) on operational tradeoffs.
Understand metadata filtering and hybrid search (dense + sparse) as essential production features.

Concept Card Preview

Visuals, diagrams, and micro-interactions you'll see in this module.

Loading diagram...

Why Exact Search Doesn't Scale

Nina's embedding prototype searches 1,000 documents in 2 milliseconds. She's thrilled — until the production dataset arr…

HNSW — Navigating a Graph of Neighbors

Hierarchical Navigable Small World (HNSW) is the most popular ANN algorithm in production vector databases. Think of…

Loading diagram...

The Vector Database Landscape

Raw ANN libraries like FAISS give you speed but not persistence, sharding, or filtering. Vector databases handle the ful…

3Module 3: Embedding Models

Not all embedding models are equal. Learn how Sentence-BERT bi-encoders enable fast semantic search, how commercial APIs (OpenAI, Cohere) compare to open-source alternatives, what Matryoshka embeddings unlock for two-stage retrieval, and when domain-specific fine-tuning is worth the effort.

Learning Goals

Explain how SBERT bi-encoder architecture enables pre-computed corpus embeddings for fast semantic search.
Compare commercial embedding APIs (OpenAI, Cohere) vs. open-source SBERT on cost, privacy, and customizability.
Describe Matryoshka Representation Learning and how truncated embeddings enable two-stage retrieval.
Identify when domain-specific fine-tuning is necessary and what training data strategies support it.

Concept Card Preview

Visuals, diagrams, and micro-interactions you'll see in this module.

Sentence-BERT and Bi-Encoders

Nina tries to use BERT for semantic search. To compare a query against 10,000 documents, BERT needs to process the query…

Loading diagram...

Commercial vs. Open-Source Embeddings

SBERT showed that sentence embeddings could be practical. Then commercial APIs made them trivial.

**OpenAI text-embeddi…

Loading diagram...

Matryoshka Embeddings

Traditional embeddings are fixed-size. A 1536-dim model always produces 1536-dim vectors, whether you need that precisio…

4Module 4: Semantic Search in Practice

Building a working semantic search system requires decisions about chunking strategy, chunk size, metadata design, and operational pipeline consistency. Learn how these choices interact and what a production embedding pipeline looks like end-to-end.

Learning Goals

Compare chunking strategies (fixed-size, sentence-window, recursive, semantic) and match each to appropriate use cases.
Explain the precision-context tradeoff in chunk size selection and the parent-child pattern that resolves it.
Design an end-to-end embedding pipeline: load → clean → chunk → embed → upsert, with metadata for provenance.
Understand why preprocessing consistency between ingestion and query time is critical for retrieval quality.

Concept Card Preview

Visuals, diagrams, and micro-interactions you'll see in this module.

Loading diagram...

Chunking — The First Decision

Nina has her embedding model picked and her vector database ready. But before she can embed anything, she faces a questi…

Loading diagram...

The Precision-Context Tradeoff

Chunk size controls a fundamental tension: precision versus context.

Small chunks (64-256 tokens) are semantically…

Loading diagram...

The Embedding Pipeline

A semantic search system isn't a vector database — it's a pipeline. Nina learns this when her first prototype works but…

5Apply It: Semantic Search for a Legal Corpus

Put everything together. Nina builds semantic search over 2 million legal contracts — choosing an embedding model, chunking strategy, vector database, and search architecture under real constraints.

Learning Goals

Apply embedding model selection to a real constraint set (VPC compliance, legal vocabulary).
Choose a chunking strategy that matches document structure (clause-boundary splitting).
Diagnose production failures (exact ID search, stale data) and select targeted fixes.
Design a complete semantic search architecture from constraints to deployment.

Concept Card Preview

Visuals, diagrams, and micro-interactions you'll see in this module.

Loading diagram...

The Brief

Nina's phone buzzes during standup. It's the CTO: "We just signed Hargrove & Associates — 400 attorneys, 2 million contr…

Loading diagram...

The Constraints

Nina opens a notebook and maps the problem. The corpus is large — 2 million contracts at ~10,000 tokens each means rough…

Loading diagram...

The Chunking Challenge

Nina deploys the fine-tuned SBERT model. Embeddings look good on test queries. Now she faces her second decision: how to…