Embeddings & Semantic Search
Understand embeddings, semantic similarity, and vector databases — the foundation for RAG and semantic search.
Why take this course?
Before you can retrieve, you need to represent. Learn how text becomes vectors, how vector databases make retrieval fast at scale, and how to build a production semantic search pipeline from corpus to query.
Prerequisites
This course builds on concepts from the following courses. It is recommended to complete them first:
Course Modules
Text has two layers: surface form (words, characters) and semantic meaning. Learn how embeddings bridge them — converting text into dense vectors where proximity encodes similarity, enabling search that understands intent rather than matching keywords.
Learning Goals
- Explain the semantic gap and why keyword search fails for conceptual queries.
- Describe what an embedding is and how high-dimensional vectors encode semantic meaning.
- Understand cosine similarity and why angle matters more than Euclidean distance for text comparison.
- Distinguish static embeddings (Word2Vec) from contextual embeddings (BERT/SBERT) and explain why context matters.
Concept Card Preview
Visuals, diagrams, and micro-interactions you'll see in this module.

The Semantic Gap
Nina's users keep complaining: "I searched for 'affordable places to live' and got nothing." She checks the database — t…

What Are Embeddings?
So how do you represent meaning as numbers? An embedding is a dense vector of floating-point numbers — typically 768…
Measuring Similarity
Embeddings live in high-dimensional space. To compare them, you need a distance metric — and the right one isn't the obv…
With millions of embeddings, brute-force similarity search becomes impractical. Explore the algorithms (HNSW, IVF) and databases (FAISS, Pinecone, Weaviate, Milvus) that make semantic search fast at scale — and the metadata filtering and hybrid search capabilities that make it useful in real applications.
Learning Goals
- Explain why exact nearest neighbor search fails at scale and what Approximate Nearest Neighbor (ANN) algorithms trade for speed.
- Describe how HNSW and IVF indexes work and when to use each.
- Compare major vector databases (FAISS, Pinecone, Weaviate, Milvus, Chroma) on operational tradeoffs.
- Understand metadata filtering and hybrid search (dense + sparse) as essential production features.
Concept Card Preview
Visuals, diagrams, and micro-interactions you'll see in this module.
Why Exact Search Doesn't Scale
Nina's embedding prototype searches 1,000 documents in 2 milliseconds. She's thrilled — until the production dataset arr…

HNSW — Navigating a Graph of Neighbors
Hierarchical Navigable Small World (HNSW) is the most popular ANN algorithm in production vector databases. Think of…
The Vector Database Landscape
Raw ANN libraries like FAISS give you speed but not persistence, sharding, or filtering. Vector databases handle the ful…
Not all embedding models are equal. Learn how Sentence-BERT bi-encoders enable fast semantic search, how commercial APIs (OpenAI, Cohere) compare to open-source alternatives, what Matryoshka embeddings unlock for two-stage retrieval, and when domain-specific fine-tuning is worth the effort.
Learning Goals
- Explain how SBERT bi-encoder architecture enables pre-computed corpus embeddings for fast semantic search.
- Compare commercial embedding APIs (OpenAI, Cohere) vs. open-source SBERT on cost, privacy, and customizability.
- Describe Matryoshka Representation Learning and how truncated embeddings enable two-stage retrieval.
- Identify when domain-specific fine-tuning is necessary and what training data strategies support it.
Concept Card Preview
Visuals, diagrams, and micro-interactions you'll see in this module.

Sentence-BERT and Bi-Encoders
Nina tries to use BERT for semantic search. To compare a query against 10,000 documents, BERT needs to process the query…
Commercial vs. Open-Source Embeddings
SBERT showed that sentence embeddings could be practical. Then commercial APIs made them trivial.
**OpenAI text-embeddi…
Matryoshka Embeddings
Traditional embeddings are fixed-size. A 1536-dim model always produces 1536-dim vectors, whether you need that precisio…
Building a working semantic search system requires decisions about chunking strategy, chunk size, metadata design, and operational pipeline consistency. Learn how these choices interact and what a production embedding pipeline looks like end-to-end.
Learning Goals
- Compare chunking strategies (fixed-size, sentence-window, recursive, semantic) and match each to appropriate use cases.
- Explain the precision-context tradeoff in chunk size selection and the parent-child pattern that resolves it.
- Design an end-to-end embedding pipeline: load → clean → chunk → embed → upsert, with metadata for provenance.
- Understand why preprocessing consistency between ingestion and query time is critical for retrieval quality.
Concept Card Preview
Visuals, diagrams, and micro-interactions you'll see in this module.
Chunking — The First Decision
Nina has her embedding model picked and her vector database ready. But before she can embed anything, she faces a questi…
The Precision-Context Tradeoff
Chunk size controls a fundamental tension: precision versus context.
Small chunks (64-256 tokens) are semantically…
The Embedding Pipeline
A semantic search system isn't a vector database — it's a pipeline. Nina learns this when her first prototype works but…
Put everything together. Nina builds semantic search over 2 million legal contracts — choosing an embedding model, chunking strategy, vector database, and search architecture under real constraints.
Learning Goals
- Apply embedding model selection to a real constraint set (VPC compliance, legal vocabulary).
- Choose a chunking strategy that matches document structure (clause-boundary splitting).
- Diagnose production failures (exact ID search, stale data) and select targeted fixes.
- Design a complete semantic search architecture from constraints to deployment.
Concept Card Preview
Visuals, diagrams, and micro-interactions you'll see in this module.
The Brief
Nina's phone buzzes during standup. It's the CTO: "We just signed Hargrove & Associates — 400 attorneys, 2 million contr…
The Constraints
Nina opens a notebook and maps the problem. The corpus is large — 2 million contracts at ~10,000 tokens each means rough…
The Chunking Challenge
Nina deploys the fine-tuned SBERT model. Embeddings look good on test queries. Now she faces her second decision: how to…