Menu
← Back to Courses
No Image

Fine-Tuning & Alignment

Go beyond prompting — adapt generative models to your domain using SFT, QLoRA, RLHF, and DPO.

Why take this course?

When prompting is not enough, fine-tuning lets you shape model behavior at the weight level. This course covers the full pipeline: supervised fine-tuning, parameter-efficient methods (LoRA/QLoRA), evaluation strategies, and preference alignment with RLHF and DPO.

Prerequisites

This course builds on concepts from the following courses. It is recommended to complete them first:

Course Modules

1Module 1: When to Fine-Tune

Fine-tuning is powerful but expensive. Learn the decision framework: when prompting is enough, when RAG is better, and when fine-tuning is the right tool. Covers cost-benefit analysis and the "prompting → RAG → fine-tuning" escalation ladder.

Learning Goals

  • Apply a decision framework to choose between prompting, RAG, and fine-tuning for a given task.
  • Estimate the cost and effort of fine-tuning vs. alternative approaches.
  • Identify tasks where fine-tuning provides irreplaceable value (style, domain language, latency).
2Module 2: Supervised Fine-Tuning (SFT)

The most common fine-tuning approach: teach a model new behaviors by training on curated input-output pairs. Covers dataset preparation, training pipelines, hyperparameter selection, and evaluation.

Learning Goals

  • Prepare training datasets with quality filtering, deduplication, and format requirements.
  • Configure SFT training runs with appropriate learning rates, batch sizes, and epoch counts.
  • Evaluate fine-tuned models against held-out test sets and production baselines.
3Module 3: Parameter-Efficient Methods (LoRA & QLoRA)

Full fine-tuning requires massive compute. LoRA and QLoRA let you adapt models by training a tiny fraction of parameters — making fine-tuning accessible on consumer hardware.

Learning Goals

  • Explain how LoRA reduces trainable parameters via low-rank decomposition.
  • Describe QLoRA and how quantization enables fine-tuning on consumer GPUs.
  • Choose appropriate rank, alpha, and target modules for LoRA configurations.
4Module 4: Preference Alignment (RLHF & DPO)

SFT teaches what to say; alignment teaches what not to say. Learn how RLHF and DPO shape model behavior toward human preferences, safety, and helpfulness — and the alignment tax you pay.

Learning Goals

  • Explain the RLHF pipeline: reward model training, PPO optimization, and the alignment tax.
  • Describe DPO as a simpler alternative to RLHF that eliminates the reward model.
  • Evaluate alignment quality and detect when alignment degrades task performance.