No Image

AI Security & Guardrails

Protect AI systems from adversarial attacks, implement input/output guardrails, and scope permissions for tool-using agents.

Why take this course?

AI systems face unique security challenges — from prompt injection to tool misuse. This course teaches you to build defense in depth: adversarial resilience, content guardrails, and least-privilege tool access.

Prerequisites

This course builds on concepts from the following courses. It is recommended to complete them first:

AI Agents

Course Modules

1Module 1: Adversarial Security (Defense in Depth)

The LLM OS has its own class of vulnerabilities. Explore jailbreaking, prompt injection, and data poisoning. Understand why alignment is a statistical tendency, not a wall — and build defense in depth around your model.

Learning Goals

Classify the three attack vectors: jailbreaking, prompt injection, and data poisoning.
Explain why alignment is a statistical tendency that attackers can circumvent.
Design defense-in-depth architectures with input sanitization, permission scoping, and output validation.

2Module 2: Input Validation & Output Guardrails

Production AI systems need sanitization on both ends. Learn patterns for input filtering, content moderation, PII detection, and output validation that prevent harmful or incorrect responses from reaching users.

Learning Goals

Design input sanitization pipelines that detect and neutralize injection attempts.
Implement output guardrails: content filtering, PII redaction, and format validation.
Choose between rule-based, classifier-based, and LLM-based moderation approaches.

3Module 3: Permission Scoping & Least Privilege

When AI agents have tools, every tool is an attack surface. Learn to scope permissions, sandbox execution, implement approval flows, and design trust boundaries that limit blast radius.

Learning Goals

Apply least-privilege principles to AI agent tool access and data permissions.
Design trust boundaries and sandbox patterns for tool-using agents.
Implement human-in-the-loop approval flows for high-risk actions.