Menu
← Back to Courses
No Image

AI Security & Guardrails

Protect AI systems from adversarial attacks, implement input/output guardrails, and scope permissions for tool-using agents.

Why take this course?

AI systems face unique security challenges — from prompt injection to tool misuse. This course teaches you to build defense in depth: adversarial resilience, content guardrails, and least-privilege tool access.

Prerequisites

This course builds on concepts from the following courses. It is recommended to complete them first:

Course Modules

1Module 1: Adversarial Security (Defense in Depth)

The LLM OS has its own class of vulnerabilities. Explore jailbreaking, prompt injection, and data poisoning. Understand why alignment is a statistical tendency, not a wall — and build defense in depth around your model.

Learning Goals

  • Classify the three attack vectors: jailbreaking, prompt injection, and data poisoning.
  • Explain why alignment is a statistical tendency that attackers can circumvent.
  • Design defense-in-depth architectures with input sanitization, permission scoping, and output validation.
2Module 2: Input Validation & Output Guardrails

Production AI systems need sanitization on both ends. Learn patterns for input filtering, content moderation, PII detection, and output validation that prevent harmful or incorrect responses from reaching users.

Learning Goals

  • Design input sanitization pipelines that detect and neutralize injection attempts.
  • Implement output guardrails: content filtering, PII redaction, and format validation.
  • Choose between rule-based, classifier-based, and LLM-based moderation approaches.
3Module 3: Permission Scoping & Least Privilege

When AI agents have tools, every tool is an attack surface. Learn to scope permissions, sandbox execution, implement approval flows, and design trust boundaries that limit blast radius.

Learning Goals

  • Apply least-privilege principles to AI agent tool access and data permissions.
  • Design trust boundaries and sandbox patterns for tool-using agents.
  • Implement human-in-the-loop approval flows for high-risk actions.
    AI Security & Guardrails | Synapse