Overview

Guardrails are security policies that control how AI assistants interact with tools and data. Unlike traditional access controls that simply grant or deny access, guardrails provide nuanced, context-aware rules that make AI systems safer and more predictable.

Why Guardrails Matter

When AI assistants use tools, they need different constraints than human users:

  • Content Limits: An AI shouldn’t read 10GB files that would overwhelm its context
  • Semantic Filtering: Block prompts trying to access sensitive data patterns
  • Behavioral Rules: Prevent actions that make sense for humans but not for AI
  • Dynamic Policies: Adjust permissions based on the conversation context

Types of Guardrails

Input Guardrails

Control what goes into the AI system:

  • Domain Filtering: Limit which websites can be accessed
  • File Type Restrictions: Block binary files or specific formats
  • Size Limits: Cap file sizes and directory traversal depth
  • Pattern Matching: Detect and block sensitive data patterns

Output Guardrails

Control what comes out of tools:

  • Data Sanitization: Remove sensitive information from responses
  • Format Enforcement: Ensure outputs match expected schemas
  • Content Filtering: Block inappropriate or harmful content
  • Response Limits: Prevent overwhelming the AI with data

Behavioral Guardrails

Control how the AI uses tools:

  • Rate Limiting: Prevent excessive API calls
  • Sequence Controls: Enforce proper tool usage order
  • State Validation: Ensure operations happen in valid states
  • Audit Requirements: Force logging of certain operations

Implementation Approaches

1. Proxy-Based

Insert guardrails between the AI and tools:

AI → Guardrail Proxy → Tool

Benefits: No tool modification needed, centralized control

2. SDK-Based

Build guardrails into tool implementations:

@guardrail({ maxFileSize: '10MB' })
async function readFile(path: string) { ... }

Benefits: Fine-grained control, better performance

3. Policy Engine

Define rules in a central policy system:

policies:
  - resource: "filesystem"
    rules:
      - deny: { path_prefix: "/etc" }
      - allow: { file_size: { max: "10MB" } }

Benefits: Declarative, auditable, version-controlled

Best Practices

  1. Start Restrictive: Begin with tight controls and loosen as needed
  2. Layer Defense: Combine multiple guardrail types for robust protection
  3. Monitor and Adapt: Track guardrail triggers to refine policies
  4. Document Policies: Make it clear why each guardrail exists
  5. Test Thoroughly: Verify guardrails work without breaking functionality

In Civic Labs

Our Guardrail Proxy implements these concepts for MCP servers, allowing you to:

  • Add LLM-specific safety rules to any MCP tool
  • Chain multiple guardrails for defense in depth
  • Customize policies without modifying tools
  • Monitor and audit all AI-tool interactions

Next Steps