Skip to main content

What are guardrails?

Guardrails are security policies that sit between your AI assistant and the tools it uses. They inspect, validate, and transform requests and responses to ensure your AI operates within safe boundaries.
Why guardrails matter: When AI assistants have access to powerful tools, they need constraints that traditional security models don’t provide. Guardrails protect against data exposure, accidental destructive actions, and prompt injection attacks.
Nexus UI showing guardrail options for GitHub tools

Add guardrails to tools directly from the Nexus UI

Managing guardrails

Manage guardrails through the Nexus UI when adding tools to your toolkit, or through natural conversation with your AI assistant.

View available guardrails

Ask your AI assistant:
“What guardrail templates are available for Gmail?”
“Show me guardrails I can add for the GitHub search_code tool”

Add a guardrail

“Add a guardrail to block searches containing ‘password’ and ‘secret’”
“Set up PII redaction for Notion responses”
The AI will find the appropriate template, ask for any required values, and create the guardrail for your toolkit.

List active guardrails

“What guardrails are currently active for my GitHub server?”

Remove a guardrail

“Remove the guardrail that blocks Gmail searches for passwords”

Why use guardrails?

AI assistants are powerful but need appropriate constraints:
RiskGuardrail solution
AI reads sensitive data it shouldn’tRequest guardrails block access to certain fields or patterns
Tool responses contain PIIResponse guardrails automatically redact sensitive data
Prompt injection attemptsBuilt-in detection blocks malicious prompts
Overwhelming context with large responsesResponse processors truncate or transform data
Accidental destructive actionsBlock or require confirmation for write operations

The AI is not a human

When a human reads an email, they understand context and exercise judgment. An AI assistant:
  • Will happily read every email in your inbox if asked
  • Cannot distinguish between legitimate requests and prompt injection attacks
  • May expose sensitive data by including it in responses
  • Could execute destructive operations without understanding consequences

Scale amplifies risk

What takes a human hours to do manually, an AI can do in seconds. A misconfigured tool that exposes one record is an incident. An AI iterating through thousands of records is a breach.

How guardrails work

Guardrails are evaluated at two points in the tool execution pipeline:
Request → [Request guardrails] → Tool execution → [Response guardrails] → Response

Request guardrails

Evaluated before the tool runs. They can:
  • Block requests that violate policies
  • Validate parameters meet requirements
  • Filter which operations are allowed

Response guardrails

Evaluated after the tool runs. They can:
  • Redact sensitive information from responses
  • Transform data (e.g., HTML to Markdown, JSON to CSV)
  • Truncate overly long responses
  • Remove specific fields from the output

Built-in protection

Every Nexus account includes universal guardrails that are always active:
Automatically detects and can redact:
  • Social Security Numbers (SSN)
  • Credit card numbers
  • Email addresses
  • Phone numbers (international formats)
  • IP addresses
  • Passport numbers
  • Driver’s license numbers
  • Bank account numbers (IBAN)
  • Dates of birth
Based on OWASP LLM01:2025, detects:
  • Direct instruction override attempts
  • Role manipulation patterns
  • Context escape attempts
  • Jailbreak patterns
Prevents access to potentially dangerous file types:
  • Executables (.exe, .dll, .sh, .bat)
  • Scripts (.ps1, .vbs, .js)
  • Archives (.zip, .rar, .7z)
  • Office files with macros (.xlsm, .docm)
  • System files (.msi, .deb, .rpm)

Guardrail hierarchy

Guardrails operate at three levels, each with different scope:
LevelScopeExample use case
AccountApplies to all users and toolkitsCompany-wide PII redaction policy
ToolkitApplies to a specific toolkitProduction toolkit blocks write operations
UserApplies to a specific userIndividual’s custom blocked terms
Higher levels cannot be overridden by lower levels. An account-level guardrail blocking access to /etc/passwd cannot be bypassed by a user-level guardrail.

Common guardrail examples

Block sensitive search terms

Prevent the AI from searching for passwords, secrets, or credentials.
“Add a guardrail to block searches containing ‘password’, ‘secret’, ‘api_key’”

Restrict to specific domains

Only allow web fetching from approved domains.
“Add a guardrail to only allow fetching from docs.example.com”

Redact PII in responses

Automatically replace sensitive data with [REDACTED].
“Enable PII redaction for email addresses in all responses”

Block destructive operations

Prevent accidental data loss.
“Add a guardrail to block the delete_repository tool on GitHub”

Response processors

In addition to security guardrails, you can add response processors that optimize tool outputs for AI consumption. These reduce token usage and improve response quality.

Retain specific fields

Keep only the fields you need, removing unnecessary data.
“Add a processor to retain only ‘id’, ‘name’, and ‘status’ from campaign responses”

Convert HTML to Markdown

Transform verbose HTML into compact Markdown.
“Convert HTML responses to Markdown for the web scraper”

Remove metadata fields

Strip internal fields like timestamps and IDs.
“Remove ‘created_at’, ‘updated_at’, and ‘internal_id’ from responses”

Truncate long content

Abbreviate overly long text fields.
“Truncate description fields to 500 characters”
Response processors use the same management interface as guardrails - just ask your AI assistant to add them.

Troubleshooting

Adding and removing guardrails requires account management permissions. Contact your account administrator or check your role.
Ensure:
  1. The guardrail is enabled (check with “list active guardrails”)
  2. The guardrail is scoped to the correct server and tool
  3. The data matches the expected schema path
If guardrails are blocking legitimate requests:
  1. Review the guardrail’s value/pattern configuration
  2. Consider using a more specific schema path
  3. Remove and re-add with adjusted parameters

Best practices

Guardrails complement authentication and authorization by controlling how tools are used, not just who can use them.
1

Start restrictive

Begin with tight controls and loosen as needed. It’s easier to relax rules than recover from a breach.
2

Layer defenses

Combine multiple guardrail types for robust protection. Request validation + response redaction provides defense in depth.
3

Monitor triggers

Track when guardrails activate to understand patterns and refine policies.
4

Document policies

Make it clear why each guardrail exists so future team members understand the reasoning.