How Prompt Injection Works
AI agents follow instructions in their context. Attackers exploit this by embedding new instructions inside content that the agent is asked to process.Direct Injection
The user or an adversarial input directly provides malicious instructions:Indirect Injection
Malicious instructions hidden in external content that the agent reads:- A document the agent is asked to summarize contains hidden instructions
- A web page the agent visits includes text instructing it to take different actions
- An email in the inbox contains instructions to forward or delete other emails
Tool Manipulation
Instructions that leverage the agent’s tool access:Real-World Impacts for Agent Builders
| Impact | Example |
|---|---|
| Data exfiltration | Agent sends sensitive emails to an attacker’s address |
| Privilege escalation | Agent switches to a toolkit with broader permissions |
| Destructive actions | Agent deletes calendar events or files based on injected instructions |
| Guardrail removal | Agent removes its own safety constraints if not locked |
| Financial abuse | Agent triggers API calls that consume credits or make purchases |
How Civic Defends Against Prompt Injection
Toolkit Locking
When you deploy an agent with a specific toolkit, lock it using theprofile URL parameter:
- Switching to other toolkits (the
switch_profiletool is hidden) - Modifying its own guardrails
- Accessing tools outside the defined toolkit
Guardrails
Guardrails enforce constraints at the protocol level — they are not part of the agent’s prompt and cannot be overridden by injected instructions. For example:- A guardrail blocking
send_gmail_messageprevents sending email even if the agent is instructed to - A parameter preset locks specific tool parameters to fixed values regardless of what the agent is told
Least-Privilege Toolkits
Build toolkits with only the tools the agent needs for its specific purpose. An agent that only needs to read calendar events should not have access todelete_event or modify_event. Reducing the attack surface reduces the impact of a successful injection.
Secret Isolation
Civic stores credentials in the Hub — not in the agent’s context. A prompt injection cannot instruct the agent to print API keys or OAuth tokens because the agent never has access to them.Secret Management
How Civic keeps credentials out of the agent’s context
Audit Trail
All tool calls are logged regardless of whether they were legitimate or injection-triggered. If an agent is compromised, the audit log provides a complete record of what it did.Audit and Observability
Query your agent’s complete activity log
Best Practices
- Lock all production toolkits — Use
?profile=your-toolkitto prevent toolkit switching - Apply least-privilege — Only include tools the agent genuinely needs
- Add guardrails for destructive tools — Block
delete_event,send_gmail_message, and similar high-risk tools on automated agents - Monitor the audit log — Watch for unexpected tool calls that may indicate an injection
- Revoke immediately if compromised — The kill switch is available at any granularity
Detection Patterns
Common signs that an agent may be injection-affected:- Unexpected tool calls outside the agent’s normal workflow
- Tool calls with unusual parameters (e.g., forwarding to an external email address)
- Sudden changes in the agent’s described behavior
- Requests to load new skills or switch toolkits
Guardrails
Protocol-level constraints that cannot be overridden by injected instructions
Revocation
Instant kill switch when you suspect a compromised agent
Audit
Review what your agent did to identify injection-triggered actions
Hooks
Middleware layer for custom filtering and validation

