Basic Prompt Injection Defense
Detect and filter obvious prompt injection patterns using regex-based detection.
Warning: This plugin provides basic protection only and is NOT suitable for production use. Regex-based detection can be bypassed by sophisticated attacks. For production environments, implement AI-based detection or human review processes.
Quick Reference
| Property | Value |
|---|---|
| Handler | basic_prompt_injection_defense |
| Type | Security |
| Scope | Global (can be configured globally or per-server) |
Detection Categories
The plugin detects three categories of prompt injection patterns:
| Category | Description | Examples |
|---|---|---|
delimiter_injection |
Attempts to break out of context using delimiters | Triple quotes, markdown code blocks, XML tags with system/admin keywords |
role_manipulation |
Attempts to change the AI's role or persona | "you are now an admin", "act as system", "ignore previous role" |
context_hijacking |
Attempts to reset or override context | "ignore all previous instructions", "forget everything", "reset context" |
Configuration Reference
action
What to do when a prompt injection is detected.
| Value | Description |
|---|---|
block |
Reject the request/response entirely |
redact |
Replace injection patterns with [PROMPT INJECTION REDACTED by Gatekit] |
audit_only |
Log detection but allow through unchanged |
Default: redact
sensitivity
Controls pattern matching aggressiveness.
| Value | Description |
|---|---|
relaxed |
Fewer patterns, only the most obvious attacks. Minimizes false positives. |
standard |
Balanced detection with good coverage and acceptable false positive rate. |
strict |
Maximum protection with more patterns. May have higher false positive rate. |
Default: standard
detection_methods
Configure which detection categories to enable. Each has an enabled boolean.
| Method | Description | Default |
|---|---|---|
delimiter_injection |
Detect delimiter-based attacks | Enabled |
role_manipulation |
Detect role/persona manipulation | Enabled |
context_hijacking |
Detect context reset attempts | Enabled |
YAML Configuration
Minimal Configuration
plugins:
- handler: basic_prompt_injection_defense
enabled: true
Full Configuration
plugins:
- handler: basic_prompt_injection_defense
enabled: true
priority: 15 # Run early, after PII/secrets filters
critical: true # Block requests if plugin fails
action: redact # block | redact | audit_only
sensitivity: standard # relaxed | standard | strict
detection_methods:
delimiter_injection:
enabled: true
role_manipulation:
enabled: true
context_hijacking:
enabled: true
High-Security Example (Block Mode)
plugins:
- handler: basic_prompt_injection_defense
enabled: true
priority: 5
action: block # Never allow injections through
sensitivity: strict # Maximum detection (accept more false positives)
detection_methods:
delimiter_injection:
enabled: true
role_manipulation:
enabled: true
context_hijacking:
enabled: true
Low False-Positive Example
plugins:
- handler: basic_prompt_injection_defense
enabled: true
action: audit_only # Log but don't interfere
sensitivity: relaxed # Only the most obvious attacks
detection_methods:
delimiter_injection:
enabled: true
role_manipulation:
enabled: true
context_hijacking:
enabled: false # Many legitimate uses of "start fresh"
Pattern Examples by Sensitivity
Relaxed Mode
[SYSTEM]...[/SYSTEM]delimiters<system>...</system>XML tags- "you are now an admin/administrator/root/superuser/DAN"
- "you are now operating as root/admin/system"
- "ignore all previous instructions"
- "forget everything you were told"
- "override/bypass all safety protocols"
Standard Mode (adds)
- Triple quotes with injection keywords
- Markdown code blocks with system/admin keywords
- "ignore/disregard previous instructions/commands"
- "act as admin" patterns
- "with elevated privileges"
- "reset context/conversation/session"
- "start fresh and ignore"
Strict Mode (adds)
- Double quotes with bypass keywords
- Shorter "you are admin" patterns
- "admin mode/access/override"
- "new conversation/session" (may catch legitimate uses)
Limitations
This plugin will NOT detect:
- Semantic injections - Attacks that use meaning rather than keywords
- Encoded attacks - Base64, ROT13, or other encoding schemes (ROT13 was removed due to false positives)
- Synonym-based evasion - Using alternative words with same meaning
- Multi-turn attacks - Gradual context manipulation across messages
- Context-dependent manipulation - Attacks that depend on prior conversation
- Advanced jailbreaking - Sophisticated techniques that don't match simple patterns
- Injections in responses - Detection works on both requests and responses, but file contents from upstream servers may contain injection patterns that trigger false positives
Redaction Format
When action: redact, detected patterns are replaced with:
[PROMPT INJECTION REDACTED by Gatekit]
Security Note
Matched injection text is intentionally excluded from audit logs to prevent "log replay attacks" where an AI reviewing logs could be affected by injection patterns stored in the logs.