Guardrails Configure content filtering, PII detection, and safety enforcement
The [features.guardrails] section configures comprehensive content filtering for both input (pre-request) and output (post-response). Supports multiple providers, execution modes, and fine-grained actions.
[ features . guardrails ]
enabled = true
Key Type Default Description enabledboolean trueEnable guardrails globally
Evaluate user messages before sending to the LLM:
[ features . guardrails . input ]
enabled = true
mode = "blocking"
timeout_ms = 5000
on_timeout = "block"
on_error = "block"
default_action = "block"
[ features . guardrails . input . actions ]
HATE = "block"
VIOLENCE = "warn"
SEXUAL = "log"
[ features . guardrails . input . provider ]
type = "openai_moderation"
Key Type Default Description enabledboolean trueEnable input guardrails modestring "blocking"blocking or concurrenttimeout_msinteger 5000Evaluation timeout on_timeoutstring "block"block or allowon_errorstring "block"block, allow, or log_and_allowdefault_actionstring "block"Default action for unconfigured categories
Mode Behavior Latency blockingWait for guardrails before LLM call Adds round-trip concurrentRace guardrails against LLM, cancel on violation Minimal for passing requests
Evaluate LLM responses before returning to users:
[ features . guardrails . output ]
enabled = true
timeout_ms = 5000
on_error = "block"
default_action = "block"
streaming_mode = "final_only"
[ features . guardrails . output . provider ]
type = "openai_moderation"
Key Type Default Description enabledboolean trueEnable output guardrails timeout_msinteger 5000Evaluation timeout on_errorstring "block"Error handling action default_actionstring "block"Default action streaming_modestring "final_only"Streaming evaluation mode
Mode Behavior Trade-off final_onlyEvaluate complete response after streaming Fastest, harmful content may stream bufferedEvaluate every N tokens Balance latency/safety per_chunkEvaluate each chunk Safest, highest latency
# Buffered mode configuration
streaming_mode = { buffered = { buffer_tokens = 100 } }
Free, fast, general-purpose content moderation:
[ features . guardrails . input . provider ]
type = "openai_moderation"
api_key = "${OPENAI_API_KEY}" # Optional, uses default provider key
base_url = "https://api.openai.com/v1"
model = "text-moderation-latest"
Key Type Default Description api_keystring none OpenAI API key (optional) base_urlstring "https://api.openai.com/v1"API base URL modelstring "text-moderation-latest"Moderation model
Enterprise-grade with configurable policies:
[ features . guardrails . input . provider ]
type = "bedrock"
guardrail_id = "abc123"
guardrail_version = "1"
region = "us-east-1"
access_key_id = "${AWS_ACCESS_KEY_ID}"
secret_access_key = "${AWS_SECRET_ACCESS_KEY}"
trace_enabled = false
Key Type Default Description guardrail_idstring required Bedrock guardrail ID guardrail_versionstring required Guardrail version regionstring none AWS region access_key_idstring none AWS access key (uses env if not set) secret_access_keystring none AWS secret key (uses env if not set) trace_enabledboolean falseEnable debug tracing
Enterprise-grade with severity thresholds:
[ features . guardrails . input . provider ]
type = "azure_content_safety"
endpoint = "https://your-resource.cognitiveservices.azure.com"
api_key = "${AZURE_CONTENT_SAFETY_KEY}"
api_version = "2024-09-01"
blocklist_names = [ "custom-blocklist" ]
[ features . guardrails . input . provider . thresholds ]
Hate = 2
Violence = 4
Sexual = 2
SelfHarm = 0
Key Type Default Description endpointstring required Azure endpoint URL api_keystring required Azure API key api_versionstring "2024-09-01"API version thresholdsmap none Category severity thresholds (0-6) blocklist_namesarray []Custom blocklist names
Fast, local pattern matching:
[ features . guardrails . input . provider ]
type = "blocklist"
case_insensitive = true
[[ features . guardrails . input . provider . patterns ]]
pattern = "competitor-name"
is_regex = false
category = "blocked_content"
severity = "high"
message = "Mentions of competitors are not allowed"
[[ features . guardrails . input . provider . patterns ]]
pattern = " \\ b(hack|exploit) \\ b"
is_regex = true
category = "security"
severity = "medium"
Key Type Default Description case_insensitiveboolean trueCase-insensitive matching patternsarray required List of patterns
Pattern fields:
Key Type Default Description patternstring required Pattern to match is_regexboolean falseTreat as regex categorystring "blocked_content"Category on match severitystring "high"Severity level messagestring none Human-readable explanation
Fast, local PII detection:
[ features . guardrails . input . provider ]
type = "pii_regex"
email = true
phone = true
ssn = true
credit_card = true
ip_address = true
date_of_birth = true
Key Type Default Description emailboolean trueDetect email addresses phoneboolean trueDetect phone numbers ssnboolean trueDetect Social Security Numbers credit_cardboolean trueDetect credit cards (Luhn validation) ip_addressboolean trueDetect IP addresses date_of_birthboolean trueDetect potential DOBs
Enforce size constraints:
[ features . guardrails . input . provider ]
type = "content_limits"
max_characters = 10000
max_words = 2000
max_lines = 500
Key Type Default Description max_charactersinteger none Maximum characters max_wordsinteger none Maximum words max_linesinteger none Maximum lines
Bring your own guardrails:
[ features . guardrails . input . provider ]
type = "custom"
url = "https://my-guardrails.example.com/evaluate"
api_key = "${CUSTOM_GUARDRAILS_KEY}"
timeout_ms = 5000
retry_enabled = true
max_retries = 2
[ features . guardrails . input . provider . headers ]
X-Custom-Header = "value"
Key Type Default Description urlstring required Guardrails service URL api_keystring none API key for authentication timeout_msinteger 5000Request timeout retry_enabledboolean falseEnable retries max_retriesinteger 2Maximum retries headersmap {}Custom headers
Dedicated PII handling with [features.guardrails.pii]:
[ features . guardrails . pii ]
enabled = true
action = "redact"
replacement = "[PII REDACTED]"
apply_to = "both"
types = [ "EMAIL" , "PHONE" , "SSN" , "CREDIT_CARD" ]
[ features . guardrails . pii . provider ]
type = "regex"
Key Type Default Description enabledboolean trueEnable PII detection actionstring "redact"block, redact, anonymize, logreplacementstring "[PII REDACTED]"Replacement text apply_tostring "both"input, output, bothtypesarray common types PII types to detect
PII types: EMAIL, PHONE, SSN, CREDIT_CARD, ADDRESS, NAME, DATE_OF_BIRTH, DRIVERS_LICENSE, PASSPORT, BANK_ACCOUNT, IP_ADDRESS, MAC_ADDRESS, URL, USERNAME, PASSWORD, AWS_ACCESS_KEY, AWS_SECRET_KEY, API_KEY
Action Behavior blockReject request/response with error warnAllow with warning headers logAllow silently, log violation redactReplace violating content modifyProvider-specific transformation
# Per-category action configuration
[ features . guardrails . input . actions ]
HATE = "block"
VIOLENCE = "block"
SEXUAL = "warn"
HARASSMENT = "log"
SELF_HARM = "block"
[ features . guardrails . audit ]
enabled = true
log_all_evaluations = false
log_blocked = true
log_violations = true
log_redacted = true
Key Type Default Description enabledboolean trueEnable audit logging log_all_evaluationsboolean falseLog all evaluations (high volume) log_blockedboolean trueLog blocked requests log_violationsboolean trueLog policy violations log_redactedboolean trueLog redaction events
[ features . guardrails ]
enabled = true
# Input guardrails with OpenAI
[ features . guardrails . input ]
enabled = true
mode = "blocking"
timeout_ms = 5000
on_timeout = "block"
on_error = "block"
default_action = "block"
[ features . guardrails . input . provider ]
type = "openai_moderation"
[ features . guardrails . input . actions ]
HATE = "block"
VIOLENCE = "block"
SEXUAL = "warn"
HARASSMENT = "log"
# Output guardrails with Bedrock
[ features . guardrails . output ]
enabled = true
timeout_ms = 10000
on_error = "log_and_allow"
default_action = "block"
streaming_mode = "final_only"
[ features . guardrails . output . provider ]
type = "bedrock"
guardrail_id = "abc123"
guardrail_version = "1"
region = "us-east-1"
# PII redaction
[ features . guardrails . pii ]
enabled = true
action = "redact"
replacement = "[REDACTED]"
apply_to = "both"
types = [ "EMAIL" , "PHONE" , "SSN" , "CREDIT_CARD" ]
[ features . guardrails . pii . provider ]
type = "regex"
# Audit logging
[ features . guardrails . audit ]
enabled = true
log_all_evaluations = false
log_blocked = true
log_violations = true
log_redacted = true