Hadrian is experimental alpha software. Do not use in production.
Hadrian
ConfigurationFeatures

Guardrails

Configure content filtering, PII detection, and safety enforcement

The [features.guardrails] section configures comprehensive content filtering for both input (pre-request) and output (post-response). Supports multiple providers, execution modes, and fine-grained actions.

Configuration Reference

Main Settings

[features.guardrails]
enabled = true
KeyTypeDefaultDescription
enabledbooleantrueEnable guardrails globally

Input Guardrails

Evaluate user messages before sending to the LLM:

[features.guardrails.input]
enabled = true
mode = "blocking"
timeout_ms = 5000
on_timeout = "block"
on_error = "block"
default_action = "block"

[features.guardrails.input.actions]
HATE = "block"
VIOLENCE = "warn"
SEXUAL = "log"

[features.guardrails.input.provider]
type = "openai_moderation"
KeyTypeDefaultDescription
enabledbooleantrueEnable input guardrails
modestring"blocking"blocking or concurrent
timeout_msinteger5000Evaluation timeout
on_timeoutstring"block"block or allow
on_errorstring"block"block, allow, or log_and_allow
default_actionstring"block"Default action for unconfigured categories

Execution Modes

ModeBehaviorLatency
blockingWait for guardrails before LLM callAdds round-trip
concurrentRace guardrails against LLM, cancel on violationMinimal for passing requests

Output Guardrails

Evaluate LLM responses before returning to users:

[features.guardrails.output]
enabled = true
timeout_ms = 5000
on_error = "block"
default_action = "block"
streaming_mode = "final_only"

[features.guardrails.output.provider]
type = "openai_moderation"
KeyTypeDefaultDescription
enabledbooleantrueEnable output guardrails
timeout_msinteger5000Evaluation timeout
on_errorstring"block"Error handling action
default_actionstring"block"Default action
streaming_modestring"final_only"Streaming evaluation mode

Streaming Modes

ModeBehaviorTrade-off
final_onlyEvaluate complete response after streamingFastest, harmful content may stream
bufferedEvaluate every N tokensBalance latency/safety
per_chunkEvaluate each chunkSafest, highest latency
# Buffered mode configuration
streaming_mode = { buffered = { buffer_tokens = 100 } }

Providers

OpenAI Moderation

Free, fast, general-purpose content moderation:

[features.guardrails.input.provider]
type = "openai_moderation"
api_key = "${OPENAI_API_KEY}"      # Optional, uses default provider key
base_url = "https://api.openai.com/v1"
model = "text-moderation-latest"
KeyTypeDefaultDescription
api_keystringnoneOpenAI API key (optional)
base_urlstring"https://api.openai.com/v1"API base URL
modelstring"text-moderation-latest"Moderation model

AWS Bedrock Guardrails

Enterprise-grade with configurable policies:

[features.guardrails.input.provider]
type = "bedrock"
guardrail_id = "abc123"
guardrail_version = "1"
region = "us-east-1"
access_key_id = "${AWS_ACCESS_KEY_ID}"
secret_access_key = "${AWS_SECRET_ACCESS_KEY}"
trace_enabled = false
KeyTypeDefaultDescription
guardrail_idstringrequiredBedrock guardrail ID
guardrail_versionstringrequiredGuardrail version
regionstringnoneAWS region
access_key_idstringnoneAWS access key (uses env if not set)
secret_access_keystringnoneAWS secret key (uses env if not set)
trace_enabledbooleanfalseEnable debug tracing

Azure Content Safety

Enterprise-grade with severity thresholds:

[features.guardrails.input.provider]
type = "azure_content_safety"
endpoint = "https://your-resource.cognitiveservices.azure.com"
api_key = "${AZURE_CONTENT_SAFETY_KEY}"
api_version = "2024-09-01"
blocklist_names = ["custom-blocklist"]

[features.guardrails.input.provider.thresholds]
Hate = 2
Violence = 4
Sexual = 2
SelfHarm = 0
KeyTypeDefaultDescription
endpointstringrequiredAzure endpoint URL
api_keystringrequiredAzure API key
api_versionstring"2024-09-01"API version
thresholdsmapnoneCategory severity thresholds (0-6)
blocklist_namesarray[]Custom blocklist names

Blocklist (Built-in)

Fast, local pattern matching:

[features.guardrails.input.provider]
type = "blocklist"
case_insensitive = true

[[features.guardrails.input.provider.patterns]]
pattern = "competitor-name"
is_regex = false
category = "blocked_content"
severity = "high"
message = "Mentions of competitors are not allowed"

[[features.guardrails.input.provider.patterns]]
pattern = "\\b(hack|exploit)\\b"
is_regex = true
category = "security"
severity = "medium"
KeyTypeDefaultDescription
case_insensitivebooleantrueCase-insensitive matching
patternsarrayrequiredList of patterns

Pattern fields:

KeyTypeDefaultDescription
patternstringrequiredPattern to match
is_regexbooleanfalseTreat as regex
categorystring"blocked_content"Category on match
severitystring"high"Severity level
messagestringnoneHuman-readable explanation

PII Regex (Built-in)

Fast, local PII detection:

[features.guardrails.input.provider]
type = "pii_regex"
email = true
phone = true
ssn = true
credit_card = true
ip_address = true
date_of_birth = true
KeyTypeDefaultDescription
emailbooleantrueDetect email addresses
phonebooleantrueDetect phone numbers
ssnbooleantrueDetect Social Security Numbers
credit_cardbooleantrueDetect credit cards (Luhn validation)
ip_addressbooleantrueDetect IP addresses
date_of_birthbooleantrueDetect potential DOBs

Content Limits (Built-in)

Enforce size constraints:

[features.guardrails.input.provider]
type = "content_limits"
max_characters = 10000
max_words = 2000
max_lines = 500
KeyTypeDefaultDescription
max_charactersintegernoneMaximum characters
max_wordsintegernoneMaximum words
max_linesintegernoneMaximum lines

Custom HTTP

Bring your own guardrails:

[features.guardrails.input.provider]
type = "custom"
url = "https://my-guardrails.example.com/evaluate"
api_key = "${CUSTOM_GUARDRAILS_KEY}"
timeout_ms = 5000
retry_enabled = true
max_retries = 2

[features.guardrails.input.provider.headers]
X-Custom-Header = "value"
KeyTypeDefaultDescription
urlstringrequiredGuardrails service URL
api_keystringnoneAPI key for authentication
timeout_msinteger5000Request timeout
retry_enabledbooleanfalseEnable retries
max_retriesinteger2Maximum retries
headersmap{}Custom headers

PII Detection

Dedicated PII handling with [features.guardrails.pii]:

[features.guardrails.pii]
enabled = true
action = "redact"
replacement = "[PII REDACTED]"
apply_to = "both"
types = ["EMAIL", "PHONE", "SSN", "CREDIT_CARD"]

[features.guardrails.pii.provider]
type = "regex"
KeyTypeDefaultDescription
enabledbooleantrueEnable PII detection
actionstring"redact"block, redact, anonymize, log
replacementstring"[PII REDACTED]"Replacement text
apply_tostring"both"input, output, both
typesarraycommon typesPII types to detect

PII types: EMAIL, PHONE, SSN, CREDIT_CARD, ADDRESS, NAME, DATE_OF_BIRTH, DRIVERS_LICENSE, PASSPORT, BANK_ACCOUNT, IP_ADDRESS, MAC_ADDRESS, URL, USERNAME, PASSWORD, AWS_ACCESS_KEY, AWS_SECRET_KEY, API_KEY

Actions

ActionBehavior
blockReject request/response with error
warnAllow with warning headers
logAllow silently, log violation
redactReplace violating content
modifyProvider-specific transformation
# Per-category action configuration
[features.guardrails.input.actions]
HATE = "block"
VIOLENCE = "block"
SEXUAL = "warn"
HARASSMENT = "log"
SELF_HARM = "block"

Audit Logging

[features.guardrails.audit]
enabled = true
log_all_evaluations = false
log_blocked = true
log_violations = true
log_redacted = true
KeyTypeDefaultDescription
enabledbooleantrueEnable audit logging
log_all_evaluationsbooleanfalseLog all evaluations (high volume)
log_blockedbooleantrueLog blocked requests
log_violationsbooleantrueLog policy violations
log_redactedbooleantrueLog redaction events

Complete Example

[features.guardrails]
enabled = true

# Input guardrails with OpenAI
[features.guardrails.input]
enabled = true
mode = "blocking"
timeout_ms = 5000
on_timeout = "block"
on_error = "block"
default_action = "block"

[features.guardrails.input.provider]
type = "openai_moderation"

[features.guardrails.input.actions]
HATE = "block"
VIOLENCE = "block"
SEXUAL = "warn"
HARASSMENT = "log"

# Output guardrails with Bedrock
[features.guardrails.output]
enabled = true
timeout_ms = 10000
on_error = "log_and_allow"
default_action = "block"
streaming_mode = "final_only"

[features.guardrails.output.provider]
type = "bedrock"
guardrail_id = "abc123"
guardrail_version = "1"
region = "us-east-1"

# PII redaction
[features.guardrails.pii]
enabled = true
action = "redact"
replacement = "[REDACTED]"
apply_to = "both"
types = ["EMAIL", "PHONE", "SSN", "CREDIT_CARD"]

[features.guardrails.pii.provider]
type = "regex"

# Audit logging
[features.guardrails.audit]
enabled = true
log_all_evaluations = false
log_blocked = true
log_violations = true
log_redacted = true

See Also

On this page