Features
Explore Hadrian Gateway's comprehensive feature set
Hadrian Gateway includes a comprehensive feature set for production AI deployments. All features are free and included in the open-source release, dual-licensed under Apache 2.0 and MIT.
LLM Providers
Route requests to any major LLM provider through a unified OpenAI-compatible API.
| Provider | Streaming | Embeddings | Function Calling | Thinking/Reasoning |
|---|---|---|---|---|
| OpenAI | Yes | Yes | Yes | Yes (o1/o3) |
| Anthropic | Yes | No | Yes | Yes (extended thinking) |
| AWS Bedrock | Yes | Yes (Titan) | Yes | Yes (Claude) |
| Google Vertex | Yes | Yes | Yes | Yes (Gemini) |
| Azure OpenAI | Yes | Yes | Yes | Yes |
| OpenRouter | Yes | No | Yes | Varies |
| Any OpenAI-compatible | Yes | Varies | Varies | Varies |
Provider capabilities:
- Circuit breaker - Automatically disable unhealthy providers after repeated failures
- Automatic retry - Exponential backoff for transient errors (429, 5xx)
- Provider fallbacks - Chain providers: try A, then B, then C
- Model fallbacks - Graceful degradation:
gpt-4o→gpt-4o-mini→claude-sonnet - Health checks - Background monitoring to detect issues before user requests fail
- Model aliases - Create shortcuts like
sonnet→claude-sonnet-4-20250514
Multi-Tenancy
Hadrian supports a flexible multi-tenancy hierarchy for organizations of any size.
Organizations
└── Teams (optional)
└── Projects
└── Users
└── API KeysEach level in the hierarchy can have:
| Capability | Description |
|---|---|
| Dynamic providers | Bring your own API keys at any scope |
| Model pricing | Override pricing for cost calculations |
| Budget limits | Daily/monthly spending caps with enforcement |
| Rate limits | Requests and tokens per minute/day |
| Guardrails | Scope-specific content policies |
Resource ownership:
Resources can be owned at different levels depending on your organization's needs:
- Organization-level: Shared across all teams and projects
- Team-level: Shared within a team
- Project-level: Isolated to a specific project
- User-level: Personal resources (conversations, API keys)
Authentication & Authorization
Flexible authentication supporting multiple methods for API and UI access.
| Method | Use Case | Description |
|---|---|---|
| API Key | Programmatic access | gw_live_... format, budget limits |
| OIDC/OAuth | SSO with identity providers | Keycloak, Auth0, Okta, Azure AD |
| JWT | Service-to-service auth | JWKS validation, custom claims |
| Per-Org SSO | Multi-tenant SaaS | Self-service SSO per organization |
| Proxy Auth | Zero-trust networks | Cloudflare Access, Tailscale |
CEL-based authorization:
Use Common Expression Language (CEL) policies for fine-grained access control:
[[auth.rbac.policies]]
name = "org-admin"
condition = "'admin' in subject.roles && context.org_id in subject.org_ids"
effect = "allow"Guardrails
Block, warn, log, or redact content using configurable guardrails on both input and output.
Guardrail Providers
| Provider | Features | Best For |
|---|---|---|
| OpenAI Moderation | Hate, violence, sexual, self-harm categories | Free, fast, general-purpose |
| AWS Bedrock Guardrails | PII detection, topic filters, word filters, denied topics | Enterprise compliance |
| Azure Content Safety | Configurable severity thresholds, custom blocklists | Azure environments |
| Custom HTTP | Your own moderation service | Custom requirements |
| Regex patterns | PII patterns, blocklist terms | Simple rules |
Execution Modes
| Mode | Behavior | Latency Impact |
|---|---|---|
| Blocking | Evaluate before sending to LLM | Adds round-trip to guardrail |
| Concurrent | Race guardrails against LLM, cancel if violations | Minimal for passing requests |
| Post-response | Filter LLM output before returning | Adds round-trip after LLM |
Actions on violation:
block- Reject request with errorwarn- Log warning, allow requestlog- Silent logging onlyredact- Remove/mask violating content
Budget Enforcement
Prevent cost overruns with atomic budget reservation and real-time enforcement.
How it works:
1. Request arrives → Reserve estimated cost ($0.10)
2. Forward to LLM provider
3. Request completes with actual cost
4. Adjust: Replace estimate with actual costThis atomic reservation pattern prevents overspend even with concurrent requests.
Budget scopes:
- Organization-level budgets
- Team-level budgets
- Project-level budgets
- User-level budgets
Forecasting:
Built-in time-series forecasting with augurs:
- Projected spend for current period
- Days until budget exhaustion
- 95% confidence intervals
Vector Stores & RAG
OpenAI-compatible Vector Stores API for building RAG (Retrieval-Augmented Generation) applications.
Capabilities:
- Upload and process files (PDF, DOCX, TXT, Markdown, HTML, etc.)
- Automatic text extraction with OCR support via Kreuzberg
- Configurable chunking strategies (auto, fixed-size)
- Vector search with similarity scoring
- LLM-based re-ranking for improved relevance
- File search tool integration for Responses API
Vector backends:
| Backend | Use Case |
|---|---|
| pgvector | Simple setup, uses existing PostgreSQL |
| Qdrant | Dedicated vector DB, high performance |
| Pinecone | Managed service, serverless |
| Weaviate | Hybrid search, schema-based |
| ChromaDB | Lightweight, embedded |
Chat UI
Built-in React UI for multi-model conversations and administration.
Chat features:
- Multi-model comparison in single conversation
- Model instances (compare same model with different settings)
- Streaming markdown with syntax highlighting
- File uploads (images, PDFs)
- Conversation history with IndexedDB persistence
- Per-model settings (temperature, max tokens)
Chat modes:
| Mode | Description |
|---|---|
| Synthesized | Gather all responses, synthesize final answer |
| Chained | Sequential relay (output of one becomes input to next) |
| Debated | Multi-round argumentation between models |
| Council | Collaborative discussion with voting/consensus |
| Hierarchical | Coordinator delegates subtasks to workers |
Frontend Tools
Client-side tool execution in the browser via WebAssembly.
| Tool | Runtime | Capabilities |
|---|---|---|
| Python | Pyodide | numpy, pandas, matplotlib, scipy |
| JavaScript | QuickJS | Sandboxed JS execution |
| SQL | DuckDB | Query CSV/Parquet files |
| Charts | Vega-Lite | Interactive visualizations |
| HTML | iframe | Sandboxed preview |
Tool results are displayed inline as interactive artifacts and sent back to the LLM to continue the conversation.
MCP Integration
Connect to external tool servers using the Model Context Protocol (MCP).
Capabilities:
- Connect to MCP servers via Streamable HTTP transport
- Automatic tool discovery from connected servers
- Tool execution with result streaming
- Persistent server connections across conversations
Use cases:
- File system access and manipulation
- Database queries
- External API integrations
- Custom enterprise tools
Response Caching
Cache LLM responses to reduce costs and latency.
| Cache Type | Matching | Use Case |
|---|---|---|
| Exact match | SHA-256 hash of request | Identical requests |
| Semantic | Embedding similarity | Similar questions |
| Prompt (Anthropic) | Provider-side caching | Long system prompts |
Observability
Comprehensive monitoring and debugging capabilities.
| Feature | Endpoint/Format | Description |
|---|---|---|
| Metrics | /metrics (Prometheus) | Request latency, token counts, costs, errors |
| Tracing | OTLP export | Distributed traces to Jaeger, Tempo, etc. |
| Logging | JSON or compact | Structured logs with configurable levels |
| Usage | Database + OTLP | Token usage, costs per user/project/org |
[observability]
logging.format = "json"
logging.level = "info"
[observability.tracing]
enabled = true
exporter = "otlp"
endpoint = "http://localhost:4317"
[observability.metrics]
enabled = trueData Privacy & GDPR
Built-in compliance features for data protection regulations.
Capabilities:
- Self-service data export (GDPR Article 15 - Right of Access)
- Self-service account deletion (GDPR Article 17 - Right to Erasure)
- Configurable data retention policies
- CSV export reports for compliance audits
- Audit logging for all privacy operations