Features

Hadrian Gateway includes a comprehensive feature set for production AI deployments. All features are free and included in the open-source release, dual-licensed under Apache 2.0 and MIT.

LLM Providers

Route requests to any major LLM provider through a unified OpenAI-compatible API.

Provider	Streaming	Embeddings	Function Calling	Thinking/Reasoning
OpenAI	Yes	Yes	Yes	Yes (o1/o3)
Anthropic	Yes	No	Yes	Yes (extended thinking)
AWS Bedrock	Yes	Yes (Titan)	Yes	Yes (Claude)
Google Vertex	Yes	Yes	Yes	Yes (Gemini)
Azure OpenAI	Yes	Yes	Yes	Yes
OpenRouter	Yes	No	Yes	Varies
Any OpenAI-compatible	Yes	Varies	Varies	Varies

Provider capabilities:

Circuit breaker - Automatically disable unhealthy providers after repeated failures
Automatic retry - Exponential backoff for transient errors (429, 5xx)
Provider fallbacks - Chain providers: try A, then B, then C
Model fallbacks - Graceful degradation: gpt-4o → gpt-4o-mini → claude-sonnet
Health checks - Background monitoring to detect issues before user requests fail
Model aliases - Create shortcuts like sonnet → claude-sonnet-4-20250514

Provider Features

Provider Configuration

Multi-Tenancy

Hadrian supports a flexible multi-tenancy hierarchy for organizations of any size.

Organizations
  └── Teams (optional)
        └── Projects
              └── Users
                    └── API Keys

Each level in the hierarchy can have:

Capability	Description
Dynamic providers	Bring your own API keys at any scope
Model pricing	Override pricing for cost calculations
Budget limits	Daily/monthly spending caps with enforcement
Rate limits	Requests and tokens per minute/day
Guardrails	Scope-specific content policies

Resource ownership:

Resources can be owned at different levels depending on your organization's needs:

Organization-level: Shared across all teams and projects
Team-level: Shared within a team
Project-level: Isolated to a specific project
User-level: Personal resources (conversations, API keys)

Multi-Tenancy Guide

Authentication & Authorization

Flexible authentication supporting multiple methods for API and UI access.

Method	Use Case	Description
API Key	Programmatic access	`gw_live_...` format, budget limits
OIDC/OAuth	SSO with identity providers	Keycloak, Auth0, Okta, Azure AD
JWT	Service-to-service auth	JWKS validation, custom claims
Per-Org SSO	Multi-tenant SaaS	Self-service SSO per organization
Proxy Auth	Zero-trust networks	Cloudflare Access, Tailscale

CEL-based authorization:

Use Common Expression Language (CEL) policies for fine-grained access control:

[[auth.rbac.policies]]
name = "org-admin"
condition = "'admin' in subject.roles && context.org_id in subject.org_ids"
effect = "allow"

Authentication Overview

SSO Admin Guide

Authorization

Configuration Reference

Guardrails

Block, warn, log, or redact content using configurable guardrails on both input and output.

Guardrail Providers

Provider	Features	Best For
OpenAI Moderation	Hate, violence, sexual, self-harm categories	Free, fast, general-purpose
AWS Bedrock Guardrails	PII detection, topic filters, word filters, denied topics	Enterprise compliance
Azure Content Safety	Configurable severity thresholds, custom blocklists	Azure environments
Custom HTTP	Your own moderation service	Custom requirements
Regex patterns	PII patterns, blocklist terms	Simple rules

Execution Modes

Mode	Behavior	Latency Impact
Blocking	Evaluate before sending to LLM	Adds round-trip to guardrail
Concurrent	Race guardrails against LLM, cancel if violations	Minimal for passing requests
Post-response	Filter LLM output before returning	Adds round-trip after LLM

Actions on violation:

block - Reject request with error
warn - Log warning, allow request
log - Silent logging only
redact - Remove/mask violating content

Guardrails Configuration

Budget Enforcement

Prevent cost overruns with atomic budget reservation and real-time enforcement.

How it works:

1. Request arrives → Reserve estimated cost ($0.10)
2. Forward to LLM provider
3. Request completes with actual cost
4. Adjust: Replace estimate with actual cost

This atomic reservation pattern prevents overspend even with concurrent requests.

Budget scopes:

Organization-level budgets
Team-level budgets
Project-level budgets
User-level budgets

Forecasting:

Built-in time-series forecasting with augurs:

Projected spend for current period
Days until budget exhaustion
95% confidence intervals

Budget Configuration

Vector Stores & RAG

OpenAI-compatible Vector Stores API for building RAG (Retrieval-Augmented Generation) applications.

Capabilities:

Upload and process files (PDF, DOCX, TXT, Markdown, HTML, etc.)
Automatic text extraction with OCR support via Kreuzberg
Configurable chunking strategies (auto, fixed-size)
Vector search with similarity scoring
LLM-based re-ranking for improved relevance
File search tool integration for Responses API

Vector backends:

Backend	Use Case
pgvector	Simple setup, uses existing PostgreSQL
Qdrant	Dedicated vector DB, high performance
Pinecone	Managed service, serverless
Weaviate	Hybrid search, schema-based
ChromaDB	Lightweight, embedded

Knowledge Bases Guide

Chat UI

Built-in React UI for multi-model conversations and administration.

Chat features:

Multi-model comparison in single conversation
Model instances (compare same model with different settings)
Streaming markdown with syntax highlighting
File uploads (images, PDFs)
Conversation history with IndexedDB persistence
Per-model settings (temperature, max tokens)

Chat modes:

Mode	Description
Synthesized	Gather all responses, synthesize final answer
Chained	Sequential relay (output of one becomes input to next)
Debated	Multi-round argumentation between models
Council	Collaborative discussion with voting/consensus
Hierarchical	Coordinator delegates subtasks to workers

Chat UI Features

Chat Modes

Frontend Tools

Client-side tool execution in the browser via WebAssembly.

Tool	Runtime	Capabilities
Python	Pyodide	numpy, pandas, matplotlib, scipy
JavaScript	QuickJS	Sandboxed JS execution
SQL	DuckDB	Query CSV/Parquet files
Charts	Vega-Lite	Interactive visualizations
HTML	iframe	Sandboxed preview

Tool results are displayed inline as interactive artifacts and sent back to the LLM to continue the conversation.

Frontend Tools

MCP Integration

Connect to external tool servers using the Model Context Protocol (MCP).

Capabilities:

Connect to MCP servers via Streamable HTTP transport
Automatic tool discovery from connected servers
Tool execution with result streaming
Persistent server connections across conversations

Use cases:

File system access and manipulation
Database queries
External API integrations
Custom enterprise tools

MCP Integration

Response Caching

Cache LLM responses to reduce costs and latency.

Cache Type	Matching	Use Case
Exact match	SHA-256 hash of request	Identical requests
Semantic	Embedding similarity	Similar questions
Prompt (Anthropic)	Provider-side caching	Long system prompts

Caching Configuration

Observability

Comprehensive monitoring and debugging capabilities.

Feature	Endpoint/Format	Description
Metrics	`/metrics` (Prometheus)	Request latency, token counts, costs, errors
Tracing	OTLP export	Distributed traces to Jaeger, Tempo, etc.
Logging	JSON or compact	Structured logs with configurable levels
Usage	Database + OTLP	Token usage, costs per user/project/org

[observability]
logging.format = "json"
logging.level = "info"

[observability.tracing]
enabled = true
exporter = "otlp"
endpoint = "http://localhost:4317"

[observability.metrics]
enabled = true

Built-in compliance features for data protection regulations.

Capabilities:

Self-service data export (GDPR Article 15 - Right of Access)
Self-service account deletion (GDPR Article 17 - Right to Erasure)
Configurable data retention policies
CSV export reports for compliance audits
Audit logging for all privacy operations

LLM Providers

Provider Features

Provider Configuration

Multi-Tenancy

Multi-Tenancy Guide

Authentication & Authorization

Authentication Overview

SSO Admin Guide

Authorization

Configuration Reference

Guardrails

Guardrail Providers

Execution Modes

Guardrails Configuration

Budget Enforcement

Budget Configuration

Vector Stores & RAG

Knowledge Bases Guide

Chat UI

Chat UI Features

Chat Modes

Frontend Tools

Frontend Tools

MCP Integration

MCP Integration

Response Caching

Caching Configuration

Observability

Data Privacy Guide

Additional Features

Deployment Guide

API Reference

Configuration

Security

On this page