Hadrian is experimental alpha software. Do not use in production.
Hadrian
Features

Features

Explore Hadrian Gateway's comprehensive feature set

Hadrian Gateway includes a comprehensive feature set for production AI deployments. All features are free and included in the open-source release, dual-licensed under Apache 2.0 and MIT.

LLM Providers

Route requests to any major LLM provider through a unified OpenAI-compatible API.

ProviderStreamingEmbeddingsFunction CallingThinking/Reasoning
OpenAIYesYesYesYes (o1/o3)
AnthropicYesNoYesYes (extended thinking)
AWS BedrockYesYes (Titan)YesYes (Claude)
Google VertexYesYesYesYes (Gemini)
Azure OpenAIYesYesYesYes
OpenRouterYesNoYesVaries
Any OpenAI-compatibleYesVariesVariesVaries

Provider capabilities:

  • Circuit breaker - Automatically disable unhealthy providers after repeated failures
  • Automatic retry - Exponential backoff for transient errors (429, 5xx)
  • Provider fallbacks - Chain providers: try A, then B, then C
  • Model fallbacks - Graceful degradation: gpt-4ogpt-4o-miniclaude-sonnet
  • Health checks - Background monitoring to detect issues before user requests fail
  • Model aliases - Create shortcuts like sonnetclaude-sonnet-4-20250514

Multi-Tenancy

Hadrian supports a flexible multi-tenancy hierarchy for organizations of any size.

Organizations
  └── Teams (optional)
        └── Projects
              └── Users
                    └── API Keys

Each level in the hierarchy can have:

CapabilityDescription
Dynamic providersBring your own API keys at any scope
Model pricingOverride pricing for cost calculations
Budget limitsDaily/monthly spending caps with enforcement
Rate limitsRequests and tokens per minute/day
GuardrailsScope-specific content policies

Resource ownership:

Resources can be owned at different levels depending on your organization's needs:

  • Organization-level: Shared across all teams and projects
  • Team-level: Shared within a team
  • Project-level: Isolated to a specific project
  • User-level: Personal resources (conversations, API keys)

Authentication & Authorization

Flexible authentication supporting multiple methods for API and UI access.

MethodUse CaseDescription
API KeyProgrammatic accessgw_live_... format, budget limits
OIDC/OAuthSSO with identity providersKeycloak, Auth0, Okta, Azure AD
JWTService-to-service authJWKS validation, custom claims
Per-Org SSOMulti-tenant SaaSSelf-service SSO per organization
Proxy AuthZero-trust networksCloudflare Access, Tailscale

CEL-based authorization:

Use Common Expression Language (CEL) policies for fine-grained access control:

[[auth.rbac.policies]]
name = "org-admin"
condition = "'admin' in subject.roles && context.org_id in subject.org_ids"
effect = "allow"

Guardrails

Block, warn, log, or redact content using configurable guardrails on both input and output.

Guardrail Providers

ProviderFeaturesBest For
OpenAI ModerationHate, violence, sexual, self-harm categoriesFree, fast, general-purpose
AWS Bedrock GuardrailsPII detection, topic filters, word filters, denied topicsEnterprise compliance
Azure Content SafetyConfigurable severity thresholds, custom blocklistsAzure environments
Custom HTTPYour own moderation serviceCustom requirements
Regex patternsPII patterns, blocklist termsSimple rules

Execution Modes

ModeBehaviorLatency Impact
BlockingEvaluate before sending to LLMAdds round-trip to guardrail
ConcurrentRace guardrails against LLM, cancel if violationsMinimal for passing requests
Post-responseFilter LLM output before returningAdds round-trip after LLM

Actions on violation:

  • block - Reject request with error
  • warn - Log warning, allow request
  • log - Silent logging only
  • redact - Remove/mask violating content

Budget Enforcement

Prevent cost overruns with atomic budget reservation and real-time enforcement.

How it works:

1. Request arrives → Reserve estimated cost ($0.10)
2. Forward to LLM provider
3. Request completes with actual cost
4. Adjust: Replace estimate with actual cost

This atomic reservation pattern prevents overspend even with concurrent requests.

Budget scopes:

  • Organization-level budgets
  • Team-level budgets
  • Project-level budgets
  • User-level budgets

Forecasting:

Built-in time-series forecasting with augurs:

  • Projected spend for current period
  • Days until budget exhaustion
  • 95% confidence intervals

Vector Stores & RAG

OpenAI-compatible Vector Stores API for building RAG (Retrieval-Augmented Generation) applications.

Capabilities:

  • Upload and process files (PDF, DOCX, TXT, Markdown, HTML, etc.)
  • Automatic text extraction with OCR support via Kreuzberg
  • Configurable chunking strategies (auto, fixed-size)
  • Vector search with similarity scoring
  • LLM-based re-ranking for improved relevance
  • File search tool integration for Responses API

Vector backends:

BackendUse Case
pgvectorSimple setup, uses existing PostgreSQL
QdrantDedicated vector DB, high performance
PineconeManaged service, serverless
WeaviateHybrid search, schema-based
ChromaDBLightweight, embedded

Chat UI

Built-in React UI for multi-model conversations and administration.

Chat features:

  • Multi-model comparison in single conversation
  • Model instances (compare same model with different settings)
  • Streaming markdown with syntax highlighting
  • File uploads (images, PDFs)
  • Conversation history with IndexedDB persistence
  • Per-model settings (temperature, max tokens)

Chat modes:

ModeDescription
SynthesizedGather all responses, synthesize final answer
ChainedSequential relay (output of one becomes input to next)
DebatedMulti-round argumentation between models
CouncilCollaborative discussion with voting/consensus
HierarchicalCoordinator delegates subtasks to workers

Frontend Tools

Client-side tool execution in the browser via WebAssembly.

ToolRuntimeCapabilities
PythonPyodidenumpy, pandas, matplotlib, scipy
JavaScriptQuickJSSandboxed JS execution
SQLDuckDBQuery CSV/Parquet files
ChartsVega-LiteInteractive visualizations
HTMLiframeSandboxed preview

Tool results are displayed inline as interactive artifacts and sent back to the LLM to continue the conversation.

MCP Integration

Connect to external tool servers using the Model Context Protocol (MCP).

Capabilities:

  • Connect to MCP servers via Streamable HTTP transport
  • Automatic tool discovery from connected servers
  • Tool execution with result streaming
  • Persistent server connections across conversations

Use cases:

  • File system access and manipulation
  • Database queries
  • External API integrations
  • Custom enterprise tools

Response Caching

Cache LLM responses to reduce costs and latency.

Cache TypeMatchingUse Case
Exact matchSHA-256 hash of requestIdentical requests
SemanticEmbedding similaritySimilar questions
Prompt (Anthropic)Provider-side cachingLong system prompts

Observability

Comprehensive monitoring and debugging capabilities.

FeatureEndpoint/FormatDescription
Metrics/metrics (Prometheus)Request latency, token counts, costs, errors
TracingOTLP exportDistributed traces to Jaeger, Tempo, etc.
LoggingJSON or compactStructured logs with configurable levels
UsageDatabase + OTLPToken usage, costs per user/project/org
[observability]
logging.format = "json"
logging.level = "info"

[observability.tracing]
enabled = true
exporter = "otlp"
endpoint = "http://localhost:4317"

[observability.metrics]
enabled = true

Data Privacy & GDPR

Built-in compliance features for data protection regulations.

Capabilities:

  • Self-service data export (GDPR Article 15 - Right of Access)
  • Self-service account deletion (GDPR Article 17 - Right to Erasure)
  • Configurable data retention policies
  • CSV export reports for compliance audits
  • Audit logging for all privacy operations

Additional Features

On this page