Provider Features
Provider-specific features including thinking/reasoning, prompt caching, and vision support
Hadrian Gateway supports advanced features that vary by provider. This page documents provider-specific capabilities and how to use them through the unified OpenAI-compatible API.
Thinking & Reasoning
Enable models to "think" before responding, improving accuracy on complex tasks. The gateway translates the unified reasoning parameter to each provider's native format.
Feature Support
| Provider | Models | Parameter | Notes |
|---|---|---|---|
| OpenAI | o1, o3, o4-mini | reasoning_effort | Native support |
| Anthropic | Claude 3.5+, Claude 4 | thinking.budget_tokens | Extended thinking |
| Bedrock (Claude) | Claude 3.5+, Claude 4 | reasoning_config | Via additionalModelRequestFields |
| Bedrock (Nova) | Nova Pro, Nova Premier | reasoningConfig | maxReasoningEffort |
| Vertex (Gemini) | Gemini 2.5, 3+ | thinking_config | thinking_budget or thinking_level |
Using the Reasoning Parameter
Use the reasoning parameter in your Chat Completion or Responses API request:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: $API_KEY" \
-d '{
"model": "anthropic/claude-sonnet-4-20250514",
"messages": [
{"role": "user", "content": "Solve this step by step: If a train travels 120 miles in 2 hours, then stops for 30 minutes, then travels 90 miles in 1.5 hours, what is its average speed for the entire journey?"}
],
"reasoning": {
"effort": "high"
}
}'Effort Levels
The effort parameter controls how much computational budget to allocate for reasoning:
| Effort | Description | Use Case |
|---|---|---|
none | Disable reasoning | Fast responses, simple queries |
minimal | Light reasoning | Basic logic, simple math |
low | Moderate reasoning | Multi-step problems |
medium | Substantial reasoning | Complex analysis |
high | Maximum reasoning | Mathematical proofs, complex code |
Provider Translation
The gateway automatically translates effort levels to provider-native formats:
Anthropic Claude (budget_tokens):
none→ disabledminimal→ 2,048 tokenslow→ 8,000 tokensmedium→ 16,000 tokenshigh→ 32,000 tokens
Bedrock Nova (maxReasoningEffort):
- Maps directly to
"minimal","low","medium","high"
Vertex Gemini 3+ (thinking_level):
- Maps to
MINIMAL,LOW,MEDIUM,HIGH
Vertex Gemini 2.5 (thinking_budget):
none→ 0minimal→ 1,024low→ 4,096medium→ 8,192high→ -1 (dynamic)
Reasoning in Responses
When reasoning is enabled, the model's thinking process appears in the response:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "The average speed is 52.5 mph.",
"reasoning": "Let me break this down step by step:\n1. First leg: 120 miles in 2 hours\n2. Stop: 30 minutes = 0.5 hours\n3. Second leg: 90 miles in 1.5 hours\n\nTotal distance = 120 + 90 = 210 miles\nTotal time = 2 + 0.5 + 1.5 = 4 hours\nAverage speed = 210 / 4 = 52.5 mph"
}
}
]
}When thinking is enabled for Anthropic models, temperature is automatically set to 1.0 as required by the Anthropic API.
Prompt Caching
Reduce costs and latency by caching frequently-used prompt content. Currently supported by Anthropic Claude models.
How It Works
- Mark content blocks with
cache_control: {"type": "ephemeral"} - First request caches the content (cache creation tokens charged)
- Subsequent requests read from cache (cache read tokens, ~90% cheaper)
Enabling Prompt Caching
Add cache_control to system messages, user messages, or tools:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: $API_KEY" \
-d '{
"model": "anthropic/claude-sonnet-4-20250514",
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are an expert assistant with access to the following documentation:\n\n[... large documentation block ...]",
"cache_control": {"type": "ephemeral"}
}
]
},
{"role": "user", "content": "How do I configure authentication?"}
]
}'Caching Tools
Cache tool definitions for function calling:
{
"tools": [
{
"type": "function",
"function": {
"name": "search_docs",
"description": "Search the documentation",
"parameters": {
"type": "object",
"properties": {
"query": { "type": "string" }
}
}
},
"cache_control": { "type": "ephemeral" }
}
]
}Cache Usage in Response
The response includes cache statistics in the usage object:
{
"usage": {
"prompt_tokens": 1500,
"completion_tokens": 250,
"total_tokens": 1750,
"prompt_tokens_details": {
"cached_tokens": 1200
}
}
}Prompt caching requires the content to be identical across requests. Even small changes will result in a cache miss.
Best Practices
- Cache stable content first - Put cacheable content (system prompts, documentation, tool definitions) at the beginning of your messages
- Use for repetitive workloads - Best ROI when the same context is used across many requests
- Monitor cache hit rates - Track
cached_tokensvsprompt_tokensto measure effectiveness - Minimum size - Anthropic requires cached content to be at least 1,024 tokens
Vision & Image Support
Send images to vision-capable models through the unified API. The gateway handles format differences between providers.
Image Input Methods
| Method | Format | Provider Support |
|---|---|---|
| Base64 | data:image/png;base64,... | All providers |
| HTTPS URL | https://example.com/image.png | Anthropic (native), others (with fetching) |
| HTTP URL | http://example.com/image.png | Requires image fetching |
Sending Images
Base64 (works everywhere):
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: $API_KEY" \
-d '{
"model": "anthropic/claude-sonnet-4-20250514",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{
"type": "image_url",
"image_url": {
"url": "..."
}
}
]
}]
}'HTTPS URL (Anthropic native):
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: $API_KEY" \
-d '{
"model": "anthropic/claude-sonnet-4-20250514",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Describe this diagram"},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/diagram.png"
}
}
]
}]
}'Image Fetching
For providers that don't support URL references natively, enable image fetching:
[features.image_fetching]
enabled = true
timeout_secs = 30
max_size_bytes = 20971520 # 20 MB
allowed_domains = ["example.com", "cdn.example.com"] # Optional allowlistWhen enabled, the gateway:
- Detects image URLs in requests
- Fetches the image content
- Converts to base64 for providers that require it
- Caches fetched images to reduce latency
Provider-Specific Notes
Anthropic:
- Supports HTTPS URLs natively (passed through directly)
- HTTP URLs are rejected (HTTPS only)
- Supports caching images with
cache_control
OpenAI:
- Supports both URLs and base64
- Gateway passes through without modification
Bedrock/Vertex:
- Require base64 format
- Enable
image_fetchingfor URL support
Supported Image Formats
| Format | MIME Type | Max Size (typical) |
|---|---|---|
| PNG | image/png | 20 MB |
| JPEG | image/jpeg | 20 MB |
| GIF | image/gif | 20 MB |
| WebP | image/webp | 20 MB |
Audio Support
Generate speech and transcribe audio through supported providers.
Text-to-Speech
curl http://localhost:8080/v1/audio/speech \
-H "Content-Type: application/json" \
-H "X-API-Key: $API_KEY" \
-d '{
"model": "openai/tts-1",
"input": "Hello, welcome to Hadrian Gateway!",
"voice": "alloy"
}' \
--output speech.mp3Transcription
curl http://localhost:8080/v1/audio/transcriptions \
-H "X-API-Key: $API_KEY" \
-F "file=@audio.mp3" \
-F "model=openai/whisper-1"Translation
curl http://localhost:8080/v1/audio/translations \
-H "X-API-Key: $API_KEY" \
-F "file=@french_audio.mp3" \
-F "model=openai/whisper-1"Audio endpoints are currently supported through OpenAI and OpenAI-compatible providers.
Streaming Buffer Configuration
For providers that require stream transformation (Anthropic, Bedrock, Vertex), configure buffer limits to protect against DoS attacks:
[providers.anthropic]
type = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
[providers.anthropic.streaming_buffer]
max_input_buffer_bytes = 4194304 # 4 MB (default)
max_output_buffer_chunks = 1000 # Max buffered chunks (default)When Buffers Apply
| Provider | Stream Transformation | Buffer Config |
|---|---|---|
| OpenAI | Pass-through | Not applicable |
| Azure OpenAI | Pass-through | Not applicable |
| Anthropic | SSE → OpenAI format | Configurable |
| Bedrock | Binary → OpenAI format | Configurable |
| Vertex | JSON → OpenAI format | Configurable |
OpenAI and Azure OpenAI streams are passed through directly without transformation, so buffer configuration doesn't apply.
Feature Comparison Matrix
| Feature | OpenAI | Anthropic | Bedrock | Vertex | Azure |
|---|---|---|---|---|---|
| Streaming | Yes | Yes | Yes | Yes | Yes |
| Function Calling | Yes | Yes | Yes | Yes | Yes |
| Thinking/Reasoning | o1/o3/o4 | Claude 3.5+ | Claude, Nova | Gemini 2.5+ | o1/o3 |
| Prompt Caching | No | Yes | No | No | No |
| Vision | Yes | Yes | Yes | Yes | Yes |
| Image URLs | Yes | HTTPS only | Via fetching | Via fetching | Yes |
| Embeddings | Yes | No | Titan | Yes | Yes |
| TTS | Yes | No | No | No | Yes |
| Transcription | Yes | No | No | No | Yes |