Knowledge Bases

Knowledge Bases provide retrieval-augmented generation (RAG) capabilities through an OpenAI-compatible Vector Stores API. Upload documents, automatically extract and chunk text, generate embeddings, and search with vector similarity, keyword matching, or hybrid approaches.

Knowledge Bases are called "Vector Stores" in the API to maintain OpenAI compatibility. The UI uses "Knowledge Bases" for clarity.

Overview

The Knowledge Bases feature provides:

OpenAI-compatible API - Drop-in replacement for OpenAI's Vector Stores and Files APIs
Automatic document processing - Extract text from PDF, DOCX, HTML, and more via Kreuzberg
OCR support - Extract text from scanned documents and images
Flexible chunking - Auto or fixed-size chunking strategies
Multiple vector backends - pgvector (PostgreSQL) or Qdrant
Hybrid search - Combine vector similarity with keyword matching
LLM re-ranking - Improve relevance with a second-stage LLM scorer
File search tool - Integrate with Responses API for automatic retrieval

Quick Start

1. Enable the Feature

[features.file_search]
enabled = true

[features.file_search.embedding]
provider = "openai"
model = "text-embedding-3-small"
dimensions = 1536

[features.file_search.vector_backend]
type = "pgvector"

2. Create a Knowledge Base

curl -X POST http://localhost:8080/v1/vector_stores \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Product Documentation"
  }'

Response:

{
  "id": "vs_abc123",
  "object": "vector_store",
  "name": "Product Documentation",
  "status": "completed",
  "file_counts": {
    "in_progress": 0,
    "completed": 0,
    "failed": 0,
    "cancelled": 0,
    "total": 0
  },
  "usage_bytes": 0,
  "created_at": 1704672000
}

3. Upload and Add a File

# Upload file
FILE_ID=$(curl -X POST http://localhost:8080/v1/files \
  -H "Authorization: Bearer $API_KEY" \
  -F "file=@documentation.pdf" \
  -F "purpose=assistants" | jq -r '.id')

# Add to knowledge base (triggers processing)
curl -X POST http://localhost:8080/v1/vector_stores/vs_abc123/files \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"file_id\": \"$FILE_ID\"}"

4. Search

curl -X POST http://localhost:8080/v1/vector_stores/vs_abc123/search \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do I configure authentication?",
    "max_num_results": 5
  }'

Configuration

Embedding Settings

Configure the embedding model used to vectorize documents:

[features.file_search.embedding]
# Embedding provider (must be configured in [providers])
provider = "openai"

# Embedding model
model = "text-embedding-3-small"

# Vector dimensions (must match model output)
dimensions = 1536

Embedding model and dimensions are immutable after creating a knowledge base. All files in a knowledge base must use the same embedding model.

Vector Backend

PostgreSQL with pgvector

Best for simple deployments using existing PostgreSQL:

[features.file_search.vector_backend]
type = "pgvector"

# Table name for embeddings (default: "semantic_cache_embeddings")
table_name = "semantic_cache_embeddings"

# Index type: "ivfflat" or "hnsw"
index_type = "hnsw"

# Distance metric: "cosine", "dot_product", or "euclidean"
distance_metric = "cosine"

Qdrant

Best for high-performance vector search at scale:

[features.file_search.vector_backend]
type = "qdrant"

# Qdrant server URL
url = "http://localhost:6333"

# Optional API key
api_key = "${QDRANT_API_KEY}"

# Collection name
collection_name = "hadrian_vectors"

# Distance metric
distance_metric = "cosine"

Distance Metrics

Metric	Use Case
`cosine` (default)	Text similarity, semantic search. Works with most embedding models.
`dot_product`	When embedding magnitude matters. Requires normalized vectors.
`euclidean`	Absolute distances. Common for image embeddings.

Document Extraction

Configure text extraction from documents:

[features.file_search]
# Enable OCR for scanned documents and images
enable_ocr = false

# Force OCR even for text-based PDFs
force_ocr = false

# Tesseract language code for OCR
ocr_language = "eng"

# Extract images from PDFs for OCR
pdf_extract_images = false

# DPI for extracted PDF images
pdf_image_dpi = 300

Chunking Strategies

Auto Chunking (Default)

Intelligently chunks based on content structure (paragraphs, sections, semantic boundaries):

{
  "chunking_strategy": {
    "type": "auto"
  }
}

Static Chunking

Fixed-size chunks with configurable overlap:

{
  "chunking_strategy": {
    "type": "static",
    "static": {
      "max_chunk_size_tokens": 800,
      "chunk_overlap_tokens": 400
    }
  }
}

Parameter	Default	Description
`max_chunk_size_tokens`	800	Maximum tokens per chunk
`chunk_overlap_tokens`	400	Overlap between consecutive chunks

Re-ranking

Enable LLM-based re-ranking for improved relevance:

[features.file_search.rerank]
enabled = true

# LLM model for re-ranking
model = "gpt-4o-mini"
provider = "openai"

# Top N results to re-rank (default: 20)
max_results_to_rerank = 20

# Batch size for parallel re-ranking
batch_size = 5

# Timeout in seconds
timeout_secs = 30

# Fall back to vector scores on error
fallback_on_error = true

Re-ranking flow:

Initial search returns top N results (e.g., 20)
Results sent to LLM in batches for relevance scoring
Results re-sorted by LLM scores
Top M results returned to user (e.g., 5)

Search Settings

[features.file_search]
# Max file_search calls per request (prevents loops)
max_iterations = 5

# Max results per search
max_results_per_search = 10

# Search timeout in seconds
timeout_secs = 30

# Minimum similarity threshold (0.0-1.0)
score_threshold = 0.7

# Max characters per result
max_search_result_chars = 4000

Supported File Types

Text Files

Extension	Description
`.txt`, `.md`, `.markdown`	Plain text and Markdown
`.json`, `.csv`, `.xml`	Structured data
`.html`, `.htm`	Web pages

Code Files

Extension	Languages
`.rs`, `.py`, `.js`, `.ts`	Rust, Python, JavaScript, TypeScript
`.java`, `.c`, `.cpp`, `.go`	Java, C, C++, Go
`.rb`, `.php`, `.sh`	Ruby, PHP, Shell

Rich Documents

Extension	Description
`.pdf`	PDF documents (via Kreuzberg)
`.docx`, `.doc`	Microsoft Word
`.xlsx`, `.xls`	Microsoft Excel
`.pptx`, `.ppt`	Microsoft PowerPoint
`.rtf`, `.odt`, `.ods`, `.odp`	Rich text and OpenDocument

Images (OCR Required)

Extension	Description
`.png`, `.jpg`, `.jpeg`	Common image formats
`.gif`, `.bmp`, `.tiff`, `.webp`	Additional formats

Image OCR requires enable_ocr = true in configuration and Tesseract installed on the system.

Search Types

Vector Search

Semantic similarity using embeddings. Best for conceptual queries:

curl -X POST http://localhost:8080/v1/vector_stores/vs_abc123/search \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do I handle authentication errors?",
    "max_num_results": 5,
    "ranking_options": {
      "ranker": "auto"
    }
  }'

Keyword Search

Full-text search using BM25/TF-IDF. Best for exact term matching:

curl -X POST http://localhost:8080/v1/vector_stores/vs_abc123/search \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "AuthenticationError 401",
    "max_num_results": 5,
    "ranking_options": {
      "ranker": "keyword"
    }
  }'

Hybrid Search

Combines vector and keyword search using Reciprocal Rank Fusion (RRF):

curl -X POST http://localhost:8080/v1/vector_stores/vs_abc123/search \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "AuthenticationError handling",
    "max_num_results": 5,
    "ranking_options": {
      "ranker": "hybrid",
      "rrf_k": 60,
      "vector_weight": 1.0,
      "keyword_weight": 0.5
    }
  }'

Parameter	Default	Description
`rrf_k`	60	RRF smoothing constant
`vector_weight`	1.0	Weight for vector results
`keyword_weight`	1.0	Weight for keyword results

Attribute Filtering

Filter search results by file attributes using OpenAI-compatible filters:

curl -X POST http://localhost:8080/v1/vector_stores/vs_abc123/search \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "deployment guide",
    "max_num_results": 5,
    "filters": {
      "type": "and",
      "filters": [
        {"type": "eq", "key": "category", "value": "documentation"},
        {"type": "gte", "key": "version", "value": 2}
      ]
    }
  }'

Comparison Operators

Operator	Description
`eq`	Equal to
`ne`	Not equal to
`gt`	Greater than
`gte`	Greater than or equal to
`lt`	Less than
`lte`	Less than or equal to

Logical Operators

Operator	Description
`and`	All filters must match
`or`	At least one filter must match

Setting Attributes

Set attributes when adding a file to a knowledge base:

curl -X POST http://localhost:8080/v1/vector_stores/vs_abc123/files \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "file_id": "file-abc123",
    "attributes": {
      "category": "documentation",
      "version": 2,
      "author": "engineering"
    }
  }'

API Reference

Files API

Endpoint	Method	Description
`/v1/files`	POST	Upload a file
`/v1/files`	GET	List files
`/v1/files/{file_id}`	GET	Get file metadata
`/v1/files/{file_id}`	DELETE	Delete a file
`/v1/files/{file_id}/content`	GET	Download file content

Vector Stores API

Endpoint	Method	Description
`/v1/vector_stores`	POST	Create a knowledge base
`/v1/vector_stores`	GET	List knowledge bases
`/v1/vector_stores/{id}`	GET	Get knowledge base details
`/v1/vector_stores/{id}`	POST	Update knowledge base
`/v1/vector_stores/{id}`	DELETE	Delete knowledge base
`/v1/vector_stores/{id}/files`	POST	Add file to knowledge base
`/v1/vector_stores/{id}/files`	GET	List files in knowledge base
`/v1/vector_stores/{id}/files/{file_id}`	GET	Get file details
`/v1/vector_stores/{id}/files/{file_id}`	DELETE	Remove file
`/v1/vector_stores/{id}/files/{file_id}/chunks`	GET	List chunks for a file
`/v1/vector_stores/{id}/search`	POST	Search knowledge base

File Batches API

Endpoint	Method	Description
`/v1/vector_stores/{id}/file_batches`	POST	Create file batch
`/v1/vector_stores/{id}/file_batches/{batch_id}`	GET	Get batch status
`/v1/vector_stores/{id}/file_batches/{batch_id}`	DELETE	Cancel batch
`/v1/vector_stores/{id}/file_batches/{batch_id}/files`	GET	List files in batch

Document Processing

When a file is added to a knowledge base, the following pipeline executes:

1. File Upload
   └─ Store raw file in database/storage

2. Add to Knowledge Base
   └─ Trigger document processing

3. Text Extraction (Kreuzberg)
   ├─ PDF: Extract text, optionally OCR images
   ├─ Office: DOCX, XLSX, PPTX conversion
   ├─ HTML: Parse and extract content
   └─ Images: OCR if enabled

4. Chunking
   ├─ Auto: Semantic boundaries, paragraphs
   └─ Static: Fixed size with overlap

5. Embedding
   └─ Generate vectors for each chunk

6. Storage
   └─ Store chunks with processing_version

7. Cleanup
   └─ Delete old chunks (shadow-copy pattern)

8. Status Update
   └─ Mark file as "completed" or "failed"

Shadow-Copy Pattern

The gateway uses a shadow-copy pattern for safe document reprocessing:

New chunks stored with incremented processing_version
Only after successful completion, old chunks are deleted
Failed processing leaves old chunks intact

This ensures documents remain searchable even if reprocessing fails.

File Status

Status	Description
`in_progress`	File is being processed
`completed`	Processing succeeded, file is searchable
`failed`	Processing failed (see `last_error`)
`cancelled`	Processing was cancelled

Stale Detection

Files stuck in in_progress for longer than the timeout (default 30 minutes) are automatically reset and can be reprocessed.

File Search Tool Integration

Knowledge bases integrate with the Responses API via the file_search tool:

curl -X POST http://localhost:8080/v1/responses \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "What does the documentation say about rate limits?",
    "tools": [
      {
        "type": "file_search",
        "vector_store_ids": ["vs_abc123"]
      }
    ]
  }'

The gateway automatically:

Intercepts file_search tool calls from the LLM
Executes vector search against specified knowledge bases
Returns results to the LLM for answer synthesis
Limits iterations to prevent infinite loops (max_iterations)

Multi-Tenancy

Knowledge bases support the full multi-tenancy hierarchy:

Owner Type	Description
Organization	Shared across all teams and projects
Team	Shared within a team
Project	Isolated to a specific project
User	Personal knowledge base

Access control is enforced on all API operations. Users can only access knowledge bases they own or have permissions for.

Error Responses

Processing Errors

{
  "id": "vsf_abc123",
  "status": "failed",
  "last_error": {
    "code": "extraction_failed",
    "message": "Failed to extract text from PDF: encrypted document"
  }
}

Search Errors

{
  "error": {
    "type": "invalid_request_error",
    "message": "Vector store not found",
    "code": "resource_not_found"
  }
}

Common Error Codes

Code	Description
`extraction_failed`	Text extraction failed
`embedding_failed`	Embedding generation failed
`chunking_failed`	Document chunking failed
`timeout`	Processing or search timed out
`resource_not_found`	Knowledge base or file not found
`permission_denied`	Insufficient permissions

Complete Configuration Example

[features.file_search]
enabled = true
max_iterations = 5
max_results_per_search = 10
timeout_secs = 30
score_threshold = 0.7
max_search_result_chars = 4000

# Document extraction
enable_ocr = true
force_ocr = false
ocr_language = "eng"
pdf_extract_images = true
pdf_image_dpi = 300

# Embedding configuration
[features.file_search.embedding]
provider = "openai"
model = "text-embedding-3-small"
dimensions = 1536

# Re-ranking configuration
[features.file_search.rerank]
enabled = true
model = "gpt-4o-mini"
provider = "openai"
max_results_to_rerank = 20
batch_size = 5
timeout_secs = 30
fallback_on_error = true

# Vector backend (PostgreSQL)
[features.file_search.vector_backend]
type = "pgvector"
table_name = "hadrian_vectors"
index_type = "hnsw"
distance_metric = "cosine"

Best Practices

Choose the right chunking strategy - Use auto for documents with clear structure, static for uniform content like logs or code.
Set appropriate chunk sizes - Smaller chunks (400-800 tokens) for precise retrieval, larger (1000-1600) for more context per result.
Enable re-ranking for quality - LLM re-ranking significantly improves relevance at the cost of latency.
Use hybrid search - Combining vector and keyword search often outperforms either alone.
Set score thresholds - Filter low-confidence results to improve answer quality.
Use attributes for filtering - Tag files with metadata to enable filtered searches.
Monitor processing status - Check for failed files and investigate extraction issues.
Use HNSW index for pgvector - Faster queries than IVFFlat at the cost of index build time.

Knowledge Bases

On this page