ADR-007: Pluggable LLM Provider | Owlat Docs

Why Owlat uses the Vercel AI SDK with a provider abstraction layer instead of hardcoding a single LLM vendor.

Status: Accepted
Date: 2026-03-24

Context

The Agent Pipeline, Knowledge Graph, and semantic file system all require LLM capabilities — text generation, structured output, embeddings, and tool calling. Owlat needs to support multiple deployment scenarios:

Cloud users who want the best available models (GPT-4o, Claude, etc.) via API keys
Self-hosters who need fully offline operation with local models (via Ollama, vLLM, or similar)
Enterprise users who route through internal API gateways with custom endpoints

The options considered:

Hardcode OpenAI — simplest, but locks out self-hosters who cannot or will not use cloud APIs
LangChain/LlamaIndex — heavy frameworks with large dependency trees, complex abstractions, and features Owlat does not need (chains, memory management, vector store adapters)
Vercel AI SDK with provider abstraction — lightweight, already in the dependency tree (@ai-sdk/openai), provider-agnostic, supports structured output and tool calling natively

Decision

Use the Vercel AI SDK as the LLM orchestration layer, wrapped in a thin provider abstraction configured via environment variables.

LLM_PROVIDER=openai             # or: openrouter, ollama
LLM_BASE_URL=                    # for ollama: http://ollama:11434/v1 (localhost outside Docker)
LLM_API_KEY=                     # not needed for ollama; openrouter resolves its own base URL
LLM_MODEL_FAST=gpt-4o-mini       # classify / extract / guard / summarize
LLM_MODEL_CAPABLE=gpt-4o         # draft / plan
LLM_MODEL=gpt-4o                 # shared fallback when a tier var is unset
LLM_EMBEDDING_MODEL=             # optional, defaults to text-embedding-3-small

Models are selected per task tier rather than as a single model — see ADR-009: Task-Based Model Routing for the routing rules. LLM_MODEL remains a shared fallback when the tier-specific vars are unset.

The AI SDK's createOpenAI() factory accepts a baseURL parameter, which means any OpenAI-compatible API (Ollama, vLLM, LiteLLM, Azure OpenAI) works without additional provider code. openrouter and ollama are handled by the same OpenAI-compatible provider with a pre-resolved base URL.

No native Anthropic provider

Native Anthropic via @ai-sdk/anthropic is not implemented and not installed — only @ai-sdk/openai is a dependency in apps/api/package.json. The value anthropic for LLM_PROVIDER is not specially handled: it hits the default branch of resolveBaseURL() (apps/api/convex/lib/llmProvider.ts:56-57), which returns undefined, so it behaves as plain OpenAI. To run Claude models today, use the OpenAI-compatible provider with Anthropic's OpenAI-compatible endpoint:

LLM_PROVIDER=openai
LLM_BASE_URL=https://api.anthropic.com/v1/
LLM_MODEL_CAPABLE=claude-sonnet-4-6
LLM_MODEL_FAST=claude-haiku-4-5-20251001

For Ollama, set a local model such as LLM_MODEL_FAST=llama3.1:8b.

LLM resolution lives in a single module, apps/api/convex/lib/llmProvider.ts, which exports getLLMProvider(task), getLLMProviderForUserText(task, userText), getEmbeddingModel(), getLLMConfig(), and assertEmbeddingDimension(). It reads these environment variables and calls the AI SDK's createOpenAI() factory directly, with a baseURL resolved per LLM_PROVIDER by resolveBaseURL() (llmProvider.ts:48-59). Every consumer imports and calls these functions directly.

Consequences

Enables:

Self-hosters run Ollama locally for fully offline, zero-cost AI features
Cloud users choose their preferred provider (OpenAI, OpenRouter, Claude via the OpenAI-compatible endpoint, or any compatible API)
Enterprise users point at internal gateways or proxy endpoints via LLM_BASE_URL
Single configuration surface — a handful of environment variables (LLM_PROVIDER, LLM_BASE_URL, LLM_API_KEY, the model-tier vars, and LLM_EMBEDDING_MODEL) control all LLM behavior
AI SDK is already a dependency (@ai-sdk/openai in both apps/api and apps/web)

Trade-offs:

Quality varies significantly between providers — local models may produce lower-quality classifications and drafts than GPT-4o or Claude
Embedding dimensions differ across models — vector indexes need to be configured for the chosen embedding model's dimensions
No built-in RAG chains — retrieval-augmented generation is implemented as explicit Convex function steps (query vector index, pass results to prompt), which is more verbose but more debuggable
AI SDK updates may introduce breaking changes, though the abstraction layer isolates application code