ADR-007: Pluggable LLM Provider

Why Owlat uses the Vercel AI SDK with a provider abstraction layer instead of hardcoding a single LLM vendor.

  • Status: Accepted
  • Date: 2026-03-24

Context

The Agent Pipeline, Knowledge Graph, and semantic file system all require LLM capabilities — text generation, structured output, embeddings, and tool calling. Owlat needs to support multiple deployment scenarios:

  1. Cloud users who want the best available models (GPT-4o, Claude, etc.) via API keys
  2. Self-hosters who need fully offline operation with local models (via Ollama, vLLM, or similar)
  3. Enterprise users who route through internal API gateways with custom endpoints

The options considered:

  1. Hardcode OpenAI — simplest, but locks out self-hosters who cannot or will not use cloud APIs
  2. LangChain/LlamaIndex — heavy frameworks with large dependency trees, complex abstractions, and features Owlat does not need (chains, memory management, vector store adapters)
  3. Vercel AI SDK with provider abstraction — lightweight, already in the dependency tree (@ai-sdk/openai), provider-agnostic, supports structured output and tool calling natively

Decision

Use the Vercel AI SDK as the LLM orchestration layer, wrapped in a thin provider abstraction configured via environment variables.

LLM_PROVIDER=openai          # or: anthropic, ollama, custom
LLM_BASE_URL=                 # for ollama: http://localhost:11434/v1
LLM_API_KEY=                  # not needed for ollama
LLM_MODEL=gpt-4o             # or: claude-sonnet-4-20250514, llama3, etc.
LLM_EMBEDDING_MODEL=          # optional, defaults to provider's default

The AI SDK's createOpenAI() factory accepts a baseURL parameter, which means any OpenAI-compatible API (Ollama, vLLM, LiteLLM, Azure OpenAI) works without additional provider code. For Anthropic, the AI SDK has a dedicated @ai-sdk/anthropic provider.

All LLM calls go through a single getLLMProvider() function in apps/api/convex/lib/llmProvider.ts that reads these environment variables and returns a configured provider instance.

Consequences

Enables:

  • Self-hosters run Ollama locally for fully offline, zero-cost AI features
  • Cloud users choose their preferred provider (OpenAI, Anthropic, or any compatible API)
  • Enterprise users point at internal gateways or proxy endpoints via LLM_BASE_URL
  • Single configuration surface — four environment variables control all LLM behavior
  • AI SDK is already a dependency (@ai-sdk/openai in both apps/api and apps/web)

Trade-offs:

  • Quality varies significantly between providers — local models may produce lower-quality classifications and drafts than GPT-4o or Claude
  • Embedding dimensions differ across models — vector indexes need to be configured for the chosen embedding model's dimensions
  • No built-in RAG chains — retrieval-augmented generation is implemented as explicit Convex function steps (query vector index, pass results to prompt), which is more verbose but more debuggable
  • AI SDK updates may introduce breaking changes, though the abstraction layer isolates application code