Knowledge Graph
Technical architecture for Owlat's typed knowledge storage — how organizational knowledge is extracted, stored, searched, and maintained.
Knowledge Graph Architecture
Every organization accumulates knowledge through communication: customer preferences, internal decisions, project context, relationship history. The Knowledge Graph captures this knowledge as typed, searchable, decaying entries — not a write-only log, but a living system that stays accurate as the organization evolves.
Each organization's knowledge graph is completely isolated. Agent context windows never mix data from different organizations. Every query, every vector search, every knowledge retrieval is scoped by organizationId. This extends the same multi-tenancy model Owlat uses today.
Storage model
The Knowledge Graph is built on Convex tables — not a separate graph database. Convex's native vector indexes enable semantic search, and indexed joins handle relationship traversal. This keeps the self-hosted stack simple: no Neo4j, no Pinecone, no additional services.
Knowledge entries
Every piece of organizational knowledge is a typed entry:
| Type | Description | Example |
|---|---|---|
| Fact | Verifiable information about an entity | "Acme Corp uses our Enterprise plan" |
| Decision | A choice that was made with reasoning | "Decided to extend Acme's trial by 2 weeks (approved by Sarah)" |
| Event | Something that happened at a point in time | "Met Acme's CTO at SaaStr conference on March 5" |
| Preference | How someone likes things done | "Acme prefers email over phone for support" |
| Goal | An objective someone is working toward | "Acme wants to launch their email program by September" |
| Relationship | A connection between entities | "Alice at Acme reports to Bob" |
Each entry has:
- Content — the knowledge itself (title + detailed content)
- Source attribution — where this knowledge came from (email, chat, manual entry, file, agent-extracted)
- Entity links — connections to contacts, conversation threads, and other entries
- Embedding — vector representation for semantic search
- Confidence score — how reliable this knowledge is (0–1)
- Expiration — optional TTL for time-sensitive facts
Knowledge relations
Entries connect to each other through typed edges:
| Relation | Meaning |
|---|---|
supports | One entry provides evidence for another |
contradicts | One entry conflicts with another (triggers resolution) |
supersedes | One entry replaces another (newer information) |
relates_to | General association |
causes | Causal relationship |
blocks | One entry prevents another |
Relations enable traversal: when the agent retrieves knowledge about a customer, it follows relations to find supporting context, flag contradictions, and surface the most recent information.
Extraction pipeline
Knowledge extraction runs automatically after each inbound message is processed by the Agent Pipeline:
Inbound message processed
→ Knowledge extractor (Convex internalAction)
1. Entity extraction: people, organizations, dates, amounts
2. Fact extraction: structured output via AI SDK generateObject()
3. Deduplication: vector search for similar existing entries
4. Contradiction check: find entries that conflict with new knowledge
5. Store entries with embeddings and entity links
6. Create relations (supports, contradicts, supersedes)
Entity extraction
Uses AI SDK structured output to extract entities:
const entities = await generateObject({
model: getLLMProvider(),
schema: z.object({
people: z.array(z.object({
name: z.string(),
role: z.string().optional(),
email: z.string().optional(),
})),
organizations: z.array(z.string()),
dates: z.array(z.object({
date: z.string(),
context: z.string(),
})),
amounts: z.array(z.object({
value: z.number(),
currency: z.string(),
context: z.string(),
})),
}),
prompt: `Extract entities from this message...\n\n${messageContent}`,
})
Deduplication
Before storing a new entry, the pipeline runs a vector search against existing entries for the same organization. If a semantically similar entry exists (cosine similarity > 0.92), the pipeline either:
- Merges — combines content, updates confidence, keeps the more recent timestamp
- Links — creates a
supportsrelation if the entries are complementary - Supersedes — creates a
supersedesrelation if the new entry is a clear update
Retrieval
The Knowledge Graph serves two retrieval patterns:
Semantic search (vector)
Used by the Agent Pipeline's context retrieval step:
// Find knowledge relevant to an inbound message
const results = await ctx.vectorSearch('knowledgeEntries', 'vector_knowledge', {
vector: await generateEmbedding(messageContent),
limit: 20,
filter: (q) =>
q.eq('organizationId', orgId),
})
Returns the most semantically relevant entries regardless of keyword matches. The agent uses these to build its context briefing.
Full-text search
Used by the UI for manual knowledge browsing:
const results = await ctx.db
.query('knowledgeEntries')
.withSearchIndex('search_knowledge', (q) =>
q.search('searchableText', searchQuery)
.eq('organizationId', orgId)
)
.take(25)
Contact-scoped retrieval
When preparing context for a specific contact interaction:
// All knowledge linked to this contact
const contactKnowledge = await ctx.db
.query('knowledgeEntries')
.withIndex('by_contact', (q) => q.eq('contactIds', contactId))
.collect()
Memory as tools
The extraction pipeline described above is passive — it runs after the agent pipeline processes a message. But agents also need to actively save and recall knowledge during pipeline execution. The Knowledge Graph exposes tool definitions that the agent can call during the action planning step (Step 3).
Active save
When the agent discovers something important during a conversation — a new fact, an updated preference, a commitment — it can persist it immediately:
// Tool definition available to the action planning step
const saveKnowledge = tool({
description: 'Save a piece of organizational knowledge discovered during this conversation',
parameters: z.object({
type: z.enum(['fact', 'decision', 'event', 'preference', 'goal', 'relationship', 'action_item']),
title: z.string(),
content: z.string(),
contactId: z.string().optional(),
confidence: z.number().min(0).max(1),
expiresInDays: z.number().optional(),
}),
execute: async ({ type, title, content, contactId, confidence, expiresInDays }) => {
// Runs deduplication + contradiction check before storing
return await ctx.runMutation(internal.knowledgeGraph.saveEntry, {
organizationId, type, title, content, contactId, confidence, expiresInDays,
})
},
})
Active recall
During draft generation (Step 4), the agent can explicitly query the Knowledge Graph for relevant context beyond what was retrieved in Step 1:
const recallKnowledge = tool({
description: 'Search organizational knowledge for information relevant to the current task',
parameters: z.object({
query: z.string(),
contactId: z.string().optional(),
type: z.enum(['fact', 'decision', 'event', 'preference', 'goal', 'relationship', 'action_item']).optional(),
limit: z.number().default(5),
}),
execute: async ({ query, contactId, type, limit }) => {
return await ctx.runAction(internal.knowledgeGraph.semanticSearch, {
organizationId, query, contactId, type, limit,
})
},
})
Action items
A new knowledge type — action_item — captures commitments and tasks extracted from conversations:
| Type | Description | Example |
|---|---|---|
| Action Item | A commitment or task identified in conversation | "Send Acme the updated proposal by Friday" |
Action items have fast decay (like goals) and can trigger reminders when their deadline approaches. The agent extracts them during pipeline processing and also when a human explicitly mentions a commitment in conversation.
Scoped isolation
Knowledge tool access is scoped to prevent cross-contamination:
- Organization boundary — tools can only read/write knowledge within the current organization (enforced by
organizationIdfiltering on every query) - Contact scope — when processing a message from Contact A, the agent can access organization-wide knowledge and Contact A's specific knowledge, but queries are weighted toward the relevant contact
- Branch isolation — when the pipeline forks for multi-intent messages, each branch has an isolated view of newly saved knowledge until the branches merge. This prevents one branch's speculative saves from affecting another branch's reasoning
Decay and maintenance
The Knowledge Graph is not append-only. Stale knowledge degrades over time:
Confidence decay
Every entry has a confidence score (0–1) and a lastValidatedAt timestamp. A scheduled Convex cron job runs daily:
- Time decay — reduce confidence by a small factor for entries not validated recently
- Contradiction resolution — when two entries have a
contradictsrelation, flag the older one for review - Expiration — delete entries past their
expiresAttimestamp - Validation boost — when an agent retrieves and uses an entry successfully (the human approves the draft), boost the entry's confidence
Knowledge types decay at different rates
| Type | Decay rate | Rationale |
|---|---|---|
| Fact | Slow | Facts like "customer's plan" change infrequently |
| Decision | Very slow | Decisions persist unless explicitly reversed |
| Event | None (historical) | Events don't become less true over time |
| Preference | Medium | Preferences evolve as relationships develop |
| Goal | Fast | Goals have deadlines and shift frequently |
| Relationship | Medium | Org structures change |
| Action Item | Fast | Commitments have deadlines and resolve quickly |
Schema
knowledgeEntries
knowledgeEntries: defineTable({
organizationId: v.string(),
entryType: v.union(
v.literal('fact'),
v.literal('decision'),
v.literal('event'),
v.literal('preference'),
v.literal('goal'),
v.literal('relationship'),
v.literal('action_item')
),
title: v.string(),
content: v.string(),
sourceType: v.union(
v.literal('email'),
v.literal('chat'),
v.literal('manual'),
v.literal('file'),
v.literal('agent_extracted')
),
sourceId: v.optional(v.string()),
contactIds: v.optional(v.array(v.id('contacts'))),
threadId: v.optional(v.id('conversationThreads')),
embedding: v.array(v.float64()),
confidence: v.number(),
lastValidatedAt: v.number(),
expiresAt: v.optional(v.number()),
tags: v.optional(v.array(v.string())),
searchableText: v.optional(v.string()),
createdAt: v.number(),
updatedAt: v.number(),
})
.index('by_organization', ['organizationId'])
.index('by_organization_and_type', ['organizationId', 'entryType'])
.index('by_contact', ['contactIds'])
.index('by_thread', ['threadId'])
.searchIndex('search_knowledge', {
searchField: 'searchableText',
filterFields: ['organizationId', 'entryType'],
})
.vectorIndex('vector_knowledge', {
vectorField: 'embedding',
dimensions: 1536,
filterFields: ['organizationId', 'entryType'],
})
knowledgeRelations
knowledgeRelations: defineTable({
organizationId: v.string(),
fromEntryId: v.id('knowledgeEntries'),
toEntryId: v.id('knowledgeEntries'),
relationType: v.union(
v.literal('supports'),
v.literal('contradicts'),
v.literal('supersedes'),
v.literal('relates_to'),
v.literal('causes'),
v.literal('blocks')
),
createdAt: v.number(),
})
.index('by_from', ['fromEntryId'])
.index('by_to', ['toEntryId'])
.index('by_organization', ['organizationId'])