Knowledge Graph

Technical architecture for Owlat's typed knowledge storage — how organizational knowledge is extracted, stored, searched, and maintained.

Knowledge Graph Architecture

Every organization accumulates knowledge through communication: customer preferences, internal decisions, project context, relationship history. The Knowledge Graph captures this knowledge as typed, searchable, decaying entries — not a write-only log, but a living system that stays accurate as the organization evolves.

Data isolation is non-negotiable

Each organization's knowledge graph is completely isolated. Agent context windows never mix data from different organizations. Every query, every vector search, every knowledge retrieval is scoped by organizationId. This extends the same multi-tenancy model Owlat uses today.

Storage model

The Knowledge Graph is built on Convex tables — not a separate graph database. Convex's native vector indexes enable semantic search, and indexed joins handle relationship traversal. This keeps the self-hosted stack simple: no Neo4j, no Pinecone, no additional services.

Knowledge entries

Every piece of organizational knowledge is a typed entry:

TypeDescriptionExample
FactVerifiable information about an entity"Acme Corp uses our Enterprise plan"
DecisionA choice that was made with reasoning"Decided to extend Acme's trial by 2 weeks (approved by Sarah)"
EventSomething that happened at a point in time"Met Acme's CTO at SaaStr conference on March 5"
PreferenceHow someone likes things done"Acme prefers email over phone for support"
GoalAn objective someone is working toward"Acme wants to launch their email program by September"
RelationshipA connection between entities"Alice at Acme reports to Bob"

Each entry has:

  • Content — the knowledge itself (title + detailed content)
  • Source attribution — where this knowledge came from (email, chat, manual entry, file, agent-extracted)
  • Entity links — connections to contacts, conversation threads, and other entries
  • Embedding — vector representation for semantic search
  • Confidence score — how reliable this knowledge is (0–1)
  • Expiration — optional TTL for time-sensitive facts

Knowledge relations

Entries connect to each other through typed edges:

RelationMeaning
supportsOne entry provides evidence for another
contradictsOne entry conflicts with another (triggers resolution)
supersedesOne entry replaces another (newer information)
relates_toGeneral association
causesCausal relationship
blocksOne entry prevents another

Relations enable traversal: when the agent retrieves knowledge about a customer, it follows relations to find supporting context, flag contradictions, and surface the most recent information.

Extraction pipeline

Knowledge extraction runs automatically after each inbound message is processed by the Agent Pipeline:

Inbound message processed
  → Knowledge extractor (Convex internalAction)
    1. Entity extraction: people, organizations, dates, amounts
    2. Fact extraction: structured output via AI SDK generateObject()
    3. Deduplication: vector search for similar existing entries
    4. Contradiction check: find entries that conflict with new knowledge
    5. Store entries with embeddings and entity links
    6. Create relations (supports, contradicts, supersedes)

Entity extraction

Uses AI SDK structured output to extract entities:

const entities = await generateObject({
  model: getLLMProvider(),
  schema: z.object({
    people: z.array(z.object({
      name: z.string(),
      role: z.string().optional(),
      email: z.string().optional(),
    })),
    organizations: z.array(z.string()),
    dates: z.array(z.object({
      date: z.string(),
      context: z.string(),
    })),
    amounts: z.array(z.object({
      value: z.number(),
      currency: z.string(),
      context: z.string(),
    })),
  }),
  prompt: `Extract entities from this message...\n\n${messageContent}`,
})

Deduplication

Before storing a new entry, the pipeline runs a vector search against existing entries for the same organization. If a semantically similar entry exists (cosine similarity > 0.92), the pipeline either:

  • Merges — combines content, updates confidence, keeps the more recent timestamp
  • Links — creates a supports relation if the entries are complementary
  • Supersedes — creates a supersedes relation if the new entry is a clear update

Retrieval

The Knowledge Graph serves two retrieval patterns:

Semantic search (vector)

Used by the Agent Pipeline's context retrieval step:

// Find knowledge relevant to an inbound message
const results = await ctx.vectorSearch('knowledgeEntries', 'vector_knowledge', {
  vector: await generateEmbedding(messageContent),
  limit: 20,
  filter: (q) =>
    q.eq('organizationId', orgId),
})

Returns the most semantically relevant entries regardless of keyword matches. The agent uses these to build its context briefing.

Used by the UI for manual knowledge browsing:

const results = await ctx.db
  .query('knowledgeEntries')
  .withSearchIndex('search_knowledge', (q) =>
    q.search('searchableText', searchQuery)
      .eq('organizationId', orgId)
  )
  .take(25)

Contact-scoped retrieval

When preparing context for a specific contact interaction:

// All knowledge linked to this contact
const contactKnowledge = await ctx.db
  .query('knowledgeEntries')
  .withIndex('by_contact', (q) => q.eq('contactIds', contactId))
  .collect()

Memory as tools

The extraction pipeline described above is passive — it runs after the agent pipeline processes a message. But agents also need to actively save and recall knowledge during pipeline execution. The Knowledge Graph exposes tool definitions that the agent can call during the action planning step (Step 3).

Active save

When the agent discovers something important during a conversation — a new fact, an updated preference, a commitment — it can persist it immediately:

// Tool definition available to the action planning step
const saveKnowledge = tool({
  description: 'Save a piece of organizational knowledge discovered during this conversation',
  parameters: z.object({
    type: z.enum(['fact', 'decision', 'event', 'preference', 'goal', 'relationship', 'action_item']),
    title: z.string(),
    content: z.string(),
    contactId: z.string().optional(),
    confidence: z.number().min(0).max(1),
    expiresInDays: z.number().optional(),
  }),
  execute: async ({ type, title, content, contactId, confidence, expiresInDays }) => {
    // Runs deduplication + contradiction check before storing
    return await ctx.runMutation(internal.knowledgeGraph.saveEntry, {
      organizationId, type, title, content, contactId, confidence, expiresInDays,
    })
  },
})

Active recall

During draft generation (Step 4), the agent can explicitly query the Knowledge Graph for relevant context beyond what was retrieved in Step 1:

const recallKnowledge = tool({
  description: 'Search organizational knowledge for information relevant to the current task',
  parameters: z.object({
    query: z.string(),
    contactId: z.string().optional(),
    type: z.enum(['fact', 'decision', 'event', 'preference', 'goal', 'relationship', 'action_item']).optional(),
    limit: z.number().default(5),
  }),
  execute: async ({ query, contactId, type, limit }) => {
    return await ctx.runAction(internal.knowledgeGraph.semanticSearch, {
      organizationId, query, contactId, type, limit,
    })
  },
})

Action items

A new knowledge type — action_item — captures commitments and tasks extracted from conversations:

TypeDescriptionExample
Action ItemA commitment or task identified in conversation"Send Acme the updated proposal by Friday"

Action items have fast decay (like goals) and can trigger reminders when their deadline approaches. The agent extracts them during pipeline processing and also when a human explicitly mentions a commitment in conversation.

Scoped isolation

Knowledge tool access is scoped to prevent cross-contamination:

  • Organization boundary — tools can only read/write knowledge within the current organization (enforced by organizationId filtering on every query)
  • Contact scope — when processing a message from Contact A, the agent can access organization-wide knowledge and Contact A's specific knowledge, but queries are weighted toward the relevant contact
  • Branch isolation — when the pipeline forks for multi-intent messages, each branch has an isolated view of newly saved knowledge until the branches merge. This prevents one branch's speculative saves from affecting another branch's reasoning

Decay and maintenance

The Knowledge Graph is not append-only. Stale knowledge degrades over time:

Confidence decay

Every entry has a confidence score (0–1) and a lastValidatedAt timestamp. A scheduled Convex cron job runs daily:

  1. Time decay — reduce confidence by a small factor for entries not validated recently
  2. Contradiction resolution — when two entries have a contradicts relation, flag the older one for review
  3. Expiration — delete entries past their expiresAt timestamp
  4. Validation boost — when an agent retrieves and uses an entry successfully (the human approves the draft), boost the entry's confidence

Knowledge types decay at different rates

TypeDecay rateRationale
FactSlowFacts like "customer's plan" change infrequently
DecisionVery slowDecisions persist unless explicitly reversed
EventNone (historical)Events don't become less true over time
PreferenceMediumPreferences evolve as relationships develop
GoalFastGoals have deadlines and shift frequently
RelationshipMediumOrg structures change
Action ItemFastCommitments have deadlines and resolve quickly

Schema

knowledgeEntries

knowledgeEntries: defineTable({
  organizationId: v.string(),
  entryType: v.union(
    v.literal('fact'),
    v.literal('decision'),
    v.literal('event'),
    v.literal('preference'),
    v.literal('goal'),
    v.literal('relationship'),
    v.literal('action_item')
  ),
  title: v.string(),
  content: v.string(),
  sourceType: v.union(
    v.literal('email'),
    v.literal('chat'),
    v.literal('manual'),
    v.literal('file'),
    v.literal('agent_extracted')
  ),
  sourceId: v.optional(v.string()),
  contactIds: v.optional(v.array(v.id('contacts'))),
  threadId: v.optional(v.id('conversationThreads')),
  embedding: v.array(v.float64()),
  confidence: v.number(),
  lastValidatedAt: v.number(),
  expiresAt: v.optional(v.number()),
  tags: v.optional(v.array(v.string())),
  searchableText: v.optional(v.string()),
  createdAt: v.number(),
  updatedAt: v.number(),
})
  .index('by_organization', ['organizationId'])
  .index('by_organization_and_type', ['organizationId', 'entryType'])
  .index('by_contact', ['contactIds'])
  .index('by_thread', ['threadId'])
  .searchIndex('search_knowledge', {
    searchField: 'searchableText',
    filterFields: ['organizationId', 'entryType'],
  })
  .vectorIndex('vector_knowledge', {
    vectorField: 'embedding',
    dimensions: 1536,
    filterFields: ['organizationId', 'entryType'],
  })

knowledgeRelations

knowledgeRelations: defineTable({
  organizationId: v.string(),
  fromEntryId: v.id('knowledgeEntries'),
  toEntryId: v.id('knowledgeEntries'),
  relationType: v.union(
    v.literal('supports'),
    v.literal('contradicts'),
    v.literal('supersedes'),
    v.literal('relates_to'),
    v.literal('causes'),
    v.literal('blocks')
  ),
  createdAt: v.number(),
})
  .index('by_from', ['fromEntryId'])
  .index('by_to', ['toEntryId'])
  .index('by_organization', ['organizationId'])