Platform Operations
Operator reference for abuse status and the sending gate, the platform-admin roster, content review, org deletion, in-app self-update, dev endpoints, crons, and migrations.
This page documents the operational surface that sits behind the product: the abuse-status lifecycle that gates sending, the platform-admin role and its queries, the content review queue, the organization-deletion walker, system health and in-app self-update, and the dev-mode endpoints, crons, and migrations that keep a deployment maintainable. It is written for operators and contributors working on a self-hosted Owlat instance.
Owlat is single-org-per-deployment. The data plane belongs to exactly one organization, and the abuse-status, settings, and stats described here all live on the singleton instanceSettings row. There is no multi-tenant org switcher.
This OSS repository ships one platform-admin-gated page: Settings → System & Updates (/dashboard/settings/system). The richer control-plane UI that consumed the queries below was extracted into a separate private repo. Treat the roster, abuse-status, and content-review functions documented here as a backend/API surface you call from your own tooling or scripts — not as an in-product dashboard.
Abuse status lifecycle and the sending gate
Every deployment carries a single abuse status on instanceSettings.abuseStatus. It drives whether the instance is allowed to send. The four statuses, defined in apps/api/convex/organizations/abuseStatus.ts, form a severity ladder:
| Status | Severity | Sending | Meaning |
|---|---|---|---|
clean | 0 | Allowed | Normal operation (also the default when no status was ever written) |
warned | 1 | Allowed | Advisory warning issued — still fully operational |
suspended | 2 | Blocked | All sending blocked; the account stays accessible |
banned | 3 | Blocked | Account fully disabled; terminal for internal writers |
The read and write halves are deliberately split into sibling modules (ADR-0011, docs/adr/0011-abuse-status-modules.md):
- Abuse gate (
apps/api/convex/organizations/abuseGate.ts) is the home of the sending predicate. It exports two surfaces:isSendingAllowed(status), the pure predicate that the live send paths call (the transactional dispatch path intransactional/dispatch.ts), andrequireSendingAllowed(ctx), a mutation-context helper that fetchesinstanceSettingsand throws onsuspended/banned. The campaign send path enforces the same predicate inline through its pre-flight query (campaigns/preflight.ts, which checksabuseStatusagainstsuspended/banneddirectly) rather than calling the helper. Either way the gate always runs — it is not behind a feature flag. - Abuse status (
apps/api/convex/organizations/abuseStatus.ts) is the only writer ofabuseStatusand its companion fields (abuseStatusReason,abuseStatusChangedAt,abuseStatusChangedBy).
Transition rules
There are two write entry points. The internal-writer path enforces severity rules; the admin path bypasses them.
| Entry point | Used by | Rules |
|---|---|---|
transition (internal mutation) | MTA circuit breaker, reputation auto-enforcement | banned is terminal; downgrades are refused except down to clean (the auto-recover path); a same-state attempt is recorded but not patched |
adminOverride (internal mutation) | The platform-admin setOrganizationStatus mutation | Bypasses all severity rules — an admin can demote a banned org back to clean to resolve an appeal |
Both paths write an abuse_status_changed audit-log row on every call, including same-state no-ops (so "circuit breaker tripped again while already warned" is observable). A missing instanceSettings row returns { ok: false, reason: 'no_settings_row' } rather than throwing — an early-deployment edge case.
What flips the status automatically
| Trigger | Source | Target status |
|---|---|---|
Reputation reaches critical risk | analytics/sendingReputation.ts → evaluateAutoEnforce (hourly cron) → autoEnforceReputation | suspended |
Reputation reaches high risk | analytics/sendingReputation.ts → evaluateAutoEnforce (hourly cron) → autoEnforceReputation | warned |
| MTA circuit breaker trips | webhooks/dispatcher.ts (internal.circuit_breaker_tripped) | warned |
Reputation is derived on read (a rolling 30-day window — see Deliverability Infrastructure), but auto-enforcement is deliberately kept off the read hot path: an hourly evaluate reputation auto-enforce cron (apps/api/convex/crons.ts) calls evaluateAutoEnforce, which summarizes the org window and schedules autoEnforceReputation when risk is high/critical. A separate hourly cleanup sending reputation cron (recalculateAll) ages out >60-day buckets. Because severity downgrades are refused, a critical → warned transition never silently relaxes an existing suspension.
Platform admin role, roster, and operational queries
Platform admins are rows in the platformAdmins table, keyed by BetterAuth user id. Two roles exist: admin and superadmin. The first admin is created with seedPlatformAdmin (an internal mutation that only succeeds while the table is empty and always seeds a superadmin); after that, superadmins manage the roster.
Gating is centralized in apps/api/convex/platformAdmin/platformAdmin.ts:
| Function | Kind | Purpose |
|---|---|---|
requirePlatformAdmin(ctx) | Helper | Throws FORBIDDEN unless the caller is in platformAdmins; returns { authUserId, email, role } |
isPlatformAdmin | Public query | Boolean nav helper, safe for anonymous callers |
isPlatformAdminByUserId | Internal query | Admin check by user id, for HTTP/action contexts that lack a QueryCtx |
seedPlatformAdmin | Internal mutation | One-shot bootstrap of the first superadmin |
The web app gates the single admin page with the platform-admin route middleware (apps/web/app/middleware/platform-admin.ts), which calls isPlatformAdmin and redirects non-admins to /dashboard.
Operational queries
All of these live in apps/api/convex/platformAdmin/queries.ts and start with requirePlatformAdmin. They read the singleton instanceSettings row plus derived reputation/stats:
| Query | Returns |
|---|---|
getPlatformStats | Contact count, abuse status, 30-day send/delivery/bounce/complaint aggregates, signups-by-day |
getDeliveryStats | In-flight sending campaigns, scheduled count, aggregate delivery metrics, last-7-day blocked-email counts by reason |
getOrganizationDetail | Settings, reputation summary, blocked-email counts (bounced/complained/manual), recent content scans and campaigns |
listFlaggedOrganizations | The instance entry, but only when warned/suspended/banned or reputation risk is high/critical |
listAllOrganizations | The instance entry with optional search/statusFilter |
listAllUsers | User profiles with optional search |
listAllDomains | Domains with optional statusFilter/search |
listRecentAbuse | Flagged content scans (suspicious/blocked) plus pending_review campaigns |
getAdminAuditLog | Up to 200 platform_admin-resource audit-log rows, with admin-email resolution |
listPlatformAdmins | The full admin roster |
getBillingOverview exists for backward compatibility and returns empty data — invoice-based billing was removed during the billing simplification and now lives only in the separate private repo. Do not build against it.
Roster + status mutations
These live in apps/api/convex/platformAdmin/mutations.ts:
| Mutation | Effect |
|---|---|
setOrganizationStatus | Sets abuse status via adminOverride (bypassing severity rules); writes both abuse_status_changed and the legacy platform_admin.org_status_changed audit rows |
addPlatformAdmin | Adds an admin (superadmin-only; rejects duplicates) |
removePlatformAdmin | Removes an admin (superadmin-only; cannot remove yourself) |
Content review queue (approve/reject pending content)
Campaigns and transactional emails can land in a pending_review status (for example when content scanning flags them — see Email Security). The review surface is platform-admin only.
getContentReviewQueue (in platformAdmin/queries.ts) returns the pending items joined with their latest content-scan result, plus the recently-reviewed log:
{
pending: [{ type: 'campaign' | 'transactional', id, name, subject, scan: { score, level } | null, ... }],
pendingCount: number,
recentlyReviewed: [{ action, details, userId, createdAt }],
}
The three review mutations (in platformAdmin/mutations.ts) all require pending_review as the current status and write a platform_admin.content_approved or platform_admin.content_rejected audit row:
| Mutation | Effect on the resource |
|---|---|
approveCampaign | Campaign → draft (the owner can then send it) |
approveTransactional | Transactional email → published (sets publishedAt) |
rejectContent | Campaign → draft, or transactional email → draft; reason is recorded |
Both approval paths and the rejection path move the resource out of the queue; there is no separate "blocked" terminal state for reviewed content — rejection reverts to draft with the reason captured in the audit log.
Organization deletion walker
Deleting the organization is a hard, irreversible wipe of the entire data plane. It is owner-only: organizationSettings.remove (in apps/api/convex/organizations/settings.ts) checks session.role === 'owner' and schedules internal.organizations.deletion.walker.start. The walker lives in apps/api/convex/organizations/deletion/walker.ts and implements ADR-0025 (docs/adr/0025-organization-deletion-module-family.md).
remove deletes every per-organization table, including storage blobs, audit logs, and finally the singleton instanceSettings row that owns the org's existence. There is no soft-delete and no recovery short of a database restore.
How it works:
- Ordered cascade.
STEPSis an ordered list of the full per-organization table set, ~90 tables. Children are deleted before parents; storage-bearing tables (e.g.mediaAssets,semanticFiles,mailMessages) purge their blobs before the row delete;auditLogsis second-to-last (it keeps accumulating from delegated lifecycle calls during the wipe);instanceSettingsis the terminal step. - One module per table. Each entry has a sibling step module under
deletion/steps/<table>.tsimplementing thedeleteBatch(ctx)contract fromsteps/_common.ts(default batch size 100). A few tables delegate — for example thecontactsstep sweeps five child tables (contactTopics,contactPropertyValues,contactActivities,contactIdentities,contactRelationships) that are not standalone steps. - Self-scheduled hop.
runStepruns one batch, re-fires itself whilehasMoreis true, advances tonextTablewhen a step drains, and terminates when there is no next step. Thetableargument is validated against the literal union in_common.ts, so a typo is a compile-time and boot-time error rather than a silent no-op.
Any new per-organization table added to the schema must also add a literal to OrganizationDeletionTable and a sibling step module, or the deletion cascade will leave it orphaned.
System health and in-app self-update (updater sidecar)
The System & Updates page (/dashboard/settings/system, platform-admin only) surfaces the current version, container health, the GitHub release check, the update flow, and update history.
Update check
apps/api/convex/systemUpdates.ts polls the GitHub Releases API (https://api.github.com/repos/wolvesdotink/owlat/releases/latest), caches the result in a singleton systemUpdates row (kind: 'latestCheck') with a 1-hour TTL, and computes updateAvailable by comparing the cached latest version to the running OWLAT_VERSION.
| Function | Kind | Notes |
|---|---|---|
checkForUpdates | Action | Admin-gated; honours the 1-hour cache unless force: true; returns cache on GitHub 403/429 rate-limit |
getLatestRelease | Query | Reads the cached latest-check row |
listUpdateHistory | Query | Lists kind: 'updateRun' rows (most recent first; capped at 200) |
A local build reporting OWLAT_VERSION=dev (or a non-semver value) is always treated as "no update available", because Owlat cannot tell whether a dev build is ahead of or behind the release tag.
Applying an update
Clicking Update now posts to the Nitro route apps/web/server/api/system/update.post.ts, which:
Verify the caller
The route re-checks platform-admin status via the session cookie, and requires INSTANCE_SECRET to be configured.
Download and verify the pinned compose file
It fetches docker-compose-<version>.yml and its .sha256 manifest from the GitHub Release, verifies the SHA-256 hash, and confirms the body references ghcr.io/wolvesdotink/web:<version>. A hash mismatch aborts the update.
Record the attempt
It records an updateRun row via recordUpdateStart, capturing versionFrom/versionTo.
Dispatch to the updater sidecar
It POSTs the verified compose template to http://updater:3200/update with the X-Instance-Secret header.
Record the outcome
It records recordUpdateFinish with success/failed and the step output. Both record mutations also emit a structured JSON log line for external log sinks.
The updater sidecar
The updater (apps/updater/src/index.ts, image ghcr.io/wolvesdotink/updater, listening on port 3200, exposed only on an internal docker network (no host port mapping) so only the web container can reach it via http://updater:3200) is the only component that touches Docker. It authenticates every request with a timing-safe X-Instance-Secret compare and reaches Docker through a least-privilege socket proxy (it can pull/recreate/list containers but not exec, build, or touch volumes).
| Endpoint | Method | Purpose |
|---|---|---|
/update | POST | Validate the compose template against an image allowlist + dangerous-mount/privileged-mode rules, write it, docker compose pull, run a one-shot convex-deploy (function deploy before restart — a bad schema aborts here and the old containers keep serving), then docker compose up -d |
/health | GET | Per-service container state, image tag, and health (auth-required to prevent enumeration) |
/configure-ip | POST | Attach/detach a floating IPv4 to eth0 and the campaign IP pool, then restart the MTA |
/rotate-env | POST | Rewrite rotated secrets in .env and force-recreate containers (requires all secret fields; tightest rate limit) |
The System & Updates page reads /health (proxied by apps/web/server/api/internal/updater-health.get.ts) to render the container-health table. The convex-deploy-before-restart ordering is the safety property: an incompatible schema fails the deploy step and the running Web/MTA containers are never restarted against a half-deployed backend.
The in-app updater applies images and compose templates from the GitHub Release pipeline; it is the supported self-update path. The desktop app's auto-update key is an empty placeholder, so desktop auto-update is not production-ready and desktop distribution is operator-provided — see Desktop App.
For the operator-facing update playbook (rollback, recovery, automation), see Maintenance & Updates.
Dev-mode endpoints, crons, and migrations
Dev-mode endpoints
A small set of destructive shortcuts exist for local development. They are fail-closed: the guard in apps/api/convex/devShortcuts/_guard.ts treats the deployment as production unless the operator explicitly sets OWLAT_DEV_MODE=true in the Convex backend runtime env (npx convex env set OWLAT_DEV_MODE true). The CLI-side CONVEX_DEPLOYMENT env var is not propagated into the function runtime, so it cannot be used as a security boundary.
| Endpoint | Guard | Purpose |
|---|---|---|
POST /dev/reset | OWLAT_DEV_MODE + X-Instance-Secret | Wipe the instance back to a blank slate (tenant tables, BetterAuth tables, and local auth tables) so the signup flow can be re-exercised without docker compose down -v; idempotent |
POST /seed/demo | OWLAT_DEV_MODE + X-Instance-Secret | Seed realistic demo content (idempotent) |
forceVerifyDomain | OWLAT_DEV_MODE + organization:manage | Force a domain to verified with synthesised DNS results, bypassing live DNS lookups |
/dev/reset deletes everything tenant-side — not just seed-tagged rows. Leave OWLAT_DEV_MODE unset on any deployment that holds real data.
Crons
Scheduled jobs are registered in apps/api/convex/crons.ts. Operationally relevant ones include:
| Cron | Interval | Job |
|---|---|---|
| process scheduled campaigns | 1 min | Backup dispatch for campaigns whose scheduledAt has passed |
| reconcile sending campaigns | 1 min | Advance sending → sent when no queued sends remain |
| process account deletions | 24 h | Process accounts past their 30-day grace period |
| cleanup webhook logs | weekly | Remove delivery logs older than 30 days |
| cleanup sending reputation | 1 h | Age out reputation buckets older than 60 days (both scopes) |
| cleanup soft-deleted contacts | 24 h | Permanently delete contacts past their 30-day retention |
| knowledge graph maintenance | 24 h | Confidence decay + expiry cleanup |
| channel health checks | 5 min | Probe channel connectivity |
| agent metrics rollup | 5 min | Compute queue depth/latency/error rates and evaluate circuit breakers |
| report analytics | 15 min | Report instance analytics to the control plane |
Counter-reconciliation crons (reconcile contact counts, reconcile topic member counts, refresh segment counts, reconcile transactional send counts) keep denormalized counters honest against any drift from partial failures.
Migrations
Schema/data migrations are one-shot internal mutations under apps/api/convex/migrations/, each exporting a run mutation. They follow the pre-prod "atomic breaking change" pattern: each migration is idempotent (re-running is a no-op once every row is rewritten) and, because deployments are single-org and resettable, most are written to run synchronously against .collect().
You invoke one by name through the Convex CLI, for example:
npx convex run migrations/0033_campaign_audience:run
Owlat's convention is clean breaking changes plus a data reset rather than two-phase backward-compat migration ceremony. Several migrations note that on a freshly-seeded deployment they see no legacy rows and are a no-op record of intent — if a deployment ever does carry legacy rows that a migration cannot map, the migration throws loudly instead of silently dropping data.