Deliverability Infrastructure

The Convex-side deliverability backend: provider routing, health-aware failover, sending reputation with auto-enforcement, IP warming cache, the blocklist, and the content-scan gate.

This page maps the deliverability machinery that lives in the Convex backend: how a send picks a provider, how provider failures feed failover, how delivery events accumulate into a reputation score that can auto-warn or auto-suspend the deployment, the cached view of IP-warming state, the address blocklist, and the daily send counters plus the pre-send content-scan thresholds.

The sending-side intelligence — per-ISP throttling, the actual IP warming schedule, circuit breakers, and DNSBL monitoring — lives in the MTA, not Convex. See MTA System for that. This page covers everything Convex owns; the two meet at the MTA's /ip-reputation endpoint and its delivery webhooks.

Per-org provider routing and strategies

A deployment can route each message type to a different email provider, or split a single type across several. Routes are stored in the providerRoutes table (apps/api/convex/schema/delivery.ts) and managed through apps/api/convex/providerRoutes.ts.

Each route row keys on one message type and carries a strategy, an ordered provider list, and an optional IP-pool override:

FieldTypeMeaning
messageTypecampaign | transactional | automationOne route per message type
strategysingle | priority_failover | workload_splitSelection algorithm
providersarray of { providerType, weight?, isEnabled }Ordered candidate set; providerType is mta / ses / resend
ipPoolstring (optional)Override the MTA IP pool for sends on this route

setRoute upserts a route and removeRoute deletes one (reverting that message type to the global default). Both require the organization:manage permission. The public reader is listRoutes (an authedQuery); send paths resolve routes through resolveSendRoute / resolveSendRouteFromDb (apps/api/convex/lib/sendProviders/route.ts).

The three strategies

Selection is a pure function. The thin dispatcher resolveRoute (apps/api/convex/lib/sendProviders/routing.ts) looks up a strategy module by strategy and calls its select() with the enabled providers, the route's ipPool, and the current provider-health snapshot. Each strategy lives in its own folder under lib/sendProviders/strategies/:

StrategyBehaviour
singleAlways use the first enabled provider. Ignores health.
priority_failoverWalk enabled providers in order; pick the first that is not down. Falls back to the first enabled provider if all are down or no health data exists.
workload_splitWeighted-random pick across enabled providers, excluding any that are down. Weights default to 100 (uniform). If every provider is down, it still picks one rather than blocking the send.

When there is no route, no enabled provider, or the strategy returns nothing, resolveRoute falls through to the EMAIL_PROVIDER env var, and otherwise returns null (unconfigured). Resolution is fail-closed — there is no implicit mta default, so an unconfigured deployment never silently dispatches to a phantom MTA. The returned ResolvedRoute records its source (org_config / env_fallback) for observability.

IP pool plumbing

The route's ipPool is threaded to the MTA adapter via MtaExtras (apps/api/convex/lib/sendProviders/mta/index.ts); when unset, the adapter defaults to the transactional pool. Transactional, test, and one-off sends pass it directly. The per-recipient campaign workpool worker (apps/api/convex/delivery/worker.ts) passes MtaExtras carrying only a messageId idempotency key (worker.ts:441), with no ipPool, so campaign-level pool routing is selected at orchestration time rather than re-derived per message.

Send dispatch and provider health-aware failover

Every send producer funnels through one helper, sendProviderDispatch (apps/api/convex/lib/sendProviders/dispatch.ts), described by ADR-0020 (docs/adr/0020-send-provider-adapter-modules.md). Six producers route through it: the workpool worker, the campaign orchestrator's test send, the post-send resend, the automation email step, the transactional HTTP send, and the system/auth mail sender (sendSystemEmail in systemMail.ts). The dispatcher does three things uniformly:

  1. Retry loop driven by each provider module's retryDelays and categorizeError. Each attempt calls the module's single-attempt sendEmail.
  2. Health recording — after every terminal outcome (success or exhausted retries) it schedules recordSendResult on the Send provider health module, so even bypass callers (test sends, automation steps) record health.
  3. Error categorization — the result carries a typed EmailErrorCode, not a raw string.

Provider health

recordSendResult (apps/api/convex/lib/sendProviders/health.ts) maintains one providerHealth row per provider kind, using exponentially-decayed rolling success/failure counts, an EMA latency, and a consecutive-failure counter. Status is derived from those:

StatusCondition
healthysuccess rate ≥ 90%
degradedsuccess rate ≥ 50% and < 90%
downsuccess rate < 50%, or ≥ 5 consecutive failures

The providerHealth rows are collected by resolveSendRouteFromDb (apps/api/convex/lib/sendProviders/route.ts) and passed to the pure resolveRoute before each dispatch, closing the loop: a provider that starts failing flips to down and priority_failover / workload_split route around it automatically.

Sending reputation (org + per-domain, derived risk, auto-enforcement)

Delivery outcomes accumulate into the sendingReputation table, owned exclusively by the Sending reputation module (apps/api/convex/analytics/sendingReputation.ts), per ADR-0042 (docs/adr/0042-sending-reputation-module.md). It is a scope-discriminated table: scope: 'org' rows track the whole deployment; scope: 'domain' rows track one sending domain. Bounce rate, complaint rate, and risk level are never stored — they are derived on read.

How events arrive

The Send lifecycle (apps/api/convex/delivery/sendLifecycle.ts) emits a reputation_update effect on each delivery transition, which schedules recordEvent with an event type and (when known) the sending domain. recordEvent is the single writer: it bumps today's org day-bucket always, and the domain day-bucket when a domain is present.

Event typeCounters bumped
sendtotalSent
delivertotalDelivered
bouncetotalBounced
hard_bouncetotalBounced and totalHardBounced
complainttotalComplaints

Derived risk

summarize (and summarizeDomains for the per-domain view) is the only place the rolling 30-day window is summed; it is reader-typed so the writer, the session-auth queries, the platform-admin queries, and the control-plane reporter all derive the identical number. Risk is computed from industry-standard thresholds (Gmail/Yahoo reject above a 0.3% complaint rate). Senders below the minimum sample size are always low:

RiskTrigger (with ≥ 100 sends in window)
lowbelow the medium thresholds
mediumcomplaint rate ≥ 0.1% or bounce rate ≥ 2%
highcomplaint rate ≥ 0.2% or bounce rate ≥ 5%
criticalcomplaint rate ≥ 0.3% or bounce rate ≥ 10%

Auto-enforcement

Auto-enforcement no longer runs inside recordEvent (which now only bumps the sharded counters). It runs hourly via the evaluateAutoEnforce cron, which summarizes the org window once and — at high or critical — schedules autoEnforceReputation, picking a target Abuse status (highwarned, criticalsuspended) and delegating the transition to the Abuse status module (ADR-0011), which dedupes idempotently and refuses severity downgrades. Domain buckets feed the per-domain dashboard only — Abuse status is a deployment-level state.

recalculateAll is a cleanup-only hourly cron (wired in apps/api/convex/crons.ts); it ages out day-buckets older than 60 days across both scopes. Risk no longer needs periodic recalculation because it is derived on read.

Where the dashboards live

Session-scoped reads are in apps/api/convex/analytics/reputationQueries.ts (getSendingOverview, getDomainReputations). The deployment-wide platform-admin reputation surface (roster, abuse status, content-review) is a backend/API surface in this OSS repo, not a bundled product dashboard — the rich control-plane UI was extracted to a separate private repo.

IP warming state (Convex-cached) and send estimates

The MTA owns the real warming schedule and per-IP state in Redis. Convex keeps a cached, reactive copy so queries can subscribe to it without hitting the MTA on every read. syncWarmingState (apps/api/convex/delivery/warmingSync.ts) runs every 5 minutes (cron in crons.ts), fetches GET /ip-reputation from the MTA, filters to the campaign pool (transactional IPs have no warming limits), aggregates the per-IP rows, and upserts the singleton warmingState row. If MTA_INTERNAL_URL / MTA_API_KEY are unset, it silently skips.

The cached row carries an overall phase (ramp / plateau / graduated), the summed daily cap, today's send count, an IP count, and a per-IP breakdown (phase, warming day, daily cap, sent today, bounce/deferral rate, pool, active flag).

Two client-facing queries read it (reputationQueries.ts):

  • getSendingOverview — combines warming state, daily send volume, the rolling 30-day org reputation summary, and the current abuse status into one card.
  • getCampaignSendEstimate — given a recipient count, estimates how many days a campaign will take based on remaining daily capacity, projecting forward conservatively (~1.5× cap growth per day) when IPs are still warming. Fully-warmed deployments report a single-day estimate.
Estimate is a projection

getCampaignSendEstimate is a UI-facing projection, not a scheduler. The actual pacing is enforced by the MTA's warming throttle at delivery time; this query only sets recipient expectations.

Suppression list and blocklist

The blockedEmails table is the address-level suppression list — the last line of defense for sender reputation. It is managed in apps/api/convex/blockedEmails.ts. Each row stores a normalized (lowercased, trimmed) address and a reason:

ReasonSource
bouncedHard bounce — the address doesn't exist
complainedRecipient marked the email as spam
manualOperator added it by hand

Blocked addresses are excluded from sends as part of the campaign audience eligibility predicate (soft-delete + email-present + suppression + DOI-if-topic). Auto-blocking happens through addFromEvent, the internal writer the bounce/complaint handlers call; it is idempotent (re-blocking an existing address returns the existing record). Operator-facing surface:

  • add / bulkAdd / remove — require the contacts:manage permission.
  • listByTeam (optionally filtered by reason), get, getByEmail, getCountsByReason — reads.
  • isBlocked (session) and isBlockedInternal (used by other Convex functions, no access check) — point lookups.

All lookups go through the by_email index on the normalized address; by_reason backs the filtered list and counts.

Daily send stats and content-scan gate thresholds

Daily counters

Two separate daily counters exist, for different purposes:

  • instanceSettings.dailySendCount — a single running counter for the current UTC day, bumped via nextDailySendCount (apps/api/convex/lib/sendingLimits.ts), which folds the increment into the instanceSettings patch (the bulk-send writer is incrementDailySendCountInternal in campaigns/sendQueries.ts). It is display-only; tier-based limits were removed and pacing is the MTA's job. getDailySendVolume resets it lazily on the first read of a new UTC day.
  • sendDailyStats — one row per UTC day with sent / delivered / opened / clicked counters, written by the Send lifecycle's daily_stats_bump effect through bumpSendDailyStat (apps/api/convex/lib/sendDailyStats.ts). The dashboard summary card reads the last 30 rows of this table instead of scanning every send.

The content-scan gate

Before a campaign fans out, the orchestrator (apps/api/convex/campaigns/send.ts) runs the address through the email scanner. It combines the local content score (scanContent from @owlat/email-scanner) with an optional Google Safe Browsing URL-reputation pass (only when GOOGLE_SAFE_BROWSING_API_KEY is set; URL-check failures never block a send). The combined 0–100 score maps to three levels:

ScoreLevelOutcome
≥ 40blockedCampaign reverts to draft with a contentBlockReason; send aborts
15–39suspiciousCampaign transitions to pending_review for platform-admin review
< 15cleanSend proceeds

Non-clean results are persisted to contentScanResults as an audit trail (keyed by resourceType + resourceId). The same scanner backs attachment and media-upload validation elsewhere; URL verdicts are cached in urlReputationCache (24h for clean, 1h for flagged). For the security-scanning internals, see Email Security.

Custom tracking domains

Open/click links point at the deployment's own tracking host — the /t/o and /t/c HTTP actions, served at CONVEX_SITE_URL. A deployment can register and DNS-verify a branded subdomain so that, eventually, tracked links carry its own domain rather than a shared host. Registration, admin-gated DNS verification, and the Settings surface are in place; the send-time link rewrite is not yet wired (see the second bullet). The trackingDomains table is managed in apps/api/convex/domains/trackingDomains.ts and surfaced under Settings → Domains (TrackingDomainsSection.vue):

  • addTrackingDomain records the subdomain with a cnameTarget derived from CONVEX_SITE_URL's hostname — the host that actually serves the tracking handlers, never an external SaaS host. verifyTrackingDomain schedules a DNS-over-HTTPS (Cloudflare) CNAME check (verifyTrackingDomainDns) that flips the row to verified only when the CNAME resolves to that target. All three mutations require requireAdminContext.
  • The internal getActiveTrackingDomain query (trackingDomains.ts) exposes the first verified row to the send pipeline, but no producer consumes it yet. delivery/worker.ts derives trackingBaseUrl as envelopeInput.trackingBaseUrl ?? convexSiteUrl, and nothing on the campaign send path populates trackingBaseUrl from a tracking domain (apps/api/convex/campaigns/ has zero references to it). So tracked links still default to CONVEX_SITE_URL regardless of any verified domain — the branding half of this feature is registered and verified but not yet rendered into links.