ADR-005: Custom MTA

Why Owlat built a custom Mail Transfer Agent instead of relying solely on third-party email providers.

  • Status: Accepted
  • Date: 2025-03-20

Context

Owlat previously depended entirely on AWS SES and Resend for email delivery. While these providers offer reliable APIs and managed infrastructure, they introduce constraints at scale:

  1. Cost — per-email pricing scales linearly. High-volume senders pay significantly more than the cost of direct SMTP delivery from dedicated IPs.
  2. Deliverability control — third-party providers manage shared and dedicated IP reputation on behalf of many customers. Owlat has no ability to implement ISP-specific throttling, IP warming schedules, or engagement-based sending priority.
  3. Bounce latency — bounce and complaint data arrives via provider webhooks with variable delay, making it harder to react quickly to reputation issues.
  4. Rate limits — provider-imposed sending quotas (especially in SES sandbox) constrain campaign throughput. Owlat cannot independently manage backpressure per ISP.

The main options:

  1. Keep SES/Resend only — simplest operationally, but cost and control limitations remain.
  2. Build a custom MTA — direct SMTP delivery with full control over IPs, throttling, warming, and bounce processing. Higher infrastructure complexity but significantly lower per-email cost and better deliverability tuning.
  3. Use an open-source MTA (Postfix, Haraka) — avoids building from scratch but requires adapting general-purpose software to Owlat's specific needs (GroupMQ integration, per-org circuit breakers, engagement priority).

Decision

Build a custom MTA as a standalone service (apps/mta/) that sends email via direct SMTP delivery to recipient mail servers. The MTA is an optional email provider — selected via EMAIL_PROVIDER=mta — alongside the existing SES and Resend providers.

Key design choices:

  • Hono HTTP API — lightweight, standard HTTP server for receiving send requests from the Convex backend. Endpoints: /send, /send/batch, /health, /metrics.
  • GroupMQ + Redis — job queue with group-based processing. Jobs are grouped by {ipPool}:{recipientDomain} so emails to the same ISP from the same IP pool are processed sequentially, respecting per-domain rate limits.
  • Intelligence pipeline — six pre-send checks run before every delivery attempt: circuit breaker (per-org bounce protection), domain throttle (adaptive per-ISP rate limiting), SMTP response tracking, DNSBL checking, IP warming cap, and engagement-based priority.
  • VERP return-path — Variable Envelope Return Path encoding correlates bounce messages back to original sends without maintaining a lookup table.
  • Webhook feedback loop — delivery events (sent, bounced, complained) are posted back to the Convex backend via authenticated webhooks, reusing the existing bounce/complaint processing pipeline.

Consequences

Enables:

  • Per-ISP adaptive rate limiting (Gmail 100/min, Outlook 80/min, Yahoo 50/min) with automatic backoff on 4xx responses
  • Automated IP warming over 30 days with adaptive acceleration/deceleration based on deliverability signals
  • Engagement-based sending priority — high-engagement contacts are delivered first
  • Per-organization circuit breaker — automatically pauses sending when bounce rates exceed thresholds
  • DNS blocklist monitoring with automatic IP removal from active pool
  • Direct cost savings at scale (no per-email API fees beyond infrastructure)
  • Full bounce/complaint processing pipeline with DSN parsing and ARF/FBL support

Trade-offs:

  • Requires dedicated IPs with proper rDNS/PTR records and DKIM key management
  • Infrastructure complexity: Redis for state, SMTP port 25 access, separate containerized service
  • Operational burden: monitoring DNSBL listings, managing IP warming, investigating deliverability issues
  • Falls back gracefully to SES/Resend — the MTA is additive, not a replacement

Comparison with Alternatives

CapabilityOwlat MTAPostalSES/Resend
Adaptive per-ISP throttlingYes (10+ ISP profiles)NoProvider-managed
IP warmingYes (30-day adaptive)NoProvider-managed
Per-org circuit breakerYes (3-state)NoNo
Engagement-based priorityYesNoNo
DNSBL auto-remediationYes (15-min checks)NoProvider-managed
Message storage/searchNoYes (per-server MySQL)No
Web admin UINoYes (Rails dashboard)Provider console
Spam content scoringNoYes (pluggable inspectors)Provider-managed
Click/open trackingNo (platform layer)Yes (built-in)Provider-managed
Cost at scaleLow (infrastructure only)Low (self-hosted)High (per-email)