Email Security

Content scanning, attachment validation, URL reputation checking, and malware detection for outbound emails.

All outbound email passes through multi-layered security scanning before delivery. The @owlat/email-scanner package provides all scanning logic as a shared library, consumed by both apps/api (Convex) and apps/mta.

Content Scanning

Analyzes email subject and HTML body for malicious or unwanted content. All scanners are pure TypeScript with zero dependencies, safe for the Convex serverless runtime.

Spam Keywords

40+ weighted patterns detect common spam phrases:

CategoryExamplesSeverity
Financial scams"free money", "million dollars", "wire transfer"High (20 pts)
Urgency tactics"act now", "limited time", "expires today"Medium (10 pts)
Suspicious phrases"click here", "no obligation", "satisfaction guaranteed"Low (3 pts)

Phishing URL Detection

  • URL shortener detection (bit.ly, t.co, goo.gl, etc.)
  • Anchor/href domain mismatch (link text says paypal.com but links to evil.com)
  • Suspicious URL patterns (IP addresses in URLs, excessive subdomains)

Homoglyph / Unicode Spoofing

Detects mixed-script characters used to impersonate legitimate domains:

  • ~50 confusable character mappings (Cyrillic а U+0430 vs Latin a U+0061, Greek ο U+03BF vs Latin o, etc.)
  • Mixed-script detection in link text and URL hostnames
  • Severity: high (20 pts) — homoglyph spoofing is a strong phishing indicator

Prohibited Content

Pattern matching for high-severity scam content:

  • Advance fee fraud patterns ("beneficiary", "next of kin", "unclaimed funds")
  • Credential phishing ("verify your account", "confirm your password", "update your payment")

Subject Line Analysis

  • ALL CAPS abuse (>50% uppercase characters)
  • Excessive punctuation (3+ consecutive ! or ?)

Scoring

All flags contribute to a composite score:

SeverityPointsExample
High20Homoglyph spoofing, credential phishing
Medium10URL shorteners, spam keywords
Low3Excessive punctuation, minor spam phrases

Thresholds:

ScoreLevelAction
0–14CleanAllowed
15–39SuspiciousAllowed with warning stored
40+BlockedSend rejected

File Validation

Validates attachments and media uploads before storage or sending. Pure TypeScript, no external dependencies.

Magic Bytes Detection

Identifies real file type from binary headers (first 16 bytes), regardless of file extension:

File TypeMagic Bytes
PE executable (.exe, .dll)4D 5A (MZ)
ELF binary7F 45 4C 46
MSI installerD0 CF 11 E0 (OLE compound)
PDF25 50 44 46 (%PDF)
PNG89 50 4E 47
JPEGFF D8 FF
ZIP/DOCX/XLSX50 4B 03 04 (PK)

Double Extension Detection

Catches attacks that hide executable extensions after document extensions:

  • invoice.pdf.exe — detected and blocked
  • report.docx.js — detected and blocked
  • photo.jpg.scr — detected and blocked

Extension Allowlist

Permitted file types (everything else is blocked):

CategoryExtensions
Images.jpg, .jpeg, .png, .gif, .webp, .svg, .ico, .bmp, .tiff
Documents.pdf, .doc, .docx, .odt, .rtf, .txt
Spreadsheets.xls, .xlsx, .csv, .ods
Archives.zip, .gz, .tar

MIME Type Allowlist

Validates declared content types against an allowlist of safe MIME types (e.g., image/*, application/pdf, text/plain).

Integration Points

  • emailWorker.ts — validates each attachment buffer before sending
  • mediaAssets.ts — validates uploads before storing to Convex file storage

URL Reputation

Checks URLs in email content against the Google Safe Browsing API v4.

How It Works

  1. Extract all URLs from the email HTML content
  2. Normalize and hash URLs (SHA-256)
  3. Check cache (urlReputationCache table) for known verdicts
  4. Batch-check uncached URLs against Safe Browsing API (up to 500 per request)
  5. Cache results (24h for clean, 1h for flagged)
  6. Convert flagged URLs to ContentFlag entries with severity 'high'

Threat Types

ThreatDescription
MALWARESites hosting malicious software
SOCIAL_ENGINEERINGPhishing and deceptive sites
UNWANTED_SOFTWARESites distributing unwanted software
POTENTIALLY_HARMFUL_APPLICATIONMobile app threats

Campaign vs Transactional

Email TypeBehavior
Campaign sendsBlocking gate — flagged URLs prevent the campaign from sending
Transactional sendsGraceful — flags stored for review, delivery not blocked

Configuration

Requires the GOOGLE_SAFE_BROWSING_API_KEY environment variable set in the Convex dashboard. The free tier allows 10,000 requests per day. When the API key is not configured, URL reputation checking is silently skipped.

ClamAV Malware Scanning

Scans attachment binary data for known malware signatures using ClamAV running as a Docker sidecar alongside the MTA.

Architecture

Convex emailWorker
    │
    │  POST /scan/attachment
    │  (binary data + X-Filename header)
    ▼
MTA scan endpoint (src/routes/scan.ts)
    │
    ├─ File type validation (magic bytes, extensions)
    │
    └─ ClamAV scan (TCP INSTREAM protocol)
        │
        ▼
    clamd (port 3310)

INSTREAM Protocol

The @owlat/email-scanner ClamAV client communicates with clamd over TCP using the INSTREAM protocol:

  1. Send zINSTREAM\0
  2. Send chunked binary data (4-byte big-endian length prefix + data)
  3. Send zero-length terminator
  4. Read verdict: stream: OK\0 or stream: <virus_name> FOUND\0

Fail-Open Design

If ClamAV is unavailable (container not running, network error, timeout), the scan returns { clean: true, skipped: true } and logs a warning. This prevents ClamAV outages from blocking all email delivery.

Health Check

GET /scan/health returns the ClamAV connection status:

{ "clamav": "connected", "version": "ClamAV 1.3.0" }

or

{ "clamav": "unavailable", "error": "Connection refused" }

Configuration

See MTA System > ClamAV Sidecar for Docker setup and environment variables.

Feedback Loops

Spam complaints from ISP feedback loops are linked back to campaign content scan results:

  1. Resend/MTA webhook delivers a complaint event
  2. resendWebhook.ts processes the complaint and looks up the campaign's content scan result
  3. Complaint count is incremented on the contentScanResults record
  4. This data enables future pattern learning and complaint rate tracking per campaign

Integration Summary

ScannerWhere CalledBlocking?On Failure
Content (spam + homoglyphs)emails.ts, transactionalEmails.tsYesN/A (pure TS, always runs)
File type validationemailWorker.ts, mediaAssets.tsYesBlock (safe default)
URL reputation (Safe Browsing)emails.tsCampaigns: yes, Transactional: noAllow, skip silently
ClamAV malware scanemailWorker.ts via MTA /scan/attachmentYesAllow, log warning (fail-open)

Package Structure

packages/email-scanner/src/
├── content/              # Content analysis (pure TS, Convex-safe)
│   ├── index.ts          # scanContent() orchestrator
│   ├── spamKeywords.ts   # 40+ weighted spam patterns
│   ├── phishingUrls.ts   # URL shorteners, anchor/href mismatch
│   ├── homoglyphs.ts     # Unicode spoofing detection
│   ├── prohibitedContent.ts  # Advance fee fraud, credential phishing
│   └── subjectAnalysis.ts    # ALL CAPS, excessive punctuation
├── files/                # File type validation (pure TS)
│   ├── index.ts          # validateFile() orchestrator
│   ├── magicBytes.ts     # Binary header detection
│   ├── doubleExtension.ts    # invoice.pdf.exe detection
│   └── filePolicy.ts     # Allowlist/blocklist engine
├── urls/                 # URL reputation (uses fetch)
│   ├── index.ts          # checkUrlReputation() orchestrator
│   ├── safeBrowsing.ts   # Google Safe Browsing API v4 client
│   └── cache.ts          # Abstract cache interface
├── clamav/               # ClamAV TCP client (Node.js net, MTA only)
│   ├── index.ts          # createClamClient() factory
│   ├── client.ts         # clamd INSTREAM protocol implementation
│   └── pool.ts           # Connection pooling
├── types.ts              # Shared types (ContentFlag, ScanResult, etc.)
└── index.ts              # Barrel export (excludes clamav/)
ClamAV is Node.js only

The clamav/ module uses Node.js net for TCP connections and is only importable from the MTA. The content/, files/, and urls/ modules are pure TS and work in both Convex and Node.js environments.

Key Files

FilePurpose
packages/email-scanner/src/content/index.tsscanContent() — main content scanning orchestrator
packages/email-scanner/src/files/index.tsvalidateFile() — file validation orchestrator
packages/email-scanner/src/urls/index.tscheckUrlReputation() — URL reputation orchestrator
packages/email-scanner/src/clamav/index.tscreateClamClient() — ClamAV client factory
apps/api/convex/lib/contentScanner.tsThin re-export wrapper from @owlat/email-scanner
apps/api/convex/emailWorker.tsAttachment validation + ClamAV scan before sending
apps/api/convex/mediaAssets.tsUpload file validation (extension + MIME type)
apps/api/convex/emails.tsContent scanning + URL reputation in campaign send flow
apps/api/convex/schema.tsurlReputationCache table, extended contentScanResults
apps/api/convex/resendWebhook.tsComplaint feedback loop integration
apps/mta/src/routes/scan.tsMTA /scan/attachment and /scan/health endpoints
apps/mta/docker-compose.ymlClamAV sidecar configuration