Email Security | Owlat Docs

Content scanning, attachment validation, URL reputation checking, and malware detection for outbound emails.

All outbound email passes through multi-layered security scanning before delivery. The @owlat/email-scanner package provides all scanning logic as a shared library, consumed by both apps/api (Convex) and apps/mta.

Content Scanning

Analyzes email subject and HTML body for malicious or unwanted content. All scanners are pure TypeScript with zero dependencies, safe for the Convex serverless runtime.

Spam Keywords

40+ weighted patterns detect common spam phrases:

Category	Examples	Severity
Financial scams	"free money", "million dollars", "wire transfer"	High (20 pts)
Urgency tactics	"act now", "limited time", "expires today"	Medium (10 pts)
Suspicious phrases	"click here", "no obligation", "satisfaction guaranteed"	Low (3 pts)

Phishing URL Detection

URL shortener detection (bit.ly, t.co, goo.gl, etc.)
Anchor/href domain mismatch (link text says paypal.com but links to evil.com)
Suspicious URL patterns (IP addresses in URLs, excessive subdomains)

Homoglyph / Unicode Spoofing

Detects mixed-script characters used to impersonate legitimate domains:

~50 confusable character mappings (Cyrillic а U+0430 vs Latin a U+0061, Greek ο U+03BF vs Latin o, etc.)
Mixed-script detection in link text and URL hostnames
Severity: high (20 pts) — homoglyph spoofing is a strong phishing indicator

Prohibited Content

Pattern matching for high-severity scam content:

Advance fee fraud patterns ("beneficiary", "next of kin", "unclaimed funds")
Credential phishing ("verify your account", "confirm your password", "update your payment")

Subject Line Analysis

ALL CAPS abuse (>50% uppercase characters)
Excessive punctuation (3+ consecutive ! or ?)

Scoring

All flags contribute to a composite score:

Severity	Points	Example
High	20	Homoglyph spoofing, credential phishing
Medium	10	URL shorteners, spam keywords
Low	3	Excessive punctuation, minor spam phrases

Thresholds:

Score	Level	Action
0–14	Clean	Allowed
15–39	Suspicious	Allowed with warning stored
40+	Blocked	Send rejected

File Validation

Validates attachments and media uploads before storage or sending. Pure TypeScript, no external dependencies.

Magic Bytes Detection

Identifies real file type from binary headers (first 16 bytes), regardless of file extension:

File Type	Magic Bytes
PE executable (.exe, .dll)	`4D 5A` (MZ)
ELF binary	`7F 45 4C 46`
MSI installer	`D0 CF 11 E0` (OLE compound)
PDF	`25 50 44 46` (%PDF)
PNG	`89 50 4E 47`
JPEG	`FF D8 FF`
ZIP/DOCX/XLSX	`50 4B 03 04` (PK)

Double Extension Detection

Catches attacks that hide executable extensions after document extensions:

invoice.pdf.exe — detected and blocked
report.docx.js — detected and blocked
photo.jpg.scr — detected and blocked

Extension Allowlist

Permitted file types (everything else is blocked):

Category	Extensions
Images	`.jpg`, `.jpeg`, `.png`, `.gif`, `.webp`, `.svg`, `.ico`, `.bmp`, `.tiff`
Documents	`.pdf`, `.doc`, `.docx`, `.odt`, `.rtf`, `.txt`
Spreadsheets	`.xls`, `.xlsx`, `.csv`, `.ods`
Archives	`.zip`, `.gz`, `.tar`

MIME Type Allowlist

Validates declared content types against an allowlist of safe MIME types (e.g., image/*, application/pdf, text/plain).

Integration Points

emailWorker.ts — validates each attachment buffer before sending
mediaAssets.ts — validates uploads before storing to Convex file storage

URL Reputation

Checks URLs in email content against the Google Safe Browsing API v4.

How It Works

Extract all URLs from the email HTML content
Normalize and hash URLs (SHA-256)
Check cache (urlReputationCache table) for known verdicts
Batch-check uncached URLs against Safe Browsing API (up to 500 per request)
Cache results (24h for clean, 1h for flagged)
Convert flagged URLs to ContentFlag entries with severity 'high'

Threat Types

Threat	Description
`MALWARE`	Sites hosting malicious software
`SOCIAL_ENGINEERING`	Phishing and deceptive sites
`UNWANTED_SOFTWARE`	Sites distributing unwanted software
`POTENTIALLY_HARMFUL_APPLICATION`	Mobile app threats

Campaign vs Transactional

Email Type	Behavior
Campaign sends	Blocking gate — flagged URLs prevent the campaign from sending
Transactional sends	Graceful — flags stored for review, delivery not blocked

Configuration

Requires the GOOGLE_SAFE_BROWSING_API_KEY environment variable set in the Convex dashboard. The free tier allows 10,000 requests per day. When the API key is not configured, URL reputation checking is silently skipped.

ClamAV Malware Scanning

Scans attachment binary data for known malware signatures using ClamAV running as a Docker sidecar alongside the MTA.

Architecture

Convex emailWorker
    │
    │  POST /scan/attachment
    │  (binary data + X-Filename header)
    ▼
MTA scan endpoint (src/routes/scan.ts)
    │
    ├─ File type validation (magic bytes, extensions)
    │
    └─ ClamAV scan (TCP INSTREAM protocol)
        │
        ▼
    clamd (port 3310)

INSTREAM Protocol

The @owlat/email-scanner ClamAV client communicates with clamd over TCP using the INSTREAM protocol:

Send zINSTREAM\0
Send chunked binary data (4-byte big-endian length prefix + data)
Send zero-length terminator
Read verdict: stream: OK\0 or stream: <virus_name> FOUND\0

Fail-Open Design

If ClamAV is unavailable (container not running, network error, timeout), the scan returns { clean: true, skipped: true } and logs a warning. This prevents ClamAV outages from blocking all email delivery.

Health Check

GET /scan/health returns the ClamAV connection status:

{ "clamav": "connected", "version": "ClamAV 1.3.0" }

{ "clamav": "unavailable", "error": "Connection refused" }

Configuration

See MTA System > ClamAV Sidecar for Docker setup and environment variables.

Feedback Loops

Spam complaints from ISP feedback loops are linked back to campaign content scan results:

Resend/MTA webhook delivers a complaint event
resendWebhook.ts processes the complaint and looks up the campaign's content scan result
Complaint count is incremented on the contentScanResults record
This data enables future pattern learning and complaint rate tracking per campaign

Integration Summary

Scanner	Where Called	Blocking?	On Failure
Content (spam + homoglyphs)	`emails.ts`, `transactionalEmails.ts`	Yes	N/A (pure TS, always runs)
File type validation	`emailWorker.ts`, `mediaAssets.ts`	Yes	Block (safe default)
URL reputation (Safe Browsing)	`emails.ts`	Campaigns: yes, Transactional: no	Allow, skip silently
ClamAV malware scan	`emailWorker.ts` via MTA `/scan/attachment`	Yes	Allow, log warning (fail-open)

Package Structure

packages/email-scanner/src/
├── content/              # Content analysis (pure TS, Convex-safe)
│   ├── index.ts          # scanContent() orchestrator
│   ├── spamKeywords.ts   # 40+ weighted spam patterns
│   ├── phishingUrls.ts   # URL shorteners, anchor/href mismatch
│   ├── homoglyphs.ts     # Unicode spoofing detection
│   ├── prohibitedContent.ts  # Advance fee fraud, credential phishing
│   └── subjectAnalysis.ts    # ALL CAPS, excessive punctuation
├── files/                # File type validation (pure TS)
│   ├── index.ts          # validateFile() orchestrator
│   ├── magicBytes.ts     # Binary header detection
│   ├── doubleExtension.ts    # invoice.pdf.exe detection
│   └── filePolicy.ts     # Allowlist/blocklist engine
├── urls/                 # URL reputation (uses fetch)
│   ├── index.ts          # checkUrlReputation() orchestrator
│   ├── safeBrowsing.ts   # Google Safe Browsing API v4 client
│   └── cache.ts          # Abstract cache interface
├── clamav/               # ClamAV TCP client (Node.js net, MTA only)
│   ├── index.ts          # createClamClient() factory
│   ├── client.ts         # clamd INSTREAM protocol implementation
│   └── pool.ts           # Connection pooling
├── types.ts              # Shared types (ContentFlag, ScanResult, etc.)
└── index.ts              # Barrel export (excludes clamav/)

ClamAV is Node.js only

The clamav/ module uses Node.js net for TCP connections and is only importable from the MTA. The content/, files/, and urls/ modules are pure TS and work in both Convex and Node.js environments.

Key Files

File	Purpose
`packages/email-scanner/src/content/index.ts`	`scanContent()` — main content scanning orchestrator
`packages/email-scanner/src/files/index.ts`	`validateFile()` — file validation orchestrator
`packages/email-scanner/src/urls/index.ts`	`checkUrlReputation()` — URL reputation orchestrator
`packages/email-scanner/src/clamav/index.ts`	`createClamClient()` — ClamAV client factory
`apps/api/convex/lib/contentScanner.ts`	Thin re-export wrapper from `@owlat/email-scanner`
`apps/api/convex/emailWorker.ts`	Attachment validation + ClamAV scan before sending
`apps/api/convex/mediaAssets.ts`	Upload file validation (extension + MIME type)
`apps/api/convex/emails.ts`	Content scanning + URL reputation in campaign send flow
`apps/api/convex/schema.ts`	`urlReputationCache` table, extended `contentScanResults`
`apps/api/convex/resendWebhook.ts`	Complaint feedback loop integration
`apps/mta/src/routes/scan.ts`	MTA `/scan/attachment` and `/scan/health` endpoints
`apps/mta/docker-compose.yml`	ClamAV sidecar configuration