Skip to content

Improve Maildir performance for large messages#479

Open
supfeer wants to merge 6 commits into
mailhog:masterfrom
supfeer:performance-boost
Open

Improve Maildir performance for large messages#479
supfeer wants to merge 6 commits into
mailhog:masterfrom
supfeer:performance-boost

Conversation

@supfeer
Copy link
Copy Markdown

@supfeer supfeer commented May 22, 2026

Summary

This PR improves Maildir-backed MailHog performance for large messages and long-running instances.

The main change is to keep the existing /api/v1 and /api/v2 contracts compatible, while adding optimized /api/v3 endpoints for the web UI and high-volume polling use cases:

  • /api/v1 and /api/v2 continue returning full legacy message payloads for API clients.
  • /api/v3/messages and /api/v3/search return compact metadata only.
  • /api/v3/messages/{id} returns a bounded preview for large messages.
  • /api/v3/messages/{id}/body returns body preview chunks for progressive UI loading.
  • /api/v3/messages/{id}/download streams the original message without loading the whole message into memory.
  • /api/v3/messages?older_than=... supports deleting messages older than a relative age.
  • /api/v3/messages?created_before=... supports deleting messages created before an absolute cutoff.

The web UI now uses /api/v3 for list/search/websocket/preview/download paths, so large attachments do not inflate list/search responses.

Why

Polling workloads that wait for test emails can generate many repeated list/search requests. With large messages or attachments in Maildir, the previous behavior could repeatedly read and serialize full message bodies and MIME payloads for list/search responses. That makes responses much larger than the data needed for polling and puts pressure on memory in long-running containers.

What Changed

  • Build Docker images from the current checkout instead of go install github.com/mailhog/MailHog@latest.
  • Add streaming Maildir writes for SMTP DATA so large messages do not need to be retained in memory during storage.
  • Add a Maildir metadata cache with background refresh for list/search.
  • Add compact Maildir list/search entries while preserving full loads where legacy APIs require them.
  • Add bounded message preview, body chunk preview, and streaming download storage interfaces.
  • Add API v3 compact endpoints and keep v1/v2 compatibility by hydrating compact storage entries before returning legacy responses.
  • Add old-message deletion filters on v3:
    • DELETE /api/v3/messages?older_than=1h
    • DELETE /api/v3/messages?older_than=1d
    • DELETE /api/v3/messages?older_than=1w
    • DELETE /api/v3/messages?created_before=2026-05-22T12:34:56Z
    • DELETE /api/v3/messages?created_before=1700000000000
  • Update embedded UI assets to use v3 list/search/websocket/preview/download paths and show loading state while previews load.
  • Add GitHub Actions CI:
    • gofmt check
    • first-party Go tests in GOPATH mode
    • Docker build
    • 600MB memory-limit performance smoke
    • GHCR publish on master and v* tags

Before And After Examples

Before: list/search could include large raw payloads

Legacy list/search responses could include full message content that is not needed for polling:

{
  "total": 50,
  "count": 50,
  "items": [
    {
      "ID": "message-id",
      "Raw": {
        "From": "...",
        "To": ["..."],
        "Data": "full SMTP DATA including large MIME attachments"
      },
      "Content": {
        "Headers": {
          "Subject": ["Large attachment"]
        },
        "Body": "full body / MIME payload",
        "Size": 104857600
      },
      "MIME": {
        "Parts": ["large parsed MIME tree"]
      }
    }
  ]
}

That shape is still available from /api/v1 and /api/v2 for compatibility.

After: compact v3 list/search for polling and UI

GET /api/v3/messages?limit=50
GET /api/v3/search?kind=to&query=user@example.com&limit=50
{
  "total": 23,
  "count": 23,
  "start": 0,
  "items": [
    {
      "ID": "message-id",
      "From": {
        "Mailbox": "sender",
        "Domain": "example.com"
      },
      "To": [
        {
          "Mailbox": "user",
          "Domain": "example.com"
        }
      ],
      "Created": "2026-05-22T12:34:56Z",
      "Content": {
        "Headers": {
          "Subject": ["Large attachment"]
        },
        "Size": 104857600
      }
    }
  ]
}

The v3 compact contract intentionally omits:

  • Raw
  • Content.Body
  • MIME
  • attachment payloads

After: progressive body preview

GET /api/v3/messages/{id}/body?offset=0&limit=1048576
{
  "ID": "message-id",
  "Content": {
    "Headers": {
      "Content-Type": ["text/html; charset=utf-8"]
    },
    "Body": "<preview chunk up to the requested limit>",
    "Size": 12582912
  },
  "Offset": 0,
  "NextOffset": 1048576,
  "Limit": 1048576,
  "MaxSize": 10485760,
  "HasMore": true,
  "Truncated": false,
  "Source": "mime"
}

After: streaming download

GET /api/v3/messages/{id}/download

The response is streamed as message/rfc822 from storage, without loading the full message into memory.

After: deletion filters

Relative age:

DELETE /api/v3/messages?older_than=1h
DELETE /api/v3/messages?older_than=24h
DELETE /api/v3/messages?older_than=1d
DELETE /api/v3/messages?older_than=7d
DELETE /api/v3/messages?older_than=1w

Absolute cutoff:

DELETE /api/v3/messages?created_before=2026-05-22T12:34:56Z
DELETE /api/v3/messages?created_before=1700000000000

older_than intentionally does not accept timestamps; timestamp-like values belong in created_before.

Compatibility Notes

  • Existing /api/v1/messages behavior is preserved.
  • Existing /api/v2/messages, /api/v2/search, and /api/v2/websocket continue returning full legacy messages.
  • Optimized compact responses are introduced under /api/v3.
  • The web UI intentionally uses /api/v3.

Docker Image Naming

The workflow publishes to:

ghcr.io/${{ github.repository_owner }}/mailhog

For the upstream repository this resolves to:

ghcr.io/mailhog/mailhog

It also publishes branch/ref, tag, latest for master, and sha-<commit> tags.

Local Action Log

Implemented and split into commits:

9ec2255 Build Docker image from checkout
517ee72 Stream and cache Maildir messages
122220c Add API v3 compact message endpoints
cf1aac2 Update UI to use API v3 previews
6c25579 Add CI pipeline and performance smoke

Key implementation steps:

Created branch performance-boost
Updated Dockerfile and .dockerignore so Docker builds the current checkout
Added streaming Maildir message writer and cache refresh loop
Added Maildir preview, body chunk, streaming download, and delete-old support
Kept v1/v2 full-message API compatibility through hydration helpers
Added v3 compact API for list/search/websocket/preview/body/download/delete
Updated embedded UI assets to use v3 endpoints
Added GitHub Actions CI and stdlib Go performance smoke runner
Committed the changes as supfeer <supfeer@gmail.com>

Validation Log

GOPATH and unit test validation:

GOPATH=/tmp/mailhog-ci-verify GO111MODULE=off go test -timeout=60s ./...
?    github.com/mailhog/MailHog                                      [no test files]
?    github.com/mailhog/MailHog/config                               [no test files]
?    github.com/mailhog/MailHog/test/perf                            [no test files]
ok   github.com/mailhog/MailHog/vendor/github.com/mailhog/storage
ok   github.com/mailhog/MailHog/vendor/github.com/mailhog/MailHog-Server/api
?    github.com/mailhog/MailHog/vendor/github.com/mailhog/smtp        [no test files]
ok   github.com/mailhog/MailHog/vendor/github.com/mailhog/MailHog-Server/smtp

Docker build:

docker build -t mailhog-ci-test .
...
naming to docker.io/library/mailhog-ci-test done

600MB memory-limit performance smoke:

docker run -d --name mailhog-ci-smoke \
  --memory=600m \
  --memory-swap=600m \
  -p 127.0.0.1:18027:8025 \
  -p 127.0.0.1:11027:1025 \
  -e MH_STORAGE=maildir \
  -e MH_MAILDIR_PATH=/home/mailhog/maildir \
  -v mailhog-ci-smoke-maildir:/home/mailhog/maildir \
  mailhog-ci-test

MAILHOG_HTTP_URL=http://127.0.0.1:18027 \
MAILHOG_SMTP_ADDR=127.0.0.1:11027 \
MAILHOG_SMOKE_MAX_COMPACT_BYTES=2097152 \
go run ./test/perf/smoke.go

Smoke output:

sent dataset: small=20 large=3
validated compact list response: messages=23 total=23 bytes=13414
validated compact search response: messages=20 bytes=11563
validated body preview: id=... bodyBytes=48 hasMore=false truncated=false source=mime
load complete: sent=100 counts=map[body:25 download:25 list:25 search:25]
perf smoke passed

Observed memory sample during smoke:

mailhog-ci-smoke 135MiB / 600MiB

Also checked:

git diff --check
gofmt -l main.go config test/perf vendor/github.com/mailhog

Both passed with no reported issues.

@supfeer supfeer marked this pull request as ready for review May 22, 2026 19:49
Comment thread test/perf/smoke.go
resp, err := client.Get(endpoint(cfg.HTTPURL, "/api/v3/messages?limit=1"))
if err == nil {
resp.Body.Close()
if resp.StatusCode >= 200 && resp.StatusCode < 500 {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readiness check passes on any status < 500, including 404 or 403, which could mean the server is up but the API path doesn't exist yet. Change the condition to only accept 2xx.

Comment thread test/perf/smoke.go
b.WriteString("Content-Transfer-Encoding: base64\r\n")
b.WriteString("Content-Disposition: attachment; filename=\"large.bin\"\r\n")
b.WriteString("\r\n")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// This allocates the full raw payload (~8 MiB) and then a separate base64-encoded
// slice (~11 MiB) per message. Since the raw data is just one repeated byte, we don't
// need to materialise it at all — a repeatReader + io.LimitedReader + base64.NewEncoder
// streaming into a lineWrapper gets this down to ~32 KiB regardless of attachment size.
raw := bytes.Repeat([]byte{byte('A' + index%26)}, attachmentBytes)
encoded := make([]byte, base64.StdEncoding.EncodedLen(len(raw)))
base64.StdEncoding.Encode(encoded, raw)
writeWrappedBase64(&b, encoded)

Comment thread test/perf/smoke.go
return checkDownload(client, cfg, smallID)
}},
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

counts and firstErr share the same mutex mu, but they have different
access patterns — firstErr is checked on every tick in the main loop
(under mu.Lock), while counts is only read after wg.Wait().
Consider a separate mutex or atomic for firstErr to avoid the main
loop contending with workers on every tick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants