Improve Maildir performance for large messages by supfeer · Pull Request #479 · mailhog/MailHog

supfeer · 2026-05-22T18:38:27Z

Summary

This PR improves Maildir-backed MailHog performance for large messages and long-running instances.

The main change is to keep the existing /api/v1 and /api/v2 contracts compatible, while adding optimized /api/v3 endpoints for the web UI and high-volume polling use cases:

/api/v1 and /api/v2 continue returning full legacy message payloads for API clients.
/api/v3/messages and /api/v3/search return compact metadata only.
/api/v3/messages/{id} returns a bounded preview for large messages.
/api/v3/messages/{id}/body returns body preview chunks for progressive UI loading.
/api/v3/messages/{id}/download streams the original message without loading the whole message into memory.
/api/v3/messages?older_than=... supports deleting messages older than a relative age.
/api/v3/messages?created_before=... supports deleting messages created before an absolute cutoff.

The web UI now uses /api/v3 for list/search/websocket/preview/download paths, so large attachments do not inflate list/search responses.

Why

Polling workloads that wait for test emails can generate many repeated list/search requests. With large messages or attachments in Maildir, the previous behavior could repeatedly read and serialize full message bodies and MIME payloads for list/search responses. That makes responses much larger than the data needed for polling and puts pressure on memory in long-running containers.

What Changed

Build Docker images from the current checkout instead of go install github.com/mailhog/MailHog@latest.
Add streaming Maildir writes for SMTP DATA so large messages do not need to be retained in memory during storage.
Add a Maildir metadata cache with background refresh for list/search.
Add compact Maildir list/search entries while preserving full loads where legacy APIs require them.
Add bounded message preview, body chunk preview, and streaming download storage interfaces.
Add API v3 compact endpoints and keep v1/v2 compatibility by hydrating compact storage entries before returning legacy responses.
Add old-message deletion filters on v3:
- DELETE /api/v3/messages?older_than=1h
- DELETE /api/v3/messages?older_than=1d
- DELETE /api/v3/messages?older_than=1w
- DELETE /api/v3/messages?created_before=2026-05-22T12:34:56Z
- DELETE /api/v3/messages?created_before=1700000000000
Update embedded UI assets to use v3 list/search/websocket/preview/download paths and show loading state while previews load.
Add GitHub Actions CI:
- gofmt check
- first-party Go tests in GOPATH mode
- Docker build
- 600MB memory-limit performance smoke
- GHCR publish on master and v* tags

Before And After Examples

Before: list/search could include large raw payloads

Legacy list/search responses could include full message content that is not needed for polling:

{
  "total": 50,
  "count": 50,
  "items": [
    {
      "ID": "message-id",
      "Raw": {
        "From": "...",
        "To": ["..."],
        "Data": "full SMTP DATA including large MIME attachments"
      },
      "Content": {
        "Headers": {
          "Subject": ["Large attachment"]
        },
        "Body": "full body / MIME payload",
        "Size": 104857600
      },
      "MIME": {
        "Parts": ["large parsed MIME tree"]
      }
    }
  ]
}

That shape is still available from /api/v1 and /api/v2 for compatibility.

After: compact v3 list/search for polling and UI

GET /api/v3/messages?limit=50
GET /api/v3/search?kind=to&query=user@example.com&limit=50

{
  "total": 23,
  "count": 23,
  "start": 0,
  "items": [
    {
      "ID": "message-id",
      "From": {
        "Mailbox": "sender",
        "Domain": "example.com"
      },
      "To": [
        {
          "Mailbox": "user",
          "Domain": "example.com"
        }
      ],
      "Created": "2026-05-22T12:34:56Z",
      "Content": {
        "Headers": {
          "Subject": ["Large attachment"]
        },
        "Size": 104857600
      }
    }
  ]
}

The v3 compact contract intentionally omits:

Raw
Content.Body
MIME
attachment payloads

After: progressive body preview

GET /api/v3/messages/{id}/body?offset=0&limit=1048576

{
  "ID": "message-id",
  "Content": {
    "Headers": {
      "Content-Type": ["text/html; charset=utf-8"]
    },
    "Body": "<preview chunk up to the requested limit>",
    "Size": 12582912
  },
  "Offset": 0,
  "NextOffset": 1048576,
  "Limit": 1048576,
  "MaxSize": 10485760,
  "HasMore": true,
  "Truncated": false,
  "Source": "mime"
}

After: streaming download

GET /api/v3/messages/{id}/download

The response is streamed as message/rfc822 from storage, without loading the full message into memory.

After: deletion filters

Relative age:

DELETE /api/v3/messages?older_than=1h
DELETE /api/v3/messages?older_than=24h
DELETE /api/v3/messages?older_than=1d
DELETE /api/v3/messages?older_than=7d
DELETE /api/v3/messages?older_than=1w

Absolute cutoff:

DELETE /api/v3/messages?created_before=2026-05-22T12:34:56Z
DELETE /api/v3/messages?created_before=1700000000000

older_than intentionally does not accept timestamps; timestamp-like values belong in created_before.

Compatibility Notes

Existing /api/v1/messages behavior is preserved.
Existing /api/v2/messages, /api/v2/search, and /api/v2/websocket continue returning full legacy messages.
Optimized compact responses are introduced under /api/v3.
The web UI intentionally uses /api/v3.

Docker Image Naming

The workflow publishes to:

ghcr.io/${{ github.repository_owner }}/mailhog

For the upstream repository this resolves to:

ghcr.io/mailhog/mailhog

It also publishes branch/ref, tag, latest for master, and sha-<commit> tags.

Local Action Log

Implemented and split into commits:

9ec2255 Build Docker image from checkout
517ee72 Stream and cache Maildir messages
122220c Add API v3 compact message endpoints
cf1aac2 Update UI to use API v3 previews
6c25579 Add CI pipeline and performance smoke

Key implementation steps:

Created branch performance-boost
Updated Dockerfile and .dockerignore so Docker builds the current checkout
Added streaming Maildir message writer and cache refresh loop
Added Maildir preview, body chunk, streaming download, and delete-old support
Kept v1/v2 full-message API compatibility through hydration helpers
Added v3 compact API for list/search/websocket/preview/body/download/delete
Updated embedded UI assets to use v3 endpoints
Added GitHub Actions CI and stdlib Go performance smoke runner
Committed the changes as supfeer <supfeer@gmail.com>

Validation Log

GOPATH and unit test validation:

GOPATH=/tmp/mailhog-ci-verify GO111MODULE=off go test -timeout=60s ./...
?    github.com/mailhog/MailHog                                      [no test files]
?    github.com/mailhog/MailHog/config                               [no test files]
?    github.com/mailhog/MailHog/test/perf                            [no test files]
ok   github.com/mailhog/MailHog/vendor/github.com/mailhog/storage
ok   github.com/mailhog/MailHog/vendor/github.com/mailhog/MailHog-Server/api
?    github.com/mailhog/MailHog/vendor/github.com/mailhog/smtp        [no test files]
ok   github.com/mailhog/MailHog/vendor/github.com/mailhog/MailHog-Server/smtp

Docker build:

docker build -t mailhog-ci-test .
...
naming to docker.io/library/mailhog-ci-test done

600MB memory-limit performance smoke:

docker run -d --name mailhog-ci-smoke \
  --memory=600m \
  --memory-swap=600m \
  -p 127.0.0.1:18027:8025 \
  -p 127.0.0.1:11027:1025 \
  -e MH_STORAGE=maildir \
  -e MH_MAILDIR_PATH=/home/mailhog/maildir \
  -v mailhog-ci-smoke-maildir:/home/mailhog/maildir \
  mailhog-ci-test

MAILHOG_HTTP_URL=http://127.0.0.1:18027 \
MAILHOG_SMTP_ADDR=127.0.0.1:11027 \
MAILHOG_SMOKE_MAX_COMPACT_BYTES=2097152 \
go run ./test/perf/smoke.go

Smoke output:

sent dataset: small=20 large=3
validated compact list response: messages=23 total=23 bytes=13414
validated compact search response: messages=20 bytes=11563
validated body preview: id=... bodyBytes=48 hasMore=false truncated=false source=mime
load complete: sent=100 counts=map[body:25 download:25 list:25 search:25]
perf smoke passed

Observed memory sample during smoke:

mailhog-ci-smoke 135MiB / 600MiB

Also checked:

git diff --check
gofmt -l main.go config test/perf vendor/github.com/mailhog

Both passed with no reported issues.

codes-stories · 2026-05-28T17:30:32Z

+		resp, err := client.Get(endpoint(cfg.HTTPURL, "/api/v3/messages?limit=1"))
+		if err == nil {
+			resp.Body.Close()
+			if resp.StatusCode >= 200 && resp.StatusCode < 500 {


The readiness check passes on any status < 500, including 404 or 403, which could mean the server is up but the API path doesn't exist yet. Change the condition to only accept 2xx.

codes-stories · 2026-05-28T17:40:59Z

+	b.WriteString("Content-Transfer-Encoding: base64\r\n")
+	b.WriteString("Content-Disposition: attachment; filename=\"large.bin\"\r\n")
+	b.WriteString("\r\n")
+


// This allocates the full raw payload (~8 MiB) and then a separate base64-encoded
// slice (~11 MiB) per message. Since the raw data is just one repeated byte, we don't
// need to materialise it at all — a repeatReader + io.LimitedReader + base64.NewEncoder
// streaming into a lineWrapper gets this down to ~32 KiB regardless of attachment size.
raw := bytes.Repeat([]byte{byte('A' + index%26)}, attachmentBytes)
encoded := make([]byte, base64.StdEncoding.EncodedLen(len(raw)))
base64.StdEncoding.Encode(encoded, raw)
writeWrappedBase64(&b, encoded)

codes-stories · 2026-05-28T17:46:43Z

+			return checkDownload(client, cfg, smallID)
+		}},
+	}
+


counts and firstErr share the same mutex mu, but they have different
access patterns — firstErr is checked on every tick in the main loop
(under mu.Lock), while counts is only read after wg.Wait().
Consider a separate mutex or atomic for firstErr to avoid the main
loop contending with workers on every tick.

supfeer added 6 commits May 22, 2026 21:32

Build Docker image from checkout

9ec2255

Stream and cache Maildir messages

517ee72

Add API v3 compact message endpoints

122220c

Update UI to use API v3 previews

cf1aac2

Add CI pipeline and performance smoke

6c25579

Clarify delete age filters

634aed2

supfeer marked this pull request as ready for review May 22, 2026 19:49

codes-stories reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Maildir performance for large messages#479

Improve Maildir performance for large messages#479
supfeer wants to merge 6 commits into
mailhog:masterfrom
supfeer:performance-boost

supfeer commented May 22, 2026 •

edited

Loading

Uh oh!

codes-stories May 28, 2026

Uh oh!

codes-stories May 28, 2026

Uh oh!

codes-stories May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

supfeer commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What Changed

Before And After Examples

Before: list/search could include large raw payloads

After: compact v3 list/search for polling and UI

After: progressive body preview

After: streaming download

After: deletion filters

Compatibility Notes

Docker Image Naming

Local Action Log

Validation Log

Uh oh!

codes-stories May 28, 2026

Choose a reason for hiding this comment

Uh oh!

codes-stories May 28, 2026

Choose a reason for hiding this comment

Uh oh!

codes-stories May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

supfeer commented May 22, 2026 •

edited

Loading