Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions components/egress/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -104,13 +104,22 @@ RUN apt-get update \
&& rm -rf /var/lib/apt/lists/*

# Python mitmproxy (transparent mode): mitmdump runs as user mitmproxy; iptables skips this uid.
# /var/lib/mitmproxy is mitm's home, used as the confdir (CA + config.yaml live under .mitmproxy/).
RUN useradd -r -u 10042 -d /var/lib/mitmproxy -s /usr/sbin/nologin mitmproxy \
&& mkdir -p /var/lib/mitmproxy \
&& chown mitmproxy:mitmproxy /var/lib/mitmproxy \
&& mkdir -p /var/lib/mitmproxy/.mitmproxy \
&& chown -R mitmproxy:mitmproxy /var/lib/mitmproxy \
&& pip3 install --no-cache-dir --break-system-packages 'mitmproxy>=10,<11' \
&& (command -v mitmdump && mitmdump --version) \
&& mkdir -p /var/egress/mitmscripts

# Static mitmproxy options (mode, listen_host, connection_strategy, stream_large_bodies,
# http2, ignore_hosts, ssl_verify_upstream_trusted_confdir). mitmdump auto-loads
# config.yaml from its confdir. Dynamic per-deployment options stay env-driven and
# are applied as --set by launch.go (which overrides values declared here).
COPY components/egress/mitmproxy/config.yaml /var/lib/mitmproxy/.mitmproxy/config.yaml
RUN chown mitmproxy:mitmproxy /var/lib/mitmproxy/.mitmproxy/config.yaml \
&& chmod 0644 /var/lib/mitmproxy/.mitmproxy/config.yaml

# All egress runtime artifacts live under one directory to keep paths grouped.
COPY --from=builder /out/egress /opt/opensandbox-egress/egress
COPY --from=builder /out/opensandbox-supervisor /opt/opensandbox-egress/supervisor
Expand All @@ -122,7 +131,8 @@ COPY --from=builder /out/opensandbox-supervisor /opt/opensandbox-egress/supervis
COPY components/egress/scripts/cleanup.sh /opt/opensandbox-egress/cleanup.sh
RUN chmod 0755 /opt/opensandbox-egress/cleanup.sh \
/opt/opensandbox-egress/egress \
/opt/opensandbox-egress/supervisor
/opt/opensandbox-egress/supervisor \
&& ln -s /opt/opensandbox-egress/egress /egress

COPY components/egress/mitmscripts /var/egress/mitmscripts

Expand Down
83 changes: 72 additions & 11 deletions components/egress/docs/mitmproxy-transparent.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,29 +32,83 @@ export OPENSANDBOX_EGRESS_MITMPROXY_PORT=18081

# Optional: load an additional user-defined mitm addon (loaded after the system addon)
export OPENSANDBOX_EGRESS_MITMPROXY_SCRIPT=/path/to/your/addon.py

# Optional: bypass decryption for selected domains (semicolon-separated regex list)
export OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS='.*\.log\.aliyuncs\.com;.*\.example\.internal'
```

To bypass decryption for selected domains, edit the baked-in
`components/egress/mitmproxy/config.yaml` and rebuild the image — see
"Static Configuration (config.yaml)" below.

## Configuration Reference

### Environment Variables (Per-Deployment Overrides)

| Variable | Required | Purpose | Default |
|------|----------|------|--------|
| `OPENSANDBOX_EGRESS_MITMPROXY_TRANSPARENT` | Yes | Enable transparent mitmproxy (`1/true/on`, etc.) | Disabled |
| `OPENSANDBOX_EGRESS_MITMPROXY_PORT` | No | mitmdump listen port; `iptables` redirects `80/443` here | `18081` |
| `OPENSANDBOX_EGRESS_MITMPROXY_SCRIPT` | No | Additional user mitm addon script path (`-s`); loaded after the system addon | Empty |
| `OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS` | No | Host/IP regex list for TLS pass-through (`;` separated) | Empty |
| `OPENSANDBOX_EGRESS_MITMPROXY_CONFDIR` | No | mitm config and CA directory (passed as `--set confdir=`, also used as `HOME`) | Default directory under `/var/lib/mitmproxy` |
| `OPENSANDBOX_EGRESS_MITMPROXY_UPSTREAM_TRUST_DIR` | No | Trust directory for upstream TLS verification (OpenSSL style) | `/etc/ssl/certs` |
| `OPENSANDBOX_EGRESS_MITMPROXY_SSL_INSECURE` | No | Skip upstream TLS certificate verification (`1/true/on`). Needed when clients connect by IP (no SNI → hostname mismatch). | Disabled |
| `OPENSANDBOX_EGRESS_MITMPROXY_UPSTREAM_TRUST_DIR` | No | Trust directory for upstream TLS verification (OpenSSL style); overrides the config.yaml default | `/etc/ssl/certs` |
| `OPENSANDBOX_EGRESS_MITMPROXY_SSL_INSECURE` | No | Skip upstream TLS verification (`1/true/on`); use when clients connect by IP and SNI is unavailable | Disabled |

Notes:

- `OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS` means **no decryption**, not “completely bypass mitm process”.
- In transparent mode, mitmproxy generally recommends matching by IP/range; verify SNI/resolve behavior if using domain regex only.
- Before mitm, `iptables`, and CA export are ready, `GET /healthz` returns `503 (mitm not ready)` to prevent premature readiness.

### Static Configuration (config.yaml)

Fleet-wide, rarely-changing mitm options live in
`components/egress/mitmproxy/config.yaml`, baked into the image at
`/var/lib/mitmproxy/.mitmproxy/config.yaml` and auto-loaded by mitmdump.
This is the single source of truth for:

- `mode` (`transparent`) — mitm default is `regular`
- `listen_host` (`127.0.0.1`) — mitm default is `0.0.0.0`
- `stream_large_bodies` (`1m`) — mitm default is unset (entire body buffered)
- `ssl_verify_upstream_trusted_confdir` (`/etc/ssl/certs`) — mitm default is unset; overridable per-deployment via env
- `connection_strategy` (`lazy`) — mitmproxy 10+ changed the default from `lazy` to `eager`; pinned explicitly to preserve the historical behavior of deferring upstream connections until the full request arrives
- `ignore_hosts` (`[]`) — matches the mitm default; kept in the file as a discoverable extension point for operators adding TLS pass-through entries

Only deviations from the mitm built-in defaults are declared in `config.yaml` (the `ignore_hosts` entry is a discoverability exception; `connection_strategy` is a compatibility pin against the upstream default change). Other options that happen to match the default (`http2=true`, etc.) are omitted — the file is the diff against upstream defaults, not a full enumeration.

Precedence: command-line `--set` (from env overrides) > `config.yaml` > mitmproxy built-in defaults.

#### Overriding the built-in config.yaml

There is no env var to point mitm at an alternate config file. Operators who need different static defaults (e.g. a different `ignore_hosts` list, `connection_strategy`, or `stream_large_bodies`) should pick one of the following:

1. **Build a downstream image** that derives from the official egress image and replaces the file:

```dockerfile
FROM <opensandbox-egress-image>:<tag>
COPY my-config.yaml /var/lib/mitmproxy/.mitmproxy/config.yaml
RUN chown mitmproxy:mitmproxy /var/lib/mitmproxy/.mitmproxy/config.yaml \
&& chmod 0644 /var/lib/mitmproxy/.mitmproxy/config.yaml
```

This is the recommended path because the override is version-controlled, reviewable, and reproducible.

2. **Mount an override file at runtime** over the baked-in path. For Kubernetes, mount a `ConfigMap` as a file at `/var/lib/mitmproxy/.mitmproxy/config.yaml` (be aware that a `ConfigMap` file mount typically lands as read-only with the original UID, so verify the mitmproxy user can read it):

```yaml
volumeMounts:
- name: mitm-config
mountPath: /var/lib/mitmproxy/.mitmproxy/config.yaml
subPath: config.yaml
readOnly: true
volumes:
- name: mitm-config
configMap:
name: egress-mitm-config
defaultMode: 0644
```

Useful for staged rollouts or per-environment overrides without rebuilding the image.

3. **Single-option escape hatch via env-driven `--set`** (already supported for the documented env variables above). This only works for options exposed via env and only for the single specific override; it cannot replace the whole file.

Do not edit `config.yaml` inside a running container — the file lives in the container layer, edits are lost on restart, and the mitmproxy user has read-only access by design.

## Common Configuration Templates

### 1) Enable Transparent MITM Only
Expand Down Expand Up @@ -82,11 +136,18 @@ The user addon is loaded after the system addon (`-s system.py -s user.py`), so

### 4) Bypass Decryption for Specific Domains (e.g. log upload)

```bash
export OPENSANDBOX_EGRESS_MITMPROXY_TRANSPARENT=true
export OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS='.*\.log\.aliyuncs\.com'
Edit `components/egress/mitmproxy/config.yaml` and append to `ignore_hosts`,
then rebuild the egress image:

```yaml
ignore_hosts:
- '.*\.log\.aliyuncs\.com'
```

`ignore_hosts` means **no decryption**, not "completely bypass mitm process":
mitm still proxies the TCP connection, it just forwards bytes without
breaking TLS, and addons do not see request/response content.

### 5) Use a Fixed CA (consistent fingerprint across replicas)

If CA files already exist in `confdir`, mitmproxy reuses them instead of regenerating on each startup. Typical paths:
Expand Down
44 changes: 44 additions & 0 deletions components/egress/mitmproxy/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Static mitmproxy options that override mitm built-in defaults for the
# OpenSandbox egress sidecar. Loaded automatically by mitmdump from
# /var/lib/mitmproxy/.mitmproxy/config.yaml.
#
# Only deviations from mitm defaults are listed here. Options that
# happen to match the mitm default (http2=true, etc.) are intentionally
# omitted — the file is meant to be the diff against upstream defaults,
# not a full enumeration. Two intentional exceptions to this rule:
# ignore_hosts (kept as a discoverable extension point) and
# connection_strategy (mitmproxy 10+ changed the default from lazy to
# eager; we pin lazy explicitly to preserve the historical behavior).
#
# Per-deployment overrides remain env-driven and applied as --set by
# launch.go. Precedence: command-line --set > this file > mitm defaults.

mode:
- transparent

# mitm default changed from lazy to eager in mitmproxy 10+. We pin
# lazy explicitly: upstream connections are deferred until the full
# request arrives, avoiding unnecessary upstream opens for blocked
# or filtered requests.
connection_strategy: lazy

# mitm default 0.0.0.0; transparent mode must only accept loopback inside
# the netns (iptables REDIRECT pushes outbound traffic here, and exposing
# mitm on the LAN would route any inbound connection through it).
listen_host: 127.0.0.1

# mitm default None (whole body buffered in memory). 1m bounds RSS for
# the allow path; chunked / SSE responses are forced to stream regardless
# by the system addon's responseheaders hook.
stream_large_bodies: 1m

# mitm default None (Python certifi bundle). Match the OS trust store so
# private-CA additions land where mitm reads them.
ssl_verify_upstream_trusted_confdir: /etc/ssl/certs

# Hosts (Python regex) for TLS pass-through: mitm forwards bytes without
# decryption and addons do not see request/response content. Empty matches
# the mitm default; kept here as a discoverable extension point. Append
# entries here rather than passing --set on the command line, because
# --set on a list option REPLACES the entire list.
ignore_hosts: []
4 changes: 1 addition & 3 deletions components/egress/mitmproxy_transparent.go
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,6 @@ func startMitmproxyTransparentIfEnabled() (*mitmTransparent, error) {
cfg := mitmproxy.Config{
ListenPort: mpPort,
UserName: mitmproxy.RunAsUser,
ConfDir: strings.TrimSpace(os.Getenv(constants.EnvMitmproxyConfDir)),
ScriptPath: strings.TrimSpace(os.Getenv(constants.EnvMitmproxyScript)),
}
// Buffer absorbs OnExit events from a retry storm so OnExit goroutines
Expand All @@ -131,8 +130,7 @@ func startMitmproxyTransparentIfEnabled() (*mitmTransparent, error) {
}
log.Infof("mitmproxy: transparent intercept active (OUTPUT tcp 80,443 -> %d; trust mitm CA in clients)", mpPort)

confDir := strings.TrimSpace(os.Getenv(constants.EnvMitmproxyConfDir))
if err := mitmproxy.SyncRootCA(confDir, mpHome); err != nil {
if err := mitmproxy.SyncRootCA("", mpHome); err != nil {
return nil, fmt.Errorf("mitm CA export: %w", err)
}
return &mitmTransparent{
Expand Down
40 changes: 38 additions & 2 deletions components/egress/mitmscripts/system.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,48 @@
#
# Behavior:
# Forces streaming for SSE / chunked responses so each chunk is forwarded
# immediately, bypassing the stream_large_bodies=1m buffer set in launch.go
# immediately, bypassing the stream_large_bodies=1m buffer set in config.yaml
# (which otherwise stalls LLM-style small-chunk streams).
#
# Implements SNI-aware ignore_hosts for transparent mode. mitmproxy's
# built-in ignore_hosts check in transparent mode matches against the
# destination IP first; the SNI hostname is only available inside the TLS
# ClientHello, which arrives after the initial check. This addon re-checks
# the same ignore_hosts patterns against the SNI hostname at the
# tls_clienthello layer and sets ignore_connection=True when a match is
# found, ensuring domain-based TLS pass-through works reliably.
#
# User-defined addons can be loaded alongside this script via
# OPENSANDBOX_EGRESS_MITMPROXY_SCRIPT.
from mitmproxy import http
import re

from mitmproxy import ctx, http
from mitmproxy.tls import ClientHelloData


def tls_clienthello(data: ClientHelloData) -> None:
"""Re-check ignore_hosts patterns against SNI hostname.

In transparent mode, mitmproxy checks ignore_hosts against the
destination IP:port before the TLS handshake. If the check fails at
that stage (SNI not yet available), we get a second chance here with
the actual hostname from the ClientHello SNI extension.
"""
sni = data.client_hello.sni
if not sni:
return

patterns = ctx.options.ignore_hosts
if not patterns:
return

for pattern in patterns:
try:
if re.search(pattern, sni):
data.ignore_connection = True
return
except re.error:
pass


def responseheaders(flow: http.HTTPFlow) -> None:
Expand Down
5 changes: 3 additions & 2 deletions components/egress/pkg/constants/configuration.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,13 @@ const (
EnvNameserverExempt = "OPENSANDBOX_EGRESS_NAMESERVER_EXEMPT"

// MITM: mitmdump transparent; Linux + CAP_NET_ADMIN, runs as a dedicated user.
// Static mitm options (mode, connection_strategy, listen_host, stream_large_bodies,
// ignore_hosts, ssl_verify_upstream_trusted_confdir default) live in
// /var/lib/mitmproxy/.mitmproxy/config.yaml; only per-deployment overrides are env-driven.
EnvMitmproxyTransparent = "OPENSANDBOX_EGRESS_MITMPROXY_TRANSPARENT"
EnvMitmproxyPort = "OPENSANDBOX_EGRESS_MITMPROXY_PORT"
EnvMitmproxyConfDir = "OPENSANDBOX_EGRESS_MITMPROXY_CONFDIR"
EnvMitmproxyScript = "OPENSANDBOX_EGRESS_MITMPROXY_SCRIPT"
EnvMitmproxyUpstreamTrustDir = "OPENSANDBOX_EGRESS_MITMPROXY_UPSTREAM_TRUST_DIR"
EnvMitmproxyIgnoreHosts = "OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS"
EnvMitmproxySslInsecure = "OPENSANDBOX_EGRESS_MITMPROXY_SSL_INSECURE"

// Comma-separated upstream resolvers: literal IP only (optional :port) — no hostnames (see dnsproxy REDIRECT note).
Expand Down
52 changes: 21 additions & 31 deletions components/egress/pkg/mitmproxy/launch.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,17 +32,23 @@ import (
const RunAsUser = "mitmproxy"

// Loopback: transparent mode receives via REDIRECT; do not listen on 0.0.0.0 in the netns.
// Kept as a Go constant only for the startup log line; the actual listen_host is set in
// /var/lib/mitmproxy/.mitmproxy/config.yaml (shipped via the egress Dockerfile).
const listenHostLoopback = "127.0.0.1"

// systemScriptPath: bundled system addon shipped via the egress Dockerfile
// (COPY components/egress/mitmscripts /var/egress/mitmscripts). Always loaded.
const systemScriptPath = "/var/egress/mitmscripts/system.py"

// Config: mitmdump --mode transparent; UserName must match iptables ! --uid-owner, ConfDir is mitm state/CA.
// Config: mitmdump --mode transparent. Static options (mode, connection_strategy,
// listen_host, stream_large_bodies, ignore_hosts,
// ssl_verify_upstream_trusted_confdir) live in
// /var/lib/mitmproxy/.mitmproxy/config.yaml and are auto-loaded by mitmdump.
// This struct carries only per-launch dynamic values that override those
// defaults via `--set`.
type Config struct {
ListenPort int
UserName string
ConfDir string
// ScriptPath is an optional user-supplied addon, loaded after the system addon.
ScriptPath string
// OnExit is called (if non-nil) when mitmdump exits. Called from a background goroutine.
Expand Down Expand Up @@ -92,24 +98,21 @@ func Launch(cfg Config) (*Running, error) {
return nil, fmt.Errorf("mitmproxy: lookup user %q: %w", uname, err)
}

// Only per-launch dynamic values are passed on the command line. Static
// options (mode, listen_host, connection_strategy, stream_large_bodies,
// http2, ignore_hosts, ssl_verify_upstream_trusted_confdir) come from
// /var/lib/mitmproxy/.mitmproxy/config.yaml shipped in the egress image.
// `--set` overrides config.yaml, so the env-driven overrides below take
// precedence at runtime without rebuilding the image.
Comment thread
Pangjiping marked this conversation as resolved.
args := []string{
"--mode", "transparent",
"--listen-host", listenHostLoopback,
"--listen-port", strconv.Itoa(cfg.ListenPort),
}
Comment thread
Pangjiping marked this conversation as resolved.

trustDir := strings.TrimSpace(os.Getenv(constants.EnvMitmproxyUpstreamTrustDir))
if trustDir == "" {
trustDir = "/etc/ssl/certs"
// Upstream cert trust path override. Default in config.yaml is /etc/ssl/certs;
// override per-deployment when the upstream uses a private CA bundle.
if trustDir := strings.TrimSpace(os.Getenv(constants.EnvMitmproxyUpstreamTrustDir)); trustDir != "" {
args = append(args, "--set", "ssl_verify_upstream_trusted_confdir="+trustDir)
}
args = append(args, "--set", "ssl_verify_upstream_trusted_confdir="+trustDir)

// Stream large bodies instead of buffering them in memory (OOM prevention).
args = append(args, "--set", "stream_large_bodies=1m")

// Lazy connection strategy: defer upstream connection until the request is fully received,
// which avoids unnecessary connections for blocked/filtered requests.
args = append(args, "--set", "connection_strategy=lazy")

// Transparent mode redirects TCP to IP addresses. Clients connecting to IPs
// do not send SNI, so upstream TLS cert hostname verification fails with
Expand All @@ -119,34 +122,21 @@ func Launch(cfg Config) (*Running, error) {
args = append(args, "--set", "ssl_insecure=true")
}

homeEnv := home
if strings.TrimSpace(cfg.ConfDir) != "" {
cd := strings.TrimSpace(cfg.ConfDir)
args = append(args, "--set", "confdir="+cd)
homeEnv = cd
}
// Load the system addon first so user addons can observe / override its hooks.
args = append(args, "-s", systemScriptPath)
if user := strings.TrimSpace(cfg.ScriptPath); user != "" {
args = append(args, "-s", user)
}

// Upstream passthrough: each pattern becomes --set ignore_hosts= (regex; IP ranges are practical in transparent mode).
for _, p := range strings.Split(os.Getenv(constants.EnvMitmproxyIgnoreHosts), ";") {
p = strings.TrimSpace(p)
if p == "" {
continue
}
args = append(args, "--set", "ignore_hosts="+p)
}

cmd := exec.Command("mitmdump", args...)
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
cmd.SysProcAttr = &syscall.SysProcAttr{
Credential: &syscall.Credential{Uid: uid, Gid: gid},
}
cmd.Env = append(os.Environ(), "HOME="+homeEnv)
// HOME determines mitm's confdir (~/.mitmproxy) which holds both the CA
// and the baked-in config.yaml.
cmd.Env = append(os.Environ(), "HOME="+home)
Comment thread
Pangjiping marked this conversation as resolved.

if err := cmd.Start(); err != nil {
return nil, fmt.Errorf("mitmproxy: start mitmdump: %w", err)
Expand Down
Loading