- Build:
make build(builds for current TARGETOS/TARGETARCH) - Full pipeline:
make all(format → lint → test → build) - Unit tests:
make test(requiresCGO_ENABLED=1for race detector) - Test with coverage:
make test-coverage - Race detector:
make test-race(linux/amd64 only) - Lint:
make lint(runs golangci-lint with.golangci.yaml) - Format:
make fmt - Cross-compile:
make release(darwin/linux/windows × amd64/arm64) - Integration tests:
make integration-tests(requires Docker, uses bats) - All integration tests:
make integration-tests-all(includes stress tests) - Advanced Go integration tests:
make integration-tests-advanced(Go-based tests intests/integration/) - Generate mocks:
make mocks(uses mockery) - Unit tests in sandbox:
CGO_ENABLED=0 go test ./...(skips race detector, works without CGO toolchain)
Integration tests use bats:
- Tests are in
tests/*.batswith helpers intests/test_helper.bash - Run all tests locally (recommended):
colima ssh -- sudo bats tests/*.bats tests/containerd_*.bats- Colima VM has native Docker + containerd sockets;
sudois required for containerd sidecar tests
- Colima VM has native Docker + containerd sockets;
- Docker tests only (via Docker image):
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock --entrypoint bats pumba:test tests/*.bats - Containerd tests only:
colima ssh -- sudo bats tests/containerd_*.bats - Podman tests only (macOS, inside podman machine VM):
podman machine ssh sudo bats tests/podman_*.bats - Podman tests only (Linux, rootful):
sudo bats tests/podman_*.bats - CI builds a Docker image (
pumba:testtargetintegration-tests) and runs bats inside it - Rebuild test image after code changes:
docker build --target integration-tests -t pumba:test -f docker/Dockerfile . - Copy updated binary to Colima:
colima ssh -- sudo cp /Users/alexei/workspace/pumba/.bin/linux/pumba /usr/local/bin/pumba - Bats teardown: Use
sudo pkill -f "pumba.*<container-name>"to stop background pumba processes;kill %1for job control fallback
- Go version: 1.26 (see go.mod)
- CLI framework:
github.com/urfave/cli(v1) - Docker SDK:
github.com/docker/dockerv28.5.2 - Containerd SDK:
github.com/containerd/containerd/v2(containerd runtime support) - Error handling:
github.com/pkg/errors(deprecated — migrate tofmt.Errorfwith%w) - Logging:
github.com/sirupsen/logrus - Testing:
github.com/stretchr/testify(assert, mock, require) - Mocking:
github.com/vektra/mockery - Linting: golangci-lint with
.golangci.yaml - CI: GitHub Actions (build.yaml, release.yaml, codeql-analysis.yml, nettools-images.yaml)
cmd/
main.go — CLI entry point: main(), init(), signal context, app construction
runtime.go — createRuntimeClient, tlsConfig, runtime factory vars (newDockerClient/newContainerdClient/newPodmanClient)
logging.go — setupLogging: log-level switch + slackrus hook wiring
flags.go — globalFlags(rootCertPath) builder
commands.go — initializeCLICommands wiring chaos cmd builders into urfave/cli commands
pkg/
chaos/
command.go — ChaosCommand interface, Runtime factory type, scheduling/interval runner
runner.go — chaos.RunOnContainers / RunOnContainersAll fanout helper (the canonical list-then-random-then-parallel/serial pattern)
cmd/ — Generic NewAction[P] CLI builder shared across all chaos packages
cliflags/ — urfave/cli v1 adapter (Flags interface + V1, NewV1FromApp for app-level context) decoupling cmd builders from cli.Context
lifecycle/ — Runtime-agnostic lifecycle chaos actions (kill, stop, pause, rm, exec, restart)
lifecycle/cmd/ — CLI command builders for lifecycle chaos actions
netem/ — Network emulation (delay, loss, corrupt, duplicate, rate, loss_ge, loss_state)
netem/cmd/ — CLI command builders for netem
iptables/ — iptables-based packet filtering
iptables/cmd/ — CLI command builders for iptables
stress/ — stress-ng based resource stress
stress/cmd/ — CLI command builder for stress
container/ — Container model, interfaces (Client, Lister, Lifecycle, Executor, Netem, etc.), filtering, NetemRequest/IPTablesRequest/StressRequest/StressResult/RemoveOpts value objects
runtime/
docker/ — Docker runtime implementation of container.Client
containerd/ — Containerd runtime implementation of container.Client
podman/ — Podman runtime implementation (embeds Docker client, overrides stress cgroup resolution + rootless guards)
util/ — Shared utilities (IP/port parsing, ValidateInterfaceName for network interface name validation)
mocks/ — Generated mock files (mockery)
tests/ — Bats integration tests
docker/ — Dockerfiles (main, alpine-nettools, debian-nettools)
deploy/ — K8s/OpenShift deployment manifests
examples/ — Demo scripts
- Container interfaces (
pkg/container/client.go): Focused sub-interfaces (Lister, Lifecycle, Executor, Netem, IPTables, Stressor) composed into a unified Client interface. Fat methods take request value objects frompkg/container/requests.go:Netem/IPTablestake*NetemRequest/*IPTablesRequest,Stressor.StressContainertakes*StressRequestand returns*StressResult(SidecarID+Output/Errorschannels),Lifecycle.RemoveContainertakesRemoveOpts(force/links/volumes/dryRun bools).SidecarSpeccarries the implementation-hint(Image, Pull)pair that runtime adapters may ignore. - Runtime factory injection (
pkg/chaos/command.go):chaos.Runtime func() container.Clientis a closure-based factory. Every CLI builder constructor takesruntime chaos.Runtimeexplicitly — nochaos.DockerClientglobal, no service locator.cmd/main.go::beforeconstructs the closure once after global flag parsing and propagates it throughinitializeCLICommands. - Canonical chaos fanout (
pkg/chaos/runner.go):RunOnContainers(ctx, lister, gp, limit, random, parallel, fn)is the single helper every chaos action'sRun()calls — it lists matching containers, optionally narrows to a random pick, and runsfneither viaerrgroup(parallel) or sequentially (serial).RunOnContainersAllvariant lists stopped + running for lifecycle ops that target stopped containers. New chaos actions MUST use this helper instead of hand-rolling the list-then-fanout loop. - Interface name validation (
pkg/util/util.go::ValidateInterfaceName): single source of truth for theeth0/en0/lo/vlan.10regex check; bothpkg/chaos/netem/parse.goandpkg/chaos/iptables/parse.gocall it. Never re-introduce a localregexp.MustCompilefor interface names. - Generic CLI builder (
pkg/chaos/cmd/builder.go):NewAction[P]collapses all 17 chaos cmd files into the same shape — flag list + typedParamParser[P]+CommandFactory[P]. Parsers receivecliflags.Flags, never*cli.Contextdirectly. - Shared cmd parsing (
pkg/chaos/{netem,iptables}/parse.go::ParseRequestBase): per-action cmd parsers (delay.go,loss.go, …) callParseRequestBasefirst to read parent-level flags viac.Parent()and build the populated base request (*NetemRequestfor netem; iptables returns a smallRequestBasecarrying*IPTablesRequest+ iface/protocol/limit). Per-action parsers then fill only their action-specific fields. New netem/iptables subcommands MUST follow this pattern — never re-parse the parent flag set inline. - CLI flags adapter (
pkg/chaos/cliflags/):Flagsinterface wrapsurfave/cliv1 viaV1. Future v3 migration is a one-file swap (v3.go+ wiring change incmd/main.go). - Docker runtime (
pkg/runtime/docker/): Docker SDK implementation of container.Client; split per-concern acrossclient.go,http_client.go,inspect.go,lifecycle.go,exec.go,sidecar.go,netem.go,iptables.go,stress.go,cgroup.go,pull.go(no monolith — every file < 350 LOC) - Containerd runtime (
pkg/runtime/containerd/): Containerd implementation of container.Client (socket:/run/containerd/containerd.sock, namespace:k8s.io); split per-concern acrossclient.go,api.go,container.go,task.go,commands.go,cgroup.go,sidecar.go,netem.go,iptables.go,stress.go,stress_sidecar.go(every production file ≤ 250 LOC) - Podman runtime (
pkg/runtime/podman/): Podman implementation of container.Client; reuses the Docker SDK against Podman's Docker-compat socket and overrides only what diverges (stress cgroup resolution + rootless guards). Socket auto-detected from$CONTAINER_HOST,$PODMAN_SOCK,podman machine inspect,/run/podman/podman.sock, and$XDG_RUNTIME_DIR/podman/podman.sock; override via--podman-socket. Cgroup parent/leaf derived host-side from/proc/<pid>/cgroupof the target container (seepkg/runtime/podman/cgroup.go) — pumba must run on the same kernel as the targets. - Chaos commands: Each action implements
ChaosCommandinterface withRun(ctx, random)method - Network emulation / iptables filtering: Both paths (
NetemContainer,IPTablesContainer) execute commands inside an ephemeral sidecar container that joins the target's network namespace. The shared lifecycle — create, start, exec-per-args (with exit-code check), force-remove — lives inrunSidecar/runSidecarExec(pkg/runtime/docker/sidecar.go). Neithernetem.gonoriptables.gomanage sidecar lifecycle directly. - Stress testing: Two modes — (1) default child-cgroup mode places stress-ng sidecar in target's cgroup via Docker's
--cgroup-parent; (2) inject-cgroup mode (--inject-cgroup) uses thecg-injectbinary (shipped inghcr.io/alexei-led/stress-ng) to write sidecar PID into target'scgroup.procsfor shared resource accounting - Target selection: Container names (exact), comma-separated lists, or
re2:prefixed regex patterns - Label filtering:
--label key=valueflags for container selection - Interval mode:
--intervalflag for recurring chaos on a schedule
- Error wrapping: Currently uses
github.com/pkg/errors— migrate tofmt.Errorf("...: %w", err) - Interfaces: Define interfaces for testability (Client in
pkg/container/client.go) - Constructor injection over globals: Chaos cmd builders take
runtime chaos.Runtimeand produce a closure that reads it lazily. Never reintroduce achaos.DockerClient-style global — visibility in the function signature is the rule. - Request value objects for fat methods: Interface methods accept request structs from
pkg/container/requests.go, not long positional arg lists. Examples:Netem.NetemContainer(*NetemRequest),IPTables.IPTablesContainer(*IPTablesRequest),Stressor.StressContainer(*StressRequest) (*StressResult, error),Lifecycle.RemoveContainer(*Container, RemoveOpts). New runtime methods that exceed 4 params must follow the same pattern. Pass pointer for ≥3-field requests; pass-by-value for tiny opts (likeRemoveOpts's 4 bools). - Chaos fanout via
chaos.RunOnContainers: New chaos actions MUST route throughchaos.RunOnContainers(orRunOnContainersAllfor stopped-container targeting) — never hand-rollcontainer.ListNContainers+ manualsync.WaitGroup/errgroupfanout in aRun()body. The helper enforces the list → random-pick → parallel-or-serial → collect-errors shape uniformly. Action-specific timeouts (e.g.context.WithTimeout(ctx, n.req.Duration)) belong inside the closure, not at the caller. - Interface name validation: call
util.ValidateInterfaceName(name)for any new flag that takes a network interface name. Do not introduce a localregexp.MustCompilefor theeth0/en0/vlan.10pattern. - Mocking: testify mocks, generated by mockery. Mock files in
mocks/andpkg/container/mock_*.go - Mock constructor: Always use
container.NewMockClient(t)— nevernew(container.MockClient); auto-asserts expectations on test cleanup - Mock request structs: EXPECT() calls for
NetemContainer/IPTablesContainer/StressContainerpass the corresponding*NetemRequest/*IPTablesRequest/*StressRequestliteral (ormock.AnythingOfType("*container.NetemRequest")etc.);RemoveContainertakes aRemoveOpts{...}literal (value, not pointer). Never the old positional form. - Mock context: Use
mock.Anythingonly forcontext.Contextargs; use exact values for all business params - Mock random container: Use
mock.AnythingOfType("*container.Container")+.Once()when only one of N containers is targeted - Logging: logrus with structured fields. Log level set via
--log-levelflag - Constants: Magic numbers use
mndlinter — use named constants - Cleanup defer survives cancellation: Use
context.WithoutCancel(ctx)with a timeout so defers (e.g. sidecar cleanup) run even when the caller cancels - gocyclo violations (limit: 15): Extract loop bodies or complex branches into named helper methods
- funlen violations (limit: 105): Extract initialization blocks (flags, config setup) into named helper functions
- Default branch:
master - NEVER add AI co-author to git commits
- Skip from unit tests (covered by integration tests or untestable without real runtime):
cmd/main.goand the rest ofcmd/*.goexceptcmd/main_test.go(which exercises thecreateRuntimeClientseam incmd/runtime.go),mocks/,NewClient/Close,sidecar.go(real container API for create/start/exec/remove),*/cmd/flag builders - Run() method variants: Always add NoContainers + DryRun + WithRandom test cases
- Run unit tests in sandbox:
CGO_ENABLED=0 go test ./...(no CGO needed outside CI)
- PID 1 signal handling:
sleep/tail -f /dev/nullas container PID 1 ignores SIGTERM — usetopin bats tests that kill with SIGTERM - iptables flag ordering:
--source,--destination,--src-port,--dst-portare on theiptablesparent command, NOT on thelosssubcommand - exec command parsing:
--command "touch /tmp/foo"is wrong (treated as binary name with spaces); use--command "touch" --args "/tmp/foo" - Containerd sidecar requires root: netem/iptables tests on containerd need
sudo pumba— overlayfs mounts for sidecar creation require root in Colima VM - Containerd namespaces: Docker-managed containers live in
mobynamespace; pure containerd indefault; Kubernetes ink8s.io - Podman requires rootful for netem/iptables/stress: rootless Podman is detected at
NewClienttime fromInfo.SecurityOptionsand every netem/iptables/stress call fails fast with a message pointing atpodman machine set --rootful(macOS) or the rootful systemd unit (Linux). Rootless support is out of scope — would need slirp4netns/pasta netns handling and user-namespace cgroup math. - Podman cgroup leaf naming: Podman uses
libpod-<id>.scope(orlibpod-<id>.scope/containerwith init sub-cgroup) under systemd, vs Docker'sdocker-<id>.scope. Pointing--runtime dockerat Podman's compat socket silently places stress-ng sidecars in the wrong cgroup;--runtime podmanderives the correct path. - Podman cgroup resolution reads host-side
/proc/<pid>/cgroup: containers launched under Podman's default--cgroupns=privatesee only0::/or0::/containerfrom inside the container, so we read from pumba's own view of/proc(requires shared kernel with targets). On macOS this means running pumba inside thepodman machineVM — the same pattern used for containerd testing in Colima. Seepkg/runtime/podman/cgroup.goand thecgroupReadervar for the test-injectable hook. ContainerExecStartempty options breaks on Podman: Docker'sContainerExecStart(ctx, id, ExecStartOptions{})with no attach/detach flags is accepted by Docker (implicit sync via HTTP hijack) but rejected by Podman's compat API with "must provide at least one stream to attach to". TherunExecAttachedhelper inpkg/runtime/docker/exec.gogoes throughContainerExecAttach+ drain + inspect — works on both. When writing new exec-driven code, never callContainerExecStartwith empty options; use the helper.- tc/iptables sidecar cleanup must survive ctx cancel:
runSidecar(inpkg/runtime/docker/sidecar.go) callsremoveSidecar(wrapsContainerRemovewithcontext.WithoutCancel(ctx)+ 15 s timeout). Without this, SIGTERM to pumba during the narrow window after tc exec and before sidecar reap leaks the sidecar AND leaves the netem qdisc on the target's netns. - Sidecar
StopSignal: "SIGKILL":tail -f /dev/nullas PID 1 ignores SIGTERM. Podman'sDELETE ?force=1sends SIGTERM and waits the fullStopTimeout(10 s) before escalating — that's 10 s per sidecar reap on every netem/iptables call. Config setsStopSignal: "SIGKILL"so force-remove is immediate. - Podman inject-cgroup needs SYS_ADMIN +
label=disable+ nested leaf: cg-inject writes the sidecar's PID into the target'scgroup.procs. Three gotchas stack on cgroup v2 + systemd: (1) the target's scope may have a nestedcontainer/leaf (Podman's libpod init sub-cgroup) — cgroup v2's "no internal processes" rule means we must targetlibpod-<id>.scope/container/cgroup.procs, not the outer scope.pkg/runtime/podman/stress.go::resolveCgroupreads/proc/<pid>/cgroupplusos.Statto pick between them. (2) Writing across sibling scopes requiresCAP_SYS_ADMINin the initial user namespace. (3) SELinuxcontainer_tblocks cgroup writes even with SYS_ADMIN;SecurityOpt: ["label=disable"]on the sidecar bypasses this. All three are required on Fedora CoreOS / RHEL hosts. - Podman 4.9.x transient
/containersub-cgroup races inject-cgroup: Stock Podman 4.9.x on Ubuntu 24.04 creates<scope>/containerduring libpod init, migrates PID 1 to<scope>shortly after, andrmdirs/containermid-flight. The resolver'sos.Statcheck can pass and then/containeris gone by the time cg-inject opens/writescgroup.procs, yielding ENOENT on write (documented Podman behavior — see podman#20910). Podman 5.x (podman machine, Fedora CoreOS) keeps/containerstable. The inject-cgroup bats test lives intests/skip_ci/podman_stress_inject_cgroup.batsand is excluded from the GH CI globtests/podman_*.batsfor this reason. Proper fix: retry-on-ENOENT inside cg-inject (stress-ng sibling repo), then move the test back. - Ephemeral tc sidecar breaks poll-based bats assertions: the tc sidecar lives only for the duration of tc exec (sub-second) before
removeSidecarreaps it. Bats tests thatpodman ps-polled for the sidecar (to verify the skip label) race the lifecycle. Rewrotetests/podman_sidecar.batstests 68/69 to assert invariants instead: (a) create a fake container with the skip label manually and confirm pumba's re2: regex doesn't target it; (b) verify netem rules are removed from the TARGET netns after SIGTERM (viansenter), not whether the sidecar itself was tracked.