Skip to content

feat(llm-challenge): run solve phase inside Podman container#780

Open
dqn wants to merge 9 commits intomainfrom
feat/llm-challenge-podman-isolation
Open

feat(llm-challenge): run solve phase inside Podman container#780
dqn wants to merge 9 commits intomainfrom
feat/llm-challenge-podman-isolation

Conversation

@dqn
Copy link
Contributor

@dqn dqn commented Mar 18, 2026

Replace software-level isolation with OS-level container isolation via Podman

Benchmark Results

Agent Model Effort Score Cost Time
Claude sonnet N/A 291/300 (97%) $1.92 11m14s
Codex gpt-5.4 high 273/300 (91%) $1.30 7m53s

Main Changes

  • Solve phase always runs inside a Podman container (llm-challenge-runner image, auto-built on first run)
  • Container isolation prevents agents from accessing host filesystem, global CLI configs (~/.claude/settings.json, ~/.codex/config.toml), and other repositories
  • Auth: Claude uses CLAUDE_CODE_OAUTH_TOKEN env var (claude setup-token), Codex uses ~/.codex/auth.json mount
  • Container runs as non-root node user (Claude Code rejects bypassPermissions under root)
  • Codex uses --dangerously-bypass-approvals-and-sandbox (bubblewrap cannot nest inside rootless Podman)
  • Host workDir mounted at /workspace inside container (avoids macOS path leaking into Codex sandbox config)
  • SDK tarball copied into workDir for container access
  • Remove ~700 lines of software isolation code (cleanEnv, denylist rules, claude-settings.json, AGENTS.md)
  • Codex default model changed from gpt-5.1-codex-mini to gpt-5.4, timeout increased to 20 minutes
  • Containerfile includes ca-certificates for Codex TLS

Notes

  • Podman must be installed and running before solve mode (podman machine start on macOS)
  • --use-solution and --impl modes are unaffected (no Podman required)
  • Verify phase still runs on the host

Open with Devin

dqn added 9 commits March 17, 2026 20:38
Replace software-level isolation (cleanEnv, claude-settings.json deny
rules, Codex denylist rules) with OS-level container isolation via
Podman. The solve phase now always runs inside a container, providing
filesystem isolation without needing path obfuscation or env scrubbing.

Key changes:
- Add container.ts: Podman management (availability check, image build,
  container run args construction)
- Modify claude.ts/codex.ts: spawn podman instead of direct CLI
- Auth via CLAUDE_CODE_OAUTH_TOKEN / OPENAI_API_KEY env var passthrough
- Run container as non-root user (Claude Code rejects bypassPermissions
  under root)
- Remove ~700 lines of software isolation code (cleanEnv, denylist,
  claude-settings.json, AGENTS.md generation)
- Add Podman availability check and auth hints to run.ts
…tials

- Claude: use CLAUDE_CODE_OAUTH_TOKEN env var (from `claude setup-token`)
- Codex: mount ~/.codex/ read-only (contains auth.json with ChatGPT OAuth)
- Remove all OPENAI_API_KEY / ANTHROPIC_API_KEY references
- Update error hints to guide users to login-based auth
- Update llm-challenge skill with Podman prerequisites
Codex CLI (Rust binary) uses native TLS and requires system CA
certificates. node:22-slim does not include them, causing
"no native root CA certificates found" errors on HTTPS/WSS
connections.
- Mount workDir at /workspace instead of host path to avoid macOS path
  leaking into Codex sandbox config (writable_roots)
- Mount only ~/.codex/auth.json instead of entire ~/.codex/ to prevent
  host config.toml from injecting invalid writable_roots
- Pre-create /home/node/.codex and .claude dirs in Containerfile
- Replace --sandbox workspace-write with
  --dangerously-bypass-approvals-and-sandbox (bubblewrap cannot create
  mount namespaces inside rootless Podman containers)
…ase timeout

- Default model: gpt-5.1-codex-mini -> gpt-5.4
- Timeout: 10 minutes -> 20 minutes (matches Claude)
…s, fix Codex auth docs

- copy SDK tarball into workDir/.sdk/ with relative file: ref so pnpm install works
  inside Podman container (host tarball path is not mounted in container)
- exclude .sdk/ directory from file listing shown to solve agent
- document credential exposure trade-off in container.ts security model comment
- correct Codex auth setup instructions: codex login / auth.json, not OPENAI_API_KEY
…dContainerRunArgs

SolveAgent is "claude" | "codex" and both values match their CLI binary names,
so the ternary `agent === "claude" ? "claude" : "codex"` is redundant.
detached: true was carried over from the old host-process invocation but is
incorrect for Podman. SIGTERM may not reach container child processes when
detached, leaving orphaned processes and keeping Node's event loop alive.
checkCodexAuthStatus correctly omits detached.
@changeset-bot
Copy link

changeset-bot bot commented Mar 18, 2026

⚠️ No Changeset found

Latest commit: fe806e7

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 18, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@tailor-platform/create-sdk@780

commit: fe806e7

@github-actions
Copy link

Code Metrics Report (packages/sdk)

main (4c8c74d) #780 (7d62ad3) +/-
Coverage 55.2% 55.2% 0.0%
Code to Test Ratio 1:0.3 1:0.3 0.0
Details
  |                    | main (4c8c74d) | #780 (7d62ad3) | +/-  |
  |--------------------|----------------|----------------|------|
  | Coverage           |          55.2% |          55.2% | 0.0% |
  |   Files            |            301 |            301 |    0 |
  |   Lines            |          10010 |          10010 |    0 |
  |   Covered          |           5532 |           5532 |    0 |
  | Code to Test Ratio |          1:0.3 |          1:0.3 |  0.0 |
  |   Code             |          58204 |          58204 |    0 |
  |   Test             |          23200 |          23200 |    0 |

SDK Configure Bundle Size

main (4c8c74d) #780 (7d62ad3) +/-
configure-index-size 10.74KB 10.74KB 0KB
dependency-chunks-size 33.76KB 33.76KB 0KB
total-bundle-size 44.49KB 44.49KB 0KB

Runtime Performance

main (4c8c74d) #780 (7d62ad3) +/-
Generate Median 2,521ms 2,460ms -61ms
Generate Max 2,634ms 2,539ms -95ms
Apply Build Median 2,568ms 2,498ms -70ms
Apply Build Max 2,593ms 2,590ms -3ms

Type Performance (instantiations)

main (4c8c74d) #780 (7d62ad3) +/-
tailordb-basic 42,977 42,977 0
tailordb-optional 3,927 3,927 0
tailordb-relation 4,071 4,071 0
tailordb-validate 2,925 2,925 0
tailordb-hooks 5,790 5,790 0
tailordb-object 11,571 11,571 0
tailordb-enum 2,793 2,793 0
resolver-basic 9,236 9,236 0
resolver-nested 25,623 25,623 0
resolver-array 17,859 17,859 0
executor-schedule 4,244 4,244 0
executor-webhook 883 883 0
executor-record 4,847 4,847 0
executor-resolver 4,270 4,270 0
executor-operation-function 877 877 0
executor-operation-gql 879 879 0
executor-operation-webhook 898 898 0
executor-operation-workflow 2,290 2,290 0

Reported by octocov

@dqn dqn marked this pull request as ready for review March 18, 2026 07:34
@dqn dqn requested review from remiposo and toiroakr as code owners March 18, 2026 07:34
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 7 additional findings.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant