feat(llm-challenge): run solve phase inside Podman container by dqn · Pull Request #780 · tailor-platform/sdk

dqn · 2026-03-18T05:45:42Z

Replace software-level isolation with OS-level container isolation via Podman

Benchmark Results

Agent	Model	Effort	Score	Cost	Time
Claude	sonnet	N/A	291/300 (97%)	$1.92	11m14s
Codex	gpt-5.4	high	273/300 (91%)	$1.30	7m53s

Main Changes

Solve phase always runs inside a Podman container (llm-challenge-runner image, auto-built on first run)
Container isolation prevents agents from accessing host filesystem, global CLI configs (~/.claude/settings.json, ~/.codex/config.toml), and other repositories
Auth: Claude uses CLAUDE_CODE_OAUTH_TOKEN env var (claude setup-token), Codex uses ~/.codex/auth.json mount
Container runs as non-root node user (Claude Code rejects bypassPermissions under root)
Codex uses --dangerously-bypass-approvals-and-sandbox (bubblewrap cannot nest inside rootless Podman)
Host workDir mounted at /workspace inside container (avoids macOS path leaking into Codex sandbox config)
SDK tarball copied into workDir for container access
Remove ~700 lines of software isolation code (cleanEnv, denylist rules, claude-settings.json, AGENTS.md)
Codex default model changed from gpt-5.1-codex-mini to gpt-5.4, timeout increased to 20 minutes
Containerfile includes ca-certificates for Codex TLS

Notes

Podman must be installed and running before solve mode (podman machine start on macOS)
--use-solution and --impl modes are unaffected (no Podman required)
Verify phase still runs on the host

Replace software-level isolation (cleanEnv, claude-settings.json deny rules, Codex denylist rules) with OS-level container isolation via Podman. The solve phase now always runs inside a container, providing filesystem isolation without needing path obfuscation or env scrubbing. Key changes: - Add container.ts: Podman management (availability check, image build, container run args construction) - Modify claude.ts/codex.ts: spawn podman instead of direct CLI - Auth via CLAUDE_CODE_OAUTH_TOKEN / OPENAI_API_KEY env var passthrough - Run container as non-root user (Claude Code rejects bypassPermissions under root) - Remove ~700 lines of software isolation code (cleanEnv, denylist, claude-settings.json, AGENTS.md generation) - Add Podman availability check and auth hints to run.ts

…tials - Claude: use CLAUDE_CODE_OAUTH_TOKEN env var (from `claude setup-token`) - Codex: mount ~/.codex/ read-only (contains auth.json with ChatGPT OAuth) - Remove all OPENAI_API_KEY / ANTHROPIC_API_KEY references - Update error hints to guide users to login-based auth - Update llm-challenge skill with Podman prerequisites

Codex CLI (Rust binary) uses native TLS and requires system CA certificates. node:22-slim does not include them, causing "no native root CA certificates found" errors on HTTPS/WSS connections.

- Mount workDir at /workspace instead of host path to avoid macOS path leaking into Codex sandbox config (writable_roots) - Mount only ~/.codex/auth.json instead of entire ~/.codex/ to prevent host config.toml from injecting invalid writable_roots - Pre-create /home/node/.codex and .claude dirs in Containerfile - Replace --sandbox workspace-write with --dangerously-bypass-approvals-and-sandbox (bubblewrap cannot create mount namespaces inside rootless Podman containers)

…ase timeout - Default model: gpt-5.1-codex-mini -> gpt-5.4 - Timeout: 10 minutes -> 20 minutes (matches Claude)

…s, fix Codex auth docs - copy SDK tarball into workDir/.sdk/ with relative file: ref so pnpm install works inside Podman container (host tarball path is not mounted in container) - exclude .sdk/ directory from file listing shown to solve agent - document credential exposure trade-off in container.ts security model comment - correct Codex auth setup instructions: codex login / auth.json, not OPENAI_API_KEY

…dContainerRunArgs SolveAgent is "claude" | "codex" and both values match their CLI binary names, so the ternary `agent === "claude" ? "claude" : "codex"` is redundant.

detached: true was carried over from the old host-process invocation but is incorrect for Podman. SIGTERM may not reach container child processes when detached, leaving orphaned processes and keeping Node's event loop alive. checkCodexAuthStatus correctly omits detached.

changeset-bot · 2026-03-18T05:45:47Z

⚠️ No Changeset found

Latest commit: fe806e7

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

pkg-pr-new · 2026-03-18T05:46:24Z

Open in StackBlitz

npm i https://pkg.pr.new/@tailor-platform/create-sdk@780

commit: fe806e7

github-actions · 2026-03-18T05:48:24Z

Code Metrics Report (packages/sdk)

	main (4c8c74d)	#780 (7d62ad3)	+/-
Coverage	55.2%	55.2%	0.0%
Code to Test Ratio	1:0.3	1:0.3	0.0

Details

  |                    | main (4c8c74d) | #780 (7d62ad3) | +/-  |
  |--------------------|----------------|----------------|------|
  | Coverage           |          55.2% |          55.2% | 0.0% |
  |   Files            |            301 |            301 |    0 |
  |   Lines            |          10010 |          10010 |    0 |
  |   Covered          |           5532 |           5532 |    0 |
  | Code to Test Ratio |          1:0.3 |          1:0.3 |  0.0 |
  |   Code             |          58204 |          58204 |    0 |
  |   Test             |          23200 |          23200 |    0 |

SDK Configure Bundle Size

	main (4c8c74d)	#780 (7d62ad3)	+/-
configure-index-size	10.74KB	10.74KB	0KB
dependency-chunks-size	33.76KB	33.76KB	0KB
total-bundle-size	44.49KB	44.49KB	0KB

Runtime Performance

	main (4c8c74d)	#780 (7d62ad3)	+/-
Generate Median	2,521ms	2,460ms	-61ms
Generate Max	2,634ms	2,539ms	-95ms
Apply Build Median	2,568ms	2,498ms	-70ms
Apply Build Max	2,593ms	2,590ms	-3ms

Type Performance (instantiations)

	main (4c8c74d)	#780 (7d62ad3)
tailordb-basic	42,977	42,977
tailordb-optional	3,927	3,927
tailordb-relation	4,071	4,071
tailordb-validate	2,925	2,925
tailordb-hooks	5,790	5,790
tailordb-object	11,571	11,571
tailordb-enum	2,793	2,793
resolver-basic	9,236	9,236
resolver-nested	25,623	25,623
resolver-array	17,859	17,859
executor-schedule	4,244	4,244
executor-webhook	883	883
executor-record	4,847	4,847
executor-resolver	4,270	4,270
executor-operation-function	877	877
executor-operation-gql	879	879
executor-operation-webhook	898	898
executor-operation-workflow	2,290	2,290

Reported by octocov

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 7 additional findings.

dqn added 9 commits March 17, 2026 20:38

fix(llm-challenge): add ca-certificates to container image

2935617

Codex CLI (Rust binary) uses native TLS and requires system CA certificates. node:22-slim does not include them, causing "no native root CA certificates found" errors on HTTPS/WSS connections.

chore(llm-challenge): change Codex default model to gpt-5.4 and incre…

4547c18

…ase timeout - Default model: gpt-5.1-codex-mini -> gpt-5.4 - Timeout: 10 minutes -> 20 minutes (matches Claude)

refactor(llm-challenge): simplify agent CLI command selection in buil…

7c24b1c

…dContainerRunArgs SolveAgent is "claude" | "codex" and both values match their CLI binary names, so the ternary `agent === "claude" ? "claude" : "codex"` is redundant.

fix(llm-challenge): clear imagePromise on build failure to allow retry

fe806e7

dqn marked this pull request as ready for review March 18, 2026 07:34

dqn requested review from remiposo and toiroakr as code owners March 18, 2026 07:34

devin-ai-integration bot reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm-challenge): run solve phase inside Podman container#780

feat(llm-challenge): run solve phase inside Podman container#780
dqn wants to merge 9 commits intomainfrom
feat/llm-challenge-podman-isolation

dqn commented Mar 18, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

changeset-bot bot commented Mar 18, 2026

Uh oh!

pkg-pr-new bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dqn commented Mar 18, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

Main Changes

Notes

Uh oh!

changeset-bot bot commented Mar 18, 2026

⚠️ No Changeset found

Uh oh!

pkg-pr-new bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 18, 2026

Code Metrics Report (packages/sdk)

SDK Configure Bundle Size

Runtime Performance

Type Performance (instantiations)

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dqn commented Mar 18, 2026 •

edited by devin-ai-integration bot

Loading

pkg-pr-new bot commented Mar 18, 2026 •

edited

Loading