Skip to content

adds docker-env eval config with 3 attempts and verifier enabled#1

Open
anirudhp26 wants to merge 1 commit into
mainfrom
ani/docker-eval
Open

adds docker-env eval config with 3 attempts and verifier enabled#1
anirudhp26 wants to merge 1 commit into
mainfrom
ani/docker-eval

Conversation

@anirudhp26

Copy link
Copy Markdown
Collaborator

No description provided.

@parsewave-bot

parsewave-bot Bot commented Feb 10, 2026

Copy link
Copy Markdown

TerminalBench Bot Commands

Run tasks:
/bot tb run [--dataset, --dataset-path, --dataset-config, --registry-url, --local-registry-path, --output-path, --run-id, --upload-results, --task-id, --n-tasks, --exclude-task-id, --no-rebuild, --cleanup, --use-subscription, --model, --agent, --agent-import-path, --agent-kwarg, --log-level, --livestream, --n-concurrent, --n-attempts, --global-timeout-multiplier, --global-agent-timeout-sec, --global-test-timeout-sec, --contributionsCommit]

Check:
/bot tb tasks check [--task-id, --tasks-dir, --unit-test-relative-path, --dockerfile-relative-path, --model, --agent, --fix, --output-path, --contributionsCommit]

Debug:
/bot tb tasks debug [--task-id, --run-id, --runs-dir, --tb-run-job-id, --tasks-dir, --agent, --model, --n-trials, --output-path, --contributionsCommit]

Full Check:
/bot full-check-v2 [--task-id <id>] [--analyze-failure] [...]
/bot full-check-v2 --opus-only - Run only tb_run_large with Claude Opus 4.6
/bot full-check-v2 --sonnet-only - Run only tb_run_large with Claude Sonnet 4.5
/bot full-check --tb-run-large-agent terminus-2 --tb-run-large-model openrouter/openai/gpt-5.2

For detailed parameter descriptions, run tb --help or tb <command> --help locally.

Job Management:
/bot job list - List all running jobs
/bot job status <job_id> - Get status of a specific job
/bot job kill <job_id> - Kill a running job
/bot job restart <job_id> - Restart a failed job
/bot job info <job_id> - Show detailed information about a job
/bot job cleanup - Remove all failed-to-report jobs

Remove default flags: Use --no-{flag} to disable default flags (e.g., --no-use-subscription)

Aliases:
/bot /gpt-5-nano-attempts [--n, --task-id, --tasks-dir]/bot /tb run --model openrouter/openai/gpt-oss-120b:exacto --agent terminus-2
/bot /codex-attempts [--n, --task-id, --tasks-dir]/bot /tb run --agent codex
/bot /claude-attempts [--n, --task-id, --tasks-dir]/bot /tb run --agent claude-code
/bot /tb-check [--task-id, --tasks-dir]/bot /tb tasks check
/bot /oracle [--task-id, --tasks-dir]/bot /tb run --agent oracle
/bot /nop [--task-id, --tasks-dir]/bot /tb run --agent nop
/bot /tb-debug [--task-id, --tasks-dir]/bot /tb tasks debug

Get help: /help or /bot help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant