Skip to content

Add eval watcher scoreboard artifacts#1708

Open
mimeding wants to merge 3 commits into
osaurus-ai:mainfrom
mimeding:codex/eval-watcher-scoreboard
Open

Add eval watcher scoreboard artifacts#1708
mimeding wants to merge 3 commits into
osaurus-ai:mainfrom
mimeding:codex/eval-watcher-scoreboard

Conversation

@mimeding

@mimeding mimeding commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds watcher-oriented eval scoreboard artifacts for stored local/frontier report bundles.
  • Adds a script and Makefile target for generating a watcher report, refreshing the latest scoreboard, and failing closed on invalid bundles.
  • Gates on the latest release-candidate regressions and run failures while preserving historical comparison totals.
  • Consumes unified evidence registry snapshots so watcher scoreboards share the same report source of truth as PR eval evidence.

Rebase update

Validation

  • swift test --package-path Packages/OsaurusEvals --filter EvalReviewReportTests — 6 tests passed.
  • swift test --package-path Packages/OsaurusEvals --filter EvalScoreboardTests — 6 tests passed.
  • git diff --check
  • bash -n scripts/evals/eval-watcher-report.sh
  • swift run --package-path Packages/OsaurusEvals osaurus-evals scoreboard --help

Risk

  • This adds report generation/scoreboard artifacts only; it does not change agent-loop behavior, prompts, model routing, or runtime defaults.

@mimeding mimeding force-pushed the codex/eval-watcher-scoreboard branch 2 times, most recently from b18cd96 to d1201c1 Compare June 25, 2026 15:48
@mimeding mimeding marked this pull request as ready for review June 25, 2026 16:15
@mimeding

Copy link
Copy Markdown
Contributor Author

Moving this back to draft because the branch is now dirty against current main even though the required checks were green. I will refresh the branch before asking for review again so the ready queue only contains merge-clean PRs.

@mimeding mimeding marked this pull request as draft June 26, 2026 21:56
@mimeding mimeding force-pushed the codex/eval-watcher-scoreboard branch from d1201c1 to aa49e4b Compare June 30, 2026 03:27
@mimeding mimeding marked this pull request as ready for review June 30, 2026 04:06
@mimeding mimeding force-pushed the codex/eval-watcher-scoreboard branch from aa49e4b to 5639a3f Compare July 2, 2026 06:26
@mimeding mimeding force-pushed the codex/eval-watcher-scoreboard branch from 5639a3f to a6be40b Compare July 2, 2026 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant