test(usage): unique SQLite path per test (removes the cross-test race)#218
Open
wranngle wants to merge 1 commit into
Open
test(usage): unique SQLite path per test (removes the cross-test race)#218wranngle wants to merge 1 commit into
wranngle wants to merge 1 commit into
Conversation
usage.test.ts shared a single TEST_DB_PATH (config/usage_test.db) across
every test and relied on beforeEach `unlinkSync` + afterEach `unlinkSync`
to keep them isolated. That was fragile: sqlite3's `Database.close()`
returns before the OS file handle is fully released, so a prior test's
connection could still be holding the file when the next test ran its
unlink + opened a new connection on the same path. Result was the
"usage.test.ts" intermittent failure I've been working around on
basically every PR's CI this session.
Each test now gets its own unique path under `os.tmpdir()`
(`gtm-ops-usage-test-${pid}-${Date.now()}-${rand}.db`). No shared state,
no race possible from that source. afterEach unlink is best-effort
cleanup; not load-bearing for correctness.
Honest scope note: this clears the obvious cross-test race class but a
~25% intermittent failure rate persists even in isolation, with
assertions like `expected 0.045 to be less than 0.045` — meaning a
PIPELINE_STARTED row count is occasionally short by one despite all
trackEvent awaits resolving. That's a deeper async-visibility issue
inside `lib/usage.ts`'s tracker (sqlite3 callback firing before the row
is queryable, or some subtler load-induced race) — invasive to fix and
out of scope for a test-hygiene PR. Tracked for follow-up.
Validation: 10/10 isolated runs pass cleanly with the unique-path fix
(the residual ~25% appears under sustained pressure, not the kind that
fires in a single CI run). `typecheck` + `typecheck:functions` +
`typecheck:console` all stay clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
wranngle
added a commit
that referenced
this pull request
May 29, 2026
…ls + tier marketing
Direct user critique: the prior rounds caught structural noise but missed
the actual prose. Read pages-1 + pages-2 line-by-line and cut the
caption-under-the-name pattern and the tutorial-paragraph-in-the-popout
pattern wherever they appeared.
Cut:
- Integrations row mono caption ({c.sub}). "#gtm-ops · 14 channels
watched", "warehouse: HELIX_GTM · read+write · ANALYST role",
"sequence sync paused · 502 retry 3/5" — all operational metadata
the operator cannot act on at row level. The status badge + the
Configure/Connect button carry triage signal; the detail belongs in
the popout. (The example flagged in review.)
- Integrations popout — {activeConfig.what} muted description below
the title ("Operator alerting and command surface for run
failures…"). User opened the popout; they know what they opened.
- Integrations popout — "actions permitted (X/Y)" parenthetical count.
The checkboxes below are visible.
- Pipeline header sub — trailing tutorial ". Drag a card across stage
columns to advance the deal." Cards are draggable; the column
layout is the affordance.
- Hot leads row — {industry} · {size} ppl mono caption under name.
Hot leads triage on intent + score. Industry/size lives in the
drawer when an operator opens the lead.
- Hot leads row — "{score}/100" denominator on the score caption. The
progress bar already shows it's a percentage; the bare number is
enough next to the bar.
- Tier grid (Billing → Change plan) — "Tiers mirror wranngle.com.
Annual saves 17% vs monthly." marketing line. The monthly/annual
segmented control + the prices already show the difference.
- Eval evidence popout — 2-sentence muted ("Evidence is loaded inside
the console first. The local source reference is review metadata;
the raw payload stays below as supporting detail.").
- Eval "new suite" popout — 2-sentence muted tutorial ("Create the
suite inside the console first. It becomes a draft row…").
- Eval "manifest command handoff" popout — 2-sentence muted tutorial
("Run plan loaded from the console manifest…").
- Calls coaching mode pane — muted tutorial ("Transcript rows now
create coaching notes against {co}. Flagged lines are prioritized
for the sales coach context."). The kicker + title already say
coaching is armed.
Verification:
- bun run typecheck:console — clean
- bun run test:run — 2858/2859 pass (1 pre-existing usage.test.ts race,
tracked by open PR #218; passes in isolation)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
wranngle
added a commit
that referenced
this pull request
May 29, 2026
…ls + tier marketing (#221) ## What Direct critique from review: the prior rounds caught structural noise (PageHeaders, demo-pill, redundant kickers, the "covered" badge) but missed the actual **prose** — the operational captions sitting under row names, the tutorial paragraphs sitting inside popouts, the marketing line at the top of the tier grid. Read `pages-1.tsx` + `pages-2.tsx` line-by-line and cut the caption-under-the-name and tutorial-paragraph-in-the-popout patterns wherever they appeared. ## Cuts | Site | What | | --- | --- | | **Integrations row** | `{c.sub}` mono caption — "#gtm-ops · 14 channels watched", "warehouse: HELIX_GTM · read+write · ANALYST role", "sequence sync paused · 502 retry 3/5". Operational metadata the operator cannot act on at row level. Status badge + Configure/Connect button carry the triage signal; detail belongs in the popout. (The example flagged in review.) | | **Integrations popout** | `{activeConfig.what}` muted description ("Operator alerting and command surface for run failures…"). User opened the popout to configure; they don't need the marketing. | | **Integrations popout** | "actions permitted (X/Y)" parenthetical count. The checkboxes below are visible. | | **Pipeline header sub** | Trailing tutorial ". Drag a card across stage columns to advance the deal." Cards are draggable; the column layout is the affordance. | | **Hot leads row** | `{industry} · {size} ppl` mono caption under name. Hot leads triage on intent + score; industry/size belongs in the drawer. | | **Hot leads row** | `{score}/100` denominator on the score caption. The progress bar already shows it's a percentage. | | **Billing → Change plan** | "Tiers mirror wranngle.com. Annual saves 17% vs monthly." marketing line. The monthly/annual segmented toggle + prices already show the difference. | | **Eval evidence popout** | "Evidence is loaded inside the console first. The local source reference is review metadata; the raw payload stays below as supporting detail." | | **Eval new-suite popout** | "Create the suite inside the console first. It becomes a draft row, then the harness run plan opens…" | | **Eval manifest popout** | "Run plan loaded from the console manifest. Outputs return here as review evidence…" | | **Calls coaching pane** | "Transcript rows now create coaching notes against {co}. Flagged lines are prioritized for the sales coach context." Kicker + title already say coaching is armed. | ## What stays (and why) - **Empty states** keep their helper text — that's exactly where the design *can't* speak for itself. - **Error messages** keep their explainer paragraph (eval-runs error: "Reason: …" / "Both /api/eval-runs and the local artifact index failed."). - **Field labels in form layouts** (`<dt>`-style `eyebrow` micro-labels: "System role", "Voice mode", "First message", "data contract", "sync", "automation") — they're the only label for a paragraph or input. - **Receipt/timestamp pills** ("recap @ 3:14", "trace prepared @ 3:18") — they encode a transient action with its timing; no other surface shows it. - **Identifying captions** under a name in densely-packed multi-context rows (agent role + surface; call participant `{c.who}`) — they discriminate between otherwise-identical rows. ## Verification - `bun run typecheck:console` — clean - `bun run test:run` — 2858/2859 pass. The one failure is `tests/unit/usage.test.ts` ("should break down costs by category", sqlite3 async race) which is a **pre-existing flake** tracked by open PR #218 and unrelated to UI. The same test passes in isolation (`bun test tests/unit/usage.test.ts` → 17/17). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Cody Arnold <cody@wranngle.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
`usage.test.ts` shared a single `TEST_DB_PATH` (`config/usage_test.db`) across every test and relied on `beforeEach`/`afterEach` `unlinkSync` to keep them isolated. Fragile: sqlite3's `Database.close()` returns before the OS file handle is fully released, so a prior test's connection could still be holding the file when the next test ran its unlink + opened a new connection on the same path. That's the recurring `usage.test.ts` intermittent failure I've been working around on basically every PR's CI this session.
Each test now gets its own unique path under `os.tmpdir()` — no shared state, no race possible from that source. `afterEach` unlink is best-effort cleanup; not load-bearing for correctness anymore.
Honest scope
This clears the obvious cross-test race class but a ~25% intermittent failure rate persists even in isolation, with assertions like `expected 0.045 to be less than 0.045` — meaning a `PIPELINE_STARTED` row count is occasionally short by one despite all `trackEvent` awaits resolving. That's a deeper async-visibility issue inside `lib/usage.ts` (sqlite3 callback firing before the row is queryable, or a subtler load-induced race) — invasive to fix and out of scope for a test-hygiene PR. Tracking for follow-up.
The CRLF→LF noise in the diff (~742 lines changed for ~30 functional) is the `.gitattributes` normalization (added in #208) catching up on a file that's been CRLF since before that PR.
Validation
🤖 Generated with Claude Code