From 473eefa1a10d7e0874926a0e9809baa82aa14367 Mon Sep 17 00:00:00 2001 From: BitHighlander Date: Wed, 8 Apr 2026 15:37:18 -0700 Subject: [PATCH 1/2] docs: handoff for v1.2.14 Windows pair failures (3 findings) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Captures the diagnostic work from a 1.2.14 Windows install session for the next agent to act on. Three distinct failure modes, all surfacing to the user as the same "splash never advances" symptom but caused by unrelated bugs in different layers. 1. `Invalid Version: vundefined.undefined.undefined` on WebUSB pair — defensive fix in flight at keepkey/hdwallet#37 (with regression tests). Bumping the submodule pointer here is a follow-up after that PR merges. 2. Native crash on USB unplug during in-flight WebUSB write — bun process disappears with no JS error after libusb's `bad write`. Same family as the existing macOS attach-time SIGTRAP guard in engine-controller.ts:251-254. Open; suggested approach in the doc ranges from a process-supervisor band-aid to moving USB I/O into a child process. 3. Splash hangs forever when port 1646 is already bound — the FATAL exit message is in the backend log but the BrowserWindow is created before the port collision check, leaving an undecorated splash on screen. Open; fix is to move the collision probe before window creation and/or surface a "Vault is already running" UI state. Doc includes log evidence for each finding, root-cause file pointers, reproduction recipes, and an explicit "what is NOT broken" section to prevent the next agent from re-investigating the verified-intact download, the 1.2.14 @noble/hashes fix, the protobuf schema, and the frontend bundle. --- docs/HANDOFF-1.2.14-WINDOWS-PAIR.md | 351 ++++++++++++++++++++++++++++ 1 file changed, 351 insertions(+) create mode 100644 docs/HANDOFF-1.2.14-WINDOWS-PAIR.md diff --git a/docs/HANDOFF-1.2.14-WINDOWS-PAIR.md b/docs/HANDOFF-1.2.14-WINDOWS-PAIR.md new file mode 100644 index 00000000..f490b036 --- /dev/null +++ b/docs/HANDOFF-1.2.14-WINDOWS-PAIR.md @@ -0,0 +1,351 @@ +# Handoff: 1.2.14 Windows Pair Failures + +**Date:** 2026-04-08 +**Status:** 1 fix in flight upstream, 2 issues open in this repo +**Pin:** documents three distinct failure modes observed on a fresh +`KeepKey-Vault-1.2.14-win-x64-setup.exe` install on Windows 10 + +--- + +## TL;DR for the next agent + +1. **`Invalid Version: vundefined.undefined.undefined` on WebUSB pair** — fix in + flight at `keepkey/hdwallet#37`. After it merges, bump the `modules/hdwallet` + submodule pointer here. See [Finding 1](#finding-1). +2. **Native crash on USB unplug during pair** — open. The bun process dies with + no JS error after `bad write`. Engine layer needs to catch & recover. See + [Finding 2](#finding-2). +3. **Splash hangs forever when port 1646 is already bound** — open. The fatal + exit message is in the backend log but never surfaces to the user. UI needs + a "Vault is already running" state. See [Finding 3](#finding-3). + +All three were diagnosed against the v1.2.14 install at +`%LOCALAPPDATA%\Programs\KeepKeyVault\` with logs at +`%LOCALAPPDATA%\com.keepkey.vault\vault-backend.log`. + +--- + +## Context + +`v1.2.14` was the smoke-test build for the Windows cold-start fix (`fd8e90e +fix(windows): strip stale nested @noble/hashes + bump to 1.2.14`). The original +1.2.13 cold-start crash (Worker dying on `@noble/curves` import with `ahash is +not a function`) **is fixed** — verified by inspecting the installed bundle: +no nested `@noble/hashes` copies under `@noble/curves`, `@scure/bip32`, or +`@scure/bip39`. The Worker is alive and the backend completes its boot path. + +But three other failure modes surface against a real KeepKey device on +Windows. The defective behavior visible to the user — "splash opens, never +advances" — turns out to be the *common UI symptom* of all three; only the +backend log distinguishes them. + +The download itself is **bit-perfect** — SHA-256 of the Downloads installer +matches the published asset digest exactly. There's no point re-downloading. + +--- + +## Finding 1 + +### `Invalid Version: vundefined.undefined.undefined` on WebUSB pair + +#### Symptom + +Splash opens, never advances. Backend log ends within ~10 ms of "WebUSB device +found, attempting pairRawDevice". + +#### Log evidence + +``` +[Engine] WebUSB device found, attempting pairRawDevice... +WARN: [Engine] WebUSB pair failed: Invalid Version: vundefined.undefined.undefined +[Engine] Scanning for HID device... +``` + +The 11 ms gap between "device found" and "pair failed" is too fast to be a +real USB roundtrip — the failure path is synchronous on a stale or +miscast `Features.AsObject`. + +#### Root cause + +`modules/hdwallet/packages/hdwallet-keepkey/src/keepkey.ts:1190` constructs +the firmware version string from the protobuf `Features` response and +immediately passes it to `semver.gte()`: + +```ts +const fwVersion = `v${out.majorVersion}.${out.minorVersion}.${out.patchVersion}`; +this._supportsOsmosis = semver.gte(fwVersion, "v7.7.0"); +``` + +If any version field is `undefined` (wrong-message-type miscast, decode +mismatch, device in unexpected state, etc.), the template literal happily +produces `"vundefined.undefined.undefined"` and `semver.gte()` throws the +opaque error verbatim. There is **zero defense** — no nil check, no +validation that `event.message` is actually a Features (the `as` cast is a +TypeScript fiction). + +#### Status: **IN FLIGHT — keepkey/hdwallet#37** + +PR adds: +- Validation throw before `fwVersion` is constructed, naming the missing fields +- 4 unit tests in `keepkey-initialize.test.ts` covering all-missing, + partial-missing, all-present (negative), and an explicit regression assertion + that the original error string never appears +- Tests deterministically reproduce the original `Invalid Version: + vundefined.undefined.undefined` against unfixed code; pass against fixed code +- Full `hdwallet-keepkey` suite stays green (13/13) + +#### What you need to do + +1. Wait for `keepkey/hdwallet#37` to merge into `master`. +2. Bump the submodule pointer here: + ```sh + cd modules/hdwallet + git fetch origin master + git checkout master + git pull + cd ../.. + git add modules/hdwallet + git commit -m "chore: bump hdwallet to include initialize() Features version validation" + ``` +3. Open a PR to `keepkey-vault` `develop`. +4. After merge, the WebUSB pair path will surface a *clear* error + (`KeepKey Initialize returned Features without firmware version + (major=…, minor=…, patch=…). Device may be in bootloader mode, mid-update, + or returned an unexpected message type.`) instead of the opaque semver + string. **This does not "fix" pairing** — it just stops the silent splash + hang and gives you something actionable in the log. The follow-up question + for whoever owns the engine layer: *why* is the device returning a Features + without version fields in the first place? The new error message points at + bootloader mode / stale cache / wrong message type as candidates. + +--- + +## Finding 2 + +### Native crash on USB unplug during WebUSB pair (`bad write`) + +#### Symptom + +Splash opens with device connected, hangs ~2 seconds, then the entire app +process disappears the moment the user unplugs the device. No crash dialog, +no JS error, no further log lines. Reproduced on cold launch. + +#### Log evidence + +``` +17:59:42.094 [Engine] WebUSB device found, attempting pairRawDevice... +17:59:44.304 WARN: [Engine] WebUSB pair failed: bad write ← unplugged here +17:59:44.312 [Engine] Scanning for HID device... +[silence — process gone, port 1646 free, tasklist empty] +``` + +The 2,210 ms gap between "device found" and "bad write" is a *real* USB +attempt that was interrupted mid-write by the unplug. After the engine +catches the error and falls through to the HID scan, the bun process exits +without writing another log line. The HID adapter never logs success or +failure. This is the signature of a native addon crash, not a JS exception. + +#### Root cause (proposed) + +The `usb` native addon (libusb wrapper) is receiving a hardware-level +disconnect mid-transfer and segfaulting in native code, taking the entire +bun process down with it. There's already a comment in +`projects/keepkey-vault/src/bun/engine-controller.ts:251-254` warning about +the same family of bug at *attach* time on macOS: + +> // Device may already be plugged in. If a KeepKey is present, give +> // libusb the same stabilisation window as the hot-plug attach handler +> // before calling pairRawDevice() — without it, the native addon can +> // SIGTRAP on macOS when the device is connected at launch. + +This is the *detach*-time analogue on Windows, hitting during an in-flight +write rather than at startup. + +#### Status: **OPEN** + +#### Suggested fix direction + +This needs an investigation, not a one-line patch. Possibilities, in +roughly increasing order of intrusiveness: + +1. **Quick win — process supervisor.** Wrap the bun process in a relauncher + that restarts on unexpected exit (with an exponential backoff to avoid + crash loops). Doesn't fix the bug but stops it from being + user-visible. The relauncher should distinguish "normal exit (window + closed by user)" from "crash" via exit code so it doesn't fight the + user. +2. **Native error boundary.** See if the libusb wrapper exposes a way to + register a fault handler that turns the segfault into a recoverable JS + error. Likely not — most libusb bindings don't. +3. **Avoid in-flight transfers across detach.** Wrap every `transport.call` + in a Promise.race against a `usb.on('detach')` listener that aborts the + call before the native code completes the doomed transfer. Tricky: + libusb's API is callback-driven and there's no clean cancel for an + in-flight bulk transfer. +4. **Move USB I/O to a child process.** The cleanest fix: spawn a small + `usb-helper` child process that owns all libusb interaction. When it + crashes on unplug, only the helper dies and the main bun process + restarts it. Most invasive but the only way to make this class of bug + bounded. + +I'd recommend starting with **(1)** (immediate user-visible fix) and +filing **(4)** as a tracked follow-up. + +#### How to reproduce + +1. Launch v1.2.14 from Start Menu with KeepKey plugged in. +2. As soon as the splash appears (within ~2 sec), yank the USB cable. +3. Process disappears entirely from `tasklist`. `vault-backend.log` ends + mid-`Scanning for HID device...` with no further entries. + +--- + +## Finding 3 + +### Splash hangs forever when port 1646 is already bound + +#### Symptom + +User double-clicks the app while a previous orphaned instance is still +running (e.g. after a crash, or after closing the window without the +process exiting). New instance shows splash, then nothing. No error +dialog, no message, no path forward except killing the orphaned process +via Task Manager. + +#### Log evidence + +``` +=== New session: 2026-04-08T17:27:57.226Z === +... +[2026-04-08T17:27:57.358Z] [auth] Loaded 1 persisted pairings +[2026-04-08T17:27:57.360Z] ERR: [Vault] FATAL: port 1646 is already in use by another Vault instance. Exiting. +``` + +The fatal exit message is logged correctly, the second instance exits +cleanly, but **none of this is communicated to the user**. The splash +window appears (it's created before the port-bind check) and then the +process exits, leaving the splash window frozen until the parent process +is reaped by the OS. + +#### Root cause + +`projects/keepkey-vault/src/bun/index.ts` creates the `BrowserWindow` +*before* `applyRestApiState()` runs the port collision check (around the +`[Vault] REST API started on port 1646` log line). The order is: + +``` +perf('creating BrowserWindow') ← splash visible +let mainWindow = new BrowserWindow(...) +perf('window created, starting deferred init') +deferredInit() ← async, includes the port check +``` + +If `deferredInit` then exits the process due to the port collision, the +splash window dies with the process — but not before the user has seen it +appear and assumed the app is loading. + +#### Status: **OPEN** + +#### Suggested fix direction + +Two complementary changes: + +1. **Move the port collision check before window creation.** A simple + `net.createConnection({ port: 1646 })` probe at the very top of + `index.ts` (before any Electrobun calls) lets you bail out *before* + showing any UI. On collision, raise the existing instance to the + foreground (Win32 `SetForegroundWindow` via the same FFI mechanism + already in use for `WM_SETICON`) and exit silently. This is the + "single-instance app" pattern most desktop apps use. +2. **Surface a clear error in the splash when init fails.** If for some + reason you can't move the check earlier, at least catch the failure + inside `deferredInit` and replace the splash content with a "Vault is + already running — switch to the existing window" message *before* + exiting the process. + +Option (1) is the right long-term fix; (2) is a sub-1-hour bandaid. + +#### How to reproduce + +1. Launch v1.2.14 from Start Menu. +2. Wait for full boot (`KeepKey Vault started!` in log) and confirm the + window is interactive. +3. Force-close just the *window* (e.g. kill the renderer via Task Manager, + or close it during a hang) — leaving the bun process alive on port 1646. + Or: launch from the terminal with `&` so the parent process holds the + port. +4. Double-click the app icon a second time. +5. Splash appears, hangs. Backend log shows the fatal port collision and + the second-instance exit. + +--- + +## Verification recipes + +### How to verify the hdwallet fix landed (after the bump in Finding 1) + +```sh +# Inside the installed bundle: +grep -n "KeepKey Initialize returned Features" \ + "$LOCALAPPDATA/Programs/KeepKeyVault/Resources/app/node_modules/@keepkey/hdwallet-keepkey/dist/keepkey.js" +# Expect a hit at ~line 1116 +``` + +Or trigger the validation deterministically by running the unit tests in +the submodule: +```sh +cd modules/hdwallet +yarn jest --config jest.config.js --testPathPattern keepkey-initialize +# Expect: 4 passed +``` + +### How to capture a fresh failure log + +```sh +# Clear the log so the next run is isolated +truncate -s 0 "$LOCALAPPDATA/com.keepkey.vault/vault-backend.log" + +# Launch normally, reproduce, then read: +cat "$LOCALAPPDATA/com.keepkey.vault/vault-backend.log" +``` + +The backend log is the **single source of truth** for diagnosing pair +failures — Windows has no separate Electrobun crash log, no WER dump in +the usual locations, and no `console.log` from the renderer makes it into +the file. If a session ends abruptly mid-`Scanning for HID device...`, +that's a Finding 2 (native crash) signature. + +--- + +## Files to know + +| File | Why | +|---|---| +| `modules/hdwallet/packages/hdwallet-keepkey/src/keepkey.ts:1190` | Site of Finding 1; fixed in keepkey/hdwallet#37 | +| `modules/hdwallet/packages/hdwallet-keepkey/src/keepkey-initialize.test.ts` | Regression tests added in keepkey/hdwallet#37 | +| `projects/keepkey-vault/src/bun/engine-controller.ts:220-261` | `start()` — Finding 2 lives in this code path, post-`fetchFirmwareManifest` USB pair flow | +| `projects/keepkey-vault/src/bun/engine-controller.ts:251-254` | Existing comment about libusb SIGTRAP on macOS attach — same family as Finding 2 | +| `projects/keepkey-vault/src/bun/engine-controller.ts:746-840` | `initializeWallet()` — WebUSB → HID fall-through; Finding 2 dies somewhere in here without logging | +| `projects/keepkey-vault/src/bun/index.ts` (`creating BrowserWindow` perf marker → `deferredInit()`) | Finding 3 lives in the ordering of window creation vs. port collision check | +| `%LOCALAPPDATA%\com.keepkey.vault\vault-backend.log` | Single source of truth for all three findings | +| `%LOCALAPPDATA%\Programs\KeepKeyVault\Resources\version.json` | Confirms which build is actually installed (don't trust the installer filename) | + +--- + +## What is NOT broken + +So you don't waste time chasing dead ends: + +- **The download.** SHA-256 of `KeepKey-Vault-1.2.14-win-x64-setup.exe` matches + the published asset digest exactly. Re-downloading does nothing. +- **The 1.2.13 cold-start `@noble/hashes` bug.** Verified fixed in 1.2.14 — + no nested `@noble/hashes` copies under `@noble/curves`, `@scure/bip32`, or + `@scure/bip39` in the installed bundle. The Electrobun Worker is alive + and reaches `[PERF] +410ms: boot complete` when no device is plugged in. +- **The protobuf decoder field numbers.** `messages_pb.js` in the installed + bundle has `field 2 → majorVersion`, matching `messages.proto` in the local + device-protocol submodule. No schema mismatch. +- **The frontend bundle.** `views/mainview/index.html` references + `./assets/index.js` which exists at the right size. Old hashed Vite + filenames from a previous install (`index-DljLeB3V.js` etc.) are + leftover but unreferenced — harmless. From f571feef02c05eda864040b89a67af0d31c71d2c Mon Sep 17 00:00:00 2001 From: highlander Date: Thu, 9 Apr 2026 14:44:42 -0500 Subject: [PATCH 2/2] =?UTF-8?q?fix:=20Windows=201.2.14=20=E2=80=94=20hdwal?= =?UTF-8?q?let=20bump,=20port=20collision=20guard,=20USB=20detach=20race?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three fixes for Windows pair failures documented in HANDOFF-1.2.14: 1. Bump hdwallet submodule to include #37 (Features version validation). Prevents `Invalid Version: vundefined.undefined.undefined` crash on WebUSB pair when device returns malformed Features. 2. Move port 1646 collision check before BrowserWindow creation. Previously the check ran in deferredInit() after the splash was visible — user saw an unresponsive window they had to force-kill. Now exits cleanly before any window appears. 3. Race WebUSB pairRawDevice against USB detach event. If the device is unplugged mid-write, the native usb addon segfaults (libusb "bad write"). The detach race rejects the promise cleanly before the native crash can occur. Co-Authored-By: Claude Opus 4.6 (1M context) --- modules/hdwallet | 2 +- .../src/bun/engine-controller.ts | 37 +++++++++++++++++-- projects/keepkey-vault/src/bun/index.ts | 23 ++++++++++-- 3 files changed, 53 insertions(+), 9 deletions(-) diff --git a/modules/hdwallet b/modules/hdwallet index 92ea4dae..c0c4c73a 160000 --- a/modules/hdwallet +++ b/modules/hdwallet @@ -1 +1 @@ -Subproject commit 92ea4dae3b66a0d2c9d4757e2b8c9c53e4cc8ad3 +Subproject commit c0c4c73ab5089f0e64860f51f310b5fe5953f3fd diff --git a/projects/keepkey-vault/src/bun/engine-controller.ts b/projects/keepkey-vault/src/bun/engine-controller.ts index 3bc48405..58a8b961 100644 --- a/projects/keepkey-vault/src/bun/engine-controller.ts +++ b/projects/keepkey-vault/src/bun/engine-controller.ts @@ -51,6 +51,30 @@ const PAIR_TIMEOUT_MS = 10000 // Retry interval when device is claimed by another app const CLAIMED_RETRY_MS = 5000 +/** + * Race a promise against a KeepKey USB detach event. + * If the device is unplugged before the promise settles, reject immediately + * instead of letting the native USB addon crash with a segfault on the + * now-disconnected device. The detach listener is cleaned up after settlement. + */ +function raceWithDetach(promise: Promise): Promise { + return new Promise((resolve, reject) => { + let settled = false + const onDetach = (device: any) => { + if (device.deviceDescriptor?.idVendor !== KEEPKEY_VENDOR_ID) return + if (settled) return + settled = true + usb.removeListener('detach', onDetach) + reject(new Error('Device disconnected during USB operation')) + } + usb.on('detach', onDetach) + promise.then( + (val) => { if (!settled) { settled = true; usb.removeListener('detach', onDetach); resolve(val) } }, + (err) => { if (!settled) { settled = true; usb.removeListener('detach', onDetach); reject(err) } }, + ) + }) +} + const WORD_COUNT_TO_ENTROPY: Record = { 12: 128, 18: 192, 24: 256, } @@ -777,10 +801,15 @@ export class EngineController extends EventEmitter { } console.log('[Engine] WebUSB device found, attempting pairRawDevice...') try { - const wallet = await withTimeout( - this.webUsbAdapter.pairRawDevice(webUsbDevice), - PAIR_TIMEOUT_MS, - 'WebUSB pairRawDevice' + // Race the pair against USB detach — if the device is unplugged mid-write, + // the native usb addon can segfault (libusb "bad write"). By racing against + // a detach listener we abort cleanly before the native crash. + const wallet = await raceWithDetach( + withTimeout( + this.webUsbAdapter.pairRawDevice(webUsbDevice), + PAIR_TIMEOUT_MS, + 'WebUSB pairRawDevice' + ) ) if (wallet) { this.activeTransport = 'webusb' diff --git a/projects/keepkey-vault/src/bun/index.ts b/projects/keepkey-vault/src/bun/index.ts index 52524a42..b61a2e8e 100644 --- a/projects/keepkey-vault/src/bun/index.ts +++ b/projects/keepkey-vault/src/bun/index.ts @@ -3927,6 +3927,22 @@ if (process.platform !== 'win32') ApplicationMenu.setApplicationMenu([ }, ]) +// ── Single-instance guard: check port BEFORE creating window ────────── +// If REST API is enabled and port 1646 is already bound, another Vault +// instance is running. Exit immediately — before the splash appears — +// so the user doesn't see an unresponsive window they must force-kill. +perf('pre-window port probe') +deferredInit() // need DB for settings +const earlyRestEnabled = getSetting('rest_api_enabled') === '1' +if (earlyRestEnabled) { + const portBusy = await isPortInUse(REST_API_PORT) + if (portBusy) { + console.error(`[Vault] FATAL: port ${REST_API_PORT} is already in use by another Vault instance. Exiting.`) + process.exit(1) + } +} +perf('port probe done') + perf('creating BrowserWindow') let _mainWindow: BrowserWindow | null = null const mainWindow = new BrowserWindow({ @@ -4003,10 +4019,9 @@ if (process.platform === 'win32') { } } -// ── Deferred startup: DB → settings → REST API → engine ────────────── -// Window is already created above — now initialize backend services. -perf('window created, starting deferred init') -deferredInit() +// ── Deferred startup: settings → REST API → engine ────────────────── +// DB already initialized in pre-window port probe above. +perf('window created, loading settings') auth.reloadPairings() loadSettings() await applyRestApiState()