Skip to content

fix(api): add code and param fields to OpenAI error responses#144

Merged
ethenotethan merged 1 commit into
Layr-Labs:swift-providerfrom
hankbobtheresearchoor:fix/error-response-code-param
May 14, 2026
Merged

fix(api): add code and param fields to OpenAI error responses#144
ethenotethan merged 1 commit into
Layr-Labs:swift-providerfrom
hankbobtheresearchoor:fix/error-response-code-param

Conversation

@hankbobtheresearchoor
Copy link
Copy Markdown
Contributor

Summary

Adds code and param fields to OpenAI-compatible error responses, fixing SDK error handling that currently breaks without them.

Closes #142

Problem

errorResponse() only populates type and message. The OpenAI error spec requires code and optionally param. Their absence breaks:

  • Python SDK: e.code returns None — can't program against error types
  • Node SDK: error.code is undefined — retry logic fails
  • Sentry/Datadog: groups all errors as one bucket (no code to distinguish)

Changes

Core: errorResponse now supports optional code and param

// Before
errorResponse("invalid_request_error", "model is required")
// → {"error": {"type": "invalid_request_error", "message": "model is required"}}

// After
errorResponse("invalid_request_error", "model is required", withParam("model"))
// → {"error": {"type": "invalid_request_error", "message": "model is required", "code": "invalid_request_error", "param": "model"}}
  • code defaults to errType — all 202 existing call sites are backward-compatible
  • withParam() / withCode() helpers for overrides

Call-site updates for OpenAI-canonical codes

Error type code param Why
model_not_found model_not_found "model" SDKs check e.code === 'model_not_found' and e.param === 'model'
invalid_request_error ("model is required") (default) "model" Param identifies the missing field
insufficient_funds insufficient_quota OpenAI canonical code for 402
rate_limit_exceeded rate_limit_exceeded Explicit for clarity (matches type)

Backward compatibility

  • Variadic opts ...errorDetailOpt — all existing calls compile unchanged
  • code defaults to errType — any SDK already pattern-matching on the type string gets the same value in code

Validation

go test ./internal/api/ -run "TestErrorResponse|TestEdge_ErrorResponseFormat" -v
# 8/8 PASS

go test ./internal/api/ -count=1
# ok  (72.7s)

New tests:

  • TestErrorResponse_CodeField — code defaults to errType
  • TestErrorResponse_WithCode — withCode overrides
  • TestErrorResponse_WithParam — withParam sets param
  • TestErrorResponse_WithCodeAndParam — both together
  • TestErrorResponse_JSONSerialization — full JSON round-trip
  • TestErrorResponse_CodeDefaultsToType — backward compat
  • TestErrorResponse_InsufficientFundsUsesCanonicalCode — canonical code
  • Updated TestEdge_ErrorResponseFormat — now asserts code and param

@vercel
Copy link
Copy Markdown

vercel Bot commented May 9, 2026

@hankbobtheresearchoor is attempting to deploy a commit to the EigenLabs Team on Vercel.

A member of the Team first needs to authorize it.

@hankbobtheresearchoor hankbobtheresearchoor changed the base branch from master to swift-provider May 13, 2026 18:06
The errorResponse function only populated type and message, missing
code and param required by the OpenAI API spec. Without code, SDKs
cannot programmatically distinguish error types (e.g. Python SDK
e.code returns None, retry logic breaks, Sentry groups all errors
as one).

Changes:
- errorResponse now accepts optional errorDetailOpt variadic args
- code defaults to errType for backward compatibility
- withParam() and withCode() helpers for call-site overrides
- model-not-found errors include param="model"
- model-is-required errors include param="model"
- insufficient_funds uses OpenAI-canonical code "insufficient_quota"
- rate_limit_exceeded gets explicit withCode for clarity

All 202 existing call sites are backward-compatible: the variadic
signature means they compile unchanged, and the default code=errType
matches the implicit behavior SDKs already assumed.

Closes Layr-Labs#142
@hankbobtheresearchoor hankbobtheresearchoor force-pushed the fix/error-response-code-param branch from 94ab31c to dc009f1 Compare May 13, 2026 18:17
@ethenotethan ethenotethan requested a review from Gajesh2007 May 13, 2026 18:28
@ethenotethan ethenotethan merged commit 12ac05b into Layr-Labs:swift-provider May 14, 2026
1 of 5 checks passed
Gajesh2007 added a commit that referenced this pull request May 15, 2026
* Clarify provider trust diagnostics

* Add Swift provider runtime

* Remove unused e2e vector generator

* Continuous batching, GPU-only enforcement, rename to darkbloom, Layr-Labs forks

This is the v0.5.0 cutover commit on the Swift provider PR. It lands
true continuous batching as the production inference path, threads
per-row sampling through the request, hard-fails on CPU-only hosts,
renames the user-visible CLI surface from "eigeninference" to
"darkbloom" with backward compatibility, and re-homes the mlx-swift /
mlx-swift-lm submodules to Layr-Labs forks.

Continuous batching (default, no parallel implementations)
----------------------------------------------------------
Replaces the per-request BatchScheduler with one shared BatchGenerator
ported from upstream `mlx_lm.generate`. All concurrent requests are
merged into one batched forward pass per step. Bit-identical against
single-stream greedy on:
  - Qwen3 0.6B-8bit (dense), B=2 / B=4-ragged
  - Qwen3.5 0.8B-MLX-4bit (hybrid SSM + attention), B=2
  - Gemma 4 26B-A4B-it-8bit (MoE, 26 GB), B=2

The mlx-swift-lm side of this work is at
Layr-Labs/mlx-swift-lm@darkbloom-continuous-batching:
  - BatchKVCache + BatchedCache protocol
  - SequenceStateMachine, PromptProcessingBatch, GenerationBatch,
    BatchGenerator
  - RowSamplers (temperature / top-P / top-K / seed)
  - Gemma 4 MoE support + K=V branch fix in Gemma4Attention

Production scheduler in provider-swift/Sources/ProviderCore/Inference/
BatchScheduler.swift wraps the engine in an actor; detached worker
calls into the actor only for short critical sections so cancel/submit
never queue behind a long-running step. submit() builds a per-row
sampler from request.{temperature, top_p, top_k, seed}.

Validation also covers eviction-and-admission: row 0 finishes mid-batch,
row C is admitted into its slot, row C's tokens match a solo run, row B
(running through the eviction) also matches its solo run. This locks in
BatchKVCache.filterBatched + extendBatched correctness end-to-end.

Sampler unit tests cover greedy passthrough, top-K=1 determinism,
top-K masking, top-P collapse-to-dominant, top-P=1 identity, seeded
reproducibility, and different-seed divergence.

GPU-only enforcement
--------------------
ProviderCore/Inference/GPUEnforcement.swift:
  - probeMetal(): non-throwing Metal device probe
  - requireMetal(): throws on missing GPU; pins Device.setDefault(.gpu);
    idempotent

Wired into BatchScheduler.loadModel, StartCommand, BenchmarkCommand,
and `darkbloom doctor`. Doctor surfaces a `[PASS] metal gpu: <name>,
<N> GB working set` line; `[FAIL]` on Intel/Linux. CPU fallback for
inference is rejected up-front with a descriptive error.

Rename: eigeninference → darkbloom (Swift CLI surface)
------------------------------------------------------
Canonical names:
  - eigeninference-enclave  → darkbloom-enclave (binary + struct)
  - Sources/eigeninference-enclave-cli/ → Sources/darkbloom-enclave-cli/
  - SwiftPM target EigenInferenceEnclaveCLI → DarkbloomEnclaveCLI
  - eigeninference-bundle-macos-arm64.tar.gz →
    darkbloom-bundle-macos-arm64.tar.gz
  - ~/.config/eigeninference/ → ~/.config/darkbloom/ (preferred path)
  - Mobileconfig prefix: EigenInference-Enroll-* → Darkbloom-Enroll-*

Backward compatibility:
  - install.sh creates a `eigeninference-enclave` symlink to
    `darkbloom-enclave` so existing install scripts keep resolving.
  - Config loader still reads ~/.config/eigeninference/ and the App
    Support legacy paths as fallbacks; new writes always go to
    ~/.config/darkbloom/.
  - LocalDataCleanup.purge() removes both directories.
  - release-swift.yml publishes the latest tarball under both
    canonical and legacy filenames.
  - NodeKeyPair.legacyDirNames and SecurityHardening MDM-profile-name
    matchers still accept the old name.
  - Coordinator/Rust/UI surfaces (R2 buckets, Stripe descriptors,
    Solana memos, telemetry source attribution) intentionally
    untouched.

CLI subcommands shipped in v0.5.0
---------------------------------
darkbloom serve / start / stop, status, doctor, models {list, catalog,
download, remove}, enroll, unenroll, login, logout, logs, autoupdate,
benchmark, update, verify. start --foreground is the launchd
entrypoint; start --local --port N runs a standalone OpenAI-compatible
HTTP server. PID-file single-instance enforcement, caffeinate-based
sleep prevention, panic-hook telemetry, and metallib hash in
attestation are all wired in.

Submodule re-homing
-------------------
.gitmodules now points to Layr-Labs/mlx-swift and Layr-Labs/mlx-swift-lm.
The mlx-swift pointer is unchanged (clean `main`). The mlx-swift-lm
pointer advances from 3ec4b8a (codex/local-mlx-swift-dependency) to
91612d5 (darkbloom-continuous-batching) which carries the batching
engine + Gemma 4 MoE fork on Layr-Labs/mlx-swift-lm.

Tests
-----
135 / 135 tests pass in 16.5 s with DARKBLOOM_LIVE_MLX_TESTS=1 and
DARKBLOOM_LIVE_MLX_GEMMA=1 (live MLX inference against real models
plus the gated 27 GB Gemma generation test).

* Bump mlx-swift-lm submodule to main after re-homing to Layr-Labs

Layr-Labs/mlx-swift-lm@main now carries the continuous-batching engine,
per-row samplers, and Gemma 4 MoE port at 8d76944. Same tree as the
prior 91612d5 commit on the darkbloom-continuous-batching branch, but
without the local-path mlx-swift dep hack, so the fork is consumable
by URL outside this repo.

* Untrack .claude/ files and drop dangling cross-references

The .claude/ directory holds local agent state (cursor task files,
working notes, the in-progress migration plan). Those don't belong in
the repo. Untrack the two committed markdown files and broaden the
.gitignore from `.claude/worktrees/` to `.claude/` so future agent runs
don't add them back. Strip the dead links to .claude/swift-migration-plan.md
from CLAUDE.md, provider-swift/README.md, docs/ARCHITECTURE.md, and
scripts/fetch-metallib.sh -- the surrounding prose stands on its own.

The local files remain on disk for active reference; only the tracking
is removed.

* Idle-timeout unload + coordinator-driven model preload protocol

Two related additions to the provider's model lifecycle:

1) Idle-timeout unload
----------------------
ProviderLoop now runs a background monitor that polls every minute.
If `idleTimeoutMins` minutes (default 60) have elapsed since the last
inference activity AND no requests are in flight, the loaded
ModelContainer is dropped. The next inference request lazy-reloads.
`idleTimeoutMins == 0` disables the monitor; the model stays
resident forever.

The decision is extracted into `IdleTimeoutPolicy.shouldUnload(...)`
so the rule is unit-testable without spinning up the full ProviderLoop
actor (which depends on Secure Enclave, coordinator client, and
security posture). Five unit tests pin the policy: (a) unloads when
all conditions met, (b) never unloads with inflight requests,
(c) never unloads with no model loaded, (d) waits for the timeout to
elapse, (e) zero-timeout edge case is still defensive.

Activity tracking: `lastInferenceAt` updates on every request
admission and on every request finish (`removeInflightTask`). The
worker is a detached `Task` so cancel/submit on the actor never
queue behind the timer.

2) Coordinator-driven model preload
-----------------------------------
New WebSocket message `coordinator → provider: load_model`. The
provider has no inbound listener (security: a discovered IP can't
reach the GPU), so the coordinator pushes preload requests over the
existing outbound WebSocket connection that the provider opened.
Use case: the coordinator predicts demand for model X on machine Y
in the next hour and warms it ahead of time.

Provider behavior:
  - If model is already loaded: short-circuit, reply succeeded.
  - Otherwise: emit `load_model_status` "started" immediately,
    kick off `ensureModelLoaded` in a detached Task, then emit
    "succeeded" or "failed" (with an error string) when the load
    settles.

Wire surface added in three places (per AGENTS.md sync rule):
  - coordinator/internal/protocol/messages.go: `TypeLoadModel`,
    `TypeLoadModelStatus`, `LoadModelMessage`, `LoadModelStatusMessage`,
    plus the `LoadModelStatusStarted/Succeeded/Failed` constants.
  - provider-swift/.../Protocol/Messages.swift: new
    `CoordinatorMessage.loadModel(...)` case + `ProviderMessage
    .loadModelStatus(...)` case + Codable on both sides.
  - provider-swift/.../Coordinator/CoordinatorClient.swift: dispatch
    inbound `load_model` to a new `CoordinatorEvent.loadModel(modelId)`
    and add `OutboundMessage.loadModelStatus(...)` for the reply.

ProviderLoop wires `handleLoadModelRequest(modelId:send:)` for the
new event. Round-trip tests cover decoding a Go-style `load_model`
JSON and encoding all three lifecycle status replies (started /
succeeded / failed-with-error) with snake_case wire keys.

Rust legacy provider intentionally untouched. The coordinator
should gate `load_model` dispatch on `backend == "mlx-swift"` so
the Rust path never receives an unknown message; that gate lives
on the coordinator side and isn't part of this commit.

Tests
-----
141 / 141 tests pass with DARKBLOOM_LIVE_MLX_TESTS=1 and
DARKBLOOM_LIVE_MLX_GEMMA=1 (live MLX inference + Gemma 4 26B-A4B-it-8bit
MoE batching included). New: 5 IdleTimeoutPolicy tests + 1
loadModel round-trip protocol test.

* Add end-to-end performance tests: TTFT, encryption, batching, model load

Four new live tests that produce reproducible numbers for the four
scenarios the operator asked about. Gated by DARKBLOOM_LIVE_MLX_TESTS=1;
all four target Qwen3 0.6B-8bit so the suite finishes in ~7 s.

  A) warm TTFT baseline -- pure inference TTFT with no encryption
     and the model already loaded.
  B) cold TTFT          -- spins up a fresh ModelContainer each
     iteration so the weights are re-paged from disk; reports
     load_time and load_time + first_token separately.
  C) encrypted TTFT     -- runs the request body through
     NodeKeyPair.encrypt (consumer side) and NodeKeyPair.decrypt
     (provider side) with real libsodium NaCl box, then submits.
     Reports encrypt-only, decrypt-only, warm TTFT, and total
     E2E first-token (enc + dec + TTFT) so each layer's cost is
     visible.
  D) batched TTFT       -- B=1, B=2, B=4 concurrent submissions on
     a single shared scheduler. Reports per-row TTFT and aggregate
     throughput so the continuous-batching scaling story is honest.

Headline numbers on M4 Max with Qwen3 0.6B-8bit:

  warm TTFT (plaintext):             ~20 ms
  encrypt (consumer side):           ~0.05 ms (libsodium NaCl box)
  decrypt (provider side):           ~0.02 ms
  E2E first-token (enc+dec+TTFT):    ~31 ms
  cold model load:                   ~856 ms
  cold load + first token:           ~1036 ms
  aggregate throughput B=1:          87.4 tok/s
  aggregate throughput B=2:          176.2 tok/s   (~2.0x)
  aggregate throughput B=4:          317.1 tok/s   (~3.6x)
  per-request TTFT B=1 -> B=4:       34 ms -> 36 ms (flat)

Encryption is essentially free, continuous batching scales
near-linearly to B=4, and per-request TTFT is invariant under
batching -- the key continuous-batching scheduler invariant.

The tests assert lower-bound liveness (durations > 0, all rows
complete) but don't pin absolute latencies, since those vary by
hardware. Numbers print to stderr in a "[perf]" prefix so they
land in the test log without polluting test stdout.

While here, fixed a `String(format:)` bug in the printRow helper
where `%s` was used with a Swift String (would have segfaulted
the test process via _platform_strlen on an unaligned pointer).

145 / 145 tests pass in 9 s with DARKBLOOM_LIVE_MLX_TESTS=1.

* Add Gemma 4 26B-A4B-it-8bit MoE tier to performance suite

Refactor PerformanceLiveTests so every scenario (warm TTFT, cold load,
encrypted E2E, batched throughput) is parameterised by a `ModelConfig`
struct (label, modelID, wired-memory budget, iteration counts, batch
sizes, max_tokens). Two configs ship in the suite:

  - Qwen3 0.6B-8bit         smoke tier (DARKBLOOM_LIVE_MLX_TESTS=1)
  - Gemma 4 26B-A4B-it-8bit production tier
                            (DARKBLOOM_LIVE_MLX_TESTS=1 +
                             DARKBLOOM_LIVE_MLX_GEMMA=1)

Both run all four scenarios. Total 8 @test methods (4 + 4).

Headline numbers on M4 Max with weights memory-mapped from local cache:

  Gemma 26B MoE:
    warm TTFT                     309 ms
    cold load                     2.63 s
    cold load + first token       3.07 s
    encrypt (consumer side)       0.05 ms
    decrypt (provider side)       0.03 ms
    E2E first-token               262 ms
    B=1 throughput                10.2 tok/s
    B=2 throughput                16.7 tok/s   (1.64x)
    B=4 throughput                23.9 tok/s   (2.34x)

  Qwen3 0.6B (for comparison):
    warm TTFT                     ~21 ms
    cold load                     ~887 ms
    E2E first-token               ~32 ms
    B=4 throughput                ~302 tok/s

Three things the Gemma tier surfaces that the smoke tier doesn't:

1. Encryption is *still* essentially free at 26B scale -- 70-80 us
   combined for encrypt + decrypt, dwarfed by the 200+ ms
   memory-bandwidth-bound prefill.
2. Per-row TTFT scales SUB-linearly with B for MoE (234 -> 344 -> 603
   ms at B=1/2/4) because each batched prefill processes a heavier
   forward. Aggregate throughput still wins (10 -> 17 -> 24 tok/s).
3. Cold load on a 26 GB MoE that's still in the OS page cache is
   ~2.6 s -- the relevant number for the idle-timeout-reload path.
   First-ever boot would be longer (NVMe-bound), but unmeasurable
   from a unit test without privileged page-cache flushing.

Also tighten the report formatting: column padding to 56 chars, "ms"
under 1 s and "s" above, max_tokens=8 for Gemma (vs 16 for Qwen) so
the suite finishes in ~30 s with all four scenarios run twice.

149 / 149 tests pass in 37 s with both env vars set.

* Performance audit vs mlx_lm: bracket the dispatch-overhead gap

The user noticed that "10.2 tok/s for Gemma 26B" looked too low. They
were right. Side-by-side with `mlx_lm` 0.31.3 Python on the same M4
Max + same checkpoints:

  Qwen3 0.6B-8bit              mlx_lm: 426 tok/s   us: ~84 tok/s   (5.0x)
  Gemma 4 26B-A4B-it-8bit MoE  mlx_lm:  84 tok/s   us: ~33 tok/s   (2.4x)

To localize the gap, this commit adds a "decode-tps bracket" test
that measures the same B=1 steady-state decode through three paths:

  1. Pure model loop  -- model.callAsFunction directly, no scheduler
  2. BatchGenerator   -- our continuous-batching engine, B=1
  3. BatchScheduler   -- production path (actor + AsyncStream)

Findings on Gemma 26B MoE (decode-only, 64 tokens):

  pure loop, sync eval        34.6 tok/s
  pure loop, async eval       34.4 tok/s    (no improvement -- not
                                              the issue)
  BatchGenerator B=1          32.6 tok/s    (-6%, noise-level)
  BatchScheduler.submit       32.5 tok/s    (-6%, noise-level)

  mlx_lm Python reference     84.0 tok/s    (2.4x faster)

Conclusion: the gap is at the **MLX-Swift dispatch layer**, not in
our scheduler or batched-cache code. The pure model loop is already
2.4x slower than Python. Adding our BatchScheduler + actor + worker
adds < 6% on top -- not the bottleneck.

The 8-13 ms per-step CPU overhead is consistent with kernel-launch
latency in mlx-swift bindings. mlx_lm Python uses `mx.compile` on
the decode step to amortize this; mlx-swift-lm does not. Closing
the gap is a separate workstream on the upstream library.

Other improvements in this commit:

* Bump Gemma's batched max_tokens from 8 -> 32 so steady-state
  decode dominates the aggregate TPS metric.
* Add steady-state decode TPS reporting alongside aggregate (subtract
  prefill so it compares like-for-like with mlx_lm's "Generation:
  X tokens-per-sec" headline).
* Switch the throughput tests to a long-output prompt ("write a 200
  word story...") so the model decodes to max_tokens instead of
  hitting EOS at ~12 tokens. The B=1 number was misleadingly low
  before because the prior prompt asked for "a single word".
* Add async-eval pipelining variant to the bracket -- confirms
  mx.async_eval alone doesn't close the gap (which means the missing
  optimization is `mx.compile`, not just async dispatch).
* Add Qwen3 bracket test alongside the Gemma one.
* Document the gap explicitly in the file header so future
  optimisation work has a clear target.

Honest headline numbers (M4 Max, weights memory-mapped from cache):

  Gemma 26B MoE warm TTFT             280-352 ms
  Gemma 26B MoE cold load             3.32 s   (re-page from cache)
  Gemma 26B MoE encrypt+decrypt       0.10 ms  (free)
  Gemma 26B MoE steady-state decode   32-40 tok/s   B=1
                                      35-39 tok/s   B=4 aggregate
  Qwen3 0.6B steady-state decode      84 tok/s      B=1
                                      323 tok/s     B=4 aggregate

Continuous batching itself works correctly: B=4 aggregate is 2.9x
B=1 (Gemma) and 3.8x B=1 (Qwen). The dispatch-overhead headwind
applies equally to all batch sizes.

151 / 151 tests pass in 71 s with both env vars set.

* Compare against mlx_lm batched + greedy fast-path in BatchScheduler

The previous perf audit only compared B=1 against mlx_lm. This commit
extends the comparison to B=1, B=2, B=4 by adding a Python benchmark
script (scripts/mlx_lm_batch_bench.py) that drives mlx_lm's upstream
BatchGenerator, and applies one targeted Swift-side optimization
based on what the comparison surfaced.

Reference numbers (mlx_lm 0.31.3, M4 Max, decode-only tok/s):

  Qwen3 0.6B-8bit              B=1: 265   B=2: 694   B=4: 1119
  Gemma 4 26B-A4B-it-8bit MoE  B=1:  74   B=2: 126   B=4:  181

The gap WIDENS with batch size, which pointed at an O(B) overhead in
our per-row sampling path. Smoking gun: GenerationBatch.step takes a
slow path whenever ANY row's sampler is non-nil, doing B separate
slice + sample + concat ops (=> 9 kernel launches per token at B=4)
instead of the vectorized fallback (=> 1 kernel launch). Our
BatchScheduler.submit was passing a non-nil greedy closure even when
temperature == 0, forcing every batch through the slow path.

Fix: when temperature <= 0, pass `nil` so the row falls through to
the vectorized fallback. The fallback is also greedy, so the result
is identical -- only the dispatch path changes. Per-row temperature
/ top-P / top-K / seed all still work for non-greedy rows.

Swift numbers after the fix (decode-only):

  Qwen3 0.6B-8bit              B=1:  88   B=2: 181   B=4:  351   (was 84 / 174 / 323)
  Gemma 4 26B-A4B-it-8bit MoE  B=1:  37   B=2:  23   B=4:   42   (was 33 / 21 / 39)

Modest +6-13% across the board. The remaining 3-4x gap to Python is
at the MLX-Swift dispatch layer (per-step kernel-launch overhead);
mlx_lm closes it via `mx.compile` on the decode step, which isn't
applied in mlx-swift-lm. That's a separate workstream.

Continuous batching scaling is still healthy:
  Qwen B=4 / B=1 = 4.0x   (matches mlx_lm's 4.2x exactly)
  Gemma B=4 / B=1 = 1.1x  (mlx_lm's is 2.4x; gap reflects MoE expert
                           dispatch where Python's compile pays off most)

Other changes:
* scripts/mlx_lm_batch_bench.py -- runnable apples-to-apples bench
  for future regression checks. Reproduces the reference numbers in
  the file header.
* Update PerformanceLiveTests.swift docstring with the side-by-side
  table so the gap is visible to anyone reading the test.

151 / 151 tests pass.

* Perf compare mlx_lm batching and bump mlx-swift-lm decode optimizations

The user called out that our Gemma 26B throughput looked too low, so this
commit makes the comparison apples-to-apples against mlx_lm Python's
BatchGenerator and bumps the mlx-swift-lm submodule to the optimized main
commit.

New reference script:
  scripts/mlx_lm_batch_bench.py

It runs mlx_lm.generate.BatchGenerator at B=1/2/4 over the same long-output
prompt used by PerformanceLiveTests and reports prefill+1, decode-only TPS,
and aggregate TPS. Reference numbers on M4 Max:

  Qwen3 0.6B-8bit              B=1: 265   B=2: 694   B=4: 1119 tok/s
  Gemma 4 26B-A4B-it-8bit MoE  B=1:  74   B=2: 126   B=4:  181 tok/s

Swift improvements landed in Layr-Labs/mlx-swift-lm@b02ea5b:

  - mlx_lm-style double buffering in GenerationBatch: constructor primes
    the first token, next() returns current token while async-evaluating
    the following token.
  - Greedy fast path avoids logSumExp: argMax(logits) == argMax(logprobs),
    and we don't expose logprobs downstream today.
  - BatchScheduler now passes nil for temperature=0 samplers so batches
    use the vectorized greedy fallback instead of per-row slice/sample/concat.
  - Token tensors are UInt32 to match mlx_lm.
  - BatchKVCache now exposes innerState and KVCache conforms to Updatable,
    which fixes the cache state surface needed for future compile work.

Measured Swift deltas:

  Qwen3 0.6B:
    B=1 decode      ~84 -> ~104 tok/s
    B=4 aggregate   ~323 -> ~363 tok/s

  Gemma 26B MoE:
    B=1 decode      ~32 -> ~37 tok/s
    B=4 aggregate   ~39 -> ~40 tok/s

This closes the avoidable scheduler/batching overhead we found, but does
not fully close the remaining 2-4x gap to Python. The bracket test shows
BatchGenerator/BatchScheduler are now within noise of the pure model loop;
the remaining gap is in mlx-swift model dispatch / lack of stateful
mx.compile support. Attempting to compile the batched-cache decode graph
still fails in mlx-swift with "uncaptured inputs", so that remains an
upstream library workstream rather than a provider scheduler bug.

* Clarify release-mode batch performance measurements

The previous perf notes mixed debug-mode Swift numbers with mlx_lm Python
reference numbers, which made the Swift engine look far worse than it is.
This test-only cleanup makes the performance suite report the data needed
to keep comparisons honest.

Changes:
- Update the PerformanceLiveTests header to state explicitly that mlx_lm
  comparisons must use `swift test -c release`; debug Swift is several
  times slower and not a valid reference.
- Add direct BatchGenerator B=2/B=4 decode-only measurements to the
  bracket test, in addition to pure loop and BatchScheduler.submit.
- Add "model-side scheduler" TPS in the public batched test so we can
  distinguish model decode speed from public text streaming / AsyncStream /
  detokenization costs.

Release-mode checks on this machine:
- Qwen3 0.6B direct BatchGenerator B=4: ~1130 tok/s, matching mlx_lm's
  ~1119 tok/s reference.
- Gemma 4 26B-A4B-it-8bit direct BatchGenerator B=4: ~186 tok/s,
  matching mlx_lm's ~181 tok/s reference.
- BatchScheduler.submit B=1 decode bracket also lands at the direct model
  rate in release mode (~402 tok/s Qwen, ~79 tok/s Gemma); public streaming
  tests report separate model-side and aggregate numbers so regressions are
  localizable.

No production code changes in this commit.

* Complete Swift provider runtime verification

* Bridge Rust updater to Swift provider bundles

* Add Rust to Swift updater E2E tests

* Add Rust bridge release workflow

* E2E testbed: integration tests, profiling, and benchmarking infrastructure (#136)

* Flatten coordinator/internal/ to coordinator/, add E2E integration test suite

Promote Go module root from coordinator/ to repo root so the e2e
test suite can import coordinator packages. Flatten
coordinator/internal/ to coordinator/ to remove the Go internal
package restriction.

All import paths change from
github.com/eigeninference/coordinator/internal/X to
github.com/eigeninference/d-inference/coordinator/X.
The module path is now github.com/eigeninference/d-inference.

12 E2E integration tests using the Swift provider (mlx-swift backend):
- NonStreamingInference, StreamingInference
- MultipleRequestsAccounting, E2EEncryptionCorrectness
- BillingBalanceDeduction, ProviderPayoutSplit, ReferralRewardDistribution
- InsufficientBalance, InvalidModel
- StreamingContentValidation, ConcurrentRequests, AttestationHeaders

Each test gets its own isolated suite (Postgres + coordinator + provider)
via startSuite(t). A semaphore serializes suite lifecycles to prevent
GPU contention from concurrent MLX model loads.

Update CI workflows to reference go.mod at repo root, exclude e2e/
from unit tests, and use swift build for the provider.

* Move coordinator e2e back to coordinator/internal/e2e/

The coordinator's own e2e package was incorrectly flattened into
coordinator/e2e/ alongside the repo-root e2e/ testbed suite.
Restore it to coordinator/internal/e2e/ where it belongs.

* Run integration tests on any PR, not just master/main

* Fix CI: install Docker on macos-15, increase timeout to 30m, serial tests

* Use colima for Docker on macOS CI

* Remove invalid --no-mount flag from colima start

* Add native Postgres fallback, drop Docker/colima from CI

Docker Desktop and colima both fail on macOS CI runners due to
virtualization restrictions. Add a native Postgres lifecycle that
uses initdb + postgres directly (installed via Homebrew).

The Start() method tries Docker first, falls back to native.
CI now installs postgresql@16 via brew instead of Docker.

* Download MLX model in CI before running integration tests

* Use Python API for model download (huggingface-cli is deprecated)

* Use shared suite across all integration tests

Instead of starting a new suite (Postgres + coordinator + provider +
model load) per test, use a single shared suite initialized on first
access. This cuts total test time from ~18min to ~3min since the
expensive model load only happens once.

* Build provider in debug mode for CI (skips SIP/security checks)

CI macOS runners have SIP disabled, which causes the provider to
exit with 'System Integrity Protection is disabled'. Debug builds
skip verifySecurityPosture() via #if !DEBUG, allowing tests to
run on CI.

Add TESTBED_PROVIDER_CONFIG env var (default: release) to control
the Swift build configuration from testbed.

* Force-trust provider in tests, disable frequent challenges

CI macOS runners have SIP disabled, which causes the provider to
fail attestation challenges. Add ForceTrustProvider() to override
status/trust/SIP verification for testing, set challenge interval
to 1h, and add a 3s delay after registration to let the initial
challenge fire before overriding.

* Force all privacy capabilities in ForceTrustProvider for testing

The private-text routing gate checks PythonRuntimeLocked and
DangerousModulesBlocked which are always false on the Swift
backend (no Python runtime). ForceTrustProvider now sets all
privacy capabilities to true and drains queued requests
immediately after trust promotion.

* Restore per-test isolated suites

Each test gets its own Postgres + coordinator + provider.
With debug builds, ForceTrustProvider, native Postgres, and
model pre-download, each suite starts in ~15-20s.

* Add load generator, profiling tests, multi-provider support

- Suite.Providers is now []*Provider; TESTBED_NUM_PROVIDERS env var
  controls how many provider subprocesses start per suite
- New LoadGenerator in testbed/load.go with configurable concurrency,
  total requests, streaming, max_tokens, temperature
- New profile tests: SingleProviderStreaming, SingleProviderNonStreaming,
  HighConcurrency — each prints segment tables with mean/p50/p95/max
- Existing integration tests (NonStreaming, Streaming, Concurrent) now
  emit Instrument events and print profile tables
- Profile SummaryTable uses millisecond resolution instead of microsecond

* Add multi-model provider specs, user pool, and latency decomposition headers

SuiteConfig now takes ModelSpecs (model ID + provider count per model) and
NumUsers. Providers are started per-spec with unique PID files (fixes
single-instance lock killing sibling providers). A user pool with round-robin
API key rotation is created at startup.

Coordinator sets X-Queue-Wait-Ms and X-Provider-Latency-Ms response headers
from PendingRequest timing fields (QueuedAt, DispatchedAt, FirstChunkAt).
LoadGenerator parses these and emits per-segment stats:
client_to_coordinator, queue_wait, coordinator_to_provider, provider_to_client.

Provider ProcessLifecycle respects DARKBLOOM_PID_FILE env var for
multi-instance testing. Add SetSkipChallenge to Server for test runs.

* Rename SegmentClientToCoordinator to SegmentTotalE2E

The segment measures full end-to-end wall clock time, not just
client-to-coordinator latency. The old name was misleading.

* Decompose X-Timing header into per-phase microsecond breakdown

Replace X-Queue-Wait-Ms / X-Provider-Latency-Ms with a single X-Timing
JSON header containing parse_us, reserve_us, route_us, queue_us,
encrypt_us, dispatch_us, provider_us. Move timing fields onto a
RequestTiming struct in PendingRequest. LoadGenerator parses the JSON
and emits per-segment stats with auto ms/µs precision.

* Add latency regression assertions, SegmentStatsMap, and heavy-load benchmark

- Add SegmentStatsMap() to LoadResult for per-segment mean/p50/p95/p99/max
- Wire coordinator overhead assertions into all benchmark and profile tests
- Update DefaultThresholds with realistic values based on benchmark data
- Add CoordinatorOverheadThresholds() alias
- Deduplicate SegmentStatsView (assert package uses type alias to testbed)
- Clean up profile_test.go: remove redundant second load loop, use assertions
- Add PromptBytes field to RequestConfig for large-payload testing
- Add HeavyLoad 100-concurrent 10KB benchmark
- Replace bubble sort with sort.Slice in computeStats

* Split CI into eval + benchmark jobs, post benchmark results as PR comment

Integration tests (TestIntegration|TestProfile) run on every push/PR.
Benchmarks (TestBenchmark) run only on PRs and post a markdown summary
as a PR comment via gh pr comment. LoadResult and AssertionReport gain
SummaryMarkdown() methods for markdown table formatting. A TestMain in
benchmark_test.go writes the aggregated markdown to BENCHMARK_MD_PATH
when set.

* Skip multi-model benchmark in CI (gemma model not downloaded)

The M1 Virtual CI runner only downloads Qwen3.5-0.8B; the gemma
multi-model test requires a second model that isn't available.

* Download gemma-3-270m-4bit in CI, remove multi-model skip

* Include model IDs and RAM sizes in benchmark PR comment

* address feedback

* fix: soft-fail Swift tests on dev + download full model for CI

* feat: environment-scoped R2 + coordinator secrets for dev/prod release isolation

- Move R2_BUCKET from vars to secrets so it participates in GitHub
  environment scoping (dev vs prod get different buckets/credentials)
- Add documentation header listing all environment-scoped secrets
  required per environment
- Soft-fail Swift unit tests on dev releases (live MLX model cache
  may be incomplete on CI)
- Download full model (remove --include filter) for deterministic
  CI cache seeding

* feat: DEV_/PROD_ prefixed repo secrets for R2 + coordinator env isolation

Both release workflows now resolve DEV_ or PROD_ prefixed repo secrets
in a resolve-env step using bash indirection — no GitHub environments
needed. The environment: gate is removed since secrets live at repo
level with prefixes.

Required repo secrets:
  DEV_R2_ACCESS_KEY_ID, PROD_R2_ACCESS_KEY_ID
  DEV_R2_SECRET_ACCESS_KEY, PROD_R2_SECRET_ACCESS_KEY
  DEV_R2_ENDPOINT, PROD_R2_ENDPOINT
  DEV_R2_BUCKET, PROD_R2_BUCKET
  DEV_R2_PUBLIC_URL, PROD_R2_PUBLIC_URL
  DEV_COORDINATOR_URL, PROD_COORDINATOR_URL
  DEV_RELEASE_KEY, PROD_RELEASE_KEY

* fix: RELEASE_KEY is shared, not env-prefixed

* fix: resolve env secrets inline to avoid GitHub cross-job output masking

* fix: add DEV_RELEASE_KEY/PROD_RELEASE_KEY to env-prefixed secrets

* Add STRIDE threat model for runtime security review

40 threats across 9 trust boundaries (coordinator/provider WebSocket,
provider operator vs process, browser/UI, Apple MDM/MDA, admin API,
inference engine, payments, Apple attestation chain). Adversaries:
malicious provider, malicious consumer, external attacker. Each threat
includes affected_files globs, mitigations with status, open_findings
links to the existing security audit, and a detection_hint for
automated PR review.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* Expand threat model trust boundaries with implementation detail

Each of the 9 trust boundaries now documents how_it_works (exact code
paths, line numbers, auth mechanisms, data flows) and current_limitations
(specific open gaps with SEC-* references). Sources: coordinator/internal/
api/{server,provider,release_handlers,device_auth,billing_handlers}.go,
registry/registry.go, attestation/, mdm/, provider-swift/Sources/
ProviderCore/Security/{AntiDebug,BinaryHasher,SecureEnclaveIdentity,
SecurityHardening}.swift, Crypto/NodeKeyPair.swift, Inference/
{BatchScheduler,IdleTimeoutPolicy,InferenceCancellation}.swift,
ProviderLoop.swift, console-ui/src/{hooks/useAuth,lib/{api,store,
encryption}}.ts, next.config.ts.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* Add threat model PR review workflow

On every PR against master/main, the workflow:
1. Gets the PR diff via gh pr diff
2. Matches changed files against affected_files globs in docs/threat-model.yaml
3. Calls Claude API (claude-sonnet-4-6) with the focused diff + full threat model
4. Posts (or updates) a single PR comment with STRIDE-based security analysis

Uses prompt caching on the static threat model block to minimise API cost
on repeated pushes. The comment marker <!-- threat-model-review --> lets
the workflow update rather than append on each push.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* Persistent Secure Enclave key with keychain access group enforcement (#146)

* Add persistent Secure Enclave attestation key with keychain access group enforcement

Replace ephemeral CryptoKit SE keys with persistent Security framework keys stored
in the macOS data protection keychain. The key is bound to the signing team's
keychain access group (SLDQ2GJ6TL.io.darkbloom.provider), enforced by securityd
at the kernel level. A patched binary re-signed with codesign -s - gets
errSecMissingEntitlement and cannot access the key.

- PersistentEnclaveKey: Security framework SE key with SecKeyCreateRandomKey,
  kSecAttrIsPermanent, and team-scoped access group
- AttestationSigner protocol: abstracts over both ephemeral and persistent keys
- ProviderLoop: tries persistent key first, falls back to ephemeral with warning
- Entitlements plist with keychain-access-groups for production signing
- 8 tests covering creation, persistence, signing, deletion, protocol conformance

* Embed provisioning profile in .app bundle for persistent SE key

The data protection keychain requires a provisioning profile to authorize
the keychain-access-groups entitlement. Wrap the CLI binaries in a minimal
Darkbloom.app bundle with embedded.provisionprofile so the persistent SE
attestation key works on provider machines.

- release-swift.yml: new step decodes PROVISIONING_PROFILE_BASE64 secret,
  builds Darkbloom.app/Contents/ structure, signs bundle + individual binaries
- install.sh: detects .app bundle layout, symlinks bin/ into the app bundle
- Backward-compatible: falls back gracefully if secret is not set or if
  provider receives a flat (pre-.app) bundle

* Add com.apple.application-identifier to provider entitlements

Required for data protection keychain access. Must match the bundle ID
in the provisioning profile (SLDQ2GJ6TL.io.darkbloom.provider).

* Address review: data protection keychain flag, tighter error handling, real SE probe

Codex P1 / hank P1:
- coordinator/api/install.sh: restore __DARKBLOOM_COORD_URL__ placeholder
  (the coordinator templates this at serve time via server.go;
  hardcoding the URL broke dev/self-hosted coordinators)
- PersistentEnclaveKey: add kSecUseDataProtectionKeychain: true to all
  Security framework calls. Without it, queries may hit the legacy
  file-based keychain where access group enforcement is silently ignored.

hank P2:
- loadOrCreate: catch only errSecItemNotFound before falling through to
  createNew. Auth failures, locked keychain, and missing entitlement
  now propagate to the caller instead of racing with key creation.
- isAvailable: probe real SE capability via CryptoKit's
  SecureEnclave.isAvailable instead of just checking macOS version.
  Now returns false on Intel Macs without T2 and macOS VMs without
  virtualized SE. Added doc comment noting the entitlement dependency.

* fix(api): add code and param fields to OpenAI error responses (#144)

The errorResponse function only populated type and message, missing
code and param required by the OpenAI API spec. Without code, SDKs
cannot programmatically distinguish error types (e.g. Python SDK
e.code returns None, retry logic breaks, Sentry groups all errors
as one).

Changes:
- errorResponse now accepts optional errorDetailOpt variadic args
- code defaults to errType for backward compatibility
- withParam() and withCode() helpers for call-site overrides
- model-not-found errors include param="model"
- model-is-required errors include param="model"
- insufficient_funds uses OpenAI-canonical code "insufficient_quota"
- rate_limit_exceeded gets explicit withCode for clarity

All 202 existing call sites are backward-compatible: the variadic
signature means they compile unchanged, and the default code=errType
matches the implicit behavior SDKs already assumed.

Closes #142

* feat: add Datadog observability stack for dev coordinator (#143)

* Fix Darkbloom analytics tracking

* Harden release workflow protections (#103)

* Harden release registration and binary hash policy (#99)

* Harden release registration and binary hash policy

* derive release download URL from allowlist

* Stabilize provider coordinator test

---------

Co-authored-by: Gajesh Naik <26431906+Gajesh2007@users.noreply.github.com>

* Remove stale Python integration test (#109)

* e2e: add local simulation environment skeleton

Introduces scripts/e2e-runner.py, a Python orchestrator that spins up the
real coordinator binary with test-friendly configuration (in-memory store,
mock billing, no trust requirements) alongside a simulated or real
provider, and runs HTTP/WebSocket-level assertions against the live stack.

Key components:
- Coordinator class: builds and spawns coordinator with EIGENINFERENCE_MIN_TRUST=none,
  EIGENINFERENCE_BILLING_MOCK=true, and in-memory store
- SimulatedProvider: pure-Python WebSocket client speaking the full provider protocol
  (register, attestation challenge/response, heartbeat, inference request/response)
- Test framework: decorator-based test registration, pass/fail summary, signal-safe
  cleanup via atexit + signal handlers
- Test stubs: test_basic (registration + discovery), test_inference (consumer
  request routing), test_multi_provider (two providers, same model)

TODO:
- RealProvider wrapper around darkbloom serve --coordinator
- Coordination between provider challenge cycle and consumer request timing
- API key handling for consumer vs admin routes
- Python dependency management (websockets, cryptography)

* Revert "e2e: add local simulation environment skeleton"

This reverts commit d02074e. The Python E2E runner adds noise on top of
the existing Go integration tests (internal/api/integration_test.go +
fullstack_integration_test.go) which already cover the full coordinator
protocol surface. The cross-language orchestration doesn't buy anything
over what httptest.Server + simulated providers already provide.

* Remove stale Python integration test

@ethenotethan

tests/integration_test.py is superseded by the Go-based coordinator
integration tests at coordinator/internal/api/:

- Test coverage for coordinator protocol (register, challenge, heartbeat,
  inference) is covered by integration_test.go using httptest.Server +
  Go simulated providers — same coverage, no binary build needed
- Full-stack GPU inference is covered by fullstack_integration_test.go
  with real vllm-mlx backends (gated behind LIVE_FULLSTACK_TEST=1)
- The Python test uses stale binary names ('eigeninference-provider'),
  old flags ('--backend mlx-lm'), and predates attestation challenges,
  E2E encryption, and the vllm-mlx backend migration
- No external dependency coverage (Postgres, Stripe, etc.) is lost — the
  coordinator main.go wiring for those is trivially tested elsewhere
- The Python SDK tests (4.5.x) belong in the SDK repo, not the infra repo

---------

Co-authored-by: Hank Bob <hankbob@researchoors.com>

* chore: remove unused dependencies (#112)

* chore: remove unused dependencies

* test: fix console ui test isolation

* chore: prune repo-wide dead code findings

* ci: run CI on any PR, not just master/main (#119)

* ci: remove racing deploy-dev-coordinator workflow (#137)

Cloud Build (deploy/gcp/cloudbuild.yaml) already deploys the coordinator
on the same trigger (push to master touching coordinator/** or deploy/gcp/**).
Having both paths active creates a race condition where two CI systems
simultaneously deploy to the same dev VM — see #115.

* feat: add Datadog observability stack for dev coordinator

Install Datadog Agent on the dev GCE VM (DogStatsD, APM, journald logs)
and wire the coordinator to emit structured metrics, split attestation
counters, model_type tags, reactive provider-count gauges, and a
completion-tokens counter. Rebuild the dev dashboard with 7 sections
covering metrics, logs, traces, and system health.

* fix: prevent double-decrement when untrusted provider disconnects

Disconnect now checks StatusUntrusted before decrementing the online
counter and model-provider gauges, since MarkUntrusted already
decremented them.

* feat: add fleet version and binary hash observability

New metrics:
- providers.per_version gauge (per provider binary version)
- providers.per_binary_hash gauge (per attested binary hash)
- coordinator.min_provider_version_set gauge (1 when configured)
- provider_version_below_minimum counter (tagged by gate and version)

Gates instrumented:
- registration (provider.go)
- challenge revalidation (provider.go)
- manifest sync (server.go)

Registry additions:
- ProviderCountByVersion()
- ProviderCountByBinaryHash()

Dashboard: Fleet Version & Binary Hash group with providers by version,
providers by binary hash, min provider version, below-minimum events,
and top binary hashes toplist.

* fix: update Dockerfile + cloudbuild for go.mod at repo root

go.mod moved from coordinator/ to repo root during the swift-provider
merge. Build context is now repo root, Dockerfile copies coordinator/
subdir explicitly.

* fix: chmod +x coordinator binary in Dockerfile

* fix: ensure coordinator binary is executable in builder stage

* fix: rename coordinator source dir in builder to avoid colliding with binary path

* fix: copy full repo in Dockerfile builder so go.mod resolves all packages

* fix: remove unused modelTypeTag and format Go files for CI

* fix: skip python/dangerous-modules check for swift runtime in private text gate

* billing telemetry + MarkUntrusted race fix + Swift routing tests

- Add Datadog histogram metrics for reservation amounts, settlement
  refunds, provider credits, and platform fees
- Add store.debit/credit.latency_ms histograms for DB operation timing
- Add billing.cost_clamped and billing.reservation_refunds counters
- Fix race in MarkUntrusted: hold r.mu write lock through counter
  decrement to prevent double-decrement with Disconnect
- Add unit tests for Swift provider privacy caps (with/without Python)
- Add E2E test for Swift provider routing via challenge-verified path
- Update dev-network-dashboard.json with Billing & Store group

* fix Heartbeat reviving untrusted providers causing onlineCount double-decrement

* revert orthogonal landing/console-ui/provider changes

* remove unbounded binary_hash cardinality, add input token metrics + store latency, fix dashboard group-by

* fix review feedback: ModelType() untrusted filter, routing.cost_ms by provider, billing in cents, dead comment

---------

Co-authored-by: Gajesh Naik <26431906+Gajesh2007@users.noreply.github.com>
Co-authored-by: anupsv <6407789+anupsv@users.noreply.github.com>
Co-authored-by: hankbob <hankbobtheresearchoor@gmail.com>
Co-authored-by: Hank Bob <hankbob@researchoors.com>

* migration: harden Rust→Swift cutover end-to-end

Twelve fixes informed by three reviewer subagents (codex-rescue,
independent Claude, full pipeline audit) to ensure the bridge release
→ Swift release cutover works on first try, with no silent breakage.

Coordinator:
- accept darkbloom-bundle-<platform>.tar.gz (was eigeninference-bundle-)
- restore TestProviderRegistrationWithoutAttestationRejectedWhenBinaryHashPolicyConfigured
  (dropped during the master→swift-provider merge)

release-swift.yml:
- ship bin/{darkbloom,darkbloom-enclave,mlx.metallib} as real-file copies
  (was symlinks) so coordinator's tar.TypeReg verifier accepts them and
  hashes the actual bytes
- staple both bin/ AND .app/Contents/MacOS/ paths now that they're
  independent files
- post-codesign verification: fail build if signed CLI is missing the
  keychain-access-groups entitlement or the access group
  SLDQ2GJ6TL.io.darkbloom.provider, or if embedded.provisionprofile
  is absent from the .app
- PROVISIONING_PROFILE_BASE64 is now hard-required (no silent ephemeral
  fallback). Profile is decoded + parsed with plutil/python: verifies
  TeamIdentifier, keychain-access-groups, application-identifier, and
  ExpirationDate >= 30 days out
- pin MLX python wheel to 0.31.1 to match libs/mlx-swift Cmlx version
  (was 0.31.2 — patch-level metallib ABI risk)
- prod releases now hard-fail Swift tests (was soft-fail for all)

release-rust-bridge.yml:
- rename bridge bundle to darkbloom-bundle-<platform>.tar.gz uniformly
  so coordinator accepts the registration

Both release workflows:
- PROD_* secrets fall back to legacy unprefixed (R2_ACCESS_KEY_ID,
  RELEASE_KEY, COORDINATOR_URL) + vars.R2_BUCKET when PROD_* empty.
  Fails hard if neither resolves.

provider/src/main.rs (bridge auto-update):
- new rewrite_launchd_plist_for_swift: extracts ProgramArguments from
  the Rust plist (`serve --coordinator URL --model M`), converts to
  Swift shape (`start --foreground --coordinator-url URL --model M`),
  atomic rename
- install_swift_update_bundle_at: if Darkbloom.app/Contents/MacOS/
  exists in the extracted bundle, replace bin/{darkbloom,darkbloom-
  enclave,mlx.metallib} with symlinks into .app/MacOS and route the
  launchd plist's ProgramArguments[0] at the .app's MacOS binary path.
  This puts the embedded provisioning profile in scope at runtime, so
  the persistent SE key (PR #146) doesn't get errSecMissingEntitlement
  on first attestation post-cutover
- plist_path is now an Option<&Path> so tests can avoid touching the
  developer machine's real ~/Library/LaunchAgents

Tests added (all passing):
- 6 plist-rewrite unit tests: extract / convert / rewrite / install-
  with-plist / .app-aware install / hash-only install
- 1 ported coordinator attestation policy test
- existing 7 auto-update integration tests still pass (302 → 303 total)

Verified by audit:
- macos-26-xlarge has Xcode 26.2 / Swift 6.2, satisfies all
  swift-tools-version requirements
- LatestProviderVersion ordering: semver THEN created_at in both memory
  and Postgres stores
- /api/version JSON shape matches what auto_update_check_with_install_dir
  expects
- StartCommand --foreground doesn't recurse into launchAgent.installAndStart
- Swift ModelScanner reads ~/.cache/huggingface/hub (same as Rust)
- AuthTokenStore path parity (~/.darkbloom/auth_token)

Deployment prerequisite: coordinator changes must be deployed (master
→ dev Cloud Build, then human ecloud deploy to prod) BEFORE tagging
any release. Bridge registration will 400 against an older coordinator
that doesn't know about the darkbloom-bundle- filename.

* chore: cargo fmt on plist-migration code

Post-rustfmt: long format!() args wrapped, with_context closure pulled
onto one line, ternary-style assignment broken into if/else. No
behavior change — `cargo test --bin darkbloom` still 303 pass / 0 fail.

---------

Co-authored-by: ethenotethan <42627790+ethenotethan@users.noreply.github.com>
Co-authored-by: anupsv <6407789+anupsv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: hankbob <hankbobtheresearchoor@gmail.com>
Co-authored-by: Hank Bob <hankbob@researchoors.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error responses missing param and code fields — OpenAI SDK error handling broken

3 participants