fix(api): add code and param fields to OpenAI error responses#144
Merged
ethenotethan merged 1 commit intoMay 14, 2026
Conversation
|
@hankbobtheresearchoor is attempting to deploy a commit to the EigenLabs Team on Vercel. A member of the Team first needs to authorize it. |
The errorResponse function only populated type and message, missing code and param required by the OpenAI API spec. Without code, SDKs cannot programmatically distinguish error types (e.g. Python SDK e.code returns None, retry logic breaks, Sentry groups all errors as one). Changes: - errorResponse now accepts optional errorDetailOpt variadic args - code defaults to errType for backward compatibility - withParam() and withCode() helpers for call-site overrides - model-not-found errors include param="model" - model-is-required errors include param="model" - insufficient_funds uses OpenAI-canonical code "insufficient_quota" - rate_limit_exceeded gets explicit withCode for clarity All 202 existing call sites are backward-compatible: the variadic signature means they compile unchanged, and the default code=errType matches the implicit behavior SDKs already assumed. Closes Layr-Labs#142
94ab31c to
dc009f1
Compare
Gajesh2007
approved these changes
May 14, 2026
Gajesh2007
added a commit
that referenced
this pull request
May 15, 2026
* Clarify provider trust diagnostics
* Add Swift provider runtime
* Remove unused e2e vector generator
* Continuous batching, GPU-only enforcement, rename to darkbloom, Layr-Labs forks
This is the v0.5.0 cutover commit on the Swift provider PR. It lands
true continuous batching as the production inference path, threads
per-row sampling through the request, hard-fails on CPU-only hosts,
renames the user-visible CLI surface from "eigeninference" to
"darkbloom" with backward compatibility, and re-homes the mlx-swift /
mlx-swift-lm submodules to Layr-Labs forks.
Continuous batching (default, no parallel implementations)
----------------------------------------------------------
Replaces the per-request BatchScheduler with one shared BatchGenerator
ported from upstream `mlx_lm.generate`. All concurrent requests are
merged into one batched forward pass per step. Bit-identical against
single-stream greedy on:
- Qwen3 0.6B-8bit (dense), B=2 / B=4-ragged
- Qwen3.5 0.8B-MLX-4bit (hybrid SSM + attention), B=2
- Gemma 4 26B-A4B-it-8bit (MoE, 26 GB), B=2
The mlx-swift-lm side of this work is at
Layr-Labs/mlx-swift-lm@darkbloom-continuous-batching:
- BatchKVCache + BatchedCache protocol
- SequenceStateMachine, PromptProcessingBatch, GenerationBatch,
BatchGenerator
- RowSamplers (temperature / top-P / top-K / seed)
- Gemma 4 MoE support + K=V branch fix in Gemma4Attention
Production scheduler in provider-swift/Sources/ProviderCore/Inference/
BatchScheduler.swift wraps the engine in an actor; detached worker
calls into the actor only for short critical sections so cancel/submit
never queue behind a long-running step. submit() builds a per-row
sampler from request.{temperature, top_p, top_k, seed}.
Validation also covers eviction-and-admission: row 0 finishes mid-batch,
row C is admitted into its slot, row C's tokens match a solo run, row B
(running through the eviction) also matches its solo run. This locks in
BatchKVCache.filterBatched + extendBatched correctness end-to-end.
Sampler unit tests cover greedy passthrough, top-K=1 determinism,
top-K masking, top-P collapse-to-dominant, top-P=1 identity, seeded
reproducibility, and different-seed divergence.
GPU-only enforcement
--------------------
ProviderCore/Inference/GPUEnforcement.swift:
- probeMetal(): non-throwing Metal device probe
- requireMetal(): throws on missing GPU; pins Device.setDefault(.gpu);
idempotent
Wired into BatchScheduler.loadModel, StartCommand, BenchmarkCommand,
and `darkbloom doctor`. Doctor surfaces a `[PASS] metal gpu: <name>,
<N> GB working set` line; `[FAIL]` on Intel/Linux. CPU fallback for
inference is rejected up-front with a descriptive error.
Rename: eigeninference → darkbloom (Swift CLI surface)
------------------------------------------------------
Canonical names:
- eigeninference-enclave → darkbloom-enclave (binary + struct)
- Sources/eigeninference-enclave-cli/ → Sources/darkbloom-enclave-cli/
- SwiftPM target EigenInferenceEnclaveCLI → DarkbloomEnclaveCLI
- eigeninference-bundle-macos-arm64.tar.gz →
darkbloom-bundle-macos-arm64.tar.gz
- ~/.config/eigeninference/ → ~/.config/darkbloom/ (preferred path)
- Mobileconfig prefix: EigenInference-Enroll-* → Darkbloom-Enroll-*
Backward compatibility:
- install.sh creates a `eigeninference-enclave` symlink to
`darkbloom-enclave` so existing install scripts keep resolving.
- Config loader still reads ~/.config/eigeninference/ and the App
Support legacy paths as fallbacks; new writes always go to
~/.config/darkbloom/.
- LocalDataCleanup.purge() removes both directories.
- release-swift.yml publishes the latest tarball under both
canonical and legacy filenames.
- NodeKeyPair.legacyDirNames and SecurityHardening MDM-profile-name
matchers still accept the old name.
- Coordinator/Rust/UI surfaces (R2 buckets, Stripe descriptors,
Solana memos, telemetry source attribution) intentionally
untouched.
CLI subcommands shipped in v0.5.0
---------------------------------
darkbloom serve / start / stop, status, doctor, models {list, catalog,
download, remove}, enroll, unenroll, login, logout, logs, autoupdate,
benchmark, update, verify. start --foreground is the launchd
entrypoint; start --local --port N runs a standalone OpenAI-compatible
HTTP server. PID-file single-instance enforcement, caffeinate-based
sleep prevention, panic-hook telemetry, and metallib hash in
attestation are all wired in.
Submodule re-homing
-------------------
.gitmodules now points to Layr-Labs/mlx-swift and Layr-Labs/mlx-swift-lm.
The mlx-swift pointer is unchanged (clean `main`). The mlx-swift-lm
pointer advances from 3ec4b8a (codex/local-mlx-swift-dependency) to
91612d5 (darkbloom-continuous-batching) which carries the batching
engine + Gemma 4 MoE fork on Layr-Labs/mlx-swift-lm.
Tests
-----
135 / 135 tests pass in 16.5 s with DARKBLOOM_LIVE_MLX_TESTS=1 and
DARKBLOOM_LIVE_MLX_GEMMA=1 (live MLX inference against real models
plus the gated 27 GB Gemma generation test).
* Bump mlx-swift-lm submodule to main after re-homing to Layr-Labs
Layr-Labs/mlx-swift-lm@main now carries the continuous-batching engine,
per-row samplers, and Gemma 4 MoE port at 8d76944. Same tree as the
prior 91612d5 commit on the darkbloom-continuous-batching branch, but
without the local-path mlx-swift dep hack, so the fork is consumable
by URL outside this repo.
* Untrack .claude/ files and drop dangling cross-references
The .claude/ directory holds local agent state (cursor task files,
working notes, the in-progress migration plan). Those don't belong in
the repo. Untrack the two committed markdown files and broaden the
.gitignore from `.claude/worktrees/` to `.claude/` so future agent runs
don't add them back. Strip the dead links to .claude/swift-migration-plan.md
from CLAUDE.md, provider-swift/README.md, docs/ARCHITECTURE.md, and
scripts/fetch-metallib.sh -- the surrounding prose stands on its own.
The local files remain on disk for active reference; only the tracking
is removed.
* Idle-timeout unload + coordinator-driven model preload protocol
Two related additions to the provider's model lifecycle:
1) Idle-timeout unload
----------------------
ProviderLoop now runs a background monitor that polls every minute.
If `idleTimeoutMins` minutes (default 60) have elapsed since the last
inference activity AND no requests are in flight, the loaded
ModelContainer is dropped. The next inference request lazy-reloads.
`idleTimeoutMins == 0` disables the monitor; the model stays
resident forever.
The decision is extracted into `IdleTimeoutPolicy.shouldUnload(...)`
so the rule is unit-testable without spinning up the full ProviderLoop
actor (which depends on Secure Enclave, coordinator client, and
security posture). Five unit tests pin the policy: (a) unloads when
all conditions met, (b) never unloads with inflight requests,
(c) never unloads with no model loaded, (d) waits for the timeout to
elapse, (e) zero-timeout edge case is still defensive.
Activity tracking: `lastInferenceAt` updates on every request
admission and on every request finish (`removeInflightTask`). The
worker is a detached `Task` so cancel/submit on the actor never
queue behind the timer.
2) Coordinator-driven model preload
-----------------------------------
New WebSocket message `coordinator → provider: load_model`. The
provider has no inbound listener (security: a discovered IP can't
reach the GPU), so the coordinator pushes preload requests over the
existing outbound WebSocket connection that the provider opened.
Use case: the coordinator predicts demand for model X on machine Y
in the next hour and warms it ahead of time.
Provider behavior:
- If model is already loaded: short-circuit, reply succeeded.
- Otherwise: emit `load_model_status` "started" immediately,
kick off `ensureModelLoaded` in a detached Task, then emit
"succeeded" or "failed" (with an error string) when the load
settles.
Wire surface added in three places (per AGENTS.md sync rule):
- coordinator/internal/protocol/messages.go: `TypeLoadModel`,
`TypeLoadModelStatus`, `LoadModelMessage`, `LoadModelStatusMessage`,
plus the `LoadModelStatusStarted/Succeeded/Failed` constants.
- provider-swift/.../Protocol/Messages.swift: new
`CoordinatorMessage.loadModel(...)` case + `ProviderMessage
.loadModelStatus(...)` case + Codable on both sides.
- provider-swift/.../Coordinator/CoordinatorClient.swift: dispatch
inbound `load_model` to a new `CoordinatorEvent.loadModel(modelId)`
and add `OutboundMessage.loadModelStatus(...)` for the reply.
ProviderLoop wires `handleLoadModelRequest(modelId:send:)` for the
new event. Round-trip tests cover decoding a Go-style `load_model`
JSON and encoding all three lifecycle status replies (started /
succeeded / failed-with-error) with snake_case wire keys.
Rust legacy provider intentionally untouched. The coordinator
should gate `load_model` dispatch on `backend == "mlx-swift"` so
the Rust path never receives an unknown message; that gate lives
on the coordinator side and isn't part of this commit.
Tests
-----
141 / 141 tests pass with DARKBLOOM_LIVE_MLX_TESTS=1 and
DARKBLOOM_LIVE_MLX_GEMMA=1 (live MLX inference + Gemma 4 26B-A4B-it-8bit
MoE batching included). New: 5 IdleTimeoutPolicy tests + 1
loadModel round-trip protocol test.
* Add end-to-end performance tests: TTFT, encryption, batching, model load
Four new live tests that produce reproducible numbers for the four
scenarios the operator asked about. Gated by DARKBLOOM_LIVE_MLX_TESTS=1;
all four target Qwen3 0.6B-8bit so the suite finishes in ~7 s.
A) warm TTFT baseline -- pure inference TTFT with no encryption
and the model already loaded.
B) cold TTFT -- spins up a fresh ModelContainer each
iteration so the weights are re-paged from disk; reports
load_time and load_time + first_token separately.
C) encrypted TTFT -- runs the request body through
NodeKeyPair.encrypt (consumer side) and NodeKeyPair.decrypt
(provider side) with real libsodium NaCl box, then submits.
Reports encrypt-only, decrypt-only, warm TTFT, and total
E2E first-token (enc + dec + TTFT) so each layer's cost is
visible.
D) batched TTFT -- B=1, B=2, B=4 concurrent submissions on
a single shared scheduler. Reports per-row TTFT and aggregate
throughput so the continuous-batching scaling story is honest.
Headline numbers on M4 Max with Qwen3 0.6B-8bit:
warm TTFT (plaintext): ~20 ms
encrypt (consumer side): ~0.05 ms (libsodium NaCl box)
decrypt (provider side): ~0.02 ms
E2E first-token (enc+dec+TTFT): ~31 ms
cold model load: ~856 ms
cold load + first token: ~1036 ms
aggregate throughput B=1: 87.4 tok/s
aggregate throughput B=2: 176.2 tok/s (~2.0x)
aggregate throughput B=4: 317.1 tok/s (~3.6x)
per-request TTFT B=1 -> B=4: 34 ms -> 36 ms (flat)
Encryption is essentially free, continuous batching scales
near-linearly to B=4, and per-request TTFT is invariant under
batching -- the key continuous-batching scheduler invariant.
The tests assert lower-bound liveness (durations > 0, all rows
complete) but don't pin absolute latencies, since those vary by
hardware. Numbers print to stderr in a "[perf]" prefix so they
land in the test log without polluting test stdout.
While here, fixed a `String(format:)` bug in the printRow helper
where `%s` was used with a Swift String (would have segfaulted
the test process via _platform_strlen on an unaligned pointer).
145 / 145 tests pass in 9 s with DARKBLOOM_LIVE_MLX_TESTS=1.
* Add Gemma 4 26B-A4B-it-8bit MoE tier to performance suite
Refactor PerformanceLiveTests so every scenario (warm TTFT, cold load,
encrypted E2E, batched throughput) is parameterised by a `ModelConfig`
struct (label, modelID, wired-memory budget, iteration counts, batch
sizes, max_tokens). Two configs ship in the suite:
- Qwen3 0.6B-8bit smoke tier (DARKBLOOM_LIVE_MLX_TESTS=1)
- Gemma 4 26B-A4B-it-8bit production tier
(DARKBLOOM_LIVE_MLX_TESTS=1 +
DARKBLOOM_LIVE_MLX_GEMMA=1)
Both run all four scenarios. Total 8 @test methods (4 + 4).
Headline numbers on M4 Max with weights memory-mapped from local cache:
Gemma 26B MoE:
warm TTFT 309 ms
cold load 2.63 s
cold load + first token 3.07 s
encrypt (consumer side) 0.05 ms
decrypt (provider side) 0.03 ms
E2E first-token 262 ms
B=1 throughput 10.2 tok/s
B=2 throughput 16.7 tok/s (1.64x)
B=4 throughput 23.9 tok/s (2.34x)
Qwen3 0.6B (for comparison):
warm TTFT ~21 ms
cold load ~887 ms
E2E first-token ~32 ms
B=4 throughput ~302 tok/s
Three things the Gemma tier surfaces that the smoke tier doesn't:
1. Encryption is *still* essentially free at 26B scale -- 70-80 us
combined for encrypt + decrypt, dwarfed by the 200+ ms
memory-bandwidth-bound prefill.
2. Per-row TTFT scales SUB-linearly with B for MoE (234 -> 344 -> 603
ms at B=1/2/4) because each batched prefill processes a heavier
forward. Aggregate throughput still wins (10 -> 17 -> 24 tok/s).
3. Cold load on a 26 GB MoE that's still in the OS page cache is
~2.6 s -- the relevant number for the idle-timeout-reload path.
First-ever boot would be longer (NVMe-bound), but unmeasurable
from a unit test without privileged page-cache flushing.
Also tighten the report formatting: column padding to 56 chars, "ms"
under 1 s and "s" above, max_tokens=8 for Gemma (vs 16 for Qwen) so
the suite finishes in ~30 s with all four scenarios run twice.
149 / 149 tests pass in 37 s with both env vars set.
* Performance audit vs mlx_lm: bracket the dispatch-overhead gap
The user noticed that "10.2 tok/s for Gemma 26B" looked too low. They
were right. Side-by-side with `mlx_lm` 0.31.3 Python on the same M4
Max + same checkpoints:
Qwen3 0.6B-8bit mlx_lm: 426 tok/s us: ~84 tok/s (5.0x)
Gemma 4 26B-A4B-it-8bit MoE mlx_lm: 84 tok/s us: ~33 tok/s (2.4x)
To localize the gap, this commit adds a "decode-tps bracket" test
that measures the same B=1 steady-state decode through three paths:
1. Pure model loop -- model.callAsFunction directly, no scheduler
2. BatchGenerator -- our continuous-batching engine, B=1
3. BatchScheduler -- production path (actor + AsyncStream)
Findings on Gemma 26B MoE (decode-only, 64 tokens):
pure loop, sync eval 34.6 tok/s
pure loop, async eval 34.4 tok/s (no improvement -- not
the issue)
BatchGenerator B=1 32.6 tok/s (-6%, noise-level)
BatchScheduler.submit 32.5 tok/s (-6%, noise-level)
mlx_lm Python reference 84.0 tok/s (2.4x faster)
Conclusion: the gap is at the **MLX-Swift dispatch layer**, not in
our scheduler or batched-cache code. The pure model loop is already
2.4x slower than Python. Adding our BatchScheduler + actor + worker
adds < 6% on top -- not the bottleneck.
The 8-13 ms per-step CPU overhead is consistent with kernel-launch
latency in mlx-swift bindings. mlx_lm Python uses `mx.compile` on
the decode step to amortize this; mlx-swift-lm does not. Closing
the gap is a separate workstream on the upstream library.
Other improvements in this commit:
* Bump Gemma's batched max_tokens from 8 -> 32 so steady-state
decode dominates the aggregate TPS metric.
* Add steady-state decode TPS reporting alongside aggregate (subtract
prefill so it compares like-for-like with mlx_lm's "Generation:
X tokens-per-sec" headline).
* Switch the throughput tests to a long-output prompt ("write a 200
word story...") so the model decodes to max_tokens instead of
hitting EOS at ~12 tokens. The B=1 number was misleadingly low
before because the prior prompt asked for "a single word".
* Add async-eval pipelining variant to the bracket -- confirms
mx.async_eval alone doesn't close the gap (which means the missing
optimization is `mx.compile`, not just async dispatch).
* Add Qwen3 bracket test alongside the Gemma one.
* Document the gap explicitly in the file header so future
optimisation work has a clear target.
Honest headline numbers (M4 Max, weights memory-mapped from cache):
Gemma 26B MoE warm TTFT 280-352 ms
Gemma 26B MoE cold load 3.32 s (re-page from cache)
Gemma 26B MoE encrypt+decrypt 0.10 ms (free)
Gemma 26B MoE steady-state decode 32-40 tok/s B=1
35-39 tok/s B=4 aggregate
Qwen3 0.6B steady-state decode 84 tok/s B=1
323 tok/s B=4 aggregate
Continuous batching itself works correctly: B=4 aggregate is 2.9x
B=1 (Gemma) and 3.8x B=1 (Qwen). The dispatch-overhead headwind
applies equally to all batch sizes.
151 / 151 tests pass in 71 s with both env vars set.
* Compare against mlx_lm batched + greedy fast-path in BatchScheduler
The previous perf audit only compared B=1 against mlx_lm. This commit
extends the comparison to B=1, B=2, B=4 by adding a Python benchmark
script (scripts/mlx_lm_batch_bench.py) that drives mlx_lm's upstream
BatchGenerator, and applies one targeted Swift-side optimization
based on what the comparison surfaced.
Reference numbers (mlx_lm 0.31.3, M4 Max, decode-only tok/s):
Qwen3 0.6B-8bit B=1: 265 B=2: 694 B=4: 1119
Gemma 4 26B-A4B-it-8bit MoE B=1: 74 B=2: 126 B=4: 181
The gap WIDENS with batch size, which pointed at an O(B) overhead in
our per-row sampling path. Smoking gun: GenerationBatch.step takes a
slow path whenever ANY row's sampler is non-nil, doing B separate
slice + sample + concat ops (=> 9 kernel launches per token at B=4)
instead of the vectorized fallback (=> 1 kernel launch). Our
BatchScheduler.submit was passing a non-nil greedy closure even when
temperature == 0, forcing every batch through the slow path.
Fix: when temperature <= 0, pass `nil` so the row falls through to
the vectorized fallback. The fallback is also greedy, so the result
is identical -- only the dispatch path changes. Per-row temperature
/ top-P / top-K / seed all still work for non-greedy rows.
Swift numbers after the fix (decode-only):
Qwen3 0.6B-8bit B=1: 88 B=2: 181 B=4: 351 (was 84 / 174 / 323)
Gemma 4 26B-A4B-it-8bit MoE B=1: 37 B=2: 23 B=4: 42 (was 33 / 21 / 39)
Modest +6-13% across the board. The remaining 3-4x gap to Python is
at the MLX-Swift dispatch layer (per-step kernel-launch overhead);
mlx_lm closes it via `mx.compile` on the decode step, which isn't
applied in mlx-swift-lm. That's a separate workstream.
Continuous batching scaling is still healthy:
Qwen B=4 / B=1 = 4.0x (matches mlx_lm's 4.2x exactly)
Gemma B=4 / B=1 = 1.1x (mlx_lm's is 2.4x; gap reflects MoE expert
dispatch where Python's compile pays off most)
Other changes:
* scripts/mlx_lm_batch_bench.py -- runnable apples-to-apples bench
for future regression checks. Reproduces the reference numbers in
the file header.
* Update PerformanceLiveTests.swift docstring with the side-by-side
table so the gap is visible to anyone reading the test.
151 / 151 tests pass.
* Perf compare mlx_lm batching and bump mlx-swift-lm decode optimizations
The user called out that our Gemma 26B throughput looked too low, so this
commit makes the comparison apples-to-apples against mlx_lm Python's
BatchGenerator and bumps the mlx-swift-lm submodule to the optimized main
commit.
New reference script:
scripts/mlx_lm_batch_bench.py
It runs mlx_lm.generate.BatchGenerator at B=1/2/4 over the same long-output
prompt used by PerformanceLiveTests and reports prefill+1, decode-only TPS,
and aggregate TPS. Reference numbers on M4 Max:
Qwen3 0.6B-8bit B=1: 265 B=2: 694 B=4: 1119 tok/s
Gemma 4 26B-A4B-it-8bit MoE B=1: 74 B=2: 126 B=4: 181 tok/s
Swift improvements landed in Layr-Labs/mlx-swift-lm@b02ea5b:
- mlx_lm-style double buffering in GenerationBatch: constructor primes
the first token, next() returns current token while async-evaluating
the following token.
- Greedy fast path avoids logSumExp: argMax(logits) == argMax(logprobs),
and we don't expose logprobs downstream today.
- BatchScheduler now passes nil for temperature=0 samplers so batches
use the vectorized greedy fallback instead of per-row slice/sample/concat.
- Token tensors are UInt32 to match mlx_lm.
- BatchKVCache now exposes innerState and KVCache conforms to Updatable,
which fixes the cache state surface needed for future compile work.
Measured Swift deltas:
Qwen3 0.6B:
B=1 decode ~84 -> ~104 tok/s
B=4 aggregate ~323 -> ~363 tok/s
Gemma 26B MoE:
B=1 decode ~32 -> ~37 tok/s
B=4 aggregate ~39 -> ~40 tok/s
This closes the avoidable scheduler/batching overhead we found, but does
not fully close the remaining 2-4x gap to Python. The bracket test shows
BatchGenerator/BatchScheduler are now within noise of the pure model loop;
the remaining gap is in mlx-swift model dispatch / lack of stateful
mx.compile support. Attempting to compile the batched-cache decode graph
still fails in mlx-swift with "uncaptured inputs", so that remains an
upstream library workstream rather than a provider scheduler bug.
* Clarify release-mode batch performance measurements
The previous perf notes mixed debug-mode Swift numbers with mlx_lm Python
reference numbers, which made the Swift engine look far worse than it is.
This test-only cleanup makes the performance suite report the data needed
to keep comparisons honest.
Changes:
- Update the PerformanceLiveTests header to state explicitly that mlx_lm
comparisons must use `swift test -c release`; debug Swift is several
times slower and not a valid reference.
- Add direct BatchGenerator B=2/B=4 decode-only measurements to the
bracket test, in addition to pure loop and BatchScheduler.submit.
- Add "model-side scheduler" TPS in the public batched test so we can
distinguish model decode speed from public text streaming / AsyncStream /
detokenization costs.
Release-mode checks on this machine:
- Qwen3 0.6B direct BatchGenerator B=4: ~1130 tok/s, matching mlx_lm's
~1119 tok/s reference.
- Gemma 4 26B-A4B-it-8bit direct BatchGenerator B=4: ~186 tok/s,
matching mlx_lm's ~181 tok/s reference.
- BatchScheduler.submit B=1 decode bracket also lands at the direct model
rate in release mode (~402 tok/s Qwen, ~79 tok/s Gemma); public streaming
tests report separate model-side and aggregate numbers so regressions are
localizable.
No production code changes in this commit.
* Complete Swift provider runtime verification
* Bridge Rust updater to Swift provider bundles
* Add Rust to Swift updater E2E tests
* Add Rust bridge release workflow
* E2E testbed: integration tests, profiling, and benchmarking infrastructure (#136)
* Flatten coordinator/internal/ to coordinator/, add E2E integration test suite
Promote Go module root from coordinator/ to repo root so the e2e
test suite can import coordinator packages. Flatten
coordinator/internal/ to coordinator/ to remove the Go internal
package restriction.
All import paths change from
github.com/eigeninference/coordinator/internal/X to
github.com/eigeninference/d-inference/coordinator/X.
The module path is now github.com/eigeninference/d-inference.
12 E2E integration tests using the Swift provider (mlx-swift backend):
- NonStreamingInference, StreamingInference
- MultipleRequestsAccounting, E2EEncryptionCorrectness
- BillingBalanceDeduction, ProviderPayoutSplit, ReferralRewardDistribution
- InsufficientBalance, InvalidModel
- StreamingContentValidation, ConcurrentRequests, AttestationHeaders
Each test gets its own isolated suite (Postgres + coordinator + provider)
via startSuite(t). A semaphore serializes suite lifecycles to prevent
GPU contention from concurrent MLX model loads.
Update CI workflows to reference go.mod at repo root, exclude e2e/
from unit tests, and use swift build for the provider.
* Move coordinator e2e back to coordinator/internal/e2e/
The coordinator's own e2e package was incorrectly flattened into
coordinator/e2e/ alongside the repo-root e2e/ testbed suite.
Restore it to coordinator/internal/e2e/ where it belongs.
* Run integration tests on any PR, not just master/main
* Fix CI: install Docker on macos-15, increase timeout to 30m, serial tests
* Use colima for Docker on macOS CI
* Remove invalid --no-mount flag from colima start
* Add native Postgres fallback, drop Docker/colima from CI
Docker Desktop and colima both fail on macOS CI runners due to
virtualization restrictions. Add a native Postgres lifecycle that
uses initdb + postgres directly (installed via Homebrew).
The Start() method tries Docker first, falls back to native.
CI now installs postgresql@16 via brew instead of Docker.
* Download MLX model in CI before running integration tests
* Use Python API for model download (huggingface-cli is deprecated)
* Use shared suite across all integration tests
Instead of starting a new suite (Postgres + coordinator + provider +
model load) per test, use a single shared suite initialized on first
access. This cuts total test time from ~18min to ~3min since the
expensive model load only happens once.
* Build provider in debug mode for CI (skips SIP/security checks)
CI macOS runners have SIP disabled, which causes the provider to
exit with 'System Integrity Protection is disabled'. Debug builds
skip verifySecurityPosture() via #if !DEBUG, allowing tests to
run on CI.
Add TESTBED_PROVIDER_CONFIG env var (default: release) to control
the Swift build configuration from testbed.
* Force-trust provider in tests, disable frequent challenges
CI macOS runners have SIP disabled, which causes the provider to
fail attestation challenges. Add ForceTrustProvider() to override
status/trust/SIP verification for testing, set challenge interval
to 1h, and add a 3s delay after registration to let the initial
challenge fire before overriding.
* Force all privacy capabilities in ForceTrustProvider for testing
The private-text routing gate checks PythonRuntimeLocked and
DangerousModulesBlocked which are always false on the Swift
backend (no Python runtime). ForceTrustProvider now sets all
privacy capabilities to true and drains queued requests
immediately after trust promotion.
* Restore per-test isolated suites
Each test gets its own Postgres + coordinator + provider.
With debug builds, ForceTrustProvider, native Postgres, and
model pre-download, each suite starts in ~15-20s.
* Add load generator, profiling tests, multi-provider support
- Suite.Providers is now []*Provider; TESTBED_NUM_PROVIDERS env var
controls how many provider subprocesses start per suite
- New LoadGenerator in testbed/load.go with configurable concurrency,
total requests, streaming, max_tokens, temperature
- New profile tests: SingleProviderStreaming, SingleProviderNonStreaming,
HighConcurrency — each prints segment tables with mean/p50/p95/max
- Existing integration tests (NonStreaming, Streaming, Concurrent) now
emit Instrument events and print profile tables
- Profile SummaryTable uses millisecond resolution instead of microsecond
* Add multi-model provider specs, user pool, and latency decomposition headers
SuiteConfig now takes ModelSpecs (model ID + provider count per model) and
NumUsers. Providers are started per-spec with unique PID files (fixes
single-instance lock killing sibling providers). A user pool with round-robin
API key rotation is created at startup.
Coordinator sets X-Queue-Wait-Ms and X-Provider-Latency-Ms response headers
from PendingRequest timing fields (QueuedAt, DispatchedAt, FirstChunkAt).
LoadGenerator parses these and emits per-segment stats:
client_to_coordinator, queue_wait, coordinator_to_provider, provider_to_client.
Provider ProcessLifecycle respects DARKBLOOM_PID_FILE env var for
multi-instance testing. Add SetSkipChallenge to Server for test runs.
* Rename SegmentClientToCoordinator to SegmentTotalE2E
The segment measures full end-to-end wall clock time, not just
client-to-coordinator latency. The old name was misleading.
* Decompose X-Timing header into per-phase microsecond breakdown
Replace X-Queue-Wait-Ms / X-Provider-Latency-Ms with a single X-Timing
JSON header containing parse_us, reserve_us, route_us, queue_us,
encrypt_us, dispatch_us, provider_us. Move timing fields onto a
RequestTiming struct in PendingRequest. LoadGenerator parses the JSON
and emits per-segment stats with auto ms/µs precision.
* Add latency regression assertions, SegmentStatsMap, and heavy-load benchmark
- Add SegmentStatsMap() to LoadResult for per-segment mean/p50/p95/p99/max
- Wire coordinator overhead assertions into all benchmark and profile tests
- Update DefaultThresholds with realistic values based on benchmark data
- Add CoordinatorOverheadThresholds() alias
- Deduplicate SegmentStatsView (assert package uses type alias to testbed)
- Clean up profile_test.go: remove redundant second load loop, use assertions
- Add PromptBytes field to RequestConfig for large-payload testing
- Add HeavyLoad 100-concurrent 10KB benchmark
- Replace bubble sort with sort.Slice in computeStats
* Split CI into eval + benchmark jobs, post benchmark results as PR comment
Integration tests (TestIntegration|TestProfile) run on every push/PR.
Benchmarks (TestBenchmark) run only on PRs and post a markdown summary
as a PR comment via gh pr comment. LoadResult and AssertionReport gain
SummaryMarkdown() methods for markdown table formatting. A TestMain in
benchmark_test.go writes the aggregated markdown to BENCHMARK_MD_PATH
when set.
* Skip multi-model benchmark in CI (gemma model not downloaded)
The M1 Virtual CI runner only downloads Qwen3.5-0.8B; the gemma
multi-model test requires a second model that isn't available.
* Download gemma-3-270m-4bit in CI, remove multi-model skip
* Include model IDs and RAM sizes in benchmark PR comment
* address feedback
* fix: soft-fail Swift tests on dev + download full model for CI
* feat: environment-scoped R2 + coordinator secrets for dev/prod release isolation
- Move R2_BUCKET from vars to secrets so it participates in GitHub
environment scoping (dev vs prod get different buckets/credentials)
- Add documentation header listing all environment-scoped secrets
required per environment
- Soft-fail Swift unit tests on dev releases (live MLX model cache
may be incomplete on CI)
- Download full model (remove --include filter) for deterministic
CI cache seeding
* feat: DEV_/PROD_ prefixed repo secrets for R2 + coordinator env isolation
Both release workflows now resolve DEV_ or PROD_ prefixed repo secrets
in a resolve-env step using bash indirection — no GitHub environments
needed. The environment: gate is removed since secrets live at repo
level with prefixes.
Required repo secrets:
DEV_R2_ACCESS_KEY_ID, PROD_R2_ACCESS_KEY_ID
DEV_R2_SECRET_ACCESS_KEY, PROD_R2_SECRET_ACCESS_KEY
DEV_R2_ENDPOINT, PROD_R2_ENDPOINT
DEV_R2_BUCKET, PROD_R2_BUCKET
DEV_R2_PUBLIC_URL, PROD_R2_PUBLIC_URL
DEV_COORDINATOR_URL, PROD_COORDINATOR_URL
DEV_RELEASE_KEY, PROD_RELEASE_KEY
* fix: RELEASE_KEY is shared, not env-prefixed
* fix: resolve env secrets inline to avoid GitHub cross-job output masking
* fix: add DEV_RELEASE_KEY/PROD_RELEASE_KEY to env-prefixed secrets
* Add STRIDE threat model for runtime security review
40 threats across 9 trust boundaries (coordinator/provider WebSocket,
provider operator vs process, browser/UI, Apple MDM/MDA, admin API,
inference engine, payments, Apple attestation chain). Adversaries:
malicious provider, malicious consumer, external attacker. Each threat
includes affected_files globs, mitigations with status, open_findings
links to the existing security audit, and a detection_hint for
automated PR review.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* Expand threat model trust boundaries with implementation detail
Each of the 9 trust boundaries now documents how_it_works (exact code
paths, line numbers, auth mechanisms, data flows) and current_limitations
(specific open gaps with SEC-* references). Sources: coordinator/internal/
api/{server,provider,release_handlers,device_auth,billing_handlers}.go,
registry/registry.go, attestation/, mdm/, provider-swift/Sources/
ProviderCore/Security/{AntiDebug,BinaryHasher,SecureEnclaveIdentity,
SecurityHardening}.swift, Crypto/NodeKeyPair.swift, Inference/
{BatchScheduler,IdleTimeoutPolicy,InferenceCancellation}.swift,
ProviderLoop.swift, console-ui/src/{hooks/useAuth,lib/{api,store,
encryption}}.ts, next.config.ts.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* Add threat model PR review workflow
On every PR against master/main, the workflow:
1. Gets the PR diff via gh pr diff
2. Matches changed files against affected_files globs in docs/threat-model.yaml
3. Calls Claude API (claude-sonnet-4-6) with the focused diff + full threat model
4. Posts (or updates) a single PR comment with STRIDE-based security analysis
Uses prompt caching on the static threat model block to minimise API cost
on repeated pushes. The comment marker <!-- threat-model-review --> lets
the workflow update rather than append on each push.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* Persistent Secure Enclave key with keychain access group enforcement (#146)
* Add persistent Secure Enclave attestation key with keychain access group enforcement
Replace ephemeral CryptoKit SE keys with persistent Security framework keys stored
in the macOS data protection keychain. The key is bound to the signing team's
keychain access group (SLDQ2GJ6TL.io.darkbloom.provider), enforced by securityd
at the kernel level. A patched binary re-signed with codesign -s - gets
errSecMissingEntitlement and cannot access the key.
- PersistentEnclaveKey: Security framework SE key with SecKeyCreateRandomKey,
kSecAttrIsPermanent, and team-scoped access group
- AttestationSigner protocol: abstracts over both ephemeral and persistent keys
- ProviderLoop: tries persistent key first, falls back to ephemeral with warning
- Entitlements plist with keychain-access-groups for production signing
- 8 tests covering creation, persistence, signing, deletion, protocol conformance
* Embed provisioning profile in .app bundle for persistent SE key
The data protection keychain requires a provisioning profile to authorize
the keychain-access-groups entitlement. Wrap the CLI binaries in a minimal
Darkbloom.app bundle with embedded.provisionprofile so the persistent SE
attestation key works on provider machines.
- release-swift.yml: new step decodes PROVISIONING_PROFILE_BASE64 secret,
builds Darkbloom.app/Contents/ structure, signs bundle + individual binaries
- install.sh: detects .app bundle layout, symlinks bin/ into the app bundle
- Backward-compatible: falls back gracefully if secret is not set or if
provider receives a flat (pre-.app) bundle
* Add com.apple.application-identifier to provider entitlements
Required for data protection keychain access. Must match the bundle ID
in the provisioning profile (SLDQ2GJ6TL.io.darkbloom.provider).
* Address review: data protection keychain flag, tighter error handling, real SE probe
Codex P1 / hank P1:
- coordinator/api/install.sh: restore __DARKBLOOM_COORD_URL__ placeholder
(the coordinator templates this at serve time via server.go;
hardcoding the URL broke dev/self-hosted coordinators)
- PersistentEnclaveKey: add kSecUseDataProtectionKeychain: true to all
Security framework calls. Without it, queries may hit the legacy
file-based keychain where access group enforcement is silently ignored.
hank P2:
- loadOrCreate: catch only errSecItemNotFound before falling through to
createNew. Auth failures, locked keychain, and missing entitlement
now propagate to the caller instead of racing with key creation.
- isAvailable: probe real SE capability via CryptoKit's
SecureEnclave.isAvailable instead of just checking macOS version.
Now returns false on Intel Macs without T2 and macOS VMs without
virtualized SE. Added doc comment noting the entitlement dependency.
* fix(api): add code and param fields to OpenAI error responses (#144)
The errorResponse function only populated type and message, missing
code and param required by the OpenAI API spec. Without code, SDKs
cannot programmatically distinguish error types (e.g. Python SDK
e.code returns None, retry logic breaks, Sentry groups all errors
as one).
Changes:
- errorResponse now accepts optional errorDetailOpt variadic args
- code defaults to errType for backward compatibility
- withParam() and withCode() helpers for call-site overrides
- model-not-found errors include param="model"
- model-is-required errors include param="model"
- insufficient_funds uses OpenAI-canonical code "insufficient_quota"
- rate_limit_exceeded gets explicit withCode for clarity
All 202 existing call sites are backward-compatible: the variadic
signature means they compile unchanged, and the default code=errType
matches the implicit behavior SDKs already assumed.
Closes #142
* feat: add Datadog observability stack for dev coordinator (#143)
* Fix Darkbloom analytics tracking
* Harden release workflow protections (#103)
* Harden release registration and binary hash policy (#99)
* Harden release registration and binary hash policy
* derive release download URL from allowlist
* Stabilize provider coordinator test
---------
Co-authored-by: Gajesh Naik <26431906+Gajesh2007@users.noreply.github.com>
* Remove stale Python integration test (#109)
* e2e: add local simulation environment skeleton
Introduces scripts/e2e-runner.py, a Python orchestrator that spins up the
real coordinator binary with test-friendly configuration (in-memory store,
mock billing, no trust requirements) alongside a simulated or real
provider, and runs HTTP/WebSocket-level assertions against the live stack.
Key components:
- Coordinator class: builds and spawns coordinator with EIGENINFERENCE_MIN_TRUST=none,
EIGENINFERENCE_BILLING_MOCK=true, and in-memory store
- SimulatedProvider: pure-Python WebSocket client speaking the full provider protocol
(register, attestation challenge/response, heartbeat, inference request/response)
- Test framework: decorator-based test registration, pass/fail summary, signal-safe
cleanup via atexit + signal handlers
- Test stubs: test_basic (registration + discovery), test_inference (consumer
request routing), test_multi_provider (two providers, same model)
TODO:
- RealProvider wrapper around darkbloom serve --coordinator
- Coordination between provider challenge cycle and consumer request timing
- API key handling for consumer vs admin routes
- Python dependency management (websockets, cryptography)
* Revert "e2e: add local simulation environment skeleton"
This reverts commit d02074e. The Python E2E runner adds noise on top of
the existing Go integration tests (internal/api/integration_test.go +
fullstack_integration_test.go) which already cover the full coordinator
protocol surface. The cross-language orchestration doesn't buy anything
over what httptest.Server + simulated providers already provide.
* Remove stale Python integration test
@ethenotethan
tests/integration_test.py is superseded by the Go-based coordinator
integration tests at coordinator/internal/api/:
- Test coverage for coordinator protocol (register, challenge, heartbeat,
inference) is covered by integration_test.go using httptest.Server +
Go simulated providers — same coverage, no binary build needed
- Full-stack GPU inference is covered by fullstack_integration_test.go
with real vllm-mlx backends (gated behind LIVE_FULLSTACK_TEST=1)
- The Python test uses stale binary names ('eigeninference-provider'),
old flags ('--backend mlx-lm'), and predates attestation challenges,
E2E encryption, and the vllm-mlx backend migration
- No external dependency coverage (Postgres, Stripe, etc.) is lost — the
coordinator main.go wiring for those is trivially tested elsewhere
- The Python SDK tests (4.5.x) belong in the SDK repo, not the infra repo
---------
Co-authored-by: Hank Bob <hankbob@researchoors.com>
* chore: remove unused dependencies (#112)
* chore: remove unused dependencies
* test: fix console ui test isolation
* chore: prune repo-wide dead code findings
* ci: run CI on any PR, not just master/main (#119)
* ci: remove racing deploy-dev-coordinator workflow (#137)
Cloud Build (deploy/gcp/cloudbuild.yaml) already deploys the coordinator
on the same trigger (push to master touching coordinator/** or deploy/gcp/**).
Having both paths active creates a race condition where two CI systems
simultaneously deploy to the same dev VM — see #115.
* feat: add Datadog observability stack for dev coordinator
Install Datadog Agent on the dev GCE VM (DogStatsD, APM, journald logs)
and wire the coordinator to emit structured metrics, split attestation
counters, model_type tags, reactive provider-count gauges, and a
completion-tokens counter. Rebuild the dev dashboard with 7 sections
covering metrics, logs, traces, and system health.
* fix: prevent double-decrement when untrusted provider disconnects
Disconnect now checks StatusUntrusted before decrementing the online
counter and model-provider gauges, since MarkUntrusted already
decremented them.
* feat: add fleet version and binary hash observability
New metrics:
- providers.per_version gauge (per provider binary version)
- providers.per_binary_hash gauge (per attested binary hash)
- coordinator.min_provider_version_set gauge (1 when configured)
- provider_version_below_minimum counter (tagged by gate and version)
Gates instrumented:
- registration (provider.go)
- challenge revalidation (provider.go)
- manifest sync (server.go)
Registry additions:
- ProviderCountByVersion()
- ProviderCountByBinaryHash()
Dashboard: Fleet Version & Binary Hash group with providers by version,
providers by binary hash, min provider version, below-minimum events,
and top binary hashes toplist.
* fix: update Dockerfile + cloudbuild for go.mod at repo root
go.mod moved from coordinator/ to repo root during the swift-provider
merge. Build context is now repo root, Dockerfile copies coordinator/
subdir explicitly.
* fix: chmod +x coordinator binary in Dockerfile
* fix: ensure coordinator binary is executable in builder stage
* fix: rename coordinator source dir in builder to avoid colliding with binary path
* fix: copy full repo in Dockerfile builder so go.mod resolves all packages
* fix: remove unused modelTypeTag and format Go files for CI
* fix: skip python/dangerous-modules check for swift runtime in private text gate
* billing telemetry + MarkUntrusted race fix + Swift routing tests
- Add Datadog histogram metrics for reservation amounts, settlement
refunds, provider credits, and platform fees
- Add store.debit/credit.latency_ms histograms for DB operation timing
- Add billing.cost_clamped and billing.reservation_refunds counters
- Fix race in MarkUntrusted: hold r.mu write lock through counter
decrement to prevent double-decrement with Disconnect
- Add unit tests for Swift provider privacy caps (with/without Python)
- Add E2E test for Swift provider routing via challenge-verified path
- Update dev-network-dashboard.json with Billing & Store group
* fix Heartbeat reviving untrusted providers causing onlineCount double-decrement
* revert orthogonal landing/console-ui/provider changes
* remove unbounded binary_hash cardinality, add input token metrics + store latency, fix dashboard group-by
* fix review feedback: ModelType() untrusted filter, routing.cost_ms by provider, billing in cents, dead comment
---------
Co-authored-by: Gajesh Naik <26431906+Gajesh2007@users.noreply.github.com>
Co-authored-by: anupsv <6407789+anupsv@users.noreply.github.com>
Co-authored-by: hankbob <hankbobtheresearchoor@gmail.com>
Co-authored-by: Hank Bob <hankbob@researchoors.com>
* migration: harden Rust→Swift cutover end-to-end
Twelve fixes informed by three reviewer subagents (codex-rescue,
independent Claude, full pipeline audit) to ensure the bridge release
→ Swift release cutover works on first try, with no silent breakage.
Coordinator:
- accept darkbloom-bundle-<platform>.tar.gz (was eigeninference-bundle-)
- restore TestProviderRegistrationWithoutAttestationRejectedWhenBinaryHashPolicyConfigured
(dropped during the master→swift-provider merge)
release-swift.yml:
- ship bin/{darkbloom,darkbloom-enclave,mlx.metallib} as real-file copies
(was symlinks) so coordinator's tar.TypeReg verifier accepts them and
hashes the actual bytes
- staple both bin/ AND .app/Contents/MacOS/ paths now that they're
independent files
- post-codesign verification: fail build if signed CLI is missing the
keychain-access-groups entitlement or the access group
SLDQ2GJ6TL.io.darkbloom.provider, or if embedded.provisionprofile
is absent from the .app
- PROVISIONING_PROFILE_BASE64 is now hard-required (no silent ephemeral
fallback). Profile is decoded + parsed with plutil/python: verifies
TeamIdentifier, keychain-access-groups, application-identifier, and
ExpirationDate >= 30 days out
- pin MLX python wheel to 0.31.1 to match libs/mlx-swift Cmlx version
(was 0.31.2 — patch-level metallib ABI risk)
- prod releases now hard-fail Swift tests (was soft-fail for all)
release-rust-bridge.yml:
- rename bridge bundle to darkbloom-bundle-<platform>.tar.gz uniformly
so coordinator accepts the registration
Both release workflows:
- PROD_* secrets fall back to legacy unprefixed (R2_ACCESS_KEY_ID,
RELEASE_KEY, COORDINATOR_URL) + vars.R2_BUCKET when PROD_* empty.
Fails hard if neither resolves.
provider/src/main.rs (bridge auto-update):
- new rewrite_launchd_plist_for_swift: extracts ProgramArguments from
the Rust plist (`serve --coordinator URL --model M`), converts to
Swift shape (`start --foreground --coordinator-url URL --model M`),
atomic rename
- install_swift_update_bundle_at: if Darkbloom.app/Contents/MacOS/
exists in the extracted bundle, replace bin/{darkbloom,darkbloom-
enclave,mlx.metallib} with symlinks into .app/MacOS and route the
launchd plist's ProgramArguments[0] at the .app's MacOS binary path.
This puts the embedded provisioning profile in scope at runtime, so
the persistent SE key (PR #146) doesn't get errSecMissingEntitlement
on first attestation post-cutover
- plist_path is now an Option<&Path> so tests can avoid touching the
developer machine's real ~/Library/LaunchAgents
Tests added (all passing):
- 6 plist-rewrite unit tests: extract / convert / rewrite / install-
with-plist / .app-aware install / hash-only install
- 1 ported coordinator attestation policy test
- existing 7 auto-update integration tests still pass (302 → 303 total)
Verified by audit:
- macos-26-xlarge has Xcode 26.2 / Swift 6.2, satisfies all
swift-tools-version requirements
- LatestProviderVersion ordering: semver THEN created_at in both memory
and Postgres stores
- /api/version JSON shape matches what auto_update_check_with_install_dir
expects
- StartCommand --foreground doesn't recurse into launchAgent.installAndStart
- Swift ModelScanner reads ~/.cache/huggingface/hub (same as Rust)
- AuthTokenStore path parity (~/.darkbloom/auth_token)
Deployment prerequisite: coordinator changes must be deployed (master
→ dev Cloud Build, then human ecloud deploy to prod) BEFORE tagging
any release. Bridge registration will 400 against an older coordinator
that doesn't know about the darkbloom-bundle- filename.
* chore: cargo fmt on plist-migration code
Post-rustfmt: long format!() args wrapped, with_context closure pulled
onto one line, ternary-style assignment broken into if/else. No
behavior change — `cargo test --bin darkbloom` still 303 pass / 0 fail.
---------
Co-authored-by: ethenotethan <42627790+ethenotethan@users.noreply.github.com>
Co-authored-by: anupsv <6407789+anupsv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: hankbob <hankbobtheresearchoor@gmail.com>
Co-authored-by: Hank Bob <hankbob@researchoors.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
codeandparamfields to OpenAI-compatible error responses, fixing SDK error handling that currently breaks without them.Closes #142
Problem
errorResponse()only populatestypeandmessage. The OpenAI error spec requirescodeand optionallyparam. Their absence breaks:e.codereturnsNone— can't program against error typeserror.codeisundefined— retry logic failscodeto distinguish)Changes
Core:
errorResponsenow supports optionalcodeandparamcodedefaults toerrType— all 202 existing call sites are backward-compatiblewithParam()/withCode()helpers for overridesCall-site updates for OpenAI-canonical codes
codeparammodel_not_foundmodel_not_found"model"e.code === 'model_not_found'ande.param === 'model'invalid_request_error("model is required")"model"insufficient_fundsinsufficient_quotarate_limit_exceededrate_limit_exceededBackward compatibility
opts ...errorDetailOpt— all existing calls compile unchangedcodedefaults toerrType— any SDK already pattern-matching on the type string gets the same value incodeValidation
New tests:
TestErrorResponse_CodeField— code defaults to errTypeTestErrorResponse_WithCode— withCode overridesTestErrorResponse_WithParam— withParam sets paramTestErrorResponse_WithCodeAndParam— both togetherTestErrorResponse_JSONSerialization— full JSON round-tripTestErrorResponse_CodeDefaultsToType— backward compatTestErrorResponse_InsufficientFundsUsesCanonicalCode— canonical codeTestEdge_ErrorResponseFormat— now asserts code and param