fix(api): add code and param fields to OpenAI error responses by hankbobtheresearchoor · Pull Request #144 · Layr-Labs/d-inference

hankbobtheresearchoor · 2026-05-09T03:20:51Z

Summary

Adds code and param fields to OpenAI-compatible error responses, fixing SDK error handling that currently breaks without them.

Closes #142

Problem

errorResponse() only populates type and message. The OpenAI error spec requires code and optionally param. Their absence breaks:

Python SDK: e.code returns None — can't program against error types
Node SDK: error.code is undefined — retry logic fails
Sentry/Datadog: groups all errors as one bucket (no code to distinguish)

Changes

Core: `errorResponse` now supports optional `code` and `param`

// Before
errorResponse("invalid_request_error", "model is required")
// → {"error": {"type": "invalid_request_error", "message": "model is required"}}

// After
errorResponse("invalid_request_error", "model is required", withParam("model"))
// → {"error": {"type": "invalid_request_error", "message": "model is required", "code": "invalid_request_error", "param": "model"}}

code defaults to errType — all 202 existing call sites are backward-compatible
withParam() / withCode() helpers for overrides

Call-site updates for OpenAI-canonical codes

Error type	`code`	`param`	Why
`model_not_found`	`model_not_found`	`"model"`	SDKs check `e.code === 'model_not_found'` and `e.param === 'model'`
`invalid_request_error` ("model is required")	(default)	`"model"`	Param identifies the missing field
`insufficient_funds`	`insufficient_quota`	—	OpenAI canonical code for 402
`rate_limit_exceeded`	`rate_limit_exceeded`	—	Explicit for clarity (matches type)

Backward compatibility

Variadic opts ...errorDetailOpt — all existing calls compile unchanged
code defaults to errType — any SDK already pattern-matching on the type string gets the same value in code

Validation

go test ./internal/api/ -run "TestErrorResponse|TestEdge_ErrorResponseFormat" -v
# 8/8 PASS

go test ./internal/api/ -count=1
# ok  (72.7s)

New tests:

TestErrorResponse_CodeField — code defaults to errType
TestErrorResponse_WithCode — withCode overrides
TestErrorResponse_WithParam — withParam sets param
TestErrorResponse_WithCodeAndParam — both together
TestErrorResponse_JSONSerialization — full JSON round-trip
TestErrorResponse_CodeDefaultsToType — backward compat
TestErrorResponse_InsufficientFundsUsesCanonicalCode — canonical code
Updated TestEdge_ErrorResponseFormat — now asserts code and param

vercel · 2026-05-09T03:20:55Z

@hankbobtheresearchoor is attempting to deploy a commit to the EigenLabs Team on Vercel.

A member of the Team first needs to authorize it.

The errorResponse function only populated type and message, missing code and param required by the OpenAI API spec. Without code, SDKs cannot programmatically distinguish error types (e.g. Python SDK e.code returns None, retry logic breaks, Sentry groups all errors as one). Changes: - errorResponse now accepts optional errorDetailOpt variadic args - code defaults to errType for backward compatibility - withParam() and withCode() helpers for call-site overrides - model-not-found errors include param="model" - model-is-required errors include param="model" - insufficient_funds uses OpenAI-canonical code "insufficient_quota" - rate_limit_exceeded gets explicit withCode for clarity All 202 existing call sites are backward-compatible: the variadic signature means they compile unchanged, and the default code=errType matches the implicit behavior SDKs already assumed. Closes Layr-Labs#142

@test

* Clarify provider trust diagnostics * Add Swift provider runtime * Remove unused e2e vector generator * Continuous batching, GPU-only enforcement, rename to darkbloom, Layr-Labs forks This is the v0.5.0 cutover commit on the Swift provider PR. It lands true continuous batching as the production inference path, threads per-row sampling through the request, hard-fails on CPU-only hosts, renames the user-visible CLI surface from "eigeninference" to "darkbloom" with backward compatibility, and re-homes the mlx-swift / mlx-swift-lm submodules to Layr-Labs forks. Continuous batching (default, no parallel implementations) ---------------------------------------------------------- Replaces the per-request BatchScheduler with one shared BatchGenerator ported from upstream `mlx_lm.generate`. All concurrent requests are merged into one batched forward pass per step. Bit-identical against single-stream greedy on: - Qwen3 0.6B-8bit (dense), B=2 / B=4-ragged - Qwen3.5 0.8B-MLX-4bit (hybrid SSM + attention), B=2 - Gemma 4 26B-A4B-it-8bit (MoE, 26 GB), B=2 The mlx-swift-lm side of this work is at Layr-Labs/mlx-swift-lm@darkbloom-continuous-batching: - BatchKVCache + BatchedCache protocol - SequenceStateMachine, PromptProcessingBatch, GenerationBatch, BatchGenerator - RowSamplers (temperature / top-P / top-K / seed) - Gemma 4 MoE support + K=V branch fix in Gemma4Attention Production scheduler in provider-swift/Sources/ProviderCore/Inference/ BatchScheduler.swift wraps the engine in an actor; detached worker calls into the actor only for short critical sections so cancel/submit never queue behind a long-running step. submit() builds a per-row sampler from request.{temperature, top_p, top_k, seed}. Validation also covers eviction-and-admission: row 0 finishes mid-batch, row C is admitted into its slot, row C's tokens match a solo run, row B (running through the eviction) also matches its solo run. This locks in BatchKVCache.filterBatched + extendBatched correctness end-to-end. Sampler unit tests cover greedy passthrough, top-K=1 determinism, top-K masking, top-P collapse-to-dominant, top-P=1 identity, seeded reproducibility, and different-seed divergence. GPU-only enforcement -------------------- ProviderCore/Inference/GPUEnforcement.swift: - probeMetal(): non-throwing Metal device probe - requireMetal(): throws on missing GPU; pins Device.setDefault(.gpu); idempotent Wired into BatchScheduler.loadModel, StartCommand, BenchmarkCommand, and `darkbloom doctor`. Doctor surfaces a `[PASS] metal gpu: <name>, <N> GB working set` line; `[FAIL]` on Intel/Linux. CPU fallback for inference is rejected up-front with a descriptive error. Rename: eigeninference → darkbloom (Swift CLI surface) ------------------------------------------------------ Canonical names: - eigeninference-enclave → darkbloom-enclave (binary + struct) - Sources/eigeninference-enclave-cli/ → Sources/darkbloom-enclave-cli/ - SwiftPM target EigenInferenceEnclaveCLI → DarkbloomEnclaveCLI - eigeninference-bundle-macos-arm64.tar.gz → darkbloom-bundle-macos-arm64.tar.gz - ~/.config/eigeninference/ → ~/.config/darkbloom/ (preferred path) - Mobileconfig prefix: EigenInference-Enroll-* → Darkbloom-Enroll-* Backward compatibility: - install.sh creates a `eigeninference-enclave` symlink to `darkbloom-enclave` so existing install scripts keep resolving. - Config loader still reads ~/.config/eigeninference/ and the App Support legacy paths as fallbacks; new writes always go to ~/.config/darkbloom/. - LocalDataCleanup.purge() removes both directories. - release-swift.yml publishes the latest tarball under both canonical and legacy filenames. - NodeKeyPair.legacyDirNames and SecurityHardening MDM-profile-name matchers still accept the old name. - Coordinator/Rust/UI surfaces (R2 buckets, Stripe descriptors, Solana memos, telemetry source attribution) intentionally untouched. CLI subcommands shipped in v0.5.0 --------------------------------- darkbloom serve / start / stop, status, doctor, models {list, catalog, download, remove}, enroll, unenroll, login, logout, logs, autoupdate, benchmark, update, verify. start --foreground is the launchd entrypoint; start --local --port N runs a standalone OpenAI-compatible HTTP server. PID-file single-instance enforcement, caffeinate-based sleep prevention, panic-hook telemetry, and metallib hash in attestation are all wired in. Submodule re-homing ------------------- .gitmodules now points to Layr-Labs/mlx-swift and Layr-Labs/mlx-swift-lm. The mlx-swift pointer is unchanged (clean `main`). The mlx-swift-lm pointer advances from 3ec4b8a (codex/local-mlx-swift-dependency) to 91612d5 (darkbloom-continuous-batching) which carries the batching engine + Gemma 4 MoE fork on Layr-Labs/mlx-swift-lm. Tests ----- 135 / 135 tests pass in 16.5 s with DARKBLOOM_LIVE_MLX_TESTS=1 and DARKBLOOM_LIVE_MLX_GEMMA=1 (live MLX inference against real models plus the gated 27 GB Gemma generation test). * Bump mlx-swift-lm submodule to main after re-homing to Layr-Labs Layr-Labs/mlx-swift-lm@main now carries the continuous-batching engine, per-row samplers, and Gemma 4 MoE port at 8d76944. Same tree as the prior 91612d5 commit on the darkbloom-continuous-batching branch, but without the local-path mlx-swift dep hack, so the fork is consumable by URL outside this repo. * Untrack .claude/ files and drop dangling cross-references The .claude/ directory holds local agent state (cursor task files, working notes, the in-progress migration plan). Those don't belong in the repo. Untrack the two committed markdown files and broaden the .gitignore from `.claude/worktrees/` to `.claude/` so future agent runs don't add them back. Strip the dead links to .claude/swift-migration-plan.md from CLAUDE.md, provider-swift/README.md, docs/ARCHITECTURE.md, and scripts/fetch-metallib.sh -- the surrounding prose stands on its own. The local files remain on disk for active reference; only the tracking is removed. * Idle-timeout unload + coordinator-driven model preload protocol Two related additions to the provider's model lifecycle: 1) Idle-timeout unload ---------------------- ProviderLoop now runs a background monitor that polls every minute. If `idleTimeoutMins` minutes (default 60) have elapsed since the last inference activity AND no requests are in flight, the loaded ModelContainer is dropped. The next inference request lazy-reloads. `idleTimeoutMins == 0` disables the monitor; the model stays resident forever. The decision is extracted into `IdleTimeoutPolicy.shouldUnload(...)` so the rule is unit-testable without spinning up the full ProviderLoop actor (which depends on Secure Enclave, coordinator client, and security posture). Five unit tests pin the policy: (a) unloads when all conditions met, (b) never unloads with inflight requests, (c) never unloads with no model loaded, (d) waits for the timeout to elapse, (e) zero-timeout edge case is still defensive. Activity tracking: `lastInferenceAt` updates on every request admission and on every request finish (`removeInflightTask`). The worker is a detached `Task` so cancel/submit on the actor never queue behind the timer. 2) Coordinator-driven model preload ----------------------------------- New WebSocket message `coordinator → provider: load_model`. The provider has no inbound listener (security: a discovered IP can't reach the GPU), so the coordinator pushes preload requests over the existing outbound WebSocket connection that the provider opened. Use case: the coordinator predicts demand for model X on machine Y in the next hour and warms it ahead of time. Provider behavior: - If model is already loaded: short-circuit, reply succeeded. - Otherwise: emit `load_model_status` "started" immediately, kick off `ensureModelLoaded` in a detached Task, then emit "succeeded" or "failed" (with an error string) when the load settles. Wire surface added in three places (per AGENTS.md sync rule): - coordinator/internal/protocol/messages.go: `TypeLoadModel`, `TypeLoadModelStatus`, `LoadModelMessage`, `LoadModelStatusMessage`, plus the `LoadModelStatusStarted/Succeeded/Failed` constants. - provider-swift/.../Protocol/Messages.swift: new `CoordinatorMessage.loadModel(...)` case + `ProviderMessage .loadModelStatus(...)` case + Codable on both sides. - provider-swift/.../Coordinator/CoordinatorClient.swift: dispatch inbound `load_model` to a new `CoordinatorEvent.loadModel(modelId)` and add `OutboundMessage.loadModelStatus(...)` for the reply. ProviderLoop wires `handleLoadModelRequest(modelId:send:)` for the new event. Round-trip tests cover decoding a Go-style `load_model` JSON and encoding all three lifecycle status replies (started / succeeded / failed-with-error) with snake_case wire keys. Rust legacy provider intentionally untouched. The coordinator should gate `load_model` dispatch on `backend == "mlx-swift"` so the Rust path never receives an unknown message; that gate lives on the coordinator side and isn't part of this commit. Tests ----- 141 / 141 tests pass with DARKBLOOM_LIVE_MLX_TESTS=1 and DARKBLOOM_LIVE_MLX_GEMMA=1 (live MLX inference + Gemma 4 26B-A4B-it-8bit MoE batching included). New: 5 IdleTimeoutPolicy tests + 1 loadModel round-trip protocol test. * Add end-to-end performance tests: TTFT, encryption, batching, model load Four new live tests that produce reproducible numbers for the four scenarios the operator asked about. Gated by DARKBLOOM_LIVE_MLX_TESTS=1; all four target Qwen3 0.6B-8bit so the suite finishes in ~7 s. A) warm TTFT baseline -- pure inference TTFT with no encryption and the model already loaded. B) cold TTFT -- spins up a fresh ModelContainer each iteration so the weights are re-paged from disk; reports load_time and load_time + first_token separately. C) encrypted TTFT -- runs the request body through NodeKeyPair.encrypt (consumer side) and NodeKeyPair.decrypt (provider side) with real libsodium NaCl box, then submits. Reports encrypt-only, decrypt-only, warm TTFT, and total E2E first-token (enc + dec + TTFT) so each layer's cost is visible. D) batched TTFT -- B=1, B=2, B=4 concurrent submissions on a single shared scheduler. Reports per-row TTFT and aggregate throughput so the continuous-batching scaling story is honest. Headline numbers on M4 Max with Qwen3 0.6B-8bit: warm TTFT (plaintext): ~20 ms encrypt (consumer side): ~0.05 ms (libsodium NaCl box) decrypt (provider side): ~0.02 ms E2E first-token (enc+dec+TTFT): ~31 ms cold model load: ~856 ms cold load + first token: ~1036 ms aggregate throughput B=1: 87.4 tok/s aggregate throughput B=2: 176.2 tok/s (~2.0x) aggregate throughput B=4: 317.1 tok/s (~3.6x) per-request TTFT B=1 -> B=4: 34 ms -> 36 ms (flat) Encryption is essentially free, continuous batching scales near-linearly to B=4, and per-request TTFT is invariant under batching -- the key continuous-batching scheduler invariant. The tests assert lower-bound liveness (durations > 0, all rows complete) but don't pin absolute latencies, since those vary by hardware. Numbers print to stderr in a "[perf]" prefix so they land in the test log without polluting test stdout. While here, fixed a `String(format:)` bug in the printRow helper where `%s` was used with a Swift String (would have segfaulted the test process via _platform_strlen on an unaligned pointer). 145 / 145 tests pass in 9 s with DARKBLOOM_LIVE_MLX_TESTS=1. * Add Gemma 4 26B-A4B-it-8bit MoE tier to performance suite Refactor PerformanceLiveTests so every scenario (warm TTFT, cold load, encrypted E2E, batched throughput) is parameterised by a `ModelConfig` struct (label, modelID, wired-memory budget, iteration counts, batch sizes, max_tokens). Two configs ship in the suite: - Qwen3 0.6B-8bit smoke tier (DARKBLOOM_LIVE_MLX_TESTS=1) - Gemma 4 26B-A4B-it-8bit production tier (DARKBLOOM_LIVE_MLX_TESTS=1 + DARKBLOOM_LIVE_MLX_GEMMA=1) Both run all four scenarios. Total 8 @test methods (4 + 4). Headline numbers on M4 Max with weights memory-mapped from local cache: Gemma 26B MoE: warm TTFT 309 ms cold load 2.63 s cold load + first token 3.07 s encrypt (consumer side) 0.05 ms decrypt (provider side) 0.03 ms E2E first-token 262 ms B=1 throughput 10.2 tok/s B=2 throughput 16.7 tok/s (1.64x) B=4 throughput 23.9 tok/s (2.34x) Qwen3 0.6B (for comparison): warm TTFT ~21 ms cold load ~887 ms E2E first-token ~32 ms B=4 throughput ~302 tok/s Three things the Gemma tier surfaces that the smoke tier doesn't: 1. Encryption is *still* essentially free at 26B scale -- 70-80 us combined for encrypt + decrypt, dwarfed by the 200+ ms memory-bandwidth-bound prefill. 2. Per-row TTFT scales SUB-linearly with B for MoE (234 -> 344 -> 603 ms at B=1/2/4) because each batched prefill processes a heavier forward. Aggregate throughput still wins (10 -> 17 -> 24 tok/s). 3. Cold load on a 26 GB MoE that's still in the OS page cache is ~2.6 s -- the relevant number for the idle-timeout-reload path. First-ever boot would be longer (NVMe-bound), but unmeasurable from a unit test without privileged page-cache flushing. Also tighten the report formatting: column padding to 56 chars, "ms" under 1 s and "s" above, max_tokens=8 for Gemma (vs 16 for Qwen) so the suite finishes in ~30 s with all four scenarios run twice. 149 / 149 tests pass in 37 s with both env vars set. * Performance audit vs mlx_lm: bracket the dispatch-overhead gap The user noticed that "10.2 tok/s for Gemma 26B" looked too low. They were right. Side-by-side with `mlx_lm` 0.31.3 Python on the same M4 Max + same checkpoints: Qwen3 0.6B-8bit mlx_lm: 426 tok/s us: ~84 tok/s (5.0x) Gemma 4 26B-A4B-it-8bit MoE mlx_lm: 84 tok/s us: ~33 tok/s (2.4x) To localize the gap, this commit adds a "decode-tps bracket" test that measures the same B=1 steady-state decode through three paths: 1. Pure model loop -- model.callAsFunction directly, no scheduler 2. BatchGenerator -- our continuous-batching engine, B=1 3. BatchScheduler -- production path (actor + AsyncStream) Findings on Gemma 26B MoE (decode-only, 64 tokens): pure loop, sync eval 34.6 tok/s pure loop, async eval 34.4 tok/s (no improvement -- not the issue) BatchGenerator B=1 32.6 tok/s (-6%, noise-level) BatchScheduler.submit 32.5 tok/s (-6%, noise-level) mlx_lm Python reference 84.0 tok/s (2.4x faster) Conclusion: the gap is at the **MLX-Swift dispatch layer**, not in our scheduler or batched-cache code. The pure model loop is already 2.4x slower than Python. Adding our BatchScheduler + actor + worker adds < 6% on top -- not the bottleneck. The 8-13 ms per-step CPU overhead is consistent with kernel-launch latency in mlx-swift bindings. mlx_lm Python uses `mx.compile` on the decode step to amortize this; mlx-swift-lm does not. Closing the gap is a separate workstream on the upstream library. Other improvements in this commit: * Bump Gemma's batched max_tokens from 8 -> 32 so steady-state decode dominates the aggregate TPS metric. * Add steady-state decode TPS reporting alongside aggregate (subtract prefill so it compares like-for-like with mlx_lm's "Generation: X tokens-per-sec" headline). * Switch the throughput tests to a long-output prompt ("write a 200 word story...") so the model decodes to max_tokens instead of hitting EOS at ~12 tokens. The B=1 number was misleadingly low before because the prior prompt asked for "a single word". * Add async-eval pipelining variant to the bracket -- confirms mx.async_eval alone doesn't close the gap (which means the missing optimization is `mx.compile`, not just async dispatch). * Add Qwen3 bracket test alongside the Gemma one. * Document the gap explicitly in the file header so future optimisation work has a clear target. Honest headline numbers (M4 Max, weights memory-mapped from cache): Gemma 26B MoE warm TTFT 280-352 ms Gemma 26B MoE cold load 3.32 s (re-page from cache) Gemma 26B MoE encrypt+decrypt 0.10 ms (free) Gemma 26B MoE steady-state decode 32-40 tok/s B=1 35-39 tok/s B=4 aggregate Qwen3 0.6B steady-state decode 84 tok/s B=1 323 tok/s B=4 aggregate Continuous batching itself works correctly: B=4 aggregate is 2.9x B=1 (Gemma) and 3.8x B=1 (Qwen). The dispatch-overhead headwind applies equally to all batch sizes. 151 / 151 tests pass in 71 s with both env vars set. * Compare against mlx_lm batched + greedy fast-path in BatchScheduler The previous perf audit only compared B=1 against mlx_lm. This commit extends the comparison to B=1, B=2, B=4 by adding a Python benchmark script (scripts/mlx_lm_batch_bench.py) that drives mlx_lm's upstream BatchGenerator, and applies one targeted Swift-side optimization based on what the comparison surfaced. Reference numbers (mlx_lm 0.31.3, M4 Max, decode-only tok/s): Qwen3 0.6B-8bit B=1: 265 B=2: 694 B=4: 1119 Gemma 4 26B-A4B-it-8bit MoE B=1: 74 B=2: 126 B=4: 181 The gap WIDENS with batch size, which pointed at an O(B) overhead in our per-row sampling path. Smoking gun: GenerationBatch.step takes a slow path whenever ANY row's sampler is non-nil, doing B separate slice + sample + concat ops (=> 9 kernel launches per token at B=4) instead of the vectorized fallback (=> 1 kernel launch). Our BatchScheduler.submit was passing a non-nil greedy closure even when temperature == 0, forcing every batch through the slow path. Fix: when temperature <= 0, pass `nil` so the row falls through to the vectorized fallback. The fallback is also greedy, so the result is identical -- only the dispatch path changes. Per-row temperature / top-P / top-K / seed all still work for non-greedy rows. Swift numbers after the fix (decode-only): Qwen3 0.6B-8bit B=1: 88 B=2: 181 B=4: 351 (was 84 / 174 / 323) Gemma 4 26B-A4B-it-8bit MoE B=1: 37 B=2: 23 B=4: 42 (was 33 / 21 / 39) Modest +6-13% across the board. The remaining 3-4x gap to Python is at the MLX-Swift dispatch layer (per-step kernel-launch overhead); mlx_lm closes it via `mx.compile` on the decode step, which isn't applied in mlx-swift-lm. That's a separate workstream. Continuous batching scaling is still healthy: Qwen B=4 / B=1 = 4.0x (matches mlx_lm's 4.2x exactly) Gemma B=4 / B=1 = 1.1x (mlx_lm's is 2.4x; gap reflects MoE expert dispatch where Python's compile pays off most) Other changes: * scripts/mlx_lm_batch_bench.py -- runnable apples-to-apples bench for future regression checks. Reproduces the reference numbers in the file header. * Update PerformanceLiveTests.swift docstring with the side-by-side table so the gap is visible to anyone reading the test. 151 / 151 tests pass. * Perf compare mlx_lm batching and bump mlx-swift-lm decode optimizations The user called out that our Gemma 26B throughput looked too low, so this commit makes the comparison apples-to-apples against mlx_lm Python's BatchGenerator and bumps the mlx-swift-lm submodule to the optimized main commit. New reference script: scripts/mlx_lm_batch_bench.py It runs mlx_lm.generate.BatchGenerator at B=1/2/4 over the same long-output prompt used by PerformanceLiveTests and reports prefill+1, decode-only TPS, and aggregate TPS. Reference numbers on M4 Max: Qwen3 0.6B-8bit B=1: 265 B=2: 694 B=4: 1119 tok/s Gemma 4 26B-A4B-it-8bit MoE B=1: 74 B=2: 126 B=4: 181 tok/s Swift improvements landed in Layr-Labs/mlx-swift-lm@b02ea5b: - mlx_lm-style double buffering in GenerationBatch: constructor primes the first token, next() returns current token while async-evaluating the following token. - Greedy fast path avoids logSumExp: argMax(logits) == argMax(logprobs), and we don't expose logprobs downstream today. - BatchScheduler now passes nil for temperature=0 samplers so batches use the vectorized greedy fallback instead of per-row slice/sample/concat. - Token tensors are UInt32 to match mlx_lm. - BatchKVCache now exposes innerState and KVCache conforms to Updatable, which fixes the cache state surface needed for future compile work. Measured Swift deltas: Qwen3 0.6B: B=1 decode ~84 -> ~104 tok/s B=4 aggregate ~323 -> ~363 tok/s Gemma 26B MoE: B=1 decode ~32 -> ~37 tok/s B=4 aggregate ~39 -> ~40 tok/s This closes the avoidable scheduler/batching overhead we found, but does not fully close the remaining 2-4x gap to Python. The bracket test shows BatchGenerator/BatchScheduler are now within noise of the pure model loop; the remaining gap is in mlx-swift model dispatch / lack of stateful mx.compile support. Attempting to compile the batched-cache decode graph still fails in mlx-swift with "uncaptured inputs", so that remains an upstream library workstream rather than a provider scheduler bug. * Clarify release-mode batch performance measurements The previous perf notes mixed debug-mode Swift numbers with mlx_lm Python reference numbers, which made the Swift engine look far worse than it is. This test-only cleanup makes the performance suite report the data needed to keep comparisons honest. Changes: - Update the PerformanceLiveTests header to state explicitly that mlx_lm comparisons must use `swift test -c release`; debug Swift is several times slower and not a valid reference. - Add direct BatchGenerator B=2/B=4 decode-only measurements to the bracket test, in addition to pure loop and BatchScheduler.submit. - Add "model-side scheduler" TPS in the public batched test so we can distinguish model decode speed from public text streaming / AsyncStream / detokenization costs. Release-mode checks on this machine: - Qwen3 0.6B direct BatchGenerator B=4: ~1130 tok/s, matching mlx_lm's ~1119 tok/s reference. - Gemma 4 26B-A4B-it-8bit direct BatchGenerator B=4: ~186 tok/s, matching mlx_lm's ~181 tok/s reference. - BatchScheduler.submit B=1 decode bracket also lands at the direct model rate in release mode (~402 tok/s Qwen, ~79 tok/s Gemma); public streaming tests report separate model-side and aggregate numbers so regressions are localizable. No production code changes in this commit. * Complete Swift provider runtime verification * Bridge Rust updater to Swift provider bundles * Add Rust to Swift updater E2E tests * Add Rust bridge release workflow * E2E testbed: integration tests, profiling, and benchmarking infrastructure (#136) * Flatten coordinator/internal/ to coordinator/, add E2E integration test suite Promote Go module root from coordinator/ to repo root so the e2e test suite can import coordinator packages. Flatten coordinator/internal/ to coordinator/ to remove the Go internal package restriction. All import paths change from github.com/eigeninference/coordinator/internal/X to github.com/eigeninference/d-inference/coordinator/X. The module path is now github.com/eigeninference/d-inference. 12 E2E integration tests using the Swift provider (mlx-swift backend): - NonStreamingInference, StreamingInference - MultipleRequestsAccounting, E2EEncryptionCorrectness - BillingBalanceDeduction, ProviderPayoutSplit, ReferralRewardDistribution - InsufficientBalance, InvalidModel - StreamingContentValidation, ConcurrentRequests, AttestationHeaders Each test gets its own isolated suite (Postgres + coordinator + provider) via startSuite(t). A semaphore serializes suite lifecycles to prevent GPU contention from concurrent MLX model loads. Update CI workflows to reference go.mod at repo root, exclude e2e/ from unit tests, and use swift build for the provider. * Move coordinator e2e back to coordinator/internal/e2e/ The coordinator's own e2e package was incorrectly flattened into coordinator/e2e/ alongside the repo-root e2e/ testbed suite. Restore it to coordinator/internal/e2e/ where it belongs. * Run integration tests on any PR, not just master/main * Fix CI: install Docker on macos-15, increase timeout to 30m, serial tests * Use colima for Docker on macOS CI * Remove invalid --no-mount flag from colima start * Add native Postgres fallback, drop Docker/colima from CI Docker Desktop and colima both fail on macOS CI runners due to virtualization restrictions. Add a native Postgres lifecycle that uses initdb + postgres directly (installed via Homebrew). The Start() method tries Docker first, falls back to native. CI now installs postgresql@16 via brew instead of Docker. * Download MLX model in CI before running integration tests * Use Python API for model download (huggingface-cli is deprecated) * Use shared suite across all integration tests Instead of starting a new suite (Postgres + coordinator + provider + model load) per test, use a single shared suite initialized on first access. This cuts total test time from ~18min to ~3min since the expensive model load only happens once. * Build provider in debug mode for CI (skips SIP/security checks) CI macOS runners have SIP disabled, which causes the provider to exit with 'System Integrity Protection is disabled'. Debug builds skip verifySecurityPosture() via #if !DEBUG, allowing tests to run on CI. Add TESTBED_PROVIDER_CONFIG env var (default: release) to control the Swift build configuration from testbed. * Force-trust provider in tests, disable frequent challenges CI macOS runners have SIP disabled, which causes the provider to fail attestation challenges. Add ForceTrustProvider() to override status/trust/SIP verification for testing, set challenge interval to 1h, and add a 3s delay after registration to let the initial challenge fire before overriding. * Force all privacy capabilities in ForceTrustProvider for testing The private-text routing gate checks PythonRuntimeLocked and DangerousModulesBlocked which are always false on the Swift backend (no Python runtime). ForceTrustProvider now sets all privacy capabilities to true and drains queued requests immediately after trust promotion. * Restore per-test isolated suites Each test gets its own Postgres + coordinator + provider. With debug builds, ForceTrustProvider, native Postgres, and model pre-download, each suite starts in ~15-20s. * Add load generator, profiling tests, multi-provider support - Suite.Providers is now []*Provider; TESTBED_NUM_PROVIDERS env var controls how many provider subprocesses start per suite - New LoadGenerator in testbed/load.go with configurable concurrency, total requests, streaming, max_tokens, temperature - New profile tests: SingleProviderStreaming, SingleProviderNonStreaming, HighConcurrency — each prints segment tables with mean/p50/p95/max - Existing integration tests (NonStreaming, Streaming, Concurrent) now emit Instrument events and print profile tables - Profile SummaryTable uses millisecond resolution instead of microsecond * Add multi-model provider specs, user pool, and latency decomposition headers SuiteConfig now takes ModelSpecs (model ID + provider count per model) and NumUsers. Providers are started per-spec with unique PID files (fixes single-instance lock killing sibling providers). A user pool with round-robin API key rotation is created at startup. Coordinator sets X-Queue-Wait-Ms and X-Provider-Latency-Ms response headers from PendingRequest timing fields (QueuedAt, DispatchedAt, FirstChunkAt). LoadGenerator parses these and emits per-segment stats: client_to_coordinator, queue_wait, coordinator_to_provider, provider_to_client. Provider ProcessLifecycle respects DARKBLOOM_PID_FILE env var for multi-instance testing. Add SetSkipChallenge to Server for test runs. * Rename SegmentClientToCoordinator to SegmentTotalE2E The segment measures full end-to-end wall clock time, not just client-to-coordinator latency. The old name was misleading. * Decompose X-Timing header into per-phase microsecond breakdown Replace X-Queue-Wait-Ms / X-Provider-Latency-Ms with a single X-Timing JSON header containing parse_us, reserve_us, route_us, queue_us, encrypt_us, dispatch_us, provider_us. Move timing fields onto a RequestTiming struct in PendingRequest. LoadGenerator parses the JSON and emits per-segment stats with auto ms/µs precision. * Add latency regression assertions, SegmentStatsMap, and heavy-load benchmark - Add SegmentStatsMap() to LoadResult for per-segment mean/p50/p95/p99/max - Wire coordinator overhead assertions into all benchmark and profile tests - Update DefaultThresholds with realistic values based on benchmark data - Add CoordinatorOverheadThresholds() alias - Deduplicate SegmentStatsView (assert package uses type alias to testbed) - Clean up profile_test.go: remove redundant second load loop, use assertions - Add PromptBytes field to RequestConfig for large-payload testing - Add HeavyLoad 100-concurrent 10KB benchmark - Replace bubble sort with sort.Slice in computeStats * Split CI into eval + benchmark jobs, post benchmark results as PR comment Integration tests (TestIntegration|TestProfile) run on every push/PR. Benchmarks (TestBenchmark) run only on PRs and post a markdown summary as a PR comment via gh pr comment. LoadResult and AssertionReport gain SummaryMarkdown() methods for markdown table formatting. A TestMain in benchmark_test.go writes the aggregated markdown to BENCHMARK_MD_PATH when set. * Skip multi-model benchmark in CI (gemma model not downloaded) The M1 Virtual CI runner only downloads Qwen3.5-0.8B; the gemma multi-model test requires a second model that isn't available. * Download gemma-3-270m-4bit in CI, remove multi-model skip * Include model IDs and RAM sizes in benchmark PR comment * address feedback * fix: soft-fail Swift tests on dev + download full model for CI * feat: environment-scoped R2 + coordinator secrets for dev/prod release isolation - Move R2_BUCKET from vars to secrets so it participates in GitHub environment scoping (dev vs prod get different buckets/credentials) - Add documentation header listing all environment-scoped secrets required per environment - Soft-fail Swift unit tests on dev releases (live MLX model cache may be incomplete on CI) - Download full model (remove --include filter) for deterministic CI cache seeding * feat: DEV_/PROD_ prefixed repo secrets for R2 + coordinator env isolation Both release workflows now resolve DEV_ or PROD_ prefixed repo secrets in a resolve-env step using bash indirection — no GitHub environments needed. The environment: gate is removed since secrets live at repo level with prefixes. Required repo secrets: DEV_R2_ACCESS_KEY_ID, PROD_R2_ACCESS_KEY_ID DEV_R2_SECRET_ACCESS_KEY, PROD_R2_SECRET_ACCESS_KEY DEV_R2_ENDPOINT, PROD_R2_ENDPOINT DEV_R2_BUCKET, PROD_R2_BUCKET DEV_R2_PUBLIC_URL, PROD_R2_PUBLIC_URL DEV_COORDINATOR_URL, PROD_COORDINATOR_URL DEV_RELEASE_KEY, PROD_RELEASE_KEY * fix: RELEASE_KEY is shared, not env-prefixed * fix: resolve env secrets inline to avoid GitHub cross-job output masking * fix: add DEV_RELEASE_KEY/PROD_RELEASE_KEY to env-prefixed secrets * Add STRIDE threat model for runtime security review 40 threats across 9 trust boundaries (coordinator/provider WebSocket, provider operator vs process, browser/UI, Apple MDM/MDA, admin API, inference engine, payments, Apple attestation chain). Adversaries: malicious provider, malicious consumer, external attacker. Each threat includes affected_files globs, mitigations with status, open_findings links to the existing security audit, and a detection_hint for automated PR review. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Expand threat model trust boundaries with implementation detail Each of the 9 trust boundaries now documents how_it_works (exact code paths, line numbers, auth mechanisms, data flows) and current_limitations (specific open gaps with SEC-* references). Sources: coordinator/internal/ api/{server,provider,release_handlers,device_auth,billing_handlers}.go, registry/registry.go, attestation/, mdm/, provider-swift/Sources/ ProviderCore/Security/{AntiDebug,BinaryHasher,SecureEnclaveIdentity, SecurityHardening}.swift, Crypto/NodeKeyPair.swift, Inference/ {BatchScheduler,IdleTimeoutPolicy,InferenceCancellation}.swift, ProviderLoop.swift, console-ui/src/{hooks/useAuth,lib/{api,store, encryption}}.ts, next.config.ts. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Add threat model PR review workflow On every PR against master/main, the workflow: 1. Gets the PR diff via gh pr diff 2. Matches changed files against affected_files globs in docs/threat-model.yaml 3. Calls Claude API (claude-sonnet-4-6) with the focused diff + full threat model 4. Posts (or updates) a single PR comment with STRIDE-based security analysis Uses prompt caching on the static threat model block to minimise API cost on repeated pushes. The comment marker  lets the workflow update rather than append on each push. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Persistent Secure Enclave key with keychain access group enforcement (#146) * Add persistent Secure Enclave attestation key with keychain access group enforcement Replace ephemeral CryptoKit SE keys with persistent Security framework keys stored in the macOS data protection keychain. The key is bound to the signing team's keychain access group (SLDQ2GJ6TL.io.darkbloom.provider), enforced by securityd at the kernel level. A patched binary re-signed with codesign -s - gets errSecMissingEntitlement and cannot access the key. - PersistentEnclaveKey: Security framework SE key with SecKeyCreateRandomKey, kSecAttrIsPermanent, and team-scoped access group - AttestationSigner protocol: abstracts over both ephemeral and persistent keys - ProviderLoop: tries persistent key first, falls back to ephemeral with warning - Entitlements plist with keychain-access-groups for production signing - 8 tests covering creation, persistence, signing, deletion, protocol conformance * Embed provisioning profile in .app bundle for persistent SE key The data protection keychain requires a provisioning profile to authorize the keychain-access-groups entitlement. Wrap the CLI binaries in a minimal Darkbloom.app bundle with embedded.provisionprofile so the persistent SE attestation key works on provider machines. - release-swift.yml: new step decodes PROVISIONING_PROFILE_BASE64 secret, builds Darkbloom.app/Contents/ structure, signs bundle + individual binaries - install.sh: detects .app bundle layout, symlinks bin/ into the app bundle - Backward-compatible: falls back gracefully if secret is not set or if provider receives a flat (pre-.app) bundle * Add com.apple.application-identifier to provider entitlements Required for data protection keychain access. Must match the bundle ID in the provisioning profile (SLDQ2GJ6TL.io.darkbloom.provider). * Address review: data protection keychain flag, tighter error handling, real SE probe Codex P1 / hank P1: - coordinator/api/install.sh: restore __DARKBLOOM_COORD_URL__ placeholder (the coordinator templates this at serve time via server.go; hardcoding the URL broke dev/self-hosted coordinators) - PersistentEnclaveKey: add kSecUseDataProtectionKeychain: true to all Security framework calls. Without it, queries may hit the legacy file-based keychain where access group enforcement is silently ignored. hank P2: - loadOrCreate: catch only errSecItemNotFound before falling through to createNew. Auth failures, locked keychain, and missing entitlement now propagate to the caller instead of racing with key creation. - isAvailable: probe real SE capability via CryptoKit's SecureEnclave.isAvailable instead of just checking macOS version. Now returns false on Intel Macs without T2 and macOS VMs without virtualized SE. Added doc comment noting the entitlement dependency. * fix(api): add code and param fields to OpenAI error responses (#144) The errorResponse function only populated type and message, missing code and param required by the OpenAI API spec. Without code, SDKs cannot programmatically distinguish error types (e.g. Python SDK e.code returns None, retry logic breaks, Sentry groups all errors as one). Changes: - errorResponse now accepts optional errorDetailOpt variadic args - code defaults to errType for backward compatibility - withParam() and withCode() helpers for call-site overrides - model-not-found errors include param="model" - model-is-required errors include param="model" - insufficient_funds uses OpenAI-canonical code "insufficient_quota" - rate_limit_exceeded gets explicit withCode for clarity All 202 existing call sites are backward-compatible: the variadic signature means they compile unchanged, and the default code=errType matches the implicit behavior SDKs already assumed. Closes #142 * feat: add Datadog observability stack for dev coordinator (#143) * Fix Darkbloom analytics tracking * Harden release workflow protections (#103) * Harden release registration and binary hash policy (#99) * Harden release registration and binary hash policy * derive release download URL from allowlist * Stabilize provider coordinator test --------- Co-authored-by: Gajesh Naik <26431906+Gajesh2007@users.noreply.github.com> * Remove stale Python integration test (#109) * e2e: add local simulation environment skeleton Introduces scripts/e2e-runner.py, a Python orchestrator that spins up the real coordinator binary with test-friendly configuration (in-memory store, mock billing, no trust requirements) alongside a simulated or real provider, and runs HTTP/WebSocket-level assertions against the live stack. Key components: - Coordinator class: builds and spawns coordinator with EIGENINFERENCE_MIN_TRUST=none, EIGENINFERENCE_BILLING_MOCK=true, and in-memory store - SimulatedProvider: pure-Python WebSocket client speaking the full provider protocol (register, attestation challenge/response, heartbeat, inference request/response) - Test framework: decorator-based test registration, pass/fail summary, signal-safe cleanup via atexit + signal handlers - Test stubs: test_basic (registration + discovery), test_inference (consumer request routing), test_multi_provider (two providers, same model) TODO: - RealProvider wrapper around darkbloom serve --coordinator - Coordination between provider challenge cycle and consumer request timing - API key handling for consumer vs admin routes - Python dependency management (websockets, cryptography) * Revert "e2e: add local simulation environment skeleton" This reverts commit d02074e. The Python E2E runner adds noise on top of the existing Go integration tests (internal/api/integration_test.go + fullstack_integration_test.go) which already cover the full coordinator protocol surface. The cross-language orchestration doesn't buy anything over what httptest.Server + simulated providers already provide. * Remove stale Python integration test @ethenotethan tests/integration_test.py is superseded by the Go-based coordinator integration tests at coordinator/internal/api/: - Test coverage for coordinator protocol (register, challenge, heartbeat, inference) is covered by integration_test.go using httptest.Server + Go simulated providers — same coverage, no binary build needed - Full-stack GPU inference is covered by fullstack_integration_test.go with real vllm-mlx backends (gated behind LIVE_FULLSTACK_TEST=1) - The Python test uses stale binary names ('eigeninference-provider'), old flags ('--backend mlx-lm'), and predates attestation challenges, E2E encryption, and the vllm-mlx backend migration - No external dependency coverage (Postgres, Stripe, etc.) is lost — the coordinator main.go wiring for those is trivially tested elsewhere - The Python SDK tests (4.5.x) belong in the SDK repo, not the infra repo --------- Co-authored-by: Hank Bob <hankbob@researchoors.com> * chore: remove unused dependencies (#112) * chore: remove unused dependencies * test: fix console ui test isolation * chore: prune repo-wide dead code findings * ci: run CI on any PR, not just master/main (#119) * ci: remove racing deploy-dev-coordinator workflow (#137) Cloud Build (deploy/gcp/cloudbuild.yaml) already deploys the coordinator on the same trigger (push to master touching coordinator/** or deploy/gcp/**). Having both paths active creates a race condition where two CI systems simultaneously deploy to the same dev VM — see #115. * feat: add Datadog observability stack for dev coordinator Install Datadog Agent on the dev GCE VM (DogStatsD, APM, journald logs) and wire the coordinator to emit structured metrics, split attestation counters, model_type tags, reactive provider-count gauges, and a completion-tokens counter. Rebuild the dev dashboard with 7 sections covering metrics, logs, traces, and system health. * fix: prevent double-decrement when untrusted provider disconnects Disconnect now checks StatusUntrusted before decrementing the online counter and model-provider gauges, since MarkUntrusted already decremented them. * feat: add fleet version and binary hash observability New metrics: - providers.per_version gauge (per provider binary version) - providers.per_binary_hash gauge (per attested binary hash) - coordinator.min_provider_version_set gauge (1 when configured) - provider_version_below_minimum counter (tagged by gate and version) Gates instrumented: - registration (provider.go) - challenge revalidation (provider.go) - manifest sync (server.go) Registry additions: - ProviderCountByVersion() - ProviderCountByBinaryHash() Dashboard: Fleet Version & Binary Hash group with providers by version, providers by binary hash, min provider version, below-minimum events, and top binary hashes toplist. * fix: update Dockerfile + cloudbuild for go.mod at repo root go.mod moved from coordinator/ to repo root during the swift-provider merge. Build context is now repo root, Dockerfile copies coordinator/ subdir explicitly. * fix: chmod +x coordinator binary in Dockerfile * fix: ensure coordinator binary is executable in builder stage * fix: rename coordinator source dir in builder to avoid colliding with binary path * fix: copy full repo in Dockerfile builder so go.mod resolves all packages * fix: remove unused modelTypeTag and format Go files for CI * fix: skip python/dangerous-modules check for swift runtime in private text gate * billing telemetry + MarkUntrusted race fix + Swift routing tests - Add Datadog histogram metrics for reservation amounts, settlement refunds, provider credits, and platform fees - Add store.debit/credit.latency_ms histograms for DB operation timing - Add billing.cost_clamped and billing.reservation_refunds counters - Fix race in MarkUntrusted: hold r.mu write lock through counter decrement to prevent double-decrement with Disconnect - Add unit tests for Swift provider privacy caps (with/without Python) - Add E2E test for Swift provider routing via challenge-verified path - Update dev-network-dashboard.json with Billing & Store group * fix Heartbeat reviving untrusted providers causing onlineCount double-decrement * revert orthogonal landing/console-ui/provider changes * remove unbounded binary_hash cardinality, add input token metrics + store latency, fix dashboard group-by * fix review feedback: ModelType() untrusted filter, routing.cost_ms by provider, billing in cents, dead comment --------- Co-authored-by: Gajesh Naik <26431906+Gajesh2007@users.noreply.github.com> Co-authored-by: anupsv <6407789+anupsv@users.noreply.github.com> Co-authored-by: hankbob <hankbobtheresearchoor@gmail.com> Co-authored-by: Hank Bob <hankbob@researchoors.com> * migration: harden Rust→Swift cutover end-to-end Twelve fixes informed by three reviewer subagents (codex-rescue, independent Claude, full pipeline audit) to ensure the bridge release → Swift release cutover works on first try, with no silent breakage. Coordinator: - accept darkbloom-bundle-<platform>.tar.gz (was eigeninference-bundle-) - restore TestProviderRegistrationWithoutAttestationRejectedWhenBinaryHashPolicyConfigured (dropped during the master→swift-provider merge) release-swift.yml: - ship bin/{darkbloom,darkbloom-enclave,mlx.metallib} as real-file copies (was symlinks) so coordinator's tar.TypeReg verifier accepts them and hashes the actual bytes - staple both bin/ AND .app/Contents/MacOS/ paths now that they're independent files - post-codesign verification: fail build if signed CLI is missing the keychain-access-groups entitlement or the access group SLDQ2GJ6TL.io.darkbloom.provider, or if embedded.provisionprofile is absent from the .app - PROVISIONING_PROFILE_BASE64 is now hard-required (no silent ephemeral fallback). Profile is decoded + parsed with plutil/python: verifies TeamIdentifier, keychain-access-groups, application-identifier, and ExpirationDate >= 30 days out - pin MLX python wheel to 0.31.1 to match libs/mlx-swift Cmlx version (was 0.31.2 — patch-level metallib ABI risk) - prod releases now hard-fail Swift tests (was soft-fail for all) release-rust-bridge.yml: - rename bridge bundle to darkbloom-bundle-<platform>.tar.gz uniformly so coordinator accepts the registration Both release workflows: - PROD_* secrets fall back to legacy unprefixed (R2_ACCESS_KEY_ID, RELEASE_KEY, COORDINATOR_URL) + vars.R2_BUCKET when PROD_* empty. Fails hard if neither resolves. provider/src/main.rs (bridge auto-update): - new rewrite_launchd_plist_for_swift: extracts ProgramArguments from the Rust plist (`serve --coordinator URL --model M`), converts to Swift shape (`start --foreground --coordinator-url URL --model M`), atomic rename - install_swift_update_bundle_at: if Darkbloom.app/Contents/MacOS/ exists in the extracted bundle, replace bin/{darkbloom,darkbloom- enclave,mlx.metallib} with symlinks into .app/MacOS and route the launchd plist's ProgramArguments[0] at the .app's MacOS binary path. This puts the embedded provisioning profile in scope at runtime, so the persistent SE key (PR #146) doesn't get errSecMissingEntitlement on first attestation post-cutover - plist_path is now an Option<&Path> so tests can avoid touching the developer machine's real ~/Library/LaunchAgents Tests added (all passing): - 6 plist-rewrite unit tests: extract / convert / rewrite / install- with-plist / .app-aware install / hash-only install - 1 ported coordinator attestation policy test - existing 7 auto-update integration tests still pass (302 → 303 total) Verified by audit: - macos-26-xlarge has Xcode 26.2 / Swift 6.2, satisfies all swift-tools-version requirements - LatestProviderVersion ordering: semver THEN created_at in both memory and Postgres stores - /api/version JSON shape matches what auto_update_check_with_install_dir expects - StartCommand --foreground doesn't recurse into launchAgent.installAndStart - Swift ModelScanner reads ~/.cache/huggingface/hub (same as Rust) - AuthTokenStore path parity (~/.darkbloom/auth_token) Deployment prerequisite: coordinator changes must be deployed (master → dev Cloud Build, then human ecloud deploy to prod) BEFORE tagging any release. Bridge registration will 400 against an older coordinator that doesn't know about the darkbloom-bundle- filename. * chore: cargo fmt on plist-migration code Post-rustfmt: long format!() args wrapped, with_context closure pulled onto one line, ternary-style assignment broken into if/else. No behavior change — `cargo test --bin darkbloom` still 303 pass / 0 fail. --------- Co-authored-by: ethenotethan <42627790+ethenotethan@users.noreply.github.com> Co-authored-by: anupsv <6407789+anupsv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: hankbob <hankbobtheresearchoor@gmail.com> Co-authored-by: Hank Bob <hankbob@researchoors.com>

hankbobtheresearchoor changed the base branch from master to swift-provider May 13, 2026 18:06

hankbobtheresearchoor force-pushed the fix/error-response-code-param branch from 94ab31c to dc009f1 Compare May 13, 2026 18:17

ethenotethan requested a review from Gajesh2007 May 13, 2026 18:28

Gajesh2007 approved these changes May 14, 2026

View reviewed changes

ethenotethan merged commit 12ac05b into Layr-Labs:swift-provider May 14, 2026
1 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api): add code and param fields to OpenAI error responses#144

fix(api): add code and param fields to OpenAI error responses#144
ethenotethan merged 1 commit into
Layr-Labs:swift-providerfrom
hankbobtheresearchoor:fix/error-response-code-param

hankbobtheresearchoor commented May 9, 2026

Uh oh!

vercel Bot commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hankbobtheresearchoor commented May 9, 2026

Summary

Problem

Changes

Core: errorResponse now supports optional code and param

Call-site updates for OpenAI-canonical codes

Backward compatibility

Validation

Uh oh!

vercel Bot commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Core: `errorResponse` now supports optional `code` and `param`