Persistent Secure Enclave key with keychain access group enforcement by Gajesh2007 · Pull Request #146 · Layr-Labs/d-inference

Gajesh2007 · 2026-05-10T16:25:14Z

Summary

Adds PersistentEnclaveKey — a Secure Enclave P-256 signing key stored in the macOS data protection keychain with team-scoped access group (SLDQ2GJ6TL.io.darkbloom.provider)
Only binaries signed by team SLDQ2GJ6TL can access the key — enforced by securityd at the kernel level. A patched binary re-signed with codesign -s - gets errSecMissingEntitlement
Introduces AttestationSigner protocol so ProviderLoop tries persistent key first, falls back to ephemeral CryptoKit key gracefully

Security model

The key is created once via SecKeyCreateRandomKey with kSecAttrTokenIDSecureEnclave + kSecAttrIsPermanent: true and persists in the data protection keychain. The private key never leaves the Secure Enclave hardware — SecKeyCopyExternalRepresentation fails by design.

Access enforcement chain:

Secure Boot → kernel → AMFI (code signing) → securityd (keychain ACL)
  → access group check: caller must be signed by team SLDQ2GJ6TL
  → Secure Enclave: signs only if securityd approves

Verified via POC — 16 attack vectors tested:

Ad-hoc signed binary: BLOCKED
Different team ID: BLOCKED
Root + security CLI: BLOCKED
Raw keychain DB extraction: encrypted blob only
Private key export: hardware enforced

Requires for production

A Developer ID provisioning profile from Apple Developer Portal authorizing the keychain-access-groups entitlement for the provider's App ID. Without it, the code gracefully falls back to the existing ephemeral CryptoKit key.

Test plan

swift build passes
swift test — 67/67 tests pass (8 new persistent enclave key tests)
POC verified: key creation, persistence, signing, access denial for ad-hoc/cross-team binaries
Production test with Eigen Labs Developer ID provisioning profile

🤖 Generated with Claude Code

…oup enforcement Replace ephemeral CryptoKit SE keys with persistent Security framework keys stored in the macOS data protection keychain. The key is bound to the signing team's keychain access group (SLDQ2GJ6TL.io.darkbloom.provider), enforced by securityd at the kernel level. A patched binary re-signed with codesign -s - gets errSecMissingEntitlement and cannot access the key. - PersistentEnclaveKey: Security framework SE key with SecKeyCreateRandomKey, kSecAttrIsPermanent, and team-scoped access group - AttestationSigner protocol: abstracts over both ephemeral and persistent keys - ProviderLoop: tries persistent key first, falls back to ephemeral with warning - Entitlements plist with keychain-access-groups for production signing - 8 tests covering creation, persistence, signing, deletion, protocol conformance

vercel · 2026-05-10T16:25:19Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
d-inference	Ready	Preview	May 14, 2026 9:45pm
d-inference-console-ui-dev	Ready	Preview	May 14, 2026 9:45pm
d-inference-landing	Ready	Preview	May 14, 2026 9:45pm

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e3635753e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-10T16:31:19Z

+        if #available(macOS 13.0, *) {
+            return true
+        }


Gate persistent-key path on actual SE availability

PersistentEnclaveKey.isAvailable currently returns true for any non-simulator macOS 13+ host, which does not actually verify Secure Enclave support. On environments like macOS VMs or unsupported hardware, callers will enter the persistent-key flow and then fail later with keychain/SecKey errors instead of cleanly treating SE as unavailable; this also makes the new tests' availability guards unreliable. Use a real probe (for example SecureEnclave.isAvailable or a Security-framework capability check) so this flag reflects hardware capability.

Useful? React with 👍 / 👎.

hankbobtheresearchoor

Review Summary

Clean abstraction and good graceful fallback logic. Two real issues: the entitlements plist has the wrong key name (🔴), and the loadOrCreate error handling should be tighter (🟡). Several observations about the security model implications of persistent identity.

1 🔴 Must Fix · 1 🟡 Should Fix · 4 🔵 Observations

hankbobtheresearchoor · 2026-05-10T16:36:30Z

+    <key>com.apple.security.network.server</key>
+    <true/>
+    <key>keychain-access-groups</key>
+    <array>


🔴 Wrong entitlement key name. The key should be com.apple.security.keychain-access-groups, not keychain-access-groups. Apple's hardened runtime entitlements all live under the com.apple.security.* namespace.

The existing scripts/entitlements.plist already has this right:

<key>com.apple.security.keychain-access-groups</key>

With the bare keychain-access-groups, codesign --entitlements will embed an entitlement that securityd doesn't recognize, and keychain access will fail with -34018 (errSecMissingEntitlement).

Additionally, this file is a duplicate of scripts/entitlements.plist (which CI already references). Either fix the key name and update release-swift.yml to use this file instead, or remove this file and keep the single source of truth in scripts/.

hankbobtheresearchoor · 2026-05-10T16:36:30Z

+        let keyLabel = label ?? defaultLabel
+
+        if let existing = try? findExisting(accessGroup: group, label: keyLabel) {
+            logger.info("Loaded existing persistent Secure Enclave key")


🟡 Silent fallthrough on transient keychain errors. try? swallows all errors from findExisting, including transient ones like errSecAuthFailed (keychain locked) or errSecDeviceUnavailable. These should probably be surfaced rather than masked by a fallthrough to createNew, which then races with the existing key.

Suggest:

if let existing = try? findExisting(accessGroup: group, label: keyLabel) { return existing } // Only fall through on not-found, propagate entitlement/auth errors

Or catch PersistentEnclaveKeyError.keyLookupFailed(status: errSecItemNotFound) explicitly and re-throw everything else.

hankbobtheresearchoor · 2026-05-10T16:36:30Z

+    public static let defaultAccessGroup = "SLDQ2GJ6TL.io.darkbloom.provider"
+
+    public static let defaultLabel = "io.darkbloom.provider.attestation-signing.v1"
+


🔵 No rotation path. The v1 suffix in defaultLabel implies future versions, but there's no mechanism to discover or migrate to v2. If the key is compromised or needs rotation:

Old key stays orphaned in keychain (no cleanup)

Coordinator still has the old SE pubkey stored

New provider instances create v1 keys with different pubkeys, but the label is the same so they'd overwrite

Consider adding a rotateKey() static method or a darkbloom verify --rotate-key CLI command, and document the coordinator-side SE pubkey update flow.

hankbobtheresearchoor · 2026-05-10T16:36:30Z

+
+    /// Whether the Secure Enclave is available on this device.
+    public static var isAvailable: Bool {
+        #if targetEnvironment(simulator)


🔵 isAvailable doesn't check entitlement. Returns true on any macOS 13+ Apple Silicon machine, but persistent SE key also requires the keychain-access-groups entitlement. Callers who check isAvailable before calling loadOrCreate() will get a misleading true on unsigned debug builds, then hit -34018.

This is consistent with SecureEnclaveIdentity.isAvailable (which also doesn't check entitlements), so not blocking — but worth a doc comment noting the entitlement dependency:

/// - Note: Also requires `com.apple.security.keychain-access-groups` entitlement /// in the binary's code signature. Without it, `loadOrCreate()` will throw /// `PersistentEnclaveKeyError.missingEntitlement`.

hankbobtheresearchoor · 2026-05-10T16:36:42Z

Security model observations (not line-anchored)

Persistent identity tradeoff: This PR changes the SE key from ephemeral (session-scoped) to persistent (survives restarts). Previously, if userspace was compromised between attestation checkpoints, the damage was limited to one session — the key was gone on restart. With persistent keys, the same compromised key can sign attestation challenges across restarts until explicitly deleted. The PR body should acknowledge this tradeoff explicitly. The access-group enforcement mitigates this (only signed binaries can use the key), but it doesn't eliminate the window.

Coordinator-side implications: The coordinator already stores SEPublicKey in Postgres and builds a lookup map "sekey:" + SEPublicKey. With persistent keys, the same SE public key appears across reconnects, enabling identity correlation that wasn't possible with ephemeral keys. The coordinator doesn't currently USE SE keys for identity binding (it uses serial number for DisconnectDuplicatesBySerial), but this PR creates the prerequisites. The PR should explicitly state whether SE-key-based identity tracking is intended, to avoid implicit feature drift.

DARKBLOOM_KEYCHAIN_ACCESS_GROUP env var: Fine for dev/testing, but worth a code comment marking it as non-production. In production, the access group should always come from the entitlement — env var overrides could allow pointing at an attacker-controlled keychain group (though the entitlement still gates access).

github-actions · 2026-05-10T16:37:40Z

Benchmark Results

Runner: macos-15 (M1 Virtual) | Date: 2026-05-14 21:51 UTC

1-provider-streaming

1 providers, 1 users, 30 requests, concurrency=5, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	30
Success	30
Errors	0
Total Duration	12.673s
Throughput	2.4 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	30	917ms	11ms	3.774s	8.256s
parse	30	39µs	18µs	182µs	232µs
reserve	30	3ms	1ms	9ms	12ms
route	30	407ms	0s	930ms	8.215s
queue_wait	9	1.356s	495ms	8.215s	8.215s
encrypt	30	196µs	145µs	425µs	452µs
dispatch	30	44µs	25µs	194µs	271µs
coordinator_to_provider	30	505ms	4ms	3.76s	3.763s

Assertion Report: PASS

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=38.533µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=182µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=2.6149ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=9.095ms (threshold=200ms)
encrypt:mean<=5ms	PASS	mean=195.566µs (threshold=5ms)
encrypt:p95<=50ms	PASS	p95=425µs (threshold=50ms)
dispatch:mean<=5ms	PASS	mean=43.6µs (threshold=5ms)
dispatch:p95<=50ms	PASS	p95=194µs (threshold=50ms)

1-provider-non-streaming

1 providers, 1 users, 20 requests, concurrency=5, streaming=false

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	20
Success	20
Errors	0
Total Duration	5.913s
Throughput	3.4 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	20	1.447s	613ms	4.38s	4.38s
parse	20	23µs	17µs	81µs	81µs
reserve	20	2ms	1ms	5ms	5ms
route	20	263ms	0s	3.991s	3.991s
queue_wait	4	1.317s	474ms	3.991s	3.991s
encrypt	20	143µs	138µs	191µs	191µs
dispatch	20	21µs	18µs	57µs	57µs
coordinator_to_provider	20	659ms	3ms	3.284s	3.284s

Assertion Report: PASS

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=23.15µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=81µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.6772ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=5.42ms (threshold=200ms)
encrypt:mean<=5ms	PASS	mean=143.4µs (threshold=5ms)
encrypt:p95<=50ms	PASS	p95=191µs (threshold=50ms)
dispatch:mean<=5ms	PASS	mean=21.45µs (threshold=5ms)
dispatch:p95<=50ms	PASS	p95=57µs (threshold=50ms)

7-provider-multi-model

7 providers, 5 users, 50 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	4	0.5 GB
mlx-community/gemma-3-270m-4bit	3	0.2 GB

Metric	Value
Total Requests	50
Success	48
Errors	2
Total Duration	1m5.791s
Throughput	0.7 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	48	4.861s	31ms	33.875s	34.306s
parse	48	70µs	41µs	232µs	266µs
reserve	48	5ms	3ms	20ms	26ms
route	48	630ms	0s	4.253s	10.004s
queue_wait	7	2.89s	3.444s	4.456s	4.456s
encrypt	48	0s	0s	0s	1ms
dispatch	48	65µs	46µs	172µs	327µs
coordinator_to_provider	48	4.221s	10ms	33.859s	34.294s

Assertion Report: PASS

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=69.833µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=232µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=4.814333ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=19.857ms (threshold=200ms)
encrypt:mean<=5ms	PASS	mean=240.562µs (threshold=5ms)
encrypt:p95<=50ms	PASS	p95=398µs (threshold=50ms)
dispatch:mean<=5ms	PASS	mean=65.02µs (threshold=5ms)
dispatch:p95<=50ms	PASS	p95=172µs (threshold=50ms)

3-provider-high-concurrency

3 providers, 10 users, 60 requests, concurrency=20, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	60
Success	60
Errors	0
Total Duration	13.335s
Throughput	4.5 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	60	2.632s	1.404s	8.362s	8.433s
parse	60	0s	0s	0s	6ms
reserve	60	12ms	4ms	43ms	50ms
route	60	1.63s	848ms	8.246s	8.322s
queue_wait	42	2.329s	1.186s	8.246s	8.322s
encrypt	60	0s	0s	1ms	2ms
dispatch	60	47µs	27µs	194µs	273µs
coordinator_to_provider	60	973ms	18ms	4.864s	4.894s

Assertion Report: PASS

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=170.233µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=293µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=11.6219ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=42.603ms (threshold=200ms)
encrypt:mean<=5ms	PASS	mean=245.433µs (threshold=5ms)
encrypt:p95<=50ms	PASS	p95=561µs (threshold=50ms)
dispatch:mean<=5ms	PASS	mean=46.5µs (threshold=5ms)
dispatch:p95<=50ms	PASS	p95=194µs (threshold=50ms)

1-provider-queue-saturation

1 providers, 10 users, 40 requests, concurrency=15, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	40
Success	40
Errors	0
Total Duration	11.263s
Throughput	3.6 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	40	3.226s	2.77s	5.865s	5.865s
parse	40	72µs	26µs	297µs	682µs
reserve	40	13ms	2ms	47ms	54ms
route	40	2.802s	2.632s	5.768s	5.77s
queue_wait	35	3.203s	2.636s	5.768s	5.77s
encrypt	40	174µs	148µs	338µs	395µs
dispatch	40	32µs	24µs	105µs	160µs
coordinator_to_provider	40	395ms	3ms	3.918s	3.919s

Assertion Report: PASS

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=72.45µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=297µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=12.659025ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=47.095ms (threshold=200ms)
encrypt:mean<=5ms	PASS	mean=174.3µs (threshold=5ms)
encrypt:p95<=50ms	PASS	p95=338µs (threshold=50ms)
dispatch:mean<=5ms	PASS	mean=31.5µs (threshold=5ms)
dispatch:p95<=50ms	PASS	p95=105µs (threshold=50ms)

3-provider-20-users

3 providers, 20 users, 60 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	60
Success	60
Errors	0
Total Duration	12.126s
Throughput	4.9 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	60	843ms	8ms	4.968s	4.969s
parse	60	32µs	27µs	73µs	201µs
reserve	60	4ms	2ms	21ms	23ms
route	60	11ms	0s	0s	514ms
queue_wait	2	319ms	514ms	514ms	514ms
encrypt	60	172µs	141µs	404µs	702µs
dispatch	60	35µs	27µs	116µs	161µs
coordinator_to_provider	60	824ms	3ms	4.946s	4.949s

Assertion Report: PASS

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=31.65µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=73µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=4.006433ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=20.584ms (threshold=200ms)
encrypt:mean<=5ms	PASS	mean=171.766µs (threshold=5ms)
encrypt:p95<=50ms	PASS	p95=404µs (threshold=50ms)
dispatch:mean<=5ms	PASS	mean=35.133µs (threshold=5ms)
dispatch:p95<=50ms	PASS	p95=116µs (threshold=50ms)

1-provider-scaling

1 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	30
Success	30
Errors	0
Total Duration	10.714s
Throughput	2.8 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	30	2.599s	1.888s	6.019s	6.02s
parse	30	35µs	28µs	78µs	82µs
reserve	30	4ms	2ms	12ms	13ms
route	30	2.047s	1.866s	5.993s	5.993s
queue_wait	23	2.671s	1.881s	5.993s	5.993s
encrypt	30	249µs	167µs	577µs	666µs
dispatch	30	44µs	33µs	90µs	177µs
coordinator_to_provider	30	543ms	5ms	4.038s	4.039s

Assertion Report: PASS

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=35.233µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=78µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=3.8423ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=11.872ms (threshold=200ms)
encrypt:mean<=5ms	PASS	mean=249.466µs (threshold=5ms)
encrypt:p95<=50ms	PASS	p95=577µs (threshold=50ms)
dispatch:mean<=5ms	PASS	mean=43.566µs (threshold=5ms)
dispatch:p95<=50ms	PASS	p95=90µs (threshold=50ms)

3-provider-scaling

3 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	30
Success	30
Errors	0
Total Duration	8.408s
Throughput	3.6 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	30	1.562s	30ms	4.731s	4.735s
parse	30	63µs	32µs	211µs	534µs
reserve	30	13ms	11ms	41ms	47ms
route	30	42µs	22µs	202µs	315µs
encrypt	30	0s	0s	0s	2ms
dispatch	30	90µs	40µs	315µs	904µs
coordinator_to_provider	30	1.53s	14ms	4.636s	4.664s

Assertion Report: PASS

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=63.033µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=211µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=13.2937ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=41.403ms (threshold=200ms)
encrypt:mean<=5ms	PASS	mean=255.466µs (threshold=5ms)
encrypt:p95<=50ms	PASS	p95=441µs (threshold=50ms)
dispatch:mean<=5ms	PASS	mean=89.8µs (threshold=5ms)
dispatch:p95<=50ms	PASS	p95=315µs (threshold=50ms)

5-provider-scaling

5 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	5	0.5 GB

Metric	Value
Total Requests	30
Success	30
Errors	0
Total Duration	17.95s
Throughput	1.7 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	30	3.725s	11ms	11.211s	11.212s
parse	30	53µs	36µs	174µs	235µs
reserve	30	8ms	2ms	35ms	36ms
route	30	1.667s	0s	10.003s	10.003s
encrypt	30	172µs	164µs	272µs	386µs
dispatch	30	68µs	39µs	303µs	383µs
coordinator_to_provider	30	2.045s	4ms	11.185s	11.198s

Assertion Report: PASS

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=52.6µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=174µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=7.577133ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=34.816ms (threshold=200ms)
encrypt:mean<=5ms	PASS	mean=172.4µs (threshold=5ms)
encrypt:p95<=50ms	PASS	p95=272µs (threshold=50ms)
dispatch:mean<=5ms	PASS	mean=67.966µs (threshold=5ms)
dispatch:p95<=50ms	PASS	p95=303µs (threshold=50ms)

3-provider-heavy-100conc-10kb

3 providers, 20 users, 100 requests, concurrency=100, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	100
Success	100
Errors	0
Total Duration	15.481s
Throughput	6.5 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	100	10.317s	10.36s	14.514s	15.03s
parse	100	0s	0s	1ms	1ms
reserve	100	49ms	56ms	61ms	66ms
route	100	9.567s	10.237s	14.388s	14.901s
queue_wait	88	10.872s	10.93s	14.388s	14.902s
encrypt	100	269µs	229µs	569µs	872µs
dispatch	100	0s	0s	0s	2ms
coordinator_to_provider	100	640ms	7ms	5.413s	5.443s

Assertion Report: PASS

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=227.52µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=1.051ms (threshold=5ms)
reserve:mean<=50ms	PASS	mean=49.01349ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=61.455ms (threshold=200ms)
encrypt:mean<=5ms	PASS	mean=268.76µs (threshold=5ms)
encrypt:p95<=50ms	PASS	p95=569µs (threshold=50ms)
dispatch:mean<=5ms	PASS	mean=99.03µs (threshold=5ms)
dispatch:p95<=50ms	PASS	p95=227µs (threshold=50ms)

The data protection keychain requires a provisioning profile to authorize the keychain-access-groups entitlement. Wrap the CLI binaries in a minimal Darkbloom.app bundle with embedded.provisionprofile so the persistent SE attestation key works on provider machines. - release-swift.yml: new step decodes PROVISIONING_PROFILE_BASE64 secret, builds Darkbloom.app/Contents/ structure, signs bundle + individual binaries - install.sh: detects .app bundle layout, symlinks bin/ into the app bundle - Backward-compatible: falls back gracefully if secret is not set or if provider receives a flat (pre-.app) bundle

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fdb113fa70

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-10T18:15:42Z

@@ -22,7 +22,7 @@ set -euo pipefail
 # Direct-fetch copy: no serve-time templating applied. Override with
 #   curl ... | COORD_URL=https://api.dev.darkbloom.xyz bash
 # Or fetch the coordinator-served copy at $COORD_URL/install.sh for templating.
-COORD_URL="${COORD_URL:-__DARKBLOOM_COORD_URL__}"
+COORD_URL="${COORD_URL:-https://api.darkbloom.dev}"


Restore coordinator URL placeholder in embedded installer

Reintroduce __DARKBLOOM_COORD_URL__ here instead of hardcoding https://api.darkbloom.dev, because /install.sh is still templated at serve time in server.go via strings.ReplaceAll(..., installScriptPlaceholder, s.resolveBaseURL(r)). With the placeholder removed, any dev/self-hosted coordinator now serves an installer that always talks to production, so release lookup and subsequent install traffic are routed to the wrong environment unless users manually override COORD_URL.

Useful? React with 👍 / 👎.

Required for data protection keychain access. Must match the bundle ID in the provisioning profile (SLDQ2GJ6TL.io.darkbloom.provider).

…se-key

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f404c1aef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-14T21:35:17Z

+        let privateKeyAttrs: [String: Any] = [
+            kSecAttrIsPermanent as String: true,
+            kSecAttrAccessControl as String: accessControl,
+            kSecAttrLabel as String: label,
+            kSecAttrAccessGroup as String: accessGroup,
+        ]


Set data-protection keychain flag on persistent key ops

Add kSecUseDataProtectionKeychain: true to the persistent-key create/query/delete dictionaries; on macOS, kSecAttrAccessGroup and kSecAttrAccessible* only apply in data-protection-keychain mode. As written, the access-group constraint can be ignored, so the key may be created/looked up in the wrong keychain class and the intended team-scoped enforcement can silently fail (especially when rotating access groups or relying on entitlement-gated access behavior).

Useful? React with 👍 / 👎.

…, real SE probe Codex P1 / hank P1: - coordinator/api/install.sh: restore __DARKBLOOM_COORD_URL__ placeholder (the coordinator templates this at serve time via server.go; hardcoding the URL broke dev/self-hosted coordinators) - PersistentEnclaveKey: add kSecUseDataProtectionKeychain: true to all Security framework calls. Without it, queries may hit the legacy file-based keychain where access group enforcement is silently ignored. hank P2: - loadOrCreate: catch only errSecItemNotFound before falling through to createNew. Auth failures, locked keychain, and missing entitlement now propagate to the caller instead of racing with key creation. - isAvailable: probe real SE capability via CryptoKit's SecureEnclave.isAvailable instead of just checking macOS version. Now returns false on Intel Macs without T2 and macOS VMs without virtualized SE. Added doc comment noting the entitlement dependency.

Gajesh2007 · 2026-05-14T21:50:20Z

@codex can you review this?

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84664d04ba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-14T21:55:07Z

+# The bundle ships as Darkbloom.app/ (contains provisioning profile for
+# keychain-access-groups) with bin/ symlinks for backward compatibility.
+# Older flat bundles (bin/darkbloom directly) are also handled.
 tar xzf "$TARBALL" -C "$INSTALL_DIR"


Remove existing app bundle before extracting update

The installer now untars directly into ~/.darkbloom and then reuses Darkbloom.app in place, but it never deletes a previous Darkbloom.app first. If a future release removes or renames files, stale files can remain inside the bundle, which can invalidate sealed app resources (unsealed contents) and cause signature/entitlement checks to fail at runtime. This is especially risky here because the new persistent-key path depends on valid app signing + embedded provisioning profile; stale bundle contents can force unexpected fallback or launch failures on upgrade.

Useful? React with 👍 / 👎.

Twelve fixes informed by three reviewer subagents (codex-rescue, independent Claude, full pipeline audit) to ensure the bridge release → Swift release cutover works on first try, with no silent breakage. Coordinator: - accept darkbloom-bundle-<platform>.tar.gz (was eigeninference-bundle-) - restore TestProviderRegistrationWithoutAttestationRejectedWhenBinaryHashPolicyConfigured (dropped during the master→swift-provider merge) release-swift.yml: - ship bin/{darkbloom,darkbloom-enclave,mlx.metallib} as real-file copies (was symlinks) so coordinator's tar.TypeReg verifier accepts them and hashes the actual bytes - staple both bin/ AND .app/Contents/MacOS/ paths now that they're independent files - post-codesign verification: fail build if signed CLI is missing the keychain-access-groups entitlement or the access group SLDQ2GJ6TL.io.darkbloom.provider, or if embedded.provisionprofile is absent from the .app - PROVISIONING_PROFILE_BASE64 is now hard-required (no silent ephemeral fallback). Profile is decoded + parsed with plutil/python: verifies TeamIdentifier, keychain-access-groups, application-identifier, and ExpirationDate >= 30 days out - pin MLX python wheel to 0.31.1 to match libs/mlx-swift Cmlx version (was 0.31.2 — patch-level metallib ABI risk) - prod releases now hard-fail Swift tests (was soft-fail for all) release-rust-bridge.yml: - rename bridge bundle to darkbloom-bundle-<platform>.tar.gz uniformly so coordinator accepts the registration Both release workflows: - PROD_* secrets fall back to legacy unprefixed (R2_ACCESS_KEY_ID, RELEASE_KEY, COORDINATOR_URL) + vars.R2_BUCKET when PROD_* empty. Fails hard if neither resolves. provider/src/main.rs (bridge auto-update): - new rewrite_launchd_plist_for_swift: extracts ProgramArguments from the Rust plist (`serve --coordinator URL --model M`), converts to Swift shape (`start --foreground --coordinator-url URL --model M`), atomic rename - install_swift_update_bundle_at: if Darkbloom.app/Contents/MacOS/ exists in the extracted bundle, replace bin/{darkbloom,darkbloom- enclave,mlx.metallib} with symlinks into .app/MacOS and route the launchd plist's ProgramArguments[0] at the .app's MacOS binary path. This puts the embedded provisioning profile in scope at runtime, so the persistent SE key (PR #146) doesn't get errSecMissingEntitlement on first attestation post-cutover - plist_path is now an Option<&Path> so tests can avoid touching the developer machine's real ~/Library/LaunchAgents Tests added (all passing): - 6 plist-rewrite unit tests: extract / convert / rewrite / install- with-plist / .app-aware install / hash-only install - 1 ported coordinator attestation policy test - existing 7 auto-update integration tests still pass (302 → 303 total) Verified by audit: - macos-26-xlarge has Xcode 26.2 / Swift 6.2, satisfies all swift-tools-version requirements - LatestProviderVersion ordering: semver THEN created_at in both memory and Postgres stores - /api/version JSON shape matches what auto_update_check_with_install_dir expects - StartCommand --foreground doesn't recurse into launchAgent.installAndStart - Swift ModelScanner reads ~/.cache/huggingface/hub (same as Rust) - AuthTokenStore path parity (~/.darkbloom/auth_token) Deployment prerequisite: coordinator changes must be deployed (master → dev Cloud Build, then human ecloud deploy to prod) BEFORE tagging any release. Bridge registration will 400 against an older coordinator that doesn't know about the darkbloom-bundle- filename.

@test

* Clarify provider trust diagnostics * Add Swift provider runtime * Remove unused e2e vector generator * Continuous batching, GPU-only enforcement, rename to darkbloom, Layr-Labs forks This is the v0.5.0 cutover commit on the Swift provider PR. It lands true continuous batching as the production inference path, threads per-row sampling through the request, hard-fails on CPU-only hosts, renames the user-visible CLI surface from "eigeninference" to "darkbloom" with backward compatibility, and re-homes the mlx-swift / mlx-swift-lm submodules to Layr-Labs forks. Continuous batching (default, no parallel implementations) ---------------------------------------------------------- Replaces the per-request BatchScheduler with one shared BatchGenerator ported from upstream `mlx_lm.generate`. All concurrent requests are merged into one batched forward pass per step. Bit-identical against single-stream greedy on: - Qwen3 0.6B-8bit (dense), B=2 / B=4-ragged - Qwen3.5 0.8B-MLX-4bit (hybrid SSM + attention), B=2 - Gemma 4 26B-A4B-it-8bit (MoE, 26 GB), B=2 The mlx-swift-lm side of this work is at Layr-Labs/mlx-swift-lm@darkbloom-continuous-batching: - BatchKVCache + BatchedCache protocol - SequenceStateMachine, PromptProcessingBatch, GenerationBatch, BatchGenerator - RowSamplers (temperature / top-P / top-K / seed) - Gemma 4 MoE support + K=V branch fix in Gemma4Attention Production scheduler in provider-swift/Sources/ProviderCore/Inference/ BatchScheduler.swift wraps the engine in an actor; detached worker calls into the actor only for short critical sections so cancel/submit never queue behind a long-running step. submit() builds a per-row sampler from request.{temperature, top_p, top_k, seed}. Validation also covers eviction-and-admission: row 0 finishes mid-batch, row C is admitted into its slot, row C's tokens match a solo run, row B (running through the eviction) also matches its solo run. This locks in BatchKVCache.filterBatched + extendBatched correctness end-to-end. Sampler unit tests cover greedy passthrough, top-K=1 determinism, top-K masking, top-P collapse-to-dominant, top-P=1 identity, seeded reproducibility, and different-seed divergence. GPU-only enforcement -------------------- ProviderCore/Inference/GPUEnforcement.swift: - probeMetal(): non-throwing Metal device probe - requireMetal(): throws on missing GPU; pins Device.setDefault(.gpu); idempotent Wired into BatchScheduler.loadModel, StartCommand, BenchmarkCommand, and `darkbloom doctor`. Doctor surfaces a `[PASS] metal gpu: <name>, <N> GB working set` line; `[FAIL]` on Intel/Linux. CPU fallback for inference is rejected up-front with a descriptive error. Rename: eigeninference → darkbloom (Swift CLI surface) ------------------------------------------------------ Canonical names: - eigeninference-enclave → darkbloom-enclave (binary + struct) - Sources/eigeninference-enclave-cli/ → Sources/darkbloom-enclave-cli/ - SwiftPM target EigenInferenceEnclaveCLI → DarkbloomEnclaveCLI - eigeninference-bundle-macos-arm64.tar.gz → darkbloom-bundle-macos-arm64.tar.gz - ~/.config/eigeninference/ → ~/.config/darkbloom/ (preferred path) - Mobileconfig prefix: EigenInference-Enroll-* → Darkbloom-Enroll-* Backward compatibility: - install.sh creates a `eigeninference-enclave` symlink to `darkbloom-enclave` so existing install scripts keep resolving. - Config loader still reads ~/.config/eigeninference/ and the App Support legacy paths as fallbacks; new writes always go to ~/.config/darkbloom/. - LocalDataCleanup.purge() removes both directories. - release-swift.yml publishes the latest tarball under both canonical and legacy filenames. - NodeKeyPair.legacyDirNames and SecurityHardening MDM-profile-name matchers still accept the old name. - Coordinator/Rust/UI surfaces (R2 buckets, Stripe descriptors, Solana memos, telemetry source attribution) intentionally untouched. CLI subcommands shipped in v0.5.0 --------------------------------- darkbloom serve / start / stop, status, doctor, models {list, catalog, download, remove}, enroll, unenroll, login, logout, logs, autoupdate, benchmark, update, verify. start --foreground is the launchd entrypoint; start --local --port N runs a standalone OpenAI-compatible HTTP server. PID-file single-instance enforcement, caffeinate-based sleep prevention, panic-hook telemetry, and metallib hash in attestation are all wired in. Submodule re-homing ------------------- .gitmodules now points to Layr-Labs/mlx-swift and Layr-Labs/mlx-swift-lm. The mlx-swift pointer is unchanged (clean `main`). The mlx-swift-lm pointer advances from 3ec4b8a (codex/local-mlx-swift-dependency) to 91612d5 (darkbloom-continuous-batching) which carries the batching engine + Gemma 4 MoE fork on Layr-Labs/mlx-swift-lm. Tests ----- 135 / 135 tests pass in 16.5 s with DARKBLOOM_LIVE_MLX_TESTS=1 and DARKBLOOM_LIVE_MLX_GEMMA=1 (live MLX inference against real models plus the gated 27 GB Gemma generation test). * Bump mlx-swift-lm submodule to main after re-homing to Layr-Labs Layr-Labs/mlx-swift-lm@main now carries the continuous-batching engine, per-row samplers, and Gemma 4 MoE port at 8d76944. Same tree as the prior 91612d5 commit on the darkbloom-continuous-batching branch, but without the local-path mlx-swift dep hack, so the fork is consumable by URL outside this repo. * Untrack .claude/ files and drop dangling cross-references The .claude/ directory holds local agent state (cursor task files, working notes, the in-progress migration plan). Those don't belong in the repo. Untrack the two committed markdown files and broaden the .gitignore from `.claude/worktrees/` to `.claude/` so future agent runs don't add them back. Strip the dead links to .claude/swift-migration-plan.md from CLAUDE.md, provider-swift/README.md, docs/ARCHITECTURE.md, and scripts/fetch-metallib.sh -- the surrounding prose stands on its own. The local files remain on disk for active reference; only the tracking is removed. * Idle-timeout unload + coordinator-driven model preload protocol Two related additions to the provider's model lifecycle: 1) Idle-timeout unload ---------------------- ProviderLoop now runs a background monitor that polls every minute. If `idleTimeoutMins` minutes (default 60) have elapsed since the last inference activity AND no requests are in flight, the loaded ModelContainer is dropped. The next inference request lazy-reloads. `idleTimeoutMins == 0` disables the monitor; the model stays resident forever. The decision is extracted into `IdleTimeoutPolicy.shouldUnload(...)` so the rule is unit-testable without spinning up the full ProviderLoop actor (which depends on Secure Enclave, coordinator client, and security posture). Five unit tests pin the policy: (a) unloads when all conditions met, (b) never unloads with inflight requests, (c) never unloads with no model loaded, (d) waits for the timeout to elapse, (e) zero-timeout edge case is still defensive. Activity tracking: `lastInferenceAt` updates on every request admission and on every request finish (`removeInflightTask`). The worker is a detached `Task` so cancel/submit on the actor never queue behind the timer. 2) Coordinator-driven model preload ----------------------------------- New WebSocket message `coordinator → provider: load_model`. The provider has no inbound listener (security: a discovered IP can't reach the GPU), so the coordinator pushes preload requests over the existing outbound WebSocket connection that the provider opened. Use case: the coordinator predicts demand for model X on machine Y in the next hour and warms it ahead of time. Provider behavior: - If model is already loaded: short-circuit, reply succeeded. - Otherwise: emit `load_model_status` "started" immediately, kick off `ensureModelLoaded` in a detached Task, then emit "succeeded" or "failed" (with an error string) when the load settles. Wire surface added in three places (per AGENTS.md sync rule): - coordinator/internal/protocol/messages.go: `TypeLoadModel`, `TypeLoadModelStatus`, `LoadModelMessage`, `LoadModelStatusMessage`, plus the `LoadModelStatusStarted/Succeeded/Failed` constants. - provider-swift/.../Protocol/Messages.swift: new `CoordinatorMessage.loadModel(...)` case + `ProviderMessage .loadModelStatus(...)` case + Codable on both sides. - provider-swift/.../Coordinator/CoordinatorClient.swift: dispatch inbound `load_model` to a new `CoordinatorEvent.loadModel(modelId)` and add `OutboundMessage.loadModelStatus(...)` for the reply. ProviderLoop wires `handleLoadModelRequest(modelId:send:)` for the new event. Round-trip tests cover decoding a Go-style `load_model` JSON and encoding all three lifecycle status replies (started / succeeded / failed-with-error) with snake_case wire keys. Rust legacy provider intentionally untouched. The coordinator should gate `load_model` dispatch on `backend == "mlx-swift"` so the Rust path never receives an unknown message; that gate lives on the coordinator side and isn't part of this commit. Tests ----- 141 / 141 tests pass with DARKBLOOM_LIVE_MLX_TESTS=1 and DARKBLOOM_LIVE_MLX_GEMMA=1 (live MLX inference + Gemma 4 26B-A4B-it-8bit MoE batching included). New: 5 IdleTimeoutPolicy tests + 1 loadModel round-trip protocol test. * Add end-to-end performance tests: TTFT, encryption, batching, model load Four new live tests that produce reproducible numbers for the four scenarios the operator asked about. Gated by DARKBLOOM_LIVE_MLX_TESTS=1; all four target Qwen3 0.6B-8bit so the suite finishes in ~7 s. A) warm TTFT baseline -- pure inference TTFT with no encryption and the model already loaded. B) cold TTFT -- spins up a fresh ModelContainer each iteration so the weights are re-paged from disk; reports load_time and load_time + first_token separately. C) encrypted TTFT -- runs the request body through NodeKeyPair.encrypt (consumer side) and NodeKeyPair.decrypt (provider side) with real libsodium NaCl box, then submits. Reports encrypt-only, decrypt-only, warm TTFT, and total E2E first-token (enc + dec + TTFT) so each layer's cost is visible. D) batched TTFT -- B=1, B=2, B=4 concurrent submissions on a single shared scheduler. Reports per-row TTFT and aggregate throughput so the continuous-batching scaling story is honest. Headline numbers on M4 Max with Qwen3 0.6B-8bit: warm TTFT (plaintext): ~20 ms encrypt (consumer side): ~0.05 ms (libsodium NaCl box) decrypt (provider side): ~0.02 ms E2E first-token (enc+dec+TTFT): ~31 ms cold model load: ~856 ms cold load + first token: ~1036 ms aggregate throughput B=1: 87.4 tok/s aggregate throughput B=2: 176.2 tok/s (~2.0x) aggregate throughput B=4: 317.1 tok/s (~3.6x) per-request TTFT B=1 -> B=4: 34 ms -> 36 ms (flat) Encryption is essentially free, continuous batching scales near-linearly to B=4, and per-request TTFT is invariant under batching -- the key continuous-batching scheduler invariant. The tests assert lower-bound liveness (durations > 0, all rows complete) but don't pin absolute latencies, since those vary by hardware. Numbers print to stderr in a "[perf]" prefix so they land in the test log without polluting test stdout. While here, fixed a `String(format:)` bug in the printRow helper where `%s` was used with a Swift String (would have segfaulted the test process via _platform_strlen on an unaligned pointer). 145 / 145 tests pass in 9 s with DARKBLOOM_LIVE_MLX_TESTS=1. * Add Gemma 4 26B-A4B-it-8bit MoE tier to performance suite Refactor PerformanceLiveTests so every scenario (warm TTFT, cold load, encrypted E2E, batched throughput) is parameterised by a `ModelConfig` struct (label, modelID, wired-memory budget, iteration counts, batch sizes, max_tokens). Two configs ship in the suite: - Qwen3 0.6B-8bit smoke tier (DARKBLOOM_LIVE_MLX_TESTS=1) - Gemma 4 26B-A4B-it-8bit production tier (DARKBLOOM_LIVE_MLX_TESTS=1 + DARKBLOOM_LIVE_MLX_GEMMA=1) Both run all four scenarios. Total 8 @test methods (4 + 4). Headline numbers on M4 Max with weights memory-mapped from local cache: Gemma 26B MoE: warm TTFT 309 ms cold load 2.63 s cold load + first token 3.07 s encrypt (consumer side) 0.05 ms decrypt (provider side) 0.03 ms E2E first-token 262 ms B=1 throughput 10.2 tok/s B=2 throughput 16.7 tok/s (1.64x) B=4 throughput 23.9 tok/s (2.34x) Qwen3 0.6B (for comparison): warm TTFT ~21 ms cold load ~887 ms E2E first-token ~32 ms B=4 throughput ~302 tok/s Three things the Gemma tier surfaces that the smoke tier doesn't: 1. Encryption is *still* essentially free at 26B scale -- 70-80 us combined for encrypt + decrypt, dwarfed by the 200+ ms memory-bandwidth-bound prefill. 2. Per-row TTFT scales SUB-linearly with B for MoE (234 -> 344 -> 603 ms at B=1/2/4) because each batched prefill processes a heavier forward. Aggregate throughput still wins (10 -> 17 -> 24 tok/s). 3. Cold load on a 26 GB MoE that's still in the OS page cache is ~2.6 s -- the relevant number for the idle-timeout-reload path. First-ever boot would be longer (NVMe-bound), but unmeasurable from a unit test without privileged page-cache flushing. Also tighten the report formatting: column padding to 56 chars, "ms" under 1 s and "s" above, max_tokens=8 for Gemma (vs 16 for Qwen) so the suite finishes in ~30 s with all four scenarios run twice. 149 / 149 tests pass in 37 s with both env vars set. * Performance audit vs mlx_lm: bracket the dispatch-overhead gap The user noticed that "10.2 tok/s for Gemma 26B" looked too low. They were right. Side-by-side with `mlx_lm` 0.31.3 Python on the same M4 Max + same checkpoints: Qwen3 0.6B-8bit mlx_lm: 426 tok/s us: ~84 tok/s (5.0x) Gemma 4 26B-A4B-it-8bit MoE mlx_lm: 84 tok/s us: ~33 tok/s (2.4x) To localize the gap, this commit adds a "decode-tps bracket" test that measures the same B=1 steady-state decode through three paths: 1. Pure model loop -- model.callAsFunction directly, no scheduler 2. BatchGenerator -- our continuous-batching engine, B=1 3. BatchScheduler -- production path (actor + AsyncStream) Findings on Gemma 26B MoE (decode-only, 64 tokens): pure loop, sync eval 34.6 tok/s pure loop, async eval 34.4 tok/s (no improvement -- not the issue) BatchGenerator B=1 32.6 tok/s (-6%, noise-level) BatchScheduler.submit 32.5 tok/s (-6%, noise-level) mlx_lm Python reference 84.0 tok/s (2.4x faster) Conclusion: the gap is at the **MLX-Swift dispatch layer**, not in our scheduler or batched-cache code. The pure model loop is already 2.4x slower than Python. Adding our BatchScheduler + actor + worker adds < 6% on top -- not the bottleneck. The 8-13 ms per-step CPU overhead is consistent with kernel-launch latency in mlx-swift bindings. mlx_lm Python uses `mx.compile` on the decode step to amortize this; mlx-swift-lm does not. Closing the gap is a separate workstream on the upstream library. Other improvements in this commit: * Bump Gemma's batched max_tokens from 8 -> 32 so steady-state decode dominates the aggregate TPS metric. * Add steady-state decode TPS reporting alongside aggregate (subtract prefill so it compares like-for-like with mlx_lm's "Generation: X tokens-per-sec" headline). * Switch the throughput tests to a long-output prompt ("write a 200 word story...") so the model decodes to max_tokens instead of hitting EOS at ~12 tokens. The B=1 number was misleadingly low before because the prior prompt asked for "a single word". * Add async-eval pipelining variant to the bracket -- confirms mx.async_eval alone doesn't close the gap (which means the missing optimization is `mx.compile`, not just async dispatch). * Add Qwen3 bracket test alongside the Gemma one. * Document the gap explicitly in the file header so future optimisation work has a clear target. Honest headline numbers (M4 Max, weights memory-mapped from cache): Gemma 26B MoE warm TTFT 280-352 ms Gemma 26B MoE cold load 3.32 s (re-page from cache) Gemma 26B MoE encrypt+decrypt 0.10 ms (free) Gemma 26B MoE steady-state decode 32-40 tok/s B=1 35-39 tok/s B=4 aggregate Qwen3 0.6B steady-state decode 84 tok/s B=1 323 tok/s B=4 aggregate Continuous batching itself works correctly: B=4 aggregate is 2.9x B=1 (Gemma) and 3.8x B=1 (Qwen). The dispatch-overhead headwind applies equally to all batch sizes. 151 / 151 tests pass in 71 s with both env vars set. * Compare against mlx_lm batched + greedy fast-path in BatchScheduler The previous perf audit only compared B=1 against mlx_lm. This commit extends the comparison to B=1, B=2, B=4 by adding a Python benchmark script (scripts/mlx_lm_batch_bench.py) that drives mlx_lm's upstream BatchGenerator, and applies one targeted Swift-side optimization based on what the comparison surfaced. Reference numbers (mlx_lm 0.31.3, M4 Max, decode-only tok/s): Qwen3 0.6B-8bit B=1: 265 B=2: 694 B=4: 1119 Gemma 4 26B-A4B-it-8bit MoE B=1: 74 B=2: 126 B=4: 181 The gap WIDENS with batch size, which pointed at an O(B) overhead in our per-row sampling path. Smoking gun: GenerationBatch.step takes a slow path whenever ANY row's sampler is non-nil, doing B separate slice + sample + concat ops (=> 9 kernel launches per token at B=4) instead of the vectorized fallback (=> 1 kernel launch). Our BatchScheduler.submit was passing a non-nil greedy closure even when temperature == 0, forcing every batch through the slow path. Fix: when temperature <= 0, pass `nil` so the row falls through to the vectorized fallback. The fallback is also greedy, so the result is identical -- only the dispatch path changes. Per-row temperature / top-P / top-K / seed all still work for non-greedy rows. Swift numbers after the fix (decode-only): Qwen3 0.6B-8bit B=1: 88 B=2: 181 B=4: 351 (was 84 / 174 / 323) Gemma 4 26B-A4B-it-8bit MoE B=1: 37 B=2: 23 B=4: 42 (was 33 / 21 / 39) Modest +6-13% across the board. The remaining 3-4x gap to Python is at the MLX-Swift dispatch layer (per-step kernel-launch overhead); mlx_lm closes it via `mx.compile` on the decode step, which isn't applied in mlx-swift-lm. That's a separate workstream. Continuous batching scaling is still healthy: Qwen B=4 / B=1 = 4.0x (matches mlx_lm's 4.2x exactly) Gemma B=4 / B=1 = 1.1x (mlx_lm's is 2.4x; gap reflects MoE expert dispatch where Python's compile pays off most) Other changes: * scripts/mlx_lm_batch_bench.py -- runnable apples-to-apples bench for future regression checks. Reproduces the reference numbers in the file header. * Update PerformanceLiveTests.swift docstring with the side-by-side table so the gap is visible to anyone reading the test. 151 / 151 tests pass. * Perf compare mlx_lm batching and bump mlx-swift-lm decode optimizations The user called out that our Gemma 26B throughput looked too low, so this commit makes the comparison apples-to-apples against mlx_lm Python's BatchGenerator and bumps the mlx-swift-lm submodule to the optimized main commit. New reference script: scripts/mlx_lm_batch_bench.py It runs mlx_lm.generate.BatchGenerator at B=1/2/4 over the same long-output prompt used by PerformanceLiveTests and reports prefill+1, decode-only TPS, and aggregate TPS. Reference numbers on M4 Max: Qwen3 0.6B-8bit B=1: 265 B=2: 694 B=4: 1119 tok/s Gemma 4 26B-A4B-it-8bit MoE B=1: 74 B=2: 126 B=4: 181 tok/s Swift improvements landed in Layr-Labs/mlx-swift-lm@b02ea5b: - mlx_lm-style double buffering in GenerationBatch: constructor primes the first token, next() returns current token while async-evaluating the following token. - Greedy fast path avoids logSumExp: argMax(logits) == argMax(logprobs), and we don't expose logprobs downstream today. - BatchScheduler now passes nil for temperature=0 samplers so batches use the vectorized greedy fallback instead of per-row slice/sample/concat. - Token tensors are UInt32 to match mlx_lm. - BatchKVCache now exposes innerState and KVCache conforms to Updatable, which fixes the cache state surface needed for future compile work. Measured Swift deltas: Qwen3 0.6B: B=1 decode ~84 -> ~104 tok/s B=4 aggregate ~323 -> ~363 tok/s Gemma 26B MoE: B=1 decode ~32 -> ~37 tok/s B=4 aggregate ~39 -> ~40 tok/s This closes the avoidable scheduler/batching overhead we found, but does not fully close the remaining 2-4x gap to Python. The bracket test shows BatchGenerator/BatchScheduler are now within noise of the pure model loop; the remaining gap is in mlx-swift model dispatch / lack of stateful mx.compile support. Attempting to compile the batched-cache decode graph still fails in mlx-swift with "uncaptured inputs", so that remains an upstream library workstream rather than a provider scheduler bug. * Clarify release-mode batch performance measurements The previous perf notes mixed debug-mode Swift numbers with mlx_lm Python reference numbers, which made the Swift engine look far worse than it is. This test-only cleanup makes the performance suite report the data needed to keep comparisons honest. Changes: - Update the PerformanceLiveTests header to state explicitly that mlx_lm comparisons must use `swift test -c release`; debug Swift is several times slower and not a valid reference. - Add direct BatchGenerator B=2/B=4 decode-only measurements to the bracket test, in addition to pure loop and BatchScheduler.submit. - Add "model-side scheduler" TPS in the public batched test so we can distinguish model decode speed from public text streaming / AsyncStream / detokenization costs. Release-mode checks on this machine: - Qwen3 0.6B direct BatchGenerator B=4: ~1130 tok/s, matching mlx_lm's ~1119 tok/s reference. - Gemma 4 26B-A4B-it-8bit direct BatchGenerator B=4: ~186 tok/s, matching mlx_lm's ~181 tok/s reference. - BatchScheduler.submit B=1 decode bracket also lands at the direct model rate in release mode (~402 tok/s Qwen, ~79 tok/s Gemma); public streaming tests report separate model-side and aggregate numbers so regressions are localizable. No production code changes in this commit. * Complete Swift provider runtime verification * Bridge Rust updater to Swift provider bundles * Add Rust to Swift updater E2E tests * Add Rust bridge release workflow * E2E testbed: integration tests, profiling, and benchmarking infrastructure (#136) * Flatten coordinator/internal/ to coordinator/, add E2E integration test suite Promote Go module root from coordinator/ to repo root so the e2e test suite can import coordinator packages. Flatten coordinator/internal/ to coordinator/ to remove the Go internal package restriction. All import paths change from github.com/eigeninference/coordinator/internal/X to github.com/eigeninference/d-inference/coordinator/X. The module path is now github.com/eigeninference/d-inference. 12 E2E integration tests using the Swift provider (mlx-swift backend): - NonStreamingInference, StreamingInference - MultipleRequestsAccounting, E2EEncryptionCorrectness - BillingBalanceDeduction, ProviderPayoutSplit, ReferralRewardDistribution - InsufficientBalance, InvalidModel - StreamingContentValidation, ConcurrentRequests, AttestationHeaders Each test gets its own isolated suite (Postgres + coordinator + provider) via startSuite(t). A semaphore serializes suite lifecycles to prevent GPU contention from concurrent MLX model loads. Update CI workflows to reference go.mod at repo root, exclude e2e/ from unit tests, and use swift build for the provider. * Move coordinator e2e back to coordinator/internal/e2e/ The coordinator's own e2e package was incorrectly flattened into coordinator/e2e/ alongside the repo-root e2e/ testbed suite. Restore it to coordinator/internal/e2e/ where it belongs. * Run integration tests on any PR, not just master/main * Fix CI: install Docker on macos-15, increase timeout to 30m, serial tests * Use colima for Docker on macOS CI * Remove invalid --no-mount flag from colima start * Add native Postgres fallback, drop Docker/colima from CI Docker Desktop and colima both fail on macOS CI runners due to virtualization restrictions. Add a native Postgres lifecycle that uses initdb + postgres directly (installed via Homebrew). The Start() method tries Docker first, falls back to native. CI now installs postgresql@16 via brew instead of Docker. * Download MLX model in CI before running integration tests * Use Python API for model download (huggingface-cli is deprecated) * Use shared suite across all integration tests Instead of starting a new suite (Postgres + coordinator + provider + model load) per test, use a single shared suite initialized on first access. This cuts total test time from ~18min to ~3min since the expensive model load only happens once. * Build provider in debug mode for CI (skips SIP/security checks) CI macOS runners have SIP disabled, which causes the provider to exit with 'System Integrity Protection is disabled'. Debug builds skip verifySecurityPosture() via #if !DEBUG, allowing tests to run on CI. Add TESTBED_PROVIDER_CONFIG env var (default: release) to control the Swift build configuration from testbed. * Force-trust provider in tests, disable frequent challenges CI macOS runners have SIP disabled, which causes the provider to fail attestation challenges. Add ForceTrustProvider() to override status/trust/SIP verification for testing, set challenge interval to 1h, and add a 3s delay after registration to let the initial challenge fire before overriding. * Force all privacy capabilities in ForceTrustProvider for testing The private-text routing gate checks PythonRuntimeLocked and DangerousModulesBlocked which are always false on the Swift backend (no Python runtime). ForceTrustProvider now sets all privacy capabilities to true and drains queued requests immediately after trust promotion. * Restore per-test isolated suites Each test gets its own Postgres + coordinator + provider. With debug builds, ForceTrustProvider, native Postgres, and model pre-download, each suite starts in ~15-20s. * Add load generator, profiling tests, multi-provider support - Suite.Providers is now []*Provider; TESTBED_NUM_PROVIDERS env var controls how many provider subprocesses start per suite - New LoadGenerator in testbed/load.go with configurable concurrency, total requests, streaming, max_tokens, temperature - New profile tests: SingleProviderStreaming, SingleProviderNonStreaming, HighConcurrency — each prints segment tables with mean/p50/p95/max - Existing integration tests (NonStreaming, Streaming, Concurrent) now emit Instrument events and print profile tables - Profile SummaryTable uses millisecond resolution instead of microsecond * Add multi-model provider specs, user pool, and latency decomposition headers SuiteConfig now takes ModelSpecs (model ID + provider count per model) and NumUsers. Providers are started per-spec with unique PID files (fixes single-instance lock killing sibling providers). A user pool with round-robin API key rotation is created at startup. Coordinator sets X-Queue-Wait-Ms and X-Provider-Latency-Ms response headers from PendingRequest timing fields (QueuedAt, DispatchedAt, FirstChunkAt). LoadGenerator parses these and emits per-segment stats: client_to_coordinator, queue_wait, coordinator_to_provider, provider_to_client. Provider ProcessLifecycle respects DARKBLOOM_PID_FILE env var for multi-instance testing. Add SetSkipChallenge to Server for test runs. * Rename SegmentClientToCoordinator to SegmentTotalE2E The segment measures full end-to-end wall clock time, not just client-to-coordinator latency. The old name was misleading. * Decompose X-Timing header into per-phase microsecond breakdown Replace X-Queue-Wait-Ms / X-Provider-Latency-Ms with a single X-Timing JSON header containing parse_us, reserve_us, route_us, queue_us, encrypt_us, dispatch_us, provider_us. Move timing fields onto a RequestTiming struct in PendingRequest. LoadGenerator parses the JSON and emits per-segment stats with auto ms/µs precision. * Add latency regression assertions, SegmentStatsMap, and heavy-load benchmark - Add SegmentStatsMap() to LoadResult for per-segment mean/p50/p95/p99/max - Wire coordinator overhead assertions into all benchmark and profile tests - Update DefaultThresholds with realistic values based on benchmark data - Add CoordinatorOverheadThresholds() alias - Deduplicate SegmentStatsView (assert package uses type alias to testbed) - Clean up profile_test.go: remove redundant second load loop, use assertions - Add PromptBytes field to RequestConfig for large-payload testing - Add HeavyLoad 100-concurrent 10KB benchmark - Replace bubble sort with sort.Slice in computeStats * Split CI into eval + benchmark jobs, post benchmark results as PR comment Integration tests (TestIntegration|TestProfile) run on every push/PR. Benchmarks (TestBenchmark) run only on PRs and post a markdown summary as a PR comment via gh pr comment. LoadResult and AssertionReport gain SummaryMarkdown() methods for markdown table formatting. A TestMain in benchmark_test.go writes the aggregated markdown to BENCHMARK_MD_PATH when set. * Skip multi-model benchmark in CI (gemma model not downloaded) The M1 Virtual CI runner only downloads Qwen3.5-0.8B; the gemma multi-model test requires a second model that isn't available. * Download gemma-3-270m-4bit in CI, remove multi-model skip * Include model IDs and RAM sizes in benchmark PR comment * address feedback * fix: soft-fail Swift tests on dev + download full model for CI * feat: environment-scoped R2 + coordinator secrets for dev/prod release isolation - Move R2_BUCKET from vars to secrets so it participates in GitHub environment scoping (dev vs prod get different buckets/credentials) - Add documentation header listing all environment-scoped secrets required per environment - Soft-fail Swift unit tests on dev releases (live MLX model cache may be incomplete on CI) - Download full model (remove --include filter) for deterministic CI cache seeding * feat: DEV_/PROD_ prefixed repo secrets for R2 + coordinator env isolation Both release workflows now resolve DEV_ or PROD_ prefixed repo secrets in a resolve-env step using bash indirection — no GitHub environments needed. The environment: gate is removed since secrets live at repo level with prefixes. Required repo secrets: DEV_R2_ACCESS_KEY_ID, PROD_R2_ACCESS_KEY_ID DEV_R2_SECRET_ACCESS_KEY, PROD_R2_SECRET_ACCESS_KEY DEV_R2_ENDPOINT, PROD_R2_ENDPOINT DEV_R2_BUCKET, PROD_R2_BUCKET DEV_R2_PUBLIC_URL, PROD_R2_PUBLIC_URL DEV_COORDINATOR_URL, PROD_COORDINATOR_URL DEV_RELEASE_KEY, PROD_RELEASE_KEY * fix: RELEASE_KEY is shared, not env-prefixed * fix: resolve env secrets inline to avoid GitHub cross-job output masking * fix: add DEV_RELEASE_KEY/PROD_RELEASE_KEY to env-prefixed secrets * Add STRIDE threat model for runtime security review 40 threats across 9 trust boundaries (coordinator/provider WebSocket, provider operator vs process, browser/UI, Apple MDM/MDA, admin API, inference engine, payments, Apple attestation chain). Adversaries: malicious provider, malicious consumer, external attacker. Each threat includes affected_files globs, mitigations with status, open_findings links to the existing security audit, and a detection_hint for automated PR review. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Expand threat model trust boundaries with implementation detail Each of the 9 trust boundaries now documents how_it_works (exact code paths, line numbers, auth mechanisms, data flows) and current_limitations (specific open gaps with SEC-* references). Sources: coordinator/internal/ api/{server,provider,release_handlers,device_auth,billing_handlers}.go, registry/registry.go, attestation/, mdm/, provider-swift/Sources/ ProviderCore/Security/{AntiDebug,BinaryHasher,SecureEnclaveIdentity, SecurityHardening}.swift, Crypto/NodeKeyPair.swift, Inference/ {BatchScheduler,IdleTimeoutPolicy,InferenceCancellation}.swift, ProviderLoop.swift, console-ui/src/{hooks/useAuth,lib/{api,store, encryption}}.ts, next.config.ts. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Add threat model PR review workflow On every PR against master/main, the workflow: 1. Gets the PR diff via gh pr diff 2. Matches changed files against affected_files globs in docs/threat-model.yaml 3. Calls Claude API (claude-sonnet-4-6) with the focused diff + full threat model 4. Posts (or updates) a single PR comment with STRIDE-based security analysis Uses prompt caching on the static threat model block to minimise API cost on repeated pushes. The comment marker  lets the workflow update rather than append on each push. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Persistent Secure Enclave key with keychain access group enforcement (#146) * Add persistent Secure Enclave attestation key with keychain access group enforcement Replace ephemeral CryptoKit SE keys with persistent Security framework keys stored in the macOS data protection keychain. The key is bound to the signing team's keychain access group (SLDQ2GJ6TL.io.darkbloom.provider), enforced by securityd at the kernel level. A patched binary re-signed with codesign -s - gets errSecMissingEntitlement and cannot access the key. - PersistentEnclaveKey: Security framework SE key with SecKeyCreateRandomKey, kSecAttrIsPermanent, and team-scoped access group - AttestationSigner protocol: abstracts over both ephemeral and persistent keys - ProviderLoop: tries persistent key first, falls back to ephemeral with warning - Entitlements plist with keychain-access-groups for production signing - 8 tests covering creation, persistence, signing, deletion, protocol conformance * Embed provisioning profile in .app bundle for persistent SE key The data protection keychain requires a provisioning profile to authorize the keychain-access-groups entitlement. Wrap the CLI binaries in a minimal Darkbloom.app bundle with embedded.provisionprofile so the persistent SE attestation key works on provider machines. - release-swift.yml: new step decodes PROVISIONING_PROFILE_BASE64 secret, builds Darkbloom.app/Contents/ structure, signs bundle + individual binaries - install.sh: detects .app bundle layout, symlinks bin/ into the app bundle - Backward-compatible: falls back gracefully if secret is not set or if provider receives a flat (pre-.app) bundle * Add com.apple.application-identifier to provider entitlements Required for data protection keychain access. Must match the bundle ID in the provisioning profile (SLDQ2GJ6TL.io.darkbloom.provider). * Address review: data protection keychain flag, tighter error handling, real SE probe Codex P1 / hank P1: - coordinator/api/install.sh: restore __DARKBLOOM_COORD_URL__ placeholder (the coordinator templates this at serve time via server.go; hardcoding the URL broke dev/self-hosted coordinators) - PersistentEnclaveKey: add kSecUseDataProtectionKeychain: true to all Security framework calls. Without it, queries may hit the legacy file-based keychain where access group enforcement is silently ignored. hank P2: - loadOrCreate: catch only errSecItemNotFound before falling through to createNew. Auth failures, locked keychain, and missing entitlement now propagate to the caller instead of racing with key creation. - isAvailable: probe real SE capability via CryptoKit's SecureEnclave.isAvailable instead of just checking macOS version. Now returns false on Intel Macs without T2 and macOS VMs without virtualized SE. Added doc comment noting the entitlement dependency. * fix(api): add code and param fields to OpenAI error responses (#144) The errorResponse function only populated type and message, missing code and param required by the OpenAI API spec. Without code, SDKs cannot programmatically distinguish error types (e.g. Python SDK e.code returns None, retry logic breaks, Sentry groups all errors as one). Changes: - errorResponse now accepts optional errorDetailOpt variadic args - code defaults to errType for backward compatibility - withParam() and withCode() helpers for call-site overrides - model-not-found errors include param="model" - model-is-required errors include param="model" - insufficient_funds uses OpenAI-canonical code "insufficient_quota" - rate_limit_exceeded gets explicit withCode for clarity All 202 existing call sites are backward-compatible: the variadic signature means they compile unchanged, and the default code=errType matches the implicit behavior SDKs already assumed. Closes #142 * feat: add Datadog observability stack for dev coordinator (#143) * Fix Darkbloom analytics tracking * Harden release workflow protections (#103) * Harden release registration and binary hash policy (#99) * Harden release registration and binary hash policy * derive release download URL from allowlist * Stabilize provider coordinator test --------- Co-authored-by: Gajesh Naik <26431906+Gajesh2007@users.noreply.github.com> * Remove stale Python integration test (#109) * e2e: add local simulation environment skeleton Introduces scripts/e2e-runner.py, a Python orchestrator that spins up the real coordinator binary with test-friendly configuration (in-memory store, mock billing, no trust requirements) alongside a simulated or real provider, and runs HTTP/WebSocket-level assertions against the live stack. Key components: - Coordinator class: builds and spawns coordinator with EIGENINFERENCE_MIN_TRUST=none, EIGENINFERENCE_BILLING_MOCK=true, and in-memory store - SimulatedProvider: pure-Python WebSocket client speaking the full provider protocol (register, attestation challenge/response, heartbeat, inference request/response) - Test framework: decorator-based test registration, pass/fail summary, signal-safe cleanup via atexit + signal handlers - Test stubs: test_basic (registration + discovery), test_inference (consumer request routing), test_multi_provider (two providers, same model) TODO: - RealProvider wrapper around darkbloom serve --coordinator - Coordination between provider challenge cycle and consumer request timing - API key handling for consumer vs admin routes - Python dependency management (websockets, cryptography) * Revert "e2e: add local simulation environment skeleton" This reverts commit d02074e. The Python E2E runner adds noise on top of the existing Go integration tests (internal/api/integration_test.go + fullstack_integration_test.go) which already cover the full coordinator protocol surface. The cross-language orchestration doesn't buy anything over what httptest.Server + simulated providers already provide. * Remove stale Python integration test @ethenotethan tests/integration_test.py is superseded by the Go-based coordinator integration tests at coordinator/internal/api/: - Test coverage for coordinator protocol (register, challenge, heartbeat, inference) is covered by integration_test.go using httptest.Server + Go simulated providers — same coverage, no binary build needed - Full-stack GPU inference is covered by fullstack_integration_test.go with real vllm-mlx backends (gated behind LIVE_FULLSTACK_TEST=1) - The Python test uses stale binary names ('eigeninference-provider'), old flags ('--backend mlx-lm'), and predates attestation challenges, E2E encryption, and the vllm-mlx backend migration - No external dependency coverage (Postgres, Stripe, etc.) is lost — the coordinator main.go wiring for those is trivially tested elsewhere - The Python SDK tests (4.5.x) belong in the SDK repo, not the infra repo --------- Co-authored-by: Hank Bob <hankbob@researchoors.com> * chore: remove unused dependencies (#112) * chore: remove unused dependencies * test: fix console ui test isolation * chore: prune repo-wide dead code findings * ci: run CI on any PR, not just master/main (#119) * ci: remove racing deploy-dev-coordinator workflow (#137) Cloud Build (deploy/gcp/cloudbuild.yaml) already deploys the coordinator on the same trigger (push to master touching coordinator/** or deploy/gcp/**). Having both paths active creates a race condition where two CI systems simultaneously deploy to the same dev VM — see #115. * feat: add Datadog observability stack for dev coordinator Install Datadog Agent on the dev GCE VM (DogStatsD, APM, journald logs) and wire the coordinator to emit structured metrics, split attestation counters, model_type tags, reactive provider-count gauges, and a completion-tokens counter. Rebuild the dev dashboard with 7 sections covering metrics, logs, traces, and system health. * fix: prevent double-decrement when untrusted provider disconnects Disconnect now checks StatusUntrusted before decrementing the online counter and model-provider gauges, since MarkUntrusted already decremented them. * feat: add fleet version and binary hash observability New metrics: - providers.per_version gauge (per provider binary version) - providers.per_binary_hash gauge (per attested binary hash) - coordinator.min_provider_version_set gauge (1 when configured) - provider_version_below_minimum counter (tagged by gate and version) Gates instrumented: - registration (provider.go) - challenge revalidation (provider.go) - manifest sync (server.go) Registry additions: - ProviderCountByVersion() - ProviderCountByBinaryHash() Dashboard: Fleet Version & Binary Hash group with providers by version, providers by binary hash, min provider version, below-minimum events, and top binary hashes toplist. * fix: update Dockerfile + cloudbuild for go.mod at repo root go.mod moved from coordinator/ to repo root during the swift-provider merge. Build context is now repo root, Dockerfile copies coordinator/ subdir explicitly. * fix: chmod +x coordinator binary in Dockerfile * fix: ensure coordinator binary is executable in builder stage * fix: rename coordinator source dir in builder to avoid colliding with binary path * fix: copy full repo in Dockerfile builder so go.mod resolves all packages * fix: remove unused modelTypeTag and format Go files for CI * fix: skip python/dangerous-modules check for swift runtime in private text gate * billing telemetry + MarkUntrusted race fix + Swift routing tests - Add Datadog histogram metrics for reservation amounts, settlement refunds, provider credits, and platform fees - Add store.debit/credit.latency_ms histograms for DB operation timing - Add billing.cost_clamped and billing.reservation_refunds counters - Fix race in MarkUntrusted: hold r.mu write lock through counter decrement to prevent double-decrement with Disconnect - Add unit tests for Swift provider privacy caps (with/without Python) - Add E2E test for Swift provider routing via challenge-verified path - Update dev-network-dashboard.json with Billing & Store group * fix Heartbeat reviving untrusted providers causing onlineCount double-decrement * revert orthogonal landing/console-ui/provider changes * remove unbounded binary_hash cardinality, add input token metrics + store latency, fix dashboard group-by * fix review feedback: ModelType() untrusted filter, routing.cost_ms by provider, billing in cents, dead comment --------- Co-authored-by: Gajesh Naik <26431906+Gajesh2007@users.noreply.github.com> Co-authored-by: anupsv <6407789+anupsv@users.noreply.github.com> Co-authored-by: hankbob <hankbobtheresearchoor@gmail.com> Co-authored-by: Hank Bob <hankbob@researchoors.com> * migration: harden Rust→Swift cutover end-to-end Twelve fixes informed by three reviewer subagents (codex-rescue, independent Claude, full pipeline audit) to ensure the bridge release → Swift release cutover works on first try, with no silent breakage. Coordinator: - accept darkbloom-bundle-<platform>.tar.gz (was eigeninference-bundle-) - restore TestProviderRegistrationWithoutAttestationRejectedWhenBinaryHashPolicyConfigured (dropped during the master→swift-provider merge) release-swift.yml: - ship bin/{darkbloom,darkbloom-enclave,mlx.metallib} as real-file copies (was symlinks) so coordinator's tar.TypeReg verifier accepts them and hashes the actual bytes - staple both bin/ AND .app/Contents/MacOS/ paths now that they're independent files - post-codesign verification: fail build if signed CLI is missing the keychain-access-groups entitlement or the access group SLDQ2GJ6TL.io.darkbloom.provider, or if embedded.provisionprofile is absent from the .app - PROVISIONING_PROFILE_BASE64 is now hard-required (no silent ephemeral fallback). Profile is decoded + parsed with plutil/python: verifies TeamIdentifier, keychain-access-groups, application-identifier, and ExpirationDate >= 30 days out - pin MLX python wheel to 0.31.1 to match libs/mlx-swift Cmlx version (was 0.31.2 — patch-level metallib ABI risk) - prod releases now hard-fail Swift tests (was soft-fail for all) release-rust-bridge.yml: - rename bridge bundle to darkbloom-bundle-<platform>.tar.gz uniformly so coordinator accepts the registration Both release workflows: - PROD_* secrets fall back to legacy unprefixed (R2_ACCESS_KEY_ID, RELEASE_KEY, COORDINATOR_URL) + vars.R2_BUCKET when PROD_* empty. Fails hard if neither resolves. provider/src/main.rs (bridge auto-update): - new rewrite_launchd_plist_for_swift: extracts ProgramArguments from the Rust plist (`serve --coordinator URL --model M`), converts to Swift shape (`start --foreground --coordinator-url URL --model M`), atomic rename - install_swift_update_bundle_at: if Darkbloom.app/Contents/MacOS/ exists in the extracted bundle, replace bin/{darkbloom,darkbloom- enclave,mlx.metallib} with symlinks into .app/MacOS and route the launchd plist's ProgramArguments[0] at the .app's MacOS binary path. This puts the embedded provisioning profile in scope at runtime, so the persistent SE key (PR #146) doesn't get errSecMissingEntitlement on first attestation post-cutover - plist_path is now an Option<&Path> so tests can avoid touching the developer machine's real ~/Library/LaunchAgents Tests added (all passing): - 6 plist-rewrite unit tests: extract / convert / rewrite / install- with-plist / .app-aware install / hash-only install - 1 ported coordinator attestation policy test - existing 7 auto-update integration tests still pass (302 → 303 total) Verified by audit: - macos-26-xlarge has Xcode 26.2 / Swift 6.2, satisfies all swift-tools-version requirements - LatestProviderVersion ordering: semver THEN created_at in both memory and Postgres stores - /api/version JSON shape matches what auto_update_check_with_install_dir expects - StartCommand --foreground doesn't recurse into launchAgent.installAndStart - Swift ModelScanner reads ~/.cache/huggingface/hub (same as Rust) - AuthTokenStore path parity (~/.darkbloom/auth_token) Deployment prerequisite: coordinator changes must be deployed (master → dev Cloud Build, then human ecloud deploy to prod) BEFORE tagging any release. Bridge registration will 400 against an older coordinator that doesn't know about the darkbloom-bundle- filename. * chore: cargo fmt on plist-migration code Post-rustfmt: long format!() args wrapped, with_context closure pulled onto one line, ternary-style assignment broken into if/else. No behavior change — `cargo test --bin darkbloom` still 303 pass / 0 fail. --------- Co-authored-by: ethenotethan <42627790+ethenotethan@users.noreply.github.com> Co-authored-by: anupsv <6407789+anupsv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: hankbob <hankbobtheresearchoor@gmail.com> Co-authored-by: Hank Bob <hankbob@researchoors.com>

vercel Bot deployed to Preview – d-inference-console-ui-dev May 10, 2026 16:25 View deployment

vercel Bot deployed to Preview – d-inference May 10, 2026 16:25 View deployment

chatgpt-codex-connector Bot reviewed May 10, 2026

View reviewed changes

hankbobtheresearchoor reviewed May 10, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference-landing May 10, 2026 18:08 View deployment

vercel Bot deployed to Preview – d-inference May 10, 2026 18:09 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 10, 2026 18:09 View deployment

chatgpt-codex-connector Bot reviewed May 10, 2026

View reviewed changes

Add com.apple.application-identifier to provider entitlements

4f1b94f

Required for data protection keychain access. Must match the bundle ID in the provisioning profile (SLDQ2GJ6TL.io.darkbloom.provider).

vercel Bot deployed to Preview – d-inference-landing May 14, 2026 21:08 View deployment

vercel Bot deployed to Preview – d-inference May 14, 2026 21:08 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 14, 2026 21:08 View deployment

Merge remote-tracking branch 'origin/swift-provider' into persistent-…

4f404c1

…se-key

vercel Bot deployed to Preview – d-inference-landing May 14, 2026 21:29 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 14, 2026 21:30 View deployment

vercel Bot deployed to Preview – d-inference May 14, 2026 21:30 View deployment

chatgpt-codex-connector Bot reviewed May 14, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference-landing May 14, 2026 21:44 View deployment

vercel Bot deployed to Preview – d-inference May 14, 2026 21:45 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 14, 2026 21:45 View deployment

chatgpt-codex-connector Bot reviewed May 14, 2026

View reviewed changes

Gajesh2007 merged commit 4a0dae5 into swift-provider May 14, 2026
10 of 11 checks passed

		public static let defaultAccessGroup = "SLDQ2GJ6TL.io.darkbloom.provider"

		public static let defaultLabel = "io.darkbloom.provider.attestation-signing.v1"

Conversation

Gajesh2007 commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Security model

Requires for production

Test plan

Uh oh!

vercel Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

hankbobtheresearchoor left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

hankbobtheresearchoor May 10, 2026

Choose a reason for hiding this comment

Uh oh!

hankbobtheresearchoor May 10, 2026

Choose a reason for hiding this comment

Uh oh!

hankbobtheresearchoor May 10, 2026

Choose a reason for hiding this comment

Uh oh!

hankbobtheresearchoor May 10, 2026

Choose a reason for hiding this comment

Uh oh!

hankbobtheresearchoor commented May 10, 2026

Security model observations (not line-anchored)

Uh oh!

github-actions Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

1-provider-streaming

Latency Decomposition

Assertion Report: PASS

1-provider-non-streaming

Latency Decomposition

Assertion Report: PASS

7-provider-multi-model

Latency Decomposition

Assertion Report: PASS

3-provider-high-concurrency

Latency Decomposition

Assertion Report: PASS

1-provider-queue-saturation

Latency Decomposition

Assertion Report: PASS

3-provider-20-users

Latency Decomposition

Assertion Report: PASS

1-provider-scaling

Latency Decomposition

Assertion Report: PASS

3-provider-scaling

Latency Decomposition

Assertion Report: PASS

5-provider-scaling

Latency Decomposition

Assertion Report: PASS

3-provider-heavy-100conc-10kb

Latency Decomposition

Assertion Report: PASS

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

Gajesh2007 commented May 10, 2026 •

edited

Loading

vercel Bot commented May 10, 2026 •

edited

Loading

github-actions Bot commented May 10, 2026 •

edited

Loading