Skip to content

Persistent Secure Enclave key with keychain access group enforcement#146

Merged
Gajesh2007 merged 5 commits into
swift-providerfrom
persistent-se-key
May 14, 2026
Merged

Persistent Secure Enclave key with keychain access group enforcement#146
Gajesh2007 merged 5 commits into
swift-providerfrom
persistent-se-key

Conversation

@Gajesh2007
Copy link
Copy Markdown
Member

@Gajesh2007 Gajesh2007 commented May 10, 2026

Summary

  • Adds PersistentEnclaveKey — a Secure Enclave P-256 signing key stored in the macOS data protection keychain with team-scoped access group (SLDQ2GJ6TL.io.darkbloom.provider)
  • Only binaries signed by team SLDQ2GJ6TL can access the key — enforced by securityd at the kernel level. A patched binary re-signed with codesign -s - gets errSecMissingEntitlement
  • Introduces AttestationSigner protocol so ProviderLoop tries persistent key first, falls back to ephemeral CryptoKit key gracefully

Security model

The key is created once via SecKeyCreateRandomKey with kSecAttrTokenIDSecureEnclave + kSecAttrIsPermanent: true and persists in the data protection keychain. The private key never leaves the Secure Enclave hardware — SecKeyCopyExternalRepresentation fails by design.

Access enforcement chain:

Secure Boot → kernel → AMFI (code signing) → securityd (keychain ACL)
  → access group check: caller must be signed by team SLDQ2GJ6TL
  → Secure Enclave: signs only if securityd approves

Verified via POC — 16 attack vectors tested:

  • Ad-hoc signed binary: BLOCKED
  • Different team ID: BLOCKED
  • Root + security CLI: BLOCKED
  • Raw keychain DB extraction: encrypted blob only
  • Private key export: hardware enforced

Requires for production

A Developer ID provisioning profile from Apple Developer Portal authorizing the keychain-access-groups entitlement for the provider's App ID. Without it, the code gracefully falls back to the existing ephemeral CryptoKit key.

Test plan

  • swift build passes
  • swift test — 67/67 tests pass (8 new persistent enclave key tests)
  • POC verified: key creation, persistence, signing, access denial for ad-hoc/cross-team binaries
  • Production test with Eigen Labs Developer ID provisioning profile

🤖 Generated with Claude Code

…oup enforcement

Replace ephemeral CryptoKit SE keys with persistent Security framework keys stored
in the macOS data protection keychain. The key is bound to the signing team's
keychain access group (SLDQ2GJ6TL.io.darkbloom.provider), enforced by securityd
at the kernel level. A patched binary re-signed with codesign -s - gets
errSecMissingEntitlement and cannot access the key.

- PersistentEnclaveKey: Security framework SE key with SecKeyCreateRandomKey,
  kSecAttrIsPermanent, and team-scoped access group
- AttestationSigner protocol: abstracts over both ephemeral and persistent keys
- ProviderLoop: tries persistent key first, falls back to ephemeral with warning
- Entitlements plist with keychain-access-groups for production signing
- 8 tests covering creation, persistence, signing, deletion, protocol conformance
@vercel
Copy link
Copy Markdown

vercel Bot commented May 10, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
d-inference Ready Ready Preview May 14, 2026 9:45pm
d-inference-console-ui-dev Ready Ready Preview May 14, 2026 9:45pm
d-inference-landing Ready Ready Preview May 14, 2026 9:45pm

Request Review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e3635753e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +282 to +284
if #available(macOS 13.0, *) {
return true
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate persistent-key path on actual SE availability

PersistentEnclaveKey.isAvailable currently returns true for any non-simulator macOS 13+ host, which does not actually verify Secure Enclave support. On environments like macOS VMs or unsupported hardware, callers will enter the persistent-key flow and then fail later with keychain/SecKey errors instead of cleanly treating SE as unavailable; this also makes the new tests' availability guards unreliable. Use a real probe (for example SecureEnclave.isAvailable or a Security-framework capability check) so this flag reflects hardware capability.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@hankbobtheresearchoor hankbobtheresearchoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Clean abstraction and good graceful fallback logic. Two real issues: the entitlements plist has the wrong key name (🔴), and the loadOrCreate error handling should be tighter (🟡). Several observations about the security model implications of persistent identity.

1 🔴 Must Fix · 1 🟡 Should Fix · 4 🔵 Observations

<key>com.apple.security.network.server</key>
<true/>
<key>keychain-access-groups</key>
<array>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Wrong entitlement key name. The key should be com.apple.security.keychain-access-groups, not keychain-access-groups. Apple's hardened runtime entitlements all live under the com.apple.security.* namespace.

The existing scripts/entitlements.plist already has this right:

<key>com.apple.security.keychain-access-groups</key>

With the bare keychain-access-groups, codesign --entitlements will embed an entitlement that securityd doesn't recognize, and keychain access will fail with -34018 (errSecMissingEntitlement).

Additionally, this file is a duplicate of scripts/entitlements.plist (which CI already references). Either fix the key name and update release-swift.yml to use this file instead, or remove this file and keep the single source of truth in scripts/.

let keyLabel = label ?? defaultLabel

if let existing = try? findExisting(accessGroup: group, label: keyLabel) {
logger.info("Loaded existing persistent Secure Enclave key")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Silent fallthrough on transient keychain errors. try? swallows all errors from findExisting, including transient ones like errSecAuthFailed (keychain locked) or errSecDeviceUnavailable. These should probably be surfaced rather than masked by a fallthrough to createNew, which then races with the existing key.

Suggest:

if let existing = try? findExisting(accessGroup: group, label: keyLabel) {
    return existing
}
// Only fall through on not-found, propagate entitlement/auth errors

Or catch PersistentEnclaveKeyError.keyLookupFailed(status: errSecItemNotFound) explicitly and re-throw everything else.

public static let defaultAccessGroup = "SLDQ2GJ6TL.io.darkbloom.provider"

public static let defaultLabel = "io.darkbloom.provider.attestation-signing.v1"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 No rotation path. The v1 suffix in defaultLabel implies future versions, but there's no mechanism to discover or migrate to v2. If the key is compromised or needs rotation:

  1. Old key stays orphaned in keychain (no cleanup)
  2. Coordinator still has the old SE pubkey stored
  3. New provider instances create v1 keys with different pubkeys, but the label is the same so they'd overwrite

Consider adding a rotateKey() static method or a darkbloom verify --rotate-key CLI command, and document the coordinator-side SE pubkey update flow.


/// Whether the Secure Enclave is available on this device.
public static var isAvailable: Bool {
#if targetEnvironment(simulator)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 isAvailable doesn't check entitlement. Returns true on any macOS 13+ Apple Silicon machine, but persistent SE key also requires the keychain-access-groups entitlement. Callers who check isAvailable before calling loadOrCreate() will get a misleading true on unsigned debug builds, then hit -34018.

This is consistent with SecureEnclaveIdentity.isAvailable (which also doesn't check entitlements), so not blocking — but worth a doc comment noting the entitlement dependency:

/// - Note: Also requires `com.apple.security.keychain-access-groups` entitlement
///   in the binary's code signature. Without it, `loadOrCreate()` will throw
///   `PersistentEnclaveKeyError.missingEntitlement`.

@hankbobtheresearchoor
Copy link
Copy Markdown
Contributor

Security model observations (not line-anchored)

Persistent identity tradeoff: This PR changes the SE key from ephemeral (session-scoped) to persistent (survives restarts). Previously, if userspace was compromised between attestation checkpoints, the damage was limited to one session — the key was gone on restart. With persistent keys, the same compromised key can sign attestation challenges across restarts until explicitly deleted. The PR body should acknowledge this tradeoff explicitly. The access-group enforcement mitigates this (only signed binaries can use the key), but it doesn't eliminate the window.

Coordinator-side implications: The coordinator already stores SEPublicKey in Postgres and builds a lookup map "sekey:" + SEPublicKey. With persistent keys, the same SE public key appears across reconnects, enabling identity correlation that wasn't possible with ephemeral keys. The coordinator doesn't currently USE SE keys for identity binding (it uses serial number for DisconnectDuplicatesBySerial), but this PR creates the prerequisites. The PR should explicitly state whether SE-key-based identity tracking is intended, to avoid implicit feature drift.

DARKBLOOM_KEYCHAIN_ACCESS_GROUP env var: Fine for dev/testing, but worth a code comment marking it as non-production. In production, the access group should always come from the entitlement — env var overrides could allow pointing at an attacker-controlled keychain group (though the entitlement still gates access).

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 10, 2026

Benchmark Results

Runner: macos-15 (M1 Virtual) | Date: 2026-05-14 21:51 UTC

1-provider-streaming

1 providers, 1 users, 30 requests, concurrency=5, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 30
Success 30
Errors 0
Total Duration 12.673s
Throughput 2.4 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 30 917ms 11ms 3.774s 8.256s
parse 30 39µs 18µs 182µs 232µs
reserve 30 3ms 1ms 9ms 12ms
route 30 407ms 0s 930ms 8.215s
queue_wait 9 1.356s 495ms 8.215s 8.215s
encrypt 30 196µs 145µs 425µs 452µs
dispatch 30 44µs 25µs 194µs 271µs
coordinator_to_provider 30 505ms 4ms 3.76s 3.763s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=38.533µs (threshold=1ms)
parse:p95<=5ms PASS p95=182µs (threshold=5ms)
reserve:mean<=50ms PASS mean=2.6149ms (threshold=50ms)
reserve:p95<=200ms PASS p95=9.095ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=195.566µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=425µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=43.6µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=194µs (threshold=50ms)

1-provider-non-streaming

1 providers, 1 users, 20 requests, concurrency=5, streaming=false

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 20
Success 20
Errors 0
Total Duration 5.913s
Throughput 3.4 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 20 1.447s 613ms 4.38s 4.38s
parse 20 23µs 17µs 81µs 81µs
reserve 20 2ms 1ms 5ms 5ms
route 20 263ms 0s 3.991s 3.991s
queue_wait 4 1.317s 474ms 3.991s 3.991s
encrypt 20 143µs 138µs 191µs 191µs
dispatch 20 21µs 18µs 57µs 57µs
coordinator_to_provider 20 659ms 3ms 3.284s 3.284s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=23.15µs (threshold=1ms)
parse:p95<=5ms PASS p95=81µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.6772ms (threshold=50ms)
reserve:p95<=200ms PASS p95=5.42ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=143.4µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=191µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=21.45µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=57µs (threshold=50ms)

7-provider-multi-model

7 providers, 5 users, 50 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 4 0.5 GB
mlx-community/gemma-3-270m-4bit 3 0.2 GB
Metric Value
Total Requests 50
Success 48
Errors 2
Total Duration 1m5.791s
Throughput 0.7 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 48 4.861s 31ms 33.875s 34.306s
parse 48 70µs 41µs 232µs 266µs
reserve 48 5ms 3ms 20ms 26ms
route 48 630ms 0s 4.253s 10.004s
queue_wait 7 2.89s 3.444s 4.456s 4.456s
encrypt 48 0s 0s 0s 1ms
dispatch 48 65µs 46µs 172µs 327µs
coordinator_to_provider 48 4.221s 10ms 33.859s 34.294s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=69.833µs (threshold=1ms)
parse:p95<=5ms PASS p95=232µs (threshold=5ms)
reserve:mean<=50ms PASS mean=4.814333ms (threshold=50ms)
reserve:p95<=200ms PASS p95=19.857ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=240.562µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=398µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=65.02µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=172µs (threshold=50ms)

3-provider-high-concurrency

3 providers, 10 users, 60 requests, concurrency=20, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 60
Success 60
Errors 0
Total Duration 13.335s
Throughput 4.5 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 60 2.632s 1.404s 8.362s 8.433s
parse 60 0s 0s 0s 6ms
reserve 60 12ms 4ms 43ms 50ms
route 60 1.63s 848ms 8.246s 8.322s
queue_wait 42 2.329s 1.186s 8.246s 8.322s
encrypt 60 0s 0s 1ms 2ms
dispatch 60 47µs 27µs 194µs 273µs
coordinator_to_provider 60 973ms 18ms 4.864s 4.894s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=170.233µs (threshold=1ms)
parse:p95<=5ms PASS p95=293µs (threshold=5ms)
reserve:mean<=50ms PASS mean=11.6219ms (threshold=50ms)
reserve:p95<=200ms PASS p95=42.603ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=245.433µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=561µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=46.5µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=194µs (threshold=50ms)

1-provider-queue-saturation

1 providers, 10 users, 40 requests, concurrency=15, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 40
Success 40
Errors 0
Total Duration 11.263s
Throughput 3.6 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 40 3.226s 2.77s 5.865s 5.865s
parse 40 72µs 26µs 297µs 682µs
reserve 40 13ms 2ms 47ms 54ms
route 40 2.802s 2.632s 5.768s 5.77s
queue_wait 35 3.203s 2.636s 5.768s 5.77s
encrypt 40 174µs 148µs 338µs 395µs
dispatch 40 32µs 24µs 105µs 160µs
coordinator_to_provider 40 395ms 3ms 3.918s 3.919s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=72.45µs (threshold=1ms)
parse:p95<=5ms PASS p95=297µs (threshold=5ms)
reserve:mean<=50ms PASS mean=12.659025ms (threshold=50ms)
reserve:p95<=200ms PASS p95=47.095ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=174.3µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=338µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=31.5µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=105µs (threshold=50ms)

3-provider-20-users

3 providers, 20 users, 60 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 60
Success 60
Errors 0
Total Duration 12.126s
Throughput 4.9 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 60 843ms 8ms 4.968s 4.969s
parse 60 32µs 27µs 73µs 201µs
reserve 60 4ms 2ms 21ms 23ms
route 60 11ms 0s 0s 514ms
queue_wait 2 319ms 514ms 514ms 514ms
encrypt 60 172µs 141µs 404µs 702µs
dispatch 60 35µs 27µs 116µs 161µs
coordinator_to_provider 60 824ms 3ms 4.946s 4.949s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=31.65µs (threshold=1ms)
parse:p95<=5ms PASS p95=73µs (threshold=5ms)
reserve:mean<=50ms PASS mean=4.006433ms (threshold=50ms)
reserve:p95<=200ms PASS p95=20.584ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=171.766µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=404µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=35.133µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=116µs (threshold=50ms)

1-provider-scaling

1 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 30
Success 30
Errors 0
Total Duration 10.714s
Throughput 2.8 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 30 2.599s 1.888s 6.019s 6.02s
parse 30 35µs 28µs 78µs 82µs
reserve 30 4ms 2ms 12ms 13ms
route 30 2.047s 1.866s 5.993s 5.993s
queue_wait 23 2.671s 1.881s 5.993s 5.993s
encrypt 30 249µs 167µs 577µs 666µs
dispatch 30 44µs 33µs 90µs 177µs
coordinator_to_provider 30 543ms 5ms 4.038s 4.039s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=35.233µs (threshold=1ms)
parse:p95<=5ms PASS p95=78µs (threshold=5ms)
reserve:mean<=50ms PASS mean=3.8423ms (threshold=50ms)
reserve:p95<=200ms PASS p95=11.872ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=249.466µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=577µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=43.566µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=90µs (threshold=50ms)

3-provider-scaling

3 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 30
Success 30
Errors 0
Total Duration 8.408s
Throughput 3.6 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 30 1.562s 30ms 4.731s 4.735s
parse 30 63µs 32µs 211µs 534µs
reserve 30 13ms 11ms 41ms 47ms
route 30 42µs 22µs 202µs 315µs
encrypt 30 0s 0s 0s 2ms
dispatch 30 90µs 40µs 315µs 904µs
coordinator_to_provider 30 1.53s 14ms 4.636s 4.664s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=63.033µs (threshold=1ms)
parse:p95<=5ms PASS p95=211µs (threshold=5ms)
reserve:mean<=50ms PASS mean=13.2937ms (threshold=50ms)
reserve:p95<=200ms PASS p95=41.403ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=255.466µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=441µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=89.8µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=315µs (threshold=50ms)

5-provider-scaling

5 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 5 0.5 GB
Metric Value
Total Requests 30
Success 30
Errors 0
Total Duration 17.95s
Throughput 1.7 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 30 3.725s 11ms 11.211s 11.212s
parse 30 53µs 36µs 174µs 235µs
reserve 30 8ms 2ms 35ms 36ms
route 30 1.667s 0s 10.003s 10.003s
encrypt 30 172µs 164µs 272µs 386µs
dispatch 30 68µs 39µs 303µs 383µs
coordinator_to_provider 30 2.045s 4ms 11.185s 11.198s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=52.6µs (threshold=1ms)
parse:p95<=5ms PASS p95=174µs (threshold=5ms)
reserve:mean<=50ms PASS mean=7.577133ms (threshold=50ms)
reserve:p95<=200ms PASS p95=34.816ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=172.4µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=272µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=67.966µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=303µs (threshold=50ms)

3-provider-heavy-100conc-10kb

3 providers, 20 users, 100 requests, concurrency=100, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 100
Success 100
Errors 0
Total Duration 15.481s
Throughput 6.5 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 100 10.317s 10.36s 14.514s 15.03s
parse 100 0s 0s 1ms 1ms
reserve 100 49ms 56ms 61ms 66ms
route 100 9.567s 10.237s 14.388s 14.901s
queue_wait 88 10.872s 10.93s 14.388s 14.902s
encrypt 100 269µs 229µs 569µs 872µs
dispatch 100 0s 0s 0s 2ms
coordinator_to_provider 100 640ms 7ms 5.413s 5.443s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=227.52µs (threshold=1ms)
parse:p95<=5ms PASS p95=1.051ms (threshold=5ms)
reserve:mean<=50ms PASS mean=49.01349ms (threshold=50ms)
reserve:p95<=200ms PASS p95=61.455ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=268.76µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=569µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=99.03µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=227µs (threshold=50ms)

The data protection keychain requires a provisioning profile to authorize
the keychain-access-groups entitlement. Wrap the CLI binaries in a minimal
Darkbloom.app bundle with embedded.provisionprofile so the persistent SE
attestation key works on provider machines.

- release-swift.yml: new step decodes PROVISIONING_PROFILE_BASE64 secret,
  builds Darkbloom.app/Contents/ structure, signs bundle + individual binaries
- install.sh: detects .app bundle layout, symlinks bin/ into the app bundle
- Backward-compatible: falls back gracefully if secret is not set or if
  provider receives a flat (pre-.app) bundle
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fdb113fa70

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coordinator/api/install.sh Outdated
@@ -22,7 +22,7 @@ set -euo pipefail
# Direct-fetch copy: no serve-time templating applied. Override with
# curl ... | COORD_URL=https://api.dev.darkbloom.xyz bash
# Or fetch the coordinator-served copy at $COORD_URL/install.sh for templating.
COORD_URL="${COORD_URL:-__DARKBLOOM_COORD_URL__}"
COORD_URL="${COORD_URL:-https://api.darkbloom.dev}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore coordinator URL placeholder in embedded installer

Reintroduce __DARKBLOOM_COORD_URL__ here instead of hardcoding https://api.darkbloom.dev, because /install.sh is still templated at serve time in server.go via strings.ReplaceAll(..., installScriptPlaceholder, s.resolveBaseURL(r)). With the placeholder removed, any dev/self-hosted coordinator now serves an installer that always talks to production, so release lookup and subsequent install traffic are routed to the wrong environment unless users manually override COORD_URL.

Useful? React with 👍 / 👎.

Required for data protection keychain access. Must match the bundle ID
in the provisioning profile (SLDQ2GJ6TL.io.darkbloom.provider).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f404c1aef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +180 to +185
let privateKeyAttrs: [String: Any] = [
kSecAttrIsPermanent as String: true,
kSecAttrAccessControl as String: accessControl,
kSecAttrLabel as String: label,
kSecAttrAccessGroup as String: accessGroup,
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Set data-protection keychain flag on persistent key ops

Add kSecUseDataProtectionKeychain: true to the persistent-key create/query/delete dictionaries; on macOS, kSecAttrAccessGroup and kSecAttrAccessible* only apply in data-protection-keychain mode. As written, the access-group constraint can be ignored, so the key may be created/looked up in the wrong keychain class and the intended team-scoped enforcement can silently fail (especially when rotating access groups or relying on entitlement-gated access behavior).

Useful? React with 👍 / 👎.

…, real SE probe

Codex P1 / hank P1:
- coordinator/api/install.sh: restore __DARKBLOOM_COORD_URL__ placeholder
  (the coordinator templates this at serve time via server.go;
  hardcoding the URL broke dev/self-hosted coordinators)
- PersistentEnclaveKey: add kSecUseDataProtectionKeychain: true to all
  Security framework calls. Without it, queries may hit the legacy
  file-based keychain where access group enforcement is silently ignored.

hank P2:
- loadOrCreate: catch only errSecItemNotFound before falling through to
  createNew. Auth failures, locked keychain, and missing entitlement
  now propagate to the caller instead of racing with key creation.
- isAvailable: probe real SE capability via CryptoKit's
  SecureEnclave.isAvailable instead of just checking macOS version.
  Now returns false on Intel Macs without T2 and macOS VMs without
  virtualized SE. Added doc comment noting the entitlement dependency.
@Gajesh2007
Copy link
Copy Markdown
Member Author

@codex can you review this?

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84664d04ba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/install.sh
# The bundle ships as Darkbloom.app/ (contains provisioning profile for
# keychain-access-groups) with bin/ symlinks for backward compatibility.
# Older flat bundles (bin/darkbloom directly) are also handled.
tar xzf "$TARBALL" -C "$INSTALL_DIR"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove existing app bundle before extracting update

The installer now untars directly into ~/.darkbloom and then reuses Darkbloom.app in place, but it never deletes a previous Darkbloom.app first. If a future release removes or renames files, stale files can remain inside the bundle, which can invalidate sealed app resources (unsealed contents) and cause signature/entitlement checks to fail at runtime. This is especially risky here because the new persistent-key path depends on valid app signing + embedded provisioning profile; stale bundle contents can force unexpected fallback or launch failures on upgrade.

Useful? React with 👍 / 👎.

@Gajesh2007 Gajesh2007 merged commit 4a0dae5 into swift-provider May 14, 2026
10 of 11 checks passed
Gajesh2007 added a commit that referenced this pull request May 15, 2026
Twelve fixes informed by three reviewer subagents (codex-rescue,
independent Claude, full pipeline audit) to ensure the bridge release
→ Swift release cutover works on first try, with no silent breakage.

Coordinator:
- accept darkbloom-bundle-<platform>.tar.gz (was eigeninference-bundle-)
- restore TestProviderRegistrationWithoutAttestationRejectedWhenBinaryHashPolicyConfigured
  (dropped during the master→swift-provider merge)

release-swift.yml:
- ship bin/{darkbloom,darkbloom-enclave,mlx.metallib} as real-file copies
  (was symlinks) so coordinator's tar.TypeReg verifier accepts them and
  hashes the actual bytes
- staple both bin/ AND .app/Contents/MacOS/ paths now that they're
  independent files
- post-codesign verification: fail build if signed CLI is missing the
  keychain-access-groups entitlement or the access group
  SLDQ2GJ6TL.io.darkbloom.provider, or if embedded.provisionprofile
  is absent from the .app
- PROVISIONING_PROFILE_BASE64 is now hard-required (no silent ephemeral
  fallback). Profile is decoded + parsed with plutil/python: verifies
  TeamIdentifier, keychain-access-groups, application-identifier, and
  ExpirationDate >= 30 days out
- pin MLX python wheel to 0.31.1 to match libs/mlx-swift Cmlx version
  (was 0.31.2 — patch-level metallib ABI risk)
- prod releases now hard-fail Swift tests (was soft-fail for all)

release-rust-bridge.yml:
- rename bridge bundle to darkbloom-bundle-<platform>.tar.gz uniformly
  so coordinator accepts the registration

Both release workflows:
- PROD_* secrets fall back to legacy unprefixed (R2_ACCESS_KEY_ID,
  RELEASE_KEY, COORDINATOR_URL) + vars.R2_BUCKET when PROD_* empty.
  Fails hard if neither resolves.

provider/src/main.rs (bridge auto-update):
- new rewrite_launchd_plist_for_swift: extracts ProgramArguments from
  the Rust plist (`serve --coordinator URL --model M`), converts to
  Swift shape (`start --foreground --coordinator-url URL --model M`),
  atomic rename
- install_swift_update_bundle_at: if Darkbloom.app/Contents/MacOS/
  exists in the extracted bundle, replace bin/{darkbloom,darkbloom-
  enclave,mlx.metallib} with symlinks into .app/MacOS and route the
  launchd plist's ProgramArguments[0] at the .app's MacOS binary path.
  This puts the embedded provisioning profile in scope at runtime, so
  the persistent SE key (PR #146) doesn't get errSecMissingEntitlement
  on first attestation post-cutover
- plist_path is now an Option<&Path> so tests can avoid touching the
  developer machine's real ~/Library/LaunchAgents

Tests added (all passing):
- 6 plist-rewrite unit tests: extract / convert / rewrite / install-
  with-plist / .app-aware install / hash-only install
- 1 ported coordinator attestation policy test
- existing 7 auto-update integration tests still pass (302 → 303 total)

Verified by audit:
- macos-26-xlarge has Xcode 26.2 / Swift 6.2, satisfies all
  swift-tools-version requirements
- LatestProviderVersion ordering: semver THEN created_at in both memory
  and Postgres stores
- /api/version JSON shape matches what auto_update_check_with_install_dir
  expects
- StartCommand --foreground doesn't recurse into launchAgent.installAndStart
- Swift ModelScanner reads ~/.cache/huggingface/hub (same as Rust)
- AuthTokenStore path parity (~/.darkbloom/auth_token)

Deployment prerequisite: coordinator changes must be deployed (master
→ dev Cloud Build, then human ecloud deploy to prod) BEFORE tagging
any release. Bridge registration will 400 against an older coordinator
that doesn't know about the darkbloom-bundle- filename.
Gajesh2007 added a commit that referenced this pull request May 15, 2026
* Clarify provider trust diagnostics

* Add Swift provider runtime

* Remove unused e2e vector generator

* Continuous batching, GPU-only enforcement, rename to darkbloom, Layr-Labs forks

This is the v0.5.0 cutover commit on the Swift provider PR. It lands
true continuous batching as the production inference path, threads
per-row sampling through the request, hard-fails on CPU-only hosts,
renames the user-visible CLI surface from "eigeninference" to
"darkbloom" with backward compatibility, and re-homes the mlx-swift /
mlx-swift-lm submodules to Layr-Labs forks.

Continuous batching (default, no parallel implementations)
----------------------------------------------------------
Replaces the per-request BatchScheduler with one shared BatchGenerator
ported from upstream `mlx_lm.generate`. All concurrent requests are
merged into one batched forward pass per step. Bit-identical against
single-stream greedy on:
  - Qwen3 0.6B-8bit (dense), B=2 / B=4-ragged
  - Qwen3.5 0.8B-MLX-4bit (hybrid SSM + attention), B=2
  - Gemma 4 26B-A4B-it-8bit (MoE, 26 GB), B=2

The mlx-swift-lm side of this work is at
Layr-Labs/mlx-swift-lm@darkbloom-continuous-batching:
  - BatchKVCache + BatchedCache protocol
  - SequenceStateMachine, PromptProcessingBatch, GenerationBatch,
    BatchGenerator
  - RowSamplers (temperature / top-P / top-K / seed)
  - Gemma 4 MoE support + K=V branch fix in Gemma4Attention

Production scheduler in provider-swift/Sources/ProviderCore/Inference/
BatchScheduler.swift wraps the engine in an actor; detached worker
calls into the actor only for short critical sections so cancel/submit
never queue behind a long-running step. submit() builds a per-row
sampler from request.{temperature, top_p, top_k, seed}.

Validation also covers eviction-and-admission: row 0 finishes mid-batch,
row C is admitted into its slot, row C's tokens match a solo run, row B
(running through the eviction) also matches its solo run. This locks in
BatchKVCache.filterBatched + extendBatched correctness end-to-end.

Sampler unit tests cover greedy passthrough, top-K=1 determinism,
top-K masking, top-P collapse-to-dominant, top-P=1 identity, seeded
reproducibility, and different-seed divergence.

GPU-only enforcement
--------------------
ProviderCore/Inference/GPUEnforcement.swift:
  - probeMetal(): non-throwing Metal device probe
  - requireMetal(): throws on missing GPU; pins Device.setDefault(.gpu);
    idempotent

Wired into BatchScheduler.loadModel, StartCommand, BenchmarkCommand,
and `darkbloom doctor`. Doctor surfaces a `[PASS] metal gpu: <name>,
<N> GB working set` line; `[FAIL]` on Intel/Linux. CPU fallback for
inference is rejected up-front with a descriptive error.

Rename: eigeninference → darkbloom (Swift CLI surface)
------------------------------------------------------
Canonical names:
  - eigeninference-enclave  → darkbloom-enclave (binary + struct)
  - Sources/eigeninference-enclave-cli/ → Sources/darkbloom-enclave-cli/
  - SwiftPM target EigenInferenceEnclaveCLI → DarkbloomEnclaveCLI
  - eigeninference-bundle-macos-arm64.tar.gz →
    darkbloom-bundle-macos-arm64.tar.gz
  - ~/.config/eigeninference/ → ~/.config/darkbloom/ (preferred path)
  - Mobileconfig prefix: EigenInference-Enroll-* → Darkbloom-Enroll-*

Backward compatibility:
  - install.sh creates a `eigeninference-enclave` symlink to
    `darkbloom-enclave` so existing install scripts keep resolving.
  - Config loader still reads ~/.config/eigeninference/ and the App
    Support legacy paths as fallbacks; new writes always go to
    ~/.config/darkbloom/.
  - LocalDataCleanup.purge() removes both directories.
  - release-swift.yml publishes the latest tarball under both
    canonical and legacy filenames.
  - NodeKeyPair.legacyDirNames and SecurityHardening MDM-profile-name
    matchers still accept the old name.
  - Coordinator/Rust/UI surfaces (R2 buckets, Stripe descriptors,
    Solana memos, telemetry source attribution) intentionally
    untouched.

CLI subcommands shipped in v0.5.0
---------------------------------
darkbloom serve / start / stop, status, doctor, models {list, catalog,
download, remove}, enroll, unenroll, login, logout, logs, autoupdate,
benchmark, update, verify. start --foreground is the launchd
entrypoint; start --local --port N runs a standalone OpenAI-compatible
HTTP server. PID-file single-instance enforcement, caffeinate-based
sleep prevention, panic-hook telemetry, and metallib hash in
attestation are all wired in.

Submodule re-homing
-------------------
.gitmodules now points to Layr-Labs/mlx-swift and Layr-Labs/mlx-swift-lm.
The mlx-swift pointer is unchanged (clean `main`). The mlx-swift-lm
pointer advances from 3ec4b8a (codex/local-mlx-swift-dependency) to
91612d5 (darkbloom-continuous-batching) which carries the batching
engine + Gemma 4 MoE fork on Layr-Labs/mlx-swift-lm.

Tests
-----
135 / 135 tests pass in 16.5 s with DARKBLOOM_LIVE_MLX_TESTS=1 and
DARKBLOOM_LIVE_MLX_GEMMA=1 (live MLX inference against real models
plus the gated 27 GB Gemma generation test).

* Bump mlx-swift-lm submodule to main after re-homing to Layr-Labs

Layr-Labs/mlx-swift-lm@main now carries the continuous-batching engine,
per-row samplers, and Gemma 4 MoE port at 8d76944. Same tree as the
prior 91612d5 commit on the darkbloom-continuous-batching branch, but
without the local-path mlx-swift dep hack, so the fork is consumable
by URL outside this repo.

* Untrack .claude/ files and drop dangling cross-references

The .claude/ directory holds local agent state (cursor task files,
working notes, the in-progress migration plan). Those don't belong in
the repo. Untrack the two committed markdown files and broaden the
.gitignore from `.claude/worktrees/` to `.claude/` so future agent runs
don't add them back. Strip the dead links to .claude/swift-migration-plan.md
from CLAUDE.md, provider-swift/README.md, docs/ARCHITECTURE.md, and
scripts/fetch-metallib.sh -- the surrounding prose stands on its own.

The local files remain on disk for active reference; only the tracking
is removed.

* Idle-timeout unload + coordinator-driven model preload protocol

Two related additions to the provider's model lifecycle:

1) Idle-timeout unload
----------------------
ProviderLoop now runs a background monitor that polls every minute.
If `idleTimeoutMins` minutes (default 60) have elapsed since the last
inference activity AND no requests are in flight, the loaded
ModelContainer is dropped. The next inference request lazy-reloads.
`idleTimeoutMins == 0` disables the monitor; the model stays
resident forever.

The decision is extracted into `IdleTimeoutPolicy.shouldUnload(...)`
so the rule is unit-testable without spinning up the full ProviderLoop
actor (which depends on Secure Enclave, coordinator client, and
security posture). Five unit tests pin the policy: (a) unloads when
all conditions met, (b) never unloads with inflight requests,
(c) never unloads with no model loaded, (d) waits for the timeout to
elapse, (e) zero-timeout edge case is still defensive.

Activity tracking: `lastInferenceAt` updates on every request
admission and on every request finish (`removeInflightTask`). The
worker is a detached `Task` so cancel/submit on the actor never
queue behind the timer.

2) Coordinator-driven model preload
-----------------------------------
New WebSocket message `coordinator → provider: load_model`. The
provider has no inbound listener (security: a discovered IP can't
reach the GPU), so the coordinator pushes preload requests over the
existing outbound WebSocket connection that the provider opened.
Use case: the coordinator predicts demand for model X on machine Y
in the next hour and warms it ahead of time.

Provider behavior:
  - If model is already loaded: short-circuit, reply succeeded.
  - Otherwise: emit `load_model_status` "started" immediately,
    kick off `ensureModelLoaded` in a detached Task, then emit
    "succeeded" or "failed" (with an error string) when the load
    settles.

Wire surface added in three places (per AGENTS.md sync rule):
  - coordinator/internal/protocol/messages.go: `TypeLoadModel`,
    `TypeLoadModelStatus`, `LoadModelMessage`, `LoadModelStatusMessage`,
    plus the `LoadModelStatusStarted/Succeeded/Failed` constants.
  - provider-swift/.../Protocol/Messages.swift: new
    `CoordinatorMessage.loadModel(...)` case + `ProviderMessage
    .loadModelStatus(...)` case + Codable on both sides.
  - provider-swift/.../Coordinator/CoordinatorClient.swift: dispatch
    inbound `load_model` to a new `CoordinatorEvent.loadModel(modelId)`
    and add `OutboundMessage.loadModelStatus(...)` for the reply.

ProviderLoop wires `handleLoadModelRequest(modelId:send:)` for the
new event. Round-trip tests cover decoding a Go-style `load_model`
JSON and encoding all three lifecycle status replies (started /
succeeded / failed-with-error) with snake_case wire keys.

Rust legacy provider intentionally untouched. The coordinator
should gate `load_model` dispatch on `backend == "mlx-swift"` so
the Rust path never receives an unknown message; that gate lives
on the coordinator side and isn't part of this commit.

Tests
-----
141 / 141 tests pass with DARKBLOOM_LIVE_MLX_TESTS=1 and
DARKBLOOM_LIVE_MLX_GEMMA=1 (live MLX inference + Gemma 4 26B-A4B-it-8bit
MoE batching included). New: 5 IdleTimeoutPolicy tests + 1
loadModel round-trip protocol test.

* Add end-to-end performance tests: TTFT, encryption, batching, model load

Four new live tests that produce reproducible numbers for the four
scenarios the operator asked about. Gated by DARKBLOOM_LIVE_MLX_TESTS=1;
all four target Qwen3 0.6B-8bit so the suite finishes in ~7 s.

  A) warm TTFT baseline -- pure inference TTFT with no encryption
     and the model already loaded.
  B) cold TTFT          -- spins up a fresh ModelContainer each
     iteration so the weights are re-paged from disk; reports
     load_time and load_time + first_token separately.
  C) encrypted TTFT     -- runs the request body through
     NodeKeyPair.encrypt (consumer side) and NodeKeyPair.decrypt
     (provider side) with real libsodium NaCl box, then submits.
     Reports encrypt-only, decrypt-only, warm TTFT, and total
     E2E first-token (enc + dec + TTFT) so each layer's cost is
     visible.
  D) batched TTFT       -- B=1, B=2, B=4 concurrent submissions on
     a single shared scheduler. Reports per-row TTFT and aggregate
     throughput so the continuous-batching scaling story is honest.

Headline numbers on M4 Max with Qwen3 0.6B-8bit:

  warm TTFT (plaintext):             ~20 ms
  encrypt (consumer side):           ~0.05 ms (libsodium NaCl box)
  decrypt (provider side):           ~0.02 ms
  E2E first-token (enc+dec+TTFT):    ~31 ms
  cold model load:                   ~856 ms
  cold load + first token:           ~1036 ms
  aggregate throughput B=1:          87.4 tok/s
  aggregate throughput B=2:          176.2 tok/s   (~2.0x)
  aggregate throughput B=4:          317.1 tok/s   (~3.6x)
  per-request TTFT B=1 -> B=4:       34 ms -> 36 ms (flat)

Encryption is essentially free, continuous batching scales
near-linearly to B=4, and per-request TTFT is invariant under
batching -- the key continuous-batching scheduler invariant.

The tests assert lower-bound liveness (durations > 0, all rows
complete) but don't pin absolute latencies, since those vary by
hardware. Numbers print to stderr in a "[perf]" prefix so they
land in the test log without polluting test stdout.

While here, fixed a `String(format:)` bug in the printRow helper
where `%s` was used with a Swift String (would have segfaulted
the test process via _platform_strlen on an unaligned pointer).

145 / 145 tests pass in 9 s with DARKBLOOM_LIVE_MLX_TESTS=1.

* Add Gemma 4 26B-A4B-it-8bit MoE tier to performance suite

Refactor PerformanceLiveTests so every scenario (warm TTFT, cold load,
encrypted E2E, batched throughput) is parameterised by a `ModelConfig`
struct (label, modelID, wired-memory budget, iteration counts, batch
sizes, max_tokens). Two configs ship in the suite:

  - Qwen3 0.6B-8bit         smoke tier (DARKBLOOM_LIVE_MLX_TESTS=1)
  - Gemma 4 26B-A4B-it-8bit production tier
                            (DARKBLOOM_LIVE_MLX_TESTS=1 +
                             DARKBLOOM_LIVE_MLX_GEMMA=1)

Both run all four scenarios. Total 8 @test methods (4 + 4).

Headline numbers on M4 Max with weights memory-mapped from local cache:

  Gemma 26B MoE:
    warm TTFT                     309 ms
    cold load                     2.63 s
    cold load + first token       3.07 s
    encrypt (consumer side)       0.05 ms
    decrypt (provider side)       0.03 ms
    E2E first-token               262 ms
    B=1 throughput                10.2 tok/s
    B=2 throughput                16.7 tok/s   (1.64x)
    B=4 throughput                23.9 tok/s   (2.34x)

  Qwen3 0.6B (for comparison):
    warm TTFT                     ~21 ms
    cold load                     ~887 ms
    E2E first-token               ~32 ms
    B=4 throughput                ~302 tok/s

Three things the Gemma tier surfaces that the smoke tier doesn't:

1. Encryption is *still* essentially free at 26B scale -- 70-80 us
   combined for encrypt + decrypt, dwarfed by the 200+ ms
   memory-bandwidth-bound prefill.
2. Per-row TTFT scales SUB-linearly with B for MoE (234 -> 344 -> 603
   ms at B=1/2/4) because each batched prefill processes a heavier
   forward. Aggregate throughput still wins (10 -> 17 -> 24 tok/s).
3. Cold load on a 26 GB MoE that's still in the OS page cache is
   ~2.6 s -- the relevant number for the idle-timeout-reload path.
   First-ever boot would be longer (NVMe-bound), but unmeasurable
   from a unit test without privileged page-cache flushing.

Also tighten the report formatting: column padding to 56 chars, "ms"
under 1 s and "s" above, max_tokens=8 for Gemma (vs 16 for Qwen) so
the suite finishes in ~30 s with all four scenarios run twice.

149 / 149 tests pass in 37 s with both env vars set.

* Performance audit vs mlx_lm: bracket the dispatch-overhead gap

The user noticed that "10.2 tok/s for Gemma 26B" looked too low. They
were right. Side-by-side with `mlx_lm` 0.31.3 Python on the same M4
Max + same checkpoints:

  Qwen3 0.6B-8bit              mlx_lm: 426 tok/s   us: ~84 tok/s   (5.0x)
  Gemma 4 26B-A4B-it-8bit MoE  mlx_lm:  84 tok/s   us: ~33 tok/s   (2.4x)

To localize the gap, this commit adds a "decode-tps bracket" test
that measures the same B=1 steady-state decode through three paths:

  1. Pure model loop  -- model.callAsFunction directly, no scheduler
  2. BatchGenerator   -- our continuous-batching engine, B=1
  3. BatchScheduler   -- production path (actor + AsyncStream)

Findings on Gemma 26B MoE (decode-only, 64 tokens):

  pure loop, sync eval        34.6 tok/s
  pure loop, async eval       34.4 tok/s    (no improvement -- not
                                              the issue)
  BatchGenerator B=1          32.6 tok/s    (-6%, noise-level)
  BatchScheduler.submit       32.5 tok/s    (-6%, noise-level)

  mlx_lm Python reference     84.0 tok/s    (2.4x faster)

Conclusion: the gap is at the **MLX-Swift dispatch layer**, not in
our scheduler or batched-cache code. The pure model loop is already
2.4x slower than Python. Adding our BatchScheduler + actor + worker
adds < 6% on top -- not the bottleneck.

The 8-13 ms per-step CPU overhead is consistent with kernel-launch
latency in mlx-swift bindings. mlx_lm Python uses `mx.compile` on
the decode step to amortize this; mlx-swift-lm does not. Closing
the gap is a separate workstream on the upstream library.

Other improvements in this commit:

* Bump Gemma's batched max_tokens from 8 -> 32 so steady-state
  decode dominates the aggregate TPS metric.
* Add steady-state decode TPS reporting alongside aggregate (subtract
  prefill so it compares like-for-like with mlx_lm's "Generation:
  X tokens-per-sec" headline).
* Switch the throughput tests to a long-output prompt ("write a 200
  word story...") so the model decodes to max_tokens instead of
  hitting EOS at ~12 tokens. The B=1 number was misleadingly low
  before because the prior prompt asked for "a single word".
* Add async-eval pipelining variant to the bracket -- confirms
  mx.async_eval alone doesn't close the gap (which means the missing
  optimization is `mx.compile`, not just async dispatch).
* Add Qwen3 bracket test alongside the Gemma one.
* Document the gap explicitly in the file header so future
  optimisation work has a clear target.

Honest headline numbers (M4 Max, weights memory-mapped from cache):

  Gemma 26B MoE warm TTFT             280-352 ms
  Gemma 26B MoE cold load             3.32 s   (re-page from cache)
  Gemma 26B MoE encrypt+decrypt       0.10 ms  (free)
  Gemma 26B MoE steady-state decode   32-40 tok/s   B=1
                                      35-39 tok/s   B=4 aggregate
  Qwen3 0.6B steady-state decode      84 tok/s      B=1
                                      323 tok/s     B=4 aggregate

Continuous batching itself works correctly: B=4 aggregate is 2.9x
B=1 (Gemma) and 3.8x B=1 (Qwen). The dispatch-overhead headwind
applies equally to all batch sizes.

151 / 151 tests pass in 71 s with both env vars set.

* Compare against mlx_lm batched + greedy fast-path in BatchScheduler

The previous perf audit only compared B=1 against mlx_lm. This commit
extends the comparison to B=1, B=2, B=4 by adding a Python benchmark
script (scripts/mlx_lm_batch_bench.py) that drives mlx_lm's upstream
BatchGenerator, and applies one targeted Swift-side optimization
based on what the comparison surfaced.

Reference numbers (mlx_lm 0.31.3, M4 Max, decode-only tok/s):

  Qwen3 0.6B-8bit              B=1: 265   B=2: 694   B=4: 1119
  Gemma 4 26B-A4B-it-8bit MoE  B=1:  74   B=2: 126   B=4:  181

The gap WIDENS with batch size, which pointed at an O(B) overhead in
our per-row sampling path. Smoking gun: GenerationBatch.step takes a
slow path whenever ANY row's sampler is non-nil, doing B separate
slice + sample + concat ops (=> 9 kernel launches per token at B=4)
instead of the vectorized fallback (=> 1 kernel launch). Our
BatchScheduler.submit was passing a non-nil greedy closure even when
temperature == 0, forcing every batch through the slow path.

Fix: when temperature <= 0, pass `nil` so the row falls through to
the vectorized fallback. The fallback is also greedy, so the result
is identical -- only the dispatch path changes. Per-row temperature
/ top-P / top-K / seed all still work for non-greedy rows.

Swift numbers after the fix (decode-only):

  Qwen3 0.6B-8bit              B=1:  88   B=2: 181   B=4:  351   (was 84 / 174 / 323)
  Gemma 4 26B-A4B-it-8bit MoE  B=1:  37   B=2:  23   B=4:   42   (was 33 / 21 / 39)

Modest +6-13% across the board. The remaining 3-4x gap to Python is
at the MLX-Swift dispatch layer (per-step kernel-launch overhead);
mlx_lm closes it via `mx.compile` on the decode step, which isn't
applied in mlx-swift-lm. That's a separate workstream.

Continuous batching scaling is still healthy:
  Qwen B=4 / B=1 = 4.0x   (matches mlx_lm's 4.2x exactly)
  Gemma B=4 / B=1 = 1.1x  (mlx_lm's is 2.4x; gap reflects MoE expert
                           dispatch where Python's compile pays off most)

Other changes:
* scripts/mlx_lm_batch_bench.py -- runnable apples-to-apples bench
  for future regression checks. Reproduces the reference numbers in
  the file header.
* Update PerformanceLiveTests.swift docstring with the side-by-side
  table so the gap is visible to anyone reading the test.

151 / 151 tests pass.

* Perf compare mlx_lm batching and bump mlx-swift-lm decode optimizations

The user called out that our Gemma 26B throughput looked too low, so this
commit makes the comparison apples-to-apples against mlx_lm Python's
BatchGenerator and bumps the mlx-swift-lm submodule to the optimized main
commit.

New reference script:
  scripts/mlx_lm_batch_bench.py

It runs mlx_lm.generate.BatchGenerator at B=1/2/4 over the same long-output
prompt used by PerformanceLiveTests and reports prefill+1, decode-only TPS,
and aggregate TPS. Reference numbers on M4 Max:

  Qwen3 0.6B-8bit              B=1: 265   B=2: 694   B=4: 1119 tok/s
  Gemma 4 26B-A4B-it-8bit MoE  B=1:  74   B=2: 126   B=4:  181 tok/s

Swift improvements landed in Layr-Labs/mlx-swift-lm@b02ea5b:

  - mlx_lm-style double buffering in GenerationBatch: constructor primes
    the first token, next() returns current token while async-evaluating
    the following token.
  - Greedy fast path avoids logSumExp: argMax(logits) == argMax(logprobs),
    and we don't expose logprobs downstream today.
  - BatchScheduler now passes nil for temperature=0 samplers so batches
    use the vectorized greedy fallback instead of per-row slice/sample/concat.
  - Token tensors are UInt32 to match mlx_lm.
  - BatchKVCache now exposes innerState and KVCache conforms to Updatable,
    which fixes the cache state surface needed for future compile work.

Measured Swift deltas:

  Qwen3 0.6B:
    B=1 decode      ~84 -> ~104 tok/s
    B=4 aggregate   ~323 -> ~363 tok/s

  Gemma 26B MoE:
    B=1 decode      ~32 -> ~37 tok/s
    B=4 aggregate   ~39 -> ~40 tok/s

This closes the avoidable scheduler/batching overhead we found, but does
not fully close the remaining 2-4x gap to Python. The bracket test shows
BatchGenerator/BatchScheduler are now within noise of the pure model loop;
the remaining gap is in mlx-swift model dispatch / lack of stateful
mx.compile support. Attempting to compile the batched-cache decode graph
still fails in mlx-swift with "uncaptured inputs", so that remains an
upstream library workstream rather than a provider scheduler bug.

* Clarify release-mode batch performance measurements

The previous perf notes mixed debug-mode Swift numbers with mlx_lm Python
reference numbers, which made the Swift engine look far worse than it is.
This test-only cleanup makes the performance suite report the data needed
to keep comparisons honest.

Changes:
- Update the PerformanceLiveTests header to state explicitly that mlx_lm
  comparisons must use `swift test -c release`; debug Swift is several
  times slower and not a valid reference.
- Add direct BatchGenerator B=2/B=4 decode-only measurements to the
  bracket test, in addition to pure loop and BatchScheduler.submit.
- Add "model-side scheduler" TPS in the public batched test so we can
  distinguish model decode speed from public text streaming / AsyncStream /
  detokenization costs.

Release-mode checks on this machine:
- Qwen3 0.6B direct BatchGenerator B=4: ~1130 tok/s, matching mlx_lm's
  ~1119 tok/s reference.
- Gemma 4 26B-A4B-it-8bit direct BatchGenerator B=4: ~186 tok/s,
  matching mlx_lm's ~181 tok/s reference.
- BatchScheduler.submit B=1 decode bracket also lands at the direct model
  rate in release mode (~402 tok/s Qwen, ~79 tok/s Gemma); public streaming
  tests report separate model-side and aggregate numbers so regressions are
  localizable.

No production code changes in this commit.

* Complete Swift provider runtime verification

* Bridge Rust updater to Swift provider bundles

* Add Rust to Swift updater E2E tests

* Add Rust bridge release workflow

* E2E testbed: integration tests, profiling, and benchmarking infrastructure (#136)

* Flatten coordinator/internal/ to coordinator/, add E2E integration test suite

Promote Go module root from coordinator/ to repo root so the e2e
test suite can import coordinator packages. Flatten
coordinator/internal/ to coordinator/ to remove the Go internal
package restriction.

All import paths change from
github.com/eigeninference/coordinator/internal/X to
github.com/eigeninference/d-inference/coordinator/X.
The module path is now github.com/eigeninference/d-inference.

12 E2E integration tests using the Swift provider (mlx-swift backend):
- NonStreamingInference, StreamingInference
- MultipleRequestsAccounting, E2EEncryptionCorrectness
- BillingBalanceDeduction, ProviderPayoutSplit, ReferralRewardDistribution
- InsufficientBalance, InvalidModel
- StreamingContentValidation, ConcurrentRequests, AttestationHeaders

Each test gets its own isolated suite (Postgres + coordinator + provider)
via startSuite(t). A semaphore serializes suite lifecycles to prevent
GPU contention from concurrent MLX model loads.

Update CI workflows to reference go.mod at repo root, exclude e2e/
from unit tests, and use swift build for the provider.

* Move coordinator e2e back to coordinator/internal/e2e/

The coordinator's own e2e package was incorrectly flattened into
coordinator/e2e/ alongside the repo-root e2e/ testbed suite.
Restore it to coordinator/internal/e2e/ where it belongs.

* Run integration tests on any PR, not just master/main

* Fix CI: install Docker on macos-15, increase timeout to 30m, serial tests

* Use colima for Docker on macOS CI

* Remove invalid --no-mount flag from colima start

* Add native Postgres fallback, drop Docker/colima from CI

Docker Desktop and colima both fail on macOS CI runners due to
virtualization restrictions. Add a native Postgres lifecycle that
uses initdb + postgres directly (installed via Homebrew).

The Start() method tries Docker first, falls back to native.
CI now installs postgresql@16 via brew instead of Docker.

* Download MLX model in CI before running integration tests

* Use Python API for model download (huggingface-cli is deprecated)

* Use shared suite across all integration tests

Instead of starting a new suite (Postgres + coordinator + provider +
model load) per test, use a single shared suite initialized on first
access. This cuts total test time from ~18min to ~3min since the
expensive model load only happens once.

* Build provider in debug mode for CI (skips SIP/security checks)

CI macOS runners have SIP disabled, which causes the provider to
exit with 'System Integrity Protection is disabled'. Debug builds
skip verifySecurityPosture() via #if !DEBUG, allowing tests to
run on CI.

Add TESTBED_PROVIDER_CONFIG env var (default: release) to control
the Swift build configuration from testbed.

* Force-trust provider in tests, disable frequent challenges

CI macOS runners have SIP disabled, which causes the provider to
fail attestation challenges. Add ForceTrustProvider() to override
status/trust/SIP verification for testing, set challenge interval
to 1h, and add a 3s delay after registration to let the initial
challenge fire before overriding.

* Force all privacy capabilities in ForceTrustProvider for testing

The private-text routing gate checks PythonRuntimeLocked and
DangerousModulesBlocked which are always false on the Swift
backend (no Python runtime). ForceTrustProvider now sets all
privacy capabilities to true and drains queued requests
immediately after trust promotion.

* Restore per-test isolated suites

Each test gets its own Postgres + coordinator + provider.
With debug builds, ForceTrustProvider, native Postgres, and
model pre-download, each suite starts in ~15-20s.

* Add load generator, profiling tests, multi-provider support

- Suite.Providers is now []*Provider; TESTBED_NUM_PROVIDERS env var
  controls how many provider subprocesses start per suite
- New LoadGenerator in testbed/load.go with configurable concurrency,
  total requests, streaming, max_tokens, temperature
- New profile tests: SingleProviderStreaming, SingleProviderNonStreaming,
  HighConcurrency — each prints segment tables with mean/p50/p95/max
- Existing integration tests (NonStreaming, Streaming, Concurrent) now
  emit Instrument events and print profile tables
- Profile SummaryTable uses millisecond resolution instead of microsecond

* Add multi-model provider specs, user pool, and latency decomposition headers

SuiteConfig now takes ModelSpecs (model ID + provider count per model) and
NumUsers. Providers are started per-spec with unique PID files (fixes
single-instance lock killing sibling providers). A user pool with round-robin
API key rotation is created at startup.

Coordinator sets X-Queue-Wait-Ms and X-Provider-Latency-Ms response headers
from PendingRequest timing fields (QueuedAt, DispatchedAt, FirstChunkAt).
LoadGenerator parses these and emits per-segment stats:
client_to_coordinator, queue_wait, coordinator_to_provider, provider_to_client.

Provider ProcessLifecycle respects DARKBLOOM_PID_FILE env var for
multi-instance testing. Add SetSkipChallenge to Server for test runs.

* Rename SegmentClientToCoordinator to SegmentTotalE2E

The segment measures full end-to-end wall clock time, not just
client-to-coordinator latency. The old name was misleading.

* Decompose X-Timing header into per-phase microsecond breakdown

Replace X-Queue-Wait-Ms / X-Provider-Latency-Ms with a single X-Timing
JSON header containing parse_us, reserve_us, route_us, queue_us,
encrypt_us, dispatch_us, provider_us. Move timing fields onto a
RequestTiming struct in PendingRequest. LoadGenerator parses the JSON
and emits per-segment stats with auto ms/µs precision.

* Add latency regression assertions, SegmentStatsMap, and heavy-load benchmark

- Add SegmentStatsMap() to LoadResult for per-segment mean/p50/p95/p99/max
- Wire coordinator overhead assertions into all benchmark and profile tests
- Update DefaultThresholds with realistic values based on benchmark data
- Add CoordinatorOverheadThresholds() alias
- Deduplicate SegmentStatsView (assert package uses type alias to testbed)
- Clean up profile_test.go: remove redundant second load loop, use assertions
- Add PromptBytes field to RequestConfig for large-payload testing
- Add HeavyLoad 100-concurrent 10KB benchmark
- Replace bubble sort with sort.Slice in computeStats

* Split CI into eval + benchmark jobs, post benchmark results as PR comment

Integration tests (TestIntegration|TestProfile) run on every push/PR.
Benchmarks (TestBenchmark) run only on PRs and post a markdown summary
as a PR comment via gh pr comment. LoadResult and AssertionReport gain
SummaryMarkdown() methods for markdown table formatting. A TestMain in
benchmark_test.go writes the aggregated markdown to BENCHMARK_MD_PATH
when set.

* Skip multi-model benchmark in CI (gemma model not downloaded)

The M1 Virtual CI runner only downloads Qwen3.5-0.8B; the gemma
multi-model test requires a second model that isn't available.

* Download gemma-3-270m-4bit in CI, remove multi-model skip

* Include model IDs and RAM sizes in benchmark PR comment

* address feedback

* fix: soft-fail Swift tests on dev + download full model for CI

* feat: environment-scoped R2 + coordinator secrets for dev/prod release isolation

- Move R2_BUCKET from vars to secrets so it participates in GitHub
  environment scoping (dev vs prod get different buckets/credentials)
- Add documentation header listing all environment-scoped secrets
  required per environment
- Soft-fail Swift unit tests on dev releases (live MLX model cache
  may be incomplete on CI)
- Download full model (remove --include filter) for deterministic
  CI cache seeding

* feat: DEV_/PROD_ prefixed repo secrets for R2 + coordinator env isolation

Both release workflows now resolve DEV_ or PROD_ prefixed repo secrets
in a resolve-env step using bash indirection — no GitHub environments
needed. The environment: gate is removed since secrets live at repo
level with prefixes.

Required repo secrets:
  DEV_R2_ACCESS_KEY_ID, PROD_R2_ACCESS_KEY_ID
  DEV_R2_SECRET_ACCESS_KEY, PROD_R2_SECRET_ACCESS_KEY
  DEV_R2_ENDPOINT, PROD_R2_ENDPOINT
  DEV_R2_BUCKET, PROD_R2_BUCKET
  DEV_R2_PUBLIC_URL, PROD_R2_PUBLIC_URL
  DEV_COORDINATOR_URL, PROD_COORDINATOR_URL
  DEV_RELEASE_KEY, PROD_RELEASE_KEY

* fix: RELEASE_KEY is shared, not env-prefixed

* fix: resolve env secrets inline to avoid GitHub cross-job output masking

* fix: add DEV_RELEASE_KEY/PROD_RELEASE_KEY to env-prefixed secrets

* Add STRIDE threat model for runtime security review

40 threats across 9 trust boundaries (coordinator/provider WebSocket,
provider operator vs process, browser/UI, Apple MDM/MDA, admin API,
inference engine, payments, Apple attestation chain). Adversaries:
malicious provider, malicious consumer, external attacker. Each threat
includes affected_files globs, mitigations with status, open_findings
links to the existing security audit, and a detection_hint for
automated PR review.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* Expand threat model trust boundaries with implementation detail

Each of the 9 trust boundaries now documents how_it_works (exact code
paths, line numbers, auth mechanisms, data flows) and current_limitations
(specific open gaps with SEC-* references). Sources: coordinator/internal/
api/{server,provider,release_handlers,device_auth,billing_handlers}.go,
registry/registry.go, attestation/, mdm/, provider-swift/Sources/
ProviderCore/Security/{AntiDebug,BinaryHasher,SecureEnclaveIdentity,
SecurityHardening}.swift, Crypto/NodeKeyPair.swift, Inference/
{BatchScheduler,IdleTimeoutPolicy,InferenceCancellation}.swift,
ProviderLoop.swift, console-ui/src/{hooks/useAuth,lib/{api,store,
encryption}}.ts, next.config.ts.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* Add threat model PR review workflow

On every PR against master/main, the workflow:
1. Gets the PR diff via gh pr diff
2. Matches changed files against affected_files globs in docs/threat-model.yaml
3. Calls Claude API (claude-sonnet-4-6) with the focused diff + full threat model
4. Posts (or updates) a single PR comment with STRIDE-based security analysis

Uses prompt caching on the static threat model block to minimise API cost
on repeated pushes. The comment marker <!-- threat-model-review --> lets
the workflow update rather than append on each push.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* Persistent Secure Enclave key with keychain access group enforcement (#146)

* Add persistent Secure Enclave attestation key with keychain access group enforcement

Replace ephemeral CryptoKit SE keys with persistent Security framework keys stored
in the macOS data protection keychain. The key is bound to the signing team's
keychain access group (SLDQ2GJ6TL.io.darkbloom.provider), enforced by securityd
at the kernel level. A patched binary re-signed with codesign -s - gets
errSecMissingEntitlement and cannot access the key.

- PersistentEnclaveKey: Security framework SE key with SecKeyCreateRandomKey,
  kSecAttrIsPermanent, and team-scoped access group
- AttestationSigner protocol: abstracts over both ephemeral and persistent keys
- ProviderLoop: tries persistent key first, falls back to ephemeral with warning
- Entitlements plist with keychain-access-groups for production signing
- 8 tests covering creation, persistence, signing, deletion, protocol conformance

* Embed provisioning profile in .app bundle for persistent SE key

The data protection keychain requires a provisioning profile to authorize
the keychain-access-groups entitlement. Wrap the CLI binaries in a minimal
Darkbloom.app bundle with embedded.provisionprofile so the persistent SE
attestation key works on provider machines.

- release-swift.yml: new step decodes PROVISIONING_PROFILE_BASE64 secret,
  builds Darkbloom.app/Contents/ structure, signs bundle + individual binaries
- install.sh: detects .app bundle layout, symlinks bin/ into the app bundle
- Backward-compatible: falls back gracefully if secret is not set or if
  provider receives a flat (pre-.app) bundle

* Add com.apple.application-identifier to provider entitlements

Required for data protection keychain access. Must match the bundle ID
in the provisioning profile (SLDQ2GJ6TL.io.darkbloom.provider).

* Address review: data protection keychain flag, tighter error handling, real SE probe

Codex P1 / hank P1:
- coordinator/api/install.sh: restore __DARKBLOOM_COORD_URL__ placeholder
  (the coordinator templates this at serve time via server.go;
  hardcoding the URL broke dev/self-hosted coordinators)
- PersistentEnclaveKey: add kSecUseDataProtectionKeychain: true to all
  Security framework calls. Without it, queries may hit the legacy
  file-based keychain where access group enforcement is silently ignored.

hank P2:
- loadOrCreate: catch only errSecItemNotFound before falling through to
  createNew. Auth failures, locked keychain, and missing entitlement
  now propagate to the caller instead of racing with key creation.
- isAvailable: probe real SE capability via CryptoKit's
  SecureEnclave.isAvailable instead of just checking macOS version.
  Now returns false on Intel Macs without T2 and macOS VMs without
  virtualized SE. Added doc comment noting the entitlement dependency.

* fix(api): add code and param fields to OpenAI error responses (#144)

The errorResponse function only populated type and message, missing
code and param required by the OpenAI API spec. Without code, SDKs
cannot programmatically distinguish error types (e.g. Python SDK
e.code returns None, retry logic breaks, Sentry groups all errors
as one).

Changes:
- errorResponse now accepts optional errorDetailOpt variadic args
- code defaults to errType for backward compatibility
- withParam() and withCode() helpers for call-site overrides
- model-not-found errors include param="model"
- model-is-required errors include param="model"
- insufficient_funds uses OpenAI-canonical code "insufficient_quota"
- rate_limit_exceeded gets explicit withCode for clarity

All 202 existing call sites are backward-compatible: the variadic
signature means they compile unchanged, and the default code=errType
matches the implicit behavior SDKs already assumed.

Closes #142

* feat: add Datadog observability stack for dev coordinator (#143)

* Fix Darkbloom analytics tracking

* Harden release workflow protections (#103)

* Harden release registration and binary hash policy (#99)

* Harden release registration and binary hash policy

* derive release download URL from allowlist

* Stabilize provider coordinator test

---------

Co-authored-by: Gajesh Naik <26431906+Gajesh2007@users.noreply.github.com>

* Remove stale Python integration test (#109)

* e2e: add local simulation environment skeleton

Introduces scripts/e2e-runner.py, a Python orchestrator that spins up the
real coordinator binary with test-friendly configuration (in-memory store,
mock billing, no trust requirements) alongside a simulated or real
provider, and runs HTTP/WebSocket-level assertions against the live stack.

Key components:
- Coordinator class: builds and spawns coordinator with EIGENINFERENCE_MIN_TRUST=none,
  EIGENINFERENCE_BILLING_MOCK=true, and in-memory store
- SimulatedProvider: pure-Python WebSocket client speaking the full provider protocol
  (register, attestation challenge/response, heartbeat, inference request/response)
- Test framework: decorator-based test registration, pass/fail summary, signal-safe
  cleanup via atexit + signal handlers
- Test stubs: test_basic (registration + discovery), test_inference (consumer
  request routing), test_multi_provider (two providers, same model)

TODO:
- RealProvider wrapper around darkbloom serve --coordinator
- Coordination between provider challenge cycle and consumer request timing
- API key handling for consumer vs admin routes
- Python dependency management (websockets, cryptography)

* Revert "e2e: add local simulation environment skeleton"

This reverts commit d02074e. The Python E2E runner adds noise on top of
the existing Go integration tests (internal/api/integration_test.go +
fullstack_integration_test.go) which already cover the full coordinator
protocol surface. The cross-language orchestration doesn't buy anything
over what httptest.Server + simulated providers already provide.

* Remove stale Python integration test

@ethenotethan

tests/integration_test.py is superseded by the Go-based coordinator
integration tests at coordinator/internal/api/:

- Test coverage for coordinator protocol (register, challenge, heartbeat,
  inference) is covered by integration_test.go using httptest.Server +
  Go simulated providers — same coverage, no binary build needed
- Full-stack GPU inference is covered by fullstack_integration_test.go
  with real vllm-mlx backends (gated behind LIVE_FULLSTACK_TEST=1)
- The Python test uses stale binary names ('eigeninference-provider'),
  old flags ('--backend mlx-lm'), and predates attestation challenges,
  E2E encryption, and the vllm-mlx backend migration
- No external dependency coverage (Postgres, Stripe, etc.) is lost — the
  coordinator main.go wiring for those is trivially tested elsewhere
- The Python SDK tests (4.5.x) belong in the SDK repo, not the infra repo

---------

Co-authored-by: Hank Bob <hankbob@researchoors.com>

* chore: remove unused dependencies (#112)

* chore: remove unused dependencies

* test: fix console ui test isolation

* chore: prune repo-wide dead code findings

* ci: run CI on any PR, not just master/main (#119)

* ci: remove racing deploy-dev-coordinator workflow (#137)

Cloud Build (deploy/gcp/cloudbuild.yaml) already deploys the coordinator
on the same trigger (push to master touching coordinator/** or deploy/gcp/**).
Having both paths active creates a race condition where two CI systems
simultaneously deploy to the same dev VM — see #115.

* feat: add Datadog observability stack for dev coordinator

Install Datadog Agent on the dev GCE VM (DogStatsD, APM, journald logs)
and wire the coordinator to emit structured metrics, split attestation
counters, model_type tags, reactive provider-count gauges, and a
completion-tokens counter. Rebuild the dev dashboard with 7 sections
covering metrics, logs, traces, and system health.

* fix: prevent double-decrement when untrusted provider disconnects

Disconnect now checks StatusUntrusted before decrementing the online
counter and model-provider gauges, since MarkUntrusted already
decremented them.

* feat: add fleet version and binary hash observability

New metrics:
- providers.per_version gauge (per provider binary version)
- providers.per_binary_hash gauge (per attested binary hash)
- coordinator.min_provider_version_set gauge (1 when configured)
- provider_version_below_minimum counter (tagged by gate and version)

Gates instrumented:
- registration (provider.go)
- challenge revalidation (provider.go)
- manifest sync (server.go)

Registry additions:
- ProviderCountByVersion()
- ProviderCountByBinaryHash()

Dashboard: Fleet Version & Binary Hash group with providers by version,
providers by binary hash, min provider version, below-minimum events,
and top binary hashes toplist.

* fix: update Dockerfile + cloudbuild for go.mod at repo root

go.mod moved from coordinator/ to repo root during the swift-provider
merge. Build context is now repo root, Dockerfile copies coordinator/
subdir explicitly.

* fix: chmod +x coordinator binary in Dockerfile

* fix: ensure coordinator binary is executable in builder stage

* fix: rename coordinator source dir in builder to avoid colliding with binary path

* fix: copy full repo in Dockerfile builder so go.mod resolves all packages

* fix: remove unused modelTypeTag and format Go files for CI

* fix: skip python/dangerous-modules check for swift runtime in private text gate

* billing telemetry + MarkUntrusted race fix + Swift routing tests

- Add Datadog histogram metrics for reservation amounts, settlement
  refunds, provider credits, and platform fees
- Add store.debit/credit.latency_ms histograms for DB operation timing
- Add billing.cost_clamped and billing.reservation_refunds counters
- Fix race in MarkUntrusted: hold r.mu write lock through counter
  decrement to prevent double-decrement with Disconnect
- Add unit tests for Swift provider privacy caps (with/without Python)
- Add E2E test for Swift provider routing via challenge-verified path
- Update dev-network-dashboard.json with Billing & Store group

* fix Heartbeat reviving untrusted providers causing onlineCount double-decrement

* revert orthogonal landing/console-ui/provider changes

* remove unbounded binary_hash cardinality, add input token metrics + store latency, fix dashboard group-by

* fix review feedback: ModelType() untrusted filter, routing.cost_ms by provider, billing in cents, dead comment

---------

Co-authored-by: Gajesh Naik <26431906+Gajesh2007@users.noreply.github.com>
Co-authored-by: anupsv <6407789+anupsv@users.noreply.github.com>
Co-authored-by: hankbob <hankbobtheresearchoor@gmail.com>
Co-authored-by: Hank Bob <hankbob@researchoors.com>

* migration: harden Rust→Swift cutover end-to-end

Twelve fixes informed by three reviewer subagents (codex-rescue,
independent Claude, full pipeline audit) to ensure the bridge release
→ Swift release cutover works on first try, with no silent breakage.

Coordinator:
- accept darkbloom-bundle-<platform>.tar.gz (was eigeninference-bundle-)
- restore TestProviderRegistrationWithoutAttestationRejectedWhenBinaryHashPolicyConfigured
  (dropped during the master→swift-provider merge)

release-swift.yml:
- ship bin/{darkbloom,darkbloom-enclave,mlx.metallib} as real-file copies
  (was symlinks) so coordinator's tar.TypeReg verifier accepts them and
  hashes the actual bytes
- staple both bin/ AND .app/Contents/MacOS/ paths now that they're
  independent files
- post-codesign verification: fail build if signed CLI is missing the
  keychain-access-groups entitlement or the access group
  SLDQ2GJ6TL.io.darkbloom.provider, or if embedded.provisionprofile
  is absent from the .app
- PROVISIONING_PROFILE_BASE64 is now hard-required (no silent ephemeral
  fallback). Profile is decoded + parsed with plutil/python: verifies
  TeamIdentifier, keychain-access-groups, application-identifier, and
  ExpirationDate >= 30 days out
- pin MLX python wheel to 0.31.1 to match libs/mlx-swift Cmlx version
  (was 0.31.2 — patch-level metallib ABI risk)
- prod releases now hard-fail Swift tests (was soft-fail for all)

release-rust-bridge.yml:
- rename bridge bundle to darkbloom-bundle-<platform>.tar.gz uniformly
  so coordinator accepts the registration

Both release workflows:
- PROD_* secrets fall back to legacy unprefixed (R2_ACCESS_KEY_ID,
  RELEASE_KEY, COORDINATOR_URL) + vars.R2_BUCKET when PROD_* empty.
  Fails hard if neither resolves.

provider/src/main.rs (bridge auto-update):
- new rewrite_launchd_plist_for_swift: extracts ProgramArguments from
  the Rust plist (`serve --coordinator URL --model M`), converts to
  Swift shape (`start --foreground --coordinator-url URL --model M`),
  atomic rename
- install_swift_update_bundle_at: if Darkbloom.app/Contents/MacOS/
  exists in the extracted bundle, replace bin/{darkbloom,darkbloom-
  enclave,mlx.metallib} with symlinks into .app/MacOS and route the
  launchd plist's ProgramArguments[0] at the .app's MacOS binary path.
  This puts the embedded provisioning profile in scope at runtime, so
  the persistent SE key (PR #146) doesn't get errSecMissingEntitlement
  on first attestation post-cutover
- plist_path is now an Option<&Path> so tests can avoid touching the
  developer machine's real ~/Library/LaunchAgents

Tests added (all passing):
- 6 plist-rewrite unit tests: extract / convert / rewrite / install-
  with-plist / .app-aware install / hash-only install
- 1 ported coordinator attestation policy test
- existing 7 auto-update integration tests still pass (302 → 303 total)

Verified by audit:
- macos-26-xlarge has Xcode 26.2 / Swift 6.2, satisfies all
  swift-tools-version requirements
- LatestProviderVersion ordering: semver THEN created_at in both memory
  and Postgres stores
- /api/version JSON shape matches what auto_update_check_with_install_dir
  expects
- StartCommand --foreground doesn't recurse into launchAgent.installAndStart
- Swift ModelScanner reads ~/.cache/huggingface/hub (same as Rust)
- AuthTokenStore path parity (~/.darkbloom/auth_token)

Deployment prerequisite: coordinator changes must be deployed (master
→ dev Cloud Build, then human ecloud deploy to prod) BEFORE tagging
any release. Bridge registration will 400 against an older coordinator
that doesn't know about the darkbloom-bundle- filename.

* chore: cargo fmt on plist-migration code

Post-rustfmt: long format!() args wrapped, with_context closure pulled
onto one line, ternary-style assignment broken into if/else. No
behavior change — `cargo test --bin darkbloom` still 303 pass / 0 fail.

---------

Co-authored-by: ethenotethan <42627790+ethenotethan@users.noreply.github.com>
Co-authored-by: anupsv <6407789+anupsv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: hankbob <hankbobtheresearchoor@gmail.com>
Co-authored-by: Hank Bob <hankbob@researchoors.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants