Clarify provider trust diagnostics#104
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
| .timeout(std::time::Duration::from_secs(5)) | ||
| .build()?; | ||
| let resp = client | ||
| .get(format!("{base_url}/v1/providers/attestation")) |
There was a problem hiding this comment.
🔴 Critical: This endpoint leaks all providers' attestation data to every caller
The client fetches the entire provider list from /v1/providers/attestation and filters client-side by serial_number. This means any unauthenticated caller gets every provider's serial_number, provider_id, trust_level, status, sip_enabled, etc.
Serial numbers are PII-equivalent — they uniquely identify hardware and can't be rotated.
Fix: Add a query param ?serial_number=X on the server side and only return records matching that serial. The endpoint should not expose other providers' data.
There was a problem hiding this comment.
eh this isn't really leaking since the endpoint is publicly accessible. the riskiest field is the serial number which is embedded as an OIN in the x509 cert signed by apple attesting to the device's identity and the p256 key that's generated within the enclave. chatted with @anupsv who also agreed its a bit risky to expose publicly.
|
|
||
| if already_enrolled { | ||
| println!(" ✓ Already enrolled in MDM — skipping"); | ||
| println!(" ✓ Local MDM profile present"); |
There was a problem hiding this comment.
already_enrolled branch skips re-enrollment nudge
When MDM is already present locally, the code just prints "Local MDM profile present" and moves on. But the old profile might be the legacy static one (pre-ACME). The user should be nudged to re-enroll to get the ACME device-attest-01 payload.
Suggestion: Add a one-liner like:
println!(" Consider re-enrolling with `darkbloom enroll` to update your attestation profile.");| b.is_hardware_verified() | ||
| .cmp(&a.is_hardware_verified()) | ||
| .then_with(|| b.is_online().cmp(&a.is_online())) | ||
| .then_with(|| a.provider_id.cmp(&b.provider_id)) |
There was a problem hiding this comment.
💡 Suggestion: Sorting by UUID string comparison gives no meaningful ordering
provider_id values are UUIDs, so a.provider_id.cmp(&b.provider_id) is essentially random. Consider sorting by most-recent registration time, or just taking the first hardware-verified online record without sorting the rest.
| if trimmed.is_empty() { | ||
| "<no output>".to_string() | ||
| } else { | ||
| trimmed.lines().take(12).collect::<Vec<_>>().join("\n ") |
There was a problem hiding this comment.
💡 Suggestion: take(12) truncates silently
For profiles status -type enrollment, 12 lines is probably fine, but for future commands added to --support, this could silently drop important output. Consider adding an indication like (truncated, X more lines) when output is cut.
|
|
||
| let mut issues: Vec<String> = Vec::new(); | ||
| let mut passed = 0; | ||
| let total_checks = 9; |
There was a problem hiding this comment.
💡 Suggestion: total_checks is hardcoded to 9
If someone adds/removes a check, total_checks won't auto-update. Consider deriving it from passed + issues.len() at the end, or at least adding a comment like // Update this when adding/removing checks.
| .build()?; | ||
| let resp = client | ||
| .get(format!("{base_url}/v1/providers/attestation")) | ||
| .send() |
There was a problem hiding this comment.
🔴 Critical: This endpoint leaks all providers' attestation data to every caller
The client fetches the entire provider list from /v1/providers/attestation and filters client-side by serial_number. This means any unauthenticated caller gets every provider's serial_number, provider_id, trust_level, status, sip_enabled, etc.
Serial numbers are PII-equivalent — they uniquely identify hardware and can't be rotated.
Fix: Add a query param ?serial_number=X on the server side and only return records matching that serial.
There was a problem hiding this comment.
@claude how frequent are queries being made to the coordinator server by the inference provider on /v1/providers/attestation ?
There was a problem hiding this comment.
Claude finished @ethenotethan's task in 38s —— View job
Query frequency for /v1/providers/attestation
- Understand the question
- Find all call sites in provider code
- Determine call frequency/trigger conditions
The provider CLI calls /v1/providers/attestation in exactly two places, both triggered only by manual user commands — there is no automatic/periodic polling:
| Trigger | Location | Frequency |
|---|---|---|
darkbloom status |
main.rs:4910 |
Once per invocation (on-demand) |
darkbloom doctor (check #9) |
main.rs:5571 |
Once per invocation (on-demand) |
The running serve process does not call this endpoint at all. No timer, ticker, or background loop hits it. Frequency is entirely user-driven.
Relevant to the serial number leakage debate: since both call sites are one-shot CLI commands invoked by the provider operator themselves (not automated), this is not a high-frequency background polling pattern. That said, the endpoint is also called by the console UI (from the browser):
PreSendTrustBanner.tsx:25— fetches on chat page load (once per page load)VerificationPanel.tsx:511— fetches when the user clicks "Verify" (on-demand)
Those are also infrequent, but they're in the web client, meaning any browser visiting the console triggers a full provider list dump. The serial number exposure concern from the review is still valid — but the blast radius is limited by the low call frequency and the fact that provider-side calls are self-inflicted (the provider operator is querying their own record).
There was a problem hiding this comment.
got it, So:
Only when a user manually runs status or doctor
In that case this is a good callout but is better suited as a follow up issue (not blocking merge).
There was a problem hiding this comment.
hmm still curious about what the request lifecycle looks like for the /attestation endpoint and if there could be DoS... not an issue for this PR though
| Ok(serial) => { | ||
| println!(" Serial: {serial}"); | ||
| match fetch_coordinator_provider_trust(DEFAULT_COORDINATOR_HTTP_URL, &serial).await { | ||
| Ok(records) if records.is_empty() => { |
There was a problem hiding this comment.
🔴 Critical: cmd_status hardcodes DEFAULT_COORDINATOR_HTTP_URL
If a provider is running against staging or a local coordinator, darkbloom status will query prod and show wrong trust info. The Status subcommand doesn't accept a --coordinator flag.
Fix: Add --coordinator to Status (like Doctor has), or read it from the existing config file.
There was a problem hiding this comment.
@hankbobtheresearchoor I agree with your deduction but disagree with --coordinator being the proper user entrypoint, perhaps something like dev and prod distinctions would make the most sense for user comprehensibility.
There was a problem hiding this comment.
actually second guessing myself, for spinning up the network in a local env being able to pass endpoint is critical. probably makes sense to:
- default to prod coordinator if no coordinator URL is provided to binary env
| #[serde(default)] | ||
| trust_level: String, | ||
| #[serde(default)] | ||
| status: String, |
There was a problem hiding this comment.
serde(default) on bool fields gives silent wrong defaults
#[serde(default)] defaults bools to false. If the coordinator response is missing a field, the provider silently interprets "missing" as "not verified" — safe direction, but masks API breakage.
Consider: Option<bool> so a deserialization mismatch surfaces as None rather than a silent false.
There was a problem hiding this comment.
this is more or less dynamic resolution and doesn't ensure adequate compatibility if this data type were to get updated by the coordinator in the future.
| if already_enrolled { | ||
| println!(" ✓ Already enrolled in MDM — skipping"); | ||
| println!(" ✓ Local MDM profile present"); | ||
| println!(" Coordinator hardware trust will be verified after provider registration."); |
There was a problem hiding this comment.
already_enrolled branch skips re-enrollment nudge
When MDM is already present locally, the code just prints "Local MDM profile present". But the old profile might be the legacy static one (pre-ACME). The user should be nudged to re-enroll to get the ACME device-attest-01 payload.
println!(" Consider re-enrolling with `darkbloom enroll` to update your attestation profile.");| .cmp(&a.is_hardware_verified()) | ||
| .then_with(|| b.is_online().cmp(&a.is_online())) | ||
| .then_with(|| a.provider_id.cmp(&b.provider_id)) | ||
| } |
There was a problem hiding this comment.
💡 Suggestion: Sorting by UUID string comparison gives no meaningful ordering
provider_id values are UUIDs, so a.provider_id.cmp(&b.provider_id) is essentially random. Consider sorting by most-recent registration time, or just taking the first hardware-verified online record.
|
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cdd03d6fa2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if record.is_hardware_verified() { | ||
| println!( | ||
| "✓ hardware ({}, provider {})", | ||
| record.status, | ||
| record.short_provider_id() | ||
| ); | ||
| passed += 1; |
There was a problem hiding this comment.
Require online provider before passing trust check
cmd_doctor marks check 9 as passed whenever the first serial-matched record has trust_level=hardware, even if that record is offline. Because records are sorted to prefer hardware first, a stale offline hardware record can hide an active online self-signed provider for the same serial, producing a false “passed” trust result and misleading routing diagnostics.
Useful? React with 👍 / 👎.
| None => { | ||
| println!("✗ Could not read local serial number"); | ||
| issues | ||
| .push("Could not read Mac serial number for coordinator trust lookup".to_string()); |
There was a problem hiding this comment.
Skip trust failure on non-macOS serial lookup
The new coordinator-trust check unconditionally fails when get_serial_number() is unavailable, but non-macOS paths in earlier checks are explicitly treated as “not applicable.” On Linux/CI environments this now always adds an issue (Could not read Mac serial number...), regressing doctor output from a conditional skip to a hard failure.
Useful? React with 👍 / 👎.
Summary
Verification