Skip to content

Clarify provider trust diagnostics#104

Open
Gajesh2007 wants to merge 2 commits into
masterfrom
codex/provider-trust-diagnostics-pr
Open

Clarify provider trust diagnostics#104
Gajesh2007 wants to merge 2 commits into
masterfrom
codex/provider-trust-diagnostics-pr

Conversation

@Gajesh2007
Copy link
Copy Markdown
Member

Summary

  • move the local-only provider trust diagnostics changes onto a clean branch from current master
  • improve provider trust/diagnostics behavior in provider/src/main.rs

Verification

  • cd provider && cargo fmt --check
  • pre-push hook passed

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
d-inference Ready Ready Preview May 1, 2026 3:17pm
d-inference-console-ui-dev Ready Ready Preview May 1, 2026 3:17pm
d-inference-landing Ready Ready Preview May 1, 2026 3:17pm

Request Review

hankbobtheresearchoor

This comment was marked as low quality.

Comment thread provider/src/main.rs
.timeout(std::time::Duration::from_secs(5))
.build()?;
let resp = client
.get(format!("{base_url}/v1/providers/attestation"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: This endpoint leaks all providers' attestation data to every caller

The client fetches the entire provider list from /v1/providers/attestation and filters client-side by serial_number. This means any unauthenticated caller gets every provider's serial_number, provider_id, trust_level, status, sip_enabled, etc.

Serial numbers are PII-equivalent — they uniquely identify hardware and can't be rotated.

Fix: Add a query param ?serial_number=X on the server side and only return records matching that serial. The endpoint should not expose other providers' data.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh this isn't really leaking since the endpoint is publicly accessible. the riskiest field is the serial number which is embedded as an OIN in the x509 cert signed by apple attesting to the device's identity and the p256 key that's generated within the enclave. chatted with @anupsv who also agreed its a bit risky to expose publicly.

Comment thread provider/src/main.rs

if already_enrolled {
println!(" ✓ Already enrolled in MDM — skipping");
println!(" ✓ Local MDM profile present");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Warning: already_enrolled branch skips re-enrollment nudge

When MDM is already present locally, the code just prints "Local MDM profile present" and moves on. But the old profile might be the legacy static one (pre-ACME). The user should be nudged to re-enroll to get the ACME device-attest-01 payload.

Suggestion: Add a one-liner like:

println!("  Consider re-enrolling with `darkbloom enroll` to update your attestation profile.");

Comment thread provider/src/main.rs
b.is_hardware_verified()
.cmp(&a.is_hardware_verified())
.then_with(|| b.is_online().cmp(&a.is_online()))
.then_with(|| a.provider_id.cmp(&b.provider_id))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: Sorting by UUID string comparison gives no meaningful ordering

provider_id values are UUIDs, so a.provider_id.cmp(&b.provider_id) is essentially random. Consider sorting by most-recent registration time, or just taking the first hardware-verified online record without sorting the rest.

Comment thread provider/src/main.rs
if trimmed.is_empty() {
"<no output>".to_string()
} else {
trimmed.lines().take(12).collect::<Vec<_>>().join("\n ")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: take(12) truncates silently

For profiles status -type enrollment, 12 lines is probably fine, but for future commands added to --support, this could silently drop important output. Consider adding an indication like (truncated, X more lines) when output is cut.

Comment thread provider/src/main.rs

let mut issues: Vec<String> = Vec::new();
let mut passed = 0;
let total_checks = 9;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: total_checks is hardcoded to 9

If someone adds/removes a check, total_checks won't auto-update. Consider deriving it from passed + issues.len() at the end, or at least adding a comment like // Update this when adding/removing checks.

Comment thread provider/src/main.rs
.build()?;
let resp = client
.get(format!("{base_url}/v1/providers/attestation"))
.send()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: This endpoint leaks all providers' attestation data to every caller

The client fetches the entire provider list from /v1/providers/attestation and filters client-side by serial_number. This means any unauthenticated caller gets every provider's serial_number, provider_id, trust_level, status, sip_enabled, etc.

Serial numbers are PII-equivalent — they uniquely identify hardware and can't be rotated.

Fix: Add a query param ?serial_number=X on the server side and only return records matching that serial.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude how frequent are queries being made to the coordinator server by the inference provider on /v1/providers/attestation ?

Copy link
Copy Markdown

@claude claude Bot Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude finished @ethenotethan's task in 38s —— View job


Query frequency for /v1/providers/attestation

  • Understand the question
  • Find all call sites in provider code
  • Determine call frequency/trigger conditions

The provider CLI calls /v1/providers/attestation in exactly two places, both triggered only by manual user commands — there is no automatic/periodic polling:

Trigger Location Frequency
darkbloom status main.rs:4910 Once per invocation (on-demand)
darkbloom doctor (check #9) main.rs:5571 Once per invocation (on-demand)

The running serve process does not call this endpoint at all. No timer, ticker, or background loop hits it. Frequency is entirely user-driven.

Relevant to the serial number leakage debate: since both call sites are one-shot CLI commands invoked by the provider operator themselves (not automated), this is not a high-frequency background polling pattern. That said, the endpoint is also called by the console UI (from the browser):

  • PreSendTrustBanner.tsx:25 — fetches on chat page load (once per page load)
  • VerificationPanel.tsx:511 — fetches when the user clicks "Verify" (on-demand)

Those are also infrequent, but they're in the web client, meaning any browser visiting the console triggers a full provider list dump. The serial number exposure concern from the review is still valid — but the blast radius is limited by the low call frequency and the fact that provider-side calls are self-inflicted (the provider operator is querying their own record).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, So:

Only when a user manually runs status or doctor

In that case this is a good callout but is better suited as a follow up issue (not blocking merge).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm still curious about what the request lifecycle looks like for the /attestation endpoint and if there could be DoS... not an issue for this PR though

Comment thread provider/src/main.rs
Ok(serial) => {
println!(" Serial: {serial}");
match fetch_coordinator_provider_trust(DEFAULT_COORDINATOR_HTTP_URL, &serial).await {
Ok(records) if records.is_empty() => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: cmd_status hardcodes DEFAULT_COORDINATOR_HTTP_URL

If a provider is running against staging or a local coordinator, darkbloom status will query prod and show wrong trust info. The Status subcommand doesn't accept a --coordinator flag.

Fix: Add --coordinator to Status (like Doctor has), or read it from the existing config file.

Copy link
Copy Markdown
Contributor

@ethenotethan ethenotethan Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hankbobtheresearchoor I agree with your deduction but disagree with --coordinator being the proper user entrypoint, perhaps something like dev and prod distinctions would make the most sense for user comprehensibility.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually second guessing myself, for spinning up the network in a local env being able to pass endpoint is critical. probably makes sense to:

  1. default to prod coordinator if no coordinator URL is provided to binary env

Comment thread provider/src/main.rs
#[serde(default)]
trust_level: String,
#[serde(default)]
status: String,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Warning: serde(default) on bool fields gives silent wrong defaults

#[serde(default)] defaults bools to false. If the coordinator response is missing a field, the provider silently interprets "missing" as "not verified" — safe direction, but masks API breakage.

Consider: Option<bool> so a deserialization mismatch surfaces as None rather than a silent false.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is more or less dynamic resolution and doesn't ensure adequate compatibility if this data type were to get updated by the coordinator in the future.

Comment thread provider/src/main.rs
if already_enrolled {
println!(" ✓ Already enrolled in MDM — skipping");
println!(" ✓ Local MDM profile present");
println!(" Coordinator hardware trust will be verified after provider registration.");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Warning: already_enrolled branch skips re-enrollment nudge

When MDM is already present locally, the code just prints "Local MDM profile present". But the old profile might be the legacy static one (pre-ACME). The user should be nudged to re-enroll to get the ACME device-attest-01 payload.

println!("  Consider re-enrolling with `darkbloom enroll` to update your attestation profile.");

Comment thread provider/src/main.rs
.cmp(&a.is_hardware_verified())
.then_with(|| b.is_online().cmp(&a.is_online()))
.then_with(|| a.provider_id.cmp(&b.provider_id))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: Sorting by UUID string comparison gives no meaningful ordering

provider_id values are UUIDs, so a.provider_id.cmp(&b.provider_id) is essentially random. Consider sorting by most-recent registration time, or just taking the first hardware-verified online record.

@hankbobtheresearchoor
Copy link
Copy Markdown
Contributor

⚠️ Warning (out-of-diff): build.rs still emits DARKBLOOM_ENROLL_PROFILE_URL but it's no longer consumed

provider/build.rs:39 still does println!("cargo:rustc-env=DARKBLOOM_ENROLL_PROFILE_URL={enroll_url}") but main.rs no longer has the option_env!("DARKBLOOM_ENROLL_PROFILE_URL") that reads it. This is dead code in the build script — should be cleaned up or it'll confuse future maintainers.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cdd03d6fa2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread provider/src/main.rs
Comment on lines +5582 to +5588
if record.is_hardware_verified() {
println!(
"✓ hardware ({}, provider {})",
record.status,
record.short_provider_id()
);
passed += 1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require online provider before passing trust check

cmd_doctor marks check 9 as passed whenever the first serial-matched record has trust_level=hardware, even if that record is offline. Because records are sorted to prefer hardware first, a stale offline hardware record can hide an active online self-signed provider for the same serial, producing a false “passed” trust result and misleading routing diagnostics.

Useful? React with 👍 / 👎.

Comment thread provider/src/main.rs
Comment on lines +5617 to +5620
None => {
println!("✗ Could not read local serial number");
issues
.push("Could not read Mac serial number for coordinator trust lookup".to_string());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip trust failure on non-macOS serial lookup

The new coordinator-trust check unconditionally fails when get_serial_number() is unavailable, but non-macOS paths in earlier checks are explicitly treated as “not applicable.” On Linux/CI environments this now always adds an issue (Could not read Mac serial number...), regressing doctor output from a conditional skip to a hard failure.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants