Skip to content

Add proper readiness checks to health interceptor#121

Merged
smoreinis merged 2 commits intomainfrom
perf/health-check-readiness-probes
Dec 29, 2025
Merged

Add proper readiness checks to health interceptor#121
smoreinis merged 2 commits intomainfrom
perf/health-check-readiness-probes

Conversation

@smoreinis
Copy link
Copy Markdown
Collaborator

Summary

  • Separates liveness (/healthz) from readiness (/readyz, /healthcheck) probes
  • Liveness probe returns 200 immediately (sub-millisecond) without checking dependencies
  • Readiness probes check PostgreSQL, Redis, and MongoDB connectivity
  • Returns 503 with detailed status when dependencies are unhealthy or not initialized
  • Adds per-check timeout (2s) and overall timeout (5s) to prevent hung health checks

Why this matters

Previously, all health check paths returned 200 regardless of dependency health. This meant Kubernetes would continue routing traffic to pods that couldn't actually serve requests. Now:

  • /healthz (liveness): Fast response for Kubernetes to detect stuck processes
  • /readyz and /healthcheck (readiness): Actual dependency validation before accepting traffic

Response format (readiness)

{
  "status": "ok",  // or "degraded"
  "checks": {
    "postgres": {"healthy": true},
    "redis": {"healthy": true},
    "mongodb": {"healthy": true}
  }
}

Test plan

  • Unit tests verify liveness returns 200 without checking deps
  • Unit tests verify readiness returns 503 when deps unavailable
  • Unit tests verify readiness returns 200 when all deps healthy
  • All 188 unit tests pass

- Separate liveness (/healthz) from readiness (/readyz, /healthcheck)
- Liveness probe returns 200 immediately (sub-millisecond, no deps)
- Readiness probes check PostgreSQL, Redis, and MongoDB connectivity
- Add per-check timeout (2s) and overall timeout (5s)
- Return 503 when dependencies are unhealthy or not initialized
- Run all dependency checks concurrently with asyncio.gather()
- Update tests to mock GlobalDependencies and verify behavior
@smoreinis smoreinis requested a review from a team as a code owner December 24, 2025 22:29
Copy link
Copy Markdown
Contributor

@RoxyFarhad RoxyFarhad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳 wooo thanks for catching this!!


async with asyncio.timeout(DEPENDENCY_CHECK_TIMEOUT):
async with engine.connect() as conn:
await conn.execute(text("SELECT 1"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random q: no pings for PostgreSQL?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean by "pings" here? connecting to the engine feels similar (making sure the instance is reachable) and trying to run this basic query helps us confirm that the instance is actually handling requests as expected (as opposed to, say, being unresponsive)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I meant is there no equivalent on a DB PING (like we do for mongo or redis) for postgresql? this is obviously good too

@smoreinis smoreinis merged commit 8bd501a into main Dec 29, 2025
6 checks passed
@smoreinis smoreinis deleted the perf/health-check-readiness-probes branch December 29, 2025 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants