fix: Change attempt_number to 1-based indexing by alexluong · Pull Request #740 · hookdeck/outpost

alexluong · 2026-03-10T15:45:43Z

Summary

attempt_number was 0-based (first attempt = 0, retries = 1, 2, 3...). This changes it to 1-based (first attempt = 1, retries = 2, 3, 4...) to align with the OpenAPI spec which already documents 1-based indexing.

Side effects

No data migration is included. This means:

Historical data stays as-is. Existing attempts in Postgres still have 0-based attempt_number values. New attempts will be 1-based. Users querying across old and new data will see mixed values. We are ok with this.
In-flight tasks with Attempt=0 that hit retry after deployment will pass through a max(attempt-1, 0) guard to prevent panics/infinite loops in backoff calculation.
Old in-flight tasks get one extra retry since 0 <= retryMaxLimit passes the retry check. Only affects tasks in-flight during deployment.

Optional: data migration for users who care about consistency

Users who want clean 1-based data across the board can run the following after upgrading. We encourage taking a timestamp of when the upgrade happens and scoping the update to records created before that point:

UPDATE attempts
SET attempt_number = attempt_number + 1
WHERE created_at < '<upgrade-timestamp>';

This does not cover in-flight tasks at the time of upgrade. Users who care about full data correctness can determine the appropriate handling for their setup.

First attempt now starts at 1 (retries: 2, 3, 4...) to align with the OpenAPI spec which already documents 1-based indexing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vercel · 2026-03-10T15:45:48Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
outpost-docs	Ready	Preview, Comment	Mar 10, 2026 3:46pm
outpost-website	Ready	Preview, Comment	Mar 10, 2026 3:46pm

PR #740 changed attempt_number from 0-based to 1-based. Update all metrics query logic, test data, seed scripts, and bench seeds to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add Metrics interface to logstore driver Split LogStore into Records + Metrics sub-interfaces. LogStore is now the combined interface so all existing consumers are unaffected. Typed responses per endpoint (EventMetricsResponse, AttemptMetricsResponse) with all fields as pointers. Stub implementations for CH, PG, and mem drivers return errNotImplemented. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add metrics conformance tests with 300-row dataset Add a comprehensive metrics test dataset spanning January 2000 with 300 tenant-1 events (50 sparse across 5 days + 250 dense bell-curve on Jan 15) and 5 tenant-2 events for isolation. All dimension cycling produces round numbers (100/topic, 150/dest, 180 success, 120 failed, error_rate=0.4, etc). Covers all granularities (1m, 1h, 1d, 1w, 1M), dimensions, filters, and measures for both event and attempt metrics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: implement Postgres logstore metrics queries Dynamic SQL builder for event and attempt metrics with parameterized queries, time bucketing (date_bin/date_trunc), dimension grouping, conditional aggregates (FILTER WHERE), 30s query timeout fallback, and row limit enforcement with truncation flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: implement ClickHouse logstore metrics queries Naive implementation using uniqExact/uniqExactIf for dedup-safe aggregation over ReplacingMergeTree without FINAL. Includes 30s query timeout fallback and 100k row limit with truncation detection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: remove unimplemented check from metrics conformance tests All three drivers (mem, pg, ch) now implement the metrics interface. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add metrics API endpoints for events and attempts Wire up QueryEventMetrics and QueryAttemptMetrics as GET /api/v1/metrics/events and GET /api/v1/metrics/attempts with query param parsing, allowlist validation, JWT tenant scoping, and response transformation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: gofmt metrics handlers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: remove duplicate derefIntFromIntPtr helper Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add missing tenant_id and attempt_number dimensions to memlogstore, add metrics OpenAPI spec memlogstore QueryEventMetrics was missing tenant_id dimension causing empty results when grouping by tenant. QueryAttemptMetrics was missing both tenant_id and attempt_number dimensions. Also adds MetricsResponse schemas and /metrics/events, /metrics/attempts paths to the OpenAPI spec. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add metrics time-series characteristics conformance suite Split metrics drivertest into DataCorrectness (existing value assertions) and Characteristics (structural contract tests for dense bucket filling, ordering, alignment, deterministic count, zero measures, no-data ranges, and dimension × time filling). Shared dataset setup, single provisioning. Characteristics tests will fail until bucket filling is implemented. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump Go from 1.24 to 1.26 Enables new(expr) syntax for cleaner pointer initialization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add post-query bucket filling for metrics time series Sparse time-series responses caused dashboard charts to render fat bars instead of slim bars across all time slots. This adds a shared bucket filling layer (internal/logstore/bucket/) called by all 3 backends after query, producing dense responses with zero-filled gaps. - Extract TruncateTime into shared bucket package - FillEventBuckets / FillAttemptBuckets with dimension-aware filling - Update drivertest assertions for dense bucket counts - All characteristics tests now pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wire missing attempt metric filters (code, manual, attempt_number) The API allowlist accepted these filters but all three query builders (pg, ch, mem) silently ignored them, causing filters like attempt_number=0 to return unfiltered results. Add WHERE clauses in all drivers and conformance tests to prevent regression. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: rename date_range to time in metrics API query params and DateRange to TimeRange in Go code Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove eligible_for_retry from event metrics dimensions and filters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: adopt bracket notation array params for metrics API Merge main (PR #732) and update metrics handlers to use ParseArrayQueryParam and resolveTenantIDsFilter. Update OpenAPI spec filters to oneOf string/array schema with bracket notation. Update test query strings to indexed bracket format. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add rate throughput measures to metrics API Add per-second throughput rate measures computed as count / bucket_duration_seconds. Events endpoint gets `rate`, attempts gets `rate`, `successful_rate`, `failed_rate`. Rate computation lives in shared driver/rate.go, called by each driver after bucket filling. Dependency measures are auto-enriched (e.g. requesting `rate` without `count` internally adds `count` for SQL but omits it from API response). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add rate measure coverage to drivertest Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add ClickHouse SETTINGS constraints to metrics queries Adds server-side resource limits (max_execution_time, max_rows_to_group_by) to CH metrics queries and surfaces overflow errors as 400 instead of 500. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: validate time range in metrics queries to prevent div-by-zero in rate computation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add missing tenant_id and attempt_number dimensions to CH and PG metrics queries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: remove dead MetricsMetadata.Granularity field No driver ever sets this field — the handler already has the request granularity. Removes the dead if-branch in buildAPIMetricsResponse. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: move tenant scoping from TenantID field to filters[tenant_id] Replace MetricsRequest.TenantID with filters[tenant_id] across all 3 drivers so tenant scoping uses the same filter mechanism as other dimensions. Handler now validates/injects filters[tenant_id] for JWT callers instead of using resolveTenantIDsFilter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add defensive guards and validation to metrics time bucketing - Add forward-progress guard in GenerateTimeBuckets to prevent infinite loops if AdvanceTime doesn't advance - Cap bucket count at 100k to prevent OOM; returns ErrTooManyBuckets wrapped as ErrResourceLimit (→ 400 "query too broad") - Validate granularity value ranges per spec (s:1-60, m:1-60, h:1-24, d:1-31, w:1-4, M:1-12) - Document Granularity alignment behavior for calendar units (d/w/M) - Update GenerateTimeBuckets and Fill*Buckets signatures to return error - Update all driver callers (ch, pg, mem) to propagate bucket errors - Add drivertest characteristics for too-many-buckets → ErrResourceLimit - Add handler tests for query-too-broad and granularity-out-of-range Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add failing test for 2d granularity data loss 2d granularity silently drops events on non-slot days. SQL groups by individual day but bucket fill generates every-2-day slots, so data on days that don't align with a slot boundary is orphaned. Expected 300, actual 280 (Jan 22 and Jan 28 events lost). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use epoch-anchored alignment for multi-value d/w/M granularities TruncateTime, CH SQL, and PG SQL all ignored g.Value for d/w/M units, always truncating to the natural period (day/week/month). This caused FillEventBuckets to silently drop data on non-slot days when Value > 1 (e.g. 2d granularity lost ~7% of events). Fix: epoch-anchored N-interval alignment for all three backends: - d: anchor to 1970-01-01, align to N-day boundaries - w: anchor to 1970-01-04 (Sunday), align to N*7-day boundaries - M: align month number to nearest multiple of N from Jan 1970 - Value=1 retains existing behavior (no functional change) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(portal): add metrics dashboard to destination overview 6-chart metrics dashboard: delivery volume (stacked bar), error rate (line, 0-100%), retries (multi-line), avg attempt number, status code breakdown, and topic breakdown. Shared timeframe selector (1h/24h/7d/30d), all charts use /metrics/attempts endpoint. Includes dataviz CSS vars for info (blue) and warning (orange) themes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(portal): replace ResponsiveContainer to fix -1px dimension warnings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(portal): redesign metrics dashboard layout and add event count chart Restructure the metrics grid to 3 rows: events/deliveries, error breakdown (3-col), and retry pressure. Add new "Events / count" chart using attempt_number=0 filter. Support title/subtitle pattern with muted subtitles and add filters param to useMetrics hook. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(portal): add events 24h sparkline column to destinations list Add sparkline + event count to each destination row using 4h granularity with attempt_number=0 filter. Includes Sparkline component with stacked success/failed bars, empty-bar rendering, and granularity override for useMetrics hook. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(portal): clean up import order and simplify useMetrics URL building Move Loading import to top of MetricsChart.tsx. Use arrays/objects directly in useMetrics instead of serializing to strings and re-splitting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: Fix metrics table histogram display * chore: Destination page metrics improvements * chore: Update time range picker styling * fix: Fix metric breakdown height and dimension ellipsis * chore: add metrics seed & QA scripts - seed_metrics.sh: generates realistic event→attempt chains with configurable error rates, retry chains, and time distribution - qa_metrics.sh: 11 named scenarios (healthy, failing, spike, empty, single, all-fail, all-success, recent, many-topics, many-codes, retry-heavy) with verification checklists - README documenting usage and scenarios Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(portal): exclude manual retries from event count metrics Add manual=false filter alongside attempt_number=0 to prevent manual retries (which also start at attempt_number=0) from inflating event counts in the destination metrics chart and destinations list sparkline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: exclude manual retries from first_attempt_count measure Manual retries start a new chain at attempt_number=0, inflating first_attempt_count. Add AND NOT manual to CH, PG, and memlogstore queries. Add FIXME for test dataset which assigns manual and attempt_number independently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * bench: add metrics benchmarking suite for ClickHouse and PostgreSQL Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * bench: update metrics bench for new API and add coverage for rate, granularity, filters Migrate bench suite to current metrics API (TimeRange, tenant via filters) and add bench cases for rate measures, multi-value granularities (2d/w/M), new dimensions (code, attempt_number), and new filters (code, manual, attempt_number, multi-filter). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: align metrics with 1-based attempt_number indexing PR #740 changed attempt_number from 0-based to 1-based. Update all metrics query logic, test data, seed scripts, and bench seeds to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Alexandre Bouchard <alexbouchardd@gmail.com>

Missed in #740 — portal still filtered for attempt_number=0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: batch destination metrics requests in portal list view Collapse N per-row metrics API calls into a single batched request using dimensions[]=destination_id grouping. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update attempt_number filter to 1-based indexing in portal Missed in #740 — portal still filtered for attempt_number=0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

alexluong and others added 2 commits March 10, 2026 22:32

fix: Change attempt_number from 0-based to 1-based indexing

cd23b78

First attempt now starts at 1 (retries: 2, 3, 4...) to align with the OpenAPI spec which already documents 1-based indexing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: Guard backoff index against leftover 0-based in-flight tasks

1387938

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vercel bot deployed to Preview – outpost-docs March 10, 2026 15:46 View deployment

alexbouchardd approved these changes Mar 10, 2026

View reviewed changes

alexluong merged commit 6775114 into main Mar 13, 2026
5 checks passed

alexluong deleted the attempt-number branch March 13, 2026 15:40

alexluong added a commit that referenced this pull request Mar 13, 2026

fix: update attempt_number filter to 1-based indexing in portal

a97be0c

Missed in #740 — portal still filtered for attempt_number=0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

alexluong mentioned this pull request Mar 13, 2026

feat: batch destination metrics requests in portal list view #762

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Change attempt_number to 1-based indexing#740

fix: Change attempt_number to 1-based indexing#740
alexluong merged 2 commits intomainfrom
attempt-number

alexluong commented Mar 10, 2026 •

edited

Loading

Uh oh!

vercel bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexluong commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Side effects

Optional: data migration for users who care about consistency

Uh oh!

vercel bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexluong commented Mar 10, 2026 •

edited

Loading

vercel bot commented Mar 10, 2026 •

edited

Loading