Skip to content

Conversation

@drernie
Copy link
Member

@drernie drernie commented Dec 30, 2025

Summary

Enhances EventBridge documentation with comprehensive troubleshooting guidance based on real customer issues encountered when setting up EventBridge routing for S3 events.

Problem

Customers have encountered several non-obvious failure modes when setting up EventBridge routing:

  1. Disabled EventBridge rules - Rules exist but are disabled (silent failure)
  2. Missing PackagerQueue subscriptions - Files index correctly, but packages don't appear in UI
  3. Missing Input Transformer - Events sent in CloudTrail format instead of S3 notification format
  4. General difficulty diagnosing event flow issues

These issues are not well-documented and lead to extended troubleshooting sessions.

Changes

Documentation Updates (docs/EventBridge.md)

  • Automated validation script reference - Point to diagnostic tooling (via support)
  • Quick diagnostic checklist - Copy-paste CLI commands for rapid diagnosis
  • Troubleshooting flowchart - Step-by-step decision tree for systematic debugging
  • 6 detailed failure mode sections:
    1. Disabled EventBridge rule (with enable commands)
    2. Files index but packages don't appear (PackagerQueue issue)
    3. Events in wrong format (Input Transformer missing)
    4. General event flow issues
    5. Permission errors
    6. Duplicate events

Key Improvements

  1. Emphasize rule state checking - Many issues stem from disabled rules
  2. Document PackagerQueue requirement - Previously undocumented, commonly missing
  3. Clarify Input Transformer is CRITICAL - Not optional, required for proper format
  4. Provide actionable commands - All diagnostics include exact CLI commands
  5. Self-service validation - Customers can diagnose without support escalation

Impact

  • Reduces support load - Customers can self-diagnose common issues
  • Faster resolution - Clear troubleshooting paths eliminate trial-and-error
  • Better onboarding - New customers can validate setup immediately
  • Prevents common mistakes - Documents known failure modes proactively

Related

  • Validation script PR: https://github.com/quiltdata/scripts/pull/8
  • Customer incident: FL109 (Flagship Pioneering) - Packages not appearing in UI after EventBridge setup
  • Root causes: Missing PackagerQueue subscription + wrong event format

🤖 Generated with Claude Code

Greptile Summary

This PR significantly enhances the EventBridge documentation by adding comprehensive troubleshooting guidance based on real customer issues. The additions include an automated validation script reference, a quick diagnostic checklist with CLI commands, a troubleshooting flowchart, and six detailed failure mode sections covering common issues like disabled EventBridge rules, missing PackagerQueue subscriptions, and incorrect Input Transformer configurations.

Key Improvements:

  • Actionable diagnostics - Every troubleshooting scenario includes specific AWS CLI commands for diagnosis and resolution
  • Structured troubleshooting flow - Flowchart provides systematic approach to debugging event flow issues
  • Critical insights documented - Highlights previously undocumented requirements like PackagerQueue subscription and Input Transformer necessity
  • Self-service enablement - Customers can validate setup and diagnose issues without support escalation

Documentation Quality:

  • Clear symptom/cause/solution structure for each issue
  • Code examples properly marked with <!-- pytest-codeblocks:skip --> to prevent test execution
  • Proper markdown formatting and consistent structure
  • References existing setup steps (e.g., "Step 6 above") for context

Confidence Score: 5/5

  • This PR is safe to merge with no risk - it only adds documentation content
  • Score reflects that this is a documentation-only change with no code modifications. The troubleshooting guidance is well-structured, based on real customer issues, and provides actionable solutions. All code examples are properly formatted and marked to skip pytest execution. The content is technically accurate and aligns with AWS best practices
  • No files require special attention

Important Files Changed

Filename Overview
docs/EventBridge.md Added comprehensive troubleshooting guide with diagnostic commands, flowchart, and 6 detailed failure scenarios

Sequence Diagram

sequenceDiagram
    participant User
    participant S3 as S3 Bucket
    participant CT as CloudTrail
    participant EB as EventBridge Rule
    participant IT as Input Transformer
    participant SNS as SNS Topic
    participant IQ as IndexerQueue (SQS)
    participant PQ as PackagerQueue (SQS)
    participant S3Q as S3SNSToEventBridgeQueue (SQS)
    participant Lambda as Lambda Processors
    participant ES as Elasticsearch
    
    User->>S3: Upload file (PutObject)
    S3->>CT: Log S3 event
    CT->>EB: Trigger EventBridge rule
    
    alt Rule is DISABLED
        EB--xSNS: No event sent (Issue #1)
        Note over EB,SNS: Silent failure - rule exists but disabled
    else Rule is ENABLED
        EB->>IT: Send CloudTrail format event
        
        alt Input Transformer Missing
            IT--xSNS: Wrong format sent (Issue #3)
            Note over IT,SNS: CloudTrail format instead of S3 notification
        else Input Transformer Configured
            IT->>SNS: Transform to S3 notification format
            
            alt PackagerQueue NOT Subscribed
                SNS->>IQ: Send to IndexerQueue
                SNS->>S3Q: Send to S3SNSToEventBridgeQueue
                SNS--xPQ: PackagerQueue missing (Issue #2)
                Note over SNS,PQ: Files index, packages don't appear in UI
            else All Queues Subscribed
                SNS->>IQ: Send to IndexerQueue
                SNS->>PQ: Send to PackagerQueue
                SNS->>S3Q: Send to S3SNSToEventBridgeQueue
                
                IQ->>Lambda: Process file indexing
                PQ->>Lambda: Process package creation
                Lambda->>ES: Update index
                Note over User,ES: Files AND packages appear in Quilt UI
            end
        end
    end
Loading

This update adds detailed troubleshooting guidance based on real customer
issues encountered when setting up EventBridge routing for S3 events.

Changes:
- Add troubleshooting flowchart with step-by-step diagnostics
- Document 6 common failure modes:
  1. Disabled EventBridge rule (silent failure)
  2. Missing PackagerQueue subscription (files index, packages don't)
  3. Missing Input Transformer (wrong event format)
  4. General event flow issues
  5. Permission errors
  6. Duplicate events
- Add quick diagnostic checklist with CLI commands
- Include prevention strategies and validation steps
- Reference to validation script (available via support)

Key improvements:
- Emphasize checking rule state FIRST (common silent failure)
- Document PackagerQueue requirement (often missing)
- Clarify Input Transformer is CRITICAL (not optional)
- Provide copy-paste validation commands

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@codecov
Copy link

codecov bot commented Dec 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.22%. Comparing base (2abb847) to head (3e4e1bc).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4680   +/-   ##
=======================================
  Coverage   43.22%   43.22%           
=======================================
  Files         797      797           
  Lines       32003    32003           
  Branches     5699     5699           
=======================================
  Hits        13833    13833           
  Misses      16176    16176           
  Partials     1994     1994           
Flag Coverage Δ
api-python 91.96% <ø> (ø)
catalog 19.43% <ø> (ø)
lambda 96.62% <ø> (ø)
py-shared 98.18% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@drernie drernie self-assigned this Dec 30, 2025
@drernie drernie requested a review from kevinemoore December 30, 2025 00:41
@drernie
Copy link
Member Author

drernie commented Dec 30, 2025

@kevinemoore The docs are correct, but we identified a few more failure modes.

Two bash code blocks were missing skip markers and causing
test-testdocs to fail when pytest tried to execute them.

Fixes:
- Line 401: Verification command for queue subscriptions
- Line 431: Validation command for event transformation
Ensures all diagnostic and troubleshooting commands include
--region us-east-1 for consistency with setup commands and to
prevent 'You must specify a region' errors.

Changes:
- Quick diagnostic checklist: added --region to all 3 checks
- Issue 1 (Disabled rule): added --region to describe commands
- Issue 2 (PackagerQueue): added --region to all SNS/SQS commands
- Issue 3 (Input Transformer): added --region to S3/SQS commands

Maintains consistency with prerequisite that emphasizes 'same region
as your S3 bucket' throughout the guide.
@drernie drernie requested a review from Copilot December 30, 2025 00:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the EventBridge documentation with comprehensive troubleshooting guidance based on real customer issues, specifically addressing non-obvious failure modes when setting up EventBridge routing for S3 events.

Key Changes:

  • Added automated validation script reference and quick diagnostic checklist with actionable CLI commands
  • Introduced systematic troubleshooting flowchart for debugging event flow issues
  • Documented six detailed failure scenarios including disabled rules, missing PackagerQueue subscriptions, and incorrect Input Transformer configurations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

--namespace AWS/Events \
--metric-name TriggeredRules \
--dimensions Name=RuleName,Value=quilt-s3-events-rule \
--start-time $(date -u -d '5 minutes ago' '+%Y-%m-%dT%H:%M:%S') \
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date -u -d '5 minutes ago' command syntax is GNU-specific and won't work on macOS (BSD date). Consider documenting that users on macOS should use date -u -v-5M instead, or provide a more portable alternative.

Suggested change
--start-time $(date -u -d '5 minutes ago' '+%Y-%m-%dT%H:%M:%S') \
--start-time $(date -u -v-5M '+%Y-%m-%dT%H:%M:%S' 2>/dev/null || date -u -d '5 minutes ago' '+%Y-%m-%dT%H:%M:%S') \

Copilot uses AI. Check for mistakes.
├─ 4. Is Input Transformer configured correctly?
│ ├─ Check EventBridge rule → Targets → Input transformer
│ ├─ ❌ "Matched event" selected → Events in wrong format!
│ ├─ ❌ Missing transformer → Configure per Step 6 above
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference 'Step 6 above' may be ambiguous as the documentation structure evolves. Consider referencing a specific section title (e.g., 'Configure per the Input Transformer Configuration section') for more durable documentation.

Suggested change
│ ├─ ❌ Missing transformer → Configure per Step 6 above
│ ├─ ❌ Missing transformer → Configure the EventBridge rule input transformer as described in the configuration instructions above in this guide

Copilot uses AI. Check for mistakes.
@drernie drernie marked this pull request as draft December 30, 2025 05:40
@drernie
Copy link
Member Author

drernie commented Dec 30, 2025

Whoops, found a real bug. Lacking policies and event morphing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants