forked from EvolutionAPI/evolution-api
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Context
Phase 1.5 image delivery RCA identified that intermittent image failures are caused by a race condition: Chatwoot fires the message_created webhook before the ActiveStorage blob upload job (Sidekiq) completes.
Current mitigation (Fix 3):
- 3 retries with 1s/2s backoff (~3s total retry window)
- Handles most cases where blob commits within 3 seconds
- If blob takes >3s to commit, all retries fail → yellow error note posted to agent
Observed in production:
- Some blob uploads exceed the 3s retry window
- Particularly affects larger images or when Sidekiq workers are slow
Proposed Enhancement
Increase retry parameters in chatwoot.service.ts:
MAX_BLOB_RETRIES: 2 → 4 (or 5)BLOB_RETRY_DELAY_MS: Consider exponential backoff (1s, 2s, 4s, 8s)
This would extend the retry window from ~3s to ~15s, covering more edge cases.
Trade-offs
Pro:
- Reduces intermittent image delivery failures
- Better UX for agents (fewer false failures)
Con:
- Longer webhook processing time if blob genuinely doesn't exist
- May mask underlying Sidekiq performance issues
Related
- Root cause analysis:
Prospek/docs/whisper/image-delivery-rca-v2.md(Section 2, RC4) - Code:
chatwoot.service.ts:1238-1282 - Current branch:
fix/image-delivery-reliability(commit8037754c)
Priority
P2 - Enhancement (Phase 1.5 mitigates most cases; this extends coverage for edge cases)
Monitor real-world blob 404 rates in production after Phase 1.5 closes. If failure rate remains >1%, prioritize this enhancement.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working