Skip to content

Incoming-call orchestration refactor plan #1

Description

@thstyl2000

Current checked-in incoming-call path

The checked-in incoming-call push ingress is Play-only:

  • app/src/gplay/AndroidManifest.xml registers NCFirebaseMessagingService
  • app/src/gplay/java/com/nextcloud/talk/services/firebase/NCFirebaseMessagingService.kt receives the FCM push and enqueues NotificationWorker
  • app/src/main/java/com/nextcloud/talk/jobs/NotificationWorker.kt decrypts and routes the push, fetches room data, posts the incoming-call notification, and polls call state
  • app/src/main/java/com/nextcloud/talk/callnotification/CallNotificationActivity.kt renders the incoming-call screen and starts CallActivity on answer
  • app/src/main/java/com/nextcloud/talk/activities/CallActivity.kt owns permission gating, room join, call join, signaling, media, and the transition into the active-call foreground service

The checked-in active-call notification path is already separate:

  • app/src/main/java/com/nextcloud/talk/services/CallForegroundService.kt owns the ongoing-call foreground notification once the app has moved past incoming-call presentation

The repo does not currently contain a self-managed Telecom integration:

  • no MANAGE_OWN_CALLS
  • no ConnectionService
  • no android.permission.BIND_TELECOM_CONNECTION_SERVICE
  • no registered PhoneAccount

The app currently uses:

  • POST_NOTIFICATIONS
  • USE_FULL_SCREEN_INTENT
  • FOREGROUND_SERVICE
  • FOREGROUND_SERVICE_CAMERA
  • FOREGROUND_SERVICE_MICROPHONE

The main current failure modes in this path are:

  • NotificationWorker returns Result.success() before its async call flow completes. Room fetch and getPeersForCall() polling continue after WorkManager considers the job finished, and room fetch is explicitly marshaled back onto the main thread before incoming-call presentation continues.
  • Incoming-call behavior is driven by notification visibility in two places: NotificationWorker uses visibility to decide missed-call or end behavior, and CallNotificationActivity polls visibility and finishes when the notification disappears. That activity poller is not lifecycle-cleaned up today, so stale pollers can outlive an activity instance.
  • Duplicate pushes are not idempotent. Ringing notification ids and full-screen PendingIntent request codes are timestamp-based, so the same call can create parallel ringing notifications and parallel polling loops.
  • The current call-ingress push model exposes id and nid; objectId is only available after NC notification enrichment, and the checked-in call path does not perform that enrichment before ringing. The ringing path therefore still does not choose a canonical orchestration identity, and room token alone is not assumed sufficient.
  • Incoming-call presentation is hard-gated on switching the globally active account. If setUserAsActive() fails, the current path drops the call without a retry path, fallback, or explicit failure state.
  • Normalization failures are not a clean terminal path today. initDecryptedData() swallows malformed or undecryptable payload errors, but later code still dereferences signatureVerification and pushMessage, so invalid call pushes can crash the worker instead of terminating as explicit normalization failures.
  • The incoming-call account switch is also thread-hostile today. handleCallPushMessage() observes room fetch on the main thread and then calls setUserAsActive(...).blockingGet() there before presentation continues.
  • Server delete pushes are handled outside the call-specific path, and the ringing notification bypasses createNotificationBuilder(), so it does not include KEY_INTERNAL_USER_ID, KEY_ROOM_TOKEN, or KEY_NOTIFICATION_ID. Existing cancel helpers therefore do not see incoming ringing notifications.
  • Notification metadata gaps are wider than the ringing path alone. The missed-call notification and the active-call foreground notification also do not participate in one shared account-scoped metadata contract, so deterministic cleanup cannot reason uniformly about ringing, missed, and active notifications.
  • Decline is not a real pre-join action path today. Closing CallNotificationActivity just finishes the activity, and onStop() cancels the notification.
  • The answer path tears down ringing before join is confirmed. Starting CallActivity from CallNotificationActivity triggers onStop(), which cancels the ringing notification before CallForegroundService has taken over.
  • CallActivity and CallNotificationActivity are both singleTask in the same .call task, but neither handles onNewIntent(), and CallNotificationActivity assumes fresh extras and a resolvable account, so stale-task reuse is a crash risk as well as a routing risk.
  • CallActivity can start the active-call foreground service before the join path has reached a valid joining or active boundary.
  • The active-call notification boundary is still partially activity-driven today. CallActivity starts CallForegroundService early and unconditionally stops it in onDestroy(), so activity destroy or recreate paths still affect notification ownership, and START_STICKY null-intent restart can resurrect a stale ongoing-call notification with a dead tap path.
  • Incoming-call routing is not fully account-scoped. The current path switches the globally active account before presenting the call, and CallActivity later resolves the conversation user from that global state instead of the explicit internal user id.
  • Incoming-call presentation can fail or degrade differently depending on which gate blocks it. POST_NOTIFICATIONS denied, app-wide notifications disabled, or the calls notification channel disabled can suppress notification surfacing entirely. Full-screen intent unavailable or denied, or Android 15/16 background-activity-launch restrictions on the full-screen PendingIntent path, can still leave a heads-up notification surface even when full-screen escalation is blocked.
  • The generic and qa flavors have no push ingress, so the long-term coordinator cannot depend on Firebase-specific types or lifecycle assumptions.
  • Other app surfaces already depend on call lifecycle flags and active-account snapshots outside the incoming-call stack. ChatActivity, ConversationsListActivity, ApplicationWideCurrentRoomHolder, BaseActivity Talk deep-link interception, and providers or viewmodels that snapshot the active user all depend on current call or account ownership enough that the refactor needs an explicit preserve-or-replace plan.
  • The current pre-join call-state path conflates timeout, caller disappearance, and transport failure closely enough that transport failure must be modeled explicitly instead of being treated as missed by default.

Architectural target

One owner for incoming-call state, one owner for ringing presentation, and one owner for join/media/in-call lifecycle.

Build toward one deterministic incoming-call pipeline with explicit ownership:

  1. Push ingress validates and normalizes the event.
  2. An early ingress gate serializes duplicate work before later persisted dedupe exists.
  3. A coordinator/store persists and deduplicates the incoming call.
  4. A notification controller renders or updates the ringing notification.
  5. The incoming-call UI reflects coordinator state.
  6. Notification actions and UI actions go through one shared answer or decline path.
  7. CallActivity takes over only when the call is actually joining or active.
  8. CallForegroundService remains the owner of the ongoing-call foreground notification unless that contract is intentionally replaced in a dedicated change.

Rules that apply across the whole refactor:

  • The coordinator is ingress-agnostic. FCM is the current ingress, not the architectural boundary.
  • Canonical orchestration identity is account-scoped and uses server-side call identity. Do not use timestamp-generated ids, and do not assume room token alone is sufficient until that is confirmed.
  • Presentation-state tracking distinguishes notification surfacing from full-screen escalation. Notification permission denial, app-wide notifications disabled, and calls channel disabled are blocked-notification outcomes. Full-screen intent unavailable or denied and background-activity-launch restrictions are blocked-full-screen outcomes unless notification surfacing also failed.
  • Blocked-presentation classification must be callable from worker, receiver, coordinator, and UI code. It cannot live only in activities or settings screens.
  • Pre-join decline is a real call-control action with a defined remote or signaling effect, not UI teardown.
  • Delete pushes map into the same call-state model and can cancel the matching ringing notification deterministically.
  • singleTask re-entry handling is a correctness requirement, not optional hardening.
  • Multi-account correctness is mandatory for dedupe, answer, decline, join routing, and delete handling.
  • Notification action flow must not use a notification trampoline. If an action leads to UI, the activity launch must come from a PendingIntent, not startActivity() from a receiver or service.
  • Full-screen and other activity-launching PendingIntent paths must satisfy the required Android 14+ background-activity-launch contract, including creator-side opt-in or an explicitly documented equivalent when the platform requires it.
  • Any path that creates microphone or camera foreground-service state must do so from a visible activity or another platform-allowed interaction boundary. Do not move active-call foreground-service startup into background coordinator work.
  • Existing consumers of CallActivity.active, ApplicationWideCurrentRoomHolder.isDialing, ApplicationWideCurrentRoomHolder.isInCall, and currentRoomToken need an explicit preserve-or-replace plan.
  • Transport failure semantics are explicit. Room lookup failure, polling failure, timeout, and caller disappearance must not collapse into one ambiguous outcome.
  • Telecom, if added later, must reuse the same coordinator and the same active-call lifecycle model instead of creating a second control path.

Sequence matters: fix the ingress lifetime boundary and duplicate race first, freeze canonical identity and terminal-state mapping early, remove notification-visibility-driven control flow and singleTask re-entry bugs before changing notification style, and tighten the incoming-to-active handoff before any Telecom work. Do not defer creator-side background-activity-launch opt-in for the current full-screen PendingIntent path past the first phase that still depends on that path on targetSdk 35+ / API 35+ devices, or the API 35/36 verification matrix is not representative.

Stress-test additions

The plan also needs explicit coverage for these cross-cutting risks before implementation starts:

  • All call launch and recovery surfaces need the same explicit account-scoped routing contract, not only the incoming answer path. That includes outgoing or manual launch from ChatActivity, the ongoing-call notification content intent, missed-call entry, and blocked-presentation recovery entry.
  • The plan needs an explicit ring-latency contract. NotificationWorker currently performs a network getRoom() fetch before the first ringing notification is posted, so phase ordering alone does not remove the cold-start or bad-network latency risk unless ringing becomes renderable from normalized push or other locally available state.
  • The plan needs an explicit FCM ingress execution contract. NCFirebaseMessagingService currently hands call pushes to a plain OneTimeWorkRequest; for urgent ringing this is not equivalent to immediate ingress, so decide whether call ingress stays in onMessageReceived(), moves to expedited work, or proves that ordinary WorkManager scheduling still meets the incoming-call latency budget.
  • The plan needs an explicit threading contract for normalization, account lookup or switch, store transitions, and notification or UI handoff. The checked-in path still uses blockingGet() and main-thread callbacks across NotificationWorker, CallNotificationActivity, CallActivity, and BaseActivity, so ownership cleanup alone will not remove lock-screen latency or ANR risk unless blocking work is kept off the main thread and off the pre-ring critical path.
  • The plan needs an explicit server transport contract for call and delete pushes: FCM priority, TTL, and collapse behavior. Client-side dedupe alone will not prevent stale or missing rings if backend delivery semantics can collapse, defer, or outlive the call.
  • The coordinator also needs a lost-message recovery contract. NCFirebaseMessagingService does not implement onDeletedMessages(), so persisted ringing or blocked states need a deterministic resync path when FCM reports dropped messages or when call/delete pushes are missed upstream.
  • Define the local concurrent-call policy before dedupe and coordinator work begins: what happens when another call is already dialing or active on this device, in the same room or a different room, on the same account or a different account.
  • CallActivity.active is only an activity-visibility signal today, not durable call ownership. Any gating that still depends on it needs an explicit replacement once CallForegroundService becomes the long-lived owner.
  • ApplicationWideCurrentRoomHolder is global and room-token keyed today. Multi-account dedupe, session reuse, and room gating cannot rely on it unchanged.
  • ApplicationWideCurrentRoomHolder is also mutated by ordinary ChatActivity room joins, not only by call flows. A backgrounded active call can therefore lose its room or session marker if the user opens another conversation after CallActivity.active has dropped false.
  • Preserve-or-replace work around ApplicationWideCurrentRoomHolder needs reset semantics for every field it owns, not only room token, session, isDialing, and isInCall. callStartTime is also stored there today and clear() does not reset it.
  • ApplicationWideCurrentRoomHolder.isDialing is set optimistically before CallActivity proves it can own the join path. Permission denial, stale-intent reuse, or other early-abort paths need deterministic rollback so dialing state does not stick and block later cleanup or routing.
  • Breakout-room or room-switch handoff during an active call is a separate lifecycle path that needs an explicit preserve-or-replace plan.
  • Shared notification metadata cannot be added in isolation. Existing room-based cleanup helpers need notification-kind or ownership scoping first, or they will start deleting ringing, missed, or active-call notifications opportunistically.
  • Shared notification metadata also needs account-wide ownership scoping before active-call notifications adopt the same extras. cancelAllNotificationsForAccount() currently cancels every notification carrying KEY_INTERNAL_USER_ID, so a later deleteAll push would also tear down the active-call foreground notification unless notification kind or ownership is encoded first.
  • CallForegroundService currently returns START_STICKY and rebuilds from nullable intent extras. If later phases make it the long-lived owner, the plan needs an explicit service-restart and rehydration contract for null-intent restarts, active-call notification content-intent recovery, and stale-notification cleanup.
  • CallForegroundService is started from CallActivity with a snapshot of pre-join launch extras today. Later phases need an explicit rule for when the ongoing-call notification is refreshed with authoritative post-join routing state instead of keeping stale pre-join data.
  • Any call-entry or resume path that still depends on CurrentUserProviderOld or other app-global active-account state is still vulnerable to stale-account routing after account changes. The cutover to explicit account-scoped routing has to cover incoming, outgoing, ongoing-notification resume, and recovery surfaces.
  • MainActivity is part of that same risk surface and still does getUserWithId(...).blockingGet() plus setUserAsActive(...).blockingGet() on the main thread for notification-driven routing. Missed-call entry and blocked-presentation recovery cannot keep that path without explicitly pulling it into the threading and latency contract.
  • The server transport-loss contract needs a named implementation phase, not only a prerequisite note. FCM priority, TTL, collapse behavior, and client dropped-message recovery together determine whether persisted incoming-call state can be trusted at all.
  • The canonical-identity rule also needs an explicit backend-prerequisite gate. The checked-in push path exposes id and nid, and the plan intentionally does not assume room token alone is sufficient. If call and delete pushes still do not carry one confirmed stable per-call identity before room fetch, phase 1 has to stop and link the required backend change instead of silently inventing a client-local surrogate.
  • The transport contract needs dependency ownership as well as semantics. FCM priority, TTL, collapse behavior, and delete-delivery behavior are not fully repo-local decisions, so the plan should name the backend owner or linked issue and state which client phases can proceed before those server-side prerequisites land.
  • The outgoing or manual call path needs an explicit migration slice, not only a phase-1 decision. ChatActivity currently owns several direct CallActivity launch paths and speculative dialing state, so leaving those outside the coordinator would preserve a second call-control path.
  • singleTask re-entry work needs to cover activity-scoped ViewModel and current-user snapshot reset or rebind semantics, not only stale intent extras. CallActivity, CallRecordingViewModel, and RaiseHandViewModel currently cache routing state at construction time.
  • The local answer mode needs to become first-class coordinator state. voice only versus video answer is currently transient intent data, so process death or stale-task reuse cannot deterministically reconstruct how the user answered unless that choice is persisted or recomputed by contract.
  • Runtime permission denial needs an explicit join-semantics contract. The checked-in path can continue toward joinCall() even when microphone, camera, or Bluetooth permission requests are denied, so the refactor must decide whether denial blocks answer, downgrades it, or allows a degraded join, and how join flags derive from granted device permissions instead of publish capability alone.
  • Post-answer, pre-join dependency failures need their own terminal-state and teardown contract. Recording-consent room lookup, signaling settings fetch, capabilities fetch, room join, and call join can all fail after the ringing UI has handed off but before the call has reached a valid joining boundary.
  • The first phase that introduces shared notification metadata also needs an explicit legacy-notification cleanup rule, so ringing or ongoing-call notifications posted before the new metadata contract do not survive invisibly beside coordinator-owned notifications.
  • Active-account switch failure semantics need to cover missed-call entry, blocked-presentation recovery, and any other MainActivity-routed call recovery surface, not only the worker-time incoming-ring path.
  • The plan needs an explicit startup reconciliation owner for persisted incoming-call state. Once phase 3 lands, some component must rehydrate coordinator records on app start and reconcile them against currently posted notifications, live service state, and authoritative server state so stale ringing, blocked, answered, or failed rows do not survive process death, app restart, or upgrade as active work.
  • The plan needs an explicit expiry and pruning contract for persisted incoming-call records and legacy call notifications. CallTimeoutWorker can model live timeout, but it does not replace startup pruning of expired rows and orphaned notifications left behind after process death, transport loss, or version upgrade.
  • The plan needs an explicit owner for the AnswerRequested -> Joining boundary. Today CallNotificationActivity hands off immediately and CallActivity can start CallForegroundService before join is valid, so the refactor needs one owner for post-answer timeout, retry, teardown, and notification ownership until the call has actually crossed a valid joining boundary.
  • The plan also needs the active-call foreground-service-type contract to be explicit. CallForegroundService currently derives microphone/camera types from pre-join extras and publish capability; later phases should require service-type selection to follow granted runtime permissions and actual local capture state, not only answer mode or stale launch extras.

Implementation plan

Ship this work as one PR per phase, in order.

Phase 0: baseline, instrumentation, and test seams

Work from Android Studio with the gplayDebug build variant.

Also keep the refactor buildable in genericDebug and qaDebug. Coordinator, action, and notification ownership code belongs in main, not gplay.

Reproduce and document:

  • cold-start first incoming call getting discarded (No Notification because MagicFirebaseMessagingService.onDestroy is called nextcloud/talk-android#1011)
  • locked-screen incoming call then unlock or crash (Fullscreen notification disappears after unlocking screen while receiving a call nextcloud/talk-android#3957)
  • Android 16 incoming call surfacing as a message-like push (Call notification is not actually call nextcloud/talk-android#5922)
  • duplicate push for the same call
  • second incoming call while this device is already dialing
  • second incoming call while this device is already in an active call
  • active call backgrounded, then open a different conversation and verify room or session ownership does not drift
  • malformed or unverifiable push payload
  • invalid signature or missing key material during normalization
  • answer on another device while this device is still ringing
  • locked-screen answer with microphone permission missing
  • locked-screen answer with camera permission missing
  • answer with BLUETOOTH_CONNECT missing on API 31+
  • voice-only answer versus video answer
  • room fetch failure or server unreachable while the push is otherwise valid
  • process death between ringing and answer
  • process death or CallForegroundService restart after an active call has taken notification ownership, including null-intent START_STICKY restart and ongoing-call notification resume
  • outgoing or manual call launch after an active-account change or stale current-user cache
  • outgoing or manual call launch aborted before join completes, including permission denial or immediate finish, and verify speculative dialing state is cleared deterministically
  • ongoing-call notification tap while the call UI is backgrounded or in picture-in-picture
  • breakout-room transfer during an active call
  • stale singleTask task reuse
  • transient CallNotificationActivity.onStop() interruptions such as permission dialogs, home, recents, and unlock transitions
  • notification permission denied
  • app-wide notifications disabled
  • calls notification channel disabled
  • full-screen intent unavailable or denied
  • Android 15/16 background-activity-launch blocking on the full-screen PendingIntent path

Keep a concrete verification matrix during this phase:

  • build genericDebug, gplayDebug, and qaDebug
  • verify incoming-call surfacing behavior on API 33, 34, 35, and 36 devices or emulators
  • enable StrictMode.VmPolicy.Builder.detectBlockedBackgroundActivityLaunch() on API 36 verification builds and treat hits as launch-contract regressions until explained
  • enable a targeted StrictMode.ThreadPolicy on verification builds and treat main-thread disk, database, or network work in normalization, account lookup or switch, and answer launch as regressions until explained
  • verify outgoing or manual call launch, ongoing-call notification resume, and room-list gating while the call UI is backgrounded or in picture-in-picture
  • verify breakout-room transfer and room-session reuse behavior across the same matrix where feasible

Add correlation-id logging at:

  • NextcloudTalkApplication.onCreate
  • NCFirebaseMessagingService.onMessageReceived
  • NotificationWorker.doWork
  • active-account switch start, success, and failure
  • normalization start, success, and failure
  • room lookup start, success, and failure
  • ringing notification post, update, and cancel
  • answer and decline action target
  • delete-push handling
  • CallNotificationActivity.onCreate, onResume, and onDestroy
  • CallActivity.onCreate, onResume, and onDestroy
  • join-start, joined, answered elsewhere, declined, missed, and disconnected

Also log:

  • FCM priority and original priority
  • cold-start startup-to-first-ring duration and process-death recovery-to-first-ring duration
  • canonical call-id inputs and the chosen key
  • notification id
  • room token and call-id correlation data
  • delete-push target identifiers versus posted notification identifiers
  • notification permission state
  • app-wide notification state
  • calls notification channel state
  • full-screen intent capability state
  • background-activity-launch opt-in or capability state for the full-screen PendingIntent path

Add narrow tests or test seams for:

  • push normalization
  • canonical call identity selection
  • terminal-state mapping
  • blocked-presentation classification
  • delete-push mapping
  • dedupe rules
  • worker execution and deterministic handoff behavior
  • notification post, update, and cancel ownership
  • activity re-entry and stale-intent reconciliation
  • time, polling, and notification-state abstraction
  • a fake incoming-call or push harness, or an explicitly documented manual device matrix if automation is not feasible

Phase 1: freeze the orchestration contract

Lock the contract before moving state into a persisted store.

Decide and document:

  • the canonical incoming-call identity using server-provided values already available in push and room-fetch flows
  • whether room token alone is sufficient or whether nid, later-enriched objectId, callStartTime, or another server-side value is required
  • that account or internal user id is part of every orchestration identity, dedupe key, and action-routing key
  • the persisted store technology before phase 3. If the incoming-call store uses Room, include schema bump, exported schema update, migration, and migration-test coverage. If it does not use Room, document why its atomic-transition and process-death guarantees are sufficient for dedupe and terminal-state ownership
  • whether the incoming-call path is allowed to switch the app-wide active account, and if so where that ownership lives
  • how active-account switch failure is classified and surfaced instead of silently dropping the call
  • the answer-path launch-data contract: which conversation fields must be persisted versus recomputed before CallActivity can start after process death, lock-screen answer, or stale task reuse
  • whether local answer mode (voice only versus video) is persisted as part of answer-path launch data or guaranteed to be recomputable without ambiguity after process death and stale-task reuse
  • how runtime permission denial changes answer and join semantics, including whether denied microphone or camera permission blocks answer, downgrades it, or allows join without local capture, and how join flags derive from granted device permissions versus publish capability
  • whether outgoing or manual call launch, the ongoing-call notification content intent, missed-call entry, and blocked-presentation recovery all move to the same explicit internal-user or coordinator routing contract as incoming answer, and whether that existing explicit internal-user id becomes the authoritative routing key end-to-end instead of being dropped in favor of the globally active account
  • how active-account switch failure is surfaced for missed-call entry, blocked-presentation recovery, and any other MainActivity-routed call recovery surface instead of silently falling back to generic navigation
  • whether the canonical call identity is already available in current call and delete pushes; if not, which backend change is the prerequisite for phase 1 and which client work can still proceed before it lands
  • the local concurrent-call policy when another call is already dialing or active on this device, including same-room versus different-room and same-account versus different-account behavior
  • the server transport contract for call and delete pushes: FCM priority, TTL, and collapse behavior, plus the stale-state or user-visible outcome when delivery is delayed, collapsed, or dropped
  • which backend owner or linked issue owns those transport guarantees, and which client acceptance criteria stay blocked until that cross-repo dependency is resolved
  • the dropped-message recovery contract when FCM reports deleted messages or the client must resync before trusting persisted incoming-call state
  • whether first-ring must be renderable from push or other locally available data without a blocking room fetch, and if not, the explicit latency budget and user-visible failure outcome when enrichment or room lookup misses it
  • whether ApplicationWideCurrentRoomHolder becomes account-scoped or is replaced before multi-account dedupe, session reuse, or room gating rely on it
  • which component owns room or session identity while a backgrounded active call exists, since ordinary ChatActivity joins also mutate ApplicationWideCurrentRoomHolder
  • how ChatActivity room-switch signaling and ordinary non-call room joins are prevented from mutating or consuming shared call-session ownership while another call is dialing or active
  • the preserve-or-replace contract for ApplicationWideCurrentRoomHolder.callStartTime, including reset semantics when room or session ownership is cleared
  • whether speculative dialing state is still allowed before CallActivity reaches a real joining boundary, and if so which owner clears it on permission denial, stale-intent reuse, or other early-abort paths
  • whether outgoing or manual call launch moves to the same coordinator action contract as incoming answer, and where speculative dialing state is cleared if that launch aborts before join
  • how stale current-user-cache reads are prevented on any call-entry or resume path that still depends on setUserAsActive(), CurrentUserProviderOld, or other app-global active-account state before launching or resuming a call surface
  • whether BaseActivity Talk deep-link interception is moved onto the same explicit account-scoped routing contract as call entry, or explicitly fenced away from call routing
  • which viewmodels, providers, or other long-lived components snapshot the active user at construction time, and how those snapshots are invalidated or rebound when call-related routing changes account
  • how activity-scoped call ViewModels and any cached current-user state are rebound or recreated on singleTask re-entry, stale task reuse, and process recreation
  • the preserve-or-replace contract for breakout-room or room-switch handoff during an active call
  • whether existing room-based cleanup callers are replaced or notification-kind scoping is added before ringing, missed, and active-call notifications participate in one shared metadata contract
  • the cleanup rule for ringing or ongoing-call notifications that were posted before the shared metadata contract exists, so the migration does not leave legacy notifications behind that deterministic cleanup cannot see
  • terminal-state mapping for delete pushes
  • whether the canonical call identity is derivable from both ringing pushes and delete pushes before room fetch; if not, the separate stable delete-routing key and mapping lifecycle that keep pre-enrichment delete handling deterministic
  • whether the store keeps both the server notification id (nid) and the posted system notification id, because delete routing and posted-notification cleanup are different contracts
  • the threading contract for normalization, account lookup or switch, store transitions, notification rendering, and answer handoff, including which owner or dispatcher is allowed to block and which boundaries must stay off the main thread
  • terminal-state mapping for room lookup failure, polling failure, timeout, and caller disappearance
  • terminal-state mapping and teardown behavior for post-answer, pre-join dependency failures: recording-consent room lookup, signaling settings fetch, capabilities fetch, room join, and call join
  • behavior when presentation is blocked by permission, app-wide settings, channel settings, full-screen intent capability, or background-activity-launch restrictions
  • the durable user-visible recovery surface for blocked incoming calls
  • whether blocked full-screen recovery explicitly uses NotificationManager.canUseFullScreenIntent() plus ACTION_MANAGE_APP_USE_FULL_SCREEN_INTENT on Android 14+ when full-screen escalation is denied
  • the pre-join decline contract and required remote or signaling effect
  • the ownership boundary between incoming-call presentation and the ongoing-call foreground service
  • the active-call notification identity contract, including whether CallForegroundService remains a process-wide singleton notification or becomes account-scoped, and which cleanup metadata contract later phases are allowed to rely on
  • how and when the active-call notification content intent and extras are refreshed after join establishes authoritative account, room, or session state instead of freezing pre-join launch data
  • whether CallForegroundService remains START_STICKY, and if so how null-intent restarts rehydrate account-scoped active-call state, notification extras, and resume routing instead of surfacing a generic stale ongoing-call notification
  • which component rehydrates persisted incoming-call state on app start, how it reconciles that state against posted notifications, live service state, and server truth, and how expired or orphaned rows and notifications are pruned
  • how CallForegroundService foreground-service types are derived from granted runtime permissions and actual local capture state instead of pre-join answer mode, publish capability, or stale extras
  • which component owns the AnswerRequested -> Joining transition, including post-answer timeout, retry, teardown, and notification ownership until a valid joining or active boundary exists
  • whether missed calls stay on the current channel or move elsewhere

This phase is not complete until the plan has:

  • a confirmed canonical call identity
  • a chosen persisted store technology with the required migration plan or documented durability guarantees
  • explicit blocked-presentation states
  • explicit failed-versus-missed semantics
  • explicit delete-push mapping
  • an explicit pre-enrichment delete-routing contract
  • a real pre-join decline semantic
  • an explicit active-call notification identity contract
  • no timestamp-based orchestration identity

Phase 2: fix ingress lifetime, add early serialization, and restore delete parity

Fix the current ring-critical ingress boundary before introducing a larger coordinator layer.

End-state for this phase:

  • no incoming-call mutation continues from a logically finished NotificationWorker unless ownership is explicitly handed off
  • one deterministic handoff from ingress into orchestration
  • one early ingress gate prevents duplicate pre-store ringing flows
  • invalid signature, malformed payload, or missing key material terminate as an explicit normalization outcome before state creation, notification work, or delete-routing logic
  • delete pushes and call pushes pass through the same normalization and ownership boundary
  • room-based cleanup helpers are notification-kind or ownership scoped before shared room-token metadata makes ringing, missed, and active notifications mutually visible to existing cancel paths
  • the first phase that introduces shared metadata also cleans up or migrates still-posted legacy ringing and ongoing-call notifications that lack that metadata, so old notifications cannot outlive coordinator ownership invisibly
  • every ringing notification carries stable delete-mappable metadata
  • every ringing, missed-call, and active-call notification carries the account id, room token, and server notification id metadata needed by deterministic cleanup, or a deliberately replaced equivalent contract
  • call and delete push loss is handled explicitly through the chosen transport or resync contract before persisted incoming-call state is treated as authoritative
  • the chosen transport-loss recovery contract is wired to a concrete ingress hook such as FirebaseMessagingService.onDeletedMessages() before persisted incoming-call state is treated as trustworthy after transport loss

If WorkManager stays in the path, use expedited unique work or an equivalent gate keyed by the best canonical identity candidate so duplicate call pushes cannot fork parallel ringing flows before the store exists. That gate must also preserve ordering for later delete or terminal events on the same identity; naive dedupe or ExistingWorkPolicy.KEEP semantics that can drop a newer delete behind an older call-ingress task are incorrect.

Phase 3: add persisted incoming-call identity, dedupe, and terminal states

Introduce a dedicated incoming-call orchestration slice, for example:

app/src/main/java/com/nextcloud/talk/calls/incoming/
 IncomingCallCoordinator.kt
 IncomingCallState.kt
 IncomingCallRecord.kt
 IncomingCallStore.kt
 CallNotificationController.kt
 CallActionReceiver.kt
 CallTimeoutWorker.kt

Its first job is stable call identity, persisted minimum state, duplicate-push dedupe, and explicit terminal states.

Persist the minimum state needed to survive process death and duplicate ingress:

  • canonical call id
  • room token
  • caller identity
  • internal user id
  • current incoming-call state
  • presentation substate
  • server notification id
  • posted system notification id
  • enriched answer-path inputs required to launch or re-launch CallActivity deterministically, or an explicit recompute-before-launch contract: conversation display name, call flag, local answer mode (voice only versus video), publish-permission flags, moderator flag, one-to-one flag, recording state, and base URL or equivalent lookup inputs
  • timestamps
  • last transition cause

Model terminal and blocked outcomes early:

  • Declined
  • Missed
  • AnsweredElsewhere
  • Failed
  • Ended
  • PresentationBlocked

Blocked outcomes must be durably surfaced outside the transient notification attempt so the user still sees what happened when presentation never succeeded.

Define idempotency rules here:

  • duplicate pushes do not create duplicate ringing notifications
  • answer tapped twice does not join twice
  • answer and decline transitions are serialized atomically so duplicate taps or duplicate broadcasts cannot fork concurrent state mutations
  • timeout after answer cannot demote an active call to missed
  • disconnect after decline cannot relaunch UI
  • delete after answer cannot tear down an already-active call through the incoming-call path
  • delete before answer cancels the still-ringing notification exactly once

Phase 4: move incoming-call control to coordinator state and land the shared action path

Use one state machine as the only owner of incoming-call state:

Received -> Normalized -> Enriching -> Ringing -> Presented -> AnswerRequested -> Joining -> Active
 -> Declined / Missed / AnsweredElsewhere / Ended / Failed / PresentationBlocked

Presentation should be modeled orthogonally to lifecycle state. Notification surfaced, full-screen launched, full-screen blocked, and notification blocked cannot all collapse into one Presented node if the blocked-presentation rules above remain requirements.

This phase must land as one slice:

  • remove NotificationWorker use of isNotificationVisible() as state input
  • remove CallNotificationActivity polling of isNotificationVisible()
  • stop using CallNotificationActivity.onStop() as the decline implementation
  • make CallNotificationActivity a thin presentation surface that renders coordinator state, dispatches answer or decline, and finishes when coordinator state says presentation is no longer valid
  • add one dedicated answer and decline action target
  • route both notification actions and the full-screen UI through that same action path
  • audit or remove other notification-triggered call starts such as ChatActivity handling KEY_FROM_NOTIFICATION_START_CALL, so only one notification-driven answer path remains
  • move outgoing or manual ChatActivity call launch onto the same coordinator action contract, or explicitly prove it is safe to keep a second direct CallActivity path
  • persist coordinator state before launching or resuming UI
  • move explicit account-scoped CallActivity launch and user resolution into this phase, so answer-path routing, stale-intent reconciliation, and multi-account correctness are all keyed by the explicit internal user id or persisted coordinator state instead of the globally active account
  • handle onCreate, onNewIntent, unlock after full-screen presentation, recreation after process pressure, duplicate launch intents into existing singleTask activities, and stale-extra reconciliation when an existing call task is reused
  • reset or rebind activity-scoped ViewModels and current-user snapshots on onNewIntent() and stale-task reuse so re-entry does not retain old account or room state
  • do not launch activities directly from a broadcast receiver or service in response to a notification tap or action
  • implement and validate creator-side background-activity-launch requirements for full-screen and other activity-launching PendingIntent paths on targetSdk 35+ / API 35+, including the Android 16 mode choice when visibility-scoped opt-in is sufficient

Do not move incoming notifications to NotificationCompat.CallStyle in this phase. CallStyle alone does not solve the permission problem in the current non-Telecom architecture, and it must not land before the shared action path and re-entry handling are correct.

Phase 5: narrow CallActivity to joining and active-call ownership

After incoming-call ownership has moved out, CallActivity should own:

  • permission gating and user-visible permission recovery needed to answer or join
  • room join and call join
  • signaling and media setup
  • transition into the active-call foreground service
  • in-call UI
  • reporting lifecycle transitions back to incoming-call coordination

It should not remain the source of truth for:

  • whether the app is still ringing
  • whether incoming-call presentation should still exist
  • whether decline, missed, answered-elsewhere, or failed has already happened

This phase must also:

  • start CallForegroundService only when the join path has reached a valid joining or active boundary, without creating a notification-ownership gap on answer
  • move local microphone and camera activation behind that same boundary, so permission and recording-consent flows do not enable capture before join ownership is established
  • derive call-join flags from the final answer mode plus granted runtime permissions, not only from publish capability or stale pre-permission intent state
  • make CallForegroundService the real owner of active-call notification lifetime, so activity recreation does not implicitly tear it down
  • define CallForegroundService restart semantics so system kill or null-intent sticky restart cannot lose account-scoped resume data or resurrect a stale ongoing-call notification
  • make ordinary CallActivity destroy or recreate paths stop neither the active call nor CallForegroundService unless the call state has actually transitioned to teardown
  • when CallForegroundService needs microphone or camera foreground-service types, create it only from a visible activity or another platform-allowed user interaction boundary
  • keep CallActivity on the explicit internal user id or coordinator state contract established in phase 4 instead of regressing to the globally active account
  • give the active-call notification content intent the same account-scoped metadata, request-code or update semantics, and stale-intent handling contract as the incoming-call surface
  • refresh or rebuild the active-call notification once join establishes authoritative routing state, so the ongoing-call content intent does not keep stale pre-join extras from the initial CallActivity launch
  • define teardown, retry, or recovery ownership when recording-consent room lookup, signaling settings fetch, capabilities fetch, room join, or call join fails after answer but before the call reaches a valid joining or active boundary
  • preserve the existing session-reuse behavior coupled through ApplicationWideCurrentRoomHolder until there is an explicit replacement
  • preserve or explicitly replace existing consumers of CallActivity.active and ApplicationWideCurrentRoomHolder call-state flags during the handoff

Phase 6: move incoming ringing notifications to NotificationCompat.CallStyle

Only do this after the shared action path and the active-call handoff are already correct.

Requirements for this phase:

  • define the API 26-30 compatibility and ranking strategy for CallStyle, because pre-31 devices need foreground-service or equivalent treatment if this migration is expected to preserve incoming-call prominence there
  • keep one owner for ringing notification rendering
  • keep one shared answer path and one shared decline path
  • carry caller identity in the notification model
  • do not silently change missed-call channel behavior without an explicit product decision
  • do not merge ongoing-call foreground-notification ownership into the incoming-call notification controller as a side effect of the migration

Phase 7: evaluate Telecom as a separate milestone

Only do this after phases 0 through 6 are stable.

At the start of this phase:

  • re-check current Android guidance for self-managed calling
  • evaluate androidx.core:core-telecom first against a raw self-managed Telecom implementation
  • if androidx.core:core-telecom is chosen, account for its operational timing contract up front: post a valid Notification.CallStyle notification within 5 seconds of addCall, and make onAnswer, onDisconnect, onSetActive, and onSetInactive complete within the documented 5-second timeout
  • choose the integration target based on the state of the platform and the codebase at that time

If Telecom is added, add the required manifest permissions, service registration, phone-account registration, and coordinator integration then. The integration must reuse the same incoming-call coordinator and the same active-call lifecycle model, not create a second call-control path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions