Skip to content

clock module freezes at suspend timestamp; no recovery after resume (intermittent) #5036

@moskovich

Description

@moskovich

Summary

The clock module's displayed time froze on the exact minute the system entered s2idle suspend, and did not recover for ~15 hours after resume. Restarting waybar cleared it.

This appears related to the missed-wake-up race fixed by #4889, but the duration of the freeze suggests there may be a second issue: even with a missed PrepareForSleep signal, the monotonic-clock fallback in sleep_for should expire within ~60s of resume.

Environment

  • Waybar 0.15.0 (Arch Linux package waybar 0.15.0-2)
  • Compositor: Hyprland
  • Kernel: 7.0.3-arch1-2
  • Clock module config: default interval: 60, format: "{:L%A %H:%M}"

Observation

Displayed time: Wed 16:39. Actual time when I noticed: Thu 07:41 (~15h drift).

Waybar process was alive, single thread parked in futex_do_wait, uptime 4d 22h. The same process had survived many earlier suspend/resume cycles with the clock recovering correctly.

journalctl -k for the moment of the freeze:

May 13 16:39:48 systemd-logind[964]: The system will suspend now!
May 13 16:39:49 systemd[1]:           Starting System Suspend...
May 13 16:39:49 kernel:               PM: suspend entry (s2idle)

— suspend entry timestamp matches the frozen display to the minute.

systemctl --user restart waybar-equivalent restored normal behavior.

Why I think this is more than #4889

#4889 fixes a race where the worker thread's unsynchronized signal_ = false at the top of the loop body could clobber a concurrent signal_ = true from wake_up(), missing the resume signal from org.freedesktop.login1.PrepareForSleep.

But SleeperThread::sleep_for also schedules a fallback timeout via condvar_.wait_until(steady_clock::now() + dur, ...). On Linux, CLOCK_MONOTONIC pauses across suspend, so after resume the deadline is effectively "now + dur" — the worker should wake within at most one interval_ (60s) of resume regardless of whether the D-Bus signal landed.

So a pure missed-signal race shouldn't produce a 15-hour freeze. Candidates for what's actually stuck:

  • Worker thread blocked in dp.emit() because Glib::Dispatcher's pipe is full or the main loop is wedged.
  • A deadlock against the main thread holding something the worker waits on (or vice versa).
  • wait_until actually using CLOCK_REALTIME with a pathological deadline (less likely with current libstdc++ + glibc).

I did not capture a stack trace before restarting, so this is speculation. If anyone else hits the freeze, gdb -p $(pidof waybar) then thread apply all bt would distinguish these.

Reproduction

Not deterministic. The same waybar process survived dozens of suspend cycles before this one (per Reached target Suspend entries in journal). It happened once across ~5 days of uptime.

Asks

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions