Skip to content

fix: survive psutil host_statistics64 failures on macOS 26#2161

Open
jasonpaulso wants to merge 1 commit into
exo-explore:mainfrom
jasonpaulso:fix/psutil-darwin27-memory
Open

fix: survive psutil host_statistics64 failures on macOS 26#2161
jasonpaulso wants to merge 1 commit into
exo-explore:mainfrom
jasonpaulso:fix/psutil-darwin27-memory

Conversation

@jasonpaulso

Copy link
Copy Markdown

Summary

On macOS 26 (Darwin 27), the kernel's vm_statistics64 struct grew, so psutil (≤ 7.2.2, including latest) calls host_statistics64 with a too-small buffer and raises RuntimeError: host_statistics64(HOST_VM_INFO64) syscall failed: (ipc/mig) array not large enough on most calls (~87–100% failure rate in our testing; psutil 7.2.2 failed 30/30). psutil.swap_memory() is affected the same way.

In exo this crashed the MLX runner at import time — cache.py computes its memory threshold at module load — so model instances reliably failed one or more times before deploying (the worker retried until a call happened to succeed), and MemoryUsage profiling was degraded.

Change

Adds exo/utils/virtual_memory.py with virtual_memory_statistics() / swap_memory_statistics():

  • Try psutil first — non-Darwin platforms and fixed psutil versions keep the exact current behavior.
  • On RuntimeError (Darwin only), fall back to calling host_statistics64 directly via ctypes with a generously sized buffer (1024 naturals), reading free/inactive/speculative at their stable indices and computing available = inactive + free − speculative, matching psutil's formula. Swap falls back to sysctlbyname("vm.swapusage").

Callers in cache.py and profiling.py switch to the helpers; no behavior change on healthy systems.

Validation

  • New tests in src/exo/utils/tests/test_virtual_memory.py, including a 30-iteration never-raises loop (which reliably fails against bare psutil on Darwin 27).
  • On the affected 2-node cluster (both on macOS 26): model deployments went from 1–3 failures per launch to 3/3 first-attempt successes, with zero host_statistics64 errors in the logs since.

Upstream context: psutil's struct-size assumption is the root cause; until a fixed psutil release is available and pinned, this keeps exo working on macOS 26.

🤖 Generated with Claude Code

Darwin 27 grew the kernel's vm_statistics64 struct, so psutil (<= 7.2.2)
calls host_statistics64 with a too-small buffer and raises "(ipc/mig)
array not large enough" on most calls. psutil.swap_memory is affected
the same way.

The runner imports exo.worker.engines.mlx.cache, which calls
psutil.virtual_memory() at module import to derive the default KV-cache
eviction threshold, so runners crashed at startup with ~90% probability
and instances only deployed after one or more supervisor restarts. The
same error crashed cache eviction mid-inference and flooded the memory
monitor with warnings.

Add exo.utils.virtual_memory with psutil-compatible helpers that try
psutil first and, on Darwin, fall back to calling host_statistics64
directly with a generously sized buffer (the struct is append-only, so
the leading field offsets are stable) and sysctlbyname for totals and
swap usage. Route the runner cache module, MemoryUsage, and the memory
monitor through the helpers.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 10, 2026 21:06

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a psutil-backed memory stats helper with a Darwin fallback to work around macOS 26/Darwin 27 vm_statistics64 incompatibility, and wires it into places that previously called psutil directly.

Changes:

  • Replaced direct psutil.virtual_memory() usage in MLX cache + profiling paths with virtual_memory_statistics().
  • Introduced exo.utils.virtual_memory with a Darwin kernel-call fallback (Mach + sysctl).
  • Added pytest coverage for basic invariants and Darwin-only fallback shape.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/exo/worker/engines/mlx/cache.py Switches memory-threshold + available/used calculations to the new virtual memory helper.
src/exo/utils/virtual_memory.py New module implementing psutil-first memory stats with Darwin fallback via host_statistics64/sysctlbyname.
src/exo/utils/tests/test_virtual_memory.py Adds sanity tests and a Darwin-only test for the fallback helpers.
src/exo/utils/info_gatherer/info_gatherer.py Updates memory monitoring to use the renamed MemoryUsage.from_system.
src/exo/shared/types/profiling.py Renames MemoryUsage.from_psutil to from_system and routes through the new helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +60 to +83
def virtual_memory_statistics() -> VirtualMemoryStatistics:
try:
virtual_memory = psutil.virtual_memory()
return VirtualMemoryStatistics(
total_bytes=virtual_memory.total,
available_bytes=virtual_memory.available,
)
except RuntimeError:
if sys.platform != "darwin":
raise
return _darwin_virtual_memory_statistics()


def swap_memory_statistics() -> SwapMemoryStatistics:
try:
swap_memory = psutil.swap_memory()
return SwapMemoryStatistics(
total_bytes=swap_memory.total,
free_bytes=swap_memory.free,
)
except RuntimeError:
if sys.platform != "darwin":
raise
return _darwin_swap_memory_statistics()
Comment on lines +29 to +36
# Indices into the vm_statistics64 struct viewed as an array of 32-bit
# naturals. The struct is pragma pack(4): four leading naturals, then nine
# 64-bit counters (zero_fill_count .. purges), then purgeable_count and
# speculative_count. 4 + 9 * 2 = 22.
_FREE_PAGES_INDEX = 0
_INACTIVE_PAGES_INDEX = 2
_SPECULATIVE_PAGES_INDEX = 23
_MINIMUM_NATURALS = _SPECULATIVE_PAGES_INDEX + 1
Comment on lines +117 to +127
def _sysctl_by_name(name: str, buffer: ctypes.c_uint64 | _SwapUsage) -> None:
libc = _libc()
size = ctypes.c_size_t(ctypes.sizeof(buffer))
result = cast(
int,
libc.sysctlbyname(
name.encode(), ctypes.byref(buffer), ctypes.byref(size), None, 0
),
)
if result != 0:
raise OSError(f"sysctlbyname({name!r}) failed")
Comment on lines 29 to 41
@classmethod
def from_psutil(cls, *, override_memory: int | None) -> Self:
vm = psutil.virtual_memory()
sm = psutil.swap_memory()
def from_system(cls, *, override_memory: int | None) -> Self:
virtual_memory = virtual_memory_statistics()
swap_memory = swap_memory_statistics()

return cls.from_bytes(
ram_total=vm.total,
ram_available=vm.available if override_memory is None else override_memory,
swap_total=sm.total,
swap_available=sm.free,
ram_total=virtual_memory.total_bytes,
ram_available=virtual_memory.available_bytes
if override_memory is None
else override_memory,
swap_total=swap_memory.total_bytes,
swap_available=swap_memory.free_bytes,
)
Comment on lines +20 to +25
def test_virtual_memory_statistics_never_raises():
# On Darwin 27 psutil.virtual_memory() fails intermittently with
# "host_statistics64 ... (ipc/mig) array not large enough"; the
# fallback must absorb that on every call.
for _ in range(30):
virtual_memory_statistics()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants