fix: survive psutil host_statistics64 failures on macOS 26#2161
Open
jasonpaulso wants to merge 1 commit into
Open
fix: survive psutil host_statistics64 failures on macOS 26#2161jasonpaulso wants to merge 1 commit into
jasonpaulso wants to merge 1 commit into
Conversation
Darwin 27 grew the kernel's vm_statistics64 struct, so psutil (<= 7.2.2) calls host_statistics64 with a too-small buffer and raises "(ipc/mig) array not large enough" on most calls. psutil.swap_memory is affected the same way. The runner imports exo.worker.engines.mlx.cache, which calls psutil.virtual_memory() at module import to derive the default KV-cache eviction threshold, so runners crashed at startup with ~90% probability and instances only deployed after one or more supervisor restarts. The same error crashed cache eviction mid-inference and flooded the memory monitor with warnings. Add exo.utils.virtual_memory with psutil-compatible helpers that try psutil first and, on Darwin, fall back to calling host_statistics64 directly with a generously sized buffer (the struct is append-only, so the leading field offsets are stable) and sysctlbyname for totals and swap usage. Route the runner cache module, MemoryUsage, and the memory monitor through the helpers. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a psutil-backed memory stats helper with a Darwin fallback to work around macOS 26/Darwin 27 vm_statistics64 incompatibility, and wires it into places that previously called psutil directly.
Changes:
- Replaced direct
psutil.virtual_memory()usage in MLX cache + profiling paths withvirtual_memory_statistics(). - Introduced
exo.utils.virtual_memorywith a Darwin kernel-call fallback (Mach + sysctl). - Added pytest coverage for basic invariants and Darwin-only fallback shape.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/exo/worker/engines/mlx/cache.py | Switches memory-threshold + available/used calculations to the new virtual memory helper. |
| src/exo/utils/virtual_memory.py | New module implementing psutil-first memory stats with Darwin fallback via host_statistics64/sysctlbyname. |
| src/exo/utils/tests/test_virtual_memory.py | Adds sanity tests and a Darwin-only test for the fallback helpers. |
| src/exo/utils/info_gatherer/info_gatherer.py | Updates memory monitoring to use the renamed MemoryUsage.from_system. |
| src/exo/shared/types/profiling.py | Renames MemoryUsage.from_psutil to from_system and routes through the new helper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+60
to
+83
| def virtual_memory_statistics() -> VirtualMemoryStatistics: | ||
| try: | ||
| virtual_memory = psutil.virtual_memory() | ||
| return VirtualMemoryStatistics( | ||
| total_bytes=virtual_memory.total, | ||
| available_bytes=virtual_memory.available, | ||
| ) | ||
| except RuntimeError: | ||
| if sys.platform != "darwin": | ||
| raise | ||
| return _darwin_virtual_memory_statistics() | ||
|
|
||
|
|
||
| def swap_memory_statistics() -> SwapMemoryStatistics: | ||
| try: | ||
| swap_memory = psutil.swap_memory() | ||
| return SwapMemoryStatistics( | ||
| total_bytes=swap_memory.total, | ||
| free_bytes=swap_memory.free, | ||
| ) | ||
| except RuntimeError: | ||
| if sys.platform != "darwin": | ||
| raise | ||
| return _darwin_swap_memory_statistics() |
Comment on lines
+29
to
+36
| # Indices into the vm_statistics64 struct viewed as an array of 32-bit | ||
| # naturals. The struct is pragma pack(4): four leading naturals, then nine | ||
| # 64-bit counters (zero_fill_count .. purges), then purgeable_count and | ||
| # speculative_count. 4 + 9 * 2 = 22. | ||
| _FREE_PAGES_INDEX = 0 | ||
| _INACTIVE_PAGES_INDEX = 2 | ||
| _SPECULATIVE_PAGES_INDEX = 23 | ||
| _MINIMUM_NATURALS = _SPECULATIVE_PAGES_INDEX + 1 |
Comment on lines
+117
to
+127
| def _sysctl_by_name(name: str, buffer: ctypes.c_uint64 | _SwapUsage) -> None: | ||
| libc = _libc() | ||
| size = ctypes.c_size_t(ctypes.sizeof(buffer)) | ||
| result = cast( | ||
| int, | ||
| libc.sysctlbyname( | ||
| name.encode(), ctypes.byref(buffer), ctypes.byref(size), None, 0 | ||
| ), | ||
| ) | ||
| if result != 0: | ||
| raise OSError(f"sysctlbyname({name!r}) failed") |
Comment on lines
29
to
41
| @classmethod | ||
| def from_psutil(cls, *, override_memory: int | None) -> Self: | ||
| vm = psutil.virtual_memory() | ||
| sm = psutil.swap_memory() | ||
| def from_system(cls, *, override_memory: int | None) -> Self: | ||
| virtual_memory = virtual_memory_statistics() | ||
| swap_memory = swap_memory_statistics() | ||
|
|
||
| return cls.from_bytes( | ||
| ram_total=vm.total, | ||
| ram_available=vm.available if override_memory is None else override_memory, | ||
| swap_total=sm.total, | ||
| swap_available=sm.free, | ||
| ram_total=virtual_memory.total_bytes, | ||
| ram_available=virtual_memory.available_bytes | ||
| if override_memory is None | ||
| else override_memory, | ||
| swap_total=swap_memory.total_bytes, | ||
| swap_available=swap_memory.free_bytes, | ||
| ) |
Comment on lines
+20
to
+25
| def test_virtual_memory_statistics_never_raises(): | ||
| # On Darwin 27 psutil.virtual_memory() fails intermittently with | ||
| # "host_statistics64 ... (ipc/mig) array not large enough"; the | ||
| # fallback must absorb that on every call. | ||
| for _ in range(30): | ||
| virtual_memory_statistics() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
On macOS 26 (Darwin 27), the kernel's
vm_statistics64struct grew, so psutil (≤ 7.2.2, including latest) callshost_statistics64with a too-small buffer and raisesRuntimeError: host_statistics64(HOST_VM_INFO64) syscall failed: (ipc/mig) array not large enoughon most calls (~87–100% failure rate in our testing; psutil 7.2.2 failed 30/30).psutil.swap_memory()is affected the same way.In exo this crashed the MLX runner at import time —
cache.pycomputes its memory threshold at module load — so model instances reliably failed one or more times before deploying (the worker retried until a call happened to succeed), andMemoryUsageprofiling was degraded.Change
Adds
exo/utils/virtual_memory.pywithvirtual_memory_statistics()/swap_memory_statistics():RuntimeError(Darwin only), fall back to callinghost_statistics64directly via ctypes with a generously sized buffer (1024 naturals), readingfree/inactive/speculativeat their stable indices and computingavailable = inactive + free − speculative, matching psutil's formula. Swap falls back tosysctlbyname("vm.swapusage").Callers in
cache.pyandprofiling.pyswitch to the helpers; no behavior change on healthy systems.Validation
src/exo/utils/tests/test_virtual_memory.py, including a 30-iteration never-raises loop (which reliably fails against bare psutil on Darwin 27).host_statistics64errors in the logs since.Upstream context: psutil's struct-size assumption is the root cause; until a fixed psutil release is available and pinned, this keeps exo working on macOS 26.
🤖 Generated with Claude Code