[Perf] Enable forcing perf dispatch use specific implementation by hughperkins · Pull Request #401 · Genesis-Embodied-AI/quadrants

hughperkins · 2026-03-06T00:52:51Z

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough

Document the @qd.perf_dispatch auto-tuning decorator: basic usage, decorator order, compatibility filtering, geometry hashing, tuning parameters, and benchmarking internals.

Two env vars allow overriding the auto-tuning: - QD_PERFDISPATCH_FORCE=dispatcher:impl[,dispatcher2:impl2,...] forces specific implementations for named dispatchers. - QD_PERFDISPATCH_FORCE_INDEX=N forces the Nth registered implementation for all dispatchers (per-dispatcher override takes priority). When either env var is set, all dispatchers print their name and registered implementations on first call, making it easy to discover valid values.

Only keep the per-dispatcher QD_PERFDISPATCH_FORCE env var.

…mentation

Add section on forcing specific implementations and discovering available dispatcher/implementation names.

Use larger arrays, marker array c with IntEnum, do_work_py, and .to_numpy() for assertions — matching the patterns that work reliably on metal/vulkan GPU backends.

duburcqa · 2026-03-09T12:26:23Z

python/quadrants/lang/_perf_dispatch.py

        self._dispatch_impl_set: set[DispatchImpl] = set()
+        self._dispatch_impl_list: list[DispatchImpl] = []
+        self._forced_impl: DispatchImpl | None = None


Why are you storing list and set? This is weird.

duburcqa · 2026-03-09T12:28:17Z

tests/python/test_perf_dispatch.py

@@ -1,9 +1,11 @@
 from enum import IntEnum
 from typing import cast
+from unittest import mock


Don't use this. Use pytest.

duburcqa · 2026-03-09T12:32:01Z

python/quadrants/lang/_perf_dispatch.py

+        if ":" not in pair:
+            print(
+                f"[perf_dispatch] WARNING: ignoring malformed QD_PERFDISPATCH_FORCE entry '{pair}' (expected 'dispatcher:impl')"
+            )


I don't think a simple print is ok. It should be an error. Why would you tolerate this?

duburcqa · 2026-03-09T12:32:41Z

python/quadrants/lang/_perf_dispatch.py

+                f"[perf_dispatch] WARNING: ignoring malformed QD_PERFDISPATCH_FORCE entry '{pair}' (expected 'dispatcher:impl')"
+            )
+            continue
+        dispatcher_name, impl_name = pair.split(":", 1)


dispatcher_name, impl_name = pair.split(":") would guard against invalid characters for free.

duburcqa · 2026-03-09T12:33:56Z

python/quadrants/lang/_perf_dispatch.py

+            )
+            continue
+        dispatcher_name, impl_name = pair.split(":", 1)
+        result[dispatcher_name.strip()] = impl_name.strip()


You could just do dispatcher_name, impl_name = map(str.strip, pair.split(":")) to avoid all these strip if you want.

duburcqa · 2026-03-09T12:34:32Z

python/quadrants/lang/_perf_dispatch.py

+    return result
+
+
+_FORCE_MAP: dict[str, str] = _parse_force_map(_QD_PERFDISPATCH_FORCE_RAW)


What about a frozendict? (if it is already a deps, otherwise not worth it)

frozendict is not a depdenncy of quadrants https://github.com/Genesis-Embodied-AI/quadrants/blob/main/pyproject.toml

…e-implementation # Conflicts: # docs/source/user_guide/index.md

hughperkins · 2026-03-12T02:23:47Z

Opus 4.6 review:

Summary

The branch adds two things:

A QD_PERFDISPATCH_FORCE env var that lets users bypass auto-tuning and pin a specific implementation for a given dispatcher (e.g. QD_PERFDISPATCH_FORCE=my_op:my_op_v2). Useful for debugging/profiling.
A comprehensive user guide for perf_dispatch (docs/source/user_guide/user_guide/perf_dispatch.md).

What's done well

Clean parsing/resolution pattern. _parse_force_map() handles malformed input gracefully with warnings. The lazy _resolve_force() pattern — deferring resolution to first call rather than registration time — is the right approach since all @register calls may not have happened yet.
Good test coverage. Three tests cover the key cases: force by name, unmatched name fallback, and multiple dispatchers. Using mock.patch.object to inject _FORCE_MAP / _ANY_FORCE_ACTIVE is clean and avoids environment variable side effects.
Excellent documentation. The user guide is thorough — covers basic usage, decorator order, compatibility filtering, geometry hash, tuning parameters, the forcing mechanism, and a complete example. Well-structured.
Improved log messages. The switch from perf dispatch chose X to perf_dispatch 'my_op': chose 'my_op_v2' adds the dispatcher name, making multi-dispatcher logs actually debuggable.

Issues to address

1. `_ANY_FORCE_ACTIVE` triggers verbose prints for all dispatchers

When QD_PERFDISPATCH_FORCE=op_a:v1 is set, every dispatcher — even unrelated ones — prints registration and available-implementations messages. This is intentional for discoverability (via QD_PERFDISPATCH_FORCE=?) but will be noisy in production scripts with many dispatchers.

Consider gating the extra prints on the dispatcher actually being in the force map, and having a separate QD_PERFDISPATCH_LIST=1 mode for discovery.

2. Parallel `_dispatch_impl_set` and `_dispatch_impl_list`

The branch adds _dispatch_impl_list alongside the existing _dispatch_impl_set — both are appended to during registration. The list is needed for ordered name lookup during force resolution, but maintaining two parallel collections is a maintenance risk.

Consider whether the set can be replaced by the list (using set() views where needed), or at least add a comment explaining why both exist.

3. `_resolve_force()` called on every `call`

After the first call, _force_resolved is True and it short-circuits, but the branch check still runs on every invocation. This is cheap (a boolean check), but in a hot path it's an extra branch per call. Could be eliminated by swapping __call__ to a different method after resolution, though that's arguably over-engineering.

4. Stray SPIRV-Cross submodule bump

The diff includes a SPIRV-Cross submodule pointer change (2dcbe8d → d8e3e2b), likely pulled in from the merge commit. Probably unintentional — should be excluded or called out.

5. No negative test for `_parse_force_map`

The parser warns on malformed entries (no colon), but there's no test exercising that warning path directly, or testing edge cases like trailing commas, empty entries, or whitespace-only entries.

…o single _dispatch_impls list

hughperkins · 2026-03-12T02:33:37Z

additional thougths from reviewer opus on concern 1:

This one I'd actually leave as-is. A few reasons:

The noise is bounded and useful. In practice, a project probably has a handful of perf_dispatch dispatchers, not hundreds. Seeing all of them when you set the env var gives
you immediate orientation — you know what's available without having to grep the codebase.
The discovery and forcing use cases are intertwined. If you only printed info for the targeted dispatcher, you'd need to already know the exact dispatcher name to use the
env var — but the whole point of the verbose output is to help you discover names in the first place. A separate QD_PERFDISPATCH_LIST=1 env var would work but adds another
knob for marginal benefit.
It's already documented. The user guide explicitly describes this as the discovery mechanism (QD_PERFDISPATCH_FORCE=?). Changing the behavior would mean the docs and the
code diverge.

If it ever does become noisy (many dispatchers), it could be refined later by only printing the full listing once at process exit or by adding a filter. But that's a problem for
when it actually materializes. I'd drop this concern from the review or relabel it as "future consideration."
Want me to update the review doc?

hughperkins · 2026-03-12T02:34:36Z

update from reviewer opus on concern 5:

Honestly, this one is borderline. _parse_force_map is a simple string parser that's only called once at module load with an env var value. The happy paths (valid entries,
empty string) are already exercised indirectly by the force tests. The edge cases (no colon, trailing commas, whitespace) are trivial enough that you can verify correctness by
reading the code — split(","), strip(), and the ":" not in pair check handle them obviously.
Adding a dedicated unit test for a 15-line parser that's unlikely to be modified feels like test bloat. If the parser were more complex or had tricky corner cases, it'd be
different. I'd drop this concern.

hughperkins · 2026-03-12T02:35:32Z

reviewer opus update on concern 3:

This is technically correct — there's a self._resolve_force() call plus a self._forced_impl is not None check on every call — but it's not worth fixing. Two boolean checks
(self._force_resolved early-return, then self._forced_impl is not None) are essentially free compared to the actual kernel execution that follows. Python function call
overhead on _resolve_force() is in the tens of nanoseconds.
The alternative (swapping call to a different method after resolution, or using a flag to replace the method) adds real complexity for an unmeasurable performance gain.
I'd drop this one too.

Consistent with the rest of the test suite.

Rejects entries with multiple colons via unpacking ValueError rather than silently accepting them.

…tead of warning Fail loudly so users don't silently get normal benchmarking when they intended to force an implementation.

hughperkins added 6 commits March 5, 2026 18:08

[Docs] Add user guide for perf_dispatch

3b4212d

Document the @qd.perf_dispatch auto-tuning decorator: basic usage, decorator order, compatibility filtering, geometry hashing, tuning parameters, and benchmarking internals.

Remove QD_PERFDISPATCH_FORCE_INDEX

fb5d154

Only keep the per-dispatcher QD_PERFDISPATCH_FORCE env var.

Merge branch 'hp/perf-dispatch-doc' into hp/perf-dispatch-force-imple…

1526d3e

…mentation

[Docs] Document QD_PERFDISPATCH_FORCE env var

17bc7e4

Add section on forcing specific implementations and discovering available dispatcher/implementation names.

[Misc] Apply black formatting to perf_dispatch files

8d24fff

hughperkins changed the title ~~[Perf] Enable perf dispatch force implementation via env var~~ [Perf] Enable forcing perf dispatch use specific implementation Mar 6, 2026

[Fix] Rewrite force tests to match existing test patterns

ed93882

Use larger arrays, marker array c with IntEnum, do_work_py, and .to_numpy() for assertions — matching the patterns that work reliably on metal/vulkan GPU backends.

hughperkins marked this pull request as draft March 6, 2026 02:39

duburcqa reviewed Mar 9, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into hp/perf-dispatch-forc…

c96fc22

…e-implementation # Conflicts: # docs/source/user_guide/index.md

hughperkins added 2 commits March 11, 2026 19:31

[Refactor] Consolidate _dispatch_impl_set and _dispatch_impl_list int…

b851fb0

…o single _dispatch_impls list

[Fix] Revert accidental SPIRV-Cross submodule bump

5bc67c3

hughperkins marked this pull request as ready for review March 12, 2026 02:35

hughperkins added 4 commits March 11, 2026 19:41

[Refactor] Replace unittest.mock with pytest monkeypatch in force tests

19cc338

Consistent with the rest of the test suite.

[Fix] Use split(":") instead of split(":", 1) in force map parser

89bfa55

Rejects entries with multiple colons via unpacking ValueError rather than silently accepting them.

[Fix] Raise ValueError on malformed QD_PERFDISPATCH_FORCE entries ins…

799a8e5

…tead of warning Fail loudly so users don't silently get normal benchmarking when they intended to force an implementation.

[Misc] Use map(str.strip, ...) to simplify force map parsing

82ea3fd

hughperkins assigned duburcqa Mar 12, 2026

[Misc] Apply black/ruff formatting fixes

9d9445b

		return result


		_FORCE_MAP: dict[str, str] = _parse_force_map(_QD_PERFDISPATCH_FORCE_RAW)

Conversation

hughperkins commented Mar 6, 2026

Brief Summary

Walkthrough

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hughperkins commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's done well

Issues to address

1. _ANY_FORCE_ACTIVE triggers verbose prints for all dispatchers

2. Parallel _dispatch_impl_set and _dispatch_impl_list

3. _resolve_force() called on every __call__

4. Stray SPIRV-Cross submodule bump

5. No negative test for _parse_force_map

Uh oh!

hughperkins commented Mar 12, 2026

Uh oh!

hughperkins commented Mar 12, 2026

Uh oh!

hughperkins commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hughperkins commented Mar 12, 2026 •

edited

Loading

1. `_ANY_FORCE_ACTIVE` triggers verbose prints for all dispatchers

2. Parallel `_dispatch_impl_set` and `_dispatch_impl_list`

3. `_resolve_force()` called on every `call`

5. No negative test for `_parse_force_map`