Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions benchmark/TTFX_REPORT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# TTFX investigation: operator-product specialization

This note records a time-to-first-execution (TTFX) investigation of `SecondQuantizedAlgebra`, motivated by the first-call latency of `QuantumCumulants.meanfield` + `complete`. It documents where the compile-time cost goes, the change made in this branch, and the larger architectural option for future work.

## Methodology

TTFX is reported as **wall-clock time of the first call in a fresh `julia --startup-file=no` process, median of 5 runs**, after precompilation is already done. This isolates first-call latency (inference, native codegen, execution) from precompilation. The `@snoop_inference` instrumented time is used only for *ranking* MethodInstances and packages; its absolute numbers carry roughly ±1.5 s of noise on this machine and are not reliable for A/B magnitude. Measurements were taken on Julia 1.12.6 using the Jaynes-Cummings model (one Fock mode, one two-level atom) as the workload.

## Where the first-call time goes

Breakdown of first-call inference for `meanfield` + `complete` (JC model), grouped by the package owning each inferred method:

| Bucket | Share | What it is |
| :-- | --: | :-- |
| Base / stdlib | ~46% | `Dict` hashing/rehash, array growth, `Broadcast`, specialized on symbolic key/value types |
| SymbolicUtils | ~28% | v4 `Add`/`Mul` worker buffers specialized on many argument-tuple types |
| SecondQuantizedAlgebra | ~16-19% | operator algebra (`*` and the canonicalization passes) |
| Moshi | ~2.5% | the sum-type machinery the SymbolicUtils v4 backend uses |

There is no single hotspot. About three quarters of the cost is the symbolic coefficient/expression stack (Base plus SymbolicUtils), which is upstream of this package and independent of how operators are represented. The remaining ~16-19% is this package's own operator algebra, which is the part addressable here.

## Operator specialization inventory

There are 7 concrete `QSym` types: `Destroy`, `Create`, `Transition`, `Pauli`, `Spin`, `Position`, `Momentum`. The per-pair methods grow as O(n^2):

| Function | Methods | Growth |
| :-- | --: | :-- |
| `_site_compare` | 12 | per operator pair |
| `_can_commute` | 12 | per operator pair |
| `_commute_pair` | 5 | per operator pair |
| `_reduce_pair` | 3 | per operator pair |
| `adjoint`, `_ground_state_expand`, `_type_order` | ~9 | per operator type |

Operator products flow through `Vector{QSym}` (in `QTerm.ops` and the canonicalization pipeline) and `QTermDict = Dict{QTerm, CNum}`. Because the element type is the abstract `QSym`, every `Base` container method specializes on the abstract type and element access dispatches dynamically.

## Change in this branch

Two source-level changes, both verified to preserve results:

1. **Function-barrier return-type assertions** in the canonicalization passes (`src/algebra/passes.jl`, `src/algebra/pipelines.jl`, `src/expressions/index.jl`). The leaf functions `_site_compare`, `_can_commute`, `_commute_pair`, `_reduce_pair`, and the `.index` field access each have a concrete return type, but inference widens them to `Any` because they are called on elements of the abstract `Vector{QSym}`. Pinning the true return type at each call site (`::SiteCmp`, `::Bool`, `::Tuple{...}`, `::Index`) stops `Any` from cascading into the downstream code. This does not remove the dynamic dispatch into those functions (that is inherent to the heterogeneous product), but it removes the secondary dispatch cascade: `@report_opt` on a product goes from 15 to 5 runtime-dispatch sites.

2. **`@nospecialize` / `Base.@nospecializeinfer`** on the three operator-product methods `*(::QSym, ::QSym)`, `*(::QAdd, ::QSym)`, `*(::QSym, ::QAdd)`. These are single methods that Julia specialized once per concrete operator pair (`*(::Destroy, ::QAdd)`, `*(::Create, ::Create)`, and so on), even though their bodies only box the operand into a `Vector{QSym}` and run the pipeline, so the concrete operator type is irrelevant to the body. Collapsing those specializations cuts native codegen for the product entry points.

### Measured effect

Wall-clock first-call (`meanfield` + `complete`, JC model), median of 5 fresh runs:

| State | build | meanfield | complete | total first-exec |
| :-- | --: | --: | --: | --: |
| baseline | 2.49 s | 13.10 s | 1.01 s | **17.13 s** |
| this branch | 1.92 s | 12.29 s | 0.70 s | **15.40 s** |

A reproducible **~1.7 s (~10%)** reduction in first-call latency (run-to-run spread ~0.4 s). Operator-product results and runtime are unchanged (for example `a*a'` still gives `1 + a'*a`, product runtime ~1.5 us). Note that `@nospecialize` reduces codegen rather than inference, so this win shows in wall-clock first-call but not in `@snoop_inference` totals.

## Architectural option (future work)

The largest remaining lever on this package's share is to collapse the 7 `QSym` subtypes into one concrete sum type, for example via Moshi's `@data` (the same mechanism SymbolicUtils v4 uses for `BasicSymbolicImpl`). This would:

- make `Vector{QSym}` a concrete-element vector, so `Base` container methods specialize once and element access stops being dynamic dispatch; and
- collapse the ~32 per-pair O(n^2) methods into a handful of functions that branch on a tag (via `@match`), turning `_site_compare(::QSym, ...)` runtime dispatch into static calls.

This targets the SecondQuantizedAlgebra share fully plus a slice of the Base share. It does not touch the SymbolicUtils coefficient stack, which is the remaining floor. It is a substantial refactor (rewriting the 7 operator types and the per-pair methods as tagged variants and `@match` branches, with a closed-world variant set), and the per-pair methods encode real algebraic correctness (Pauli epsilon, Transition composition, Spin commutators), so each branch must be preserved and guarded by the test suite. A reference datapoint for the magnitude: a 100-variant stress test compiles in ~6.7 s as a sum type versus ~56 s as a `Union`.

## Summary

- The dominant first-call cost is the upstream symbolic stack (~74%), not the operator algebra.
- This branch removes the operator-product codegen explosion for a measured ~10% first-call reduction, with no change to results or runtime.
- A sum-type collapse of the operator hierarchy is the principled next step for this package's remaining share, scoped as its own project.
13 changes: 8 additions & 5 deletions src/algebra/passes.jl
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@ function _partial_sort!(ops::Vector{QSym}, ne::Vector{NonEqualPair})
for i in 2:n
j = i
while j > 1
cmp = _site_compare(ops[j - 1], ops[j], ne)
# ops is Vector{QSym} (abstract elt), so leaf-pass dispatch is dynamic;
# pin the concrete return type so it cannot widen to Any and cascade.
cmp = _site_compare(ops[j - 1], ops[j], ne)::SiteCmp
cmp === Greater || break
ops[j - 1], ops[j] = ops[j], ops[j - 1]
j -= 1
Expand Down Expand Up @@ -55,7 +57,7 @@ same-site pair `(a, b)` for which `_reduce_pair(a, b)` returns non-`nothing`.
i = 1
while i < length(ops)
a, b = ops[i], ops[i + 1]
kind, new_op, factor = _reduce_pair(a, b)
kind, new_op, factor = _reduce_pair(a, b)::Tuple{ReduceKind, QSym, CNum}
if kind === NoReduction
i += 1
else
Expand Down Expand Up @@ -96,9 +98,10 @@ function _commute_ops_at(sink::F, ops::Vector{QSym}, c::CNum, start::Int) where
i = start
while i < length(ops)
a, b = ops[i], ops[i + 1]
cmp = _site_compare(a, b, _EMPTY_NE)
if cmp === Equal && !_can_commute(a, b)
sw_b, sw_a, residual_coeff, residual_ops = _commute_pair(a, b)
cmp = _site_compare(a, b, _EMPTY_NE)::SiteCmp
if cmp === Equal && !(_can_commute(a, b)::Bool)
sw_b, sw_a, residual_coeff, residual_ops =
_commute_pair(a, b)::Tuple{QSym, QSym, CNum, Vector{QSym}}
# Swap branch: stepping back to i-1 because ops[i-1] may now form
# a new non-canonical pair with sw_b.
swapped = copy(ops)
Expand Down
12 changes: 6 additions & 6 deletions src/algebra/pipelines.jl
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ end

function _depends_on_index_ops(c::CNum, ops::Vector{QSym}, idx::Index)
for op in ops
op.index == idx && return true
op.index::Index == idx && return true
end
return _depends_on_index_term(c, ops, idx)
end
Expand Down Expand Up @@ -129,7 +129,7 @@ function _accumulate_with_diag!(
_canonicalize!(out, copy(ops), c, augmented_ne)

for (sum_idx, ext_idx) in diag_pairs
sub_ops = QSym[change_index(o, sum_idx, ext_idx) for o in ops]
sub_ops = QSym[change_index(o, sum_idx, ext_idx)::QSym for o in ops]
sub_c = change_index(c, sum_idx, ext_idx)
sub_ne = _drop_ne_with(ne, sum_idx)
# Sibling diagonal slices for the same sum must be mutually exclusive;
Expand Down Expand Up @@ -178,7 +178,7 @@ function _emit_scaled_by_scope!(
return nothing
end

function Base.:*(a::QSym, b::QSym)
Base.@nospecializeinfer function Base.:*(@nospecialize(a::QSym), @nospecialize(b::QSym))
out = QTermDict()
_emit_product!(
out,
Expand All @@ -189,7 +189,7 @@ function Base.:*(a::QSym, b::QSym)
return QAdd(out, _EMPTY_INDICES)
end

function Base.:*(a::QAdd, b::QSym)
Base.@nospecializeinfer function Base.:*(a::QAdd, @nospecialize(b::QSym))
out = QTermDict()
needs = !isempty(a.indices)
for (ta, ca) in a
Expand All @@ -202,7 +202,7 @@ function Base.:*(a::QAdd, b::QSym)
return QAdd(out, _absorb_pinned_sums(a.indices, a, b))
end

function Base.:*(a::QSym, b::QAdd)
Base.@nospecializeinfer function Base.:*(@nospecialize(a::QSym), b::QAdd)
out = QTermDict()
needs = !isempty(b.indices)
for (tb, cb) in b
Expand Down Expand Up @@ -270,7 +270,7 @@ function _every_term_has_op_index(q::QAdd, idx::Index)
@inbounds for t in keys(q.arguments)
found = false
for op in t.ops
if op.index == idx
if op.index::Index == idx
found = true
break
end
Expand Down
2 changes: 1 addition & 1 deletion src/expressions/index.jl
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,7 @@ _check_not_identical(result::Number) = Num(result)

function _depends_on_index_term(c::CNum, ops::Vector{QSym}, idx::Index)
for op in ops
op.index == idx && return true
op.index::Index == idx && return true
end
isym = SymbolicUtils.unwrap(idx.sym)
for part in (real(c), imag(c))
Expand Down
Loading