Skip to content

BatchExaModel: batch optimization with GPU support#216

Open
amontoison wants to merge 52 commits intomainfrom
am/batch
Open

BatchExaModel: batch optimization with GPU support#216
amontoison wants to merge 52 commits intomainfrom
am/batch

Conversation

@amontoison
Copy link
Copy Markdown
Member

@amontoison amontoison commented Feb 5, 2026

Summary

Adds BatchExaModel for solving multiple independent optimization instances that share identical structure but differ in parameter values. All instances share one compiled expression pattern and are fused into a single model for efficient SIMD/GPU evaluation.

Core features

  • BatchExaCore(ns) — create a core for ns batch instances; variables, parameters, objectives, and constraints are defined once and replicated automatically
  • Batch NLP APIobj!, grad!, cons!, jac_coord!, hess_coord! operate on matrix-valued (dim, ns) inputs
  • FlattenNLPModel — wraps a BatchExaModel as a standard AbstractNLPModel{T, Vector{T}} for use with any NLPModels-compatible solver (e.g. Ipopt)
  • get_model(model) — returns the solver-ready flat model for batch, identity for regular models
  • var_indices(model, i) — extract per-instance solution from the flat solver output
  • GPU support — batch KA kernels for obj, grad, cons, jac, hess; race-free obj kernel via fill+reduce
  • Batched two-stageTwoStageExaCore(ns; nbatch = Val(N)) combines batch and two-stage models

API additions

  • BatchExaCore, BatchExaModel, FlattenNLPModel
  • get_nbatch, get_model, var_indices, cons_block_indices
  • @add_var, @add_par, @add_obj, @add_con, @add_con!, @add_expr all work with batch cores
  • set_value! / get_value for updating parameters on the model (batch-aware, broadcasts across instances)
  • set_parameter! deprecated in favor of set_value!

Other changes

  • TwoStageExaCore / TwoStageExaModel with EachScenario() API, getter/setter helpers
  • add_con consolidated to ns... varargs (single method handles both explicit dims and generator)
  • append! fixed for N-dimensional arrays (fixes juliac COPSApp)
  • build_extension widened from AbstractVector to AbstractArray so batch models get KA acceleration
  • Pretty-print improvements: operator precedence parenthesization, bracket notation for integer indexing
  • Benchmark noise reduction with minimum-based estimator

Tests

  • BatchTest: construction, obj/grad/cons/jac/hess, bounds, error guards, Ipopt, set_value!, multidim vars, add_con!, add_expr, per-instance accessors, GPU backend loop
  • TwoStageTest: construction, evaluation, Ipopt (inequality/equality/multi-var), batched two-stage (construction/eval/Ipopt), getters/setters
  • GetterSetterTest: parameter/variable/constraint get/set API

Docs

  • New docs/src/batch.jl tutorial with worked examples
  • BatchNLPModels autodocs in API manual
  • Updated docs/src/parameters.jl to use set_value!

cc @michel2323

@michel2323 michel2323 marked this pull request as ready for review February 13, 2026 21:18
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 13, 2026

Your PR requires formatting changes to meet the project's style guidelines.

Please run:

julia --project=@runic -e 'using Pkg; Pkg.add("Runic")'
julia --project=@runic -e "using Runic; exit(Runic.main(ARGS))" -- --fix <files>

(or git runic main if you have the git wrapper installed)

Note: the full diff is omitted because it can exceed GitHub Actions input limits.

@amontoison amontoison changed the title First draft of BatchExaModel BatchExaModel Feb 14, 2026
Comment thread src/batch.jl Outdated
@amontoison
Copy link
Copy Markdown
Member Author

@michel2323 Do you know how we can adapt the objective function to return a vector of nscenario instead of a scalar that is the sum of all objectives?
It is what we need for the new API of NLPModels.jl.

@michel2323
Copy link
Copy Markdown
Member

michel2323 commented Feb 18, 2026

Right. Forgot about the primal. How about the constraints?

I'll give it a try. Should be easy.

@amontoison
Copy link
Copy Markdown
Member Author

For the constraints, we want to a strided API where the cosntraints of each scenarios are stored in the same vector.
It is like TwoStageExaModel and exactly what we do.

@michel2323
Copy link
Copy Markdown
Member

For the objective, I've added the API below. This should be aligned with what be discussed back here.

    obj(model::TwoStageExaModel, x_global, s)  scalar

Evaluate the objective contribution of scenario s (for 1 ≤ s ≤ ns), or
the shared/design-only objective terms (for s = 0).

Per-scenario terms are Objective nodes whose iterator length is evenly divisible
by ns. Design-only terms (where length(itr) % ns != 0) are returned when s = 0.

The invariant obj(m, x) ≈ sum(obj(m, x, s) for s in 1:ns) + obj(m, x, 0) holds.

@michel2323
Copy link
Copy Markdown
Member

We decided on a vectorized API obj(::BatchExaModel, x, result) where result is a vector of size ns.

@michel2323 michel2323 force-pushed the am/batch branch 6 times, most recently from a17fa88 to 019aa55 Compare March 3, 2026 14:52
Comment thread src/batch.jl Outdated
sshin23 and others added 4 commits April 18, 2026 00:10
# Conflicts:
#	Project.toml
#	docs/make.jl
#	src/ExaModels.jl
#	test/runtests.jl
- BatchExaCore(nbatch) creates ExaCore with Matrix storage
- EachInstance() marker for per-instance add_var/add_par/add_con
- Unified append! for Vector and Matrix (no separate methods)
- ExaModel(c) handles both batch and non-batch via _meta_dims dispatch
- Removed BatchExt: no fused_model, objbuffer, or hess_buffer storage
- get_model() constructs flat model on-the-fly for solver compatibility
- hess_perm computed on-the-fly in hess_coord!
- BatchNLPModelMeta stores per-instance nvar/ncon/nnzj/nnzh
- Relaxed ExaCore VT constraint from AbstractVector to AbstractArray

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nels, fix two-stage tag propagation

- Remove batch.jl; merge all batch logic into nlp.jl
- Remove LinAlgTest (dead code)
- Batch s-functions (gradient!, sjacobian!, shessian!) use double for-loop (batch × itr)
- Thread `backend` through batch evaluation chain for KA dispatch
- Add OffsetVector for zero-allocation batch offset indexing in KA kernels
- Add batch KA kernels (kerf_batch, kerg_batch, kerj_batch, kerh_batch, kerh2_batch)
  launching single kernel over nb*nitr work items
- Support per-instance lvar/uvar/start via matrix args to @add_var
- Add append! method for AbstractMatrix
- Fix two-stage tag propagation: capture append! return value and rebuild tag
- Move BatchExaModel getters/setters after type alias definition

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 18, 2026

Benchmark Results


sshin23 and others added 3 commits April 18, 2026 23:28
BatchExaModel.jac_structure! and hess_structure! now pass getbackend(m)
to backend-aware _jac_structure!/_obj_hess_structure!/_con_hess_structure!
so the KA extension can intercept and use GPU kernels instead of
scalar-indexing f.itr arrays.

Also converts obj_weight to model type T in batch hess_coord! to prevent
Metal InvalidIRError from Float64 values on Float32 models.

FlatNLPModel structure queries now use device-native temp arrays for the
per-instance query, then copy to CPU for replication.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The KA extension's _jac_structure!, _obj_hess_structure!, and
_con_hess_structure! were local functions, not methods on ExaModels.
BatchExaModel.jac_structure! now calls ExaModels._jac_structure! with
getbackend(m), so the extension must add methods to ExaModels' functions
for GPU dispatch to work.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three calls in the KAExtension constructor at lines 61/64/65 still used
unqualified _jac_structure!, _obj_hess_structure!, _con_hess_structure!
instead of ExaModels._ prefixed versions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add _reshape_to_match to correctly handle array bounds regardless of
input shape. For non-batch (Vector target), matrices are vecd. For
batch (Matrix target), matrices are reshaped to match trailing dims.
This fixes the DimensionMismatch when passing matrix lvar/uvar to
batch models, while preserving the existing behavior for non-batch
models that receive matrix-shaped comprehension results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sshin23 and others added 24 commits April 21, 2026 19:27
- Rename _batch suffix from KA kernels (kerg, kerj, kerh, kerh2, kerg_sparse)
- Remove AbstractVector constraints from GPU dispatch functions so NaNSource works for structure detection
- Add GPU obj! override for BatchExaModel
- Add batched CPU fallback for sgradient!
- Add kerg_sparse kernel for GPU sparse gradient

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OffsetVector is not <:AbstractVector, so grpass/jrpass/hrpass/hdrpass
structure-detection specializations were not matching — the generic
value-accumulation methods fired instead, causing Tuple += Float64
MethodError in GPU kernels.

Fix: parameterize OffsetVector{T,V} so T is the element type, then add
matching specializations for each structure-detection pass function.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Null, SumNode, ProdNode use eltype(T) where T is the inner type of
AdjointNodeSource/SecondAdjointNodeSource. With batch kernels wrapping
x/θ in OffsetVector, T becomes OffsetVector{Float64,...} and eltype(T)
was undefined, causing jl_f_throw_methoderror in GPU compilation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…spatch

The GPU obj! for BatchExaModel was ambiguous with the CPU fallback in
nlp.jl because neither constrained both VT and E. Adding VT<:AbstractMatrix{T}
makes the GPU method strictly more specific.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…erModels, SpecialFunctions to weakdeps

These should not be hard dependencies of ExaModels core.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…elsJuMP, NLPModelsTest, Percival, PowerModels)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Delete src/BatchNLPModels.jl; move FlatNLPModel and generic get_nbatch
  overloads into utils.jl
- Remove abstract type AbstractExaModel; ExaModel now directly subtypes
  NLPModels.AbstractNLPModel
- Update ExaModelsKernelAbstractions.jl and test/BatchTest accordingly
- Remove BatchNLPModels section from docs/src/core.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…h = 1

ExaCore and BatchExaCore now take `batch = Val(false)` (non-batch, default)
or `batch = Val(true)` (batch) as a boolean flag, with `nbatch` as a plain
integer instead of Val-wrapped. _make_exacore dispatches on Val{false}/Val{true}.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…shadowing

Adds set_value!(model, param, values::AbstractMatrix) for batch models so
per-instance parameters can be set with an npar×ns matrix. Uses Base.size
explicitly to avoid shadowing by the local size(ns) function.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e specializations

Output arrays (y, y1) in KA batch kernels now use view(arr, off+1:off+n) instead
of OffsetVector. Since SubArray <: AbstractVector, the existing AbstractVector
dispatch in gradient/jacobian/hessian covers them without extra overloads.
OffsetVector is kept only for input sources (x, θ) which may be NaNSource.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Delete OffsetVector struct from graph.jl
- Make NaNSource <: AbstractVector{T} with size (typemax(Int),) and
  unconstrained getindex so view() wraps it safely under @inbounds
- Replace all OffsetVector(x/θ, off) in KA kernels with view(arr, off+1:off+n)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add ExaModel{T, <:AbstractVector} specializations for obj, cons_nln!,
grad!, jac_coord!, hess_coord! that bypass the batch view-creating loop
(which was introduced for the batched ExaModel{T, <:AbstractMatrix} path).

The batch loop created SubArray views even for nb=1, degrading performance
3–7× on jac (elec) and 1.2–4× on obj/grad/hess. Non-batch models now
dispatch to the original direct-vector kernel paths with no overhead.
Batch (AbstractMatrix) models are unaffected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…A ambiguity

The <:AbstractVector dispatch for obj/cons_nln!/grad!/jac_coord!/hess_coord!
was too broad: it matched non-batch GPU models (CuVector<:AbstractVector),
creating method ambiguity with the KA extension's E<:KAExtension dispatches,
and silently dropping the GPU backend for jac/hess on those models.

Restrict all fast direct-path specializations to ExaModel{T,<:AbstractVector,Nothing}
(CPU-only). GPU non-batch models (E<:KAExtension, VT<:AbstractVector) now
fall through to the general ExaModel{T} path which passes the backend through,
hitting the KA GPU kernels as before.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s_coord!

These were dropped in a previous edit, causing MethodAmbiguity between
ExaModel{T}(AbstractVecOrMat, AbstractVecOrMat) and NLPModels'
AbstractNLPModel(AbstractVector, AbstractVector) when called with
plain vector arguments.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rray

KAExtension's internal buffers are always 1D (created via similar(c.x0, n)),
but c.x0 can be a matrix for batch models. AbstractArray{T} is the correct
constraint since similar(::AbstractMatrix, n::Int) returns a 1D array which
satisfies AbstractArray{T} but the old AbstractVector{T} was misleadingly
restrictive.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… up docs/tests

- Remove set_parameter! (deprecated); replace with set_value! in tests and docs
- TwoStageExaCore now accepts nbatch kwarg for batch scenario support
- Update parameters.md to use @add_par/@add_var/@add_obj/@add_con macros
- Move MadNLP/PowerModels/Percival/NLPModelsTest to main deps in Project.toml
- Whitespace cleanup in LuksanVlcekApp.jl

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tension

- Add jac_structure!/hess_structure! overrides for KAExtension models to route
  to backend-aware GPU kernels instead of CPU scalar iteration over GPU arrays
- Relax jrpass/hrpass/hdrpass dispatch from y1::V,y2::V (same type) to
  y1::V1,y2::V2 (same eltype) so view+array pairs dispatch correctly in kerj/kerh
- Add obj/grad! overrides for VT<:AbstractMatrix + KAExtension to resolve method
  ambiguity and throw ArgumentError for batch models receiving vector input
- Revert KAExtension VT param to AbstractVector (buffers are always 1D)
- Fix kersyspmv2 parse error: idx = @index(Global)0 -> idx = @index(Global)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use offset0(f, itr, k) in kerf_con_aug_batch to compute the correct
constraint index for ConstraintAugmentation. The previous approach
passed con.oa (a scalar conbuffer offset) as an array and indexed into
it, producing wrong indices for nb > 1 batches and causing solver
restoration failures.

Also add test_batch_opf_flat: a 3-batch AC OPF test (case3_lmbd) with
augmented power balance constraints solved via FlatNLPModel + MadNLP.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MadNLP's default MakeParameter treatment calls findall on GPU boolean
arrays which triggers scalar indexing errors. RelaxBound skips that
entirely and is GPU-compatible.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_classify_bounds: collect lb/ub to CPU (Array()) before findall to
avoid GPU scalar indexing (Metal InvalidIRError, PoCL findall error).

test_batch_opf_flat: wrap FlatNLPModel in WrapperNLPModel so MadNLP
gets CPU arrays for its internals (force_lower_triangular!, etc.);
GPU model computations still happen on-device via the inner model.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The per-batch loop called BatchExaModel.jtprod_nln! with 1D GPU slices
of one-batch length, but the KA prodhelper sparsity structures are built
for the full nb-batch flat vector. Pass the full flat x/v/Jtv vectors
directly to m.batch.jtprod_nln! so the KA kernel uses the correct extents.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants