BatchExaModel: batch optimization with GPU support by amontoison · Pull Request #216 · exanauts/ExaModels.jl

amontoison · 2026-02-05T19:14:27Z

Summary

Adds BatchExaModel for solving multiple independent optimization instances that share identical structure but differ in parameter values. All instances share one compiled expression pattern and are fused into a single model for efficient SIMD/GPU evaluation.

Core features

BatchExaCore(ns) — create a core for ns batch instances; variables, parameters, objectives, and constraints are defined once and replicated automatically
Batch NLP API — obj!, grad!, cons!, jac_coord!, hess_coord! operate on matrix-valued (dim, ns) inputs
FlattenNLPModel — wraps a BatchExaModel as a standard AbstractNLPModel{T, Vector{T}} for use with any NLPModels-compatible solver (e.g. Ipopt)
get_model(model) — returns the solver-ready flat model for batch, identity for regular models
var_indices(model, i) — extract per-instance solution from the flat solver output
GPU support — batch KA kernels for obj, grad, cons, jac, hess; race-free obj kernel via fill+reduce
Batched two-stage — TwoStageExaCore(ns; nbatch = Val(N)) combines batch and two-stage models

API additions

BatchExaCore, BatchExaModel, FlattenNLPModel
get_nbatch, get_model, var_indices, cons_block_indices
@add_var, @add_par, @add_obj, @add_con, @add_con!, @add_expr all work with batch cores
set_value! / get_value for updating parameters on the model (batch-aware, broadcasts across instances)
set_parameter! deprecated in favor of set_value!

Other changes

TwoStageExaCore / TwoStageExaModel with EachScenario() API, getter/setter helpers
add_con consolidated to ns... varargs (single method handles both explicit dims and generator)
append! fixed for N-dimensional arrays (fixes juliac COPSApp)
build_extension widened from AbstractVector to AbstractArray so batch models get KA acceleration
Pretty-print improvements: operator precedence parenthesization, bracket notation for integer indexing
Benchmark noise reduction with minimum-based estimator

Tests

BatchTest: construction, obj/grad/cons/jac/hess, bounds, error guards, Ipopt, set_value!, multidim vars, add_con!, add_expr, per-instance accessors, GPU backend loop
TwoStageTest: construction, evaluation, Ipopt (inequality/equality/multi-var), batched two-stage (construction/eval/Ipopt), getters/setters
GetterSetterTest: parameter/variable/constraint get/set API

Docs

New docs/src/batch.jl tutorial with worked examples
BatchNLPModels autodocs in API manual
Updated docs/src/parameters.jl to use set_value!

cc @michel2323

github-actions · 2026-02-13T21:20:02Z

Your PR requires formatting changes to meet the project's style guidelines.

Please run:

julia --project=@runic -e 'using Pkg; Pkg.add("Runic")'
julia --project=@runic -e "using Runic; exit(Runic.main(ARGS))" -- --fix <files>

(or git runic main if you have the git wrapper installed)

Note: the full diff is omitted because it can exceed GitHub Actions input limits.

amontoison · 2026-02-17T18:36:46Z

@michel2323 Do you know how we can adapt the objective function to return a vector of nscenario instead of a scalar that is the sum of all objectives?
It is what we need for the new API of NLPModels.jl.

michel2323 · 2026-02-18T00:37:20Z

Right. Forgot about the primal. How about the constraints?

I'll give it a try. Should be easy.

amontoison · 2026-02-18T01:17:46Z

For the constraints, we want to a strided API where the cosntraints of each scenarios are stored in the same vector.
It is like TwoStageExaModel and exactly what we do.

michel2323 · 2026-02-18T13:48:39Z

For the objective, I've added the API below. This should be aligned with what be discussed back here.

    obj(model::TwoStageExaModel, x_global, s) → scalar

Evaluate the objective contribution of scenario s (for 1 ≤ s ≤ ns), or
the shared/design-only objective terms (for s = 0).

Per-scenario terms are Objective nodes whose iterator length is evenly divisible
by ns. Design-only terms (where length(itr) % ns != 0) are returned when s = 0.

The invariant obj(m, x) ≈ sum(obj(m, x, s) for s in 1:ns) + obj(m, x, 0) holds.

michel2323 · 2026-02-18T17:34:24Z

We decided on a vectorized API obj(::BatchExaModel, x, result) where result is a vector of size ns.

# Conflicts: # Project.toml # docs/make.jl # src/ExaModels.jl # test/runtests.jl

- BatchExaCore(nbatch) creates ExaCore with Matrix storage - EachInstance() marker for per-instance add_var/add_par/add_con - Unified append! for Vector and Matrix (no separate methods) - ExaModel(c) handles both batch and non-batch via _meta_dims dispatch - Removed BatchExt: no fused_model, objbuffer, or hess_buffer storage - get_model() constructs flat model on-the-fly for solver compatibility - hess_perm computed on-the-fly in hess_coord! - BatchNLPModelMeta stores per-instance nvar/ncon/nnzj/nnzh - Relaxed ExaCore VT constraint from AbstractVector to AbstractArray Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…nels, fix two-stage tag propagation - Remove batch.jl; merge all batch logic into nlp.jl - Remove LinAlgTest (dead code) - Batch s-functions (gradient!, sjacobian!, shessian!) use double for-loop (batch × itr) - Thread `backend` through batch evaluation chain for KA dispatch - Add OffsetVector for zero-allocation batch offset indexing in KA kernels - Add batch KA kernels (kerf_batch, kerg_batch, kerj_batch, kerh_batch, kerh2_batch) launching single kernel over nb*nitr work items - Support per-instance lvar/uvar/start via matrix args to @add_var - Add append! method for AbstractMatrix - Fix two-stage tag propagation: capture append! return value and rebuild tag - Move BatchExaModel getters/setters after type alias definition Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-18T21:48:40Z

Benchmark Results

BatchExaModel.jac_structure! and hess_structure! now pass getbackend(m) to backend-aware _jac_structure!/_obj_hess_structure!/_con_hess_structure! so the KA extension can intercept and use GPU kernels instead of scalar-indexing f.itr arrays. Also converts obj_weight to model type T in batch hess_coord! to prevent Metal InvalidIRError from Float64 values on Float32 models. FlatNLPModel structure queries now use device-native temp arrays for the per-instance query, then copy to CPU for replication. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The KA extension's _jac_structure!, _obj_hess_structure!, and _con_hess_structure! were local functions, not methods on ExaModels. BatchExaModel.jac_structure! now calls ExaModels._jac_structure! with getbackend(m), so the extension must add methods to ExaModels' functions for GPU dispatch to work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Three calls in the KAExtension constructor at lines 61/64/65 still used unqualified _jac_structure!, _obj_hess_structure!, _con_hess_structure! instead of ExaModels._ prefixed versions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add _reshape_to_match to correctly handle array bounds regardless of input shape. For non-batch (Vector target), matrices are vecd. For batch (Matrix target), matrices are reshaped to match trailing dims. This fixes the DimensionMismatch when passing matrix lvar/uvar to batch models, while preserving the existing behavior for non-batch models that receive matrix-shaped comprehension results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Rename _batch suffix from KA kernels (kerg, kerj, kerh, kerh2, kerg_sparse) - Remove AbstractVector constraints from GPU dispatch functions so NaNSource works for structure detection - Add GPU obj! override for BatchExaModel - Add batched CPU fallback for sgradient! - Add kerg_sparse kernel for GPU sparse gradient Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

OffsetVector is not <:AbstractVector, so grpass/jrpass/hrpass/hdrpass structure-detection specializations were not matching — the generic value-accumulation methods fired instead, causing Tuple += Float64 MethodError in GPU kernels. Fix: parameterize OffsetVector{T,V} so T is the element type, then add matching specializations for each structure-detection pass function. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Null, SumNode, ProdNode use eltype(T) where T is the inner type of AdjointNodeSource/SecondAdjointNodeSource. With batch kernels wrapping x/θ in OffsetVector, T becomes OffsetVector{Float64,...} and eltype(T) was undefined, causing jl_f_throw_methoderror in GPU compilation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…spatch The GPU obj! for BatchExaModel was ambiguous with the CPU fallback in nlp.jl because neither constrained both VT and E. Adding VT<:AbstractMatrix{T} makes the GPU method strictly more specific. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…erModels, SpecialFunctions to weakdeps These should not be hard dependencies of ExaModels core. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…elsJuMP, NLPModelsTest, Percival, PowerModels) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Delete src/BatchNLPModels.jl; move FlatNLPModel and generic get_nbatch overloads into utils.jl - Remove abstract type AbstractExaModel; ExaModel now directly subtypes NLPModels.AbstractNLPModel - Update ExaModelsKernelAbstractions.jl and test/BatchTest accordingly - Remove BatchNLPModels section from docs/src/core.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…h = 1 ExaCore and BatchExaCore now take `batch = Val(false)` (non-batch, default) or `batch = Val(true)` (batch) as a boolean flag, with `nbatch` as a plain integer instead of Val-wrapped. _make_exacore dispatches on Val{false}/Val{true}. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…shadowing Adds set_value!(model, param, values::AbstractMatrix) for batch models so per-instance parameters can be set with an npar×ns matrix. Uses Base.size explicitly to avoid shadowing by the local size(ns) function. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e specializations Output arrays (y, y1) in KA batch kernels now use view(arr, off+1:off+n) instead of OffsetVector. Since SubArray <: AbstractVector, the existing AbstractVector dispatch in gradient/jacobian/hessian covers them without extra overloads. OffsetVector is kept only for input sources (x, θ) which may be NaNSource. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@inbounds

- Delete OffsetVector struct from graph.jl - Make NaNSource <: AbstractVector{T} with size (typemax(Int),) and unconstrained getindex so view() wraps it safely under @inbounds - Replace all OffsetVector(x/θ, off) in KA kernels with view(arr, off+1:off+n) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add ExaModel{T, <:AbstractVector} specializations for obj, cons_nln!, grad!, jac_coord!, hess_coord! that bypass the batch view-creating loop (which was introduced for the batched ExaModel{T, <:AbstractMatrix} path). The batch loop created SubArray views even for nb=1, degrading performance 3–7× on jac (elec) and 1.2–4× on obj/grad/hess. Non-batch models now dispatch to the original direct-vector kernel paths with no overhead. Batch (AbstractMatrix) models are unaffected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…A ambiguity The <:AbstractVector dispatch for obj/cons_nln!/grad!/jac_coord!/hess_coord! was too broad: it matched non-batch GPU models (CuVector<:AbstractVector), creating method ambiguity with the KA extension's E<:KAExtension dispatches, and silently dropping the GPU backend for jac/hess on those models. Restrict all fast direct-path specializations to ExaModel{T,<:AbstractVector,Nothing} (CPU-only). GPU non-batch models (E<:KAExtension, VT<:AbstractVector) now fall through to the general ExaModel{T} path which passes the backend through, hitting the KA GPU kernels as before. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…s_coord! These were dropped in a previous edit, causing MethodAmbiguity between ExaModel{T}(AbstractVecOrMat, AbstractVecOrMat) and NLPModels' AbstractNLPModel(AbstractVector, AbstractVector) when called with plain vector arguments. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rray KAExtension's internal buffers are always 1D (created via similar(c.x0, n)), but c.x0 can be a matrix for batch models. AbstractArray{T} is the correct constraint since similar(::AbstractMatrix, n::Int) returns a 1D array which satisfies AbstractArray{T} but the old AbstractVector{T} was misleadingly restrictive. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… up docs/tests - Remove set_parameter! (deprecated); replace with set_value! in tests and docs - TwoStageExaCore now accepts nbatch kwarg for batch scenario support - Update parameters.md to use @add_par/@add_var/@add_obj/@add_con macros - Move MadNLP/PowerModels/Percival/NLPModelsTest to main deps in Project.toml - Whitespace cleanup in LuksanVlcekApp.jl Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tension - Add jac_structure!/hess_structure! overrides for KAExtension models to route to backend-aware GPU kernels instead of CPU scalar iteration over GPU arrays - Relax jrpass/hrpass/hdrpass dispatch from y1::V,y2::V (same type) to y1::V1,y2::V2 (same eltype) so view+array pairs dispatch correctly in kerj/kerh - Add obj/grad! overrides for VT<:AbstractMatrix + KAExtension to resolve method ambiguity and throw ArgumentError for batch models receiving vector input - Revert KAExtension VT param to AbstractVector (buffers are always 1D) - Fix kersyspmv2 parse error: idx = @index(Global)0 -> idx = @index(Global) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Use offset0(f, itr, k) in kerf_con_aug_batch to compute the correct constraint index for ConstraintAugmentation. The previous approach passed con.oa (a scalar conbuffer offset) as an array and indexed into it, producing wrong indices for nb > 1 batches and causing solver restoration failures. Also add test_batch_opf_flat: a 3-batch AC OPF test (case3_lmbd) with augmented power balance constraints solved via FlatNLPModel + MadNLP. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

MadNLP's default MakeParameter treatment calls findall on GPU boolean arrays which triggers scalar indexing errors. RelaxBound skips that entirely and is GPU-compatible. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

_classify_bounds: collect lb/ub to CPU (Array()) before findall to avoid GPU scalar indexing (Metal InvalidIRError, PoCL findall error). test_batch_opf_flat: wrap FlatNLPModel in WrapperNLPModel so MadNLP gets CPU arrays for its internals (force_lower_triangular!, etc.); GPU model computations still happen on-device via the inner model. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The per-batch loop called BatchExaModel.jtprod_nln! with 1D GPU slices of one-batch length, but the KA prodhelper sparsity structures are built for the full nb-batch flat vector. Pass the full flat x/v/Jtv vectors directly to m.batch.jtprod_nln! so the KA kernel uses the correct extents. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

amontoison force-pushed the am/batch branch from 45ab273 to 659c9ff Compare February 5, 2026 19:16

michel2323 marked this pull request as ready for review February 13, 2026 21:18

amontoison changed the title ~~First draft of BatchExaModel~~ BatchExaModel Feb 14, 2026

amontoison commented Feb 14, 2026

View reviewed changes

Comment thread src/batch.jl Outdated

amontoison mentioned this pull request Feb 14, 2026

Batch API JuliaSmoothOptimizers/NLPModels.jl#540

Merged

michel2323 force-pushed the am/batch branch from b097ea9 to e4a369a Compare February 17, 2026 17:38

michel2323 force-pushed the am/batch branch 6 times, most recently from a17fa88 to 019aa55 Compare March 3, 2026 14:52

amontoison commented Mar 3, 2026

View reviewed changes

Comment thread src/batch.jl Outdated

michel2323 added 4 commits March 6, 2026 11:15

Add BatchExaModel with NLPModels batch API

c31ab20

Make docs work

4d1a531

Fix

ea36480

Use get_nbatch()

cb3a523

michel2323 force-pushed the am/batch branch from 019aa55 to cb3a523 Compare March 6, 2026 17:18

michel2323 mentioned this pull request Mar 9, 2026

Extension packages #246

Closed

sshin23 and others added 4 commits April 18, 2026 00:10

Merge remote-tracking branch 'origin/main' into am/batch

69ef168

# Conflicts: # Project.toml # docs/make.jl # src/ExaModels.jl # test/runtests.jl

Resolve merge conflict: keep both two_stage and batch includes/exports

dfe35e0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sshin23 and others added 3 commits April 18, 2026 23:28

sshin23 force-pushed the am/batch branch from cafa9e1 to 529030a Compare April 19, 2026 13:18

sshin23 force-pushed the am/batch branch from 529030a to b46e204 Compare April 19, 2026 13:37

sshin23 and others added 24 commits April 21, 2026 19:27

Move ForwardDiff, MadNLP, NLPModelsJuMP, NLPModelsTest, Percival, Pow…

0be8d9d

…erModels, SpecialFunctions to weakdeps These should not be hard dependencies of ExaModels core. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove Percival from Project.toml (test-only dependency)

d753552

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Clean up Project.toml: remove test-only packages (ForwardDiff, NLPMod…

2c845ad

…elsJuMP, NLPModelsTest, Percival, PowerModels) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Relax NLPModels compat to 0.21

2bea7c7

docs: add batch solver note and per-instance parameter examples

beb7ae4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BatchExaModel: batch optimization with GPU support#216

BatchExaModel: batch optimization with GPU support#216
amontoison wants to merge 52 commits intomainfrom
am/batch

amontoison commented Feb 5, 2026 •

edited by sshin23

Loading

Uh oh!

github-actions Bot commented Feb 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

amontoison commented Feb 17, 2026

Uh oh!

michel2323 commented Feb 18, 2026 •

edited

Loading

Uh oh!

amontoison commented Feb 18, 2026

Uh oh!

michel2323 commented Feb 18, 2026

Uh oh!

michel2323 commented Feb 18, 2026

Uh oh!

Uh oh!

github-actions Bot commented Apr 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

amontoison commented Feb 5, 2026 • edited by sshin23 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core features

API additions

Other changes

Tests

Docs

Uh oh!

github-actions Bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

amontoison commented Feb 17, 2026

Uh oh!

michel2323 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amontoison commented Feb 18, 2026

Uh oh!

michel2323 commented Feb 18, 2026

Uh oh!

michel2323 commented Feb 18, 2026

Uh oh!

Uh oh!

github-actions Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amontoison commented Feb 5, 2026 •

edited by sshin23

Loading

github-actions Bot commented Feb 13, 2026 •

edited

Loading

michel2323 commented Feb 18, 2026 •

edited

Loading

github-actions Bot commented Apr 18, 2026 •

edited

Loading