BatchExaModel: batch optimization with GPU support#216
BatchExaModel: batch optimization with GPU support#216amontoison wants to merge 52 commits intomainfrom
Conversation
|
Your PR requires formatting changes to meet the project's style guidelines. Please run: julia --project=@runic -e 'using Pkg; Pkg.add("Runic")'
julia --project=@runic -e "using Runic; exit(Runic.main(ARGS))" -- --fix <files>(or Note: the full diff is omitted because it can exceed GitHub Actions input limits. |
|
@michel2323 Do you know how we can adapt the objective function to return a vector of |
|
Right. Forgot about the primal. How about the constraints? I'll give it a try. Should be easy. |
|
For the constraints, we want to a strided API where the cosntraints of each scenarios are stored in the same vector. |
|
For the objective, I've added the API below. This should be aligned with what be discussed back here. obj(model::TwoStageExaModel, x_global, s) → scalarEvaluate the objective contribution of scenario Per-scenario terms are The invariant |
|
We decided on a vectorized API |
a17fa88 to
019aa55
Compare
# Conflicts: # Project.toml # docs/make.jl # src/ExaModels.jl # test/runtests.jl
- BatchExaCore(nbatch) creates ExaCore with Matrix storage - EachInstance() marker for per-instance add_var/add_par/add_con - Unified append! for Vector and Matrix (no separate methods) - ExaModel(c) handles both batch and non-batch via _meta_dims dispatch - Removed BatchExt: no fused_model, objbuffer, or hess_buffer storage - get_model() constructs flat model on-the-fly for solver compatibility - hess_perm computed on-the-fly in hess_coord! - BatchNLPModelMeta stores per-instance nvar/ncon/nnzj/nnzh - Relaxed ExaCore VT constraint from AbstractVector to AbstractArray Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nels, fix two-stage tag propagation - Remove batch.jl; merge all batch logic into nlp.jl - Remove LinAlgTest (dead code) - Batch s-functions (gradient!, sjacobian!, shessian!) use double for-loop (batch × itr) - Thread `backend` through batch evaluation chain for KA dispatch - Add OffsetVector for zero-allocation batch offset indexing in KA kernels - Add batch KA kernels (kerf_batch, kerg_batch, kerj_batch, kerh_batch, kerh2_batch) launching single kernel over nb*nitr work items - Support per-instance lvar/uvar/start via matrix args to @add_var - Add append! method for AbstractMatrix - Fix two-stage tag propagation: capture append! return value and rebuild tag - Move BatchExaModel getters/setters after type alias definition Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Benchmark Results |
BatchExaModel.jac_structure! and hess_structure! now pass getbackend(m) to backend-aware _jac_structure!/_obj_hess_structure!/_con_hess_structure! so the KA extension can intercept and use GPU kernels instead of scalar-indexing f.itr arrays. Also converts obj_weight to model type T in batch hess_coord! to prevent Metal InvalidIRError from Float64 values on Float32 models. FlatNLPModel structure queries now use device-native temp arrays for the per-instance query, then copy to CPU for replication. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The KA extension's _jac_structure!, _obj_hess_structure!, and _con_hess_structure! were local functions, not methods on ExaModels. BatchExaModel.jac_structure! now calls ExaModels._jac_structure! with getbackend(m), so the extension must add methods to ExaModels' functions for GPU dispatch to work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three calls in the KAExtension constructor at lines 61/64/65 still used unqualified _jac_structure!, _obj_hess_structure!, _con_hess_structure! instead of ExaModels._ prefixed versions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add _reshape_to_match to correctly handle array bounds regardless of input shape. For non-batch (Vector target), matrices are vecd. For batch (Matrix target), matrices are reshaped to match trailing dims. This fixes the DimensionMismatch when passing matrix lvar/uvar to batch models, while preserving the existing behavior for non-batch models that receive matrix-shaped comprehension results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename _batch suffix from KA kernels (kerg, kerj, kerh, kerh2, kerg_sparse) - Remove AbstractVector constraints from GPU dispatch functions so NaNSource works for structure detection - Add GPU obj! override for BatchExaModel - Add batched CPU fallback for sgradient! - Add kerg_sparse kernel for GPU sparse gradient Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OffsetVector is not <:AbstractVector, so grpass/jrpass/hrpass/hdrpass
structure-detection specializations were not matching — the generic
value-accumulation methods fired instead, causing Tuple += Float64
MethodError in GPU kernels.
Fix: parameterize OffsetVector{T,V} so T is the element type, then add
matching specializations for each structure-detection pass function.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Null, SumNode, ProdNode use eltype(T) where T is the inner type of
AdjointNodeSource/SecondAdjointNodeSource. With batch kernels wrapping
x/θ in OffsetVector, T becomes OffsetVector{Float64,...} and eltype(T)
was undefined, causing jl_f_throw_methoderror in GPU compilation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…spatch
The GPU obj! for BatchExaModel was ambiguous with the CPU fallback in
nlp.jl because neither constrained both VT and E. Adding VT<:AbstractMatrix{T}
makes the GPU method strictly more specific.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…erModels, SpecialFunctions to weakdeps These should not be hard dependencies of ExaModels core. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…elsJuMP, NLPModelsTest, Percival, PowerModels) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Delete src/BatchNLPModels.jl; move FlatNLPModel and generic get_nbatch overloads into utils.jl - Remove abstract type AbstractExaModel; ExaModel now directly subtypes NLPModels.AbstractNLPModel - Update ExaModelsKernelAbstractions.jl and test/BatchTest accordingly - Remove BatchNLPModels section from docs/src/core.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…h = 1
ExaCore and BatchExaCore now take `batch = Val(false)` (non-batch, default)
or `batch = Val(true)` (batch) as a boolean flag, with `nbatch` as a plain
integer instead of Val-wrapped. _make_exacore dispatches on Val{false}/Val{true}.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…shadowing Adds set_value!(model, param, values::AbstractMatrix) for batch models so per-instance parameters can be set with an npar×ns matrix. Uses Base.size explicitly to avoid shadowing by the local size(ns) function. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e specializations Output arrays (y, y1) in KA batch kernels now use view(arr, off+1:off+n) instead of OffsetVector. Since SubArray <: AbstractVector, the existing AbstractVector dispatch in gradient/jacobian/hessian covers them without extra overloads. OffsetVector is kept only for input sources (x, θ) which may be NaNSource. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Delete OffsetVector struct from graph.jl
- Make NaNSource <: AbstractVector{T} with size (typemax(Int),) and
unconstrained getindex so view() wraps it safely under @inbounds
- Replace all OffsetVector(x/θ, off) in KA kernels with view(arr, off+1:off+n)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add ExaModel{T, <:AbstractVector} specializations for obj, cons_nln!,
grad!, jac_coord!, hess_coord! that bypass the batch view-creating loop
(which was introduced for the batched ExaModel{T, <:AbstractMatrix} path).
The batch loop created SubArray views even for nb=1, degrading performance
3–7× on jac (elec) and 1.2–4× on obj/grad/hess. Non-batch models now
dispatch to the original direct-vector kernel paths with no overhead.
Batch (AbstractMatrix) models are unaffected.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…A ambiguity
The <:AbstractVector dispatch for obj/cons_nln!/grad!/jac_coord!/hess_coord!
was too broad: it matched non-batch GPU models (CuVector<:AbstractVector),
creating method ambiguity with the KA extension's E<:KAExtension dispatches,
and silently dropping the GPU backend for jac/hess on those models.
Restrict all fast direct-path specializations to ExaModel{T,<:AbstractVector,Nothing}
(CPU-only). GPU non-batch models (E<:KAExtension, VT<:AbstractVector) now
fall through to the general ExaModel{T} path which passes the backend through,
hitting the KA GPU kernels as before.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s_coord!
These were dropped in a previous edit, causing MethodAmbiguity between
ExaModel{T}(AbstractVecOrMat, AbstractVecOrMat) and NLPModels'
AbstractNLPModel(AbstractVector, AbstractVector) when called with
plain vector arguments.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rray
KAExtension's internal buffers are always 1D (created via similar(c.x0, n)),
but c.x0 can be a matrix for batch models. AbstractArray{T} is the correct
constraint since similar(::AbstractMatrix, n::Int) returns a 1D array which
satisfies AbstractArray{T} but the old AbstractVector{T} was misleadingly
restrictive.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… up docs/tests - Remove set_parameter! (deprecated); replace with set_value! in tests and docs - TwoStageExaCore now accepts nbatch kwarg for batch scenario support - Update parameters.md to use @add_par/@add_var/@add_obj/@add_con macros - Move MadNLP/PowerModels/Percival/NLPModelsTest to main deps in Project.toml - Whitespace cleanup in LuksanVlcekApp.jl Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tension - Add jac_structure!/hess_structure! overrides for KAExtension models to route to backend-aware GPU kernels instead of CPU scalar iteration over GPU arrays - Relax jrpass/hrpass/hdrpass dispatch from y1::V,y2::V (same type) to y1::V1,y2::V2 (same eltype) so view+array pairs dispatch correctly in kerj/kerh - Add obj/grad! overrides for VT<:AbstractMatrix + KAExtension to resolve method ambiguity and throw ArgumentError for batch models receiving vector input - Revert KAExtension VT param to AbstractVector (buffers are always 1D) - Fix kersyspmv2 parse error: idx = @index(Global)0 -> idx = @index(Global) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use offset0(f, itr, k) in kerf_con_aug_batch to compute the correct constraint index for ConstraintAugmentation. The previous approach passed con.oa (a scalar conbuffer offset) as an array and indexed into it, producing wrong indices for nb > 1 batches and causing solver restoration failures. Also add test_batch_opf_flat: a 3-batch AC OPF test (case3_lmbd) with augmented power balance constraints solved via FlatNLPModel + MadNLP. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MadNLP's default MakeParameter treatment calls findall on GPU boolean arrays which triggers scalar indexing errors. RelaxBound skips that entirely and is GPU-compatible. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_classify_bounds: collect lb/ub to CPU (Array()) before findall to avoid GPU scalar indexing (Metal InvalidIRError, PoCL findall error). test_batch_opf_flat: wrap FlatNLPModel in WrapperNLPModel so MadNLP gets CPU arrays for its internals (force_lower_triangular!, etc.); GPU model computations still happen on-device via the inner model. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The per-batch loop called BatchExaModel.jtprod_nln! with 1D GPU slices of one-batch length, but the KA prodhelper sparsity structures are built for the full nb-batch flat vector. Pass the full flat x/v/Jtv vectors directly to m.batch.jtprod_nln! so the KA kernel uses the correct extents. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Adds
BatchExaModelfor solving multiple independent optimization instances that share identical structure but differ in parameter values. All instances share one compiled expression pattern and are fused into a single model for efficient SIMD/GPU evaluation.Core features
BatchExaCore(ns)— create a core fornsbatch instances; variables, parameters, objectives, and constraints are defined once and replicated automaticallyobj!,grad!,cons!,jac_coord!,hess_coord!operate on matrix-valued(dim, ns)inputsFlattenNLPModel— wraps aBatchExaModelas a standardAbstractNLPModel{T, Vector{T}}for use with any NLPModels-compatible solver (e.g. Ipopt)get_model(model)— returns the solver-ready flat model for batch, identity for regular modelsvar_indices(model, i)— extract per-instance solution from the flat solver outputTwoStageExaCore(ns; nbatch = Val(N))combines batch and two-stage modelsAPI additions
BatchExaCore,BatchExaModel,FlattenNLPModelget_nbatch,get_model,var_indices,cons_block_indices@add_var,@add_par,@add_obj,@add_con,@add_con!,@add_exprall work with batch coresset_value!/get_valuefor updating parameters on the model (batch-aware, broadcasts across instances)set_parameter!deprecated in favor ofset_value!Other changes
TwoStageExaCore/TwoStageExaModelwithEachScenario()API, getter/setter helpersadd_conconsolidated tons...varargs (single method handles both explicit dims and generator)append!fixed for N-dimensional arrays (fixes juliac COPSApp)build_extensionwidened fromAbstractVectortoAbstractArrayso batch models get KA accelerationTests
BatchTest: construction, obj/grad/cons/jac/hess, bounds, error guards, Ipopt,set_value!, multidim vars,add_con!,add_expr, per-instance accessors, GPU backend loopTwoStageTest: construction, evaluation, Ipopt (inequality/equality/multi-var), batched two-stage (construction/eval/Ipopt), getters/settersGetterSetterTest: parameter/variable/constraint get/set APIDocs
docs/src/batch.jltutorial with worked examplesBatchNLPModelsautodocs in API manualdocs/src/parameters.jlto useset_value!cc @michel2323