Added submatrices support for the SCE method#651
Added submatrices support for the SCE method#651denisilie94 wants to merge 2 commits intoarcee-ai:mainfrom
Conversation
| # Handle shape mismatch - resize to base dimensions | ||
| if t.shape != base_tensor.shape: | ||
| # Slice tensor to match base_tensor dimensions | ||
| t = t[: base_tensor.shape[0], : base_tensor.shape[1]] |
There was a problem hiding this comment.
Bug: Submatrix slicing crashes on 1D tensors
The submatrix slicing t[: base_tensor.shape[0], : base_tensor.shape[1]] assumes tensors are 2D. If a 1D tensor (like a bias vector or layer norm weight) has a shape mismatch, accessing base_tensor.shape[1] will raise an IndexError. The reference implementation in generalized_task_arithmetic.py avoids this by only applying 2D slicing to is_embed weights, which are guaranteed to be 2D embedding matrices. The SCE implementation lacks this guard and applies 2D indexing unconditionally to any shape-mismatched tensor.
| logging.warning(f"Using submatrix of tensor {idx}") | ||
|
|
||
| # Compute task vector (delta) | ||
| task_vector = t - base_tensor |
There was a problem hiding this comment.
Bug: Slicing fails when tensor is smaller than base
The submatrix slicing t[: base_tensor.shape[0], : base_tensor.shape[1]] only works when t is larger than base_tensor. If t is smaller in any dimension (e.g., a model with smaller vocabulary), the slice operation returns t unchanged, and the subsequent subtraction t - base_tensor on line 45 will raise a broadcasting error due to shape mismatch. The reference implementation in generalized_task_arithmetic.py handles this by skipping non-embedding tensors with mismatches entirely, but the SCE implementation unconditionally attempts to proceed.
Summary
Standardizes task vector extraction in SCE to use the same logic in get_task_vectors function from
generalized_task_arithmetic.py, adding submatrix support that was previously missing.Problem
SCE currently uses an independent implementation for extracting task vectors
This independent implementation lacks submatrix support
The
get_task_vectorsfunction ingeneralized_task_arithmetic.pyalready provides this functionalityCode duplication creates maintenance overhead and feature inconsistency, at some point SCE should benefit from the
generalized_task_arithmetic.pyimplementations.Solution
Added submatrix support for SCE such as in the existing
get_task_vectors functionEnsures consistent logic across both implementations
Adds submatrix support to SCE operations
Note
Adds submatrix handling and dtype normalization to
sce_merge, stacking only valid task vectors and warning when tensors are sliced.sce(mergekit/merge_methods/sce.py):base_tensorwhen shapes differ, with a warning per tensor.base_tensor.dtypebefore computing task vectors.sce_mask), sign-consensus erasing, weighting, and merge logic.Written by Cursor Bugbot for commit b318151. This will update automatically on new commits. Configure here.