fix problems in the FOEM data processing pipeline#2659
fix problems in the FOEM data processing pipeline#2659Qubitium merged 1 commit intoModelCloud:mainfrom
Conversation
…teraction with GPTAQ
|
@Xingyu-Zheng LGTM! Thanks. |
|
@Xingyu-Zheng I just remebered why GPTAQ had issues with MoE. Calibration data is feed to model serially and becomes orderd input to module which generates output. GPTAQ processed had the assumption that the input captured is in the samer order. The problem of MoE routing is that input of [a, b, c, e] may be seen by an MoE module as [ b, e ] but there was no safe to actually match captured |
|
@Qubitium I haven’t studied MoE models in depth, nor have I carefully gone through the implementation details in GPTQModel. However, here is my current hypothesis. GPTAQ assumes a dual-stream data flow, where one stream corresponds to the FP model and the other to the progressively quantized model. As earlier layers become quantized, the routing decisions in later MoE layers may start to diverge between the two streams. For example, the FP model might route tokens {a, c} to expert 1, while the quantized model routes {b, d, e} to the same expert. As a result, when GPTAQ performs calibration on expert 1, the inputs If this hypothesis is correct, there may be several possible solutions:
I should note that I am not deeply familiar with MoE mechanisms, so these are only preliminary thoughts. I hope they might still provide some useful insights. |
When testing FOEM on Qwen3.5-35B-A3B, the error caused by reusing GPTAQ’s data processing pipeline occurs even earlier than the previous issue I encountered in
gptqmodel/quantization/foem.py. In this case, simply settingalpha = 0does not resolve the problem.To address this, I added a special handling of
alphain the processor to ensure that FOEM, when used alone, achieves better generalization consistent with GPTQ.