I’m testing gemma4:26b locally in a bounded Project Phoenix evaluation lane and wanted to report a concrete behavior difference versus gemma3:27b.
On an RTX 3090, gemma4:26b loads and runs cleanly at 100% GPU, and it is fast:
- total bundle time:
99.406s
- proxy stage:
43.158s
- protocol stage:
56.127s
However, on a bounded machine-facing protocol lane, the model failed all 6/6 protocol probes as non_json:
strict: 0/6
wrapper: 0/6
safe_repair: 0/6
By comparison, our current gemma3:27b row on the same lane is materially stronger:
- desktop:
3/6, 5/6, 5/6
- laptop:
2/6, 5/6, 5/6
So the current early read is:
Gemma 4 appears faster and stronger in general reasoning / HITL use
- but
Gemma 3 is currently much safer in a strict machine-facing protocol / handoff setting
Question:
- Is this kind of weak protocol-following / JSON-discipline behavior versus Gemma 3 expected in the current release?
- Is there a recommended patch, prompt pattern, runtime setting, or updated checkpoint that would improve it?
I’m happy to provide more exact artifact details if useful.
I’m testing
gemma4:26blocally in a bounded Project Phoenix evaluation lane and wanted to report a concrete behavior difference versusgemma3:27b.On an
RTX 3090,gemma4:26bloads and runs cleanly at100% GPU, and it is fast:99.406s43.158s56.127sHowever, on a bounded machine-facing protocol lane, the model failed all
6/6protocol probes asnon_json:strict:0/6wrapper:0/6safe_repair:0/6By comparison, our current
gemma3:27brow on the same lane is materially stronger:3/6,5/6,5/62/6,5/6,5/6So the current early read is:
Gemma 4appears faster and stronger in general reasoning / HITL useGemma 3is currently much safer in a strict machine-facing protocol / handoff settingQuestion:
I’m happy to provide more exact artifact details if useful.