Skip to content

Gemma 4 26B: strong protocol-following regression vs Gemma 3 in bounded machine-facing evaluation #604

@blue-az

Description

@blue-az

I’m testing gemma4:26b locally in a bounded Project Phoenix evaluation lane and wanted to report a concrete behavior difference versus gemma3:27b.

On an RTX 3090, gemma4:26b loads and runs cleanly at 100% GPU, and it is fast:

  • total bundle time: 99.406s
  • proxy stage: 43.158s
  • protocol stage: 56.127s

However, on a bounded machine-facing protocol lane, the model failed all 6/6 protocol probes as non_json:

  • strict: 0/6
  • wrapper: 0/6
  • safe_repair: 0/6

By comparison, our current gemma3:27b row on the same lane is materially stronger:

  • desktop: 3/6, 5/6, 5/6
  • laptop: 2/6, 5/6, 5/6

So the current early read is:

  • Gemma 4 appears faster and stronger in general reasoning / HITL use
  • but Gemma 3 is currently much safer in a strict machine-facing protocol / handoff setting

Question:

  • Is this kind of weak protocol-following / JSON-discipline behavior versus Gemma 3 expected in the current release?
  • Is there a recommended patch, prompt pattern, runtime setting, or updated checkpoint that would improve it?

I’m happy to provide more exact artifact details if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions