Skip to content

fix(forwarder): qoe_downshift_overshoot false-positive on startup ramp (no actual downshift) #792

Description

@jonathaneoliver

Summary

qoe_downshift_overshoot fires during the startup ramp even when the player never downshifts — it climbs from the conservative startup rung (360p) straight up to the top. The label is meant for rampdown / pyramid-descent over-correction (#669), not the initial ascent, so this is a false positive.

Evidence

Observed on the Android TV / HLS / live_offset=6 startup sweep finding (exp seed-config-androidtv-hls-live-offset-startup + its rep batch rep-8c0d9e-*).

Play b104c934-dae9-4790-85fa-cee6e6424afa label histogram:

critical=*qoe_cirr_breach
info=*qoe_tier_premium
warning=stall_segment
info=shift_up                  <- the only rendition-direction event
warning=*qoe_cirr_concerning
warning=*qoe_downshift_overshoot   <- fired anyway
info=first_frame

There is no shift_down in the whole play. The rendition path is purely upward: first frame at 640x360 (~1 Mbps, the floor rung) → shift_up → 3840x2160 (qoe_tier_premium). So nothing downshifted, yet qoe_downshift_overshoot is present. The single labelled row lands at the startup boundary (before a steady rendition is established).

Root cause

downshiftOvershootLabel in analytics/go-forwarder/qoe_labels.go:439 is purely instantaneous — it fires whenever the selected rung sits DownshiftOvershootRungs (=2) or more rungs below the rung the effective cap supports:

if ceilingIdx-curIdx >= cfg.ABR.DownshiftOvershootRungs {
    return qoeLabel(SevWarning, "qoe_downshift_overshoot")
}

It has no notion of direction or history. At startup the player legitimately sits at the floor rung (curIdx≈0) while it ramps up; with a high cap-supported ceiling the ceilingIdx - curIdx >= 2 gap trips immediately — indistinguishable from a genuine over-downshift. The doc comment even says the intent is "the overshoot we want flagged on rampdown / pyramid-descent runs," i.e. descent, not ascent.

Proposed fix

Gate the label on an actual downshift / directionality, so the conservative startup ascent can't trip it. Options (any that satisfies the acceptance test):

  • Only fire if the player has previously held a higher rung earlier in this play (a real decrease), or after a recent shift_down; suppress while still in the initial upshift ramp from the startup rung.
  • Equivalently: don't fire until the play has first reached (at least once) the cap-supported ceiling / steady state.

Keep the existing rampdown / pyramid-descent behaviour intact.

Acceptance

  • A startup ramp 360p → top with no shift_down must NOT emit qoe_downshift_overshoot.
  • Genuine rampdown / pyramid-descent over-correction still fires (existing labels_test.go overshoot cases keep passing).
  • Add a regression case in analytics/go-forwarder/labels_test.go covering the startup-ascent-no-downshift scenario.

Found via the #772 automated sweep (config class).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpoints:3Fibonacci story points: 3priority:P2Next sprintvalue:mediumMedium product value

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions