perf: skip H2 for full-tunnel batch requests#1040
Conversation
Full-tunnel batches already coalesce N ops into one HTTP request, so H2 stream multiplexing adds no benefit — there's nothing to multiplex. Worse, H2 introduces measurable regressions: - Long-poll batches complete at 16–17s (LONGPOLL_DEADLINE + latency) instead of timing out at 10s on H1. Each idle poll holds an Apps Script execution slot 60% longer, reducing available concurrency. - NonRetryable error path (RequestSent::Maybe) silently drops batches with no retry — data loss the H1 path doesn't have. - POOL_MIN_H2_FALLBACK trims the H1 pool from 8 to 2 once H2 lands, starving tunnel batches that still need the H1 pool. A/B tested on Pixel 6 Pro (30 batch samples each): | Metric | H2 (stock v1.9.20) | H1 (this PR) | v1.9.14 | |------------------|--------------------|--------------|---------| | 16–17s batches | 8–10/30 | 0/30 | 0/30 | | 10s timeouts | 0 | 4/30 | 5/30 | | Active RTTs | 1.4–2.4s | 1.3–2.2s | 1.4–2.3s| Changes: - tunnel_batch_request_to: skip h2_relay_request, go straight to H1 pool acquire(). Removes the H2 try/fallback/NonRetryable block. - run_pool_refill: always maintain POOL_MIN (8) connections. The H2-era POOL_MIN_H2_FALLBACK (2) trim starved tunnel batches; with tunnel traffic on H1 the pool must stay at full capacity. H2 multiplexing remains active for relay mode (non-full) where it genuinely helps — each browser request is a separate HTTP call that benefits from stream multiplexing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Reviewed via Anthropic Claude. Verified locally:
The architectural reasoning is sound: tunnel batches are already-coalesced single HTTP requests where H2 stream multiplexing has no value (nothing to multiplex). H2 multiplexing genuinely shines on relay mode (one HTTP call per browser request, dozens parallel) — and you've explicitly kept H2 there. The A/B data on Pixel 6 Pro is concrete (8-10/30 stalls → 0/30). Two things I want to verify before merging: 1. Interaction with PR #1029 (just merged in v1.9.20): that PR was @rezaisrad's fix for the 3-week #924 cluster — same 2. Cross-check vs #962 (r0ar's data): that issue's controlled A/B showed h2 enabled was strictly better than Plan: leaving open for 3-5 days community testing. Specifically asking testers:
If 2-3 testers confirm "feels like v1.9.14 again" without surprise regressions, I'll squash-merge as v1.9.21. Excellent perf work — the empirical bisect-into-h2-overhead is the right framing for these kinds of low-level transport regressions. [reply via Anthropic Claude | reviewed by @therealaleph] |
Bumps Cargo.toml v1.9.20 → v1.9.21 and ships the changelog. Headline: PR #1040 (@yyoyoian-pixel) skips H2 multiplexing for tunnel_batch_request_to (Full-mode batch path). Tunnel batches already coalesce N ops into one HTTP request, so H2 stream multiplexing has nothing to multiplex there; the H2 path was actively hurting via long-poll stalls + silent batch drops + pool starvation. H2 remains active for relay mode (apps_script), where r0ar's #962 A/B data confirmed it's strictly better. A/B on Pixel 6 Pro: 0/30 vs 8-10/30 long-poll stalls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve conflict in src/domain_fronter.rs (tunnel_batch_request_with_timeout). main (therealaleph#1040) intentionally removed the h2 fast path from tunnel batches because coalesced batches see no h2-multiplexing benefit. The branch's contribution — threading caller-supplied `batch_timeout` (so pipelined batches honor `max(configured, PIPELINED_BATCH_TIMEOUT_FLOOR)`) — is preserved on the h1 header-read path, which is the only path now. Build: clean on both crates. Tests: 217/217 client lib, 57/57 tunnel-node.
Completes [#1040](#1040) (v1.9.21). #1040 skipped H2 for `tunnel_batch_request_to` but missed `tunnel_request` — the single-op path used for plain `connect` ops. The 16-17s long-poll stalls persisted on full-tunnel sessions that go through the single-op path; this PR closes that gap. Same fix shape: remove the H2 try/fallback/NonRetryable block from `tunnel_request`, go straight to H1 pool `acquire()`. H2 remains active for relay-mode paths (`do_relay_once_with`, exit-node `relay()`). ## All h2_relay_request call sites audited | Call site | Function | Mode | H2 skipped? | |---|---|---|---| | `do_relay_once_with` | relay | Relay | No (correct — relay benefits from H2) | | `relay()` exit-node | relay | Relay | No (correct) | | `tunnel_request` | tunnel single op | Full tunnel | **YES — this PR** | | `tunnel_batch_request_to` | tunnel batch | Full tunnel | Yes (PR #1040) | | `tunnel_batch_request_with_timeout` | tunnel batch | Full tunnel | Yes (PR #1040) | No other full-tunnel paths use H2 after this fix. ## Verified locally on top of v1.9.21 - `cargo test --lib --release`: 209/209 ✅ - `cargo build --release --features ui --bin mhrv-rs-ui`: clean ✅ Reviewed via Anthropic Claude. Co-Authored-By: yyoyoian-pixel <noreply@github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ops (#1041) Bumps Cargo.toml v1.9.21 → v1.9.22. Ships @yyoyoian-pixel's PR #1041 which completes #1040 — v1.9.21 skipped H2 for tunnel_batch_request_to but missed tunnel_request (single-op connect path). 5/5 h2_relay_request call sites now audited; all full-tunnel paths use H1, relay paths keep H2. 209 lib tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
) Fixes #1088 — under Full mode, a single slow Apps Script edge cascade-killed every in-flight tunnel session sharing its batch. Users on 1.9.21+ saw frequent 10s "batch timeout" errors and lost download progress on Telegram / browser sessions. ## Root cause `read_http_response` in `domain_fronter.rs` had a **hardcoded 10s header-read timeout** that ran *inside* `tunnel_batch_request_to` — independent of and shorter than the outer `tokio::time::timeout(batch_timeout, …)` in `fire_batch`. Apps Script cold starts routinely land in the 8-12s range (PR #1040's A/B recorded 4/30 H1 batches timing out at exactly 10s after the H2→H1 switch), so the inner cliff fired as a false-positive batch timeout well before `request_timeout_secs` (default 30s) could. Secondary: even with a parameterized timeout, the per-read `timeout(d, stream.read(...))` form would silently extend its budget if a peer drip-fed bytes just under `d` each — a slow edge could keep the loop alive past the outer `batch_timeout` and defeat the whole wiring. ## Fix (two changes in `domain_fronter.rs`) 1. **`tunnel_batch_request_to` passes `batch_timeout` to the header read** via new `read_http_response_with_header_timeout` helper. `Config::request_timeout_secs` is now the only knob controlling how long we wait for an Apps Script edge to start responding. Other callers (relay path, exit-node) keep the historical 10s value. 2. **Header read uses a single absolute deadline** (`tokio::time::timeout_at(deadline, …)`) instead of per-read `timeout()`. Total elapsed across all header reads is bounded by `header_read_timeout`, regardless of read cadence. ## Bonus (in `tunnel_client.rs`) 3. **`TunnelMux::reply_timeout` co-varies with `batch_timeout`**: computed at construction as `fronter.batch_timeout() + 5s slack` instead of the fixed 35s const. Operators raising `request_timeout_secs` no longer have sessions abandon `reply_rx` just before `fire_batch`'s HTTP round-trip would complete. ## Verified locally (on top of v1.9.23 / main after #1117 merge) - `cargo test --lib --release`: **231/231** ✅ (was 209 in v1.9.23 baseline; this PR adds 22 new tests covering the deadline/co-variance behavior) - `cargo build --release --features ui --bin mhrv-rs-ui`: clean ✅ ## Interaction with v1.9.20 (PR #1029) PR #1029 added `H1_OPEN_TIMEOUT_SECS = 8` to bound the TCP+TLS handshake in `open()`. That bound is **separate** from the header-read timeout this PR addresses — both bounds exist in the same call chain. Issue #1131 (BuffOvrFlw, just opened) reports `h1 open timed out after 8s` errors which are the `open()` bound firing, not the header-read bound. Worth a follow-up to make `H1_OPEN_TIMEOUT_SECS` parameterized too, but that's a separate change. Reviewed via Anthropic Claude. Co-Authored-By: dazzling-no-more <noreply@github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Full-tunnel batches already coalesce N ops into one HTTP request — H2 stream multiplexing has nothing to multiplex. This PR skips the H2 path for
tunnel_batch_request_toand keeps the H1 pool at full capacity.H2 multiplexing remains active for relay mode (non-full) where each browser request is a separate HTTP call that genuinely benefits from stream multiplexing.
Problem
H2 for tunnel batches introduced three regressions vs v1.9.14:
RequestSent::Maybefailures drop the entire batch with no retry — a failure mode H1 doesn't have.POOL_MIN_H2_FALLBACKtrims the H1 pool from 8→2 once H2 connects, but tunnel batches still use H1 and need the full pool.A/B test results (Pixel 6 Pro, 30 batch samples each)
This PR restores v1.9.14 tunnel performance while keeping all v1.9.15+ improvements (H2 for relay, zero-copy mux, block DoH/QUIC, TLS pool tuning).
Changes
tunnel_batch_request_to: remove H2 try/fallback/NonRetryable block, go straight to H1 poolacquire()(-54 lines)run_pool_refill: always maintainPOOL_MIN(8). RemovePOOL_MIN_H2_FALLBACK(2) trim that starved tunnel batches (-12 lines)Test plan
cargo checkclean🤖 Generated with Claude Code