perf: skip H2 for full-tunnel batch requests by yyoyoian-pixel · Pull Request #1040 · therealaleph/MasterHttpRelayVPN-RUST

yyoyoian-pixel · 2026-05-10T23:24:40Z

Summary

Full-tunnel batches already coalesce N ops into one HTTP request — H2 stream multiplexing has nothing to multiplex. This PR skips the H2 path for tunnel_batch_request_to and keeps the H1 pool at full capacity.

H2 multiplexing remains active for relay mode (non-full) where each browser request is a separate HTTP call that genuinely benefits from stream multiplexing.

Problem

H2 for tunnel batches introduced three regressions vs v1.9.14:

Long-poll stalls: idle polls complete at 16–17s (LONGPOLL_DEADLINE + network latency) instead of timing out at 10s on H1. Each poll holds an Apps Script execution slot 60% longer.
Silent batch drops: RequestSent::Maybe failures drop the entire batch with no retry — a failure mode H1 doesn't have.
Pool starvation: POOL_MIN_H2_FALLBACK trims the H1 pool from 8→2 once H2 connects, but tunnel batches still use H1 and need the full pool.

A/B test results (Pixel 6 Pro, 30 batch samples each)

Metric	H2 (stock v1.9.20)	H1 (this PR)	v1.9.14 (baseline)
16–17s batches	8–10/30	0/30	0/30
10s timeouts	0	4/30	5/30
Active RTTs	1.4–2.4s	1.3–2.2s	1.4–2.3s

This PR restores v1.9.14 tunnel performance while keeping all v1.9.15+ improvements (H2 for relay, zero-copy mux, block DoH/QUIC, TLS pool tuning).

Changes

tunnel_batch_request_to: remove H2 try/fallback/NonRetryable block, go straight to H1 pool acquire() (-54 lines)
run_pool_refill: always maintain POOL_MIN (8). Remove POOL_MIN_H2_FALLBACK (2) trim that starved tunnel batches (-12 lines)

Test plan

cargo check clean
A/B tested on Pixel 6 Pro — v1.9.14 performance restored
Four separate sample runs confirm zero 16-17s stalls with H1
H2 still active for relay mode (not affected)

🤖 Generated with Claude Code

Full-tunnel batches already coalesce N ops into one HTTP request, so H2 stream multiplexing adds no benefit — there's nothing to multiplex. Worse, H2 introduces measurable regressions: - Long-poll batches complete at 16–17s (LONGPOLL_DEADLINE + latency) instead of timing out at 10s on H1. Each idle poll holds an Apps Script execution slot 60% longer, reducing available concurrency. - NonRetryable error path (RequestSent::Maybe) silently drops batches with no retry — data loss the H1 path doesn't have. - POOL_MIN_H2_FALLBACK trims the H1 pool from 8 to 2 once H2 lands, starving tunnel batches that still need the H1 pool. A/B tested on Pixel 6 Pro (30 batch samples each): | Metric | H2 (stock v1.9.20) | H1 (this PR) | v1.9.14 | |------------------|--------------------|--------------|---------| | 16–17s batches | 8–10/30 | 0/30 | 0/30 | | 10s timeouts | 0 | 4/30 | 5/30 | | Active RTTs | 1.4–2.4s | 1.3–2.2s | 1.4–2.3s| Changes: - tunnel_batch_request_to: skip h2_relay_request, go straight to H1 pool acquire(). Removes the H2 try/fallback/NonRetryable block. - run_pool_refill: always maintain POOL_MIN (8) connections. The H2-era POOL_MIN_H2_FALLBACK (2) trim starved tunnel batches; with tunnel traffic on H1 the pool must stay at full capacity. H2 multiplexing remains active for relay mode (non-full) where it genuinely helps — each browser request is a separate HTTP call that benefits from stream multiplexing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

therealaleph · 2026-05-10T23:32:10Z

Reviewed via Anthropic Claude. Verified locally:

cargo test --lib --release: 209/209 ✅
cargo build --release --features ui --bin mhrv-rs-ui: clean ✅

The architectural reasoning is sound: tunnel batches are already-coalesced single HTTP requests where H2 stream multiplexing has no value (nothing to multiplex). H2 multiplexing genuinely shines on relay mode (one HTTP call per browser request, dozens parallel) — and you've explicitly kept H2 there. The A/B data on Pixel 6 Pro is concrete (8-10/30 stalls → 0/30).

Two things I want to verify before merging:

1. Interaction with PR #1029 (just merged in v1.9.20): that PR was @rezaisrad's fix for the 3-week #924 cluster — same warm() race in domain_fronter.rs that this PR also touches. Their fix parallelized h1 prewarm with h2 handshake and added H2Cell.dead AtomicBool for synchronous dead-cell detection. With this PR removing the h2 path for tunnel_batch_request_to entirely, the H2Cell.dead flag stops getting consulted on the tunnel batch path — that's fine (no regression) but the dead-cell logic now only covers relay mode. Worth a one-liner comment in the changelog noting this.

2. Cross-check vs #962 (r0ar's data): that issue's controlled A/B showed h2 enabled was strictly better than force_http1: true for apps_script mode, not Full mode. Your PR scopes the skip to tunnel_batch_request_to only (Full mode batch path), so r0ar's apps_script users keep h2 — same trade-off as your "H2 still active for relay mode" claim. Just confirming this matches the diff.

Plan: leaving open for 3-5 days community testing. Specifically asking testers:

Cold-start Full mode after mhrv-rs restart — does v1.9.20+this PR feel snappier than stock v1.9.20?
Long sustained browse session (>30 min) — any new stalls or disconnect patterns?
Mixed apps_script + Full mode (run both deployments via separate script_ids): does relay mode latency regress?

If 2-3 testers confirm "feels like v1.9.14 again" without surprise regressions, I'll squash-merge as v1.9.21.

Excellent perf work — the empirical bisect-into-h2-overhead is the right framing for these kinds of low-level transport regressions.

_{[reply via Anthropic Claude | reviewed by @therealaleph]}

@yyoyoian-pixel

Bumps Cargo.toml v1.9.20 → v1.9.21 and ships the changelog. Headline: PR #1040 (@yyoyoian-pixel) skips H2 multiplexing for tunnel_batch_request_to (Full-mode batch path). Tunnel batches already coalesce N ops into one HTTP request, so H2 stream multiplexing has nothing to multiplex there; the H2 path was actively hurting via long-poll stalls + silent batch drops + pool starvation. H2 remains active for relay mode (apps_script), where r0ar's #962 A/B data confirmed it's strictly better. A/B on Pixel 6 Pro: 0/30 vs 8-10/30 long-poll stalls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Resolve conflict in src/domain_fronter.rs (tunnel_batch_request_with_timeout). main (therealaleph#1040) intentionally removed the h2 fast path from tunnel batches because coalesced batches see no h2-multiplexing benefit. The branch's contribution — threading caller-supplied `batch_timeout` (so pipelined batches honor `max(configured, PIPELINED_BATCH_TIMEOUT_FLOOR)`) — is preserved on the h1 header-read path, which is the only path now. Build: clean on both crates. Tests: 217/217 client lib, 57/57 tunnel-node.

Completes [#1040](#1040) (v1.9.21). #1040 skipped H2 for `tunnel_batch_request_to` but missed `tunnel_request` — the single-op path used for plain `connect` ops. The 16-17s long-poll stalls persisted on full-tunnel sessions that go through the single-op path; this PR closes that gap. Same fix shape: remove the H2 try/fallback/NonRetryable block from `tunnel_request`, go straight to H1 pool `acquire()`. H2 remains active for relay-mode paths (`do_relay_once_with`, exit-node `relay()`). ## All h2_relay_request call sites audited | Call site | Function | Mode | H2 skipped? | |---|---|---|---| | `do_relay_once_with` | relay | Relay | No (correct — relay benefits from H2) | | `relay()` exit-node | relay | Relay | No (correct) | | `tunnel_request` | tunnel single op | Full tunnel | **YES — this PR** | | `tunnel_batch_request_to` | tunnel batch | Full tunnel | Yes (PR #1040) | | `tunnel_batch_request_with_timeout` | tunnel batch | Full tunnel | Yes (PR #1040) | No other full-tunnel paths use H2 after this fix. ## Verified locally on top of v1.9.21 - `cargo test --lib --release`: 209/209 ✅ - `cargo build --release --features ui --bin mhrv-rs-ui`: clean ✅ Reviewed via Anthropic Claude. Co-Authored-By: yyoyoian-pixel <noreply@github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@yyoyoian-pixel

…ops (#1041) Bumps Cargo.toml v1.9.21 → v1.9.22. Ships @yyoyoian-pixel's PR #1041 which completes #1040 — v1.9.21 skipped H2 for tunnel_batch_request_to but missed tunnel_request (single-op connect path). 5/5 h2_relay_request call sites now audited; all full-tunnel paths use H1, relay paths keep H2. 209 lib tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

) Fixes #1088 — under Full mode, a single slow Apps Script edge cascade-killed every in-flight tunnel session sharing its batch. Users on 1.9.21+ saw frequent 10s "batch timeout" errors and lost download progress on Telegram / browser sessions. ## Root cause `read_http_response` in `domain_fronter.rs` had a **hardcoded 10s header-read timeout** that ran *inside* `tunnel_batch_request_to` — independent of and shorter than the outer `tokio::time::timeout(batch_timeout, …)` in `fire_batch`. Apps Script cold starts routinely land in the 8-12s range (PR #1040's A/B recorded 4/30 H1 batches timing out at exactly 10s after the H2→H1 switch), so the inner cliff fired as a false-positive batch timeout well before `request_timeout_secs` (default 30s) could. Secondary: even with a parameterized timeout, the per-read `timeout(d, stream.read(...))` form would silently extend its budget if a peer drip-fed bytes just under `d` each — a slow edge could keep the loop alive past the outer `batch_timeout` and defeat the whole wiring. ## Fix (two changes in `domain_fronter.rs`) 1. **`tunnel_batch_request_to` passes `batch_timeout` to the header read** via new `read_http_response_with_header_timeout` helper. `Config::request_timeout_secs` is now the only knob controlling how long we wait for an Apps Script edge to start responding. Other callers (relay path, exit-node) keep the historical 10s value. 2. **Header read uses a single absolute deadline** (`tokio::time::timeout_at(deadline, …)`) instead of per-read `timeout()`. Total elapsed across all header reads is bounded by `header_read_timeout`, regardless of read cadence. ## Bonus (in `tunnel_client.rs`) 3. **`TunnelMux::reply_timeout` co-varies with `batch_timeout`**: computed at construction as `fronter.batch_timeout() + 5s slack` instead of the fixed 35s const. Operators raising `request_timeout_secs` no longer have sessions abandon `reply_rx` just before `fire_batch`'s HTTP round-trip would complete. ## Verified locally (on top of v1.9.23 / main after #1117 merge) - `cargo test --lib --release`: **231/231** ✅ (was 209 in v1.9.23 baseline; this PR adds 22 new tests covering the deadline/co-variance behavior) - `cargo build --release --features ui --bin mhrv-rs-ui`: clean ✅ ## Interaction with v1.9.20 (PR #1029) PR #1029 added `H1_OPEN_TIMEOUT_SECS = 8` to bound the TCP+TLS handshake in `open()`. That bound is **separate** from the header-read timeout this PR addresses — both bounds exist in the same call chain. Issue #1131 (BuffOvrFlw, just opened) reports `h1 open timed out after 8s` errors which are the `open()` bound firing, not the header-read bound. Worth a follow-up to make `H1_OPEN_TIMEOUT_SECS` parameterized too, but that's a separate change. Reviewed via Anthropic Claude. Co-Authored-By: dazzling-no-more <noreply@github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

therealaleph mentioned this pull request May 10, 2026

سرعت پایین مدل Full Tunnel #1038

Closed

therealaleph merged commit dd7b355 into therealaleph:main May 10, 2026
1 check passed

yyoyoian-pixel mentioned this pull request May 11, 2026

fix: skip H2 for tunnel_request (single ops) — completes #1040 #1041

Merged

dazzling-no-more mentioned this pull request May 12, 2026

fix(tunnel): batch header read honors request_timeout_secs (#1088) #1108

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: skip H2 for full-tunnel batch requests#1040

perf: skip H2 for full-tunnel batch requests#1040
therealaleph merged 1 commit into
therealaleph:mainfrom
yyoyoian-pixel:perf/skip-h2-for-tunnel-batches

yyoyoian-pixel commented May 10, 2026

Uh oh!

therealaleph commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yyoyoian-pixel commented May 10, 2026

Summary

Problem

A/B test results (Pixel 6 Pro, 30 batch samples each)

Changes

Test plan

Uh oh!

therealaleph commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants