Skip to content

fix: skip subrequests in LOG_PHASE handler and avoid cache lock nesting (P0 SIGSEGV + P1 deadlock risk)#344

Open
yaoge123 wants to merge 1 commit into
vozlt:masterfrom
yaoge123:fix/subrequest-cache-lock
Open

fix: skip subrequests in LOG_PHASE handler and avoid cache lock nesting (P0 SIGSEGV + P1 deadlock risk)#344
yaoge123 wants to merge 1 commit into
vozlt:masterfrom
yaoge123:fix/subrequest-cache-lock

Conversation

@yaoge123

Copy link
Copy Markdown

Problem

When nginx processes requests that generate a large number of internal background subrequests (via NGX_HTTP_SUBREQUEST_BACKGROUND), each subrequest triggers VTS's NGX_HTTP_LOG_PHASE handler and the filter ACCESS_PHASE handler. This causes three bugs:

Bug 1 (P0 — SIGSEGV): r->upstream NULL dereference in shm_add_upstream()

File: src/ngx_http_vhost_traffic_status_shm.c

// Before fix — r->upstream is checked for NULL *after* being dereferenced:
if (r->upstream_states == NULL || r->upstream_states->nelts == 0
    || r->upstream->state == NULL)   // ← SIGSEGV if r->upstream == NULL

A subrequest can have r->upstream_states != NULL (partially initialized or inherited) while r->upstream == NULL. Dereferencing r->upstream->state crashes the worker.

Observed symptom: 3 nginx worker processes crash with SIGSEGV (signal 11) after a batch background subrequest operation completes (e.g. 100,000+ subrequests).

Bug 2 (P0 — incorrect stats): subrequests counted as independent requests

Files: src/ngx_http_vhost_traffic_status_module.c, src/ngx_http_vhost_traffic_status_set.c

VTS's NGX_HTTP_LOG_PHASE handler has no guard against subrequests. Every background subrequest is counted as a separate request in shm_add_server(), shm_add_upstream(), shm_add_filter(), and shm_add_cache(), producing severely inflated statistics and unnecessary shared-memory mutex contention.

Bug 3 (P1 — deadlock risk): nested mutex acquisition in shm_add_node_cache()

File: src/ngx_http_vhost_traffic_status_shm.c

shm_add_node_cache() is called while the VTS slab mutex (shpool->mutex) is already held. Inside it, it acquires cache->shpool->mutex — a different shared-memory zone's mutex:

// shm_add_node() holds shpool->mutex here
ngx_shmtx_lock(&cache->shpool->mutex);   // ← nested lock, different shpool
vtsn->stat_cache_used_size = ...;
ngx_shmtx_unlock(&cache->shpool->mutex);

If any other code path (e.g. the nginx cache manager) acquires cache->shpool->mutex first and then tries to interact with VTS, the reversed lock order can deadlock the worker.

Fix

Bug 1 + Bug 2: add subrequest guard

// In both module.c and set.c, immediately after the enable check:
if (r != r->main) {
    return NGX_DECLINED;
}

Also add the missing NULL check in shm_add_upstream():

if (r->upstream_states == NULL || r->upstream_states->nelts == 0
    || r->upstream == NULL || r->upstream->state == NULL)

Bug 3: read cache sizes before acquiring VTS slab mutex

Extract cache size reading into a new shm_get_cache_size() helper that is called before ngx_shmtx_lock(&shpool->mutex). The VTS node update (shm_add_node_cache()) then receives the already-read values as parameters and no longer needs to acquire any lock:

[Before] shm_add_node() locks VTS mutex → calls shm_add_node_cache() → locks cache mutex  (nested!)
[After]  shm_get_cache_size() locks+unlocks cache mutex → shm_add_node() locks VTS mutex → shm_add_node_cache() uses pre-read values (no lock)

Changes

  • src/ngx_http_vhost_traffic_status_module.c — subrequest guard (r != r->main)
  • src/ngx_http_vhost_traffic_status_set.c — subrequest guard (r != r->main)
  • src/ngx_http_vhost_traffic_status_shm.cr->upstream NULL check; extract shm_get_cache_size(); pass sizes as parameters to shm_add_node_cache()

Copilot AI review requested due to automatic review settings April 12, 2026 14:23

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes VTS stats collection for workloads that generate many internal/background subrequests by (1) skipping subrequests in the LOG_PHASE and filter/set handlers, (2) preventing a NULL dereference in upstream accounting, and (3) avoiding nested shared-memory mutex acquisition when reading cache sizes.

Changes:

  • Skip subrequests via if (r != r->main) return NGX_DECLINED; in both the LOG_PHASE handler and the filter/set handler.
  • Fix potential SIGSEGV by checking r->upstream == NULL before dereferencing r->upstream->state.
  • Refactor cache size reads into a pre-lock helper to avoid cache mutex acquisition while holding the VTS slab mutex.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/ngx_http_vhost_traffic_status_shm.c Adds upstream NULL guard; refactors cache size reading to avoid nested mutex locking.
src/ngx_http_vhost_traffic_status_set.c Skips subrequests in the set/filter handler to avoid inflated stats and contention.
src/ngx_http_vhost_traffic_status_module.c Skips subrequests in the LOG_PHASE handler to avoid inflated stats and contention.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 90 to 95
size_t size;
unsigned init;
uint32_t hash;
ngx_int_t rc;
ngx_int_t status_code_slot;
ngx_slab_pool_t *shpool;

Copilot AI Apr 12, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rc is declared unconditionally but only used inside #if (NGX_HTTP_CACHE). When nginx is built without NGX_HTTP_CACHE, this becomes an unused local and may fail builds that treat warnings as errors. Consider moving the rc declaration inside the #if (NGX_HTTP_CACHE) block (or otherwise compiling it out when cache is disabled).

Copilot uses AI. Check for mistakes.
Comment on lines 293 to +298
if (!ctx->enable || !vtscf->enable || vtscf->bypass_stats) {
return NGX_DECLINED;
}
if (r != r->main) {
return NGX_DECLINED;
}

Copilot AI Apr 12, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change intentionally skips subrequests (r != r->main), which can significantly affect accounting behavior. There are existing Test::Nginx tests in t/ for request counters/filters; consider adding a regression test that triggers an internal subrequest (e.g., auth_request or another subrequest-producing directive) and asserts VTS counters only increment for the main request.

Copilot uses AI. Check for mistakes.
Comment on lines 40 to +45
if (!ctx->enable || !vtscf->filter) {
return NGX_DECLINED;
}
if (r != r->main) {
return NGX_DECLINED;
}

Copilot AI Apr 12, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the LOG_PHASE handler, the filter/set handler now skips subrequests (r != r->main). To prevent regressions (and to validate the intended behavior), consider extending the existing Test::Nginx coverage to include a scenario where a request triggers subrequests and verifies vhost_traffic_status_set_by_filter variables are only updated for the main request.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants