fix: skip subrequests in LOG_PHASE handler and avoid cache lock nesting (P0 SIGSEGV + P1 deadlock risk)#344
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes VTS stats collection for workloads that generate many internal/background subrequests by (1) skipping subrequests in the LOG_PHASE and filter/set handlers, (2) preventing a NULL dereference in upstream accounting, and (3) avoiding nested shared-memory mutex acquisition when reading cache sizes.
Changes:
- Skip subrequests via
if (r != r->main) return NGX_DECLINED;in both the LOG_PHASE handler and the filter/set handler. - Fix potential SIGSEGV by checking
r->upstream == NULLbefore dereferencingr->upstream->state. - Refactor cache size reads into a pre-lock helper to avoid cache mutex acquisition while holding the VTS slab mutex.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/ngx_http_vhost_traffic_status_shm.c | Adds upstream NULL guard; refactors cache size reading to avoid nested mutex locking. |
| src/ngx_http_vhost_traffic_status_set.c | Skips subrequests in the set/filter handler to avoid inflated stats and contention. |
| src/ngx_http_vhost_traffic_status_module.c | Skips subrequests in the LOG_PHASE handler to avoid inflated stats and contention. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| size_t size; | ||
| unsigned init; | ||
| uint32_t hash; | ||
| ngx_int_t rc; | ||
| ngx_int_t status_code_slot; | ||
| ngx_slab_pool_t *shpool; |
There was a problem hiding this comment.
rc is declared unconditionally but only used inside #if (NGX_HTTP_CACHE). When nginx is built without NGX_HTTP_CACHE, this becomes an unused local and may fail builds that treat warnings as errors. Consider moving the rc declaration inside the #if (NGX_HTTP_CACHE) block (or otherwise compiling it out when cache is disabled).
| if (!ctx->enable || !vtscf->enable || vtscf->bypass_stats) { | ||
| return NGX_DECLINED; | ||
| } | ||
| if (r != r->main) { | ||
| return NGX_DECLINED; | ||
| } |
There was a problem hiding this comment.
This change intentionally skips subrequests (r != r->main), which can significantly affect accounting behavior. There are existing Test::Nginx tests in t/ for request counters/filters; consider adding a regression test that triggers an internal subrequest (e.g., auth_request or another subrequest-producing directive) and asserts VTS counters only increment for the main request.
| if (!ctx->enable || !vtscf->filter) { | ||
| return NGX_DECLINED; | ||
| } | ||
| if (r != r->main) { | ||
| return NGX_DECLINED; | ||
| } |
There was a problem hiding this comment.
Like the LOG_PHASE handler, the filter/set handler now skips subrequests (r != r->main). To prevent regressions (and to validate the intended behavior), consider extending the existing Test::Nginx coverage to include a scenario where a request triggers subrequests and verifies vhost_traffic_status_set_by_filter variables are only updated for the main request.
Problem
When nginx processes requests that generate a large number of internal background subrequests (via
NGX_HTTP_SUBREQUEST_BACKGROUND), each subrequest triggers VTS'sNGX_HTTP_LOG_PHASEhandler and the filterACCESS_PHASEhandler. This causes three bugs:Bug 1 (P0 — SIGSEGV):
r->upstreamNULL dereference inshm_add_upstream()File:
src/ngx_http_vhost_traffic_status_shm.cA subrequest can have
r->upstream_states != NULL(partially initialized or inherited) whiler->upstream == NULL. Dereferencingr->upstream->statecrashes the worker.Observed symptom: 3 nginx worker processes crash with SIGSEGV (signal 11) after a batch background subrequest operation completes (e.g. 100,000+ subrequests).
Bug 2 (P0 — incorrect stats): subrequests counted as independent requests
Files:
src/ngx_http_vhost_traffic_status_module.c,src/ngx_http_vhost_traffic_status_set.cVTS's
NGX_HTTP_LOG_PHASEhandler has no guard against subrequests. Every background subrequest is counted as a separate request inshm_add_server(),shm_add_upstream(),shm_add_filter(), andshm_add_cache(), producing severely inflated statistics and unnecessary shared-memory mutex contention.Bug 3 (P1 — deadlock risk): nested mutex acquisition in
shm_add_node_cache()File:
src/ngx_http_vhost_traffic_status_shm.cshm_add_node_cache()is called while the VTS slab mutex (shpool->mutex) is already held. Inside it, it acquirescache->shpool->mutex— a different shared-memory zone's mutex:If any other code path (e.g. the nginx cache manager) acquires
cache->shpool->mutexfirst and then tries to interact with VTS, the reversed lock order can deadlock the worker.Fix
Bug 1 + Bug 2: add subrequest guard
Also add the missing NULL check in
shm_add_upstream():Bug 3: read cache sizes before acquiring VTS slab mutex
Extract cache size reading into a new
shm_get_cache_size()helper that is called beforengx_shmtx_lock(&shpool->mutex). The VTS node update (shm_add_node_cache()) then receives the already-read values as parameters and no longer needs to acquire any lock:Changes
src/ngx_http_vhost_traffic_status_module.c— subrequest guard (r != r->main)src/ngx_http_vhost_traffic_status_set.c— subrequest guard (r != r->main)src/ngx_http_vhost_traffic_status_shm.c—r->upstreamNULL check; extractshm_get_cache_size(); pass sizes as parameters toshm_add_node_cache()