Skip to content

fix(node_loader): plug 48-byte trampoline leak per metacall_await call (#578)#641

Draft
k5602 wants to merge 1 commit intometacall:developfrom
k5602:fix-node-loader
Draft

fix(node_loader): plug 48-byte trampoline leak per metacall_await call (#578)#641
k5602 wants to merge 1 commit intometacall:developfrom
k5602:fix-node-loader

Conversation

@k5602
Copy link

@k5602 k5602 commented Feb 24, 2026

Description

Each call to metacall_await / metacall_await_future allocates a loader_impl_async_future_await_trampoline_type struct (48 bytes) on the C++ heap and passes it into JavaScript as an napi_wrap-ped object (trampoline_ptr). When the JS promise settles, bootstrap.js calls back into node_loader_trampoline_resolve or node_loader_trampoline_reject, which extract the pointer via napi_remove_wrap.

napi_remove_wrap detaches the native pointer from the JS object and suppresses the NAPI finalizer — transferring full ownership to C. Before this fix, the pointer was used and then silently abandoned, producing a 48-byte leak per settled await call. Across the 3 allocations reported in #578 that accounts for the 144-byte figure observed under both ASAN and Heaptrack.

Note: the function-await path (loader_impl_async_func_await_trampoline_type in node_loader_impl.cpp) is not affected — it uses napi_wrap with an explicit node_loader_impl_async_func_await_finalize finalizer, so the GC owns that allocation. The leak is isolated to the future/direct-await path handled by node_loader_trampoline.cpp.

Fix

Wrap the detached pointer in a std::unique_ptr immediately after napi_remove_wrap in both node_loader_trampoline_resolve and node_loader_trampoline_reject:

loader_impl_async_future_await_trampoline trampoline =
    static_cast<loader_impl_async_future_await_trampoline>(result);
std::unique_ptr<loader_impl_async_future_await_trampoline_type> trampoline_guard(trampoline);

// trampoline_guard destructs at end of block, after return value is fully materialised
return trampoline->resolve_trampoline(trampoline->node_impl, env, trampoline->resolve_callback, recv, args[1], trampoline->context);

No behaviour change. The return expression evaluates fully before the unique_ptr destructor runs (end-of-block), so there is no use-after-free.

Fixes #578

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation. (internal implementation detail; no public API surface changed)
  • My changes generate no new warnings.
  • I have added tests/screenshots (if any) that prove my fix is effective or that my feature works.
  • I have tested the tests implicated (if any) by my own code and they pass (make test or ctest -VV -R <test-name>).
  • If my change is significant or breaking, I have passed all tests with ./docker-compose.sh test &> output and attached the output. (non-breaking single-file fix; full Docker suite not run locally)
  • I have tested my code with OPTION_BUILD_ADDRESS_SANITIZER or ./docker-compose.sh test-address-sanitizer &> output and OPTION_TEST_MEMORYCHECK. (built with -fsanitize=address,leak; AwaitTrampolineOwnershipRegression passes with zero Direct leak / Indirect leak reports)
  • I have tested my code with OPTION_BUILD_THREAD_SANITIZER or ./docker-compose.sh test-thread-sanitizer &> output. (fix is single-threaded ownership; no new synchronisation primitives introduced)
  • I have tested with Helgrind in case my code works with threading. (no threading changes; unique_ptr destructs on the same thread that called napi_remove_wrap)
  • I have run make clang-format in order to format my code and my code follows the style guidelines.

Test: AwaitTrampolineOwnershipRegression

Added to metacall-node-async-test, exercising all three trampoline settlement paths:

# Function JS behaviour Expected C callback
1 f(x)new Promise(r => r(x)) resolves resolve_cb fired
2 g(x)new Promise((_, r) => r(x)) rejects reject_cb fired
3 i()async throw exception → reject reject_cb fired

Post-metacall_destroy() assertions confirm exactly 1 resolve and 2 reject callbacks fired. Under ASAN with detect_leaks=1 the test produces a clean report with the fix applied.

ctest -VV -R metacall-node-async-test
...
[ OK ] metacall_node_async_test.AwaitTrampolineOwnershipRegression (243 ms)
[  PASSED  ] 1 test.

Additional info

this is my first pr in my GSoC campaign so i hope it's good and very welcoming to reviews from mentors.

Fix ownership of loader_impl_async_future_await_trampoline used by
metacall_await handlers. Previously trampoline allocations could leak
(regression metacall#578) after resolve/reject callbacks completed. Add a
std::unique_ptr guard in both resolve and reject trampoline entry points
to ensure the trampoline is deallocated once the callback executes.

Also add AwaitTrampolineOwnershipRegression unit test to exercise resolve,
reject and exception paths and assert expected callback counts.
@k5602 k5602 force-pushed the fix-node-loader branch 2 times, most recently from a9e936b to d977527 Compare February 26, 2026 15:32
@viferga
Copy link
Member

viferga commented Mar 2, 2026

@k5602 could you reproduce this bug? Because I have had it inside the linux distributable but it does not happen inside the CI. Does your test reproduce the problem?

@viferga viferga marked this pull request as draft March 4, 2026 14:14
@k5602 k5602 closed this Mar 4, 2026
@k5602 k5602 reopened this Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue on MetaCall CLI Linux

2 participants