Add ModuleAllocProfiler#50820
Conversation
Abstract commonalities with IntrusiveAllocProfiler to AllocProfilerData
…ent stacktrace entries
|
cms-bot internal usage |
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50820/49146
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
Without relying on specific functions being part of the stack trace.
…or/scripts/edmAllocProfilerFoldStacks.py
ddf7e01 to
25f5ead
Compare
|
A new Pull Request was created by @makortel for master. It involves the following packages:
@Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
@cmsbuild, please test for CMSSW_17_0_CPP23_X |
|
Pull request #50820 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again. |
|
@cmsbuild, please test |
| // TODO: I'm not sure what to do with ClearEvent here | ||
| // On one hand it would be useful in some cases, but I'm not sure if it should be always enabled, or if it should be | ||
| // optional, and if so how to configure it (since it doesn't have a module description or calling context). In any | ||
| // case it should not be tied to source. | ||
| //iAR.watchPreClearEvent(sourceStart); | ||
| //iAR.watchPostClearEvent(sourceStop); |
There was a problem hiding this comment.
Treatment of ClearEvent signal is still an open question
|
|
||
| ### ModuleAllocProfiler (C++23) | ||
|
|
||
| The `ModuleAllocProfiler` combines the stack-trace recording of [`IntrusiveAllocProfiler`](#intrusiveallocprofiler-c23) to the module-based monitoring of [`ModuleAllocMonitor`](#moduleallocmonitor). In contrast to `ModuleAllocMonitor`, the set of modules to be profiled must be set explicitly. This is because the profiling is substantially slower, and produces substantially more data, than the counting monitoring of `ModuleAllocMonitor`. The stack traces are reported either via MessageLogger (default), or to per-module-per-transition files. |
There was a problem hiding this comment.
For longer term I think the approach of having a file per module per signal call per measurement type results in too many files. On the other hand, already the present way is useful, and the profile file format might be a deeper rabbit hole than initially thought.
|
+1 Size: This PR adds an extra 24KB to repository Comparison SummarySummary:
|
|
@cmsbuild, please test for CMSSW_17_0_CPP23_X |
|
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0cf159/52920/summary.html Comparison SummarySummary:
|
| // Compute longest list of common trace entries fron the top across al recorded traces (both alloc and dealloc) so | ||
| // that a single consistent measurement context is stripped from every section of the report. | ||
| auto allAllocTraces = | ||
| uniqueAllocTraces_ | std::views::transform([](auto const& at) -> std::stacktrace const& { return at.trace_; }); | ||
| auto allDeallocTraces = uniqueDeallocTraces_ | | ||
| std::views::transform([](auto const& dt) -> std::stacktrace const& { return dt.trace_; }); | ||
| std::size_t const commonTopEntries = computeCommonTopEntries(allAllocTraces, allDeallocTraces); |
There was a problem hiding this comment.
This computation of the common part of the stack traces is not exactly cheap. In a test with a random module that resulted ~2.5k unique allocation and deallocation traces in total for one measurement period the computation took ~1.8 seconds. It took slightly less time than the churn analysis (for ~100 traces), and in the bigger picture the stack unwinding in the first place seems to be at least an order of magnitude slower, so maybe it's ok.
The previous way of trying to construct the common part by excluding specific functions from the traces just felt brittle to extend.
|
Pull request #50820 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again. |
|
@cmsbuild, please test for CMSSW_17_0_CPP23_X |
|
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0cf159/52959/summary.html Comparison SummarySummary:
|
|
|
||
| #include "AllocProfilerCommon.h" | ||
|
|
||
| namespace cms::perftools::allocMon::profiler { |
There was a problem hiding this comment.
do we normally use camel case for namespaces?
| struct DeallocationTrace { | ||
| DeallocationTrace(DeallocationRecord const& o, std::stacktrace trace) : total_(o), trace_(trace) {} | ||
|
|
||
| void recordDeallocation(DeallocationRecord const& o) { total_.add(o); } |
| // Compute the longest common set of topmost stack frames shared by all traces in the range. | ||
| // Returns 0 if the range is empty or traces share no common frames. | ||
| template <typename TraceRange1, typename TraceRange2> | ||
| static std::size_t computeCommonTopEntries(TraceRange1&& first, TraceRange2&& second) { |
There was a problem hiding this comment.
It would really be nice to have a unit test for this as it is quite complex.
PR description:
This PR adds an
ModuleAllocProfilerthat is an amalgamation of the module transition monitoring ofModuleAllocMonitorand the stack trace based profiling ofIntrusiveAllocProfiler. See the updated README for more information.This PR alters the file name pattern of the
IntrusiveAllocProfiler. It also makes theSleepingServerdestructor to stop the thread if it wasn't stopped yet (in case the job terminates earlier than endJob).Resolves cms-sw/framework-team#2022
PR validation:
Unit tests pass