Skip to content

Add ModuleAllocProfiler#50820

Open
makortel wants to merge 15 commits intocms-sw:masterfrom
makortel:moduleAllocProfiler
Open

Add ModuleAllocProfiler#50820
makortel wants to merge 15 commits intocms-sw:masterfrom
makortel:moduleAllocProfiler

Conversation

@makortel
Copy link
Copy Markdown
Contributor

PR description:

This PR adds an ModuleAllocProfiler that is an amalgamation of the module transition monitoring of ModuleAllocMonitor and the stack trace based profiling of IntrusiveAllocProfiler. See the updated README for more information.

This PR alters the file name pattern of the IntrusiveAllocProfiler. It also makes the SleepingServer destructor to stop the thread if it wasn't stopped yet (in case the job terminates earlier than endJob).

Resolves cms-sw/framework-team#2022

PR validation:

Unit tests pass

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 27, 2026

cms-bot internal usage

@cmsbuild
Copy link
Copy Markdown
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50820/49146

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Copy Markdown
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50820/49147

@cmsbuild
Copy link
Copy Markdown
Contributor

A new Pull Request was created by @makortel for master.

It involves the following packages:

  • FWCore/Modules (core)
  • PerfTools/AllocMonitor (core)

@Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please review it and eventually sign? Thanks.
@felicepantaleo, @wddgit this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Copy Markdown
Contributor Author

@cmsbuild, please test for CMSSW_17_0_CPP23_X

@cmsbuild
Copy link
Copy Markdown
Contributor

Pull request #50820 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again.

@makortel
Copy link
Copy Markdown
Contributor Author

@cmsbuild, please test

Comment on lines +415 to +420
// TODO: I'm not sure what to do with ClearEvent here
// On one hand it would be useful in some cases, but I'm not sure if it should be always enabled, or if it should be
// optional, and if so how to configure it (since it doesn't have a module description or calling context). In any
// case it should not be tied to source.
//iAR.watchPreClearEvent(sourceStart);
//iAR.watchPostClearEvent(sourceStop);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Treatment of ClearEvent signal is still an open question


### ModuleAllocProfiler (C++23)

The `ModuleAllocProfiler` combines the stack-trace recording of [`IntrusiveAllocProfiler`](#intrusiveallocprofiler-c23) to the module-based monitoring of [`ModuleAllocMonitor`](#moduleallocmonitor). In contrast to `ModuleAllocMonitor`, the set of modules to be profiled must be set explicitly. This is because the profiling is substantially slower, and produces substantially more data, than the counting monitoring of `ModuleAllocMonitor`. The stack traces are reported either via MessageLogger (default), or to per-module-per-transition files.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For longer term I think the approach of having a file per module per signal call per measurement type results in too many files. On the other hand, already the present way is useful, and the profile file format might be a deeper rabbit hole than initially thought.

@cmsbuild
Copy link
Copy Markdown
Contributor

+1

Size: This PR adds an extra 24KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0cf159/52919/summary.html
COMMIT: 29b38da
CMSSW: CMSSW_17_0_X_2026-04-28-1100/el8_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/50820/52919/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 1 lines from the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4186963
  • DQMHistoTests: Total failures: 32
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4186911
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
  • Checked 227 log files, 197 edm output root files, 53 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Copy Markdown
Contributor Author

@cmsbuild, please test for CMSSW_17_0_CPP23_X

@cmsbuild
Copy link
Copy Markdown
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0cf159/52920/summary.html
COMMIT: 29b38da
CMSSW: CMSSW_17_0_CPP23_X_2026-04-28-1100/el8_amd64_gcc14
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/50820/52920/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 551 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 101560 differences found in the comparisons
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4186963
  • DQMHistoTests: Total failures: 529139
  • DQMHistoTests: Total nulls: 329
  • DQMHistoTests: Total successes: 3657475
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -11.712 KiB( 52 files compared)
  • DQMHistoSizes: changed ( 13034.0 ): 0.176 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 17034.0 ): -0.527 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 18634.0 ): -6.084 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 2022.0010001,... ): -0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 2023.0020001 ): 0.004 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 2024.0050001 ): -0.004 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 2024.0060001 ): -0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 2025.0000002 ): -0.027 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 2025.0010001 ): 0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 250202.181 ): 0.293 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 25202.0 ): ...
  • Checked 227 log files, 197 edm output root files, 53 DQM output files
  • TriggerResults: found differences in 18 / 51 workflows

Comment on lines +76 to +82
// Compute longest list of common trace entries fron the top across al recorded traces (both alloc and dealloc) so
// that a single consistent measurement context is stripped from every section of the report.
auto allAllocTraces =
uniqueAllocTraces_ | std::views::transform([](auto const& at) -> std::stacktrace const& { return at.trace_; });
auto allDeallocTraces = uniqueDeallocTraces_ |
std::views::transform([](auto const& dt) -> std::stacktrace const& { return dt.trace_; });
std::size_t const commonTopEntries = computeCommonTopEntries(allAllocTraces, allDeallocTraces);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This computation of the common part of the stack traces is not exactly cheap. In a test with a random module that resulted ~2.5k unique allocation and deallocation traces in total for one measurement period the computation took ~1.8 seconds. It took slightly less time than the churn analysis (for ~100 traces), and in the bigger picture the stack unwinding in the first place seems to be at least an order of magnitude slower, so maybe it's ok.

The previous way of trying to construct the common part by excluding specific functions from the traces just felt brittle to extend.

@cmsbuild
Copy link
Copy Markdown
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50820/49177

@cmsbuild
Copy link
Copy Markdown
Contributor

Pull request #50820 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again.

@makortel
Copy link
Copy Markdown
Contributor Author

@cmsbuild, please test for CMSSW_17_0_CPP23_X

@cmsbuild
Copy link
Copy Markdown
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0cf159/52959/summary.html
COMMIT: 004fc67
CMSSW: CMSSW_17_0_CPP23_X_2026-04-28-1100/el8_amd64_gcc14
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/50820/52959/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 570 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 101560 differences found in the comparisons
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4186963
  • DQMHistoTests: Total failures: 529139
  • DQMHistoTests: Total nulls: 329
  • DQMHistoTests: Total successes: 3657475
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -11.712 KiB( 52 files compared)
  • DQMHistoSizes: changed ( 13034.0 ): 0.176 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 17034.0 ): -0.527 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 18634.0 ): -6.084 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 2022.0010001,... ): -0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 2023.0020001 ): 0.004 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 2024.0050001 ): -0.004 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 2024.0060001 ): -0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 2025.0000002 ): -0.027 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 2025.0010001 ): 0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 250202.181 ): 0.293 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 25202.0 ): ...
  • Checked 227 log files, 197 edm output root files, 53 DQM output files
  • TriggerResults: found differences in 18 / 51 workflows


#include "AllocProfilerCommon.h"

namespace cms::perftools::allocMon::profiler {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we normally use camel case for namespaces?

struct DeallocationTrace {
DeallocationTrace(DeallocationRecord const& o, std::stacktrace trace) : total_(o), trace_(trace) {}

void recordDeallocation(DeallocationRecord const& o) { total_.add(o); }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be subtract?

// Compute the longest common set of topmost stack frames shared by all traces in the range.
// Returns 0 if the range is empty or traces share no common frames.
template <typename TraceRange1, typename TraceRange2>
static std::size_t computeCommonTopEntries(TraceRange1&& first, TraceRange2&& second) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would really be nice to have a unit test for this as it is quite complex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Develop ModuleAllocProfiler

3 participants