Skip to content

Add ccache troubleshooting docs and analysis scripts#5185

Open
ScottTodd wants to merge 2 commits into
ROCm:mainfrom
ScottTodd:ccache-docs
Open

Add ccache troubleshooting docs and analysis scripts#5185
ScottTodd wants to merge 2 commits into
ROCm:mainfrom
ScottTodd:ccache-docs

Conversation

@ScottTodd
Copy link
Copy Markdown
Member

Motivation

Follow-up to #5141, documenting what was learned while debugging and contributing a few (hacky) scripts that helped my coding agent in its analysis.

See also:

Technical Details

The documentation and scripts here were mostly generated by Claude Code following a multi-day debugging session. The intended audience is both humans and coding agents, so together we won't need to rediscover so much the next time we're debugging poor cache behavior.

Test Plan

Ran the new scripts myself:

Analyze "baseline"

(.venv) λ python build_tools/hack/ccache/analyze_ccache_logs.py --run-id 25588131640 --stage math-libs --gfx gfx110X-all --output-dir=D:\scratch\therock\ccache
  Downloading: https://therock-ci-artifacts.s3.amazonaws.com/25588131640-windows/logs/math-libs/gfx110X-all/ccache_logs.tar.zst
  Extracted to: D:\scratch\therock\ccache\run_25588131640\math-libs_gfx110X-all

======================================================================
  Run 25588131640 / math-libs / gfx110X-all
======================================================================

## Overall Summary
  Total result entries:   23877
  Cacheable:              7977
    Hits:                 127 (1.6%)
      Direct:             54
      Preprocessed:       73
    Misses:               7850 (98.4%)
  Uncacheable:            79

## Path Issues
  'Can't be read' entries: 104012
  Unique GUIDs in paths:   88
  (Stale entries from runners with different workspace GUIDs)

## Per-Compiler Breakdown
  amd-llvm/clang++                    0 /    12 hits (0.0%)
  cl.exe (MSVC)                      51 /    56 hits (91.1%)
  clr/clang                           0 /    34 hits (0.0%)
  clr/clang++                        76 /  7875 hits (1.0%)

## Top Projects by Cache Misses
  rocsparse                        1148 misses
  rocwmma                           972 misses
  composablekernel                  923 misses
  miopen                            815 misses
  composable_kernel                 763 misses
  rocblas                           635 misses
  rocsolver                         533 misses
  hipdnn                            423 misses
  rocthrust                         257 misses
  hipsparse                         195 misses
  hipcub                            165 misses
  hipblaslt                         156 misses
  rocrand                           132 misses
  rocfft                            110 misses
  hipblas                           100 misses

Analyze "with fixes"

(.venv) λ python build_tools/hack/ccache/analyze_ccache_logs.py --run-id 25585430330 --stage math-libs --gfx gfx110X-all --output-dir=D:\scratch\therock\ccache
  Using cached: D:\scratch\therock\ccache\run_25585430330\math-libs_gfx110X-all\ccache.log

======================================================================
  Run 25585430330 / math-libs / gfx110X-all
======================================================================

## Overall Summary
  Total result entries:   9547
  Cacheable:              7977
    Hits:                 7658 (96.0%)
      Direct:             6853
      Preprocessed:       805
    Misses:               319 (4.0%)
  Uncacheable:            79

## Per-Compiler Breakdown
  amd-llvm/clang++                    6 /    12 hits (50.0%)
  cl.exe (MSVC)                      51 /    56 hits (91.1%)
  clr/clang                           3 /    34 hits (8.8%)
  clr/clang++                      7598 /  7875 hits (96.5%)

## Top Projects by Cache Misses
  cmake-3.31/modules                 39 misses
  rocsparse                          29 misses
  rocthrust                          25 misses
  rocrand                            24 misses
  hipcub                             22 misses
  rocwmma                            20 misses
  rocblas                            18 misses
  rocfft                             15 misses
  hipfft                             13 misses
  hipblaslt                          12 misses
  miopen                             10 misses
  composable_kernel                   9 misses
  hipblas                             9 misses
  support                             8 misses
  rocsolver                           7 misses

Compare "baseline" to "with fixes"

(.venv) λ python build_tools/hack/ccache/compare_ccache_by_project.py D:\scratch\therock\ccache\run_25588131640\math-libs_gfx110X-all\ccache.log D:\scratch\therock\ccache\run_25585430330\math-libs_gfx110X-all\ccache.log
Parsing log 1...
Parsing log 2...

Project                                     Log 1              Log 2      Gap
                                hits/total   rate  hits/total   rate
------------------------------------------------------------------------------
  3p-spdlog                             0/2    0%          3/3  100%    +100%
  ?                                   2/177    1%      122/131   93%     +92%
  composable_kernel                   0/508    0%      303/303  100%    +100%
  composablekernel                    0/596    0%      391/391  100%    +100%
  hipblas                              0/84    0%        65/65  100%    +100%
  hipblaslt                            0/74    0%        46/46  100%    +100%
  hipcub                               0/92    0%        54/54  100%    +100%
  hipdnn                              2/239    1%      133/133  100%     +99%
  hipdnn_integration_tests              0/1    0%                 --
  hipfft                                0/9    0%          9/9  100%    +100%
  hiprand                               0/3    0%          3/3  100%    +100%
  hipsolver                            0/50    0%        53/53  100%    +100%
  hipsparse                           0/182    0%      187/187  100%    +100%
  miopen                              4/781    1%      397/409   97%     +97%
  rocblas                             1/447    0%      304/305  100%     +99%
  rocfft                               0/59    0%        53/53  100%    +100%
  rocprim                              0/44    0%        23/23  100%    +100%
  rocprim_tests                        0/29    0%        14/14  100%    +100%
  rocrand                              1/60    2%        23/23  100%     +98%
  rocsolver                           2/394    1%      312/312  100%     +99%
  rocsparse                           3/958    0%      903/903  100%    +100%
  rocthrust                           1/148    1%        77/77  100%     +99%
  rocwmma                             6/828    1%      641/641  100%     +99%
------------------------------------------------------------------------------
  TOTAL                        22/5765  0.4%       4116/4138 99.5%

Submission Checklist

Add docs/development/ccache_troubleshooting.md covering:
- CI infrastructure (remote cache servers, namespace versioning,
  platform differences, Windows drive mounts)
- How to download and inspect ccache logs from CI
- Symptoms of poor cache hit rates and their fixes
- Validation procedures for compiler reproducibility, path stability,
  and flag stability

Add two analysis scripts to build_tools/hack/ccache/:
- analyze_ccache_logs.py: Downloads ccache logs from S3 and reports
  hit/miss rates broken down by compiler and subproject
- compare_ccache_by_project.py: Compares per-subproject hit rates
  across two log files (e.g. Linux vs Windows)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ScottTodd ScottTodd added the ci:skip Skip all CI builds/tests for this PR label May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:skip Skip all CI builds/tests for this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant