Skip to content

Profile code#6

Merged
transientlunatic merged 4 commits into
v0.4-previewfrom
update-profiling
Aug 20, 2024
Merged

Profile code#6
transientlunatic merged 4 commits into
v0.4-previewfrom
update-profiling

Conversation

@transientlunatic

Copy link
Copy Markdown
Owner

This PR adds tests to profile the code more effectively.

@transientlunatic transientlunatic marked this pull request as draft August 20, 2024 10:09
@transientlunatic

Copy link
Copy Markdown
Owner Author
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                       model_likelihood        51.52%     693.273ms        99.88%        1.344s        1.344s       0.000us         0.00%     278.159ms     278.159ms             1
                                  aten::linalg_solve_ex         0.00%      54.000us        23.12%     311.050ms     155.525ms       0.000us         0.00%     231.637ms     115.819ms             2
                                 aten::_linalg_solve_ex         0.02%     223.000us        23.11%     310.964ms     155.482ms       0.000us         0.00%     231.637ms     115.819ms             2
                              aten::linalg_lu_factor_ex         0.96%      12.924ms         9.46%     127.295ms      42.432ms     172.744ms        62.10%     173.101ms      57.700ms             3
                                       aten::linalg_inv         0.01%      70.000us        19.12%     257.323ms     257.323ms       0.000us         0.00%     159.835ms     159.835ms             1
                                    aten::linalg_inv_ex         0.02%     205.000us        18.36%     247.055ms     247.055ms       0.000us         0.00%     159.832ms     159.832ms             1
                                  aten::linalg_lu_solve         0.03%     367.000us        13.69%     184.271ms      92.135ms       3.025ms         1.09%      95.305ms      47.653ms             2
                          aten::linalg_solve_triangular         0.09%       1.188ms         9.22%     124.011ms      31.003ms      92.038ms        33.09%      92.038ms      23.009ms             4
void cutlass::Kernel<cutlass_80_tensorop_d884gemm_64...         0.00%       0.000us         0.00%       0.000us       0.000us      84.977ms        30.55%      84.977ms       2.360ms            36
                                     aten::linalg_solve         0.00%      14.000us         5.69%      76.628ms      76.628ms       0.000us         0.00%      71.864ms      71.864ms             1
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 1.346s
Self CUDA time total: 278.159ms

@transientlunatic transientlunatic self-assigned this Aug 20, 2024
@transientlunatic transientlunatic marked this pull request as ready for review August 20, 2024 13:09
@transientlunatic transientlunatic merged commit 5bf0cdc into v0.4-preview Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant