GPU performance on mac M2 is bad compared to cpu for small models, though large models see some benefit.
Speed check cpu vs gpu on mac ocl, after setting gpu_speed_check and speed_only in explore/check_adaptive.py:
$ python explore/check_adaptive.py
== Speed and accuracy tests for all adaptive integration models ==
* target evaluation time is 2 s (running on a mac M2 chip)
* q in [1e-5, 1] with 40 points per decade for 201 points total
* warns if the adaptive model is 2x slower than a 76-point gaussian
* large models tested against 5000 point gaussian integration
* q=[5e-4, 1e-3, 2e-3] with tol=1e-5 relative (measured q)
* q=[0.01, 0.1] with tol=0.2 relative (slit resolution limits)
* small models tested against 5000 point gaussian integration
* q in [1e-3, 1] with 1 points per decade
!!!! These tests run very slowly --- don't use as part of CI !!!!
=== small rods: a=20 b=40 c=200 ===
core_shell_bicelle background=0 radius=18.0 thick_rim=2.0 thick_face=2.0 length=196.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 0.5 ms gpu single: 3.9 ms [*** 6.1x slow down]
core_shell_cylinder background=0 radius=18.0 thickness=2.0 length=196.0 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 0.3 ms gpu single: 3.9 ms [*** 12.0x slow down]
core_shell_ellipsoid background=0 radius_equat_core=18.0 x_core=5.444444444444445 thick_shell=2.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 0.2 ms gpu single: 2.2 ms [*** 11.2x slow down]
cylinder background=0 sld=1 sld_solvent=0 radius=20.0 length=200
cpu double: 0.2 ms gpu single: 2.6 ms [*** 11.1x slow down]
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=20.0 radius_polar=100.0
cpu double: 0.1 ms gpu single: 1.6 ms [*** 12.8x slow down]
flexible_cylinder background=0 length=2000 kuhn_length=200 radius=40 sld=1 sld_solvent=0
cpu double: 0.1 ms gpu single: 0.7 ms [*** 7.1x slow down]
hollow_cylinder background=0 radius=18.0 thickness=2.0 length=200 sld=1 sld_solvent=0
cpu double: 0.3 ms gpu single: 3.1 ms [*** 10.1x slow down]
core_shell_bicelle_elliptical_belt_rough background=0 radius=9.0 x_core=2.111111111111111 thick_rim=1.0 thick_face=1.0 length=198.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=0.1
cpu double: 6.8 ms gpu single: 45.9 ms [*** 5.7x slow down]
core_shell_bicelle_elliptical background=0 radius=9.0 x_core=2.111111111111111 thick_rim=1.0 thick_face=1.0 length=198.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 6.9 ms gpu single: 45.8 ms [*** 5.6x slow down]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=18.0 length_b=38.0 length_c=198.0 thick_rim_a=1.0 thick_rim_b=1.0 thick_rim_c=1.0
cpu double: 4.8 ms gpu single: 41.4 ms [*** 7.7x slow down]
elliptical_cylinder background=0 radius_minor=10.0 axis_ratio=0.2 length=200 sld=1 sld_solvent=0
cpu double: 2.5 ms gpu single: 29.2 ms [*** 10.5x slow down]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=20 b2a_ratio=2.0 c2a_ratio=10.0
cpu double: 3.0 ms gpu single: 24.5 ms [*** 7.1x slow down]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=20 b2a_ratio=2.0 c2a_ratio=10.0 thickness=1.0
cpu double: 4.2 ms gpu single: 39.5 ms [*** 8.3x slow down]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=20 length_b=40 length_c=200
cpu double: 2.9 ms gpu single: 25.8 ms [*** 7.8x slow down]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=20 b2a_ratio=2.0 c2a_ratio=10.0
cpu double: 2.8 ms gpu single: 24.4 ms [*** 7.6x slow down]
=== small disks: a=180 b=200 c=40 ===
core_shell_ellipsoid background=0 radius_equat_core=90.0 x_core=0.1111111111111111 thick_shell=10.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 0.2 ms gpu single: 1.2 ms [*** 6.1x slow down]
cylinder background=0 sld=1 sld_solvent=0 radius=100.0 length=40
cpu double: 0.2 ms gpu single: 1.3 ms [*** 5.6x slow down]
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=100.0 radius_polar=20.0
cpu double: 0.1 ms gpu single: 0.9 ms [*** 8.3x slow down]
flexible_cylinder background=0 length=400 kuhn_length=40 radius=200 sld=1 sld_solvent=0
cpu double: 0.1 ms gpu single: 0.6 ms [*** 7.3x slow down]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=180 radius_equat_major=200 radius_polar=40
cpu double: 29.0 ms gpu single: 110.6 ms [*** 2.8x slow down]
=== small cubes: a=200 b=200 c=200 ===
core_shell_ellipsoid background=0 radius_equat_core=90.0 x_core=1.0 thick_shell=10.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 0.2 ms gpu single: 1.2 ms [*** 6.6x slow down]
cylinder background=0 sld=1 sld_solvent=0 radius=100.0 length=200
cpu double: 0.2 ms gpu single: 1.3 ms [*** 5.5x slow down]
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=100.0 radius_polar=100.0
cpu double: 0.1 ms gpu single: 0.9 ms [*** 8.8x slow down]
flexible_cylinder background=0 length=2000 kuhn_length=200 radius=200 sld=1 sld_solvent=0
cpu double: 0.1 ms gpu single: 0.5 ms [*** 7.2x slow down]
elliptical_cylinder background=0 radius_minor=100.0 axis_ratio=1.0 length=200 sld=1 sld_solvent=0
cpu double: 11.0 ms gpu single: 214.6 ms [*** 18.5x slow down]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=200 radius_equat_major=200 radius_polar=200
cpu double: 30.2 ms gpu single: 110.0 ms [*** 2.6x slow down]
=== big rods: a=1000 b=2000 c=200000 ===
flexible_cylinder background=0 length=2000000 kuhn_length=200000 radius=2000 sld=1 sld_solvent=0
cpu double: 0.1 ms gpu single: 0.6 ms [*** 8.1x slow down]
barbell background=0 radius=500.0 radius_bell=1000.0 length=196267.94919243112 sld=1 sld_solvent=0
cpu double: 778.3 ms gpu single: 334.8 ms [56% speed up]
capped_cylinder background=0 radius=500.0 radius_cap=1000.0 length=199732.05080756888 sld=1 sld_solvent=0
cpu double: 771.3 ms gpu single: 319.3 ms [58% speed up]
core_shell_bicelle_elliptical_belt_rough background=0 radius=450.0 x_core=2.111111111111111 thick_rim=50.0 thick_face=50.0 length=199900.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=5.0
cpu double: 1297.7 ms gpu single: 549.2 ms [57% speed up]
core_shell_bicelle_elliptical background=0 radius=450.0 x_core=2.111111111111111 thick_rim=50.0 thick_face=50.0 length=199900.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 1302.1 ms gpu single: 549.4 ms [57% speed up]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=900.0 length_b=1900.0 length_c=199900.0 thick_rim_a=50.0 thick_rim_b=50.0 thick_rim_c=50.0
cpu double: 891.7 ms gpu single: 497.2 ms [44% speed up]
elliptical_cylinder background=0 radius_minor=500.0 axis_ratio=0.01 length=200000 sld=1 sld_solvent=0
cpu double: 599.2 ms gpu single: 335.3 ms [44% speed up]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=1000 b2a_ratio=2.0 c2a_ratio=200.0 thickness=50.0
cpu double: 778.6 ms gpu single: 485.6 ms [37% speed up]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=1000 length_b=2000 length_c=200000
cpu double: 546.8 ms gpu single: 309.3 ms [43% speed up]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=1000 b2a_ratio=2.0 c2a_ratio=200.0
cpu double: 512.8 ms gpu single: 296.2 ms [42% speed up]
=== big disks: a=180000 b=200000 c=1000 ===
flexible_cylinder background=0 length=10000 kuhn_length=1000 radius=200000 sld=1 sld_solvent=0
cpu double: 0.1 ms gpu single: 0.6 ms [*** 8.1x slow down]
barbell background=0 radius=90000.0 radius_bell=100000.0 length=0 sld=1 sld_solvent=0
cpu double: 1046.0 ms gpu single: 337.2 ms [67% speed up]
capped_cylinder background=0 radius=90000.0 radius_cap=100000.0 length=0 sld=1 sld_solvent=0
cpu double: 936.5 ms gpu single: 337.3 ms [63% speed up]
core_shell_bicelle_elliptical_belt_rough background=0 radius=81000.0 x_core=1.123456790123457 thick_rim=9000.0 thick_face=9000.0 length=-17000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=900.0
cpu double: 1874.4 ms gpu single: 836.5 ms [55% speed up]
core_shell_bicelle_elliptical background=0 radius=81000.0 x_core=1.123456790123457 thick_rim=9000.0 thick_face=9000.0 length=-17000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 1874.6 ms gpu single: 839.9 ms [55% speed up]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=162000.0 length_b=182000.0 length_c=-17000.0 thick_rim_a=9000.0 thick_rim_b=9000.0 thick_rim_c=9000.0
cpu double: 1721.5 ms gpu single: 749.8 ms [56% speed up]
elliptical_cylinder background=0 radius_minor=90000.0 axis_ratio=200.0 length=1000 sld=1 sld_solvent=0
! ** elliptical_cylinder is slow: 2.2 s for 201 points in [1e-05, 1.0]
cpu double: 2225.0 ms gpu single: 523.9 ms [76% speed up]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=180000 b2a_ratio=1.1111111111111112 c2a_ratio=0.005555555555555556
cpu double: 883.0 ms gpu single: 453.4 ms [48% speed up]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=180000 b2a_ratio=1.1111111111111112 c2a_ratio=0.005555555555555556 thickness=9000.0
cpu double: 1633.8 ms gpu single: 715.9 ms [56% speed up]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=180000 length_b=200000 length_c=1000
cpu double: 1113.5 ms gpu single: 463.5 ms [58% speed up]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=180000 b2a_ratio=1.1111111111111112 c2a_ratio=0.005555555555555556
cpu double: 1075.5 ms gpu single: 437.0 ms [59% speed up]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=180000 radius_equat_major=200000 radius_polar=1000
cpu double: 600.6 ms gpu single: 259.9 ms [56% speed up]
=== big cubes: a=200000 b=200000 c=200000 ===
flexible_cylinder background=0 length=2000000 kuhn_length=200000 radius=200000 sld=1 sld_solvent=0
cpu double: 0.1 ms gpu single: 0.6 ms [*** 8.2x slow down]
barbell background=0 radius=100000.0 radius_bell=100000.0 length=0.0 sld=1 sld_solvent=0
cpu double: 1000.5 ms gpu single: 336.3 ms [66% speed up]
capped_cylinder background=0 radius=100000.0 radius_cap=100000.0 length=0.0 sld=1 sld_solvent=0
cpu double: 1000.8 ms gpu single: 335.6 ms [66% speed up]
core_shell_bicelle_elliptical_belt_rough background=0 radius=90000.0 x_core=1.0 thick_rim=10000.0 thick_face=10000.0 length=180000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=1000.0
cpu double: 1765.2 ms gpu single: 835.1 ms [52% speed up]
core_shell_bicelle_elliptical background=0 radius=90000.0 x_core=1.0 thick_rim=10000.0 thick_face=10000.0 length=180000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 1764.4 ms gpu single: 839.7 ms [52% speed up]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=180000.0 length_b=180000.0 length_c=180000.0 thick_rim_a=10000.0 thick_rim_b=10000.0 thick_rim_c=10000.0
cpu double: 1726.7 ms gpu single: 751.9 ms [56% speed up]
elliptical_cylinder background=0 radius_minor=100000.0 axis_ratio=1.0 length=200000 sld=1 sld_solvent=0
cpu double: 912.3 ms gpu single: 523.2 ms [42% speed up]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=200000 b2a_ratio=1.0 c2a_ratio=1.0
cpu double: 888.5 ms gpu single: 453.8 ms [48% speed up]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=200000 b2a_ratio=1.0 c2a_ratio=1.0 thickness=10000.0
cpu double: 1641.9 ms gpu single: 714.1 ms [56% speed up]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=200000 length_b=200000 length_c=200000
cpu double: 1112.4 ms gpu single: 464.0 ms [58% speed up]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=200000 b2a_ratio=1.0 c2a_ratio=1.0
cpu double: 1076.6 ms gpu single: 436.8 ms [59% speed up]
Originally posted by @pkienzle in #658 (comment)
GPU performance on mac M2 is bad compared to cpu for small models, though large models see some benefit.
Speed check cpu vs gpu on mac ocl, after setting gpu_speed_check and speed_only in
explore/check_adaptive.py:Originally posted by @pkienzle in #658 (comment)