Problem with GEMM benchmark results for NVIDIA Volta

Your NVIDIA gemm benchmark appears to have a problem.  gemm_bench.cu uses uint16_t, an integer type, instead of __half to represent half precision floating point numbers.  As a result, rand() in tensor.h fills the matrices A and B with random floating point numbers between 0 and 1 that are converted to integers -- therefore most of the entries are zeros rather than fully random floating point numbers.  This results in unrepresentative benchmark timings for Volta GPUs that have power/frequency throttling enabled -- computing on zeros takes much less power than computing on random numbers -- I've confirmed this with nvidia-smi using your benchmark.  For your gemm benchmark I've measured performance reported up to ~15% higher due to computing on zeros, an unrepresentative use case, compared to computing on realistic, nonzero inputs.

The fix seems to be replacing uint16_t with __half in the code.

Thank you for your assistance.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with GEMM benchmark results for NVIDIA Volta #104

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Problem with GEMM benchmark results for NVIDIA Volta #104

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions