Skip to content

Problem with GEMM benchmark results for NVIDIA Volta #104

@wdj

Description

@wdj

Your NVIDIA gemm benchmark appears to have a problem. gemm_bench.cu uses uint16_t, an integer type, instead of __half to represent half precision floating point numbers. As a result, rand() in tensor.h fills the matrices A and B with random floating point numbers between 0 and 1 that are converted to integers -- therefore most of the entries are zeros rather than fully random floating point numbers. This results in unrepresentative benchmark timings for Volta GPUs that have power/frequency throttling enabled -- computing on zeros takes much less power than computing on random numbers -- I've confirmed this with nvidia-smi using your benchmark. For your gemm benchmark I've measured performance reported up to ~15% higher due to computing on zeros, an unrepresentative use case, compared to computing on realistic, nonzero inputs.

The fix seems to be replacing uint16_t with __half in the code.

Thank you for your assistance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions