library for accelerating mixed precision matrix multiply-accumulate operations
https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocwmma