matmul benchmark
Running a few different WebGPU matmul programs on 4096x4096 matrices.
naive-1
naive-16
naive-32
shmem-tiling-16
shmem-tiling-32
unroll4-8-8
unroll4-8-16
unroll4-16-16
unroll4x2-8-8
unroll4x2-8-16
unroll4x2-16-16
unroll4x4-8-8
unroll4x4-8-16
unroll4x4-16-16
tfjs
jax-js
jax-js-fp16