Speaker
Hans Hacker
(LRZ)
Description
As part of the PRACE, a few kernels from the Euroben-benchmark
had to be ported to all the available PRACE prototype architectures.
This presentation focuses on the Nvidia/CUDA port and gives an
overview of the used Nvidia Hardware, the experience with CUDA and the
available toolkit. It illustrates the porting effort, various
problems and the results by the example with three of those kernels, namely
a dense matrix-matrix multiplication, a sparse matrix-vector multiplication
and a 1D fast Fourier transformation.