CUDA-MTL4 manual

CUDA-MTL4 manual

Many things can be realized on a computer very elegantly and efficiently today thanks to progress in software and programming languages. One thing that cannot be done elegantly on a computer is computing. At least not computing fast.

In the Matrix Template Library 4 we aim for a natural mathematical notation without sacrifying performance. You can write an expression like x = y * z and the library will perform the according operation: scaling a vector, multiplying a sparse matrix with a dense vector or two sparse matrices. Some operations like dense matrix product use tuned BLAS implementation. In parallel, all described operations in this manual are also realized in C++ so that the library can be used without BLAS and is not limited to types supported by BLAS. For short, general applicability is combined with maximal available performance. We developed new techniques to allow for:

- Unrolling of dynamicly sized data with user-define block and tile sizes;
- Combining multiple vector assignments in a single statement (and more importingly perform them in one single loop);
- Storing matrices recursively in a never-before realized generality;
- Performing operations on recursive and non-recursive matrices recursively;
- Filling compressed sparse matrices efficiently;

and much more.

The manual still not covers all features and techniques of the library. But it should give you enough information to get started.

*
*