Abstract
An MPI implementation of the Basic Linear Algebra Communication Subprograms (BLACS), an underlying layer of the ScaLAPACK library is presented. Use is made of a wide spectrum of functionality available in MPI to realize BLACS as succinctly as possible, thus making the implementation concise, but still yielding good performance. Some of the implementation details are discussed and the benchmark results for the ScaLAPACK LU factorization on several parallel architectures with different MPI libraries are presented. A performance comparison with other existing BLACS implementations is made and some conclusions are drawn from the results.