2015 IEEE International Conference on Cluster Computing (CLUSTER)
Download PDF

Abstract

Noncontiguous data communication has been heavily adopted in scientific applications, especially for those written with MPI. Common strategies to handle noncontiguous data, like packing/unpacking, incur significant performance overhead during communication, which could become as a barrier of using MPI derived datatypes. Recently, a novel feature of Mellanox InfiniBand, called User-mode Memory Registration (UMR), has been introduced for noncontiguous data communication. UMR has the potential to support MPI derived datatype communication efficiently without the overhead of packing/unpacking. In this paper, we analyze the UMR feature and study its basic performance with InfiniBand verbs-level micro-benchmarks. With this knowledge, we propose UMR-based schemes to support zero-copy datatype communication at MPI level. We show that a naive integration of UMR with an MPI stack could not bring performance benefits over existing schemes. Thus we propose two schemes -- UMR Pool and UMR Cache -- to enable high performance MPI datatype communication with UMR. To the best of our knowledge, this is the first paper to study, analyze, and design MPI noncontiguous data communication using the UMR feature. We propose and implement UMR-based designs on top of MVAPICH2 library. The experimental results at the microbenchmark level show that the proposed UMR-based design is able to deliver 4X performance improvement in latency for large message vector benchmarks over the packing/unpacking scheme. At the application level, for a 3D stencil communication kernel with MPI derived datatype on 512 processes, the optimized UMR-based design outperforms the packing/unpacking scheme by 27% in execution time.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles