Abstract
The performance of the Message Passing Interface collective communications is a critical issue to high performance computing widely discussed. In this paper we propose a mechanism that dynamically selects the most efficient MPI Alltoall algorithm for a given system/workload situation. This implementation method starts by grouping the fast algorithms based on respective performance prediction models that were obtained by using the point-to-point model P-LogP. The experiments performed on different parallel machines equipped with Infiniband and Gigabit Ethernet interconnects produced encouraging results, with negligible overhead to find the most appropriate algorithm to carry on the operation. In most cases, the dynamic Alltoall largely outperforms the traditional MPI implementations on different platforms.