Abstract
With the rapid increase in the size and scale of modern systems, topology-aware process mapping has become an important approach for improving system efficiency. A poor placement of processes across compute nodes could cause significant congestion within the interconnect. In this paper, we propose a new greedy mapping heuristic as well as a mapping refinement algorithm. The heuristic attempts to minimize a hybrid metric that we use for evaluating various mappings, whereas the refinement algorithm attempts to reduce maximum congestion directly. Moreover, we take advantage of parallelism in the design and implementation of our proposed algorithms to achieve scalability. We also use the underlying routing information in addition to the topology of the system to derive a better evaluation of congestion. Our experimental results with 4096 processes show that the proposed approach can provide more than 60% improvement in various mapping metrics compared to an initial in-order mapping of processes. Communication time is also improved by 50%. In addition, we also compare our proposed algorithms with 4 other heuristics from the LibTopoMap library, and show that we can achieve better mappings at a significantly lower cost.