2018 International Conference on Computational Science and Computational Intelligence (CSCI)
Download PDF

Abstract

A supercomputer may provide a heterogeneous environment including the host processors and accelerators. As a consequence, programmers can execute applications on both host processors and accelerators. To achieve the combined performance potential, it requires software to effectively partition the the workload of parallel applications to maximize the computation overlap between host processors and accelerators. However, it is hard to determine the right data partition and task parallelism on heterogeneous platforms given a new application. The number of possible options regarding data partition between host processors and accelerators is huge. The imbalanced data and task partition can seriously hurt the performance. In this paper, we present an approach to determining the workload partition and the task granularity for any given application, targeting the Intel Xeon Phi accelerated heterogeneous systems. We employ machine learning techniques to train a predictive model off-line and then use the trained model to predict the data partition and task granularity for any unseen programs at runtime. We apply our approach to 21 representative parallel applications and evaluate it on a Xeon-Xeon Phi mixed heterogeneous platform. Compared with the optimized, non-asynchronous method, our approach achieves, on average, a speedup of 1.6× and 1.9× using a single MIC with one CPU processor and two CPU processors.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles