2017 IEEE International Parallel and Distributed Processing Symposium: Workshops (IPDPSW)
Download PDF

Abstract

The K-means algorithm is one of the simplest and most universal clustering algorithms. Significant work has been carried out over several years to improve its performance in both academic and industrial applications. Researchers have optimized K-means not only on the algorithm level but also on the architecture level. Notably, GEMM, a rigorously studied matrix multiplication operation, has been used to speed up the Euclidean-distance calculations in the K-means algorithm. The Intel DAAL library currently provides a fast K-means implementation based on the Intel Math Kernel Library GEMM subroutine and low-level architecture information. However, in spite of utilizing the MKL GEMM subroutine and architecture properties, the performance of the state-of-the-art K-means implementation is still far from its hardware peak performance. This paper presents a faster fused-matrix K-means kernel that is superior to current K-means designs. Based on our experimental results, the fused matrix K-means kernel runs around 76% faster than the state-of-the-art Intel DAAL K-means algorithm and is able to achieve nearly double floating point performance on Intel x86-84 Ivy micro-architectures.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Similar Articles