Abstract
The use Graphic Processing Units (GPU) as computing accelerators has been. Nevertheless, writing efficient GPU programs is a difficult and time consuming task. In this paper we present the Linear Performance Breakdown Model (LBPM), an analytic model that is used to extract the breakdown of GPU kernel programs execution time into the three major components that affect its running time. The model can be used as a tool to provide guidelines to detect the performance bottlenecks. Our approach is the incorporation of three elements, the Global-to-Shared Memory Time slice, Shared-to-Private Time slice and Processing Units Time slice. These three factors are integrated into a performance model formula by applying the Normalized Least Squares Method (NLSM). The resulting coefficients are used to construct a performance breakdown graph that reveals the effects of each element in the total execution time of the kernel program. We demonstrate the results obtained with our proposed model with two common numeric routines: Single-Precision General Matrix Multiplication (SGMM) and Fast Fourier Transform (FFT), and apply the model to the results obtained from two GPU devices: A8-3870 AMD Accelerated Processing Unit (APU) and a GTX 660 Nvidia GPU.