Abstract
The increasing demand for high-quality graphics requires a significant increase in computational power of modern GPUs. The common approach to follow is augmenting the number of compute units (i.e., shader cores). However, this can result in underutilized resources if the workload is not properly balanced. This is particularly challenging in Tile-Based Rendering (TBR) GPUs, the predominant architecture in mobile GPUs, running graphics applications due to limited per-tile workload. This work proposes parallel tile rendering to efficiently in-crease the computational capabilities of TBR GPUs. This solves the problem of not having enough work to utilize the additional compute units but causes memory-intensive applications to underperform due to the increased memory pressure. To this end, we introduce LIBRA, a parallel tile rendering architecture that includes a novel locality-aware approach to schedule tiles to Raster Units to evenly distribute memory requests during the rendering of each frame. This alleviates memory congestion, therefore, reducing memory access time. LIBRA leverages frame-to-frame coherence to predict the memory pressure of each tile of a frame without penalizing the hit ratio of the cache memories. Evaluations over a wide range of commercial gaming applications show that LIBRA reduces the average memory latency by 13.5% and achieves an average speedup of 20.9%. It also provides an 11.4% improvement in throughput (frames per second) and a total GPU energy reduction of 9.2%, while adding negligible overhead.