Abstract
We address the problem of optimizing global shared memory usage in deeply heterogeneous accelerators in the context of HPC systems running multiple applications with different quality of service levels. We explore predictive memory allocation algorithms, allowing to serve up to 28% more high priority requests when using a moving average based prediction in a low-workload scenario.