Abstract
The replacement policies commonly used in modern processors perform an average of 57% worse than an optimal replacement policy for commercial applications using large, shared caches in a chip-multiprocessor (CMP). Recent proposals that improve the performance of smaller, uniprocessor caches with SPEC CPU workloads do not achieve similar benefits with commercial workloads and larger caches, even though these caches still perform worse than optimal. The recently proposed Shepherd Cache replacement policy reduces miss-ratios by 7.3% on average, but it relies on an impractical LRU policy and requires 5.3% overhead relative to the total cache capacity. We propose two new, practical, low-overhead replacement policies that mimic shepherd cache with significantly less meta-data overhead. First, we propose a Lightweight shepherd cache design that reduces miss-ratios by 8% on average and up to 19%, while requiring only 1.9% meta-data overhead. We also propose an extra-lightweight shepherd cache design that reduces overhead to only 0.5% when combined with a practical clock replacement policy while reducing miss-ratios by an average of 5.4% and up to 14%.