Abstract
Even with today's large caches, the increasing performance gap between processors and memory systems imposes a memory bottleneck for many important scientific and com mercial applications. This bottleneck is intensified in shared-memory multiprocessors by contention and the ef fects of cache coherency. Under heavy memory contention, the memory latency may increase two or three times. Nonethless, as more sophisticated techniques are used to hide latency and increase bandwidth, measuring memory performance has become increasingly difficult. Previous simple methods to measure memory performance can overestimate unipro cessor memory latency and underestimate bandwidth by tens of percent. We introduce a micro benchmark suite that measures memory hierarchy performance in light of both uniprocessor optimizations and the contention and coherence effects of multiprocessors. The benchmark suite has been used to improve the memory system performance of the SGI Origin multiprocessor.