Proceedings of 3rd International Conference on High Performance Computing (HiPC)
Download PDF

Abstract

In this paper, we investigate techniques to incorporate fault tolerance in superscalar processors, the de facto execution model for building processors today. We first analyze the different ways in which errors can manifest when faults occur in various parts of a superscalar processor. We then describe different ways of detecting and recovering from these errors, and the merits and demerits of these schemes. Finally, we present the results of a simulation study conducted to determine the performance loss incurred due to the introduction of these fault tolerance schemes. These results suggest that fault tolerance can be incorporated in superscalar processors, with low hardware overhead, low performance overhead, and good error coverage.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!