Abstract
Current semiconductor technologies have become susceptible to high-energy neutrons from space. Following the trends in smaller transistors, lower supply voltage, and higher clock frequency, current microprocessors are susceptible to soft errors, which constitute the vast majority of hardware failures. Based on these trends, it is expected that the quality with respect to reliability becomes important as well as performance for microprocessors. In light of this, a lot of fault-tolerance microarchitectures are recently proposed (see Section 2 in [1]). These studies mainly focus on detecting transient faults, and hence almost every previous study evaluated processor performance in the absence of faults. This analysis only presents the performance impact of constraints introduced by fault detection mechanism. While several papers propose fault recovery mechanisms, only a few evaluated their impact on performance [2, 3]. One of the reasons why this evaluation methodology is widely selected is that faults are expected to be rare enough that the overall performance will be determined by fault-free behavior. However, evaluating recovery cost of fault tolerant execution is also important, because it is predicted that transient hardware faults occur more frequently as semiconductor technology is improved. Therefore, this paper focuses on recovery from faults.