Proceedings International Conference on Parallel Processing
Download PDF

Abstract

In this paper, we formulate the array robustness theorems (ARTs) for efficient computation and communication on faulty arrays. No hardware redundancy is required and no assumption is made about the availability of a complete submesh or subtorus. Based on ARTs, a very wide variety of problems, including sorting, FFT, total exchange, permutation, and some matrix operations, can be solved with a slowdown factor of 1 +o(1). The number of faults tolerated by ARTs ranges from o(\min(n^{1-\frac{n}{d},\frac{n}{h})) for n-ary d-cubes with worst-case faults to as large as o(N) for most N-node 2-D meshes or tori with random faults, where h is the number of data otems per processor. The resultant running times are the best results reported thus far for solving many problems on faulty arrays. Based on ARTs and several other components such as robust libraries, the priority emulation discipline, and X1Y1 routing, we introduce the robust adaptation interface layer (RAIL) as a middleware between ordinary algorithms/programs (that are originally developed for fault-free arrays) and the faulty network/hardware. In effect, RAIL provides a virtual fault-free network to higher layers, while ordinary algorithms/programs are transformed through RAIL into corresponding robust algorithms/programs that can run on faulty networks.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles