Abstract
Two different fault-tolerant architectural concepts for a computer node to be used in a distributed embedded environment have been developed to meet the requirements that the system can sustain at least two independent, nonsimulation hardware failures and remain operational. The architectures are distinguished by the organization of their fault-tolerant algorithm hardware. An analysis is made of these two architectures, and several issues on the reliability analysis of such complex architectures are addressed. Techniques are developed to reduce the complexity of the reliability model. An analysis of the interrelationship between the number of retries and their effect upon system reliability for different average transient lifetimes has also been performed.<>