Abstract
Topology changes, such as switches being turned on/off, hot expansion, hot replacement or link re-mapping, are very likely to occur in NOWs and clusters. Moreover, topology changes are much more frequent than faults. However, their impact on real-time communications has not been considered a major problem up to now, mostly because they are not feasible in traditional environments, such as massive parallel processors (MPPs), which have fixed topologies. They are supported and handled by some current and future interconnects, such as Myrinet or Infiniband. Unfortunately, they do not include support for real-time communications in the presence of topology changes. In this paper, we propose and evaluate a new protocol that provides topology change- and fault-tolerant real-time communication services on NOWs and clusters. This protocol overcomes the main drawback of our previously proposed protocol, called Dynamically Re-established Real-Time Channels (DRRTC), which is physically limited by the number of virtual channels per port. The new protocol allows different real-time channels to share the same virtual channel. In this way, the new protocol allows to establish a greater number of real-time channels than the previous one. Moreover, its only limitation is the bandwidth devoted to real-time traffic. However, this introduces two new problems that are successfully managed by the new protocol: the existence of cyclic dependencies among different real-time channels and the increased complexity of deadline requirements. We present and analyze the performance evaluation results when a single switch or a single link is de-activated/ activated for different topologies and workloads. The new protocol overwhelms the DRRTC protocol while guaranteeing deadline requirements and channel recovery. Keywords - NOWs, clusters, real-time services, topology change tolerance, dynamic reconfiguration