Abstract
Multicast is an important collective operation for parallel programs. Some Network Interface Cards (NICs), such as Myrinet, have programmable processors that can be programmed to support multicast. This paper proposes a high performance and reliable NIC-based multicast scheme, in which a NIC-based multi-send mechanism is used to to send multiple replicas of a message to different destinations, and a NIC-based forwarding mechanism to forward the received packets without intermediate host involvement. We have explored different design alternatives and implemented the proposed scheme with the set of best alternatives over Myrinet/GM-2. MPICH-GM has also been modified to take advantage of this scheme. At the GM-level, the NIC-based multicast improves the multicast latency by a factor up to 1.48 for messages \le 512 bytes, and a factor up to 1.86 for 16KB messages over 16 nodes compared to the traditional host-based multicast. Similar improvements are also achieved at the MPI level. In addition, it is demonstrated that NIC-based multicast is tolerant to process skew and has significant benefits for large systems.