2024 IEEE 31st International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)
Download PDF

Abstract

One of the crucial components of the cluster-based HPC system is its high-performance interconnect. Some popular HPC interconnects are InfiniBand (IB), Gigabit Ethernet, EFA, Slingshot and Omni-Path. Along with the performance of the network, its efficient management is also essential for a large HPC cluster. One such entity that performs the role of management of an IB subnet is OpenSM.Trinetra-A is a high-performance, 100 Gbps, switchless torus, proprietary interconnect developed by the Centre for Development of Advanced Computing (C-DAC). Instead of developing a network manager for Trinetra-A, OpenSM has been adapted for its management. Typically, OpenSM runs on a switched network and discovers the entire network. However, Trinetra-A is a switchless network, preventing OpenSM from discovering the complete network.This work presents a method to port OpenSM onto a switchless torus fabric by emulating an IB switch in software, without modifying the network hardware or OpenSM. This emulation is achieved by intercepting subnet management query packets in the Trinetra-A IB interface driver. These queries are responded suitably as if they were replies from an IB switch. Thus, while physical topology is a switchless torus, it is made to appear as a switched topology to OpenSM. The mechanism has been successfully validated on 24 server nodes connected in a 3D torus fashion using Trinetra-A interconnect. Observations show that OpenSM discovers the entire subnet, including the virtual IB switch. Subsequently, the subnet is configured successfully with the required network management parameters. Hence, the proposed method can be used to enable OpenSM over any network supporting IB software interface and with a topology that is not supported by OpenSM.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles