Distributed Shared Abstractions (DSA) on Multiprocessors

Christian Clémençon; Bodhisattwa Mukherjee; Karsten Schwan

doi:10.1109/32.485223

Abstract

Abstract—Any parallel program has abstractions that are shared by the program's multiple processes, including data structures containing shared data, code implementing operations like global sums or minima, type instances used for process synchronization or communication. Such shared abstractions can considerably affect the performance of parallel programs, on both distributed and shared memory multiprocessors. As a result, their implementation must be efficient, and such efficiency should be achieved without unduly compromising program portability and maintainability. Unfortunately, efficiency and portability can be at cross-purposes, since high performance typically requires changes in the representation of shared abstractions across different parallel machines.The primary contribution of the DSA library presented and evaluated in this paper is its representation of shared abstractions as objects that may be internally distributed across different nodes of a parallel machine. Such distributed shared abstractions (DSA) are encapsulated so that their implementations are easily changed while maintaining program portability across parallel architectures ranging from small-scale multiprocessors, to medium-scale shared and distributed memory machines, and potentially, to networks of computer workstations. The principal results presented in this paper are 1) a demonstration that the fragmentation of object state across different nodes of a multiprocessor machine can significantly improve program performance, and 2) that such object fragmentation can be achieved without compromising portability by changing object interfaces. These results are demonstrated using implementations of the DSA library on several medium-scale multiprocessors, including the BBN Butterfly, Kendall Square Research, and SGI shared memory multiprocessors. The DSA library's evaluation uses synthetic workloads and a parallel implementation of a branch-and-bound algorithm for solving the Traveling Salesperson Problem (TSP).

Distributed Shared Abstractions (DSA) on Multiprocessors

Authors

Keywords

Abstract

References