Abstract—Any parallel program has abstractions that are shared by the program's multiple processes, including data structures containing shared data, code implementing operations like global sums or minima, type instances used for process synchronization or communication. Such shared abstractions can considerably affect the performance of parallel programs, on both distributed and shared memory multiprocessors. As a result, their implementation must be efficient, and such efficiency should be achieved without unduly compromising program portability and maintainability. Unfortunately, efficiency and portability can be at cross-purposes, since high performance typically requires changes in the representation of shared abstractions across different parallel machines.The primary contribution of the DSA library presented and evaluated in this paper is its representation of shared abstractions as objects that may be internally distributed across different nodes of a parallel machine. Such distributed shared abstractions (DSA) are encapsulated so that their implementations are easily changed while maintaining program portability across parallel architectures ranging from small-scale multiprocessors, to medium-scale shared and distributed memory machines, and potentially, to networks of computer workstations. The principal results presented in this paper are 1) a demonstration that the fragmentation of object state across different nodes of a multiprocessor machine can significantly improve program performance, and 2) that such object fragmentation can be achieved without compromising portability by changing object interfaces. These results are demonstrated using implementations of the DSA library on several medium-scale multiprocessors, including the BBN Butterfly, Kendall Square Research, and SGI shared memory multiprocessors. The DSA library's evaluation uses synthetic workloads and a parallel implementation of a branch-and-bound algorithm for solving the Traveling Salesperson Problem (TSP).
1. K. Schwan, and W. Bo, "TopologiesDistributed objects on multicomputers," ACM Trans. Computer Systems, vol. 8, pp. 111-157, May 1990.
2. G.C. Fox, M.A. Johnson, G.A. Lyzenga, S.W. Otto, J.K. Salmon, and D.W. Walker, Solving Problems on Concurrent Processors. Prentice Hall, 1988.
3. T.J. Leblanc, "Shared memory versus message-passing in a tightly-coupled multiprocessor: A case
study," Proc. Int'l Conf. Parallel Processing, pp. 463-466, Aug. 1986.
4. T. Anderson, "The performance of spin lock alternatives for shared-memory multiprocessors," IEEE Trans. Parallel and Distributed Systems, vol. 1, pp. 6-16, Jan. 1990.
5. J.M. Mellor-Crummey, and M.L. Scott, "Algorithms for scalable synchronization on shared-memory multiprocssors," ACM Trans. Computer Systems, vol. 9, pp. 21-65, Feb. 1991.
6. M. Shapiro, "Structure and encapsulation in distributed systems: The proxy principle," Proc. Sixth Int'l Conf. Distributed Computing Systems pp. 198-204, May 1986.
7. E. Cooper, and R. Draves, "C threads," Technical Report No. CMU-CS-88-154, Dept. of Computer Science, Carnegie Mellon Univ., June 1988.
8. B. Mukherjee, "A portable and reconfigurable threads package," Proc. Sun User Group Technical Conf., pp. 101-112, June 1991.
9. A.D. Birrel, and B.J. Nelson, "Implementing remote procedure calls," ACM Trans. Computer Systems, vol. 2, pp. 39-59, Feb. 1984.
10. M. Satayanarayanan, J. Howard, D. Nichols, R. Sidebotham, A. Spector, and M. West, "The ITC distributed file system: Principles and design," Proc. 10th ACM Symp. Operating System Principles, pp. 35-50, Dec. 1985.
11. K. Gharchorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, "Memory consistency and event ordering in scalable shared memory multiprocessors," Proc. 17th Ann. Int'l Symp. Computer Architecture, May 1990.
12. J. Bennett, J. Carter, and W. Zwaenepol, "Munin: Distributed shared memory based on type-specific memory coherence," Proc. Second Symp. Principles and Practice Parallel Programming, Mar. 1990.
13. P.W. Hutto, and M. Ahamad, "Slow memory: Weakening consistency to enhance concurrency in distributed shared memories," Proc. Int'l Conf. Distributed Computing Systems, pp. 302-311, 1990.
14. K. Li, and P. Hudak, "Memory coherence in shared virtual memory systems," ACM Trans. Computer Systems, vol. 7, pp. 321-359, Nov. 1989.
15. P. Kohli, M. Ahamad, and K. Schwan, "Indigo: User-level support for building distributed shared abstractions," Proc. Fourth IEEE Int'l Symp. High-Performance Distributed Computing, Aug. 1995.
16. H. Bal, M. Kaashoek, and A. Tanenbaum, "Orca: A language for parallel programming of distributed systems," IEEE Trans. Software Engineering, vol. 13, Mar. 1992.
17. V. Karamcheti, and A. Chien, "ConcertEfficient runtime support for concurrent object-oriented programming languages
on stock hardware," Proc. Supercomputing, May 1993.
18. W. Weihl, E. Brewer, A. Colbrook, C. Dellarocas, W. Hsieh, A. Joseph, C. Waldspurger, and P. Wang, "Prelude: A system for portable parallel software," MIT Lab for Computer Science, Technical Report MIT/LCS/TR-519, Oct. 1991.
19. W. Hsieh, K. Johnson, M. Kaashoek, D. Wallach, and W. Weihl, "Optimistic active messages: A mechanism for scheduling communication with computation," Proc. Symp. Principles and Practice of Parallel Programming, July 1995
20. E. Spertus, and W.J. Dally, "Evaluating and locality benefits of active messages," Proc. Symp. Principles and Practice of Parallel Programming, July 1995.
21. B. Mukherjee, D. Silva, K. Schwan, and A. Gheith, "KTK: Kernel support for configurable objects and invocations," Distributed Systems Engineering J., vol. 1, pp. 259-270, Sept. 1994.
22. M. Shapiro, "Object-supporting operating systems," TCOS Newsletter, vol. 5, pp. 39-42, 1991.
23. D.M. Ogle, K. Schwan, and R. Snodgrass, "The dynamic monitoring of real-time distributed and parallel systems," Technical Report ICS-GIT-90/23, College of Computing, Georgia Inst. of Tech
nology, Atlanta, May 1990.
24. C. Kilpatrick, and K. Schwan, "ChaosmonApplication-specific monitoring and display of performance information for
parallel and distributed systems," Proc. ACM Workshop Parallel and Distributed Debugging, pp. 57-67, May 1991.
25. B. Mukherjee, and K. Schwan, "Improving performance by use of adaptive objects: Experimentation with a configurable
multiprocessor thread package," Proc. Second Int'l Symp. High Performance Distributed Computing, pp. 59-66, July 1993.
26. W. Gu, G. Eisenhauer, E. Kraemer, K. Schwan, J. Stasko, J. Vetter, and N. Mallavarupu, "Falcon: On-line monitoring and steering of large-scale parallel programs," Technical Report GIT-CC-94-21, Georgia Inst. of Technology, College of Computing, Atlanta, Apr. 1994.
27. A.K. Jones, and K. Schwan, "Task forces: Distributed software for solving problems of substantial size," Proc. Fourth Int'l Conf. Software Engineering, pp. 315-329, Sept. 1979.
28. R.H. Halstead Jr., "Multilisp: A language for concurrent symbolic computation," ACM Trans. Programming Languages and Systems, vol. 7, pp. 501-538, Oct. 1985.
29. R. Finkel, and U. Manber, "DibA distributed implementation of backtracking," ACM Trans. Programming Languages and Systems, vol. 9, pp. 235-255, Apr. 1987.
30. K. Schwan, B. Blake, W. Bo, and J. Gawkowski, "Global data and control in multicomputers: Operating system primitives and experimentation
with a parallel branch-and-bound algorithm," Concurrency: Practice and Experience, pp. 191-218, Dec. 1989.
31. D.S.J.D. Little, K. Murty, and C. Karel, "An algorithm for the traveling salesman problem," Operations Research, vol. 11, 1963.
32. J. Mohan, "Experience with two parallel programs solving the parallel salesman problem," Proc. Int'l Conf. Parallel Processing, pp. 191-193, Aug. 1983.
33. K. Schwan, J. Gawkowski, and B. Blake, "Process and workload migration for a parallel branch-and-bound algorithm on a hypercube
multicomputer," Proc. Third Conf. Hypercube Concurrent Computers and Applications, pp. 1,520-1,530, Jan. 1988.
34. E. Chaves Jr., P. Das, T. LeBlanc, B. Marsh, and M. Scott, "Kernel-kernel communication in a shared-memory multiprocessor," Concurrency: Practice and Experience, vol. 5, pp. 171-192, May 1993.
35. E. Felten, "Best-first branch-and-bound on a hypercube," Proc. Third Conf. Hypercube Concurrent Computers and Applications, Jan. 1988.
36. K. Ghosh, B. Mukherjee, and K. Schwan, "Experimentation with configurable lightweight threads on a ksr multiprocessor," Technical report GIT-CC-93/37, College of Computing, Georgia Inst. of Tech
nology, Atlanta, 1993.
37. K. Schwan, H. Forbes, A. Gheith, B. Mukherjee, and Y. Samiotakis, "A C thread library for multiprocessors," Technical Report GIT-ICS-91/02, College of Computing, Georgia Inst. of Tech
nology, Atlanta, Jan. 1991.
38. A. Cox, R. Fowler, and J. Veenstra, "Interprocessor invocation on a numa multiprocessor," Technical report TR 356, Univ. of Rochester, 1990.
39. D. Eager, and J. Zahorjan, "Enhanced run-time support for shared memory parallel computing," ACM Trans. Computer Systems, vol. 11, pp. 1-32, Feb. 1993.
40. G. Alverson, and D. Notkin, "Program structuring for effective parallel portability," Proc. IEEE Trans. Parallel and Distributed Systems, vol. 4, pp. 1,041-1,059, Sept. 1993
41. L. Crowl, "Architectural adaptability in parallel programming," PhD thesis, Dept. of Computer Science, Univ. of Rochester, May 1991.
42. B. Mukherjee, and K. Schwan, "Experimentation with a reconfigurable micro-kernel," Proc. Second Workshop Microkernels and Other Kernel Architectures, Sept. 1993.
43. B. Lindgren, B. Krupczak, M. Ammar, and K. Schwan, "Parallel and configurable protocols: Experiences with a prototype and an architectural
framework," Proc. Int'l Conf. Network Protocols, 1993.
44. A. Gheith, and K. Schwan, "Chaos-arcKernel support for multi-weight objects, invocations, and atomicity in real-time
applications," ACM Trans. Computer Systems, vol. 11, pp. 33-72, Apr. 1993.
45. G. Eisenhauer, W. Gu, T. Kindler, K. Schwan, D. Silva, and J. Vetter, "Opportunities and tools for highly interactive distributed and parallel computing," Technical Report GIT-CC-94-58, Georgia Inst. of Technology, College of Computing, Atlanta, Dec. 1994.
46. M. Schroeder, and M. Burrows, "Performance or firefly rpc," Proc. 12th ACM Symp. Operating Systems, pp. 83-90, Dec. 1989.