Synchronization Trade-Offs in GPU Implementations of Graph Algorithms

Rashid Kaleem; Anand Venkat; Sreepathi Pai; Mary Hall; Keshav Pingali

doi:10.1109/IPDPS.2016.106

2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Synchronization Trade-Offs in GPU Implementations of Graph Algorithms

Year: 2016, Pages: 514-523

DOI Bookmark: 10.1109/IPDPS.2016.106

Authors

Rashid Kaleem
Anand Venkat
Sreepathi Pai
Mary Hall
Keshav Pingali

Abstract

Although there is an extensive literature on GPU implementations of graph algorithms, we do not yet have a clear understanding of how implementation choices impact performance. As a step towards this goal, we studied how the choice of synchronization mechanism affects the end-to-end performance of complex graph algorithms, using stochastic gradient descent (SGD) as an exemplar. We implemented seven synchronization strategies for this application and evaluated them on two GPU platforms, using both road networks and social network graphs as inputs. Our experiments showed that although none of the seven strategies dominates the rest, it is possible to use properties of the platform and input graph to predict the best strategy.

Like what you’re reading?

Already a member?

Get this article FREE with a new membership!

Compile-Time Automatic Synchronization Insertion and Redundant Synchronization Elimination for GPU Kernels
2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)
Performance Characterization of High-Level Programming Models for GPU Graph Analytics
2015 IEEE International Symposium on Workload Characterization (IISWC)
Inter-block GPU communication via fast barrier synchronization
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
Fast PageRank Computation on a GPU Cluster
2012 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2012)
Improving the Scalability of GPU Synchronization Primitives
IEEE Transactions on Parallel & Distributed Systems
Efficient GPU Implementations to Compute the Diameter of a Graph
2019 Seventh International Symposium on Computing and Networking (CANDAR)
Dynamic Graphs on the GPU
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Two-stage Asynchronous Iterative Solvers for multi-GPU Clusters
2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)
YuenyeungSpTRSV: A Thread-Level and Warp-Level Fusion Synchronization-Free Sparse Triangular Solve
IEEE Transactions on Parallel & Distributed Systems
Over-Synchronization in GPU Programs
2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)

Synchronization Trade-Offs in GPU Implementations of Graph Algorithms

Authors

Abstract

Related Articles