Abstract
Although there is an extensive literature on GPU implementations of graph algorithms, we do not yet have a clear understanding of how implementation choices impact performance. As a step towards this goal, we studied how the choice of synchronization mechanism affects the end-to-end performance of complex graph algorithms, using stochastic gradient descent (SGD) as an exemplar. We implemented seven synchronization strategies for this application and evaluated them on two GPU platforms, using both road networks and social network graphs as inputs. Our experiments showed that although none of the seven strategies dominates the rest, it is possible to use properties of the platform and input graph to predict the best strategy.