2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS)
Download PDF

Abstract

This paper describes a new parallel execution model motivated by 1) the idea that computation should move to, and execute near, the global data which it accesses, 2) a set of extended memory semantics to provide fine-grained global synchronization, 3) matching shared-memory architecture research, and 4) the need for high performance languages to provide protected system transparency. We compare this new model to MPI, Chapel, X10, and UPC, in terms of 1) expressibility of parallel structures, 2) shared memory synchronization, and 3) performance tuning. Initial simulation results of a graph traversal kernel on a research architecture good speedup up to 256 multicore nodes supporting over 1 million simultaneous threadlets.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles