Abstract
This paper presents our experience with exploiting fine-grained pipeline parallelism for wavefront computations on a multicore platform. Wavefront computations have been widely applied in many application areas such as scientific computing algorithms and dynamic programming algorithms. To exploit fine-grained parallelism on multicore platforms, the programmers must consider the problems of synchronization, scheduling strategies and data locality. This paper shows the impact of fine-grained synchronization methods, scheduling strategies and data tile sizes on performance. We propose a low cost, lock-free, and lightweight synchronization method that can fully exploit pipeline parallelism. Our evaluation shows that RNAfold, an application for RNA secondary structures prediction, can achieve the best speedup of 3.88 on four cores under our framework.