Parallelization of BLAST with MapReduce for Long Sequence Alignment

Xiao-liang Yang; Yu-long Liu; Chun-feng Yuan; Yi-hua Huang

doi:10.1109/PAAP.2011.36

Abstract

The landscape of distributed computing is rapidly evolving, with computers exhibiting increasing processing capabilities with many-core architectures. Almost every field of science is now data driven and requires analysis of massive datasets. The algorithms for analytics such as machine learning can be used to discover properties of a given dataset and make predictions based on it. However, there is still a lack of simple and unified programming frameworks for these data intensive applications, and many existing efforts are designed with specialized means to speed up individual algorithms. In this thesis research, a distributed programming model, MapCollective, is defined so that it can be easily applied to many machine learning algorithms. Specifically, algorithms that fit the iterative computation model can be easily parallelized with a unique collective communication layer for efficient synchronization. In contrast to traditional parallelization strategies that focus on handling high volume input data, a lesser known challenge is that the shared model data between parallel workers, is equally high volume in multidimensions and required to be communicated continually during the entire execution. This extends the understanding of data aspects in computation from in-memory caching of input data (e.g. iterative MapReduce model) to fine-grained synchronization on model data (e.g. MapCollective model). A library called Harp is developed as a Hadoop plugin to demonstrate that sophisticated machine learning algorithms can be simply abstracted with the MapCollective model and conveniently developed on top of the MapReduce framework. K-means and Multi-Dimensional Scaling (MDS) are tested over 4096 threads on the IU Big Red II Supercomputer. The results show linear speedup with an increasing number of parallel units.

Parallelization of BLAST with MapReduce for Long Sequence Alignment

Authors

Abstract

Similar Articles