Cluster Computing and the Grid, IEEE International Symposium on
Download PDF

Abstract

Computational biology sequence alignment tools using the Burrows-Wheeler Transform (BWT) are widely used in next-generation sequencing (NGS) analysis. However, despite extensive optimization efforts, the performance of these tools still cannot keep up with the explosive growth of sequencing data. Through an in-depth performance analysis of BWA, a popular BWT-based aligner on multicore architectures, we demonstrate that such tools are limited by memory bandwidth due to their irregular memory access patterns. We then propose a locality-aware implementation of BWA that aims at optimizing its performance by better exploiting the caching mechanisms of modern multicore processors. Experimental results show that our improved BWA implementation can reduce last-level cache (LLC) misses by 30% and translation look aside buffer (TLB) misses by 20%, resulting in up to 2.6-fold speedup over the original BWA implementation.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles