Parallel and Distributed Processing Symposium, International
Download PDF

Abstract

In this paper, we first develop a novel architecture for fixed-point LU decomposition of streaming input matrices, on FPGAs. Our architecture, based on a circular linear array, achieves the minimal latency and is resource-efficient. We then extend it, by using a stacked matrices approach, to a floating-point based architecture which achieves the minimal effective latency. Our design objective was to develop high-throughput and energy-efficient architectures for applications which require computing LU decomposition. We analyze (1) the impact of high-throughput, pipelined floating-point units (with different depths of pipelining and different performance) on the architecture?s performance, and (2) the impact of algorithm level design on the system-wide energy dissipation. We analyze the energy dissipation by capturing algorithm and architectural details of the target FPGA device. We analyze and compare our architecture with a state-of-art architecture implemented on FPGAs with respect to latency, area and energy. Our designs achieve a 10%-60% reduction in energy over that of the state-of-art architecture.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles