2022 25th Euromicro Conference on Digital System Design (DSD)
Download PDF

Abstract

The main computation in any CNN is convolution operation. This computation shows significant potential for massively parallel implementations on an FPGA. Systolic arrays with their intrinsic pipelining have been explored for CNN inference. In this paper, we present a systolic array architecture suitably designed for a novel method of convolution operation. We implement an image-kernel convolution and test it with representative image inputs to several models like LeNet-5, AlexNet, VGG-16, and Resnet-34. We compare the proposed design with conventional convolution and HLS based designs. We limit our implementation to resource constrained FPGA: AMD-Xilinx Zynq 7020 platform. We observe that the proposed architecture outperforms the direct convolution method and HLS pipelined designs by 2× and 2.1×, respectively, on average. Since DSP blocks are scarce resources, we constrain our implementation to avoid DSP blocks and use the LUTs instead. Thus, our implementation uses nearly 9× more LUTs than baseline convolution but 8× fewer LUTs than the HLS pipelined implementation. We further accelerate the convolution throughput by 11×. We achieve this by implementing a tiled systolic architecture that completely utilises the parallel computing resources of the FPGA.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles