2023 3rd International Conference on Electronic Information Engineering and Computer Science (EIECS)
Download PDF

Abstract

To implement target detection algorithms such as YOLO on FPGA and meet the strict requirement of real-time target detection with low latency, a variety of optimization methods from model quantization to hardware optimization are needed. Firstly, the layer fusion and the bit width quantization strategy are used to reduce computational complexity. Then, the column-based fine-grained pipeline architecture with padding skip technology is used to reduce start time. Next, the double symbol multiplication correction circuit is introduced to shorten the calculation time of CNN. Finally, the design space exploration algorithm is used to solve the problem of resource allocation in the FPGA-based convolutional neural network hardware accelerator and improve the efficiency of DSP. To verify the neural network accelerator architecture, we implement YOLOv2-tiny network on Xilinx ZCU104. Compared with the previous accelerators, the latency is reduced by 1.88 to 2.07 times, and the DSP efficiency reaches 90.9%.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles