Efficient Hardware Optimization for CNN

Seda Güzel Aydın; Hasan Şakir Bilge

Araştırma Makalesi

Yıl 2022, Cilt: 6 Sayı: 1, 38 - 44, 20.07.2022

Seda Güzel Aydın , Hasan Şakir Bilge

https://izlik.org/JA85DY35WP

Öz

Proje Numarası

121E393

Kaynakça

[1] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[2] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv.org, 2014, doi: 10.48550/arXiv.1409.1556.
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, May 2017, doi: 10.1145/3065386.
[4] M. Mikaeili and H. S. Bilge, “Estimating Rotation Angle and Transformation Matrix Between Consecutive Ultrasound Images Using Deep Learning,” 2020 Medical Technologies Congress (TIPTEKNO), Nov. 2020, doi: 10.1109/tiptekno50054.2020.9299237.
[5] C. Huang, S. Ni and G. Chen, "A layer-based structured design of CNN on FPGA," 2017 IEEE 12th International Conference on ASIC (ASICON), 2017, pp. 1037-1040, doi: 10.1109/ASICON.2017.8252656.
[6] W. A. Haque, S. Arefin, A. S. M. Shihavuddin, and M. A. Hasan, “DeepThin: A novel lightweight CNN architecture for traffic sign recognition without GPU requirements,” Expert Systems with Applications, vol. 168, p. 114481, Apr. 2021, doi: 10.1016/j.eswa.2020.114481.
[7] Y. Hu, Y. Liu, and Z. Liu, “A Survey on Convolutional Neural Network Accelerators: GPU, FPGA and ASIC,” 2022 14th International Conference on Computer Research and Development (ICCRD), Jan. 2022, doi: 10.1109/iccrd54409.2022.9730377.
[8] N. Zhang, X. Wei, H. Chen, and W. Liu, “FPGA Implementation for CNN-Based Optical Remote Sensing Object Detection,” Electronics, vol. 10, no. 3, p. 282, Jan. 2021, doi: 10.3390/electronics10030282.
[9] C, Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, “Optimizing fpga-based accelerator design for deep convolutional neural networks.” In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2015; pp. 161–170.
[10] A. Dundar, J. Jin, V. Gokhale, B. Krishnamurthy, A. Canziani, B. Martini, & E. Culurciello,” Accelerating deep neural networks on mobile processor with embedded programmable logic.” In Neural information processing systems conference (NIPS). 2013
[11] M. Arredondo-Velázquez, J. Diaz-Carmona, C. Torres-Huitzil, A. Padilla-Medina, and J. Prado-Olivarez, “A streaming architecture for Convolutional Neural Networks based on layer operations chaining,” Journal of Real-Time Image Processing, vol. 17, no. 5, pp. 1715–1733, Jan. 2020, doi: 10.1007/s11554-019-00938-y.
[12] Y. Ma, Y. Cao, S. Vrudhula, and J. Seo, “Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 7, pp. 1354–1367, Jul. 2018, doi: 10.1109/tvlsi.2018.2815603.
[13] Y. Shen, M. Ferdman and P. Milder, "Maximizing CNN accelerator efficiency through resource partitioning," 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), 2017, pp. 535-547, doi: 10.1145/3079856.3080221.
[14] S. Ghaffari and S. Sharifian, "FPGA-based convolutional neural network accelerator design using high level synthesize," 2016 2nd International Conference of Signal Processing and Intelligent Systems (ICSPIS), 2016, pp. 1-6, doi: 10.1109/ICSPIS.2016.7869873.
[15] Huimin Li, Xitian Fan, Li Jiao, Wei Cao, Xuegong Zhou and Lingli Wang, "A high performance FPGA-based accelerator for large-scale convolutional neural networks," 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016, pp. 1-9, doi: 10.1109/FPL.2016.7577308.
[16] Z. Liu, Y. Dou, J. Jiang, J. Xu, S. Li, Y. Zhou, Y. Xu, “Throughput-optimized fpga accelerator for deep convolutional neural networks.” ACM Trans. Reconfgurable Technol. Syst. (TRETS) 10(3), 17, 2017
[17] Y. Zhou, J. Jiang, “ An FPGA-based accelerator implementation for deep convolutional neural networks.” In Proceedings of the 2015 4th International Conference on Computer Science and Network Technology, ICCSNT 2015, Harbin, China, 19–20 December2015; Volume 1, pp. 829–832.
[18] K. Abdelouahab, M. Pelcat, J. Serot, & F. Berry, “Accelerating CNN inference on FPGAs: A survey.” arXiv preprint arXiv:1806.01683. 2018.
[19] K. Guo, S. Zeng, J. Yu, Y. Wang, & H. Yang” [DL] A survey of FPGA-based neural network inference accelerators.” ACM Transactions on Reconfigurable Technology and Systems (TRETS), 12(1), 1-26.2019.
[20] R. Ayachi, Y. Said, & A. Abdelali, “Optimizing Neural Networks for Efficient FPGA Implementation: A Survey.” Archives of Computational Methods in Engineering, 28(7), 4537–4547. 2021.
[21] G. Muhsin “A Comparative Study between RTL and HLS for Image Processing Applications with FPGAs” thesis, University of California, San Diego, Master of Science.
[22] Vivado Design Suite User Guide High-Level Synthesis Documentation Portal. (2022). Retrieved May 17, 2022, from Xilinx.com website: https://docs.xilinx.com/v/u/2018.3-English/ug902-vivado-high-level-synthesis
[23] S. Guzel Aydin and H. S. Bilge, "FPGA -Based Implementation of Convolutional Layer Accelerator Part for CNN," 2021 Innovations in Intelligent Systems and Applications Conference (ASYU), 2021, pp. 1-6, doi: 10.1109/ASYU52992.2021.9599029.
[24] F. Uysal, F. Hardalaç, O. Peker, T. Tolunay, and N. Tokgöz, “Classification of Shoulder X-ray Images with Deep Learning Ensemble Models,” Applied Sciences, vol. 11, no. 6, p. 2723, Mar. 2021, doi: 10.3390/app11062723.

Efficient Hardware Optimization for CNN

Yıl 2022, Cilt: 6 Sayı: 1, 38 - 44, 20.07.2022

Seda Güzel Aydın , Hasan Şakir Bilge

https://izlik.org/JA85DY35WP

Öz

Convolutional Neural Networks (CNN) architectures have been increasingly well-known for image processing applications such as object detection, and remote sensing. Some applications like these systems need to adopt CNN methods for real-time implementation. Embedded devices like Field Programmable Gate Arrays (FPGA) technologies are a favorable alternative to implementing CNN-based algorithms. However, FPGA has some drawbacks such as limited resources and bottlenecks, it is difficult and so crucial to map the whole CNN that has a high number of layers, on FPGA without any optimization. Therefore, hardware optimization techniques are compulsory. In this study, an FPGA-based CNN architecture using high-level synthesis (HLS) is demonstrated, and a synthesis report is created for Xilinx Zynq-7000 xc7z020-clg484-1 target FPGAs. By implementing the CNN architecture on an FPGA platform, the implemented architecture has been fastened. To improve the throughput, the proposed design is optimized for convolutional layers. The most important contribution of this study is to perform optimization on the convolution layer by unrolling kernels and input feature maps and examine the effects on throughput, latency, and hardware resources. In this study, throughput is 15.6 GOP/s for the first convolution layer. With the proposed method in the study, approximately x2.6 acceleration in terms of latency and throughput was achieved compared to the baseline design.

Anahtar Kelimeler

FPGA , Deep learning , Convolutional neural networks , Hardware optimization , HLS

Destekleyen Kurum

Tubitak

Proje Numarası

121E393

Teşekkür

This research was supported by a grant from (121E393) TUBITAK (Türkiye Bilimsel ve Teknolojik Araştirma Kurumu). We thank the TUBITAK for their support of our research.

Kaynakça

[1] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[2] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv.org, 2014, doi: 10.48550/arXiv.1409.1556.
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, May 2017, doi: 10.1145/3065386.
[4] M. Mikaeili and H. S. Bilge, “Estimating Rotation Angle and Transformation Matrix Between Consecutive Ultrasound Images Using Deep Learning,” 2020 Medical Technologies Congress (TIPTEKNO), Nov. 2020, doi: 10.1109/tiptekno50054.2020.9299237.
[5] C. Huang, S. Ni and G. Chen, "A layer-based structured design of CNN on FPGA," 2017 IEEE 12th International Conference on ASIC (ASICON), 2017, pp. 1037-1040, doi: 10.1109/ASICON.2017.8252656.
[6] W. A. Haque, S. Arefin, A. S. M. Shihavuddin, and M. A. Hasan, “DeepThin: A novel lightweight CNN architecture for traffic sign recognition without GPU requirements,” Expert Systems with Applications, vol. 168, p. 114481, Apr. 2021, doi: 10.1016/j.eswa.2020.114481.
[7] Y. Hu, Y. Liu, and Z. Liu, “A Survey on Convolutional Neural Network Accelerators: GPU, FPGA and ASIC,” 2022 14th International Conference on Computer Research and Development (ICCRD), Jan. 2022, doi: 10.1109/iccrd54409.2022.9730377.
[8] N. Zhang, X. Wei, H. Chen, and W. Liu, “FPGA Implementation for CNN-Based Optical Remote Sensing Object Detection,” Electronics, vol. 10, no. 3, p. 282, Jan. 2021, doi: 10.3390/electronics10030282.
[9] C, Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, “Optimizing fpga-based accelerator design for deep convolutional neural networks.” In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2015; pp. 161–170.
[10] A. Dundar, J. Jin, V. Gokhale, B. Krishnamurthy, A. Canziani, B. Martini, & E. Culurciello,” Accelerating deep neural networks on mobile processor with embedded programmable logic.” In Neural information processing systems conference (NIPS). 2013
[11] M. Arredondo-Velázquez, J. Diaz-Carmona, C. Torres-Huitzil, A. Padilla-Medina, and J. Prado-Olivarez, “A streaming architecture for Convolutional Neural Networks based on layer operations chaining,” Journal of Real-Time Image Processing, vol. 17, no. 5, pp. 1715–1733, Jan. 2020, doi: 10.1007/s11554-019-00938-y.
[12] Y. Ma, Y. Cao, S. Vrudhula, and J. Seo, “Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 7, pp. 1354–1367, Jul. 2018, doi: 10.1109/tvlsi.2018.2815603.
[13] Y. Shen, M. Ferdman and P. Milder, "Maximizing CNN accelerator efficiency through resource partitioning," 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), 2017, pp. 535-547, doi: 10.1145/3079856.3080221.
[14] S. Ghaffari and S. Sharifian, "FPGA-based convolutional neural network accelerator design using high level synthesize," 2016 2nd International Conference of Signal Processing and Intelligent Systems (ICSPIS), 2016, pp. 1-6, doi: 10.1109/ICSPIS.2016.7869873.
[15] Huimin Li, Xitian Fan, Li Jiao, Wei Cao, Xuegong Zhou and Lingli Wang, "A high performance FPGA-based accelerator for large-scale convolutional neural networks," 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016, pp. 1-9, doi: 10.1109/FPL.2016.7577308.
[16] Z. Liu, Y. Dou, J. Jiang, J. Xu, S. Li, Y. Zhou, Y. Xu, “Throughput-optimized fpga accelerator for deep convolutional neural networks.” ACM Trans. Reconfgurable Technol. Syst. (TRETS) 10(3), 17, 2017
[17] Y. Zhou, J. Jiang, “ An FPGA-based accelerator implementation for deep convolutional neural networks.” In Proceedings of the 2015 4th International Conference on Computer Science and Network Technology, ICCSNT 2015, Harbin, China, 19–20 December2015; Volume 1, pp. 829–832.
[18] K. Abdelouahab, M. Pelcat, J. Serot, & F. Berry, “Accelerating CNN inference on FPGAs: A survey.” arXiv preprint arXiv:1806.01683. 2018.
[19] K. Guo, S. Zeng, J. Yu, Y. Wang, & H. Yang” [DL] A survey of FPGA-based neural network inference accelerators.” ACM Transactions on Reconfigurable Technology and Systems (TRETS), 12(1), 1-26.2019.
[20] R. Ayachi, Y. Said, & A. Abdelali, “Optimizing Neural Networks for Efficient FPGA Implementation: A Survey.” Archives of Computational Methods in Engineering, 28(7), 4537–4547. 2021.
[21] G. Muhsin “A Comparative Study between RTL and HLS for Image Processing Applications with FPGAs” thesis, University of California, San Diego, Master of Science.
[22] Vivado Design Suite User Guide High-Level Synthesis Documentation Portal. (2022). Retrieved May 17, 2022, from Xilinx.com website: https://docs.xilinx.com/v/u/2018.3-English/ug902-vivado-high-level-synthesis
[23] S. Guzel Aydin and H. S. Bilge, "FPGA -Based Implementation of Convolutional Layer Accelerator Part for CNN," 2021 Innovations in Intelligent Systems and Applications Conference (ASYU), 2021, pp. 1-6, doi: 10.1109/ASYU52992.2021.9599029.
[24] F. Uysal, F. Hardalaç, O. Peker, T. Tolunay, and N. Tokgöz, “Classification of Shoulder X-ray Images with Deep Learning Ensemble Models,” Applied Sciences, vol. 11, no. 6, p. 2723, Mar. 2021, doi: 10.3390/app11062723.

Toplam 24 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Mühendislik
Bölüm	Araştırma Makalesi
Yazarlar	Seda Güzel Aydın 0000-0001-8875-9705 Hasan Şakir Bilge 0000-0002-4945-0884
Proje Numarası	121E393
Gönderilme Tarihi	1 Haziran 2022
Yayımlanma Tarihi	20 Temmuz 2022
IZ	https://izlik.org/JA85DY35WP
Yayımlandığı Sayı	Yıl 2022 Cilt: 6 Sayı: 1

Kaynak Göster

IEEE	[1]S. Güzel Aydın ve H. Ş. Bilge, “Efficient Hardware Optimization for CNN”, IJMSIT, c. 6, sy 1, ss. 38–44, Tem. 2022, [çevrimiçi]. Erişim adresi: https://izlik.org/JA85DY35WP

Makale Dosyaları

Tam Metin