Research Article
BibTex RIS Cite

Impact of GPU Optimization Methods on Convolution Operation

Year 2025, Volume: 25 Issue: 4, 804 - 815, 04.08.2025
https://doi.org/10.35414/akufemubid.1491786

Abstract

Convolution is an important method used in many image processing algorithms. It has become even more significant today due to its incorporation in the input layers of Convolutional Neural Networks (CNN) and many similar artificial neural network models. Since these neural network models operate on very large datasets, even a minor improvement in the convolution operation can greatly impact overall performance. In this study, the effects of several optimization methods aimed at enhancing the efficiency of a GPU-based convolution algorithm were examined. Specifically, the focus was on processing more data per thread to reduce memory accesses and utilizing dedicated memory to lower the cost of existing memory accesses. As a result, it was measured that processing varying amounts of data per thread provided a speedup ranging from 2.33x to 2.45x, while the use of dedicated memory increased this range to 2.50x-2.60x. Additionally, packaging memory accesses into larger data structures (vectorized memory access) during the writing of the output image to memory further boosted this speedup to a range of 2.95x-3.22x. The proposed method was found to be 2.72x-2.96x and 4.23x-4.68x faster, respectively, compared to OpenCV and ArrayFire library functions in the best case.

References

  • Aamodt, T.M., Fung, W.W.L., Rogers, T.G. and Martonosi, M., 2018. General-purpose Graphics Processor Architectures. Morgan & Claypool Publishers.
  • Brodtkorb, A.R., Hagen, T.R., Schulz, C. and Hasle, G., 2013. GPU Computing in discrete optimization. Part I: Introduction to the GPU. EURO Journal on Transportation and Logistics, 2, 1-2, 129-157. https://doi.org/10.1007/s13676-013-0025-1
  • Brodtkorb, A.R., Hagen, T.R. and Sætra, M.L., 2013. Graphics processing unit (GPU) programming strategies and trends in GPU computing. Journal of Parallel and Distributed Computing, 73, 1, 4-13. https://doi.org/10.1016/j.jpdc.2012.04.003
  • Chakrabarti, G., Grover, V., Aarts, B., Kong, X., Kudlur, M., Lin, Y., Marathe, J., Murphy, M. and Wang, J. Z., 2012. CUDA: Compiling and optimizing for a GPU platform. Procedia Computer Science, 9, 1910-1919. https://doi.org/10.1016/j.procs.2012.04.209
  • Chen, X., Chen, J., Chen, D.Z. and Hu, X.S., 2017. Optimizing memory efficiency for convolution kernels on Kepler GPUs. 54th Annual Design Automation Conference 2017. Texas, United States, 1-6. https://doi.org/10.48550/arXiv.1705.10591
  • Choi, J. W., Singh, A. and Vuduc, R.W., 2010. Model-driven autotuning of sparse matrix-vector multiply on GPUs. ACM sigplan notices, 45, 5, 115-126. https://doi.org/10.1145/1837853.1693471
  • Hijma, P., Heldens, S., Sclocco, A., Van Werkhoven, B. and Bal, H.E., 2023. Optimization techniques for GPU programming. ACM Computing Surveys, 55, 11, 1-81. https://doi.org/10.1145/3570638
  • Iandola, F.N., Sheffield, D., Anderson, M.J., Phothilimthana, P.M., and Keutzer, K., 2013. Communication-minimizing 2D convolution in GPU registers. 2013 IEEE International Conference on Image Processing. Melbourne, Australia, 2116-2120. https://doi.org/10.1109/ICIP.2013.6738436
  • Kalaiselvi, T., Sriramakrishnan, P. and Somasundaram, K., 2017. Survey of using GPU CUDA programming model in medical image analysis. Informatics in Medicine Unlocked, 9, 133-144. https://doi.org/10.1016/j.imu.2017.08.001
  • Kirk, D.B. and Wen-Mei, W.H., 2016. Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann.
  • Lu, G., Zhang, W. and Wang, Z., 2020. Optimizing GPU memory transactions for convolution operations. 2020 IEEE International Conference on Cluster Computing (CLUSTER). Kobe, Japan, 399-403. https://doi.org/10.1109/CLUSTER49012.2020.00050
  • Magni, A., Dubach, C. and O'Boyle, M.F., 2013. A large-scale cross-architecture evaluation of thread-coarsening. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. Colorado, United States, 1-11. https://doi.org/10.1145/2503210.2503268
  • Topçu, B. and Öz, I., 2022. Performance Evaluation of CUDA Optimizations for Convolution Operations. Yüksek Başarımlı Hesaplama Konferansı (BAŞARIM) 2022. İstanbul, Turkey, 37.
  • Sanders, J. and Kandrot, E., 2010. CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Professional.
  • Smith, S.W., 1997. The Scientist and Engineer’s Guide to Digital Signal Processing. California Technical Pub.
  • van den Braak, G. J., Mesman, B. and Corporaal, H. 2010. Compile-time GPU memory access optimizations. 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation. Samos, Greece, 200-207. https://doi.org/10.1109/ICSAMOS.2010.5642066
  • Van Werkhoven, B., Maassen, J., and Seinstra, F.J. 2011. Optimizing convolution operations in cuda with adaptive tiling. A4MMC’11: Proc. Workshop on Applications for Multi and Many Core Processors. California, United States. https://doi.org/10.1016/j.future.2013.09.003
  • Yang, Y., Xiang, P., Kong, J., Mantor, M. and Zhou, H., 2012. A unified optimizing compiler framework for different GPGPU architectures. ACM Transactions on Architecture and Code Optimization (TACO), 9, 2, 1-33. https://doi.org/10.1145/2207222.2207225 Ludwig, J., Image Convolution, https://web.pdx.edu/~jduh/courses/Archive/geog481w07/Students/Ludwig_ImageConvolution.pdf, (24.01.2025) O'Shea, K., An Introduction to Convolutional Neural Networks, https://arxiv.org/abs/1511.08458, (24.01.2025) Podlozhnyuk, Victor., Image Convolution with CUDA, https://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_64_website/projects/ convolutionSeparable/doc/convolutionSeparable.pdf, (24.01.2025) The User Guide for Nsight Compute, https://docs.nvidia.com/nsight-compute/NsightCompute, (24.01.2025)

Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi

Year 2025, Volume: 25 Issue: 4, 804 - 815, 04.08.2025
https://doi.org/10.35414/akufemubid.1491786

Abstract

Konvolüsyon, birçok görüntü işleme algoritmasında kullanılan önemli bir yöntemdir. Convolutional Neural Network (CNN) ve benzeri birçok sinir ağı yapısının giriş katmanlarında konvolüsyon işlemine yer vermesinden dolayı günümüzde daha da önemli bir duruma gelmiştir. Bu sinir ağı modellerinin çok büyük veri kümeleri üzerinde çalışmasından dolayı konvolüsyon işleminde gerçekleştirilebilecek küçük bir iyileştirme, genel başarımı büyük oranda etkileyebilecektir. Bu çalışmada, GPU tabanlı bir konvolüsyon algoritmasının verimliliğini artırmak amacıyla bir takım eniyileme yöntemlerinin etkisi incelenmiştir. Bu kapsamda, bellek erişimlerini azaltmaya yönelik iş parçacığı başına daha fazla veri işlemenin ve var olan bellek erişimlerinin yükünü azaltmak için adanmış belleklerin kullanımı üzerinde durulmuştur. Sonuç olarak, iş parçacığı başına değişken oranlarda veri işlemenin 2,33-2,45 kat arasında değişken hızlanma sağladığı, adanmış belleklerin kullanımının bu oranı 2,50-2,60 aralığına taşıdığı ölçülmüştür. Bunun yanı sıra, çıktı görüntüsünün belleğe yazılması sırasında bellek erişimlerinin daha büyük veri yapılarında birleştirilmesi (vektörleştirilmiş bellek erişimi) bu hız artışını 2,95-3,22 aralığına çıkarmıştır. Önerilen yöntemin en iyi durumda OpenCV ve ArrayFire kütüphane işlevlerine kıyasla sırasıyla 2,72-2,96 ve 4,23-4,68 kat arasında değişen oranlarda daha hızlı olduğu görülmüştür.

References

  • Aamodt, T.M., Fung, W.W.L., Rogers, T.G. and Martonosi, M., 2018. General-purpose Graphics Processor Architectures. Morgan & Claypool Publishers.
  • Brodtkorb, A.R., Hagen, T.R., Schulz, C. and Hasle, G., 2013. GPU Computing in discrete optimization. Part I: Introduction to the GPU. EURO Journal on Transportation and Logistics, 2, 1-2, 129-157. https://doi.org/10.1007/s13676-013-0025-1
  • Brodtkorb, A.R., Hagen, T.R. and Sætra, M.L., 2013. Graphics processing unit (GPU) programming strategies and trends in GPU computing. Journal of Parallel and Distributed Computing, 73, 1, 4-13. https://doi.org/10.1016/j.jpdc.2012.04.003
  • Chakrabarti, G., Grover, V., Aarts, B., Kong, X., Kudlur, M., Lin, Y., Marathe, J., Murphy, M. and Wang, J. Z., 2012. CUDA: Compiling and optimizing for a GPU platform. Procedia Computer Science, 9, 1910-1919. https://doi.org/10.1016/j.procs.2012.04.209
  • Chen, X., Chen, J., Chen, D.Z. and Hu, X.S., 2017. Optimizing memory efficiency for convolution kernels on Kepler GPUs. 54th Annual Design Automation Conference 2017. Texas, United States, 1-6. https://doi.org/10.48550/arXiv.1705.10591
  • Choi, J. W., Singh, A. and Vuduc, R.W., 2010. Model-driven autotuning of sparse matrix-vector multiply on GPUs. ACM sigplan notices, 45, 5, 115-126. https://doi.org/10.1145/1837853.1693471
  • Hijma, P., Heldens, S., Sclocco, A., Van Werkhoven, B. and Bal, H.E., 2023. Optimization techniques for GPU programming. ACM Computing Surveys, 55, 11, 1-81. https://doi.org/10.1145/3570638
  • Iandola, F.N., Sheffield, D., Anderson, M.J., Phothilimthana, P.M., and Keutzer, K., 2013. Communication-minimizing 2D convolution in GPU registers. 2013 IEEE International Conference on Image Processing. Melbourne, Australia, 2116-2120. https://doi.org/10.1109/ICIP.2013.6738436
  • Kalaiselvi, T., Sriramakrishnan, P. and Somasundaram, K., 2017. Survey of using GPU CUDA programming model in medical image analysis. Informatics in Medicine Unlocked, 9, 133-144. https://doi.org/10.1016/j.imu.2017.08.001
  • Kirk, D.B. and Wen-Mei, W.H., 2016. Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann.
  • Lu, G., Zhang, W. and Wang, Z., 2020. Optimizing GPU memory transactions for convolution operations. 2020 IEEE International Conference on Cluster Computing (CLUSTER). Kobe, Japan, 399-403. https://doi.org/10.1109/CLUSTER49012.2020.00050
  • Magni, A., Dubach, C. and O'Boyle, M.F., 2013. A large-scale cross-architecture evaluation of thread-coarsening. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. Colorado, United States, 1-11. https://doi.org/10.1145/2503210.2503268
  • Topçu, B. and Öz, I., 2022. Performance Evaluation of CUDA Optimizations for Convolution Operations. Yüksek Başarımlı Hesaplama Konferansı (BAŞARIM) 2022. İstanbul, Turkey, 37.
  • Sanders, J. and Kandrot, E., 2010. CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Professional.
  • Smith, S.W., 1997. The Scientist and Engineer’s Guide to Digital Signal Processing. California Technical Pub.
  • van den Braak, G. J., Mesman, B. and Corporaal, H. 2010. Compile-time GPU memory access optimizations. 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation. Samos, Greece, 200-207. https://doi.org/10.1109/ICSAMOS.2010.5642066
  • Van Werkhoven, B., Maassen, J., and Seinstra, F.J. 2011. Optimizing convolution operations in cuda with adaptive tiling. A4MMC’11: Proc. Workshop on Applications for Multi and Many Core Processors. California, United States. https://doi.org/10.1016/j.future.2013.09.003
  • Yang, Y., Xiang, P., Kong, J., Mantor, M. and Zhou, H., 2012. A unified optimizing compiler framework for different GPGPU architectures. ACM Transactions on Architecture and Code Optimization (TACO), 9, 2, 1-33. https://doi.org/10.1145/2207222.2207225 Ludwig, J., Image Convolution, https://web.pdx.edu/~jduh/courses/Archive/geog481w07/Students/Ludwig_ImageConvolution.pdf, (24.01.2025) O'Shea, K., An Introduction to Convolutional Neural Networks, https://arxiv.org/abs/1511.08458, (24.01.2025) Podlozhnyuk, Victor., Image Convolution with CUDA, https://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_64_website/projects/ convolutionSeparable/doc/convolutionSeparable.pdf, (24.01.2025) The User Guide for Nsight Compute, https://docs.nvidia.com/nsight-compute/NsightCompute, (24.01.2025)
There are 18 citations in total.

Details

Primary Language Turkish
Subjects Computer Software
Journal Section Articles
Authors

Kadir Emre Özer 0009-0001-7853-3830

Sercan Demirci 0000-0001-6739-7653

Early Pub Date July 21, 2025
Publication Date August 4, 2025
Submission Date May 29, 2024
Acceptance Date February 5, 2025
Published in Issue Year 2025 Volume: 25 Issue: 4

Cite

APA Özer, K. E., & Demirci, S. (2025). Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, 25(4), 804-815. https://doi.org/10.35414/akufemubid.1491786
AMA Özer KE, Demirci S. Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. August 2025;25(4):804-815. doi:10.35414/akufemubid.1491786
Chicago Özer, Kadir Emre, and Sercan Demirci. “Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 25, no. 4 (August 2025): 804-15. https://doi.org/10.35414/akufemubid.1491786.
EndNote Özer KE, Demirci S (August 1, 2025) Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 25 4 804–815.
IEEE K. E. Özer and S. Demirci, “Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi”, Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, vol. 25, no. 4, pp. 804–815, 2025, doi: 10.35414/akufemubid.1491786.
ISNAD Özer, Kadir Emre - Demirci, Sercan. “Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 25/4 (August2025), 804-815. https://doi.org/10.35414/akufemubid.1491786.
JAMA Özer KE, Demirci S. Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 2025;25:804–815.
MLA Özer, Kadir Emre and Sercan Demirci. “Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, vol. 25, no. 4, 2025, pp. 804-15, doi:10.35414/akufemubid.1491786.
Vancouver Özer KE, Demirci S. Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 2025;25(4):804-15.