Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi

Kadir Emre Özer; Sercan Demirci

doi:10.35414/akufemubid.1491786

Research Article

Impact of GPU Optimization Methods on Convolution Operation

Year 2025, Volume: 25 Issue: 4, 804 - 815, 04.08.2025

Kadir Emre Özer , Sercan Demirci

https://doi.org/10.35414/akufemubid.1491786

Abstract

Convolution is an important method used in many image processing algorithms. It has become even more significant today due to its incorporation in the input layers of Convolutional Neural Networks (CNN) and many similar artificial neural network models. Since these neural network models operate on very large datasets, even a minor improvement in the convolution operation can greatly impact overall performance. In this study, the effects of several optimization methods aimed at enhancing the efficiency of a GPU-based convolution algorithm were examined. Specifically, the focus was on processing more data per thread to reduce memory accesses and utilizing dedicated memory to lower the cost of existing memory accesses. As a result, it was measured that processing varying amounts of data per thread provided a speedup ranging from 2.33x to 2.45x, while the use of dedicated memory increased this range to 2.50x-2.60x. Additionally, packaging memory accesses into larger data structures (vectorized memory access) during the writing of the output image to memory further boosted this speedup to a range of 2.95x-3.22x. The proposed method was found to be 2.72x-2.96x and 4.23x-4.68x faster, respectively, compared to OpenCV and ArrayFire library functions in the best case.

Keywords

GPGPU , CUDA , Convolution , Thread Coarsening , Optimization , Parallel Programming

References

Aamodt, T.M., Fung, W.W.L., Rogers, T.G. and Martonosi, M., 2018. General-purpose Graphics Processor Architectures. Morgan & Claypool Publishers.
Brodtkorb, A.R., Hagen, T.R., Schulz, C. and Hasle, G., 2013. GPU Computing in discrete optimization. Part I: Introduction to the GPU. EURO Journal on Transportation and Logistics, 2, 1-2, 129-157. https://doi.org/10.1007/s13676-013-0025-1
Brodtkorb, A.R., Hagen, T.R. and Sætra, M.L., 2013. Graphics processing unit (GPU) programming strategies and trends in GPU computing. Journal of Parallel and Distributed Computing, 73, 1, 4-13. https://doi.org/10.1016/j.jpdc.2012.04.003
Chakrabarti, G., Grover, V., Aarts, B., Kong, X., Kudlur, M., Lin, Y., Marathe, J., Murphy, M. and Wang, J. Z., 2012. CUDA: Compiling and optimizing for a GPU platform. Procedia Computer Science, 9, 1910-1919. https://doi.org/10.1016/j.procs.2012.04.209
Chen, X., Chen, J., Chen, D.Z. and Hu, X.S., 2017. Optimizing memory efficiency for convolution kernels on Kepler GPUs. 54th Annual Design Automation Conference 2017. Texas, United States, 1-6. https://doi.org/10.48550/arXiv.1705.10591
Choi, J. W., Singh, A. and Vuduc, R.W., 2010. Model-driven autotuning of sparse matrix-vector multiply on GPUs. ACM sigplan notices, 45, 5, 115-126. https://doi.org/10.1145/1837853.1693471
Hijma, P., Heldens, S., Sclocco, A., Van Werkhoven, B. and Bal, H.E., 2023. Optimization techniques for GPU programming. ACM Computing Surveys, 55, 11, 1-81. https://doi.org/10.1145/3570638
Iandola, F.N., Sheffield, D., Anderson, M.J., Phothilimthana, P.M., and Keutzer, K., 2013. Communication-minimizing 2D convolution in GPU registers. 2013 IEEE International Conference on Image Processing. Melbourne, Australia, 2116-2120. https://doi.org/10.1109/ICIP.2013.6738436
Kalaiselvi, T., Sriramakrishnan, P. and Somasundaram, K., 2017. Survey of using GPU CUDA programming model in medical image analysis. Informatics in Medicine Unlocked, 9, 133-144. https://doi.org/10.1016/j.imu.2017.08.001
Kirk, D.B. and Wen-Mei, W.H., 2016. Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann.
Lu, G., Zhang, W. and Wang, Z., 2020. Optimizing GPU memory transactions for convolution operations. 2020 IEEE International Conference on Cluster Computing (CLUSTER). Kobe, Japan, 399-403. https://doi.org/10.1109/CLUSTER49012.2020.00050
Magni, A., Dubach, C. and O'Boyle, M.F., 2013. A large-scale cross-architecture evaluation of thread-coarsening. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. Colorado, United States, 1-11. https://doi.org/10.1145/2503210.2503268
Topçu, B. and Öz, I., 2022. Performance Evaluation of CUDA Optimizations for Convolution Operations. Yüksek Başarımlı Hesaplama Konferansı (BAŞARIM) 2022. İstanbul, Turkey, 37.
Sanders, J. and Kandrot, E., 2010. CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Professional.
Smith, S.W., 1997. The Scientist and Engineer’s Guide to Digital Signal Processing. California Technical Pub.
van den Braak, G. J., Mesman, B. and Corporaal, H. 2010. Compile-time GPU memory access optimizations. 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation. Samos, Greece, 200-207. https://doi.org/10.1109/ICSAMOS.2010.5642066
Van Werkhoven, B., Maassen, J., and Seinstra, F.J. 2011. Optimizing convolution operations in cuda with adaptive tiling. A4MMC’11: Proc. Workshop on Applications for Multi and Many Core Processors. California, United States. https://doi.org/10.1016/j.future.2013.09.003
Yang, Y., Xiang, P., Kong, J., Mantor, M. and Zhou, H., 2012. A unified optimizing compiler framework for different GPGPU architectures. ACM Transactions on Architecture and Code Optimization (TACO), 9, 2, 1-33. https://doi.org/10.1145/2207222.2207225 Ludwig, J., Image Convolution, https://web.pdx.edu/~jduh/courses/Archive/geog481w07/Students/Ludwig_ImageConvolution.pdf, (24.01.2025) O'Shea, K., An Introduction to Convolutional Neural Networks, https://arxiv.org/abs/1511.08458, (24.01.2025) Podlozhnyuk, Victor., Image Convolution with CUDA, https://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_64_website/projects/ convolutionSeparable/doc/convolutionSeparable.pdf, (24.01.2025) The User Guide for Nsight Compute, https://docs.nvidia.com/nsight-compute/NsightCompute, (24.01.2025)

Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi

Year 2025, Volume: 25 Issue: 4, 804 - 815, 04.08.2025

Kadir Emre Özer , Sercan Demirci

https://doi.org/10.35414/akufemubid.1491786

Abstract

Konvolüsyon, birçok görüntü işleme algoritmasında kullanılan önemli bir yöntemdir. Convolutional Neural Network (CNN) ve benzeri birçok sinir ağı yapısının giriş katmanlarında konvolüsyon işlemine yer vermesinden dolayı günümüzde daha da önemli bir duruma gelmiştir. Bu sinir ağı modellerinin çok büyük veri kümeleri üzerinde çalışmasından dolayı konvolüsyon işleminde gerçekleştirilebilecek küçük bir iyileştirme, genel başarımı büyük oranda etkileyebilecektir. Bu çalışmada, GPU tabanlı bir konvolüsyon algoritmasının verimliliğini artırmak amacıyla bir takım eniyileme yöntemlerinin etkisi incelenmiştir. Bu kapsamda, bellek erişimlerini azaltmaya yönelik iş parçacığı başına daha fazla veri işlemenin ve var olan bellek erişimlerinin yükünü azaltmak için adanmış belleklerin kullanımı üzerinde durulmuştur. Sonuç olarak, iş parçacığı başına değişken oranlarda veri işlemenin 2,33-2,45 kat arasında değişken hızlanma sağladığı, adanmış belleklerin kullanımının bu oranı 2,50-2,60 aralığına taşıdığı ölçülmüştür. Bunun yanı sıra, çıktı görüntüsünün belleğe yazılması sırasında bellek erişimlerinin daha büyük veri yapılarında birleştirilmesi (vektörleştirilmiş bellek erişimi) bu hız artışını 2,95-3,22 aralığına çıkarmıştır. Önerilen yöntemin en iyi durumda OpenCV ve ArrayFire kütüphane işlevlerine kıyasla sırasıyla 2,72-2,96 ve 4,23-4,68 kat arasında değişen oranlarda daha hızlı olduğu görülmüştür.

Keywords

GPGPU , CUDA , Konvolüsyon , İş Parçacığı Kalınlaştırma , Eniyileme , Paralel Programlama

References

Aamodt, T.M., Fung, W.W.L., Rogers, T.G. and Martonosi, M., 2018. General-purpose Graphics Processor Architectures. Morgan & Claypool Publishers.
Brodtkorb, A.R., Hagen, T.R., Schulz, C. and Hasle, G., 2013. GPU Computing in discrete optimization. Part I: Introduction to the GPU. EURO Journal on Transportation and Logistics, 2, 1-2, 129-157. https://doi.org/10.1007/s13676-013-0025-1
Brodtkorb, A.R., Hagen, T.R. and Sætra, M.L., 2013. Graphics processing unit (GPU) programming strategies and trends in GPU computing. Journal of Parallel and Distributed Computing, 73, 1, 4-13. https://doi.org/10.1016/j.jpdc.2012.04.003
Chakrabarti, G., Grover, V., Aarts, B., Kong, X., Kudlur, M., Lin, Y., Marathe, J., Murphy, M. and Wang, J. Z., 2012. CUDA: Compiling and optimizing for a GPU platform. Procedia Computer Science, 9, 1910-1919. https://doi.org/10.1016/j.procs.2012.04.209
Chen, X., Chen, J., Chen, D.Z. and Hu, X.S., 2017. Optimizing memory efficiency for convolution kernels on Kepler GPUs. 54th Annual Design Automation Conference 2017. Texas, United States, 1-6. https://doi.org/10.48550/arXiv.1705.10591
Choi, J. W., Singh, A. and Vuduc, R.W., 2010. Model-driven autotuning of sparse matrix-vector multiply on GPUs. ACM sigplan notices, 45, 5, 115-126. https://doi.org/10.1145/1837853.1693471
Hijma, P., Heldens, S., Sclocco, A., Van Werkhoven, B. and Bal, H.E., 2023. Optimization techniques for GPU programming. ACM Computing Surveys, 55, 11, 1-81. https://doi.org/10.1145/3570638
Iandola, F.N., Sheffield, D., Anderson, M.J., Phothilimthana, P.M., and Keutzer, K., 2013. Communication-minimizing 2D convolution in GPU registers. 2013 IEEE International Conference on Image Processing. Melbourne, Australia, 2116-2120. https://doi.org/10.1109/ICIP.2013.6738436
Kalaiselvi, T., Sriramakrishnan, P. and Somasundaram, K., 2017. Survey of using GPU CUDA programming model in medical image analysis. Informatics in Medicine Unlocked, 9, 133-144. https://doi.org/10.1016/j.imu.2017.08.001
Kirk, D.B. and Wen-Mei, W.H., 2016. Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann.
Lu, G., Zhang, W. and Wang, Z., 2020. Optimizing GPU memory transactions for convolution operations. 2020 IEEE International Conference on Cluster Computing (CLUSTER). Kobe, Japan, 399-403. https://doi.org/10.1109/CLUSTER49012.2020.00050
Magni, A., Dubach, C. and O'Boyle, M.F., 2013. A large-scale cross-architecture evaluation of thread-coarsening. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. Colorado, United States, 1-11. https://doi.org/10.1145/2503210.2503268
Topçu, B. and Öz, I., 2022. Performance Evaluation of CUDA Optimizations for Convolution Operations. Yüksek Başarımlı Hesaplama Konferansı (BAŞARIM) 2022. İstanbul, Turkey, 37.
Sanders, J. and Kandrot, E., 2010. CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Professional.
Smith, S.W., 1997. The Scientist and Engineer’s Guide to Digital Signal Processing. California Technical Pub.
van den Braak, G. J., Mesman, B. and Corporaal, H. 2010. Compile-time GPU memory access optimizations. 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation. Samos, Greece, 200-207. https://doi.org/10.1109/ICSAMOS.2010.5642066
Van Werkhoven, B., Maassen, J., and Seinstra, F.J. 2011. Optimizing convolution operations in cuda with adaptive tiling. A4MMC’11: Proc. Workshop on Applications for Multi and Many Core Processors. California, United States. https://doi.org/10.1016/j.future.2013.09.003
Yang, Y., Xiang, P., Kong, J., Mantor, M. and Zhou, H., 2012. A unified optimizing compiler framework for different GPGPU architectures. ACM Transactions on Architecture and Code Optimization (TACO), 9, 2, 1-33. https://doi.org/10.1145/2207222.2207225 Ludwig, J., Image Convolution, https://web.pdx.edu/~jduh/courses/Archive/geog481w07/Students/Ludwig_ImageConvolution.pdf, (24.01.2025) O'Shea, K., An Introduction to Convolutional Neural Networks, https://arxiv.org/abs/1511.08458, (24.01.2025) Podlozhnyuk, Victor., Image Convolution with CUDA, https://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_64_website/projects/ convolutionSeparable/doc/convolutionSeparable.pdf, (24.01.2025) The User Guide for Nsight Compute, https://docs.nvidia.com/nsight-compute/NsightCompute, (24.01.2025)

There are 18 citations in total.

Details

Primary Language	Turkish
Subjects	Computer Software
Journal Section	Articles
Authors	Kadir Emre Özer 0009-0001-7853-3830 Sercan Demirci 0000-0001-6739-7653
Early Pub Date	July 21, 2025
Publication Date	August 4, 2025
Submission Date	May 29, 2024
Acceptance Date	February 5, 2025
Published in Issue	Year 2025 Volume: 25 Issue: 4

Cite

APA	Özer, K. E., & Demirci, S. (2025). Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, 25(4), 804-815. https://doi.org/10.35414/akufemubid.1491786
AMA	Özer KE, Demirci S. Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. August 2025;25(4):804-815. doi:10.35414/akufemubid.1491786
Chicago	Özer, Kadir Emre, and Sercan Demirci. “Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 25, no. 4 (August 2025): 804-15. https://doi.org/10.35414/akufemubid.1491786.
EndNote	Özer KE, Demirci S (August 1, 2025) Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 25 4 804–815.
IEEE	K. E. Özer and S. Demirci, “Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi”, Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, vol. 25, no. 4, pp. 804–815, 2025, doi: 10.35414/akufemubid.1491786.
ISNAD	Özer, Kadir Emre - Demirci, Sercan. “Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 25/4 (August2025), 804-815. https://doi.org/10.35414/akufemubid.1491786.
JAMA	Özer KE, Demirci S. Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 2025;25:804–815.
MLA	Özer, Kadir Emre and Sercan Demirci. “Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, vol. 25, no. 4, 2025, pp. 804-15, doi:10.35414/akufemubid.1491786.
Vancouver	Özer KE, Demirci S. Konvolüsyon İşlemi Üzerinde GPU Eniyileme Yöntemlerinin Etkisi. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 2025;25(4):804-15.

Download Cover Image

Article Files

Full Text

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.