Derin Öğrenme Algoritmalarının GPU ve CPU Donanım Mimarileri Üzerinde Uygulanması ve Performans Analizi: Deneysel Araştırma

Tuğba Saray Çetinkaya; Ahmet Sertbaş

doi:10.31590/ejosat.937936

Araştırma Makalesi

Derin Öğrenme Algoritmalarının GPU ve CPU Donanım Mimarileri Üzerinde Uygulanması ve Performans Analizi: Deneysel Araştırma

Yıl 2022, Sayı: 33, 10 - 19, 31.01.2022

Tuğba Saray Çetinkaya , Ahmet Sertbaş

https://doi.org/10.31590/ejosat.937936

Cited By: 2

https://izlik.org/JA57MW28AL

Öz

Günümüzde hızla gelişen teknolojiyle verilerin çeşitliliği ve boyutu artmaktadır. Bu artış bilgisayar mimarisinde farklı tasarımları ortaya çıkarmıştır. CPU ve GPU mimarileri üzerlerinde bulunan çekirdek sayıları uygulama anında sonuca ulaşmada çözümler sağlayabilmektedir. Yazılım geliştirmesi yapılırken işlem performansı ve güç tüketimine dikkat edilmelidir. CPU’lar GPU’lardan daha uzun işlem süresi ile uygulamaları yürütmektedir. Bu süre performans sırasında harcanan gücü doğru orantılı etkilemektedir. GPU’lar derin öğrenme algoritmalarında CPU’lardan daha hızlı ve başarılı sonuçlar vermektedir. Öğrenme aşamasındaki en önemli kriter olan veri setinin büyüklüğü ve çeşitliliği öğrenme başarısını aynı oranda artırmaktadır. Bu çalışmada farklı mimariye sahip işlemciler üzerinde veri seti büyüklüğü ve işlem süresi kriterleri göz önünde bulundurularak uygulamalar yapılmıştır. Yapılan uygulamalarda GPU mimarilerinde harcanan güç seviyesi ölçülmüştür. Farklı büyüklüğe sahip 3 veri seti üzerinde CNN, RNN ve LSTM derin öğrenme algoritmaları uygulanmıştır. 6 farklı deney yapılarak performans ve enerji tüketimi konularında tespitler ve performans karşılaştırılması yapılmıştır. Çalışma neticesinde elde edilen sonuçlar ile algoritmalar üzerinde çalışmalar yapılırken süre ve enerji kriterleri baz alınmıştır. Bulgular derin öğrenme algoritmalarının yüksek doğrulukta GPU sistemlerinde tahmin edilmesinde yardımcı bir araç olarak kullanılabileceği yönündedir. Araştırmanın sonuçları CPU ve GPU sistemleri ile enerji ve süre açısından önemli bilgiler içermesinin yanı sıra, gelecekte farklı sektörlerde uygulanması açısından değer taşımaktadır.

Anahtar Kelimeler

Derin Öğrenme , CPU , GPU , Bilgisayar Mimarisi , Performans Analizi

Kaynakça

Själander, M., Martonosi, M., & Kaxiras, S. (2014). Power-efficient computer architectures: Recent advances. Synthesis Lectures on Computer Architecture, 9(3), 1-96.
Huang, L., Lü, Y., Ma, S., Xiao, N., & Wang, Z. (2019). SIMD stealing: Architectural support for efficient data parallel execution on multicores. Microprocessors and Microsystems, 65, 136-147.
Li, T., Evans, A. T., Chiravuri, S., Gianchandani, R. Y., & Gianchandani, Y. B. (2012). Compact, power-efficient architectures using microvalves and microsensors, for intrathecal, insulin, and other drug delivery systems. Advanced drug delivery reviews, 64(14), 1639-1649.
Katreepalli, R., & Haniotakis, T. (2019). Power efficient synchronous counter design. Computers & Electrical Engineering, 75, 288-300.
Dehnavi, M., & Eshghi, M. (2018). Cost and power efficient FPGA based stereo vision system using directional graph transform. Journal of Visual Communication and Image Representation, 56, 106-115.
Huynh, T. V., Mücke, M., & Gansterer, W. N. (2012). Evaluation of the Stretch S6 Hybrid Reconfigurable Embedded CPU Architecture for Power-Efficient Scientific Computing. Procedia Computer Science, 9, 196-205.
Lautner, D., Hua, X., DeBates, S., Song, M., & Ren, S. (2018). Power efficient scheduling algorithms for real-time tasks on multi-mode microcontrollers. Procedia computer science, 130, 557-566.
Wyant, C. M., Cullinan, C. R., & Frattesi, T. R. (2012). Computing performance benchmarks among cpu, gpu, and fpga. Computing.
Mittal, S., & Vetter, J. S. (2014). A survey of methods for analyzing and improving GPU energy efficiency. ACM Computing Surveys (CSUR), 47(2), 1-23.
Betkaoui, B., Thomas, D. B., & Luk, W. (2010). Comparing performance and energy efficiency of FPGAs and GPUs for high productivity computing. In 2010 International Conference on Field-Programmable Technology (pp. 94-101). IEEE.
Mumcu, M. C., & Bayar, S. (2020). Parallel Implenetation Of The GPR Techniques For Detecting And Mapping Ancient Buildings By Using CUDA. Avrupa Bilim ve Teknoloji Dergisi, 352-359.
Stratton, J. A., Anssari, N., Rodrigues, C., Sung, I. J., Obeid, N., Chang, L., ... & Hwu, W. M. (2012). Optimization and architecture effects on GPU computing workload performance. In 2012 Innovative Parallel Computing (InPar) (pp. 1-10). IEEE.
InAccel. (2018). Cpu Gpu Fpga or Tpu, https://medium.com/@inaccel/cpu-gpu-fpga-or-tpu-which-one-to-choose-for-my-machine-learning-training-948902f058e0, 15.03.2021.
Kahoul, A., Constantinides, G. A., Smith, A. M., & Cheung, P. Y. (2009). Heterogeneous architecture exploration: Analysis vs. parameter sweep. In International Workshop on Applied Reconfigurable Computing (pp. 133-144). Springer, Berlin, Heidelberg.
Qasaimeh, M., Denolf, K., Lo, J., Vissers, K., Zambreno, J., & Jones, P. H. (2019). Comparing energy efficiency of CPU, GPU and FPGA implementations for vision kernels. In 2019 IEEE International Conference on Embedded Software and Systems (ICESS) (pp. 1-8). IEEE.
Holm, H. H., Brodtkorb, A. R., & Sætra, M. L. (2020). GPU computing with Python: Performance, energy efficiency and usability. Computation, 8(1), 4.
Bandyopadhyay, A., (2019). Hands-On GPU Computing with Python: Explore the capabilities of GPUs for solving high performance computational problems, Packt Publishing, ISBN-13: 978-1789341072
Aydın, S., Samet, R., & Bay, Ö. F. (2020) Gpu Programlamada Cuda Platformu Kullanılan Paralel Görüntü İşleme Çalışmalarının İncelenmesi. Politeknik Dergisi, 23(3), 737-754.
Vaidya, B. (2018). Hands-On GPU-Accelerated Computer Vision with OpenCV and CUDA: Effective techniques for processing complex image data in real time using GPUs. Packt Publishing Ltd.
Goz, D., Ieronymakis, G., Papaefstathiou, V., Dimou, N., Bertocco, S., Simula, F., ... & Taffoni, G. (2020). Performance and energy footprint assessment of FPGAs and GPUs on HPC systems using Astrophysics application. Computation, 8(2), 34.
Lam, S. K., Pitrou, A., & Seibert, S. (2015). Numba: A llvm-based python jit compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC (pp. 1-6).
Okuta, R., Unno, Y., Nishino, D., Hido, S., & Loomis, C. (2017). Cupy: A numpy-compatible library for nvidia gpu calculations. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS) (p. 7).
Kaehler, A., & Bradski, G. (2016). Learning OpenCV 3: computer vision in C++ with the OpenCV library. " O'Reilly Media, Inc.".
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.
Klöckner, A., Pinto, N., Lee, Y., Catanzaro, B., Ivanov, P., & Fasih, A. (2012). PyCUDA and PyOpenCL: A scripting- based approach to GPU run-time code generation. Parallel Computing, 38(3), 157-174.
Sriadibhatla, S., & Baboji, K. (2019). Design and implementation of area and power efficient reconfigurable fir filter with low complexity coefficients. Gazi University Journal of Science, 32(2), 494-507.
Sanapala, K. (2017). Two Novel Subthreshold Logic Families for Area and Ultra Low-Energy Efficient Applications: DTGDI & SBBGDI. Gazi University Journal of Science, 30(4), 283-294.
Demirbas, A. A., & Çınar, A. (2020) Nesne Sınıflandırma İşlemi İçin Tensor İşleme Birimi ve Cpu Performans Karşılaştırması. Bilgisayar Bilimleri ve Teknolojileri Dergisi, 1(1), 10-15.
Dodiu, E., & Gaitan, V. G. (2012). Custom designed CPU architecture based on a hardware scheduler and independent pipeline registers—Concept and theory of operation. In 2012 IEEE International Conference on Electro/Information Technology (pp. 1-5). IEEE.
Pereira, R., Couto, M., Ribeiro, F., Rua, R., Cunha, J., Fernandes, J. P., & Saraiva, J. (2017). Energy efficiency across programming languages: how do energy, time, and memory relate?. In Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering (pp. 256-267).
Çetin, N. M., & Hacıömeroğlu, M. (2013). Gpu Hızlandırmalı Veri Demetleme Algoritmalarının İncelenmesi. Ajıt-E: Bilişim Teknolojileri Online Dergisi, 4(11), 19-59.
Nvidia Time Series Dataset, Modeling Time Series Data with Recurrent Neural Networks in Keras, Son Erişim Tarihi: 02.01.2021, https://courses.nvidia.com/courses/course-v1:DLI+L-FX-24+V1/about adresinden erişildi.
Tensorflow Mnist Dataset, Loads the MNIST dataset, Son Erişim Tarihi: 05.01.2021, https://www.tensorflow.org/api_docs/python/tf/keras/datasets/mnist/load_data adresinden erişildi.
Tensorflow IMDB Dataset, Loads the IMDB dataset, Son Erişim Tarihi: 09.01.2021, https://www.tensorflow.org/api_docs/python/tf/keras/datasets/imdb/load_data adresinden erişildi.
Selvin, S., Vinayakumar, R., Gopalakrishnan, E. A., Menon, V. K., & Soman, K. P. (2017, September). Stock price prediction using LSTM, RNN and CNN-sliding window model. In 2017 international conference on advances in computing, communications and informatics (icacci) (pp. 1643-1647). IEEE.
Ullah, A., Ahmad, J., Muhammad, K., Sajjad, M., & Baik, S. W. (2017). Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE access, 6, 1155-1166.
Zhu, F., Ye, F., Fu, Y., Liu, Q., & Shen, B. (2019). Electrocardiogram generation with a bidirectional LSTM-CNN generative adversarial network. Scientific reports, 9(1), 1-11.

Application of Deep Learning Algorithms to GPU and CPU Hardware Architectures and Performance Analysis: Experimental Reseach

Yıl 2022, Sayı: 33, 10 - 19, 31.01.2022

Tuğba Saray Çetinkaya , Ahmet Sertbaş

https://doi.org/10.31590/ejosat.937936

Cited By: 2

https://izlik.org/JA57MW28AL

Öz

Nowadays, the variety and volume of data is increasing with the rapidly developing technology. This increase has led to different designs in computer architecture. The number of cores on CPU and GPU architectures can provide solutions to achieve the result at the time of application. When developing software, attention should be paid to processing performance and power consumption. CPUs run applications with a longer processing time than GPUs. This time directly affects power consumption during performance. GPUs produce faster and more successful results than CPUs for Deep Learning algorithms. The size and diversity of the dataset, which is the most important criterion in the learning phase, increases the learning success to the same extent. In this study, applications were performed on processors with different architectures considering the criteria of dataset size and processing time. In the applications, the power consumption of GPU architectures is measured. CNN, RNN and LSTM deep learning algorithms are applied to 3 different sized datasets. 6 different experiments are performed and determinations and performance comparisons are made on the performance and power consumption. Time and energy criteria were used in processing the algorithms with the results of the study. The results show that Deep Learning algorithms can be used as a tool for predicting GPU systems with high accuracy. The results of the study not only contain important information related to CPU and GPU systems, energy, and time, but also are valuable for future applications in various fields.

Anahtar Kelimeler

Deep Learning , CPU , GPU , Computer Architecture , Performance Analysis

Kaynakça

Själander, M., Martonosi, M., & Kaxiras, S. (2014). Power-efficient computer architectures: Recent advances. Synthesis Lectures on Computer Architecture, 9(3), 1-96.
Huang, L., Lü, Y., Ma, S., Xiao, N., & Wang, Z. (2019). SIMD stealing: Architectural support for efficient data parallel execution on multicores. Microprocessors and Microsystems, 65, 136-147.
Li, T., Evans, A. T., Chiravuri, S., Gianchandani, R. Y., & Gianchandani, Y. B. (2012). Compact, power-efficient architectures using microvalves and microsensors, for intrathecal, insulin, and other drug delivery systems. Advanced drug delivery reviews, 64(14), 1639-1649.
Katreepalli, R., & Haniotakis, T. (2019). Power efficient synchronous counter design. Computers & Electrical Engineering, 75, 288-300.
Dehnavi, M., & Eshghi, M. (2018). Cost and power efficient FPGA based stereo vision system using directional graph transform. Journal of Visual Communication and Image Representation, 56, 106-115.
Huynh, T. V., Mücke, M., & Gansterer, W. N. (2012). Evaluation of the Stretch S6 Hybrid Reconfigurable Embedded CPU Architecture for Power-Efficient Scientific Computing. Procedia Computer Science, 9, 196-205.
Lautner, D., Hua, X., DeBates, S., Song, M., & Ren, S. (2018). Power efficient scheduling algorithms for real-time tasks on multi-mode microcontrollers. Procedia computer science, 130, 557-566.
Wyant, C. M., Cullinan, C. R., & Frattesi, T. R. (2012). Computing performance benchmarks among cpu, gpu, and fpga. Computing.
Mittal, S., & Vetter, J. S. (2014). A survey of methods for analyzing and improving GPU energy efficiency. ACM Computing Surveys (CSUR), 47(2), 1-23.
Betkaoui, B., Thomas, D. B., & Luk, W. (2010). Comparing performance and energy efficiency of FPGAs and GPUs for high productivity computing. In 2010 International Conference on Field-Programmable Technology (pp. 94-101). IEEE.
Mumcu, M. C., & Bayar, S. (2020). Parallel Implenetation Of The GPR Techniques For Detecting And Mapping Ancient Buildings By Using CUDA. Avrupa Bilim ve Teknoloji Dergisi, 352-359.
Stratton, J. A., Anssari, N., Rodrigues, C., Sung, I. J., Obeid, N., Chang, L., ... & Hwu, W. M. (2012). Optimization and architecture effects on GPU computing workload performance. In 2012 Innovative Parallel Computing (InPar) (pp. 1-10). IEEE.
InAccel. (2018). Cpu Gpu Fpga or Tpu, https://medium.com/@inaccel/cpu-gpu-fpga-or-tpu-which-one-to-choose-for-my-machine-learning-training-948902f058e0, 15.03.2021.
Kahoul, A., Constantinides, G. A., Smith, A. M., & Cheung, P. Y. (2009). Heterogeneous architecture exploration: Analysis vs. parameter sweep. In International Workshop on Applied Reconfigurable Computing (pp. 133-144). Springer, Berlin, Heidelberg.
Qasaimeh, M., Denolf, K., Lo, J., Vissers, K., Zambreno, J., & Jones, P. H. (2019). Comparing energy efficiency of CPU, GPU and FPGA implementations for vision kernels. In 2019 IEEE International Conference on Embedded Software and Systems (ICESS) (pp. 1-8). IEEE.
Holm, H. H., Brodtkorb, A. R., & Sætra, M. L. (2020). GPU computing with Python: Performance, energy efficiency and usability. Computation, 8(1), 4.
Bandyopadhyay, A., (2019). Hands-On GPU Computing with Python: Explore the capabilities of GPUs for solving high performance computational problems, Packt Publishing, ISBN-13: 978-1789341072
Aydın, S., Samet, R., & Bay, Ö. F. (2020) Gpu Programlamada Cuda Platformu Kullanılan Paralel Görüntü İşleme Çalışmalarının İncelenmesi. Politeknik Dergisi, 23(3), 737-754.
Vaidya, B. (2018). Hands-On GPU-Accelerated Computer Vision with OpenCV and CUDA: Effective techniques for processing complex image data in real time using GPUs. Packt Publishing Ltd.
Goz, D., Ieronymakis, G., Papaefstathiou, V., Dimou, N., Bertocco, S., Simula, F., ... & Taffoni, G. (2020). Performance and energy footprint assessment of FPGAs and GPUs on HPC systems using Astrophysics application. Computation, 8(2), 34.
Lam, S. K., Pitrou, A., & Seibert, S. (2015). Numba: A llvm-based python jit compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC (pp. 1-6).
Okuta, R., Unno, Y., Nishino, D., Hido, S., & Loomis, C. (2017). Cupy: A numpy-compatible library for nvidia gpu calculations. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS) (p. 7).
Kaehler, A., & Bradski, G. (2016). Learning OpenCV 3: computer vision in C++ with the OpenCV library. " O'Reilly Media, Inc.".
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.
Klöckner, A., Pinto, N., Lee, Y., Catanzaro, B., Ivanov, P., & Fasih, A. (2012). PyCUDA and PyOpenCL: A scripting- based approach to GPU run-time code generation. Parallel Computing, 38(3), 157-174.
Sriadibhatla, S., & Baboji, K. (2019). Design and implementation of area and power efficient reconfigurable fir filter with low complexity coefficients. Gazi University Journal of Science, 32(2), 494-507.
Sanapala, K. (2017). Two Novel Subthreshold Logic Families for Area and Ultra Low-Energy Efficient Applications: DTGDI & SBBGDI. Gazi University Journal of Science, 30(4), 283-294.
Demirbas, A. A., & Çınar, A. (2020) Nesne Sınıflandırma İşlemi İçin Tensor İşleme Birimi ve Cpu Performans Karşılaştırması. Bilgisayar Bilimleri ve Teknolojileri Dergisi, 1(1), 10-15.
Dodiu, E., & Gaitan, V. G. (2012). Custom designed CPU architecture based on a hardware scheduler and independent pipeline registers—Concept and theory of operation. In 2012 IEEE International Conference on Electro/Information Technology (pp. 1-5). IEEE.
Pereira, R., Couto, M., Ribeiro, F., Rua, R., Cunha, J., Fernandes, J. P., & Saraiva, J. (2017). Energy efficiency across programming languages: how do energy, time, and memory relate?. In Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering (pp. 256-267).
Çetin, N. M., & Hacıömeroğlu, M. (2013). Gpu Hızlandırmalı Veri Demetleme Algoritmalarının İncelenmesi. Ajıt-E: Bilişim Teknolojileri Online Dergisi, 4(11), 19-59.
Nvidia Time Series Dataset, Modeling Time Series Data with Recurrent Neural Networks in Keras, Son Erişim Tarihi: 02.01.2021, https://courses.nvidia.com/courses/course-v1:DLI+L-FX-24+V1/about adresinden erişildi.
Tensorflow Mnist Dataset, Loads the MNIST dataset, Son Erişim Tarihi: 05.01.2021, https://www.tensorflow.org/api_docs/python/tf/keras/datasets/mnist/load_data adresinden erişildi.
Tensorflow IMDB Dataset, Loads the IMDB dataset, Son Erişim Tarihi: 09.01.2021, https://www.tensorflow.org/api_docs/python/tf/keras/datasets/imdb/load_data adresinden erişildi.
Selvin, S., Vinayakumar, R., Gopalakrishnan, E. A., Menon, V. K., & Soman, K. P. (2017, September). Stock price prediction using LSTM, RNN and CNN-sliding window model. In 2017 international conference on advances in computing, communications and informatics (icacci) (pp. 1643-1647). IEEE.
Ullah, A., Ahmad, J., Muhammad, K., Sajjad, M., & Baik, S. W. (2017). Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE access, 6, 1155-1166.
Zhu, F., Ye, F., Fu, Y., Liu, Q., & Shen, B. (2019). Electrocardiogram generation with a bidirectional LSTM-CNN generative adversarial network. Scientific reports, 9(1), 1-11.

Toplam 37 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Mühendislik
Bölüm	Araştırma Makalesi
Yazarlar	Tuğba Saray Çetinkaya 0000-0003-1639-553X Ahmet Sertbaş 0000-0001-8166-1211
Yayımlanma Tarihi	31 Ocak 2022
DOI	https://doi.org/10.31590/ejosat.937936
IZ	https://izlik.org/JA57MW28AL
Yayımlandığı Sayı	Yıl 2022 Sayı: 33

Kaynak Göster

APA	Saray Çetinkaya, T., & Sertbaş, A. (2022). Derin Öğrenme Algoritmalarının GPU ve CPU Donanım Mimarileri Üzerinde Uygulanması ve Performans Analizi: Deneysel Araştırma. Avrupa Bilim ve Teknoloji Dergisi, 33, 10-19. https://doi.org/10.31590/ejosat.937936

Cited By

Accuracy Prediction and Analysis of Teachable Machine (TM) Model developed with Tensorflow Javascript on the Cloud: Face Recognition System Implementation

Uluslararası Yönetim Bilişim Sistemleri ve Bilgisayar Bilimleri Dergisi

https://doi.org/10.33461/uybisbbd.1106753

Deep learning-based detection of power transmission lines using YOLOv4 and YOLOv8

Scientific Reports

https://doi.org/10.1038/s41598-025-32200-w

Makale Dosyaları

Tam Metin