Araştırma Makalesi

AN ANALYSIS OF APACHE SPARK AND GPU PERFORMANCES ON DATABASE SQL QUERIES FOR DISTRIBUTED NETWORKS

Cilt: 11 Sayı: 24 31 Aralık 2024
PDF İndir
EN TR

AN ANALYSIS OF APACHE SPARK AND GPU PERFORMANCES ON DATABASE SQL QUERIES FOR DISTRIBUTED NETWORKS

Abstract

The use of GPU in different fields and its successful results initiate efforts to use GPU in database systems. It is also effective in distributed systems and computer networks in that accelerates computational tasks by leveraging parallel processing capabilities across multiple nodes and for tasks that require high computational power, such as network traffic analysis and real-time data processing. Digital transformation in all areas of life has led to the emergence of needs such as increased data diversity and faster data analysis. Upgrading the hardware capacity of the system or software-based studies are possible solutions to analyze this data for meeting the needs. In this study, Apache Spark and GPU performance differences are examined in commonly used SQL queries on big data. In this context, SQL queries such as grouping, sorting, and filtering, which are commonly used in data analysis, are used. While the queries performed with the GPU showed similar results in simple queries compared to the queries performed with Apache Spark, the GPU was completed 3x faster in queries requiring calculation.

Keywords

GPU , Apache Spark , Distributed Networking , Big Data , HPC

Kaynakça

  1. Einav, L., and Levin, J., (2013). The Data Revolution and Economic Analysis. Innovation Policy and the Economy. 14: 1-24
  2. Mishra, R., and Sharma, R, 2015. Big Data: Opportunities and Challenges. International Journal of Computer Science and Mobile Computing, 4(6): 27-35.
  3. Mcdonald, C., 2018. Spark 101: What Is It, What It Does, and Why It Matters. https://mapr.com/blog/spark-101-what-it-what-it-does-and-why-it-matters
  4. Ahmed, N., Barczak, A., L., C., Susnjak, T. A, (2020), and Rashid, M., A., ‘Comprehensive Performance Analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench.’ Journal of Big Data 7, article numer 110. https://doi.org/10.1186/s40537-020-00388-5
  5. Kennedy, R., 2023, at Nasuni, The New Era of Big Data., ‘The_New_Era_of_Big_Data.’, (2023), https://www.forbes.com/councils/forbestechcouncil/2023/05/24/the-new-era-of-big-data/
  6. Guner, K., and Kosar, T., In proceedings "Energy-Efficient Mobile Network I/O", IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, United Arab Emirates, 2018, pp. 1-6.
  7. Wang, K., Khan, and M.M.H., 2015. Performance Prediction For Apache Spark Platform, High Performance Computing and Communications (HPCC), IEEE 7th International Symposium On Cyberspace Safety and Security (CSS), IEEE 12th International Conference On Embedded Software and Systems (ICESS), IEEE 17th International Conference, New York, P.166-173.
  8. Tang, S., He, B., Yu C., Li, Y. and Li, K., (2022), "A Survey on Spark Ecosystem: Big Data Processing Infrastructure, Machine Learning, and Applications," in IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 1, pp. 71-91, 1 Jan. 2022, doi: 10.1109/TKDE.2020.2975652.
  9. Lunga, D., Gerrand, J., Yang, L., Layton, C., and Stewart, R., 2020, "Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics," in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 271-283,
  10. Tang Z., Zeng A., Zhang X., Yang L., and Li K., 2020, ‘Dynamic memory-aware scheduling in spark computing environment’ in J. Parallel. Distrib. Comput., 141 (2020), pp. 10-22

Kaynak Göster

APA
Turan, M., Tenekeci, E., & Güner, K. (2024). AN ANALYSIS OF APACHE SPARK AND GPU PERFORMANCES ON DATABASE SQL QUERIES FOR DISTRIBUTED NETWORKS. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi, 11(24), 428-437. https://doi.org/10.54365/adyumbd.1508182
AMA
1.Turan M, Tenekeci E, Güner K. AN ANALYSIS OF APACHE SPARK AND GPU PERFORMANCES ON DATABASE SQL QUERIES FOR DISTRIBUTED NETWORKS. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi. 2024;11(24):428-437. doi:10.54365/adyumbd.1508182
Chicago
Turan, Mehmet, Emin Tenekeci, ve Kemal Güner. 2024. “AN ANALYSIS OF APACHE SPARK AND GPU PERFORMANCES ON DATABASE SQL QUERIES FOR DISTRIBUTED NETWORKS”. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi 11 (24): 428-37. https://doi.org/10.54365/adyumbd.1508182.
EndNote
Turan M, Tenekeci E, Güner K (01 Aralık 2024) AN ANALYSIS OF APACHE SPARK AND GPU PERFORMANCES ON DATABASE SQL QUERIES FOR DISTRIBUTED NETWORKS. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi 11 24 428–437.
IEEE
[1]M. Turan, E. Tenekeci, ve K. Güner, “AN ANALYSIS OF APACHE SPARK AND GPU PERFORMANCES ON DATABASE SQL QUERIES FOR DISTRIBUTED NETWORKS”, Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi, c. 11, sy 24, ss. 428–437, Ara. 2024, doi: 10.54365/adyumbd.1508182.
ISNAD
Turan, Mehmet - Tenekeci, Emin - Güner, Kemal. “AN ANALYSIS OF APACHE SPARK AND GPU PERFORMANCES ON DATABASE SQL QUERIES FOR DISTRIBUTED NETWORKS”. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi 11/24 (01 Aralık 2024): 428-437. https://doi.org/10.54365/adyumbd.1508182.
JAMA
1.Turan M, Tenekeci E, Güner K. AN ANALYSIS OF APACHE SPARK AND GPU PERFORMANCES ON DATABASE SQL QUERIES FOR DISTRIBUTED NETWORKS. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi. 2024;11:428–437.
MLA
Turan, Mehmet, vd. “AN ANALYSIS OF APACHE SPARK AND GPU PERFORMANCES ON DATABASE SQL QUERIES FOR DISTRIBUTED NETWORKS”. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi, c. 11, sy 24, Aralık 2024, ss. 428-37, doi:10.54365/adyumbd.1508182.
Vancouver
1.Mehmet Turan, Emin Tenekeci, Kemal Güner. AN ANALYSIS OF APACHE SPARK AND GPU PERFORMANCES ON DATABASE SQL QUERIES FOR DISTRIBUTED NETWORKS. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi. 01 Aralık 2024;11(24):428-37. doi:10.54365/adyumbd.1508182