Research Article
BibTex RIS Cite

Büyük ölçekli veri analizini desteklemek için Apache Solr ve Elasticsearch teknolojilerinin karşılaştırması

Year 2023, Volume: 13 Issue: 2, 386 - 404, 15.04.2023
https://doi.org/10.17714/gumusfenbil.1213317

Abstract

Büyük veri çağında, gizli içgörüler içerdiği için veriler hiç bu kadar önemli olmamıştı. Ayrca, çok büyük hacimli verilerden kullanılabilir bilgileri çıkarmak zaruri ve zordur. Çeşitli alanlarda veri işleme ve analitiği gerçekleştirmeye çalışırken, veri yoğunluklu sistem geliştiricileri çok çeşitli zorluklarla karşılaşmaktadır. Ayrıca, tam metin arama, büyük veriler içinde gerekli verilerin istenilen kısımlarını ortaya çıkarmak için büyük veri işleme ve analitiğinin en önemli bileşenlerinden biridir. Konunun önemi nedeniyle bu makale, tam metin arama teknolojilerinin özelliklerinin, yeteneklerinin ve teknik karşılaştırmalarının incelenmesiyle başlamakta, ardından Apache Solr ve Elasticsearch'ün indeksleme süreleri üç ayrı veri seti kullanılarak sorgulama açısından sistematik bir karşılaştırması ile devam etmektedir. Bulgularımıza göre, karşılaştırılan teknolojilerin varsayılan konfigürasyonlarını baz alarak, Apache Solr, farklı donanım özelliklerine sahip üç makinede ölçülen indeksleme sürelerine bakıldığında daha iyi performansa sahiptir. Aynı şekilde, on arama sorgusunun yedisinde Apache Solr Elasticsearch'ten daha iyi performans göstermektedir. Sonuçlarımıza göre, kısıtlı donanım kaynaklarına sahip bilgisayarlarda, Elasticsearch yerine Apache Solr kullanmanızı öneririz. Buna ek olarak, bu çalışma, araştırmacılara, veri yoğunluklu sistem geliştiricilerine, gerçekleştirecekleri görevleri için en uygun tam metin arama teknolojisini seçmeleri için eksiksiz bir karşılaştırma ve öneriler sağlamaktadır.

References

  • Anderson, K. M., Aydin, A. A., Barrenechea, M., Cardenas, A., Hakeem, M., & Jambi, S. (2015). Design challenges/solutions for environments supporting the analysis of social media data in crisis informatics research. 2015 48th Hawaii International Conference on System Sciences, 2015-March, 163–172. https://doi.org/10.1109/HICSS.2015.29
  • Apache Lucene. (2022). https://lucene.apache.org/
  • Barrenechea, M., Jambi, S., Aydin, A. A., Hakeem, M., & Anderson, K. M. (2017). Getting the query right for crisis informatics design issues for web-based analysis environments. Journal of Web Engineering, 16(5), 399–432. https://journals.riverpublishers.com/index.php/JWE/article/view/3269/2153
  • Bellini, P., Bugli, F., Nesi, P., Pantaleo, G., Paolucci, M., & Zaza, I. (2019). Data flow management and visual analytic for big data smart city/IOT. Proceedings - 2019 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Internet of People and Smart City Innovation, SmartWorld/UIC/ATC/SCALCOM/IOP/SCI 2019, 1529–1536. https://doi.org/10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00276
  • DB-Engines. (2022). https://db-engines.com/en/
  • Domo Company. (2022). Data Never Sleeps 9.0. https://www.domo.com/learn/infographic/data-never-sleeps-9
  • Dota 2 Matches. (2022). https://www.kaggle.com/datasets/devinanzelmo/dota-2-matches
  • D.S., S. (2016). A quick search on the projects with a high loads and a large amount of data. Modern Technologies: Current Issues, Achievements and Innovations — Collection of Articles III International Scientific Conference / under the General Editorship of G. Yu Gulyaev — Penza MCNS « Science and Education », 23–32.
  • Elastic Installation and Upgrade Guide [8.4]. (2022). https://www.elastic.co/guide/en/elastic-stack/8.4/index.html
  • Elasticsearch vs. Solr Performance: Round 2. (2015). https://www.flax.co.uk/blog/2015/12/02/elasticsearch-vs-solr-performance-round-2/
  • Gonçalves, A. A. S., & Sunye, M. S. (2020). Comparison of search servers for use with digital repositories. ICEIS 2020 - Proceedings of the 22nd International Conference on Enterprise Information Systems, 1, 256–260. https://doi.org/10.5220/0009577102560260
  • Google Play Store Apps. (2022). https://www.kaggle.com/datasets/lava18/google-play-store-apps Google Trends. (2022). https://trends.google.com/trends/
  • Halevi, G., & Moed, H. (2012). The evolution of big data as a research and scientific topic: overview of the literature. Research Trends, 30(36), 3–6.
  • Hansen, J., Porter, K., Shalaginov, A., & Franke, K. (2018). Comparing open source search engine functionality, efficiency and effectiveness with respect to digital forensic search. NISK 2018 - 11th Norwegian Information Security Conference, 108-121.
  • Kılıç, U., & Karabey, I. (2016). Comparison of solr and elasticsearch among popular full-text search engines and their security analysis. UBMK'16 - International Conference on Computer Science and Engineering, 2016 October. https://doi.org/10.13140/RG.2.2.24563.32803
  • Kowsari, K., Brown, D., Heidarysafa, M., Meimandi, K. J., Gerber, M., & Barnes, L. (2018). Web of science dataset. 6. Mendeley Data. https://doi.org/10.17632/9RW3VKCFY4.6
  • Lakhara, S., & Mishra, N. (2017). Desktop full-text searching based on Lucene: a review. 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), 2434–2438. https://doi.org/10.1109/ICPCSI.2017.8392154
  • Lashkaripour, Z. (2020). The era of big data: a thorough inspection in the building blocks of future generation data management. International Journal of Scientific and Technology Research, 9, 321–330.
  • Lokoč, J., Veselý, P., Mejzlík, F., Kovalčík, G., Souček, T., Rossetto, L., Schoeffmann, K., Bailer, W., Gurrin, C., Sauter, L., Song, J., Vrochidis, S., Wu, J., & Jónsson, B. þóR. (2021). Is the reign of interactive search eternal? findings from the video browser showdown 2020. ACM Transactions on Multimedia Computing, Communications, and Applications, 17(3), 1–26. https://doi.org/10.1145/3445031
  • Luburić, N., & Ivanovic, D. (2016). Comparing Apache Solr and Elasticsearch search servers. 6th International Conference on Information Society and Technology - ICIST 2016. http://www.eventiotic.com/eventiotic/files/Papers/URL/icist2016_54.pdf
  • Oussous, A., & Benjelloun, F. (2022). A comparative study of different search and indexing tools for big data. Jordanian Journal of Computers and Information Technology, 8(1), 1. https://doi.org/10.5455/jjcit.71-1637097759
  • Rao, T. R., Mitra, P., Bhatt, R., & Goswami, A. (2018). The big data system, components, tools, and technologies: a survey. Knowledge and Information Systems, 60(3), 1165. https://doi.org/10.1007/s10115-018-1248-0
  • Resources Apache Solr. (2022). https://solr.apache.org/resources.html
  • Voit, A., Stankus, A., Magomedov, S., & Ivanova, I. (2017). Big data processing for full-text search and visualization with elasticsearch. IJACSA - International Journal of Advanced Computer Science and Applications, 8(12). www.ijacsa.thesai.org
  • Wang, J.-F., Wang, X.-F., & Li, H. (2022). Design of multimedia distance teaching auxiliary system based on MOOC platform. ICMTMA 2022 - 14th International Conference on Measuring Technology and Mechatronics Automation, 1179–1186. https://doi.org/10.1109/ICMTMA54903.2022.00237
  • Y. Aldailamy, A., Abdul Hamid, N. A. W., & Abdulkarem, M. (2018). Distributed indexing: performance analysis of Solr, Terrier and Katta information retrievals. Malaysian Journal of Computer Science, 87–104. https://doi.org/10.22452/mjcs.sp2018no1.7
  • Yurtsever, M. M. E., Özcan, M., Taruz, Z., Eken, S., & Sayar, A. (2022). Figure search by text in large scale digital document collections. Concurrency and Computation: Practice and Experience, 34(1). https://doi.org/10.1002/CPE.6529

A comparison of Apache Solr and Elasticsearch technologies in support of large-scale data analysis

Year 2023, Volume: 13 Issue: 2, 386 - 404, 15.04.2023
https://doi.org/10.17714/gumusfenbil.1213317

Abstract

In the era of big data, data has never been more important because it contains hidden insights. Additionally, it is necessary and challenging to extract usable information from enormous volumes of data. When attempting to perform data processing and analytics in a variety of domains, developers of data-intensive systems have consequently met several challenges. In addition, full-text search is one of the most significant components of big data processing and analytics for discovering fragments of required data among large volumes of data. Due to the importance of the subject, this article begins with an examination of the characteristics, capabilities, and technical comparisons of full-text search technologies, followed by a systematic comparison of Apache Solr and Elasticsearch in terms of indexing times and queries on three separate datasets. According to our findings, based on default configuration, Apache Solr has better performance when looking at indexing times measured on three machines with different hardware specifications. Likewise, Apache Solr outperforms Elasticsearch in seven out of ten search queries. Regarding our results, on computers with restricted hardware resources, we recommend utilizing Apache Solr instead of Elasticsearch. In addition, this study provides researchers and developers of data-intensive systems with a complete comparison and suggestions for choosing the most effective full-text search engine for their task.

References

  • Anderson, K. M., Aydin, A. A., Barrenechea, M., Cardenas, A., Hakeem, M., & Jambi, S. (2015). Design challenges/solutions for environments supporting the analysis of social media data in crisis informatics research. 2015 48th Hawaii International Conference on System Sciences, 2015-March, 163–172. https://doi.org/10.1109/HICSS.2015.29
  • Apache Lucene. (2022). https://lucene.apache.org/
  • Barrenechea, M., Jambi, S., Aydin, A. A., Hakeem, M., & Anderson, K. M. (2017). Getting the query right for crisis informatics design issues for web-based analysis environments. Journal of Web Engineering, 16(5), 399–432. https://journals.riverpublishers.com/index.php/JWE/article/view/3269/2153
  • Bellini, P., Bugli, F., Nesi, P., Pantaleo, G., Paolucci, M., & Zaza, I. (2019). Data flow management and visual analytic for big data smart city/IOT. Proceedings - 2019 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Internet of People and Smart City Innovation, SmartWorld/UIC/ATC/SCALCOM/IOP/SCI 2019, 1529–1536. https://doi.org/10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00276
  • DB-Engines. (2022). https://db-engines.com/en/
  • Domo Company. (2022). Data Never Sleeps 9.0. https://www.domo.com/learn/infographic/data-never-sleeps-9
  • Dota 2 Matches. (2022). https://www.kaggle.com/datasets/devinanzelmo/dota-2-matches
  • D.S., S. (2016). A quick search on the projects with a high loads and a large amount of data. Modern Technologies: Current Issues, Achievements and Innovations — Collection of Articles III International Scientific Conference / under the General Editorship of G. Yu Gulyaev — Penza MCNS « Science and Education », 23–32.
  • Elastic Installation and Upgrade Guide [8.4]. (2022). https://www.elastic.co/guide/en/elastic-stack/8.4/index.html
  • Elasticsearch vs. Solr Performance: Round 2. (2015). https://www.flax.co.uk/blog/2015/12/02/elasticsearch-vs-solr-performance-round-2/
  • Gonçalves, A. A. S., & Sunye, M. S. (2020). Comparison of search servers for use with digital repositories. ICEIS 2020 - Proceedings of the 22nd International Conference on Enterprise Information Systems, 1, 256–260. https://doi.org/10.5220/0009577102560260
  • Google Play Store Apps. (2022). https://www.kaggle.com/datasets/lava18/google-play-store-apps Google Trends. (2022). https://trends.google.com/trends/
  • Halevi, G., & Moed, H. (2012). The evolution of big data as a research and scientific topic: overview of the literature. Research Trends, 30(36), 3–6.
  • Hansen, J., Porter, K., Shalaginov, A., & Franke, K. (2018). Comparing open source search engine functionality, efficiency and effectiveness with respect to digital forensic search. NISK 2018 - 11th Norwegian Information Security Conference, 108-121.
  • Kılıç, U., & Karabey, I. (2016). Comparison of solr and elasticsearch among popular full-text search engines and their security analysis. UBMK'16 - International Conference on Computer Science and Engineering, 2016 October. https://doi.org/10.13140/RG.2.2.24563.32803
  • Kowsari, K., Brown, D., Heidarysafa, M., Meimandi, K. J., Gerber, M., & Barnes, L. (2018). Web of science dataset. 6. Mendeley Data. https://doi.org/10.17632/9RW3VKCFY4.6
  • Lakhara, S., & Mishra, N. (2017). Desktop full-text searching based on Lucene: a review. 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), 2434–2438. https://doi.org/10.1109/ICPCSI.2017.8392154
  • Lashkaripour, Z. (2020). The era of big data: a thorough inspection in the building blocks of future generation data management. International Journal of Scientific and Technology Research, 9, 321–330.
  • Lokoč, J., Veselý, P., Mejzlík, F., Kovalčík, G., Souček, T., Rossetto, L., Schoeffmann, K., Bailer, W., Gurrin, C., Sauter, L., Song, J., Vrochidis, S., Wu, J., & Jónsson, B. þóR. (2021). Is the reign of interactive search eternal? findings from the video browser showdown 2020. ACM Transactions on Multimedia Computing, Communications, and Applications, 17(3), 1–26. https://doi.org/10.1145/3445031
  • Luburić, N., & Ivanovic, D. (2016). Comparing Apache Solr and Elasticsearch search servers. 6th International Conference on Information Society and Technology - ICIST 2016. http://www.eventiotic.com/eventiotic/files/Papers/URL/icist2016_54.pdf
  • Oussous, A., & Benjelloun, F. (2022). A comparative study of different search and indexing tools for big data. Jordanian Journal of Computers and Information Technology, 8(1), 1. https://doi.org/10.5455/jjcit.71-1637097759
  • Rao, T. R., Mitra, P., Bhatt, R., & Goswami, A. (2018). The big data system, components, tools, and technologies: a survey. Knowledge and Information Systems, 60(3), 1165. https://doi.org/10.1007/s10115-018-1248-0
  • Resources Apache Solr. (2022). https://solr.apache.org/resources.html
  • Voit, A., Stankus, A., Magomedov, S., & Ivanova, I. (2017). Big data processing for full-text search and visualization with elasticsearch. IJACSA - International Journal of Advanced Computer Science and Applications, 8(12). www.ijacsa.thesai.org
  • Wang, J.-F., Wang, X.-F., & Li, H. (2022). Design of multimedia distance teaching auxiliary system based on MOOC platform. ICMTMA 2022 - 14th International Conference on Measuring Technology and Mechatronics Automation, 1179–1186. https://doi.org/10.1109/ICMTMA54903.2022.00237
  • Y. Aldailamy, A., Abdul Hamid, N. A. W., & Abdulkarem, M. (2018). Distributed indexing: performance analysis of Solr, Terrier and Katta information retrievals. Malaysian Journal of Computer Science, 87–104. https://doi.org/10.22452/mjcs.sp2018no1.7
  • Yurtsever, M. M. E., Özcan, M., Taruz, Z., Eken, S., & Sayar, A. (2022). Figure search by text in large scale digital document collections. Concurrency and Computation: Practice and Experience, 34(1). https://doi.org/10.1002/CPE.6529
There are 27 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Articles
Authors

Ayşenur Deniz 0000-0003-0895-9171

Muhammed Mehdi Elömer 0000-0003-4000-333X

Ahmet Arif Aydın 0000-0002-4124-7275

Publication Date April 15, 2023
Submission Date December 1, 2022
Acceptance Date March 10, 2023
Published in Issue Year 2023 Volume: 13 Issue: 2

Cite

APA Deniz, A., Elömer, M. M., & Aydın, A. A. (2023). A comparison of Apache Solr and Elasticsearch technologies in support of large-scale data analysis. Gümüşhane Üniversitesi Fen Bilimleri Dergisi, 13(2), 386-404. https://doi.org/10.17714/gumusfenbil.1213317