TY - JOUR T1 - Real-Time Big Data Processing and Analytics: Concepts, Technologies, and Domains TT - Gerçek Zamanlı Büyük Veri İşleme ve Analitiği: Kavramlar, Teknolojiler ve Etki Alanları AU - Aydın, Ahmet Arif AU - Kekevi, Uğur PY - 2022 DA - December Y2 - 2022 DO - 10.53070/bbd.1204112 JF - Computer Science JO - JCS PB - Ali KARCI WT - DergiPark SN - 2548-1304 SP - 111 EP - 123 VL - Vol:7 IS - Issue:2 LA - en AB - In the digital era, data is one of the most important assets since it conceals valuable information. Developers of data-intensive systems have new challenges at each level of streaming, storing, and processing large quantities of data in a variety of forms and speeds. Obtaining useful information at the proper time and place is also crucial. Since the value of information is inversely proportional to time, real-time data processing and analytics are receiving more attention. Due to the importance of real-time data processing and analytics, this study focuses on real-time data processing concepts and terminology, popular technologies used in real-time data processing and analytics, popular NoSQL storage technologies used in real-time data processing, and real-time data processing application areas. The purpose of this paper is to provide researchers of real-time analysis and developers of data-intensive systems with a comparative perspective on real-time data processing by highlighting the key characteristics of real-time data processing technologies, NoSQL storage technologies, their application domains, and selected examples from previous studies. KW - Big data analysis KW - real-time data processing KW - streaming technologies KW - NoSQL N2 - Dijital çağda veriler, değerli bilgileri gizlediği için en önemli varlıklardan biridir. Veri yoğun sistemlerin geliştiricileri, çeşitli biçimlerde ve hızlarda büyük miktarda verinin akışının, depolanmasının ve işlenmesinin her düzeyinde yeni zorluklarla karşılaşmaktadır. Doğru zamanda ve yerde faydalı bilgiler edinmek de çok önemlidir. Bilginin değeri zamanla ters orantılı olduğundan, gerçek zamanlı veri işleme ve analitik daha fazla ilgi görmektedir. Gerçek zamanlı veri işleme ve analitiğin önemi nedeniyle, bu çalışmada gerçek zamanlı veri işleme kavramları ve terminolojisi, gerçek zamanlı veri işleme ve analitikte kullanılan popüler teknolojiler, gerçek zamanlı veri işlemede kullanılan popüler NoSQL depolama teknolojileri, ve gerçek zamanlı veri işleme uygulama alanları sunulmuştur. Bu makalenin amacı, gerçek zamanlı veri işleme teknolojilerinin temel özelliklerini, NoSQL depolama teknolojilerini ve bunların uygulamalarını vurgulayarak, gerçek zamanlı analiz araştırmacılarına ve veri yoğun sistem geliştiricilerine gerçek zamanlı veri işleme konusunda önceki çalışmalardan seçilmiş örnekler ile karşılaştırmalı bir bakış açısı sağlamaktır. CR - Abdul Ghani, N. B., Hamid, S., Ahmad, M., Saadi, Y., Jhanjhi, N. Z., Alzain, M. A., & Masud, M. (2021). Tracking Dengue on Twitter Using Hybrid Filtration-Polarity and Apache Flume. Computer Systems Science and Engineering, 40(3), 913–926. https://doi.org/10.32604/CSSE.2022.018467 CR - Acharjya, D. P., & Ahmed, K. (n.d.). A Survey on Big Data Analytics: Challenges, Open Research Issues and Tools. www.ijacsa.thesai.org CR - Acharjya, D. P., & Ahmed P, K. (2016). A Survey on Big Data Analytics: Challenges, Open Research Issues and Tools. International Journal of Advanced Computer Sciences and Applıcatıons, 7(2), 511–518. CR - Alhomsi, Y., Alsalemi, A., al Disi, M., Bensaali, F., Amira, A., & Alinier, G. (2019). CouchDB Based Real-Time Wireless Communication System for Clinical Simulation. Proceedings - 20th International Conference on High Performance Computing and Communications, 16th International Conference on Smart City and 4th International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, 1094–1098. https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00182 CR - Apache Software Foundation. (2022a). Cassandra. https://cassandra.apache.org/_/index.html CR - Apache Software Foundation. (2022b). CouchDB. https://couchdb.apache.org/ CR - Apache Software Foundation. (2022c). Flink. https://flink.apache.org/ CR - Apache Software Foundation. (2022d). Flume. https://flume.apache.org/ CR - Apache Software Foundation. (2022e). Hadoop. https://hadoop.apache.org/ CR - Apache Software Foundation. (2022f). HBase. https://hbase.apache.org/ CR - Apache Software Foundation. (2022g). Kafka. https://kafka.apache.org/ CR - Apache Software Foundation. (2022h). Spark. https://spark.apache.org/ CR - Apache Software Foundation. (2022i). Storm. https://storm.apache.org/ CR - Aydin, A. A. (2016). INCREMENTAL DATA COLLECTION & ANALYTICS THE DESIGN OF NEXT-GENERATION CRISIS INFORMATICS SOFTWARE. CR - Aydin, A. A., & Anderson, K. M. (2017). Batch to Real-Time : Incremental Data Collection & Analytics Platform. Proceedings of the 50th Hawaii International Conference on System Sciences, 5911–5920. CR - Azzedin, F. (2013). Towards a scalable HDFS architecture. Proceedings of the 2013 International Conference on Collaboration Technologies and Systems, CTS 2013, 155–161. https://doi.org/10.1109/CTS.2013.6567222 CR - Bagga, S., & Sharma, A. (2019). Big Data and Its Challenges: A Review. Proceedings - 4th International Conference on Computing Sciences, ICCS 2018, 183–187. https://doi.org/10.1109/ICCS.2018.00037 CR - Bajaber, F., Elshawi, R., Batarfi, O., Altalhi, A., Barnawi, A., & Sakr, S. (2016). Big Data 2.0 Processing Systems: Taxonomy and Open Challenges. Journal of Grid Computing, 14(3), 379–405. https://doi.org/10.1007/s10723-016-9371-1 CR - Baron, C. A. (2015). NoSQL Key-Value DBs Riak and Redis. In Database Systems Journal: Vol. VI (Issue 4). CR - Beata, P. A., Jeffers, A. E., & Kamat, V. R. (2018). Real-Time Fire Monitoring and Visualization for the Post-Ignition Fire State in a Building. Fire Technology, 54(4), 995–1027. https://doi.org/10.1007/s10694-018-0723-1 CR - Chatterjee, N., Chakraborty, S., Decosta, A., & Nath, A. (2018). Real-time Communication Application Based on Android Using Google Firebase. International Journal of Advance Research in Computer Science and Management Studies, 6(4). www.ijarcsms.com CR - Croushore, D., & Stark, T. (2001). A real-time data set for macroeconomists. In Journal of Econometrics (Vol. 105). www.elsevier.com/locate/econbase CR - DB-Engines. (2022). https://db-engines.com/en/ CR - de Castro Martins, J., Mancilha Pinto, A. F., Junior, E. E. B., Goncalves, G. S., Louro, H. D. B., Gomes, J. M., Filho, L. A. L., da Silva, L. H. R. C., Rodrigues, R. A., Neto, W. C., da Cunha, A. M., & Dias, L. A. V. (2018). Using big data, internet of things, and agile for crises management. Advances in Intelligent Systems and Computing, 558, 373–382. https://doi.org/10.1007/978-3-319-54978-1_50 CR - Diogo, M., Cabral, B., & Bernardino, J. (2019). Consistency models of NoSQL databases. In Future Internet (Vol. 11, Issue 2). MDPI AG. https://doi.org/10.3390/fi11020043 CR - Doğuç, T. B., & Aydin, A. A. (2019). CAP-based Examination of Popular NoSQL Database Technologies in Streaming Data Processing. 2019 International Artificial Intelligence and Data Processing Symposium (IDAP). CR - Dutta, K., & Jayapal, M. (2016). Big Data Analytics for Real Time Systems. https://www.researchgate.net/publication/304078196 CR - Erzi, H. M., & Aydin, A. A. (2020). IoT Based Mobile Smart Home Surveillance Application. 4th International Symposium on Multidisciplinary Studies and Innovative Technologies, ISMSIT 2020 - Proceedings. https://doi.org/10.1109/ISMSIT50672.2020.9255303 CR - Gavrilenko, I., Sharma, M., Litmaath, M., Tikhomirova, T., Gavrilenko, I., Sharma, M., Litmaath, M., & Tikhomirova, T. (2019). DYNAMIC APACHE SPARK CLUSTER FOR ECONOMIC MODELING. CR - Gibadullin, R. F., Baimukhametova, G. A., & Perukhin, M. Y. (2019). Service-Oriented Distributed Energy Data Management Using Big Data Technologies; Service-Oriented Distributed Energy Data Management Using Big Data Technologies. In 2019 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM). CR - Google Trends. (2022). https://trends.google.com/trends/ CR - Guo, D., & Onstein, E. (2020). State-of-the-art geospatial information processing in NoSQL databases. In ISPRS International Journal of Geo-Information (Vol. 9, Issue 5). MDPI AG. https://doi.org/10.3390/ijgi9050331 CR - Gürcan, F., & Berigel, M. (2018). Real-Time Processing of Big Data Streams: Lifecycle, Tools, Tasks, and Challenges; Real-Time Processing of Big Data Streams: Lifecycle, Tools, Tasks, and Challenges. In 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT). CR - Hamadou, H. ben, Bach Pedersen, T., & Thomsen, C. (2020). The Danish National Energy Data Lake: Requirements, Technical Architecture, and Tool Selection. Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020, 1523–1532. https://doi.org/10.1109/BigData50022.2020.9378368 CR - Han, H., Yonggang, W., Tat-Seng, C., & Xuelong, L. (2014). Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. Access, IEEE, 2, 652–687. https://doi.org/0.11 09/ACCESS.2014.2332453 CR - Hegde, G. P., Tech, M., Hegde, N., & Seetha, M. (2021). SMART CITY DATA GENERATION FOR IOT APPLICATIONS USING ESSENTIAL HADOOP FRAMEWORKS. Embracing Change & Transformation-Breakthrough Innovation and Creativity, 153–160. CR - Jiang, S., Qian, X., Mei, T., & Fu, Y. (2016). Personalized Travel Sequence Recommendation on Multi-Source Big Social Media. IEEE Transactions on Big Data, 2(1), 43–56. https://doi.org/10.1109/tbdata.2016.2541160 CR - Kejariwal, A., Kulkarni, S., & Ramasamy, K. (2017). Real Time Analytics: Algorithms and Systems. http://arxiv.org/abs/1708.02621 CR - Khan, M. F., Azam, M., Khan, M. A., Algarni, F., Ashfaq, M., Ahmad, I., & Ullah, I. (2021). A Review of Big Data Resource Management: Using Smart Grid Systems as a Case Study. Wireless Communications and Mobile Computing, 2021. https://doi.org/10.1155/2021/3740476 CR - Krishnamoorthy, R., & Udhayakumar, K. (2021). Futuristic bigdata framework with optimization techniques for wind energy resource assessment and management in smart grid. Proceedings of the 7th International Conference on Electrical Energy Systems, ICEES 2021, 507–514. https://doi.org/10.1109/ICEES51510.2021.9383710 CR - Lakshman, A., & Malik, P. (2014). Cassandra - A Decentralized Structured Storage System. Dancing Times, 105(1252), 43. https://doi.org/10.1145/1773912.1773922 CR - Lennon, J. (2009). CouchDB Beginning. CR - Li, W. J., Yen, C., Lin, Y. S., Tung, S. C., & Huang, S. M. (2018). JustIoT Internet of Things based on the Firebase real-time database. Proceedings - 2018 IEEE International Conference on Smart Manufacturing, Industrial and Logistics Engineering, SMILE 2018, 2018-January, 43–47. https://doi.org/10.1109/SMILE.2018.8353979 CR - Liu, X., Lftikhar, N., & Xie, X. (2014). Survey of real-time processing systems for big data. ACM International Conference Proceeding Series, 356–361. https://doi.org/10.1145/2628194.2628251 CR - Lv, Z., Chirivella, J., & Gagliardo, P. (2016). Bigdata oriented multimedia mobile health applications. Journal of Medical Systems, 40(5). https://doi.org/10.1007/s10916-016-0475-8 CR - Lv, Z., Li, X., Zhang, B., Wang, W., Zhu, Y., Hu, J., & Feng, S. (2016). Managing Big City Information Based on WebVRGIS. IEEE Access, 4, 407–415. https://doi.org/10.1109/ACCESS.2016.2517076 CR - Lv, Z., Song, H., Basanta-Val, P., Steed, A., & Jo, M. (2017). Next-Generation Big Data Analytics: State of the Art, Challenges, and Future Research Topics. IEEE Transactions on Industrial Informatics, 13(4), 1891–1899. https://doi.org/10.1109/TII.2017.2650204 CR - Miler, M., Medak, D., & Odobasic, D. (2011). Two-Tier Architecture for Web Mapping with NoSQL Database CouchDB. 62–71. https://www.researchgate.net/publication/236951067 CR - MongoDB. (2022). https://www.mongodb.com/ CR - Moroney, L. (2017a). The Definitive Guide to Firebase. In The Definitive Guide to Firebase. Apress. https://doi.org/10.1007/978-1-4842-2943-9 CR - Moroney, L. (2017b). The Definitive Guide to Firebase. In The Definitive Guide to Firebase. https://doi.org/10.1007/978-1-4842-2943-9 CR - Nambiar, S., Kalambur, S., & Sitaram, D. (2020). Modeling Access Control on Streaming Data in Apache Storm. Procedia Computer Science, 171, 2734–2739. https://doi.org/10.1016/j.procs.2020.04.297 CR - Nasiri, H., Nasehi, S., & Goudarzi, M. (2019). Evaluation of distributed stream processing frameworks for IoT applications in Smart Cities. Journal of Big Data, 6(1). https://doi.org/10.1186/s40537-019-0215-2 CR - Nasr, K. (2021). Comparison of Popular Data Processing Systems KTH Thesis Report. Degree Project in Computer Science and Engineering, 76. https://www.diva-portal.org/smash/record.jsf?dswid=6172&pid=diva2%3A1547503 CR - Oussous, A., Benjelloun, F. Z., Ait Lahcen, A., & Belfkih, S. (2018). Big Data technologies: A survey. In Journal of King Saud University - Computer and Information Sciences (Vol. 30, Issue 4, pp. 431–448). King Saud bin Abdulaziz University. https://doi.org/10.1016/j.jksuci.2017.06.001 CR - Philip Chen, C. L., & Zhang, C. Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information Sciences, 275, 314–347. https://doi.org/10.1016/j.ins.2014.01.015 CR - Redis. (2022). https://redis.io/ CR - Riak. (2022). https://riak.com/ CR - Ryan, J. (2019). Big Data Velocity in Plain English. https://www.voltdb.com/wp-content/uploads/2018/02/VoltDB_BigData_eBook_Feb2018-v2.pdf CR - Saloot, M. A., & Pham, D. N. (2021). Real-time Text Stream Processing: A Dynamic and Distributed NLP Pipeline. ACM International Conference Proceeding Series, 575–584. https://doi.org/10.1145/3459104.3459198 CR - Saranya, K., Chellammal, S., & Chelliah, P. R. (2020). Ontology-Based Information Retrieval for Healthcare Systems. CR - Schram, A., & Anderson, K. M. (2012). MySQL to NoSQL. 191. https://doi.org/10.1145/2384716.2384773 CR - Singh, V. K., Taram, M., Agrawal, V., & Baghel, B. S. (2018). A Literature Review on Hadoop Ecosystem and Various Techniques of Big Data Optimization. In Lecture Notes in Networks and Systems (Vol. 38, pp. 231–240). Springer. https://doi.org/10.1007/978-981-10-8360-0_22 CR - Splunk. (2022). https://www.splunk.com/ CR - Sudhakar Yadav, N., Eswara Reddy, B., & Srinivasa, K. G. (2018). Cloud-Based Healthcare Monitoring System Using Storm and Kafka. In Towards Extensible and Adaptable Methods in Computing (pp. 99–106). Springer Singapore. https://doi.org/10.1007/978-981-13-2348-5_8 CR - Sun, Z., Han, L., Huang, W., Wang, X., Zeng, X., Wang, M., & Yan, H. (2015). Recommender systems based on social networks. Journal of Systems and Software, 99, 109–119. CR - Syed, D., Zainab, A., Ghrayeb, A., Refaat, S. S., Abu-Rub, H., & Bouhali, O. (2021). Smart Grid Big Data Analytics: Survey of Technologies, Techniques, and Applications. IEEE Access, 9, 59564–59585. https://doi.org/10.1109/ACCESS.2020.3041178 CR - Tang, L., Li, J., Du, H., Li, L., Wu, J., & Wang, S. (2022). Big Data in Forecasting Research: A Literature Review. Big Data Research, 27, 100289. https://doi.org/10.1016/j.bdr.2021.100289 CR - Verma, S., Kawamoto, Y., Fadlullah, Z. M., Nishiyama, H., & Kato, N. (2017). A Survey on Network Methodologies for Real-Time Analytics of Massive IoT Data and Open Research Issues. In IEEE Communications Surveys and Tutorials (Vol. 19, Issue 3, pp. 1457–1477). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/COMST.2017.2694469 CR - Vohra, D. (2016). Practical Hadoop Ecosystem. In Practical Hadoop Ecosystem. Apress. https://doi.org/10.1007/978-1-4842-2199-0 CR - Xie, L., Zhou, W., & Li, Y. (2016). Application of improved recommendation system based on spark platform in big data analysis. Cybernetics and Information Technologies, 16(Specialissue6), 245–255. https://doi.org/10.1515/cait-2016-0092 CR - Yang, J., Wang, H., Lv, Z., Wei, W., Song, H., Erol-Kantarci, M., Kantarci, B., & He, S. (2017). Multimedia recommendation and transmission system based on cloud platform. Future Generation Computer Systems, 70, 94–103. https://doi.org/10.1016/j.future.2016.06.015 CR - Yaqoob, I., Hashem, I. A. T., Gani, A., Mokhtar, S., Ahmed, E., Anuar, N. B., & Vasilakos, A. v. (2016). Big data: From beginning to future. In International Journal of Information Management (Vol. 36, Issue 6, pp. 1231–1247). Elsevier Ltd. https://doi.org/10.1016/j.ijinfomgt.2016.07.009 CR - Zheng, Z., Wang, P., Liu, J., & Sun, S. (2015). Real-time big data processing framework: Challenges and solutions. Applied Mathematics and Information Sciences, 9(6), 3169–3190. https://doi.org/10.12785/amis/090646 UR - https://doi.org/10.53070/bbd.1204112 L1 - https://dergipark.org.tr/en/download/article-file/2770544 ER -