Research Article
BibTex RIS Cite

A comparison of tree data structures in the streaming data clustering issue

Year 2024, Volume: 39 Issue: 1, 217 - 232, 21.08.2023
https://doi.org/10.17341/gazimmfd.1144533

Abstract

Advances in technology have allowed people to collect data produced in many different sources and analyze it. The data collected from sensors, mobile devices, and the internet of things are in the form of streaming data, and it is a hard problem to obtain useful information from such data. In clustering which is one of the most frequently used methods to analyze streaming data, data is analyzed by dividing them into various groups according to their distribution. In this study, two new algorithms in streaming data clustering were developed and compared with another clustering algorithm in the literature. As a result, the algorithms developed gave successful results on different datasets.

References

  • AlNuaimi, N., et al., Streaming feature selection algorithms for big data: A survey. Applied Computing and Informatics, 2020.
  • Das, A., S. Das, and N.J.A.I.i.E. Rathee, Roles of Big Data, Data Science, Artificial Intelligence in Entrepreneurships. 2021.
  • Zheng, X., et al., A survey on multi-label data stream classification. IEEE Access, 2019. 8: p. 1249-1275.
  • Jain, A.K., Data clustering: 50 years beyond K-means. Pattern recognition letters, 2010. 31(8): p. 651-666.
  • Yin, C., et al., Anomaly detection model based on data stream clustering. Cluster Computing, 2019. 22(1): p. 1729-1738.
  • Laurinec, P. and M. Lucká, Interpretable multiple data streams clustering with clipped streams representation for the improvement of electricity consumption forecasting. Data Mining and Knowledge Discovery, 2019. 33(2): p. 413-445.
  • Gajowniczek, K., M. Bator, and T. Ząbkowski, Whole time series data streams clustering: dynamic profiling of the electricity consumption. Entropy, 2020. 22(12): p. 1414.
  • Tajalizadeh, H. and R. Boostani, A novel stream clustering framework for spam detection in Twitter. IEEE Transactions on Computational Social Systems, 2019. 6(3): p. 525-534.
  • Yin, J., et al. Model-based clustering of short text streams. in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018.
  • Diaz-Rozo, J., C. Bielza, and P. Larrañaga, Clustering of data streams with dynamic gaussian mixture models: an IoT application in industrial processes. IEEE Internet of Things Journal, 2018. 5(5): p. 3533-3547.
  • Al-Shammari, A., et al., An effective density-based clustering and dynamic maintenance framework for evolving medical data streams. International journal of medical informatics, 2019. 126: p. 176-186.
  • Hendricks, D., Using real-time cluster configurations of streaming asynchronous features as online state descriptors in financial markets. Pattern Recognition Letters, 2017. 97: p. 21-28.
  • Zubaroğlu, A. and V. Atalay, Data stream clustering: a review. Artificial Intelligence Review, 2021. 54(2): p. 1201-1236.
  • Kokate, U., et al., Data stream clustering techniques, applications, and models: comparative analysis and discussion. Big Data and Cognitive Computing, 2018. 2(4): p. 32.
  • Mansalis, S., et al., An evaluation of data stream clustering algorithms. Statistical Analysis and Data Mining: The ASA Data Science Journal, 2018. 11(4): p. 167-187.
  • Kranen, P., et al., The clustree: indexing micro-clusters for anytime stream mining. Knowledge and information systems, 2011. 29(2): p. 249-272.
  • Zhang, T., R. Ramakrishnan, and M. Livny, BIRCH: an efficient data clustering method for very large databases. ACM sigmod record, 1996. 25(2): p. 103-114.
  • Lang, A. and E. Schubert, BETULA: Fast clustering of large data with improved BIRCH CF-Trees. Information Systems, 2022. 108: p. 101918.
  • Aggarwal, C.C., et al. A framework for clustering evolving data streams. in Proceedings 2003 VLDB conference. 2003. Elsevier.
  • Zhou, A., et al., Tracking clusters in evolving data streams over sliding windows. Knowledge and Information Systems, 2008. 15(2): p. 181-214.
  • Karypis, G., E.-H. Han, and V. Kumar, Chameleon: Hierarchical clustering using dynamic modeling. Computer, 1999. 32(8): p. 68-75.
  • Lühr, S. and M. Lazarescu, Incremental clustering of dynamic data streams using connectivity based representative points. Data & knowledge engineering, 2009. 68(1): p. 1-27.
  • Udommanetanakit, K., T. Rakthanmanon, and K. Waiyamai. E-stream: Evolution-based technique for stream clustering. in International conference on advanced data mining and applications. 2007. Springer.
  • Meesuksabai, W., T. Kangkachit, and K. Waiyamai. Hue-stream: Evolution-based clustering technique for heterogeneous data streams with uncertainty. in International Conference on Advanced Data Mining and Applications. 2011. Springer.
  • Nikpour, S. and S. Asadi, A dynamic hierarchical incremental learning-based supervised clustering for data stream with considering concept drift. Journal of Ambient Intelligence and Humanized Computing, 2022: p. 1-21.
  • Sangma, J.W., et al., Hierarchical clustering for multiple nominal data streams with evolving behaviour. Complex & Intelligent Systems, 2022: p. 1-25.
  • Ahmed, R., G. Dalkılıç, and Y. Erten, DGStream: High quality and efficiency stream clustering algorithm. Expert Systems with Applications, 2020. 141: p. 112947.
  • Li, Y., et al., Esa-stream: Efficient self-adaptive online data stream clustering. IEEE Transactions on Knowledge and Data Engineering, 2020.
  • Huang, L., et al., MVStream: Multiview data stream clustering. IEEE transactions on neural networks and learning systems, 2019. 31(9): p. 3482-3496.
  • Laohakiat, S. and V. Sa-Ing, An incremental density-based clustering framework using fuzzy local clustering. Information Sciences, 2021. 547: p. 404-426.
  • Nguyen, H.-L., Y.-K. Woon, and W.-K. Ng, A survey on data stream clustering and classification. Knowledge and information systems, 2015. 45(3): p. 535-569.
  • Şenol, A. and H. Karacan, Kd-tree and adaptive radius (KD-AR Stream) based real-time data stream clustering. Journal of the Faculty of Engineering Architecture of Gazi University, 2020. 35(1): p. 337-354.
  • Bentley, J.L., Multidimensional binary search trees used for associative searching. Communications of the ACM, 1975. 18(9): p. 509-517.
  • Omohundro, S.M., Five balltree construction algorithms. 1989: International Computer Science Institute Berkeley.
  • Yianilos, P.N. Data Structures and Algorithms for Nearest Neighbor. in Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms. 1993. SIAM.
  • Cao, F., et al. Density-based clustering over an evolving data stream with noise. in Proceedings of the 2006 SIAM international conference on data mining. 2006. SIAM.
  • Dua, D. and C. Graff. UCI Machine Learning Repository. 2021; Available from: http://archive.ics.uci.edu/ml.

Akan veri kümeleme probleminde ağaç veri yapılarının performans karşılaştırması

Year 2024, Volume: 39 Issue: 1, 217 - 232, 21.08.2023
https://doi.org/10.17341/gazimmfd.1144533

Abstract

Teknolojideki gelişmeler, insanların pek çok farklı kaynakta üretilen verileri toplamasına ve analiz etmesine imkân tanımıştır. Sensörler, mobil cihazlar, nesnelerin interneti gibi yapılarda üretilen veriler akan veri formatında olup, bu tür verilerden işlenerek faydalı bilgilerin elde edilmesi zor bir problemdir. Akan verileri analiz etmek için sıklıkla kullanılan yöntemlerden birisi olan kümelemede, veriler dağılımlarına göre çeşitli gruplara ayrılarak analiz edilir. Bu çalışmada, akan veri kümeleme problemi için iki yeni algoritma geliştirilerek literatürdeki başka bir yöntemle karşılaştırılmıştır. Farklı veri kümeleri üzerinde yapılan deneyler neticesinde, geliştirilen algoritmaların iyi sonuçlar verdiği görülmüştür.

References

  • AlNuaimi, N., et al., Streaming feature selection algorithms for big data: A survey. Applied Computing and Informatics, 2020.
  • Das, A., S. Das, and N.J.A.I.i.E. Rathee, Roles of Big Data, Data Science, Artificial Intelligence in Entrepreneurships. 2021.
  • Zheng, X., et al., A survey on multi-label data stream classification. IEEE Access, 2019. 8: p. 1249-1275.
  • Jain, A.K., Data clustering: 50 years beyond K-means. Pattern recognition letters, 2010. 31(8): p. 651-666.
  • Yin, C., et al., Anomaly detection model based on data stream clustering. Cluster Computing, 2019. 22(1): p. 1729-1738.
  • Laurinec, P. and M. Lucká, Interpretable multiple data streams clustering with clipped streams representation for the improvement of electricity consumption forecasting. Data Mining and Knowledge Discovery, 2019. 33(2): p. 413-445.
  • Gajowniczek, K., M. Bator, and T. Ząbkowski, Whole time series data streams clustering: dynamic profiling of the electricity consumption. Entropy, 2020. 22(12): p. 1414.
  • Tajalizadeh, H. and R. Boostani, A novel stream clustering framework for spam detection in Twitter. IEEE Transactions on Computational Social Systems, 2019. 6(3): p. 525-534.
  • Yin, J., et al. Model-based clustering of short text streams. in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018.
  • Diaz-Rozo, J., C. Bielza, and P. Larrañaga, Clustering of data streams with dynamic gaussian mixture models: an IoT application in industrial processes. IEEE Internet of Things Journal, 2018. 5(5): p. 3533-3547.
  • Al-Shammari, A., et al., An effective density-based clustering and dynamic maintenance framework for evolving medical data streams. International journal of medical informatics, 2019. 126: p. 176-186.
  • Hendricks, D., Using real-time cluster configurations of streaming asynchronous features as online state descriptors in financial markets. Pattern Recognition Letters, 2017. 97: p. 21-28.
  • Zubaroğlu, A. and V. Atalay, Data stream clustering: a review. Artificial Intelligence Review, 2021. 54(2): p. 1201-1236.
  • Kokate, U., et al., Data stream clustering techniques, applications, and models: comparative analysis and discussion. Big Data and Cognitive Computing, 2018. 2(4): p. 32.
  • Mansalis, S., et al., An evaluation of data stream clustering algorithms. Statistical Analysis and Data Mining: The ASA Data Science Journal, 2018. 11(4): p. 167-187.
  • Kranen, P., et al., The clustree: indexing micro-clusters for anytime stream mining. Knowledge and information systems, 2011. 29(2): p. 249-272.
  • Zhang, T., R. Ramakrishnan, and M. Livny, BIRCH: an efficient data clustering method for very large databases. ACM sigmod record, 1996. 25(2): p. 103-114.
  • Lang, A. and E. Schubert, BETULA: Fast clustering of large data with improved BIRCH CF-Trees. Information Systems, 2022. 108: p. 101918.
  • Aggarwal, C.C., et al. A framework for clustering evolving data streams. in Proceedings 2003 VLDB conference. 2003. Elsevier.
  • Zhou, A., et al., Tracking clusters in evolving data streams over sliding windows. Knowledge and Information Systems, 2008. 15(2): p. 181-214.
  • Karypis, G., E.-H. Han, and V. Kumar, Chameleon: Hierarchical clustering using dynamic modeling. Computer, 1999. 32(8): p. 68-75.
  • Lühr, S. and M. Lazarescu, Incremental clustering of dynamic data streams using connectivity based representative points. Data & knowledge engineering, 2009. 68(1): p. 1-27.
  • Udommanetanakit, K., T. Rakthanmanon, and K. Waiyamai. E-stream: Evolution-based technique for stream clustering. in International conference on advanced data mining and applications. 2007. Springer.
  • Meesuksabai, W., T. Kangkachit, and K. Waiyamai. Hue-stream: Evolution-based clustering technique for heterogeneous data streams with uncertainty. in International Conference on Advanced Data Mining and Applications. 2011. Springer.
  • Nikpour, S. and S. Asadi, A dynamic hierarchical incremental learning-based supervised clustering for data stream with considering concept drift. Journal of Ambient Intelligence and Humanized Computing, 2022: p. 1-21.
  • Sangma, J.W., et al., Hierarchical clustering for multiple nominal data streams with evolving behaviour. Complex & Intelligent Systems, 2022: p. 1-25.
  • Ahmed, R., G. Dalkılıç, and Y. Erten, DGStream: High quality and efficiency stream clustering algorithm. Expert Systems with Applications, 2020. 141: p. 112947.
  • Li, Y., et al., Esa-stream: Efficient self-adaptive online data stream clustering. IEEE Transactions on Knowledge and Data Engineering, 2020.
  • Huang, L., et al., MVStream: Multiview data stream clustering. IEEE transactions on neural networks and learning systems, 2019. 31(9): p. 3482-3496.
  • Laohakiat, S. and V. Sa-Ing, An incremental density-based clustering framework using fuzzy local clustering. Information Sciences, 2021. 547: p. 404-426.
  • Nguyen, H.-L., Y.-K. Woon, and W.-K. Ng, A survey on data stream clustering and classification. Knowledge and information systems, 2015. 45(3): p. 535-569.
  • Şenol, A. and H. Karacan, Kd-tree and adaptive radius (KD-AR Stream) based real-time data stream clustering. Journal of the Faculty of Engineering Architecture of Gazi University, 2020. 35(1): p. 337-354.
  • Bentley, J.L., Multidimensional binary search trees used for associative searching. Communications of the ACM, 1975. 18(9): p. 509-517.
  • Omohundro, S.M., Five balltree construction algorithms. 1989: International Computer Science Institute Berkeley.
  • Yianilos, P.N. Data Structures and Algorithms for Nearest Neighbor. in Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms. 1993. SIAM.
  • Cao, F., et al. Density-based clustering over an evolving data stream with noise. in Proceedings of the 2006 SIAM international conference on data mining. 2006. SIAM.
  • Dua, D. and C. Graff. UCI Machine Learning Repository. 2021; Available from: http://archive.ics.uci.edu/ml.
There are 37 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Makaleler
Authors

Ali Şenol 0000-0003-0364-2837

Mahmut Kaya 0000-0002-7846-1769

Yavuz Canbay 0000-0003-2316-7893

Early Pub Date June 15, 2023
Publication Date August 21, 2023
Submission Date July 17, 2022
Acceptance Date January 25, 2023
Published in Issue Year 2024 Volume: 39 Issue: 1

Cite

APA Şenol, A., Kaya, M., & Canbay, Y. (2023). Akan veri kümeleme probleminde ağaç veri yapılarının performans karşılaştırması. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, 39(1), 217-232. https://doi.org/10.17341/gazimmfd.1144533
AMA Şenol A, Kaya M, Canbay Y. Akan veri kümeleme probleminde ağaç veri yapılarının performans karşılaştırması. GUMMFD. August 2023;39(1):217-232. doi:10.17341/gazimmfd.1144533
Chicago Şenol, Ali, Mahmut Kaya, and Yavuz Canbay. “Akan Veri kümeleme Probleminde ağaç Veri yapılarının Performans karşılaştırması”. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi 39, no. 1 (August 2023): 217-32. https://doi.org/10.17341/gazimmfd.1144533.
EndNote Şenol A, Kaya M, Canbay Y (August 1, 2023) Akan veri kümeleme probleminde ağaç veri yapılarının performans karşılaştırması. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi 39 1 217–232.
IEEE A. Şenol, M. Kaya, and Y. Canbay, “Akan veri kümeleme probleminde ağaç veri yapılarının performans karşılaştırması”, GUMMFD, vol. 39, no. 1, pp. 217–232, 2023, doi: 10.17341/gazimmfd.1144533.
ISNAD Şenol, Ali et al. “Akan Veri kümeleme Probleminde ağaç Veri yapılarının Performans karşılaştırması”. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi 39/1 (August 2023), 217-232. https://doi.org/10.17341/gazimmfd.1144533.
JAMA Şenol A, Kaya M, Canbay Y. Akan veri kümeleme probleminde ağaç veri yapılarının performans karşılaştırması. GUMMFD. 2023;39:217–232.
MLA Şenol, Ali et al. “Akan Veri kümeleme Probleminde ağaç Veri yapılarının Performans karşılaştırması”. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, vol. 39, no. 1, 2023, pp. 217-32, doi:10.17341/gazimmfd.1144533.
Vancouver Şenol A, Kaya M, Canbay Y. Akan veri kümeleme probleminde ağaç veri yapılarının performans karşılaştırması. GUMMFD. 2023;39(1):217-32.