Year 2018, Volume , Issue 13, Pages 17 - 30 2018-08-31

A Survey on Data Stream Clustering Techniques
Akan Veri Kümeleme Teknikleri Üzerine Bir Derleme

Ali ŞENOL [1] , Hacer KARACAN [2]

142 244

In parallel with the development of today's technology, the amount of data that has been transferred to the computer environment has reached incredible dimensions and is increasing day by day. For this reason, the methods of data processing are also changing. In classical data clustering approaches, data is static. However, in today's technology in which data streams very fast, there is a need for applications that can cluster data and show results while the data is streaming whenever the user wants. In this sense, the demand for data stream clustering approaches is increasing day by day. Because, the data stream clustering approaches read once, fast, and have the ability to adapt themselves to new data. In other words, the results are shown to the user on the one hand, while the data is streaming on the other hand. In this study, the proposed studies on the data stream clustering area are collected and the researchers who are interested in this field are enlighten. 

Günümüz teknolojisinin gelişmesine paralel olarak bilgisayar ortamına aktarılmış olan veri miktarı inanılmaz boyutlara ulaşmış ve gün geçtikçe de artmaktadır.  Bu nedenle veriyi işleme yöntemleri de değişmektedir. Klasik kümeleme yaklaşımlarında veri statiktir. Oysa günümüz teknolojisinde, verinin çok hızlı olduğu dünyada artık veriyi akarken kümeleyecek, kullanıcıya istediği zaman sonuç verebilecek uygulamalara ihtiyaç vardır. Bu anlamda ihtiyacı karşılayan akan veri kümeleme yaklaşımlarına olan talep gün geçtikçe artmaktadır. Çünkü akan veri kümeleme yaklaşımları bir defa okumalı, hızlı ve kendisini yeni gelen veriye uyarlama özelliğine sahiptir. Yani veri bir yandan akarken bir yandan kullanıcıya sonuç üretilebilmektedir. Bu çalışmada akan veri kümeleme alanında yapılan çalışmalar derlenmekte ve bu alana ilgi duyan araştırmacılara ışık tutulmaktır.

  • Ankleshwaria, T.B. and J.S. Dhobi, Mining Data Streams: A Survey. International Journal of Advance Research in Computer Science and Management Studies, 2014. 2(2): p. 379-386.
  • Ikonomovska, E., S. Loskovska, and D. Gjorgjevik, A survey of stream data mining, in Eighth International Conference with International Participation – ETAI 2007. 2007: Ohrid, Republic ofMacedonia.
  • Aggarwal, C.C., Data Streams: Models and Algorithms. 1 ed. Advances in Database Systems. 2007: Springer US.
  • Bifet, A. and R. Kirkby, Data stream mining a practical approach. 2009.
  • Yogita and D. Toshniwal. Clustering techniques for streaming data-a survey. in 2013 3rd IEEE International Advance Computing Conference (IACC). 2013.
  • Antonellis, P., C. Makris, and N. Tsirakis, Algorithms for clustering clickstream data. Information Processing Letters, 2009. 109(8): p. 381-385.
  • Yin, C., L. Xia, and J. Wang. Application of an Improved Data Stream Clustering Algorithm in Intrusion Detection System. in Advanced Multimedia and Ubiquitous Engineering. 2017. Singapore: Springer Singapore.
  • Yin, C., L. Xia, and J. Wang. Data Stream Clustering Algorithm Based on Bucket Density for Intrusion Detection. in Advances in Computer Science and Ubiquitous Computing. 2018. Singapore: Springer Singapore.
  • Li, Z.Q., A New Data Stream Clustering Approach about Intrusion Detection. Advanced Materials Research, 2014. 926-930: p. 2898-2901.
  • Weiler, A., M. Grossniklaus, and M.H. Scholl, Situation monitoring of urban areas using social media data streams. Information Systems, 2016. 57: p. 129-141.
  • Hawwash, B., Stream-dashboard : a big data stream clustering framework with applications to social mediastreams, in Department of Computer Engineering and Computer Science. 2013, University of Louisville.
  • Barddal, J.P., et al., SNCStream: a social network-based data stream clustering algorithm, in Proceedings of the 30th Annual ACM Symposium on Applied Computing. 2015, ACM: Salamanca, Spain. p. 935-940.
  • Hendricks, D., Using real-time cluster configurations of streaming asynchronous features as online state descriptors in financial markets. Pattern Recognition Letters, 2017. 97: p. 21-28.
  • Aggarwal, C.C., Data Streams: An Overview and Scientific Applications, in Scientific Data Mining and Knowledge Discovery: Principles and Foundations, M.M. Gaber, Editor. 2010, Springer Berlin Heidelberg: Berlin, Heidelberg. p. 377-397.
  • King, R.C., et al., Application of data fusion techniques and technologies for wearable health monitoring. Medical Engineering & Physics, 2017. 42: p. 1-12.
  • Gravina, R., et al., Multi-sensor fusion in body sensor networks: State-of-the-art and research challenges. Information Fusion, 2017. 35: p. 68-80.
  • Manzi, A., P. Dario, and F. Cavallo, A Human Activity Recognition System Based on Dynamic Clustering of Skeleton Data. Sensors (Basel, Switzerland), 2017. 17(5): p. 1100.
  • Tasnim, S., et al. Semantic-Aware Clustering-based Approach of Trajectory Data Stream Mining. in 2018 International Conference on Computing, Networking and Communications (ICNC). 2018.
  • Diaz-Rozo, J., C. Bielza, and P. Larrañaga, Clustering of Data Streams with Dynamic Gaussian Mixture Models. An IoT Application in Industrial Processes. IEEE Internet of Things Journal, 2018: p. 1-1.
  • Sabit, H., A. Al-Anbuky, and H. Gholam-Hosseini. Distributed WSN Data Stream Mining Based on Fuzzy Clustering. in 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing. 2009.
  • Silva, A.d., et al., A clustering approach for sampling data streams in sensor networks. Knowl. Inf. Syst., 2012. 32(1): p. 1-23.
  • Silva, J.A., et al., Data stream clustering: A survey. ACM Comput. Surv., 2013. 46(1): p. 1-31.
  • Datar, M., et al., Maintaining stream statistics over sliding windows: (extended abstract), in Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms. 2002, Society for Industrial and Applied Mathematics: San Francisco, California. p. 635-644.
  • Aggarwal, C.C., et al., A framework for clustering evolving data streams, in Proceedings of the 29th international conference on Very large data bases - Volume 29. 2003, VLDB Endowment: Berlin, Germany. p. 81-92.
  • Keim, D.A. and M. Heczko. Wavelets and their Applications in Databases. in 17th International Conference on Data Engineering (ICDE'01), Heidelberg, Germany, 2001. 2001.
  • Rousseeuw, P.J., Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 1987. 20: p. 53-65.
  • Brun, M., et al., Model-based evaluation of clustering validation measures. Pattern Recognition, 2007. 40(3): p. 807-824.
  • Rand, W.M., Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association, 1971. 66(336): p. 846-850.
  • Hubert, L. and P. Arabie, Comparing partitions. Journal of Classification, 1985. 2(1): p. 193-218.
  • Jaccard, P., Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles, 1901. 37: p. 241-272.
  • Caliński, T. and J. Harabasz, A dendrite method for cluster analysis. Communications in Statistics, 1974. 3(1): p. 1-27.
  • Maulik, U. and S. Bandyopadhyay, Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002. 24(12): p. 1650-1654.
  • Dunn†, J.C., Well-Separated Clusters and Optimal Fuzzy Partitions. Journal of Cybernetics, 1974. 4(1): p. 95-104.
  • Davies, D.L. and D.W. Bouldin, A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1979. PAMI-1(2): p. 224-227.
  • Wallace, D.L., A Method for Comparing Two Hierarchical Clusterings: Comment. Journal of the American Statistical Association, 1983. 78(383): p. 569-576.
  • Raftery, A.E., A Note on Bayesian Factors for Log-Linear Contingency Table Models with Vague Prior Information. Journal of the Royal Statistical Society, Series B, 1986. 48(B): p. 249-250.
  • Strehl, A. and J. Ghosh, Cluster ensembles --- a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res., 2003. 3: p. 583-617.
  • Shannon, C.E., A mathematical theory of communication. SIGMOBILE Mob. Comput. Commun. Rev., 2001. 5(1): p. 3-55.
  • Amini, A., T.Y. Wah, and H. Saboohi, On Density-Based Data Streams Clustering Algorithms: A Survey. Journal of Computer Science and Technology, 2014. 29(1): p. 116-141.
  • O'Callaghan, L., et al. Streaming-data algorithms for high-quality clustering. in Proceedings 1st International Conference on Data Engineering. 2002. San Jose, CA, USA, USA: IEEE.
  • Zhang, T., R. Ramakrishnan, and M. Livny, BIRCH: an efficient data clustering method for very large databases. SIGMOD Rec., 1996. 25(2): p. 103-114.
  • Karypis, G., E.-H. Han, and V. Kumar, Chameleon: Hierarchical Clustering Using Dynamic Modeling. Computer, 1999. 32(8): p. 68-75.
  • Kranen, P., et al., The ClusTree: indexing micro-clusters for anytime stream mining. Knowledge and Information Systems, 2011. 29(2): p. 249-272.
  • Wang, W., J. Yang, and R.R. Muntz, STING: A Statistical Information Grid Approach to Spatial Data Mining, in Proceedings of the 23rd International Conference on Very Large Data Bases. 1997, Morgan Kaufmann Publishers Inc. p. 186-195.
  • Sheikholeslami, G., S. Chatterjee, and A. Zhang, WaveCluster: a wavelet-based clustering approach for spatial data in very large databases. The VLDB Journal, 2000. 8(3): p. 289-304.
  • Agrawal, R., et al., Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec., 1998. 27(2): p. 94-105.
  • Tu, L. and Y. Chen, Stream data clustering based on grid density and attraction. ACM Trans. Knowl. Discov. Data, 2009. 3(3): p. 1-27.
  • Wan, L., et al., Density-based clustering of data streams at multiple resolutions. ACM Trans. Knowl. Discov. Data, 2009. 3(3): p. 1-28.
  • Dempster, A., N.M. Laird, and D.B. Rubin, Maximum Likelihood from Incomplete Data via the EM Algorithm, in Paper presented at the Royal Statistical Society at a meeting organized by the Research Section. 1976.
  • Dang, X.H., et al. An EM-Based Algorithm for Clustering Data Streams in Sliding Windows. 2009. Berlin, Heidelberg: Springer Berlin Heidelberg.
  • Ester, M., et al., A density-based algorithm for discovering clusters in large spatial databases with noise, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. 1996, AAAI Press: Portland, Oregon. p. 226-231.
  • Ankerst, M., et al., OPTICS: ordering points to identify the clustering structure. SIGMOD Rec., 1999. 28(2): p. 49-60.
  • Hinneburg, A. and D.A. Keim, An efficient approach to clustering in large multimedia databases with noise, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. 1998, AAAI Press: New York, NY. p. 58-65.
  • Cao, F., et al., Density-Based Clustering over an Evolving Data Stream with Noise, in Proceedings of the 2006 SIAM International Conference on Data Mining. p. 328-339.
  • Mousavi, M., A.A. Bakar, and M. Vakilian, Data stream clustering algorithms: A review. International Journal of Advances in Soft Computing and its Applications, 2015. 7(Specialissue3): p. 1-15.
  • Csernel, B., F. Clerot, and G. Hébrail. StreamSamp: DataStream Clustering Over Tilted Windows Through Sampling. in ECML PKDD 2006 Workshop on Knowledge Discovery from Data Streams.
  • Charu, C.A., et al., A framework for projected clustering of high dimensional data streams, in Proceedings of the Thirtieth international conference on Very large data bases - Volume 30 %@ 0-12-088469-0. 2004, VLDB Endowment: Toronto, Canada. p. 852-863.
  • Gao, J., et al. An Incremental Data Stream Clustering Algorithm Based on Dense Units Detection. 2005. Berlin, Heidelberg: Springer Berlin Heidelberg.
  • Liu, L.x., et al. rDenStream, A Clustering Algorithm over an Evolving Data Stream. in 2009 International Conference on Information Engineering and Computer Science. 2009.
  • Udommanetanakit, K., T. Rakthanmanon, and K. Waiyamai. E-Stream: Evolution-Based Technique for Stream Clustering. 2007. Berlin, Heidelberg: Springer Berlin Heidelberg.
  • Chairukwattana, R., et al. Efficient evolution-based clustering of high dimensional data streams with dimension projection. in 2013 International Computer Science and Engineering Conference (ICSEC). 2013.
  • Jia, C., C. Tan, and A. Yong. A Grid and Density-Based Clustering Algorithm for Processing Data Stream. in 2008 Second International Conference on Genetic and Evolutionary Computing. 2008.
  • Meesuksabai, W., T. Kangkachit, and K. Waiyamai. HUE-Stream: Evolution-Based Clustering Technique for Heterogeneous Data Streams with Uncertainty. 2011. Berlin, Heidelberg: Springer Berlin Heidelberg.
  • Ackermann, M.R., et al., StreamKM++: A clustering algorithm for data streams. J. Exp. Algorithmics, 2012. 17: p. 2.1-2.30.
  • Ntoutsi, I., et al. Density-based Projected Clustering over High Dimensional Data Streams. in SIAM International Conference on Data Mining. 2012.
  • Amini, A. and T.Y. Wah, LeaDen-Stream: A Leader Density-Based Clustering Algorithm over Evolving Data Stream. Journal of Computer and Communications, 2013. 1: p. 26-31.
  • Hyde, R. and P. Angelov. A new online clustering approach for data in arbitrary shaped clusters. in 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF). 2015.
  • Masmoudi, N., et al. Incremental clustering of data stream using real ants behavior. in 2014 Sixth World Congress on Nature and Biologically Inspired Computing (NaBIC 2014). 2014.
  • Masmoudi, N., et al., CL-AntInc Algorithm for Clustering Binary Data Streams Using the Ants Behavior. Procedia Comput. Sci., 2016. 96(C): p. 187-196.
  • Ahmed, I., I. Ahmed, and W. Shahzad, Scaling up for high dimensional and high speed data streams: HSDStream. CoRR, 2015. abs/1510.03375.
  • Choromanski, K., S. Kumar, and X. Liu, Fast Online Clustering with Randomized Skeleton Sets. CoRR, 2015. abs/1506.03425.
  • Merino, J.A., Streaming data clustering in MOA using the leader algorithm, in Department of Computer Science. 2015, Universitat Polit`ecnica de Catalunya. p. 122.
  • Hahsler, M. and M. Bolaños, Clustering Data Streams Based on Shared Density between Micro-Clusters. IEEE Transactions on Knowledge and Data Engineering, 2016. 28(6): p. 1449-1461.
  • Khalilian, M., N. Mustapha, and N. Sulaiman, Data stream clustering by divide and conquer approach based on vector model. Journal of Big Data, 2016. 3(1): p. 1.
  • Silva, J.d.A., et al., An evolutionary algorithm for clustering data streams with a variable number of clusters. Expert Syst. Appl., 2017. 67(C): p. 228-238.
  • Xu, J., et al., Fat node leading tree for data stream clustering with density peaks. Knowledge-Based Systems, 2017. 120: p. 99-117.
  • Hyde, R., P. Angelov, and A.R. MacKenzie, Fully online clustering of evolving data streams into arbitrarily shaped clusters. Information Sciences, 2017. 382-383: p. 96-114.
  • Laohakiat, S., S. Phimoltares, and C. Lursinsap, A clustering algorithm for stream data with LDA-based unsupervised localized dimension reduction. Information Sciences, 2017. 381: p. 104-123.
  • Shao, X., M. Zhang, and J. Meng. Data Stream Clustering and Outlier Detection Algorithm Based on Shared Nearest Neighbor Density. in 2018 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS). 2018.
  • Keogh, E., et al. An online algorithm for segmenting time series. in Proceedings 2001 IEEE International Conference on Data Mining 2001. San Jose, CA, USA, USA: IEEE.
  • Beringer, J. and E. Hüllermeier, Online clustering of parallel data streams. Data & Knowledge Engineering, 2006. 58(2): p. 180-204.
  • Rodrigues, P.P., J. Gama, and J. Pedroso, Hierarchical Clustering of Time-Series Data Streams. IEEE Transactions on Knowledge and Data Engineering, 2008. 20(5): p. 615-627.
  • Chaovalit, P. and A. Gangopadhyay, A method for clustering transient data streams, in Proceedings of the 2009 ACM symposium on Applied Computing. 2009, ACM: Honolulu, Hawaii. p. 1518-1519.
  • Yeh, M.Y., B.R. Dai, and M.S. Chen, Clustering over Multiple Evolving Streams by Events and Correlations. IEEE Transactions on Knowledge and Data Engineering, 2007. 19(10): p. 1349-1362.
Primary Language tr
Subjects Engineering
Journal Section Makaleler

Orcid: 0000-0003-0364-2837
Author: Ali ŞENOL (Primary Author)
Country: Turkey

Orcid: 0000-0001-6788-008X
Author: Hacer KARACAN
Country: Turkey

Bibtex @review { ejosat446019, journal = {Avrupa Bilim ve Teknoloji Dergisi}, issn = {}, eissn = {2148-2683}, address = {Osman SAĞDIÇ}, year = {2018}, volume = {}, pages = {17 - 30}, doi = {10.31590/ejosat.446019}, title = {Akan Veri Kümeleme Teknikleri Üzerine Bir Derleme}, key = {cite}, author = {ŞENOL, Ali and KARACAN, Hacer} }
APA ŞENOL, A , KARACAN, H . (2018). Akan Veri Kümeleme Teknikleri Üzerine Bir Derleme. Avrupa Bilim ve Teknoloji Dergisi, (13), 17-30. DOI: 10.31590/ejosat.446019
MLA ŞENOL, A , KARACAN, H . "Akan Veri Kümeleme Teknikleri Üzerine Bir Derleme". Avrupa Bilim ve Teknoloji Dergisi (2018): 17-30 <>
Chicago ŞENOL, A , KARACAN, H . "Akan Veri Kümeleme Teknikleri Üzerine Bir Derleme". Avrupa Bilim ve Teknoloji Dergisi (2018): 17-30
RIS TY - JOUR T1 - Akan Veri Kümeleme Teknikleri Üzerine Bir Derleme AU - Ali ŞENOL , Hacer KARACAN Y1 - 2018 PY - 2018 N1 - doi: 10.31590/ejosat.446019 DO - 10.31590/ejosat.446019 T2 - Avrupa Bilim ve Teknoloji Dergisi JF - Journal JO - JOR SP - 17 EP - 30 VL - IS - 13 SN - -2148-2683 M3 - doi: 10.31590/ejosat.446019 UR - Y2 - 2018 ER -
EndNote %0 European Journal of Science and Technology Akan Veri Kümeleme Teknikleri Üzerine Bir Derleme %A Ali ŞENOL , Hacer KARACAN %T Akan Veri Kümeleme Teknikleri Üzerine Bir Derleme %D 2018 %J Avrupa Bilim ve Teknoloji Dergisi %P -2148-2683 %V %N 13 %R doi: 10.31590/ejosat.446019 %U 10.31590/ejosat.446019
ISNAD ŞENOL, Ali , KARACAN, Hacer . "Akan Veri Kümeleme Teknikleri Üzerine Bir Derleme". Avrupa Bilim ve Teknoloji Dergisi / 13 (August 2018): 17-30.