Research Article
BibTex RIS Cite

Kategorik Verilerde Kümeleme İçin Farklı Algoritmaların Karşılaştırılması

Year 2017, Volume: 2 Issue: 2, 62 - 78, 29.12.2017

Abstract

Kümeleme analizi
nesnelerin doğal gruplarını bulmak için kullanılan bir yöntemdir. Kümeleme
yapılırken küme içi homojenlik ile kümeler arası heterojenliğin yüksek olması
istenir. Literatürde, kategorik verileri kümelemek için çok fazla yöntem yoktur
ve var olanların hangisinin en iyi olduğu ile ilgili kesin bir bilgi
bulunmamaktadır. Veri sayısına ve veri yapısına göre her bir yöntemin birbirine
üstünlükleri ve eksiklikleri vardır. Ayrıca iyi bir kümeleme yapmak için
kullanılacak değişken sayısı büyük önem taşımaktadır. Bu çalışmada kategorik
verilerin kümelenmesi ile ilgilenildi. Hiyerarşik kümeleme tekniklerinden tek
bağlantı tekniği, tam bağlantı tekniği, ortalama bağlantı tekniği ve bölmeli
kümeleme tekniklerinden K-modes algoritması kullanılarak kümeleme analizi
yapıldı ve sonuçlar karşılaştırıldı. Analiz sonuçlarına göre veri sayısı
büyüdükçe kümeleme performansı hiyerarşik tekniklerde azalırken K-modes
algoritmasında arttığı tespit edildi. 

References

  • [1] Guo L, 2008. Clustering Categorical Response, Master Thesis, Office of Graduate Studies College of Arts and Sciences, Georgia State University, 1-2.
  • [2] Triphaty B K, Ghosh A, 2011. SSDR: An algorithm for Clustering Categorical Data Using Rough Set Theory, Advances in Applied Science Research, 2(3):314-326.
  • [3] Huang Z, 1998. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values”, Data Mining and Knowledge Discovery, 2(3): 283-304.
  • [4] Gibson D, Kleinberg J, Raghavan P, 1998. Clustering categorical data: an approach based on dynamical systems, In Proceedings of the 24th VLDB Conference, New York, USA, 311-322.
  • [5] Ganti V, Gehrke J, Ramakrishan R, 1999. CACTUS: Clustering categorical data using summaries, In Proceedings of ACM SIGKDD, International Conference on Knowledge Discovery&Data Mining, San Diego, CA, USA, 73- 83.
  • [6] Guha S., Rastogi R., Shim K, 1999. ROCK: A robust clustering algorithm for categorical attributes, Proceedings of the IEEE International Conference on Data Engineering, Sydney, 345-366.
  • [7] He Z, Xu X, Deng S 2002. Squeezer: An Efficient Algorithm for Clustering Categorical Data, Department of Computer Science and Engineering, Harbin Institue of Technology, 17(5):611-624.
  • [8] Rezankova H, 2009. Cluster Analysis and Categorical Data, Vysoka Skola Economicka v Praze, Praha, 223-234.
  • [9] Abdu E, 2009. Clustering Categorical Data Using Summaries and Spectral Techniques, PHD Thesis, Graduate Department of Computer Science, The University of New York, 1-3.
  • [10] Michel C, 2000. Cardinal Nominal or Similarity Measures in Comparative Evaluation of Information Retrieval Process, The 2st International Conference on Language Resources & Evaluation (LREC), 367.
  • [11] Barbara D, Couto J, Yi L, 2002. COOLCAT: An Entropy-based Algorithm for Categorical Clustering, In Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM), McLean, VA, USA, 582- 589.
  • [12] Nemalhabib A, 2006. A Cohesion-based Clustering Technique for Categorical Data, Master Thesis, Graduate Department of Computer Science and Software Engineering, Concordia University, 1-37.
  • [13] Khan SS, 2007. Computation of Initial Modes for K-modes Clustering Algorithm Using Evidence Accumulation, IJCAI’07 Proocedings of the 20th International Joint Conference on Artifical Intelligence, 2784-2789.
  • [14] UCI Machine Learning Repository. http://www.ics.uci.edu/~mlear/MLRepository.html, 2013.
Year 2017, Volume: 2 Issue: 2, 62 - 78, 29.12.2017

Abstract

References

  • [1] Guo L, 2008. Clustering Categorical Response, Master Thesis, Office of Graduate Studies College of Arts and Sciences, Georgia State University, 1-2.
  • [2] Triphaty B K, Ghosh A, 2011. SSDR: An algorithm for Clustering Categorical Data Using Rough Set Theory, Advances in Applied Science Research, 2(3):314-326.
  • [3] Huang Z, 1998. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values”, Data Mining and Knowledge Discovery, 2(3): 283-304.
  • [4] Gibson D, Kleinberg J, Raghavan P, 1998. Clustering categorical data: an approach based on dynamical systems, In Proceedings of the 24th VLDB Conference, New York, USA, 311-322.
  • [5] Ganti V, Gehrke J, Ramakrishan R, 1999. CACTUS: Clustering categorical data using summaries, In Proceedings of ACM SIGKDD, International Conference on Knowledge Discovery&Data Mining, San Diego, CA, USA, 73- 83.
  • [6] Guha S., Rastogi R., Shim K, 1999. ROCK: A robust clustering algorithm for categorical attributes, Proceedings of the IEEE International Conference on Data Engineering, Sydney, 345-366.
  • [7] He Z, Xu X, Deng S 2002. Squeezer: An Efficient Algorithm for Clustering Categorical Data, Department of Computer Science and Engineering, Harbin Institue of Technology, 17(5):611-624.
  • [8] Rezankova H, 2009. Cluster Analysis and Categorical Data, Vysoka Skola Economicka v Praze, Praha, 223-234.
  • [9] Abdu E, 2009. Clustering Categorical Data Using Summaries and Spectral Techniques, PHD Thesis, Graduate Department of Computer Science, The University of New York, 1-3.
  • [10] Michel C, 2000. Cardinal Nominal or Similarity Measures in Comparative Evaluation of Information Retrieval Process, The 2st International Conference on Language Resources & Evaluation (LREC), 367.
  • [11] Barbara D, Couto J, Yi L, 2002. COOLCAT: An Entropy-based Algorithm for Categorical Clustering, In Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM), McLean, VA, USA, 582- 589.
  • [12] Nemalhabib A, 2006. A Cohesion-based Clustering Technique for Categorical Data, Master Thesis, Graduate Department of Computer Science and Software Engineering, Concordia University, 1-37.
  • [13] Khan SS, 2007. Computation of Initial Modes for K-modes Clustering Algorithm Using Evidence Accumulation, IJCAI’07 Proocedings of the 20th International Joint Conference on Artifical Intelligence, 2784-2789.
  • [14] UCI Machine Learning Repository. http://www.ics.uci.edu/~mlear/MLRepository.html, 2013.
There are 14 citations in total.

Details

Subjects Engineering
Journal Section Research Articles
Authors

Ferhan Baş Kaman

Publication Date December 29, 2017
Submission Date March 15, 2017
Published in Issue Year 2017 Volume: 2 Issue: 2

Cite

APA Baş Kaman, F. (2017). Kategorik Verilerde Kümeleme İçin Farklı Algoritmaların Karşılaştırılması. Sinop Üniversitesi Fen Bilimleri Dergisi, 2(2), 62-78.


Articles published in Sinopjns are licensed under CC BY-NC 4.0.