Araştırma Makalesi
BibTex RIS Kaynak Göster

A New Clustering Algorithm of Hybrid Data According to Weights of Attributes

Yıl 2016, Cilt: 5 Sayı: 9, 28 - 37, 26.12.2016

Öz

Separating large data into similar clusters is one of the basic problems of data mining. Storing large data in an organized way has currently increased the importance of the methods developed for clustering. Even if the hierarchical clustering methods give effective results, they are still inadequate due to their computational complexity. Non-hierarchical clustering methods cannot be used for all data types because of the cost function which cannot run by categorical data. Recently, some non-hierarchical clustering methods have been improved for categorical and hybrid data. In addition, the weights of attributes in clustering might be different due to the nature of the data or the expected results. In this paper, we introduce an algorithm which has been improved for the clustering of large hybrid data in an effective way that also includes the weights of attributes. This algorithm, mainly based on the K-Prototypes algorithm, will be called “W-K-Prototypes”. The computational results show that the algorithm can be used efficiently for clustering.

Kaynakça

  • Huang, Z. (1997a) Clustering Large Data Sets with Mixed Numeric and Categorical Values, In Proceedings of The First Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore, World Scientific.
  • Huang, Z. (1997b) A fast clustering algorithm to cluster very large categorical data sets in data mining. Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Dept. of Computer Science, The University of British Columbia, Canada, pp. 1–8.
  • Huang, Z. (1998) “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values”, Data Mining and Knowledge Discovery, Vol. 2, No. 3, pp. 283 – 304.
  • Anderberg, M. R. (1973) Cluster Analysis for Applications, Academic Press.
  • S. Boriah, V. Chandola and V. Kumar. (2008) “Similarity Measures for Categorical Data: A Comparative Evaluation”, Proceedings of the 8th SIAM International Conference on Data Mining, pp. 243 – 254.
  • Renato Cordeiro de Amorim. (2015) A survey on feature weighting based K-Means algorithms.
  • Kantardzic, M. (2003) Data Mining: Concepts, Models and Algorithms, IEEE Press and John Wiley, New York.
  • B. Everitt. (1974) Cluster Analysis. Heinemann Educational Books Ltd.

A New Clustering Algorithm of Hybrid Data According to Weights of Attributes

Yıl 2016, Cilt: 5 Sayı: 9, 28 - 37, 26.12.2016

Öz

Separating large data into similar clusters is one of the basic problems of data mining. Storing large data in an organized way has currently increased the importance of the methods developed for clustering. Even if the hierarchical clustering methods give effective results, they are still inadequate due to their computational complexity. Non-hierarchical clustering methods cannot be used for all data types because of the cost function which cannot run by categorical data. Recently, some non-hierarchical clustering methods have been improved for categorical and hybrid data. In addition, the weights of attributes in clustering might be different due to the nature of the data or the expected results. In this paper, we introduce an algorithm which has been improved for the clustering of large hybrid data in an effective way that also includes the weights of attributes. This algorithm, mainly based on the K-Prototypes algorithm, will be called “W-K-Prototypes”. The computational results show that the algorithm can be used efficiently for clustering.

Kaynakça

  • Huang, Z. (1997a) Clustering Large Data Sets with Mixed Numeric and Categorical Values, In Proceedings of The First Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore, World Scientific.
  • Huang, Z. (1997b) A fast clustering algorithm to cluster very large categorical data sets in data mining. Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Dept. of Computer Science, The University of British Columbia, Canada, pp. 1–8.
  • Huang, Z. (1998) “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values”, Data Mining and Knowledge Discovery, Vol. 2, No. 3, pp. 283 – 304.
  • Anderberg, M. R. (1973) Cluster Analysis for Applications, Academic Press.
  • S. Boriah, V. Chandola and V. Kumar. (2008) “Similarity Measures for Categorical Data: A Comparative Evaluation”, Proceedings of the 8th SIAM International Conference on Data Mining, pp. 243 – 254.
  • Renato Cordeiro de Amorim. (2015) A survey on feature weighting based K-Means algorithms.
  • Kantardzic, M. (2003) Data Mining: Concepts, Models and Algorithms, IEEE Press and John Wiley, New York.
  • B. Everitt. (1974) Cluster Analysis. Heinemann Educational Books Ltd.
Toplam 8 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Mühendislik
Bölüm Makaleler
Yazarlar

Osman Çörekci Bu kişi benim

Ayla Saylı

Yayımlanma Tarihi 26 Aralık 2016
Yayımlandığı Sayı Yıl 2016 Cilt: 5 Sayı: 9

Kaynak Göster

APA Çörekci, O., & Saylı, A. (2016). A New Clustering Algorithm of Hybrid Data According to Weights of Attributes. Avrupa Bilim Ve Teknoloji Dergisi, 5(9), 28-37.