Research Article
BibTex RIS Cite

A New Clustering Algorithm of Hybrid Data According to Weights of Attributes

Year 2016, Volume: 5 Issue: 9, 28 - 37, 26.12.2016

Abstract

Separating large data into similar clusters is one of the basic problems of data mining. Storing large data in an organized way has currently increased the importance of the methods developed for clustering. Even if the hierarchical clustering methods give effective results, they are still inadequate due to their computational complexity. Non-hierarchical clustering methods cannot be used for all data types because of the cost function which cannot run by categorical data. Recently, some non-hierarchical clustering methods have been improved for categorical and hybrid data. In addition, the weights of attributes in clustering might be different due to the nature of the data or the expected results. In this paper, we introduce an algorithm which has been improved for the clustering of large hybrid data in an effective way that also includes the weights of attributes. This algorithm, mainly based on the K-Prototypes algorithm, will be called “W-K-Prototypes”. The computational results show that the algorithm can be used efficiently for clustering.

References

  • Huang, Z. (1997a) Clustering Large Data Sets with Mixed Numeric and Categorical Values, In Proceedings of The First Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore, World Scientific.
  • Huang, Z. (1997b) A fast clustering algorithm to cluster very large categorical data sets in data mining. Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Dept. of Computer Science, The University of British Columbia, Canada, pp. 1–8.
  • Huang, Z. (1998) “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values”, Data Mining and Knowledge Discovery, Vol. 2, No. 3, pp. 283 – 304.
  • Anderberg, M. R. (1973) Cluster Analysis for Applications, Academic Press.
  • S. Boriah, V. Chandola and V. Kumar. (2008) “Similarity Measures for Categorical Data: A Comparative Evaluation”, Proceedings of the 8th SIAM International Conference on Data Mining, pp. 243 – 254.
  • Renato Cordeiro de Amorim. (2015) A survey on feature weighting based K-Means algorithms.
  • Kantardzic, M. (2003) Data Mining: Concepts, Models and Algorithms, IEEE Press and John Wiley, New York.
  • B. Everitt. (1974) Cluster Analysis. Heinemann Educational Books Ltd.

A New Clustering Algorithm of Hybrid Data According to Weights of Attributes

Year 2016, Volume: 5 Issue: 9, 28 - 37, 26.12.2016

Abstract

Separating large data into similar clusters is one of the basic problems of data mining. Storing large data in an organized way has currently increased the importance of the methods developed for clustering. Even if the hierarchical clustering methods give effective results, they are still inadequate due to their computational complexity. Non-hierarchical clustering methods cannot be used for all data types because of the cost function which cannot run by categorical data. Recently, some non-hierarchical clustering methods have been improved for categorical and hybrid data. In addition, the weights of attributes in clustering might be different due to the nature of the data or the expected results. In this paper, we introduce an algorithm which has been improved for the clustering of large hybrid data in an effective way that also includes the weights of attributes. This algorithm, mainly based on the K-Prototypes algorithm, will be called “W-K-Prototypes”. The computational results show that the algorithm can be used efficiently for clustering.

References

  • Huang, Z. (1997a) Clustering Large Data Sets with Mixed Numeric and Categorical Values, In Proceedings of The First Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore, World Scientific.
  • Huang, Z. (1997b) A fast clustering algorithm to cluster very large categorical data sets in data mining. Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Dept. of Computer Science, The University of British Columbia, Canada, pp. 1–8.
  • Huang, Z. (1998) “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values”, Data Mining and Knowledge Discovery, Vol. 2, No. 3, pp. 283 – 304.
  • Anderberg, M. R. (1973) Cluster Analysis for Applications, Academic Press.
  • S. Boriah, V. Chandola and V. Kumar. (2008) “Similarity Measures for Categorical Data: A Comparative Evaluation”, Proceedings of the 8th SIAM International Conference on Data Mining, pp. 243 – 254.
  • Renato Cordeiro de Amorim. (2015) A survey on feature weighting based K-Means algorithms.
  • Kantardzic, M. (2003) Data Mining: Concepts, Models and Algorithms, IEEE Press and John Wiley, New York.
  • B. Everitt. (1974) Cluster Analysis. Heinemann Educational Books Ltd.
There are 8 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Articles
Authors

Osman Çörekci This is me

Ayla Saylı

Publication Date December 26, 2016
Published in Issue Year 2016 Volume: 5 Issue: 9

Cite

APA Çörekci, O., & Saylı, A. (2016). A New Clustering Algorithm of Hybrid Data According to Weights of Attributes. Avrupa Bilim Ve Teknoloji Dergisi, 5(9), 28-37.