TR
EN
Comparison of the methods to determine optimal number of cluster
Abstract
Clustering is an unsupervised learning that divides observations into groups based on their similarity. The most widely used clustering algorithm is k-means. However, in this clustering algorithm, the number of clusters needs to be determined in advance. In this study, the most widely used methods for determining the number of clusters, namely Average Silhouette, Caliński-Harabasz, Davies-Bouldin and Dunn Index were used. The performances of these methods were compared by Rand Index and Meila's Variation of Information (MVI) criteria on nine real data sets where the number of clusters was known in advance. According to these criterias, Average Silhouette was given more successful results.
Keywords
References
- [1] El Naqa, I., & Murphy, M. J. What is machine learning?. In machine learning in radiation oncology (pp. 3-11). Springer, Cham. 2015
- [2] Learned-Miller, E. G. Introduction to supervised learning. I: Department of Computer Science, University of Massachusetts, 3. 2014
- [3] Hady, M. F. A., & Schwenker, F. Semi-supervised learning. Handbook on Neural Information Processing, 215-239. 2013
- [4] Hinton, G., & Sejnowski, T. J. (Eds.). Unsupervised learning: foundations of neural computation. MIT press. 1999
- [5] Holzinger, K. J., & Harman, H. H. Factor analysis; a synthesis of factorial methods. 1941
- [6] Sokal, R. R. Numerical taxonomy. Scientific American, 215(6), 106-117. 1966
- [7] Barbará, D., & Jajodia, S. (Eds.). Applications of data mining in computer security (Vol. 6). Springer Science & Business Media. 2002
- [8] Mirkin, B. Clustering for data mining: a data recovery approach. Chapman and Hall/CRC. 2005
Details
Primary Language
English
Subjects
-
Journal Section
Research Article
Publication Date
June 30, 2023
Submission Date
March 5, 2023
Acceptance Date
May 15, 2023
Published in Issue
Year 2023 Volume: 6 Number: 1
APA
Öztürk, F. E., & Demirel, N. (2023). Comparison of the methods to determine optimal number of cluster. Veri Bilimi, 6(1), 34-45. https://izlik.org/JA52PR38GA
AMA
1.Öztürk FE, Demirel N. Comparison of the methods to determine optimal number of cluster. Data Sci. J. 2023;6(1):34-45. https://izlik.org/JA52PR38GA
Chicago
Öztürk, Fatih Emre, and Neslihan Demirel. 2023. “Comparison of the Methods to Determine Optimal Number of Cluster”. Veri Bilimi 6 (1): 34-45. https://izlik.org/JA52PR38GA.
EndNote
Öztürk FE, Demirel N (June 1, 2023) Comparison of the methods to determine optimal number of cluster. Veri Bilimi 6 1 34–45.
IEEE
[1]F. E. Öztürk and N. Demirel, “Comparison of the methods to determine optimal number of cluster”, Data Sci. J., vol. 6, no. 1, pp. 34–45, June 2023, [Online]. Available: https://izlik.org/JA52PR38GA
ISNAD
Öztürk, Fatih Emre - Demirel, Neslihan. “Comparison of the Methods to Determine Optimal Number of Cluster”. Veri Bilimi 6/1 (June 1, 2023): 34-45. https://izlik.org/JA52PR38GA.
JAMA
1.Öztürk FE, Demirel N. Comparison of the methods to determine optimal number of cluster. Data Sci. J. 2023;6:34–45.
MLA
Öztürk, Fatih Emre, and Neslihan Demirel. “Comparison of the Methods to Determine Optimal Number of Cluster”. Veri Bilimi, vol. 6, no. 1, June 2023, pp. 34-45, https://izlik.org/JA52PR38GA.
Vancouver
1.Fatih Emre Öztürk, Neslihan Demirel. Comparison of the methods to determine optimal number of cluster. Data Sci. J. [Internet]. 2023 Jun. 1;6(1):34-45. Available from: https://izlik.org/JA52PR38GA