Research Article

Comparison of the methods to determine optimal number of cluster

Volume: 6 Number: 1 June 30, 2023
TR EN

Comparison of the methods to determine optimal number of cluster

Abstract

Clustering is an unsupervised learning that divides observations into groups based on their similarity. The most widely used clustering algorithm is k-means. However, in this clustering algorithm, the number of clusters needs to be determined in advance. In this study, the most widely used methods for determining the number of clusters, namely Average Silhouette, Caliński-Harabasz, Davies-Bouldin and Dunn Index were used. The performances of these methods were compared by Rand Index and Meila's Variation of Information (MVI) criteria on nine real data sets where the number of clusters was known in advance. According to these criterias, Average Silhouette was given more successful results.

Keywords

References

  1. [1] El Naqa, I., & Murphy, M. J. What is machine learning?. In machine learning in radiation oncology (pp. 3-11). Springer, Cham. 2015
  2. [2] Learned-Miller, E. G. Introduction to supervised learning. I: Department of Computer Science, University of Massachusetts, 3. 2014
  3. [3] Hady, M. F. A., & Schwenker, F. Semi-supervised learning. Handbook on Neural Information Processing, 215-239. 2013
  4. [4] Hinton, G., & Sejnowski, T. J. (Eds.). Unsupervised learning: foundations of neural computation. MIT press. 1999
  5. [5] Holzinger, K. J., & Harman, H. H. Factor analysis; a synthesis of factorial methods. 1941
  6. [6] Sokal, R. R. Numerical taxonomy. Scientific American, 215(6), 106-117. 1966
  7. [7] Barbará, D., & Jajodia, S. (Eds.). Applications of data mining in computer security (Vol. 6). Springer Science & Business Media. 2002
  8. [8] Mirkin, B. Clustering for data mining: a data recovery approach. Chapman and Hall/CRC. 2005

Details

Primary Language

English

Subjects

-

Journal Section

Research Article

Publication Date

June 30, 2023

Submission Date

March 5, 2023

Acceptance Date

May 15, 2023

Published in Issue

Year 2023 Volume: 6 Number: 1

APA
Öztürk, F. E., & Demirel, N. (2023). Comparison of the methods to determine optimal number of cluster. Veri Bilimi, 6(1), 34-45. https://izlik.org/JA52PR38GA
AMA
1.Öztürk FE, Demirel N. Comparison of the methods to determine optimal number of cluster. Data Sci. J. 2023;6(1):34-45. https://izlik.org/JA52PR38GA
Chicago
Öztürk, Fatih Emre, and Neslihan Demirel. 2023. “Comparison of the Methods to Determine Optimal Number of Cluster”. Veri Bilimi 6 (1): 34-45. https://izlik.org/JA52PR38GA.
EndNote
Öztürk FE, Demirel N (June 1, 2023) Comparison of the methods to determine optimal number of cluster. Veri Bilimi 6 1 34–45.
IEEE
[1]F. E. Öztürk and N. Demirel, “Comparison of the methods to determine optimal number of cluster”, Data Sci. J., vol. 6, no. 1, pp. 34–45, June 2023, [Online]. Available: https://izlik.org/JA52PR38GA
ISNAD
Öztürk, Fatih Emre - Demirel, Neslihan. “Comparison of the Methods to Determine Optimal Number of Cluster”. Veri Bilimi 6/1 (June 1, 2023): 34-45. https://izlik.org/JA52PR38GA.
JAMA
1.Öztürk FE, Demirel N. Comparison of the methods to determine optimal number of cluster. Data Sci. J. 2023;6:34–45.
MLA
Öztürk, Fatih Emre, and Neslihan Demirel. “Comparison of the Methods to Determine Optimal Number of Cluster”. Veri Bilimi, vol. 6, no. 1, June 2023, pp. 34-45, https://izlik.org/JA52PR38GA.
Vancouver
1.Fatih Emre Öztürk, Neslihan Demirel. Comparison of the methods to determine optimal number of cluster. Data Sci. J. [Internet]. 2023 Jun. 1;6(1):34-45. Available from: https://izlik.org/JA52PR38GA