Double K Initializing Algorithm For K-Means Clustering Method
Year 2021,
Issue: 23, 280 - 287, 30.04.2021
Aziz Mahmut Yücelen
,
Abdullah Baykal
Abstract
Clustering methods that one of the most striking subjects of data mining are the most intensive research area of this field and there are many techniques and related methods on it. Some of the studies in this field have been obtained by updating the algorithms previously available and their performance has been evaluated. The most interesting topic of clustering techniques is K-Means method. Every initializing of K-Means algorithm return different cluster outputs because of random selection of the initial centers. Therefore, the reliability of the results is adversely affected and the number of iterations increase for clustering accuracy.One of the methods that tries to eliminate this problem is the k-means ++ method. In this study, the proposed method that we called double k was applied to synthetic dataset. It has been observed that double k method which finding final cluster labels is more successful than the K-Means and K-Means++ methods.
References
- Rahim, M. S., & Ahmed, T. (2017). An initial centroid selection method based on radial and angular coordinates for K-means algorithm. In 2017 20th International Conference of Computer and Information Technology (ICCIT) (pp. 1-6). IEEE.
- Astrahan, M. M. (1970). Speech analysis by clustering, or the hyperphoneme method (No. AIM-124). STANFORD UNIV CA DEPT OF COMPUTER SCIENCE.
- Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM computing surveys (CSUR), 31(3), 264-323.
- Pena, J. M., Lozano, J. A., & Larranaga, P. (1999). An empirical comparison of four initialization methods for the k-means algorithm. Pattern recognition letters, 20(10), 1027-1040.
- Redmond, S. J., & Heneghan, C. (2007). A method for initialising the K-means clustering algorithm using kd-trees. Pattern recognition letters, 28(8), 965-973.
- Xu, R., & Wunsch, D. (2008). Clustering (Vol. 10). John Wiley & Sons.
- Singhal, M., & Shukla, S. (2018, February). Centroid Selection in Kernel Extreme Learning Machine Using K-Means. In 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN) (pp. 708-711). IEEE.
- Zahra, S., Ghazanfar, M. A., Khalid, A., Azam, M. A., Naeem, U., & Prugel-Bennett, A. (2015). Novel centroid selection approaches for KMeans-clustering based recommender systems. Information sciences, 320, 156-189.
- Ding, C., & He, X. (2004, July). K-means clustering via principal component analysis. In Proceedings of the twenty-first international conference on Machine learning (p. 29).
- Selim, S. Z., & Ismail, M. A. (1984). K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transactions on pattern analysis and machine intelligence, (1), 81-87.
- MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281-297).
- Hand, D. J., & Krzanowski, W. J. (2005). Optimising k-means clustering results with standard software packages. Computational Statistics & Data Analysis, 49(4), 969-973.
- Arthur, D., & Vassilvitskii, S. (2006). k-means++: The advantages of careful seeding. Stanford.
- Lloyd, S. (1982). Least squares quantization in PCM. IEEE transactions on information theory, 28(2), 129-137.
- Katsavounidis, I., Kuo, C. C. J., & Zhang, Z. (1994). A new initialization technique for generalized Lloyd iteration. IEEE Signal processing letters, 1(10), 144-146.
- Bradley, P. S., & Fayyad, U. M. (1998, July). Refining initial points for k-means clustering. In ICML (Vol. 98, pp. 91-99).
- Likas, A., Vlassis, N., & Verbeek, J. J. (2003). The global k-means clustering algorithm. Pattern recognition, 36(2), 451-461.
- Gan, G., Chaoqun, M., & Wu, J. (2007). Data Clustering: Theory, Algorithms and Applications; 20 of Series on Statistics and Applied Probability. Philadelphia, PA.
- Xie, J., Jiang, S., Xie, W., & Gao, X. (2011). An Efficient Global K-means Clustering Algorithm. JCP, 6(2), 271-279.
- Beigi, H. (2011). Speaker recognition. In Fundamentals of Speaker Recognition (pp. 543-559). Springer, Boston, MA.
- Agrawal, A., & Gupta, H. (2013). Global K-means (GKM) clustering algorithm: a survey. International journal of computer applications, 79(2).
- Rani, A. J. M., & Parthipan, L. (2012). Clustering Analysis by Improved Particle Swarm Optimization and KMeans Algorithm.
- Li, H., Yang, X., & Wei, W. (2014). The application of pattern recognition in electrofacies analysis. Journal of Applied Mathematics, 2014.
- Hennig, C. (2015). Clustering strategy and method selection. arXiv preprint arXiv:1503.02059.
- Kärkkäinen, I., & Fränti, P. (2002). Dynamic local search algorithm for the clustering problem. Joensuu, Finland: University of Joensuu.
- Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336), 846-850.
- Campello, R. J. (2007). A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognition Letters, 28(7), 833-841.
- Santos, J. M., & Embrechts, M. (2009, September). On the use of the adjusted rand index as a metric for evaluating supervised classification. In International conference on artificial neural networks (pp. 175-184). Springer, Berlin, Heidelberg.
- Yeung, K. Y., & Ruzzo, W. L. (2001). Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data. Bioinformatics, 17(9), 763-774.
- Priness, I., Maimon, O., & Ben-Gal, I. (2007). Evaluation of gene-expression clustering via mutual information distance measure. BMC bioinformatics, 8(1), 111.
- Kraskov, A., & Grassberger, P. (2009). MIC: Mutual information based hierarchical clustering. In Information theory and statistical learning (pp. 101-123). Springer, Boston, MA.
- Newman, M. E., Cantwell, G. T., & Young, J. G. (2020). Improved mutual information measure for clustering, classification, and community detection. Physical Review E, 101(4), 042304.
K-Ortalamalar Kümeleme Yöntemi İçin Çift K Başlatma Algoritması
Year 2021,
Issue: 23, 280 - 287, 30.04.2021
Aziz Mahmut Yücelen
,
Abdullah Baykal
Abstract
Veri madenciliğinin en dikkat çekici konularından biri olan kümelenme yöntemleri, bu alanın en yoğun araştırma sahası olup kümelenme üzerine bir çok teknik ve bağlı yöntemler bulunmaktadır.Bu alandaki çalışmaların bir kısmı daha önce mevcut olan algoritmaların güncellenmesiyle elde edilmiş ve performansları değerlendirilmiştir.Kümelenmenin en çok ilgi duyulan konusu K-Ortalamalar yöntemidir.K-Ortalamalar algoritması her çalıştırıldığında, başlangıç merkezlerinin rastgele seçilmesi nedeniyle farklı küme çıktıları döndürür.Bu nedenle, sonuçların güvenilirliği olumsuz etkilenir ve kümeleme doğruluğu için yineleme sayısı artar.Bu sorunu ortadan kaldırmaya çalışan yöntemlerden biri de K-Ortalamalar++ yöntemidir.Bu çalışmada, sentetik veri kümesine çift k olarak adlandırdığımız önerilen yöntem uygulanmıştır.Çift k yöntemi, nihai kümelenme etiketlerini bulmada K-Ortalamalar ve K-Ortalamalar++ yöntemine gore daha başarılı olduğu gözlenmiştir.
References
- Rahim, M. S., & Ahmed, T. (2017). An initial centroid selection method based on radial and angular coordinates for K-means algorithm. In 2017 20th International Conference of Computer and Information Technology (ICCIT) (pp. 1-6). IEEE.
- Astrahan, M. M. (1970). Speech analysis by clustering, or the hyperphoneme method (No. AIM-124). STANFORD UNIV CA DEPT OF COMPUTER SCIENCE.
- Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM computing surveys (CSUR), 31(3), 264-323.
- Pena, J. M., Lozano, J. A., & Larranaga, P. (1999). An empirical comparison of four initialization methods for the k-means algorithm. Pattern recognition letters, 20(10), 1027-1040.
- Redmond, S. J., & Heneghan, C. (2007). A method for initialising the K-means clustering algorithm using kd-trees. Pattern recognition letters, 28(8), 965-973.
- Xu, R., & Wunsch, D. (2008). Clustering (Vol. 10). John Wiley & Sons.
- Singhal, M., & Shukla, S. (2018, February). Centroid Selection in Kernel Extreme Learning Machine Using K-Means. In 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN) (pp. 708-711). IEEE.
- Zahra, S., Ghazanfar, M. A., Khalid, A., Azam, M. A., Naeem, U., & Prugel-Bennett, A. (2015). Novel centroid selection approaches for KMeans-clustering based recommender systems. Information sciences, 320, 156-189.
- Ding, C., & He, X. (2004, July). K-means clustering via principal component analysis. In Proceedings of the twenty-first international conference on Machine learning (p. 29).
- Selim, S. Z., & Ismail, M. A. (1984). K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transactions on pattern analysis and machine intelligence, (1), 81-87.
- MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281-297).
- Hand, D. J., & Krzanowski, W. J. (2005). Optimising k-means clustering results with standard software packages. Computational Statistics & Data Analysis, 49(4), 969-973.
- Arthur, D., & Vassilvitskii, S. (2006). k-means++: The advantages of careful seeding. Stanford.
- Lloyd, S. (1982). Least squares quantization in PCM. IEEE transactions on information theory, 28(2), 129-137.
- Katsavounidis, I., Kuo, C. C. J., & Zhang, Z. (1994). A new initialization technique for generalized Lloyd iteration. IEEE Signal processing letters, 1(10), 144-146.
- Bradley, P. S., & Fayyad, U. M. (1998, July). Refining initial points for k-means clustering. In ICML (Vol. 98, pp. 91-99).
- Likas, A., Vlassis, N., & Verbeek, J. J. (2003). The global k-means clustering algorithm. Pattern recognition, 36(2), 451-461.
- Gan, G., Chaoqun, M., & Wu, J. (2007). Data Clustering: Theory, Algorithms and Applications; 20 of Series on Statistics and Applied Probability. Philadelphia, PA.
- Xie, J., Jiang, S., Xie, W., & Gao, X. (2011). An Efficient Global K-means Clustering Algorithm. JCP, 6(2), 271-279.
- Beigi, H. (2011). Speaker recognition. In Fundamentals of Speaker Recognition (pp. 543-559). Springer, Boston, MA.
- Agrawal, A., & Gupta, H. (2013). Global K-means (GKM) clustering algorithm: a survey. International journal of computer applications, 79(2).
- Rani, A. J. M., & Parthipan, L. (2012). Clustering Analysis by Improved Particle Swarm Optimization and KMeans Algorithm.
- Li, H., Yang, X., & Wei, W. (2014). The application of pattern recognition in electrofacies analysis. Journal of Applied Mathematics, 2014.
- Hennig, C. (2015). Clustering strategy and method selection. arXiv preprint arXiv:1503.02059.
- Kärkkäinen, I., & Fränti, P. (2002). Dynamic local search algorithm for the clustering problem. Joensuu, Finland: University of Joensuu.
- Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336), 846-850.
- Campello, R. J. (2007). A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognition Letters, 28(7), 833-841.
- Santos, J. M., & Embrechts, M. (2009, September). On the use of the adjusted rand index as a metric for evaluating supervised classification. In International conference on artificial neural networks (pp. 175-184). Springer, Berlin, Heidelberg.
- Yeung, K. Y., & Ruzzo, W. L. (2001). Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data. Bioinformatics, 17(9), 763-774.
- Priness, I., Maimon, O., & Ben-Gal, I. (2007). Evaluation of gene-expression clustering via mutual information distance measure. BMC bioinformatics, 8(1), 111.
- Kraskov, A., & Grassberger, P. (2009). MIC: Mutual information based hierarchical clustering. In Information theory and statistical learning (pp. 101-123). Springer, Boston, MA.
- Newman, M. E., Cantwell, G. T., & Young, J. G. (2020). Improved mutual information measure for clustering, classification, and community detection. Physical Review E, 101(4), 042304.