Jensen Shannon Mesafesi Temelli Uyarlanmış Bulanık C Ortalamalar Kümeleme Yöntemi

Naciye Aydın; Gökhan Kayhan

doi:10.31590/ejosat.1021473

Konferans Bildirisi

Modified FCM Clustering Method based on Jensen Shannon Distance

Yıl 2021, Sayı: 29, 58 - 64, 01.12.2021

Naciye Aydın , Gökhan Kayhan

https://doi.org/10.31590/ejosat.1021473

Öz

Clustering methods, which is an important branch of unsupervised learning is one of the popular research areas of computer science. The inability to predict the number of clusters is an important problem in many clustering methods. In this study, a new Jensen Shannon Fuzzy C Means (JSFCM) algorithm have been proposed by modifying the Jensen Shannon (JS) distance to the Fuzzy C Means (FCM) algorithm to estimate the number of clusters. The goal of the study is to increase the performance of determining the correct number of clusters with a new algorithm proposal based on the FCM algorithm. For this purpose, the suggested JSFCM algorithm is compared with the FCM method used with Modified Partition Entropy (MPE-FCM) and the pure FCM algorithm. The FCM algorithm was run for 6 different data sets with the real number of clusters defined in the database. The number of clusters of datasets was predicted by running the same datasets for the JSFCM and MPE-FCM methods. The obtained results are compared with the JSFCM, MPE-FCM and pure FCM methods. With this comparison, it is concluded that the JSFCM algorithm is more successful in estimating the number of clusters and minimizing the objective function. It has been concluded that the JSFCM algorithm, in addition to its superiority in estimating the number of clusters is more stable in estimating the number of clusters compared to the MPE-FCM method. Based on the aggregation dataset, when the results of 10 different runs with both JSFCM and MPE-FCM algorithms are examined, it has been demonstrated that the JSFCM algorithm is more stable in estimating the number of clusters. According to these results, the MPE-FCM method achieved 20% accuracy by making 2 correct predictions in 10 different runs while the JSFCM method achieved 80% accuracy by making 8 correct predictions in 10 different runs. In addition, the cluster number predictions of all data sets obtained in 10 different runs were compared with both methods, and it was shown that the JSFCM algorithm maintains its stability when the number of clusters and features increases. Finally, suggestions are made to guide future research to eliminate the disadvantageous situations of the JSFCM algorithm arising from the FCM algorithm.

Anahtar Kelimeler

Unsupervised Learning, Clustering, Fuzzy C Means, Jensen Shannon Distance, Modified Partition Entropy

Kaynakça

Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & geosciences, 10, 191-203. doi: 10.1016/0098-3004(84)90020-7.
Boydell, B., & McBratney, A.B. (2002). Identifying potential within-field management zones from cotton-yield estimates. Precision Agriculture, 3(1), 9-23. doi: 10.1023/A:1013318002609.
Chang, H., & Yeung, D.Y. (2008). Robust path-based spectral clustering. Pattern Recognition, 41(1), 191-203. doi: 10.1016/j.patcog.2007.04.010.
Ezugwu, A. E., Shukla, A. K., Agbaje, M.B., Oyelade, O. N., José-García, A., & Agushaka, J. O. (2021). Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature. Neural Computing and Applications, 33(11), 6247-6306.
Gionis, A., Mannila, H., & Tsaparas, P. (2007). Clustering aggregation. ACM Transactions on Knowledge Discovery from Data, 1(1) 4-es. doi: 10.1145/1217299.1217303.
Govender, P., & Sivakumar, V. (2020). Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980–2019). Atmospheric Pollution Research, 11(1), 40-56. doi: 10.1016/j.apr.2019.09.009.
Hruschka, E. R., Campello, R. J., & Freitas, A. A (2009). A survey of evolutionary algorithms for clustering. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 39(2), 133-155. doi: 10.1109/TSMCC.2008.2007252.
Li, Y. L., & Shen. Y. (2010). An automatic fuzzy c-means algorithm for image segmentation. Soft Computing, 14(2), 123-128. doi: 10.1007/s00500-009-0442-0.
Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145-151.
Maulik, U., & Saha, I. (2010). Automatic fuzzy clustering using modified differential evolution for image classification. IEEE Transactions on Geoscience and Remote Sensing, 48(9), 3503-3510.
Rahman, M. A., & Islam. M. Z. (2014). A hybrid clustering technique combining a novel genetic algorithm with K-Means. Knowledge-Based Systems, 71, 345-365. doi: 10.1016/j.knosys.2014.08.011.
Schenatto, K., de Souza, E. G., Bazzi, C. L., Gavioli, A., Betzek, N. M., & Beneduzzi, H. M. (2017). Normalization of data for delineating management zones. Computers and Electronics in Agriculture, 143, 238-248. doi: 10.1016/j.compag.2017.10.017.
Veenman, C. J., Reinders, M. J. T., & Backer, E. (2002). A maximum variance cluster algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9), 1273-1280. doi: 10.1109/TPAMI.2002.1033218.
Wikaisuksakul, S. (2014). A multi-objective genetic algorithm with fuzzy c-means for automatic data clustering. Applied Soft Computing Journal, 24, 679-691. doi: 10.1016/j.asoc.2014.08.036.
Xiao, J., Yan, Y., Zhang, J., & Tang, Y. (2010). A quantum-inspired genetic algorithm for k -means clustering. Expert Systems with Applications, 37(7), 4966-4973. doi: 10.1016/j.eswa.2009.12.017.
Zahn, C. T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, 100(1), 68-86.
Dheeru, D., & Taniskidou, E. K. (2017). UCI machine learning repository.

Jensen Shannon Mesafesi Temelli Uyarlanmış Bulanık C Ortalamalar Kümeleme Yöntemi

Yıl 2021, Sayı: 29, 58 - 64, 01.12.2021

Naciye Aydın , Gökhan Kayhan

https://doi.org/10.31590/ejosat.1021473

Öz

Denetimsiz öğrenmenin önemli bir dalı olan kümeleme yöntemleri, bilgisayar bilimlerinin popüler araştırma alanlarından biridir. Kümeleme yöntemlerinin birçoğunda, küme sayısının tahmin edilememesi önemli bir problem olarak ortaya çıkmaktadır. Bu çalışmada küme sayısını tahmin etmek için Jensen Shannon (JS) mesafesi, Bulanık C Ortalamalar (BCO) algoritmasına uyarlanarak yeni bir Jensen Shannon Bulanık C Ortalamalar (JSBCO) algoritması önerilmiştir. Bu çalışma, BCO algoritmasını temel alan yeni bir algoritma önerisiyle doğru küme sayısını belirleme başarımını artırmayı hedeflemektedir. Bu amaçla, önerilen JSBCO algoritması, Uyarlanmış Bölüm Entropisi (MPE) ile kullanılan BCO yöntemi ve saf BCO algoritması ile karşılaştırılmıştır. BCO algoritması 6 farklı veri seti için, veri tabanında tanımlanan sahip oldukları gerçek küme sayıları ile çalıştırılmıştır. Aynı veri setleri MPE–BCO ve JSBCO yöntemleri için de çalıştırılarak verilere ait küme sayıları tahmin edilmiştir. Elde edilen sonuçlar ile JSBCO, MPE-BCO ve BCO yöntemlerinin karşılaştırması yapılmıştır. Yapılan bu karşılaştırma ile JSBCO algoritmasının küme sayısını tahmin etmede ve amaç fonksiyonunu minimize etmede daha başarılı olduğu sonucuna varılmıştır. JSBCO algoritmasının MPE-BCO yöntemine göre, küme sayısı tahmin etme üstünlüğünün yanı sıra, küme sayısı tahmininde daha kararlı davrandığı sonucuna ulaşılmıştır. JSBCO algoritmasının küme sayısı tahmin etmede daha kararlı davrandığını göstermek için Aggregation veri seti esas alınarak hem MPE-BCO algoritması hem JSBCO algoritması ile 10 farklı çalışmasının sonuçları gösterilmiştir. Bu sonuçlara göre MPE-BCO yöntemi, 10 farklı çalışma içinde toplamda 2 kez doğru tahmin ederek %20 doğruluk elde ederken, JSBCO algoritması 10 farklı çalışma içinde 8 kez doğru tahminde bulunarak %80 doğruluk elde etmiştir. Ayrıca tüm veri setlerinin 10 farklı çalışması sonucu elde edilen küme sayısı tahminleri her iki yöntemde karşılaştırılarak, JSBCO algoritmasının artan küme sayısı ve özellik sayısında da kararlı davranışlarını sürdürdüğü gösterilmiştir. Son olarak JSBCO algoritmasının, BCO algoritması kısmından kaynaklanan dezavantajlı durumlarının giderilmesi için gelecek çalışmalara yol gösteren önerilerde bulunulmuştur.

Anahtar Kelimeler

Denetimsiz Öğrenme, Kümeleme, Bulanık C Ortalamalar, Jensen Shannon Mesafesi, Uyarlanmış Bölüm Entropisi.

Kaynakça

Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & geosciences, 10, 191-203. doi: 10.1016/0098-3004(84)90020-7.
Boydell, B., & McBratney, A.B. (2002). Identifying potential within-field management zones from cotton-yield estimates. Precision Agriculture, 3(1), 9-23. doi: 10.1023/A:1013318002609.
Chang, H., & Yeung, D.Y. (2008). Robust path-based spectral clustering. Pattern Recognition, 41(1), 191-203. doi: 10.1016/j.patcog.2007.04.010.
Ezugwu, A. E., Shukla, A. K., Agbaje, M.B., Oyelade, O. N., José-García, A., & Agushaka, J. O. (2021). Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature. Neural Computing and Applications, 33(11), 6247-6306.
Gionis, A., Mannila, H., & Tsaparas, P. (2007). Clustering aggregation. ACM Transactions on Knowledge Discovery from Data, 1(1) 4-es. doi: 10.1145/1217299.1217303.
Govender, P., & Sivakumar, V. (2020). Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980–2019). Atmospheric Pollution Research, 11(1), 40-56. doi: 10.1016/j.apr.2019.09.009.
Hruschka, E. R., Campello, R. J., & Freitas, A. A (2009). A survey of evolutionary algorithms for clustering. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 39(2), 133-155. doi: 10.1109/TSMCC.2008.2007252.
Li, Y. L., & Shen. Y. (2010). An automatic fuzzy c-means algorithm for image segmentation. Soft Computing, 14(2), 123-128. doi: 10.1007/s00500-009-0442-0.
Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145-151.
Maulik, U., & Saha, I. (2010). Automatic fuzzy clustering using modified differential evolution for image classification. IEEE Transactions on Geoscience and Remote Sensing, 48(9), 3503-3510.
Rahman, M. A., & Islam. M. Z. (2014). A hybrid clustering technique combining a novel genetic algorithm with K-Means. Knowledge-Based Systems, 71, 345-365. doi: 10.1016/j.knosys.2014.08.011.
Schenatto, K., de Souza, E. G., Bazzi, C. L., Gavioli, A., Betzek, N. M., & Beneduzzi, H. M. (2017). Normalization of data for delineating management zones. Computers and Electronics in Agriculture, 143, 238-248. doi: 10.1016/j.compag.2017.10.017.
Veenman, C. J., Reinders, M. J. T., & Backer, E. (2002). A maximum variance cluster algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9), 1273-1280. doi: 10.1109/TPAMI.2002.1033218.
Wikaisuksakul, S. (2014). A multi-objective genetic algorithm with fuzzy c-means for automatic data clustering. Applied Soft Computing Journal, 24, 679-691. doi: 10.1016/j.asoc.2014.08.036.
Xiao, J., Yan, Y., Zhang, J., & Tang, Y. (2010). A quantum-inspired genetic algorithm for k -means clustering. Expert Systems with Applications, 37(7), 4966-4973. doi: 10.1016/j.eswa.2009.12.017.
Zahn, C. T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, 100(1), 68-86.
Dheeru, D., & Taniskidou, E. K. (2017). UCI machine learning repository.

Toplam 17 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Mühendislik
Bölüm	Makaleler
Yazarlar	Naciye Aydın 0000-0002-6261-6121 Gökhan Kayhan 0000-0003-3391-0097
Erken Görünüm Tarihi	15 Aralık 2021
Yayımlanma Tarihi	1 Aralık 2021
Yayımlandığı Sayı	Yıl 2021 Sayı: 29

Kaynak Göster

APA	Aydın, N., & Kayhan, G. (2021). Jensen Shannon Mesafesi Temelli Uyarlanmış Bulanık C Ortalamalar Kümeleme Yöntemi. Avrupa Bilim Ve Teknoloji Dergisi(29), 58-64. https://doi.org/10.31590/ejosat.1021473

Kapak Resmi İndir

Makale Dosyaları

Tam Metin