Research Article
BibTex RIS Cite

An incremental fuzzy algorithm for data clustering problems

Year 2019, Volume: 21 Issue: 1, 169 - 183, 15.03.2019
https://doi.org/10.25092/baunfbed.532619

Abstract

Data Cluster analysis is an important part of data mining. It can be handled as two types, hard and soft clustering. In hard clustering, a dataset is divided into distinct clusters and each data in the dataset belongs to exactly one cluster. On the contrary data can belong to more than one cluster in soft clustering and each data can be associated with each cluster by a membership degree. Incremental algorithms which are developed for hard clustering have two main advantages. They based on the nonsmooth-nonconvex mathematical model which allows significantly reduce the number of variables and they choose one cluster center for each step that leads to obtain better objective function. In this paper, we propose an incremental fuzzy algorithm for soft clustering problems and present results of numerical experiments on 11 real-world datasets. These results demonstrate that the proposed algorithm is efficient for solving the soft clustering problems.

References

  • Zadeh, A.L., Fuzzy sets, Information and Control, 8, 338-353, (1965).
  • Al-Sultan, K.S., A tabu search approach to the clustering problem, Pattern Recognition, 28(9), 1443-1451, (1995).
  • Brown, D.E., Entail, C.L., A practical application of simulated annealing to the clustering problem, Pattern Recognition, 25, 401-412, (1992).
  • Diehr, G., Evaluation of a branch and bound algorithm for clustering, SIAM J. Scientific and Statistical Computing, 6, 268-284, (1985).
  • Dubes, R., Jain, A.K., Clustering techniques: the user's dilemma, Pattern Recognition, 8, 247-260, (1976).
  • Hanjoul, P., Peeters, D., A comparison of two dual-based procedures for solving the p-median problem, European Journal of Operational Research, 20, 387-396, (1985).
  • Hansen, P., Jaumard, B., Cluster analysis and mathematical programming, Mathematical Programming, 79(1-3), 191-215, (1997).
  • Hansen, P., Mladenovic, N., J-means: a new heuristic for minimum sum-of-squares clustering, Pattern Recognition, 4, 405-413, (2001).
  • Hansen, P., Mladenovic, N., Variable neighborhood decomposition search, Journal of Heuristic, 7, 335-350, (2001).
  • Koontz, W.L.G., Narendra, P.M., Fukunaga, K., A branch and bound clustering algorithm, IEEE Transactions on Computers, 24, 908-915, (1975).
  • Du Merle, O., Hansen, P., Jaumard, B., Mladenovic, N., An interior point method for minimum sum-of-squares clustering, SIAM Journal on Scientific Computing, 21, 1485-1505, (2001).
  • Selim, S.Z., Al-Sultan, K.S., A simulated annealing algorithm for the clustering, Pattern Recognition, 24(10), 1003-1008, (1991).
  • Spath, H., Cluster Analysis Algorithms, Ellis Horwood Limited, Chichester, (1980).
  • Sun, L.X., Xie, Y.L., Song, X.H., Wang, J.H., Yu, R.Q., Cluster analysis by simulated annealing, Computers and Chemistry, 18, 103-108, (1994).
  • Likas, A., Vlassis, N. and Verbeek, J., The global k-means clustering algorithm, Pattern Recognition, 36, 451 – 461, (2001).
  • Bagirov, A.M., Modified global k-means algorithm for minimum sum-of-squares clustering problems, Pattern Recognition, 41, 3192- 3199, (2008).
  • Vanisri, D. and Loganathan, C., An efficient fuzzy clustering algorithm based on modified k-means, International Journal of Engineering Science and Technology, 5949-5958, (2010).
  • Ordin, B., Bagirov, A., A heuristic algorithm for solving the minimum sum of squares clustering problems, Journal of Global Optimization, 61, 341–361, (2015).
  • Bagirov, A., Ordin, B., Ozturk, G., Xavier, A.E., An ıncremental clustering algorithm based on hyperbolic smoothing, Computational Optimization and Applications, 61(1), 219-241, (2015).
  • Chuang, K., Tzeng, H., Chen, S., Wu, J. and Chen, T., Fuzzy c-means clustering with spatial information for image segmentation, Computerized Medical Imaging and Graphics, 30, 9–15, (2006).
  • Brandt, M.E., Bohant, T.P., Kramer, L.A. and Fletcher, J.M., Estimation of CSF, white and gray matter volumes in hydrocephalic children using fuzzy clustering of Mr Images, Computerized Medical Imaging and Graphics, 18(1), 25–34, (2004).
  • Staianoa, A., Tagliaferria, R. and Pedrycz, W., Improving RBF networks performance in regression tasks by means of a supervised fuzzy clustering, Neurocomputing, 69, 1570-1581, (2005).
  • Velmurugan, T. and Santhanam, T., Performance evaluation of k-means and fuzzy c-means clustering algorithms for statistical distributions of input data points, European Journal of Scientific Research, 3, 320-330, (2010).
  • Bagirov, A.M., Yearwood, J., A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems, European Journal of Operational Research, 170(2), 578–596, (2006).
  • Bezdek, J.C., Ehrlich, R., Full, W., FCM: The fuzzy c-means clustering algorithm, Computers & Geosciences, 10(2–3), 191–203, (1984).
  • UC Irvine Machine Learning Repository (http://archive.ics.uci.edu/ml/ )

Veri kümeleme problemleri için artımlı bir bulanık algoritma

Year 2019, Volume: 21 Issue: 1, 169 - 183, 15.03.2019
https://doi.org/10.25092/baunfbed.532619

Abstract

Veri kümeleme analizi veri madenciliğinin önemli bir parçasıdır. Kesin ve esnek kümeleme olmak üzere iki sınıfta ele alınabilir. Kesin kümelemede bir veri seti kümelere ayrıldığında herbir veriseti içerisindeki her eleman yalnız ve yalnız bir kümeye ait olabilir.Esnek kümelemede ise kesin kümelemenin aksine her bir eleman belirli bir üyelik derecesi ile birden fazla kümeye ait olabilir. Kesin kümeleme için geliştirilmiş olan artımlı algoritmalar iki ana avantaja sahiptir. 
Bunlardan birincisi sahip olduğu nonsmooth-nonconvex matematiksel model yapısı sayaseinde değişken sayısı önemli ölçüde azaltılabilir. Bunun yanısıra her bir adımda bir tek küme merkezi seçilerek belirli bir k değerine kadar bu devam ettiği için küresel anlamda amaç fonksiyonu için daha iyi değerler elde edilebilir. Bu çalışmada esnek kümeleme problemlerinin çözümü için artımlı bir bulanık algoritma hazırlanmıştır ve algoritmayla gerçek 11 veriseti üzerinde gerçekleştirilen hesaplama denemelerinin sonuçları sunulmuştur. Sonuçlar esnek kümeleme problemlerinin çözümü için hazırlanan algoritmanın yararlılığını göstermektedir.

References

  • Zadeh, A.L., Fuzzy sets, Information and Control, 8, 338-353, (1965).
  • Al-Sultan, K.S., A tabu search approach to the clustering problem, Pattern Recognition, 28(9), 1443-1451, (1995).
  • Brown, D.E., Entail, C.L., A practical application of simulated annealing to the clustering problem, Pattern Recognition, 25, 401-412, (1992).
  • Diehr, G., Evaluation of a branch and bound algorithm for clustering, SIAM J. Scientific and Statistical Computing, 6, 268-284, (1985).
  • Dubes, R., Jain, A.K., Clustering techniques: the user's dilemma, Pattern Recognition, 8, 247-260, (1976).
  • Hanjoul, P., Peeters, D., A comparison of two dual-based procedures for solving the p-median problem, European Journal of Operational Research, 20, 387-396, (1985).
  • Hansen, P., Jaumard, B., Cluster analysis and mathematical programming, Mathematical Programming, 79(1-3), 191-215, (1997).
  • Hansen, P., Mladenovic, N., J-means: a new heuristic for minimum sum-of-squares clustering, Pattern Recognition, 4, 405-413, (2001).
  • Hansen, P., Mladenovic, N., Variable neighborhood decomposition search, Journal of Heuristic, 7, 335-350, (2001).
  • Koontz, W.L.G., Narendra, P.M., Fukunaga, K., A branch and bound clustering algorithm, IEEE Transactions on Computers, 24, 908-915, (1975).
  • Du Merle, O., Hansen, P., Jaumard, B., Mladenovic, N., An interior point method for minimum sum-of-squares clustering, SIAM Journal on Scientific Computing, 21, 1485-1505, (2001).
  • Selim, S.Z., Al-Sultan, K.S., A simulated annealing algorithm for the clustering, Pattern Recognition, 24(10), 1003-1008, (1991).
  • Spath, H., Cluster Analysis Algorithms, Ellis Horwood Limited, Chichester, (1980).
  • Sun, L.X., Xie, Y.L., Song, X.H., Wang, J.H., Yu, R.Q., Cluster analysis by simulated annealing, Computers and Chemistry, 18, 103-108, (1994).
  • Likas, A., Vlassis, N. and Verbeek, J., The global k-means clustering algorithm, Pattern Recognition, 36, 451 – 461, (2001).
  • Bagirov, A.M., Modified global k-means algorithm for minimum sum-of-squares clustering problems, Pattern Recognition, 41, 3192- 3199, (2008).
  • Vanisri, D. and Loganathan, C., An efficient fuzzy clustering algorithm based on modified k-means, International Journal of Engineering Science and Technology, 5949-5958, (2010).
  • Ordin, B., Bagirov, A., A heuristic algorithm for solving the minimum sum of squares clustering problems, Journal of Global Optimization, 61, 341–361, (2015).
  • Bagirov, A., Ordin, B., Ozturk, G., Xavier, A.E., An ıncremental clustering algorithm based on hyperbolic smoothing, Computational Optimization and Applications, 61(1), 219-241, (2015).
  • Chuang, K., Tzeng, H., Chen, S., Wu, J. and Chen, T., Fuzzy c-means clustering with spatial information for image segmentation, Computerized Medical Imaging and Graphics, 30, 9–15, (2006).
  • Brandt, M.E., Bohant, T.P., Kramer, L.A. and Fletcher, J.M., Estimation of CSF, white and gray matter volumes in hydrocephalic children using fuzzy clustering of Mr Images, Computerized Medical Imaging and Graphics, 18(1), 25–34, (2004).
  • Staianoa, A., Tagliaferria, R. and Pedrycz, W., Improving RBF networks performance in regression tasks by means of a supervised fuzzy clustering, Neurocomputing, 69, 1570-1581, (2005).
  • Velmurugan, T. and Santhanam, T., Performance evaluation of k-means and fuzzy c-means clustering algorithms for statistical distributions of input data points, European Journal of Scientific Research, 3, 320-330, (2010).
  • Bagirov, A.M., Yearwood, J., A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems, European Journal of Operational Research, 170(2), 578–596, (2006).
  • Bezdek, J.C., Ehrlich, R., Full, W., FCM: The fuzzy c-means clustering algorithm, Computers & Geosciences, 10(2–3), 191–203, (1984).
  • UC Irvine Machine Learning Repository (http://archive.ics.uci.edu/ml/ )
There are 26 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Research Articles
Authors

Elvin Nasibov This is me 0000-0002-0105-5447

Burak Ordin 0000-0001-7897-3265

Publication Date March 15, 2019
Submission Date March 28, 2018
Published in Issue Year 2019 Volume: 21 Issue: 1

Cite

APA Nasibov, E., & Ordin, B. (2019). An incremental fuzzy algorithm for data clustering problems. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 21(1), 169-183. https://doi.org/10.25092/baunfbed.532619
AMA Nasibov E, Ordin B. An incremental fuzzy algorithm for data clustering problems. BAUN Fen. Bil. Enst. Dergisi. March 2019;21(1):169-183. doi:10.25092/baunfbed.532619
Chicago Nasibov, Elvin, and Burak Ordin. “An Incremental Fuzzy Algorithm for Data Clustering Problems”. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi 21, no. 1 (March 2019): 169-83. https://doi.org/10.25092/baunfbed.532619.
EndNote Nasibov E, Ordin B (March 1, 2019) An incremental fuzzy algorithm for data clustering problems. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi 21 1 169–183.
IEEE E. Nasibov and B. Ordin, “An incremental fuzzy algorithm for data clustering problems”, BAUN Fen. Bil. Enst. Dergisi, vol. 21, no. 1, pp. 169–183, 2019, doi: 10.25092/baunfbed.532619.
ISNAD Nasibov, Elvin - Ordin, Burak. “An Incremental Fuzzy Algorithm for Data Clustering Problems”. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi 21/1 (March 2019), 169-183. https://doi.org/10.25092/baunfbed.532619.
JAMA Nasibov E, Ordin B. An incremental fuzzy algorithm for data clustering problems. BAUN Fen. Bil. Enst. Dergisi. 2019;21:169–183.
MLA Nasibov, Elvin and Burak Ordin. “An Incremental Fuzzy Algorithm for Data Clustering Problems”. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 21, no. 1, 2019, pp. 169-83, doi:10.25092/baunfbed.532619.
Vancouver Nasibov E, Ordin B. An incremental fuzzy algorithm for data clustering problems. BAUN Fen. Bil. Enst. Dergisi. 2019;21(1):169-83.