Research Article
BibTex RIS Cite

Comparison of The Performances of Clustering and Dimensionality Reduction Approaches in Collaborative Filtering

Year 2024, Volume: 4 Issue: 2, 96 - 110, 30.12.2024
https://doi.org/10.54569/aair.1597930

Abstract

Recommendation systems (RS) can be defined as systems that aim to offer personalized product and service recommendations to users based on users' past product preferences and similarities with other users in the system, especially in systems that provide e-commerce services. The main purpose of RS is to reveal meaningful information from large-scale data to users and to recommend systems that aim to simplify the analysis of user behaviors and product attributes. It is possible to divide the techniques used in RS into two main categories content-based and collaborative filtering (CF) according to the information they receive as input. Content-based recommendation systems focus on analyzing the attributes of items such as articles, movies or music to generate tailored recommendations. CF methods analyze user-generated scores for products and services to identify patterns and preferences. The success of CF techniques hinges on accurately identifying user similarities within large datasets. However, in CF techniques, large-scale data sets consisting of a large number of users and the scores given by users to these products are used. Consequently, identifying user similarities in such extensive datasets poses significant challenges. Two different methods are used to overcome this problem. The first method applies clustering analysis to divide the dataset into smaller subsets (user or product), followed by the application of CF techniques. In the other method, dimensionality reduction is performed on a product (object) basis using Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) methods. Up to now, many studies have been carried out using clustering analysis and variable dimensionality reduction methods Despite extensive research, a thorough comparison of clustering and dimensionality reduction methods on real-world datasets remains unexplored. This study aims to compare the performances of eleven clustering techniques of eleven clustering techniques, four of which are non-hierarchical seven of which are hierarchical clustering algorithms, and two variable dimensionality reduction techniques, consisting of SVD and PCA METHODS, in CF.

References

  • Cai, D., Wang, X., & He, X. (2009, June). Probabilistic dyadic data analysis with local and global consistency. In Proceedings of the 26th annual international conference on machine learning (pp. 105 112).
  • George T, Merugu S., (2005), A scalable collaborative filtering framework based on co-clustering. In Proc. the 5th IEEE Int. Conf. Data Mining, Nov. pp.625-628.
  • Hastie,T ,R.Tibshirani and J. Friedman (2009). The Elements Of Statistical Learning: datamining, inference and prediction (2 ed.). Springer, pp 745.
  • Heckerman D., Chickering D., Meek C., Rounthwaite R. and Kadie C., (2001) Dependency networks for inference, collaborative filtering, and data visualization. The Journal of Machine Learning Research, 1:49–75.
  • MacQueen, J. B., (1967), Some Methods for Classification and Analysis of Multivariate Observations, Proc. Symp. Math. Statist. and Probability (5th), 281– 297.
  • Şenol, A., Kaya, M. ve Canbay, Y. (2024). Akan veri kümeleme probleminde ağaç veri yapılarının performans karşılaştırması. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi , 39 (1), 217-232.
  • Groth, D., Hartmann, S., Klie, S. ve Selbig, J. (2013). Başlıca Bileşenler analizi. Hesaplamalı Toksikoloji: Cilt II, 527-547.
  • Bakır, Ç., & Albayrak, S. (2014, April). User based and item based collaborative filtering with temporal dynamics. In 2014 22nd Signal Processing and Communications Applications Conference (Siu) (pp. 252-255). IEEE.
  • Sarwar B., Karypis G., Konstan J. and Riedl J., (2001) Item-based Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th International Conference on World Wide Web (WWW ’01). ACM, 285– 295. DOI:http://dx.doi.org/10.1145/371920.372071.
  • Şenol, A., Kaya, M. ve Canbay, Y. (2024). Akan veri kümeleme probleminde ağaç veri yapılarının performans karşılaştırması. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi , 39 (1), 217-232.
  • Xu R,Wunsch D., (2005) . Survey Of Clustering Algorithms, IEEE Transactionson Neural Networks, 16(3):645–678.
  • Altinisik, A., Yildirim, U., & Topcu, Y. I. (2022). Evaluation of failure risks for manual tightening operations in automotive assembly lines. Assembly Automation, 42(5), 653-676.
  • Koohi, H., Kiani, K. (2016), User based collaborative filtering using fuzzy c-means, Measurement, 91:134-139.
  • Chen, J., Wang, H., & Yan, Z. (2018). Evolutionary heterogeneous clustering for rating prediction based on user collaborative filtering. Swarm and Evolutionary Computation, 38, 35-41.
  • Liao, C.L., Lee, S.J. (2016) A clustering based approach to improving the efficiency of collaborative filtering recommendation, Electronic Commerce Research and Applications,18:1-9.
  • Ba, J. ve Frey, B. (2013). Derin sinir ağlarını eğitmek için uyarlanabilir bırakma. Sinirsel bilgi işleme sistemlerindeki gelişmeler , 26 .Chicago
  • Hastie,T ,R.Tibshirani and J. Friedman (2009). The Elements Of Statistical Learning: datamining, inference and prediction (2 ed.). Springer, pp 745.
  • Roelofsen, P. (2018), Time Series Clustering, Master Thesis, Vrıje Unıversıteıt, Amsterdam, 83s.
  • MacQueen, J. B., (1967), Some Methods for Classification and Analysis of Multivariate Observations, Proc. Symp. Math. Statist. and Probability (5th), 281– 297.
  • Kaufman, L. ve Rousseeuw, PJ (2009). Verilerde grupları bulma: kümeleme analizine giriş . John Wiley & Sons.
  • Kohonen T. (1995) Learning Vector Quantization. In: Self-Organizing Maps. Springer Series in Information Sciences, vol 30. Springer, Berlin, Heidelberg pp 175-189.
  • Groth, D., Hartmann, S., Klie, S. ve Selbig, J. (2013). Başlıca Bileşenler analizi. Hesaplamalı Toksikoloji: Cilt II, 527-547.
  • X. Zhang, D. Rajan, and B. Story, “Concrete crack detection using context-aware deep semantic segmentation network,” Computer-Aided Civil and Infrastructure Engineering, 34(11) (2019) 951–971; https://doi.org/10.1111/mice.12477.
  • Konstan, J.A., Riedl, J. (2012) Recommender systems: from algorithms to user experience , Adapt Interact 22: 101–23 .
  • Pan, C., Li. W. (2010) Research paper recommendation with topic analysis. In Computer Design and Applications IEEE 4, pp V4-264 .
  • Konstan J.A., Miller B.N., Maltz D., Herlocker J.L., Gordon L.R., Riedl J., (1997), Applying collaborative filtering to Usenet news.Commun ACM; 40(3):77-87.
  • Link 1 , (https://www.kaggle.com/datasets) , (Jester Collaborative Filtering Dataset) , (Restoran_tavsiye_sistemi) , (Recommendation System (CF) | Anime ),01.08.2023
  • Link 2, https://github.com/Ramakrishna05/Recommendation-Algorithm, 01.08.2023.
  • Link 3, Web: https://bookdown.org/egarpor/PM-UC3M/lm-ii-dimred.html

Comparison of The Performances of Clustering and Dimensionality Reduction Approaches in Collaborative Filtering

Year 2024, Volume: 4 Issue: 2, 96 - 110, 30.12.2024
https://doi.org/10.54569/aair.1597930

Abstract

Recommendation systems (RS) can be defined as systems that aim to offer personalized product and service recommendations to users based on users' past product preferences and similarities with other users in the system, especially in systems that provide e-commerce services. The main purpose of RS is to reveal meaningful information from large-scale data to users and to recommend systems that aim to simplify the analysis of user behaviors and product attributes. It is possible to divide the techniques used in RS into two main categories content-based and collaborative filtering (CF) according to the information they receive as input. Content-based recommendation systems focus on analyzing the attributes of items such as articles, movies or music to generate tailored recommendations. CF methods analyze user-generated scores for products and services to identify patterns and preferences. The success of CF techniques hinges on accurately identifying user similarities within large datasets. However, in CF techniques, large-scale data sets consisting of a large number of users and the scores given by users to these products are used. Consequently, identifying user similarities in such extensive datasets poses significant challenges. Two different methods are used to overcome this problem. The first method applies clustering analysis to divide the dataset into smaller subsets (user or product), followed by the application of CF techniques. In the other method, dimensionality reduction is performed on a product (object) basis using Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) methods. Up to now, many studies have been carried out using clustering analysis and variable dimensionality reduction methods Despite extensive research, a thorough comparison of clustering and dimensionality reduction methods on real-world datasets remains unexplored. This study aims to compare the performances of eleven clustering techniques of eleven clustering techniques, four of which are non-hierarchical seven of which are hierarchical clustering algorithms, and two variable dimensionality reduction techniques, consisting of SVD and PCA METHODS, in CF.

References

  • Cai, D., Wang, X., & He, X. (2009, June). Probabilistic dyadic data analysis with local and global consistency. In Proceedings of the 26th annual international conference on machine learning (pp. 105 112).
  • George T, Merugu S., (2005), A scalable collaborative filtering framework based on co-clustering. In Proc. the 5th IEEE Int. Conf. Data Mining, Nov. pp.625-628.
  • Hastie,T ,R.Tibshirani and J. Friedman (2009). The Elements Of Statistical Learning: datamining, inference and prediction (2 ed.). Springer, pp 745.
  • Heckerman D., Chickering D., Meek C., Rounthwaite R. and Kadie C., (2001) Dependency networks for inference, collaborative filtering, and data visualization. The Journal of Machine Learning Research, 1:49–75.
  • MacQueen, J. B., (1967), Some Methods for Classification and Analysis of Multivariate Observations, Proc. Symp. Math. Statist. and Probability (5th), 281– 297.
  • Şenol, A., Kaya, M. ve Canbay, Y. (2024). Akan veri kümeleme probleminde ağaç veri yapılarının performans karşılaştırması. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi , 39 (1), 217-232.
  • Groth, D., Hartmann, S., Klie, S. ve Selbig, J. (2013). Başlıca Bileşenler analizi. Hesaplamalı Toksikoloji: Cilt II, 527-547.
  • Bakır, Ç., & Albayrak, S. (2014, April). User based and item based collaborative filtering with temporal dynamics. In 2014 22nd Signal Processing and Communications Applications Conference (Siu) (pp. 252-255). IEEE.
  • Sarwar B., Karypis G., Konstan J. and Riedl J., (2001) Item-based Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th International Conference on World Wide Web (WWW ’01). ACM, 285– 295. DOI:http://dx.doi.org/10.1145/371920.372071.
  • Şenol, A., Kaya, M. ve Canbay, Y. (2024). Akan veri kümeleme probleminde ağaç veri yapılarının performans karşılaştırması. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi , 39 (1), 217-232.
  • Xu R,Wunsch D., (2005) . Survey Of Clustering Algorithms, IEEE Transactionson Neural Networks, 16(3):645–678.
  • Altinisik, A., Yildirim, U., & Topcu, Y. I. (2022). Evaluation of failure risks for manual tightening operations in automotive assembly lines. Assembly Automation, 42(5), 653-676.
  • Koohi, H., Kiani, K. (2016), User based collaborative filtering using fuzzy c-means, Measurement, 91:134-139.
  • Chen, J., Wang, H., & Yan, Z. (2018). Evolutionary heterogeneous clustering for rating prediction based on user collaborative filtering. Swarm and Evolutionary Computation, 38, 35-41.
  • Liao, C.L., Lee, S.J. (2016) A clustering based approach to improving the efficiency of collaborative filtering recommendation, Electronic Commerce Research and Applications,18:1-9.
  • Ba, J. ve Frey, B. (2013). Derin sinir ağlarını eğitmek için uyarlanabilir bırakma. Sinirsel bilgi işleme sistemlerindeki gelişmeler , 26 .Chicago
  • Hastie,T ,R.Tibshirani and J. Friedman (2009). The Elements Of Statistical Learning: datamining, inference and prediction (2 ed.). Springer, pp 745.
  • Roelofsen, P. (2018), Time Series Clustering, Master Thesis, Vrıje Unıversıteıt, Amsterdam, 83s.
  • MacQueen, J. B., (1967), Some Methods for Classification and Analysis of Multivariate Observations, Proc. Symp. Math. Statist. and Probability (5th), 281– 297.
  • Kaufman, L. ve Rousseeuw, PJ (2009). Verilerde grupları bulma: kümeleme analizine giriş . John Wiley & Sons.
  • Kohonen T. (1995) Learning Vector Quantization. In: Self-Organizing Maps. Springer Series in Information Sciences, vol 30. Springer, Berlin, Heidelberg pp 175-189.
  • Groth, D., Hartmann, S., Klie, S. ve Selbig, J. (2013). Başlıca Bileşenler analizi. Hesaplamalı Toksikoloji: Cilt II, 527-547.
  • X. Zhang, D. Rajan, and B. Story, “Concrete crack detection using context-aware deep semantic segmentation network,” Computer-Aided Civil and Infrastructure Engineering, 34(11) (2019) 951–971; https://doi.org/10.1111/mice.12477.
  • Konstan, J.A., Riedl, J. (2012) Recommender systems: from algorithms to user experience , Adapt Interact 22: 101–23 .
  • Pan, C., Li. W. (2010) Research paper recommendation with topic analysis. In Computer Design and Applications IEEE 4, pp V4-264 .
  • Konstan J.A., Miller B.N., Maltz D., Herlocker J.L., Gordon L.R., Riedl J., (1997), Applying collaborative filtering to Usenet news.Commun ACM; 40(3):77-87.
  • Link 1 , (https://www.kaggle.com/datasets) , (Jester Collaborative Filtering Dataset) , (Restoran_tavsiye_sistemi) , (Recommendation System (CF) | Anime ),01.08.2023
  • Link 2, https://github.com/Ramakrishna05/Recommendation-Algorithm, 01.08.2023.
  • Link 3, Web: https://bookdown.org/egarpor/PM-UC3M/lm-ii-dimred.html
There are 29 citations in total.

Details

Primary Language English
Subjects Machine Learning (Other)
Journal Section Research Articles
Authors

Özge Taş 0000-0001-7220-5054

Publication Date December 30, 2024
Submission Date December 7, 2024
Acceptance Date December 28, 2024
Published in Issue Year 2024 Volume: 4 Issue: 2

Cite

IEEE Ö. Taş, “Comparison of The Performances of Clustering and Dimensionality Reduction Approaches in Collaborative Filtering”, Adv. Artif. Intell. Res., vol. 4, no. 2, pp. 96–110, 2024, doi: 10.54569/aair.1597930.

88x31.png
Advances in Artificial Intelligence Research is an open access journal which means that the content is freely available without charge to the user or his/her institution. All papers are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which allows users to distribute, remix, adapt, and build upon the material in any medium or format for non-commercial purposes only, and only so long as attribution is given to the creator.

Graphic design @ Özden Işıktaş