Research Article

Comparison of Clustering Methods for Mixed Data: A Case Study on Hypothetical Student Scholarship Data

Number: 59 December 12, 2025
TR EN

Comparison of Clustering Methods for Mixed Data: A Case Study on Hypothetical Student Scholarship Data

Abstract

Clustering is a widely used technique for uncovering patterns and grouping individuals within complex datasets, particularly in fields like education where both academic and contextual variables are essential. This study aims to introduce the basics and explore the performance of six clustering methods in classifying students into scholarship eligibility groups using a hypothetical student scholarship dataset generated in R software. The dataset consists of two numerical variables (GPA and Scholarship Exam Result) and four categorical variables (Financial Need, Number of Parents Employed, Employment Status, and Accommodation), reflecting typical criteria in educational funding decisions. Students were labeled as Primary, Secondary, or Rejected Candidates, and the clustering methods—K-Means, K-Modes, K-Prototypes, Partitioning Around Medoids (PAM), Latent Class Analysis (LCA), and Factor Analysis for Mixed Data (FAMD) followed by K-Means—were assessed based on how accurately they reproduced these labels. Results indicate that hybrid approaches, particularly K-Prototypes (95.6%) and PAM (92.5%), achieved the highest accuracy. FAMD + K-Means (93.9%) offered a robust alternative through dimensionality reduction while LCA produced an 85.9% accuracy. The findings highlight the value of categorical variables in clustering applications, and it also demonstrates the importance of selecting suitable clustering techniques for mixed-type educational data, especially in high-stakes contexts such as scholarship selection.

Keywords

Clustering mixed data, K-Means, K-Prototypes, Latent Class Analysis, Factor Analysis with Mixed Data

References

  1. Ahmad, A., & Khan, S. S. (2019). Survey of state-of-the-art mixed data clustering algorithms. IEEE Access, 7, 31883-31902. https://doi.org/10.1109/ACCESS.2019.2903568
  2. Bektas, A., & Schumann, R. (2019, June). How to optimize Gower distance weights for the k-medoids clustering algorithm to obtain mobility profiles of the Swiss population. In 2019 6th Swiss Conference on Data Science (SDS) (pp. 51-56). IEEE. https://doi.org/10.1109/SDS.2019.000-8.
  3. Costa, E., Papatsouma, I., & Markos, A. (2023). Benchmarking distance-based partitioning methods for mixed-type data. Advances in Data Analysis and Classification, 17(3), 701-724. https://doi.org/10.1007/s11634-022-00521-7
  4. Dutt, A., Ismail, M. A., Herawan, T., & Targio, I. A. (2024). Partition-Based Clustering Algorithms Applied to Mixed Data for Educational Data Mining: A Survey From 1971 to 2024. IEEE Access 12, 172923- 172942. https://doi.org/10.1109/ACCESS.2024.3496929
  5. Hadzi-Pavlovic, D. (2010). Finding patterns and groupings: II. Introduction to latent profile analysis and finite mixture models. Acta Neuropsychiatrica, 22(1), 40-42.https://doi.org/10.1111/j.1601-5215.2009.00442.x
  6. Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate data analysis (8th ed.). Cengage Learning.
  7. Hunt, L., & Jorgensen, M. (2011). Clustering mixed data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(4), 352-361. https://doi.org/10.1002/widm.33
  8. Kim, B. (2017). A fast K-prototypes algorithm using partial distance computation. Symmetry, 9(4), 58-68. https://doi.org/10.3390/sym9040058
  9. MacQueen, J. (1967, January). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics (Vol. 5, pp. 281-298). University of California press.
  10. Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. Wiley interdisciplinary reviews: Data mining and knowledge discovery, 10(3), 1-21. https://doi.org/10.1002/widm.1355
APA
Ataseven, H., Çokluk Bökeoğlu, Ö., & Taşdemir, F. (2025). Comparison of Clustering Methods for Mixed Data: A Case Study on Hypothetical Student Scholarship Data. Educational Academic Research, 59, 1-14. https://doi.org/10.33418/education.1674501
AMA
1.Ataseven H, Çokluk Bökeoğlu Ö, Taşdemir F. Comparison of Clustering Methods for Mixed Data: A Case Study on Hypothetical Student Scholarship Data. Educational Academic Research. 2025;(59):1-14. doi:10.33418/education.1674501
Chicago
Ataseven, Hüseyin, Ömay Çokluk Bökeoğlu, and Fazilet Taşdemir. 2025. “Comparison of Clustering Methods for Mixed Data: A Case Study on Hypothetical Student Scholarship Data”. Educational Academic Research, nos. 59: 1-14. https://doi.org/10.33418/education.1674501.
EndNote
Ataseven H, Çokluk Bökeoğlu Ö, Taşdemir F (December 1, 2025) Comparison of Clustering Methods for Mixed Data: A Case Study on Hypothetical Student Scholarship Data. Educational Academic Research 59 1–14.
IEEE
[1]H. Ataseven, Ö. Çokluk Bökeoğlu, and F. Taşdemir, “Comparison of Clustering Methods for Mixed Data: A Case Study on Hypothetical Student Scholarship Data”, Educational Academic Research, no. 59, pp. 1–14, Dec. 2025, doi: 10.33418/education.1674501.
ISNAD
Ataseven, Hüseyin - Çokluk Bökeoğlu, Ömay - Taşdemir, Fazilet. “Comparison of Clustering Methods for Mixed Data: A Case Study on Hypothetical Student Scholarship Data”. Educational Academic Research. 59 (December 1, 2025): 1-14. https://doi.org/10.33418/education.1674501.
JAMA
1.Ataseven H, Çokluk Bökeoğlu Ö, Taşdemir F. Comparison of Clustering Methods for Mixed Data: A Case Study on Hypothetical Student Scholarship Data. Educational Academic Research. 2025;:1–14.
MLA
Ataseven, Hüseyin, et al. “Comparison of Clustering Methods for Mixed Data: A Case Study on Hypothetical Student Scholarship Data”. Educational Academic Research, no. 59, Dec. 2025, pp. 1-14, doi:10.33418/education.1674501.
Vancouver
1.Hüseyin Ataseven, Ömay Çokluk Bökeoğlu, Fazilet Taşdemir. Comparison of Clustering Methods for Mixed Data: A Case Study on Hypothetical Student Scholarship Data. Educational Academic Research. 2025 Dec. 1;(59):1-14. doi:10.33418/education.1674501