A Hybrid Unsupervised ML Framework for Predicting Student Admissions in Higher Education: A Case Study
Abstract
In higher education admissions, particularly in resource-constrained institutions, researchers face challenges in predicting student enrollment likelihood from post-entrance exam visitor data due to imbalanced or unlabeled datasets. This paper proposes a hybrid unsupervised machine learning framework that integrates One-Class Support Vector Machine (OCSVM) for anomaly detection, Gaussian Mixture Model (GMM) for probabilistic clustering, and Nearest Neighbors (NN) for similarity-based scoring. We apply the framework to a real-world dataset of 721 admitted students from D. Y. Patil College of Engineering and Technology (DYPCET), Kolhapur, India. The framework engineers domain-specific features (academic performance, geographic proximity, engagement indicators) to compute a composite likelihood score (0-1 scale). Evaluation yields an 86.1% inlier rate for OCSVM, a silhouette score of up to 0.342 (mean 0.242 ± 0.061 across random initializations) for GMM, and an average NN distance of 0.468. The system enables targeted faculty follow-ups, supporting admission cells in prioritizing high-likelihood candidates for counselor outreach. This deployable pipeline addresses gaps in unsupervised admission prediction for small colleges with limited labeled data.
Keywords
Supporting Institution
Ethical Statement
Thanks
References
- AISHE. (2022). All India Survey on Higher Education 2021–22. Ministry of Education, Government of India. https://aishe.gov.in
- Al-Doulat, A., Nur, N., Karduni, A., Benedict, A., Al-Hossami, E., Maher, M. L., Dou, W., Dorodchi, M., & Niu, X. (2020). Making sense of student success and risk through unsupervised machine learning and interactive storytelling. In I. I. Bittencourt, M. Cukurova, K. Muldner, R. Luckin, & E. Millán (Eds.), Artificial Intelligence in Education. AIED 2020. Lecture Notes in Computer Science (Vol. 12163, pp. 3–15). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-52237-7_1
- Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–185. https://doi.org/10.1080/00031305.1992.10475879
- Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J. M., & Perona, I. (2013). An extensive comparative study of cluster validity indices. Pattern Recognition, 46(1), 243–256. https://doi.org/10.1016/j.patcog.2012.07.021
- Delen, D. (2010). A comparative analysis of machine learning techniques for student retention management. Decision Support Systems, 49(4), 498–506. https://doi.org/10.1016/j.dss.2010.06.003
- García, S., Luengo, J., & Herrera, F. (2015). Data Preprocessing in Data Mining. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-10247-4
- Guo, T., Bai, X., Tian, X., Firmin, S., & Xia, F. (2022). Educational anomaly analytics: Features, methods, and challenges. Frontiers in Big Data, 4, 811840. https://doi.org/10.3389/fdata.2021.811840
- Han, J., Pei, J., & Tong, H. (2022). Data Mining: Concepts and Techniques (4th ed.). Morgan Kaufmann. https://doi.org/10.1016/C2013-0-18660-6
Details
Primary Language
English
Subjects
Machine Learning Algorithms, Data Engineering and Data Science
Journal Section
Research Article
Authors
Saiprasad Lendale
This is me
0009-0007-4194-4568
India
Sachin Takmare
This is me
0009-0008-4868-0670
India
Publication Date
May 19, 2026
Submission Date
January 28, 2026
Acceptance Date
May 17, 2026
Published in Issue
Year 2026 Number: 10