A Machine Learning Based Approach to Enhance Mooc Users’ Classification

Youssef Mourdı; Mohammed Sadgal; Wafa Berrada Fathı; Hamada El Kabtane

doi:10.17718/tojde.727976

Research Article

A Machine Learning Based Approach to Enhance Mooc Users’ Classification

Year 2020, Volume: 21 Issue: 2, 47 - 68, 01.04.2020

Youssef Mourdı Mohammed Sadgal Wafa Berrada Fathı Hamada El Kabtane

https://doi.org/10.17718/tojde.727976

Cited By: 9

Abstract

At the beginning of the 2010 decade, the world of education and more specifically e-learning was revolutionized by the emergence of Massive Open Online Courses, better known by their acronym MOOC. Proposed more and more by universities and training centers around the world, MOOCs have become an undeniable asset for any student or person seeking to complete their initial training with free distance courses open to all areas. Despite the remarkable number of course enrollees, MOOCs have a huge dropout rate of up to 90%. This rate significantly affects the efforts made by the moderators for the success of this pedagogical model and negatively influences the learners’ experience and their supervision. To address this problem and help instructors streamline their interventions, we present a solution to classify MOOC learners into three distinct classes. The approach proposed in this paper is based on the filters methods to select the most relevant attributes and ensembling methods of machine learning algorithms. This approach has been validated by four MOOC courses from Stanford University. In order to prove the performance of the model (92.2%), a comparative study between the proposed model and other algorithms was made on several performance measures.

Keywords

Distance Education, Dropout, Feature Selection

References

Alonso-betanzos, A. (2007). Filter methods for feature selection. A comparative study. In Intelligent Data Engineering and Automated Learning - IDEAL 2007 (pp. 178–187). https://doi.org/10.1007/978- 3-642-04394-9 Alves, A. (2017). Stacking machine learning classifiers to identify Higgs bosons at the LHC. Journal of Instrumentation, 12(5). https://doi.org/10.1088/1748-0221/12/05/T05005 Armbrust, M., Ghodsi, A., Zaharia, M., Xin, R. S., Lian, C., Huai, Y., … Franklin, M. J. (2015). Spark SQL: Relational Data Processing in Spark. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data - SIGMOD ’15, 1383–1394. https://doi.org/10.1145/2723372.2742797 Bahassine, S., Madani, A., Al-sarem, M., & Kissi, M. (2018). Feature selection using an improved Chisquare for Arabic text classification. Journal of King Saud University - Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.2018.05.010 Burgos, C., Campanario, M. L., Pe??a, D. de la, Lara, J. A., Lizcano, D., & Mart??nez, M. A. (2017). Data mining for modeling students’ performance: A tutoring action plan to prevent academic dropout. Computers and Electrical Engineering, 0, 1–16. https://doi.org/10.1016/j.compeleceng.2017.03.005 Chaplot, D. S., Rhim, E., & Kim, J. (2015). Predicting student attrition in MOOCs using sentiment analysis and neural networks. CEUR Workshop Proceedings, 1432, 7–12. 66 Chen, Z., Brandon, A., Gayle, C., Nicholas, E., Daphne, K., & J.Ezekiel, E. (2015). Who’s benefiting from MOOCs, and why. Harvard Business Review, 25. Choudhury, S., & Bhowal, A. (2015). Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection. 2015 International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials, ICSTM 2015 - Proceedings, (May), 89–95. https://doi.org/10.1109/ICSTM.2015.7225395 Crossley, S., Paquette, L., Dascalu, M., McNamara, D. S., & Baker, R. S. (2016). Combining clickstream data with NLP tools to better understand MOOC completion. Proceedings of the Sixth International Conference on Learning Analytics & Knowledge - LAK ’16, 6–14. https://doi. org/10.1145/2883851.2883931 Erel, I., Stern, L. H., Tan, C., & Weisbach, M. S. (2018). Selecting Directors Using Machine Learning. Ssrn. https://doi.org/10.2139/ssrn.3144080 Feng, W., Tang, J., & Liu, T. X. (2019). Understanding Dropouts in MOOCs. Gao, C., Cheng, Q., He, P., Susilo, W., & Li, J. (2018). Privacy-preserving Naive Bayes classifiers secure against the substitution-then-comparison attack. Information Sciences, 444, 72–88. https://doi. org/10.1016/j.ins.2018.02.058 Goel, S., Sabitha, A. S., & Choudhury, T. (2019). Analytical Analysis of Learners’ Dropout Rate with Data Mining Techniques (Vol. 841). Springer Singapore. https://doi.org/10.1007/978-981-13-2285-3 Gupta, R., & Sambyal, N. (2013). An understanding Approach towards MOOCs. International Journal of Emerging Technology and Advanced Engineering, 3(6), 312--315. Retrieved from http://www.ijetae. com/files/Volume3Issue6/IJETAE_0613_52.pdf Halawa, S., Greene, D., & Mitchell, J. (2014). Dropout Prediction in MOOCs using Learner Activity Features. ELearning Papers, 37(March), 1–10. Retrieved from https://oerknowledgecloud.org/ sites/oerknowledgecloud.org/files/In_depth_37_1 (1).pdf Jain, I., Jain, V. K., & Jain, R. (2018). Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification. Applied Soft Computing Journal, 62, 203–215. https://doi.org/10.1016/j.asoc.2017.09.038 Jovic, A., Brkic, K., & Bogunovic, N. (2015). A review of feature selection methods with applications. 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015 - Proceedings, 1200–1205. https://doi.org/10.1109/ MIPRO.2015.7160458 Kabir, A., Ruiz, C., & Alvarez, S. A. (2014). Regression, Classification and Ensemble Machine Learning Approaches to Forecasting Clinical Outcomes in Ischemic Stroke. In Biomedical Engineering Systems and Technologies (Vol. 452, pp. 376–402). Springer International Publishing. https://doi. org/10.1007/978-3-662-44485-6 Karegowda, A. G., Manjunath, A. S., & Jayaram, M. A. (2010). Feature Subset Selection Problem using Wrapper Approach in Supervised Learning. International Journal of Computer Applications, 1(7), 13–17. https://doi.org/10.5120/169-295 Khourdifi, Y., & Bahaj, M. (2018). Feature Selection with Fast Correlation-Based Filter for Breast Cancer Prediction and Classification Using Machine Learning Algorithms. In International Symposium on Advanced Electrical and Communication Technologies (ISAECT) (pp. 1–6). Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., & Liu, H. (2018). Feature Selection: A Data Perspective. ACM Computing Surveys, 50. https://doi.org/10.1201/9781351070348 Liu, T., & Li, X. (2017). Finding out Reasons for Low Completion in MOOC Environment : An Explicable Approach Using Hybrid Data Mining Methods, (Meit), 376–384. 67 Liyanagunawardena, T. R., Parslow, P., & Williams, S. A. (2014). Dropout: MOOC Participants’ Perspective. Proceedings of the European MOOC Stakeholder Summit 2014, 95–100. Retrieved from http:// centaur.reading.ac.uk/36002/ Ly, A., Marsman, M., & Wagenmakers, E. (2018). Analytic posteriors for Pearson ’ s correlation coefficient. Statistica Neerlandica, 72(1), 4–13. https://doi.org/10.1111/stan.12111 Martinez-Espana, R., Bueno-Crespo, A., Timón, I., Soto, J., Munoz, A., & Cecilia, J. M. (2018). Airpollution prediction in smart cities through machine learning methods: A case of study in Murcia, Spain. Journal of Universal Computer Science, 24(3), 261–276. Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., … Talwalkar, A. (2015). MLlib: Machine Learning in Apache Spark. Journal of Machine Learning Research, 17, 1–7. https://doi. org/10.1145/2882903.2912565 Mu, Y., Liu, X., & Wang, L. (2017). A Pearson’s correlation coefficient based decision tree and its parallel implementation. Information Sciences. https://doi.org/10.1016/j.ins.2017.12.059 Naghibi, S. A., Ahmadi, K., & Daneshi, A. (2017). Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping. Water Resources Management, 31(9), 2761–2775. https://doi.org/10.1007/s11269-017-1660-3 Nagi, S., & Bhattacharyya, D. K. (2013). Classification of microarray cancer data using ensemble approach. Network Modeling and Analysis in Health Informatics and Bioinformatics, 2(3), 159–173. https:// doi.org/10.1007/s13721-013-0034-x Onah, D. F. ., Sinclair, J., & Boyatt. (2014). DROPOUT RATES OF MASSIVE OPEN ONLINE COURSES : BEHAVIOURAL PATTERNS MOOC Dropout and Completion : Existing Evaluations. Proceedings of the 6th International Conference on Education and New Learning Technologies (EDULEARN14), 1–10. https://doi.org/10.13140/RG.2.1.2402.0009 Os, H. J. A. Van, Ramos, L. A., Hilbert, A., & Leeuwen, M. Van. (2018). Predicting Outcome of Endovascular Treatment for Acute Ischemic Stroke : Potential Value of Machine Learning Algorithms. Frontiers in Neurology, 9(September), 1–8. https://doi.org/10.3389/fneur.2018.00784 Salcedo-Sanz, S., Cornejo-Bueno, L., Prieto, L., Paredes, D., & Garcia-Herrera, R. (2018). Feature selection in machine learning prediction systems for renewable energy applications. Renewable and Sustainable Energy Reviews. https://doi.org/10.1016/j.rser.2018.04.008 Sanchez-Gordon, S., & Luján-Mora, S. (2016). How could MOOCs become accessible? The case of edX and the future of inclusive online learning. Journal of Universal Computer Science, 22(1), 55–81. Sikora, R., & Al-Laymoun, O. (2014). A Modified Stacking Ensemble Machine Learning Algorithm Using Genetic Algorithms, 23(1), 43–53. https://doi.org/10.4018/978-1-4666-7272-7.ch004 Talavera, L. (2005). An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical Clustering, 440–451. https://doi.org/10.1007/11552253_40 Urbanowicz, R. J., Meeker, M., Lacava, W., Olson, R. S., & Jason, H. (2018). Relief-Based Feature Selection : Introduction and Review. Journal of Biomedical Informatics, 85, 189–203. Vora, M. N. (2011). Hadoop-HBase for Large-Scale Data. In International Conference on Computer Science and Network Technology (pp. 601–605). White, T. (2012). Hadoop: The definitive guide. (M. Loukides & M. Blanchette, Eds.), Online (3rd Editio). USA: O’Reilly Media, Inc. https://doi.org/citeulike-article-id:4882841 Xing, W., Chen, X., Stein, J., & Marcinkowski, M. (2016). Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization. Computers in Human Behavior, 58(May), 119–129. https://doi.org/10.1016/j.chb.2015.12.007 68 Xing, W., Chen, X., Stein, J., & Marcinkowski, M. (2017). Erratum: Corrigendum to “Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization” (Computers in Human Behavior (2016) 58 (119–129)(S074756321530279X)(10.1016/j.chb.2015.12.007)). Computers in Human Behavior, 66, 409. https://doi.org/10.1016/j.chb.2016.08.051 Zhu, Y., Xie, C., Wang, G. J., & Yan, X. G. (2017). Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance. Neural Computing and Applications, 28(s1), 41–50. https://doi.org/10.1007/s00521-016-2304-x Zitlau, R., Hoyle, B., Paech, K., Weller, J., Rau, M. M., & Seitz, S. (2016). Stacking for machine learning redshifts applied to SDSS galaxies. Monthly Notices of the Royal Astronomical Society, 460(3), 3152–3162. https://doi.org/10.1093/mnras/stw1454

Year 2020, Volume: 21 Issue: 2, 47 - 68, 01.04.2020

Youssef Mourdı Mohammed Sadgal Wafa Berrada Fathı Hamada El Kabtane

https://doi.org/10.17718/tojde.727976

Cited By: 9

Abstract

References

Alonso-betanzos, A. (2007). Filter methods for feature selection. A comparative study. In Intelligent Data Engineering and Automated Learning - IDEAL 2007 (pp. 178–187). https://doi.org/10.1007/978- 3-642-04394-9 Alves, A. (2017). Stacking machine learning classifiers to identify Higgs bosons at the LHC. Journal of Instrumentation, 12(5). https://doi.org/10.1088/1748-0221/12/05/T05005 Armbrust, M., Ghodsi, A., Zaharia, M., Xin, R. S., Lian, C., Huai, Y., … Franklin, M. J. (2015). Spark SQL: Relational Data Processing in Spark. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data - SIGMOD ’15, 1383–1394. https://doi.org/10.1145/2723372.2742797 Bahassine, S., Madani, A., Al-sarem, M., & Kissi, M. (2018). Feature selection using an improved Chisquare for Arabic text classification. Journal of King Saud University - Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.2018.05.010 Burgos, C., Campanario, M. L., Pe??a, D. de la, Lara, J. A., Lizcano, D., & Mart??nez, M. A. (2017). Data mining for modeling students’ performance: A tutoring action plan to prevent academic dropout. Computers and Electrical Engineering, 0, 1–16. https://doi.org/10.1016/j.compeleceng.2017.03.005 Chaplot, D. S., Rhim, E., & Kim, J. (2015). Predicting student attrition in MOOCs using sentiment analysis and neural networks. CEUR Workshop Proceedings, 1432, 7–12. 66 Chen, Z., Brandon, A., Gayle, C., Nicholas, E., Daphne, K., & J.Ezekiel, E. (2015). Who’s benefiting from MOOCs, and why. Harvard Business Review, 25. Choudhury, S., & Bhowal, A. (2015). Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection. 2015 International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials, ICSTM 2015 - Proceedings, (May), 89–95. https://doi.org/10.1109/ICSTM.2015.7225395 Crossley, S., Paquette, L., Dascalu, M., McNamara, D. S., & Baker, R. S. (2016). Combining clickstream data with NLP tools to better understand MOOC completion. Proceedings of the Sixth International Conference on Learning Analytics & Knowledge - LAK ’16, 6–14. https://doi. org/10.1145/2883851.2883931 Erel, I., Stern, L. H., Tan, C., & Weisbach, M. S. (2018). Selecting Directors Using Machine Learning. Ssrn. https://doi.org/10.2139/ssrn.3144080 Feng, W., Tang, J., & Liu, T. X. (2019). Understanding Dropouts in MOOCs. Gao, C., Cheng, Q., He, P., Susilo, W., & Li, J. (2018). Privacy-preserving Naive Bayes classifiers secure against the substitution-then-comparison attack. Information Sciences, 444, 72–88. https://doi. org/10.1016/j.ins.2018.02.058 Goel, S., Sabitha, A. S., & Choudhury, T. (2019). Analytical Analysis of Learners’ Dropout Rate with Data Mining Techniques (Vol. 841). Springer Singapore. https://doi.org/10.1007/978-981-13-2285-3 Gupta, R., & Sambyal, N. (2013). An understanding Approach towards MOOCs. International Journal of Emerging Technology and Advanced Engineering, 3(6), 312--315. Retrieved from http://www.ijetae. com/files/Volume3Issue6/IJETAE_0613_52.pdf Halawa, S., Greene, D., & Mitchell, J. (2014). Dropout Prediction in MOOCs using Learner Activity Features. ELearning Papers, 37(March), 1–10. Retrieved from https://oerknowledgecloud.org/ sites/oerknowledgecloud.org/files/In_depth_37_1 (1).pdf Jain, I., Jain, V. K., & Jain, R. (2018). Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification. Applied Soft Computing Journal, 62, 203–215. https://doi.org/10.1016/j.asoc.2017.09.038 Jovic, A., Brkic, K., & Bogunovic, N. (2015). A review of feature selection methods with applications. 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015 - Proceedings, 1200–1205. https://doi.org/10.1109/ MIPRO.2015.7160458 Kabir, A., Ruiz, C., & Alvarez, S. A. (2014). Regression, Classification and Ensemble Machine Learning Approaches to Forecasting Clinical Outcomes in Ischemic Stroke. In Biomedical Engineering Systems and Technologies (Vol. 452, pp. 376–402). Springer International Publishing. https://doi. org/10.1007/978-3-662-44485-6 Karegowda, A. G., Manjunath, A. S., & Jayaram, M. A. (2010). Feature Subset Selection Problem using Wrapper Approach in Supervised Learning. International Journal of Computer Applications, 1(7), 13–17. https://doi.org/10.5120/169-295 Khourdifi, Y., & Bahaj, M. (2018). Feature Selection with Fast Correlation-Based Filter for Breast Cancer Prediction and Classification Using Machine Learning Algorithms. In International Symposium on Advanced Electrical and Communication Technologies (ISAECT) (pp. 1–6). Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., & Liu, H. (2018). Feature Selection: A Data Perspective. ACM Computing Surveys, 50. https://doi.org/10.1201/9781351070348 Liu, T., & Li, X. (2017). Finding out Reasons for Low Completion in MOOC Environment : An Explicable Approach Using Hybrid Data Mining Methods, (Meit), 376–384. 67 Liyanagunawardena, T. R., Parslow, P., & Williams, S. A. (2014). Dropout: MOOC Participants’ Perspective. Proceedings of the European MOOC Stakeholder Summit 2014, 95–100. Retrieved from http:// centaur.reading.ac.uk/36002/ Ly, A., Marsman, M., & Wagenmakers, E. (2018). Analytic posteriors for Pearson ’ s correlation coefficient. Statistica Neerlandica, 72(1), 4–13. https://doi.org/10.1111/stan.12111 Martinez-Espana, R., Bueno-Crespo, A., Timón, I., Soto, J., Munoz, A., & Cecilia, J. M. (2018). Airpollution prediction in smart cities through machine learning methods: A case of study in Murcia, Spain. Journal of Universal Computer Science, 24(3), 261–276. Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., … Talwalkar, A. (2015). MLlib: Machine Learning in Apache Spark. Journal of Machine Learning Research, 17, 1–7. https://doi. org/10.1145/2882903.2912565 Mu, Y., Liu, X., & Wang, L. (2017). A Pearson’s correlation coefficient based decision tree and its parallel implementation. Information Sciences. https://doi.org/10.1016/j.ins.2017.12.059 Naghibi, S. A., Ahmadi, K., & Daneshi, A. (2017). Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping. Water Resources Management, 31(9), 2761–2775. https://doi.org/10.1007/s11269-017-1660-3 Nagi, S., & Bhattacharyya, D. K. (2013). Classification of microarray cancer data using ensemble approach. Network Modeling and Analysis in Health Informatics and Bioinformatics, 2(3), 159–173. https:// doi.org/10.1007/s13721-013-0034-x Onah, D. F. ., Sinclair, J., & Boyatt. (2014). DROPOUT RATES OF MASSIVE OPEN ONLINE COURSES : BEHAVIOURAL PATTERNS MOOC Dropout and Completion : Existing Evaluations. Proceedings of the 6th International Conference on Education and New Learning Technologies (EDULEARN14), 1–10. https://doi.org/10.13140/RG.2.1.2402.0009 Os, H. J. A. Van, Ramos, L. A., Hilbert, A., & Leeuwen, M. Van. (2018). Predicting Outcome of Endovascular Treatment for Acute Ischemic Stroke : Potential Value of Machine Learning Algorithms. Frontiers in Neurology, 9(September), 1–8. https://doi.org/10.3389/fneur.2018.00784 Salcedo-Sanz, S., Cornejo-Bueno, L., Prieto, L., Paredes, D., & Garcia-Herrera, R. (2018). Feature selection in machine learning prediction systems for renewable energy applications. Renewable and Sustainable Energy Reviews. https://doi.org/10.1016/j.rser.2018.04.008 Sanchez-Gordon, S., & Luján-Mora, S. (2016). How could MOOCs become accessible? The case of edX and the future of inclusive online learning. Journal of Universal Computer Science, 22(1), 55–81. Sikora, R., & Al-Laymoun, O. (2014). A Modified Stacking Ensemble Machine Learning Algorithm Using Genetic Algorithms, 23(1), 43–53. https://doi.org/10.4018/978-1-4666-7272-7.ch004 Talavera, L. (2005). An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical Clustering, 440–451. https://doi.org/10.1007/11552253_40 Urbanowicz, R. J., Meeker, M., Lacava, W., Olson, R. S., & Jason, H. (2018). Relief-Based Feature Selection : Introduction and Review. Journal of Biomedical Informatics, 85, 189–203. Vora, M. N. (2011). Hadoop-HBase for Large-Scale Data. In International Conference on Computer Science and Network Technology (pp. 601–605). White, T. (2012). Hadoop: The definitive guide. (M. Loukides & M. Blanchette, Eds.), Online (3rd Editio). USA: O’Reilly Media, Inc. https://doi.org/citeulike-article-id:4882841 Xing, W., Chen, X., Stein, J., & Marcinkowski, M. (2016). Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization. Computers in Human Behavior, 58(May), 119–129. https://doi.org/10.1016/j.chb.2015.12.007 68 Xing, W., Chen, X., Stein, J., & Marcinkowski, M. (2017). Erratum: Corrigendum to “Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization” (Computers in Human Behavior (2016) 58 (119–129)(S074756321530279X)(10.1016/j.chb.2015.12.007)). Computers in Human Behavior, 66, 409. https://doi.org/10.1016/j.chb.2016.08.051 Zhu, Y., Xie, C., Wang, G. J., & Yan, X. G. (2017). Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance. Neural Computing and Applications, 28(s1), 41–50. https://doi.org/10.1007/s00521-016-2304-x Zitlau, R., Hoyle, B., Paech, K., Weller, J., Rau, M. M., & Seitz, S. (2016). Stacking for machine learning redshifts applied to SDSS galaxies. Monthly Notices of the Royal Astronomical Society, 460(3), 3152–3162. https://doi.org/10.1093/mnras/stw1454

There are 1 citations in total.

Details

Primary Language	English
Journal Section	Articles
Authors	Youssef Mourdı This is me Mohammed Sadgal This is me Wafa Berrada Fathı This is me Hamada El Kabtane This is me
Publication Date	April 1, 2020
Submission Date	February 17, 2019
Published in Issue	Year 2020 Volume: 21 Issue: 2

Cite

APA	Mourdı, Y., Sadgal, M., Berrada Fathı, W., El Kabtane, H. (2020). A Machine Learning Based Approach to Enhance Mooc Users’ Classification. Turkish Online Journal of Distance Education, 21(2), 47-68. https://doi.org/10.17718/tojde.727976