TY - JOUR T1 - ÖĞRENCİLERİN STEM EĞİTİMİ TERCİHLERİNİN VERİ MADENCİLİĞİ YAKLAŞIMI ile TAHMİN EDİLMESİ TT - PREDICTING STUDENTS’ STEM CAREER INTEREST WITH DATA MINING APPROACH AU - Akçapınar, Gökhan AU - Coşgun, Erdal PY - 2019 DA - January DO - 10.17943/etku.429785 JF - Eğitim Teknolojisi Kuram ve Uygulama JO - Eğitim Teknolojisi Kuram ve Uygulama (ETKU) PB - Tolga GÜYER WT - DergiPark SN - 2147-1908 SP - 73 EP - 88 VL - 9 IS - 1 LA - tr AB - Buçalışmada, ortaokul öğrencilerinin ASSISTments isimli zeki öğretim sistemindekietkileşim verilerinden lisede STEM eğitimini tercih edip etmeyeceklerini tahminedecek bir model oluşturulması amaçlanmıştır. Çalışmada 2017 yılında düzenlenenASSISTments Veri Madenciliği Yarışması’nda (ASSISTments Data Mining Competition2017) katılımcılara sunulan veri seti kullanılmıştır. Düzenlenen yarışmanınamacı; öğrencilerin ortaokul eğitimleri süresince sistemi kullanım verilerindenlisede STEM alanında kariyerlerine devam edip etmeyeceklerini tahmin etmeye yönelikbir tahmin modeli geliştirilmesidir. Bu amaçla 2004-2007 yılları arasındasistemi kullanan 1709 öğrenciye ilişkin yaklaşık 1 milyon satırlık tıklamaverisi, öğrenciyi tanımlayan veriler silinerek, katılımcılara sunulmuştur. Katılımcılaraaynı zamanda geliştirdikleri tahmin modellerini test edebilmeleri için verisetinde yer alan 514 öğrencinin STEM kariyerine devam edip etmedikleribilgisini içeren bir eğitim veri seti verilmiştir. Bu çalışma kapsamında farklıön işlemle yöntemleri ve farklı sınıflama algoritmaları veri setinde karşılaştırmalıolarak test edilmiş ve sonuçları raporlanmıştır. Yapılan analizler sonucundaelde edilen en iyi sınıflama modeli öğrencilerin STEM eğitimi tercihlerini %89,1oranında doğru olarak tahmin etmiştir. Aynı zamanda öğrencilerin STEM eğitimi tercihlerinibelirlemede önemli olan değişkenler de analiz edilmiştir. KW - Eğitsel veri madenciliği KW - tahmin KW - sınıflama KW - makine öğrenmesi KW - STEM N2 - In this study, it was aimed to create a model topredict whether secondary school students would prefer STEM education in highschool by analyzing the use data in the intelligent tutoring system namedASSISTments with data mining methods. In the study, the data set given to theparticipants in the ASSISTments Data Mining Competition 2017 was used. Purposeof the competition is to develop a predictive model for predicting whetherstudents will continue to STEM education in high school or not. For thispurpose, approximately 1 million lines of click-stream data of 1709 studentsusing the system between 2004-2007 were given to the participant while deletingthe personal data defining the students. Participants were also given a trainingdata containing information of 514 students along with their STEM field choices.So they could test the predictive models they developed on this data set. Inthis study, different preprocessing methods and different classificationalgorithms have been tested comparatively in the data set and their results arereported. As a result of the analysis, the best classification model correctlypredicted STEM field choices of students by 89.1%. Important variables topredict students’ STEM preferences are also investigated. CR - Botelho, A. F., Baker, R. S., & Heffernan, N. T. (2017). Improving Sensor-Free Affect Detection Using Deep Learning. In E. André, R. Baker, X. Hu, M. M. T. Rodrigo & B. du Boulay (Eds.), Artificial Intelligence in Education: 18th International Conference, AIED 2017, Wuhan, China, June 28 – July 1, 2017, Proceedings (pp. 40-51). Cham: Springer International Publishing. CR - Chawla, N. V. (2005). Data Mining for Imbalanced Datasets: An Overview. In O. Maimon & L. Rokach (Eds.), Data Mining and Knowledge Discovery Handbook (pp. 853-867). Boston, MA: Springer US. CR - Desmarais, M. C., & Baker, R. S. (2012). A review of recent advances in learner and skill modeling in intelligent learning environments. User Modeling and User-Adapted Interaction, 22(1-2), 9-38. doi: 10.1007/s11257-011-9106-8 CR - Feng, M., Heffernan, N., & Koedinger, K. (2009). Addressing the assessment challenge with an online system that tutors as it assesses. User Modeling and User-Adapted Interaction, 19(3), 243-266. doi: 10.1007/s11257-009-9063-7 CR - Flanagan, B., & Ogata, H. (2017). Integration of Learning Analytics Research and Production Systems While Protecting Privacy. Paper presented at the 25th International Conference on Computers in Education, ICCE 2017, New Zealand. CR - Heffernan, N. T., & Heffernan, C. L. (2014). The ASSISTments Ecosystem: Building a Platform that Brings Scientists and Teachers Together for Minimally Invasive Research on Human Learning and Teaching. International Journal of Artificial Intelligence in Education, 24(4), 470-497. doi: 10.1007/s40593-014-0024-x CR - Koedinger, K., Baker, R., Cunningham, K., Skogsholm, A., Leber, B., & Stamper, J. (2010). A data repository for the EDM community: The PSLC DataShop. Handbook of educational data mining, 43. doi: citeulike-article-id:13242329 CR - Kowarik, A., & Templ, M. (2016). Imputation with the R Package VIM. 2016, 74(7), 16. doi: 10.18637/jss.v074.i07 CR - Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. 2008, 28(5), 26. doi: 10.18637/jss.v028.i05 CR - Kursa, M. B., & Rudnicki, W. R. (2010). Feature Selection with the Boruta Package. 2010, 36(11), 13. doi: 10.18637/jss.v036.i11 CR - Lunardon, N., Menardi, G., & Torelli, N. (2014). ROSE: A Package for Binary Imbalanced Learning. R Journal, 6(1). CR - Ocumpaugh, J., Baker, R., Gowda, S., Heffernan, N., & Heffernan, C. (2014). Population validity for educational data mining models: A case study in affect detection. British Journal of Educational Technology, 45(3), 487-501. doi: 10.1111/bjet.12156 CR - Olmo, J. L., Romero, C., Gibaja, E., & Ventura, S. (2015). Improving Meta-learning for Algorithm Selection by Using Multi-label Classification: A Case of Study with Educational Data Sets. International Journal of Computational Intelligence Systems, 8(6), 1144-1164. doi: 10.1080/18756891.2015.1113748 CR - Pardos, Z. A., Baker, R. S. J. D., San Pedro, M., Gowda, S. M., & Gowda, S. M. (2014). Affective States and State Tests: Investigating How Affect and Engagement during the School Year Predict End-of-Year Learning Outcomes. 2014, 1(1), 22. doi: 10.18608/jla.2014.11.6 CR - Pedro, M. O., Baker, R., Bowers, A., & Heffernan, N. (2013). Predicting college enrollment from student interaction with an intelligent tutoring system in middle school. Paper presented at the Educational Data Mining 2013. CR - Pedro, M. O., Ocumpaugh, J., Baker, R., & Heffernan, N. (2014). Predicting STEM and non-STEM college major enrollment from middle school interaction with mathematics educational software. Paper presented at the Educational Data Mining 2014. CR - Pedro, M. O. Z. S., Baker, R. S., Heffernan, N. T., & Ocumpaugh, J. L. (2015). Exploring college major choice and middle school student behavior, affect and learning: what happens to students who game the system? Paper presented at the Proceedings of the Fifth International Conference on Learning Analytics And Knowledge, Poughkeepsie, New York. CR - Peña-Ayala, A. (2014). Educational data mining: A survey and a data mining-based analysis of recent works. Expert Systems with Applications, 41(4, Part 1), 1432-1462. doi: http://dx.doi.org/10.1016/j.eswa.2013.08.042 CR - R Core Team. (2017). R: A language and environment for statistical computing: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/ CR - Refaeilzadeh, P., Tang, L., & Liu, H. (2016). Cross-Validation. In L. Liu & M. T. Özsu (Eds.), Encyclopedia of Database Systems (pp. 1-7). New York, NY: Springer New York. CR - San Pedro, M. O. C. Z., Baker, R. S. J. d., & Rodrigo, M. M. T. (2011). Detecting Carelessness through Contextual Estimation of Slip Probabilities among Students Using an Intelligent Tutor for Mathematics, Berlin, Heidelberg. CR - San Pedro, M. O. Z., Baker, R. S. J. d., Gowda, S. M., & Heffernan, N. T. (2013). Towards an Understanding of Affect and Knowledge from Student Interaction with an Intelligent Tutoring System. In H. C. Lane, K. Yacef, J. Mostow & P. Pavlik (Eds.), Artificial Intelligence in Education: 16th International Conference, AIED 2013, Memphis, TN, USA, July 9-13, 2013. Proceedings (pp. 41-50). Berlin, Heidelberg: Springer Berlin Heidelberg. CR - Sim, J., & Wright, C. C. (2005). The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy, 85(3), 257-268. doi: 10.1093/ptj/85.3.257 CR - Stamper, J., Koedinger, K., Baker, R. S. J. d., Skogsholm, A., Leber, B., Rankin, J., & Demi, S. (2010). PSLC DataShop: A Data Analysis Service for the Learning Science Community, Berlin, Heidelberg. CR - Wang, Y., Heffernan, N. T., & Heffernan, C. (2015). Towards better affect detectors: effect of missing skills, class features and common wrong answers. Paper presented at the Proceedings of the Fifth International Conference on Learning Analytics And Knowledge, Poughkeepsie, New York. CR - Yu, H.-F., Lo, H.-Y., Hsieh, H.-P., Lou, J.-K., McKenzie, T. G., Chou, J.-W., . . . Lin, C.-J. (2010). Feature Engineering and Classifier Ensemble for KDD Cup 2010. UR - https://doi.org/10.17943/etku.429785 L1 - https://dergipark.org.tr/en/download/article-file/641712 ER -