ÖĞRENCİLERİN STEM EĞİTİMİ TERCİHLERİNİN VERİ MADENCİLİĞİ YAKLAŞIMI ile TAHMİN EDİLMESİ
Yıl 2019,
Cilt: 9 Sayı: 1, 73 - 88, 31.01.2019
Gökhan Akçapınar
,
Erdal Coşgun
Öz
Bu
çalışmada, ortaokul öğrencilerinin ASSISTments isimli zeki öğretim sistemindeki
etkileşim verilerinden lisede STEM eğitimini tercih edip etmeyeceklerini tahmin
edecek bir model oluşturulması amaçlanmıştır. Çalışmada 2017 yılında düzenlenen
ASSISTments Veri Madenciliği Yarışması’nda (ASSISTments Data Mining Competition
2017) katılımcılara sunulan veri seti kullanılmıştır. Düzenlenen yarışmanın
amacı; öğrencilerin ortaokul eğitimleri süresince sistemi kullanım verilerinden
lisede STEM alanında kariyerlerine devam edip etmeyeceklerini tahmin etmeye yönelik
bir tahmin modeli geliştirilmesidir. Bu amaçla 2004-2007 yılları arasında
sistemi kullanan 1709 öğrenciye ilişkin yaklaşık 1 milyon satırlık tıklama
verisi, öğrenciyi tanımlayan veriler silinerek, katılımcılara sunulmuştur. Katılımcılara
aynı zamanda geliştirdikleri tahmin modellerini test edebilmeleri için veri
setinde yer alan 514 öğrencinin STEM kariyerine devam edip etmedikleri
bilgisini içeren bir eğitim veri seti verilmiştir. Bu çalışma kapsamında farklı
ön işlemle yöntemleri ve farklı sınıflama algoritmaları veri setinde karşılaştırmalı
olarak test edilmiş ve sonuçları raporlanmıştır. Yapılan analizler sonucunda
elde edilen en iyi sınıflama modeli öğrencilerin STEM eğitimi tercihlerini %89,1
oranında doğru olarak tahmin etmiştir. Aynı zamanda öğrencilerin STEM eğitimi tercihlerini
belirlemede önemli olan değişkenler de analiz edilmiştir.
Kaynakça
- Botelho, A. F., Baker, R. S., & Heffernan, N. T. (2017). Improving Sensor-Free Affect Detection Using Deep Learning. In E. André, R. Baker, X. Hu, M. M. T. Rodrigo & B. du Boulay (Eds.), Artificial Intelligence in Education: 18th International Conference, AIED 2017, Wuhan, China, June 28 – July 1, 2017, Proceedings (pp. 40-51). Cham: Springer International Publishing.
- Chawla, N. V. (2005). Data Mining for Imbalanced Datasets: An Overview. In O. Maimon & L. Rokach (Eds.), Data Mining and Knowledge Discovery Handbook (pp. 853-867). Boston, MA: Springer US.
- Desmarais, M. C., & Baker, R. S. (2012). A review of recent advances in learner and skill modeling in intelligent learning environments. User Modeling and User-Adapted Interaction, 22(1-2), 9-38. doi: 10.1007/s11257-011-9106-8
- Feng, M., Heffernan, N., & Koedinger, K. (2009). Addressing the assessment challenge with an online system that tutors as it assesses. User Modeling and User-Adapted Interaction, 19(3), 243-266. doi: 10.1007/s11257-009-9063-7
- Flanagan, B., & Ogata, H. (2017). Integration of Learning Analytics Research and Production Systems While Protecting Privacy. Paper presented at the 25th International Conference on Computers in Education, ICCE 2017, New Zealand.
- Heffernan, N. T., & Heffernan, C. L. (2014). The ASSISTments Ecosystem: Building a Platform that Brings Scientists and Teachers Together for Minimally Invasive Research on Human Learning and Teaching. International Journal of Artificial Intelligence in Education, 24(4), 470-497. doi: 10.1007/s40593-014-0024-x
- Koedinger, K., Baker, R., Cunningham, K., Skogsholm, A., Leber, B., & Stamper, J. (2010). A data repository for the EDM community: The PSLC DataShop. Handbook of educational data mining, 43. doi: citeulike-article-id:13242329
- Kowarik, A., & Templ, M. (2016). Imputation with the R Package VIM. 2016, 74(7), 16. doi: 10.18637/jss.v074.i07
- Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. 2008, 28(5), 26. doi: 10.18637/jss.v028.i05
- Kursa, M. B., & Rudnicki, W. R. (2010). Feature Selection with the Boruta Package. 2010, 36(11), 13. doi: 10.18637/jss.v036.i11
- Lunardon, N., Menardi, G., & Torelli, N. (2014). ROSE: A Package for Binary Imbalanced Learning. R Journal, 6(1).
- Ocumpaugh, J., Baker, R., Gowda, S., Heffernan, N., & Heffernan, C. (2014). Population validity for educational data mining models: A case study in affect detection. British Journal of Educational Technology, 45(3), 487-501. doi: 10.1111/bjet.12156
- Olmo, J. L., Romero, C., Gibaja, E., & Ventura, S. (2015). Improving Meta-learning for Algorithm Selection by Using Multi-label Classification: A Case of Study with Educational Data Sets. International Journal of Computational Intelligence Systems, 8(6), 1144-1164. doi: 10.1080/18756891.2015.1113748
- Pardos, Z. A., Baker, R. S. J. D., San Pedro, M., Gowda, S. M., & Gowda, S. M. (2014). Affective States and State Tests: Investigating How Affect and Engagement during the School Year Predict End-of-Year Learning Outcomes. 2014, 1(1), 22. doi: 10.18608/jla.2014.11.6
- Pedro, M. O., Baker, R., Bowers, A., & Heffernan, N. (2013). Predicting college enrollment from student interaction with an intelligent tutoring system in middle school. Paper presented at the Educational Data Mining 2013.
- Pedro, M. O., Ocumpaugh, J., Baker, R., & Heffernan, N. (2014). Predicting STEM and non-STEM college major enrollment from middle school interaction with mathematics educational software. Paper presented at the Educational Data Mining 2014.
- Pedro, M. O. Z. S., Baker, R. S., Heffernan, N. T., & Ocumpaugh, J. L. (2015). Exploring college major choice and middle school student behavior, affect and learning: what happens to students who game the system? Paper presented at the Proceedings of the Fifth International Conference on Learning Analytics And Knowledge, Poughkeepsie, New York.
- Peña-Ayala, A. (2014). Educational data mining: A survey and a data mining-based analysis of recent works. Expert Systems with Applications, 41(4, Part 1), 1432-1462. doi: http://dx.doi.org/10.1016/j.eswa.2013.08.042
- R Core Team. (2017). R: A language and environment for statistical computing: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
- Refaeilzadeh, P., Tang, L., & Liu, H. (2016). Cross-Validation. In L. Liu & M. T. Özsu (Eds.), Encyclopedia of Database Systems (pp. 1-7). New York, NY: Springer New York.
- San Pedro, M. O. C. Z., Baker, R. S. J. d., & Rodrigo, M. M. T. (2011). Detecting Carelessness through Contextual Estimation of Slip Probabilities among Students Using an Intelligent Tutor for Mathematics, Berlin, Heidelberg.
- San Pedro, M. O. Z., Baker, R. S. J. d., Gowda, S. M., & Heffernan, N. T. (2013). Towards an Understanding of Affect and Knowledge from Student Interaction with an Intelligent Tutoring System. In H. C. Lane, K. Yacef, J. Mostow & P. Pavlik (Eds.), Artificial Intelligence in Education: 16th International Conference, AIED 2013, Memphis, TN, USA, July 9-13, 2013. Proceedings (pp. 41-50). Berlin, Heidelberg: Springer Berlin Heidelberg.
- Sim, J., & Wright, C. C. (2005). The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy, 85(3), 257-268. doi: 10.1093/ptj/85.3.257
- Stamper, J., Koedinger, K., Baker, R. S. J. d., Skogsholm, A., Leber, B., Rankin, J., & Demi, S. (2010). PSLC DataShop: A Data Analysis Service for the Learning Science Community, Berlin, Heidelberg.
- Wang, Y., Heffernan, N. T., & Heffernan, C. (2015). Towards better affect detectors: effect of missing skills, class features and common wrong answers. Paper presented at the Proceedings of the Fifth International Conference on Learning Analytics And Knowledge, Poughkeepsie, New York.
- Yu, H.-F., Lo, H.-Y., Hsieh, H.-P., Lou, J.-K., McKenzie, T. G., Chou, J.-W., . . . Lin, C.-J. (2010). Feature Engineering and Classifier Ensemble for KDD Cup 2010.
PREDICTING STUDENTS’ STEM CAREER INTEREST WITH DATA MINING APPROACH
Yıl 2019,
Cilt: 9 Sayı: 1, 73 - 88, 31.01.2019
Gökhan Akçapınar
,
Erdal Coşgun
Öz
In this study, it was aimed to create a model to
predict whether secondary school students would prefer STEM education in high
school by analyzing the use data in the intelligent tutoring system named
ASSISTments with data mining methods. In the study, the data set given to the
participants in the ASSISTments Data Mining Competition 2017 was used. Purpose
of the competition is to develop a predictive model for predicting whether
students will continue to STEM education in high school or not. For this
purpose, approximately 1 million lines of click-stream data of 1709 students
using the system between 2004-2007 were given to the participant while deleting
the personal data defining the students. Participants were also given a training
data containing information of 514 students along with their STEM field choices.
So they could test the predictive models they developed on this data set. In
this study, different preprocessing methods and different classification
algorithms have been tested comparatively in the data set and their results are
reported. As a result of the analysis, the best classification model correctly
predicted STEM field choices of students by 89.1%. Important variables to
predict students’ STEM preferences are also investigated.
Kaynakça
- Botelho, A. F., Baker, R. S., & Heffernan, N. T. (2017). Improving Sensor-Free Affect Detection Using Deep Learning. In E. André, R. Baker, X. Hu, M. M. T. Rodrigo & B. du Boulay (Eds.), Artificial Intelligence in Education: 18th International Conference, AIED 2017, Wuhan, China, June 28 – July 1, 2017, Proceedings (pp. 40-51). Cham: Springer International Publishing.
- Chawla, N. V. (2005). Data Mining for Imbalanced Datasets: An Overview. In O. Maimon & L. Rokach (Eds.), Data Mining and Knowledge Discovery Handbook (pp. 853-867). Boston, MA: Springer US.
- Desmarais, M. C., & Baker, R. S. (2012). A review of recent advances in learner and skill modeling in intelligent learning environments. User Modeling and User-Adapted Interaction, 22(1-2), 9-38. doi: 10.1007/s11257-011-9106-8
- Feng, M., Heffernan, N., & Koedinger, K. (2009). Addressing the assessment challenge with an online system that tutors as it assesses. User Modeling and User-Adapted Interaction, 19(3), 243-266. doi: 10.1007/s11257-009-9063-7
- Flanagan, B., & Ogata, H. (2017). Integration of Learning Analytics Research and Production Systems While Protecting Privacy. Paper presented at the 25th International Conference on Computers in Education, ICCE 2017, New Zealand.
- Heffernan, N. T., & Heffernan, C. L. (2014). The ASSISTments Ecosystem: Building a Platform that Brings Scientists and Teachers Together for Minimally Invasive Research on Human Learning and Teaching. International Journal of Artificial Intelligence in Education, 24(4), 470-497. doi: 10.1007/s40593-014-0024-x
- Koedinger, K., Baker, R., Cunningham, K., Skogsholm, A., Leber, B., & Stamper, J. (2010). A data repository for the EDM community: The PSLC DataShop. Handbook of educational data mining, 43. doi: citeulike-article-id:13242329
- Kowarik, A., & Templ, M. (2016). Imputation with the R Package VIM. 2016, 74(7), 16. doi: 10.18637/jss.v074.i07
- Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. 2008, 28(5), 26. doi: 10.18637/jss.v028.i05
- Kursa, M. B., & Rudnicki, W. R. (2010). Feature Selection with the Boruta Package. 2010, 36(11), 13. doi: 10.18637/jss.v036.i11
- Lunardon, N., Menardi, G., & Torelli, N. (2014). ROSE: A Package for Binary Imbalanced Learning. R Journal, 6(1).
- Ocumpaugh, J., Baker, R., Gowda, S., Heffernan, N., & Heffernan, C. (2014). Population validity for educational data mining models: A case study in affect detection. British Journal of Educational Technology, 45(3), 487-501. doi: 10.1111/bjet.12156
- Olmo, J. L., Romero, C., Gibaja, E., & Ventura, S. (2015). Improving Meta-learning for Algorithm Selection by Using Multi-label Classification: A Case of Study with Educational Data Sets. International Journal of Computational Intelligence Systems, 8(6), 1144-1164. doi: 10.1080/18756891.2015.1113748
- Pardos, Z. A., Baker, R. S. J. D., San Pedro, M., Gowda, S. M., & Gowda, S. M. (2014). Affective States and State Tests: Investigating How Affect and Engagement during the School Year Predict End-of-Year Learning Outcomes. 2014, 1(1), 22. doi: 10.18608/jla.2014.11.6
- Pedro, M. O., Baker, R., Bowers, A., & Heffernan, N. (2013). Predicting college enrollment from student interaction with an intelligent tutoring system in middle school. Paper presented at the Educational Data Mining 2013.
- Pedro, M. O., Ocumpaugh, J., Baker, R., & Heffernan, N. (2014). Predicting STEM and non-STEM college major enrollment from middle school interaction with mathematics educational software. Paper presented at the Educational Data Mining 2014.
- Pedro, M. O. Z. S., Baker, R. S., Heffernan, N. T., & Ocumpaugh, J. L. (2015). Exploring college major choice and middle school student behavior, affect and learning: what happens to students who game the system? Paper presented at the Proceedings of the Fifth International Conference on Learning Analytics And Knowledge, Poughkeepsie, New York.
- Peña-Ayala, A. (2014). Educational data mining: A survey and a data mining-based analysis of recent works. Expert Systems with Applications, 41(4, Part 1), 1432-1462. doi: http://dx.doi.org/10.1016/j.eswa.2013.08.042
- R Core Team. (2017). R: A language and environment for statistical computing: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
- Refaeilzadeh, P., Tang, L., & Liu, H. (2016). Cross-Validation. In L. Liu & M. T. Özsu (Eds.), Encyclopedia of Database Systems (pp. 1-7). New York, NY: Springer New York.
- San Pedro, M. O. C. Z., Baker, R. S. J. d., & Rodrigo, M. M. T. (2011). Detecting Carelessness through Contextual Estimation of Slip Probabilities among Students Using an Intelligent Tutor for Mathematics, Berlin, Heidelberg.
- San Pedro, M. O. Z., Baker, R. S. J. d., Gowda, S. M., & Heffernan, N. T. (2013). Towards an Understanding of Affect and Knowledge from Student Interaction with an Intelligent Tutoring System. In H. C. Lane, K. Yacef, J. Mostow & P. Pavlik (Eds.), Artificial Intelligence in Education: 16th International Conference, AIED 2013, Memphis, TN, USA, July 9-13, 2013. Proceedings (pp. 41-50). Berlin, Heidelberg: Springer Berlin Heidelberg.
- Sim, J., & Wright, C. C. (2005). The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy, 85(3), 257-268. doi: 10.1093/ptj/85.3.257
- Stamper, J., Koedinger, K., Baker, R. S. J. d., Skogsholm, A., Leber, B., Rankin, J., & Demi, S. (2010). PSLC DataShop: A Data Analysis Service for the Learning Science Community, Berlin, Heidelberg.
- Wang, Y., Heffernan, N. T., & Heffernan, C. (2015). Towards better affect detectors: effect of missing skills, class features and common wrong answers. Paper presented at the Proceedings of the Fifth International Conference on Learning Analytics And Knowledge, Poughkeepsie, New York.
- Yu, H.-F., Lo, H.-Y., Hsieh, H.-P., Lou, J.-K., McKenzie, T. G., Chou, J.-W., . . . Lin, C.-J. (2010). Feature Engineering and Classifier Ensemble for KDD Cup 2010.