Veri Madenciliğinde Kullanılan Öğrenme Yöntemlerinin Farklı Koşullar Altında Karşılaştırılması

Gökhan Aksu; Nuri Doğan

doi:10.30964/auebfd.464262

Research Article

Veri Madenciliğinde Kullanılan Öğrenme Yöntemlerinin Farklı Koşullar Altında Karşılaştırılması

Year 2018, Volume: 51 Issue: 3, 71 - 100, 01.12.2018

Gökhan Aksu , Nuri Doğan

https://doi.org/10.30964/auebfd.464262

Cited By: 5

Abstract

Bu çalışmada veri madenciliği ve makine öğrenme yaklaşımının eğitim
alanında kullanılması ve bu algoritmalara dayalı olarak elde edilen sonuçların
güvenirlik ve geçerlik değerlerinin ne düzeyde olduğu belirlenmeye
çalışılmıştır. PISA 2015 Türkiye ortalamasına göre öğrencilerin başarılı ve
başarısız olarak sınıflandığı çalışmada farklı öğrenme yöntemleri kullanılarak
fen okuryazarlığı bakımından öğrencilerin hangi sınıfta yer alacağı tahmin
edilmiş ve bu aşamada elde edilen sonuçların güvenirlik ve geçerlik ölçütleri
incelenmiştir. Çalışma kapsamında ele alınan 8 farklı öğrenme yönteminden doğru
sınıflama sayısı, doğru sınıflama oranı, kappa istatistiği, karekök hata ve
göreceli karekök hata değerleri bakımından en iyi sonuçların Random Forest
yöntemiyle elde edilirken Ridge lojistik regresyon, Lojistik model ve Hoefding
tree yöntemlerinin en başarılı diğer yöntemler olduğu belirlenmiştir. Çapraz
geçerleme yöntemi kullanılmadan tüm veri setinin eğitim ve test veri seti
olarak ayrılması durumunda Lojistik model, Random Forest ve Ridge Regresyon
yöntemlerinin farklı büyüklükteki test verilerinde en düşük hata değerlerini
verirken Random Tree ve J.48 yönteminlerinin en yüksek hata değerlerine sahip
olduğu belirlenmiştir. Ridge regresyon, Random forest ve Lojistik model
tarafından elde edilen hata değerlerinin de farklı yüzdelikteki test
verilerinde oldukça tutarlı olduğu sonucuna ulaşılmıştır. Farklı yöntemler
yardımıyla elde edilen ölçme sonuçlarının veri setini test ve eğitim verisi
olarak ayırmayıp aynı veri seti üzerinden hem öğrenme yöntemini eğitip hem de
test ettiğimiz taktirde özellikle Random tree ve J.48 öğrenme yöntemlerinin
gerçek performanslarından daha yüksek doğru sınıflama oranına sahip oldukları
belirlenmiştir.

Keywords

Veri madenciliği , WEKA , Öğrenme yöntemi , Sınıflama

References

Ahmed, A. B., & Elaraby, I. S. (2014) Data Mining: A prediction for student's performance using classification method. World Journal of Computer Application and Technology, 2 (2), 43-47.
Boss, D. D. (2003). Introduction to the Bootstrap World, Statistical Science, 18 (2), 168-174.
Bramer, M. (2013). Principles of Data Mining (2nd ed.), London: Springer-Verlag.
Brown, M. S. (2014). Data Mining For Dummies, Hoboken, New Jersey: John Wiley & Sons.
Chamatkar, A. J., & Butey, P. K. (2014) Importance of data mining with different types of data applications and challenging areas, Journal of Engineering Research and Applications, 4 (5), 38-41.
Chen, S. X. and J. S. Liu (1997). Statistical applications of the Poisson-binomial and conditional Bernoulli distributions. Statistica Sinica 7, 875–892.
Dekking, F. M., Kraaikamp, C., Lopuhaa, H. P. & Meester, L. E. (2005) A modern ıntroduction to probability and statistics: understanding why and how, United States of America: Springer Science+Business Media.
Domingos, P. (2012), A few useful things to know about machine learning, Communications of the ACM, 55 (10), 78–87.
Efron, B. (1979). Bootstrap methods: another look at the jackknife. Annals of Statistics, 7, 1–26.
Elayidom, S. M. (2012). Design and development of data mining models for the prediction of manpower (Unpublished Doctoral Thesis), Cochin University of Science and Technology Computer Science and Engineering, Kochi, India.
Elhamahmy, M. E., Elmahdy, H. N., & Saroit, I. A. (2010). A new approach for evaluating intrusion detection system, CiiT International Journal of Artificial Intelligent Systems and Machine Learning, 2 (11), 290-298.
Fernandez-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15, 3133–3181.
Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education (8th ed.). New York: Mc Graw HIll.
Friedman, J. H., & Fisher N. I. (1999). Bump hunting in high-dimensional data. Stat Comput, 9,123–143.
Galdi, P., & Tagliaferri, R. (2017). Data Mining: Accuracy and Error Measures for Classification and Prediction, in Reference Module in Life Sciences, Holland: Elsevier
Han, J., & Kamber, M. (2006). Data Mining: Concepts and Techniques, (2nd edition),
Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning: Data mining, inference, and prediction. New York, NY: Springer.
Huang, S., & Fang, N. (2013). Predicting student academic performance in an engineering dynamics course: A comparison of four types of predictive mathematical models. Computers & Education, 61, 133–145.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection, Appears in the International Joint Conference on Articial Intelligence, 2, 1137-1143.
Kuonen, D. (2018). An introduction to bootstrap methods and their application, WBL in Angewandter Statistik ETHZ 2017/19, 1-143.
Liaw A. & Wiener M. (2002), Classification and regression by random forest, R News, 2 (3), 18-22.
Lykourentzou, I., Giannoukos, I., Mpardis, G., Nikolopoulos, V., & Loumos, V. (2009). Early and dynamic student achievement prediction in e-learning courses using neural networks, Journal of the American Society for Information Science and Technology, 60 (2), 372–380.
Mavroforakis, C. (2011). Data mining with WEKA, Boston University, Retrived from http://cs-people.bu.edu/cmav/cs105/files/lab12/intro_to_weka.pdf
MEB (2016). PISA 2015 Ulusal Ön Raporu. Ankara: MEB
Mehdiyev, N., Enke, D., Fettke, P., & Loos, P. (2016). Evaluating Forecasting Methods by Considering Different Accuracy Measures, Procedia Computer Science, 95, 264 – 271.
Ng, A.Y. (1997). Preventing "overfitting" of cross-validation data. In D.H. Fisher (Ed.), Proceedings of the 14th International Conference on Machine Learning, Nashville, TN, USA, July 8–12, San Francisco, CA: Morgan Kaufmann.
North, M. A. (2012). Data Mining for the Masses, ABD: A Global Text Project Book.
Olmo, J.L. Romero, J.R. & Ventura, S. (2012). Classification rule mining using ant programming guided by grammar with multiple Pareto fronts. Soft Computing, 16 (12), 2143-2163.
Ramageri, M. B. (2010). Data mining techniques and applications, Indian Journal of Computer Science and Engineering, 1 (4), 301-305.
Refaeilzadeh, P., Tang, L., & Liu., H. (2009). Cross Validation. In Encyclopedia of Database Systems, (Editors: M. Tamer Ã–zsu and Ling Liu). New York, ABD: Springer publishing.
Schwenke, C., & Schering, A. (2007). True Positives, True Negatives, False Positives, False Negatives. New Jersey, ABD: Wiley Encyclopedia of Clinical Trials.
Sinha, A. P., & May, J. H. (2005) Evaluating and tuning predictive data mining models using receiver operating characteristic curves, Journal of Management Information Systems, 21 (3), 249-280.
Sinharay, S. (2016). An NCME instructional module on data mining methods for classification and regression, Educational Measurement: Issues and Practice, 35 (3), 38–54.
Souza, J., Matwin, S., & Japkowicz, N. (2002). Evaluating data mining models: a pattern language. In: 9th Conference on Pattern Language of Programs (PLOP’02), Monticello, Illinois, 8–12 September 2002.
Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14, 323–348.
Svetnik, V., Liaw, A., Tong, C., & Wang, T. (2004). Application of Breimans random forest to modeling structure-activity relationships of pharmaceutical molecules. In F. Roli, J. Kittler, & T. Windeatt (Eds.), Multiple classifier systems (vol. 3077, pp. 334–343). Cagliari, Italy: Springer.
Vanwinckelen, G., & Blockeel, H. (2012) On estimating model accuracy with repeated cross-validation, Belgian-Dutch Conference on Machine Learning (BeneLearn) edition:21 location, 24-25 May 2012.
Weiss, S. M., & Kulikowski, C. A. (1991). Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. San Mateo, CA: Morgan Kaufmann.
Williams, G. (2011) Data mining with Rattle and R: The art of excavating data for knowledge discovery, New York, USA: Springer Science+Business Media
Witten, I. H., & Frank, E. (2005) Data minig: Practical machine learning tools and techniques, United States of America: Morgan Kaufmann publications
Witten, I. H., Frank, E., & Hall, M. (2016) Data minig: Practical machine learning tools and techniques, United States of America: Morgan Kaufmann publications.

Year 2018, Volume: 51 Issue: 3, 71 - 100, 01.12.2018

Gökhan Aksu , Nuri Doğan

https://doi.org/10.30964/auebfd.464262

Cited By: 5

Abstract

References

Ahmed, A. B., & Elaraby, I. S. (2014) Data Mining: A prediction for student's performance using classification method. World Journal of Computer Application and Technology, 2 (2), 43-47.
Boss, D. D. (2003). Introduction to the Bootstrap World, Statistical Science, 18 (2), 168-174.
Bramer, M. (2013). Principles of Data Mining (2nd ed.), London: Springer-Verlag.
Brown, M. S. (2014). Data Mining For Dummies, Hoboken, New Jersey: John Wiley & Sons.
Chamatkar, A. J., & Butey, P. K. (2014) Importance of data mining with different types of data applications and challenging areas, Journal of Engineering Research and Applications, 4 (5), 38-41.
Chen, S. X. and J. S. Liu (1997). Statistical applications of the Poisson-binomial and conditional Bernoulli distributions. Statistica Sinica 7, 875–892.
Dekking, F. M., Kraaikamp, C., Lopuhaa, H. P. & Meester, L. E. (2005) A modern ıntroduction to probability and statistics: understanding why and how, United States of America: Springer Science+Business Media.
Domingos, P. (2012), A few useful things to know about machine learning, Communications of the ACM, 55 (10), 78–87.
Efron, B. (1979). Bootstrap methods: another look at the jackknife. Annals of Statistics, 7, 1–26.
Elayidom, S. M. (2012). Design and development of data mining models for the prediction of manpower (Unpublished Doctoral Thesis), Cochin University of Science and Technology Computer Science and Engineering, Kochi, India.
Elhamahmy, M. E., Elmahdy, H. N., & Saroit, I. A. (2010). A new approach for evaluating intrusion detection system, CiiT International Journal of Artificial Intelligent Systems and Machine Learning, 2 (11), 290-298.
Fernandez-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15, 3133–3181.
Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education (8th ed.). New York: Mc Graw HIll.
Friedman, J. H., & Fisher N. I. (1999). Bump hunting in high-dimensional data. Stat Comput, 9,123–143.
Galdi, P., & Tagliaferri, R. (2017). Data Mining: Accuracy and Error Measures for Classification and Prediction, in Reference Module in Life Sciences, Holland: Elsevier
Han, J., & Kamber, M. (2006). Data Mining: Concepts and Techniques, (2nd edition),
Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning: Data mining, inference, and prediction. New York, NY: Springer.
Huang, S., & Fang, N. (2013). Predicting student academic performance in an engineering dynamics course: A comparison of four types of predictive mathematical models. Computers & Education, 61, 133–145.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection, Appears in the International Joint Conference on Articial Intelligence, 2, 1137-1143.
Kuonen, D. (2018). An introduction to bootstrap methods and their application, WBL in Angewandter Statistik ETHZ 2017/19, 1-143.
Liaw A. & Wiener M. (2002), Classification and regression by random forest, R News, 2 (3), 18-22.
Lykourentzou, I., Giannoukos, I., Mpardis, G., Nikolopoulos, V., & Loumos, V. (2009). Early and dynamic student achievement prediction in e-learning courses using neural networks, Journal of the American Society for Information Science and Technology, 60 (2), 372–380.
Mavroforakis, C. (2011). Data mining with WEKA, Boston University, Retrived from http://cs-people.bu.edu/cmav/cs105/files/lab12/intro_to_weka.pdf
MEB (2016). PISA 2015 Ulusal Ön Raporu. Ankara: MEB
Mehdiyev, N., Enke, D., Fettke, P., & Loos, P. (2016). Evaluating Forecasting Methods by Considering Different Accuracy Measures, Procedia Computer Science, 95, 264 – 271.
Ng, A.Y. (1997). Preventing "overfitting" of cross-validation data. In D.H. Fisher (Ed.), Proceedings of the 14th International Conference on Machine Learning, Nashville, TN, USA, July 8–12, San Francisco, CA: Morgan Kaufmann.
North, M. A. (2012). Data Mining for the Masses, ABD: A Global Text Project Book.
Olmo, J.L. Romero, J.R. & Ventura, S. (2012). Classification rule mining using ant programming guided by grammar with multiple Pareto fronts. Soft Computing, 16 (12), 2143-2163.
Ramageri, M. B. (2010). Data mining techniques and applications, Indian Journal of Computer Science and Engineering, 1 (4), 301-305.
Refaeilzadeh, P., Tang, L., & Liu., H. (2009). Cross Validation. In Encyclopedia of Database Systems, (Editors: M. Tamer Ã–zsu and Ling Liu). New York, ABD: Springer publishing.
Schwenke, C., & Schering, A. (2007). True Positives, True Negatives, False Positives, False Negatives. New Jersey, ABD: Wiley Encyclopedia of Clinical Trials.
Sinha, A. P., & May, J. H. (2005) Evaluating and tuning predictive data mining models using receiver operating characteristic curves, Journal of Management Information Systems, 21 (3), 249-280.
Sinharay, S. (2016). An NCME instructional module on data mining methods for classification and regression, Educational Measurement: Issues and Practice, 35 (3), 38–54.
Souza, J., Matwin, S., & Japkowicz, N. (2002). Evaluating data mining models: a pattern language. In: 9th Conference on Pattern Language of Programs (PLOP’02), Monticello, Illinois, 8–12 September 2002.
Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14, 323–348.
Svetnik, V., Liaw, A., Tong, C., & Wang, T. (2004). Application of Breimans random forest to modeling structure-activity relationships of pharmaceutical molecules. In F. Roli, J. Kittler, & T. Windeatt (Eds.), Multiple classifier systems (vol. 3077, pp. 334–343). Cagliari, Italy: Springer.
Vanwinckelen, G., & Blockeel, H. (2012) On estimating model accuracy with repeated cross-validation, Belgian-Dutch Conference on Machine Learning (BeneLearn) edition:21 location, 24-25 May 2012.
Weiss, S. M., & Kulikowski, C. A. (1991). Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. San Mateo, CA: Morgan Kaufmann.
Williams, G. (2011) Data mining with Rattle and R: The art of excavating data for knowledge discovery, New York, USA: Springer Science+Business Media
Witten, I. H., & Frank, E. (2005) Data minig: Practical machine learning tools and techniques, United States of America: Morgan Kaufmann publications
Witten, I. H., Frank, E., & Hall, M. (2016) Data minig: Practical machine learning tools and techniques, United States of America: Morgan Kaufmann publications.

There are 41 citations in total.

Details

Primary Language	Turkish
Subjects	Studies on Education
Journal Section	Research Article
Authors	Gökhan Aksu 0000-0003-2563-6112 Nuri Doğan 0000-0001-6274-2016
Publication Date	December 1, 2018
Published in Issue	Year 2018 Volume: 51 Issue: 3

Cite

APA	Aksu, G., & Doğan, N. (2018). Veri Madenciliğinde Kullanılan Öğrenme Yöntemlerinin Farklı Koşullar Altında Karşılaştırılması. Ankara University Journal of Faculty of Educational Sciences (JFES), 51(3), 71-100. https://doi.org/10.30964/auebfd.464262