Research Article
BibTex RIS Cite

THE DETERMINING OF OUTLIERS ON E-LEARNING DATA IN THE CONTEXT OF EDUCATIONAL DATA MINING AND LEARNING ANALYTICS

Year 2019, Volume: 9 Issue: 1, 292 - 309, 31.01.2019
https://doi.org/10.17943/etku.475149

Abstract

In the process of learning analytics, the determination of outliers and
making smoothing before the analysis stage has an important place in reaching
the right patterns. The outliers can be determined in the real-time, as well
as, at the end of the data collection process. In this study, the use of
outlier detection methods is discussed using educational data from an
e-learning environment. Also, the methods were tested on a real-time system.
The Moodle, Learning Management System (LMS) log records were used as the data
set. The study group consists of 65 students. In this study, the total
interaction times in hypertext, video, assessment, scorm, and forum themes were
used as data set. Box-plot, Z, Grubbs, Rosner and Hampel methods were used to
determine the outliers. Outliers are determined by processing through manual
calculations without using the existing packaged software. At the same time, in
order to evaluate integrability of these methods into the e-learning environment,
some PHP script examples are coded by researchers. As a result of analyzes, it
was shown that outlier numbers changed according to the methods. When the
experiences obtained therefrom and database structure are considered; Z and
Box-Plot methods are easier to implement in e-learning systems, for the
real-time outlier detection than other methods. In other words, it has been
seen that these methods are more functional in machine teaching. However, it
should be noted that other methods have significant advantages, for that they
require hypothesis test and give more sensitive results. In the context of
machine learning, the positive and negative characteristics of these methods
are discussed.

References

  • ArcGIS Pro (2018). Box Plot. Erişim Tarihi: 24.04.2018, https://pro.arcgis.com/en/pro-app/help/analysis/geoprocessing/charts/box-plot.htm.
  • Cantador, I., & Conde, J. M. (2010). Effects of competition in education: A case study in an e-learning environment. Proceedings of the IADIS International Conference E-learning 2010, Retrieved from https://pdfs.semanticscholar.org/95a0/4babb8841f3f644e2d7d497c98807eac3595.pdf
  • Chouldary, P. (2017) Introduction to Anomaly Detection. https://www.datascience.com/blog/python-anomaly-detection Adresinden 12.10.2018 tarihinde alınmıştır.
  • Durivage, M. A. (2014). Practical engineering, process, and reliability statistics. ASQ Quality Press.
  • Ferguson, R. (2012). Learning analytics: drivers, developments and challenges. International Journal of Technology Enhanced Learning, 4(5/6), 304-317.
  • Grubbs, F. E. (1969). Procedures for detecting outlying observations in samples. Technometrics, 11(1), 1–21. https://doi.org/10.2307/1266761
  • Grubbs, F. E., & Beck, G. (1972). Extension of sample sizes and percentage points for significance tests of outlying observations. Technometrics, 14(4), 847-854.
  • Hampel, F. R. (1971). A general qualitative definition of robustness. The Annals of Mathematical Statistics, 42, 1887-1896.
  • Hampel, F. R. (1974). The influence curve and its role in robust estimation. Journal of the american statistical association, 69(346), 383-393.
  • Han, J., Kanber, M. (2006) Data Mining: Concepts and Techniques, Morgan Kaufmann.
  • Hogo, M. A. (2010). Evaluation of e-learners behaviour using different fuzzy clustering models: a comparative study. arXiv preprint arXiv:1003.1499.
  • LAK. (2011) Learning Analytics & Knowledge. Retrieved from: https://tekri.athabascau.ca/analytics/
  • McGill, R., Tukey, J. W., & Larsen, W. A. (1978). Variations of box plots. The American Statistician, 32(1), 12-16.
  • Moore, D. S. and McCabe, G. P. (1999) Introduction to the Practice of Statistics, 3rd ed. New York: W. H. Freeman, 1999.
  • Moore, J. L., Dickson-Deane, C., & Galyen, K. (2011). e-Learning, online learning, and distance learning environments: Are they the same?. The Internet and Higher Education, 14(2), 129-135.
  • Orosz, G., Farkas, D., & Roland-Levy, C. (2013). Are competition and extrinsic motivation reliable predictors of academic cheating? Frontiers in Psychology, 4(87), 1e16. http:// dx.doi.org/10.1080/10508422.2013.877393.
  • Rosner, B. (1983). Percentage points for a generalized ESD many-outlier procedure. Technometrics, 25(2), 165-172.
  • Siemens, G. (2013). Learning analytics: The emergence of a discipline. American Behavioral Scientist, 57(10), 1380-1400.
  • Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley, Reading, M.A.

EĞİTSEL VERİ MADENCİLİĞİ VE ÖĞRENME ANALİTİKLERİ BAĞLAMINDA E-ÖĞRENME VERİLERİNDE AYKIRI GÖZLEMLERİN BELİRLENMESİ

Year 2019, Volume: 9 Issue: 1, 292 - 309, 31.01.2019
https://doi.org/10.17943/etku.475149

Abstract

E-öğrenme teknolojilerinin sağladığı en önemli yararlardan birisi
de öğrenme verilerinin kayıt edilmesidir. Bu veriler eğitsel veri madenciliği
bağlamında analiz edilmekte ve aynı zamanda öğrenme analitikleri olarak da
kullanılmaktadır. Ancak kayıt edilen her veri sağlıklı bir öğrenme verisi
anlamına gelmemektedir. Bu nedenle analiz aşamasından önce aykırı gözlemlerin
belirlenmesi ve düzeltmelerin yapılması doğru sonuçlara ulaşılmasında önemli
bir yere sahiptir. Aykırı gözlemler, verilerin oluşma sürecinde (real-time)
belirlenebileceği gibi süreç sonunda elde edilen veri kümelerinden de
belirlenebilmektedir. Bu araştırmada bir e-öğrenme ortamından elde edilen
eğitsel veriler üzerinde aykırı gözlem belirleme yöntemlerinin kullanımı ele
alınmıştır. Araştırmada bir ders dönemi süresinde kullanılan Moodle öğrenme yönetim
sistemi (ÖYS) log kayıtları veri kümesi olarak kullanılmıştır. Veri kümesi 65
öğrencinin hiper-metin, video, 
değerlendirme, scorm ve forum etkileşimlerine ilişkin toplam etkileşim
süresinden oluşmaktadır. Aykırı gözlem verilerinin belirlenmesinde Z, Grubbs,
Rosner,
kutu grafiği ve
Hampel yöntemi kullanılmıştır. Bu çalışmada aykırı gözlem verileri hazır paket
programlar kullanılmadan hesaplama çizelgeleri üzerinden işlemler yapılarak
belirlenmiştir. Yapılan analizlerin sonucunda yöntemlere göre aykırı (anormal)
gözlem sayılarının değiştiği görülmüştür. Buradan elde edilen deneyimler ve
veri tabanı yapısı göz önünde bulundurulduğunda Z yöntemi ve kutu grafiği
yöntemlerinin bir e-öğrenme sisteminde uygulama anında aykırı gözlemlerin
tespiti amacıyla diğer yöntemlere göre daha kolay uygulanabilir olduğu, bir
başka ifadeyle bu yöntemlerin makineye öğretiminin daha işlevsel olduğu
görülmüştür. Bununla birlikte diğer yöntemlerin ise bir hipotez sınaması
gerektirmesi ve daha duyarlı sonuçlar vermesi yönünden önemli bir avantaja
sahip olduğu göz önünde bulundurulmalıdır. 

References

  • ArcGIS Pro (2018). Box Plot. Erişim Tarihi: 24.04.2018, https://pro.arcgis.com/en/pro-app/help/analysis/geoprocessing/charts/box-plot.htm.
  • Cantador, I., & Conde, J. M. (2010). Effects of competition in education: A case study in an e-learning environment. Proceedings of the IADIS International Conference E-learning 2010, Retrieved from https://pdfs.semanticscholar.org/95a0/4babb8841f3f644e2d7d497c98807eac3595.pdf
  • Chouldary, P. (2017) Introduction to Anomaly Detection. https://www.datascience.com/blog/python-anomaly-detection Adresinden 12.10.2018 tarihinde alınmıştır.
  • Durivage, M. A. (2014). Practical engineering, process, and reliability statistics. ASQ Quality Press.
  • Ferguson, R. (2012). Learning analytics: drivers, developments and challenges. International Journal of Technology Enhanced Learning, 4(5/6), 304-317.
  • Grubbs, F. E. (1969). Procedures for detecting outlying observations in samples. Technometrics, 11(1), 1–21. https://doi.org/10.2307/1266761
  • Grubbs, F. E., & Beck, G. (1972). Extension of sample sizes and percentage points for significance tests of outlying observations. Technometrics, 14(4), 847-854.
  • Hampel, F. R. (1971). A general qualitative definition of robustness. The Annals of Mathematical Statistics, 42, 1887-1896.
  • Hampel, F. R. (1974). The influence curve and its role in robust estimation. Journal of the american statistical association, 69(346), 383-393.
  • Han, J., Kanber, M. (2006) Data Mining: Concepts and Techniques, Morgan Kaufmann.
  • Hogo, M. A. (2010). Evaluation of e-learners behaviour using different fuzzy clustering models: a comparative study. arXiv preprint arXiv:1003.1499.
  • LAK. (2011) Learning Analytics & Knowledge. Retrieved from: https://tekri.athabascau.ca/analytics/
  • McGill, R., Tukey, J. W., & Larsen, W. A. (1978). Variations of box plots. The American Statistician, 32(1), 12-16.
  • Moore, D. S. and McCabe, G. P. (1999) Introduction to the Practice of Statistics, 3rd ed. New York: W. H. Freeman, 1999.
  • Moore, J. L., Dickson-Deane, C., & Galyen, K. (2011). e-Learning, online learning, and distance learning environments: Are they the same?. The Internet and Higher Education, 14(2), 129-135.
  • Orosz, G., Farkas, D., & Roland-Levy, C. (2013). Are competition and extrinsic motivation reliable predictors of academic cheating? Frontiers in Psychology, 4(87), 1e16. http:// dx.doi.org/10.1080/10508422.2013.877393.
  • Rosner, B. (1983). Percentage points for a generalized ESD many-outlier procedure. Technometrics, 25(2), 165-172.
  • Siemens, G. (2013). Learning analytics: The emergence of a discipline. American Behavioral Scientist, 57(10), 1380-1400.
  • Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley, Reading, M.A.
There are 19 citations in total.

Details

Primary Language Turkish
Journal Section Articles
Authors

Sinan Keskin 0000-0003-0483-3897

Furkan Aydın 0000-0003-2471-9725

Halil Yurdugül 0000-0001-7856-4664

Publication Date January 31, 2019
Published in Issue Year 2019 Volume: 9 Issue: 1

Cite

APA Keskin, S., Aydın, F., & Yurdugül, H. (2019). EĞİTSEL VERİ MADENCİLİĞİ VE ÖĞRENME ANALİTİKLERİ BAĞLAMINDA E-ÖĞRENME VERİLERİNDE AYKIRI GÖZLEMLERİN BELİRLENMESİ. Eğitim Teknolojisi Kuram Ve Uygulama, 9(1), 292-309. https://doi.org/10.17943/etku.475149