Research Article
BibTex RIS Cite

Veri Madenciliğinde Kullanılan Güncel Bir Analiz Programı: WEKA

Year 2019, Volume: 10 Issue: 1, 80 - 95, 29.03.2019
https://doi.org/10.21031/epod.399832

Abstract

Bu çalışmada veri madenciliği yöntemlerinden ve bu alanda en çok kullanılan
programlardan birini tanıtmak amaçlanmıştır. Teknoloji çağının yaşandığı
günümüzde sahip olunan bilgi miktarı sürekli artış göstermekte ve bu
bilgilerden anlamlı sonuçlar çıkarmak oldukça değerli bir çalışma alanı olarak
görülmektedir. Veri madenciliğinde büyük miktardaki verinin içinde saklı olan
ve araştırmacılar için oldukça faydalı olan bilgilerin bir dizi işlemlerin
ardından ortaya çıkarılması hedeflenmektedir. Genel olarak tahminleme ve
sınıflama üzerine kurulu bu yaklaşımda yeni ve doğruluğu henüz tam olarak test
edilmemiş birçok yazılım bulunmaktadır. Bu çalışmada alan yazında veri
madenciliği ile ilgili tarama yaptığınızda ilk karşınıza çıkan programlardan
biri olan WEKA yazılımının ne olduğu, programın nasıl çalıştırıldığı ve
analizler ile çıktı dosyalarının neler içerdiği açıklanmaya çalışılmıştır. Çalışmada
ayrıca bu program üzerinden analiz yapmak isteyen uygulayıcılar için yazılımın
üstün yönlerinin neler olduğu anlatılarak neler yapabilecekleri konusunda
önerilerde bulunulmuştur.

References

  • Bradley, E. & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Boca Raton, USA: Chapman and Hall
  • Breiman L., Friedman J.H., Olsen E.A. & Stone C.J., (1984), Classification and Regression Trees, Belmont, CA: Wadsworth International Group.
  • Fayyad, U. & Irani, K. (1992) On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8, 87–102.
  • Holte, R. C. 1993. Very simple classification rules perform well on most commonly used datasets. Machine Learning 11:63–91.
  • Ibarguren, I., Perez, J. M., Muguerza, J., Gurrutxaga, I & Arbelaitz, O. (2014). An update of the J48Consolidated WEKA’s class: CTC algorithm enhanced with the notion of coverage, Technical Report EHU-KAT-IK-02-14, University of the Basque Country
  • Johari, R. (2016) MS&E 226: “Small” Data Lecture 8: Classification Problems, Lucture notes, Available from: http://web.stanford.edu/~rjohari/teaching/notes.html
  • Kohavi R. (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of International Joint Conference on AI. 1995, 1137–1145
  • Mitchell, T. M. (1997). Machine Learning. New York: McGraw-Hill.
  • Quinlan, J. R. (1993) C4.5: Programs for machine learning, San Mateo, CA: Morgan Kaufmann publications
  • Powers, D. M. (2011). Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation, Journal of Machine Learning Technologies. 2 (1), 37–63.
  • Refaeilzadeh P., Tang L. & Liu H. (2007) On comparison of feature selection algorithms. In Proc. AAAI-07 Workshop on Evaluation Methods in Machine Learing II. 34–39.
  • Rokach, L. and Maimon, O.Z. (2008) Data mining with decision trees: Theory and applications. World Scientific Publishing Co., Inc., Singapore.
  • Sumathi, S. & Sivanandam, S.N. (2006) Introduction to Data Mining Principles, Studies in Computational Intelligence (SCI) 29, 1–20
  • Witten, I. H. & Frank, E. (2005) Data minig: Practical machine learning tools and techniques, United States of America: Morgan Kaufmann publications
  • Witten, I. H., Frank, E. & Hall, M. (2016) Data minig: Practical machine learning tools and techniques, United States of America: Morgan Kaufmann publications

An Analysis Program Used in Data Mining: WEKA

Year 2019, Volume: 10 Issue: 1, 80 - 95, 29.03.2019
https://doi.org/10.21031/epod.399832

Abstract

In this study, it is aimed to
introduce one of the data mining methods which is very popular in recent years
and commonly used in this area. For this purpose, the WEKA program and the
decision trees, which is one of the methods used to estimate the dependent
variable through independent variables, will be introduced. In today's age of
technology, the amount of information at hand is constantly increasing and the
derivation of meaningful results from this information is seen as a valuable
field of study. Data mining aims to reveal the information that is hidden in a
large amount of data after a series of operations, which is very useful for
researchers. Regarding this approach that is mostly based on estimation and
classification, there is a lot of new and unvalidated software that has not yet
been fully tested. In this study, we discuss WEKA software, which is one of the
programs in the field of data mining, how to run the program and the content of
the analyzes and output files. The study also contains some suggestions for the
practitioners who want to use this program about the superior aspects of the
software and what kind of analysis can be done with it.

References

  • Bradley, E. & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Boca Raton, USA: Chapman and Hall
  • Breiman L., Friedman J.H., Olsen E.A. & Stone C.J., (1984), Classification and Regression Trees, Belmont, CA: Wadsworth International Group.
  • Fayyad, U. & Irani, K. (1992) On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8, 87–102.
  • Holte, R. C. 1993. Very simple classification rules perform well on most commonly used datasets. Machine Learning 11:63–91.
  • Ibarguren, I., Perez, J. M., Muguerza, J., Gurrutxaga, I & Arbelaitz, O. (2014). An update of the J48Consolidated WEKA’s class: CTC algorithm enhanced with the notion of coverage, Technical Report EHU-KAT-IK-02-14, University of the Basque Country
  • Johari, R. (2016) MS&E 226: “Small” Data Lecture 8: Classification Problems, Lucture notes, Available from: http://web.stanford.edu/~rjohari/teaching/notes.html
  • Kohavi R. (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of International Joint Conference on AI. 1995, 1137–1145
  • Mitchell, T. M. (1997). Machine Learning. New York: McGraw-Hill.
  • Quinlan, J. R. (1993) C4.5: Programs for machine learning, San Mateo, CA: Morgan Kaufmann publications
  • Powers, D. M. (2011). Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation, Journal of Machine Learning Technologies. 2 (1), 37–63.
  • Refaeilzadeh P., Tang L. & Liu H. (2007) On comparison of feature selection algorithms. In Proc. AAAI-07 Workshop on Evaluation Methods in Machine Learing II. 34–39.
  • Rokach, L. and Maimon, O.Z. (2008) Data mining with decision trees: Theory and applications. World Scientific Publishing Co., Inc., Singapore.
  • Sumathi, S. & Sivanandam, S.N. (2006) Introduction to Data Mining Principles, Studies in Computational Intelligence (SCI) 29, 1–20
  • Witten, I. H. & Frank, E. (2005) Data minig: Practical machine learning tools and techniques, United States of America: Morgan Kaufmann publications
  • Witten, I. H., Frank, E. & Hall, M. (2016) Data minig: Practical machine learning tools and techniques, United States of America: Morgan Kaufmann publications
There are 15 citations in total.

Details

Primary Language Turkish
Journal Section Articles
Authors

Gökhan Aksu 0000-0003-2563-6112

Nuri Doğan 0000-0001-6274-2016

Publication Date March 29, 2019
Acceptance Date November 29, 2018
Published in Issue Year 2019 Volume: 10 Issue: 1

Cite

APA Aksu, G., & Doğan, N. (2019). Veri Madenciliğinde Kullanılan Güncel Bir Analiz Programı: WEKA. Journal of Measurement and Evaluation in Education and Psychology, 10(1), 80-95. https://doi.org/10.21031/epod.399832