Yıl 2021, Cilt 13 , Sayı 1, Sayfalar 140 - 151 2021-01-18

Decision Trees in Large Data Sets

Zeynep ÇETİNKAYA [1] , Fahrettin HORASAN [2]


Data mining is the process of obtaining information, which is used to identify and define the relationships between data of different qualities. One of the important problems encountered in this process is the classification process in large data sets. Extensive research has been done to find solutions to this classification problem and different solution methods have been introduced. Some decision tree algorithms are among the structures that can be used effectively in this field. In this article, various decision tree structures and algorithms used for classification process in large data sets are discussed. Along with the definitions of the algorithms, the similarities and existing differences between them were determined, their advantages and disadvantages were investigated.
Decision trees, Decision tree algorithms, Big data sets, Scalable decision trees
  • Akpınar, H. (2000). Veri Tabanlarında Bilgi Keşfi ve Veri Madenciliği, İ. U. işletme Fakültesi Dergisi, C:29, s: 1-22.
  • Alsabti, K., Ranka, S., Singh, V. (1998). CLOUDS: A Decision Tree Classifier for Large Datasets, Electrical Engineering and Computer Science.Paper 41.
  • Aytekin, Ç., Sütcü, C.S., Özfidan, U. (2018) Text Classıfıcatıon Vıa Decısıon Trees Algorıthm: Customer Comments Case, The Journal of International Social Research, vol:11 ss:55.
  • Berry, M.J.A., Lınoff, G.S. (1997). Data Mining Techniques for Marketing, Sales, and Customer Relationship Management, First Edition, Wiley Publishing, 187-216.
  • Bounsaythip, C., Rinta, R.E. (2001). Overview of Data Mining For Customer Behavior Modeling. VTT Information Technology Research Report. Ver 1. ss. 21.
  • Breiman, L., Freidman, J.H., Olshen, R.A., Stone, C.J.(1984). Classification and Regression Trees, Monterey, CA: Wadsworth, 358 s.
  • Chaudhuri, S., Fayyad, U., Berhardt, J. (1997). Scalable Classification over SQL Database, Technical Report MSR-TR-97-35, Microsoft Research.
  • Chein, C. F., Chen, L. F. (2008) Data Mining to Improve Personnel Selection and Enhance Human Capital: A Case Study in High-Technology Industry, Expert Systems with Applications, vol. 34, p. 280-290.
  • Çalış, A., Kayapınar, S., Çetinyokuş, T. (2014). Veri Madenciliğinde Karar Ağacı Algoritmaları ile Bilgisayar ve İnternet Güvenliği Üzerine Bir Uygulama, Endüstri Mühendisliği Dergisi cilt:25 sayı:3-4 s:2-19.
  • Demirel, Ş., Y. Giray, S. (2019). Karar Ağacı Algoritmaları ve Çocuk İşçiliği Üzerine Bir Uygulama. Social Sciences Research Journal, 8 (4), 52-65.
  • Gehrke, J., Ramakrishnan, R., and Ganti, V. (1998). Rainforest: A framework for fast decision tree construction of large datasets, VLDB, vol. 98, pp. 416–427.
  • Gehrke, J., Ramakrishnan, R., and Ganti, V. (2000). Rainforest: A framework for fast decision tree construction of large datasets, Data Mining and Knowledge Discovery, 4, 127–162.
  • Han, J., Kamber, M. (2006). Data Mining: Concepts and Techniques, Morgan Kaufmann, USA.
  • Hssına, B., Merbouha, A., Ezzıkourı, H., Errıtalı, M. (2014). A comparative study of decision tree ID3 and C4.5, International Journal of Advanced Computer Science and Applications.
  • Joshi, M. V., Karypis, G., Kumar, V. (1998). ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets, Department of Computer Science University of Minnesota, Minneapolis, MN 55455.
  • Kavzoğlu, T., Çölkesen, İ. (2010). Classification of Satellite Images Using Decision Trees: Kocaeli Case, Electronic Journal of Map Technologies 2-1, 36-45.
  • Mehta, M., Agarwal, R. and Rissanen, J. (1996) SLIQ: A fast scalable classifier for data mining, In Proc. of 5th International Conference on Extending Database Technology (EBDT).
  • Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction”, Machine Learning, 4, 227–243.
  • Nisbet, R., Elder, J. and Miner, G. (2009), Handbook of Statistical Analysis and Data Mining Applications, Burlington: Elsevier, ISBN: 978-0-12-374765-5.
  • Oğuzlar, A. (2004). CART Analizi İle Hane Halkı İşgücü Anketi Sonuçlarının Özetlenmesi, Atatürk Üniversitesi İİBF Dergisi, sayı 18, s. 79-90.
  • Özekeş, S., Çamurcu, A. Y. (2002). Veri Madenciliğinde Sınıflama Ve Kestirim Uygulaması, Marmara Üniversitesi Fen Bilimleri Dergisi, sayı 18, s. 1-17.
  • Patil, N., Lathi, R., Chitre V., (2012), Comparison of C5.0 & CART Classification algorithms using pruning technique, International Journal of Engineering Research & Technology (IJERT), ISSN: 2278-0181, Vol. 1 Issue 4.
  • Pehlivan, G. (2006). CHAID Analizi ve Bir Uygulama. Yayınlanmamış Yüksek Lisans Tezi . İstanbul: Yıldız Teknik Üniversitesi, FBE.
  • Sezer, E. A., Bozkır, A. S., Yağız, S., Gökçeoğlu, C. (2010). Karar Ağacı Derinliğinin CART Algoritmasında Kestirim Kapasitesine Etkisi: Bir Tünel Açma Makinesinin İlerleme Hızı Üzerinde Uygulama, Akıllı Sistemlerde Yenilikler ve Uygulamaları Sempozyumu, Kayseri.
  • Shafer, J., Agarwal, R. and Mehta, M. (1996) SPRINT: A scalable parallel classifier for data mining, In Proc. of 22nd International Conference on Very Large Databases.
  • Shannon, C. (1948). A Mathematical Theory of Communication, The Bell System Technical Journal. Vol:27. ss:379-423.
  • Swain, P.H., Hauska, H. (1977). Decision tree classifier - design and potential, IEEE Transactions on Geoscience and Remote Sensing, 15, 142-147.
  • Yang, Y., Chen, W. (2016). Taiga: Performance Optimization of the C4.5 Decision Tree Construction Algorithm, Tsınghua Science and Technology, ISSN 1007-0214 06/11, pp 415–425 Vol. 21, N. 4.
  • Zhixian N., Zong L., Yan, Q., Zhao, Z. (2009), AutoRecognizing DBMS Workload Based on C5.0 Algorithm, College of Computer and Software, Taiyuan University of Technology.
Birincil Dil en
Konular Mühendislik
Bölüm Makaleler
Yazarlar

Orcid: 0000-0002-6815-5186
Yazar: Zeynep ÇETİNKAYA
Kurum: KIRIKKALE UNIVERSITY, FACULTY OF ENGINEERING, DEPARTMENT OF COMPUTER ENGINEERING
Ülke: Turkey


Orcid: 0000-0003-4554-9083
Yazar: Fahrettin HORASAN (Sorumlu Yazar)
Kurum: Dr., Kırıkkale University,
Ülke: Turkey


Tarihler

Yayımlanma Tarihi : 18 Ocak 2021

APA Çetinkaya, Z , Horasan, F . (2021). Decision Trees in Large Data Sets . International Journal of Engineering Research and Development , 13 (1) , 140-151 . DOI: 10.29137/umagd.763490