TR
EN
A comprehensive review on data preprocessing techniques in data analysis
Abstract
With the technological developments, the amount of data stored in the computer environment is increasing very rapidly. Data analysis has become an important research subject for the correct evaluation of these data and to transform them into useful information. Of course, data play an important role in data analysis. However, model performance is highly dependent on the characteristics of the data. For this reason, it is essential to preprocess them before starting any data analysis process. Data preprocessing creates accurate and useful datasets by overcoming erroneous, incomplete, or other unwanted problems. In this study, papers on data preprocessing in the last 5 years have been researched systematically and it has been observed that widely used preprocessing methods are classified under three main branches: data cleaning, data transformation and data reduction. These methods and various algorithms of them are examined, the frequency of use is presented, and comparisons are made in terms of accuracy performance. As the result of the study shows, when data preprocessing methods are not used on raw data or when wrong data preprocessing methods are applied, data analysis methods alone cannot achieve sufficient performance.
Keywords
Kaynakça
- [1] Oussous A, Benjelloun F, Lahcen A, Belfkih S. "Big data technologies: a survey". Journal of King Saud UniversityComputer and Information Sciences, 30(4), 431-448, 2018.
- [2] Choi TM, Wallace SW, Wang Y. “Big data analytics in operations management”. Production and Operations Management, 27(10), 1868-1883, 2018.
- [3] García S, Ramírez-Gallego S, Luengo J, Benítez JM.“Big data preprocessing: methods and prospects”. Big Data Analytics, 1(1), 1-22, 2016.
- [4] Anoopkumar M, Rahman AMJMZ. “A Review on data mining techniques and factors used in educational data mining to predict student amelioration”. 2016 International Conference on Data Mining and Advanced Computing, Ernakulam, India, 16-18 March, 2016.
- [5] Yıldırım P, Birant D. “Application of data mining techniques in cloud computing: a literature review”. Pamukkale University Journal of Engineering Sciences, 24(2), 336-343, 2018.
- [6] Venkatkumar IA, Shardaben SJK. “Comparative study of data mining clustering algorithms”. 2016 International Conference on Data Science and Engineering, Cochin, India, 23-25 August 2016.
- [7] Çığşar B, Ünal D."Comparison of data mining classification algorithms determining the default risk”. Scientific Programming, 2019, 1-8, 2019.
- [8] Umadevi S, Marseline KSJ. "A survey on data mining classification algorithms". 2017 International Conference on Signal Processing and Communication, Coimbatore, India, 28-29 July 2017.
Ayrıntılar
Birincil Dil
İngilizce
Konular
Mühendislik
Bölüm
Derleme
Yayımlanma Tarihi
30 Nisan 2022
Gönderilme Tarihi
21 Nisan 2021
Kabul Tarihi
7 Temmuz 2021
Yayımlandığı Sayı
Yıl 2022 Cilt: 28 Sayı: 2
APA
Çetin, V., & Yıldız, O. (2022). A comprehensive review on data preprocessing techniques in data analysis. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 28(2), 299-312. https://izlik.org/JA72KF92WU
AMA
1.Çetin V, Yıldız O. A comprehensive review on data preprocessing techniques in data analysis. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2022;28(2):299-312. https://izlik.org/JA72KF92WU
Chicago
Çetin, Volkan, ve Oktay Yıldız. 2022. “A comprehensive review on data preprocessing techniques in data analysis”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 28 (2): 299-312. https://izlik.org/JA72KF92WU.
EndNote
Çetin V, Yıldız O (01 Nisan 2022) A comprehensive review on data preprocessing techniques in data analysis. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 28 2 299–312.
IEEE
[1]V. Çetin ve O. Yıldız, “A comprehensive review on data preprocessing techniques in data analysis”, Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, c. 28, sy 2, ss. 299–312, Nis. 2022, [çevrimiçi]. Erişim adresi: https://izlik.org/JA72KF92WU
ISNAD
Çetin, Volkan - Yıldız, Oktay. “A comprehensive review on data preprocessing techniques in data analysis”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 28/2 (01 Nisan 2022): 299-312. https://izlik.org/JA72KF92WU.
JAMA
1.Çetin V, Yıldız O. A comprehensive review on data preprocessing techniques in data analysis. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2022;28:299–312.
MLA
Çetin, Volkan, ve Oktay Yıldız. “A comprehensive review on data preprocessing techniques in data analysis”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, c. 28, sy 2, Nisan 2022, ss. 299-12, https://izlik.org/JA72KF92WU.
Vancouver
1.Volkan Çetin, Oktay Yıldız. A comprehensive review on data preprocessing techniques in data analysis. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi [Internet]. 01 Nisan 2022;28(2):299-312. Erişim adresi: https://izlik.org/JA72KF92WU