EN
TR
Evaluation of Oversampling Methods (OVER, SMOTE, and ROSE) in Classifying Soil Liquefaction Dataset based on SVM, RF, and Naïve Bayes
Abstract
Class imbalanced datasets are prevalent in real-world applications, including engineering, medical domain, financial sector, and others. Machine learning (ML)-based prediction models have successfully demonstrated the applicability of various algorithms for the solution of different problems. However, their application for the soil liquefaction issue considering the class imbalance situation is limited. This paper presents the prediction results of random forest (RF), support vector machine (SVM), and naïve bayes (NB) algorithms with different training sample sizes for soil liquefaction. The effect of oversampling methods, namely simple oversampling (OVER), random oversampling examples (ROSE), and synthetic minority oversampling technique (SMOTE), on the prediction performance of classification algorithms is also investigated. Performance results are evaluated by means of some metrics, including Accuracy, Kappa, Precision, Recall, and F-measure. The results concluded the effectiveness of applying oversampling methods on imbalanced data before the modeling phase. All of the oversampling methods helped to enhance the overall performances of the classification models. It is also observed that the SMOTE exhibited slightly better performance than other considered oversampling methods. Furthermore, the SVM model outperformed compared to RF and NB models when all algorithms were trained by the SMOTE algorithm.
Keywords
References
- Adalier, K., & Elgamal, A. (2004). Mitigation of liquefaction and associated ground deformations by stone columns. Engineering Geology, 72(3-4), 275-291.
- Allen, J. R. L. (1982). Sedimentary Structures: Their Character and Physical Basis. Volume II. Developments in Sedimentology, 30B, Amsterdam.
- Amiri, M., Bakhshandeh Amnieh, H., Hasanipanah, M., & Mohammad Khanli, L. (2016). A new combination of artificial neural network and K-nearest neighbors models to predict blast-induced ground vibration and air-overpressure. Engineering with Computers, 32(4), 631-644.
- Cetin, K. O., Seed, R. B., Der Kiureghian, A., Tokimatsu, K., Harder Jr, L. F., Kayen, R. E., & Moss, R. E. (2004). Standard penetration test-based probabilistic and deterministic assessment of seismic soil liquefaction potential. Journal of Geotechnical and Geoenvironmental Engineering, 130(12), 1314-1340.
- Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357.
- Chen, B., Xia, S., Chen, Z., Wang, B., & Wang, G. (2021). RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise. Information Sciences, 553, 397-428.
- Demir, S., & Sahin, E. K. (2022). Comparison of tree-based machine learning algorithms for predicting liquefaction potential using canonical correlation forest, rotation forest, and random forest based on CPT data. Soil Dynamics and Earthquake Engineering, 154, 107130.
- Douzas, G., & Bacao, F. (2017). Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning. Expert Systems with Applications, 82, 40-52.
Details
Primary Language
English
Subjects
Engineering
Journal Section
Research Article
Publication Date
March 31, 2022
Submission Date
February 23, 2022
Acceptance Date
February 23, 2022
Published in Issue
Year 1970 Number: 34
APA
Demir, S., & Şahin, E. K. (2022). Evaluation of Oversampling Methods (OVER, SMOTE, and ROSE) in Classifying Soil Liquefaction Dataset based on SVM, RF, and Naïve Bayes. Avrupa Bilim Ve Teknoloji Dergisi, 34, 142-147. https://doi.org/10.31590/ejosat.1077867
Cited By
Glacial lakes of Sikkim Himalaya: their dynamics, trends, and likely fate—a timeseries analysis through cloud-based geocomputing, and machine learning
Geomatics, Natural Hazards and Risk
https://doi.org/10.1080/19475705.2023.2286903A new xG model for football analytics
Journal of the Operational Research Society
https://doi.org/10.1080/01605682.2024.2323669Customised-sampling approach for pipe failure prediction in water distribution networks
Scientific Reports
https://doi.org/10.1038/s41598-024-69109-9TÜKETİCİLERİN ONLİNE YEMEK SİPARİŞİ MEMNUNİYETİNİN VERİ MADENCİLİĞİ ALGORİTMALARIYLA SINIFLANDIRILMASI VE PERFORMANSLARININ KARŞILAŞTIRILMASI
International Review of Economics and Management
https://doi.org/10.18825/iremjournal.1478562FairFML: fair federated machine learning with a case study on reducing gender disparities in cardiac arrest outcome prediction
npj Health Systems
https://doi.org/10.1038/s44401-025-00035-2Implementation of SMOTE to Improve the Performance of Random Forest Classification in Credit Risk Assessment in Banking
INTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi
https://doi.org/10.29407/intensif.v9i2.23930Landslide Susceptibility Assessment via Imbalanced Data Augmentation with Tabular Variational Autoencoder and Quality–Diversity Post-Selection
Applied Sciences
https://doi.org/10.3390/app152211965Interpretable liquefaction prediction model based on stacking algorithm
Earth Science Informatics
https://doi.org/10.1007/s12145-025-01999-3Stacking model based on six base classifiers to improve prediction of soil liquefaction: a multi-dataset study
Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards
https://doi.org/10.1080/17499518.2025.2573637