Research Article

Evaluation of Oversampling Methods (OVER, SMOTE, and ROSE) in Classifying Soil Liquefaction Dataset based on SVM, RF, and Naïve Bayes

Number: 34 March 31, 2022
EN TR

Evaluation of Oversampling Methods (OVER, SMOTE, and ROSE) in Classifying Soil Liquefaction Dataset based on SVM, RF, and Naïve Bayes

Abstract

Class imbalanced datasets are prevalent in real-world applications, including engineering, medical domain, financial sector, and others. Machine learning (ML)-based prediction models have successfully demonstrated the applicability of various algorithms for the solution of different problems. However, their application for the soil liquefaction issue considering the class imbalance situation is limited. This paper presents the prediction results of random forest (RF), support vector machine (SVM), and naïve bayes (NB) algorithms with different training sample sizes for soil liquefaction. The effect of oversampling methods, namely simple oversampling (OVER), random oversampling examples (ROSE), and synthetic minority oversampling technique (SMOTE), on the prediction performance of classification algorithms is also investigated. Performance results are evaluated by means of some metrics, including Accuracy, Kappa, Precision, Recall, and F-measure. The results concluded the effectiveness of applying oversampling methods on imbalanced data before the modeling phase. All of the oversampling methods helped to enhance the overall performances of the classification models. It is also observed that the SMOTE exhibited slightly better performance than other considered oversampling methods. Furthermore, the SVM model outperformed compared to RF and NB models when all algorithms were trained by the SMOTE algorithm.

Keywords

References

  1. Adalier, K., & Elgamal, A. (2004). Mitigation of liquefaction and associated ground deformations by stone columns. Engineering Geology, 72(3-4), 275-291.
  2. Allen, J. R. L. (1982). Sedimentary Structures: Their Character and Physical Basis. Volume II. Developments in Sedimentology, 30B, Amsterdam.
  3. Amiri, M., Bakhshandeh Amnieh, H., Hasanipanah, M., & Mohammad Khanli, L. (2016). A new combination of artificial neural network and K-nearest neighbors models to predict blast-induced ground vibration and air-overpressure. Engineering with Computers, 32(4), 631-644.
  4. Cetin, K. O., Seed, R. B., Der Kiureghian, A., Tokimatsu, K., Harder Jr, L. F., Kayen, R. E., & Moss, R. E. (2004). Standard penetration test-based probabilistic and deterministic assessment of seismic soil liquefaction potential. Journal of Geotechnical and Geoenvironmental Engineering, 130(12), 1314-1340.
  5. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357.
  6. Chen, B., Xia, S., Chen, Z., Wang, B., & Wang, G. (2021). RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise. Information Sciences, 553, 397-428.
  7. Demir, S., & Sahin, E. K. (2022). Comparison of tree-based machine learning algorithms for predicting liquefaction potential using canonical correlation forest, rotation forest, and random forest based on CPT data. Soil Dynamics and Earthquake Engineering, 154, 107130.
  8. Douzas, G., & Bacao, F. (2017). Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning. Expert Systems with Applications, 82, 40-52.

Details

Primary Language

English

Subjects

Engineering

Journal Section

Research Article

Publication Date

March 31, 2022

Submission Date

February 23, 2022

Acceptance Date

February 23, 2022

Published in Issue

Year 1970 Number: 34

APA
Demir, S., & Şahin, E. K. (2022). Evaluation of Oversampling Methods (OVER, SMOTE, and ROSE) in Classifying Soil Liquefaction Dataset based on SVM, RF, and Naïve Bayes. Avrupa Bilim Ve Teknoloji Dergisi, 34, 142-147. https://doi.org/10.31590/ejosat.1077867

Cited By