Research Article

Sampling Techniques and Application in Machine Learning in order to Analyse Crime Dataset

Number: 38 August 31, 2022
TR EN

Sampling Techniques and Application in Machine Learning in order to Analyse Crime Dataset

Abstract

Machine learning enables machines to learn information and make inferences using the information it has learned. In this article, five years of crime data were analyzed and the learning process was completed with the data in the machine's hands. One-Hot Encoding and Min-Max Normalization methods and Principal Component Analysis algorithm were used in the analysis of the data. The model was asked to predict whether the criminal could be caught, the security of the area, and the type of crime committed using the K-Nearest Neighborhood, Random Forest and Extreme Gradient Boosting algorithms. However, no matter how successful the model is in imbalanced datasets, the result will be misleading. Therefore, the main purpose of this article is to transform the imbalanced data into a balanced one by various methods and to find the most accurate sampling method for the data, which is compatible with the classification method. For this purpose, one statistical sampling method (Stratify), three over sampling method (Random Over Sampler, Synthetic Minority Over, Adaptive Synthetic), three under sampling method (Random Under Sampler, Near Miss, Neighborhood Cleaning Rule) and mix samplig method (Smote Tomek) have been applied to avoid imbalance of data in target areas such as Arrest, Crime Type,Security. As a result of the sampling methods applied, efficient and effective results were obtained.

Keywords

References

  1. Hibberts, M., Burke Johnson, R., & Hudson, K. (2012). Common survey sampling techniques. In Handbook of survey methodology for the social sciences (pp. 53-74). Springer, New York, NY.
  2. Zhihao, P., Fenglong, Y., & Xucheng, L. (2019, April). Comparison of the different sampling techniques for imbalanced classification problems in machine learning. In 2019 11th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA) (pp. 431-434). IEEE.
  3. Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter, 6(1), 20-29.
  4. Sathyadevan, S., Devan, M. S., & Gangadharan, S. S. (2014, August). Crime analysis and prediction using data mining. In 2014 First international conference on networks & soft computing (ICNSC2014) (pp. 406-412). IEEE.
  5. Junsomboon, N., & Phienthrakul, T. (2017, February). Combining over-sampling and under-sampling techniques for imbalance dataset. In Proceedings of the 9th International Conference on Machine Learning and Computing (pp. 243-247).
  6. Prabakaran, S., & Mitra, S. (2018, April). Survey of analysis of crime detection techniques using data mining and machine learning. In Journal of Physics: Conference Series (Vol. 1000, No. 1, p. 012046). IOP Publishing.
  7. Xie, C., Du, R., Ho, J. W., Pang, H. H., Chiu, K. W., Lee, E. Y., & Vardhanabhuti, V. (2020). Effect of machine learning re-sampling techniques for imbalanced datasets in 18F-FDG PET-based radiomics model on prognostication performance in cohorts of head and neck cancer patients. European journal of nuclear medicine and molecular imaging, 47(12), 2826-2835.
  8. Etikan, I., & Bala, K. (2017). Sampling and sampling methods. Biometrics & Biostatistics International Journal, 5(6), 00149.

Details

Primary Language

English

Subjects

Engineering

Journal Section

Research Article

Publication Date

August 31, 2022

Submission Date

May 11, 2022

Acceptance Date

June 14, 2022

Published in Issue

Year 2022 Number: 38

APA
Saylı, A., & Başarır, S. (2022). Sampling Techniques and Application in Machine Learning in order to Analyse Crime Dataset. Avrupa Bilim Ve Teknoloji Dergisi, 38, 296-310. https://doi.org/10.31590/ejosat.1115323
AMA
1.Saylı A, Başarır S. Sampling Techniques and Application in Machine Learning in order to Analyse Crime Dataset. EJOSAT. 2022;(38):296-310. doi:10.31590/ejosat.1115323
Chicago
Saylı, Ayla, and Sevil Başarır. 2022. “Sampling Techniques and Application in Machine Learning in Order to Analyse Crime Dataset”. Avrupa Bilim Ve Teknoloji Dergisi, nos. 38: 296-310. https://doi.org/10.31590/ejosat.1115323.
EndNote
Saylı A, Başarır S (August 1, 2022) Sampling Techniques and Application in Machine Learning in order to Analyse Crime Dataset. Avrupa Bilim ve Teknoloji Dergisi 38 296–310.
IEEE
[1]A. Saylı and S. Başarır, “Sampling Techniques and Application in Machine Learning in order to Analyse Crime Dataset”, EJOSAT, no. 38, pp. 296–310, Aug. 2022, doi: 10.31590/ejosat.1115323.
ISNAD
Saylı, Ayla - Başarır, Sevil. “Sampling Techniques and Application in Machine Learning in Order to Analyse Crime Dataset”. Avrupa Bilim ve Teknoloji Dergisi. 38 (August 1, 2022): 296-310. https://doi.org/10.31590/ejosat.1115323.
JAMA
1.Saylı A, Başarır S. Sampling Techniques and Application in Machine Learning in order to Analyse Crime Dataset. EJOSAT. 2022;:296–310.
MLA
Saylı, Ayla, and Sevil Başarır. “Sampling Techniques and Application in Machine Learning in Order to Analyse Crime Dataset”. Avrupa Bilim Ve Teknoloji Dergisi, no. 38, Aug. 2022, pp. 296-10, doi:10.31590/ejosat.1115323.
Vancouver
1.Ayla Saylı, Sevil Başarır. Sampling Techniques and Application in Machine Learning in order to Analyse Crime Dataset. EJOSAT. 2022 Aug. 1;(38):296-310. doi:10.31590/ejosat.1115323

Cited By