Araştırma Makalesi
BibTex RIS Kaynak Göster

DistilBERT Tabanlı Özellik Çıkarma ve Makine Öğrenimi ile Suç Tahmini

Yıl 2024, Cilt: 39 Sayı: 4, 1067 - 1079, 25.12.2024
https://doi.org/10.21605/cukurovaumfd.1606169

Öz

Suç toplumlara zarar veren yasal olarak da cezai bir karşılığı da olan tüm eylem ve davranışlardır. Suçla mücadele temelde devletin görevi olarak yorumlanmakla birlikte bu çalışmaya benzer uygulamalar mücadeleyi destekleyebilmek adına önemlidir. Çünkü suç verileri üzerinden yapılan farklı analizler ile yorumlanabilir durumlar ortaya çıkarabilir. Buradan hareketle alınan ek tedbirler suç ile mücadele de yardımcı öge olmuş olur. Oluşabilecek suçun tahmin edilebilmesi suç durumu oluşmadan önlenmesini sağlar. Bu nedenle suçların analizi ve tahmini gelecekteki suçları belirlemede ve azaltmada önemlidir. Bu çalışmada DistilBERT ile özniteliklerin elde edildiği ve 8 farklı makine öğrenim algoritmasının sınıflandırıcı olarak kullanıldığı bir model önerilmiştir. Veriseti olarak Kaggle Inc. Tarafından yönetilen çevrimiçi bir yarışma için kullanılan San Francisco suç veriseti kullanılmıştır. Literatürden farklı olarak verisetindeki tüm suç kategorileri (39 kategori) çalışmaya dâhil edilmiştir. Ayrıca DistilBERT ile özniteliklerin elde edilmesi de çalışmayı farklılaştıran diğer bir noktadır. Parametre optimizasyonu için GridSearchCV tercih edilmiş ve default parametrelere göre 1-2% aralığında genel iyileşme gözlemlenmiştir. En yüksek doğruluk oranı 99.78% ile Destek Vektör Makinesi (DVM) ile elde edilmiştir. Ayrıca 10 kat çapraz doğrulama ile de yine DVM ve Lojistik Regresyon (LR) sınıflandırıcılarında daha yüksek doğruluk değerlerine ulaşılmıştır.

Kaynakça

  • 1. Dülgeroğlu, B., 2024. Suç kategori tespiti için istifleme topluluğu modeli kullanan sistem tasarımı. Yüksek Lisans Tezi, Kayseri Üniversitesi, Kayseri.
  • 2. Khan, M., Azmat, A., Alharbi, Y., 2022. Predicting and preventing crime: a crime prediction model using san francisco crime data by classification techniques. Complexity, 2022(1), 4830411.
  • 3. Horoz, A.D., Arslan, H., 2023. Crime analysis and forecasting using machine learning. Journal of Optimization and Decision Making, 2(2), 270-275.
  • 4. Arslan, R.S., Dülgeroğlu, B., 2023. A design of crime category detection framework using stacking ensemble model. Çukurova Üniversitesi Mühendislik Fakültesi Dergisi, 38(4), 1035-1048.
  • 5. Butt, U.M., Letchmunan, S., Hassan, F.H., Ali, M., Baqir, A., Sherazi, H.H.R., 2020. Spatio-temporal crime hotspot detection and prediction: a systematic literature review. IEEE Access, 8, 166553-166574.
  • 6. Bharathi, S.T., Indrani, B., Prabakar, M.A., 2017. A supervised learning approach for criminal identification using similarity measures and K-Medoids clustering. In 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), 646-653. IEEE.
  • 7. Babakura, A., Sulaiman, M.N., Yusuf, M.A., 2014. Improved method of classification algorithms for crime prediction. In 2014 International Symposium on Biometrics and Security Technologies (ISBAST), 250-255. IEEE.
  • 8. Baculo, M.J.C., Marzan, C.S., de Dios Bulos, R., Ruiz, C., 2017. Geospatial-temporal analysis and classification of criminal data in manila. In 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA), 6-11. IEEE.
  • 9. Borowik, G., Wawrzyniak, Z.M., Cichosz, P., 2018. Time series analysis for crime forecasting. In 2018 26th International Conference on Systems Engineering (ICSEng), 1-10. IEEE.
  • 10. Abdulrahman, N., Abedalkhader, W., 2017. KNN classifier and Naive Bayse classifier for crime prediction in San Francisco context. International Journal of Database Management Systems, 9(4), 1-9.
  • 11. Borges, J., Ziehr, D., Beigl, M., Cacho, N., Martins, A., Araujo, A., Bezerra, L., Geisler, S., 2018. Time-series features for predictive policing. In 2018 IEEE international smart cities conference (ISC2), 1-8. IEEE.
  • 12. Yehya, A., 2016. San francisco crime classification. arXiv Preprint arXiv, 1607.03626.
  • 13. Chandrasekar, A., Sunder, A., Kumar, P., 2015. Crime prediction and classification in San Francisco City.
  • 14. Arslan, R.S., Dülgeroğlu, B., 2023. Crime classification using categorical feature engineering and machine learning. In 2023 International Ankara Congress on Multidisciplinary Studies-VI, 1-8.
  • 15. Pradhan, I., 2018. Exploratory data analysis and crime prediction in San Francisco. San Jose State University, 2018.
  • 16. Bilen, A., Özer, A.B., 2022. Regresyon yöntemlerine dayalı suç tespit analizi karşılaştırması Elazığ ili örneği. Fırat Üniversitesi Mühendislik Bilimleri Dergisi, 34(1), 115-121.
  • 17. Sarzaeim, P., Mahmoud, Q.H., Azim, A., 2024. Experimental analysis of large language models in crime classification and prediction. In Proceedings of the Canadian Conference on Artificial Intelligence.
  • 18. Selvakumari, S., Peter, V., 2024. Crime classification using GRU, CNN and autoencoder techniques. Educational Administration: Theory and Practice, 30(5), 2950-2964.
  • 19. Bharath, R.R., Sulthan, H.K., Mingaz, R.M., Kumaravengatesh, S.N.A., 2024. Machine learning approach to crime analysis and forecasting for prediction and prevention. African Journal of Biological Sciences, 1300-1313.
  • 20. Djon, D., Jhawar, J., Drumm, K., Tran, V., 2023. A comparative analysis of multiple methods for predicting a specific type of crime in the city of Chicago. arXiv Preprint arXiv, 2304.13464.
  • 21. Butt, U.M., Letchmunan, S., Hassan, F.H., Koh, T.W., 2024. Leveraging transfer learning with deep learning for crime prediction. Plos One, 19(4), e0296486.
  • 22. Kan, W., 2015. San Francisco crime classification. https://kaggle.com/competitions/sf-crime, Kaggle.
  • 23. Özkan, M., Kar, G., 2022. Türkçe dilinde yazılan bilimsel metinlerin derin öğrenme tekniği uygulanarak çoklu sınıflandırılması. Mühendislik Bilimleri ve Tasarım Dergisi, 10(2), 504-519.
  • 24. Sevli, O., Kemaloğlu, N., 2021. Olağandışı olaylar hakkındaki tweet’lerin gerçek ve gerçek dışı olarak google BERT modeli ile sınıflandırılması. Veri Bilimi, 4(1), 31-37.
  • 25. Özkömürcü, H., 2021. Google Bert algoritması/Google Bert nedir? [Online]. Available: https://hozkomurcu.com/google-bert-algoritmasi-google-bert-nedir/, Access date: 06.2024.
  • 26. Liu, W., Zhang, S., Zhou, L., Luo, N., Xu, M., 2024. A semi-supervised mixture model of visual language multitask for vehicle recognition. Applied Soft Computing, 159, 111619.
  • 27. Sanh, V., Debut, L., Chaumond, J., Wolf, T., 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv Preprint arXiv, 1910.01108.
  • 28. Ranjan, G.S.K., Verma, A.K., Radhika, S., 2019. K-nearest neighbors and grid search cv based real time fault monitoring system for industries. In 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), 1-5. IEEE.
  • 29. Pirjatullah, Kartini, D., Nugrahadi, D.T., Muliadi, M., Farmadi, A., 2021. Hyperparameter tuning using GridsearchCV on the comparison of the activation function of the ELM method to the classification of pneumonia in toddlers. In 2021 4th International Conference of Computer and Informatics Engineering (IC2IE), 390-395. IEEE.

Crime Prediction with DistilBERT-based Feature Extraction and Machine Learning

Yıl 2024, Cilt: 39 Sayı: 4, 1067 - 1079, 25.12.2024
https://doi.org/10.21605/cukurovaumfd.1606169

Öz

Crime is all actions and behaviors that harm societies and have a legal and criminal counterpart. Although the fight against crime is basically interpreted as the duty of the state, practices similar to this study are important in order to support the struggle. Because it can create situations that can be interpreted with different analyzes made on crime data. From this point of view, additional measures taken will be an auxiliary element in the fight against crime. Being able to predict the crime that may occur ensures that it is prevented before the crime situation occurs. Therefore, the analysis and prediction of crimes is important in identifying and reducing future crimes. In this research, a model in which features are obtained with DistilBERT and 8 different machine learning algorithms are used as classifiers is proposed. The San Francisco crime dataset, which was used for an online competition managed by Kaggle Inc, was used as the dataset. Unlike the literature, all crime categories (39 categories) in the dataset were included in the study. In addition, obtaining features with DistilBERT is another point that differentiates the study. GridSearchCV was preferred for parameter optimization and a general improvement was observed in the range of 1-2% compared to the default parameters. The highest accuracy rate was accomplished with the Support Vector Machine (SVM) with 99.78%. In addition, with 10-fold cross-validation, higher accuracy values were achieved in SVM and Logistic Regression (LR) classifiers.

Kaynakça

  • 1. Dülgeroğlu, B., 2024. Suç kategori tespiti için istifleme topluluğu modeli kullanan sistem tasarımı. Yüksek Lisans Tezi, Kayseri Üniversitesi, Kayseri.
  • 2. Khan, M., Azmat, A., Alharbi, Y., 2022. Predicting and preventing crime: a crime prediction model using san francisco crime data by classification techniques. Complexity, 2022(1), 4830411.
  • 3. Horoz, A.D., Arslan, H., 2023. Crime analysis and forecasting using machine learning. Journal of Optimization and Decision Making, 2(2), 270-275.
  • 4. Arslan, R.S., Dülgeroğlu, B., 2023. A design of crime category detection framework using stacking ensemble model. Çukurova Üniversitesi Mühendislik Fakültesi Dergisi, 38(4), 1035-1048.
  • 5. Butt, U.M., Letchmunan, S., Hassan, F.H., Ali, M., Baqir, A., Sherazi, H.H.R., 2020. Spatio-temporal crime hotspot detection and prediction: a systematic literature review. IEEE Access, 8, 166553-166574.
  • 6. Bharathi, S.T., Indrani, B., Prabakar, M.A., 2017. A supervised learning approach for criminal identification using similarity measures and K-Medoids clustering. In 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), 646-653. IEEE.
  • 7. Babakura, A., Sulaiman, M.N., Yusuf, M.A., 2014. Improved method of classification algorithms for crime prediction. In 2014 International Symposium on Biometrics and Security Technologies (ISBAST), 250-255. IEEE.
  • 8. Baculo, M.J.C., Marzan, C.S., de Dios Bulos, R., Ruiz, C., 2017. Geospatial-temporal analysis and classification of criminal data in manila. In 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA), 6-11. IEEE.
  • 9. Borowik, G., Wawrzyniak, Z.M., Cichosz, P., 2018. Time series analysis for crime forecasting. In 2018 26th International Conference on Systems Engineering (ICSEng), 1-10. IEEE.
  • 10. Abdulrahman, N., Abedalkhader, W., 2017. KNN classifier and Naive Bayse classifier for crime prediction in San Francisco context. International Journal of Database Management Systems, 9(4), 1-9.
  • 11. Borges, J., Ziehr, D., Beigl, M., Cacho, N., Martins, A., Araujo, A., Bezerra, L., Geisler, S., 2018. Time-series features for predictive policing. In 2018 IEEE international smart cities conference (ISC2), 1-8. IEEE.
  • 12. Yehya, A., 2016. San francisco crime classification. arXiv Preprint arXiv, 1607.03626.
  • 13. Chandrasekar, A., Sunder, A., Kumar, P., 2015. Crime prediction and classification in San Francisco City.
  • 14. Arslan, R.S., Dülgeroğlu, B., 2023. Crime classification using categorical feature engineering and machine learning. In 2023 International Ankara Congress on Multidisciplinary Studies-VI, 1-8.
  • 15. Pradhan, I., 2018. Exploratory data analysis and crime prediction in San Francisco. San Jose State University, 2018.
  • 16. Bilen, A., Özer, A.B., 2022. Regresyon yöntemlerine dayalı suç tespit analizi karşılaştırması Elazığ ili örneği. Fırat Üniversitesi Mühendislik Bilimleri Dergisi, 34(1), 115-121.
  • 17. Sarzaeim, P., Mahmoud, Q.H., Azim, A., 2024. Experimental analysis of large language models in crime classification and prediction. In Proceedings of the Canadian Conference on Artificial Intelligence.
  • 18. Selvakumari, S., Peter, V., 2024. Crime classification using GRU, CNN and autoencoder techniques. Educational Administration: Theory and Practice, 30(5), 2950-2964.
  • 19. Bharath, R.R., Sulthan, H.K., Mingaz, R.M., Kumaravengatesh, S.N.A., 2024. Machine learning approach to crime analysis and forecasting for prediction and prevention. African Journal of Biological Sciences, 1300-1313.
  • 20. Djon, D., Jhawar, J., Drumm, K., Tran, V., 2023. A comparative analysis of multiple methods for predicting a specific type of crime in the city of Chicago. arXiv Preprint arXiv, 2304.13464.
  • 21. Butt, U.M., Letchmunan, S., Hassan, F.H., Koh, T.W., 2024. Leveraging transfer learning with deep learning for crime prediction. Plos One, 19(4), e0296486.
  • 22. Kan, W., 2015. San Francisco crime classification. https://kaggle.com/competitions/sf-crime, Kaggle.
  • 23. Özkan, M., Kar, G., 2022. Türkçe dilinde yazılan bilimsel metinlerin derin öğrenme tekniği uygulanarak çoklu sınıflandırılması. Mühendislik Bilimleri ve Tasarım Dergisi, 10(2), 504-519.
  • 24. Sevli, O., Kemaloğlu, N., 2021. Olağandışı olaylar hakkındaki tweet’lerin gerçek ve gerçek dışı olarak google BERT modeli ile sınıflandırılması. Veri Bilimi, 4(1), 31-37.
  • 25. Özkömürcü, H., 2021. Google Bert algoritması/Google Bert nedir? [Online]. Available: https://hozkomurcu.com/google-bert-algoritmasi-google-bert-nedir/, Access date: 06.2024.
  • 26. Liu, W., Zhang, S., Zhou, L., Luo, N., Xu, M., 2024. A semi-supervised mixture model of visual language multitask for vehicle recognition. Applied Soft Computing, 159, 111619.
  • 27. Sanh, V., Debut, L., Chaumond, J., Wolf, T., 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv Preprint arXiv, 1910.01108.
  • 28. Ranjan, G.S.K., Verma, A.K., Radhika, S., 2019. K-nearest neighbors and grid search cv based real time fault monitoring system for industries. In 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), 1-5. IEEE.
  • 29. Pirjatullah, Kartini, D., Nugrahadi, D.T., Muliadi, M., Farmadi, A., 2021. Hyperparameter tuning using GridsearchCV on the comparison of the activation function of the ELM method to the classification of pneumonia in toddlers. In 2021 4th International Conference of Computer and Informatics Engineering (IC2IE), 390-395. IEEE.
Toplam 29 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Doğal Dil İşleme
Bölüm Makaleler
Yazarlar

Emel Çolakoğlu Bu kişi benim 0000-0003-1755-3130

Serhat Hızlısoy 0000-0001-8440-5539

Recep Sinan Arslan 0000-0002-3028-0416

Yayımlanma Tarihi 25 Aralık 2024
Gönderilme Tarihi 26 Temmuz 2024
Kabul Tarihi 23 Aralık 2024
Yayımlandığı Sayı Yıl 2024 Cilt: 39 Sayı: 4

Kaynak Göster

APA Çolakoğlu, E., Hızlısoy, S., & Arslan, R. S. (2024). Crime Prediction with DistilBERT-based Feature Extraction and Machine Learning. Çukurova Üniversitesi Mühendislik Fakültesi Dergisi, 39(4), 1067-1079. https://doi.org/10.21605/cukurovaumfd.1606169