Görsel Özellik Çıkarma ve Makine Öğrenmesi ile Grafiksel Şiddet Tespiti: Derin Öğrenmesiz, Verimli Bir Yaklaşım
Year 2026,
Volume: 14 Issue: 1, 216 - 224, 21.01.2026
Mehmet Osman Devrim
,
Serdar Kırışoğlu
Abstract
Bu çalışma, dijital platformlarda yaygın bir sorun olan grafiksel şiddet içeren görsellerin otomatik tespiti için verimli bir makine öğrenmesi yaklaşımı sunmaktadır. Çalışmada, sınıflar arasında yaklaşık 16:1 oranında ciddi bir dengesizlik sergileyen 'Graphical Violence and Safe Images' veri seti kullanılmıştır. Veri setinin görece küçük ölçeği bir sınırlılık olarak not edilmekle birlikte, modelin genelleme kabiliyetini artırmak ve sınıf dengesizliğini gidermek amacıyla, veri seti %75 eğitim ve %25 test olarak ayrıldıktan sonra yalnızca eğitim setindeki azınlık sınıfı üzerinde veri artırma (data augmentation) uygulanmıştır.
Önerilen yaklaşımın temelini, renk, doku ve şekil bilgilerini birleştiren hibrit bir özellik çıkarımı stratejisi oluşturmaktadır. Hesaplama verimliliğini artırmak amacıyla bu zengin özellik uzayı, ANOVA F-testi tabanlı özellik seçimi ile en ayırt edici niteliklere indirgenmiş ve ardından beş farklı makine öğrenmesi modeliyle değerlendirilmiştir. Deneysel sonuçlar, test setinde en yüksek performansı XGBoost modelinin elde ettiğini ortaya koymuştur. Model, sınıf dengesizliğine rağmen %84,38 Makro F1-Skoru ile güçlü bir ayrım kabiliyeti gösterirken %96,55 genel doğruluk oranına ulaşmıştır. Bu bulgular, önerilen yaklaşımın sosyal medya platformlarında otomatik içerik moderasyonu, çocuk koruma filtreleri ve dijital güvenlik sistemleri gibi kritik uygulamalarda, derin öğrenmeye kıyasla daha düşük hesaplama maliyetiyle yüksek doğruluk sağlayan etkili bir alternatif olabileceğini göstermektedir.
Ethical Statement
Bu çalışmada kullanılan "Grafik Şiddet ve Güvenli Görüntüler" veri seti Kaggle platformunda kamuya açıktır. Veri seti herhangi bir kişisel, klinik veya hassas veri içermemektedir. Bu nedenle, bu çalışma için etik kurul onayı veya katılımcı onayı gerekmemektedir. Yazarlar arasında herhangi bir çıkar çatışması bulunmamaktadır.
Supporting Institution
Düzce Üniversitesi Bilimsel Araştırma Koordinatörlüğü
Project Number
2024.06.01.1550
Thanks
Bu projeyi destekleyen Düzce Üniversitesi Bilimsel Araştırma Koordinatörlüğü'ne vermiş oldukları destekten dolayı teşekkür ederiz.
References
-
Abundez, I. M., Alejo, R., Primero, F. P., Granda-Gutiérrez, E. E., Portillo-Rodríguez, O., & Velázquez, J. A. A. (2024). Threshold active learning approach for physical violence detection on images obtained from video (frame-level) using pre-trained deep learning neural network models. Algorithms, 17(7), Article 316. https://doi.org/10.3390/a17070316
-
Azzakhnini, M., Saidi, H., Azough, A., Tairi, H., & Qjidaa, H. (2025). LAVID: A lightweight and autonomous smart camera system for urban violence detection and geolocation. Computers, 14(4), Article 140. https://doi.org/10.3390/computers14040140
-
Bartwal, K. (2024). Graphical violence and safe images dataset [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/8534050
-
Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations Newsletter, 6(1), 20–29. https://doi.org/10.1145/1007730.1007735
-
Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the Kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6299–6308). https://doi.org/10.1109/CVPR.2017.502
-
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
-
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (Vol. 1, pp. 886–893). https://doi.org/10.1109/CVPR.2005.177
-
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16×16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2010.11929
-
Hand, D. J., & Yu, K. (2001). Idiot’s Bayes—not so stupid after all? International Statistical Review, 69(3), 385–398. https://doi.org/10.1111/j.1751-5823.2001.tb00465.x
-
Haralick, R. M., Shanmugam, K., & Dinstein, I. (1973). Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics, 3(6), 610–621. https://doi.org/10.1109/TSMC.1973.4309314
-
He, H., Bai, Y., Garcia, E., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN) (pp. 1322–1328). https://doi.org/10.1109/IJCNN.2008.4633969
-
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239
-
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90
-
Krizhevsky, A., Sutskever, I., & Hinton, G. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. https://doi.org/10.1145/3065386
-
Kumar, A., & Pang, G. K. H. (2002). Defect detection in textured materials using Gabor filters. IEEE Transactions on Industry Applications, 38(2), 425–440. https://doi.org/10.1109/28.993164
-
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., & Chang, Y. (2016). Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web (WWW) (pp. 145–153). https://doi.org/10.1145/2872427.2883062
-
Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987. https://doi.org/10.1109/TPAMI.2002.1017623
-
Pareek, P., & Thakkar, A. (2021). A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artificial Intelligence Review, 54(3), 2259–2322. https://doi.org/10.1007/s10462-020-09904-8
-
Park, J.-H., Mahmoud, M., & Kang, H.-S. (2024). Conv3D-based video violence detection network using optical flow and RGB data. Sensors, 24(2), Article 317. https://doi.org/10.3390/s24020317
-
Pathak, G., Kumar, A., Rawat, S., & Gupta, S. (2024). Streamlining video analysis for efficient violence detection. In Proceedings of the 15th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP’24). https://doi.org/10.48550/arXiv.2412.02127
-
Perez, L., & Wang, J. (2017). The effectiveness of data augmentation in image classification using deep learning. arXiv. https://doi.org/10.48550/arXiv.1712.04621
-
Powers, D. M. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. International Journal of Machine Learning Technology, 2(1), 37–63. https://doi.org/10.48550/arXiv.2010.16061
-
Qi, B., Wu, B., & Sun, B. (2025). Automated violence monitoring system for real-time fistfight detection using deep learning-based temporal action localization. Scientific Reports, 15(1), Article 29497. https://doi.org/10.1038/s41598-025-12531-4
-
Radford, A., Metz, L., & Chintala, S. (2016). Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of the International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.1511.06434
-
Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 512–519).
-
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-019-0048-x
-
Samek, W., Wiegand, T., & Müller, K. (2017). Explainable artificial intelligence: Understanding, visualising and interpreting deep learning models. In Proceedings of the IEEE Information Theory Workshop (ITW) (pp. 1–10). https://doi.org/10.48550/arXiv.1708.08296
-
Shafizadegan, F., Naghsh-Nilchi, A. R., & Shabaninia, E. (2024). Multimodal vision-based human action recognition using deep learning: A review. Artificial Intelligence Review, 57, Article 178. https://doi.org/10.1007/s10462-024-10730-5
-
Sharma, D. K., Singh, B., Agarwal, S., Garg, L., Kim, C., & Jung, K.-H. (2023). A survey of detection and mitigation for fake images on social media platforms. Applied Sciences, 13(19), Article 10980. https://doi.org/10.3390/app131910980
-
Shorten, C., & Khoshgoftaar, T. (2019). A survey on image data augmentation for deep learning. Journal of Big Data, 6, Article 60. https://doi.org/10.1186/s40537-019-0197-0
-
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.1409.1556
-
Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (pp. 843–852).
-
Swain, M. J., & Ballard, D. H. (1991). Color indexing. International Journal of Computer Vision, 7(1), 11–32. https://doi.org/10.1007/BF00130487
-
Varghese, E. B., Elzein, A., Yang, Y., & Qaraqe, M. (2025). A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection. Neural Computing and Applications, 37, 26689–26709. https://doi.org/10.1007/s00521-025-11641-4
-
Yao, X., Wu, Y., Xie, G., Zhao, D., & Xia, D.-H. (2025). A comprehensive survey on multimodal computer vision (CV)-assisted corrosion assessment and corrosion prediction. Corrosion Engineering, Science and Technology, 1–38. https://doi.org/10.1177/1478422X251387524
Visual Feature Extraction and Machine Learning for Graphical Violence Detection: A Deep Learning-Free, Efficient Approach
Year 2026,
Volume: 14 Issue: 1, 216 - 224, 21.01.2026
Mehmet Osman Devrim
,
Serdar Kırışoğlu
Abstract
This study proposes an efficient, deep learning-free approach for detecting graphically violent images to address the challenges of high computational costs and class imbalance in digital content moderation. Using the "Graphical Violence and Safe Images" dataset, we employed a hybrid feature extraction strategy combining color (3D Histogram), texture (Local Binary Patterns [LBP], Gray-Level Co-occurrence Matrix [GLCM]), and shape (Histogram of Oriented Gradients [HOG]) descriptors, followed by Analysis of Variance (ANOVA)-based feature selection. Among five machine learning models evaluated, XGBoost achieved the highest performance with 96.55% accuracy and an 84.38% Macro F1-Score on the test set. Furthermore, the proposed method offers a processing time of approximately 33.85 ms per image on a standard CPU. The results demonstrate that the proposed method offers a computationally efficient and interpretable alternative to deep learning for real-time applications.
Ethical Statement
This study does not involve human or animal participants. All procedures followed scientific and ethical principles, and all referenced studies are appropriately cited.
Supporting Institution
This study was supported by the Düzce University Scientific Research Projects Coordination Office (BAP - 2024.06.01.1550) through the HIZDEP project.
Project Number
2024.06.01.1550
Thanks
The authors would like to thank Düzce University Scientific Research Projects Coordinatorship (BAP) for their support.
References
-
Abundez, I. M., Alejo, R., Primero, F. P., Granda-Gutiérrez, E. E., Portillo-Rodríguez, O., & Velázquez, J. A. A. (2024). Threshold active learning approach for physical violence detection on images obtained from video (frame-level) using pre-trained deep learning neural network models. Algorithms, 17(7), Article 316. https://doi.org/10.3390/a17070316
-
Azzakhnini, M., Saidi, H., Azough, A., Tairi, H., & Qjidaa, H. (2025). LAVID: A lightweight and autonomous smart camera system for urban violence detection and geolocation. Computers, 14(4), Article 140. https://doi.org/10.3390/computers14040140
-
Bartwal, K. (2024). Graphical violence and safe images dataset [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/8534050
-
Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations Newsletter, 6(1), 20–29. https://doi.org/10.1145/1007730.1007735
-
Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the Kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6299–6308). https://doi.org/10.1109/CVPR.2017.502
-
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
-
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (Vol. 1, pp. 886–893). https://doi.org/10.1109/CVPR.2005.177
-
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16×16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2010.11929
-
Hand, D. J., & Yu, K. (2001). Idiot’s Bayes—not so stupid after all? International Statistical Review, 69(3), 385–398. https://doi.org/10.1111/j.1751-5823.2001.tb00465.x
-
Haralick, R. M., Shanmugam, K., & Dinstein, I. (1973). Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics, 3(6), 610–621. https://doi.org/10.1109/TSMC.1973.4309314
-
He, H., Bai, Y., Garcia, E., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN) (pp. 1322–1328). https://doi.org/10.1109/IJCNN.2008.4633969
-
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239
-
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90
-
Krizhevsky, A., Sutskever, I., & Hinton, G. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. https://doi.org/10.1145/3065386
-
Kumar, A., & Pang, G. K. H. (2002). Defect detection in textured materials using Gabor filters. IEEE Transactions on Industry Applications, 38(2), 425–440. https://doi.org/10.1109/28.993164
-
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., & Chang, Y. (2016). Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web (WWW) (pp. 145–153). https://doi.org/10.1145/2872427.2883062
-
Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987. https://doi.org/10.1109/TPAMI.2002.1017623
-
Pareek, P., & Thakkar, A. (2021). A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artificial Intelligence Review, 54(3), 2259–2322. https://doi.org/10.1007/s10462-020-09904-8
-
Park, J.-H., Mahmoud, M., & Kang, H.-S. (2024). Conv3D-based video violence detection network using optical flow and RGB data. Sensors, 24(2), Article 317. https://doi.org/10.3390/s24020317
-
Pathak, G., Kumar, A., Rawat, S., & Gupta, S. (2024). Streamlining video analysis for efficient violence detection. In Proceedings of the 15th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP’24). https://doi.org/10.48550/arXiv.2412.02127
-
Perez, L., & Wang, J. (2017). The effectiveness of data augmentation in image classification using deep learning. arXiv. https://doi.org/10.48550/arXiv.1712.04621
-
Powers, D. M. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. International Journal of Machine Learning Technology, 2(1), 37–63. https://doi.org/10.48550/arXiv.2010.16061
-
Qi, B., Wu, B., & Sun, B. (2025). Automated violence monitoring system for real-time fistfight detection using deep learning-based temporal action localization. Scientific Reports, 15(1), Article 29497. https://doi.org/10.1038/s41598-025-12531-4
-
Radford, A., Metz, L., & Chintala, S. (2016). Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of the International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.1511.06434
-
Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 512–519).
-
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-019-0048-x
-
Samek, W., Wiegand, T., & Müller, K. (2017). Explainable artificial intelligence: Understanding, visualising and interpreting deep learning models. In Proceedings of the IEEE Information Theory Workshop (ITW) (pp. 1–10). https://doi.org/10.48550/arXiv.1708.08296
-
Shafizadegan, F., Naghsh-Nilchi, A. R., & Shabaninia, E. (2024). Multimodal vision-based human action recognition using deep learning: A review. Artificial Intelligence Review, 57, Article 178. https://doi.org/10.1007/s10462-024-10730-5
-
Sharma, D. K., Singh, B., Agarwal, S., Garg, L., Kim, C., & Jung, K.-H. (2023). A survey of detection and mitigation for fake images on social media platforms. Applied Sciences, 13(19), Article 10980. https://doi.org/10.3390/app131910980
-
Shorten, C., & Khoshgoftaar, T. (2019). A survey on image data augmentation for deep learning. Journal of Big Data, 6, Article 60. https://doi.org/10.1186/s40537-019-0197-0
-
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.1409.1556
-
Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (pp. 843–852).
-
Swain, M. J., & Ballard, D. H. (1991). Color indexing. International Journal of Computer Vision, 7(1), 11–32. https://doi.org/10.1007/BF00130487
-
Varghese, E. B., Elzein, A., Yang, Y., & Qaraqe, M. (2025). A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection. Neural Computing and Applications, 37, 26689–26709. https://doi.org/10.1007/s00521-025-11641-4
-
Yao, X., Wu, Y., Xie, G., Zhao, D., & Xia, D.-H. (2025). A comprehensive survey on multimodal computer vision (CV)-assisted corrosion assessment and corrosion prediction. Corrosion Engineering, Science and Technology, 1–38. https://doi.org/10.1177/1478422X251387524