Comparative Analysis of Principle Component Analysis and Anova Feature Selection in Malware Detection

Taha Etem

doi:10.62520/fujece.1635121

Araştırma Makalesi

Comparative Analysis of Principle Component Analysis and Anova Feature Selection in Malware Detection

Yıl 2026, Cilt: 5 Sayı: 1 , 299 - 315 , 28.02.2026

Taha Etem

https://doi.org/10.62520/fujece.1635121

https://izlik.org/JA75WY33BW

Öz

This study presents a comparative analysis of Principal Component Analysis (PCA) and ANOVA-based feature selection methods for Android malware detection, evaluating their impact on classification accuracy and computational efficiency. Three preprocessing scenarios were examined: using the original dataset with 241 features, applying PCA for feature extraction (retaining all components due to variance thresholds), and employing ANOVA to reduce the feature set to 120. Support Vector Machines (SVM), Wide Neural Networks, and Logistic Regression classifiers were trained on these datasets, with hyperparameters optimized via 5-fold cross-validation. Results demonstrated that SVM consistently achieved the highest accuracy across all scenarios, peaking at 99.25% with PCA. However, PCA failed to reduce dimensionality of models and increased training times for SVM compared to the original dataset. In contrast, ANOVA effectively reduced the feature count, lowering SVM training time to 4.81 seconds while obtaining 98.95% accuracy. These findings highlight ANOVA as a computationally efficient method, balancing high detection performance with reduced resource demands. While PCA marginally improved accuracy, its computational cost renders it less practical for real-time applications. The study concludes that feature selection via ANOVA offers a superior trade-off for Android malware detection, prioritizing both accuracy and efficiency. Future work should explore advanced feature selection techniques and validate models on diverse datasets to enhance generalizability and address evolving malware threats.

Anahtar Kelimeler

Malware detection , Feature selection , Support vector machines , Android security.

Etik Beyan

There is no need for an ethics committee approval in the prepared article. There is no conflict of interest with any person/institution in the prepared article.

Kaynakça

K. V. S. Bai and M. Thirumaran, “Hybrid deep learning and behavioral analysis for enhanced malware detection in banking,” in Proc. 8th Int. Conf. Electron., Commun. Aerosp. Technol. (ICECA), 2024, pp. 1168–1173.
H. Y. Kwon, T. Kim, and M. K. Lee, “Advanced intrusion detection combining signature-based and behavior-based detection methods,” Electron., vol. 11, no. 6, p. 867, Mar. 2022.
S. Yakut, “Kayıplı resim sıkıştırma algoritmalarını temel alan rastgele sayı üreteci,” Adıyaman Üniv. Müh. Bilim. Derg., vol. 9, no. 18, pp. 571–580, Dec. 2022.
S. Yakut, “Random number generator based on discrete cosine transform based lossy picture compression,” MTU J. Eng. Nat. Sci., vol. 2, no. 2, pp. 76–85, 2021.
S. Yakut, T. Tuncer, and A. B. Özer, “A new secure and efficient approach for TRNG and its post-processing algorithms,” Int. J. Bifurcation Chaos, vol. 29, no. 15, May 2020.
S. Yakut, T. Tuncer, and A. B. Ozer, “Secure and efficient hybrid random number generator based on sponge constructions for cryptographic applications,” Elektron. Elektrotech., vol. 25, no. 4, pp. 40–46, Aug. 2019.
G. Areo, “Evaluating the efficacy of machine learning techniques in mitigating cybersecurity threats: A comprehensive analysis,” 2024.
T. Adewale, “The role of deep learning in cloud-based cybersecurity solutions,” 2024.
G. Cuiying et al., “Uncovering and mitigating the impact of code obfuscation on dataset annotation with antivirus engines,” in Proc. 33rd ACM SIGSOFT Int. Symp. Softw. Test. Anal. (ISSTA), 2024, pp. 553–565.
S. S. Rajest and R. Regin, “Application of the CatBoost classifier for the detection of Android ransomware,” Central Asian J. Math. Theory Comput. Sci., vol. 5, no. 5, pp. 476–486, Nov. 2024.
D. Totham, V. Andersson, H. Thompson, and J. Whitmore, “Dynamic ransomware detection with adaptive encryption pattern recognition techniques,” 2024.
M. A. Haq and M. Khuthaylah, “Leveraging machine learning for Android malware analysis: Insights from static and dynamic techniques,” Eng. Technol. Appl. Sci. Res., vol. 14, no. 4, pp. 15027–15032, Aug. 2024.
K. Kunku, A. N. K. Zaman, and K. Roy, “Ransomware detection and classification using machine learning,” in Proc. IEEE Symp. Ser. Comput. Intell. (SSCI), 2023, pp. 862–866.
G. Mubarak and S. Alasmari, “A framework to analyze OS systems artifacts from Linux machines,” Adv. Appl. Discrete Math., vol. 41, no. 8, pp. 603–640, Oct. 2024.
D. Gihavo, O. Ivanovich, A. Harrison, L. Merritt, and V. Schneider, “Automated file trap selection using machine learning for early detection of ransomware attacks,” Authorea Preprints, Oct. 2024.
K. Dolesi, E. Steinbach, A. Velasquez, L. Whitaker, M. Baranov, and L. Atherton, “A machine learning approach to ransomware detection using opcode features and k-nearest neighbors on Windows,” Authorea Preprints, Oct. 2024.
P. Borah, D. K. Bhattacharyya, and J. K. Kalita, “Malware dataset generation and evaluation,” in Proc. 4th IEEE Conf. Inf. Commun. Technol. (CICT), 2020.
A. Alhudhaif, B. Almaslukh, A. O. Aseeri, O. Guler, and K. Polat, “A novel nonlinear automated multi-class skin lesion detection system using soft-attention based convolutional neural networks,” Chaos Solitons Fractals, vol. 170, p. 113409, May 2023.
F. Ucar, O. F. Alcin, B. Dandil, and F. Ata, “Machine learning based power quality event classification using wavelet-entropy and basic statistical features,” in Proc. 21st Int. Conf. Methods Models Autom. Robot. (MMAR), 2016, pp. 414–419.
J. Zhang, “Machine learning with feature selection using principal component analysis for malware detection: A case study,” arXiv preprint, Feb. 2019.
D. Ö. Şahin, O. E. Kural, S. Akleylek, and E. Kılıç, “Permission-based Android malware analysis by using dimension reduction with PCA and LDA,” J. Inf. Secur. Appl., vol. 63, p. 102995, Dec. 2021.
M. Hadjila, M. Merzoug, W. Ferhi, D. Moussaoui, A. B. Bouidaine, and M. H. Hachemi, “Obfuscated malware detection using deep neural network with ANOVA feature selection on CIC-MalMem-2022 dataset,” Sci. Tech. J. Inf. Technol. Mech. Opt., vol. 24, no. 5, pp. 849–857, Sep. 2024.
M. J. Siraj, T. Ahmad, and R. M. Ijtihadie, “Analyzing ANOVA F-test and sequential feature selection for intrusion detection systems,” Int. J. Adv. Soft Comput. Appl., vol. 14, no. 2, pp. 185–194, 2022.
C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, pp. 273–297, 1995.
S. Buyrukoğlu, “New hybrid data mining model for prediction of Salmonella presence in agricultural waters based on ensemble feature selection and machine learning algorithms,” J. Food Saf., vol. 41, no. 4, p. e12903, Aug. 2021.
B. Karakaya, V. Çelik, and A. Gülten, “Chaotic cellular neural network-based true random number generator,” Int. J. Circuit Theory Appl., vol. 45, no. 11, pp. 1885–1897, 2017.
O. Güler, “Turbofan motorlarının kestirimci bakımında makine öğrenimi algoritmaları performanslarının karşılaştırılması,” Niğde Ömer Halisdemir Üniv. Müh. Bilim. Derg., vol. 13, no. 1, pp. 99–106, Jan. 2024.
R. K. Sheu, M. S. Pardeshi, K. C. Pai, L. C. Chen, C. L. Wu, and W. C. Chen, “Interpretable classification of pneumonia infection using explainable AI (XAI-ICP),” IEEE Access, vol. 11, pp. 28896–28919, 2023.
F. Türk, “Investigation of machine learning algorithms on heart disease through dominant feature detection and feature selection,” Signal Image Video Process., vol. 18, no. 4, pp. 3943–3955, Jun. 2024.
S. Ahmadov and A. Boyacı, “Multilingual text mining based open source emotional intelligence,” Turkish J. Sci. Technol., vol. 17, no. 2, pp. 161–166, Sep. 2022.
H. A. Kwaider and E. Avaroğlu, “Threats detection in IoT network,” Turkish J. Sci. Technol., vol. 18, no. 1, pp. 113–122, Mar. 2023.
A. Bilen and A. B. Özer, “Siber saldırılar için rastgele orman algoritması kullanılarak öznitelik seçimi,” Fırat Üniv. Fen Bilim. Derg., vol. 34, no. 1, pp. 31–37, Mar. 2022.
A. İmak, “Automatic classification of defective photovoltaic module cells based on a novel CNN-PCA-SVM deep hybrid model in electroluminescence images,” Turkish J. Sci. Technol., vol. 19, no. 2, pp. 497–508, Sep. 2024.
İ. Riaz, N. Mushtaq, M. M. Jillani, and U. Navaz, “Performance analysis of Pakistan Super League players using principal component analysis approach,” Sci. J. Mehmet Akif Ersoy Univ., vol. 2, no. 4, pp. 127–135, Dec. 2019.
A. Alhogail and R. A. Alharbi, “Effective ML-based Android malware detection and categorization,” Electron., vol. 14, no. 8, p. 1486, Apr. 2025.

Kötü Amaçlı Yazılım Tespiti İçin Temel Bileşen Analizi ve Anova Özellik Seçiminin Karşılaştırmalı Analizi

Yıl 2026, Cilt: 5 Sayı: 1 , 299 - 315 , 28.02.2026

Taha Etem

https://doi.org/10.62520/fujece.1635121

https://izlik.org/JA75WY33BW

Öz

Bu çalışma, Android kötü amaçlı yazılım tespiti için Temel Bileşen Analizi (PCA) ve ANOVA tabanlı özellik seçimi yöntemlerinin karşılaştırmalı bir analizini sunarak, bunların sınıflandırma doğruluğu ve hesaplama verimliliği üzerindeki etkilerini değerlendirmektedir. Üç ön işleme senaryosu incelenmiştir: 241 özelliğe sahip orijinal veri setini kullanma, özellik çıkarma için PCA uygulama (varyans eşikleri nedeniyle tüm bileşenleri tutma) ve özellik setini 120'ye düşürmek için ANOVA kullanma. Destek Vektör Makineleri (SVM), Geniş Sinir Ağları ve Lojistik Regresyon sınıflandırıcıları, 5 katlı çapraz doğrulama yoluyla optimize edilen hiper parametrelerle bu veri setleri üzerinde eğitilmiştir. Sonuçlar, SVM'nin tüm senaryolarda tutarlı bir şekilde en yüksek doğruluğu elde ettiğini ve PCA ile %99,25'a ulaştığını göstermiştir. Ancak, PCA model boyutunu azaltmada başarısız olmuş ve orijinal veri setine kıyasla SVM için eğitim sürelerini artırmıştır. Bunun aksine, ANOVA özellik sayısını etkili bir şekilde azaltarak SVM eğitim süresini 4,81 saniyeye düşürürken %98,95 doğruluk oranı elde etmiştir. Bu bulgular, ANOVA'nın yüksek tespit performansını azaltılmış kaynak talepleriyle dengeleyen hesaplama açısından verimli bir yöntem olduğunu vurgulamaktadır. PCA doğruluğu marjinal olarak iyileştirirken, hesaplama maliyeti onu gerçek zamanlı uygulamalar için daha az pratik hale getirmektedir. Çalışma, ANOVA aracılığıyla özellik seçiminin Android kötü amaçlı yazılım tespiti için hem doğruluğu hem de verimliliği önceliklendirerek üstün bir takas sağladığı sonucuna varmıştır. Gelecekteki çalışmalar, genelleştirilebilirliği artırmak ve gelişen kötü amaçlı yazılım tehditlerini ele almak için gelişmiş özellik seçimi tekniklerini keşfetmeli ve modelleri çeşitli veri kümelerinde doğrulamalıdır.

Anahtar Kelimeler

: Kötü amaçlı yazılım tespiti , Özellik seçimi , Destek vektör makineleri , Android güvenliği

Etik Beyan

Hazırlanan makale için etik kurul onayına gerek yoktur. Hazırlanan makalede herhangi bir kişi/kurumla çıkar çatışması bulunmamaktadır.

Kaynakça

K. V. S. Bai and M. Thirumaran, “Hybrid deep learning and behavioral analysis for enhanced malware detection in banking,” in Proc. 8th Int. Conf. Electron., Commun. Aerosp. Technol. (ICECA), 2024, pp. 1168–1173.
H. Y. Kwon, T. Kim, and M. K. Lee, “Advanced intrusion detection combining signature-based and behavior-based detection methods,” Electron., vol. 11, no. 6, p. 867, Mar. 2022.
S. Yakut, “Kayıplı resim sıkıştırma algoritmalarını temel alan rastgele sayı üreteci,” Adıyaman Üniv. Müh. Bilim. Derg., vol. 9, no. 18, pp. 571–580, Dec. 2022.
S. Yakut, “Random number generator based on discrete cosine transform based lossy picture compression,” MTU J. Eng. Nat. Sci., vol. 2, no. 2, pp. 76–85, 2021.
S. Yakut, T. Tuncer, and A. B. Özer, “A new secure and efficient approach for TRNG and its post-processing algorithms,” Int. J. Bifurcation Chaos, vol. 29, no. 15, May 2020.
S. Yakut, T. Tuncer, and A. B. Ozer, “Secure and efficient hybrid random number generator based on sponge constructions for cryptographic applications,” Elektron. Elektrotech., vol. 25, no. 4, pp. 40–46, Aug. 2019.
G. Areo, “Evaluating the efficacy of machine learning techniques in mitigating cybersecurity threats: A comprehensive analysis,” 2024.
T. Adewale, “The role of deep learning in cloud-based cybersecurity solutions,” 2024.
G. Cuiying et al., “Uncovering and mitigating the impact of code obfuscation on dataset annotation with antivirus engines,” in Proc. 33rd ACM SIGSOFT Int. Symp. Softw. Test. Anal. (ISSTA), 2024, pp. 553–565.
S. S. Rajest and R. Regin, “Application of the CatBoost classifier for the detection of Android ransomware,” Central Asian J. Math. Theory Comput. Sci., vol. 5, no. 5, pp. 476–486, Nov. 2024.
D. Totham, V. Andersson, H. Thompson, and J. Whitmore, “Dynamic ransomware detection with adaptive encryption pattern recognition techniques,” 2024.
M. A. Haq and M. Khuthaylah, “Leveraging machine learning for Android malware analysis: Insights from static and dynamic techniques,” Eng. Technol. Appl. Sci. Res., vol. 14, no. 4, pp. 15027–15032, Aug. 2024.
K. Kunku, A. N. K. Zaman, and K. Roy, “Ransomware detection and classification using machine learning,” in Proc. IEEE Symp. Ser. Comput. Intell. (SSCI), 2023, pp. 862–866.
G. Mubarak and S. Alasmari, “A framework to analyze OS systems artifacts from Linux machines,” Adv. Appl. Discrete Math., vol. 41, no. 8, pp. 603–640, Oct. 2024.
D. Gihavo, O. Ivanovich, A. Harrison, L. Merritt, and V. Schneider, “Automated file trap selection using machine learning for early detection of ransomware attacks,” Authorea Preprints, Oct. 2024.
K. Dolesi, E. Steinbach, A. Velasquez, L. Whitaker, M. Baranov, and L. Atherton, “A machine learning approach to ransomware detection using opcode features and k-nearest neighbors on Windows,” Authorea Preprints, Oct. 2024.
P. Borah, D. K. Bhattacharyya, and J. K. Kalita, “Malware dataset generation and evaluation,” in Proc. 4th IEEE Conf. Inf. Commun. Technol. (CICT), 2020.
A. Alhudhaif, B. Almaslukh, A. O. Aseeri, O. Guler, and K. Polat, “A novel nonlinear automated multi-class skin lesion detection system using soft-attention based convolutional neural networks,” Chaos Solitons Fractals, vol. 170, p. 113409, May 2023.
F. Ucar, O. F. Alcin, B. Dandil, and F. Ata, “Machine learning based power quality event classification using wavelet-entropy and basic statistical features,” in Proc. 21st Int. Conf. Methods Models Autom. Robot. (MMAR), 2016, pp. 414–419.
J. Zhang, “Machine learning with feature selection using principal component analysis for malware detection: A case study,” arXiv preprint, Feb. 2019.
D. Ö. Şahin, O. E. Kural, S. Akleylek, and E. Kılıç, “Permission-based Android malware analysis by using dimension reduction with PCA and LDA,” J. Inf. Secur. Appl., vol. 63, p. 102995, Dec. 2021.
M. Hadjila, M. Merzoug, W. Ferhi, D. Moussaoui, A. B. Bouidaine, and M. H. Hachemi, “Obfuscated malware detection using deep neural network with ANOVA feature selection on CIC-MalMem-2022 dataset,” Sci. Tech. J. Inf. Technol. Mech. Opt., vol. 24, no. 5, pp. 849–857, Sep. 2024.
M. J. Siraj, T. Ahmad, and R. M. Ijtihadie, “Analyzing ANOVA F-test and sequential feature selection for intrusion detection systems,” Int. J. Adv. Soft Comput. Appl., vol. 14, no. 2, pp. 185–194, 2022.
C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, pp. 273–297, 1995.
S. Buyrukoğlu, “New hybrid data mining model for prediction of Salmonella presence in agricultural waters based on ensemble feature selection and machine learning algorithms,” J. Food Saf., vol. 41, no. 4, p. e12903, Aug. 2021.
B. Karakaya, V. Çelik, and A. Gülten, “Chaotic cellular neural network-based true random number generator,” Int. J. Circuit Theory Appl., vol. 45, no. 11, pp. 1885–1897, 2017.
O. Güler, “Turbofan motorlarının kestirimci bakımında makine öğrenimi algoritmaları performanslarının karşılaştırılması,” Niğde Ömer Halisdemir Üniv. Müh. Bilim. Derg., vol. 13, no. 1, pp. 99–106, Jan. 2024.
R. K. Sheu, M. S. Pardeshi, K. C. Pai, L. C. Chen, C. L. Wu, and W. C. Chen, “Interpretable classification of pneumonia infection using explainable AI (XAI-ICP),” IEEE Access, vol. 11, pp. 28896–28919, 2023.
F. Türk, “Investigation of machine learning algorithms on heart disease through dominant feature detection and feature selection,” Signal Image Video Process., vol. 18, no. 4, pp. 3943–3955, Jun. 2024.
S. Ahmadov and A. Boyacı, “Multilingual text mining based open source emotional intelligence,” Turkish J. Sci. Technol., vol. 17, no. 2, pp. 161–166, Sep. 2022.
H. A. Kwaider and E. Avaroğlu, “Threats detection in IoT network,” Turkish J. Sci. Technol., vol. 18, no. 1, pp. 113–122, Mar. 2023.
A. Bilen and A. B. Özer, “Siber saldırılar için rastgele orman algoritması kullanılarak öznitelik seçimi,” Fırat Üniv. Fen Bilim. Derg., vol. 34, no. 1, pp. 31–37, Mar. 2022.
A. İmak, “Automatic classification of defective photovoltaic module cells based on a novel CNN-PCA-SVM deep hybrid model in electroluminescence images,” Turkish J. Sci. Technol., vol. 19, no. 2, pp. 497–508, Sep. 2024.
İ. Riaz, N. Mushtaq, M. M. Jillani, and U. Navaz, “Performance analysis of Pakistan Super League players using principal component analysis approach,” Sci. J. Mehmet Akif Ersoy Univ., vol. 2, no. 4, pp. 127–135, Dec. 2019.
A. Alhogail and R. A. Alharbi, “Effective ML-based Android malware detection and categorization,” Electron., vol. 14, no. 8, p. 1486, Apr. 2025.

Toplam 35 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Yazılım Mühendisliği (Diğer)
Bölüm	Araştırma Makalesi
Yazarlar	Taha Etem 0000-0003-1419-5008
Gönderilme Tarihi	7 Şubat 2025
Kabul Tarihi	6 Ocak 2026
Yayımlanma Tarihi	28 Şubat 2026
DOI	https://doi.org/10.62520/fujece.1635121
IZ	https://izlik.org/JA75WY33BW
Yayımlandığı Sayı	Yıl 2026 Cilt: 5 Sayı: 1

Kaynak Göster

APA	Etem, T. (2026). Comparative Analysis of Principle Component Analysis and Anova Feature Selection in Malware Detection. Firat University Journal of Experimental and Computational Engineering, 5(1), 299-315. https://doi.org/10.62520/fujece.1635121
AMA	1.Etem T. Comparative Analysis of Principle Component Analysis and Anova Feature Selection in Malware Detection. Firat University Journal of Experimental and Computational Engineering. 2026;5(1):299-315. doi:10.62520/fujece.1635121
Chicago	Etem, Taha. 2026. “Comparative Analysis of Principle Component Analysis and Anova Feature Selection in Malware Detection”. Firat University Journal of Experimental and Computational Engineering 5 (1): 299-315. https://doi.org/10.62520/fujece.1635121.
EndNote	Etem T (01 Şubat 2026) Comparative Analysis of Principle Component Analysis and Anova Feature Selection in Malware Detection. Firat University Journal of Experimental and Computational Engineering 5 1 299–315.
IEEE	[1]T. Etem, “Comparative Analysis of Principle Component Analysis and Anova Feature Selection in Malware Detection”, Firat University Journal of Experimental and Computational Engineering, c. 5, sy 1, ss. 299–315, Şub. 2026, doi: 10.62520/fujece.1635121.
ISNAD	Etem, Taha. “Comparative Analysis of Principle Component Analysis and Anova Feature Selection in Malware Detection”. Firat University Journal of Experimental and Computational Engineering 5/1 (01 Şubat 2026): 299-315. https://doi.org/10.62520/fujece.1635121.
JAMA	1.Etem T. Comparative Analysis of Principle Component Analysis and Anova Feature Selection in Malware Detection. Firat University Journal of Experimental and Computational Engineering. 2026;5:299–315.
MLA	Etem, Taha. “Comparative Analysis of Principle Component Analysis and Anova Feature Selection in Malware Detection”. Firat University Journal of Experimental and Computational Engineering, c. 5, sy 1, Şubat 2026, ss. 299-15, doi:10.62520/fujece.1635121.
Vancouver	1.Taha Etem. Comparative Analysis of Principle Component Analysis and Anova Feature Selection in Malware Detection. Firat University Journal of Experimental and Computational Engineering. 01 Şubat 2026;5(1):299-315. doi:10.62520/fujece.1635121

Makale Dosyaları

Tam Metin

Bu eser Creative Commons Atıf-GayriTicari 4.0 Uluslararası Lisansı (CC BY NC) ile lisanslanmıştır.