Türkçe Oltalama E-Postalarının Anlamsal Tespiti: Doğal Dil İşleme ve Derin Öğrenme Tabanlı Bir Yaklaşım
Yıl 2025,
Cilt: 1 Sayı: 1, 29 - 42, 30.06.2025
Merve Gül Taş
Öz
Bu çalışma, Türkçe e-posta içeriklerindeki kimlik avı (phishing) saldırılarını anlamsal düzeyde tespit etmeye yönelik bir metin sınıflandırma yaklaşımı sunmaktadır. Gerçek ve sahte e-postalardan dengeli olarak oluşturulmuş bir veri kümesi kullanılmıştır. Ön işleme sürecinde küçük harfe dönüştürme, noktalama işaretlerinin temizlenmesi ve TF-IDF tabanlı vektörleştirme uygulanmış; bağlamsal temsiller ise BERTurk modeli aracılığıyla elde edilmiştir. Sınıflandırma işlemi Naive Bayes, SVM, LSTM, ELM ve BERT algoritmalarıyla gerçekleştirilmiştir. Modeller Google Colab ortamında eğitilmiş ve doğruluk, F1 skoru ile ROC-AUC metrikleri üzerinden değerlendirilmiştir. Sonuçlar, BERT modelinin Türkçe phishing e-postalarındaki anlamsal farkları başarılı biçimde ayırt ettiğini ortaya koymaktadır. Bu çalışma, morfolojik açıdan zengin dillerde phishing tespiti konusunda literatürdeki boşluğu doldurmayı amaçlamakta ve gerçek zamanlı siber güvenlik sistemlerine entegre edilebilecek ölçeklenebilir bir model önermektedir. Elde edilen bulgular, düşük kaynaklı dillerde bağlamsal doğal dil işleme yöntemlerinin etkinliğini de ortaya koymaktadır.
Etik Beyan
Bu makale insan veya hayvan denekleriyle ilgili herhangi bir çalışma içermemektedir. Çalışmanın hazırlanma sürecinde bilimsel ve etik ilkelere uyulmuş; yararlanılan tüm kaynaklar kaynakça bölümünde belirtilmiştir.
Destekleyen Kurum
Bu araştırma herhangi bir dış fon desteği almamıştır.
Teşekkür
Bu çalışmanın yürütülmesi sürecinde katkılarını esirgemeyen danışman hocama ve destek sağlayan tüm aka-demik personele teşekkür ederim.
Kaynakça
-
Ahi, Ş. ve Soğukpınar, İ. (2023). Derin öğrenme modelleri ile kimlik avı e-posta tespiti. Türkiye Bilim Vakfı Bilgisayar Bilimleri ve Mü-hendisliği Dergisi, 13(2), 17–29.
-
Aldakheel, E. A., Zakariah, M., Gashgari, G. A., Almarshad, F. A., & Alzahrani, A. I. A. (2023). A deep learning-based innovative tech-nique for phishing detection in modern security with uniform resource locators. Sensors, 23(9), 4403. https://doi.org/10.3390/s23094403
-
Alhogail, A., & Alsabih, A. (2021). Applying machine learning and natural language processing to detect phishing email. Computers & Se-curity, 110, 102414. https://doi.org/10.1016/j.cose.2021.102414.
-
Al-Yozbaky, R. Sh. ve Alanezi, M. (2023). Detection and analyzing phishing emails using NLP techniques. 2023 5th International Confe-rence on Human-Computer Interaction, Optimization and Robotic Applications (HORA), 1-6. https://doi.org/10.1109/HORA58378.2023.10156738.
-
AlJamal, M., Alquran, R., Aljaidi, M., AlJamal, O. S., Alsarhan, A., AL-Aiash, I., Samara, G., BaniSalman, M. ve Khouj, M. (2024). Har-nessing ML and NLP for enhanced cybersecurity: A comprehensive approach for phishing email detection. 2024 25th Internatio-nal Arab Conference on Information Technology (ACIT), 1–6. https://doi.org/10.1109/ACIT62805.2024.10877181.
-
Anilkumar, C., Karrothu, A., Sri Mouli, N. ve Bhanu Tej, C. (2023). Recognition and processing of phishing emails using NLP: A survey. 2023 International Conference on Computer Communication and Informatics (ICCCI), 1–6. https://doi.org/10.1109/ICCCI56745.2023.10128481.
-
Atawneh, S. ve Aljehani, H. (2023). Phishing email detection model using deep learning. Electronics, 12(4261), 1–15. https://doi.org/10.3390/electronics12204261.
-
Buber, E., Diri, B. ve Şahingöz, Ö. K. (2017). DDİ yöntemleri ile oltalama saldırılarının URL’den tespiti. 2017 Uluslararası Bilgisayar Bi-limleri ve Mühendisliği Konferansı (UBMK), 253–258. https://doi.org/10.1109/UBMK.2017.8093406.
-
Egozi, G. ve Verma, R. (2018). Phishing email detection using robust NLP techniques. 2018 IEEE International Conference on Data Mi-ning Workshops (ICDMW), 1–8. https://doi.org/10.1109/ICDMW.2018.00009.
-
Eryılmaz, E. E., Şahin, D. Ö. ve Kılıç, E. (2020). Türkçe için makine öğrenimi tabanlı yaramaz elektronik posta algılama sistemi. 5. Ulus-lararası Bilgisayar Bilimleri ve Mühendisliği Konferansı (UBMK 2020), Kocaeli, Türkiye, ss. 122–130. https://doi.org/10.1109/UBMK50275.2020.9115625.
-
Fahim, R. A., Arman, M. S., Sultana, I., Tasnim, N., Ahmed, K. R. ve Mahmud, I. (2024). PhishGuard: Leveraging NLP and machine le-arning for email phishing detection. 2024 International Conference on Big Data Analytics in Bioinformatics (DABCon), 1–6. https://doi.org/10.1109/DABCON63472.2024.10919349.
-
Fette, I., Sadeh, N. ve Tomasic, A. (2007). Learning to detect phishing. 16th International Conference on World Wide Web (WWW '07), Banff, Alberta, Kanada, ss. 649–656. ACM. https://doi.org/10.1145/1242572.1242650.
-
Hasanov, I., Virtanen, S., Hakkala, A., & Isoaho, J. (2024). Application of large language models in cybersecurity: A systematic literature review. IEEE Access, 12, 176751–176773. https://doi.org/10.1109/ACCESS.2024.3505983.
-
Ibrahim, A., Aljarah, I. ve Al-Betar, M. A. (2024). Phishing detection in Arabic SMS messages using natural language proces-sing. Proceedings of the 2024 Seventh International Women in Data Science Conference at Prince Sultan University (WiDS PSU), Riyadh, Saudi Arabia. https://doi.org/10.1109/WiDS-PSU61003.2024.00040.
-
Kopparaju, S. T., Chavarriaga, C. ve Galarreta, E. (2024). Natural language processing-enhanced machine learning framework for comp-rehensive phishing email identification. 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), 1-6. https://doi.org/10.1109/ICCCNT61001.2024.10723950.
-
Kulal, D., Shiferaw, L. ve Niyaz, Q. (2025). Phishing email detection through machine learning and word error correction. 2025 17th In-ternational Conference on COMmunication Systems & NETworkS (COMSNETS), Bengaluru, Hindistan, ss. 216–223. https://doi.org/10.1109/COMSNETS63942.2025.10885558.
-
Opara, C., Modesti, P. ve Golightly, L. (2025). Evaluating spam filters and stylometric detection of AI-generated phishing emails. Expert Systems with Applications, 235, 127044. https://doi.org/10.1016/j.eswa.2025.127044.
-
Patel, H., Rehman, U. ve Iqbal, F. (2024). Evaluating the efficacy of large language models in identifying phishing attempts. 2024 16th International Conference on Human System Interaction (HSI), 1–7. https://doi.org/10.1109/HSI61632.2024.10613528.
-
Peng, T., Harris, I. G. ve Sawa, Y. (2018). Detecting phishing attacks using natural language processing and machine learning. 2018 IEEE 12th International Conference on Semantic Computing (ICSC), 300–303. https://doi.org/10.1109/ICSC.2018.00056.
-
Pimpason, N., Viboonsang, P. ve Kosolsombat, S. (2025). Phishing email detection model using deep learning. 2025 IEEE International Conference on Cybernetics and Innovations (ICCI), Bangkok, Tayland, ss. 1–6. https://doi.org/10.1109/ICCI64209.2025.10987422.
-
Özker, U. (2021). İçerik tabanlı oltalama saldırısı tespit sistemi (Yüksek lisans tezi). İstanbul Kültür Üniversitesi.
-
Rabbi, M. F., Champa, A. I. ve Zibran, M. F. (2023). Phishy? Detecting phishing emails using ML and NLP. 2023 IEEE/ACIS 21st Inter-national Conference on Software Engineering Research, Management and Applications (SERA), 1–6. https://doi.org/10.1109/SERA57763.2023.10197758.
-
Roumeliotis, K. I., Tselikas, N. D. ve Nasiopoulos, D. K. (2024). Next-generation spam filtering: Comparative fine-tuning of LLMs, NLPs, and CNN models for email spam classification. Electronics, 13(11), 2034. https://doi.org/10.3390/electronics13112034.
-
Salloum, S., Gaber, T., Vadera, S. ve Shaalan, K. (2022). A systematic literature review on phishing email detection using natural langua-ge processing techniques. IEEE Access, 10, 65703–65734. https://doi.org/10.1109/ACCESS.2022.3183083.
-
Salloum, S., Gaber, T., Vadera, S. ve Shaalan, K. (2021). Phishing email detection using natural language processing techniques: A litera-ture survey. Procedia Computer Science, 189, 19–28. https://doi.org/10.1016/j.procs.2021.05.077.
-
Sawant, S., Savakhande, R., Sankhe, O. ve Tamboli, S. (2023). Phishing detection by integrating machine learning and deep lear-ning. 2023 International Conference on Advances in Computing and Communications (ICACC), New Delhi, India, ss. 104–111. https://doi.org/10.1109/ICACC58235.2023.10117854.
-
Toğaçar, M. (2021). Web sitelerinde gerçekleştirilen oltalama saldırılarının yapay zekâ yaklaşımı ile tespiti. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 10(4), 1603–1614. https://doi.org/10.17798/bitlisfen.988001.
-
Turhanlar, M. (2019). Detecting Turkish phishing attacks with machine learning classifiers (Yüksek lisans tezi).
Sakarya Üniversitesi.
Uçar, M. (2020). Phishing detection system using extreme learning machines with different activation function based on majority voting. Politeknik Dergisi, 23(4), 1227–1235. https://doi.org/10.2339/politeknik.1098037
Verma, S., Ayala-Rivera, V. ve Portillo-Dominguez, A. O. (2023). Detection of phishing in mobile instant messaging using natural langua-ge processing and machine learning. In CONISOFT 2023: 11th International Conference in Software Engineering Research and Innovation (s. 106–113). IEEE. https://doi.org/10.1109/CONISOFT58849.2023.00029
Semantic Detection of Turkish Phishing Emails: A Natural Language Processing and Deep Learning-Based Approach
Yıl 2025,
Cilt: 1 Sayı: 1, 29 - 42, 30.06.2025
Merve Gül Taş
Öz
This study proposes a semantic-based text classification approach for detecting phishing attacks in Turkish email content. A balanced dataset consisting of legitimate and fraudulent emails was constructed. The preprocessing phase included case normalization, punctuation removal, and TF-IDF-based vectorization, while contextual embeddings were obtained using the BERTurk model. Classification was performed using Naive Bayes, SVM, LSTM, ELM, and BERT algorithms. All models were trained in the Google Colab environment, and their performance was assessed using accuracy, F1-score, and ROC-AUC metrics. Results indicate that the BERT model provides superior performance in identifying semantic differences in Turkish phishing emails. The study addresses the gap in phishing detection for morphologically rich languages such as Turkish and presents a scalable model suitable for integration into real-time cybersecurity systems. The findings also demonstrate the viability of contextual NLP techniques in resource-scarce language environments.
Etik Beyan
This article does not involve any studies with human or animal participants. Scientific and ethical principles were followed throughout the preparation of this study, and all sources used have been duly cited in the refe-rences section.
Destekleyen Kurum
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Teşekkür
I would like to express my sincere gratitude to my academic advisor and all supporting academic staff for their valuable contributions throughout the conduct of this study.
Kaynakça
-
Ahi, Ş. ve Soğukpınar, İ. (2023). Derin öğrenme modelleri ile kimlik avı e-posta tespiti. Türkiye Bilim Vakfı Bilgisayar Bilimleri ve Mü-hendisliği Dergisi, 13(2), 17–29.
-
Aldakheel, E. A., Zakariah, M., Gashgari, G. A., Almarshad, F. A., & Alzahrani, A. I. A. (2023). A deep learning-based innovative tech-nique for phishing detection in modern security with uniform resource locators. Sensors, 23(9), 4403. https://doi.org/10.3390/s23094403
-
Alhogail, A., & Alsabih, A. (2021). Applying machine learning and natural language processing to detect phishing email. Computers & Se-curity, 110, 102414. https://doi.org/10.1016/j.cose.2021.102414.
-
Al-Yozbaky, R. Sh. ve Alanezi, M. (2023). Detection and analyzing phishing emails using NLP techniques. 2023 5th International Confe-rence on Human-Computer Interaction, Optimization and Robotic Applications (HORA), 1-6. https://doi.org/10.1109/HORA58378.2023.10156738.
-
AlJamal, M., Alquran, R., Aljaidi, M., AlJamal, O. S., Alsarhan, A., AL-Aiash, I., Samara, G., BaniSalman, M. ve Khouj, M. (2024). Har-nessing ML and NLP for enhanced cybersecurity: A comprehensive approach for phishing email detection. 2024 25th Internatio-nal Arab Conference on Information Technology (ACIT), 1–6. https://doi.org/10.1109/ACIT62805.2024.10877181.
-
Anilkumar, C., Karrothu, A., Sri Mouli, N. ve Bhanu Tej, C. (2023). Recognition and processing of phishing emails using NLP: A survey. 2023 International Conference on Computer Communication and Informatics (ICCCI), 1–6. https://doi.org/10.1109/ICCCI56745.2023.10128481.
-
Atawneh, S. ve Aljehani, H. (2023). Phishing email detection model using deep learning. Electronics, 12(4261), 1–15. https://doi.org/10.3390/electronics12204261.
-
Buber, E., Diri, B. ve Şahingöz, Ö. K. (2017). DDİ yöntemleri ile oltalama saldırılarının URL’den tespiti. 2017 Uluslararası Bilgisayar Bi-limleri ve Mühendisliği Konferansı (UBMK), 253–258. https://doi.org/10.1109/UBMK.2017.8093406.
-
Egozi, G. ve Verma, R. (2018). Phishing email detection using robust NLP techniques. 2018 IEEE International Conference on Data Mi-ning Workshops (ICDMW), 1–8. https://doi.org/10.1109/ICDMW.2018.00009.
-
Eryılmaz, E. E., Şahin, D. Ö. ve Kılıç, E. (2020). Türkçe için makine öğrenimi tabanlı yaramaz elektronik posta algılama sistemi. 5. Ulus-lararası Bilgisayar Bilimleri ve Mühendisliği Konferansı (UBMK 2020), Kocaeli, Türkiye, ss. 122–130. https://doi.org/10.1109/UBMK50275.2020.9115625.
-
Fahim, R. A., Arman, M. S., Sultana, I., Tasnim, N., Ahmed, K. R. ve Mahmud, I. (2024). PhishGuard: Leveraging NLP and machine le-arning for email phishing detection. 2024 International Conference on Big Data Analytics in Bioinformatics (DABCon), 1–6. https://doi.org/10.1109/DABCON63472.2024.10919349.
-
Fette, I., Sadeh, N. ve Tomasic, A. (2007). Learning to detect phishing. 16th International Conference on World Wide Web (WWW '07), Banff, Alberta, Kanada, ss. 649–656. ACM. https://doi.org/10.1145/1242572.1242650.
-
Hasanov, I., Virtanen, S., Hakkala, A., & Isoaho, J. (2024). Application of large language models in cybersecurity: A systematic literature review. IEEE Access, 12, 176751–176773. https://doi.org/10.1109/ACCESS.2024.3505983.
-
Ibrahim, A., Aljarah, I. ve Al-Betar, M. A. (2024). Phishing detection in Arabic SMS messages using natural language proces-sing. Proceedings of the 2024 Seventh International Women in Data Science Conference at Prince Sultan University (WiDS PSU), Riyadh, Saudi Arabia. https://doi.org/10.1109/WiDS-PSU61003.2024.00040.
-
Kopparaju, S. T., Chavarriaga, C. ve Galarreta, E. (2024). Natural language processing-enhanced machine learning framework for comp-rehensive phishing email identification. 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), 1-6. https://doi.org/10.1109/ICCCNT61001.2024.10723950.
-
Kulal, D., Shiferaw, L. ve Niyaz, Q. (2025). Phishing email detection through machine learning and word error correction. 2025 17th In-ternational Conference on COMmunication Systems & NETworkS (COMSNETS), Bengaluru, Hindistan, ss. 216–223. https://doi.org/10.1109/COMSNETS63942.2025.10885558.
-
Opara, C., Modesti, P. ve Golightly, L. (2025). Evaluating spam filters and stylometric detection of AI-generated phishing emails. Expert Systems with Applications, 235, 127044. https://doi.org/10.1016/j.eswa.2025.127044.
-
Patel, H., Rehman, U. ve Iqbal, F. (2024). Evaluating the efficacy of large language models in identifying phishing attempts. 2024 16th International Conference on Human System Interaction (HSI), 1–7. https://doi.org/10.1109/HSI61632.2024.10613528.
-
Peng, T., Harris, I. G. ve Sawa, Y. (2018). Detecting phishing attacks using natural language processing and machine learning. 2018 IEEE 12th International Conference on Semantic Computing (ICSC), 300–303. https://doi.org/10.1109/ICSC.2018.00056.
-
Pimpason, N., Viboonsang, P. ve Kosolsombat, S. (2025). Phishing email detection model using deep learning. 2025 IEEE International Conference on Cybernetics and Innovations (ICCI), Bangkok, Tayland, ss. 1–6. https://doi.org/10.1109/ICCI64209.2025.10987422.
-
Özker, U. (2021). İçerik tabanlı oltalama saldırısı tespit sistemi (Yüksek lisans tezi). İstanbul Kültür Üniversitesi.
-
Rabbi, M. F., Champa, A. I. ve Zibran, M. F. (2023). Phishy? Detecting phishing emails using ML and NLP. 2023 IEEE/ACIS 21st Inter-national Conference on Software Engineering Research, Management and Applications (SERA), 1–6. https://doi.org/10.1109/SERA57763.2023.10197758.
-
Roumeliotis, K. I., Tselikas, N. D. ve Nasiopoulos, D. K. (2024). Next-generation spam filtering: Comparative fine-tuning of LLMs, NLPs, and CNN models for email spam classification. Electronics, 13(11), 2034. https://doi.org/10.3390/electronics13112034.
-
Salloum, S., Gaber, T., Vadera, S. ve Shaalan, K. (2022). A systematic literature review on phishing email detection using natural langua-ge processing techniques. IEEE Access, 10, 65703–65734. https://doi.org/10.1109/ACCESS.2022.3183083.
-
Salloum, S., Gaber, T., Vadera, S. ve Shaalan, K. (2021). Phishing email detection using natural language processing techniques: A litera-ture survey. Procedia Computer Science, 189, 19–28. https://doi.org/10.1016/j.procs.2021.05.077.
-
Sawant, S., Savakhande, R., Sankhe, O. ve Tamboli, S. (2023). Phishing detection by integrating machine learning and deep lear-ning. 2023 International Conference on Advances in Computing and Communications (ICACC), New Delhi, India, ss. 104–111. https://doi.org/10.1109/ICACC58235.2023.10117854.
-
Toğaçar, M. (2021). Web sitelerinde gerçekleştirilen oltalama saldırılarının yapay zekâ yaklaşımı ile tespiti. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 10(4), 1603–1614. https://doi.org/10.17798/bitlisfen.988001.
-
Turhanlar, M. (2019). Detecting Turkish phishing attacks with machine learning classifiers (Yüksek lisans tezi).
Sakarya Üniversitesi.
Uçar, M. (2020). Phishing detection system using extreme learning machines with different activation function based on majority voting. Politeknik Dergisi, 23(4), 1227–1235. https://doi.org/10.2339/politeknik.1098037
Verma, S., Ayala-Rivera, V. ve Portillo-Dominguez, A. O. (2023). Detection of phishing in mobile instant messaging using natural langua-ge processing and machine learning. In CONISOFT 2023: 11th International Conference in Software Engineering Research and Innovation (s. 106–113). IEEE. https://doi.org/10.1109/CONISOFT58849.2023.00029