Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti

Şeydanur Ahi; İbrahim Soğukpınar

Araştırma Makalesi

Phishing E-mail Detection with Deep Learning Models

Yıl 2020, Cilt: 13 Sayı: 2, 17 - 31, 16.12.2020

Öz

Social engineering is the art of getting information (deception) from people with using technology or without using technology. The vast majority of the attacks facing today are human origin, and likewise, these attacks target computer users. Human being who is the weakest link in the security chain shows various weaknesses in the security process, due to human being’s variable behavior in different times. Phishing that is a kind of social engineering attack is technically created to capture consumers' financial or personal information. Phishing is the one of the biggest challenges for the e commerce world. Many companies and individuals lose billions of dollars because of phishing attacks. This global impact of phishing attacks will continue to increase therefore, more effective phishing detection techniques need to be developed to reduce threats. A detection method which is created by using deep learning models against phishing email attacks is proposed in this work. Various deep learning models were trained using the features obtained from the head and body parts of incoming e-mails in the proposed method. As a result of the tests, a 96.84% success rate was achieved with this detection method proposed against phishing attacks.

Anahtar Kelimeler

Social Engineering Attack, Phishing E-mail Detection, Deep learning, Multi Layer Perceptron , LSTM, Word2Vec.

Kaynakça

[1] Khonji, M. Iraqi Y., ve Jones, A. Phishing detection: a literature survey, IEEE Communications & Surveys Tutorials, vol. 15, no. 4, pp. 2091–2121, 2013.
[2] Sheng S., Holbrook, M. Kumaraguru, P. L. Cranor, F. ve Downs, J. Who falls for phish?: a demographic analysis of phishing susceptibility and effectiveness of interventions, in Proceedings of the 28th Annual SIGCHI Conference on Human Factors in Computing Systems (CHI ’10), pp. 373–382, Atlanta, Ga, USA, April 2010.
[3] Behdad M., Barone, L. Bennamoun, M. ve French, T. Nature inspired techniques in the context of fraud detection, IEEE Transactions on Systems, Man, and Cybernetics C: Applications and Reviews, vol. 42, no. 6, pp. 1273–1290, 2012.
[4] Akinyelu, A. A., ve Adewumi, A. O. Classification of phishing email using random forest machine learning technique. Journal of Applied Mathematics, vol. 2014, 2014.
[5] Mohammad, R. M., Thabtah, F., ve McCluskey, L. Intelligent rule-based phishing websites classification. IET Information Security, 8(3), 153-160. (2014).
[6] Almomani, A., Gupta, B. B., Atawneh, S., Meulenberg, A., ve Almomani, E. A survey of phishing email filtering techniques. IEEE communications surveys & tutorials, 15(4), 2070-2090. (2013).
[7] Silva, R. M., Yamakami, A., ve Almeida, T. A. An analysis of machine learning methods for spam host detection. In 2012 IEEE 11th International Conference on Machine Learning and Applications, (Vol. 2, pp. 227-232). IEEE. (2012, December).
[8] Zareapoor, M., ve Seeja, K. R. Feature extraction or feature selection for text classification: A case study on phishing email detection. International Journal of Information Engineering and Electronic Business, 7(2), 60. (2015).
[9] Nguyen, M., Nguyen, T., ve Nguyen, T. H. A deep learning model with hierarchical lstms and supervised attention for anti-phishing. arXiv preprint arXiv:1805.01554. (2018).
[10] Özdemı̇r, C., Ataş, M., ve Özer, A. B. Classification of Turkish spam e-mails with artificial immune system. In 21st Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE. (2013, April).
[11] Basnet, R., Mukkamala, S., ve Sung, A. H. Detection of phishing attacks: A machine learning approach. In Soft Computing Applications in Industry (pp. 373-383). Springer, Berlin, Heidelberg. (2008).
[12] Park, G., & Taylor, J. M. Using syntactic features for phishing detection. arXiv preprint arXiv:1506.00037. (2015).
[13] E. Kreyszig Advanced Engineering Mathematics (Fourth ed.). Wiley. p. 880, eq. 5. ISBN 0-471-02140-7. (1979).
[14] Spiegel, Murray R.; Stephens, Larry J Schaum's Outlines Statistics (Fourth ed.), McGraw Hill, ISBN 978-0-07-148584-5 (2008),
[15] Tool for computing continuous distributed representations of words, Google Jul 30, 2013, Accessed on: Nov. 2019. [Online]. Available: https://code.google.com/archive/p/word2vec/
[16] Chollet, F. Keras Git Hubrepository. [Online]. Available: https://github.com/fchollet/keras [Accessed 2020].
[17] Kingma, D. P., ve Ba, J. Adam: A method for stochastic optimization. ArXivpreprint arXiv:1412.6980. (2014).
[18] Şeker, A, Diri, B, Balık, H., Derin Öğrenme Yöntemleri ve Uygulamaları Hakkında Bir İnceleme. Gazi Mühendislik Bilimleri Dergisi (GMBD), 3 (3), 47-64 (2017).
[19] Sak, H., Senior, A. W., ve Beaufays, F. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. (2014).
[20] Jose Navario phishing corpus, [Online]. Available: https://monkey.org/~jose/phishing/. [Accessed 2020].
[21] W. W. Cohen, Enron Email Dataset, 8 May [Online]. Available: https://www.cs.cmu.edu/~enron/. [Accessed 2020].
[22] Enron Corporation-Company Profile, [Online]. Available: https://www.referenceforbusiness.com/history2/57/Enron-Corporation.html. [Accessed 2020].
[23] Vinayakumar, R., Barathi Ganesh, H. B., ve Kumar, M., ve Soman, K. P. DeepAnti-PhishNet: applying deep neural networks for phishing email detection. CEN-AISecurity@ IWSPA, 40-50. (2018).
[24] Kramer, O. Scikit-learn. In Machine learning for evolution strategies (pp. 45-53). Springer, Cham. (2016).
[25] T. Fawcett, An Introduction to ROC Analysis, Pattern Recognition Letters, vol. 27, Jun 2006, pp. 861-874.
[26] Abu-Nimeh, S., Nappa, D., Wang, X., ve Nair, S. October). A comparison of machine learning techniques for phishing detection. In Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit (pp. 60-69). (2007,
[27] Hussain, R., ve Qamar, U. An Approach to Detect Spam Emails by Using Majority Voting. In International Conference on Data Mining, Internet Computing and Big Data (BigData2014) (pp. 76-83). (2014).
[28] Das, A., ve Verma, R. Automated email Generation for Targeted Attacks using Natural Language. arXiv preprint arXiv:1908.06893. (2019).
[29] Almomani, A., Gupta, B. B., Atawneh, S., Meulenberg, A., ve Almomani, E. A survey of phishing email filtering techniques. IEEE communications surveys & tutorials, 15(4), 2070-2090. (2013).
[30] Richardson, L. Beautiful soup documentation. April. (2007).
[31] Hopkins M., UCI Machine Learning Repository, Spambase Data Set, [Online].Available: https://archive.ics.uci.edu/ml/datasets/Spambase. [Accessed 2019]
[32] Moradpoor, N., Clavie, B., ve Buchanan, B. Employing machine learning techniques for detection and classification of phishing emails. In 2017 Computing Conference (pp. 149-156). IEEE. (2017, July).
[33] Spam Assassin spam email public corpus, [Online].Available: https://spamassassin.apache.org/old/publiccorpus/. [Accessed 2020].
[34] Unnithan, N. A., Harikrishnan, N. B., Vinayakumar, R., Soman, K. P., & Sundarakrishna, S. Detecting phishing E-mail using machine learning techniques (2018).
[35] Sonowal, G., & Kuppusamy, K. S. PhiDMA–A phishing detection model with multi-filter approach. Journal of King Saud University-Computer and Information Sciences. (2017).
[36] Espinoza, B., Simba, J., Fuertes, W., Benavides, E., Andrade, R., & Toulkeridis, T. December). Phishing Attack Detection: A Solution Based on the Typical Machine Learning Modeling Cycle. In 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (pp. 202-207). IEEE. (2019,

Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti

Yıl 2020, Cilt: 13 Sayı: 2, 17 - 31, 16.12.2020

Şeydanur Ahi , İbrahim Soğukpınar

Öz

Sosyal mühendislik, teknolojiyi kullanarak ya da teknolojiyi kullanmadan insanlardan bilgi edinme (aldatma) sanatıdır. Günümüzde karşı karşıya olduğumuz saldırıların çok büyük bir kısmı insan kaynaklıdır ve aynı şekilde sistemleri değil onları kullanan insanları hedef almaktadır. Güvenlik zincirindeki en zayıf halka olan insan, farklı zamanlarda farklı davranışlar sergilemesinden dolayı güvenlik sürecinde çeşitli zafiyetler gösterebilmektedir. Kimlik avı teknik olarak tüketicilerin finansal veya kişisel bilgilerini ele geçirmek için oluşturulmuş bir tür sosyal mühendislik saldırısıdır. Kimlik avı bugün e-ticaret dünyasının karşılaştığı en büyük zorluklardan biridir. Kimlik avı saldırıları yüzünden birçok şirket ve birey milyarlarca dolar kaybetmektedir. Kimlik avı saldırılarının bu küresel etkisi artmaya devam edecektir ve bu nedenle tehditleri azaltmak için daha etkili kimlik avı algılama tekniklerinin geliştirilmesi gerekmektedir. Bu çalışmada, kimlik avı e-posta saldırılarına karşı derin öğrenme modelleri kullanılarak oluşturulan bir tespit yöntemi önerilmiştir. Önerilen yöntemde gelen e-posta iletilerinin başlık ve gövde bölümlerinden elde edilen özellikler kullanılarak çeşitli derin öğrenme modelleri eğitilmiştir. Yapılan testler sonucunda kimlik avı saldırılarına karşı önerilen bu tespit yöntemi %96,84’lük bir başarı oranı elde edilmiştir.

Anahtar Kelimeler

Sosyal Mühendislik Saldırısı , Oltalama E-Posta Tespiti , Derin öğrenme , Çok Katmanlı Algılayıcı , LSTM , Word2Vec

Kaynakça

[1] Khonji, M. Iraqi Y., ve Jones, A. Phishing detection: a literature survey, IEEE Communications & Surveys Tutorials, vol. 15, no. 4, pp. 2091–2121, 2013.
[2] Sheng S., Holbrook, M. Kumaraguru, P. L. Cranor, F. ve Downs, J. Who falls for phish?: a demographic analysis of phishing susceptibility and effectiveness of interventions, in Proceedings of the 28th Annual SIGCHI Conference on Human Factors in Computing Systems (CHI ’10), pp. 373–382, Atlanta, Ga, USA, April 2010.
[3] Behdad M., Barone, L. Bennamoun, M. ve French, T. Nature inspired techniques in the context of fraud detection, IEEE Transactions on Systems, Man, and Cybernetics C: Applications and Reviews, vol. 42, no. 6, pp. 1273–1290, 2012.
[4] Akinyelu, A. A., ve Adewumi, A. O. Classification of phishing email using random forest machine learning technique. Journal of Applied Mathematics, vol. 2014, 2014.
[5] Mohammad, R. M., Thabtah, F., ve McCluskey, L. Intelligent rule-based phishing websites classification. IET Information Security, 8(3), 153-160. (2014).
[6] Almomani, A., Gupta, B. B., Atawneh, S., Meulenberg, A., ve Almomani, E. A survey of phishing email filtering techniques. IEEE communications surveys & tutorials, 15(4), 2070-2090. (2013).
[7] Silva, R. M., Yamakami, A., ve Almeida, T. A. An analysis of machine learning methods for spam host detection. In 2012 IEEE 11th International Conference on Machine Learning and Applications, (Vol. 2, pp. 227-232). IEEE. (2012, December).
[8] Zareapoor, M., ve Seeja, K. R. Feature extraction or feature selection for text classification: A case study on phishing email detection. International Journal of Information Engineering and Electronic Business, 7(2), 60. (2015).
[9] Nguyen, M., Nguyen, T., ve Nguyen, T. H. A deep learning model with hierarchical lstms and supervised attention for anti-phishing. arXiv preprint arXiv:1805.01554. (2018).
[10] Özdemı̇r, C., Ataş, M., ve Özer, A. B. Classification of Turkish spam e-mails with artificial immune system. In 21st Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE. (2013, April).
[11] Basnet, R., Mukkamala, S., ve Sung, A. H. Detection of phishing attacks: A machine learning approach. In Soft Computing Applications in Industry (pp. 373-383). Springer, Berlin, Heidelberg. (2008).
[12] Park, G., & Taylor, J. M. Using syntactic features for phishing detection. arXiv preprint arXiv:1506.00037. (2015).
[13] E. Kreyszig Advanced Engineering Mathematics (Fourth ed.). Wiley. p. 880, eq. 5. ISBN 0-471-02140-7. (1979).
[14] Spiegel, Murray R.; Stephens, Larry J Schaum's Outlines Statistics (Fourth ed.), McGraw Hill, ISBN 978-0-07-148584-5 (2008),
[15] Tool for computing continuous distributed representations of words, Google Jul 30, 2013, Accessed on: Nov. 2019. [Online]. Available: https://code.google.com/archive/p/word2vec/
[16] Chollet, F. Keras Git Hubrepository. [Online]. Available: https://github.com/fchollet/keras [Accessed 2020].
[17] Kingma, D. P., ve Ba, J. Adam: A method for stochastic optimization. ArXivpreprint arXiv:1412.6980. (2014).
[18] Şeker, A, Diri, B, Balık, H., Derin Öğrenme Yöntemleri ve Uygulamaları Hakkında Bir İnceleme. Gazi Mühendislik Bilimleri Dergisi (GMBD), 3 (3), 47-64 (2017).
[19] Sak, H., Senior, A. W., ve Beaufays, F. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. (2014).
[20] Jose Navario phishing corpus, [Online]. Available: https://monkey.org/~jose/phishing/. [Accessed 2020].
[21] W. W. Cohen, Enron Email Dataset, 8 May [Online]. Available: https://www.cs.cmu.edu/~enron/. [Accessed 2020].
[22] Enron Corporation-Company Profile, [Online]. Available: https://www.referenceforbusiness.com/history2/57/Enron-Corporation.html. [Accessed 2020].
[23] Vinayakumar, R., Barathi Ganesh, H. B., ve Kumar, M., ve Soman, K. P. DeepAnti-PhishNet: applying deep neural networks for phishing email detection. CEN-AISecurity@ IWSPA, 40-50. (2018).
[24] Kramer, O. Scikit-learn. In Machine learning for evolution strategies (pp. 45-53). Springer, Cham. (2016).
[25] T. Fawcett, An Introduction to ROC Analysis, Pattern Recognition Letters, vol. 27, Jun 2006, pp. 861-874.
[26] Abu-Nimeh, S., Nappa, D., Wang, X., ve Nair, S. October). A comparison of machine learning techniques for phishing detection. In Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit (pp. 60-69). (2007,
[27] Hussain, R., ve Qamar, U. An Approach to Detect Spam Emails by Using Majority Voting. In International Conference on Data Mining, Internet Computing and Big Data (BigData2014) (pp. 76-83). (2014).
[28] Das, A., ve Verma, R. Automated email Generation for Targeted Attacks using Natural Language. arXiv preprint arXiv:1908.06893. (2019).
[29] Almomani, A., Gupta, B. B., Atawneh, S., Meulenberg, A., ve Almomani, E. A survey of phishing email filtering techniques. IEEE communications surveys & tutorials, 15(4), 2070-2090. (2013).
[30] Richardson, L. Beautiful soup documentation. April. (2007).
[31] Hopkins M., UCI Machine Learning Repository, Spambase Data Set, [Online].Available: https://archive.ics.uci.edu/ml/datasets/Spambase. [Accessed 2019]
[32] Moradpoor, N., Clavie, B., ve Buchanan, B. Employing machine learning techniques for detection and classification of phishing emails. In 2017 Computing Conference (pp. 149-156). IEEE. (2017, July).
[33] Spam Assassin spam email public corpus, [Online].Available: https://spamassassin.apache.org/old/publiccorpus/. [Accessed 2020].
[34] Unnithan, N. A., Harikrishnan, N. B., Vinayakumar, R., Soman, K. P., & Sundarakrishna, S. Detecting phishing E-mail using machine learning techniques (2018).
[35] Sonowal, G., & Kuppusamy, K. S. PhiDMA–A phishing detection model with multi-filter approach. Journal of King Saud University-Computer and Information Sciences. (2017).
[36] Espinoza, B., Simba, J., Fuertes, W., Benavides, E., Andrade, R., & Toulkeridis, T. December). Phishing Attack Detection: A Solution Based on the Typical Machine Learning Modeling Cycle. In 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (pp. 202-207). IEEE. (2019,

Toplam 36 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Mühendislik
Bölüm	Makaleler(Araştırma)
Yazarlar	Şeydanur Ahi 0000-0001-8511-440X İbrahim Soğukpınar 0000-0002-0408-0277
Yayımlanma Tarihi	16 Aralık 2020
Yayımlandığı Sayı	Yıl 2020 Cilt: 13 Sayı: 2

Kaynak Göster

APA	Ahi, Ş., & Soğukpınar, İ. (2020). Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 13(2), 17-31.
AMA	Ahi Ş, Soğukpınar İ. Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti. TBV-BBMD. Aralık 2020;13(2):17-31.
Chicago	Ahi, Şeydanur, ve İbrahim Soğukpınar. “Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti”. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 13, sy. 2 (Aralık 2020): 17-31.
EndNote	Ahi Ş, Soğukpınar İ (01 Aralık 2020) Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 13 2 17–31.
IEEE	Ş. Ahi ve İ. Soğukpınar, “Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti”, TBV-BBMD, c. 13, sy. 2, ss. 17–31, 2020.
ISNAD	Ahi, Şeydanur - Soğukpınar, İbrahim. “Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti”. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 13/2 (Aralık2020), 17-31.
JAMA	Ahi Ş, Soğukpınar İ. Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti. TBV-BBMD. 2020;13:17–31.
MLA	Ahi, Şeydanur ve İbrahim Soğukpınar. “Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti”. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, c. 13, sy. 2, 2020, ss. 17-31.
Vancouver	Ahi Ş, Soğukpınar İ. Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti. TBV-BBMD. 2020;13(2):17-31.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

https://i.creativecommons.org/l/by-nc/4.0Makale Kabulü

Çevrimiçi makale yüklemesi yapmak için kullanıcı kayıt/girişini kullanınız.

Dergiye gönderilen makalelerin kabul süreci şu aşamalardan oluşmaktadır:

1. Gönderilen her makale ilk aşamada en az iki hakeme gönderilmektedir.

2. Hakem ataması, dergi editörleri tarafından yapılmaktadır. Derginin hakem havuzunda yaklaşık 200 hakem bulunmaktadır ve bu hakemler ilgi alanlarına göre sınıflandırılmıştır. Her hakeme ilgilendiği konuda makale gönderilmektedir. Hakem seçimi menfaat çatışmasına neden olmayacak biçimde yapılmaktadır.

3. Hakemlere gönderilen makalelerde yazar adları kapatılmaktadır.

4. Hakemlere bir makalenin nasıl değerlendirileceği açıklanmaktadır ve aşağıda görülen değerlendirme formunu doldurmaları istenmektedir.

5. İki hakemin olumlu görüş bildirdiği makaleler editörler tarafından benzerlik incelemesinden geçirilir. Makalelerdeki benzerliğin %25’ten küçük olması beklenir.

6. Tüm aşamaları geçmiş olan bir bildiri dil ve sunuş açısından editör tarafından incelenir ve gerekli düzeltme ve iyileştirmeler yapılır. Gerekirse yazarlara durum bildirilir.

Bu eser Creative Commons Atıf-GayriTicari 4.0 Uluslararası Lisansı ile lisanslanmıştır.