Research Article
BibTex RIS Cite

Phishing E-mail Detection with Deep Learning Models

Year 2020, Volume: 13 Issue: 2, 17 - 31, 16.12.2020

Abstract

Social engineering is the art of getting information (deception) from people with using technology or without using technology. The vast majority of the attacks facing today are human origin, and likewise, these attacks target computer users. Human being who is the weakest link in the security chain shows various weaknesses in the security process, due to human being’s variable behavior in different times. Phishing that is a kind of social engineering attack is technically created to capture consumers' financial or personal information. Phishing is the one of the biggest challenges for the e commerce world. Many companies and individuals lose billions of dollars because of phishing attacks. This global impact of phishing attacks will continue to increase therefore, more effective phishing detection techniques need to be developed to reduce threats. A detection method which is created by using deep learning models against phishing email attacks is proposed in this work. Various deep learning models were trained using the features obtained from the head and body parts of incoming e-mails in the proposed method. As a result of the tests, a 96.84% success rate was achieved with this detection method proposed against phishing attacks.

References

  • [1] Khonji, M. Iraqi Y., ve Jones, A. Phishing detection: a literature survey, IEEE Communications & Surveys Tutorials, vol. 15, no. 4, pp. 2091–2121, 2013.
  • [2] Sheng S., Holbrook, M. Kumaraguru, P. L. Cranor, F. ve Downs, J. Who falls for phish?: a demographic analysis of phishing susceptibility and effectiveness of interventions, in Proceedings of the 28th Annual SIGCHI Conference on Human Factors in Computing Systems (CHI ’10), pp. 373–382, Atlanta, Ga, USA, April 2010.
  • [3] Behdad M., Barone, L. Bennamoun, M. ve French, T. Nature inspired techniques in the context of fraud detection, IEEE Transactions on Systems, Man, and Cybernetics C: Applications and Reviews, vol. 42, no. 6, pp. 1273–1290, 2012.
  • [4] Akinyelu, A. A., ve Adewumi, A. O. Classification of phishing email using random forest machine learning technique. Journal of Applied Mathematics, vol. 2014, 2014.
  • [5] Mohammad, R. M., Thabtah, F., ve McCluskey, L. Intelligent rule-based phishing websites classification. IET Information Security, 8(3), 153-160. (2014).
  • [6] Almomani, A., Gupta, B. B., Atawneh, S., Meulenberg, A., ve Almomani, E. A survey of phishing email filtering techniques. IEEE communications surveys & tutorials, 15(4), 2070-2090. (2013).
  • [7] Silva, R. M., Yamakami, A., ve Almeida, T. A. An analysis of machine learning methods for spam host detection. In 2012 IEEE 11th International Conference on Machine Learning and Applications, (Vol. 2, pp. 227-232). IEEE. (2012, December).
  • [8] Zareapoor, M., ve Seeja, K. R. Feature extraction or feature selection for text classification: A case study on phishing email detection. International Journal of Information Engineering and Electronic Business, 7(2), 60. (2015).
  • [9] Nguyen, M., Nguyen, T., ve Nguyen, T. H. A deep learning model with hierarchical lstms and supervised attention for anti-phishing. arXiv preprint arXiv:1805.01554. (2018).
  • [10] Özdemı̇r, C., Ataş, M., ve Özer, A. B. Classification of Turkish spam e-mails with artificial immune system. In 21st Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE. (2013, April).
  • [11] Basnet, R., Mukkamala, S., ve Sung, A. H. Detection of phishing attacks: A machine learning approach. In Soft Computing Applications in Industry (pp. 373-383). Springer, Berlin, Heidelberg. (2008).
  • [12] Park, G., & Taylor, J. M. Using syntactic features for phishing detection. arXiv preprint arXiv:1506.00037. (2015).
  • [13] E. Kreyszig Advanced Engineering Mathematics (Fourth ed.). Wiley. p. 880, eq. 5. ISBN 0-471-02140-7. (1979).
  • [14] Spiegel, Murray R.; Stephens, Larry J Schaum's Outlines Statistics (Fourth ed.), McGraw Hill, ISBN 978-0-07-148584-5 (2008),
  • [15] Tool for computing continuous distributed representations of words, Google Jul 30, 2013, Accessed on: Nov. 2019. [Online]. Available: https://code.google.com/archive/p/word2vec/
  • [16] Chollet, F. Keras Git Hubrepository. [Online]. Available: https://github.com/fchollet/keras [Accessed 2020].
  • [17] Kingma, D. P., ve Ba, J. Adam: A method for stochastic optimization. ArXivpreprint arXiv:1412.6980. (2014).
  • [18] Şeker, A, Diri, B, Balık, H., Derin Öğrenme Yöntemleri ve Uygulamaları Hakkında Bir İnceleme. Gazi Mühendislik Bilimleri Dergisi (GMBD), 3 (3), 47-64 (2017).
  • [19] Sak, H., Senior, A. W., ve Beaufays, F. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. (2014).
  • [20] Jose Navario phishing corpus, [Online]. Available: https://monkey.org/~jose/phishing/. [Accessed 2020].
  • [21] W. W. Cohen, Enron Email Dataset, 8 May [Online]. Available: https://www.cs.cmu.edu/~enron/. [Accessed 2020].
  • [22] Enron Corporation-Company Profile, [Online]. Available: https://www.referenceforbusiness.com/history2/57/Enron-Corporation.html. [Accessed 2020].
  • [23] Vinayakumar, R., Barathi Ganesh, H. B., ve Kumar, M., ve Soman, K. P. DeepAnti-PhishNet: applying deep neural networks for phishing email detection. CEN-AISecurity@ IWSPA, 40-50. (2018).
  • [24] Kramer, O. Scikit-learn. In Machine learning for evolution strategies (pp. 45-53). Springer, Cham. (2016).
  • [25] T. Fawcett, An Introduction to ROC Analysis, Pattern Recognition Letters, vol. 27, Jun 2006, pp. 861-874.
  • [26] Abu-Nimeh, S., Nappa, D., Wang, X., ve Nair, S. October). A comparison of machine learning techniques for phishing detection. In Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit (pp. 60-69). (2007,
  • [27] Hussain, R., ve Qamar, U. An Approach to Detect Spam Emails by Using Majority Voting. In International Conference on Data Mining, Internet Computing and Big Data (BigData2014) (pp. 76-83). (2014).
  • [28] Das, A., ve Verma, R. Automated email Generation for Targeted Attacks using Natural Language. arXiv preprint arXiv:1908.06893. (2019).
  • [29] Almomani, A., Gupta, B. B., Atawneh, S., Meulenberg, A., ve Almomani, E. A survey of phishing email filtering techniques. IEEE communications surveys & tutorials, 15(4), 2070-2090. (2013).
  • [30] Richardson, L. Beautiful soup documentation. April. (2007).
  • [31] Hopkins M., UCI Machine Learning Repository, Spambase Data Set, [Online].Available: https://archive.ics.uci.edu/ml/datasets/Spambase. [Accessed 2019]
  • [32] Moradpoor, N., Clavie, B., ve Buchanan, B. Employing machine learning techniques for detection and classification of phishing emails. In 2017 Computing Conference (pp. 149-156). IEEE. (2017, July).
  • [33] Spam Assassin spam email public corpus, [Online].Available: https://spamassassin.apache.org/old/publiccorpus/. [Accessed 2020].
  • [34] Unnithan, N. A., Harikrishnan, N. B., Vinayakumar, R., Soman, K. P., & Sundarakrishna, S. Detecting phishing E-mail using machine learning techniques (2018).
  • [35] Sonowal, G., & Kuppusamy, K. S. PhiDMA–A phishing detection model with multi-filter approach. Journal of King Saud University-Computer and Information Sciences. (2017).
  • [36] Espinoza, B., Simba, J., Fuertes, W., Benavides, E., Andrade, R., & Toulkeridis, T. December). Phishing Attack Detection: A Solution Based on the Typical Machine Learning Modeling Cycle. In 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (pp. 202-207). IEEE. (2019,

Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti

Year 2020, Volume: 13 Issue: 2, 17 - 31, 16.12.2020

Abstract

Sosyal mühendislik, teknolojiyi kullanarak ya da teknolojiyi kullanmadan insanlardan bilgi edinme (aldatma) sanatıdır. Günümüzde karşı karşıya olduğumuz saldırıların çok büyük bir kısmı insan kaynaklıdır ve aynı şekilde sistemleri değil onları kullanan insanları hedef almaktadır. Güvenlik zincirindeki en zayıf halka olan insan, farklı zamanlarda farklı davranışlar sergilemesinden dolayı güvenlik sürecinde çeşitli zafiyetler gösterebilmektedir. Kimlik avı teknik olarak tüketicilerin finansal veya kişisel bilgilerini ele geçirmek için oluşturulmuş bir tür sosyal mühendislik saldırısıdır. Kimlik avı bugün e-ticaret dünyasının karşılaştığı en büyük zorluklardan biridir. Kimlik avı saldırıları yüzünden birçok şirket ve birey milyarlarca dolar kaybetmektedir. Kimlik avı saldırılarının bu küresel etkisi artmaya devam edecektir ve bu nedenle tehditleri azaltmak için daha etkili kimlik avı algılama tekniklerinin geliştirilmesi gerekmektedir. Bu çalışmada, kimlik avı e-posta saldırılarına karşı derin öğrenme modelleri kullanılarak oluşturulan bir tespit yöntemi önerilmiştir. Önerilen yöntemde gelen e-posta iletilerinin başlık ve gövde bölümlerinden elde edilen özellikler kullanılarak çeşitli derin öğrenme modelleri eğitilmiştir. Yapılan testler sonucunda kimlik avı saldırılarına karşı önerilen bu tespit yöntemi %96,84’lük bir başarı oranı elde edilmiştir.

References

  • [1] Khonji, M. Iraqi Y., ve Jones, A. Phishing detection: a literature survey, IEEE Communications & Surveys Tutorials, vol. 15, no. 4, pp. 2091–2121, 2013.
  • [2] Sheng S., Holbrook, M. Kumaraguru, P. L. Cranor, F. ve Downs, J. Who falls for phish?: a demographic analysis of phishing susceptibility and effectiveness of interventions, in Proceedings of the 28th Annual SIGCHI Conference on Human Factors in Computing Systems (CHI ’10), pp. 373–382, Atlanta, Ga, USA, April 2010.
  • [3] Behdad M., Barone, L. Bennamoun, M. ve French, T. Nature inspired techniques in the context of fraud detection, IEEE Transactions on Systems, Man, and Cybernetics C: Applications and Reviews, vol. 42, no. 6, pp. 1273–1290, 2012.
  • [4] Akinyelu, A. A., ve Adewumi, A. O. Classification of phishing email using random forest machine learning technique. Journal of Applied Mathematics, vol. 2014, 2014.
  • [5] Mohammad, R. M., Thabtah, F., ve McCluskey, L. Intelligent rule-based phishing websites classification. IET Information Security, 8(3), 153-160. (2014).
  • [6] Almomani, A., Gupta, B. B., Atawneh, S., Meulenberg, A., ve Almomani, E. A survey of phishing email filtering techniques. IEEE communications surveys & tutorials, 15(4), 2070-2090. (2013).
  • [7] Silva, R. M., Yamakami, A., ve Almeida, T. A. An analysis of machine learning methods for spam host detection. In 2012 IEEE 11th International Conference on Machine Learning and Applications, (Vol. 2, pp. 227-232). IEEE. (2012, December).
  • [8] Zareapoor, M., ve Seeja, K. R. Feature extraction or feature selection for text classification: A case study on phishing email detection. International Journal of Information Engineering and Electronic Business, 7(2), 60. (2015).
  • [9] Nguyen, M., Nguyen, T., ve Nguyen, T. H. A deep learning model with hierarchical lstms and supervised attention for anti-phishing. arXiv preprint arXiv:1805.01554. (2018).
  • [10] Özdemı̇r, C., Ataş, M., ve Özer, A. B. Classification of Turkish spam e-mails with artificial immune system. In 21st Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE. (2013, April).
  • [11] Basnet, R., Mukkamala, S., ve Sung, A. H. Detection of phishing attacks: A machine learning approach. In Soft Computing Applications in Industry (pp. 373-383). Springer, Berlin, Heidelberg. (2008).
  • [12] Park, G., & Taylor, J. M. Using syntactic features for phishing detection. arXiv preprint arXiv:1506.00037. (2015).
  • [13] E. Kreyszig Advanced Engineering Mathematics (Fourth ed.). Wiley. p. 880, eq. 5. ISBN 0-471-02140-7. (1979).
  • [14] Spiegel, Murray R.; Stephens, Larry J Schaum's Outlines Statistics (Fourth ed.), McGraw Hill, ISBN 978-0-07-148584-5 (2008),
  • [15] Tool for computing continuous distributed representations of words, Google Jul 30, 2013, Accessed on: Nov. 2019. [Online]. Available: https://code.google.com/archive/p/word2vec/
  • [16] Chollet, F. Keras Git Hubrepository. [Online]. Available: https://github.com/fchollet/keras [Accessed 2020].
  • [17] Kingma, D. P., ve Ba, J. Adam: A method for stochastic optimization. ArXivpreprint arXiv:1412.6980. (2014).
  • [18] Şeker, A, Diri, B, Balık, H., Derin Öğrenme Yöntemleri ve Uygulamaları Hakkında Bir İnceleme. Gazi Mühendislik Bilimleri Dergisi (GMBD), 3 (3), 47-64 (2017).
  • [19] Sak, H., Senior, A. W., ve Beaufays, F. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. (2014).
  • [20] Jose Navario phishing corpus, [Online]. Available: https://monkey.org/~jose/phishing/. [Accessed 2020].
  • [21] W. W. Cohen, Enron Email Dataset, 8 May [Online]. Available: https://www.cs.cmu.edu/~enron/. [Accessed 2020].
  • [22] Enron Corporation-Company Profile, [Online]. Available: https://www.referenceforbusiness.com/history2/57/Enron-Corporation.html. [Accessed 2020].
  • [23] Vinayakumar, R., Barathi Ganesh, H. B., ve Kumar, M., ve Soman, K. P. DeepAnti-PhishNet: applying deep neural networks for phishing email detection. CEN-AISecurity@ IWSPA, 40-50. (2018).
  • [24] Kramer, O. Scikit-learn. In Machine learning for evolution strategies (pp. 45-53). Springer, Cham. (2016).
  • [25] T. Fawcett, An Introduction to ROC Analysis, Pattern Recognition Letters, vol. 27, Jun 2006, pp. 861-874.
  • [26] Abu-Nimeh, S., Nappa, D., Wang, X., ve Nair, S. October). A comparison of machine learning techniques for phishing detection. In Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit (pp. 60-69). (2007,
  • [27] Hussain, R., ve Qamar, U. An Approach to Detect Spam Emails by Using Majority Voting. In International Conference on Data Mining, Internet Computing and Big Data (BigData2014) (pp. 76-83). (2014).
  • [28] Das, A., ve Verma, R. Automated email Generation for Targeted Attacks using Natural Language. arXiv preprint arXiv:1908.06893. (2019).
  • [29] Almomani, A., Gupta, B. B., Atawneh, S., Meulenberg, A., ve Almomani, E. A survey of phishing email filtering techniques. IEEE communications surveys & tutorials, 15(4), 2070-2090. (2013).
  • [30] Richardson, L. Beautiful soup documentation. April. (2007).
  • [31] Hopkins M., UCI Machine Learning Repository, Spambase Data Set, [Online].Available: https://archive.ics.uci.edu/ml/datasets/Spambase. [Accessed 2019]
  • [32] Moradpoor, N., Clavie, B., ve Buchanan, B. Employing machine learning techniques for detection and classification of phishing emails. In 2017 Computing Conference (pp. 149-156). IEEE. (2017, July).
  • [33] Spam Assassin spam email public corpus, [Online].Available: https://spamassassin.apache.org/old/publiccorpus/. [Accessed 2020].
  • [34] Unnithan, N. A., Harikrishnan, N. B., Vinayakumar, R., Soman, K. P., & Sundarakrishna, S. Detecting phishing E-mail using machine learning techniques (2018).
  • [35] Sonowal, G., & Kuppusamy, K. S. PhiDMA–A phishing detection model with multi-filter approach. Journal of King Saud University-Computer and Information Sciences. (2017).
  • [36] Espinoza, B., Simba, J., Fuertes, W., Benavides, E., Andrade, R., & Toulkeridis, T. December). Phishing Attack Detection: A Solution Based on the Typical Machine Learning Modeling Cycle. In 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (pp. 202-207). IEEE. (2019,
There are 36 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Makaleler(Araştırma)
Authors

Şeydanur Ahi 0000-0001-8511-440X

İbrahim Soğukpınar 0000-0002-0408-0277

Publication Date December 16, 2020
Published in Issue Year 2020 Volume: 13 Issue: 2

Cite

APA Ahi, Ş., & Soğukpınar, İ. (2020). Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti. Türkiye Bilişim Vakfı Bilgisayar Bilimleri Ve Mühendisliği Dergisi, 13(2), 17-31.
AMA Ahi Ş, Soğukpınar İ. Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti. TBV-BBMD. December 2020;13(2):17-31.
Chicago Ahi, Şeydanur, and İbrahim Soğukpınar. “Derin Öğrenme Modelleri Ile Kimlik Avı E-Posta Tespiti”. Türkiye Bilişim Vakfı Bilgisayar Bilimleri Ve Mühendisliği Dergisi 13, no. 2 (December 2020): 17-31.
EndNote Ahi Ş, Soğukpınar İ (December 1, 2020) Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 13 2 17–31.
IEEE Ş. Ahi and İ. Soğukpınar, “Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti”, TBV-BBMD, vol. 13, no. 2, pp. 17–31, 2020.
ISNAD Ahi, Şeydanur - Soğukpınar, İbrahim. “Derin Öğrenme Modelleri Ile Kimlik Avı E-Posta Tespiti”. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 13/2 (December 2020), 17-31.
JAMA Ahi Ş, Soğukpınar İ. Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti. TBV-BBMD. 2020;13:17–31.
MLA Ahi, Şeydanur and İbrahim Soğukpınar. “Derin Öğrenme Modelleri Ile Kimlik Avı E-Posta Tespiti”. Türkiye Bilişim Vakfı Bilgisayar Bilimleri Ve Mühendisliği Dergisi, vol. 13, no. 2, 2020, pp. 17-31.
Vancouver Ahi Ş, Soğukpınar İ. Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti. TBV-BBMD. 2020;13(2):17-31.

Article Acceptance

Use user registration/login to upload articles online.

The acceptance process of the articles sent to the journal consists of the following stages:

1. Each submitted article is sent to at least two referees at the first stage.

2. Referee appointments are made by the journal editors. There are approximately 200 referees in the referee pool of the journal and these referees are classified according to their areas of interest. Each referee is sent an article on the subject he is interested in. The selection of the arbitrator is done in a way that does not cause any conflict of interest.

3. In the articles sent to the referees, the names of the authors are closed.

4. Referees are explained how to evaluate an article and are asked to fill in the evaluation form shown below.

5. The articles in which two referees give positive opinion are subjected to similarity review by the editors. The similarity in the articles is expected to be less than 25%.

6. A paper that has passed all stages is reviewed by the editor in terms of language and presentation, and necessary corrections and improvements are made. If necessary, the authors are notified of the situation.

0

.   This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.