BibTex RIS Cite

Comparison of Pattern Matching Techniques on Identification of Same Family Malware

Year 2015, Volume: 4 Issue: 3, 104 - 111, 29.09.2015

Abstract

Development in computing technology for the past decade has also given rise to threats against the users, particularly in form of malware. However, manual malware identification effort is being overwhelmed due to the sheer number of malware being created every day. Most of the malware are not exactly created from scratch; large numbers of them are byproducts of particular malware family. This means that same or slightly modified resolution can be applied to counter their threat. This paper analyzes string matching methods for identification of same family malware. We investigate and compare the effectiveness of three well-known pattern matching algorithms, namely Jaro, Lowest Common Subsequence (LCS), and N-Gram.  After researching these three algorithms we found out thresholds of 0.79 for Jaro, 0.79 for LCS, and 0.54 for N-Gram showed to be effective for string similarity detection between malware.

Index Terms— Jaro, Longest Common Subsequence, Malware Analysis, N-gram, String Similarity

References

  • Microsoft, "The evolution of malware and the threat landscape - a 10-year review: key findings," 2012, http://download.microsoft.com/download/1/A/7/ A76A73B-6C5B-41CF-9E8C
  • F7709B870F/Microsoft-Security-Intelligence-Report
  • Special-Edition-10-Year-Review-Key-Findings- Summary.pdf, Feb.2012 [Online; accessed September M.R.Islam , R.Tian, L.Batten, and S.Versteeg.
  • "Classification of malware based on string and function feature selection." In Cybercrime and Trustworthy Computing Workshop (CTC), 2010 Second, pp. 9-17. IEEE, 2010.
  • A.Walenstein, M.Venable, M.Hayes, C.Thompson, and A.Lakhotia. "Exploiting similarity between variants to defeat malware." In Proc. BlackHat DC Conf. 2007. K.Kendall, and C.McMillan. "Practical malware analysis." In Black Hat Conference, USA. 2007
  • J.H.Park, M.Kim, B.Noh, and J.Joshi. "A Similarity based Technique for Detecting Malicious Executable files for Computer Forensics." In Information Reuse and Integration, 2006 IEEE International Conference on, pp. 193. IEEE, 2006.
  • V.Levenshtein,"Binary codes capable of correcting deletions, insertions, and reversals". Soviet Physics Doklady 10 pp.707-710, USSR, 1966.
  • J.Lee, C.Im, and H. Jeong. "A study of malware detection and classification by comparing extracted strings." In Proceedings of the 5th International Conference on Ubiquitous Communication, pp. 75. ACM, 2011 Management and A.Sulaiman, S.Mandada, S. Mukkamala, and A.Sung.
  • "Similarity Analysis of Malicious Executables." In Proceedings of the 2nd International Conference on Information Warfare & Security, pp. 225. Academic Conferences Limited, 2007.
  • M.Jaro. “Advances in record linkage methodology as applied to the 1985 census of Tampa Florida,” In 84th
  • Journal of the American Statistical Association, pp.414- , 1989.
  • L. Bergroth, H. Hakonen and T. Raita. “A Survey of Longest Common Subsequence Algorithms” In SPIRE (IEEE Computer Society), pp.39-48, 2000.
  • D.Plohman. “Portable Executable 101 - a windows executable https://code.google.com/p/corkami/wiki/PE101?show=co ntent, Aug.2014[Online, accessed August 2014] Internet:
  • DG Altman and JM Bland. “Diagnostic tests. 1 :Sensitivity and specificity”, In 38th Business Medical Journal ,1994.
  • D.Olson and D.Delen, Advanced Data Mining Techniques, 1st ed, Springer, 2008, pp.138.
Year 2015, Volume: 4 Issue: 3, 104 - 111, 29.09.2015

Abstract

References

  • Microsoft, "The evolution of malware and the threat landscape - a 10-year review: key findings," 2012, http://download.microsoft.com/download/1/A/7/ A76A73B-6C5B-41CF-9E8C
  • F7709B870F/Microsoft-Security-Intelligence-Report
  • Special-Edition-10-Year-Review-Key-Findings- Summary.pdf, Feb.2012 [Online; accessed September M.R.Islam , R.Tian, L.Batten, and S.Versteeg.
  • "Classification of malware based on string and function feature selection." In Cybercrime and Trustworthy Computing Workshop (CTC), 2010 Second, pp. 9-17. IEEE, 2010.
  • A.Walenstein, M.Venable, M.Hayes, C.Thompson, and A.Lakhotia. "Exploiting similarity between variants to defeat malware." In Proc. BlackHat DC Conf. 2007. K.Kendall, and C.McMillan. "Practical malware analysis." In Black Hat Conference, USA. 2007
  • J.H.Park, M.Kim, B.Noh, and J.Joshi. "A Similarity based Technique for Detecting Malicious Executable files for Computer Forensics." In Information Reuse and Integration, 2006 IEEE International Conference on, pp. 193. IEEE, 2006.
  • V.Levenshtein,"Binary codes capable of correcting deletions, insertions, and reversals". Soviet Physics Doklady 10 pp.707-710, USSR, 1966.
  • J.Lee, C.Im, and H. Jeong. "A study of malware detection and classification by comparing extracted strings." In Proceedings of the 5th International Conference on Ubiquitous Communication, pp. 75. ACM, 2011 Management and A.Sulaiman, S.Mandada, S. Mukkamala, and A.Sung.
  • "Similarity Analysis of Malicious Executables." In Proceedings of the 2nd International Conference on Information Warfare & Security, pp. 225. Academic Conferences Limited, 2007.
  • M.Jaro. “Advances in record linkage methodology as applied to the 1985 census of Tampa Florida,” In 84th
  • Journal of the American Statistical Association, pp.414- , 1989.
  • L. Bergroth, H. Hakonen and T. Raita. “A Survey of Longest Common Subsequence Algorithms” In SPIRE (IEEE Computer Society), pp.39-48, 2000.
  • D.Plohman. “Portable Executable 101 - a windows executable https://code.google.com/p/corkami/wiki/PE101?show=co ntent, Aug.2014[Online, accessed August 2014] Internet:
  • DG Altman and JM Bland. “Diagnostic tests. 1 :Sensitivity and specificity”, In 38th Business Medical Journal ,1994.
  • D.Olson and D.Delen, Advanced Data Mining Techniques, 1st ed, Springer, 2008, pp.138.
There are 15 citations in total.

Details

Primary Language English
Journal Section Articles
Authors

Ferdiansyah Mastjik This is me

Cihan Varol This is me

Asaf Varol

Publication Date September 29, 2015
Submission Date January 30, 2016
Published in Issue Year 2015 Volume: 4 Issue: 3

Cite

IEEE F. Mastjik, C. Varol, and A. Varol, “Comparison of Pattern Matching Techniques on Identification of Same Family Malware”, IJISS, vol. 4, no. 3, pp. 104–111, 2015.