Comparison of Pattern Matching Techniques on Identification of Same Family Malware
Abstract
Development in computing technology for the past decade has also given rise to threats against the users, particularly in form of malware. However, manual malware identification effort is being overwhelmed due to the sheer number of malware being created every day. Most of the malware are not exactly created from scratch; large numbers of them are byproducts of particular malware family. This means that same or slightly modified resolution can be applied to counter their threat. This paper analyzes string matching methods for identification of same family malware. We investigate and compare the effectiveness of three well-known pattern matching algorithms, namely Jaro, Lowest Common Subsequence (LCS), and N-Gram. After researching these three algorithms we found out thresholds of 0.79 for Jaro, 0.79 for LCS, and 0.54 for N-Gram showed to be effective for string similarity detection between malware.
Index Terms— Jaro, Longest Common Subsequence, Malware Analysis, N-gram, String Similarity
Keywords
References
- Microsoft, "The evolution of malware and the threat landscape - a 10-year review: key findings," 2012, http://download.microsoft.com/download/1/A/7/ A76A73B-6C5B-41CF-9E8C
- F7709B870F/Microsoft-Security-Intelligence-Report
- Special-Edition-10-Year-Review-Key-Findings- Summary.pdf, Feb.2012 [Online; accessed September M.R.Islam , R.Tian, L.Batten, and S.Versteeg.
- "Classification of malware based on string and function feature selection." In Cybercrime and Trustworthy Computing Workshop (CTC), 2010 Second, pp. 9-17. IEEE, 2010.
- A.Walenstein, M.Venable, M.Hayes, C.Thompson, and A.Lakhotia. "Exploiting similarity between variants to defeat malware." In Proc. BlackHat DC Conf. 2007. K.Kendall, and C.McMillan. "Practical malware analysis." In Black Hat Conference, USA. 2007
- J.H.Park, M.Kim, B.Noh, and J.Joshi. "A Similarity based Technique for Detecting Malicious Executable files for Computer Forensics." In Information Reuse and Integration, 2006 IEEE International Conference on, pp. 193. IEEE, 2006.
- V.Levenshtein,"Binary codes capable of correcting deletions, insertions, and reversals". Soviet Physics Doklady 10 pp.707-710, USSR, 1966.
- J.Lee, C.Im, and H. Jeong. "A study of malware detection and classification by comparing extracted strings." In Proceedings of the 5th International Conference on Ubiquitous Communication, pp. 75. ACM, 2011 Management and A.Sulaiman, S.Mandada, S. Mukkamala, and A.Sung.
Details
Primary Language
English
Subjects
-
Journal Section
-
Publication Date
September 29, 2015
Submission Date
January 30, 2016
Acceptance Date
-
Published in Issue
Year 2015 Volume: 4 Number: 3