Development in computing technology for the past decade has also given rise to threats against the users, particularly in form of malware. However, manual malware identification effort is being overwhelmed due to the sheer number of malware being created every day. Most of the malware are not exactly created from scratch; large numbers of them are byproducts of particular malware family. This means that same or slightly modified resolution can be applied to counter their threat. This paper analyzes string matching methods for identification of same family malware. We investigate and compare the effectiveness of three well-known pattern matching algorithms, namely Jaro, Lowest Common Subsequence (LCS), and N-Gram. After researching these three algorithms we found out thresholds of 0.79 for Jaro, 0.79 for LCS, and 0.54 for N-Gram showed to be effective for string similarity detection between malware.
Index Terms— Jaro, Longest Common Subsequence, Malware Analysis, N-gram, String Similarity
Primary Language | English |
---|---|
Journal Section | Articles |
Authors | |
Publication Date | September 29, 2015 |
Submission Date | January 30, 2016 |
Published in Issue | Year 2015 Volume: 4 Issue: 3 |