Development in computing technology for the past decade has also given rise to threats against the users, particularly in form of malware. However, manual malware identification effort is being overwhelmed due to the sheer number of malware being created every day. Most of the malware are not exactly created from scratch; large numbers of them are byproducts of particular malware family. This means that same or slightly modified resolution can be applied to counter their threat. This paper analyzes string matching methods for identification of same family malware. We investigate and compare the effectiveness of three well-known pattern matching algorithms, namely Jaro, Lowest Common Subsequence (LCS), and N-Gram. After researching these three algorithms we found out thresholds of 0.79 for Jaro, 0.79 for LCS, and 0.54 for N-Gram showed to be effective for string similarity detection between malware.
Index Terms— Jaro, Longest Common Subsequence, Malware Analysis, N-gram, String Similarity
Jaro Longest Common Subsequence Malware Analysis N-gram String Similarity
Birincil Dil | İngilizce |
---|---|
Bölüm | Makaleler |
Yazarlar | |
Yayımlanma Tarihi | 29 Eylül 2015 |
Gönderilme Tarihi | 30 Ocak 2016 |
Yayımlandığı Sayı | Yıl 2015 Cilt: 4 Sayı: 3 |