Published January 1, 2011 | Version v1
Conference paper Open

Proposal of n-gram Based Algorithm for Malware Classification

  • 1. Sci & Technol Res Council Turkey, Natl Res Inst Elect & Cryptol, Gebze, Turkey
  • 2. Galatasaray Univ, Dept Comp Engn, Istanbul, Turkey

Description

Obfuscation techniques degrade the n-gram features of binary form of the malware. In this study, methodology to classify malware instances by using n-gram features of its disassembled code is presented. The presented statistical method uses the n-gram features of the malware to classify its instance with respect to their families. n-gram is a fixed size sliding window of byte array, where n is the size of the window. The contribution of the presented method is capability of using only one vector to represent malware subfamily which is called subfamily centroid. Using only one vector for classification simply reduces the dimension of the n-gram space. Experimental results are performed over a fairly large data set, which is being collected through Computer Emergency Response Team (CERT) activities in the National Research Institute of Electronics and Cryptology, to illustrate the effectiveness of the proposed malware classification methodology.

Files

bib-537944f7-1eff-42dc-b955-96fcafc0aa65.txt

Files (231 Bytes)

Name Size Download all
md5:4e21ee4f936123943277b158e8325317
231 Bytes Preview Download