Malware encyclopedias now play a vital role in disseminating information about security threats. Coupled with categorization and generalization capabilities, such encyclopedias might help better defend against both isolated and clustered specimens.In this paper, we present Malware Evaluator, a classification framework that treats malware categorization as a supervised learning task, builds learning models with both support vector machines and decision trees and finally, visualizes classifications with self-organizing maps. Malware Evaluator refrains from using readily available taxonomic features to produce species classifications. Instead, we generate attributes of malware strains via a tokenization process and select the attributes used according to their projected information gain. We also deploy word stemming and stopword removal techniques to reduce dimensions of the feature space. In contrast to existing approaches, Malware Evaluator defines its taxonomic features based on the behavior of species throughout their life-cycle, allowing it to discover properties that previously might have gone unobserved. The learning and generalization capabilities of the framework also help detect and categorize zero-day attacks. Our prototype helps establish that malicious strains improve their penetration rate through multiple propagation channels as well as compact code footprints; moreover, they attempt to evade detection by resorting to code polymorphism and information encryption. Malware Evaluator also reveals that breeds in the categories of Trojan, Infector, Backdoor, and Worm significantly contribute to the malware population and impose critical risks on the Internet ecosystem.
- Malware characteristics and categorization
- Malware propagation mechanisms and payloads
- Self-organizing maps
- Support vector machines
ASJC Scopus subject areas
- Information Systems
- Hardware and Architecture