TY - JOUR
T1 - Malware characteristics and threats on the internet ecosystem
AU - Chen, Zhongqiang
AU - Roussopoulos, Mema
AU - Liang, Zhanyan
AU - Zhang, Yuan
AU - Chen, Zhongrong
AU - Delis, Alex
N1 - Funding Information:
Mema Roussopoulos is an Assistant Professor of Computer Science at the Department of Informatics and Telecommunications at the University of Athens in Athens, Greece. She completed her PhD in Computer Science and was a Postdoctoral Fellow in the Computer Science Department at Stanford University. She was an Assistant Professor of Computer Science on the Gordon McKay Endowment at Harvard University. She then was a faculty member at the Department of Computer Science at the University of Crete and an Associated Researcher at the Institute of Computer Science at FORTH. Her interests are the areas of distributed systems, networking, mobile computing, and digital preservation. She is a recipient of the CAREER award from the National Science Foundation, a Starting Grant Award from the European Research Council, and the Best Paper Award at ACM SOSP 2003.
Funding Information:
We are grateful to reviewers for their comments and Peter Wei of Trend Micro Inc. for fruitful discussions on the proposed framework. This work has been partially supported by the European Commission D4Science II FP7 Project and the ERC Starting Grant Project (no 279237).
PY - 2012/7
Y1 - 2012/7
N2 - Malware encyclopedias now play a vital role in disseminating information about security threats. Coupled with categorization and generalization capabilities, such encyclopedias might help better defend against both isolated and clustered specimens.In this paper, we present Malware Evaluator, a classification framework that treats malware categorization as a supervised learning task, builds learning models with both support vector machines and decision trees and finally, visualizes classifications with self-organizing maps. Malware Evaluator refrains from using readily available taxonomic features to produce species classifications. Instead, we generate attributes of malware strains via a tokenization process and select the attributes used according to their projected information gain. We also deploy word stemming and stopword removal techniques to reduce dimensions of the feature space. In contrast to existing approaches, Malware Evaluator defines its taxonomic features based on the behavior of species throughout their life-cycle, allowing it to discover properties that previously might have gone unobserved. The learning and generalization capabilities of the framework also help detect and categorize zero-day attacks. Our prototype helps establish that malicious strains improve their penetration rate through multiple propagation channels as well as compact code footprints; moreover, they attempt to evade detection by resorting to code polymorphism and information encryption. Malware Evaluator also reveals that breeds in the categories of Trojan, Infector, Backdoor, and Worm significantly contribute to the malware population and impose critical risks on the Internet ecosystem.
AB - Malware encyclopedias now play a vital role in disseminating information about security threats. Coupled with categorization and generalization capabilities, such encyclopedias might help better defend against both isolated and clustered specimens.In this paper, we present Malware Evaluator, a classification framework that treats malware categorization as a supervised learning task, builds learning models with both support vector machines and decision trees and finally, visualizes classifications with self-organizing maps. Malware Evaluator refrains from using readily available taxonomic features to produce species classifications. Instead, we generate attributes of malware strains via a tokenization process and select the attributes used according to their projected information gain. We also deploy word stemming and stopword removal techniques to reduce dimensions of the feature space. In contrast to existing approaches, Malware Evaluator defines its taxonomic features based on the behavior of species throughout their life-cycle, allowing it to discover properties that previously might have gone unobserved. The learning and generalization capabilities of the framework also help detect and categorize zero-day attacks. Our prototype helps establish that malicious strains improve their penetration rate through multiple propagation channels as well as compact code footprints; moreover, they attempt to evade detection by resorting to code polymorphism and information encryption. Malware Evaluator also reveals that breeds in the categories of Trojan, Infector, Backdoor, and Worm significantly contribute to the malware population and impose critical risks on the Internet ecosystem.
KW - Malware characteristics and categorization
KW - Malware propagation mechanisms and payloads
KW - Self-organizing maps
KW - Support vector machines
UR - http://www.scopus.com/inward/record.url?scp=84861096936&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84861096936&partnerID=8YFLogxK
U2 - 10.1016/j.jss.2012.02.015
DO - 10.1016/j.jss.2012.02.015
M3 - Article
AN - SCOPUS:84861096936
SN - 0164-1212
VL - 85
SP - 1650
EP - 1672
JO - Journal of Systems and Software
JF - Journal of Systems and Software
IS - 7
ER -