ViT4Mal: Lightweight Vision Transformer for Malware Detection on Edge Devices

Akshara Ravi, Vivek Chaturvedi, Muhammad Shafique

Research output: Contribution to journalArticlepeer-review


There has been a tremendous growth of edge devices connected to the network in recent years. Although these devices make our life simpler and smarter, they need to perform computations under severe resource and energy constraints, while being vulnerable to malware attacks. Once compromised, these devices are further exploited as attack vectors targeting critical infrastructure. Most existing malware detection solutions are resource and compute-intensive and hence perform poorly in protecting edge devices. In this paper, we propose a novel approach ViT4Mal that utilizes a lightweight vision transformer (ViT) for malware detection on an edge device. ViT4Mal first converts executable byte-code into images to learn malware features and later uses a customized lightweight ViT to detect malware with high accuracy. We have performed extensive experiments to compare our model with state-of-the-art CNNs in the malware detection domain. Experimental results corroborate that ViTs don't demand deeper networks to achieve comparable accuracy of around 97% corresponding to heavily structured CNN models. We have also performed hardware deployment of our proposed lightweight ViT4Mal model on the Xilinx PYNQ Z1 FPGA board by applying specialized hardware optimizations such as quantization, loop pipelining, and array partitioning. ViT4Mal achieved an accuracy of ∼94% and a 41x speedup compared to the original ViT model.

Original languageEnglish (US)
Article number117
JournalACM Transactions on Embedded Computing Systems
Issue number5 s
StatePublished - Sep 9 2023


  • FPGA
  • IoT
  • hardware optimization
  • inference latency
  • malware
  • matrix multiplication
  • resource-constrained
  • vision transformer (ViT)

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture


Dive into the research topics of 'ViT4Mal: Lightweight Vision Transformer for Malware Detection on Edge Devices'. Together they form a unique fingerprint.

Cite this