Abstract
We present FiFTy, a modern file-type identification tool for memory forensics and data carving. In contrast to previous approaches based on hand-crafted features, we design a compact neural network architecture, which uses a trainable embedding space. Our approach dispenses with the explicit feature extraction which has been a bottleneck in legacy systems. We evaluate the proposed method on a novel dataset with 75 file-types - the most diverse and balanced dataset reported to date. FiFTy consistently outperforms all baselines in terms of speed, accuracy and individual misclassification rates. We achieved an average accuracy of 77.5% with processing speed of ≈38 sec/GB, which is better and more than an order of magnitude faster than the previous state-of-the-art tool - Sceadan (69% at 9 min/GB). Our tool and the corresponding dataset is open-source.
Original language | English (US) |
---|---|
Article number | 9122499 |
Pages (from-to) | 28-41 |
Number of pages | 14 |
Journal | IEEE Transactions on Information Forensics and Security |
Volume | 16 |
DOIs | |
State | Published - 2021 |
Keywords
- File-type classification
- carving
- convolutional neural network
- machine learning
- memory forensics
ASJC Scopus subject areas
- Safety, Risk, Reliability and Quality
- Computer Networks and Communications