FiFTy: Large-Scale File Fragment Type Identification Using Convolutional Neural Networks

Govind Mittal, Pawel Korus, Nasir Memon

Research output: Contribution to journalArticlepeer-review

Abstract

We present FiFTy, a modern file-type identification tool for memory forensics and data carving. In contrast to previous approaches based on hand-crafted features, we design a compact neural network architecture, which uses a trainable embedding space. Our approach dispenses with the explicit feature extraction which has been a bottleneck in legacy systems. We evaluate the proposed method on a novel dataset with 75 file-types - the most diverse and balanced dataset reported to date. FiFTy consistently outperforms all baselines in terms of speed, accuracy and individual misclassification rates. We achieved an average accuracy of 77.5% with processing speed of ≈38 sec/GB, which is better and more than an order of magnitude faster than the previous state-of-the-art tool - Sceadan (69% at 9 min/GB). Our tool and the corresponding dataset is open-source.

Original languageEnglish (US)
Article number9122499
Pages (from-to)28-41
Number of pages14
JournalIEEE Transactions on Information Forensics and Security
Volume16
DOIs
StatePublished - 2021

Keywords

  • File-type classification
  • carving
  • convolutional neural network
  • machine learning
  • memory forensics

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'FiFTy: Large-Scale File Fragment Type Identification Using Convolutional Neural Networks'. Together they form a unique fingerprint.

Cite this