TY - JOUR
T1 - FiFTy
T2 - Large-Scale File Fragment Type Identification Using Convolutional Neural Networks
AU - Mittal, Govind
AU - Korus, Pawel
AU - Memon, Nasir
N1 - Funding Information:
Manuscript received August 9, 2019; revised January 1, 2020 and June 5, 2020; accepted June 6, 2020. Date of publication June 22, 2020; date of current version July 27, 2020. This work was supported in part by New York University (NYU) Abu Dhabi, United Arab Emirates. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Luisa Verdoliva. (Corresponding author: Paweł Korus.) Govind Mittal and Nasir Memon are with the Center for Cybersecurity, New York University Tandon School of Engineering, Brooklyn Borough, NY 11201 USA (e-mail: [email protected]; [email protected]).
Publisher Copyright:
© 2005-2012 IEEE.
PY - 2021
Y1 - 2021
N2 - We present FiFTy, a modern file-type identification tool for memory forensics and data carving. In contrast to previous approaches based on hand-crafted features, we design a compact neural network architecture, which uses a trainable embedding space. Our approach dispenses with the explicit feature extraction which has been a bottleneck in legacy systems. We evaluate the proposed method on a novel dataset with 75 file-types - the most diverse and balanced dataset reported to date. FiFTy consistently outperforms all baselines in terms of speed, accuracy and individual misclassification rates. We achieved an average accuracy of 77.5% with processing speed of ≈38 sec/GB, which is better and more than an order of magnitude faster than the previous state-of-the-art tool - Sceadan (69% at 9 min/GB). Our tool and the corresponding dataset is open-source.
AB - We present FiFTy, a modern file-type identification tool for memory forensics and data carving. In contrast to previous approaches based on hand-crafted features, we design a compact neural network architecture, which uses a trainable embedding space. Our approach dispenses with the explicit feature extraction which has been a bottleneck in legacy systems. We evaluate the proposed method on a novel dataset with 75 file-types - the most diverse and balanced dataset reported to date. FiFTy consistently outperforms all baselines in terms of speed, accuracy and individual misclassification rates. We achieved an average accuracy of 77.5% with processing speed of ≈38 sec/GB, which is better and more than an order of magnitude faster than the previous state-of-the-art tool - Sceadan (69% at 9 min/GB). Our tool and the corresponding dataset is open-source.
KW - File-type classification
KW - carving
KW - convolutional neural network
KW - machine learning
KW - memory forensics
UR - http://www.scopus.com/inward/record.url?scp=85089874236&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089874236&partnerID=8YFLogxK
U2 - 10.1109/TIFS.2020.3004266
DO - 10.1109/TIFS.2020.3004266
M3 - Article
AN - SCOPUS:85089874236
SN - 1556-6013
VL - 16
SP - 28
EP - 41
JO - IEEE Transactions on Information Forensics and Security
JF - IEEE Transactions on Information Forensics and Security
M1 - 9122499
ER -