A Transformer-based Function Symbol Name Inference Model from an Assembly Language for Binary Reversing

Hyunjin Kim, Jinyeong Bak, Kyunghyun Cho, Hyungjoon Koo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Reverse engineering of a stripped binary has a wide range of applications, yet it is challenging mainly due to the lack of contextually useful information within. Once debugging symbols (e.g., variable names, types, function names) are discarded, recovering such information is not technically viable with traditional approaches like static or dynamic binary analysis. We focus on a function symbol name recovery, which allows a reverse engineer to gain a quick overview of an unseen binary. The key insight is that a well-developed program labels a meaningful function name that describes its underlying semantics well. In this paper, we present AsmDepictor, the Transformer-based framework that generates a function symbol name from a set of assembly codes (i.e., machine instructions), which consists of three major components: binary code refinement, model training, and inference. To this end, we conduct systematic experiments on the effectiveness of code refinement that can enhance an overall performance. We introduce the per-layer positional embedding and Unique-softmax for AsmDepictor so that both can aid to capture a better relationship between tokens. Lastly, we devise a novel evaluation metric tailored for a short description length, the Jaccard∗score. Our empirical evaluation shows that the performance of AsmDepictor by far surpasses that of the state-of-the-art models up to around 400%. The best AsmDepictor model achieves an F1 of 71.5 and Jaccard∗of 75.4.

Original languageEnglish (US)
Title of host publicationASIA CCS 2023 - Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security
PublisherAssociation for Computing Machinery
Pages951-965
Number of pages15
ISBN (Electronic)9798400700989
DOIs
StatePublished - Jul 10 2023
Event18th ACM ASIA Conference on Computer and Communications Security, ASIA CCS 2023 - Melbourne, Australia
Duration: Jul 10 2023Jul 14 2023

Publication series

NameProceedings of the ACM Conference on Computer and Communications Security
ISSN (Print)1543-7221

Conference

Conference18th ACM ASIA Conference on Computer and Communications Security, ASIA CCS 2023
Country/TerritoryAustralia
CityMelbourne
Period7/10/237/14/23

Keywords

  • assembly
  • function name
  • neural networks
  • reversing
  • Transformer

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'A Transformer-based Function Symbol Name Inference Model from an Assembly Language for Binary Reversing'. Together they form a unique fingerprint.

Cite this