AcX: System, Techniques, and Experiments for Acronym Expansion

João L.M. João, João Casanova, Helena Galhardas, Dennis Shasha

Research output: Contribution to journalConference articlepeer-review


In this information-accumulating world, each of us must learn continuously. To participate in a new field, or even a sub-field, one must be aware of the terminology including the acronyms that specialists know so well, but newcomers do not. Building on state-of-the art acronym tools, our end-to-end acronym expander system called AcX takes a document, identifies its acronyms, and suggests expansions that are either found in the document or appropriate given the subject matter of the document. As far as we know, AcX is the first open source and extensible system for acronym expansion that allows mixing and matching of different inference modules. As of now, AcX works for English, French, and Portuguese with other languages in progress. This paper describes the design and implementation of AcX, proposes three new acronym expansion benchmarks, compares state-of-the-art techniques on them, and proposes ensemble techniques that improve on any single technique. Finally, the paper evaluates the performance of AcX and related work MadDog system in end-to-end experiments on a new human-annotated dataset of Wikipedia documents. Our experiments show that AcX outperforms MadDog but that human performance is still substantially better than the best automated approaches. Thus, achieving Acronym Expansion at a human level is still a rich and open challenge.

Original languageEnglish (US)
Pages (from-to)2530-2544
Number of pages15
JournalProceedings of the VLDB Endowment
Issue number11
StatePublished - 2022
Event48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, Australia
Duration: Sep 5 2022Sep 9 2022

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Computer Science


Dive into the research topics of 'AcX: System, Techniques, and Experiments for Acronym Expansion'. Together they form a unique fingerprint.

Cite this