TY - JOUR
T1 - AcX
T2 - 48th International Conference on Very Large Data Bases, VLDB 2022
AU - João, João L.M.
AU - Casanova, João
AU - Galhardas, Helena
AU - Shasha, Dennis
N1 - Funding Information:
Pereira was supported through (i) FCT (Fundação para a Ciência e a Tecnologia), under the PhD Scholarship SFRH/BD/135719/2018 and (ii) the Graduate School of Informatics of the University of Amsterdam. Pereira and Galhardas were supported through FCT under the project UIDB/50021/2020. Shasha’s work has been supported by (i) the New York University Abu Dhabi Center for Interacting Urban Networks (CITIES), funded by Tamkeen under the NYUAD Research Institute Award CG001and by the Swiss Re Institute under the Quantum Cities initiative, (ii) NYU WIRELESS, (iii) U.S. National Science Foundation grants 1934388, 1840761, and 1339362, and (iv) INRIA. This support is greatly appreciated. The server virtual machines used to run the experiments were supported by BioData.pt – Infraestrutura Portuguesa de Dados Bi-ológicos, project 22231/01/SAICT/2016, funded by Portugal 2020, as well as Google Cloud and Dutch national e-infrastructure of the SURF Cooperative. We would like to thank the following individuals who contributed to previous designs of Acronym Expansion. They are listed here in order of participation: Ben Turtel, Kshitiz Sethia, Leah Bracken, Maria Beatriz Silva, and Maxime Prieur. We also would like to thank the annotators, mostly CS students from IST University of Lisbon, whose help was essential to create the end-to-end dataset. Finally, we would like to thank the reviewers for several excellent suggestions.
Funding Information:
The server virtual machines used to run the experiments were supported by BioData.pt – Infraestrutura Portuguesa de Dados Bi-ológicos, project 22231/01/SAICT/2016, funded by Portugal 2020, as well as Google Cloud and Dutch national e-infrastructure of the SURF Cooperative.
Funding Information:
Pereira was supported through (i) FCT (Fundação para a Ciência e a Tecnologia), under the PhD Scholarship SFRH/BD/135719/2018 and (ii) the Graduate School of Informatics of the University of Amsterdam. Pereira and Galhardas were supported through FCT under the project UIDB/50021/2020.
Funding Information:
Shasha’s work has been supported by (i) the New York University Abu Dhabi Center for Interacting Urban Networks (CITIES), funded by Tamkeen under the NYUAD Research Institute Award CG001and by the Swiss Re Institute under the Quantum Cities initiative, (ii) NYU WIRELESS, (iii) U.S. National Science Foundation grants 1934388, 1840761, and 1339362, and (iv) INRIA. This support is greatly appreciated.
Publisher Copyright:
© 2022, VLDB Endowment. All rights reserved.
PY - 2022
Y1 - 2022
N2 - In this information-accumulating world, each of us must learn continuously. To participate in a new field, or even a sub-field, one must be aware of the terminology including the acronyms that specialists know so well, but newcomers do not. Building on state-of-the art acronym tools, our end-to-end acronym expander system called AcX takes a document, identifies its acronyms, and suggests expansions that are either found in the document or appropriate given the subject matter of the document. As far as we know, AcX is the first open source and extensible system for acronym expansion that allows mixing and matching of different inference modules. As of now, AcX works for English, French, and Portuguese with other languages in progress. This paper describes the design and implementation of AcX, proposes three new acronym expansion benchmarks, compares state-of-the-art techniques on them, and proposes ensemble techniques that improve on any single technique. Finally, the paper evaluates the performance of AcX and related work MadDog system in end-to-end experiments on a new human-annotated dataset of Wikipedia documents. Our experiments show that AcX outperforms MadDog but that human performance is still substantially better than the best automated approaches. Thus, achieving Acronym Expansion at a human level is still a rich and open challenge.
AB - In this information-accumulating world, each of us must learn continuously. To participate in a new field, or even a sub-field, one must be aware of the terminology including the acronyms that specialists know so well, but newcomers do not. Building on state-of-the art acronym tools, our end-to-end acronym expander system called AcX takes a document, identifies its acronyms, and suggests expansions that are either found in the document or appropriate given the subject matter of the document. As far as we know, AcX is the first open source and extensible system for acronym expansion that allows mixing and matching of different inference modules. As of now, AcX works for English, French, and Portuguese with other languages in progress. This paper describes the design and implementation of AcX, proposes three new acronym expansion benchmarks, compares state-of-the-art techniques on them, and proposes ensemble techniques that improve on any single technique. Finally, the paper evaluates the performance of AcX and related work MadDog system in end-to-end experiments on a new human-annotated dataset of Wikipedia documents. Our experiments show that AcX outperforms MadDog but that human performance is still substantially better than the best automated approaches. Thus, achieving Acronym Expansion at a human level is still a rich and open challenge.
UR - http://www.scopus.com/inward/record.url?scp=85137973617&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85137973617&partnerID=8YFLogxK
U2 - 10.14778/3551793.3551812
DO - 10.14778/3551793.3551812
M3 - Conference article
AN - SCOPUS:85137973617
SN - 2150-8097
VL - 15
SP - 2530
EP - 2544
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 11
Y2 - 5 September 2022 through 9 September 2022
ER -