TY - GEN
T1 - Contextual Embeddings Can Distinguish Homonymy from Polysemy in a Human-Like Way
AU - Wilson, Kyra
AU - Marantz, Alec
N1 - Funding Information:
This research was supported by the NYUAD Research Institute under Grant G1001 and was carried out on the High Performance Computing resources at New York University Abu Dhabi. We additionally thank Tal Linzen and anonymous reviewers for their guidance and feedback on earlier versions of this paper.
Publisher Copyright:
© ICNLSP 2022.All rights reserved
PY - 2022
Y1 - 2022
N2 - Lexical ambiguity is a pervasive feature of natural language, and a major difficulty in understanding language is selecting the intended meaning when more than one are possible. Despite this difficulty, many studies of single word recognition have found a processing advantage for ambiguous words compared to unambiguous ones. This effect is not homogeneous however–studies find consistent advantages for polysemes (words with multiple related meanings), and inconsistent results for homonyms (words with multiple unrelated meanings). Complicating this is the fact that most measures of ambiguity are derived from human- annotated or curated lexicographic resources, and their use is not consistent between studies. Our work investigates whether contextualized word embeddings are able to capture human-like distinctions between senses and meanings, and whether they can predict human behavior. We reanalyze data from previous experiments reporting ambiguity (dis)advantages using the lexical decision times reported in the English Lexicon Project. We find that our method does replicate the polyseme advantage and homonym disadvantage previously reported, and the predictors are superior to binary distinctions derived from lexicographic resources. Our findings point towards the benefits of using continuous-space representations of senses and meanings over more traditional measures. Additionally, we make our code publicly available for use in future research.
AB - Lexical ambiguity is a pervasive feature of natural language, and a major difficulty in understanding language is selecting the intended meaning when more than one are possible. Despite this difficulty, many studies of single word recognition have found a processing advantage for ambiguous words compared to unambiguous ones. This effect is not homogeneous however–studies find consistent advantages for polysemes (words with multiple related meanings), and inconsistent results for homonyms (words with multiple unrelated meanings). Complicating this is the fact that most measures of ambiguity are derived from human- annotated or curated lexicographic resources, and their use is not consistent between studies. Our work investigates whether contextualized word embeddings are able to capture human-like distinctions between senses and meanings, and whether they can predict human behavior. We reanalyze data from previous experiments reporting ambiguity (dis)advantages using the lexical decision times reported in the English Lexicon Project. We find that our method does replicate the polyseme advantage and homonym disadvantage previously reported, and the predictors are superior to binary distinctions derived from lexicographic resources. Our findings point towards the benefits of using continuous-space representations of senses and meanings over more traditional measures. Additionally, we make our code publicly available for use in future research.
UR - http://www.scopus.com/inward/record.url?scp=85152141994&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85152141994&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85152141994
T3 - ICNLSP 2022 - Proceedings of the 5th International Conference on Natural Language and Speech Processing
SP - 144
EP - 155
BT - ICNLSP 2022 - Proceedings of the 5th International Conference on Natural Language and Speech Processing
A2 - Abbas, Mourad
A2 - Freihat, Abed Alhakim
PB - Association for Computational Linguistics (ACL)
T2 - 5th International Conference on Natural Language and Speech Processing, ICNLSP 2022
Y2 - 16 December 2022 through 17 December 2022
ER -