TY - JOUR
T1 - A benchmark for systematic generalization in grounded language understanding
AU - Ruis, Laura
AU - Andreas, Jacob
AU - Baroni, Marco
AU - Bouchacourt, Diane
AU - Lake, Brenden M.
N1 - Funding Information:
We are grateful to Adina Williams and Ev Fedorenko for very helpful discussions, to João Loula who did important initial work to explore compositional learning in a grid world, to Robin Vaaler for comments on an earlier version of this paper, and to Esther Vecht for important design advice and support. Through B. Lake’s position at NYU, this research was partially funded by NSF Award 1922658 NRT-HDR: FUTURE Foundations, Translation, and Responsibility for Data Science.
Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Humans easily interpret expressions that describe unfamiliar situations composed from familiar parts (“greet the pink brontosaurus by the ferris wheel”). Modern neural networks, by contrast, struggle to interpret novel compositions. In this paper, we introduce a new benchmark, gSCAN, for evaluating compositional generalization in situated language understanding. Going beyond a related benchmark that focused on syntactic aspects of generalization, gSCAN defines a language grounded in the states of a grid world, facilitating novel evaluations of acquiring linguistically motivated rules. For example, agents must understand how adjectives such as ‘small’ are interpreted relative to the current world state or how adverbs such as ‘cautiously’ combine with new verbs. We test a strong multi-modal baseline model and a state-of-the-art compositional method finding that, in most cases, they fail dramatically when generalization requires systematic compositional rules.
AB - Humans easily interpret expressions that describe unfamiliar situations composed from familiar parts (“greet the pink brontosaurus by the ferris wheel”). Modern neural networks, by contrast, struggle to interpret novel compositions. In this paper, we introduce a new benchmark, gSCAN, for evaluating compositional generalization in situated language understanding. Going beyond a related benchmark that focused on syntactic aspects of generalization, gSCAN defines a language grounded in the states of a grid world, facilitating novel evaluations of acquiring linguistically motivated rules. For example, agents must understand how adjectives such as ‘small’ are interpreted relative to the current world state or how adverbs such as ‘cautiously’ combine with new verbs. We test a strong multi-modal baseline model and a state-of-the-art compositional method finding that, in most cases, they fail dramatically when generalization requires systematic compositional rules.
UR - http://www.scopus.com/inward/record.url?scp=85098442617&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098442617&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85098442617
SN - 1049-5258
VL - 2020-December
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020
Y2 - 6 December 2020 through 12 December 2020
ER -