TY - GEN
T1 - COGS
T2 - 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020
AU - Kim, Najoung
AU - Linzen, Tal
N1 - Publisher Copyright:
© 2020 Association for Computational Linguistics.
PY - 2020
Y1 - 2020
N2 - Natural language is characterized by compositionality: the meaning of a complex expression is constructed from the meanings of its constituent parts. To facilitate the evaluation of the compositional abilities of language processing architectures, we introduce COGS, a semantic parsing dataset based on a fragment of English. The evaluation portion of COGS contains multiple systematic gaps that can only be addressed by compositional generalization; these include new combinations of familiar syntactic structures, or new combinations of familiar words and familiar structures. In experiments with Transformers and LSTMs, we found that in-distribution accuracy on the COGS test set was near-perfect (96-99%), but generalization accuracy was substantially lower (16-35%) and showed high sensitivity to random seed (±6-8%). These findings indicate that contemporary standard NLP models are limited in their compositional generalization capacity, and position COGS as a good way to measure progress.
AB - Natural language is characterized by compositionality: the meaning of a complex expression is constructed from the meanings of its constituent parts. To facilitate the evaluation of the compositional abilities of language processing architectures, we introduce COGS, a semantic parsing dataset based on a fragment of English. The evaluation portion of COGS contains multiple systematic gaps that can only be addressed by compositional generalization; these include new combinations of familiar syntactic structures, or new combinations of familiar words and familiar structures. In experiments with Transformers and LSTMs, we found that in-distribution accuracy on the COGS test set was near-perfect (96-99%), but generalization accuracy was substantially lower (16-35%) and showed high sensitivity to random seed (±6-8%). These findings indicate that contemporary standard NLP models are limited in their compositional generalization capacity, and position COGS as a good way to measure progress.
UR - http://www.scopus.com/inward/record.url?scp=85101606681&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85101606681&partnerID=8YFLogxK
U2 - 10.18653/v1/2020.emnlp-main.731
DO - 10.18653/v1/2020.emnlp-main.731
M3 - Conference contribution
AN - SCOPUS:85101606681
T3 - EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
SP - 9087
EP - 9105
BT - EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
Y2 - 16 November 2020 through 20 November 2020
ER -