Neural unsupervised parsing beyond english

Katharina Kann, Anhad Mohananey, Kyunghyun Cho, Samuel R. Bowman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recently, neural network models which automatically infer syntactic structure from raw text have started to achieve promising results. However, earlier work on unsupervised parsing shows large performance differences between non-neural models trained on corpora in different languages, even for comparable amounts of data. With that in mind, we train instances of the PRPN architecture (Shen et al., 2018a)-one of these unsupervised neural network parsers-for Arabic, Chinese, English, and German. We find that (i) the model strongly outperforms trivial baselines and, thus, acquires at least some parsing ability for all languages; (ii) good hyperparameter values seem to be universal; (iii) how the model benefits from larger training set sizes depends on the corpus, with the model achieving the largest performance gains when increasing the number of sentences from 2,500 to 12,500 for English. In addition, we show that, by sharing parameters between the related languages German and English, we can improve the model's unsupervised parsing F1 score by up to 4% in the low-resource setting.

Original languageEnglish (US)
Title of host publicationDeepLo@EMNLP-IJCNLP 2019 - Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing - Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages209-218
Number of pages10
ISBN (Electronic)9781950737789
StatePublished - 2021
Event2nd Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing, DeepLo@EMNLP-IJCNLP 2019 - Hong Kong, China
Duration: Nov 3 2019 → …

Publication series

NameDeepLo@EMNLP-IJCNLP 2019 - Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing - Proceedings

Conference

Conference2nd Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing, DeepLo@EMNLP-IJCNLP 2019
Country/TerritoryChina
CityHong Kong
Period11/3/19 → …

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software

Fingerprint

Dive into the research topics of 'Neural unsupervised parsing beyond english'. Together they form a unique fingerprint.

Cite this