The paradigm discovery problem

Alexander Erdmann, Micha Elsner, Shijie Wu, Ryan Cotterell, Nizar Habash

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This work treats the paradigm discovery problem (PDP)-the task of learning an inflectional morphological system from unannotated sentences. We formalize the PDP and develop evaluation metrics for judging systems. Using currently available resources, we construct datasets for the task. We also devise a heuristic benchmark for the PDP and report empirical results on five diverse languages. Our benchmark system first makes use of word embeddings and string similarity to cluster forms by cell and by paradigm. Then, we bootstrap a neural transducer on top of the clustered data to predict words to realize the empty paradigm slots. An error analysis of our system suggests clustering by cell across different inflection classes is the most pressing challenge for future work. Our code and data are available at https://github.com/alexerdmann/ParadigmDiscovery.

Original languageEnglish (US)
Title of host publicationACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages7778-7790
Number of pages13
ISBN (Electronic)9781952148255
StatePublished - 2020
Event58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 - Virtual, Online, United States
Duration: Jul 5 2020Jul 10 2020

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference58th Annual Meeting of the Association for Computational Linguistics, ACL 2020
Country/TerritoryUnited States
CityVirtual, Online
Period7/5/207/10/20

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'The paradigm discovery problem'. Together they form a unique fingerprint.

Cite this