Automatic syntactic parsing for question constructions is a challenging task due to the paucity of training examples in most treebanks. The near absence of question constructions is due to the dominance of the news domain in treebanking efforts. In this paper, we compare two synthetic low-cost question treebank creation methods with a conventional manual high-cost annotation method in the context of three domains (news questions, political talk shows, and chatbots) for Modern Standard Arabic, a language with relatively low resources and rich morphology. Our results show that synthetic methods can be effective at significantly reducing parsing errors for a target domain without having to invest large resources on manual annotation; and the combination of manual and synthetic methods is our best domain-independent performer.
|Title of host publication||Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)|
|Place of Publication||Santa Fe, New Mexico, USA|
|Publisher||Association for Computational Linguistics (ACL)|
|Number of pages||8|
|State||Published - Aug 1 2018|
Boukaram, H-A., Habash, N., Ziadee, M., & Sakr, M. (2018). Improving Domain Independent Question Parsing with Synthetic Treebanks. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018) (pp. 214-221). Association for Computational Linguistics (ACL). https://www.aclweb.org/anthology/W18-4924