Improving Domain Independent Question Parsing with Synthetic Treebanks

Halim-Antoine Boukaram, Nizar Habash, Micheline Ziadee, Majd Sakr

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Automatic syntactic parsing for question constructions is a challenging task due to the paucity of training examples in most treebanks. The near absence of question constructions is due to the dominance of the news domain in treebanking efforts. In this paper, we compare two synthetic low-cost question treebank creation methods with a conventional manual high-cost annotation method in the context of three domains (news questions, political talk shows, and chatbots) for Modern Standard Arabic, a language with relatively low resources and rich morphology. Our results show that synthetic methods can be effective at significantly reducing parsing errors for a target domain without having to invest large resources on manual annotation; and the combination of manual and synthetic methods is our best domain-independent performer.
Original languageUndefined
Title of host publicationProceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Place of PublicationSanta Fe, New Mexico, USA
PublisherAssociation for Computational Linguistics (ACL)
Number of pages8
StatePublished - Aug 1 2018
Externally publishedYes

Cite this