Improving domain independent question parsing with synthetic treebanks

Halim Antoine Boukaram, Nizar Habash, Micheline Ziadee, Majd Sakr

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automatic syntactic parsing for question constructions is a challenging task due to the paucity of training examples in most treebanks. The near absence of question constructions is due to the dominance of the news domain in treebanking efforts. In this paper, we compare two synthetic low-cost question treebank creation methods with a conventional manual high-cost annotation method in the context of three domains (news questions, political talk shows, and chatbots) for Modern Standard Arabic, a language with relatively low resources and rich morphology. Our results show that synthetic methods can be effective at significantly reducing parsing errors for a target domain without having to invest large resources on manual annotation; and the combination of manual and synthetic methods is our best domain-independent performer.

Original languageEnglish (US)
Title of host publicationLAW-MWE-CxG 2018 - Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages214-221
Number of pages8
ISBN (Electronic)9781948087513
StatePublished - Jan 1 2018
EventJoint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, LAW-MWECxG 2018, in conjunction with the 27th International Conference on Computational Linguistics, COLING 2018 - Santa Fe, United States
Duration: Aug 25 2018Aug 26 2018

Publication series

NameLAW-MWE-CxG 2018 - Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, Proceedings of the Workshop

Conference

ConferenceJoint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, LAW-MWECxG 2018, in conjunction with the 27th International Conference on Computational Linguistics, COLING 2018
Country/TerritoryUnited States
CitySanta Fe
Period8/25/188/26/18

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Improving domain independent question parsing with synthetic treebanks'. Together they form a unique fingerprint.

Cite this