TY - GEN
T1 - Improving domain independent question parsing with synthetic treebanks
AU - Boukaram, Halim Antoine
AU - Habash, Nizar
AU - Ziadee, Micheline
AU - Sakr, Majd
N1 - Publisher Copyright:
Copyright © LAW-MWE-CxG 2018 - Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, Proceedings of the Workshop.All rights reserved.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Automatic syntactic parsing for question constructions is a challenging task due to the paucity of training examples in most treebanks. The near absence of question constructions is due to the dominance of the news domain in treebanking efforts. In this paper, we compare two synthetic low-cost question treebank creation methods with a conventional manual high-cost annotation method in the context of three domains (news questions, political talk shows, and chatbots) for Modern Standard Arabic, a language with relatively low resources and rich morphology. Our results show that synthetic methods can be effective at significantly reducing parsing errors for a target domain without having to invest large resources on manual annotation; and the combination of manual and synthetic methods is our best domain-independent performer.
AB - Automatic syntactic parsing for question constructions is a challenging task due to the paucity of training examples in most treebanks. The near absence of question constructions is due to the dominance of the news domain in treebanking efforts. In this paper, we compare two synthetic low-cost question treebank creation methods with a conventional manual high-cost annotation method in the context of three domains (news questions, political talk shows, and chatbots) for Modern Standard Arabic, a language with relatively low resources and rich morphology. Our results show that synthetic methods can be effective at significantly reducing parsing errors for a target domain without having to invest large resources on manual annotation; and the combination of manual and synthetic methods is our best domain-independent performer.
UR - http://www.scopus.com/inward/record.url?scp=85084297062&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084297062&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85084297062
T3 - LAW-MWE-CxG 2018 - Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, Proceedings of the Workshop
SP - 214
EP - 221
BT - LAW-MWE-CxG 2018 - Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
T2 - Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, LAW-MWECxG 2018, in conjunction with the 27th International Conference on Computational Linguistics, COLING 2018
Y2 - 25 August 2018 through 26 August 2018
ER -