Probabilistic context-free grammar induction based on structural zeros

Mehryar Mohri, Brian Roark

Research output: Contribution to conferencePaperpeer-review

Abstract

We present a method for induction of concise and accurate probabilistic context-free grammars for efficient use in early stages of a multi-stage parsing technique. The method is based on the use of statistical tests to determine if a non-terminal combination is unobserved due to sparse data or hard syntactic constraints. Experimental results show that, using this method, high accuracies can be achieved with a non-terminal set that is orders of magnitude smaller than in typically induced probabilistic context-free grammars, leading to substantial speed-ups in parsing. The approach is further used in combination with an existing reranker to provide competitive WSJ parsing results.

Original languageEnglish (US)
Pages312-319
Number of pages8
DOIs
StatePublished - 2006
Event2006 Human Language Technology Conference - North American Chapter of the Association for Computational Linguistics Annual Meeting, HLT-NAACL 2006 - New York, NY, United States
Duration: Jun 4 2006Jun 9 2006

Other

Other2006 Human Language Technology Conference - North American Chapter of the Association for Computational Linguistics Annual Meeting, HLT-NAACL 2006
CountryUnited States
CityNew York, NY
Period6/4/066/9/06

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Probabilistic context-free grammar induction based on structural zeros'. Together they form a unique fingerprint.

Cite this