Exploiting syntactic and distributional information for spelling correction with web-scale N-gram models

Wei Xu, Joel Tetreault, Martin Chodorow, Ralph Grishman, Le Zhao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a novel way of incorporating dependency parse and word co-occurrence information into a state-of-the-art web-scale n-gram model for spelling correction. The syntactic and distributional information provides extra evidence in addition to that provided by a web-scale n-gram corpus and especially helps with data sparsity problems. Experimental results show that introducing syntactic features into n-gram based models significantly reduces errors by up to 12.4% over the current state-of-the-art. The word co-occurrence information shows potential but only improves overall accuracy slightly.

Original languageEnglish (US)
Title of host publicationEMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Pages1291-1300
Number of pages10
StatePublished - 2011
EventConference on Empirical Methods in Natural Language Processing, EMNLP 2011 - Edinburgh, United Kingdom
Duration: Jul 27 2011Jul 31 2011

Publication series

NameEMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Other

OtherConference on Empirical Methods in Natural Language Processing, EMNLP 2011
Country/TerritoryUnited Kingdom
CityEdinburgh
Period7/27/117/31/11

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Exploiting syntactic and distributional information for spelling correction with web-scale N-gram models'. Together they form a unique fingerprint.

Cite this