Robust Dictionary Lookup in Multiple Noisy Orthographies

Lingliang Zhang, Nizar Habash, Godfried Toussaint

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present the MultiScript Phonetic Search algorithm to address the problem of language learners looking up unfamiliar words that they heard. We apply it to Arabic dictionary lookup with noisy queries done using both the Arabic and Roman scripts. Our algorithm is based on a computational phonetic distance metric that can be optionally machine learned. To benchmark our performance, we created the ArabScribe dataset, containing 10,000 noisy transcriptions of random Arabic dictionary words. Our algorithm outperforms Google Translate’s “did you mean" feature, as well as the Yamli smart Arabic keyboard.

Original languageEnglish (US)
Title of host publicationWANLP 2017, co-located with EACL 2017 - 3rd Arabic Natural Language Processing Workshop, Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages119-129
Number of pages11
ISBN (Electronic)9781945626449
StatePublished - 2017
Event3rd Arabic Natural Language Processing Workshop, WANLP 2017 held at EACL 2017 - Valencia, Spain
Duration: Apr 3 2017 → …

Publication series

NameWANLP 2017, co-located with EACL 2017 - 3rd Arabic Natural Language Processing Workshop, Proceedings of the Workshop

Conference

Conference3rd Arabic Natural Language Processing Workshop, WANLP 2017 held at EACL 2017
Country/TerritorySpain
CityValencia
Period4/3/17 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Robust Dictionary Lookup in Multiple Noisy Orthographies'. Together they form a unique fingerprint.

Cite this