Abstract
Arabic has an ambiguous mapping between words and pronunciations, making it a deep orthographic system. This ambiguity can be resolved through diacritics, which if displayed, would compose 30% of characters in a text. We investigate the different dimensions of lexical modeling, covering diacritics, pronunciation rules, and acoustic based pronunciation modeling. We show the impact of explicitly modeling the different classes of diacritics (short vowels, geminates, nunnations). We further show that a phonetic lexicon, derived by applying simple pronunciation rules to diacritized words, offers the best gains in ASR performance. Finally, deriving pronunciations from acoustics, yields improvements, beyond a canonical lexicon.
Original language | English (US) |
---|---|
Pages (from-to) | 2605-2609 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
State | Published - 2014 |
Event | 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore Duration: Sep 14 2014 → Sep 18 2014 |
Keywords
- Arabic
- Automatic speech recognition
- Diacritics
- Joint sequence model
- Language model
- Lexical model
- Pronunciation mixture model
- Pronunciation rules
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modeling and Simulation