Abstract
We consider the problem of predicting the surface pronunciations of a word in conversational speech, using a model of pronunciation variation based on articulatory features. We build context-dependent decision trees for both phone-based and feature-based models, and compare their perplexities on conversational data from the Switchboard Transcription Project. We find that a fully-factored model, with separate decision trees for each articulatory feature, does not perform well, but a feature-based model using a smaller number of "feature bundles" outperforms both the fully-factored model and a phone-based model. The articulatory feature-based decision trees are also much more robust to reductions in training data. We also analyze the usefulness of various context variables.
Original language | English (US) |
---|---|
Pages | 326-329 |
Number of pages | 4 |
State | Published - 2010 |
Event | 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba, Japan Duration: Sep 26 2010 → Sep 30 2010 |
Other
Other | 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 |
---|---|
Country/Territory | Japan |
City | Makuhari, Chiba |
Period | 9/26/10 → 9/30/10 |
Keywords
- Articulatory features
- Pronunciation modeling
ASJC Scopus subject areas
- Language and Linguistics
- Speech and Hearing