Rapid Unit Selection from a Large Speech Corpus for Concatenative Speech Synthesis

Mark Beutnagel, Mehryar Mohri, Michael Riley

Research output: Contribution to conferencePaperpeer-review

Abstract

Concatenative Text-to-Speech (TTS) systems such as those described by Hunt and Black [6] can select at synthesis time from a very large number of recorded units. The selected units are chosen to minimize a combination of target and join costs for a given sentence. However, the join costs, in particular, can be quite expensive to compute, even when this computation has been optimized. If possible, we would avoid this computation by precomputing and caching all the possible join costs, but their number is prohibitive. Although the search space of possible joins is large, we have found that only a small fraction are selected in practice. By synthesizing a large quantity of text and logging the units actually selected, we were able to gather usage statistics and construct a practical and efficient cache of concatenation costs. Use of this cache dramatically decreased the runtime of the AT&T Next-Generation TTS system [1] with negligible effect on speech quality. Experiments show that by caching 0.7% of the possible joins, 99% of the join cost computations can be avoided.

Original languageEnglish (US)
Pages607-610
Number of pages4
StatePublished - 1999
Event6th European Conference on Speech Communication and Technology, EUROSPEECH 1999 - Budapest, Hungary
Duration: Sep 5 1999Sep 9 1999

Conference

Conference6th European Conference on Speech Communication and Technology, EUROSPEECH 1999
Country/TerritoryHungary
CityBudapest
Period9/5/999/9/99

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Linguistics and Language
  • Communication

Fingerprint

Dive into the research topics of 'Rapid Unit Selection from a Large Speech Corpus for Concatenative Speech Synthesis'. Together they form a unique fingerprint.

Cite this