INTEGRATION OF TALKING HEADS AND TEXT-TO-SPEECH SYNTHESIZERS FOR VISUAL TTS

Jörn Ostermann, Mark Beutnagel, Ariel Fischer, Yao Wang

Research output: Contribution to conferencePaperpeer-review

Abstract

The integration of text-to-speech (TTS) synthesis and animation of synthetic faces allows new applications like visual human computer interfaces using agents or avatars. The TTS informs the talking head when phonemes are spoken. The appropriate mouth shapes are animated and rendered while the TTS produces the sound. We call this integrated system of TTS and animation a Visual TTS (VTTS). This paper describes the architecture on an integrated VTTS synthesizer that allows defining facial expressions as bookmarks in the text that will be animated while the model is talking. The position of a bookmark in the text defines the start time for the facial expression. The bookmark itself names the expression, its amplitude and the duration during which the amplitude has to be reached by the face. A bookmark to face animation parameter (FAP) converter creates a curve defining the amplitude for the given FAP over time using Hermite functions of 3rd order.

Original languageEnglish (US)
StatePublished - 1998
Event5th International Conference on Spoken Language Processing, ICSLP 1998 - Sydney, Australia
Duration: Nov 30 1998Dec 4 1998

Conference

Conference5th International Conference on Spoken Language Processing, ICSLP 1998
Country/TerritoryAustralia
CitySydney
Period11/30/9812/4/98

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'INTEGRATION OF TALKING HEADS AND TEXT-TO-SPEECH SYNTHESIZERS FOR VISUAL TTS'. Together they form a unique fingerprint.

Cite this