Temporal supervised learning for inferring a dialog policy from example conversations

Lihong Li, He He, Jason D. Williams

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper tackles the problem of learning a dialog policy from example dialogs - for example, from Wizard-of-Oz style dialogs, where an expert (person) plays the role of the system. Learning in this setting is challenging because dialog is a temporal process in which actions affect the future course of the conversation - i.e., dialog requires planning. Past work solved this problem with either conventional supervised learning or reinforcement learning. Reinforcement learning provides a principled approach to planning, but requires more resources than a fixed corpus of examples, such as a dialog simulator or a reward function. Conventional supervised learning, by contrast, operates directly from example dialogs but does not take proper account of planning. We introduce a new algorithm called Temporal Supervised Learning which learns directly from example dialogs, while also taking proper account of planning. The key idea is to choose the next dialog action to maximize the expected discounted accuracy until the end of the dialog. On a dialog testbed in the calendar domain, in simulation, we show that a dialog manager trained with temporal supervised learning substantially outperforms a baseline trained using conventional supervised learning.

Original languageEnglish (US)
Title of host publication2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages312-317
Number of pages6
ISBN (Electronic)9781479971299
DOIs
StatePublished - Apr 1 2014
Event2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - South Lake Tahoe, United States
Duration: Dec 7 2014Dec 10 2014

Publication series

Name2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings

Conference

Conference2014 IEEE Workshop on Spoken Language Technology, SLT 2014
Country/TerritoryUnited States
CitySouth Lake Tahoe
Period12/7/1412/10/14

ASJC Scopus subject areas

  • Computer Science Applications
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Temporal supervised learning for inferring a dialog policy from example conversations'. Together they form a unique fingerprint.

Cite this