TY - GEN
T1 - Temporal supervised learning for inferring a dialog policy from example conversations
AU - Li, Lihong
AU - He, He
AU - Williams, Jason D.
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/4/1
Y1 - 2014/4/1
N2 - This paper tackles the problem of learning a dialog policy from example dialogs - for example, from Wizard-of-Oz style dialogs, where an expert (person) plays the role of the system. Learning in this setting is challenging because dialog is a temporal process in which actions affect the future course of the conversation - i.e., dialog requires planning. Past work solved this problem with either conventional supervised learning or reinforcement learning. Reinforcement learning provides a principled approach to planning, but requires more resources than a fixed corpus of examples, such as a dialog simulator or a reward function. Conventional supervised learning, by contrast, operates directly from example dialogs but does not take proper account of planning. We introduce a new algorithm called Temporal Supervised Learning which learns directly from example dialogs, while also taking proper account of planning. The key idea is to choose the next dialog action to maximize the expected discounted accuracy until the end of the dialog. On a dialog testbed in the calendar domain, in simulation, we show that a dialog manager trained with temporal supervised learning substantially outperforms a baseline trained using conventional supervised learning.
AB - This paper tackles the problem of learning a dialog policy from example dialogs - for example, from Wizard-of-Oz style dialogs, where an expert (person) plays the role of the system. Learning in this setting is challenging because dialog is a temporal process in which actions affect the future course of the conversation - i.e., dialog requires planning. Past work solved this problem with either conventional supervised learning or reinforcement learning. Reinforcement learning provides a principled approach to planning, but requires more resources than a fixed corpus of examples, such as a dialog simulator or a reward function. Conventional supervised learning, by contrast, operates directly from example dialogs but does not take proper account of planning. We introduce a new algorithm called Temporal Supervised Learning which learns directly from example dialogs, while also taking proper account of planning. The key idea is to choose the next dialog action to maximize the expected discounted accuracy until the end of the dialog. On a dialog testbed in the calendar domain, in simulation, we show that a dialog manager trained with temporal supervised learning substantially outperforms a baseline trained using conventional supervised learning.
UR - http://www.scopus.com/inward/record.url?scp=84946688303&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84946688303&partnerID=8YFLogxK
U2 - 10.1109/SLT.2014.7078593
DO - 10.1109/SLT.2014.7078593
M3 - Conference contribution
AN - SCOPUS:84946688303
T3 - 2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings
SP - 312
EP - 317
BT - 2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 IEEE Workshop on Spoken Language Technology, SLT 2014
Y2 - 7 December 2014 through 10 December 2014
ER -