TY - GEN
T1 - Gated feedback recurrent neural networks
AU - Chung, Junyoung
AU - Gulcehre, Caglar
AU - Cho, Kyunghyun
AU - Bengio, Yoshua
N1 - Funding Information:
The authors would like to thank the developers of Theano (Bastien et al., 2012) and Pylearn2 (Goodfellow et al., 2013). Also, the authors thank Yann N. Dauphin and Laurent Dinh for insightful comments and discussion. We acknowledge the support of the following agencies for research funding and computing support: NSERC, Samsung, Calcul Québec, Compute Canada, the Canada Research Chairs and CIFAR.
Publisher Copyright:
© Copyright 2015 by International Machine Learning Society (IMLS). All rights reserved.
PY - 2015
Y1 - 2015
N2 - In this work, we propose a novel recurrent neural network (RNN) architecture. The proposed RNN, gated-feedback RNN (GF-RNN), extends the existing approach of stacking multiple recurrent layers by allowing and controlling signals flowing from upper recurrent layers to lower layers using a global gating unit for each pair of layers. The recurrent signals exchanged between layers are gated adaptively based on the previous hidden states and the current input. We evaluated the proposed GF-RNN with different types of recurrent units, such as tanh, long short-term memory and gated recurrent units, on the tasks of character-level language modeling and Python program evaluation. Our empirical evaluation of different RNN units, revealed that in both tasks, the GF-RNN outperforms the conventional approaches to build deep stacked RNNs. We suggest that the improvement arises because the GF-RNN can adaptively assign different layers to different timescales and layer-to-layer interactions (including the top-down ones which are not usually present in a stacked RNN) by learning to gate these interactions.
AB - In this work, we propose a novel recurrent neural network (RNN) architecture. The proposed RNN, gated-feedback RNN (GF-RNN), extends the existing approach of stacking multiple recurrent layers by allowing and controlling signals flowing from upper recurrent layers to lower layers using a global gating unit for each pair of layers. The recurrent signals exchanged between layers are gated adaptively based on the previous hidden states and the current input. We evaluated the proposed GF-RNN with different types of recurrent units, such as tanh, long short-term memory and gated recurrent units, on the tasks of character-level language modeling and Python program evaluation. Our empirical evaluation of different RNN units, revealed that in both tasks, the GF-RNN outperforms the conventional approaches to build deep stacked RNNs. We suggest that the improvement arises because the GF-RNN can adaptively assign different layers to different timescales and layer-to-layer interactions (including the top-down ones which are not usually present in a stacked RNN) by learning to gate these interactions.
UR - http://www.scopus.com/inward/record.url?scp=84969972350&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84969972350&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84969972350
T3 - 32nd International Conference on Machine Learning, ICML 2015
SP - 2067
EP - 2075
BT - 32nd International Conference on Machine Learning, ICML 2015
A2 - Bach, Francis
A2 - Blei, David
PB - International Machine Learning Society (IMLS)
T2 - 32nd International Conference on Machine Learning, ICML 2015
Y2 - 6 July 2015 through 11 July 2015
ER -