TY - GEN
T1 - A two-stage pretraining algorithm for deep boltzmann machines
AU - Cho, Kyunghyun
AU - Raiko, Tapani
AU - Ilin, Alexander
AU - Karhunen, Juha
PY - 2013
Y1 - 2013
N2 - A deep Boltzmann machine (DBM) is a recently introduced Markov random field model that has multiple layers of hidden units. It has been shown empirically that it is difficult to train a DBM with approximate maximum- likelihood learning using the stochastic gradient unlike its simpler special case, restricted Boltzmann machine (RBM). In this paper, we propose a novel pretraining algorithm that consists of two stages; obtaining approximate posterior distributions over hidden units from a simpler model and maximizing the variational lower-bound given the fixed hidden posterior distributions. We show empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm.
AB - A deep Boltzmann machine (DBM) is a recently introduced Markov random field model that has multiple layers of hidden units. It has been shown empirically that it is difficult to train a DBM with approximate maximum- likelihood learning using the stochastic gradient unlike its simpler special case, restricted Boltzmann machine (RBM). In this paper, we propose a novel pretraining algorithm that consists of two stages; obtaining approximate posterior distributions over hidden units from a simpler model and maximizing the variational lower-bound given the fixed hidden posterior distributions. We show empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm.
KW - Deep Boltzmann Machine
KW - Deep Learning
KW - Pretraining
UR - http://www.scopus.com/inward/record.url?scp=84884941662&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84884941662&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-40728-4_14
DO - 10.1007/978-3-642-40728-4_14
M3 - Conference contribution
AN - SCOPUS:84884941662
SN - 9783642407277
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 106
EP - 113
BT - Artificial Neural Networks and Machine Learning, ICANN 2013 - 23rd International Conference on Artificial Neural Networks, Proceedings
T2 - 23rd International Conference on Artificial Neural Networks, ICANN 2013
Y2 - 10 September 2013 through 13 September 2013
ER -