TY - GEN
T1 - How to pretrain deep Boltzmann machines in two stages
AU - Cho, Kyunghyun
AU - Raiko, Tapani
AU - Ilin, Alexander
AU - Karhunen, Juha
PY - 2015
Y1 - 2015
N2 - A deep Boltzmann machine (DBM) is a recently introduced Markov random field model that has multiple layers of hidden units. It has been shown empirically that it is difficult to train a DBM with approximate maximum-likelihood learning using the stochastic gradient unlike its simpler special case, restricted Boltzmann machine (RBM). In this paper, we propose a novel pretraining algorithm that consists of two stages; obtaining approximate posterior distributions over hidden units from a simpler model and maximizing the variational lower-bound given the fixed hidden posterior distributions. We show empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm.
AB - A deep Boltzmann machine (DBM) is a recently introduced Markov random field model that has multiple layers of hidden units. It has been shown empirically that it is difficult to train a DBM with approximate maximum-likelihood learning using the stochastic gradient unlike its simpler special case, restricted Boltzmann machine (RBM). In this paper, we propose a novel pretraining algorithm that consists of two stages; obtaining approximate posterior distributions over hidden units from a simpler model and maximizing the variational lower-bound given the fixed hidden posterior distributions. We show empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm.
UR - http://www.scopus.com/inward/record.url?scp=85008402581&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85008402581&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-09903-3_10
DO - 10.1007/978-3-319-09903-3_10
M3 - Conference contribution
AN - SCOPUS:85008402581
SN - 9783319099026
T3 - Artificial Neural Networks - Methods and Applications in Bio-/Neuroinformatics
SP - 201
EP - 219
BT - Artificial Neural Networks - Methods and Applications in Bio-/Neuroinformatics
PB - Springer Verlag
T2 - 23rd International Conference on Artificial Neural Networks, ICANN 2013
Y2 - 10 September 2013 through 13 September 2013
ER -