A two-stage pretraining algorithm for deep boltzmann machines

Kyunghyun Cho, Tapani Raiko, Alexander Ilin, Juha Karhunen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A deep Boltzmann machine (DBM) is a recently introduced Markov random field model that has multiple layers of hidden units. It has been shown empirically that it is difficult to train a DBM with approximate maximum- likelihood learning using the stochastic gradient unlike its simpler special case, restricted Boltzmann machine (RBM). In this paper, we propose a novel pretraining algorithm that consists of two stages; obtaining approximate posterior distributions over hidden units from a simpler model and maximizing the variational lower-bound given the fixed hidden posterior distributions. We show empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm.

Original languageEnglish (US)
Title of host publicationArtificial Neural Networks and Machine Learning, ICANN 2013 - 23rd International Conference on Artificial Neural Networks, Proceedings
Pages106-113
Number of pages8
DOIs
StatePublished - 2013
Event23rd International Conference on Artificial Neural Networks, ICANN 2013 - Sofia, Bulgaria
Duration: Sep 10 2013Sep 13 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8131 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other23rd International Conference on Artificial Neural Networks, ICANN 2013
Country/TerritoryBulgaria
CitySofia
Period9/10/139/13/13

Keywords

  • Deep Boltzmann Machine
  • Deep Learning
  • Pretraining

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'A two-stage pretraining algorithm for deep boltzmann machines'. Together they form a unique fingerprint.

Cite this