Deep learning and the information bottleneck principle

Naftali Tishby, Noga Zaslavsky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds. The advantage of getting closer to the theoretical limit is quantifiable both by the generalization bound and by the network's simplicity. We argue that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer. The hierarchical representations at the layered network naturally correspond to the structural phase transitions along the information curve. We believe that this new insight can lead to new optimality bounds and deep learning algorithms.

Original languageEnglish (US)
Title of host publication2015 IEEE Information Theory Workshop, ITW 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781479955268
DOIs
StatePublished - Jun 24 2015
Event2015 IEEE Information Theory Workshop, ITW 2015 - Jerusalem, Israel
Duration: Apr 26 2015May 1 2015

Publication series

Name2015 IEEE Information Theory Workshop, ITW 2015

Other

Other2015 IEEE Information Theory Workshop, ITW 2015
Country/TerritoryIsrael
CityJerusalem
Period4/26/155/1/15

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Networks and Communications
  • Information Systems
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Deep learning and the information bottleneck principle'. Together they form a unique fingerprint.

Cite this