TY - GEN
T1 - Deep learning and the information bottleneck principle
AU - Tishby, Naftali
AU - Zaslavsky, Noga
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/6/24
Y1 - 2015/6/24
N2 - Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds. The advantage of getting closer to the theoretical limit is quantifiable both by the generalization bound and by the network's simplicity. We argue that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer. The hierarchical representations at the layered network naturally correspond to the structural phase transitions along the information curve. We believe that this new insight can lead to new optimality bounds and deep learning algorithms.
AB - Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds. The advantage of getting closer to the theoretical limit is quantifiable both by the generalization bound and by the network's simplicity. We argue that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer. The hierarchical representations at the layered network naturally correspond to the structural phase transitions along the information curve. We believe that this new insight can lead to new optimality bounds and deep learning algorithms.
UR - http://www.scopus.com/inward/record.url?scp=84938946187&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84938946187&partnerID=8YFLogxK
U2 - 10.1109/ITW.2015.7133169
DO - 10.1109/ITW.2015.7133169
M3 - Conference contribution
AN - SCOPUS:84938946187
T3 - 2015 IEEE Information Theory Workshop, ITW 2015
BT - 2015 IEEE Information Theory Workshop, ITW 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2015 IEEE Information Theory Workshop, ITW 2015
Y2 - 26 April 2015 through 1 May 2015
ER -