TY - CONF
T1 - Normalizing the normalizers
T2 - 5th International Conference on Learning Representations, ICLR 2017
AU - Ren, Mengye
AU - Liao, Renjie
AU - Urtasun, Raquel
AU - Sinz, Fabian H.
AU - Zemel, Richard S.
N1 - Funding Information:
Acknowledgements RL is supported by Connaught International Scholarships. FS would like to thank Edgar Y. Walker, Shuang Li, Andreas Tolias and Alex Ecker for helpful discussions. Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/Interior Business Center (DoI/IBC) contract number D16PC00003. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/IBC, or the U.S. Government.
Publisher Copyright:
© ICLR 2019 - Conference Track Proceedings. All rights reserved.
PY - 2017
Y1 - 2017
N2 - Normalization techniques have only recently begun to be exploited in supervised learning tasks. Batch normalization exploits mini-batch statistics to normalize the activations. This was shown to speed up training and result in better models. However its success has been very limited when dealing with recurrent neural networks. On the other hand, layer normalization normalizes the activations across all activities within a layer. This was shown to work well in the recurrent setting. In this paper we propose a unified view of normalization techniques, as forms of divisive normalization, which includes layer and batch normalization as special cases. Our second contribution is the finding that a small modification to these normalization schemes, in conjunction with a sparse regularizer on the activations, leads to significant benefits over standard normalization techniques. We demonstrate the effectiveness of our unified divisive normalization framework in the context of convolutional neural nets and recurrent neural networks, showing improvements over baselines in image classification, language modeling as well as super-resolution.
AB - Normalization techniques have only recently begun to be exploited in supervised learning tasks. Batch normalization exploits mini-batch statistics to normalize the activations. This was shown to speed up training and result in better models. However its success has been very limited when dealing with recurrent neural networks. On the other hand, layer normalization normalizes the activations across all activities within a layer. This was shown to work well in the recurrent setting. In this paper we propose a unified view of normalization techniques, as forms of divisive normalization, which includes layer and batch normalization as special cases. Our second contribution is the finding that a small modification to these normalization schemes, in conjunction with a sparse regularizer on the activations, leads to significant benefits over standard normalization techniques. We demonstrate the effectiveness of our unified divisive normalization framework in the context of convolutional neural nets and recurrent neural networks, showing improvements over baselines in image classification, language modeling as well as super-resolution.
UR - http://www.scopus.com/inward/record.url?scp=85088228386&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85088228386&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85088228386
Y2 - 24 April 2017 through 26 April 2017
ER -