TY - GEN
T1 - Tunable efficient unitary neural networks (EUNN) and their application to RNNs
AU - Jing, Li
AU - Shen, Yichen
AU - Dubcek, Tena
AU - Peurifoy, John
AU - Skirlo, Scott
AU - LeCun, Yann
AU - Tegmark, Max
AU - Soljačić, Marin
N1 - Funding Information:
We thank Hugo Larochelle and Yoshua Bengio for helpful discussions and comments. This work was partially supported by the Army Research Office through the Institute for Soldier Nanotechnologies under contract W911NF-13-D0001, the National Science Foundation under Grant No. CCF-1640012 and the Roth-berg Family Fund for Cognitive Science.
Publisher Copyright:
© Copyright 2017 by the author(s).
Copyright:
Copyright 2018 Elsevier B.V., All rights reserved.
PY - 2017
Y1 - 2017
N2 - Using unitary (instead of general) matrices in artificial neural networks (ANNs) is a promising way to solve the gradient explosion/vanishing problem, as well as to enable ANNs to learn long-term correlations in the data. This approach appears particularly promising for Recurrent Neural Networks (RNNs). In this work, we present a new architecture for implementing an Efficient Unitary Neural Network (EUNNs); its main advantages can be summarized as follows. Firstly, the representation capacity of the unitary space in an EUNN is fully tunable, ranging from a subspace of SU(N) to the entire unitary space. Secondly, the computational complexity for training an EUNN is merely 0(1) per parameter. Finally, we test the performance of EUNNs on the standard copying task, the pixel-permuted MNIST digit recognition benchmark as well as the Speech Prediction Test (TIMIT). We find that our architecture significantly outperforms both other state-of-the-art unitary RNNs and the LSTM architecture, in terms of the final performance and/or the wall-clock training speed. EUNNs are thus promising alternatives to RNNs and LSTMs for a wide variety of applications.
AB - Using unitary (instead of general) matrices in artificial neural networks (ANNs) is a promising way to solve the gradient explosion/vanishing problem, as well as to enable ANNs to learn long-term correlations in the data. This approach appears particularly promising for Recurrent Neural Networks (RNNs). In this work, we present a new architecture for implementing an Efficient Unitary Neural Network (EUNNs); its main advantages can be summarized as follows. Firstly, the representation capacity of the unitary space in an EUNN is fully tunable, ranging from a subspace of SU(N) to the entire unitary space. Secondly, the computational complexity for training an EUNN is merely 0(1) per parameter. Finally, we test the performance of EUNNs on the standard copying task, the pixel-permuted MNIST digit recognition benchmark as well as the Speech Prediction Test (TIMIT). We find that our architecture significantly outperforms both other state-of-the-art unitary RNNs and the LSTM architecture, in terms of the final performance and/or the wall-clock training speed. EUNNs are thus promising alternatives to RNNs and LSTMs for a wide variety of applications.
UR - http://www.scopus.com/inward/record.url?scp=85048389993&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048389993&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85048389993
T3 - 34th International Conference on Machine Learning, ICML 2017
SP - 2753
EP - 2761
BT - 34th International Conference on Machine Learning, ICML 2017
PB - International Machine Learning Society (IMLS)
T2 - 34th International Conference on Machine Learning, ICML 2017
Y2 - 6 August 2017 through 11 August 2017
ER -