TY - GEN
T1 - Long-term prediction of μeCOG signals with a spatio-temporal pyramid of adversarial convolutional networks
AU - Wang, Ran
AU - Song, Yilin
AU - Wang, Yao
AU - Viventi, Jonathan
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/5/23
Y1 - 2018/5/23
N2 - Video prediction into sufficiently long future has many potential applications. Modeling long-term dynamics for times series is challenging with convolution neural network structure, which is usually good for capturing short-term dependencies. In this work, we propose to embed the convolutional neural network within a spatial-temporal pyramid structure, to exploit both long-term and short-term temporal dependency and capture both macro-scale and micro-scale spatial structures. The prediction at a given scale is conditioned on the features extracted from a lower scale and past observations from the current scale. In order to overcome the blurry issue caused by the mean square error loss, we add a critic model with Wasserstein distance based adversarial loss to complement MSE. We compare our spatio-temporal pyramid model against a single scale convolution network as well as a model with multiple spatial scales only, and demonstrate that our pyramid structure performs better for predicting up to 24 future frames.
AB - Video prediction into sufficiently long future has many potential applications. Modeling long-term dynamics for times series is challenging with convolution neural network structure, which is usually good for capturing short-term dependencies. In this work, we propose to embed the convolutional neural network within a spatial-temporal pyramid structure, to exploit both long-term and short-term temporal dependency and capture both macro-scale and micro-scale spatial structures. The prediction at a given scale is conditioned on the features extracted from a lower scale and past observations from the current scale. In order to overcome the blurry issue caused by the mean square error loss, we add a critic model with Wasserstein distance based adversarial loss to complement MSE. We compare our spatio-temporal pyramid model against a single scale convolution network as well as a model with multiple spatial scales only, and demonstrate that our pyramid structure performs better for predicting up to 24 future frames.
KW - ECoG
KW - Machine learning
KW - Video prediction
UR - http://www.scopus.com/inward/record.url?scp=85048092150&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048092150&partnerID=8YFLogxK
U2 - 10.1109/ISBI.2018.8363813
DO - 10.1109/ISBI.2018.8363813
M3 - Conference contribution
AN - SCOPUS:85048092150
T3 - Proceedings - International Symposium on Biomedical Imaging
SP - 1313
EP - 1317
BT - 2018 IEEE 15th International Symposium on Biomedical Imaging, ISBI 2018
PB - IEEE Computer Society
T2 - 15th IEEE International Symposium on Biomedical Imaging, ISBI 2018
Y2 - 4 April 2018 through 7 April 2018
ER -