TY - GEN
T1 - Stimulus Speech Decoding from Human Cortex with Generative Adversarial Network Transfer Learning
AU - Wang, Ran
AU - Chen, Xupeng
AU - Khalilian-Gourtani, Amirhossein
AU - Chen, Zhaoxi
AU - Yu, Leyao
AU - Flinker, Adeen
AU - Wang, Yao
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/4
Y1 - 2020/4
N2 - Decoding auditory stimulus from neural activity can enable neuroprosthetics and direct communication with the brain. Some recent studies have shown successful speech decoding from intracranial recording using deep learning models. However, scarcity of training data leads to low quality speech reconstruction which prevents a complete brain-computer-interface (BCI) application. In this work, we propose a transfer learning approach with a pre-trained GAN to disentangle representation and generation layers for decoding. We first pre-train a generator to produce spectrograms from a representation space using a large corpus of natural speech data. With a small amount of paired data containing the stimulus speech and corresponding ECoG signals, we then transfer it to a bigger network with an encoder attached before, which maps the neural signal to the representation space. To further improve the network generalization ability, we introduce a Gaussian prior distribution regularizer on the latent representation during the transfer phase. With at most 150 training samples for each tested subject, we achieve a state-of-the-art decoding performance. By visualizing the attention mask embedded in the encoder, we observe brain dynamics that are consistent with findings from previous studies investigating dynamics in the superior temporal gyrus (STG), pre-central gyrus (motor) and inferior frontal gyrus (IFG). Our findings demonstrate a high reconstruction accuracy using deep learning networks together with the potential to elucidate interactions across different brain regions during a cognitive task.
AB - Decoding auditory stimulus from neural activity can enable neuroprosthetics and direct communication with the brain. Some recent studies have shown successful speech decoding from intracranial recording using deep learning models. However, scarcity of training data leads to low quality speech reconstruction which prevents a complete brain-computer-interface (BCI) application. In this work, we propose a transfer learning approach with a pre-trained GAN to disentangle representation and generation layers for decoding. We first pre-train a generator to produce spectrograms from a representation space using a large corpus of natural speech data. With a small amount of paired data containing the stimulus speech and corresponding ECoG signals, we then transfer it to a bigger network with an encoder attached before, which maps the neural signal to the representation space. To further improve the network generalization ability, we introduce a Gaussian prior distribution regularizer on the latent representation during the transfer phase. With at most 150 training samples for each tested subject, we achieve a state-of-the-art decoding performance. By visualizing the attention mask embedded in the encoder, we observe brain dynamics that are consistent with findings from previous studies investigating dynamics in the superior temporal gyrus (STG), pre-central gyrus (motor) and inferior frontal gyrus (IFG). Our findings demonstrate a high reconstruction accuracy using deep learning networks together with the potential to elucidate interactions across different brain regions during a cognitive task.
KW - electrocorticographic (ECoG)
KW - generative adversarial networks (GAN)
KW - speech decoding
KW - superior temporal gyrus (STG)
KW - transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85085860019&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85085860019&partnerID=8YFLogxK
U2 - 10.1109/ISBI45749.2020.9098589
DO - 10.1109/ISBI45749.2020.9098589
M3 - Conference contribution
AN - SCOPUS:85085860019
T3 - Proceedings - International Symposium on Biomedical Imaging
SP - 390
EP - 394
BT - ISBI 2020 - 2020 IEEE International Symposium on Biomedical Imaging
PB - IEEE Computer Society
T2 - 17th IEEE International Symposium on Biomedical Imaging, ISBI 2020
Y2 - 3 April 2020 through 7 April 2020
ER -