TY - JOUR
T1 - URCA-GAN
T2 - UpSample Residual Channel-wise Attention Generative Adversarial Network for image-to-image translation
AU - Nie, Xuan
AU - Ding, Haoxuan
AU - Qi, Manhua
AU - Wang, Yifei
AU - Wong, Edward K.
N1 - Funding Information:
This work was supported by the 2020 key research and development plan of Shaanxi Province under Project 2020ZDLSF04-02.
Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/7/5
Y1 - 2021/7/5
N2 - Multimodal image-to-image translation is a challenging topic in computer vision. In image-to-image translation, an image is translated from a source domain to different target domains. For many translation tasks, the difference between the source image and the target image is only in the foreground. In this paper, we propose a novel deep-learning-based method for image-to-image translation. Our method, named URCA-GAN, is based on a generative adversarial network and it can generate images of higher quality and diversity than existing methods. We introduce Upsample Residual Channel-wise Attention Blocks (URCABs), based on ResNet and softmax channel-wise attention, to extract features associated with the foreground. The URCABs form a parallel architecture module named Upsample Residual Channel-wise Attention Module (URCAM) to merge features from the URCABs. URCAM is embedded after the decoder in the generator to regulate image generation. Experimental results and quantitative evaluations showed that our model has better performance than current state-of-the-art methods in both quality and diversity. Especially, the LPIPS, PSNR, and SSIM of URCA-GAN on CelebA dataset increase by 1.31%,1.66%, and 4.74% respectively, the PSNR and SSIM on RaFD dataset increase by 1.35% and 6.71% respectively. In addition, visualization of the features from URCABs demonstrates that our model puts emphasis on the foreground features.
AB - Multimodal image-to-image translation is a challenging topic in computer vision. In image-to-image translation, an image is translated from a source domain to different target domains. For many translation tasks, the difference between the source image and the target image is only in the foreground. In this paper, we propose a novel deep-learning-based method for image-to-image translation. Our method, named URCA-GAN, is based on a generative adversarial network and it can generate images of higher quality and diversity than existing methods. We introduce Upsample Residual Channel-wise Attention Blocks (URCABs), based on ResNet and softmax channel-wise attention, to extract features associated with the foreground. The URCABs form a parallel architecture module named Upsample Residual Channel-wise Attention Module (URCAM) to merge features from the URCABs. URCAM is embedded after the decoder in the generator to regulate image generation. Experimental results and quantitative evaluations showed that our model has better performance than current state-of-the-art methods in both quality and diversity. Especially, the LPIPS, PSNR, and SSIM of URCA-GAN on CelebA dataset increase by 1.31%,1.66%, and 4.74% respectively, the PSNR and SSIM on RaFD dataset increase by 1.35% and 6.71% respectively. In addition, visualization of the features from URCABs demonstrates that our model puts emphasis on the foreground features.
KW - Generative adversarial network (GAN)
KW - Multimodal image-to-image translation
KW - Softmax channel-wise attention mechanism
UR - http://www.scopus.com/inward/record.url?scp=85103128400&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103128400&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2021.02.054
DO - 10.1016/j.neucom.2021.02.054
M3 - Article
AN - SCOPUS:85103128400
SN - 0925-2312
VL - 443
SP - 75
EP - 84
JO - Neurocomputing
JF - Neurocomputing
ER -