Multimodal image-to-image translation is a challenging topic in computer vision. In image-to-image translation, an image is translated from a source domain to different target domains. For many translation tasks, the difference between the source image and the target image is only in the foreground. In this paper, we propose a novel deep-learning-based method for image-to-image translation. Our method, named URCA-GAN, is based on a generative adversarial network and it can generate images of higher quality and diversity than existing methods. We introduce Upsample Residual Channel-wise Attention Blocks (URCABs), based on ResNet and softmax channel-wise attention, to extract features associated with the foreground. The URCABs form a parallel architecture module named Upsample Residual Channel-wise Attention Module (URCAM) to merge features from the URCABs. URCAM is embedded after the decoder in the generator to regulate image generation. Experimental results and quantitative evaluations showed that our model has better performance than current state-of-the-art methods in both quality and diversity. Especially, the LPIPS, PSNR, and SSIM of URCA-GAN on CelebA dataset increase by 1.31%,1.66%, and 4.74% respectively, the PSNR and SSIM on RaFD dataset increase by 1.35% and 6.71% respectively. In addition, visualization of the features from URCABs demonstrates that our model puts emphasis on the foreground features.
- Generative adversarial network (GAN)
- Multimodal image-to-image translation
- Softmax channel-wise attention mechanism
ASJC Scopus subject areas
- Computer Science Applications
- Cognitive Neuroscience
- Artificial Intelligence