A Hybrid Conditional GAN Design for Image-to-Image Translation Integrating U-Net and ResNet
Year 2025,
Volume: 4 Issue: 3, 557 - 579, 20.10.2025
Khaled Al Hariri
,
Muhammet Paşaoğlu
,
Erkut Arıcan
Abstract
Image-to-image translation is one of the major image processing tasks in the computer vision field that can be utilized in many types of applications such as style transfer, image enhancement, and more. This study introduces a novel approach for image-to-image translation based on a conditional generator adversarial network with a new hybrid generator architecture that combines the U-Net and ResNet architectures. This combination allows the model to benefit from both of their advantages due to their high compatibility. The discriminator uses the PatchGAN architecture for patch-wise discrimination. The model was evaluated by using the SSIM and PSNR which are standard metrics for image quality evaluation. The results are also compared to previous work that uses the same evaluation criteria and datasets. Furthermore, a public survey was conducted in which the participants were asked to choose the image that most closely resembled the target image between the proposed model and another study. The outcome of both the evaluation metrics and the public survey successfully demonstrated that the proposed image-to-image translation method is superior to that of previous studies.
Ethical Statement
There is no need for an ethics committee approval in the prepared article. There is no conflict of interest with any person/institution in the prepared article.
References
-
H. Hoyez, C. Schockaert, J. Rambach, B. Mirbach, and D. Stricker, “Unsupervised image-to-image translation: A review,” Sensors, vol. 22, no. 21, p. 8540, 2022.
-
P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, 2017.
-
E. U. R. Mohammed, N. R. Soora, and S. W. Mohammed, “A comprehensive literature review on convolutional neural networks,” Computer Science Publications, 2022.
-
A. Kamil, and T. Shaikh, “Literature review of generative models for image-to-image translation problems,” in Proc. Int. Conf. Comput. Intell. Knowl. Economy (ICCIKE), Dubai, United Arab Emirates, 2019.
-
M. Mirza, and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint, arXiv:1411.1784, 2014.
-
C. Koç, and F. Özyurt, “An examination of synthetic images produced with DCGAN according to the size of data and epoch,” Firat Univ. J. Exp. Comput. Eng., vol. 2, no. 1, pp. 32–37, 2023.
-
I. Goodfellow, J. Pouget-Abadie, M. Mirza, X. Bing, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 27, pp. 2672–2680, 2014.
-
G. Perarnau, J. Van De Weijer, B. Raducanu, and J. M. Álvarez, “Invertible conditional GANs for image editing,” arXiv preprint, arXiv:1611.06355, 2016.
-
A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classifier GANs,” in Proc. Int. Conf. Mach. Learn. (ICML), Sydney, Australia, 2017.
-
O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Med. Image Comput. Comput.-Assist. Interv. – MICCAI 2015, Munich, Germany, 2015.
-
Y. Ji, H. Zhang, and Q. M. J. Wu, “Saliency detection via conditional adversarial image-to-image network,” Neurocomputing, vol. 316, pp. 357–368, 2018.
-
X. Mao, S. Wang, L. Zheng, and Q. Huang, “Semantic invariant cross-domain image generation with generative adversarial networks,” Neurocomputing, vol. 293, pp. 55–63, 2018.
-
Y. Gan, J. Gong, M. Ye, Y. Qian, and K. Liu, “Unpaired cross-domain image translation with augmented auxiliary domain information,” Neurocomputing, vol. 316, pp. 112–123, 2018.
-
S. Mo, M. Cho, and J. Shin, “InstaGAN: Instance-aware image-to-image translation,” arXiv preprint, arXiv:1812.10889, 2018.
-
Y. Cho, R. Malav, G. Pandey, and A. Kim, “DehazeGAN: Underwater haze image restoration using unpaired image-to-image translation,” IFAC-PapersOnLine, vol. 52, no. 21, pp. 82–85, 2019.
-
D. Yang, S. Hong, Y. Jang, T. Zhao, and H. Lee, “Diversity-sensitive conditional generative adversarial networks,” arXiv preprint, arXiv:1901.09024, 2019.
-
M.-Y. Liu, X. Huang, A. Mallya, T. Karras, T. Aila, J. Lehtinen, and J. Kautz, “Few-shot unsupervised image-to-image translation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Seoul, South Korea, 2019.
-
W. Xu, S. Keshmiri, and G. Wang, “Toward learning a unified many-to-many mapping for diverse image translation,” Pattern Recognit., vol. 93, pp. 570–580, 2019.
-
L. Ye, B. Zhang, M. Yang, and W. Lian, “Triple-translation GAN with multi-layer sparse representation for face image synthesis,” Neurocomputing, vol. 358, pp. 294–308, 2019.
-
Y. Choi, Y. Uh, J. Yoo, and J.-W. Ha, “StarGAN v2: Diverse image synthesis for multiple domains,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Seattle, WA, USA, 2020.
-
P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Pix2Pix datasets,” UC Berkeley, Feb. 9, 2017. [Online]. Available: https://efrosgans.eecs.berkeley.edu/pix2pix/datasets/. [Accessed: Apr. 2, 2024].
-
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, 2016.
-
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.
-
S. Mallat, “Compression,” in A Wavelet Tour of Signal Processing, 3rd ed., Boston, MA, USA: Academic Press, 2009, pp. 481–533.
U-Net ve ResNet Entegrasyonu ile Görüntüden Görüntüye Dönüşüm için Hibrit Koşullu GAN Tasarımı
Year 2025,
Volume: 4 Issue: 3, 557 - 579, 20.10.2025
Khaled Al Hariri
,
Muhammet Paşaoğlu
,
Erkut Arıcan
Abstract
Görüntüden görüntüye dönüşüm, bilgisayarla görme alanının temel görüntü işleme görevlerinden biri olup, stil transferi, görüntü iyileştirme ve benzeri birçok uygulamada kullanılabilmektedir. Bu çalışmada, koşullu üretici çekişmeli ağlara (Conditional GAN) dayalı ve U-Net ile ResNet mimarilerini bir araya getiren hibrit bir üretici mimarisi üzerine kurulu yeni bir yaklaşım sunulmaktadır. Bu birleşim, yüksek uyumlulukları sayesinde modelin her iki mimarinin avantajlarından yararlanmasına olanak sağlamaktadır. Ayırt edici ağ ise PatchGAN mimarisi üzerine inşa edilerek yama bazlı ayrım yapmaktadır. Modelin performansı, görüntü kalitesi değerlendirmesinde standart kabul edilen SSIM ve PSNR metrikleri kullanılarak ölçülmüş, ayrıca aynı ölçütler ve veri kümeleri üzerinden değerlendirilen önceki çalışmalarla karşılaştırılmıştır. Bunun yanı sıra, katılımcılardan önerilen modelin çıktıları ile başka bir çalışmanın çıktıları arasından hedef görüntüye en çok benzeyeni seçmelerinin istendiği bir kamuoyu anketi de yapılmıştır. Hem değerlendirme metriklerinden hem de kamuoyu anketinden elde edilen bulgular, önerilen görüntüden görüntüye dönüşüm yöntemimizin önceki çalışmalara kıyasla üstün olduğunu açıkça ortaya koymaktadır.
Ethical Statement
Hazırlanan makalede etik kurul onayına gerek yoktur. Hazırlanan makalede herhangi bir kişi/kurumla çıkar çatışması bulunmamaktadır.
References
-
H. Hoyez, C. Schockaert, J. Rambach, B. Mirbach, and D. Stricker, “Unsupervised image-to-image translation: A review,” Sensors, vol. 22, no. 21, p. 8540, 2022.
-
P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, 2017.
-
E. U. R. Mohammed, N. R. Soora, and S. W. Mohammed, “A comprehensive literature review on convolutional neural networks,” Computer Science Publications, 2022.
-
A. Kamil, and T. Shaikh, “Literature review of generative models for image-to-image translation problems,” in Proc. Int. Conf. Comput. Intell. Knowl. Economy (ICCIKE), Dubai, United Arab Emirates, 2019.
-
M. Mirza, and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint, arXiv:1411.1784, 2014.
-
C. Koç, and F. Özyurt, “An examination of synthetic images produced with DCGAN according to the size of data and epoch,” Firat Univ. J. Exp. Comput. Eng., vol. 2, no. 1, pp. 32–37, 2023.
-
I. Goodfellow, J. Pouget-Abadie, M. Mirza, X. Bing, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 27, pp. 2672–2680, 2014.
-
G. Perarnau, J. Van De Weijer, B. Raducanu, and J. M. Álvarez, “Invertible conditional GANs for image editing,” arXiv preprint, arXiv:1611.06355, 2016.
-
A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classifier GANs,” in Proc. Int. Conf. Mach. Learn. (ICML), Sydney, Australia, 2017.
-
O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Med. Image Comput. Comput.-Assist. Interv. – MICCAI 2015, Munich, Germany, 2015.
-
Y. Ji, H. Zhang, and Q. M. J. Wu, “Saliency detection via conditional adversarial image-to-image network,” Neurocomputing, vol. 316, pp. 357–368, 2018.
-
X. Mao, S. Wang, L. Zheng, and Q. Huang, “Semantic invariant cross-domain image generation with generative adversarial networks,” Neurocomputing, vol. 293, pp. 55–63, 2018.
-
Y. Gan, J. Gong, M. Ye, Y. Qian, and K. Liu, “Unpaired cross-domain image translation with augmented auxiliary domain information,” Neurocomputing, vol. 316, pp. 112–123, 2018.
-
S. Mo, M. Cho, and J. Shin, “InstaGAN: Instance-aware image-to-image translation,” arXiv preprint, arXiv:1812.10889, 2018.
-
Y. Cho, R. Malav, G. Pandey, and A. Kim, “DehazeGAN: Underwater haze image restoration using unpaired image-to-image translation,” IFAC-PapersOnLine, vol. 52, no. 21, pp. 82–85, 2019.
-
D. Yang, S. Hong, Y. Jang, T. Zhao, and H. Lee, “Diversity-sensitive conditional generative adversarial networks,” arXiv preprint, arXiv:1901.09024, 2019.
-
M.-Y. Liu, X. Huang, A. Mallya, T. Karras, T. Aila, J. Lehtinen, and J. Kautz, “Few-shot unsupervised image-to-image translation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Seoul, South Korea, 2019.
-
W. Xu, S. Keshmiri, and G. Wang, “Toward learning a unified many-to-many mapping for diverse image translation,” Pattern Recognit., vol. 93, pp. 570–580, 2019.
-
L. Ye, B. Zhang, M. Yang, and W. Lian, “Triple-translation GAN with multi-layer sparse representation for face image synthesis,” Neurocomputing, vol. 358, pp. 294–308, 2019.
-
Y. Choi, Y. Uh, J. Yoo, and J.-W. Ha, “StarGAN v2: Diverse image synthesis for multiple domains,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Seattle, WA, USA, 2020.
-
P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Pix2Pix datasets,” UC Berkeley, Feb. 9, 2017. [Online]. Available: https://efrosgans.eecs.berkeley.edu/pix2pix/datasets/. [Accessed: Apr. 2, 2024].
-
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, 2016.
-
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.
-
S. Mallat, “Compression,” in A Wavelet Tour of Signal Processing, 3rd ed., Boston, MA, USA: Academic Press, 2009, pp. 481–533.