Research Article
BibTex RIS Cite

WY-NET: A NEW APPROACH to IMAGE SYNTHESIS with GENERATIVE ADVERSARIAL NETWORKS

Year 2022, Issue: 050, 270 - 290, 30.09.2022

Abstract

Conditional image synthesis is the translation of images from different domains with the same dimensions into each other. Generative Adversarial Networks (GANs) are commonly used in translation studies in this field. With the classical GAN approach, data are transferred between the encoder and decoder of the generator network in the image translation. While this data transfer increases the quality of the translated image, it also leads to data dependency. This dependence has two negative effects: First, it prevents the understanding of whether the encoder or the decoder causes the error in the translated images, other causes the image synthesis quality to depend on the parameter increment of the network. In this study, two different architectures (dY-Net, uY-Net) are proposed. These architectures are developed on the principle of equalizing high-level feature parameters in cross-domain image translation. The first of these architectures concentrates on the speed of image synthesis, the other on its quality. There is a significant reduction in data dependency and parameter space in the dY-Net architecture, which concentrates on speed performance in image synthesis. The uY-Net architecture, which concentrates on image synthesis quality, attempts to maximize the results of metrics that measure quality like SSIM and PSNR. Three different datasets (Maps, Cityscapes, and Denim2Mustache) were used for performance testing of the proposed architectures and existing image synthesizing approaches. As a result of the tests, it has been seen that the proposed architecture synthesized images with similar accuracy, although it has approximately 66% parameters compared to DiscoGAN, which is one of the existing approaches. The results obtained show that WY-Net architectures, which provide high performance and translation quality, can be used in image synthesis.

Supporting Institution

Inonu University and Baykan Denim A.S.

Project Number

FKP-2021-2144

Thanks

This study was funded by the Baykan Denim A.Ş. and the Scientific Research Projects Department of Inonu University with the project number “FKP-2021-2144”. We would like to thank Baykan Denim A.Ş. and Inonu University.

References

  • [1] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S. and Bengio, Y., (2014), Generative adversarial nets. Advances in neural information processing systems, 27.
  • [2] Radford, A., Metz, L. and Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
  • [3] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A. and Shi, W. (2016), Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681-4690).
  • [4] Özdemir, D., and Arslan, N. N., (2022) Analysis of Deep Transfer Learning Methods for Early Diagnosis of the Covid-19 Disease with Chest X-ray Images. Düzce Üniversitesi Bilim ve Teknoloji Dergisi, 10(2), 628-640.
  • [5] Karras, T., Aila, T., Laine, S. and Lehtinen, J. (2017), Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.
  • [6] Zhang, H., Goodfellow, I., Metaxas, D. and Odena, A. (2018), Self-attention generative adversarial networks. In International conference on machine learning (pp. 7354-7363). PMLR.
  • [7] Karras, T., Laine, S. and Aila, T. (2018), A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948.
  • [8] Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J. and Aila, T. (2019), Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8110-8119).
  • [9] Karras, T., Aittala, M., Laine, S., Härkönen, E., Hellsten, J., Lehtinen, J. and Aila, T. (2021), Alias-free generative adversarial networks. arXiv preprint arXiv:2106.12423.
  • [10] Brock, A., Donahue, J. and Simonyan, K. (2018), Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096.
  • [11] Donahue, J. and Simonyan, K. (2019), Large scale adversarial representation learning. arXiv preprint arXiv:1907.02544.
  • [12] B. Li, Y. Zhu, Y. Wang, C. -W. Lin, B. Ghanem and L. Shen. (2021), AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation," in IEEE Transactions on Multimedia, doi: 10.1109/TMM.2021.3113786.
  • [13] Isola, P., Zhu, J. Y., Zhou, T. and Efros, A. A. (2017), Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1125-1134).
  • [14] Wang, T. C., Liu, M. Y., Zhu, J. Y., Tao, A., Kautz, J. and Catanzaro, B. (2018), High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8798-8807).
  • [15] Park, T., Liu, M. Y., Wang, T. C. and Zhu, J. Y. (2019), Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2337-2346).
  • [16] Zhu, J. Y., Park, T., Isola, P. and Efros, A. A. (2017), Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223-2232).
  • [17] Li, Y., Singh, K. K., Ojha, U. and Lee, Y. J. (2020), Mixnmatch: Multifactor disentanglement and encoding for conditional image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8039-8048).
  • [18] Dundar, A., Sapra, K., Liu, G., Tao, A. and Catanzaro, B. (2020), Panoptic-based image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8070-8079).
  • [19] Wang, X., Li, Y., Zhang, H. and Shan, Y. (2021), Towards Real-World Blind Face Restoration with Generative Facial Prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9168-9178).
  • [20] Wang, X., Xie, L., Dong, C. and Shan, Y. (2021), Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1905-1914).
  • [21] Suvorov, R., Logacheva, E., Mashikhin, A., Remizova, A., Ashukha, A., Silvestrov, A. and Lempitsky, V. (2021), Resolution-robust Large Mask Inpainting with Fourier Convolutions. arXiv preprint arXiv:2109.07161.
  • [22] Kim, T., Cha, M., Kim, H., Lee, J. K., and Kim, J. (2017, July), Learning to discover cross-domain relations with generative adversarial networks. In International Conference on Machine Learning (pp. 1857-1865). PMLR.
  • [23] Nilsson, J. and Akenine-Möller, T. (2020), Understanding ssim. arXiv preprint arXiv:2006.13846.
  • [24] Mihelich, M., Dognin, C., Shu, Y., and Blot, M. (2020), A Characterization of Mean Squared Error for Estimator with Bagging. In International Conference on Artificial Intelligence and Statistics (pp. 288-297). PMLR.
  • [25] Willmott, C. J. and Matsuura, K. (2005), Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate research, 30(1), 79-82.
  • [26] Fardo, F. A., Conforto, V. H., de Oliveira, F. C. and Rodrigues, P. S. (2016). A formal evaluation of PSNR as quality measurement parameter for image segmentation algorithms. arXiv preprint arXiv:1605.07116.
  • [27] Bailer, C., Varanasi, K., and Stricker, D. (2017), CNN-based patch matching for optical flow with thresholded hinge embedding loss. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3250-3259).
  • [28] Ronneberger, O., Fischer, P., & Brox, T. (2015, October), U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234-241). Springer, Cham.
  • [29] Lim, J. H., and Ye, J. C. (2017), Geometric gan. arXiv preprint arXiv:1705.02894.
  • [30] Şahin, E., TALU, M. F., (2021), Bıyık Deseni Üretiminde Çekişmeli Üretici Ağların Performans Karşılaştırması. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 10(4), 1575-1589. Doi: 10.17798/bitlisfen.98586.
  • [31] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R. and Schiele, B. (2016), The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213-3223).
Year 2022, Issue: 050, 270 - 290, 30.09.2022

Abstract

Project Number

FKP-2021-2144

References

  • [1] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S. and Bengio, Y., (2014), Generative adversarial nets. Advances in neural information processing systems, 27.
  • [2] Radford, A., Metz, L. and Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
  • [3] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A. and Shi, W. (2016), Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681-4690).
  • [4] Özdemir, D., and Arslan, N. N., (2022) Analysis of Deep Transfer Learning Methods for Early Diagnosis of the Covid-19 Disease with Chest X-ray Images. Düzce Üniversitesi Bilim ve Teknoloji Dergisi, 10(2), 628-640.
  • [5] Karras, T., Aila, T., Laine, S. and Lehtinen, J. (2017), Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.
  • [6] Zhang, H., Goodfellow, I., Metaxas, D. and Odena, A. (2018), Self-attention generative adversarial networks. In International conference on machine learning (pp. 7354-7363). PMLR.
  • [7] Karras, T., Laine, S. and Aila, T. (2018), A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948.
  • [8] Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J. and Aila, T. (2019), Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8110-8119).
  • [9] Karras, T., Aittala, M., Laine, S., Härkönen, E., Hellsten, J., Lehtinen, J. and Aila, T. (2021), Alias-free generative adversarial networks. arXiv preprint arXiv:2106.12423.
  • [10] Brock, A., Donahue, J. and Simonyan, K. (2018), Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096.
  • [11] Donahue, J. and Simonyan, K. (2019), Large scale adversarial representation learning. arXiv preprint arXiv:1907.02544.
  • [12] B. Li, Y. Zhu, Y. Wang, C. -W. Lin, B. Ghanem and L. Shen. (2021), AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation," in IEEE Transactions on Multimedia, doi: 10.1109/TMM.2021.3113786.
  • [13] Isola, P., Zhu, J. Y., Zhou, T. and Efros, A. A. (2017), Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1125-1134).
  • [14] Wang, T. C., Liu, M. Y., Zhu, J. Y., Tao, A., Kautz, J. and Catanzaro, B. (2018), High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8798-8807).
  • [15] Park, T., Liu, M. Y., Wang, T. C. and Zhu, J. Y. (2019), Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2337-2346).
  • [16] Zhu, J. Y., Park, T., Isola, P. and Efros, A. A. (2017), Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223-2232).
  • [17] Li, Y., Singh, K. K., Ojha, U. and Lee, Y. J. (2020), Mixnmatch: Multifactor disentanglement and encoding for conditional image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8039-8048).
  • [18] Dundar, A., Sapra, K., Liu, G., Tao, A. and Catanzaro, B. (2020), Panoptic-based image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8070-8079).
  • [19] Wang, X., Li, Y., Zhang, H. and Shan, Y. (2021), Towards Real-World Blind Face Restoration with Generative Facial Prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9168-9178).
  • [20] Wang, X., Xie, L., Dong, C. and Shan, Y. (2021), Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1905-1914).
  • [21] Suvorov, R., Logacheva, E., Mashikhin, A., Remizova, A., Ashukha, A., Silvestrov, A. and Lempitsky, V. (2021), Resolution-robust Large Mask Inpainting with Fourier Convolutions. arXiv preprint arXiv:2109.07161.
  • [22] Kim, T., Cha, M., Kim, H., Lee, J. K., and Kim, J. (2017, July), Learning to discover cross-domain relations with generative adversarial networks. In International Conference on Machine Learning (pp. 1857-1865). PMLR.
  • [23] Nilsson, J. and Akenine-Möller, T. (2020), Understanding ssim. arXiv preprint arXiv:2006.13846.
  • [24] Mihelich, M., Dognin, C., Shu, Y., and Blot, M. (2020), A Characterization of Mean Squared Error for Estimator with Bagging. In International Conference on Artificial Intelligence and Statistics (pp. 288-297). PMLR.
  • [25] Willmott, C. J. and Matsuura, K. (2005), Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate research, 30(1), 79-82.
  • [26] Fardo, F. A., Conforto, V. H., de Oliveira, F. C. and Rodrigues, P. S. (2016). A formal evaluation of PSNR as quality measurement parameter for image segmentation algorithms. arXiv preprint arXiv:1605.07116.
  • [27] Bailer, C., Varanasi, K., and Stricker, D. (2017), CNN-based patch matching for optical flow with thresholded hinge embedding loss. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3250-3259).
  • [28] Ronneberger, O., Fischer, P., & Brox, T. (2015, October), U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234-241). Springer, Cham.
  • [29] Lim, J. H., and Ye, J. C. (2017), Geometric gan. arXiv preprint arXiv:1705.02894.
  • [30] Şahin, E., TALU, M. F., (2021), Bıyık Deseni Üretiminde Çekişmeli Üretici Ağların Performans Karşılaştırması. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 10(4), 1575-1589. Doi: 10.17798/bitlisfen.98586.
  • [31] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R. and Schiele, B. (2016), The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213-3223).
There are 31 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Research Articles
Authors

Emrullah Şahin 0000-0002-3390-6285

Muhammed Fatih Talu 0000-0003-1166-8404

Project Number FKP-2021-2144
Publication Date September 30, 2022
Submission Date July 21, 2022
Published in Issue Year 2022 Issue: 050

Cite

IEEE E. Şahin and M. F. Talu, “WY-NET: A NEW APPROACH to IMAGE SYNTHESIS with GENERATIVE ADVERSARIAL NETWORKS”, JSR-A, no. 050, pp. 270–290, September 2022.