Araştırma Makalesi
BibTex RIS Kaynak Göster

An unconditional generative model with self-attention module for single image generation

Yıl 2024, Cilt: 13 Sayı: 1, 196 - 204, 15.01.2024
https://doi.org/10.28948/ngumuh.1367602

Öz

Generative Adversarial Networks (GANs) have revolutionized the field of deep learning by enabling the production of high-quality synthetic data. However, the effectiveness of GANs largely depends on the size and quality of training data. In many real-world applications, collecting large amounts of high-quality training data is time-consuming, and expensive. Accordingly, in recent years, GAN models that use limited data have begun to be developed. In this study, we propose a GAN model that can learn from a single training image. Our model is based on the principle of multiple GANs operating sequentially at different scales, where each GAN learns the features of the training image and transfers them to the next GAN, ultimately generating examples with different realistic structures at the final scale. In our model, we utilized a self-attention and new scaling method to increase the realism and quality of the generated images. The experimental results show that our model performs image generation successfully. In addition, we demonstrated the robustness of our model by testing it in different image manipulation applications. As a result, our model can successfully produce realistic, high-quality, diverse images from a single training image, providing short training time and good training stability.

Kaynakça

  • S. J. Pan, Q. Yang, A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10), 1345-1359, 2010. 10.1109/TKDE.2009.191
  • C. Finn, P. Abbeel, and S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks. International conference on machine learning PMLR, pp. 1126-1135, Sydney, Australia 6-11 August 2017.
  • C. Shorten and T. M. Khoshgoftaar, A survey on image data augmentation for deep learning. Journal of big data, 6(1), 1-48, 2019. https://doi.org/10.1186/s40537-019-0197-0
  • S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Jozefowicz and S. Bengio, Generating sentences from a continuous space. 20th SIGNLL conference on computational natural language learning, CoNLL 2016. Association for computational linguistics (ACL), pp. 10-21, Berlin, Germany, 11-12 August 2016.
  • F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr and T. M. Hospedales, Learning to compare: relation network for few-shot learning. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1199-1208, Salt Lake City, USA, 18-23 June 2018.
  • I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A.C. Courville and Y. Bengio, Generative adversarial networks. Communications of the ACM, 63(11), 139-144, 2020. https://doi.org/10.1145/3422622
  • T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford and X. Chen, Improved techniques for training gans. Advances in neural information processing systems, Barcelona, Spain, 5-10 December, 2016.
  • M. Arjovsky and L. Bottou, Towards principled methods for training generative adversarial networks. Advances in neural Information Processing Systems, Barcelona, Spain, 5-10 December, 2016.
  • Z. Zhang, M. Li and J. Yu, On the convergence and mode collapse of GAN. SIGGRAPH asia 2018 technical briefs, pp. 1-4, Tokyo, Japan, 4-8 December 2018.
  • T.R. Shaham, T. Dekel and T. Michaeli, SinGAN: Learning a generative model from a single natural ımage. Proceedings of the IEEE/CVF ınternational conference on computer vision, pp. 4570-4580, Seoul, Korea (South), 27 October-2 November, 2019.
  • H. Zhang, I. Goodfellow, D. Metaxas and A. Odena. Self-attention generative adversarial networks. 36th international conference on machine learning PMLR, pp. 7354-7363, Long Beach, California, USA, 9-15 June 2019.
  • E. Zakharov, A. Shysheya, E. Burkov and V. Lempitsky, Few-shot adversarial learning of realistic neural talking head models. Proceedings of the IEEE/CVF international conference on computer vision, pp. 9459-9468, Seoul, Korea (South), 27 October-2 November, 2019.
  • M. Lučić, M. Tschannen, M. Ritter, X. Zhai, O. Bachem and S. Gelly, High-fidelity image generation with fewer labels. 36th international conference on machine learning PMLR, pp. 4183-4192, California, USA, 9-15 June 2019.
  • A. Noguchi and T. Harada, Image generation from small datasets via batch statistics adaptation. Proceedings of the IEEE/CVF international conference on computer vision, pp. 2750-2758, Seoul, Korea (South), 27 October-2 November, 2019.
  • A. Shocher, S. Bagon, P. Isola and M. Irani, InGAN: capturing and retargeting the ‘DNA’ of a natural ımage. Proceedings of the IEEE/CVF international conference on computer vision, pp. 4492-4501, Seoul, Korea (South), 27 October-2 November, 2019.
  • P. Isola, J.Y. Zhu, T. Zhou and A.A. Efros, Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1125-1134, Honolulu, HI, USA, 21-26 July, 2017.
  • T. Hinz, M. Fisher, O. Wang and S. Wermter, Improved techniques for training single-ımage gans. Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp. 1300–1309, January 5 – 9, 2021.
  • D. Bahdanau, K. Cho and Y. Bengio, Neural machine translation by jointly learning to align and translate. Proceedings of the 3rd ınternational conference on learning representations, San Diego, CA, USA, 7-9 May, 2015.
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser and I. Polosukhin, Attention is all you need. Advances neural information processing systems 30, pp. 6000–6010, Long Beach, CA, USA, 4-9 December, 2017.
  • H. Zhao, J. Jia and V. Koltun, Exploring self-attention for ımage recognition. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10076-10085, Seattle, WA, USA, 13-19 June, 2020.
  • I. Bello, B. Zoph, A. Vaswani, J. Shlens and Q.V. Le, Attention augmented convolutional networks. Proceedings of the IEEE/CVF international conference on computer vision, pp. 3286-3295, Seoul, Korea (South), 27 October-2 November, 2019.
  • J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang and H. Lu, Dual attention network for scene segmentation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3146-3154, Long Beach, CA, USA, 15-20 June 2019.
  • I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin and A.C. Courville, Improved training of wasserstein gans. Advances in neural information processing systems 30, pp. 5769-5779, Long Beach, CA, USA, 4-9 December, 2017.
  • D.P. Kingma and J. Ba, Adam: a method for stochastic optimization. Proceedings of the 3rd international conference on learning representations, San Diego, CA, USA, 7-9 May, 2015.
  • B. Zhou, A. Lapedriza, J. Xiao, A. Torralba and A. Oliva, Learning deep features for scene recognition using places database. Advances in neural information processing systems, pp. 487–495 Montreal, Quebec, Canada, 8-13 December, 2014.
  • Düden şelalesi. https://www.kulturportali.gov.tr/contents/images/Yukar%c4%b1%20D%c3%bcden_Servet%20Uygun%20logolu.jpg, Accessed 10 September 2023.
  • SinGAN github web site. https://github.com/tamarott/SinGAN/tree/master/Input/Images, Accessed 1 September 2023.
  • ConSinGAN github web site. https://github.com/tohinz/ConSinGAN/tree/master/Images/Harmonization, Accessed 1 September 2023.
  • M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler and S. Hochreiter, GANs trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30, pp. 6629–6640, Long Beach, CA, USA, 4-9 December, 2017.
  • Z. Wang, A.C. Bovik, H.R. Sheikh and E.P. Simoncelli, Image quality assessment: from error visibility to structural similarity. IEEE Transactions on image processing, 13(4), 600-612, 2004. 10.1109/TIP.2003.819861

Tek görüntü üretimi için öz-dikkat modüllü koşulsuz üretken bir model

Yıl 2024, Cilt: 13 Sayı: 1, 196 - 204, 15.01.2024
https://doi.org/10.28948/ngumuh.1367602

Öz

Üretken Çekişmeli Ağlar (GANs), yüksek kaliteli sentetik verilerin üretilmesini sağlayarak derin öğrenme alanında devrim yaratmıştır. Bununla birlikte, GAN'ların etkinliği büyük ölçüde eğitim verilerinin boyutuna ve kalitesine bağlıdır. Birçok gerçek dünya uygulamasında, büyük miktarda yüksek kaliteli eğitim verisi toplamak zaman alıcı ve pahalı bir süreçtir. Buna bağlı olarak, son yıllarda, az veri kullanan GAN modelleri geliştirilmeye başlanmıştır. Bu çalışmada tek bir eğitim görüntüsünden öğrenebilen üretken çekişmeli ağ model önermekteyiz. Modelimiz, farklı ölçeklerde sıralı olarak çalışan birden fazla GAN'ın, eğitim görüntüsünün özelliklerini öğrenip son ölçekte farklı gerçekçi yapılarla örnekler ürettiği bir prensibe dayanmaktadır. Modelimizde, üretilen görüntülerin gerçekçiliğini ve kalitesini artırmak amacıyla bir öz-dikkat ve yeni ölçeklendirme yöntemi kullandık. Deneysel sonuçlar, modelimizin başarılı bir şekilde çalıştığını göstermektedir. Buna ilaveten, modelimizi farklı görüntü manipülasyonu uygulamalarında test ederek model sağlamlığını ortaya koyduk. Sonuç olarak, geliştirdiğimiz GAN modeli; tek bir eğitim görüntüsünden faklı, gerçekçi ve kaliteli görüntü örneklerini başarılı bir şekilde üretebilmekte, kısa eğitim süresi ve iyi eğitim kararlılığı sağlamaktadır.

Kaynakça

  • S. J. Pan, Q. Yang, A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10), 1345-1359, 2010. 10.1109/TKDE.2009.191
  • C. Finn, P. Abbeel, and S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks. International conference on machine learning PMLR, pp. 1126-1135, Sydney, Australia 6-11 August 2017.
  • C. Shorten and T. M. Khoshgoftaar, A survey on image data augmentation for deep learning. Journal of big data, 6(1), 1-48, 2019. https://doi.org/10.1186/s40537-019-0197-0
  • S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Jozefowicz and S. Bengio, Generating sentences from a continuous space. 20th SIGNLL conference on computational natural language learning, CoNLL 2016. Association for computational linguistics (ACL), pp. 10-21, Berlin, Germany, 11-12 August 2016.
  • F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr and T. M. Hospedales, Learning to compare: relation network for few-shot learning. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1199-1208, Salt Lake City, USA, 18-23 June 2018.
  • I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A.C. Courville and Y. Bengio, Generative adversarial networks. Communications of the ACM, 63(11), 139-144, 2020. https://doi.org/10.1145/3422622
  • T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford and X. Chen, Improved techniques for training gans. Advances in neural information processing systems, Barcelona, Spain, 5-10 December, 2016.
  • M. Arjovsky and L. Bottou, Towards principled methods for training generative adversarial networks. Advances in neural Information Processing Systems, Barcelona, Spain, 5-10 December, 2016.
  • Z. Zhang, M. Li and J. Yu, On the convergence and mode collapse of GAN. SIGGRAPH asia 2018 technical briefs, pp. 1-4, Tokyo, Japan, 4-8 December 2018.
  • T.R. Shaham, T. Dekel and T. Michaeli, SinGAN: Learning a generative model from a single natural ımage. Proceedings of the IEEE/CVF ınternational conference on computer vision, pp. 4570-4580, Seoul, Korea (South), 27 October-2 November, 2019.
  • H. Zhang, I. Goodfellow, D. Metaxas and A. Odena. Self-attention generative adversarial networks. 36th international conference on machine learning PMLR, pp. 7354-7363, Long Beach, California, USA, 9-15 June 2019.
  • E. Zakharov, A. Shysheya, E. Burkov and V. Lempitsky, Few-shot adversarial learning of realistic neural talking head models. Proceedings of the IEEE/CVF international conference on computer vision, pp. 9459-9468, Seoul, Korea (South), 27 October-2 November, 2019.
  • M. Lučić, M. Tschannen, M. Ritter, X. Zhai, O. Bachem and S. Gelly, High-fidelity image generation with fewer labels. 36th international conference on machine learning PMLR, pp. 4183-4192, California, USA, 9-15 June 2019.
  • A. Noguchi and T. Harada, Image generation from small datasets via batch statistics adaptation. Proceedings of the IEEE/CVF international conference on computer vision, pp. 2750-2758, Seoul, Korea (South), 27 October-2 November, 2019.
  • A. Shocher, S. Bagon, P. Isola and M. Irani, InGAN: capturing and retargeting the ‘DNA’ of a natural ımage. Proceedings of the IEEE/CVF international conference on computer vision, pp. 4492-4501, Seoul, Korea (South), 27 October-2 November, 2019.
  • P. Isola, J.Y. Zhu, T. Zhou and A.A. Efros, Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1125-1134, Honolulu, HI, USA, 21-26 July, 2017.
  • T. Hinz, M. Fisher, O. Wang and S. Wermter, Improved techniques for training single-ımage gans. Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp. 1300–1309, January 5 – 9, 2021.
  • D. Bahdanau, K. Cho and Y. Bengio, Neural machine translation by jointly learning to align and translate. Proceedings of the 3rd ınternational conference on learning representations, San Diego, CA, USA, 7-9 May, 2015.
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser and I. Polosukhin, Attention is all you need. Advances neural information processing systems 30, pp. 6000–6010, Long Beach, CA, USA, 4-9 December, 2017.
  • H. Zhao, J. Jia and V. Koltun, Exploring self-attention for ımage recognition. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10076-10085, Seattle, WA, USA, 13-19 June, 2020.
  • I. Bello, B. Zoph, A. Vaswani, J. Shlens and Q.V. Le, Attention augmented convolutional networks. Proceedings of the IEEE/CVF international conference on computer vision, pp. 3286-3295, Seoul, Korea (South), 27 October-2 November, 2019.
  • J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang and H. Lu, Dual attention network for scene segmentation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3146-3154, Long Beach, CA, USA, 15-20 June 2019.
  • I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin and A.C. Courville, Improved training of wasserstein gans. Advances in neural information processing systems 30, pp. 5769-5779, Long Beach, CA, USA, 4-9 December, 2017.
  • D.P. Kingma and J. Ba, Adam: a method for stochastic optimization. Proceedings of the 3rd international conference on learning representations, San Diego, CA, USA, 7-9 May, 2015.
  • B. Zhou, A. Lapedriza, J. Xiao, A. Torralba and A. Oliva, Learning deep features for scene recognition using places database. Advances in neural information processing systems, pp. 487–495 Montreal, Quebec, Canada, 8-13 December, 2014.
  • Düden şelalesi. https://www.kulturportali.gov.tr/contents/images/Yukar%c4%b1%20D%c3%bcden_Servet%20Uygun%20logolu.jpg, Accessed 10 September 2023.
  • SinGAN github web site. https://github.com/tamarott/SinGAN/tree/master/Input/Images, Accessed 1 September 2023.
  • ConSinGAN github web site. https://github.com/tohinz/ConSinGAN/tree/master/Images/Harmonization, Accessed 1 September 2023.
  • M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler and S. Hochreiter, GANs trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30, pp. 6629–6640, Long Beach, CA, USA, 4-9 December, 2017.
  • Z. Wang, A.C. Bovik, H.R. Sheikh and E.P. Simoncelli, Image quality assessment: from error visibility to structural similarity. IEEE Transactions on image processing, 13(4), 600-612, 2004. 10.1109/TIP.2003.819861
Toplam 30 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Yapay Zeka (Diğer)
Bölüm Araştırma Makaleleri
Yazarlar

Eyyüp Yıldız 0000-0002-7051-3368

Erkan Yüksel 0000-0001-8976-9964

Selçuk Sevgen 0000-0003-1443-1779

Erken Görünüm Tarihi 12 Ocak 2024
Yayımlanma Tarihi 15 Ocak 2024
Gönderilme Tarihi 27 Eylül 2023
Kabul Tarihi 15 Kasım 2023
Yayımlandığı Sayı Yıl 2024 Cilt: 13 Sayı: 1

Kaynak Göster

APA Yıldız, E., Yüksel, E., & Sevgen, S. (2024). An unconditional generative model with self-attention module for single image generation. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, 13(1), 196-204. https://doi.org/10.28948/ngumuh.1367602
AMA Yıldız E, Yüksel E, Sevgen S. An unconditional generative model with self-attention module for single image generation. NÖHÜ Müh. Bilim. Derg. Ocak 2024;13(1):196-204. doi:10.28948/ngumuh.1367602
Chicago Yıldız, Eyyüp, Erkan Yüksel, ve Selçuk Sevgen. “An Unconditional Generative Model With Self-Attention Module for Single Image Generation”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 13, sy. 1 (Ocak 2024): 196-204. https://doi.org/10.28948/ngumuh.1367602.
EndNote Yıldız E, Yüksel E, Sevgen S (01 Ocak 2024) An unconditional generative model with self-attention module for single image generation. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 13 1 196–204.
IEEE E. Yıldız, E. Yüksel, ve S. Sevgen, “An unconditional generative model with self-attention module for single image generation”, NÖHÜ Müh. Bilim. Derg., c. 13, sy. 1, ss. 196–204, 2024, doi: 10.28948/ngumuh.1367602.
ISNAD Yıldız, Eyyüp vd. “An Unconditional Generative Model With Self-Attention Module for Single Image Generation”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 13/1 (Ocak 2024), 196-204. https://doi.org/10.28948/ngumuh.1367602.
JAMA Yıldız E, Yüksel E, Sevgen S. An unconditional generative model with self-attention module for single image generation. NÖHÜ Müh. Bilim. Derg. 2024;13:196–204.
MLA Yıldız, Eyyüp vd. “An Unconditional Generative Model With Self-Attention Module for Single Image Generation”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, c. 13, sy. 1, 2024, ss. 196-04, doi:10.28948/ngumuh.1367602.
Vancouver Yıldız E, Yüksel E, Sevgen S. An unconditional generative model with self-attention module for single image generation. NÖHÜ Müh. Bilim. Derg. 2024;13(1):196-204.

 23135