Hierarchical Encoding for Image Inpainting with StyleGAN Inversion

Ayşegül Dündar

doi:10.29109/gujsc.1563933

Araştırma Makalesi

Yıl 2024, Cilt: 12 Sayı: 4, 1091 - 1101, 31.12.2024

Ayşegül Dündar

https://doi.org/10.29109/gujsc.1563933

Öz

Kaynakça

[1] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, “Context encoders: Feature learning by inpainting,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2536–2544, 2016.
[2] G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, and B. Catanzaro, “Image inpainting for irregular holes using partial convolutions,” in Proceedings of the European conference on computer vision (ECCV), pp. 85–100, 2018.
[3] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Free-form image inpainting with gated convolution,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 4471–4480, 2019.
[4] J. Li, N. Wang, L. Zhang, B. Du, and D. Tao, “Recurrent feature reasoning for image inpainting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7760–7768, 2020.
[5] G. Liu, A. Dundar, K. J. Shih, T.-C. Wang, F. A. Reda, K. Sapra, Z. Yu, X. Yang, A. Tao, and B. Catanzaro, “Partial convolution for padding, inpainting, and image synthesis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 5, pp. 6096–6110, 2022, https://doi.org/10.1109/TPAMI.2022.3209702.
[6] A. Lugmayr, M. Danelljan, A. Romero, F. Yu, R. Timofte, and L. Van Gool, “Repaint: Inpainting using denoising diffusion probabilistic models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11461–11471, 2022.
[7] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.
[8] A. B. Yildirim, V. Baday, E. Erdem, A. Erdem, and A. Dundar, “Inst-inpaint: Instructing to remove objects with diffusion models,” arXiv preprint arXiv:2304.03246, 2023.
[9] A. B. Yildirim, H. Pehlivan, B. B. Bilecen, and A. Dundar, “Diverse inpainting and editing with gan inversion,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 23120–23130, 2023.
[10] H. Sivuk and A. Dundar, “Diverse semantic image editing with style codes,” arXiv preprint arXiv:2309.13975, 2023.
[11] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410, 2019.
[12] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” in International conference on machine learning, pp. 7354–7363, PMLR, 2019.
[13] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of stylegan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110– 8119, 2020.
[14] N. Yu, G. Liu, A. Dundar, A. Tao, B. Catanzaro, L. S. Davis, and M. Fritz, “Dual contrastive loss and attention for gans,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6731–6742, 2021.
[15] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E.L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans, et al., “Photorealistic text-to-image diffusion models with deep language understanding,” Advances in Neural Information Processing Systems, vol. 35, pp. 36479–36494, 2022.
[16] O. Tov, Y. Alaluf, Y. Nitzan, O. Patashnik, and D. Cohen-Or, “Designing an encoder for style- gan image manipulation,” ACM Transactions on Graphics (TOG), vol. 40, no. 4, pp. 1–14, 2021, https://doi.org/10.1145/3450626.3459838.
[17] T. Wang, Y. Zhang, Y. Fan, J. Wang, and Q. Chen, “High-fidelity gan inversion for image attribute editing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11379–11388, 2022.
[18] Y.Shen, J. Gu, X.Tang , and B.Zhou, “Interpreting the latent space of gans for semantic face editing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9243–9252, 2020.
[19] E. Ha ̈rko ̈nen, A. Hertzmann, J. Lehtinen, and S. Paris, “Ganspace: Discovering interpretable gan controls,” Advances in Neural Information Processing Systems, vol. 33, 2020.
[20] O. Patashnik, Z. Wu, E. Shechtman, D. Cohen-Or, and D. Lischinski, “Styleclip: Text-driven manipulation of stylegan imagery,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2085–2094, 2021.
[21] Z. Chen, R. Jiang, B. Duke, H. Zhao, and P. Aarabi, “Exploring gradient-based multi-directional controls in gans,” in European Conference on Computer Vision, pp. 104–119, Springer, 2022.
[22] H. Pehlivan, Y. Dalva, and A. Dundar, “Styleres: Transforming the residuals for real image editing with stylegan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1828–1837, 2023.
[23] A. B. Yildirim, H. Pehlivan, and A. Dundar, “Warping the residuals for image editing with stylegan,” arXiv preprint arXiv:2312.11422, 2023.
[24] Y. Yu, L. Zhang, H. Fan, and T. Luo, “High-fidelity image inpainting with gan inversion,” in Computer Vision– ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVI, pp. 242–258, Springer, 2022.
[25] W. Wang, L. Niu, J. Zhang, X. Yang, and L. Zhang, “Dual-path image inpainting with auxiliary gan inversion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11421–11430, 2022.
[26] Y. Jo and J. Park, “Sc-fegan: Face editing generative adversarial network with user’s sketch and color,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 1745–1753, 2019.
[27] W. Luo, S. Yang, H. Wang, B. Long, and W. Zhang, “Context-consistent semantic image editing with style-preserved modulation,” in European Conference on Computer Vision, pp. 561–578, Springer, 2022.
[28] Y. Wang, X. Tao, X. Shen, and J. Jia, “Wide-context semantic image extrapolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1399–1408, 2019.
[29] Y.-C. Cheng, C. H. Lin, H.-Y. Lee, J. Ren, S. Tulyakov, and M.-H. Yang, “Inout: Diverse image outpainting via gan inversion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11431–11440, 2022.
[30] H. Liu, B. Jiang, Y. Xiao, and C. Yang, “Coherent semantic attention for image inpainting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4170–4179, 2019.
[31] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Generative image inpainting with contextual attention,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5505–5514, 2018.
[32] C. Xie, S. Liu, C. Li, M.-M. Cheng, W. Zuo, X. Liu, S. Wen, and E. Ding, “Image inpainting with learnable bidirec- tional attention maps,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 8858–8867, 2019.
[33] S. Zhao, J. Cui, Y. Sheng, Y. Dong, X. Liang, E. I. Chang, and Y. Xu, “Large scale image completion via co- modulated generative adversarial networks,” In International Conference on Learning Representations, 2021.
[34] E. Richardson, Y. Alaluf, O. Patashnik, Y. Nitzan, Y. Azar, S. Shapiro, and D. Cohen-Or, “Encoding in style: a stylegan encoder for image-to-image translation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2287–2296, 2021.
[35] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
[36] Y. Alaluf, O. Tov, R. Mokady, R. Gal, and A. Bermano, “Hyperstyle: Stylegan inversion with hypernetworks for real image editing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18511–18521, 2022.
[37] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

Hierarchical Encoding for Image Inpainting with StyleGAN Inversion

Yıl 2024, Cilt: 12 Sayı: 4, 1091 - 1101, 31.12.2024

Ayşegül Dündar

https://doi.org/10.29109/gujsc.1563933

Öz

Image inpainting, the process of removing unwanted pixels and seamlessly replacing them with new ones, poses significant challenges requiring algorithms to understand image context and generate realistic replacements. With applications ranging from content generation to image editing, image inpainting has garnered significant interest. Traditional approaches involve training deep neural network models from scratch using binary masks to identify regions for inpainting. Recent advancements have shown the feasibility of leveraging well-trained image generation models, such as StyleGANs, for inpainting tasks. However, effectively embedding images into StyleGAN's latent space and addressing the challenges of diverse inpainting remain key obstacles. In this work, we propose a hierarchical encoder tailored to encode visible and missing features seamlessly. Additionally, we introduce a single-stage architecture capable of encoding both low-rate and high-rate latent features used by StyleGAN. While low-rate latent features offer a comprehensive understanding of images, high-rate latent features excel in transmitting intricate details to the generator. Through extensive experiments, we demonstrate significant improvements over state-of-the-art models for image inpainting, highlighting the efficacy of our approach.

Anahtar Kelimeler

Image processing, Generative Adversarial Networks, Deep Learning

Kaynakça

[1] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, “Context encoders: Feature learning by inpainting,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2536–2544, 2016.
[2] G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, and B. Catanzaro, “Image inpainting for irregular holes using partial convolutions,” in Proceedings of the European conference on computer vision (ECCV), pp. 85–100, 2018.
[3] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Free-form image inpainting with gated convolution,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 4471–4480, 2019.
[4] J. Li, N. Wang, L. Zhang, B. Du, and D. Tao, “Recurrent feature reasoning for image inpainting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7760–7768, 2020.
[5] G. Liu, A. Dundar, K. J. Shih, T.-C. Wang, F. A. Reda, K. Sapra, Z. Yu, X. Yang, A. Tao, and B. Catanzaro, “Partial convolution for padding, inpainting, and image synthesis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 5, pp. 6096–6110, 2022, https://doi.org/10.1109/TPAMI.2022.3209702.
[6] A. Lugmayr, M. Danelljan, A. Romero, F. Yu, R. Timofte, and L. Van Gool, “Repaint: Inpainting using denoising diffusion probabilistic models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11461–11471, 2022.
[7] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.
[8] A. B. Yildirim, V. Baday, E. Erdem, A. Erdem, and A. Dundar, “Inst-inpaint: Instructing to remove objects with diffusion models,” arXiv preprint arXiv:2304.03246, 2023.
[9] A. B. Yildirim, H. Pehlivan, B. B. Bilecen, and A. Dundar, “Diverse inpainting and editing with gan inversion,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 23120–23130, 2023.
[10] H. Sivuk and A. Dundar, “Diverse semantic image editing with style codes,” arXiv preprint arXiv:2309.13975, 2023.
[11] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410, 2019.
[12] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” in International conference on machine learning, pp. 7354–7363, PMLR, 2019.
[13] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of stylegan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110– 8119, 2020.
[14] N. Yu, G. Liu, A. Dundar, A. Tao, B. Catanzaro, L. S. Davis, and M. Fritz, “Dual contrastive loss and attention for gans,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6731–6742, 2021.
[15] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E.L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans, et al., “Photorealistic text-to-image diffusion models with deep language understanding,” Advances in Neural Information Processing Systems, vol. 35, pp. 36479–36494, 2022.
[16] O. Tov, Y. Alaluf, Y. Nitzan, O. Patashnik, and D. Cohen-Or, “Designing an encoder for style- gan image manipulation,” ACM Transactions on Graphics (TOG), vol. 40, no. 4, pp. 1–14, 2021, https://doi.org/10.1145/3450626.3459838.
[17] T. Wang, Y. Zhang, Y. Fan, J. Wang, and Q. Chen, “High-fidelity gan inversion for image attribute editing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11379–11388, 2022.
[18] Y.Shen, J. Gu, X.Tang , and B.Zhou, “Interpreting the latent space of gans for semantic face editing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9243–9252, 2020.
[19] E. Ha ̈rko ̈nen, A. Hertzmann, J. Lehtinen, and S. Paris, “Ganspace: Discovering interpretable gan controls,” Advances in Neural Information Processing Systems, vol. 33, 2020.
[20] O. Patashnik, Z. Wu, E. Shechtman, D. Cohen-Or, and D. Lischinski, “Styleclip: Text-driven manipulation of stylegan imagery,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2085–2094, 2021.
[21] Z. Chen, R. Jiang, B. Duke, H. Zhao, and P. Aarabi, “Exploring gradient-based multi-directional controls in gans,” in European Conference on Computer Vision, pp. 104–119, Springer, 2022.
[22] H. Pehlivan, Y. Dalva, and A. Dundar, “Styleres: Transforming the residuals for real image editing with stylegan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1828–1837, 2023.
[23] A. B. Yildirim, H. Pehlivan, and A. Dundar, “Warping the residuals for image editing with stylegan,” arXiv preprint arXiv:2312.11422, 2023.
[24] Y. Yu, L. Zhang, H. Fan, and T. Luo, “High-fidelity image inpainting with gan inversion,” in Computer Vision– ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVI, pp. 242–258, Springer, 2022.
[25] W. Wang, L. Niu, J. Zhang, X. Yang, and L. Zhang, “Dual-path image inpainting with auxiliary gan inversion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11421–11430, 2022.
[26] Y. Jo and J. Park, “Sc-fegan: Face editing generative adversarial network with user’s sketch and color,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 1745–1753, 2019.
[27] W. Luo, S. Yang, H. Wang, B. Long, and W. Zhang, “Context-consistent semantic image editing with style-preserved modulation,” in European Conference on Computer Vision, pp. 561–578, Springer, 2022.
[28] Y. Wang, X. Tao, X. Shen, and J. Jia, “Wide-context semantic image extrapolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1399–1408, 2019.
[29] Y.-C. Cheng, C. H. Lin, H.-Y. Lee, J. Ren, S. Tulyakov, and M.-H. Yang, “Inout: Diverse image outpainting via gan inversion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11431–11440, 2022.
[30] H. Liu, B. Jiang, Y. Xiao, and C. Yang, “Coherent semantic attention for image inpainting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4170–4179, 2019.
[31] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Generative image inpainting with contextual attention,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5505–5514, 2018.
[32] C. Xie, S. Liu, C. Li, M.-M. Cheng, W. Zuo, X. Liu, S. Wen, and E. Ding, “Image inpainting with learnable bidirec- tional attention maps,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 8858–8867, 2019.
[33] S. Zhao, J. Cui, Y. Sheng, Y. Dong, X. Liang, E. I. Chang, and Y. Xu, “Large scale image completion via co- modulated generative adversarial networks,” In International Conference on Learning Representations, 2021.
[34] E. Richardson, Y. Alaluf, O. Patashnik, Y. Nitzan, Y. Azar, S. Shapiro, and D. Cohen-Or, “Encoding in style: a stylegan encoder for image-to-image translation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2287–2296, 2021.
[35] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
[36] Y. Alaluf, O. Tov, R. Mokady, R. Gal, and A. Bermano, “Hyperstyle: Stylegan inversion with hypernetworks for real image editing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18511–18521, 2022.
[37] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

Toplam 37 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Elektrik Mühendisliği (Diğer)
Bölüm	Tasarım ve Teknoloji
Yazarlar	Ayşegül Dündar 0000-0003-2014-6325
Erken Görünüm Tarihi	26 Aralık 2024
Yayımlanma Tarihi	31 Aralık 2024
Gönderilme Tarihi	11 Ekim 2024
Kabul Tarihi	12 Aralık 2024
Yayımlandığı Sayı	Yıl 2024 Cilt: 12 Sayı: 4

Kaynak Göster

APA	Dündar, A. (2024). Hierarchical Encoding for Image Inpainting with StyleGAN Inversion. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım Ve Teknoloji, 12(4), 1091-1101. https://doi.org/10.29109/gujsc.1563933

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

e-ISSN:2147-9526