Attentive Sequential Auto-Encoding Towards Unsupervised Object-centric Scene Modeling

Yarkın Deniz Çetin; Ramazan Gökberk Cinbiş

doi:10.29109/gujsc.1139701

Araştırma Makalesi

Attentive Sequential Auto-Encoding Towards Unsupervised Object-centric Scene Modeling

Yıl 2022, , 1127 - 1142, 30.12.2022

Yarkın Deniz Çetin , Ramazan Gökberk Cinbiş

https://doi.org/10.29109/gujsc.1139701

Öz

This paper describes an unsupervised sequential auto-encoding model targeting multi-object scenes. The proposed model uses an attention-based formulation, with reconstruction-driven losses. The main model relies on iteratively writing regions onto a canvas, in a differentiable manner. To enforce attention to objects and/or parts, the model uses a convolutional localization network, a region level bottleneck auto-encoder and a loss term that encourages reconstruction within a limited number of iterations. An extended version of the model incorporates a background modeling component that aims at handling scenes with complex backgrounds. The model is evaluated on two separate datasets: a synthetic dataset that is constructed by composing MNIST digit instances together, and the MS-COCO dataset. The model achieves high reconstruction ability on MNIST based scenes. The extended model shows promising results on the complex and challenging MS-COCO scenes.

Anahtar Kelimeler

Unsupervised learning, complex scene modeling, object discovery

Destekleyen Kurum

TUBITAK

Proje Numarası

116E445

Kaynakça

Goodfellow I. J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., & Bengio Y. (2014). Generative Adversarial Networks. Advances in Neural Information Processing Systems.
Arjovsky M., Chintala S., & Bottou L. (2017). Wasserstein GAN. ArXiv:1701.07875 [Cs, Stat].
Karras T., Laine S., & Aila T. (2019). A Style-Based Generator Architecture for Generative Adversarial Networks. Proc. CVPR.
Kingma Diederik P., & Welling, M. (2014). Auto-Encoding Variational Bayes. International Conference on Learning Representations.
Rezende D. J., Mohamed S., & Wierstra D. (2014). Stochastic Backpropagation and Approximate Inference in Deep Generative Models. ArXiv:1401.4082.
Li Y., Swersky K., & Zemel R. (2015). Generative Moment Matching Networks. PMLR.
Dinh L., Sohl-Dickstein J., & Bengio S. (2016). Density estimation using Real NVP.
Kobyzev I., Prince S. J., & Brubaker M. A. (2020). Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence, 43(11), 3964-3979.
Kingma Durk P. & Dhariwal P. (2018). Glow: Generative Flow with Invertible 1x1 Convolutions. In Advances in Neural Information Processing Systems 31 (pp. 10215–10224).
Behrmann J., Grathwohl W., Chen R. T. Q., Duvenaud, D., & Jacobsen, J.-H. (2019). Invertible Residual Networks. ArXiv:1811.00995 [Cs, Stat].
Köhler J., Klein L. & Noé F. (2020). Equivariant Flows: exact likelihood generative learning for symmetric densities. ArXiv:2006.02425 [Physics, Stat].
San-Roman R., Nachmani E., & Wolf L. (2021). Noise estimation for generative diffusion models. ArXiv Preprint ArXiv:2104.02600.
Huang C.-W., Lim J. H., & Courville A. C. (2021). A variational perspective on diffusion-based generative models and score matching. Advances in Neural Information Processing Systems, 34.
Liu K., Tang W., Zhou F., & Qiu G. (2019, October). Spectral Regularization for Combating Mode Collapse in GANs. ICCV.
Brock A., Donahue J., & Simonyan K. (2018). Large Scale GAN Training for High Fidelity Natural Image Synthesis. ArXiv:1809.11096 [Cs, Stat].
Karras T., Laine S., Aittala M., Hellsten J., Lehtinen J., & Aila T. (2020). Analyzing and Improving the Image Quality of StyleGAN. Proc. CVPR.
Karnewar A., & Wang O. (2020). MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks. CVPR.
Karras T., Aittala M., Laine S., Härkönen E., Hellsten J., Lehtinen J., & Aila T. (2021). Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34.
Casanova A., Careil M., Verbeek J., Drozdzal M., & Romero-Soriano, A. (2021, November). Instance-Conditioned GAN. NeurIPS.
Zhang Y., Ling H., Gao J., Yin K., Lafleche J.-F., Barriuso A., Torralba A., & Fidler S. (2021). DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort. ArXiv:2104.06490 [Cs].
Gregor K., Danihelka I., Graves A., Rezende D. J., & Wierstra D. (2015). DRAW: A Recurrent Neural Network For Image Generation. ArXiv:1502.04623 [Cs].
Lin T.-Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., & Zitnick C. L. (2014). Microsoft COCO: Common Objects in Context. 740–755.
van der Maaten L., & Hinton G. (2008). Visualizing Data using t-SNE . Journal of Machine Learning Research, 9, 2579–2605.
Felzenszwalb P. F., & Huttenlocher D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.
Cour T., Benezit F., & Shi J. (2005). Spectral segmentation with multiscale graph decomposition. IEEE Conference on Computer Vision and Pattern Recognition, 2, 1124–1131.
Comaniciu D., & Meer P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.
Arbelaez P., Maire M., Fowlkes C., & Malik J. (2009). From contours to regions: An empirical evaluation. IEEE Conference on Computer Vision and Pattern Recognition, 2294–2301.
Pont-Tuset J., Arbelaez P., Barron J. T., Marques F., & Malik J. (2016). Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(1), 128–140.
Xia, X. & Kulis B. (2017). W-net: A deep model for fully unsupervised image segmentation. ArXiv Preprint ArXiv:1711.08506.
Karras T., Aila T., Laine S., & Lehtinen J. (2017). Progressive growing of GANs for improved quality, stability, and variation. Proc. Int. Conf. Learn. Represent.

Yıl 2022, , 1127 - 1142, 30.12.2022

Yarkın Deniz Çetin , Ramazan Gökberk Cinbiş

https://doi.org/10.29109/gujsc.1139701

Öz

Proje Numarası

116E445

Kaynakça

Goodfellow I. J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., & Bengio Y. (2014). Generative Adversarial Networks. Advances in Neural Information Processing Systems.
Arjovsky M., Chintala S., & Bottou L. (2017). Wasserstein GAN. ArXiv:1701.07875 [Cs, Stat].
Karras T., Laine S., & Aila T. (2019). A Style-Based Generator Architecture for Generative Adversarial Networks. Proc. CVPR.
Kingma Diederik P., & Welling, M. (2014). Auto-Encoding Variational Bayes. International Conference on Learning Representations.
Rezende D. J., Mohamed S., & Wierstra D. (2014). Stochastic Backpropagation and Approximate Inference in Deep Generative Models. ArXiv:1401.4082.
Li Y., Swersky K., & Zemel R. (2015). Generative Moment Matching Networks. PMLR.
Dinh L., Sohl-Dickstein J., & Bengio S. (2016). Density estimation using Real NVP.
Kobyzev I., Prince S. J., & Brubaker M. A. (2020). Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence, 43(11), 3964-3979.
Kingma Durk P. & Dhariwal P. (2018). Glow: Generative Flow with Invertible 1x1 Convolutions. In Advances in Neural Information Processing Systems 31 (pp. 10215–10224).
Behrmann J., Grathwohl W., Chen R. T. Q., Duvenaud, D., & Jacobsen, J.-H. (2019). Invertible Residual Networks. ArXiv:1811.00995 [Cs, Stat].
Köhler J., Klein L. & Noé F. (2020). Equivariant Flows: exact likelihood generative learning for symmetric densities. ArXiv:2006.02425 [Physics, Stat].
San-Roman R., Nachmani E., & Wolf L. (2021). Noise estimation for generative diffusion models. ArXiv Preprint ArXiv:2104.02600.
Huang C.-W., Lim J. H., & Courville A. C. (2021). A variational perspective on diffusion-based generative models and score matching. Advances in Neural Information Processing Systems, 34.
Liu K., Tang W., Zhou F., & Qiu G. (2019, October). Spectral Regularization for Combating Mode Collapse in GANs. ICCV.
Brock A., Donahue J., & Simonyan K. (2018). Large Scale GAN Training for High Fidelity Natural Image Synthesis. ArXiv:1809.11096 [Cs, Stat].
Karras T., Laine S., Aittala M., Hellsten J., Lehtinen J., & Aila T. (2020). Analyzing and Improving the Image Quality of StyleGAN. Proc. CVPR.
Karnewar A., & Wang O. (2020). MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks. CVPR.
Karras T., Aittala M., Laine S., Härkönen E., Hellsten J., Lehtinen J., & Aila T. (2021). Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34.
Casanova A., Careil M., Verbeek J., Drozdzal M., & Romero-Soriano, A. (2021, November). Instance-Conditioned GAN. NeurIPS.
Zhang Y., Ling H., Gao J., Yin K., Lafleche J.-F., Barriuso A., Torralba A., & Fidler S. (2021). DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort. ArXiv:2104.06490 [Cs].
Gregor K., Danihelka I., Graves A., Rezende D. J., & Wierstra D. (2015). DRAW: A Recurrent Neural Network For Image Generation. ArXiv:1502.04623 [Cs].
Lin T.-Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., & Zitnick C. L. (2014). Microsoft COCO: Common Objects in Context. 740–755.
van der Maaten L., & Hinton G. (2008). Visualizing Data using t-SNE . Journal of Machine Learning Research, 9, 2579–2605.
Felzenszwalb P. F., & Huttenlocher D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.
Cour T., Benezit F., & Shi J. (2005). Spectral segmentation with multiscale graph decomposition. IEEE Conference on Computer Vision and Pattern Recognition, 2, 1124–1131.
Comaniciu D., & Meer P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.
Arbelaez P., Maire M., Fowlkes C., & Malik J. (2009). From contours to regions: An empirical evaluation. IEEE Conference on Computer Vision and Pattern Recognition, 2294–2301.
Pont-Tuset J., Arbelaez P., Barron J. T., Marques F., & Malik J. (2016). Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(1), 128–140.
Xia, X. & Kulis B. (2017). W-net: A deep model for fully unsupervised image segmentation. ArXiv Preprint ArXiv:1711.08506.
Karras T., Aila T., Laine S., & Lehtinen J. (2017). Progressive growing of GANs for improved quality, stability, and variation. Proc. Int. Conf. Learn. Represent.

Toplam 30 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Mühendislik
Bölüm	Tasarım ve Teknoloji
Yazarlar	Yarkın Deniz Çetin 0000-0003-1358-4247 Ramazan Gökberk Cinbiş 0000-0003-0962-7101
Proje Numarası	116E445
Yayımlanma Tarihi	30 Aralık 2022
Gönderilme Tarihi	2 Temmuz 2022
Yayımlandığı Sayı	Yıl 2022

Kaynak Göster

APA	Çetin, Y. D., & Cinbiş, R. G. (2022). Attentive Sequential Auto-Encoding Towards Unsupervised Object-centric Scene Modeling. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım Ve Teknoloji, 10(4), 1127-1142. https://doi.org/10.29109/gujsc.1139701

Makale Dosyaları

Tam Metin

e-ISSN:2147-9526