Attentive Sequential Auto-Encoding Towards Unsupervised Object-centric Scene Modeling

Yarkın Deniz Çetin; Ramazan Gökberk Cinbiş

doi:10.29109/gujsc.1139701

Research Article

Attentive Sequential Auto-Encoding Towards Unsupervised Object-centric Scene Modeling

Year 2022, Volume: 10 Issue: 4, 1127 - 1142, 30.12.2022

Yarkın Deniz Çetin , Ramazan Gökberk Cinbiş

https://doi.org/10.29109/gujsc.1139701

Abstract

This paper describes an unsupervised sequential auto-encoding model targeting multi-object scenes. The proposed model uses an attention-based formulation, with reconstruction-driven losses. The main model relies on iteratively writing regions onto a canvas, in a differentiable manner. To enforce attention to objects and/or parts, the model uses a convolutional localization network, a region level bottleneck auto-encoder and a loss term that encourages reconstruction within a limited number of iterations. An extended version of the model incorporates a background modeling component that aims at handling scenes with complex backgrounds. The model is evaluated on two separate datasets: a synthetic dataset that is constructed by composing MNIST digit instances together, and the MS-COCO dataset. The model achieves high reconstruction ability on MNIST based scenes. The extended model shows promising results on the complex and challenging MS-COCO scenes.

Keywords

Unsupervised learning , complex scene modeling , object discovery

Supporting Institution

TUBITAK

Project Number

116E445

References

Goodfellow I. J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., & Bengio Y. (2014). Generative Adversarial Networks. Advances in Neural Information Processing Systems.
Arjovsky M., Chintala S., & Bottou L. (2017). Wasserstein GAN. ArXiv:1701.07875 [Cs, Stat].
Karras T., Laine S., & Aila T. (2019). A Style-Based Generator Architecture for Generative Adversarial Networks. Proc. CVPR.
Kingma Diederik P., & Welling, M. (2014). Auto-Encoding Variational Bayes. International Conference on Learning Representations.
Rezende D. J., Mohamed S., & Wierstra D. (2014). Stochastic Backpropagation and Approximate Inference in Deep Generative Models. ArXiv:1401.4082.
Li Y., Swersky K., & Zemel R. (2015). Generative Moment Matching Networks. PMLR.
Dinh L., Sohl-Dickstein J., & Bengio S. (2016). Density estimation using Real NVP.
Kobyzev I., Prince S. J., & Brubaker M. A. (2020). Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence, 43(11), 3964-3979.
Kingma Durk P. & Dhariwal P. (2018). Glow: Generative Flow with Invertible 1x1 Convolutions. In Advances in Neural Information Processing Systems 31 (pp. 10215–10224).
Behrmann J., Grathwohl W., Chen R. T. Q., Duvenaud, D., & Jacobsen, J.-H. (2019). Invertible Residual Networks. ArXiv:1811.00995 [Cs, Stat].
Köhler J., Klein L. & Noé F. (2020). Equivariant Flows: exact likelihood generative learning for symmetric densities. ArXiv:2006.02425 [Physics, Stat].
San-Roman R., Nachmani E., & Wolf L. (2021). Noise estimation for generative diffusion models. ArXiv Preprint ArXiv:2104.02600.
Huang C.-W., Lim J. H., & Courville A. C. (2021). A variational perspective on diffusion-based generative models and score matching. Advances in Neural Information Processing Systems, 34.
Liu K., Tang W., Zhou F., & Qiu G. (2019, October). Spectral Regularization for Combating Mode Collapse in GANs. ICCV.
Brock A., Donahue J., & Simonyan K. (2018). Large Scale GAN Training for High Fidelity Natural Image Synthesis. ArXiv:1809.11096 [Cs, Stat].
Karras T., Laine S., Aittala M., Hellsten J., Lehtinen J., & Aila T. (2020). Analyzing and Improving the Image Quality of StyleGAN. Proc. CVPR.
Karnewar A., & Wang O. (2020). MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks. CVPR.
Karras T., Aittala M., Laine S., Härkönen E., Hellsten J., Lehtinen J., & Aila T. (2021). Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34.
Casanova A., Careil M., Verbeek J., Drozdzal M., & Romero-Soriano, A. (2021, November). Instance-Conditioned GAN. NeurIPS.
Zhang Y., Ling H., Gao J., Yin K., Lafleche J.-F., Barriuso A., Torralba A., & Fidler S. (2021). DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort. ArXiv:2104.06490 [Cs].
Gregor K., Danihelka I., Graves A., Rezende D. J., & Wierstra D. (2015). DRAW: A Recurrent Neural Network For Image Generation. ArXiv:1502.04623 [Cs].
Lin T.-Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., & Zitnick C. L. (2014). Microsoft COCO: Common Objects in Context. 740–755.
van der Maaten L., & Hinton G. (2008). Visualizing Data using t-SNE . Journal of Machine Learning Research, 9, 2579–2605.
Felzenszwalb P. F., & Huttenlocher D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.
Cour T., Benezit F., & Shi J. (2005). Spectral segmentation with multiscale graph decomposition. IEEE Conference on Computer Vision and Pattern Recognition, 2, 1124–1131.
Comaniciu D., & Meer P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.
Arbelaez P., Maire M., Fowlkes C., & Malik J. (2009). From contours to regions: An empirical evaluation. IEEE Conference on Computer Vision and Pattern Recognition, 2294–2301.
Pont-Tuset J., Arbelaez P., Barron J. T., Marques F., & Malik J. (2016). Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(1), 128–140.
Xia, X. & Kulis B. (2017). W-net: A deep model for fully unsupervised image segmentation. ArXiv Preprint ArXiv:1711.08506.
Karras T., Aila T., Laine S., & Lehtinen J. (2017). Progressive growing of GANs for improved quality, stability, and variation. Proc. Int. Conf. Learn. Represent.

Year 2022, Volume: 10 Issue: 4, 1127 - 1142, 30.12.2022

Yarkın Deniz Çetin , Ramazan Gökberk Cinbiş

https://doi.org/10.29109/gujsc.1139701

Abstract

Project Number

116E445

References

Goodfellow I. J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., & Bengio Y. (2014). Generative Adversarial Networks. Advances in Neural Information Processing Systems.
Arjovsky M., Chintala S., & Bottou L. (2017). Wasserstein GAN. ArXiv:1701.07875 [Cs, Stat].
Karras T., Laine S., & Aila T. (2019). A Style-Based Generator Architecture for Generative Adversarial Networks. Proc. CVPR.
Kingma Diederik P., & Welling, M. (2014). Auto-Encoding Variational Bayes. International Conference on Learning Representations.
Rezende D. J., Mohamed S., & Wierstra D. (2014). Stochastic Backpropagation and Approximate Inference in Deep Generative Models. ArXiv:1401.4082.
Li Y., Swersky K., & Zemel R. (2015). Generative Moment Matching Networks. PMLR.
Dinh L., Sohl-Dickstein J., & Bengio S. (2016). Density estimation using Real NVP.
Kobyzev I., Prince S. J., & Brubaker M. A. (2020). Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence, 43(11), 3964-3979.
Kingma Durk P. & Dhariwal P. (2018). Glow: Generative Flow with Invertible 1x1 Convolutions. In Advances in Neural Information Processing Systems 31 (pp. 10215–10224).
Behrmann J., Grathwohl W., Chen R. T. Q., Duvenaud, D., & Jacobsen, J.-H. (2019). Invertible Residual Networks. ArXiv:1811.00995 [Cs, Stat].
Köhler J., Klein L. & Noé F. (2020). Equivariant Flows: exact likelihood generative learning for symmetric densities. ArXiv:2006.02425 [Physics, Stat].
San-Roman R., Nachmani E., & Wolf L. (2021). Noise estimation for generative diffusion models. ArXiv Preprint ArXiv:2104.02600.
Huang C.-W., Lim J. H., & Courville A. C. (2021). A variational perspective on diffusion-based generative models and score matching. Advances in Neural Information Processing Systems, 34.
Liu K., Tang W., Zhou F., & Qiu G. (2019, October). Spectral Regularization for Combating Mode Collapse in GANs. ICCV.
Brock A., Donahue J., & Simonyan K. (2018). Large Scale GAN Training for High Fidelity Natural Image Synthesis. ArXiv:1809.11096 [Cs, Stat].
Karras T., Laine S., Aittala M., Hellsten J., Lehtinen J., & Aila T. (2020). Analyzing and Improving the Image Quality of StyleGAN. Proc. CVPR.
Karnewar A., & Wang O. (2020). MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks. CVPR.
Karras T., Aittala M., Laine S., Härkönen E., Hellsten J., Lehtinen J., & Aila T. (2021). Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34.
Casanova A., Careil M., Verbeek J., Drozdzal M., & Romero-Soriano, A. (2021, November). Instance-Conditioned GAN. NeurIPS.
Zhang Y., Ling H., Gao J., Yin K., Lafleche J.-F., Barriuso A., Torralba A., & Fidler S. (2021). DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort. ArXiv:2104.06490 [Cs].
Gregor K., Danihelka I., Graves A., Rezende D. J., & Wierstra D. (2015). DRAW: A Recurrent Neural Network For Image Generation. ArXiv:1502.04623 [Cs].
Lin T.-Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., & Zitnick C. L. (2014). Microsoft COCO: Common Objects in Context. 740–755.
van der Maaten L., & Hinton G. (2008). Visualizing Data using t-SNE . Journal of Machine Learning Research, 9, 2579–2605.
Felzenszwalb P. F., & Huttenlocher D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.
Cour T., Benezit F., & Shi J. (2005). Spectral segmentation with multiscale graph decomposition. IEEE Conference on Computer Vision and Pattern Recognition, 2, 1124–1131.
Comaniciu D., & Meer P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.
Arbelaez P., Maire M., Fowlkes C., & Malik J. (2009). From contours to regions: An empirical evaluation. IEEE Conference on Computer Vision and Pattern Recognition, 2294–2301.
Pont-Tuset J., Arbelaez P., Barron J. T., Marques F., & Malik J. (2016). Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(1), 128–140.
Xia, X. & Kulis B. (2017). W-net: A deep model for fully unsupervised image segmentation. ArXiv Preprint ArXiv:1711.08506.
Karras T., Aila T., Laine S., & Lehtinen J. (2017). Progressive growing of GANs for improved quality, stability, and variation. Proc. Int. Conf. Learn. Represent.

There are 30 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Research Article
Authors	Yarkın Deniz Çetin 0000-0003-1358-4247 Ramazan Gökberk Cinbiş 0000-0003-0962-7101
Project Number	116E445
Publication Date	December 30, 2022
Submission Date	July 2, 2022
Published in Issue	Year 2022 Volume: 10 Issue: 4

Cite

APA	Çetin, Y. D., & Cinbiş, R. G. (2022). Attentive Sequential Auto-Encoding Towards Unsupervised Object-centric Scene Modeling. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım Ve Teknoloji, 10(4), 1127-1142. https://doi.org/10.29109/gujsc.1139701

Download Cover Image

Article Files

Full Text

TRINDEX

e-ISSN:2147-9526