TY - JOUR T1 - Attentive Sequential Auto-Encoding Towards Unsupervised Object-centric Scene Modeling AU - Cinbiş, Ramazan Gökberk AU - Çetin, Yarkın Deniz PY - 2022 DA - December DO - 10.29109/gujsc.1139701 JF - Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji JO - GUJS Part C PB - Gazi Üniversitesi WT - DergiPark SN - 2147-9526 SP - 1127 EP - 1142 VL - 10 IS - 4 LA - en AB - This paper describes an unsupervised sequential auto-encoding model targeting multi-object scenes. The proposed model uses an attention-based formulation, with reconstruction-driven losses. The main model relies on iteratively writing regions onto a canvas, in a differentiable manner. To enforce attention to objects and/or parts, the model uses a convolutional localization network, a region level bottleneck auto-encoder and a loss term that encourages reconstruction within a limited number of iterations. An extended version of the model incorporates a background modeling component that aims at handling scenes with complex backgrounds. The model is evaluated on two separate datasets: a synthetic dataset that is constructed by composing MNIST digit instances together, and the MS-COCO dataset. The model achieves high reconstruction ability on MNIST based scenes. The extended model shows promising results on the complex and challenging MS-COCO scenes. KW - Unsupervised learning KW - complex scene modeling KW - object discovery CR - Goodfellow I. J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., & Bengio Y. (2014). Generative Adversarial Networks. Advances in Neural Information Processing Systems. CR - Arjovsky M., Chintala S., & Bottou L. (2017). Wasserstein GAN. ArXiv:1701.07875 [Cs, Stat]. CR - Karras T., Laine S., & Aila T. (2019). A Style-Based Generator Architecture for Generative Adversarial Networks. Proc. CVPR. CR - Kingma Diederik P., & Welling, M. (2014). Auto-Encoding Variational Bayes. International Conference on Learning Representations. CR - Rezende D. J., Mohamed S., & Wierstra D. (2014). Stochastic Backpropagation and Approximate Inference in Deep Generative Models. ArXiv:1401.4082. CR - Li Y., Swersky K., & Zemel R. (2015). Generative Moment Matching Networks. PMLR. CR - Dinh L., Sohl-Dickstein J., & Bengio S. (2016). Density estimation using Real NVP. CR - Kobyzev I., Prince S. J., & Brubaker M. A. (2020). Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence, 43(11), 3964-3979. CR - Kingma Durk P. & Dhariwal P. (2018). Glow: Generative Flow with Invertible 1x1 Convolutions. In Advances in Neural Information Processing Systems 31 (pp. 10215–10224). CR - Behrmann J., Grathwohl W., Chen R. T. Q., Duvenaud, D., & Jacobsen, J.-H. (2019). Invertible Residual Networks. ArXiv:1811.00995 [Cs, Stat]. CR - Köhler J., Klein L. & Noé F. (2020). Equivariant Flows: exact likelihood generative learning for symmetric densities. ArXiv:2006.02425 [Physics, Stat]. CR - San-Roman R., Nachmani E., & Wolf L. (2021). Noise estimation for generative diffusion models. ArXiv Preprint ArXiv:2104.02600. CR - Huang C.-W., Lim J. H., & Courville A. C. (2021). A variational perspective on diffusion-based generative models and score matching. Advances in Neural Information Processing Systems, 34. CR - Liu K., Tang W., Zhou F., & Qiu G. (2019, October). Spectral Regularization for Combating Mode Collapse in GANs. ICCV. CR - Brock A., Donahue J., & Simonyan K. (2018). Large Scale GAN Training for High Fidelity Natural Image Synthesis. ArXiv:1809.11096 [Cs, Stat]. CR - Karras T., Laine S., Aittala M., Hellsten J., Lehtinen J., & Aila T. (2020). Analyzing and Improving the Image Quality of StyleGAN. Proc. CVPR. CR - Karnewar A., & Wang O. (2020). MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks. CVPR. CR - Karras T., Aittala M., Laine S., Härkönen E., Hellsten J., Lehtinen J., & Aila T. (2021). Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34. CR - Casanova A., Careil M., Verbeek J., Drozdzal M., & Romero-Soriano, A. (2021, November). Instance-Conditioned GAN. NeurIPS. CR - Zhang Y., Ling H., Gao J., Yin K., Lafleche J.-F., Barriuso A., Torralba A., & Fidler S. (2021). DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort. ArXiv:2104.06490 [Cs]. CR - Gregor K., Danihelka I., Graves A., Rezende D. J., & Wierstra D. (2015). DRAW: A Recurrent Neural Network For Image Generation. ArXiv:1502.04623 [Cs]. CR - Lin T.-Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., & Zitnick C. L. (2014). Microsoft COCO: Common Objects in Context. 740–755. CR - van der Maaten L., & Hinton G. (2008). Visualizing Data using t-SNE . Journal of Machine Learning Research, 9, 2579–2605. CR - Felzenszwalb P. F., & Huttenlocher D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181. CR - Cour T., Benezit F., & Shi J. (2005). Spectral segmentation with multiscale graph decomposition. IEEE Conference on Computer Vision and Pattern Recognition, 2, 1124–1131. CR - Comaniciu D., & Meer P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619. CR - Arbelaez P., Maire M., Fowlkes C., & Malik J. (2009). From contours to regions: An empirical evaluation. IEEE Conference on Computer Vision and Pattern Recognition, 2294–2301. CR - Pont-Tuset J., Arbelaez P., Barron J. T., Marques F., & Malik J. (2016). Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(1), 128–140. CR - Xia, X. & Kulis B. (2017). W-net: A deep model for fully unsupervised image segmentation. ArXiv Preprint ArXiv:1711.08506. CR - Karras T., Aila T., Laine S., & Lehtinen J. (2017). Progressive growing of GANs for improved quality, stability, and variation. Proc. Int. Conf. Learn. Represent. UR - https://doi.org/10.29109/gujsc.1139701 L1 - https://dergipark.org.tr/tr/download/article-file/2522001 ER -