Review
BibTex RIS Cite

A Review on Self-Supervised Learning Methods in Computer Vision

Year 2024, , 1136 - 1165, 29.04.2024
https://doi.org/10.29130/dubited.1201292

Abstract

Although deep learning models have achieved great success in computer vision tasks such as image classification, object detection, image segmentation in the last decade, a large amount of labeled data requires in the training of these models, which are in a supervised learning approach. Therefore, in recent years, there has been an increased interest in self-supervised learning methods that can learn generalizable image representations by utilizing large-scale unlabeled data without the need for manually labeled data by humans. In this study, self-supervised learning methods used in computer vision tasks are comprehensively reviewed and categorization of self-supervised learning methods is provided. Performance comparisons of the reviewed self-supervised learning methods for image classification, object detection and image segmentation target tasks are presented. Finally, problematic issues in current methods are discussed and potential research topics are suggested for future studies.

References

  • [1] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818–2826.
  • [2] M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” in International Conference on Machine Learning (ICML), 2019, pp. 6105–6114.
  • [3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.
  • [4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations (ICLR), 2015, pp. 1–13.
  • [5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
  • [6] C. Szegedy et al., “Going deeper with convolutions,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
  • [7] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4700–4708.
  • [8] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv preprint arXiv:1804.02767, 2018.
  • [9] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 580–587.
  • [10] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” in Advances in Neural Information Processing Systems, 2017, pp. 91–99.
  • [11] R. Girshick, “Fast R-CNN,” in IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1440–1448.
  • [12] J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431–3440.
  • [13] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495, 2017.
  • [14] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015, pp. 234–241.
  • [15] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2961–2969.
  • [16] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, 2018.
  • [17] C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era,” in IEEE International Conference on Computer Vision (ICCV), 2017, pp. 843–852.
  • [18] A. V Joshi, “Amazon’s Machine Learning Toolkit: Sagemaker,” in Machine Learning and Artificial Intelligence, 2020, pp. 233–243.
  • [19] A. Chowdhury, J. Rosenthal, J. Waring, and R. Umeton, “Applying Self-Supervised Learning to Medicine: Review of the State of the Art and Medical Implementations,” Informatics, vol. 8, no. 3, p. 59, 2021.
  • [20] J. Deng, W. Dong, R. Socher, L.-J. Li, Kai Li, and Li Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255.
  • [21] M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1717–1724.
  • [22] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?,” in Advances in Neural Information Processing Systems, 2014, pp. 3320–3328.
  • [23] S. Shurrab and R. Duwairi, “Self-supervised learning methods and applications in medical imaging analysis: a survey,” PeerJ Comput. Sci., vol. 8, p. e1045, 2022.
  • [24] A. Tendle and M. R. Hasan, “A study of the generalizability of self-supervised representations,” Mach. Learn. with Appl., vol. 6, p. 100124, 2021.
  • [25] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A Simple Framework for Contrastive Learning of Visual Representations,” in International Conference on Machine Learning (ICML), 2020, pp. 1597–1607.
  • [26] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum Contrast for Unsupervised Visual Representation Learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9729–9738.
  • [27] J.-B. Grill et al., “Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning,” in Advances in Neural Information Processing Systems, 2020, pp. 21271–21284.
  • [28] X. Chen and K. He, “Exploring Simple Siamese Representation Learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 15750–15758.
  • [29] J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow Twins: Self-Supervised Learning via Redundancy Reduction,” in International Conference on Machine Learning (ICML), 2021, pp. 12310–12320.
  • [30] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised Learning of Visual Features by Contrasting Cluster Assignments,” in Advances in Neural Information Processing Systems, 2020, pp. 9912–9924.
  • [31] X. Liu et al., “Self-supervised Learning: Generative or Contrastive,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 1, pp. 857-876, 2021.
  • [32] K. Ohri and M. Kumar, “Review on self-supervised image recognition using deep neural networks,” Knowledge-Based Syst., vol. 224, p. 107090, 2021.
  • [33] L. Jing and Y. Tian, “Self-Supervised Visual Feature Learning with Deep Neural Networks: A Survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 11, pp. 4037–4058, 2021.
  • [34] Y. Bastanlar and S. Orhan, “Self-Supervised Contrastive Representation Learning in Computer Vision,” in Applied Intelligence- Annual Volume 2022 [Working Title], London, United Kingdom: IntechOpen, 2022.
  • [35] R. Zhang, P. Isola, A. A. Efros, and B. A. Research, “Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1058–1067.
  • [36] L. Ericsson, H. Gouk, C. C. Loy, and T. M. Hospedales, “Self-Supervised Representation Learning: Introduction, advances, and challenges,” IEEE Signal Process. Mag., vol. 39, no. 3, pp. 42–62, 2022.
  • [37] P. Vincent, H. Larochelle, Y. Bengio, and P. A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in International Conference on Machine Learning (ICML), 2008, pp. 1096–1103.
  • [38] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, “Context Encoders: Feature Learning by Inpainting,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2536–2544.
  • [39] R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in European Conference on Computer Vision (ECCV), 2016, pp. 649–666.
  • [40] C. Doersch, A. Gupta, and A. A. Efros, “Unsupervised Visual Representation Learning by Context Prediction,” in IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1422–1430.
  • [41] M. Noroozi and P. Favaro, “Unsupervised learning of visual representations by solving jigsaw puzzles,” in European Conference on Computer Vision (ECCV), 2016, pp. 69–84.
  • [42] S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting image rotations,” in International Conference on Learning Representations (ICLR), 2018.
  • [43] M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering for unsupervised learning of visual features,” in European Conference on Computer Vision (ECCV), 2018, pp. 132–149.
  • [44] Y. M. Asano, C. Rupprecht, and A. Vedaldi, “Self-labelling via simultaneous clustering and representation learning,” arXiv preprint arXiv:1911.05371, 2019.
  • [45] M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” in Advances in Neural Information Processing Systems, 2013, pp. 2292–2300.
  • [46] R. Epstein. (2023, Aug. 11). The empty brain [Online]. Available: https://aeon.co/essays/your-brain-does-not-process-information-and-it-is-not-a-computer.
  • [47] R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006, pp. 1735–1742.
  • [48] F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and clustering,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 815–823.
  • [49] K. Sohn, “Improved deep metric learning with multi-class N-pair loss objective,” in Advances in Neural Information Processing Systems, 2016, pp. 1857–1865.
  • [50] Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, “Unsupervised Feature Learning via Non-parametric Instance Discrimination,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3733–3742.
  • [51] M. Gutmann and A. Hyvärinen, “Noise-contrastive estimation: A new estimation principle for unnormalized statistical models,” in International Conference on Artificial Intelligence and Statistics (AISTATS), 2010, pp. 297–304.
  • [52] A. van den Oord, Y. Li, and O. Vinyals, “Representation Learning with Contrastive Predictive Coding,” arXiv Preprint arXiv:1807.03748, 2018.
  • [53] A. Dosovitskiy, P. Fischer, J. T. Springenberg, M. Riedmiller, and T. Brox, “Discriminative unsupervised feature learning with exemplar convolutional neural networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 9, pp. 1734–1747, 2016.
  • [54] A. Van Den Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” in International Conference on Machine Learning (ICML), 2016, pp. 2611–2620.
  • [55] O. J. Henaff et al., “Data-Efficient image recognition with contrastive predictive coding,” in International Conference on Machine Learning (ICML), 2020, pp. 4182–4192.
  • [56] R. Devon Hjelm et al., “Learning deep representations by mutual information estimation and maximization,” in International Conference on Learning Representations (ICLR), 2019, pp. 1–24.
  • [57] P. Bachman, R. Devon Hjelm, and W. Buchwalter, “Learning representations by maximizing mutual information across views,” in Advances in Neural Information Processing Systems, 2019, pp. 15535–15545.
  • [58] Y. Tian, D. Krishnan, and P. Isola, “Contrastive Multiview Coding,” in European Conference on Computer Vision (ECCV), 2020, pp. 776–794.
  • [59] I. Misra and L. van der Maaten, “Self-Supervised Learning of Pretext-Invariant Representations,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 6707–6717.
  • [60] X. Chen, H. Fan, R. Girshick, and K. He, “Improved Baselines with Momentum Contrastive Learning,” arXiv preprint arXiv:2003.04297, 2020.
  • [61] T. Chen, S. Kornblith, K. Swersky, M. Norouzi, and G. Hinton, “Big self-supervised models are strong semi-supervised learners,” in Advances in Neural Information Processing Systems, 2020, pp. 22243–22255.
  • [62] M. Caron et al., “Emerging Properties in Self-Supervised Vision Transformers,” in IEEE International Conference on Computer Vision (ICCV), 2021, pp. 9630–9640.
  • [63] A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” arXiv preprint arXiv:2010.11929, 2020.
  • [64] M. Oquab et al., “DINOv2: Learning Robust Visual Features without Supervision,” arXiv preprint arXiv:2304.07193, 2023.
  • [65] P. Goyal, D. Mahajan, A. Gupta, and I. Misra, “Scaling and benchmarking self-supervised visual representation learning,” in IEEE International Conference on Computer Vision (ICCV), 2019, pp. 6390–6399.
  • [66] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, “Learning deep features for scene recognition using places database,” in Advances in Neural Information Processing Systems, 2014, pp. 487–495.
  • [67] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (VOC) challenge,” Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, 2010.
  • [68] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Object Classes Challenge: A Retrospective,” Int. J. Comput. Vis., vol. 111, no. 1, pp. 98–136, 2015.
  • [69] P. Krähenbühl, C. Doersch, J. Donahue, and T. Darrell, “Data-dependent initializations of convolutional neural networks,” arXiv preprint arXiv:1511.06856, 2015.
  • [70] S. Arora, H. Khandeparkar, M. Khodak, O. Plevrakis, and N. Saunshi, “A theoretical analysis of contrastive unsupervised representation learning,” in International Conference on Machine Learning (ICML), 2019, pp. 9904–9923.
  • [71] F. Bordes, S. Lavoie, R. Balestriero, N. Ballas, and P. Vincent, “A surprisingly simple technique to control the pretraining bias for better transfer: Expand or Narrow your representation,” arXiv preprint arXiv:2304.05369, 2023.
  • [72] Z. Xie et al., “Self-Supervised Learning with Swin Transformers,” arXiv preprint arXiv:2105.04553, 2021.
  • [73] X. Chen, S. Xie, and K. He, “An Empirical Study of Training Self-Supervised Vision Transformers,” in IEEE International Conference on Computer Vision (ICCV), 2021, pp. 9620–9629.
  • [74] C. Li et al., “Efficient Self-supervised Vision Transformers for Representation Learning,” arXiv preprint arXiv:2106.09785, 2022.
  • [75] S. Albelwi, “Survey on Self-Supervised Learning: Auxiliary Pretext Tasks and Contrastive Learning Methods in Imaging,” Entropy, vol. 24, no. 4, p. 551, 2022.
  • [76] F. Bordes, R. Balestriero, and P. Vincent, “Towards Democratizing Joint-Embedding Self-Supervised Learning,” arXiv preprint arXiv:2303.01986, 2023.
  • [77] R. Balestriero et al., “A Cookbook of Self-Supervised Learning,” arXiv preprint arXiv:2304.12210, 2023.
  • [78] C. Zhang, Z. Hao, and Y. Gu, “Dive into the Details of Self-Supervised Learning for Medical Image Analysis,” Med. Image Anal., vol. 89, p. 102879, 2023.
  • [79] S. C. Huang, A. Pareek, M. Jensen, M. P. Lungren, S. Yeung, and A. S. Chaudhari, “Self-supervised learning for medical image classification: a systematic review and implementation guidelines,” NPJ Digit. Med., vol. 6, no. 1, p. 74, 2023.
  • [80] Y. Tian, C. Sun, B. Poole, D. Krishnan, C. Schmid, and P. Isola, “What makes for good views for contrastive learning?,” in Advances in Neural Information Processing Systems, 2020, pp. 6827–6839.
  • [81] X. Wang and G. J. Qi, “Contrastive Learning with Stronger Augmentations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 5, pp. 5549–5560, 2022. [82] Y. Kalantidis, M. B. Sariyildiz, N. Pion, P. Weinzaepfel, and D. Larlus, “Hard negative mixing for contrastive learning,” in Advances in Neural Information Processing Systems, 2020, pp. 21798–21809.
  • [83] J. Robinson, C.-Y. Chuang, S. Sra, and S. Jegelka, “Contrastive Learning with Hard Negative Samples,” arXiv preprint arXiv:2010.04592, 2020.
  • [84] Y. Tian, X. Chen, and S. Ganguli, “Understanding self-supervised Learning Dynamics without Contrastive Pairs,” in International Conference on Machine Learning (ICML), 2021, pp. 10268–10278.

Bilgisayarlı Görüde Öz-Denetimli Öğrenme Yöntemleri Üzerine Bir İnceleme

Year 2024, , 1136 - 1165, 29.04.2024
https://doi.org/10.29130/dubited.1201292

Abstract

Derin öğrenme modelleri son on yılda görüntü sınıflandırma, nesne tespiti, görüntü bölütleme vb. bilgisayarlı görü görevlerinde büyük başarılar elde etmelerine rağmen denetimli öğrenme yaklaşımında olan bu modellerin eğitiminde büyük miktarda etiketli veriye ihtiyaç duyulmaktadır. Bu nedenle, son yıllarda insanlar tarafından manuel olarak etiketlenen veriye ihtiyaç duymadan etiketsiz büyük boyutlu veriden faydalanarak genelleştirilebilir görüntü temsillerini öğrenebilen öz-denetimli öğrenme yöntemlerine ilgi artmıştır. Bu çalışmada, bilgisayarla görü görevlerinde kullanılan öz denetimli öğrenme yöntemleri kapsamlı bir şekilde incelenmiş ve öz denetimli öğrenme yöntemlerinin kategorizasyonu sağlanmıştır. İncelenen öz-denetimli öğrenme yöntemlerinin görüntü sınıflandırma, nesne tespiti ve görüntü bölütleme hedef görevleri için performans karşılaştırmaları sunulmuştur. Son olarak, mevcut yöntemlerdeki sorunlu hususlar tartışılmakta ve gelecek çalışmalar için potansiyel araştırma konuları önerilmektedir.

References

  • [1] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818–2826.
  • [2] M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” in International Conference on Machine Learning (ICML), 2019, pp. 6105–6114.
  • [3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.
  • [4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations (ICLR), 2015, pp. 1–13.
  • [5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
  • [6] C. Szegedy et al., “Going deeper with convolutions,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
  • [7] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4700–4708.
  • [8] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv preprint arXiv:1804.02767, 2018.
  • [9] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 580–587.
  • [10] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” in Advances in Neural Information Processing Systems, 2017, pp. 91–99.
  • [11] R. Girshick, “Fast R-CNN,” in IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1440–1448.
  • [12] J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431–3440.
  • [13] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495, 2017.
  • [14] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015, pp. 234–241.
  • [15] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2961–2969.
  • [16] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, 2018.
  • [17] C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era,” in IEEE International Conference on Computer Vision (ICCV), 2017, pp. 843–852.
  • [18] A. V Joshi, “Amazon’s Machine Learning Toolkit: Sagemaker,” in Machine Learning and Artificial Intelligence, 2020, pp. 233–243.
  • [19] A. Chowdhury, J. Rosenthal, J. Waring, and R. Umeton, “Applying Self-Supervised Learning to Medicine: Review of the State of the Art and Medical Implementations,” Informatics, vol. 8, no. 3, p. 59, 2021.
  • [20] J. Deng, W. Dong, R. Socher, L.-J. Li, Kai Li, and Li Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255.
  • [21] M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1717–1724.
  • [22] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?,” in Advances in Neural Information Processing Systems, 2014, pp. 3320–3328.
  • [23] S. Shurrab and R. Duwairi, “Self-supervised learning methods and applications in medical imaging analysis: a survey,” PeerJ Comput. Sci., vol. 8, p. e1045, 2022.
  • [24] A. Tendle and M. R. Hasan, “A study of the generalizability of self-supervised representations,” Mach. Learn. with Appl., vol. 6, p. 100124, 2021.
  • [25] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A Simple Framework for Contrastive Learning of Visual Representations,” in International Conference on Machine Learning (ICML), 2020, pp. 1597–1607.
  • [26] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum Contrast for Unsupervised Visual Representation Learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9729–9738.
  • [27] J.-B. Grill et al., “Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning,” in Advances in Neural Information Processing Systems, 2020, pp. 21271–21284.
  • [28] X. Chen and K. He, “Exploring Simple Siamese Representation Learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 15750–15758.
  • [29] J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow Twins: Self-Supervised Learning via Redundancy Reduction,” in International Conference on Machine Learning (ICML), 2021, pp. 12310–12320.
  • [30] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised Learning of Visual Features by Contrasting Cluster Assignments,” in Advances in Neural Information Processing Systems, 2020, pp. 9912–9924.
  • [31] X. Liu et al., “Self-supervised Learning: Generative or Contrastive,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 1, pp. 857-876, 2021.
  • [32] K. Ohri and M. Kumar, “Review on self-supervised image recognition using deep neural networks,” Knowledge-Based Syst., vol. 224, p. 107090, 2021.
  • [33] L. Jing and Y. Tian, “Self-Supervised Visual Feature Learning with Deep Neural Networks: A Survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 11, pp. 4037–4058, 2021.
  • [34] Y. Bastanlar and S. Orhan, “Self-Supervised Contrastive Representation Learning in Computer Vision,” in Applied Intelligence- Annual Volume 2022 [Working Title], London, United Kingdom: IntechOpen, 2022.
  • [35] R. Zhang, P. Isola, A. A. Efros, and B. A. Research, “Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1058–1067.
  • [36] L. Ericsson, H. Gouk, C. C. Loy, and T. M. Hospedales, “Self-Supervised Representation Learning: Introduction, advances, and challenges,” IEEE Signal Process. Mag., vol. 39, no. 3, pp. 42–62, 2022.
  • [37] P. Vincent, H. Larochelle, Y. Bengio, and P. A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in International Conference on Machine Learning (ICML), 2008, pp. 1096–1103.
  • [38] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, “Context Encoders: Feature Learning by Inpainting,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2536–2544.
  • [39] R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in European Conference on Computer Vision (ECCV), 2016, pp. 649–666.
  • [40] C. Doersch, A. Gupta, and A. A. Efros, “Unsupervised Visual Representation Learning by Context Prediction,” in IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1422–1430.
  • [41] M. Noroozi and P. Favaro, “Unsupervised learning of visual representations by solving jigsaw puzzles,” in European Conference on Computer Vision (ECCV), 2016, pp. 69–84.
  • [42] S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting image rotations,” in International Conference on Learning Representations (ICLR), 2018.
  • [43] M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering for unsupervised learning of visual features,” in European Conference on Computer Vision (ECCV), 2018, pp. 132–149.
  • [44] Y. M. Asano, C. Rupprecht, and A. Vedaldi, “Self-labelling via simultaneous clustering and representation learning,” arXiv preprint arXiv:1911.05371, 2019.
  • [45] M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” in Advances in Neural Information Processing Systems, 2013, pp. 2292–2300.
  • [46] R. Epstein. (2023, Aug. 11). The empty brain [Online]. Available: https://aeon.co/essays/your-brain-does-not-process-information-and-it-is-not-a-computer.
  • [47] R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006, pp. 1735–1742.
  • [48] F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and clustering,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 815–823.
  • [49] K. Sohn, “Improved deep metric learning with multi-class N-pair loss objective,” in Advances in Neural Information Processing Systems, 2016, pp. 1857–1865.
  • [50] Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, “Unsupervised Feature Learning via Non-parametric Instance Discrimination,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3733–3742.
  • [51] M. Gutmann and A. Hyvärinen, “Noise-contrastive estimation: A new estimation principle for unnormalized statistical models,” in International Conference on Artificial Intelligence and Statistics (AISTATS), 2010, pp. 297–304.
  • [52] A. van den Oord, Y. Li, and O. Vinyals, “Representation Learning with Contrastive Predictive Coding,” arXiv Preprint arXiv:1807.03748, 2018.
  • [53] A. Dosovitskiy, P. Fischer, J. T. Springenberg, M. Riedmiller, and T. Brox, “Discriminative unsupervised feature learning with exemplar convolutional neural networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 9, pp. 1734–1747, 2016.
  • [54] A. Van Den Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” in International Conference on Machine Learning (ICML), 2016, pp. 2611–2620.
  • [55] O. J. Henaff et al., “Data-Efficient image recognition with contrastive predictive coding,” in International Conference on Machine Learning (ICML), 2020, pp. 4182–4192.
  • [56] R. Devon Hjelm et al., “Learning deep representations by mutual information estimation and maximization,” in International Conference on Learning Representations (ICLR), 2019, pp. 1–24.
  • [57] P. Bachman, R. Devon Hjelm, and W. Buchwalter, “Learning representations by maximizing mutual information across views,” in Advances in Neural Information Processing Systems, 2019, pp. 15535–15545.
  • [58] Y. Tian, D. Krishnan, and P. Isola, “Contrastive Multiview Coding,” in European Conference on Computer Vision (ECCV), 2020, pp. 776–794.
  • [59] I. Misra and L. van der Maaten, “Self-Supervised Learning of Pretext-Invariant Representations,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 6707–6717.
  • [60] X. Chen, H. Fan, R. Girshick, and K. He, “Improved Baselines with Momentum Contrastive Learning,” arXiv preprint arXiv:2003.04297, 2020.
  • [61] T. Chen, S. Kornblith, K. Swersky, M. Norouzi, and G. Hinton, “Big self-supervised models are strong semi-supervised learners,” in Advances in Neural Information Processing Systems, 2020, pp. 22243–22255.
  • [62] M. Caron et al., “Emerging Properties in Self-Supervised Vision Transformers,” in IEEE International Conference on Computer Vision (ICCV), 2021, pp. 9630–9640.
  • [63] A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” arXiv preprint arXiv:2010.11929, 2020.
  • [64] M. Oquab et al., “DINOv2: Learning Robust Visual Features without Supervision,” arXiv preprint arXiv:2304.07193, 2023.
  • [65] P. Goyal, D. Mahajan, A. Gupta, and I. Misra, “Scaling and benchmarking self-supervised visual representation learning,” in IEEE International Conference on Computer Vision (ICCV), 2019, pp. 6390–6399.
  • [66] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, “Learning deep features for scene recognition using places database,” in Advances in Neural Information Processing Systems, 2014, pp. 487–495.
  • [67] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (VOC) challenge,” Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, 2010.
  • [68] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Object Classes Challenge: A Retrospective,” Int. J. Comput. Vis., vol. 111, no. 1, pp. 98–136, 2015.
  • [69] P. Krähenbühl, C. Doersch, J. Donahue, and T. Darrell, “Data-dependent initializations of convolutional neural networks,” arXiv preprint arXiv:1511.06856, 2015.
  • [70] S. Arora, H. Khandeparkar, M. Khodak, O. Plevrakis, and N. Saunshi, “A theoretical analysis of contrastive unsupervised representation learning,” in International Conference on Machine Learning (ICML), 2019, pp. 9904–9923.
  • [71] F. Bordes, S. Lavoie, R. Balestriero, N. Ballas, and P. Vincent, “A surprisingly simple technique to control the pretraining bias for better transfer: Expand or Narrow your representation,” arXiv preprint arXiv:2304.05369, 2023.
  • [72] Z. Xie et al., “Self-Supervised Learning with Swin Transformers,” arXiv preprint arXiv:2105.04553, 2021.
  • [73] X. Chen, S. Xie, and K. He, “An Empirical Study of Training Self-Supervised Vision Transformers,” in IEEE International Conference on Computer Vision (ICCV), 2021, pp. 9620–9629.
  • [74] C. Li et al., “Efficient Self-supervised Vision Transformers for Representation Learning,” arXiv preprint arXiv:2106.09785, 2022.
  • [75] S. Albelwi, “Survey on Self-Supervised Learning: Auxiliary Pretext Tasks and Contrastive Learning Methods in Imaging,” Entropy, vol. 24, no. 4, p. 551, 2022.
  • [76] F. Bordes, R. Balestriero, and P. Vincent, “Towards Democratizing Joint-Embedding Self-Supervised Learning,” arXiv preprint arXiv:2303.01986, 2023.
  • [77] R. Balestriero et al., “A Cookbook of Self-Supervised Learning,” arXiv preprint arXiv:2304.12210, 2023.
  • [78] C. Zhang, Z. Hao, and Y. Gu, “Dive into the Details of Self-Supervised Learning for Medical Image Analysis,” Med. Image Anal., vol. 89, p. 102879, 2023.
  • [79] S. C. Huang, A. Pareek, M. Jensen, M. P. Lungren, S. Yeung, and A. S. Chaudhari, “Self-supervised learning for medical image classification: a systematic review and implementation guidelines,” NPJ Digit. Med., vol. 6, no. 1, p. 74, 2023.
  • [80] Y. Tian, C. Sun, B. Poole, D. Krishnan, C. Schmid, and P. Isola, “What makes for good views for contrastive learning?,” in Advances in Neural Information Processing Systems, 2020, pp. 6827–6839.
  • [81] X. Wang and G. J. Qi, “Contrastive Learning with Stronger Augmentations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 5, pp. 5549–5560, 2022. [82] Y. Kalantidis, M. B. Sariyildiz, N. Pion, P. Weinzaepfel, and D. Larlus, “Hard negative mixing for contrastive learning,” in Advances in Neural Information Processing Systems, 2020, pp. 21798–21809.
  • [83] J. Robinson, C.-Y. Chuang, S. Sra, and S. Jegelka, “Contrastive Learning with Hard Negative Samples,” arXiv preprint arXiv:2010.04592, 2020.
  • [84] Y. Tian, X. Chen, and S. Ganguli, “Understanding self-supervised Learning Dynamics without Contrastive Pairs,” in International Conference on Machine Learning (ICML), 2021, pp. 10268–10278.
There are 83 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Articles
Authors

Serdar Alasu 0000-0003-2267-9707

Muhammed Fatih Talu 0000-0003-1166-8404

Publication Date April 29, 2024
Published in Issue Year 2024

Cite

APA Alasu, S., & Talu, M. F. (2024). Bilgisayarlı Görüde Öz-Denetimli Öğrenme Yöntemleri Üzerine Bir İnceleme. Duzce University Journal of Science and Technology, 12(2), 1136-1165. https://doi.org/10.29130/dubited.1201292
AMA Alasu S, Talu MF. Bilgisayarlı Görüde Öz-Denetimli Öğrenme Yöntemleri Üzerine Bir İnceleme. DÜBİTED. April 2024;12(2):1136-1165. doi:10.29130/dubited.1201292
Chicago Alasu, Serdar, and Muhammed Fatih Talu. “Bilgisayarlı Görüde Öz-Denetimli Öğrenme Yöntemleri Üzerine Bir İnceleme”. Duzce University Journal of Science and Technology 12, no. 2 (April 2024): 1136-65. https://doi.org/10.29130/dubited.1201292.
EndNote Alasu S, Talu MF (April 1, 2024) Bilgisayarlı Görüde Öz-Denetimli Öğrenme Yöntemleri Üzerine Bir İnceleme. Duzce University Journal of Science and Technology 12 2 1136–1165.
IEEE S. Alasu and M. F. Talu, “Bilgisayarlı Görüde Öz-Denetimli Öğrenme Yöntemleri Üzerine Bir İnceleme”, DÜBİTED, vol. 12, no. 2, pp. 1136–1165, 2024, doi: 10.29130/dubited.1201292.
ISNAD Alasu, Serdar - Talu, Muhammed Fatih. “Bilgisayarlı Görüde Öz-Denetimli Öğrenme Yöntemleri Üzerine Bir İnceleme”. Duzce University Journal of Science and Technology 12/2 (April 2024), 1136-1165. https://doi.org/10.29130/dubited.1201292.
JAMA Alasu S, Talu MF. Bilgisayarlı Görüde Öz-Denetimli Öğrenme Yöntemleri Üzerine Bir İnceleme. DÜBİTED. 2024;12:1136–1165.
MLA Alasu, Serdar and Muhammed Fatih Talu. “Bilgisayarlı Görüde Öz-Denetimli Öğrenme Yöntemleri Üzerine Bir İnceleme”. Duzce University Journal of Science and Technology, vol. 12, no. 2, 2024, pp. 1136-65, doi:10.29130/dubited.1201292.
Vancouver Alasu S, Talu MF. Bilgisayarlı Görüde Öz-Denetimli Öğrenme Yöntemleri Üzerine Bir İnceleme. DÜBİTED. 2024;12(2):1136-65.