Iterative ensemble pseudo-labeling for convolutional neural networks

Serdar Yildiz; Mehmet Fatih Amasyali

Research Article

Year 2024, Volume: 42 Issue: 3, 862 - 874, 12.06.2024

Serdar Yildiz , Mehmet Fatih Amasyali

Abstract

References

REFERENCES
[1] Liu Z, Mao H, Wu CY, Feichtenhofer C, Darrell T, Xie S. A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022. p. 1197611986. [CrossRef]
[2] Brock A, De S, Smith SL, Simonyan K. High-performance large-scale image recognition without normalization. In: International Conference on Machine Learning, PMLR; 2021. p. 10591071.
[3] Zhang H, Wu C, Zhang Z, Zhu Y, Lin H, Zhang Z, Sun Y, He T, Mueller J, Manmatha R, et al. Resnest: Split-attention networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022. p. 27362746. [CrossRef]
[4] Deng J, Dong W, Socher R, Li LJ, Li K, FeiFei L. Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition; 2009. p. 248255. [CrossRef]
[5] Memiş A, Varlı S, Bilgili F. Semantic segmentation of the multiform proximal femur and femoral head bones with the deep convolutional neural networks in low quality MRI sections acquired in different MRI protocols. Comput Med Imag Graph. 2020;81:101715. [CrossRef]
[6] Fang S, Xie H, Wang Y, Mao Z, Zhang Y. Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021. p. 7098107. [CrossRef]
[7] Yıldız S, Aydemir O, Memiş A, Varlı S. A turnaround control system to automatically detect and monitor the time stamps of ground service actions in airports: A deep learning and computer vision based approach. Eng Appl Artif Intell 2022;114:105032. [CrossRef]
[8] Van Engelen JE, Hoos HH. A survey on semisupervised learning. Mach Learn 2020;109:373440. [CrossRef]
[9] Yang X, Song Z, King I, Xu Z. A survey on deep semi-supervised learning. IEEE Trans Knowld Data Eng 2022;35: 8934–8954. [CrossRef]
[10] Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016. p. 785794. [CrossRef]
[11] Amasyali MF, Ersoy OK. Classifier ensembles with the extended space forest. IEEE Trans Knowld Data Eng 2014;26:549562. [CrossRef]
[12] Yildiz S, Aydemir O, Memiş A, Varlı S. Customer churn analysis. In: 2020 28th Signal Processing and Communications Applications Conference (SIU), IEEE; 2020. p. 14. [CrossRef]
[13] Polikar R. Ensemble learning. In: Ensemble Machine Learning. Springer; 2012. p. 134. [CrossRef]
[14] Dong X, Yu Z, Cao W, Shi Y, Ma Q. A survey on ensemble learning. Frontiers of Computer Science. 2020;14:241258. [CrossRef]
[15] Lee DH, et al. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML; 2013. p. 896.
[16] Verma V, Kawaguchi K, Lamb A, Kannala J, Solin A, Bengio Y, Lopez-Paz D. Interpolation consistency training for semi-supervised learning. Neural Netw 2022;145:90106. [CrossRef]
[17] Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel CA. Mixmatch: A holistic approach to semi-supervised learning. Adv Neural Inform Process Syst 2019;32.
[18] Berthelot D, Carlini N, Cubuk ED, Kurakin A, Sohn K, Zhang H, et al. Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. arXiv preprint arXiv:1911.09785. 2019.
[19] Li J, Socher R, Hoi SC. Dividemix: Learning with noisy labels as semi-supervised learning. In: International Conference on Learning Representations; 2019.
[20] Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel CA, et al. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Adv Neural Inform Process Syst 2020;33:596608.
[21] Grandvalet Y, Bengio Y. Semi-supervised learning by entropy minimization. In: Saul L, Weiss Y, Bottou L, editors. Advances in Neural Information Processing Systems. Cambridge, Massachusetts: MIT Press; 2004.
[22] Arazo E, Ortego D, Albert P, O'Connor NE, McGuinness K. Pseudo-labeling and confirmation bias in deep semi-supervised learning. In: 2020 International Joint Conference on Neural Networks (IJCNN); 2020. p. 18. [CrossRef]
[23] Xie Q, Luong MT, Hovy E, Le QV. Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 1068710698. [CrossRef]
[24] Sajjadi M, Javanmardi M, Tasdizen T. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. Adv Neural Inf Process Syst 2016;29.
[25] Tarvainen A, Valpola H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Adv Neural Inf Process Syst 2017;30.
[26] Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory; 1998. p. 92100. [CrossRef]
[27] Zhou ZH, Li M. Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 2005;17:15291541. [CrossRef]
[28] Søgaard A. Simple semi-supervised training of part-of-speech taggers. In: Proceedings of the ACL 2010 Conference Short Papers; 2010. p. 205208.
[29] Ghosh S, Kumar S, Verma J, Kumar A. Self training with ensemble of teacher models. arXiv preprint arXiv:2107.08211. 2021.
[30] Grandvalet Y, Bengio Y. Semi-supervised learning by entropy minimization. Adv Neural Inf Process Syst. 2004;17.
[31] Smith LN. Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV); 2017. p. 464472. [CrossRef]
[32] Hearst M, Dumais S, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intelligent Syst Their Appl 1998;13:1828. [CrossRef]
[33] Coates A, Ng A, Lee H. An analysis of single-layer networks in unsupervised feature learning. In: Gordon G, Dunson D, Dudík M, editors. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics; 2011. p. 215223.
[34] Huang G, Liu Z, Weinberger KQ. Densely connected convolutional networks. CoRR abs/1608.06993. 2016. arXiv:1608.06993. [CrossRef]
[35] Loshchilov I, Hutter F. Decoupled weight decay regularization. In: International Conference on Learning Representations; 2018.
[36] Quinlan JR. Learning decision tree classifiers. ACM Comput Surv. 1996;28:7172. [CrossRef]
[37] Rigatti SJ. Random forest. J Insur Med 2017;47:3139. [CrossRef]
[38] Peterson LE. K-nearest neighbor. Scholarpedia 2009;4:1883. [CrossRef]
[39] Van der Maaten L, Hinton G. Visualizing data using t-sne. J Mach Learn Res 2008;9.
[40] Jaderberg M, Simonyan K, Zisserman A, et al. Spatial transformer networks. Adv Neural Inf Process Syst 2015;28.
[41] Kayhan OS, Van Gemert JC. On translation invariance in cnns: Convolutional layers can exploit absolute spatial location. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 1427414285.
[42] Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision; 2017. p. 618626. [CrossRef]
[43] McHugh ML. Interrater reliability: the kappa statistic. Biochemia Medica. 2012;22:276282. [CrossRef]

Iterative ensemble pseudo-labeling for convolutional neural networks

Year 2024, Volume: 42 Issue: 3, 862 - 874, 12.06.2024

Serdar Yildiz , Mehmet Fatih Amasyali

Abstract

As is well known, the quantity of labeled samples determines the success of a convolutional neural network (CNN). However, creating the labeled dataset is a difficult and time-consum-ing process. In contrast, unlabeled data is cheap and easy to access. Semi-supervised methods incorporate unlabeled data into the training process, which allows the model to learn from unlabeled data as well. We propose a semi-supervised method based on the ensemble ap-proach and the pseudo-labeling method. By balancing the unlabeled dataset with the labeled dataset during training, both the decision diversity between base-learner models and the in-dividual success of base-learner models are high in our proposed training strategy. We show that using multiple CNN models can result in both higher success and a more robust model than training a single CNN model. For inference, we propose using both stacking and voting methodologies. We have shown that the most successful algorithm for the stacking approach is the Support Vector Machine (SVM). In experiments, we use the STL-10 dataset to evaluate models, and we increased accuracy by 15.9% over training using only labeled data. Since we propose a training method based on cross-entropy loss, it can be implemented combined with state-of-the-art algorithms.

Keywords

Ensemble Learning, Pseudo Labeling, Semi-Supervised Learning, STL-10

References

REFERENCES
[1] Liu Z, Mao H, Wu CY, Feichtenhofer C, Darrell T, Xie S. A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022. p. 1197611986. [CrossRef]
[2] Brock A, De S, Smith SL, Simonyan K. High-performance large-scale image recognition without normalization. In: International Conference on Machine Learning, PMLR; 2021. p. 10591071.
[3] Zhang H, Wu C, Zhang Z, Zhu Y, Lin H, Zhang Z, Sun Y, He T, Mueller J, Manmatha R, et al. Resnest: Split-attention networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022. p. 27362746. [CrossRef]
[4] Deng J, Dong W, Socher R, Li LJ, Li K, FeiFei L. Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition; 2009. p. 248255. [CrossRef]
[5] Memiş A, Varlı S, Bilgili F. Semantic segmentation of the multiform proximal femur and femoral head bones with the deep convolutional neural networks in low quality MRI sections acquired in different MRI protocols. Comput Med Imag Graph. 2020;81:101715. [CrossRef]
[6] Fang S, Xie H, Wang Y, Mao Z, Zhang Y. Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021. p. 7098107. [CrossRef]
[7] Yıldız S, Aydemir O, Memiş A, Varlı S. A turnaround control system to automatically detect and monitor the time stamps of ground service actions in airports: A deep learning and computer vision based approach. Eng Appl Artif Intell 2022;114:105032. [CrossRef]
[8] Van Engelen JE, Hoos HH. A survey on semisupervised learning. Mach Learn 2020;109:373440. [CrossRef]
[9] Yang X, Song Z, King I, Xu Z. A survey on deep semi-supervised learning. IEEE Trans Knowld Data Eng 2022;35: 8934–8954. [CrossRef]
[10] Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016. p. 785794. [CrossRef]
[11] Amasyali MF, Ersoy OK. Classifier ensembles with the extended space forest. IEEE Trans Knowld Data Eng 2014;26:549562. [CrossRef]
[12] Yildiz S, Aydemir O, Memiş A, Varlı S. Customer churn analysis. In: 2020 28th Signal Processing and Communications Applications Conference (SIU), IEEE; 2020. p. 14. [CrossRef]
[13] Polikar R. Ensemble learning. In: Ensemble Machine Learning. Springer; 2012. p. 134. [CrossRef]
[14] Dong X, Yu Z, Cao W, Shi Y, Ma Q. A survey on ensemble learning. Frontiers of Computer Science. 2020;14:241258. [CrossRef]
[15] Lee DH, et al. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML; 2013. p. 896.
[16] Verma V, Kawaguchi K, Lamb A, Kannala J, Solin A, Bengio Y, Lopez-Paz D. Interpolation consistency training for semi-supervised learning. Neural Netw 2022;145:90106. [CrossRef]
[17] Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel CA. Mixmatch: A holistic approach to semi-supervised learning. Adv Neural Inform Process Syst 2019;32.
[18] Berthelot D, Carlini N, Cubuk ED, Kurakin A, Sohn K, Zhang H, et al. Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. arXiv preprint arXiv:1911.09785. 2019.
[19] Li J, Socher R, Hoi SC. Dividemix: Learning with noisy labels as semi-supervised learning. In: International Conference on Learning Representations; 2019.
[20] Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel CA, et al. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Adv Neural Inform Process Syst 2020;33:596608.
[21] Grandvalet Y, Bengio Y. Semi-supervised learning by entropy minimization. In: Saul L, Weiss Y, Bottou L, editors. Advances in Neural Information Processing Systems. Cambridge, Massachusetts: MIT Press; 2004.
[22] Arazo E, Ortego D, Albert P, O'Connor NE, McGuinness K. Pseudo-labeling and confirmation bias in deep semi-supervised learning. In: 2020 International Joint Conference on Neural Networks (IJCNN); 2020. p. 18. [CrossRef]
[23] Xie Q, Luong MT, Hovy E, Le QV. Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 1068710698. [CrossRef]
[24] Sajjadi M, Javanmardi M, Tasdizen T. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. Adv Neural Inf Process Syst 2016;29.
[25] Tarvainen A, Valpola H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Adv Neural Inf Process Syst 2017;30.
[26] Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory; 1998. p. 92100. [CrossRef]
[27] Zhou ZH, Li M. Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 2005;17:15291541. [CrossRef]
[28] Søgaard A. Simple semi-supervised training of part-of-speech taggers. In: Proceedings of the ACL 2010 Conference Short Papers; 2010. p. 205208.
[29] Ghosh S, Kumar S, Verma J, Kumar A. Self training with ensemble of teacher models. arXiv preprint arXiv:2107.08211. 2021.
[30] Grandvalet Y, Bengio Y. Semi-supervised learning by entropy minimization. Adv Neural Inf Process Syst. 2004;17.
[31] Smith LN. Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV); 2017. p. 464472. [CrossRef]
[32] Hearst M, Dumais S, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intelligent Syst Their Appl 1998;13:1828. [CrossRef]
[33] Coates A, Ng A, Lee H. An analysis of single-layer networks in unsupervised feature learning. In: Gordon G, Dunson D, Dudík M, editors. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics; 2011. p. 215223.
[34] Huang G, Liu Z, Weinberger KQ. Densely connected convolutional networks. CoRR abs/1608.06993. 2016. arXiv:1608.06993. [CrossRef]
[35] Loshchilov I, Hutter F. Decoupled weight decay regularization. In: International Conference on Learning Representations; 2018.
[36] Quinlan JR. Learning decision tree classifiers. ACM Comput Surv. 1996;28:7172. [CrossRef]
[37] Rigatti SJ. Random forest. J Insur Med 2017;47:3139. [CrossRef]
[38] Peterson LE. K-nearest neighbor. Scholarpedia 2009;4:1883. [CrossRef]
[39] Van der Maaten L, Hinton G. Visualizing data using t-sne. J Mach Learn Res 2008;9.
[40] Jaderberg M, Simonyan K, Zisserman A, et al. Spatial transformer networks. Adv Neural Inf Process Syst 2015;28.
[41] Kayhan OS, Van Gemert JC. On translation invariance in cnns: Convolutional layers can exploit absolute spatial location. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 1427414285.
[42] Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision; 2017. p. 618626. [CrossRef]
[43] McHugh ML. Interrater reliability: the kappa statistic. Biochemia Medica. 2012;22:276282. [CrossRef]

There are 44 citations in total.

Details

Primary Language	English
Subjects	Biochemistry and Cell Biology (Other)
Journal Section	Research Articles
Authors	Serdar Yildiz 0000-0002-0430-690X Mehmet Fatih Amasyali This is me
Publication Date	June 12, 2024
Submission Date	July 19, 2022
Published in Issue	Year 2024 Volume: 42 Issue: 3

Cite

Vancouver	Yildiz S, Amasyali MF. Iterative ensemble pseudo-labeling for convolutional neural networks. SIGMA. 2024;42(3):862-74.

Article Files

Full Text

IMPORTANT NOTE: JOURNAL SUBMISSION LINK https://eds.yildiz.edu.tr/sigma/