Research Article
BibTex RIS Cite

The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size

Year 2026, Volume: 9 Issue: 1, 13 - 14
https://doi.org/10.34248/bsengineering.1739598

Abstract

Model complexity, dataset size and optimizer choice critically influence machine learning model performance, especially in complex architectures like Transformers. This study aims to analyze the impact of seven optimizers —Adam, AdamW, AdaBelief, RMSprop, Nadam, Adagrad and SGD—across two Transformer configurations and three dataset sizes. Results show adaptive optimizers generally outperform non-adaptive ones like SGD, particularly as dataset size grows. For smaller datasets (20K, 50K), Adam, AdamW, Nadam and RMSprop perform best on low-complexity models, while AdaBelief, Adagrad and SGD excel with higher complexity. On the largest dataset (∼140K samples), Nadam and RMSprop lead in low-complexity models, whereas Adam, AdaBelief, Adagrad, SGD and AdamW do so in high-complexity models. Notably, low-complexity models train more than twice as fast and, in some cases, achieve better accuracy and lower loss than their high-complexity counterparts. This result highlighting the importance of balancing optimizer choice, dataset size and model complexity for efficiency and accuracy. These results emphasize the trade-offs associated with optimizing model efficiency and accuracy through the interplay of optimizer selection, dataset size and model complexity.

Ethical Statement

This study does not involve human participants or animals. All data used are publicly available, and no ethical approval was required.

Supporting Institution

This work has been supported by the Scientific Research Projects Coordination Unit of the Sivas University of Science and Technology.

Project Number

2024-DTP-Müh-0004

Thanks

The authors would like to thank the Scientific Research Projects Coordination Unit of the Sivas University of Science and Technology for supporting this work under Project Number: 2024-DTP-Müh-0004. Computing resources were provided by the National Center for High Performance Computing of Turkey (UHeM) under grant number 5020092024. Additional computational support was provided by the TUBITAK ULAKBIM High Performance and Grid Computing Center (TRUBA) and the Lütfi Albay Artificial Intelligence and Robotics Laboratory at the Sivas University of Science and Technology.

References

  • Abdulmumin, I., Galadanci, B. S., & Isa, A. (2021). Enhanced back-translation for low resource neural machine translation using self-training. Communications in Computer and Information Science, 1350, 355–371. https://doi.org/10.1007/978-3-030-69143-1_28
  • Ahmad, R., & Al-Ramahi, I. A. M. (2023). Optimization of deep learning models: Benchmark and analysis. Advances in Computational Intelligence, 3(2), 1–15. https://doi.org/10.1007/s43674-023-00055-1

The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size

Year 2026, Volume: 9 Issue: 1, 13 - 14
https://doi.org/10.34248/bsengineering.1739598

Abstract

Model karmaşıklığı, veri kümesi boyutu ve optimizer seçimi, özellikle Transformatörler gibi karmaşık mimarilerde makine öğrenimi model performansını önemli ölçüde etkiler. Bu çalışma, yedi optimizer’ın -Adam, AdamW, AdaBelief, RMSprop, Nadam, Adagrad ve SGD- iki Transformer konfigürasyonu ve üç veri kümesi boyutu üzerindeki etkisini analiz etmeyi amaçlamaktadır. Sonuçlar, özellikle veri kümesi boyutu büyüdükçe, uyarlanabilir optimizer’ların genellikle SGD gibi uyarlanabilir olmayanlardan daha iyi performans gösterdiğini göstermektedir. Daha küçük veri kümeleri (20K, 50K) için Adam, AdamW, Nadam ve RMSprop düşük karmaşıklığa sahip modellerde en iyi performansı gösterirken AdaBelief, Adagrad ve SGD daha yüksek karmaşıklıkta üstünlük sağlamaktadır. En büyük veri setinde (∼140K örnek), Nadam ve RMSprop düşük karmaşıklıktaki modellerde lider olurken, Adam, AdaBelief, Adagrad, SGD ve AdamW yüksek karmaşıklıktaki modellerde bunu başarmaktadır. Özellikle, düşük karmaşıklıktaki modeller iki kattan daha hızlı eğitilmekte ve bazı durumlarda yüksek karmaşıklıktaki muadillerine göre daha iyi doğruluk ve daha düşük kayıp elde etmektedir. Bu sonuç, verimlilik ve doğruluk için optimizer seçimi, veri kümesi boyutu ve model karmaşıklığını dengelemenin önemini vurgulamaktadır. Bu sonuçlar, optimizer seçimi, veri kümesi boyutu ve model karmaşıklığının karşılıklı etkileşimi yoluyla model verimliliği ve doğruluğunun optimize edilmesiyle ilişkili ödünleşimleri vurgulamaktadır.

Ethical Statement

Bu çalışma insan katılımcıları veya hayvanları içermemektedir. Kullanılan tüm veriler kamuya açıktır ve herhangi bir etik onay gerekmemiştir.

Project Number

2024-DTP-Müh-0004

References

  • Abdulmumin, I., Galadanci, B. S., & Isa, A. (2021). Enhanced back-translation for low resource neural machine translation using self-training. Communications in Computer and Information Science, 1350, 355–371. https://doi.org/10.1007/978-3-030-69143-1_28
  • Ahmad, R., & Al-Ramahi, I. A. M. (2023). Optimization of deep learning models: Benchmark and analysis. Advances in Computational Intelligence, 3(2), 1–15. https://doi.org/10.1007/s43674-023-00055-1
There are 2 citations in total.

Details

Primary Language English
Subjects Statistical Data Science
Journal Section Research Article
Authors

Hilal Çelik 0000-0001-5428-3411

Ramazan Katırcı 0000-0003-2448-011X

Project Number 2024-DTP-Müh-0004
Early Pub Date December 3, 2025
Publication Date December 3, 2025
Submission Date July 11, 2025
Acceptance Date November 22, 2025
Published in Issue Year 2026 Volume: 9 Issue: 1

Cite

APA Çelik, H., & Katırcı, R. (2025). The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size. Black Sea Journal of Engineering and Science, 9(1), 13-14. https://doi.org/10.34248/bsengineering.1739598
AMA Çelik H, Katırcı R. The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size. BSJ Eng. Sci. December 2025;9(1):13-14. doi:10.34248/bsengineering.1739598
Chicago Çelik, Hilal, and Ramazan Katırcı. “The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size”. Black Sea Journal of Engineering and Science 9, no. 1 (December 2025): 13-14. https://doi.org/10.34248/bsengineering.1739598.
EndNote Çelik H, Katırcı R (December 1, 2025) The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size. Black Sea Journal of Engineering and Science 9 1 13–14.
IEEE H. Çelik and R. Katırcı, “The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size”, BSJ Eng. Sci., vol. 9, no. 1, pp. 13–14, 2025, doi: 10.34248/bsengineering.1739598.
ISNAD Çelik, Hilal - Katırcı, Ramazan. “The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size”. Black Sea Journal of Engineering and Science 9/1 (December2025), 13-14. https://doi.org/10.34248/bsengineering.1739598.
JAMA Çelik H, Katırcı R. The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size. BSJ Eng. Sci. 2025;9:13–14.
MLA Çelik, Hilal and Ramazan Katırcı. “The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size”. Black Sea Journal of Engineering and Science, vol. 9, no. 1, 2025, pp. 13-14, doi:10.34248/bsengineering.1739598.
Vancouver Çelik H, Katırcı R. The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size. BSJ Eng. Sci. 2025;9(1):13-4.

                            24890