Research Article

The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size

Volume: 9 Number: 1 January 15, 2026
EN TR

The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size

Abstract

Model complexity, dataset size and optimizer choice critically influence machine learning model performance, especially in complex architectures like Transformers. This study aims to analyze the impact of seven optimizers —Adam, AdamW, AdaBelief, RMSprop, Nadam, Adagrad and SGD—across two Transformer configurations and three dataset sizes. Results show adaptive optimizers generally outperform non-adaptive ones like SGD, particularly as dataset size grows. For smaller datasets (20K, 50K), Adam, AdamW, Nadam and RMSprop perform best on low-complexity models, while AdaBelief, Adagrad and SGD excel with higher complexity. On the largest dataset (∼140K samples), Nadam and RMSprop lead in low-complexity models, whereas Adam, AdaBelief, Adagrad, SGD and AdamW do so in high-complexity models. Notably, low-complexity models train more than twice as fast and, in some cases, achieve better accuracy and lower loss than their high-complexity counterparts. This result highlighting the importance of balancing optimizer choice, dataset size and model complexity for efficiency and accuracy. These results emphasize the trade-offs associated with optimizing model efficiency and accuracy through the interplay of optimizer selection, dataset size and model complexity.

Keywords

Supporting Institution

This work has been supported by the Scientific Research Projects Coordination Unit of the Sivas University of Science and Technology.

Project Number

2024-DTP-Müh-0004

Ethical Statement

Ethics committee approval was not required for this study because of there was no study on animals or humans.

Thanks

This work has been supported by the Scientific Research ProjectsCoordination Unit of the Sivas University of Science and Technology. ProjectNumber: 2024-DTP-Müh-0004. Computing resources used in this work were providedby the National Center for High Performance Computing of Türkiye (UHeM) undergrant number 5020092024. The research utilized computational resources providedby the TUBITAK ULAKBIM High Performance and Grid Computing Center (TRUBA) andthe Lütfi Albay Artificial Intelligence and Robotics Laboratory at Sivas Universityof Science and Technology."

References

  1. Abdulmumin, I., Galadanci, B. S., & Isa, A. (2021). Enhanced back-translation for low resource neural machine translation using self-training. Communications in Computer and Information Science, 1350, 355–371. https://doi.org/10.1007/978-3-030-69143-1_28
  2. Ahmad, R., & Al-Ramahi, I. A. M. (2023). Optimization of deep learning models: Benchmark and analysis. Advances in Computational Intelligence, 3(2), 1–15. https://doi.org/10.1007/s43674-023-00055-1
  3. Babaeianjelodar, M., Lorenz, S., Gordon, J., Matthews, J., & Freitag, E. (2020). Quantifying gender bias in different corpora. Companion Proceedings of the Web Conference 2020, 752–759. https://doi.org/10.1145/3366424.3383559
  4. Baskakov, D. (2023). The computational complexity of machine learning (Issue January). Springer. https://doi.org/10.1007/978-981-33-6632-9
  5. Çelik, H., Katırcı, R., & İşlek, B. (2024). Effect of parameters on performance in question-answer model with simple RNN deep learning method. International Conference on Scientific and Innovation Research, 161–169.
  6. Chakrabarti, K., & Chopra, N. (2021). Generalized AdaGrad (G-AdaGrad) and Adam: A state-space perspective. Proceedings of the IEEE Conference on Decision and Control (CDC), 1496–1501. https://doi.org/10.1109/CDC45484.2021.9682994
  7. Chen, Y., Song, X., Lee, C., Wang, Z., Zhang, Q., Dohan, D., Kawakami, K., Kochanski, G., Doucet, A., Ranzato, M., Perel, S., & de Freitas, N. (2022). Towards learning universal hyperparameter optimizers with transformers. Advances in Neural Information Processing Systems, 35, 1–16.
  8. Choi, D., Shallue, C. J., Nado, Z., Lee, J., Maddison, C. J., & Dahl, G. E. (2019). On empirical comparisons of optimizers for deep learning. arXiv. https://arxiv.org/abs/1910.05446

Details

Primary Language

English

Subjects

Statistical Data Science

Journal Section

Research Article

Early Pub Date

December 3, 2025

Publication Date

January 15, 2026

Submission Date

July 11, 2025

Acceptance Date

November 22, 2025

Published in Issue

Year 2026 Volume: 9 Number: 1

APA
Çelik, H., & Katırcı, R. (2026). The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size. Black Sea Journal of Engineering and Science, 9(1), 180-191. https://doi.org/10.34248/bsengineering.1739598
AMA
1.Çelik H, Katırcı R. The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size. BSJ Eng. Sci. 2026;9(1):180-191. doi:10.34248/bsengineering.1739598
Chicago
Çelik, Hilal, and Ramazan Katırcı. 2026. “The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size”. Black Sea Journal of Engineering and Science 9 (1): 180-91. https://doi.org/10.34248/bsengineering.1739598.
EndNote
Çelik H, Katırcı R (January 1, 2026) The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size. Black Sea Journal of Engineering and Science 9 1 180–191.
IEEE
[1]H. Çelik and R. Katırcı, “The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size”, BSJ Eng. Sci., vol. 9, no. 1, pp. 180–191, Jan. 2026, doi: 10.34248/bsengineering.1739598.
ISNAD
Çelik, Hilal - Katırcı, Ramazan. “The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size”. Black Sea Journal of Engineering and Science 9/1 (January 1, 2026): 180-191. https://doi.org/10.34248/bsengineering.1739598.
JAMA
1.Çelik H, Katırcı R. The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size. BSJ Eng. Sci. 2026;9:180–191.
MLA
Çelik, Hilal, and Ramazan Katırcı. “The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size”. Black Sea Journal of Engineering and Science, vol. 9, no. 1, Jan. 2026, pp. 180-91, doi:10.34248/bsengineering.1739598.
Vancouver
1.Hilal Çelik, Ramazan Katırcı. The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size. BSJ Eng. Sci. 2026 Jan. 1;9(1):180-91. doi:10.34248/bsengineering.1739598

                            24890