Model complexity, dataset size and optimizer choice critically influence machine learning model performance, especially in complex architectures like Transformers. This study aims to analyze the impact of seven optimizers —Adam, AdamW, AdaBelief, RMSprop, Nadam, Adagrad and SGD—across two Transformer configurations and three dataset sizes. Results show adaptive optimizers generally outperform non-adaptive ones like SGD, particularly as dataset size grows. For smaller datasets (20K, 50K), Adam, AdamW, Nadam and RMSprop perform best on low-complexity models, while AdaBelief, Adagrad and SGD excel with higher complexity. On the largest dataset (∼140K samples), Nadam and RMSprop lead in low-complexity models, whereas Adam, AdaBelief, Adagrad, SGD and AdamW do so in high-complexity models. Notably, low-complexity models train more than twice as fast and, in some cases, achieve better accuracy and lower loss than their high-complexity counterparts. This result highlighting the importance of balancing optimizer choice, dataset size and model complexity for efficiency and accuracy. These results emphasize the trade-offs associated with optimizing model efficiency and accuracy through the interplay of optimizer selection, dataset size and model complexity.
Transformer architecture Optimizer comparison Model complexity Dataset size Optimizers Training efficiency
Ethics committee approval was not required for this study because of there was no study on animals or humans.
This work has been supported by the Scientific Research Projects Coordination Unit of the Sivas University of Science and Technology.
2024-DTP-Müh-0004
This work has been supported by the Scientific Research ProjectsCoordination Unit of the Sivas University of Science and Technology. ProjectNumber: 2024-DTP-Müh-0004. Computing resources used in this work were providedby the National Center for High Performance Computing of Türkiye (UHeM) undergrant number 5020092024. The research utilized computational resources providedby the TUBITAK ULAKBIM High Performance and Grid Computing Center (TRUBA) andthe Lütfi Albay Artificial Intelligence and Robotics Laboratory at Sivas Universityof Science and Technology."
Model complexity, dataset size and optimizer choice critically influence machine learning model performance, especially in complex architectures like Transformers. This study aims to analyze the impact of seven optimizers —Adam, AdamW, AdaBelief, RMSprop, Nadam, Adagrad and SGD—across two Transformer configurations and three dataset sizes. Results show adaptive optimizers generally outperform non-adaptive ones like SGD, particularly as dataset size grows. For smaller datasets (20K, 50K), Adam, AdamW, Nadam and RMSprop perform best on low-complexity models, while AdaBelief, Adagrad and SGD excel with higher complexity. On the largest dataset (∼140K samples), Nadam and RMSprop lead in low-complexity models, whereas Adam, AdaBelief, Adagrad, SGD and AdamW do so in high-complexity models. Notably, low-complexity models train more than twice as fast and, in some cases, achieve better accuracy and lower loss than their high-complexity counterparts. This result highlighting the importance of balancing optimizer choice, dataset size and model complexity for efficiency and accuracy. These results emphasize the trade-offs associated with optimizing model efficiency and accuracy through the interplay of optimizer selection, dataset size and model complexity.
Transformer architecture Optimizer comparison Model complexity Dataset size Optimizers Training efficiency
Ethics committee approval was not required for this study because of there was no study on animals or humans.
2024-DTP-Müh-0004
This work has been supported by the Scientific Research ProjectsCoordination Unit of the Sivas University of Science and Technology. ProjectNumber: 2024-DTP-Müh-0004. Computing resources used in this work were providedby the National Center for High Performance Computing of Türkiye (UHeM) undergrant number 5020092024. The research utilized computational resources providedby the TUBITAK ULAKBIM High Performance and Grid Computing Center (TRUBA) andthe Lütfi Albay Artificial Intelligence and Robotics Laboratory at Sivas Universityof Science and Technology."
| Birincil Dil | İngilizce |
|---|---|
| Konular | İstatistiksel Veri Bilimi |
| Bölüm | Araştırma Makalesi |
| Yazarlar | |
| Proje Numarası | 2024-DTP-Müh-0004 |
| Gönderilme Tarihi | 11 Temmuz 2025 |
| Kabul Tarihi | 22 Kasım 2025 |
| Erken Görünüm Tarihi | 3 Aralık 2025 |
| Yayımlanma Tarihi | 15 Ocak 2026 |
| DOI | https://doi.org/10.34248/bsengineering.1739598 |
| IZ | https://izlik.org/JA77HY25WE |
| Yayımlandığı Sayı | Yıl 2026 Cilt: 9 Sayı: 1 |