Araştırma Makalesi
BibTex RIS Kaynak Göster

The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size

Yıl 2026, Cilt: 9 Sayı: 1, 180 - 191, 15.01.2026
https://doi.org/10.34248/bsengineering.1739598
https://izlik.org/JA77HY25WE

Öz

Model complexity, dataset size and optimizer choice critically influence machine learning model performance, especially in complex architectures like Transformers. This study aims to analyze the impact of seven optimizers —Adam, AdamW, AdaBelief, RMSprop, Nadam, Adagrad and SGD—across two Transformer configurations and three dataset sizes. Results show adaptive optimizers generally outperform non-adaptive ones like SGD, particularly as dataset size grows. For smaller datasets (20K, 50K), Adam, AdamW, Nadam and RMSprop perform best on low-complexity models, while AdaBelief, Adagrad and SGD excel with higher complexity. On the largest dataset (∼140K samples), Nadam and RMSprop lead in low-complexity models, whereas Adam, AdaBelief, Adagrad, SGD and AdamW do so in high-complexity models. Notably, low-complexity models train more than twice as fast and, in some cases, achieve better accuracy and lower loss than their high-complexity counterparts. This result highlighting the importance of balancing optimizer choice, dataset size and model complexity for efficiency and accuracy. These results emphasize the trade-offs associated with optimizing model efficiency and accuracy through the interplay of optimizer selection, dataset size and model complexity.

Etik Beyan

Ethics committee approval was not required for this study because of there was no study on animals or humans.

Destekleyen Kurum

This work has been supported by the Scientific Research Projects Coordination Unit of the Sivas University of Science and Technology.

Proje Numarası

2024-DTP-Müh-0004

Teşekkür

This work has been supported by the Scientific Research ProjectsCoordination Unit of the Sivas University of Science and Technology. ProjectNumber: 2024-DTP-Müh-0004. Computing resources used in this work were providedby the National Center for High Performance Computing of Türkiye (UHeM) undergrant number 5020092024. The research utilized computational resources providedby the TUBITAK ULAKBIM High Performance and Grid Computing Center (TRUBA) andthe Lütfi Albay Artificial Intelligence and Robotics Laboratory at Sivas Universityof Science and Technology."

Kaynakça

  • Abdulmumin, I., Galadanci, B. S., & Isa, A. (2021). Enhanced back-translation for low resource neural machine translation using self-training. Communications in Computer and Information Science, 1350, 355–371. https://doi.org/10.1007/978-3-030-69143-1_28
  • Ahmad, R., & Al-Ramahi, I. A. M. (2023). Optimization of deep learning models: Benchmark and analysis. Advances in Computational Intelligence, 3(2), 1–15. https://doi.org/10.1007/s43674-023-00055-1
  • Babaeianjelodar, M., Lorenz, S., Gordon, J., Matthews, J., & Freitag, E. (2020). Quantifying gender bias in different corpora. Companion Proceedings of the Web Conference 2020, 752–759. https://doi.org/10.1145/3366424.3383559
  • Baskakov, D. (2023). The computational complexity of machine learning (Issue January). Springer. https://doi.org/10.1007/978-981-33-6632-9
  • Çelik, H., Katırcı, R., & İşlek, B. (2024). Effect of parameters on performance in question-answer model with simple RNN deep learning method. International Conference on Scientific and Innovation Research, 161–169.
  • Chakrabarti, K., & Chopra, N. (2021). Generalized AdaGrad (G-AdaGrad) and Adam: A state-space perspective. Proceedings of the IEEE Conference on Decision and Control (CDC), 1496–1501. https://doi.org/10.1109/CDC45484.2021.9682994
  • Chen, Y., Song, X., Lee, C., Wang, Z., Zhang, Q., Dohan, D., Kawakami, K., Kochanski, G., Doucet, A., Ranzato, M., Perel, S., & de Freitas, N. (2022). Towards learning universal hyperparameter optimizers with transformers. Advances in Neural Information Processing Systems, 35, 1–16.
  • Choi, D., Shallue, C. J., Nado, Z., Lee, J., Maddison, C. J., & Dahl, G. E. (2019). On empirical comparisons of optimizers for deep learning. arXiv. https://arxiv.org/abs/1910.05446
  • Developer, N. (2025). CUDA zone. https://developer.nvidia.com/cuda-zone
  • Fang, H., Lee, J. U., Moosavi, N. S., & Gurevych, I. (2023). Transformers with learnable activation functions. Findings of the EACL 2023, 2337–2353. https://doi.org/10.18653/v1/2023.findings-eacl.181
  • Fehrman, B., & Gess, B. (2020). Convergence rates for the stochastic gradient descent method for non-convex objective functions. arXiv:1904.01517.
  • Guan, L. (2024). Adaplus: Integrating Nesterov momentum and precise stepsize adjustment on AdamW basis. ICASSP 2024, 5210–5214. https://doi.org/10.1109/ICASSP48485.2024.10447337
  • İşlek, B., Katırcı, R., & Çelik, H. (2024). Enhancing question answering systems through optimal hyperparameter tuning in GRU. 8th International Artificial Intelligence and Data Processing Symposium (IDAP 2024). https://doi.org/10.1109/IDAP64064.2024.10710732
  • Jelassi, S., & Li, Y. (2022). Towards understanding how momentum improves generalization in deep learning. Proceedings of Machine Learning Research, 162, 9965–10040.
  • Jurafsky, D., & Martin, J. H. (2008). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (2nd ed.). Prentice Hall.
  • Katırcı, R., & Çelik, H. (2024). Transformer mimarisi. https://doi.org/10.5281/zenodo.13971609
  • Katırcı, R., & Çelik, H. (2025a). Evaluating the impact of activation functions on transformer architecture performance. International Science and Art Research Center, 626–639.
  • Katırcı, R., & Çelik, H. (2025b). Learning rate sensitivity in transformer models: A case study in neural machine translation. https://doi.org/10.5281/zenodo.15769066
  • Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A lite BERT for self-supervised learning of language representations. 8th International Conference on Learning Representations (ICLR 2020).
  • Li, S. (2019). Getting started with distributed data parallel. https://pytorch.org/tutorials/intermediate/ddp_tutorial.html
  • Luo, L. (2019). Adaptive gradient methods with dynamic bound of learning rate. arXiv. https://arxiv.org/abs/1902.09843
  • Mahadevaswamy, U. B., & Swathi, P. (2022). Sentiment analysis using bidirectional LSTM network. Procedia Computer Science, 218, 45–56. https://doi.org/10.1016/j.procs.2022.12.400
  • Pan, Y., Li, X., Yang, Y., & Dong, R. (2020). Morphological word segmentation on agglutinative languages for neural machine translation. arXiv:2001.01589.
  • Pan, Y., & Li, Y. (2023). Toward understanding why Adam converges faster than SGD for transformers. arXiv:2306.00204.
  • Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Technical Report.
  • Razzhigaev, A., Mikhalchuk, M., Goncharova, E., Oseledets, I., Dimitrov, D., & Kuznetsov, A. (2024). The shape of learning: Anisotropy and intrinsic dimensions. arXiv.
  • Rithika. (2024). Recent advances in large language models: An upshot. Journal, 5(6), 137–143. https://doi.org/10.55248/gengpi.5.0624.1403
  • Saray, U., & Çavdar, U. (2024). Comparison of different optimization algorithms in the Fashion MNIST dataset. International Journal of Management Science and Information Technology, 8(2), 52–58. https://doi.org/10.36287/ijmsit.8.2.1
  • Sarvepalli, S. K., Sarat, S., & Sarvepalli, K. (2015). Deep learning in neural networks: The science behind an artificial brain. https://doi.org/10.13140/RG.2.2.22512.71682
  • Shazeer, N., & Stern, M. (2018). Adafactor: Adaptive learning rates with sublinear memory cost. ICML 2018, 7322–7330.
  • Shoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J., & Catanzaro, B. (2020). Megatron-LM: Training multi-billion parameter language models using model parallelism. https://arxiv.org/abs/1909.08053
  • Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., & Liu, Y. (2024). RoFormer: Enhanced transformer with rotary position embedding. Neurocomputing, 568, Article 127063. https://doi.org/10.1016/j.neucom.2023.127063
  • Sun, Y., & Platoš, J. (2024). Abstractive text summarization model combining a hierarchical attention mechanism and multiobjective reinforcement learning. Expert Systems with Applications, 248. https://doi.org/10.1016/j.eswa.2024.123356
  • Tomihari, A., & Sato, I. (2025). Understanding why Adam outperforms SGD: Gradient heterogeneity in transformers. arXiv:2502.00213.
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5999–6009.
  • Vradam, V. A. (2025). A physics-inspired optimizer: Velocity regularized Adam. (Eksik: Yayın türü)
  • Wang, X., & Aitchison, L. (2024). How to set AdamW’s weight decay as you scale model and dataset size. arXiv:2405.13698.
  • Yan, H., & Shao, D. (2024). Enhancing transformer training efficiency with dynamic dropout. arXiv:2411.03236.
  • Yehudai, G., Kaplan, H., Ghandeharioun, A., Geva, M., & Globerson, A. (2024). When can transformers count to n? arXiv:2407.15160.
  • You, Y., Li, J., Reddi, S., Hseu, J., Kumar, S., Bhojanapalli, S., Song, X., Demmel, J., Keutzer, K., & Hsieh, C. J. (2020). Large batch optimization for deep learning: Training BERT in 76 minutes. ICLR 2020, 1-15.
  • Zaheer, R., & Shaziya, H. (2019). A study of the optimization algorithms in deep learning. Proceedings of the ICISC 2019, 536–539. https://doi.org/10.1109/ICISC44355.2019.9036442
  • Zhang, G., Niwa, K., & Kleijn, W. B. (2022). A DNN optimizer that improves over AdaBelief by suppression of the adaptive stepsize range. arXiv:2203.13273.
  • Zhang, Y., Chen, C., Ding, T., Li, Z., Sun, R., & Luo, Z. (2024). Why transformers need Adam: A Hessian perspective. NeurIPS.
  • Zhao, R., Morwani, D., Brandfonbrener, D., Vyas, N., & Kakade, S. (2024). Deconstructing what makes a good optimizer for language models. arXiv:2407.07972.
  • Zheng, J., Rezagholizadeh, M., & Passban, P. (2022). Dynamic position encoding for transformers. Proceedings of COLING 2022, 5076–5084.
  • Zhou, Y., Huang, K., Cheng, C., Wang, X., Hussain, A., & Liu, X. (2021). FastAdaBelief: Improving convergence rate for belief-based adaptive optimizers by exploiting strong convexity. arXiv:2104.13790.
  • Zhuang, J., Tang, T., Ding, Y., Tatikonda, S., & Dvornek, N. (2020). AdaBelief optimizer: Adapting stepsizes by the belief in observed gradients. arXiv:2010.07468.

The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size

Yıl 2026, Cilt: 9 Sayı: 1, 180 - 191, 15.01.2026
https://doi.org/10.34248/bsengineering.1739598
https://izlik.org/JA77HY25WE

Öz

Model complexity, dataset size and optimizer choice critically influence machine learning model performance, especially in complex architectures like Transformers. This study aims to analyze the impact of seven optimizers —Adam, AdamW, AdaBelief, RMSprop, Nadam, Adagrad and SGD—across two Transformer configurations and three dataset sizes. Results show adaptive optimizers generally outperform non-adaptive ones like SGD, particularly as dataset size grows. For smaller datasets (20K, 50K), Adam, AdamW, Nadam and RMSprop perform best on low-complexity models, while AdaBelief, Adagrad and SGD excel with higher complexity. On the largest dataset (∼140K samples), Nadam and RMSprop lead in low-complexity models, whereas Adam, AdaBelief, Adagrad, SGD and AdamW do so in high-complexity models. Notably, low-complexity models train more than twice as fast and, in some cases, achieve better accuracy and lower loss than their high-complexity counterparts. This result highlighting the importance of balancing optimizer choice, dataset size and model complexity for efficiency and accuracy. These results emphasize the trade-offs associated with optimizing model efficiency and accuracy through the interplay of optimizer selection, dataset size and model complexity.

Etik Beyan

Ethics committee approval was not required for this study because of there was no study on animals or humans.

Proje Numarası

2024-DTP-Müh-0004

Teşekkür

This work has been supported by the Scientific Research ProjectsCoordination Unit of the Sivas University of Science and Technology. ProjectNumber: 2024-DTP-Müh-0004. Computing resources used in this work were providedby the National Center for High Performance Computing of Türkiye (UHeM) undergrant number 5020092024. The research utilized computational resources providedby the TUBITAK ULAKBIM High Performance and Grid Computing Center (TRUBA) andthe Lütfi Albay Artificial Intelligence and Robotics Laboratory at Sivas Universityof Science and Technology."

Kaynakça

  • Abdulmumin, I., Galadanci, B. S., & Isa, A. (2021). Enhanced back-translation for low resource neural machine translation using self-training. Communications in Computer and Information Science, 1350, 355–371. https://doi.org/10.1007/978-3-030-69143-1_28
  • Ahmad, R., & Al-Ramahi, I. A. M. (2023). Optimization of deep learning models: Benchmark and analysis. Advances in Computational Intelligence, 3(2), 1–15. https://doi.org/10.1007/s43674-023-00055-1
  • Babaeianjelodar, M., Lorenz, S., Gordon, J., Matthews, J., & Freitag, E. (2020). Quantifying gender bias in different corpora. Companion Proceedings of the Web Conference 2020, 752–759. https://doi.org/10.1145/3366424.3383559
  • Baskakov, D. (2023). The computational complexity of machine learning (Issue January). Springer. https://doi.org/10.1007/978-981-33-6632-9
  • Çelik, H., Katırcı, R., & İşlek, B. (2024). Effect of parameters on performance in question-answer model with simple RNN deep learning method. International Conference on Scientific and Innovation Research, 161–169.
  • Chakrabarti, K., & Chopra, N. (2021). Generalized AdaGrad (G-AdaGrad) and Adam: A state-space perspective. Proceedings of the IEEE Conference on Decision and Control (CDC), 1496–1501. https://doi.org/10.1109/CDC45484.2021.9682994
  • Chen, Y., Song, X., Lee, C., Wang, Z., Zhang, Q., Dohan, D., Kawakami, K., Kochanski, G., Doucet, A., Ranzato, M., Perel, S., & de Freitas, N. (2022). Towards learning universal hyperparameter optimizers with transformers. Advances in Neural Information Processing Systems, 35, 1–16.
  • Choi, D., Shallue, C. J., Nado, Z., Lee, J., Maddison, C. J., & Dahl, G. E. (2019). On empirical comparisons of optimizers for deep learning. arXiv. https://arxiv.org/abs/1910.05446
  • Developer, N. (2025). CUDA zone. https://developer.nvidia.com/cuda-zone
  • Fang, H., Lee, J. U., Moosavi, N. S., & Gurevych, I. (2023). Transformers with learnable activation functions. Findings of the EACL 2023, 2337–2353. https://doi.org/10.18653/v1/2023.findings-eacl.181
  • Fehrman, B., & Gess, B. (2020). Convergence rates for the stochastic gradient descent method for non-convex objective functions. arXiv:1904.01517.
  • Guan, L. (2024). Adaplus: Integrating Nesterov momentum and precise stepsize adjustment on AdamW basis. ICASSP 2024, 5210–5214. https://doi.org/10.1109/ICASSP48485.2024.10447337
  • İşlek, B., Katırcı, R., & Çelik, H. (2024). Enhancing question answering systems through optimal hyperparameter tuning in GRU. 8th International Artificial Intelligence and Data Processing Symposium (IDAP 2024). https://doi.org/10.1109/IDAP64064.2024.10710732
  • Jelassi, S., & Li, Y. (2022). Towards understanding how momentum improves generalization in deep learning. Proceedings of Machine Learning Research, 162, 9965–10040.
  • Jurafsky, D., & Martin, J. H. (2008). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (2nd ed.). Prentice Hall.
  • Katırcı, R., & Çelik, H. (2024). Transformer mimarisi. https://doi.org/10.5281/zenodo.13971609
  • Katırcı, R., & Çelik, H. (2025a). Evaluating the impact of activation functions on transformer architecture performance. International Science and Art Research Center, 626–639.
  • Katırcı, R., & Çelik, H. (2025b). Learning rate sensitivity in transformer models: A case study in neural machine translation. https://doi.org/10.5281/zenodo.15769066
  • Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A lite BERT for self-supervised learning of language representations. 8th International Conference on Learning Representations (ICLR 2020).
  • Li, S. (2019). Getting started with distributed data parallel. https://pytorch.org/tutorials/intermediate/ddp_tutorial.html
  • Luo, L. (2019). Adaptive gradient methods with dynamic bound of learning rate. arXiv. https://arxiv.org/abs/1902.09843
  • Mahadevaswamy, U. B., & Swathi, P. (2022). Sentiment analysis using bidirectional LSTM network. Procedia Computer Science, 218, 45–56. https://doi.org/10.1016/j.procs.2022.12.400
  • Pan, Y., Li, X., Yang, Y., & Dong, R. (2020). Morphological word segmentation on agglutinative languages for neural machine translation. arXiv:2001.01589.
  • Pan, Y., & Li, Y. (2023). Toward understanding why Adam converges faster than SGD for transformers. arXiv:2306.00204.
  • Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Technical Report.
  • Razzhigaev, A., Mikhalchuk, M., Goncharova, E., Oseledets, I., Dimitrov, D., & Kuznetsov, A. (2024). The shape of learning: Anisotropy and intrinsic dimensions. arXiv.
  • Rithika. (2024). Recent advances in large language models: An upshot. Journal, 5(6), 137–143. https://doi.org/10.55248/gengpi.5.0624.1403
  • Saray, U., & Çavdar, U. (2024). Comparison of different optimization algorithms in the Fashion MNIST dataset. International Journal of Management Science and Information Technology, 8(2), 52–58. https://doi.org/10.36287/ijmsit.8.2.1
  • Sarvepalli, S. K., Sarat, S., & Sarvepalli, K. (2015). Deep learning in neural networks: The science behind an artificial brain. https://doi.org/10.13140/RG.2.2.22512.71682
  • Shazeer, N., & Stern, M. (2018). Adafactor: Adaptive learning rates with sublinear memory cost. ICML 2018, 7322–7330.
  • Shoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J., & Catanzaro, B. (2020). Megatron-LM: Training multi-billion parameter language models using model parallelism. https://arxiv.org/abs/1909.08053
  • Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., & Liu, Y. (2024). RoFormer: Enhanced transformer with rotary position embedding. Neurocomputing, 568, Article 127063. https://doi.org/10.1016/j.neucom.2023.127063
  • Sun, Y., & Platoš, J. (2024). Abstractive text summarization model combining a hierarchical attention mechanism and multiobjective reinforcement learning. Expert Systems with Applications, 248. https://doi.org/10.1016/j.eswa.2024.123356
  • Tomihari, A., & Sato, I. (2025). Understanding why Adam outperforms SGD: Gradient heterogeneity in transformers. arXiv:2502.00213.
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5999–6009.
  • Vradam, V. A. (2025). A physics-inspired optimizer: Velocity regularized Adam. (Eksik: Yayın türü)
  • Wang, X., & Aitchison, L. (2024). How to set AdamW’s weight decay as you scale model and dataset size. arXiv:2405.13698.
  • Yan, H., & Shao, D. (2024). Enhancing transformer training efficiency with dynamic dropout. arXiv:2411.03236.
  • Yehudai, G., Kaplan, H., Ghandeharioun, A., Geva, M., & Globerson, A. (2024). When can transformers count to n? arXiv:2407.15160.
  • You, Y., Li, J., Reddi, S., Hseu, J., Kumar, S., Bhojanapalli, S., Song, X., Demmel, J., Keutzer, K., & Hsieh, C. J. (2020). Large batch optimization for deep learning: Training BERT in 76 minutes. ICLR 2020, 1-15.
  • Zaheer, R., & Shaziya, H. (2019). A study of the optimization algorithms in deep learning. Proceedings of the ICISC 2019, 536–539. https://doi.org/10.1109/ICISC44355.2019.9036442
  • Zhang, G., Niwa, K., & Kleijn, W. B. (2022). A DNN optimizer that improves over AdaBelief by suppression of the adaptive stepsize range. arXiv:2203.13273.
  • Zhang, Y., Chen, C., Ding, T., Li, Z., Sun, R., & Luo, Z. (2024). Why transformers need Adam: A Hessian perspective. NeurIPS.
  • Zhao, R., Morwani, D., Brandfonbrener, D., Vyas, N., & Kakade, S. (2024). Deconstructing what makes a good optimizer for language models. arXiv:2407.07972.
  • Zheng, J., Rezagholizadeh, M., & Passban, P. (2022). Dynamic position encoding for transformers. Proceedings of COLING 2022, 5076–5084.
  • Zhou, Y., Huang, K., Cheng, C., Wang, X., Hussain, A., & Liu, X. (2021). FastAdaBelief: Improving convergence rate for belief-based adaptive optimizers by exploiting strong convexity. arXiv:2104.13790.
  • Zhuang, J., Tang, T., Ding, Y., Tatikonda, S., & Dvornek, N. (2020). AdaBelief optimizer: Adapting stepsizes by the belief in observed gradients. arXiv:2010.07468.
Toplam 47 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular İstatistiksel Veri Bilimi
Bölüm Araştırma Makalesi
Yazarlar

Hilal Çelik 0000-0001-5428-3411

Ramazan Katırcı 0000-0003-2448-011X

Proje Numarası 2024-DTP-Müh-0004
Gönderilme Tarihi 11 Temmuz 2025
Kabul Tarihi 22 Kasım 2025
Erken Görünüm Tarihi 3 Aralık 2025
Yayımlanma Tarihi 15 Ocak 2026
DOI https://doi.org/10.34248/bsengineering.1739598
IZ https://izlik.org/JA77HY25WE
Yayımlandığı Sayı Yıl 2026 Cilt: 9 Sayı: 1

Kaynak Göster

APA Çelik, H., & Katırcı, R. (2026). The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size. Black Sea Journal of Engineering and Science, 9(1), 180-191. https://doi.org/10.34248/bsengineering.1739598
AMA 1.Çelik H, Katırcı R. The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size. BSJ Eng. Sci. 2026;9(1):180-191. doi:10.34248/bsengineering.1739598
Chicago Çelik, Hilal, ve Ramazan Katırcı. 2026. “The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size”. Black Sea Journal of Engineering and Science 9 (1): 180-91. https://doi.org/10.34248/bsengineering.1739598.
EndNote Çelik H, Katırcı R (01 Ocak 2026) The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size. Black Sea Journal of Engineering and Science 9 1 180–191.
IEEE [1]H. Çelik ve R. Katırcı, “The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size”, BSJ Eng. Sci., c. 9, sy 1, ss. 180–191, Oca. 2026, doi: 10.34248/bsengineering.1739598.
ISNAD Çelik, Hilal - Katırcı, Ramazan. “The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size”. Black Sea Journal of Engineering and Science 9/1 (01 Ocak 2026): 180-191. https://doi.org/10.34248/bsengineering.1739598.
JAMA 1.Çelik H, Katırcı R. The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size. BSJ Eng. Sci. 2026;9:180–191.
MLA Çelik, Hilal, ve Ramazan Katırcı. “The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size”. Black Sea Journal of Engineering and Science, c. 9, sy 1, Ocak 2026, ss. 180-91, doi:10.34248/bsengineering.1739598.
Vancouver 1.Hilal Çelik, Ramazan Katırcı. The Impact of Optimizer Selection on Transformer Performance: Analyzing the Role of Model Complexity and Dataset Size. BSJ Eng. Sci. 01 Ocak 2026;9(1):180-91. doi:10.34248/bsengineering.1739598

                           24890