Araştırma Makalesi
BibTex RIS Kaynak Göster

Adapting the AlphaZero Algorithm to Pawn Dama: Implementation, Training, and Performance Evaluation

Yıl 2025, Cilt: 37 Sayı: 1, 27 - 35, 25.03.2025
https://doi.org/10.7240/jeps.1620319

Öz

This research uses deep reinforcement learning techniques, notably the AlphaZero algorithm, to construct an artificial intelligence system that can play Pawn Dama at a level that surpasses human players. Pawn dama, a simplified variant of Dama, is a perfect platform to explore AI's ability to think strategically and make decisions. The primary goal is to develop an AI that can use self-play to develop sophisticated strategies and comprehend the game's dynamics and regulations. The project incorporates MCTS to improve decision-making during games and uses a Convolutional Neural Network (CNN) to enhance the AI's learning capabilities. Creating an intuitive graphical user interface, putting the reinforcement learning algorithm into practice, and testing the system against real players are steps in the development process. The accomplishment of this project will contribute to the field of strategic game AI research by providing insights that may be applied to other domains and spurring further advancements in AI-driven game strategies.

Kaynakça

  • Shannon, C.E. (1950). Programming a Computer for Playing Chess. Philosophical Magazine, 41(314), 256-275.
  • Knuth, D.E., & Moore, R.W. (1975). An Analysis of Alpha-Beta Pruning. Artificial Intelligence, 6(4), 293-326.
  • Newborn, M. (1997). Kasparov versus Deep Blue: Computer Chess Comes of Age. Springer.
  • Schaeffer, J. (1997). One Jump Ahead: Challenging Human Supremacy in Checkers. Springer-Verlag.
  • Tesauro, G. (1995). Temporal Difference Learning and TD-Gammon. Communications of the ACM, 38(3), 58-68.
  • Samuel, A.L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3), 210-229.
  • Coulom, R. (2006). Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Proceedings of the 5th International Conference on Computers and Games (pp. 72-83).
  • Kocsis, L., & Szepesvári, C. (2006). Bandit Based Monte-Carlo Planning. Machine Learning, 282, 282-293.
  • Gelly, S., & Silver, D. (2007). Combining Online and Offline Knowledge in UCT. In Proceedings of the 24th International Conference on Machine Learning (273-280).
  • Ciancarini, P., & Favini, G.P. (2010). Monte Carlo tree search in Kriegspiel. Artificial Intelligence, 174, 670-684.
  • Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., & Colton, S. (2012). A Survey of Monte Carlo Tree Search Methods. Artificial Intelligence Review, 34(1), 1-49.
  • Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., & Lanctot, M. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489.
  • Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., & Hassabis, D. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), 354-359.
  • Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., & Hassabis, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419), 1140-1144.
  • Vinyals, O., Babuschkin, I., Czarnecki, W.M., Mathieu, M., Dudzik, A., Chung, J., Choi, D., Powell, R., Ewalds, T., Georgiev, P., & Silver, D. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782), 350-354.
  • Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., & Silver, D. (2020). Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. Nature, 588(7839), 604-609.
  • Dong, H., Ding, Z., & Zhang, S. (2020). Fundamentals, Research and Applications. In Deep Reinforcement Learning (pp. 391-414).
  • Thakoor, S., Nair, S., & Jhunjhunwala, M. (2016). Learning to play othello without human knowledge. Stanford University.
  • Hassabis, D. (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. ArXiv. Retrieved from https://arxiv.org/abs/1712.01815
  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning.
  • Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929-1958.
  • Nair, V., & Hinton, G.E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the International Conference on Machine Learning.
  • Tomašev, N., Paquet, U., Hassabis, D., & Kramnik, V. (2020). Assessing game balance with AlphaZero: Exploring alternative rule sets in chess. arXiv preprint arXiv:2009.04374. https://arxiv.org/abs/2009.04374
  • Ye, W., Liu, S., Kurutach, T., Abbeel, P., & Gao, Y. (2021). Mastering Atari games with limited data. Advances in Neural Information Processing Systems, 34, 14917–14929.
  • Schmid, M., Moravčík, M., Burch, N., Kadlec, R., Davidson, J., Waugh, K., Bard, N., Timbers, F., Lanctot, M., Holland, G. Z., Davoodi, E., Christianson, A., & Bowling, M. (2023). Student of Games: A unified learning algorithm for both perfect and imperfect information games. Science Advances, 9(45), eadg3256. https://doi.org/10.1126/sciadv.adg3256
  • Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press.

AlphaZero Algoritmasının Piyon Daması’na Uyarlanması: Uygulama, Eğitim ve Performans Değerlendirmesi

Yıl 2025, Cilt: 37 Sayı: 1, 27 - 35, 25.03.2025
https://doi.org/10.7240/jeps.1620319

Öz

This research uses deep reinforcement learning techniques, notably the AlphaZero algorithm, to construct an artificial intelligence system that can play Pawn Dama at a level that surpasses human players. Pawn dama, a simplified variant of Dama, is a perfect platform to explore AI's ability to think strategically and make decisions. The primary goal is to develop an AI that can use self-play to develop sophisticated strategies and comprehend the game's dynamics and regulations. The project incorporates MCTS to improve decision-making during games and uses a Convolutional Neural Network (CNN) to enhance the AI's learning capabilities. Creating an intuitive graphical user interface, putting the reinforcement learning algorithm into practice, and testing the system against real players are steps in the development process. The accomplishment of this project will contribute to the field of strategic game AI research by providing insights that may be applied to other domains and spurring further advancements in AI-driven game strategies.

Teşekkür

araştırmalarımız sırasında bize bir çatı sağlayan marmara ünv. bilg müh. bölümüne teşekkür ederiz

Kaynakça

  • Shannon, C.E. (1950). Programming a Computer for Playing Chess. Philosophical Magazine, 41(314), 256-275.
  • Knuth, D.E., & Moore, R.W. (1975). An Analysis of Alpha-Beta Pruning. Artificial Intelligence, 6(4), 293-326.
  • Newborn, M. (1997). Kasparov versus Deep Blue: Computer Chess Comes of Age. Springer.
  • Schaeffer, J. (1997). One Jump Ahead: Challenging Human Supremacy in Checkers. Springer-Verlag.
  • Tesauro, G. (1995). Temporal Difference Learning and TD-Gammon. Communications of the ACM, 38(3), 58-68.
  • Samuel, A.L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3), 210-229.
  • Coulom, R. (2006). Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Proceedings of the 5th International Conference on Computers and Games (pp. 72-83).
  • Kocsis, L., & Szepesvári, C. (2006). Bandit Based Monte-Carlo Planning. Machine Learning, 282, 282-293.
  • Gelly, S., & Silver, D. (2007). Combining Online and Offline Knowledge in UCT. In Proceedings of the 24th International Conference on Machine Learning (273-280).
  • Ciancarini, P., & Favini, G.P. (2010). Monte Carlo tree search in Kriegspiel. Artificial Intelligence, 174, 670-684.
  • Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., & Colton, S. (2012). A Survey of Monte Carlo Tree Search Methods. Artificial Intelligence Review, 34(1), 1-49.
  • Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., & Lanctot, M. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489.
  • Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., & Hassabis, D. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), 354-359.
  • Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., & Hassabis, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419), 1140-1144.
  • Vinyals, O., Babuschkin, I., Czarnecki, W.M., Mathieu, M., Dudzik, A., Chung, J., Choi, D., Powell, R., Ewalds, T., Georgiev, P., & Silver, D. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782), 350-354.
  • Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., & Silver, D. (2020). Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. Nature, 588(7839), 604-609.
  • Dong, H., Ding, Z., & Zhang, S. (2020). Fundamentals, Research and Applications. In Deep Reinforcement Learning (pp. 391-414).
  • Thakoor, S., Nair, S., & Jhunjhunwala, M. (2016). Learning to play othello without human knowledge. Stanford University.
  • Hassabis, D. (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. ArXiv. Retrieved from https://arxiv.org/abs/1712.01815
  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning.
  • Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929-1958.
  • Nair, V., & Hinton, G.E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the International Conference on Machine Learning.
  • Tomašev, N., Paquet, U., Hassabis, D., & Kramnik, V. (2020). Assessing game balance with AlphaZero: Exploring alternative rule sets in chess. arXiv preprint arXiv:2009.04374. https://arxiv.org/abs/2009.04374
  • Ye, W., Liu, S., Kurutach, T., Abbeel, P., & Gao, Y. (2021). Mastering Atari games with limited data. Advances in Neural Information Processing Systems, 34, 14917–14929.
  • Schmid, M., Moravčík, M., Burch, N., Kadlec, R., Davidson, J., Waugh, K., Bard, N., Timbers, F., Lanctot, M., Holland, G. Z., Davoodi, E., Christianson, A., & Bowling, M. (2023). Student of Games: A unified learning algorithm for both perfect and imperfect information games. Science Advances, 9(45), eadg3256. https://doi.org/10.1126/sciadv.adg3256
  • Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press.
Toplam 27 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Yapay Zeka (Diğer)
Bölüm Araştırma Makaleleri
Yazarlar

Mehmet Kadir Baran 0000-0002-7973-2794

Erdem Pehlivanlar 0009-0009-7438-1647

Cem Güleç 0000-0003-0285-2795

Alperen Gönül 0009-0003-9066-1418

Muhammet Şeramet 0009-0005-9343-6202

Erken Görünüm Tarihi 19 Mart 2025
Yayımlanma Tarihi 25 Mart 2025
Gönderilme Tarihi 15 Ocak 2025
Kabul Tarihi 20 Şubat 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 37 Sayı: 1

Kaynak Göster

APA Baran, M. K., Pehlivanlar, E., Güleç, C., Gönül, A., vd. (2025). Adapting the AlphaZero Algorithm to Pawn Dama: Implementation, Training, and Performance Evaluation. International Journal of Advances in Engineering and Pure Sciences, 37(1), 27-35. https://doi.org/10.7240/jeps.1620319
AMA Baran MK, Pehlivanlar E, Güleç C, Gönül A, Şeramet M. Adapting the AlphaZero Algorithm to Pawn Dama: Implementation, Training, and Performance Evaluation. JEPS. Mart 2025;37(1):27-35. doi:10.7240/jeps.1620319
Chicago Baran, Mehmet Kadir, Erdem Pehlivanlar, Cem Güleç, Alperen Gönül, ve Muhammet Şeramet. “Adapting the AlphaZero Algorithm to Pawn Dama: Implementation, Training, and Performance Evaluation”. International Journal of Advances in Engineering and Pure Sciences 37, sy. 1 (Mart 2025): 27-35. https://doi.org/10.7240/jeps.1620319.
EndNote Baran MK, Pehlivanlar E, Güleç C, Gönül A, Şeramet M (01 Mart 2025) Adapting the AlphaZero Algorithm to Pawn Dama: Implementation, Training, and Performance Evaluation. International Journal of Advances in Engineering and Pure Sciences 37 1 27–35.
IEEE M. K. Baran, E. Pehlivanlar, C. Güleç, A. Gönül, ve M. Şeramet, “Adapting the AlphaZero Algorithm to Pawn Dama: Implementation, Training, and Performance Evaluation”, JEPS, c. 37, sy. 1, ss. 27–35, 2025, doi: 10.7240/jeps.1620319.
ISNAD Baran, Mehmet Kadir vd. “Adapting the AlphaZero Algorithm to Pawn Dama: Implementation, Training, and Performance Evaluation”. International Journal of Advances in Engineering and Pure Sciences 37/1 (Mart 2025), 27-35. https://doi.org/10.7240/jeps.1620319.
JAMA Baran MK, Pehlivanlar E, Güleç C, Gönül A, Şeramet M. Adapting the AlphaZero Algorithm to Pawn Dama: Implementation, Training, and Performance Evaluation. JEPS. 2025;37:27–35.
MLA Baran, Mehmet Kadir vd. “Adapting the AlphaZero Algorithm to Pawn Dama: Implementation, Training, and Performance Evaluation”. International Journal of Advances in Engineering and Pure Sciences, c. 37, sy. 1, 2025, ss. 27-35, doi:10.7240/jeps.1620319.
Vancouver Baran MK, Pehlivanlar E, Güleç C, Gönül A, Şeramet M. Adapting the AlphaZero Algorithm to Pawn Dama: Implementation, Training, and Performance Evaluation. JEPS. 2025;37(1):27-35.