Research Article
BibTex RIS Cite

Solving an Order Batching and Sequencing Problem with Reinforcement Learning

Year 2024, Volume: 36 Issue: 3, 235 - 246, 26.09.2024
https://doi.org/10.7240/jeps.1475312

Abstract

The purpose of this research is to determine whether a DRL solution would be a suitable solution for the OBSP problem and to compare it with traditional methods. For this purpose, models trained utilizing the PPO algorithm were tested in a complex and realistic warehouse environment, and an attempt was made to measure whether a strategy was developed to decrease the number of orders being late. A heuristic method was also applied and the results were compared on the same environment and data. The results showed that DRL approach that combines heuristics with the PPO algorithm outperforms the heuristics in minimizing the tardy order percentage in all tested scenarios.

Thanks

This research was prepared within the scope of Bahçeşehir University postgraduate thesis study. I would like to express my gratitude to my supervisor Assos. Prof. Ayla Gülcü for her valuable guidance and advice which makes this study possible.

References

  • Cals, B. J. H. C. (2019). The order batching problem: a deep reinforcement learning approach. (Master Thesis, Eindhoven University of Technology, Eindhoven, Holland). Retrieved from https://research.tue.nl/en/studentTheses/the-order-batching-problem
  • Menéndez, B., Bustillo, M., G. Pardo, E., & Duarte, A. (2017). General Variable Neighborhood Search for the Order Batching and Sequencing Problem. European Journal of Operational Research. 263. 10.1016/j.ejor.2017.05.001.
  • Xiaowei, J., Zhou, Y., Zhang, Y., Sun, L., & Hu, X. (2018). Order batching and sequencing problem under the pick-and-sort strategy in online supermarkets. Procedia Computer Science. 126. 1985-1993. 10.1016/j.procs.2018.07.254.
  • Aylak, B. L. (2022). WAREHOUSE LAYOUT OPTIMIZATION USING ASSOCIATION RULES. FRESENIUS ENVIRONMENTAL BULLETIN, 31(3 A), 3828-3840.
  • Beeks, M. S. (2021). Deep reinforcement learning for solving a multi-objective online order batching problem. (Master Thesis, Eindhoven University of Technology, Eindhoven, Holland). Retrieved from https://research.tue.nl/en/studentTheses/deep-reinforcement-learning-for-solving-a-multi-objective-online-
  • Boysen, N., De Koster, R.B.M, & Weidinger, F. (2018). Warehousing in the e-commerce era: A survey. European Journal of Operational Research. 277. 10.1016/j.ejor.2018.08.023.
  • Aylak, B. L., İnce, M., Oral, O., Süer, G., Almasarwah, N., Singh, M., & Salah, B. (2021). Application of machine learning methods for pallet loading problem. Applied Sciences, 11(18), 8304.
  • Yan, Y., Chow, A.H.F., Ho, C.P., Kuo, Y.H., Wu, Q., & Ying, C. (2021). Reinforcement Learning for Logistics and Supply Chain Management: Methodologies, State of the Art, and Future Opportunities. Retrieved from SSRN: https://ssrn.com/abstract=3935816
  • Henn, S. & Schmid, V. (2011). Metaheuristics for Order Batching and Sequencing in Manual Order Picking Systems. Computers and Industrial Engineering. 66. 10.1016/j.cie.2013.07.003.
  • Henn, S. (2012). Order batching and sequencing for the minimization of the total tardiness in picker-to-part warehouses. Flexible Services and Manufacturing Journal. 27. 10.1007/s10696-012-9164-1.
  • Tsai, C.-Y., Liou, J. J. H., & Huang, T.-M. (2008). Using a multiple-GA method to solve the batch picking problem: considering travel distance and order due time. International Journal of Production Research, 46:22, 6533-6555. DOI: 10.1080/00207540701441947
  • Valle, C.A., Beasley, J.E., & Cunha, A.S. (2017). Optimally solving the joint order batching and picker routing problem. European Journal of Operational Research. 10.1016/j.ejor.2017.03.069.
  • Cals, B., Zhang, Y., Dijkman, R. M., & van Dorst, C. (2021). Solving the Online Batching Problem using Deep Reinforcement Learning. Computers & Industrial Engineering, 156, [107221]. https://doi.org/10.1016/j.cie.2021.107221
  • Hildebrand, M., Frendrup, J., & Sarivan, M. (2019). Batching using reinforcement learning. The 7th Student Symposium on Mechanical and Manufacturing Engineering. Department of Materials and Production, Aalborg University.
  • Beeks, M., Refaei Afshar, R., Zhang, Y., Dijkman, R., Dorst, C. & Looijer, S. (2022). Deep Reinforcement Learning for a Multi-Objective Online Order Batching Problem. In Proceedings of the International Conference on Automated Planning and Scheduling. 32. 435-443. 10.1609/icaps.v32i1.19829.
  • Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. & Zaremba, W. (2016).
  • Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA.
  • Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. (2017). Proximal Policy Optimization Algorithms. https://doi.org/10.48550/arXiv.1707.06347
  • Lopes, G.C., Ferreira, M., da Silva Simões, A., & Colombini, E. L. (2018) Intelligent Control of a Quadrotor with Proximal Policy Optimization Reinforcement Learning. 2018 Latin American Robotic Symposium, 2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics in Education (WRE), João Pessoa, Brazil, 2018, pp. 503-508, doi: 10.1109/LARS/SBR/WRE.2018.00094.
  • Funika, W., Koperek, P., & Kitowski, J. (2020). Automatic Management of Cloud Applications with Use of Proximal Policy Optimization. In: Krzhizhanovskaya, V., et al. Computational Science – ICCS 2020. ICCS 2020. Lecture Notes in Computer Science, vol 12137. Springer, Cham. https://doi.org/10.1007/978-3-030-50371-0_6 OpenAI Gym. https://doi.org/10.48550/arXiv.1606.01540
  • Kechinov, M. (2020). eCommerce purchase history from electronics store [Data file]. Retrieved from https://www.kaggle.com/datasets/mkechinov/ecommerce-purchase-history-from-electronics-store
  • Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research 22 (2021) 1-8
  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  • Wang, Y., He, H. & Tan, X. (2020). Truly Proximal Policy Optimization. Proceedings of the 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:113-122 Available from https://proceedings.mlr.press/v115/wang20b.html.
  • Cobbe, K. W., Hilton, J., Klimov, O., & Schulman, J. (2021). Phasic policy gradient. In International Conference on Machine Learning (pp. 2020-2027). PMLR.

Pekiştirmeli Öğrenme ile Sipariş Yığınlama ve Sıralama Probleminin Çözülmesi

Year 2024, Volume: 36 Issue: 3, 235 - 246, 26.09.2024
https://doi.org/10.7240/jeps.1475312

Abstract

Bu araştırmanın amacı bir DRL çözümünün OBSP problemi için uygun bir çözüm olup olmayacağını belirlemek ve geleneksel yöntemlerle karşılaştırmaktır. Bu amaçla karmaşık ve gerçekçi bir warehouse ortamında PPO algoritması ile eğitilen modeller test edilmiş, geç siparişlerin sayısını azaltacak bir strateji geliştirilip geliştirilmediği ölçülmeye çalışılmıştır. Bir heuristic metod da aynı ortam ve veriler üzerinde uygulanarak sonuçlar karşılaştırılmıştır. Sonuçlar, heuristic yöntemi PPO algoritmasıyla birleştiren DRL yaklaşımının, tüm test edilen senaryolarda geç kalan sipariş yüzdesini en aza indirmede heuristic yöntemlere göre daha iyi bir performansa sahip olduğunu gösterdi.

References

  • Cals, B. J. H. C. (2019). The order batching problem: a deep reinforcement learning approach. (Master Thesis, Eindhoven University of Technology, Eindhoven, Holland). Retrieved from https://research.tue.nl/en/studentTheses/the-order-batching-problem
  • Menéndez, B., Bustillo, M., G. Pardo, E., & Duarte, A. (2017). General Variable Neighborhood Search for the Order Batching and Sequencing Problem. European Journal of Operational Research. 263. 10.1016/j.ejor.2017.05.001.
  • Xiaowei, J., Zhou, Y., Zhang, Y., Sun, L., & Hu, X. (2018). Order batching and sequencing problem under the pick-and-sort strategy in online supermarkets. Procedia Computer Science. 126. 1985-1993. 10.1016/j.procs.2018.07.254.
  • Aylak, B. L. (2022). WAREHOUSE LAYOUT OPTIMIZATION USING ASSOCIATION RULES. FRESENIUS ENVIRONMENTAL BULLETIN, 31(3 A), 3828-3840.
  • Beeks, M. S. (2021). Deep reinforcement learning for solving a multi-objective online order batching problem. (Master Thesis, Eindhoven University of Technology, Eindhoven, Holland). Retrieved from https://research.tue.nl/en/studentTheses/deep-reinforcement-learning-for-solving-a-multi-objective-online-
  • Boysen, N., De Koster, R.B.M, & Weidinger, F. (2018). Warehousing in the e-commerce era: A survey. European Journal of Operational Research. 277. 10.1016/j.ejor.2018.08.023.
  • Aylak, B. L., İnce, M., Oral, O., Süer, G., Almasarwah, N., Singh, M., & Salah, B. (2021). Application of machine learning methods for pallet loading problem. Applied Sciences, 11(18), 8304.
  • Yan, Y., Chow, A.H.F., Ho, C.P., Kuo, Y.H., Wu, Q., & Ying, C. (2021). Reinforcement Learning for Logistics and Supply Chain Management: Methodologies, State of the Art, and Future Opportunities. Retrieved from SSRN: https://ssrn.com/abstract=3935816
  • Henn, S. & Schmid, V. (2011). Metaheuristics for Order Batching and Sequencing in Manual Order Picking Systems. Computers and Industrial Engineering. 66. 10.1016/j.cie.2013.07.003.
  • Henn, S. (2012). Order batching and sequencing for the minimization of the total tardiness in picker-to-part warehouses. Flexible Services and Manufacturing Journal. 27. 10.1007/s10696-012-9164-1.
  • Tsai, C.-Y., Liou, J. J. H., & Huang, T.-M. (2008). Using a multiple-GA method to solve the batch picking problem: considering travel distance and order due time. International Journal of Production Research, 46:22, 6533-6555. DOI: 10.1080/00207540701441947
  • Valle, C.A., Beasley, J.E., & Cunha, A.S. (2017). Optimally solving the joint order batching and picker routing problem. European Journal of Operational Research. 10.1016/j.ejor.2017.03.069.
  • Cals, B., Zhang, Y., Dijkman, R. M., & van Dorst, C. (2021). Solving the Online Batching Problem using Deep Reinforcement Learning. Computers & Industrial Engineering, 156, [107221]. https://doi.org/10.1016/j.cie.2021.107221
  • Hildebrand, M., Frendrup, J., & Sarivan, M. (2019). Batching using reinforcement learning. The 7th Student Symposium on Mechanical and Manufacturing Engineering. Department of Materials and Production, Aalborg University.
  • Beeks, M., Refaei Afshar, R., Zhang, Y., Dijkman, R., Dorst, C. & Looijer, S. (2022). Deep Reinforcement Learning for a Multi-Objective Online Order Batching Problem. In Proceedings of the International Conference on Automated Planning and Scheduling. 32. 435-443. 10.1609/icaps.v32i1.19829.
  • Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. & Zaremba, W. (2016).
  • Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA.
  • Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. (2017). Proximal Policy Optimization Algorithms. https://doi.org/10.48550/arXiv.1707.06347
  • Lopes, G.C., Ferreira, M., da Silva Simões, A., & Colombini, E. L. (2018) Intelligent Control of a Quadrotor with Proximal Policy Optimization Reinforcement Learning. 2018 Latin American Robotic Symposium, 2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics in Education (WRE), João Pessoa, Brazil, 2018, pp. 503-508, doi: 10.1109/LARS/SBR/WRE.2018.00094.
  • Funika, W., Koperek, P., & Kitowski, J. (2020). Automatic Management of Cloud Applications with Use of Proximal Policy Optimization. In: Krzhizhanovskaya, V., et al. Computational Science – ICCS 2020. ICCS 2020. Lecture Notes in Computer Science, vol 12137. Springer, Cham. https://doi.org/10.1007/978-3-030-50371-0_6 OpenAI Gym. https://doi.org/10.48550/arXiv.1606.01540
  • Kechinov, M. (2020). eCommerce purchase history from electronics store [Data file]. Retrieved from https://www.kaggle.com/datasets/mkechinov/ecommerce-purchase-history-from-electronics-store
  • Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research 22 (2021) 1-8
  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  • Wang, Y., He, H. & Tan, X. (2020). Truly Proximal Policy Optimization. Proceedings of the 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:113-122 Available from https://proceedings.mlr.press/v115/wang20b.html.
  • Cobbe, K. W., Hilton, J., Klimov, O., & Schulman, J. (2021). Phasic policy gradient. In International Conference on Machine Learning (pp. 2020-2027). PMLR.
There are 25 citations in total.

Details

Primary Language English
Subjects Software Engineering (Other)
Journal Section Research Articles
Authors

Begüm Canaslan 0009-0004-3662-7291

Ayla Gülcü 0000-0003-3258-8681

Early Pub Date September 19, 2024
Publication Date September 26, 2024
Submission Date April 29, 2024
Acceptance Date June 27, 2024
Published in Issue Year 2024 Volume: 36 Issue: 3

Cite

APA Canaslan, B., & Gülcü, A. (2024). Solving an Order Batching and Sequencing Problem with Reinforcement Learning. International Journal of Advances in Engineering and Pure Sciences, 36(3), 235-246. https://doi.org/10.7240/jeps.1475312
AMA Canaslan B, Gülcü A. Solving an Order Batching and Sequencing Problem with Reinforcement Learning. JEPS. September 2024;36(3):235-246. doi:10.7240/jeps.1475312
Chicago Canaslan, Begüm, and Ayla Gülcü. “Solving an Order Batching and Sequencing Problem With Reinforcement Learning”. International Journal of Advances in Engineering and Pure Sciences 36, no. 3 (September 2024): 235-46. https://doi.org/10.7240/jeps.1475312.
EndNote Canaslan B, Gülcü A (September 1, 2024) Solving an Order Batching and Sequencing Problem with Reinforcement Learning. International Journal of Advances in Engineering and Pure Sciences 36 3 235–246.
IEEE B. Canaslan and A. Gülcü, “Solving an Order Batching and Sequencing Problem with Reinforcement Learning”, JEPS, vol. 36, no. 3, pp. 235–246, 2024, doi: 10.7240/jeps.1475312.
ISNAD Canaslan, Begüm - Gülcü, Ayla. “Solving an Order Batching and Sequencing Problem With Reinforcement Learning”. International Journal of Advances in Engineering and Pure Sciences 36/3 (September 2024), 235-246. https://doi.org/10.7240/jeps.1475312.
JAMA Canaslan B, Gülcü A. Solving an Order Batching and Sequencing Problem with Reinforcement Learning. JEPS. 2024;36:235–246.
MLA Canaslan, Begüm and Ayla Gülcü. “Solving an Order Batching and Sequencing Problem With Reinforcement Learning”. International Journal of Advances in Engineering and Pure Sciences, vol. 36, no. 3, 2024, pp. 235-46, doi:10.7240/jeps.1475312.
Vancouver Canaslan B, Gülcü A. Solving an Order Batching and Sequencing Problem with Reinforcement Learning. JEPS. 2024;36(3):235-46.