Solving an Order Batching and Sequencing Problem with Reinforcement Learning

Begüm Canaslan; Ayla Gülcü

doi:10.7240/jeps.1475312

Research Article

Pekiştirmeli Öğrenme ile Sipariş Yığınlama ve Sıralama Probleminin Çözülmesi

Year 2024, Volume: 36 Issue: 3, 235 - 246, 26.09.2024

Begüm Canaslan , Ayla Gülcü

https://doi.org/10.7240/jeps.1475312

Abstract

Bu araştırmanın amacı bir DRL çözümünün OBSP problemi için uygun bir çözüm olup olmayacağını belirlemek ve geleneksel yöntemlerle karşılaştırmaktır. Bu amaçla karmaşık ve gerçekçi bir warehouse ortamında PPO algoritması ile eğitilen modeller test edilmiş, geç siparişlerin sayısını azaltacak bir strateji geliştirilip geliştirilmediği ölçülmeye çalışılmıştır. Bir heuristic metod da aynı ortam ve veriler üzerinde uygulanarak sonuçlar karşılaştırılmıştır. Sonuçlar, heuristic yöntemi PPO algoritmasıyla birleştiren DRL yaklaşımının, tüm test edilen senaryolarda geç kalan sipariş yüzdesini en aza indirmede heuristic yöntemlere göre daha iyi bir performansa sahip olduğunu gösterdi.

Keywords

Pekiştirmeli Öğrenme , Sipariş Gruplama ve Sıralama Problemi , Yakın Politika Optimizasyonu , Depo Optimizasyonu

References

Cals, B. J. H. C. (2019). The order batching problem: a deep reinforcement learning approach. (Master Thesis, Eindhoven University of Technology, Eindhoven, Holland). Retrieved from https://research.tue.nl/en/studentTheses/the-order-batching-problem
Menéndez, B., Bustillo, M., G. Pardo, E., & Duarte, A. (2017). General Variable Neighborhood Search for the Order Batching and Sequencing Problem. European Journal of Operational Research. 263. 10.1016/j.ejor.2017.05.001.
Xiaowei, J., Zhou, Y., Zhang, Y., Sun, L., & Hu, X. (2018). Order batching and sequencing problem under the pick-and-sort strategy in online supermarkets. Procedia Computer Science. 126. 1985-1993. 10.1016/j.procs.2018.07.254.
Aylak, B. L. (2022). WAREHOUSE LAYOUT OPTIMIZATION USING ASSOCIATION RULES. FRESENIUS ENVIRONMENTAL BULLETIN, 31(3 A), 3828-3840.
Beeks, M. S. (2021). Deep reinforcement learning for solving a multi-objective online order batching problem. (Master Thesis, Eindhoven University of Technology, Eindhoven, Holland). Retrieved from https://research.tue.nl/en/studentTheses/deep-reinforcement-learning-for-solving-a-multi-objective-online-
Boysen, N., De Koster, R.B.M, & Weidinger, F. (2018). Warehousing in the e-commerce era: A survey. European Journal of Operational Research. 277. 10.1016/j.ejor.2018.08.023.
Aylak, B. L., İnce, M., Oral, O., Süer, G., Almasarwah, N., Singh, M., & Salah, B. (2021). Application of machine learning methods for pallet loading problem. Applied Sciences, 11(18), 8304.
Yan, Y., Chow, A.H.F., Ho, C.P., Kuo, Y.H., Wu, Q., & Ying, C. (2021). Reinforcement Learning for Logistics and Supply Chain Management: Methodologies, State of the Art, and Future Opportunities. Retrieved from SSRN: https://ssrn.com/abstract=3935816
Henn, S. & Schmid, V. (2011). Metaheuristics for Order Batching and Sequencing in Manual Order Picking Systems. Computers and Industrial Engineering. 66. 10.1016/j.cie.2013.07.003.
Henn, S. (2012). Order batching and sequencing for the minimization of the total tardiness in picker-to-part warehouses. Flexible Services and Manufacturing Journal. 27. 10.1007/s10696-012-9164-1.
Tsai, C.-Y., Liou, J. J. H., & Huang, T.-M. (2008). Using a multiple-GA method to solve the batch picking problem: considering travel distance and order due time. International Journal of Production Research, 46:22, 6533-6555. DOI: 10.1080/00207540701441947
Valle, C.A., Beasley, J.E., & Cunha, A.S. (2017). Optimally solving the joint order batching and picker routing problem. European Journal of Operational Research. 10.1016/j.ejor.2017.03.069.
Cals, B., Zhang, Y., Dijkman, R. M., & van Dorst, C. (2021). Solving the Online Batching Problem using Deep Reinforcement Learning. Computers & Industrial Engineering, 156, [107221]. https://doi.org/10.1016/j.cie.2021.107221
Hildebrand, M., Frendrup, J., & Sarivan, M. (2019). Batching using reinforcement learning. The 7th Student Symposium on Mechanical and Manufacturing Engineering. Department of Materials and Production, Aalborg University.
Beeks, M., Refaei Afshar, R., Zhang, Y., Dijkman, R., Dorst, C. & Looijer, S. (2022). Deep Reinforcement Learning for a Multi-Objective Online Order Batching Problem. In Proceedings of the International Conference on Automated Planning and Scheduling. 32. 435-443. 10.1609/icaps.v32i1.19829.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. & Zaremba, W. (2016).
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. (2017). Proximal Policy Optimization Algorithms. https://doi.org/10.48550/arXiv.1707.06347
Lopes, G.C., Ferreira, M., da Silva Simões, A., & Colombini, E. L. (2018) Intelligent Control of a Quadrotor with Proximal Policy Optimization Reinforcement Learning. 2018 Latin American Robotic Symposium, 2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics in Education (WRE), João Pessoa, Brazil, 2018, pp. 503-508, doi: 10.1109/LARS/SBR/WRE.2018.00094.
Funika, W., Koperek, P., & Kitowski, J. (2020). Automatic Management of Cloud Applications with Use of Proximal Policy Optimization. In: Krzhizhanovskaya, V., et al. Computational Science – ICCS 2020. ICCS 2020. Lecture Notes in Computer Science, vol 12137. Springer, Cham. https://doi.org/10.1007/978-3-030-50371-0_6 OpenAI Gym. https://doi.org/10.48550/arXiv.1606.01540
Kechinov, M. (2020). eCommerce purchase history from electronics store [Data file]. Retrieved from https://www.kaggle.com/datasets/mkechinov/ecommerce-purchase-history-from-electronics-store
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research 22 (2021) 1-8
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
Wang, Y., He, H. & Tan, X. (2020). Truly Proximal Policy Optimization. Proceedings of the 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:113-122 Available from https://proceedings.mlr.press/v115/wang20b.html.
Cobbe, K. W., Hilton, J., Klimov, O., & Schulman, J. (2021). Phasic policy gradient. In International Conference on Machine Learning (pp. 2020-2027). PMLR.

Solving an Order Batching and Sequencing Problem with Reinforcement Learning

Year 2024, Volume: 36 Issue: 3, 235 - 246, 26.09.2024

Begüm Canaslan , Ayla Gülcü

https://doi.org/10.7240/jeps.1475312

Abstract

The purpose of this research is to determine whether a DRL solution would be a suitable solution for the OBSP problem and to compare it with traditional methods. For this purpose, models trained utilizing the PPO algorithm were tested in a complex and realistic warehouse environment, and an attempt was made to measure whether a strategy was developed to decrease the number of orders being late. A heuristic method was also applied and the results were compared on the same environment and data. The results showed that DRL approach that combines heuristics with the PPO algorithm outperforms the heuristics in minimizing the tardy order percentage in all tested scenarios.

Keywords

Reinforcement Learning , Order Batching and Sequencing , Proximal Policy Optimization , Warehouse Optimization

Thanks

This research was prepared within the scope of Bahçeşehir University postgraduate thesis study. I would like to express my gratitude to my supervisor Assos. Prof. Ayla Gülcü for her valuable guidance and advice which makes this study possible.

References

Cals, B. J. H. C. (2019). The order batching problem: a deep reinforcement learning approach. (Master Thesis, Eindhoven University of Technology, Eindhoven, Holland). Retrieved from https://research.tue.nl/en/studentTheses/the-order-batching-problem
Menéndez, B., Bustillo, M., G. Pardo, E., & Duarte, A. (2017). General Variable Neighborhood Search for the Order Batching and Sequencing Problem. European Journal of Operational Research. 263. 10.1016/j.ejor.2017.05.001.
Xiaowei, J., Zhou, Y., Zhang, Y., Sun, L., & Hu, X. (2018). Order batching and sequencing problem under the pick-and-sort strategy in online supermarkets. Procedia Computer Science. 126. 1985-1993. 10.1016/j.procs.2018.07.254.
Aylak, B. L. (2022). WAREHOUSE LAYOUT OPTIMIZATION USING ASSOCIATION RULES. FRESENIUS ENVIRONMENTAL BULLETIN, 31(3 A), 3828-3840.
Beeks, M. S. (2021). Deep reinforcement learning for solving a multi-objective online order batching problem. (Master Thesis, Eindhoven University of Technology, Eindhoven, Holland). Retrieved from https://research.tue.nl/en/studentTheses/deep-reinforcement-learning-for-solving-a-multi-objective-online-
Boysen, N., De Koster, R.B.M, & Weidinger, F. (2018). Warehousing in the e-commerce era: A survey. European Journal of Operational Research. 277. 10.1016/j.ejor.2018.08.023.
Aylak, B. L., İnce, M., Oral, O., Süer, G., Almasarwah, N., Singh, M., & Salah, B. (2021). Application of machine learning methods for pallet loading problem. Applied Sciences, 11(18), 8304.
Yan, Y., Chow, A.H.F., Ho, C.P., Kuo, Y.H., Wu, Q., & Ying, C. (2021). Reinforcement Learning for Logistics and Supply Chain Management: Methodologies, State of the Art, and Future Opportunities. Retrieved from SSRN: https://ssrn.com/abstract=3935816
Henn, S. & Schmid, V. (2011). Metaheuristics for Order Batching and Sequencing in Manual Order Picking Systems. Computers and Industrial Engineering. 66. 10.1016/j.cie.2013.07.003.
Henn, S. (2012). Order batching and sequencing for the minimization of the total tardiness in picker-to-part warehouses. Flexible Services and Manufacturing Journal. 27. 10.1007/s10696-012-9164-1.
Tsai, C.-Y., Liou, J. J. H., & Huang, T.-M. (2008). Using a multiple-GA method to solve the batch picking problem: considering travel distance and order due time. International Journal of Production Research, 46:22, 6533-6555. DOI: 10.1080/00207540701441947
Valle, C.A., Beasley, J.E., & Cunha, A.S. (2017). Optimally solving the joint order batching and picker routing problem. European Journal of Operational Research. 10.1016/j.ejor.2017.03.069.
Cals, B., Zhang, Y., Dijkman, R. M., & van Dorst, C. (2021). Solving the Online Batching Problem using Deep Reinforcement Learning. Computers & Industrial Engineering, 156, [107221]. https://doi.org/10.1016/j.cie.2021.107221
Hildebrand, M., Frendrup, J., & Sarivan, M. (2019). Batching using reinforcement learning. The 7th Student Symposium on Mechanical and Manufacturing Engineering. Department of Materials and Production, Aalborg University.
Beeks, M., Refaei Afshar, R., Zhang, Y., Dijkman, R., Dorst, C. & Looijer, S. (2022). Deep Reinforcement Learning for a Multi-Objective Online Order Batching Problem. In Proceedings of the International Conference on Automated Planning and Scheduling. 32. 435-443. 10.1609/icaps.v32i1.19829.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. & Zaremba, W. (2016).
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. (2017). Proximal Policy Optimization Algorithms. https://doi.org/10.48550/arXiv.1707.06347
Lopes, G.C., Ferreira, M., da Silva Simões, A., & Colombini, E. L. (2018) Intelligent Control of a Quadrotor with Proximal Policy Optimization Reinforcement Learning. 2018 Latin American Robotic Symposium, 2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics in Education (WRE), João Pessoa, Brazil, 2018, pp. 503-508, doi: 10.1109/LARS/SBR/WRE.2018.00094.
Funika, W., Koperek, P., & Kitowski, J. (2020). Automatic Management of Cloud Applications with Use of Proximal Policy Optimization. In: Krzhizhanovskaya, V., et al. Computational Science – ICCS 2020. ICCS 2020. Lecture Notes in Computer Science, vol 12137. Springer, Cham. https://doi.org/10.1007/978-3-030-50371-0_6 OpenAI Gym. https://doi.org/10.48550/arXiv.1606.01540
Kechinov, M. (2020). eCommerce purchase history from electronics store [Data file]. Retrieved from https://www.kaggle.com/datasets/mkechinov/ecommerce-purchase-history-from-electronics-store
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research 22 (2021) 1-8
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
Wang, Y., He, H. & Tan, X. (2020). Truly Proximal Policy Optimization. Proceedings of the 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:113-122 Available from https://proceedings.mlr.press/v115/wang20b.html.
Cobbe, K. W., Hilton, J., Klimov, O., & Schulman, J. (2021). Phasic policy gradient. In International Conference on Machine Learning (pp. 2020-2027). PMLR.

There are 25 citations in total.

Details

Primary Language	English
Subjects	Software Engineering (Other)
Journal Section	Research Article
Authors	Begüm Canaslan 0009-0004-3662-7291 Ayla Gülcü 0000-0003-3258-8681
Submission Date	April 29, 2024
Acceptance Date	June 27, 2024
Early Pub Date	September 19, 2024
Publication Date	September 26, 2024
Published in Issue	Year 2024 Volume: 36 Issue: 3

Cite

APA	Canaslan, B., & Gülcü, A. (2024). Solving an Order Batching and Sequencing Problem with Reinforcement Learning. International Journal of Advances in Engineering and Pure Sciences, 36(3), 235-246. https://doi.org/10.7240/jeps.1475312
AMA	1.Canaslan B, Gülcü A. Solving an Order Batching and Sequencing Problem with Reinforcement Learning. JEPS. 2024;36(3):235-246. doi:10.7240/jeps.1475312
Chicago	Canaslan, Begüm, and Ayla Gülcü. 2024. “Solving an Order Batching and Sequencing Problem With Reinforcement Learning”. International Journal of Advances in Engineering and Pure Sciences 36 (3): 235-46. https://doi.org/10.7240/jeps.1475312.
EndNote	Canaslan B, Gülcü A (September 1, 2024) Solving an Order Batching and Sequencing Problem with Reinforcement Learning. International Journal of Advances in Engineering and Pure Sciences 36 3 235–246.
IEEE	[1]B. Canaslan and A. Gülcü, “Solving an Order Batching and Sequencing Problem with Reinforcement Learning”, JEPS, vol. 36, no. 3, pp. 235–246, Sept. 2024, doi: 10.7240/jeps.1475312.
ISNAD	Canaslan, Begüm - Gülcü, Ayla. “Solving an Order Batching and Sequencing Problem With Reinforcement Learning”. International Journal of Advances in Engineering and Pure Sciences 36/3 (September 1, 2024): 235-246. https://doi.org/10.7240/jeps.1475312.
JAMA	1.Canaslan B, Gülcü A. Solving an Order Batching and Sequencing Problem with Reinforcement Learning. JEPS. 2024;36:235–246.
MLA	Canaslan, Begüm, and Ayla Gülcü. “Solving an Order Batching and Sequencing Problem With Reinforcement Learning”. International Journal of Advances in Engineering and Pure Sciences, vol. 36, no. 3, Sept. 2024, pp. 235-46, doi:10.7240/jeps.1475312.
Vancouver	1.Canaslan B, Gülcü A. Solving an Order Batching and Sequencing Problem with Reinforcement Learning. JEPS [Internet]. 2024 Sept. 1;36(3):235-46. Available from: https://izlik.org/JA82RU95GM

Article Files

Full Text