Solving an Order Batching and Sequencing Problem with Reinforcement Learning
Year 2024,
Volume: 36 Issue: 3, 235 - 246, 26.09.2024
Begüm Canaslan
,
Ayla Gülcü
Abstract
The purpose of this research is to determine whether a DRL solution would be a suitable solution for the OBSP problem and to compare it with traditional methods. For this purpose, models trained utilizing the PPO algorithm were tested in a complex and realistic warehouse environment, and an attempt was made to measure whether a strategy was developed to decrease the number of orders being late. A heuristic method was also applied and the results were compared on the same environment and data. The results showed that DRL approach that combines heuristics with the PPO algorithm outperforms the heuristics in minimizing the tardy order percentage in all tested scenarios.
Thanks
This research was prepared within the scope of Bahçeşehir University postgraduate thesis study. I would like to express my gratitude to my supervisor Assos. Prof. Ayla Gülcü for her valuable guidance and advice which makes this study possible.
References
- Cals, B. J. H. C. (2019). The order batching problem: a deep reinforcement learning approach. (Master Thesis, Eindhoven University of Technology, Eindhoven, Holland). Retrieved from https://research.tue.nl/en/studentTheses/the-order-batching-problem
- Menéndez, B., Bustillo, M., G. Pardo, E., & Duarte, A. (2017). General Variable Neighborhood Search for the Order Batching and Sequencing Problem. European Journal of Operational Research. 263. 10.1016/j.ejor.2017.05.001.
- Xiaowei, J., Zhou, Y., Zhang, Y., Sun, L., & Hu, X. (2018). Order batching and sequencing problem under the pick-and-sort strategy in online supermarkets. Procedia Computer Science. 126. 1985-1993. 10.1016/j.procs.2018.07.254.
- Aylak, B. L. (2022). WAREHOUSE LAYOUT OPTIMIZATION USING ASSOCIATION RULES. FRESENIUS ENVIRONMENTAL BULLETIN, 31(3 A), 3828-3840.
- Beeks, M. S. (2021). Deep reinforcement learning for solving a multi-objective online order batching problem. (Master Thesis, Eindhoven University of Technology, Eindhoven, Holland). Retrieved from https://research.tue.nl/en/studentTheses/deep-reinforcement-learning-for-solving-a-multi-objective-online-
- Boysen, N., De Koster, R.B.M, & Weidinger, F. (2018). Warehousing in the e-commerce era: A survey. European Journal of Operational Research. 277. 10.1016/j.ejor.2018.08.023.
- Aylak, B. L., İnce, M., Oral, O., Süer, G., Almasarwah, N., Singh, M., & Salah, B. (2021). Application of machine learning methods for pallet loading problem. Applied Sciences, 11(18), 8304.
- Yan, Y., Chow, A.H.F., Ho, C.P., Kuo, Y.H., Wu, Q., & Ying, C. (2021). Reinforcement Learning for Logistics and Supply Chain Management: Methodologies, State of the Art, and Future Opportunities. Retrieved from SSRN: https://ssrn.com/abstract=3935816
- Henn, S. & Schmid, V. (2011). Metaheuristics for Order Batching and Sequencing in Manual Order Picking Systems. Computers and Industrial Engineering. 66. 10.1016/j.cie.2013.07.003.
- Henn, S. (2012). Order batching and sequencing for the minimization of the total tardiness in picker-to-part warehouses. Flexible Services and Manufacturing Journal. 27. 10.1007/s10696-012-9164-1.
- Tsai, C.-Y., Liou, J. J. H., & Huang, T.-M. (2008). Using a multiple-GA method to solve the batch picking problem: considering travel distance and order due time. International Journal of Production Research, 46:22, 6533-6555. DOI: 10.1080/00207540701441947
- Valle, C.A., Beasley, J.E., & Cunha, A.S. (2017). Optimally solving the joint order batching and picker routing problem. European Journal of Operational Research. 10.1016/j.ejor.2017.03.069.
- Cals, B., Zhang, Y., Dijkman, R. M., & van Dorst, C. (2021). Solving the Online Batching Problem using Deep Reinforcement Learning. Computers & Industrial Engineering, 156, [107221]. https://doi.org/10.1016/j.cie.2021.107221
- Hildebrand, M., Frendrup, J., & Sarivan, M. (2019). Batching using reinforcement learning. The 7th Student Symposium on Mechanical and Manufacturing Engineering. Department of Materials and Production, Aalborg University.
- Beeks, M., Refaei Afshar, R., Zhang, Y., Dijkman, R., Dorst, C. & Looijer, S. (2022). Deep Reinforcement Learning for a Multi-Objective Online Order Batching Problem. In Proceedings of the International Conference on Automated Planning and Scheduling. 32. 435-443. 10.1609/icaps.v32i1.19829.
- Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. & Zaremba, W. (2016).
- Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA.
- Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. (2017). Proximal Policy Optimization Algorithms. https://doi.org/10.48550/arXiv.1707.06347
- Lopes, G.C., Ferreira, M., da Silva Simões, A., & Colombini, E. L. (2018) Intelligent Control of a Quadrotor with Proximal Policy Optimization Reinforcement Learning. 2018 Latin American Robotic Symposium, 2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics in Education (WRE), João Pessoa, Brazil, 2018, pp. 503-508, doi: 10.1109/LARS/SBR/WRE.2018.00094.
- Funika, W., Koperek, P., & Kitowski, J. (2020). Automatic Management of Cloud Applications with Use of Proximal Policy Optimization. In: Krzhizhanovskaya, V., et al. Computational Science – ICCS 2020. ICCS 2020. Lecture Notes in Computer Science, vol 12137. Springer, Cham. https://doi.org/10.1007/978-3-030-50371-0_6 OpenAI Gym. https://doi.org/10.48550/arXiv.1606.01540
- Kechinov, M. (2020). eCommerce purchase history from electronics store [Data file]. Retrieved from https://www.kaggle.com/datasets/mkechinov/ecommerce-purchase-history-from-electronics-store
- Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research 22 (2021) 1-8
- Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
- Wang, Y., He, H. & Tan, X. (2020). Truly Proximal Policy Optimization. Proceedings of the 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:113-122 Available from https://proceedings.mlr.press/v115/wang20b.html.
- Cobbe, K. W., Hilton, J., Klimov, O., & Schulman, J. (2021). Phasic policy gradient. In International Conference on Machine Learning (pp. 2020-2027). PMLR.
Pekiştirmeli Öğrenme ile Sipariş Yığınlama ve Sıralama Probleminin Çözülmesi
Year 2024,
Volume: 36 Issue: 3, 235 - 246, 26.09.2024
Begüm Canaslan
,
Ayla Gülcü
Abstract
Bu araştırmanın amacı bir DRL çözümünün OBSP problemi için uygun bir çözüm olup olmayacağını belirlemek ve geleneksel yöntemlerle karşılaştırmaktır. Bu amaçla karmaşık ve gerçekçi bir warehouse ortamında PPO algoritması ile eğitilen modeller test edilmiş, geç siparişlerin sayısını azaltacak bir strateji geliştirilip geliştirilmediği ölçülmeye çalışılmıştır. Bir heuristic metod da aynı ortam ve veriler üzerinde uygulanarak sonuçlar karşılaştırılmıştır. Sonuçlar, heuristic yöntemi PPO algoritmasıyla birleştiren DRL yaklaşımının, tüm test edilen senaryolarda geç kalan sipariş yüzdesini en aza indirmede heuristic yöntemlere göre daha iyi bir performansa sahip olduğunu gösterdi.
References
- Cals, B. J. H. C. (2019). The order batching problem: a deep reinforcement learning approach. (Master Thesis, Eindhoven University of Technology, Eindhoven, Holland). Retrieved from https://research.tue.nl/en/studentTheses/the-order-batching-problem
- Menéndez, B., Bustillo, M., G. Pardo, E., & Duarte, A. (2017). General Variable Neighborhood Search for the Order Batching and Sequencing Problem. European Journal of Operational Research. 263. 10.1016/j.ejor.2017.05.001.
- Xiaowei, J., Zhou, Y., Zhang, Y., Sun, L., & Hu, X. (2018). Order batching and sequencing problem under the pick-and-sort strategy in online supermarkets. Procedia Computer Science. 126. 1985-1993. 10.1016/j.procs.2018.07.254.
- Aylak, B. L. (2022). WAREHOUSE LAYOUT OPTIMIZATION USING ASSOCIATION RULES. FRESENIUS ENVIRONMENTAL BULLETIN, 31(3 A), 3828-3840.
- Beeks, M. S. (2021). Deep reinforcement learning for solving a multi-objective online order batching problem. (Master Thesis, Eindhoven University of Technology, Eindhoven, Holland). Retrieved from https://research.tue.nl/en/studentTheses/deep-reinforcement-learning-for-solving-a-multi-objective-online-
- Boysen, N., De Koster, R.B.M, & Weidinger, F. (2018). Warehousing in the e-commerce era: A survey. European Journal of Operational Research. 277. 10.1016/j.ejor.2018.08.023.
- Aylak, B. L., İnce, M., Oral, O., Süer, G., Almasarwah, N., Singh, M., & Salah, B. (2021). Application of machine learning methods for pallet loading problem. Applied Sciences, 11(18), 8304.
- Yan, Y., Chow, A.H.F., Ho, C.P., Kuo, Y.H., Wu, Q., & Ying, C. (2021). Reinforcement Learning for Logistics and Supply Chain Management: Methodologies, State of the Art, and Future Opportunities. Retrieved from SSRN: https://ssrn.com/abstract=3935816
- Henn, S. & Schmid, V. (2011). Metaheuristics for Order Batching and Sequencing in Manual Order Picking Systems. Computers and Industrial Engineering. 66. 10.1016/j.cie.2013.07.003.
- Henn, S. (2012). Order batching and sequencing for the minimization of the total tardiness in picker-to-part warehouses. Flexible Services and Manufacturing Journal. 27. 10.1007/s10696-012-9164-1.
- Tsai, C.-Y., Liou, J. J. H., & Huang, T.-M. (2008). Using a multiple-GA method to solve the batch picking problem: considering travel distance and order due time. International Journal of Production Research, 46:22, 6533-6555. DOI: 10.1080/00207540701441947
- Valle, C.A., Beasley, J.E., & Cunha, A.S. (2017). Optimally solving the joint order batching and picker routing problem. European Journal of Operational Research. 10.1016/j.ejor.2017.03.069.
- Cals, B., Zhang, Y., Dijkman, R. M., & van Dorst, C. (2021). Solving the Online Batching Problem using Deep Reinforcement Learning. Computers & Industrial Engineering, 156, [107221]. https://doi.org/10.1016/j.cie.2021.107221
- Hildebrand, M., Frendrup, J., & Sarivan, M. (2019). Batching using reinforcement learning. The 7th Student Symposium on Mechanical and Manufacturing Engineering. Department of Materials and Production, Aalborg University.
- Beeks, M., Refaei Afshar, R., Zhang, Y., Dijkman, R., Dorst, C. & Looijer, S. (2022). Deep Reinforcement Learning for a Multi-Objective Online Order Batching Problem. In Proceedings of the International Conference on Automated Planning and Scheduling. 32. 435-443. 10.1609/icaps.v32i1.19829.
- Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. & Zaremba, W. (2016).
- Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA.
- Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. (2017). Proximal Policy Optimization Algorithms. https://doi.org/10.48550/arXiv.1707.06347
- Lopes, G.C., Ferreira, M., da Silva Simões, A., & Colombini, E. L. (2018) Intelligent Control of a Quadrotor with Proximal Policy Optimization Reinforcement Learning. 2018 Latin American Robotic Symposium, 2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics in Education (WRE), João Pessoa, Brazil, 2018, pp. 503-508, doi: 10.1109/LARS/SBR/WRE.2018.00094.
- Funika, W., Koperek, P., & Kitowski, J. (2020). Automatic Management of Cloud Applications with Use of Proximal Policy Optimization. In: Krzhizhanovskaya, V., et al. Computational Science – ICCS 2020. ICCS 2020. Lecture Notes in Computer Science, vol 12137. Springer, Cham. https://doi.org/10.1007/978-3-030-50371-0_6 OpenAI Gym. https://doi.org/10.48550/arXiv.1606.01540
- Kechinov, M. (2020). eCommerce purchase history from electronics store [Data file]. Retrieved from https://www.kaggle.com/datasets/mkechinov/ecommerce-purchase-history-from-electronics-store
- Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research 22 (2021) 1-8
- Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
- Wang, Y., He, H. & Tan, X. (2020). Truly Proximal Policy Optimization. Proceedings of the 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:113-122 Available from https://proceedings.mlr.press/v115/wang20b.html.
- Cobbe, K. W., Hilton, J., Klimov, O., & Schulman, J. (2021). Phasic policy gradient. In International Conference on Machine Learning (pp. 2020-2027). PMLR.