Prediction of Wind Speed Using Tree-Based Ensemble Algorithms: CatBoost, HistGBM, and XGBoost

İlker Mert

Research Article

Prediction of Wind Speed Using Tree-Based Ensemble Algorithms: CatBoost, HistGBM, and XGBoost

Year 2025, Volume: 9 Issue: 1, 145 - 150, 31.07.2025

Abstract

In this study, three advanced tree-based machine learning models (XGBoost, HistGradientBoosting (HistGBM), and CatBoost) are compared for predicting wind speed (V (m/s)) in an urban area. A dataset covering four years is used to train the models, and their performance is evaluated, especially on the test data. The root mean square error (RMSE), mean absolute percentage error (MAPE), coefficient of determination (R^2), and P-value are used to evaluate the model's performance. XGBoost is the best amongst all the models with respect to RMSE, MAPE, and R^2 values, which are measured at 0.0416, 0.0089, and 0.9993, respectively. Next, we can have the second best as CatBoost with very successful results, having RMSE of 0.0843 and an R^2 value of 0.9972. The third model, with an RMSE of 0.1174, has an R^2 value of 0.9946. When the p-values are considered, then all estimates of the models is found to be statistically significant. The results indicate that the ensemble type modeling algorithms have very active performance for the time-series problems like estimations of V (m/s). Hence, the XGBoost method is found to be the most efficient and trustworthy for the V (m/s) estimation applications.

Keywords

Wind Speed Prediction , Tree-Based Ensemble Learning , XGBoost , CatBoost , Time Series Forecasting

References

[1] Antor, A. F., & Wollega, E. D. (2020, August). Comparison of machine learning algorithms for wind speed prediction. In Proceedings of the International Conference on Industrial Engineering and Operations Management (pp. 857-866).
[2] Shen, Z., Fan, X., Zhang, L., & Yu, H. (2022). Wind speed prediction of unmanned sailboat based on CNN and LSTM hybrid neural network. Ocean Engineering, 254, 111352, https://doi.org/10.1016/j.oceaneng.2022.111352.
[3] Chen, Q., & Folly, K. A. (2018, July). Comparison of three methods for short-term wind power forecasting. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
[4] Mugware, F. W., Sigauke, C., & Ravele, T. (2024). Evaluating Wind Speed Forecasting Models: A Comparative Study of CNN, DAN2, Random Forest and XGBOOST in Diverse South African Weather Conditions. Forecasting, 6(3), 672-699. https://doi.org/10.3390/forecast6030035.
[5] National Aeronautics and Space Administration (NASA)Langley Research Center (LaRC), POWER Data Access Viewer, Single Point Data Access, 2020 online resource, accessed August and September 2020, https://power.larc.nasa.gov/data-access-viewer.
[6] Hancock, J. T., & Khoshgoftaar, T. M. (2020). CatBoost for big data: an interdisciplinary review. Journal of big data, 7(1), 94, https://doi.org/10.21203/rs.3.rs-54646/v1
[7] Nhat-Duc, H., & Van-Duc, T. (2023). Comparison of histogram-based gradient boosting classification machine, random Forest, and deep convolutional neural network for pavement raveling severity classification. Automation in construction, 148, 104767, https://doi.org/10.1016/j.autcon.2023.104767.
[8] Zhang, L., & Jánošík, D. (2024). Enhanced short-term load forecasting with hybrid machine learning models: CatBoost and XGBoost approaches. Expert Systems with Applications, 241, 122686, https://doi.org/10.1016/j.eswa.2023.122686.
[9] Üstün, İ., Üneş, F., Mert, İ., & Karakuş, C. (2020). A comparative study of estimating solar radiation using machine learning approaches: DL, SMGRT, and ANFIS. Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, 44(4), 10322–10345. https://doi.org/10.1080/15567036.2020.1781301.
[10] Zaman, T., & Alakuş, K. (2019). Bootstrap Tahminini Kullanarak Pearson Korelasyon Katsayısının Önemliliğinin Araştırılması. Süleyman Demirel University Faculty of Arts and Science Journal of Science, 14(1), https://doi.org/77-88, 10.29233/sdufeffd.460768.
[11] Van Laarhoven, T. (2017). L2 regularization versus batch and weight normalization. arXiv preprint arXiv:1706.05350, https://doi.org/10.48550/arXiv.1706.05350.
[12] Xiong, X., Guo, X., Zeng, P., Zou, R., & Wang, X. (2022). A short-term wind power forecast method via XGBoost hyper-parameters optimization. Frontiers in energy research, 10, 905155, https://doi.org/10.3389/fenrg.2022.905155.
[13] Mollick, T., Hashmi, G. & Sabuj, S.R. Wind speed prediction for site selection and reliable operation of wind power plants in coastal regions using machine learning algorithm variants. Sustainable Energy res. 11, 5 (2024). https://doi.org/10.1186/s40807-024-00098-z.

Ağaç Tabanlı Topluluk Algoritmaları Kullanılarak Rüzgar Hızının Tahmini: CatBoost, HistGBM ve XGBoost

Year 2025, Volume: 9 Issue: 1, 145 - 150, 31.07.2025

İlker Mert

Abstract

Bu çalışmada, kentsel bir bölgede rüzgar hızının (V (m/s)) tahmini amacıyla üç gelişmiş ağaç tabanlı makine öğrenimi modeli (XGBoost, HistGradientBoosting (HistGBM) ve CatBoost) karşılaştırılmıştır. Modellerin eğitilmesinde dört yılı kapsayan bir veri seti kullanılmış ve performansları özellikle test verileri üzerinde değerlendirilmiştir. Model başarımını değerlendirmek için karekök ortalama kare hata (RMSE), ortalama mutlak yüzde hata (MAPE), belirleme katsayısı (R²) ve p-değeri kullanılmıştır. RMSE, MAPE ve R² değerleri sırasıyla 0.0416, 0.0089 ve 0.9993 olarak ölçülen XGBoost modeli, tüm modeller arasında en iyi performansı göstermiştir. İkinci en iyi model olan CatBoost, 0.0843 RMSE ve 0.9972 R² değeri ile oldukça başarılı sonuçlar vermiştir. Üçüncü model olan HistGBM ise 0.1174 RMSE ve 0.9946 R² değerine sahiptir. P-değerleri dikkate alındığında, tüm modellerin tahminlerinin istatistiksel olarak anlamlı olduğu görülmüştür. Elde edilen sonuçlar, topluluk türü modelleme algoritmalarının V (m/s) gibi zaman serisi problemleri için oldukça etkili performans sergilediğini göstermektedir. Bu bağlamda, XGBoost yöntemi V (m/s) tahmini uygulamaları için en verimli ve güvenilir yöntem olarak öne çıkmaktadır.

Keywords

Rüzgar Hızı Tahmini , Ağaç Tabanlı Topluluk Öğrenmesi , XGBoost , CatBoost , Zaman Serisi Tahmini

References

[1] Antor, A. F., & Wollega, E. D. (2020, August). Comparison of machine learning algorithms for wind speed prediction. In Proceedings of the International Conference on Industrial Engineering and Operations Management (pp. 857-866).
[2] Shen, Z., Fan, X., Zhang, L., & Yu, H. (2022). Wind speed prediction of unmanned sailboat based on CNN and LSTM hybrid neural network. Ocean Engineering, 254, 111352, https://doi.org/10.1016/j.oceaneng.2022.111352.
[3] Chen, Q., & Folly, K. A. (2018, July). Comparison of three methods for short-term wind power forecasting. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
[4] Mugware, F. W., Sigauke, C., & Ravele, T. (2024). Evaluating Wind Speed Forecasting Models: A Comparative Study of CNN, DAN2, Random Forest and XGBOOST in Diverse South African Weather Conditions. Forecasting, 6(3), 672-699. https://doi.org/10.3390/forecast6030035.
[5] National Aeronautics and Space Administration (NASA)Langley Research Center (LaRC), POWER Data Access Viewer, Single Point Data Access, 2020 online resource, accessed August and September 2020, https://power.larc.nasa.gov/data-access-viewer.
[6] Hancock, J. T., & Khoshgoftaar, T. M. (2020). CatBoost for big data: an interdisciplinary review. Journal of big data, 7(1), 94, https://doi.org/10.21203/rs.3.rs-54646/v1
[7] Nhat-Duc, H., & Van-Duc, T. (2023). Comparison of histogram-based gradient boosting classification machine, random Forest, and deep convolutional neural network for pavement raveling severity classification. Automation in construction, 148, 104767, https://doi.org/10.1016/j.autcon.2023.104767.
[8] Zhang, L., & Jánošík, D. (2024). Enhanced short-term load forecasting with hybrid machine learning models: CatBoost and XGBoost approaches. Expert Systems with Applications, 241, 122686, https://doi.org/10.1016/j.eswa.2023.122686.
[9] Üstün, İ., Üneş, F., Mert, İ., & Karakuş, C. (2020). A comparative study of estimating solar radiation using machine learning approaches: DL, SMGRT, and ANFIS. Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, 44(4), 10322–10345. https://doi.org/10.1080/15567036.2020.1781301.
[10] Zaman, T., & Alakuş, K. (2019). Bootstrap Tahminini Kullanarak Pearson Korelasyon Katsayısının Önemliliğinin Araştırılması. Süleyman Demirel University Faculty of Arts and Science Journal of Science, 14(1), https://doi.org/77-88, 10.29233/sdufeffd.460768.
[11] Van Laarhoven, T. (2017). L2 regularization versus batch and weight normalization. arXiv preprint arXiv:1706.05350, https://doi.org/10.48550/arXiv.1706.05350.
[12] Xiong, X., Guo, X., Zeng, P., Zou, R., & Wang, X. (2022). A short-term wind power forecast method via XGBoost hyper-parameters optimization. Frontiers in energy research, 10, 905155, https://doi.org/10.3389/fenrg.2022.905155.
[13] Mollick, T., Hashmi, G. & Sabuj, S.R. Wind speed prediction for site selection and reliable operation of wind power plants in coastal regions using machine learning algorithm variants. Sustainable Energy res. 11, 5 (2024). https://doi.org/10.1186/s40807-024-00098-z.

There are 13 citations in total.

Details

Primary Language	English
Subjects	Deep Learning
Journal Section	Articles
Authors	İlker Mert 0000-0001-6864-2948
Early Pub Date	July 25, 2025
Publication Date	July 31, 2025
Submission Date	June 26, 2025
Acceptance Date	July 25, 2025
Published in Issue	Year 2025 Volume: 9 Issue: 1

Cite

IEEE	İ. Mert, “Prediction of Wind Speed Using Tree-Based Ensemble Algorithms: CatBoost, HistGBM, and XGBoost”, IJMSIT, vol. 9, no. 1, pp. 145–150, 2025.

Download Cover Image

Article Files

Full Text