This study evaluates Linear Regression, Random Forest, XGBoost and CatBoost to forecast global CO₂ emissions from 2001 to 2021 using the Global Carbon Project dataset (accessed via Our World in Data). A leakage free pipeline standardizes preprocessing, prevents temporal spillover and applies a consistent train–test protocol. Performance is summarized with MSE, RMSE, MAE, MAPE and R² to enable fair, reproducible comparisons. Linear Regression delivers the strongest out of sample accuracy (R² = 0.94, RMSE = 3.81, MAPE = 12.9%), reflecting predominantly linear and autoregressive dynamics. Boosting models (XGBoost, CatBoost) follow closely (R² > 0.914), capturing nonlinear fluctuations, whereas Random Forest is comparatively weaker (R² = 0.879). Feature importance analysis highlights short-term lags (lag₁–lag₂) as dominant predictors, corroborated by autocorrelation, partial autocorrelation and Augmented Dickey–Fuller tests. Overall, the study provides a transparent global baseline and a standardized evaluation protocol that can be extended to country-granular analyses and policy experiments. By clarifying when simple statistical models suffice and when ensemble approaches add value, the results offer evidence-based, actionable guidance for researchers and policymakers seeking interpretable, scalable tools for emissions monitoring, planning and policy relevant scenario design.
Our study does not cause any harm to the environment and does not involve the use of animal or human subjects. Therefore, it was not necessary to obtain an Ethics Committee Report.
| Primary Language | English |
|---|---|
| Subjects | Air Pollution Modelling and Control |
| Journal Section | Research Article |
| Authors | |
| Submission Date | May 8, 2025 |
| Acceptance Date | September 26, 2025 |
| Publication Date | December 30, 2025 |
| Published in Issue | Year 2025 Volume: 11 Issue: 2 |

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.