In this study, sector-based methane emissions of European countries were modeled using a Random Forest–based machine learning approach applied to a panel dataset covering the period 2014–2023 with country–sector–year dimensions. The primary objective of the study is not to maximize predictive accuracy, but to evaluate how different validation strategies affect model performance and generalization behavior. Accordingly, three validation strategies—random training–test split, temporal (time-based) validation, and country-based group validation—were comparatively analyzed. The dataset, obtained from Eurostat, comprises 29 countries, 5 sectors, and 1,449 observations. Model performance was evaluated using root mean square error and the coefficient of determination. Under random splitting, the model achieved very low errors (mean RMSE = 0.0126 ± 0.0025; mean R² = 0.9993 ± 0.0003), although these results may be optimistic due to information leakage. Temporal validation yielded stable near-future performance (RMSE = 0.0225, R² = 0.9975). In contrast, country-based group validation resulted in a substantial performance decline (average RMSE = 0.3132 ± 0.4061), indicating strong cross-country heterogeneity. Overall, the findings demonstrate that, in panel data settings, the choice of validation strategy is as critical as the machine learning algorithm for realistic generalization assessment.
In this study, sector-based methane emissions of European countries were modeled using a Random Forest–based machine learning approach applied to a panel dataset covering the period 2014–2023 with country–sector–year dimensions. The primary objective of the study is not to maximize predictive accuracy, but to evaluate how different validation strategies affect model performance and generalization behavior. Accordingly, three validation strategies—random training–test split, temporal (time-based) validation, and country-based group validation—were comparatively analyzed. The dataset, obtained from Eurostat, comprises 29 countries, 5 sectors, and 1,449 observations. Model performance was evaluated using root mean square error and the coefficient of determination. Under random splitting, the model achieved very low errors (mean RMSE = 0.0126 ± 0.0025; mean R² = 0.9993 ± 0.0003), although these results may be optimistic due to information leakage. Temporal validation yielded stable near-future performance (RMSE = 0.0225, R² = 0.9975). In contrast, country-based group validation resulted in a substantial performance decline (average RMSE = 0.3132 ± 0.4061), indicating strong cross-country heterogeneity. Overall, the findings demonstrate that, in panel data settings, the choice of validation strategy is as critical as the machine learning algorithm for realistic generalization assessment.
No human- subjects data were collected therefore, IRB/ethics committee approval was not required.
No external funding was received for this study.
| Primary Language | English |
|---|---|
| Subjects | Artificial Intelligence (Other) |
| Journal Section | Research Article |
| Authors | |
| Submission Date | February 17, 2026 |
| Acceptance Date | March 19, 2026 |
| Publication Date | March 27, 2026 |
| DOI | https://doi.org/10.18038/estubtda.1891746 |
| IZ | https://izlik.org/JA87BX42RJ |
| Published in Issue | Year 2026 Volume: 27 Issue: 1 |