<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.4 20241031//EN"
        "https://jats.nlm.nih.gov/publishing/1.4/JATS-journalpublishing1-4.dtd">
<article  article-type="research-article"        dtd-version="1.4">
            <front>

                <journal-meta>
                                                                <journal-id>estuscience - se</journal-id>
            <journal-title-group>
                                                                                    <journal-title>Eskişehir Technical University Journal of Science and Technology A - Applied Sciences and Engineering</journal-title>
            </journal-title-group>
                            <issn pub-type="ppub">2667-4211</issn>
                                                                                                        <publisher>
                    <publisher-name>Eskisehir Technical University</publisher-name>
                </publisher>
                    </journal-meta>
                <article-meta>
                                        <article-id pub-id-type="doi">10.18038/estubtda.1891746</article-id>
                                                                <article-categories>
                                            <subj-group  xml:lang="en">
                                                            <subject>Artificial Intelligence (Other)</subject>
                                                    </subj-group>
                                            <subj-group  xml:lang="tr">
                                                            <subject>Yapay Zeka (Diğer)</subject>
                                                    </subj-group>
                                    </article-categories>
                                                                                                                                                        <title-group>
                                                                                                                        <trans-title-group xml:lang="tr">
                                    <trans-title>MACHINE LEARNING AND VALIDATION STRATEGIES IN PANEL DATA-BASED GREENHOUSE GAS EMISSION MODELING</trans-title>
                                </trans-title-group>
                                                                                                                                                                                                <article-title>MACHINE LEARNING AND VALIDATION STRATEGIES IN PANEL DATA-BASED GREENHOUSE GAS EMISSION MODELING</article-title>
                                                                                                    </title-group>
            
                                                    <contrib-group content-type="authors">
                                                                        <contrib contrib-type="author">
                                                                    <contrib-id contrib-id-type="orcid">
                                        https://orcid.org/0000-0002-4280-0394</contrib-id>
                                                                <name>
                                    <surname>Demircioğlu Diren</surname>
                                    <given-names>Deniz</given-names>
                                </name>
                                                                    <aff>SAKARYA ÜNİVERSİTESİ</aff>
                                                            </contrib>
                                                                                </contrib-group>
                        
                                        <pub-date pub-type="pub" iso-8601-date="20260327">
                    <day>03</day>
                    <month>27</month>
                    <year>2026</year>
                </pub-date>
                                        <volume>27</volume>
                                        <issue>1</issue>
                                        <fpage>204</fpage>
                                        <lpage>219</lpage>
                        
                        <history>
                                    <date date-type="received" iso-8601-date="20260217">
                        <day>02</day>
                        <month>17</month>
                        <year>2026</year>
                    </date>
                                                    <date date-type="accepted" iso-8601-date="20260319">
                        <day>03</day>
                        <month>19</month>
                        <year>2026</year>
                    </date>
                            </history>
                                        <permissions>
                    <copyright-statement>Copyright © 2000, Eskişehir Technical University Journal of Science and Technology A - Applied Sciences and Engineering</copyright-statement>
                    <copyright-year>2000</copyright-year>
                    <copyright-holder>Eskişehir Technical University Journal of Science and Technology A - Applied Sciences and Engineering</copyright-holder>
                </permissions>
            
                                                                                                <trans-abstract xml:lang="tr">
                            <p>In this study, sector-based methane emissions of European countries were modeled using a Random Forest–based machine learning approach applied to a panel dataset covering the period 2014–2023 with country–sector–year dimensions. The primary objective of the study is not to maximize predictive accuracy, but to evaluate how different validation strategies affect model performance and generalization behavior. Accordingly, three validation strategies—random training–test split, temporal (time-based) validation, and country-based group validation—were comparatively analyzed. The dataset, obtained from Eurostat, comprises 29 countries, 5 sectors, and 1,449 observations. Model performance was evaluated using root mean square error and the coefficient of determination. Under random splitting, the model achieved very low errors (mean RMSE = 0.0126 ± 0.0025; mean R² = 0.9993 ± 0.0003), although these results may be optimistic due to information leakage. Temporal validation yielded stable near-future performance (RMSE = 0.0225, R² = 0.9975). In contrast, country-based group validation resulted in a substantial performance decline (average RMSE = 0.3132 ± 0.4061), indicating strong cross-country heterogeneity. Overall, the findings demonstrate that, in panel data settings, the choice of validation strategy is as critical as the machine learning algorithm for realistic generalization assessment.</p></trans-abstract>
                                                                                                                                    <abstract><p>In this study, sector-based methane emissions of European countries were modeled using a Random Forest–based machine learning approach applied to a panel dataset covering the period 2014–2023 with country–sector–year dimensions. The primary objective of the study is not to maximize predictive accuracy, but to evaluate how different validation strategies affect model performance and generalization behavior. Accordingly, three validation strategies—random training–test split, temporal (time-based) validation, and country-based group validation—were comparatively analyzed. The dataset, obtained from Eurostat, comprises 29 countries, 5 sectors, and 1,449 observations. Model performance was evaluated using root mean square error and the coefficient of determination. Under random splitting, the model achieved very low errors (mean RMSE = 0.0126 ± 0.0025; mean R² = 0.9993 ± 0.0003), although these results may be optimistic due to information leakage. Temporal validation yielded stable near-future performance (RMSE = 0.0225, R² = 0.9975). In contrast, country-based group validation resulted in a substantial performance decline (average RMSE = 0.3132 ± 0.4061), indicating strong cross-country heterogeneity. Overall, the findings demonstrate that, in panel data settings, the choice of validation strategy is as critical as the machine learning algorithm for realistic generalization assessment.</p></abstract>
                                                            
            
                                                                                        <kwd-group>
                                                    <kwd>Panel data</kwd>
                                                    <kwd>  Machine learning</kwd>
                                                    <kwd>  Validation strategies</kwd>
                                                    <kwd>  Greenhouse gas emissions</kwd>
                                            </kwd-group>
                            
                                                <kwd-group xml:lang="tr">
                                                    <kwd>Panel data</kwd>
                                                    <kwd>  Machine learning</kwd>
                                                    <kwd>  Validation strategies</kwd>
                                                    <kwd>  Greenhouse gas emissions</kwd>
                                            </kwd-group>
                                                                                                                                    <funding-group specific-use="FundRef">
                    <award-group>
                                                    <funding-source>
                                <named-content content-type="funder_name">No external funding was received for this study.</named-content>
                            </funding-source>
                                                                    </award-group>
                </funding-group>
                                </article-meta>
    </front>
    <back>
                            <ref-list>
                                    <ref id="ref1">
                        <label>1</label>
                        <mixed-citation publication-type="journal">[1]	World Meteorological Organization. WMO Greenhouse Gas Bulletin No. 19: The State of Greenhouse Gases in the Atmosphere. World Meteorological Organization; 2023. Accessed: December 14, 2025. https://bpb-us-w2.wpmucdn.com/blog.nus.edu.sg/dist/0/15540/files/2019/11/ghg_bulletin_en.pdf</mixed-citation>
                    </ref>
                                    <ref id="ref2">
                        <label>2</label>
                        <mixed-citation publication-type="journal">[2]	World Meteorological Organization. State of the Global Climate 2021. World Meteorological Organization; 2022. Accessed: February 14, 2025. https://wmo.int/resources/publication-series/state-of-global-climate/state-of-global-climate-2021</mixed-citation>
                    </ref>
                                    <ref id="ref3">
                        <label>3</label>
                        <mixed-citation publication-type="journal">[3]	Gan N, Zhao S. Global greenhouse gas reduction forecasting via machine learning model in the scenario of energy transition. J Environ Manage 2024;371:123309.</mixed-citation>
                    </ref>
                                    <ref id="ref4">
                        <label>4</label>
                        <mixed-citation publication-type="journal">[4]	Eurostat. Greenhouse gas emissions by source sector. Eurostat; 2024. Accessed October 09, 2025. https://ec.europa.eu/eurostat</mixed-citation>
                    </ref>
                                    <ref id="ref5">
                        <label>5</label>
                        <mixed-citation publication-type="journal">[5]	UNFCCC. Greenhouse Gas Inventory Data – Time Series. UNFCCC; 2025. Accessed January 05, 2025. https://di.unfccc.int/time_series</mixed-citation>
                    </ref>
                                    <ref id="ref6">
                        <label>6</label>
                        <mixed-citation publication-type="journal">[6]	Crippa M, Solazzo E, Huang G, Guizzardi D, Koffi E, Muntean M, Schieberle C, Friedrich R, Janssens-Maenhout G. High resolution temporal profiles in the Emissions Database for Global Atmospheric Research. Sci Data 2020; 7(1):121.</mixed-citation>
                    </ref>
                                    <ref id="ref7">
                        <label>7</label>
                        <mixed-citation publication-type="journal">[7]	Wood R, Neuhoff K, Moran D, Simas M, Grubb M, Stadler K. The structure, drivers and policy implications of the European carbon footprint. Clim Policy 2020; 20(1), S39-S57.</mixed-citation>
                    </ref>
                                    <ref id="ref8">
                        <label>8</label>
                        <mixed-citation publication-type="journal">[8]	Marotta A, Porras-Amores C, Rodríguez Sánchez A, Villoria Sáez P, Maser G. Greenhouse gas emissions forecasts in countries of the european union by means of a multifactor algorithm. Applied Sciences 2023;13(14), 8520.</mixed-citation>
                    </ref>
                                    <ref id="ref9">
                        <label>9</label>
                        <mixed-citation publication-type="journal">[9]	Ene Yalçın, S. Development of a Forecasting Framework Based on Advanced Machine Learning Algorithms for Greenhouse Gas Emissions. Systems 2024; 12(12): 528.</mixed-citation>
                    </ref>
                                    <ref id="ref10">
                        <label>10</label>
                        <mixed-citation publication-type="journal">[10]	Berrington A, Halpin B, Wiggins R. An overview of methods for the analysis of panel data. NCRM Methods Review Paper NCRM/007. National Centre for Research Methods. 2006. Accessed March 14, 2026. https://eprints.ncrm.ac.uk/id/eprint/415/1/MethodsReviewPaperNCRM-007.pdf</mixed-citation>
                    </ref>
                                    <ref id="ref11">
                        <label>11</label>
                        <mixed-citation publication-type="journal">[11]	Wooldridge JM. Econometric analysis of cross section and panel data. MIT press; 2010</mixed-citation>
                    </ref>
                                    <ref id="ref12">
                        <label>12</label>
                        <mixed-citation publication-type="journal">[12]	Athey S, Imbens GW. Machine learning methods that economists should know about. Annu Rev Econ 2019; 11(1): 685-725.</mixed-citation>
                    </ref>
                                    <ref id="ref13">
                        <label>13</label>
                        <mixed-citation publication-type="journal">[13]	Bakay MS, Ağbulut Ü. Electricity production based forecasting of greenhouse gas emissions in Turkey with deep learning, support vector machine and artificial neural network algorithms. J Clean Prod 2021; 285: 125324.</mixed-citation>
                    </ref>
                                    <ref id="ref14">
                        <label>14</label>
                        <mixed-citation publication-type="journal">[14]	Aksu İÖ, Demirdelen T. The new prediction methodology for CO2 emission to ensure energy sustainability with the hybrid artificial neural network approach. Sustainability 2022; 14(23): 15595.</mixed-citation>
                    </ref>
                                    <ref id="ref15">
                        <label>15</label>
                        <mixed-citation publication-type="journal">[15]	Uzel H, Alpsalaz F, Aslan E,  Özüpak YA. Comprehensive Benchmark Of Linear And Ensemble Machine Learning Models For Global Co₂ Emission Forecasting. Middle East J Sci 2025; 11(2): 247-262.</mixed-citation>
                    </ref>
                                    <ref id="ref16">
                        <label>16</label>
                        <mixed-citation publication-type="journal">[16]	Cerqua A, Letta M, Pinto G. On the (Mis) Use of Machine Learning With Panel Data, Oxf Bull Econ Stat 2025; 0:1-13. doi:10.1111/obes.70019.</mixed-citation>
                    </ref>
                                    <ref id="ref17">
                        <label>17</label>
                        <mixed-citation publication-type="journal">[17]	Tian Y, Ren X, Li K, Li X. Carbon Dioxide Emission Forecast: A Review of Existing Models and Future Challenges. Sustainability 2025;17(4): 1471.</mixed-citation>
                    </ref>
                                    <ref id="ref18">
                        <label>18</label>
                        <mixed-citation publication-type="journal">[18]	Breiman L. Random forests. Machine Learning 2001; 45(1): 5–32. https://doi.org/10.1023/A:1010933404324</mixed-citation>
                    </ref>
                                    <ref id="ref19">
                        <label>19</label>
                        <mixed-citation publication-type="journal">[19]	Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI) 1995; Montreal, Canada. 1137-1145.</mixed-citation>
                    </ref>
                                    <ref id="ref20">
                        <label>20</label>
                        <mixed-citation publication-type="journal">[20]	Roberts DR, Bahn V, Ciuti S, Boyce MS, Elith J, Guillera-Arroita G, Hauenstein S, Lahoz-Monfort JJ, Schröder B, Thuiller W, Warton DI, Wintle BA, Hartig F, Dormann CF. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography; 2017; 40(8): 913–929. https://doi.org/10.1111/ecog.02881</mixed-citation>
                    </ref>
                                    <ref id="ref21">
                        <label>21</label>
                        <mixed-citation publication-type="journal">[21]	Eurostat. Greenhouse gas emissions by source sector (dataset env_air_gge). European Commission; Published 2025. Accessed February 14, 2025. https://ec.europa.eu/eurostat/databrowser/view/env_air_gge/default/table</mixed-citation>
                    </ref>
                                    <ref id="ref22">
                        <label>22</label>
                        <mixed-citation publication-type="journal">[22]	Hancock JT, Khoshgoftaar TM. Survey on categorical data for neural networks. J Big Data 2020; 7: 28. https://doi.org/10.1186/s40537-020-00305-w</mixed-citation>
                    </ref>
                                    <ref id="ref23">
                        <label>23</label>
                        <mixed-citation publication-type="journal">[23]	Bergmeir C, Benítez JM. On the use of cross-validation for time series predictor evaluation. Inf Sci; 2012; 191: 192–213. https://doi.org/10.1016/j.ins.2011.12.028</mixed-citation>
                    </ref>
                                    <ref id="ref24">
                        <label>24</label>
                        <mixed-citation publication-type="journal">[24]	Joseph VR. Optimal ratio for data splitting. Stat Anal Data Min 2022; 15(4), 531-538.</mixed-citation>
                    </ref>
                                    <ref id="ref25">
                        <label>25</label>
                        <mixed-citation publication-type="journal">[25]	Tashman LJ. Out-of-sample tests of forecasting accuracy: An analysis and review. Int J Forecast 2000 16(4), 437–450.https://doi.org/10.1016/S0169-2070(00)00065-0</mixed-citation>
                    </ref>
                                    <ref id="ref26">
                        <label>26</label>
                        <mixed-citation publication-type="journal">[26]	Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G. blockCV: An R package for generating spatially or environmentally separated folds for k-fold cross-validation. Methods Ecol Evo 2019; 10(2): 225–232. https://doi.org/10.1111/2041-210X.13107</mixed-citation>
                    </ref>
                                    <ref id="ref27">
                        <label>27</label>
                        <mixed-citation publication-type="journal">[27]	Chai T, Draxler RR. Root mean square error (RMSE) or mean absolute error (MAE)? Arguments against avoiding RMSE. Geosci Model Dev 2014; 7(3): 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014</mixed-citation>
                    </ref>
                                    <ref id="ref28">
                        <label>28</label>
                        <mixed-citation publication-type="journal">[28]	Chicco D, Warrens MJ, Jurman G. The coefficient of determination R² is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput Sci 2021; 7: e623. https://doi.org/10.7717/peerj-cs.623</mixed-citation>
                    </ref>
                            </ref-list>
                    </back>
    </article>
