<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.4 20241031//EN"
        "https://jats.nlm.nih.gov/publishing/1.4/JATS-journalpublishing1-4.dtd">
<article  article-type="research-article"        dtd-version="1.4">
            <front>

                <journal-meta>
                                    <journal-id></journal-id>
            <journal-title-group>
                                                                                    <journal-title>Balkan Journal of Electrical and Computer Engineering</journal-title>
            </journal-title-group>
                            <issn pub-type="ppub">2147-284X</issn>
                                        <issn pub-type="epub">2147-284X</issn>
                                                                                            <publisher>
                    <publisher-name>MUSA YILMAZ</publisher-name>
                </publisher>
                    </journal-meta>
                <article-meta>
                                        <article-id pub-id-type="doi">10.17694/bajece.1577929</article-id>
                                                                <article-categories>
                                            <subj-group  xml:lang="en">
                                                            <subject>Computer Software</subject>
                                                    </subj-group>
                                            <subj-group  xml:lang="tr">
                                                            <subject>Bilgisayar Yazılımı</subject>
                                                    </subj-group>
                                    </article-categories>
                                                                                                                                                        <title-group>
                                                                                                                                                            <article-title>Leveraging SHAP for Interpretable Diabetes Prediction: A Study of Machine Learning Models on the Pima Indians Diabetes Dataset</article-title>
                                                                                                    </title-group>
            
                                                    <contrib-group content-type="authors">
                                                                        <contrib contrib-type="author">
                                                                    <contrib-id contrib-id-type="orcid">
                                        https://orcid.org/0000-0002-1206-8294</contrib-id>
                                                                <name>
                                    <surname>Kırbaş</surname>
                                    <given-names>İsmail</given-names>
                                </name>
                                                                    <aff>BURDUR MEHMET AKİF ERSOY ÜNİVERSİTESİ</aff>
                                                            </contrib>
                                                    <contrib contrib-type="author">
                                                                    <contrib-id contrib-id-type="orcid">
                                        https://orcid.org/0000-0001-7679-9945</contrib-id>
                                                                <name>
                                    <surname>Çifci</surname>
                                    <given-names>Ahmet</given-names>
                                </name>
                                                                    <aff>BURDUR MEHMET AKİF ERSOY ÜNİVERSİTESİ, MÜHENDİSLİK-MİMARLIK FAKÜLTESİ</aff>
                                                            </contrib>
                                                                                </contrib-group>
                        
                                        <pub-date pub-type="pub" iso-8601-date="20250630">
                    <day>06</day>
                    <month>30</month>
                    <year>2025</year>
                </pub-date>
                                        <volume>13</volume>
                                        <issue>2</issue>
                                        <fpage>128</fpage>
                                        <lpage>139</lpage>
                        
                        <history>
                                    <date date-type="received" iso-8601-date="20241101">
                        <day>11</day>
                        <month>01</month>
                        <year>2024</year>
                    </date>
                                                    <date date-type="accepted" iso-8601-date="20241227">
                        <day>12</day>
                        <month>27</month>
                        <year>2024</year>
                    </date>
                            </history>
                                        <permissions>
                    <copyright-statement>Copyright © 2013, Balkan Journal of Electrical and Computer Engineering</copyright-statement>
                    <copyright-year>2013</copyright-year>
                    <copyright-holder>Balkan Journal of Electrical and Computer Engineering</copyright-holder>
                </permissions>
            
                                                                                                                        <abstract><p>This paper investigates the application of machine learning (ML) models for predicting diabetes using the Pima Indians Diabetes Database, with a focus on enhancing model interpretability through the use of SHapley Additive exPlanations (SHAP). The study evaluates eight ML models, including Adaptive Boosting (AdaBoost), k-Nearest Neighbors (k-NN), Logistic Regression (LR), Multi-layer Perceptron (MLP), Naive Bayes (NB), Random Forest (RF), Support Vector Machine (SVM), and eXtreme Gradient Boosting (XGBoost), utilizing both test/train split and 10-fold cross-validation methods. The RF model demonstrated superior performance, achieving an accuracy of 82% and an F1-score of 0.83 in the test/train split, and an accuracy of 83% and an F1-score of 0.84 in the 10-fold cross-validation. SHAP analysis was employed to identify the most influential predictors, revealing that glucose, BMI, pregnancies, and insulin levels are the key factors in diabetes prediction, aligning with established clinical markers. Additionally, the use of the Synthetic Minority Over-sampling TEchnique (SMOTE) for class balancing and data scaling contributes to robust model performance. The study emphasizes the necessity for interpretable ML in healthcare, proposing SHAP as a valuable tool for bridging predictive accuracy and clinical transparency in diabetes diagnostics.</p></abstract>
                                                            
            
                                                                                        <kwd-group>
                                                    <kwd>Diabetes Prediction</kwd>
                                                    <kwd>  Explainable Artificial Intelligence</kwd>
                                                    <kwd>  Machine Learning Models</kwd>
                                                    <kwd>  Model Interpretability</kwd>
                                                    <kwd>  SHapley Additive exPlanation</kwd>
                                            </kwd-group>
                            
                                                                                                                                                    </article-meta>
    </front>
    <back>
                            <ref-list>
                                    <ref id="ref1">
                        <label>1</label>
                        <mixed-citation publication-type="journal">[1]	P. David, S. Singh, R. Ankar, “A comprehensive overview of skin complications in diabetes and their prevention,” Cureus, vol. 15, no. 5, p. e38961, 2023.</mixed-citation>
                    </ref>
                                    <ref id="ref2">
                        <label>2</label>
                        <mixed-citation publication-type="journal">[2]	A. F. Walker et al., “Interventions to address global inequity in diabetes: international progress,” Lancet, vol. 402, no. 10397, 2023, pp. 250-264.</mixed-citation>
                    </ref>
                                    <ref id="ref3">
                        <label>3</label>
                        <mixed-citation publication-type="journal">[3]	M. Zakir et al., “Cardiovascular complications of diabetes: From microvascular to macrovascular pathways,” Cureus, vol. 15, no. 9, p. e45835, 2023.</mixed-citation>
                    </ref>
                                    <ref id="ref4">
                        <label>4</label>
                        <mixed-citation publication-type="journal">[4]	A. Avogaro, M. Rigato, E. di Brino, D. Bianco, I. Gianotto, G. Brusaporco, “The socio-environmental determinants of diabetes and their consequences,” Acta Diabetol., vol. 61, no. 10, 2024, pp. 1205-1210.</mixed-citation>
                    </ref>
                                    <ref id="ref5">
                        <label>5</label>
                        <mixed-citation publication-type="journal">[5]	S. Gowthami, R. Venkata Siva Reddy, M. R. Ahmed, “Exploring the effectiveness of machine learning algorithms for early detection of Type-2 Diabetes Mellitus,” Measur. Sens., vol. 31, no. 100983, p. 100983, 2024.</mixed-citation>
                    </ref>
                                    <ref id="ref6">
                        <label>6</label>
                        <mixed-citation publication-type="journal">[6]	A. A. L. Ahmad, A. A. Mohamed, “Artificial intelligence and machine learning techniques in the diagnosis of type I diabetes: Case studies,” in Studies in Computational Intelligence, Singapore: Springer Nature Singapore, 2024, pp. 289-302.</mixed-citation>
                    </ref>
                                    <ref id="ref7">
                        <label>7</label>
                        <mixed-citation publication-type="journal">[7]	T. Althobaiti, S. Althobaiti, M. M. Selim, “An optimized diabetes mellitus detection model for improved prediction of accuracy and clinical decision-making,” Alex. Eng. J., vol. 94, 2024, pp. 311-324.</mixed-citation>
                    </ref>
                                    <ref id="ref8">
                        <label>8</label>
                        <mixed-citation publication-type="journal">[8]	R. F. Albadri, S. M. Awad, A. S. Hameed, T. H. Mandeel, R. A. Jabbar, “A diabetes prediction model using hybrid machine learning algorithm,” Math. Model. Eng. Probl., vol. 11, no. 8, 2024, pp. 2119-2126.</mixed-citation>
                    </ref>
                                    <ref id="ref9">
                        <label>9</label>
                        <mixed-citation publication-type="journal">[9]	S. Buyrukoğlu, A. Akbaş, “Machine learning based early prediction of type 2 diabetes: A new hybrid feature selection approach using Correlation Matrix with Heatmap and SFS,” Balkan Journal of Electrical and Computer Engineering, vol. 10, no. 2, 2022, pp. 110-117.</mixed-citation>
                    </ref>
                                    <ref id="ref10">
                        <label>10</label>
                        <mixed-citation publication-type="journal">[10]	A. Adadi, M. Berrada, “Peeking inside the black-box: A survey on explainable artificial intelligence (XAI),” IEEE Access, vol. 6, 2018, pp. 52138-52160.</mixed-citation>
                    </ref>
                                    <ref id="ref11">
                        <label>11</label>
                        <mixed-citation publication-type="journal">[11]	F. Bodria, F. Giannotti, R. Guidotti, F. Naretto, D. Pedreschi, S. Rinzivillo, “Benchmarking and survey of explanation methods for black box models,” Data Min. Knowl. Discov., vol. 37, no. 5, 2023, pp. 1719-1778.</mixed-citation>
                    </ref>
                                    <ref id="ref12">
                        <label>12</label>
                        <mixed-citation publication-type="journal">[12]	A. Barredo Arrieta et al., “Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI,” Inf. Fusion, vol. 58, 2020, pp. 82-115.</mixed-citation>
                    </ref>
                                    <ref id="ref13">
                        <label>13</label>
                        <mixed-citation publication-type="journal">[13]	W. Ding, M. Abdel-Basset, H. Hawash, A. M. Ali, “Explainability of artificial intelligence methods, applications and challenges: A comprehensive survey,” Inf. Sci. (Ny), vol. 615, 2022, pp. 238-292.</mixed-citation>
                    </ref>
                                    <ref id="ref14">
                        <label>14</label>
                        <mixed-citation publication-type="journal">[14]	V. Hassija et al., “Interpreting black-box models: A review on explainable Artificial Intelligence,” Cognit. Comput., vol. 16, no. 1, 2024, pp. 45-74.</mixed-citation>
                    </ref>
                                    <ref id="ref15">
                        <label>15</label>
                        <mixed-citation publication-type="journal">[15]	S. Lundberg, S.-I. Lee, “A unified approach to interpreting model predictions,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan et al., Eds. Curran Associates, Inc., 2017.</mixed-citation>
                    </ref>
                                    <ref id="ref16">
                        <label>16</label>
                        <mixed-citation publication-type="journal">[16]	L. S. Shapley, “Stochastic games,” Proc. Natl. Acad. Sci. U.S.A., vol. 39, no. 10, 1953, pp. 1095-1100.</mixed-citation>
                    </ref>
                                    <ref id="ref17">
                        <label>17</label>
                        <mixed-citation publication-type="journal">[17]	K. Aliyeva, N. Mehdiyev, “Uncertainty-aware multi-criteria decision analysis for evaluation of explainable artificial intelligence methods: A use case from the healthcare domain,” Information Sciences, vol. 657, no. 119987, p. 119987, 2024.</mixed-citation>
                    </ref>
                                    <ref id="ref18">
                        <label>18</label>
                        <mixed-citation publication-type="journal">[18]	Kaggle Dataset, “Pima Indian Diabetes Database,” 2017. [Online]. Available: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database</mixed-citation>
                    </ref>
                                    <ref id="ref19">
                        <label>19</label>
                        <mixed-citation publication-type="journal">[19]	P. Verma, A. Khatoon, “Data mining applications in healthcare: A comparative analysis of classification techniques for diabetes diagnosis using the PIMA Indian diabetes dataset,” in 2024 4th International Conference on Innovative Practices in Technology and Management (ICIPTM), 2024.</mixed-citation>
                    </ref>
                                    <ref id="ref20">
                        <label>20</label>
                        <mixed-citation publication-type="journal">[20]	L. Xie, “Pima Indian diabetes database and machine learning models for diabetes prediction,” Highlights in Science, Engineering and Technology, vol. 88, 2024, pp. 97-103.</mixed-citation>
                    </ref>
                                    <ref id="ref21">
                        <label>21</label>
                        <mixed-citation publication-type="journal">[21]	V. Chang, J. Bailey, Q. A. Xu, Z. Sun, “Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms,” Neural Comput. Appl., vol. 35, no. 22, 2022, pp. 1-17.</mixed-citation>
                    </ref>
                                    <ref id="ref22">
                        <label>22</label>
                        <mixed-citation publication-type="journal">[22]	S. Sahoo, T. Mitra, A. K. Mohanty, B. J. R. Sahoo, and S. Rath, “Diabetes prediction: A study of various classification based data mining techniques,” International Journal of Computer Science and Informatics, vol. 4, no. 3, 2022, pp. 1-13.</mixed-citation>
                    </ref>
                                    <ref id="ref23">
                        <label>23</label>
                        <mixed-citation publication-type="journal">[23]	S. You, M. Kang, “A Study on Methods to Prevent Pima Indians Diabetes using SVM,” Korean Journal of Artificial Intelligence, vol. 8, no. 2, 2020, pp. 7-10.</mixed-citation>
                    </ref>
                                    <ref id="ref24">
                        <label>24</label>
                        <mixed-citation publication-type="journal">[24]	A. F. Ashour, M. M. Fouda, Z. M. Fadlullah, M. I. Ibrahem, “Optimized neural networks for diabetes classification using Pima Indians diabetes database,” in 2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI), 2024.</mixed-citation>
                    </ref>
                                    <ref id="ref25">
                        <label>25</label>
                        <mixed-citation publication-type="journal">[25]	K. Akyol, B. Şen, “Diabetes mellitus data classification by cascading of feature selection methods and ensemble learning algorithms,” Int. J. Mod. Educ. Comput. Sci., vol. 6, 2018, pp. 10-16.</mixed-citation>
                    </ref>
                                    <ref id="ref26">
                        <label>26</label>
                        <mixed-citation publication-type="journal">[26]	M. S. Reza, R. Amin, R. Yasmin, W. Kulsum, S. Ruhi, “Improving diabetes disease patients classification using stacking ensemble method with PIMA and local healthcare data,” Heliyon, vol. 10, p. e24536, 2024.
[27]	A. Pyne, B. Chakraborty, “Artificial Neural Network based approach to Diabetes Prediction using Pima Indians Diabetes Dataset,” in 2023 International Conference on Control, Automation and Diagnosis (ICCAD), Rome, Italy, 2023.</mixed-citation>
                    </ref>
                                    <ref id="ref27">
                        <label>27</label>
                        <mixed-citation publication-type="journal">[28]	A V. Jain, S. Shukla, N. Khare, “Analysis of various data imputation techniques for diabetes classification on PIMA dataset,” in 2024 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), 2024, pp. 1–6.</mixed-citation>
                    </ref>
                                    <ref id="ref28">
                        <label>28</label>
                        <mixed-citation publication-type="journal">[29]	S. Karatsiolis, C. N. Schizas, “Region based Support Vector Machine algorithm for medical diagnosis on Pima Indian Diabetes dataset,” in 2012 IEEE 12th International Conference on Bioinformatics &amp; Bioengineering (BIBE), 2012.</mixed-citation>
                    </ref>
                                    <ref id="ref29">
                        <label>29</label>
                        <mixed-citation publication-type="journal">[30]	M. Bilal, G. Ali, M. W. Iqbal, M. Anwar, M. S. A. Malik, R. A. Kadir, “Auto-Prep: Efficient and Automated Data Preprocessing Pipeline,” IEEE Access, vol. 10, 2022, pp. 107764-107784.</mixed-citation>
                    </ref>
                                    <ref id="ref30">
                        <label>30</label>
                        <mixed-citation publication-type="journal">[31]	L. B. V. de Amorim, G. D. C. Cavalcanti, R. M. O. Cruz, “The choice of scaling technique matters for classification performance,” Appl. Soft Comput., vol. 133, no. 109924, p. 109924, 2023.</mixed-citation>
                    </ref>
                                    <ref id="ref31">
                        <label>31</label>
                        <mixed-citation publication-type="journal">[32]	A. D. Amirruddin, F. M. Muharam, M. H. Ismail, N. P. Tan, M. F. Ismail, “Synthetic Minority Over-sampling TEchnique (SMOTE) and Logistic Model Tree (LMT)-Adaptive Boosting algorithms for classifying imbalanced datasets of nutrient and chlorophyll sufficiency levels of oil palm (Elaeis guineensis) using spectroradiometers and unmanned aerial vehicles,” Comput. Electron. Agric., vol. 193, no. 106646, p. 106646, 2022.</mixed-citation>
                    </ref>
                                    <ref id="ref32">
                        <label>32</label>
                        <mixed-citation publication-type="journal">[33]	N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, 2002, pp. 321-357.</mixed-citation>
                    </ref>
                                    <ref id="ref33">
                        <label>33</label>
                        <mixed-citation publication-type="journal">[34]	T. Yi̇lmaz, “Microwave spectroscopy based classification of rat hepatic tissues: On the significance of dataset,” Balkan Journal of Electrical and Computer Engineering, vol. 8, no. 4, 2020, pp. 307-313.</mixed-citation>
                    </ref>
                                    <ref id="ref34">
                        <label>34</label>
                        <mixed-citation publication-type="journal">[35]	T. Tulgar, A. Haydar, İ. Erşan, “A distributed K Nearest Neighbor classifier for Big Data,” Balkan Journal of Electrical and Computer Engineering, vol. 6, no. 2, 2018, pp. 105-111.</mixed-citation>
                    </ref>
                                    <ref id="ref35">
                        <label>35</label>
                        <mixed-citation publication-type="journal">[36]	T. Pala, A. Y. Camurcu, “Design of decision support system in the metastatic colorectal cancer data set and its application,” Balkan Journal of Electrical and Computer Engineering, vol. 4, no. 1, 2016, pp. 12-16.</mixed-citation>
                    </ref>
                                    <ref id="ref36">
                        <label>36</label>
                        <mixed-citation publication-type="journal">[37]	C. Greco, P. Pace, S. Basagni, G. Fortino, “Jamming detection at the edge of drone networks using Multi-layer Perceptrons and Decision Trees,” Appl. Soft Comput., vol. 111, no. 107806, p. 107806, 2021.</mixed-citation>
                    </ref>
                                    <ref id="ref37">
                        <label>37</label>
                        <mixed-citation publication-type="journal">[38]	İ. Kırbaş, A. Çifci, “Machine Learning-Based Rice Grain Classification Through Numerical Feature Extraction from Rice Image Data.” in 9th International Zeugma Conference on Scientific Research. Gaziantep, Türkiye, 2023.
[39]	A. Çifci, M. İlkuçar, “Analysis of window sizes in prediction of daily COVID-19 cases using machine learning models,” International Journal of Mechatronics, Electrical and Computer Technology (IJMEC), vol. 12, no. 45, 2022, pp. 5208-5217.</mixed-citation>
                    </ref>
                                    <ref id="ref38">
                        <label>38</label>
                        <mixed-citation publication-type="journal">[40]	G. Bilgin, A. Çifci, “Eritematöz skuamöz hastalıkların teşhisinde makine öğrenme algoritmaları performanslarının değerlendirilmesi,” Journal of Intelligent Systems: Theory and Applications, vol. 4, no. 2, 2021, pp. 195-202.</mixed-citation>
                    </ref>
                                    <ref id="ref39">
                        <label>39</label>
                        <mixed-citation publication-type="journal">[41]	C. Bentéjac, A. Csörgő, G. Martínez-Muñoz, “A comparative analysis of gradient boosting algorithms,” Artif. Intell. Rev., vol. 54, no. 3, 2021, pp. 1937-1967.</mixed-citation>
                    </ref>
                                    <ref id="ref40">
                        <label>40</label>
                        <mixed-citation publication-type="journal">[42]	C. Molnar, Interpretable machine learning: a guide for making black box
models interpretable. Morisville, North Carolina: Lulu, 2019.</mixed-citation>
                    </ref>
                                    <ref id="ref41">
                        <label>41</label>
                        <mixed-citation publication-type="journal">[43]	S. M. Lundberg et al., “From local explanations to global understanding with explainable AI for trees,” Nat. Mach. Intell., vol. 2, no. 1, 2020, pp. 56-67.</mixed-citation>
                    </ref>
                                    <ref id="ref42">
                        <label>42</label>
                        <mixed-citation publication-type="journal">[44]	S. M. Lundberg, G. G. Erion, S.-I. Lee, “Consistent individualized feature attribution for tree ensembles,” arXiv [cs.LG], 2018.</mixed-citation>
                    </ref>
                            </ref-list>
                    </back>
    </article>
