Research Article
BibTex RIS Cite

Year 2025, Issue: Sayı:71 (EYS'25 Özel Sayısı), 21 - 40, 29.12.2025
https://doi.org/10.30794/pausbed.1775480

Abstract

References

  • Abdi, H., ve Williams, L. J. (2010). Principal Component Analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459. https://doi.org/10.1002/wics.101.
  • Alpar, R. (2013). Çok Değişkenli İstatistiksel Yöntemler, Detay Yayıncılık: Ankara, Türkiye.
  • Berrich, Y. ve Guennoun, Z. (2025). EEG-Based Epilepsy Detection Using CNN-SVM and DNN-SVM with Feature Dimensionality Reduction By PCA. Sci Rep 15, 14313 (2025). https://doi.org/10.1038/s41598-025-95831-z.
  • Bishop, C. M. (2006). Pattern Recognition and Machine Learning, Springer.
  • Bradley, A. P. (1997). The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition, 30(7), 1145–1159. https://doi.org/10.1016/S0031-3203(96)00142-2
  • Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324.
  • Carvajal-Dossman, J.P., Guio, L., García-Orjuela, D. vd. (2025). Retraining and Evaluation of Machine Learning and Deep Learning Models for Seizure Classification from EEG Data. Sci Rep 15, 15345 (2025). https://doi.org/10.1038/s41598-025-98389-y.
  • Chan, J.-L., Leow, S., Bea, K., Cheng, W., Phoong, S., Hong, Z.-W. ve Chen, Y. L. (2022). Mitigating the Multicollinearity Problem and its Machine Learning Approach: A Review, Mathematics, 10(8), 1283. https://doi.org/10.3390/math10081283.
  • Chen, T., ve Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System, In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://dl.acm.org/doi/pdf/10.1145/2939672.2939785.
  • Chen, T., ve He, T. (2020). XGBoost: Extreme Gradient Boosting. R Package Version 1.0.0.2.
  • Cortes, C., ve Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018.
  • Diler, S. ve Demir, Y. (2024). Çoklu Doğrusal Bağlantı Olması Durumunda Veri Madenciliği Algoritmaları Performanslarının Karşılaştırılması, Nicel Bilimler Dergisi, 6(1), 40-67. doi: 10.51541/nicel.1371834.
  • Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. https://people.inf.elte.hu/kiss/13dwhdm/roc.pdf
  • Friedman, J., Hastie, T., ve Tibshirani, R. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. https://tinyurl.com/4cpndh32.
  • Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine, Annals of Statistics, 29(5), 1189–1232.
  • Geron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, Concepts, Tools, and Techniques to Build Intelligent Systems, 2nd Edition. O’Reilly Media, Inc., Canada. https://www.oreilly.com/catalog/errata.csp?isbn=9781492032649.
  • Gupta, A. ve Goel, S. (2023). Deep Learning Empowered Weather Image Classification for Accurate Analysis. IEEE 2nd International Conference on Industrial Electronics: Developments & Applications (ICIDeA), Imphal, India, 2023, pp. 567-572, doi: 10.1109/ICIDeA59866.2023.10295216.
  • Han, J., Kamber, M., ve Pei, J. (2012). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann. https://homes.di.unimi.it/ceselli/IM/2012-13/slides/02-KnowYourData.pdf.
  • Hanley, J. A., ve McNeil, B. J. (1982). The Meaning and Use of the Area Under a Receiver Operating Characteristic (ROC) Curve. Radiology, 143(1), 29–36. https://doi.org/10.1148/radiology.143.1.7063747.
  • Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate Data Analysis, (8th ed.). Cengage. Hastie, T., Tibshirani, R., ve Friedman, J. (2009). The Elements of Statistical Learning, (2nd ed.). Springer.
  • Hosmer, D. W., Lemeshov, S. ve Sturdivant, R. X. (2013). Applied Logistic Regression, (Third Edition). John Wiley & Sons, Inc: New Jersey, USA.
  • Islam, M. S., Thapa, K., ve Yang, S.-H. (2022). Epileptic-Net: An Improved Epileptic Seizure Detection System Using Dense Convolutional Block with Attention Network from EEG. Sensors, 22(3), 728. https://doi.org/10.3390/s22030728.
  • Jolliffe, I. T. (2002). Principal Component Analysis (2nd ed.). Springer Series in Statistics. Springer-Verlag, New York.
  • Kecman, V. (2005). Support Vector Machines: Theory and Applications. In L.Wang (Eds). Support Vector Machines-An Introduction (pp. 1–47). Springer, Berlin Heidelberg. https://doi.org/10.1007/b95439.
  • Kendirkıran, G. ve Doğan, S. (2024). Makine Öğrenimi Teknikleri ile Kredi Risk Tahmininde Yeniden Örnekleme Yöntemlerinin Karşılaştırılması, Söke İşletme Fakültesi Dergisi, Yıl: 2024, Cilt: 1, Sayı: 2, ss.48-60 https://dergipark.org.tr/tr/download/article-file/4364826.
  • Kıvrak, O. (2025). Araç Fiyat Tahmininde Makine Öğrenmesi Algoritmalarının Karşılaştırılması ve Performans Analizi, İktisadi İdari ve Siyasal Araştırmalar Dergisi, Cilt: 10 Sayı: 27, 454-474, 29.06.2025 https://doi.org/10.25204/iktisad.1494020.
  • Kong, G., Ma, S., Zhao, W., Wang, H., Fu, Q., & Wang, J. (2024). A Novel Method for Optimizing Epilepsy Detection Features Through Multi-Domain Feature Fusion and Selection, Frontiers in Computational Neuroscience, 18, 1416838. doi:10.3389/fncom.2024.1416838.
  • Lewis, N. D. (2017), Machine Learning Made Easy with R: An Intuitive Step by Step Blueprint for Beginners, CreateSpace Independent Publishing Platform: Carolina, USA.
  • Li. Q., Cao., W, Zhang, A. (2025). Multi-Stream Feature Fusion of Vision Transformer and CNN for Precise Epileptic Seizure Detection from EEG Signals. J Transl Med, Aug 6;23(1):871. doi: 10.1186/s12967-025-06862-z. PMID: 40770757; PMCID: PMC12329966.
  • Mason, C. H. ve Perreault, W. D. (1991), Collinearity, Power, and Interpretation of Multiple Regression Analysis, Journal of Marketing Research, 28(3), 268–280. https://doi.org/10.2307/3172863.
  • Moguerza, J., ve Muñoz, A. (2006). Support Vector Machines with Applications, Statistical Science, 21, 322- 336. https://doi.org/10.1214/088342306000000493.
  • Montgomery, D. C., Peck, E. A. ve Vining, G. G. (2021). Introduction to Linear Regression Analysis. John Wiley ve Sons.
  • Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press.
  • Najmusseher ve Banu, P. K. N. (2024). BEED: Bangalore EEG Epilepsy Dataset. [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5K33B. https://archive.ics.uci.edu/dataset.
  • Najmusseher ve Banu, P. K. N. (2025). Feature Engineering for Epileptic Seizure Classification Using SeqBoostNet. International Journal of Computing and Digital Systems, VOL. 17, NO. 1, 1–15. http://dx.doi.org/10.12785/ijcds/1571020131.
  • Rahman, M. M., Ghasemi, Y., Suley, E., Zhou, Y., Wang, S. ve Rogers, J. (2021). Machine Learning Based Computer Aided Diagnosis of Breast Cancer Utilizing Anthropometric and Clinical Features, IRBM, 42(4), 215-226.
  • Powers, D. M. W. (2011). Evaluation: From Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies, 2(1), 37–63. https://arxiv.org/abs/2010.16061.
  • Provost, F., ve Fawcett, T. (2013). Data Science for Business. O’Reilly Media. https://www.oreilly.com/library/view/data-science-for/9781449374273/.
  • Saito, T., ve Rehmsmeier, M. (2015). The Precision-Recall Plot is More Informative Than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLOS ONE, 10(3), e0118432. https://doi.org/10.1371/journal.pone.0118432.
  • Shalev-Shwartz, S., ve Ben-David, S. (2014). Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.
  • Shlens, J. (2014). A Tutorial on Principal Component Analysis. arXiv preprint arXiv:1404.1100.
  • Smola, A. J. ve Schölkopf, B. (2004). A Tutorial on Support Vector Regression. Statistics and Computing, 14, 199- 222. https://doi.org/10.1023/B:STCO.0000035301.49549.88.
  • Wei, L. ve Mooney, C. (2020). Epileptic Seizure Detection in Clinical EEGs Using an XGboost-based Method, IEEE SPMB, v1.0:1-6. https://isip.piconepress.com/conferences/ieee_spmb/2020/papers/l02_03.pdf?utm.
  • Wu, J., Zhou, T., Li. T. (2020). Detecting Epileptic Seizures in EEG Signals with Complementary Ensemble Empirical Mode Decomposition and Extreme Gradient Boosting. Entropy (Basel), 24;22(2):140. doi: 10.3390/e22020140.

Year 2025, Issue: Sayı:71 (EYS'25 Özel Sayısı), 21 - 40, 29.12.2025
https://doi.org/10.30794/pausbed.1775480

Abstract

References

  • Abdi, H., ve Williams, L. J. (2010). Principal Component Analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459. https://doi.org/10.1002/wics.101.
  • Alpar, R. (2013). Çok Değişkenli İstatistiksel Yöntemler, Detay Yayıncılık: Ankara, Türkiye.
  • Berrich, Y. ve Guennoun, Z. (2025). EEG-Based Epilepsy Detection Using CNN-SVM and DNN-SVM with Feature Dimensionality Reduction By PCA. Sci Rep 15, 14313 (2025). https://doi.org/10.1038/s41598-025-95831-z.
  • Bishop, C. M. (2006). Pattern Recognition and Machine Learning, Springer.
  • Bradley, A. P. (1997). The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition, 30(7), 1145–1159. https://doi.org/10.1016/S0031-3203(96)00142-2
  • Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324.
  • Carvajal-Dossman, J.P., Guio, L., García-Orjuela, D. vd. (2025). Retraining and Evaluation of Machine Learning and Deep Learning Models for Seizure Classification from EEG Data. Sci Rep 15, 15345 (2025). https://doi.org/10.1038/s41598-025-98389-y.
  • Chan, J.-L., Leow, S., Bea, K., Cheng, W., Phoong, S., Hong, Z.-W. ve Chen, Y. L. (2022). Mitigating the Multicollinearity Problem and its Machine Learning Approach: A Review, Mathematics, 10(8), 1283. https://doi.org/10.3390/math10081283.
  • Chen, T., ve Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System, In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://dl.acm.org/doi/pdf/10.1145/2939672.2939785.
  • Chen, T., ve He, T. (2020). XGBoost: Extreme Gradient Boosting. R Package Version 1.0.0.2.
  • Cortes, C., ve Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018.
  • Diler, S. ve Demir, Y. (2024). Çoklu Doğrusal Bağlantı Olması Durumunda Veri Madenciliği Algoritmaları Performanslarının Karşılaştırılması, Nicel Bilimler Dergisi, 6(1), 40-67. doi: 10.51541/nicel.1371834.
  • Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. https://people.inf.elte.hu/kiss/13dwhdm/roc.pdf
  • Friedman, J., Hastie, T., ve Tibshirani, R. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. https://tinyurl.com/4cpndh32.
  • Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine, Annals of Statistics, 29(5), 1189–1232.
  • Geron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, Concepts, Tools, and Techniques to Build Intelligent Systems, 2nd Edition. O’Reilly Media, Inc., Canada. https://www.oreilly.com/catalog/errata.csp?isbn=9781492032649.
  • Gupta, A. ve Goel, S. (2023). Deep Learning Empowered Weather Image Classification for Accurate Analysis. IEEE 2nd International Conference on Industrial Electronics: Developments & Applications (ICIDeA), Imphal, India, 2023, pp. 567-572, doi: 10.1109/ICIDeA59866.2023.10295216.
  • Han, J., Kamber, M., ve Pei, J. (2012). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann. https://homes.di.unimi.it/ceselli/IM/2012-13/slides/02-KnowYourData.pdf.
  • Hanley, J. A., ve McNeil, B. J. (1982). The Meaning and Use of the Area Under a Receiver Operating Characteristic (ROC) Curve. Radiology, 143(1), 29–36. https://doi.org/10.1148/radiology.143.1.7063747.
  • Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate Data Analysis, (8th ed.). Cengage. Hastie, T., Tibshirani, R., ve Friedman, J. (2009). The Elements of Statistical Learning, (2nd ed.). Springer.
  • Hosmer, D. W., Lemeshov, S. ve Sturdivant, R. X. (2013). Applied Logistic Regression, (Third Edition). John Wiley & Sons, Inc: New Jersey, USA.
  • Islam, M. S., Thapa, K., ve Yang, S.-H. (2022). Epileptic-Net: An Improved Epileptic Seizure Detection System Using Dense Convolutional Block with Attention Network from EEG. Sensors, 22(3), 728. https://doi.org/10.3390/s22030728.
  • Jolliffe, I. T. (2002). Principal Component Analysis (2nd ed.). Springer Series in Statistics. Springer-Verlag, New York.
  • Kecman, V. (2005). Support Vector Machines: Theory and Applications. In L.Wang (Eds). Support Vector Machines-An Introduction (pp. 1–47). Springer, Berlin Heidelberg. https://doi.org/10.1007/b95439.
  • Kendirkıran, G. ve Doğan, S. (2024). Makine Öğrenimi Teknikleri ile Kredi Risk Tahmininde Yeniden Örnekleme Yöntemlerinin Karşılaştırılması, Söke İşletme Fakültesi Dergisi, Yıl: 2024, Cilt: 1, Sayı: 2, ss.48-60 https://dergipark.org.tr/tr/download/article-file/4364826.
  • Kıvrak, O. (2025). Araç Fiyat Tahmininde Makine Öğrenmesi Algoritmalarının Karşılaştırılması ve Performans Analizi, İktisadi İdari ve Siyasal Araştırmalar Dergisi, Cilt: 10 Sayı: 27, 454-474, 29.06.2025 https://doi.org/10.25204/iktisad.1494020.
  • Kong, G., Ma, S., Zhao, W., Wang, H., Fu, Q., & Wang, J. (2024). A Novel Method for Optimizing Epilepsy Detection Features Through Multi-Domain Feature Fusion and Selection, Frontiers in Computational Neuroscience, 18, 1416838. doi:10.3389/fncom.2024.1416838.
  • Lewis, N. D. (2017), Machine Learning Made Easy with R: An Intuitive Step by Step Blueprint for Beginners, CreateSpace Independent Publishing Platform: Carolina, USA.
  • Li. Q., Cao., W, Zhang, A. (2025). Multi-Stream Feature Fusion of Vision Transformer and CNN for Precise Epileptic Seizure Detection from EEG Signals. J Transl Med, Aug 6;23(1):871. doi: 10.1186/s12967-025-06862-z. PMID: 40770757; PMCID: PMC12329966.
  • Mason, C. H. ve Perreault, W. D. (1991), Collinearity, Power, and Interpretation of Multiple Regression Analysis, Journal of Marketing Research, 28(3), 268–280. https://doi.org/10.2307/3172863.
  • Moguerza, J., ve Muñoz, A. (2006). Support Vector Machines with Applications, Statistical Science, 21, 322- 336. https://doi.org/10.1214/088342306000000493.
  • Montgomery, D. C., Peck, E. A. ve Vining, G. G. (2021). Introduction to Linear Regression Analysis. John Wiley ve Sons.
  • Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press.
  • Najmusseher ve Banu, P. K. N. (2024). BEED: Bangalore EEG Epilepsy Dataset. [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5K33B. https://archive.ics.uci.edu/dataset.
  • Najmusseher ve Banu, P. K. N. (2025). Feature Engineering for Epileptic Seizure Classification Using SeqBoostNet. International Journal of Computing and Digital Systems, VOL. 17, NO. 1, 1–15. http://dx.doi.org/10.12785/ijcds/1571020131.
  • Rahman, M. M., Ghasemi, Y., Suley, E., Zhou, Y., Wang, S. ve Rogers, J. (2021). Machine Learning Based Computer Aided Diagnosis of Breast Cancer Utilizing Anthropometric and Clinical Features, IRBM, 42(4), 215-226.
  • Powers, D. M. W. (2011). Evaluation: From Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies, 2(1), 37–63. https://arxiv.org/abs/2010.16061.
  • Provost, F., ve Fawcett, T. (2013). Data Science for Business. O’Reilly Media. https://www.oreilly.com/library/view/data-science-for/9781449374273/.
  • Saito, T., ve Rehmsmeier, M. (2015). The Precision-Recall Plot is More Informative Than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLOS ONE, 10(3), e0118432. https://doi.org/10.1371/journal.pone.0118432.
  • Shalev-Shwartz, S., ve Ben-David, S. (2014). Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.
  • Shlens, J. (2014). A Tutorial on Principal Component Analysis. arXiv preprint arXiv:1404.1100.
  • Smola, A. J. ve Schölkopf, B. (2004). A Tutorial on Support Vector Regression. Statistics and Computing, 14, 199- 222. https://doi.org/10.1023/B:STCO.0000035301.49549.88.
  • Wei, L. ve Mooney, C. (2020). Epileptic Seizure Detection in Clinical EEGs Using an XGboost-based Method, IEEE SPMB, v1.0:1-6. https://isip.piconepress.com/conferences/ieee_spmb/2020/papers/l02_03.pdf?utm.
  • Wu, J., Zhou, T., Li. T. (2020). Detecting Epileptic Seizures in EEG Signals with Complementary Ensemble Empirical Mode Decomposition and Extreme Gradient Boosting. Entropy (Basel), 24;22(2):140. doi: 10.3390/e22020140.

EEG TABANLI NÖBET SINIFLANDIRMASI: PCA’NIN ETKİSİ VE LOJİSTİK REGRESYON, SVM, XGBOOST KARŞILAŞTIRMASI

Year 2025, Issue: Sayı:71 (EYS'25 Özel Sayısı), 21 - 40, 29.12.2025
https://doi.org/10.30794/pausbed.1775480

Abstract

Epilepsi, bireylerin sosyal ilişkilerini, toplumsal uyumunu ve yaşam kalitesini olumsuz yönde etkileyen bir hastalıktır. Bu çalışmada, çok öznitelikli (16 kanal), çok sınıflı (sağlıklı, jeneralize nöbet, fokal nöbet, nöbet aktivitesi) ve dengeli Bangalore Epilepsi veri kümesi üzerinde Lojistik Regresyon, Destek Vektör Makineleri (SVM) ve Aşırı Gradyan Artırma (XGBoost) modelleri; Temel Bileşenler Analizi (PCA) ile ve PCA’sız iki işlem hattında değerlendirilmiştir. Doğruluk-çalışma süresi dengesi bağlamında PCA’nın marjinal katkısı nicel olarak gösterilmiş; sınıflar arası ayrışmanın zorlaştığı durumlarda hangi modelin daha tutarlı ve etkin performans gösterdiği incelenmiştir. Bulgular, öznitelik korelasyonlarının yüksek olduğu veri kümelerinde XGBoost’un doğruluk, F1, ROC-AUC açısından SVM ve lojistik regresyona kıyasla daha iyi performans sergilediğine işaret etmektedir. Bu durum epileptik nöbet tespitinde PCA’nın her durumda model başarımını artırmadığını; dolayısıyla veri ön-işleme stratejilerinin model tipine göre dikkatle seçilmesi gerektiğini göstermektedir. Elde edilen sonuçlar, hem bilişsel sinyal işleme literatürüne metodolojik bir karşılaştırma katkısı sunmakta hem de model seçimi konusunda uygulayıcılara yol göstermektedir. Standart metrikler ile on katlı çapraz doğrulama ve farklı oranlarda ayrılmış test kümelerine dayalı bir kıyaslama sunması çalışmanın başlıca katkılarındandır.

Ethical Statement

Çalışmanın tüm süreçlerinin araştırma ve yayın etiğine uygun olduğunu, etik kurallara ve bilimsel atıf gösterme ilkelerine uyduğumu beyan ederim.

References

  • Abdi, H., ve Williams, L. J. (2010). Principal Component Analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459. https://doi.org/10.1002/wics.101.
  • Alpar, R. (2013). Çok Değişkenli İstatistiksel Yöntemler, Detay Yayıncılık: Ankara, Türkiye.
  • Berrich, Y. ve Guennoun, Z. (2025). EEG-Based Epilepsy Detection Using CNN-SVM and DNN-SVM with Feature Dimensionality Reduction By PCA. Sci Rep 15, 14313 (2025). https://doi.org/10.1038/s41598-025-95831-z.
  • Bishop, C. M. (2006). Pattern Recognition and Machine Learning, Springer.
  • Bradley, A. P. (1997). The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition, 30(7), 1145–1159. https://doi.org/10.1016/S0031-3203(96)00142-2
  • Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324.
  • Carvajal-Dossman, J.P., Guio, L., García-Orjuela, D. vd. (2025). Retraining and Evaluation of Machine Learning and Deep Learning Models for Seizure Classification from EEG Data. Sci Rep 15, 15345 (2025). https://doi.org/10.1038/s41598-025-98389-y.
  • Chan, J.-L., Leow, S., Bea, K., Cheng, W., Phoong, S., Hong, Z.-W. ve Chen, Y. L. (2022). Mitigating the Multicollinearity Problem and its Machine Learning Approach: A Review, Mathematics, 10(8), 1283. https://doi.org/10.3390/math10081283.
  • Chen, T., ve Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System, In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://dl.acm.org/doi/pdf/10.1145/2939672.2939785.
  • Chen, T., ve He, T. (2020). XGBoost: Extreme Gradient Boosting. R Package Version 1.0.0.2.
  • Cortes, C., ve Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018.
  • Diler, S. ve Demir, Y. (2024). Çoklu Doğrusal Bağlantı Olması Durumunda Veri Madenciliği Algoritmaları Performanslarının Karşılaştırılması, Nicel Bilimler Dergisi, 6(1), 40-67. doi: 10.51541/nicel.1371834.
  • Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. https://people.inf.elte.hu/kiss/13dwhdm/roc.pdf
  • Friedman, J., Hastie, T., ve Tibshirani, R. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. https://tinyurl.com/4cpndh32.
  • Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine, Annals of Statistics, 29(5), 1189–1232.
  • Geron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, Concepts, Tools, and Techniques to Build Intelligent Systems, 2nd Edition. O’Reilly Media, Inc., Canada. https://www.oreilly.com/catalog/errata.csp?isbn=9781492032649.
  • Gupta, A. ve Goel, S. (2023). Deep Learning Empowered Weather Image Classification for Accurate Analysis. IEEE 2nd International Conference on Industrial Electronics: Developments & Applications (ICIDeA), Imphal, India, 2023, pp. 567-572, doi: 10.1109/ICIDeA59866.2023.10295216.
  • Han, J., Kamber, M., ve Pei, J. (2012). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann. https://homes.di.unimi.it/ceselli/IM/2012-13/slides/02-KnowYourData.pdf.
  • Hanley, J. A., ve McNeil, B. J. (1982). The Meaning and Use of the Area Under a Receiver Operating Characteristic (ROC) Curve. Radiology, 143(1), 29–36. https://doi.org/10.1148/radiology.143.1.7063747.
  • Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate Data Analysis, (8th ed.). Cengage. Hastie, T., Tibshirani, R., ve Friedman, J. (2009). The Elements of Statistical Learning, (2nd ed.). Springer.
  • Hosmer, D. W., Lemeshov, S. ve Sturdivant, R. X. (2013). Applied Logistic Regression, (Third Edition). John Wiley & Sons, Inc: New Jersey, USA.
  • Islam, M. S., Thapa, K., ve Yang, S.-H. (2022). Epileptic-Net: An Improved Epileptic Seizure Detection System Using Dense Convolutional Block with Attention Network from EEG. Sensors, 22(3), 728. https://doi.org/10.3390/s22030728.
  • Jolliffe, I. T. (2002). Principal Component Analysis (2nd ed.). Springer Series in Statistics. Springer-Verlag, New York.
  • Kecman, V. (2005). Support Vector Machines: Theory and Applications. In L.Wang (Eds). Support Vector Machines-An Introduction (pp. 1–47). Springer, Berlin Heidelberg. https://doi.org/10.1007/b95439.
  • Kendirkıran, G. ve Doğan, S. (2024). Makine Öğrenimi Teknikleri ile Kredi Risk Tahmininde Yeniden Örnekleme Yöntemlerinin Karşılaştırılması, Söke İşletme Fakültesi Dergisi, Yıl: 2024, Cilt: 1, Sayı: 2, ss.48-60 https://dergipark.org.tr/tr/download/article-file/4364826.
  • Kıvrak, O. (2025). Araç Fiyat Tahmininde Makine Öğrenmesi Algoritmalarının Karşılaştırılması ve Performans Analizi, İktisadi İdari ve Siyasal Araştırmalar Dergisi, Cilt: 10 Sayı: 27, 454-474, 29.06.2025 https://doi.org/10.25204/iktisad.1494020.
  • Kong, G., Ma, S., Zhao, W., Wang, H., Fu, Q., & Wang, J. (2024). A Novel Method for Optimizing Epilepsy Detection Features Through Multi-Domain Feature Fusion and Selection, Frontiers in Computational Neuroscience, 18, 1416838. doi:10.3389/fncom.2024.1416838.
  • Lewis, N. D. (2017), Machine Learning Made Easy with R: An Intuitive Step by Step Blueprint for Beginners, CreateSpace Independent Publishing Platform: Carolina, USA.
  • Li. Q., Cao., W, Zhang, A. (2025). Multi-Stream Feature Fusion of Vision Transformer and CNN for Precise Epileptic Seizure Detection from EEG Signals. J Transl Med, Aug 6;23(1):871. doi: 10.1186/s12967-025-06862-z. PMID: 40770757; PMCID: PMC12329966.
  • Mason, C. H. ve Perreault, W. D. (1991), Collinearity, Power, and Interpretation of Multiple Regression Analysis, Journal of Marketing Research, 28(3), 268–280. https://doi.org/10.2307/3172863.
  • Moguerza, J., ve Muñoz, A. (2006). Support Vector Machines with Applications, Statistical Science, 21, 322- 336. https://doi.org/10.1214/088342306000000493.
  • Montgomery, D. C., Peck, E. A. ve Vining, G. G. (2021). Introduction to Linear Regression Analysis. John Wiley ve Sons.
  • Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press.
  • Najmusseher ve Banu, P. K. N. (2024). BEED: Bangalore EEG Epilepsy Dataset. [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5K33B. https://archive.ics.uci.edu/dataset.
  • Najmusseher ve Banu, P. K. N. (2025). Feature Engineering for Epileptic Seizure Classification Using SeqBoostNet. International Journal of Computing and Digital Systems, VOL. 17, NO. 1, 1–15. http://dx.doi.org/10.12785/ijcds/1571020131.
  • Rahman, M. M., Ghasemi, Y., Suley, E., Zhou, Y., Wang, S. ve Rogers, J. (2021). Machine Learning Based Computer Aided Diagnosis of Breast Cancer Utilizing Anthropometric and Clinical Features, IRBM, 42(4), 215-226.
  • Powers, D. M. W. (2011). Evaluation: From Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies, 2(1), 37–63. https://arxiv.org/abs/2010.16061.
  • Provost, F., ve Fawcett, T. (2013). Data Science for Business. O’Reilly Media. https://www.oreilly.com/library/view/data-science-for/9781449374273/.
  • Saito, T., ve Rehmsmeier, M. (2015). The Precision-Recall Plot is More Informative Than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLOS ONE, 10(3), e0118432. https://doi.org/10.1371/journal.pone.0118432.
  • Shalev-Shwartz, S., ve Ben-David, S. (2014). Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.
  • Shlens, J. (2014). A Tutorial on Principal Component Analysis. arXiv preprint arXiv:1404.1100.
  • Smola, A. J. ve Schölkopf, B. (2004). A Tutorial on Support Vector Regression. Statistics and Computing, 14, 199- 222. https://doi.org/10.1023/B:STCO.0000035301.49549.88.
  • Wei, L. ve Mooney, C. (2020). Epileptic Seizure Detection in Clinical EEGs Using an XGboost-based Method, IEEE SPMB, v1.0:1-6. https://isip.piconepress.com/conferences/ieee_spmb/2020/papers/l02_03.pdf?utm.
  • Wu, J., Zhou, T., Li. T. (2020). Detecting Epileptic Seizures in EEG Signals with Complementary Ensemble Empirical Mode Decomposition and Extreme Gradient Boosting. Entropy (Basel), 24;22(2):140. doi: 10.3390/e22020140.

EEG-BASED SEIZURE CLASSIFICATION: THE EFFECT OF PCA AND A COMPARISON OF LOGISTIC REGRESSION, SVM, AND XGBOOST

Year 2025, Issue: Sayı:71 (EYS'25 Özel Sayısı), 21 - 40, 29.12.2025
https://doi.org/10.30794/pausbed.1775480

Abstract

Epilepsy is a disorder that adversely affects individuals’ social relationships, societal integration, and overall quality of life. In this study, Logistic Regression, Support Vector Machines (SVM), and XGBoost models were evaluated on the multivariate (16-channel), multiclass (healthy, generalized seizure, focal seizure, seizure activity), and balanced Bangalore EEG Epilepsy Dataset using two processing pipelines—with and without Principal Component Analysis (PCA). The marginal contribution of PCA was quantitatively examined in terms of the accuracy–runtime trade-off, and the consistency and efficiency of each model were analyzed in cases where class separation became more challenging. The findings indicate that, in datasets with high feature correlations, XGBoost outperformed SVM and Logistic Regression in terms of accuracy, F1, and ROC-AUC metrics. This suggests that PCA does not necessarily improve model performance in epileptic seizure detection, emphasizing that data pre-processing strategies should be carefully chosen according to the model type. The results provide a methodological comparison that contributes to the cognitive signal processing literature and offer guidance for practitioners in model selection. Presenting a benchmark based on standard metrics, ten-fold cross-validation, and test sets with different split ratios constitutes one of the main contributions of this study.

References

  • Abdi, H., ve Williams, L. J. (2010). Principal Component Analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459. https://doi.org/10.1002/wics.101.
  • Alpar, R. (2013). Çok Değişkenli İstatistiksel Yöntemler, Detay Yayıncılık: Ankara, Türkiye.
  • Berrich, Y. ve Guennoun, Z. (2025). EEG-Based Epilepsy Detection Using CNN-SVM and DNN-SVM with Feature Dimensionality Reduction By PCA. Sci Rep 15, 14313 (2025). https://doi.org/10.1038/s41598-025-95831-z.
  • Bishop, C. M. (2006). Pattern Recognition and Machine Learning, Springer.
  • Bradley, A. P. (1997). The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition, 30(7), 1145–1159. https://doi.org/10.1016/S0031-3203(96)00142-2
  • Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324.
  • Carvajal-Dossman, J.P., Guio, L., García-Orjuela, D. vd. (2025). Retraining and Evaluation of Machine Learning and Deep Learning Models for Seizure Classification from EEG Data. Sci Rep 15, 15345 (2025). https://doi.org/10.1038/s41598-025-98389-y.
  • Chan, J.-L., Leow, S., Bea, K., Cheng, W., Phoong, S., Hong, Z.-W. ve Chen, Y. L. (2022). Mitigating the Multicollinearity Problem and its Machine Learning Approach: A Review, Mathematics, 10(8), 1283. https://doi.org/10.3390/math10081283.
  • Chen, T., ve Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System, In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://dl.acm.org/doi/pdf/10.1145/2939672.2939785.
  • Chen, T., ve He, T. (2020). XGBoost: Extreme Gradient Boosting. R Package Version 1.0.0.2.
  • Cortes, C., ve Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018.
  • Diler, S. ve Demir, Y. (2024). Çoklu Doğrusal Bağlantı Olması Durumunda Veri Madenciliği Algoritmaları Performanslarının Karşılaştırılması, Nicel Bilimler Dergisi, 6(1), 40-67. doi: 10.51541/nicel.1371834.
  • Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. https://people.inf.elte.hu/kiss/13dwhdm/roc.pdf
  • Friedman, J., Hastie, T., ve Tibshirani, R. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. https://tinyurl.com/4cpndh32.
  • Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine, Annals of Statistics, 29(5), 1189–1232.
  • Geron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, Concepts, Tools, and Techniques to Build Intelligent Systems, 2nd Edition. O’Reilly Media, Inc., Canada. https://www.oreilly.com/catalog/errata.csp?isbn=9781492032649.
  • Gupta, A. ve Goel, S. (2023). Deep Learning Empowered Weather Image Classification for Accurate Analysis. IEEE 2nd International Conference on Industrial Electronics: Developments & Applications (ICIDeA), Imphal, India, 2023, pp. 567-572, doi: 10.1109/ICIDeA59866.2023.10295216.
  • Han, J., Kamber, M., ve Pei, J. (2012). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann. https://homes.di.unimi.it/ceselli/IM/2012-13/slides/02-KnowYourData.pdf.
  • Hanley, J. A., ve McNeil, B. J. (1982). The Meaning and Use of the Area Under a Receiver Operating Characteristic (ROC) Curve. Radiology, 143(1), 29–36. https://doi.org/10.1148/radiology.143.1.7063747.
  • Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate Data Analysis, (8th ed.). Cengage. Hastie, T., Tibshirani, R., ve Friedman, J. (2009). The Elements of Statistical Learning, (2nd ed.). Springer.
  • Hosmer, D. W., Lemeshov, S. ve Sturdivant, R. X. (2013). Applied Logistic Regression, (Third Edition). John Wiley & Sons, Inc: New Jersey, USA.
  • Islam, M. S., Thapa, K., ve Yang, S.-H. (2022). Epileptic-Net: An Improved Epileptic Seizure Detection System Using Dense Convolutional Block with Attention Network from EEG. Sensors, 22(3), 728. https://doi.org/10.3390/s22030728.
  • Jolliffe, I. T. (2002). Principal Component Analysis (2nd ed.). Springer Series in Statistics. Springer-Verlag, New York.
  • Kecman, V. (2005). Support Vector Machines: Theory and Applications. In L.Wang (Eds). Support Vector Machines-An Introduction (pp. 1–47). Springer, Berlin Heidelberg. https://doi.org/10.1007/b95439.
  • Kendirkıran, G. ve Doğan, S. (2024). Makine Öğrenimi Teknikleri ile Kredi Risk Tahmininde Yeniden Örnekleme Yöntemlerinin Karşılaştırılması, Söke İşletme Fakültesi Dergisi, Yıl: 2024, Cilt: 1, Sayı: 2, ss.48-60 https://dergipark.org.tr/tr/download/article-file/4364826.
  • Kıvrak, O. (2025). Araç Fiyat Tahmininde Makine Öğrenmesi Algoritmalarının Karşılaştırılması ve Performans Analizi, İktisadi İdari ve Siyasal Araştırmalar Dergisi, Cilt: 10 Sayı: 27, 454-474, 29.06.2025 https://doi.org/10.25204/iktisad.1494020.
  • Kong, G., Ma, S., Zhao, W., Wang, H., Fu, Q., & Wang, J. (2024). A Novel Method for Optimizing Epilepsy Detection Features Through Multi-Domain Feature Fusion and Selection, Frontiers in Computational Neuroscience, 18, 1416838. doi:10.3389/fncom.2024.1416838.
  • Lewis, N. D. (2017), Machine Learning Made Easy with R: An Intuitive Step by Step Blueprint for Beginners, CreateSpace Independent Publishing Platform: Carolina, USA.
  • Li. Q., Cao., W, Zhang, A. (2025). Multi-Stream Feature Fusion of Vision Transformer and CNN for Precise Epileptic Seizure Detection from EEG Signals. J Transl Med, Aug 6;23(1):871. doi: 10.1186/s12967-025-06862-z. PMID: 40770757; PMCID: PMC12329966.
  • Mason, C. H. ve Perreault, W. D. (1991), Collinearity, Power, and Interpretation of Multiple Regression Analysis, Journal of Marketing Research, 28(3), 268–280. https://doi.org/10.2307/3172863.
  • Moguerza, J., ve Muñoz, A. (2006). Support Vector Machines with Applications, Statistical Science, 21, 322- 336. https://doi.org/10.1214/088342306000000493.
  • Montgomery, D. C., Peck, E. A. ve Vining, G. G. (2021). Introduction to Linear Regression Analysis. John Wiley ve Sons.
  • Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press.
  • Najmusseher ve Banu, P. K. N. (2024). BEED: Bangalore EEG Epilepsy Dataset. [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5K33B. https://archive.ics.uci.edu/dataset.
  • Najmusseher ve Banu, P. K. N. (2025). Feature Engineering for Epileptic Seizure Classification Using SeqBoostNet. International Journal of Computing and Digital Systems, VOL. 17, NO. 1, 1–15. http://dx.doi.org/10.12785/ijcds/1571020131.
  • Rahman, M. M., Ghasemi, Y., Suley, E., Zhou, Y., Wang, S. ve Rogers, J. (2021). Machine Learning Based Computer Aided Diagnosis of Breast Cancer Utilizing Anthropometric and Clinical Features, IRBM, 42(4), 215-226.
  • Powers, D. M. W. (2011). Evaluation: From Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies, 2(1), 37–63. https://arxiv.org/abs/2010.16061.
  • Provost, F., ve Fawcett, T. (2013). Data Science for Business. O’Reilly Media. https://www.oreilly.com/library/view/data-science-for/9781449374273/.
  • Saito, T., ve Rehmsmeier, M. (2015). The Precision-Recall Plot is More Informative Than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLOS ONE, 10(3), e0118432. https://doi.org/10.1371/journal.pone.0118432.
  • Shalev-Shwartz, S., ve Ben-David, S. (2014). Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.
  • Shlens, J. (2014). A Tutorial on Principal Component Analysis. arXiv preprint arXiv:1404.1100.
  • Smola, A. J. ve Schölkopf, B. (2004). A Tutorial on Support Vector Regression. Statistics and Computing, 14, 199- 222. https://doi.org/10.1023/B:STCO.0000035301.49549.88.
  • Wei, L. ve Mooney, C. (2020). Epileptic Seizure Detection in Clinical EEGs Using an XGboost-based Method, IEEE SPMB, v1.0:1-6. https://isip.piconepress.com/conferences/ieee_spmb/2020/papers/l02_03.pdf?utm.
  • Wu, J., Zhou, T., Li. T. (2020). Detecting Epileptic Seizures in EEG Signals with Complementary Ensemble Empirical Mode Decomposition and Extreme Gradient Boosting. Entropy (Basel), 24;22(2):140. doi: 10.3390/e22020140.
There are 44 citations in total.

Details

Primary Language Turkish
Subjects Econometric and Statistical Methods
Journal Section Research Article
Authors

Burcu Kocarık Gacar 0000-0001-5944-4456

Submission Date September 1, 2025
Acceptance Date October 16, 2025
Publication Date December 29, 2025
Published in Issue Year 2025 Issue: Sayı:71 (EYS'25 Özel Sayısı)

Cite

APA Kocarık Gacar, B. (2025). EEG TABANLI NÖBET SINIFLANDIRMASI: PCA’NIN ETKİSİ VE LOJİSTİK REGRESYON, SVM, XGBOOST KARŞILAŞTIRMASI. Pamukkale Üniversitesi Sosyal Bilimler Enstitüsü Dergisi(Sayı:71 (EYS’25 Özel Sayısı), 21-40. https://doi.org/10.30794/pausbed.1775480
by-nc-nd.eu.svg  The articles in this journal are licensed under a Creative Commons Attribution 4.0 International License.