Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity

Ahmet Sertol Köksal

doi:10.29130/dubited.1773372

EN TR

Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity

Abstract

Accurate and reliable sleep staging from electroencephalography (EEG) is essential for both research and clinical applications. However, evaluation practices differ widely, and subtle methodological choices can strongly influence reported results. In this study, we examined how cross-validation strategies and normalization protocols affect the reliability and generalizability of EEG-based sleep staging models. Two benchmark datasets, SleepEDF and ISRUC, were used to systematically compare common approaches. We found that record-wise evaluation, often used in the literature, leads to overly optimistic results, while subject-wise and leave-one-subject-out (LOSO) evaluations provide more realistic estimates. On SleepEDF and ISRUC, record-wise median Macro-F1 was 0.70 and 0.71, respectively; under subject-wise it was lower by 9 and 7 percentage points. Similarly, normalization strategies matter: although fold-aware normalization performed better in standard tests, subject-aware normalization combined with test-time adaptation produced the most consistent and clinically relevant outcomes, which improves calibration (lower ECE) and supports safer decisions. In particular, it reduced errors and improved both classification accuracy and probability reliability; for example, on ISRUC, subject-aware further improved Macro-F1 by 0.08, reduced ECE by 0.02, and increased kappa by 0.10, compared with fold-aware normalization. We present a protocol-level, model-independent proof that evaluation and normalization decisions can compete with model selection, particularly when datasets change. Better-calibrated predictions and safer clinical decisions are obtained by using subject-wise/LOSO for internal assessment and subject-aware normalization with test-time adaptation for deployment.

Keywords

EEG, Sleep staging, Cross-validation, Normalization, Calibration, Domain Adaptation

Supporting Institution

This research received no external funding.

Ethical Statement

This study does not involve human or animal participants. All procedures followed scientific and ethical principles, and all referenced studies are appropriately cited.

Thanks

The author do not wish to acknowledge any individual or institution.

EEG Tabanlı Uyku Evrelemede Çapraz Doğrulama ve Normalizasyon: Genelleme, Kalibrasyon ve Klinik Geçerlilik Üzerindeki Etkiler

Abstract

Elektroensefalografi (EEG) tabanlı uyku evreleme, hem araştırma hem de klinik uygulamalar için kritik öneme sahiptir. Ancak değerlendirme yaklaşımları literatürde büyük farklılıklar göstermekte ve yöntemsel tercihler raporlanan sonuçları güçlü biçimde etkileyebilmektedir. Bu çalışmada, çapraz doğrulama stratejileri ve normalizasyon protokollerinin EEG tabanlı uyku evreleme modellerinin güvenilirliği ve genellenebilirliği üzerindeki etkileri incelenmiştir. İki temel veri seti (SleepEDF ve ISRUC) kullanılarak yaygın yaklaşımlar sistematik biçimde karşılaştırılmıştır. Bulgular, literatürde sık kullanılan kayıt-bazlı değerlendirmenin aşırı iyimser sonuçlara yol açtığını; buna karşın birey-bazlı ve bir-birey-dışlama yaklaşımlarının daha gerçekçi tahminler sunduğunu göstermektedir. Benzer şekilde normalizasyon stratejileri de kritik rol oynamaktadır: standart testlerde kat-düzeyinde normalizasyon daha başarılı bulunurken, birey-düzeyinde normalizasyon yöntemi test-zamanlı uyarlama ile birlikte en tutarlı ve klinik açıdan en anlamlı sonuçları vermiştir. Özellikle test-zamanlı uyarlama, hataları azaltmış ve farklı veri setlerinde hem sınıflandırma doğruluğunu hem de olasılık güvenilirliğini artırmıştır. Sonuç olarak, değerlendirme protokollerinin seçiminde algoritma seçimi kadar dikkatli olunması gerektiği ortaya konmuştur. Daha titiz ve klinik odaklı değerlendirme stratejilerinin benimsenmesiyle EEG tabanlı uyku evreleme sistemleri daha güvenilir hale gelebilir ve gerçek sağlık uygulamaları için daha uygun bir yapıya kavuşabilir.

Keywords

Elektroensefalografi (EEG), Uyku evreleme, Çapraz doğrulama, Normalizasyon, Kalibrasyon, Alan uyarlaması

References

Albuquerque, I., Monteiro, J., Rosanne, O., & Falk, T. H. (2022). Estimating distribution shifts for predicting cross-subject generalization in electroencephalography-based mental workload assessment. Frontiers in Artificial Intelligence, 5, Article 992732. https://doi.org/10.3389/frai.2022.992732
Alsolai, H., Qureshi, S., Iqbal, S. M. Z., Vanichayobon, S., Henesey, L. E., Lindley, C., & Karrila, S. (2022). A systematic review of literature on automated sleep scoring. IEEE Access, 10(11), 79419–79443. https://doi.org/10.1109/ACCESS.2022.3194145
Berry, R. B., Quan, S. F., Abreu, A. R., Bibbs, M. L., DelRosso, L., Harding, S. M., Mao, M.-M., Plante, D. T., Pressman, M. R., Troester, M. M., & Vaughn, B. V. (2020). The AASM manual for the scoring of sleep and associated events: Rules, terminology and technical specifications (Version 2.6). American Academy of Sleep Medicine.
Buriro, A. B., Ahmed, B., Baloch, G., Ahmed, J., Shoorangiz, R., Weddell, S. J., & Jones, R. D. (2021). Classification of alcoholic EEG signals using wavelet scattering transform-based features. Computers in Biology and Medicine, 139, Article 104969. https://doi.org/10.1016/j.compbiomed.2021.104969
Cesari, M., Portscher, A., Stefani, A., Angerbauer, R., Ibrahim, A., Brandauer, E., Feuerstein, S., Egger, K., Högl, B., & Rodriguez-Sanchez, A. (2024). Machine learning predicts phenoconversion from polysomnography in isolated REM sleep behavior disorder. Brain Sciences, 14(9), Article 871. https://doi.org/10.3390/brainsci14090871
Chambon, S., Galtier, M. N., Arnal, P. J., Wainrib, G., & Gramfort, A. (2018). A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26(4), 758–769. https://doi.org/10.1109/TNSRE.2018.2813138
Chato, L., & Regentova, E. (2023). Survey of transfer learning approaches in the machine learning of digital health sensing data. Journal of Personalized Medicine, 13(12), Article 1703. https://doi.org/10.3390/jpm13121703
Cheng, X., Huang, K., Zou, Y., & Ma, S. (2024). SleepEGAN: A GAN-enhanced ensemble deep learning model for imbalanced classification of sleep stages. Biomedical Signal Processing and Control, 92, Article 106020. https://doi.org/10.1016/j.bspc.2024.106020
Collins, G. S., Moons, K. G. M., Dhiman, P., Riley, R. D., Beam, A. L., Van Calster, B., Ghassemi, M., Liu, X., Reitsma, J. B., van Smeden, M., Boulesteix, A.-L., Camaradou, J. C., Celi, L. A., Denaxas, S., Denniston, A. K., Glocker, B., Golub, R. M., Harvey, H., Heinze, G., … Logullo, P. (2024). TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ, 385, Article e078378. https://doi.org/10.1136/bmj-2023-078378
Eldele, E., Chen, Z., Liu, C., Wu, M., Kwoh, C.-K., Li, X., & Guan, C. (2021). An attention-based deep learning approach for sleep stage classification with single-channel EEG. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29, 809–818. https://doi.org/10.1109/TNSRE.2021.3076234

Eldele, E., Ragab, M., Chen, Z., Wu, M., Kwoh, C.-K., Li, X., & Guan, C. (2023). ADAST: Attentive cross-domain EEG-based sleep staging framework with iterative self-training. IEEE Transactions on Emerging Topics in Computational Intelligence, 7(1), 210–221. https://doi.org/10.1109/TETCI.2022.3189695
Fiorillo, L., Pedroncelli, D., Agostini, V., Favaro, P., & Di Faraci, F. (2023). Multi-scored sleep databases: How to exploit the multiple labels in automated sleep scoring. Sleep, 46(5), Article zsad028. https://doi.org/10.1093/sleep/zsad028
Fultz, N. E., Bonmassar, G., Setsompop, K., Stickgold, R. A., Rosen, B. R., Polimeni, J. R., & Lewis, L. D. (2019). Coupled electrophysiological, hemodynamic, and cerebrospinal fluid oscillations in human sleep. Science, 366(6465), 628–631. https://doi.org/10.1126/science.aax5440
Goldberger, A. L., Amaral, L. A. N., Glass, L., Hausdorff, J. M., Ivanov, P. Ch., Mark, R. G., Mietus, J. E., Moody, G. B., Peng, C.-K., & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet. Circulation, 101(23), e215–e220. https://doi.org/10.1161/01.CIR.101.23.e215
He, Z., Du, L., Wang, P., Xia, P., Liu, Z., Song, Y., Chen, X., & Fang, Z. (2022). Single-channel EEG sleep staging based on data augmentation and cross-subject discrepancy alleviation. Computers in Biology and Medicine, 149, Article 106044. https://doi.org/10.1016/j.compbiomed.2022.106044
Huang, G., Zhao, Z., Zhang, S., Hu, Z., Fan, J., Fu, M., Chen, J., Xiao, Y., Wang, J., & Dan, G. (2023). Discrepancy between inter- and intra-subject variability in EEG-based motor imagery brain–computer interface: Evidence from multiple perspectives. Frontiers in Neuroscience, 17, Article 1122661. https://doi.org/10.3389/fnins.2023.1122661
Irwin, M. R. (2015). Why sleep is important for health: A psychoneuroimmunology perspective. Annual Review of Psychology, 66(1), 143–172. https://doi.org/10.1146/annurev-psych-010213-115205
Jirakittayakorn, N., Wongsawat, Y., & Mitrirattanakul, S. (2024). ZleepAnlystNet: A novel deep learning model for automatic sleep stage scoring based on single-channel raw EEG data using separating training. Scientific Reports, 14(1), Article 9859. https://doi.org/10.1038/s41598-024-60796-y
Khalighi, S., Sousa, T., Santos, J. M., & Nunes, U. (2016). ISRUC-Sleep: A comprehensive public dataset for sleep researchers. Computer Methods and Programs in Biomedicine, 124, 180–192. https://doi.org/10.1016/j.cmpb.2015.10.013
Kryger, M. H., Roth, T., & Dement, W. C. (2010). Principles and practice of sleep medicine (5th ed.). Elsevier Saunders.
Lee, H., Choi, Y. R., Lee, H. K., Jeong, J., Hong, J., Shin, H. W., & Kim, H. S. (2025). Explainable vision transformer for automatic visual sleep staging on multimodal PSG signals. NPJ Digital Medicine, 8(1), Article 55. https://doi.org/10.1038/s41746-024-01378-0
Lee, Y. J., Lee, J. Y., Cho, J. H., & Choi, J. H. (2022). Interrater reliability of sleep stage scoring: A meta-analysis. Journal of Clinical Sleep Medicine, 18(1), 193–202. https://doi.org/10.5664/jcsm.9538
Liu, Y., Ghafoor, A. A., Hajipour, M., & Ayas, N. (2023). Role of precision medicine in obstructive sleep apnoea. BMJ Medicine, 2(1), Article e000218. https://doi.org/10.1136/bmjmed-2022-000218
Perslev, M., Darkner, S., Kempfner, L., Nikolic, M., Jennum, P. J., & Igel, C. (2021). U-Sleep: Resilient high-frequency sleep staging. NPJ Digital Medicine, 4, Article 72. https://doi.org/10.1038/s41746-021-00440-5
Phan, H., Andreotti, F., Cooray, N., Chén, O. Y., & De Vos, M. (2019). SeqSleepNet: End-to-end hierarchical recurrent neural network for sequence-to-sequence automatic sleep staging. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(3), 400–410. https://doi.org/10.1109/TNSRE.2019.2896659
Phan, H., & Mikkelsen, K. (2022). Automatic sleep staging of EEG signals: Recent development, challenges, and future directions. Physiological Measurement, 43(4), Article 04TR01. https://doi.org/10.1088/1361-6579/ac6049
Rasch, B., & Born, J. (2013). About sleep’s role in memory. Physiological Reviews, 93(2), 681–766. https://doi.org/10.1152/physrev.00032.2012
Saha, S., & Baumert, M. (2020). Intra- and inter-subject variability in EEG-based sensorimotor brain–computer interface: A review. Frontiers in Computational Neuroscience, 13, Article 87. https://doi.org/10.3389/fncom.2019.00087
Samaee, M., Yazdi, M., & Massicotte, D. (2025). Multi-modal signal integration for enhanced sleep stage classification: Leveraging EOG and 2-channel EEG data with advanced feature extraction. Artificial Intelligence in Medicine, 166, Article 103152. https://doi.org/10.1016/j.artmed.2025.103152
Sarafraz, G., Behnamnia, A., Hosseinzadeh, M., Balapour, A., Meghrazi, A., & Rabiee, H. R. (2024). Domain adaptation and generalization of functional medical data: A systematic survey of brain data. ACM Computing Surveys, 56(10), 1–39. https://doi.org/10.1145/3654664
Satapathy, S. K., & Loganathan, D. (2023). Automated classification of multi-class sleep stages classification using polysomnography signals: A nine-layer 1D-convolution neural network approach. Multimedia Tools and Applications, 82, 8049–8091. https://doi.org/10.1007/s11042-022-13195-2
Sentner, T., Wang, X., de Groot, E. R., van Schaijk, L., Tataranno, M. L., Vijlbrief, D. C., Benders, M. J. N. L., Bartels, R., & Dudink, J. (2022). The Sleep Well Baby project: An automated real-time sleep–wake state prediction algorithm in preterm infants. Sleep, 45(10), Article zsac143. https://doi.org/10.1093/sleep/zsac143
Supratak, A., Dong, H., Wu, C., & Guo, Y. (2017). DeepSleepNet: A model for automatic sleep stage scoring based on raw single-channel EEG. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(11), 1998–2008. https://doi.org/10.1109/TNSRE.2017.2721116
Toma, T. I., & Choi, S. (2023). An end-to-end multi-channel convolutional Bi-LSTM network for automatic sleep stage detection. Sensors, 23(10), Article 4950. https://doi.org/10.3390/s23104950
Uçar, M. K., & Düzayak, S. (2020). Papüloskuamöz hastalıkların belirlenmesi için yapay zeka yöntemleriyle kural tabanlı teşhis algoritmalarının geliştirilmesi. Duzce University Journal of Science and Technology, 8(3), 1903–1922.
van der Plas, D., Verbraecken, J., Willemen, M., Meert, W., & Davis, J. (2021). Evaluation of automated hypnogram analysis on multi-scored polysomnographies. Frontiers in Digital Health, 3, Article 707589. https://doi.org/10.3389/fdgth.2021.707589
van Sweden, B., Kemp, B., Kamphuisen, H. A., & van der Velde, E. A. (1990). Alternative electrode placement in (automatic) sleep scoring (Fpz–Cz/Pz–Oz versus C4–A1). Sleep, 13(3), 279–283. https://doi.org/10.1093/sleep/13.3.279
van Twist, E., Hiemstra, F. W., Cramer, A. B. G., Verbruggen, S. C. A. T., Tax, D. M. J., Joosten, K., Louter, M., Straver, D. C. G., de Hoog, M., Kuiper, J. W., & de Jonge, R. C. J. (2024). An electroencephalography-based sleep index and supervised machine learning as a suitable tool for automated sleep classification in children. Journal of Clinical Sleep Medicine, 20(3), 389–397. https://doi.org/10.5664/jcsm.10880
von Ellenrieder, N., Peter-Derex, L., Gotman, J., & Frauscher, B. (2022). SleepSEEG: Automatic sleep scoring using intracranial EEG recordings only. Journal of Neural Engineering, 19(2), Article 026057. https://doi.org/10.1088/1741-2552/ac6829
Walker, M. P., & Stickgold, R. (2006). Sleep, memory, and plasticity. Annual Review of Psychology, 57(1), 139–166. https://doi.org/10.1146/annurev.psych.56.091103.070307
Wolpert, E. A. (1969). A manual of standardized terminology, techniques and scoring system for sleep stages of human subjects. Archives of General Psychiatry, 20(2), 246–247. https://doi.org/10.1001/archpsyc.1969.01740140118016
Zhang, W., Li, C., Peng, H., Qiao, H., & Chen, X. (2024). CTCNet: A CNN transformer capsule network for sleep stage classification. Measurement, 226, Article 114157. https://doi.org/10.1016/j.measurement.2024.114157
Zhao, M., Yue, S., Katabi, D., Jaakkola, T. S., & Bianchi, M. T. (2017). Learning sleep stages from radio signals: A conditional adversarial architecture. In Proceedings of the 34th International Conference on Machine Learning (pp. 4100–4109). Proceedings of Machine Learning Research, Vol. 70. https://proceedings.mlr.press/v70/zhao17d.html

Details

Primary Language

English

Subjects

Bioinformatics

Journal Section

Research Article

Authors

Ahmet Sertol Köksal ^*
0000-0002-3452-828X
Türkiye

Publication Date

January 21, 2026

Submission Date

August 28, 2025

Acceptance Date

October 23, 2025

Published in Issue

Year 2026 Volume: 14 Number: 1

DOI

https://doi.org/10.29130/dubited.1773372

IZ

https://izlik.org/JA37YL65XK

APA

Köksal, A. S. (2026). Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity. Duzce University Journal of Science and Technology, 14(1), 72-85. https://doi.org/10.29130/dubited.1773372

AMA

1.Köksal AS. Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity. DUBİTED. 2026;14(1):72-85. doi:10.29130/dubited.1773372

Chicago

Köksal, Ahmet Sertol. 2026. “Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity”. Duzce University Journal of Science and Technology 14 (1): 72-85. https://doi.org/10.29130/dubited.1773372.

EndNote

Köksal AS (January 1, 2026) Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity. Duzce University Journal of Science and Technology 14 1 72–85.

IEEE

[1]A. S. Köksal, “Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity”, DUBİTED, vol. 14, no. 1, pp. 72–85, Jan. 2026, doi: 10.29130/dubited.1773372.

ISNAD

Köksal, Ahmet Sertol. “Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity”. Duzce University Journal of Science and Technology 14/1 (January 1, 2026): 72-85. https://doi.org/10.29130/dubited.1773372.

JAMA

1.Köksal AS. Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity. DUBİTED. 2026;14:72–85.

MLA

Köksal, Ahmet Sertol. “Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity”. Duzce University Journal of Science and Technology, vol. 14, no. 1, Jan. 2026, pp. 72-85, doi:10.29130/dubited.1773372.

Vancouver

1.Ahmet Sertol Köksal. Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity. DUBİTED. 2026 Jan. 1;14(1):72-85. doi:10.29130/dubited.1773372

Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity

Abstract

Keywords

Supporting Institution

Ethical Statement

Thanks

EEG Tabanlı Uyku Evrelemede Çapraz Doğrulama ve Normalizasyon: Genelleme, Kalibrasyon ve Klinik Geçerlilik Üzerindeki Etkiler

Abstract

Keywords

References

Details

Primary Language

Subjects

Journal Section

Authors

Publication Date

Submission Date

Acceptance Date

Published in Issue

DOI

IZ

Cite