EN
TR
Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity
Abstract
Accurate and reliable sleep staging from electroencephalography (EEG) is essential for both research and clinical applications. However, evaluation practices differ widely, and subtle methodological choices can strongly influence reported results. In this study, we examined how cross-validation strategies and normalization protocols affect the reliability and generalizability of EEG-based sleep staging models. Two benchmark datasets, SleepEDF and ISRUC, were used to systematically compare common approaches. We found that record-wise evaluation, often used in the literature, leads to overly optimistic results, while subject-wise and leave-one-subject-out (LOSO) evaluations provide more realistic estimates. On SleepEDF and ISRUC, record-wise median Macro-F1 was 0.70 and 0.71, respectively; under subject-wise it was lower by 9 and 7 percentage points. Similarly, normalization strategies matter: although fold-aware normalization performed better in standard tests, subject-aware normalization combined with test-time adaptation produced the most consistent and clinically relevant outcomes, which improves calibration (lower ECE) and supports safer decisions. In particular, it reduced errors and improved both classification accuracy and probability reliability; for example, on ISRUC, subject-aware further improved Macro-F1 by 0.08, reduced ECE by 0.02, and increased kappa by 0.10, compared with fold-aware normalization. We present a protocol-level, model-independent proof that evaluation and normalization decisions can compete with model selection, particularly when datasets change. Better-calibrated predictions and safer clinical decisions are obtained by using subject-wise/LOSO for internal assessment and subject-aware normalization with test-time adaptation for deployment.
Keywords
Supporting Institution
This research received no external funding.
Ethical Statement
This study does not involve human or animal participants. All procedures followed scientific and ethical principles, and all referenced studies are appropriately cited.
Thanks
The author do not wish to acknowledge any individual or institution.
References
- Albuquerque, I., Monteiro, J., Rosanne, O., & Falk, T. H. (2022). Estimating distribution shifts for predicting cross-subject generalization in electroencephalography-based mental workload assessment. Frontiers in Artificial Intelligence, 5, Article 992732. https://doi.org/10.3389/frai.2022.992732
- Alsolai, H., Qureshi, S., Iqbal, S. M. Z., Vanichayobon, S., Henesey, L. E., Lindley, C., & Karrila, S. (2022). A systematic review of literature on automated sleep scoring. IEEE Access, 10(11), 79419–79443. https://doi.org/10.1109/ACCESS.2022.3194145
- Berry, R. B., Quan, S. F., Abreu, A. R., Bibbs, M. L., DelRosso, L., Harding, S. M., Mao, M.-M., Plante, D. T., Pressman, M. R., Troester, M. M., & Vaughn, B. V. (2020). The AASM manual for the scoring of sleep and associated events: Rules, terminology and technical specifications (Version 2.6). American Academy of Sleep Medicine.
- Buriro, A. B., Ahmed, B., Baloch, G., Ahmed, J., Shoorangiz, R., Weddell, S. J., & Jones, R. D. (2021). Classification of alcoholic EEG signals using wavelet scattering transform-based features. Computers in Biology and Medicine, 139, Article 104969. https://doi.org/10.1016/j.compbiomed.2021.104969
- Cesari, M., Portscher, A., Stefani, A., Angerbauer, R., Ibrahim, A., Brandauer, E., Feuerstein, S., Egger, K., Högl, B., & Rodriguez-Sanchez, A. (2024). Machine learning predicts phenoconversion from polysomnography in isolated REM sleep behavior disorder. Brain Sciences, 14(9), Article 871. https://doi.org/10.3390/brainsci14090871
- Chambon, S., Galtier, M. N., Arnal, P. J., Wainrib, G., & Gramfort, A. (2018). A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26(4), 758–769. https://doi.org/10.1109/TNSRE.2018.2813138
- Chato, L., & Regentova, E. (2023). Survey of transfer learning approaches in the machine learning of digital health sensing data. Journal of Personalized Medicine, 13(12), Article 1703. https://doi.org/10.3390/jpm13121703
- Cheng, X., Huang, K., Zou, Y., & Ma, S. (2024). SleepEGAN: A GAN-enhanced ensemble deep learning model for imbalanced classification of sleep stages. Biomedical Signal Processing and Control, 92, Article 106020. https://doi.org/10.1016/j.bspc.2024.106020
Details
Primary Language
English
Subjects
Bioinformatics
Journal Section
Research Article
Authors
Publication Date
January 21, 2026
Submission Date
August 28, 2025
Acceptance Date
October 23, 2025
Published in Issue
Year 2026 Volume: 14 Number: 1
APA
Köksal, A. S. (2026). Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity. Duzce University Journal of Science and Technology, 14(1), 72-85. https://doi.org/10.29130/dubited.1773372
AMA
1.Köksal AS. Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity. DUBİTED. 2026;14(1):72-85. doi:10.29130/dubited.1773372
Chicago
Köksal, Ahmet Sertol. 2026. “Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity”. Duzce University Journal of Science and Technology 14 (1): 72-85. https://doi.org/10.29130/dubited.1773372.
EndNote
Köksal AS (January 1, 2026) Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity. Duzce University Journal of Science and Technology 14 1 72–85.
IEEE
[1]A. S. Köksal, “Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity”, DUBİTED, vol. 14, no. 1, pp. 72–85, Jan. 2026, doi: 10.29130/dubited.1773372.
ISNAD
Köksal, Ahmet Sertol. “Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity”. Duzce University Journal of Science and Technology 14/1 (January 1, 2026): 72-85. https://doi.org/10.29130/dubited.1773372.
JAMA
1.Köksal AS. Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity. DUBİTED. 2026;14:72–85.
MLA
Köksal, Ahmet Sertol. “Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity”. Duzce University Journal of Science and Technology, vol. 14, no. 1, Jan. 2026, pp. 72-85, doi:10.29130/dubited.1773372.
Vancouver
1.Ahmet Sertol Köksal. Cross-Validation and Normalization in EEG Sleep Staging: Impacts on Generalization, Calibration, and Clinical Validity. DUBİTED. 2026 Jan. 1;14(1):72-85. doi:10.29130/dubited.1773372