Research Article
BibTex RIS Cite

Year 2026, Volume: 14 Issue: 1, 26 - 35, 31.01.2026
https://doi.org/10.21541/apjess.1714659

Abstract

References

  • J. Lucier and P. M. Mathias, “Type 1 Diabetes,” in StatPearls, Treasure Island (FL): StatPearls Publishing, 2025. Accessed: May 27, 2025. [Online]. Available: http://www.ncbi.nlm.nih.gov/books/NBK507713/
  • R. Giancotti et al., “Forecasting glucose values for patients with type 1 diabetes using heart rate data,” Computer Methods and Programs in Biomedicine, vol. 257, p. 108438, Dec. 2024, Doi: 10.1016/j.cmpb.2024.108438.
  • J. R. Petrie, A. L. Peters, R. M. Bergenstal, R. W. Holl, G. A. Fleming, and L. Heinemann, “Improving the Clinical Value and Utility of CGM Systems: Issues and Recommendations,” Diabetes Care, vol. 40, no. 12, pp. 1614–1621, Dec. 2017, Doi: 10.2337/dci17-0043.
  • M. Reddy and N. Oliver, “The role of real‐time continuous glucose monitoring in diabetes management and how it should link to integrated personalized diabetes management,” Diabetes Obesity Metabolism, vol. 26, no. S1, pp. 46–56, Mar. 2024, Doi: 10.1111/dom.15504.
  • C. Marling and R. Bunescu, “The OhioT1DM Dataset for Blood Glucose Level Prediction: Update 2020,” 2021.
  • F. Prendin, J. Pavan, G. Cappon, S. Del Favero, G. Sparacino, and A. Facchinetti, “The importance of interpreting machine learning models for blood glucose prediction in diabetes: an analysis using SHAP,” Sci Rep, vol. 13, no. 1, p. 16865, Oct. 2023, Doi: 10.1038/s41598-023-44155-x.
  • E. O. Buschur et al., “Exploratory Analysis of Glycemic Control and Variability Over Gestation Among Pregnant Women with Type 1 Diabetes,” 2021, Doi: 10.1089/dia.2021.0138.
  • H. Khadem, H. Nemat, J. Elliott, and M. Benaissa, “Blood Glucose Level Time Series Forecasting: Nested Deep Ensemble Learning Lag Fusion,” 2023.
  • Georga, E. I., Protopappas, V. C., Polyzos, D., & Fotiadis, D. I. (2012). A predictive model of subcutaneous glucose concentration in type 1 diabetes based on Random Forests. Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2889–2892. https://doi.org/10.1109/embc.2012.6346567.
  • Nemat, H., Khadem, H., Eissa, M. R., Elliott, J., & Benaissa, M. (2022). Blood glucose level prediction: Advanced Deep-Ensemble Learning Approach. IEEE Journal of Biomedical and Health Informatics, 26(6), 2758–2769. https://doi.org/10.1109/jbhi.2022.3144870
  • Ghimire, S., Celik, T., Gerdes, M., & Omlin, C. W. (2024). Deep learning for blood glucose level prediction: How well do models generalize across different data sets? PLoS ONE, 19(9), e0310801. https://doi.org/10.1371/journal.pone.031080
  • Marling C, Bunescu R. The OhioT1DM Dataset for Blood Glucose Level Prediction: Update 2020. CEUR Workshop Proc. 2020 Sep;2675:71-74. PMID: 33584164; PMCID: PMC7881904.
  • T. Iordanova, “An Introduction to Non-Stationary Processes,” Investopedia. Accessed: May 27, 2025. [Online]. available: https://www.investopedia.com/articles/trading/07/stationary.asp
  • E. Baumohl and S. Lyocsa, “Stationarity of Time Series and the Problem of Spurious Regression,” SSRN Journal, 2009, Doi: 10.2139/ssrn.1480682.
  • D. A. Dickey and W. A. Fuller, “Distribution of the Estimators for Autoregressive Time Series with a Unit Root,” Journal of the American Statistical Association, vol. 74, no. 366a, pp. 427–431, Jun. 1979, Doi: 10.1080/01621459.1979.10482531.
  • Loshin. D., Chapter 18 - Data Enhancement, The practitioner's guide to data quality improvement. Elsevier, 2010.
  • P. Schober, C. Boer, and L. A. Schwarte, “Correlation Coefficients: Appropriate Use and Interpretation,” Anesthesia & Analgesia, vol. 126, no. 5, pp. 1763–1768, May 2018, Doi: 10.1213/ANE.0000000000002864.
  • B. Danisman et al., “Analysis of the correlation between blood glucose level and prognosis in patients younger than 18 years of age who had head trauma,” World J Emerg Surg, vol. 10, no. 1, p. 8, Dec. 2015, Doi: 10.1186/s13017-015-0010-0.
  • A. Gupta, S. K. Singh, B. N. Padmavathi, S.Y. Rajan, G.P. Mamatha, S. Kumar, S. Roy, Mohit Sareen, “Evaluation of Correlation of Blood Glucose and Salivary Glucose Level in Known Diabetic Patients,” JCDR, 2015, Doi: 10.7860/JCDR/2015/12398.5994.
  • T. Kajisa, T. Kuroi, H. Hara, and T. Sakai, “Correlation analysis of heart rate variations and glucose fluctuations during sleep,” Sleep Medicine, vol. 113, pp. 180–187, Jan. 2024, Doi: 10.1016/j.sleep.2023.11.038.

Enhancing the OhioT1DM Dataset for Predictive Modeling: Exploratory Analysis, Feature Engineering, and Correlation Insights

Year 2026, Volume: 14 Issue: 1, 26 - 35, 31.01.2026
https://doi.org/10.21541/apjess.1714659

Abstract

Type 1 diabetes mellitus is a serious illness, and the lack of open-source datasets for a long time has made it difficult to perform research on continuous glucose monitoring and blood glucose level prediction. The OhioT1DM dataset presents an excellent resource upon which to perform research on glucose variability and develop predictive models. In this study, we perform a thorough exploratory data analysis of the OhioT1DM dataset, focusing on data quality assessment, time-series stationarity, and data enhancement (i.e., feature augmentation/feature engineering). We begin by examining missing values and the time differences between consecutive timestamps to understand gaps and inconsistencies in the data. Stationarity is checked for each patient’s glucose levels to ascertain the constancy of the time-series data, which is crucial for accurate forecasting. To enrich the dataset, we introduce new features that capture important physiological and behavioral trends, such as insulin dose variability and daily exercise participation. We also conduct correlation analysis of all features against glucose levels. The findings indicate that the fingerstick measurements exhibit the highest positive correlation to glucose, followed by basal and temporary basal insulin, while exercise intensity shows a strong negative correlation. Although correlation analysis provides useful preliminary insights, it is inherently limited in capturing nonlinear or lagged dependencies; thus, future research should extend this work with nonlinear and time-aware analytical methods. This study contributes to overcoming key data challenges in the OhioT1DM dataset while introducing innovative features that can improve the performance of machine learning models for glucose prediction. The results support ongoing progress toward more accurate and reliable diabetes management solutions.

References

  • J. Lucier and P. M. Mathias, “Type 1 Diabetes,” in StatPearls, Treasure Island (FL): StatPearls Publishing, 2025. Accessed: May 27, 2025. [Online]. Available: http://www.ncbi.nlm.nih.gov/books/NBK507713/
  • R. Giancotti et al., “Forecasting glucose values for patients with type 1 diabetes using heart rate data,” Computer Methods and Programs in Biomedicine, vol. 257, p. 108438, Dec. 2024, Doi: 10.1016/j.cmpb.2024.108438.
  • J. R. Petrie, A. L. Peters, R. M. Bergenstal, R. W. Holl, G. A. Fleming, and L. Heinemann, “Improving the Clinical Value and Utility of CGM Systems: Issues and Recommendations,” Diabetes Care, vol. 40, no. 12, pp. 1614–1621, Dec. 2017, Doi: 10.2337/dci17-0043.
  • M. Reddy and N. Oliver, “The role of real‐time continuous glucose monitoring in diabetes management and how it should link to integrated personalized diabetes management,” Diabetes Obesity Metabolism, vol. 26, no. S1, pp. 46–56, Mar. 2024, Doi: 10.1111/dom.15504.
  • C. Marling and R. Bunescu, “The OhioT1DM Dataset for Blood Glucose Level Prediction: Update 2020,” 2021.
  • F. Prendin, J. Pavan, G. Cappon, S. Del Favero, G. Sparacino, and A. Facchinetti, “The importance of interpreting machine learning models for blood glucose prediction in diabetes: an analysis using SHAP,” Sci Rep, vol. 13, no. 1, p. 16865, Oct. 2023, Doi: 10.1038/s41598-023-44155-x.
  • E. O. Buschur et al., “Exploratory Analysis of Glycemic Control and Variability Over Gestation Among Pregnant Women with Type 1 Diabetes,” 2021, Doi: 10.1089/dia.2021.0138.
  • H. Khadem, H. Nemat, J. Elliott, and M. Benaissa, “Blood Glucose Level Time Series Forecasting: Nested Deep Ensemble Learning Lag Fusion,” 2023.
  • Georga, E. I., Protopappas, V. C., Polyzos, D., & Fotiadis, D. I. (2012). A predictive model of subcutaneous glucose concentration in type 1 diabetes based on Random Forests. Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2889–2892. https://doi.org/10.1109/embc.2012.6346567.
  • Nemat, H., Khadem, H., Eissa, M. R., Elliott, J., & Benaissa, M. (2022). Blood glucose level prediction: Advanced Deep-Ensemble Learning Approach. IEEE Journal of Biomedical and Health Informatics, 26(6), 2758–2769. https://doi.org/10.1109/jbhi.2022.3144870
  • Ghimire, S., Celik, T., Gerdes, M., & Omlin, C. W. (2024). Deep learning for blood glucose level prediction: How well do models generalize across different data sets? PLoS ONE, 19(9), e0310801. https://doi.org/10.1371/journal.pone.031080
  • Marling C, Bunescu R. The OhioT1DM Dataset for Blood Glucose Level Prediction: Update 2020. CEUR Workshop Proc. 2020 Sep;2675:71-74. PMID: 33584164; PMCID: PMC7881904.
  • T. Iordanova, “An Introduction to Non-Stationary Processes,” Investopedia. Accessed: May 27, 2025. [Online]. available: https://www.investopedia.com/articles/trading/07/stationary.asp
  • E. Baumohl and S. Lyocsa, “Stationarity of Time Series and the Problem of Spurious Regression,” SSRN Journal, 2009, Doi: 10.2139/ssrn.1480682.
  • D. A. Dickey and W. A. Fuller, “Distribution of the Estimators for Autoregressive Time Series with a Unit Root,” Journal of the American Statistical Association, vol. 74, no. 366a, pp. 427–431, Jun. 1979, Doi: 10.1080/01621459.1979.10482531.
  • Loshin. D., Chapter 18 - Data Enhancement, The practitioner's guide to data quality improvement. Elsevier, 2010.
  • P. Schober, C. Boer, and L. A. Schwarte, “Correlation Coefficients: Appropriate Use and Interpretation,” Anesthesia & Analgesia, vol. 126, no. 5, pp. 1763–1768, May 2018, Doi: 10.1213/ANE.0000000000002864.
  • B. Danisman et al., “Analysis of the correlation between blood glucose level and prognosis in patients younger than 18 years of age who had head trauma,” World J Emerg Surg, vol. 10, no. 1, p. 8, Dec. 2015, Doi: 10.1186/s13017-015-0010-0.
  • A. Gupta, S. K. Singh, B. N. Padmavathi, S.Y. Rajan, G.P. Mamatha, S. Kumar, S. Roy, Mohit Sareen, “Evaluation of Correlation of Blood Glucose and Salivary Glucose Level in Known Diabetic Patients,” JCDR, 2015, Doi: 10.7860/JCDR/2015/12398.5994.
  • T. Kajisa, T. Kuroi, H. Hara, and T. Sakai, “Correlation analysis of heart rate variations and glucose fluctuations during sleep,” Sleep Medicine, vol. 113, pp. 180–187, Jan. 2024, Doi: 10.1016/j.sleep.2023.11.038.
There are 20 citations in total.

Details

Primary Language English
Subjects Machine Learning Algorithms, Bioinformatics
Journal Section Research Article
Authors

Taofiq Olanrewaju Musa 0009-0003-2859-4816

Arsene Adjevi 0000-0003-4305-3914

Donaldo Omondi Jaccojwang 0009-0007-3552-9755

Raheem Nasirudeen Adeleye 0009-0000-2551-3775

Diyaolu Abdulmalik Opeyemi 0009-0003-2169-0805

Süleyman Uzun 0000-0001-8246-6733

Mustafa Zahid Yıldız 0000-0003-1870-288X

Ali Lazim 0009-0002-3431-1109

Rhobi Peter Mwita 0009-0002-1536-538X

Submission Date June 5, 2025
Acceptance Date September 13, 2025
Publication Date January 31, 2026
Published in Issue Year 2026 Volume: 14 Issue: 1

Cite

IEEE [1]T. O. Musa et al., “Enhancing the OhioT1DM Dataset for Predictive Modeling: Exploratory Analysis, Feature Engineering, and Correlation Insights”, APJESS, vol. 14, no. 1, pp. 26–35, Jan. 2026, doi: 10.21541/apjess.1714659.

Academic Platform Journal of Engineering and Smart Systems