Research Article

Enhancing the OhioT1DM Dataset for Predictive Modeling: Exploratory Analysis, Feature Engineering, and Correlation Insights

Volume: 14 Number: 1 January 31, 2026
EN

Enhancing the OhioT1DM Dataset for Predictive Modeling: Exploratory Analysis, Feature Engineering, and Correlation Insights

Abstract

Type 1 diabetes mellitus is a serious illness, and the lack of open-source datasets for a long time has made it difficult to perform research on continuous glucose monitoring and blood glucose level prediction. The OhioT1DM dataset presents an excellent resource upon which to perform research on glucose variability and develop predictive models. In this study, we perform a thorough exploratory data analysis of the OhioT1DM dataset, focusing on data quality assessment, time-series stationarity, and data enhancement (i.e., feature augmentation/feature engineering). We begin by examining missing values and the time differences between consecutive timestamps to understand gaps and inconsistencies in the data. Stationarity is checked for each patient’s glucose levels to ascertain the constancy of the time-series data, which is crucial for accurate forecasting. To enrich the dataset, we introduce new features that capture important physiological and behavioral trends, such as insulin dose variability and daily exercise participation. We also conduct correlation analysis of all features against glucose levels. The findings indicate that the fingerstick measurements exhibit the highest positive correlation to glucose, followed by basal and temporary basal insulin, while exercise intensity shows a strong negative correlation. Although correlation analysis provides useful preliminary insights, it is inherently limited in capturing nonlinear or lagged dependencies; thus, future research should extend this work with nonlinear and time-aware analytical methods. This study contributes to overcoming key data challenges in the OhioT1DM dataset while introducing innovative features that can improve the performance of machine learning models for glucose prediction. The results support ongoing progress toward more accurate and reliable diabetes management solutions.

Keywords

References

  1. J. Lucier and P. M. Mathias, “Type 1 Diabetes,” in StatPearls, Treasure Island (FL): StatPearls Publishing, 2025. Accessed: May 27, 2025. [Online]. Available: http://www.ncbi.nlm.nih.gov/books/NBK507713/
  2. R. Giancotti et al., “Forecasting glucose values for patients with type 1 diabetes using heart rate data,” Computer Methods and Programs in Biomedicine, vol. 257, p. 108438, Dec. 2024, Doi: 10.1016/j.cmpb.2024.108438.
  3. J. R. Petrie, A. L. Peters, R. M. Bergenstal, R. W. Holl, G. A. Fleming, and L. Heinemann, “Improving the Clinical Value and Utility of CGM Systems: Issues and Recommendations,” Diabetes Care, vol. 40, no. 12, pp. 1614–1621, Dec. 2017, Doi: 10.2337/dci17-0043.
  4. M. Reddy and N. Oliver, “The role of real‐time continuous glucose monitoring in diabetes management and how it should link to integrated personalized diabetes management,” Diabetes Obesity Metabolism, vol. 26, no. S1, pp. 46–56, Mar. 2024, Doi: 10.1111/dom.15504.
  5. C. Marling and R. Bunescu, “The OhioT1DM Dataset for Blood Glucose Level Prediction: Update 2020,” 2021.
  6. F. Prendin, J. Pavan, G. Cappon, S. Del Favero, G. Sparacino, and A. Facchinetti, “The importance of interpreting machine learning models for blood glucose prediction in diabetes: an analysis using SHAP,” Sci Rep, vol. 13, no. 1, p. 16865, Oct. 2023, Doi: 10.1038/s41598-023-44155-x.
  7. E. O. Buschur et al., “Exploratory Analysis of Glycemic Control and Variability Over Gestation Among Pregnant Women with Type 1 Diabetes,” 2021, Doi: 10.1089/dia.2021.0138.
  8. H. Khadem, H. Nemat, J. Elliott, and M. Benaissa, “Blood Glucose Level Time Series Forecasting: Nested Deep Ensemble Learning Lag Fusion,” 2023.

Details

Primary Language

English

Subjects

Machine Learning Algorithms, Bioinformatics

Journal Section

Research Article

Publication Date

January 31, 2026

Submission Date

June 5, 2025

Acceptance Date

September 13, 2025

Published in Issue

Year 2026 Volume: 14 Number: 1

APA
Musa, T. O., Adjevi, A., Jaccojwang, D. O., Adeleye, R. N., Opeyemi, D. A., Uzun, S., Yıldız, M. Z., Lazim, A., & Mwita, R. P. (2026). Enhancing the OhioT1DM Dataset for Predictive Modeling: Exploratory Analysis, Feature Engineering, and Correlation Insights. Academic Platform Journal of Engineering and Smart Systems, 14(1), 26-35. https://doi.org/10.21541/apjess.1714659
AMA
1.Musa TO, Adjevi A, Jaccojwang DO, et al. Enhancing the OhioT1DM Dataset for Predictive Modeling: Exploratory Analysis, Feature Engineering, and Correlation Insights. APJESS. 2026;14(1):26-35. doi:10.21541/apjess.1714659
Chicago
Musa, Taofiq Olanrewaju, Arsene Adjevi, Donaldo Omondi Jaccojwang, et al. 2026. “Enhancing the OhioT1DM Dataset for Predictive Modeling: Exploratory Analysis, Feature Engineering, and Correlation Insights”. Academic Platform Journal of Engineering and Smart Systems 14 (1): 26-35. https://doi.org/10.21541/apjess.1714659.
EndNote
Musa TO, Adjevi A, Jaccojwang DO, Adeleye RN, Opeyemi DA, Uzun S, Yıldız MZ, Lazim A, Mwita RP (January 1, 2026) Enhancing the OhioT1DM Dataset for Predictive Modeling: Exploratory Analysis, Feature Engineering, and Correlation Insights. Academic Platform Journal of Engineering and Smart Systems 14 1 26–35.
IEEE
[1]T. O. Musa et al., “Enhancing the OhioT1DM Dataset for Predictive Modeling: Exploratory Analysis, Feature Engineering, and Correlation Insights”, APJESS, vol. 14, no. 1, pp. 26–35, Jan. 2026, doi: 10.21541/apjess.1714659.
ISNAD
Musa, Taofiq Olanrewaju - Adjevi, Arsene - Jaccojwang, Donaldo Omondi - Adeleye, Raheem Nasirudeen - Opeyemi, Diyaolu Abdulmalik - Uzun, Süleyman - Yıldız, Mustafa Zahid - Lazim, Ali - Mwita, Rhobi Peter. “Enhancing the OhioT1DM Dataset for Predictive Modeling: Exploratory Analysis, Feature Engineering, and Correlation Insights”. Academic Platform Journal of Engineering and Smart Systems 14/1 (January 1, 2026): 26-35. https://doi.org/10.21541/apjess.1714659.
JAMA
1.Musa TO, Adjevi A, Jaccojwang DO, Adeleye RN, Opeyemi DA, Uzun S, Yıldız MZ, Lazim A, Mwita RP. Enhancing the OhioT1DM Dataset for Predictive Modeling: Exploratory Analysis, Feature Engineering, and Correlation Insights. APJESS. 2026;14:26–35.
MLA
Musa, Taofiq Olanrewaju, et al. “Enhancing the OhioT1DM Dataset for Predictive Modeling: Exploratory Analysis, Feature Engineering, and Correlation Insights”. Academic Platform Journal of Engineering and Smart Systems, vol. 14, no. 1, Jan. 2026, pp. 26-35, doi:10.21541/apjess.1714659.
Vancouver
1.Taofiq Olanrewaju Musa, Arsene Adjevi, Donaldo Omondi Jaccojwang, Raheem Nasirudeen Adeleye, Diyaolu Abdulmalik Opeyemi, Süleyman Uzun, Mustafa Zahid Yıldız, Ali Lazim, Rhobi Peter Mwita. Enhancing the OhioT1DM Dataset for Predictive Modeling: Exploratory Analysis, Feature Engineering, and Correlation Insights. APJESS. 2026 Jan. 1;14(1):26-35. doi:10.21541/apjess.1714659

Academic Platform Journal of Engineering and Smart Systems