Research Article
BibTex RIS Cite

Discovery and Comparison of the Relationship between Sleep Health and Lifestyle with PCA, Naive Bayes and Random Forest Trees Methods

Year 2024, Volume: 8 Issue: 1, 41 - 56
https://doi.org/10.33461/uybisbbd.1415925

Abstract

Sleep is considered a fundamental element of daily life and plays a crucial role in maintaining overall health and well-being. This study aims to develop a predictive model using the "Sleep Health Lifestyle" dataset downloaded from the Kaggle platform. The model is constructed using Principal Component Analysis (PCA), Naive Bayes, and Random Forest methods, and its performance is evaluated. Additionally, the dataset undergoes dimensionality reduction through the PCA module in the KNIME platform, and the results are presented. Relationships between attributes that influence sleep quality are determined through correlation calculations. Furthermore, the dataset is analyzed using the Naive Bayes and Random Forest methods, and the prediction results are assessed using the KNIME platform. The results are presented in tabular form. The scatter matrices of these comparisons are visualized using the Scatter Plot module in the KNIME platform. The primary contribution of this study is to identify the most effective methodology for mining datasets containing sleep-related information. The findings are discussed in the conclusion section.

References

  • Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
  • Brink-Kjaer, A., Leary, E. B., Sun, H., Westover, M. B., Stone, K. L., Peppard, P. E., ..., Mignot, E. (2022). Age estimation from sleep studies using deep learning predicts life expectancy. NPJ digital medicine, 5(1), 103.
  • Bro, R., Smilde, A. K., 2014. Principal component analysis. Analytical methods, 6(9), 2812-2831.
  • Buysse, D. J. (2014). Sleep health: can we define it? Does it matter?. Sleep, 37(1), 9-17.
  • Dietz, C., Rueden, C. T., Helfrich, S., Dobson, E. T., Hom, M., Eglinger, J., ..., Eliceiri, K. W. (2020). Integration of the ImageJ ecosystem in Knime analytics platform. Frontiers in computer science, 2, 8.
  • Fillbrunn, A., Dietz, C., Pfeuffer, J., Rahn, R., Landrum, G. A., Berthold, M. R. (2017). KNIME for reproducible cross-domain analysis of life science data. Journal of biotechnology, 261, 149-156.
  • Ghose, S. M., Dzierzewski, J. M., Dautovich, N. D. (2023). Sleep and self-efficacy: The role of domain specificity in predicting sleep health. Sleep Health, 9(2), 190-195.
  • Hale, L., Troxel, W., Buysse, D. J. (2020). Sleep health: an opportunity for public health to address health equity. Annual review of public health, 41, 81-99.
  • Hastie, T., Tibshirani, R., Friedman, J. H., Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: springer.
  • Karamizadeh, S., Abdullah, S.M., Manaf, A.A., Zamani, M. and Hooman, A., 2013. An overview of principal component analysis. Journal of Signal and Information Processing, 4(3B), 173-175.
  • Kaya, M., Özel, S. A. (2014). Açık kaynak kodlu veri madenciliği yazılımlarının karşılaştırılması. Akademik Bilişim, 1-8.
  • Maraza-Quispe, B., Valderrama-Chauca, E. D., Cari-Mogrovejo, L. H., Apaza-Huanca, J. M., Sanchez-Ilabaca, J. (2022). A predictive model implemented in knime based on learning analytics for timely decision making in virtual learning environments. International Journal of Information and Education Technology, 12(2), 91-99.
  • Pal, M. (2005). Random forest classifier for remote sensing classification. International journal of remote sensing, 26(1), 217-222.
  • Ricciardi, C., Valente, A. S., Edmund, K., Cantoni, V., Green, R., Fiorillo, A., ..., Cesarelli, M. (2020). Linear discriminant analysis and principal component analysis to predict coronary artery disease. Health informatics journal, 26(3), 2181-2192.
  • Rish, I. (2001). An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence, 3(22), 41-46.
  • Tharmalingam, L. 2023. Sleep Health and Lifestyle Dataset. https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset

Uyku Sağlığı ile Yaşam Tarzı Arasındaki İlişkinin PCA, Naive Bayes ve Rastgele Orman Ağaçları Yöntemleri ile İncelenmesi ve Karşılaştırılması

Year 2024, Volume: 8 Issue: 1, 41 - 56
https://doi.org/10.33461/uybisbbd.1415925

Abstract

Uyku, günlük yaşamın temel bir unsuru olarak kabul edilir ve genel sağlık ile refahın sürdürülmesinde önemli bir rol oynar. Bu araştırma, Kaggle platformundan elde edilen "Uyku Sağlığı Yaşam Tarzı" veri setini kullanarak bir tahmin modeli oluşturmayı, bu modeli Principal Component Analysis (PCA) yöntemi, Naive Bayes yöntemi ve Rastgele Orman Ağaçları yöntemiyle değerlendirmeyi ve görselleştirmeler gerçekleştirmeyi amaçlamaktadır. İncelenen veri seti, KNIME platformunda PCA modülü ile boyut azaltma işlemine tabi tutulmuş ve elde edilen çıktılar sunulmuştur. Uyku kalitesini etkilediği düşünülen öznitelikler arasındaki ilişkiler, korelasyon hesaplamaları ile belirlenmiştir. Ayrıca, veri seti Naive Bayes ve Rastgele Orman Ağaçları yöntemleriyle analiz edilmiş, tahmin sonuçları KNIME ortamında değerlendirilmiştir. Elde edilen sonuçlar tablolar halinde sunulmuştur. Bu karşılaştırmaların dağılım matrisleri, KNIME platformundaki Scatter Plot modülü kullanılarak görselleştirilmiştir. Bu çalışmanın en önemli katkısı uyku verileri içeren veri setlerinde kullanılabilecek metodolojiler arasında en etkili olan yöntemi belirlemektir. Bulgular, tartışma ve sonuçlar bölümünde detaylı bir şekilde ele alınmıştır.

References

  • Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
  • Brink-Kjaer, A., Leary, E. B., Sun, H., Westover, M. B., Stone, K. L., Peppard, P. E., ..., Mignot, E. (2022). Age estimation from sleep studies using deep learning predicts life expectancy. NPJ digital medicine, 5(1), 103.
  • Bro, R., Smilde, A. K., 2014. Principal component analysis. Analytical methods, 6(9), 2812-2831.
  • Buysse, D. J. (2014). Sleep health: can we define it? Does it matter?. Sleep, 37(1), 9-17.
  • Dietz, C., Rueden, C. T., Helfrich, S., Dobson, E. T., Hom, M., Eglinger, J., ..., Eliceiri, K. W. (2020). Integration of the ImageJ ecosystem in Knime analytics platform. Frontiers in computer science, 2, 8.
  • Fillbrunn, A., Dietz, C., Pfeuffer, J., Rahn, R., Landrum, G. A., Berthold, M. R. (2017). KNIME for reproducible cross-domain analysis of life science data. Journal of biotechnology, 261, 149-156.
  • Ghose, S. M., Dzierzewski, J. M., Dautovich, N. D. (2023). Sleep and self-efficacy: The role of domain specificity in predicting sleep health. Sleep Health, 9(2), 190-195.
  • Hale, L., Troxel, W., Buysse, D. J. (2020). Sleep health: an opportunity for public health to address health equity. Annual review of public health, 41, 81-99.
  • Hastie, T., Tibshirani, R., Friedman, J. H., Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: springer.
  • Karamizadeh, S., Abdullah, S.M., Manaf, A.A., Zamani, M. and Hooman, A., 2013. An overview of principal component analysis. Journal of Signal and Information Processing, 4(3B), 173-175.
  • Kaya, M., Özel, S. A. (2014). Açık kaynak kodlu veri madenciliği yazılımlarının karşılaştırılması. Akademik Bilişim, 1-8.
  • Maraza-Quispe, B., Valderrama-Chauca, E. D., Cari-Mogrovejo, L. H., Apaza-Huanca, J. M., Sanchez-Ilabaca, J. (2022). A predictive model implemented in knime based on learning analytics for timely decision making in virtual learning environments. International Journal of Information and Education Technology, 12(2), 91-99.
  • Pal, M. (2005). Random forest classifier for remote sensing classification. International journal of remote sensing, 26(1), 217-222.
  • Ricciardi, C., Valente, A. S., Edmund, K., Cantoni, V., Green, R., Fiorillo, A., ..., Cesarelli, M. (2020). Linear discriminant analysis and principal component analysis to predict coronary artery disease. Health informatics journal, 26(3), 2181-2192.
  • Rish, I. (2001). An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence, 3(22), 41-46.
  • Tharmalingam, L. 2023. Sleep Health and Lifestyle Dataset. https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset
There are 16 citations in total.

Details

Primary Language Turkish
Subjects Data Quality, Data Mining and Knowledge Discovery
Journal Section Research Paper
Authors

Serkan Ayan 0000-0003-3041-2324

Turgay Tugay Bilgin 0000-0002-9245-5728

Early Pub Date March 17, 2024
Publication Date
Submission Date January 7, 2024
Acceptance Date February 29, 2024
Published in Issue Year 2024 Volume: 8 Issue: 1

Cite

APA Ayan, S., & Bilgin, T. T. (2024). Uyku Sağlığı ile Yaşam Tarzı Arasındaki İlişkinin PCA, Naive Bayes ve Rastgele Orman Ağaçları Yöntemleri ile İncelenmesi ve Karşılaştırılması. Uluslararası Yönetim Bilişim Sistemleri Ve Bilgisayar Bilimleri Dergisi, 8(1), 41-56. https://doi.org/10.33461/uybisbbd.1415925