Araştırma Makalesi
BibTex RIS Kaynak Göster

Prediction of Hard Disk Failures in Data Centers: A Comparative Study with Decision Tree-Based Machine Learning Models

Yıl 2025, Cilt: 8 Sayı: 1, 34 - 44, 31.12.2025

Öz

In the digitalizing world, data centers serve as the backbone of information technologies and represent critical infrastructure worldwide. However, the hard disks used in these centers often experience high failure rates, leading to issues such as data loss, service interruptions, and increased operational costs. This study presents a comparative analysis of decision tree-based machine learning methods for predicting hard disk failures using SMART (Self-Monitoring, Analysis, and Reporting Technology) data. Extensive SMART data from active data centers is gathered, preprocessed for modeling, and evaluated using four machine learning models: decision trees, random forests, XGBoost, and LightGBM. Model performance is assessed using metrics such as accuracy, false alarm rate (FAR), missed alarm rate (MAR), and a general metric (GM). Predictions for long-term (one month in advance) and short-term (one week in advance) horizons are analyzed separately. Following a comprehensive model selection process, the selected LightGBM model achieves 85.2% accuracy and 82.9% GM for short-term predictions, while the selected random forest model demonstrates superior performance for long-term predictions with 88.7% accuracy and 87.6% GM.

Proje Numarası

AGTMPR94351

Kaynakça

  • Referans 1 International Energy Agency. (2024). Electricity 2024: Analysis and forecast to 2026. International Energy Agency. https://iea.blob.core.windows.net/assets/6b2fd954-2017-408e-bf08- 952fdd62118a/Electricity2024-Analysisandforecastto2026.pdf (Erişim tarihi: 30.11.2024).
  • Referans 2 Ahmed, K. P., Mourin, A., & Ahmed, K. M. U. (2021). Application of predictive maintenance in industry 4.0: A use-case study for datacenters. 2021 3rd International Conference on Sustainable Technologies for Industry 4.0 (STI), IEEE, pp. 1–6.
  • Referans 3 Gargiulo, F., Duellmann, D., Arpaia, P., & Schiano Lo Moriello, R. (2021). Predicting hard disk failure by means of automatized labeling and machine learning approach. Applied Sciences, 11(18), 8293.
  • Referans 4 Wang, G., Wang, Y., & Sun, X. (2021). Multi-instance deep learning based on attention mechanism for failure prediction of unlabeled hard disk drives. IEEE Transactions on Instrumentation and Measurement, 70, 1–9.
  • Referans 5 Wang, G., Zhang, L., & Xu, W. (2017). What can we learn from four years of data center hardware failures? 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), IEEE, pp. 1–9.
  • Referans 6 Backblaze. (2023). Backblaze drive stats for 2023. Backblaze. https://www.backblaze.com/blog/backblaze- drive-stats-for-2023/ (Erişim tarihi: 30.11.2024).
  • Referans 7 Krishnamurthy, L., Adler, R., Buonadonna, P., Chhabra, J., Flanigan, M., Kushalnagar, N., Nachman, L., & Yarvis,M. (2005). Design and deployment of industrial sensor networks: Experiences from a semiconductor plant and the North Sea. Proceedings of the 3rd International Conference on Embedded Networked Sensor Systems, pp. 64–75.
  • Referans 8 Susto, G. A., Beghi, A., & De Luca, C. (2012). A predictive maintenance system for epitaxy processes based on filtering and prediction techniques. IEEE Transactions on Semiconductor Manufacturing, 25(4), 638–649.
  • Referans 9 Susto, G. A., Schirru, A., Pampuri, S., McLoone, S., & Beghi, A. (2014). Machine learning for predictive maintenance: A multiple classifier approach. IEEE Transactions on Industrial Informatics, 11(3), 812–820. Referans 10 Samsung Semiconductor. (2014). S.M.A.R.T. Application Note. Samsung Semiconductor. https://download.semiconductor.samsung.com/resources/others/SSD_Application_Note_SMART_final.pdf (Erişim tarihi: 30.11.2024).
  • Referans 11 Mashhadi, A. R., Cade, W., & Behdad, S. (2018). Moving towards real-time data-driven quality monitoring: A case study of hard disk drives. Procedia Manufacturing, 26, 1107–1115.
  • Referans 12 Coursey, A., Nath, G., Prabhu, S., & Sengupta, S. (2021). Remaining useful life estimation of hard disk drives using bidirectional LSTM networks. 2021 IEEE International Conference on Big Data (Big Data), IEEE, pp. 4832–4841.
  • Referans 13 Ferraro, A., Galli, A., Moscato, V., & Sperlí, G. (2020). A novel approach for predictive maintenance combining GAF encoding strategies and deep networks. 2020 IEEE 6th International Conference on Dependability in Sensor, Cloud and Big Data Systems and Application (DependSys), IEEE, pp. 127–132.
  • Referans 14 Li, J., Ji, X., Jia, Y., Zhu, B., Wang, G., Li, Z., & Liu, X. (2014). Hard drive failure prediction using classification and regression trees. 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), IEEE, pp. 383–394.
  • Referans 15 Chaves, I. C., De Paula, M. R. P., Leite, L. G., Gomes, J. P. P., & Machado, J. C. (2018). Hard disk drive failure prediction method based on a Bayesian network. 2018 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1–7.
  • Referans 16 Murray, J. F., Hughes, G. F., & Kreutz-Delgado, K. (2003). Hard drive failure prediction using non-parametric statistical methods. Proceedings of ICANN/ICONIP.
  • Referans 17 Zhao, Y., Liu, X., Gan, S., & Zheng, W. (2010). Predicting disk failures with HMM- and HSMM-based approaches. Advances in Data Mining: Applications and Theoretical Aspects, 390–404. Springer Berlin Heidelberg.
  • Referans 18 Shen, J., Wan, J., Lim, S. J., & Yu, L. (2018). Random-forest-based failure prediction for hard disk drives. International Journal of Distributed Sensor Networks, 14(11), 1550147718806480.
  • Referans 19 Xu, C., Wang, G., Liu, X., Guo, D., & Liu, T. Y. (2016). Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Transactions on Computers, 65(11), 3502–3508.
  • Referans 20 Pereira, F. L. F., Chaves, I. C., Gomes, J. P. P., & Machado, J. C. (2020). Using autoencoders for anomaly detection in hard disk drives. 2020 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1–7.
  • Referans 21 Shen, J., Ren, Y., Wan, J., & Lan, Y. (2021). Hard disk drive failure prediction for mobile edge computing based on an LSTM recurrent neural network. Mobile Information Systems, 2021(1), 8878364.
  • Referans 22 Hai, Q., Zhang, S., Liu, C., & Han, G. (2022). Hard disk drive failure prediction based on GRU neural network. 2022 IEEE/CIC International Conference on Communications in China (ICCC), IEEE, pp. 696–701.
  • Referans 23 Zhang, M., Ge, W., Tang, R., & Liu, P. (2023). Hard disk failure prediction based on blending ensemble learning. Applied Sciences, 13(5), 3288.
  • Referans 24 Rombach, P., & Keuper, J. (2020). SmartPred: Unsupervised hard disk failure detection. High Performance Computing: ISC High Performance 2020 International Workshops, Springer International Publishing, pp. 235– 246.
  • Referans 25 Barelli, E., & Ottaviani, E. (2021). Unsupervised anomaly detection for hard drives. PHM Society European Conference, 6(1), 7–7.
  • Referans 26 Zhou, H., Niu, Z., Wang, G., Liu, X., Liu, D., Kang, B., Zheng, H., & Zhang, Y. (2021). A proactive failure tolerant mechanism for SSDs storage systems based on unsupervised learning. 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQoS), IEEE, pp. 1–10.
  • Referans 27 Yang, Q., Jia, X., Li, X., Feng, J., Li, W., & Lee, J. (2020). Evaluating feature selection and anomaly detection methods of hard drive failure prediction. IEEE Transactions on Reliability, 70(2), 749–760.
  • Referans 28 Mohapatra, R., Coursey, A., & Sengupta, S. (2023). Large-scale end-of-life prediction of hard disks in distributed datacenters. 2023 IEEE International Conference on Smart Computing (SMARTCOMP), IEEE, pp. 261–266. Referans 29 Ahmad, W., Khan, S. A., Kim, C. H., & Kim, J. M. (2020). Feature selection for improving failure detection in hard disk drives using a genetic algorithm and significance scores. Applied Sciences, 10(9), 3200.
  • Referans 30 Wang, H., Yang, Y., & Yang, H. (2021). Hard disk failure prediction based on LightGBM with CID. 2021 IEEE Symposium on Computers and Communications (ISCC), IEEE, pp. 1–7. Referans 31 Backblaze. (2023). Hard drive test data. Backblaze. https://www.backblaze.com/cloud- storage/resources/hard-drive-test-data (Erişim tarihi: 30.11.2024).
  • Referans 32 Backblaze. (2023). What SMART stats indicate hard drive failures. Backblaze. https://www.backblaze.com/blog/what-smart-stats-indicate-hard-drive-failures/ (Erişim tarihi: 30.11.2024).
  • Referans 33 Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth International Group.
  • Referans 34 Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
  • Referans 35 Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794.
  • Referans 36 Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 3146–3154.

Veri Merkezlerinde Kullanılan Sabit Disklerdeki Arızaların Önceden Kestirimi: Karar Ağacı Tabanlı Makine Öğrenmesi Modelleri ile Karşılaştırmalı Bir Çalışma

Yıl 2025, Cilt: 8 Sayı: 1, 34 - 44, 31.12.2025

Öz

Dijitalleşen dünyada veri merkezleri, bilgi teknolojilerinin omurgasını oluşturarak kritik altyapılar arasında yer almaktadır. Ancak, veri merkezlerinde kullanılan sabit diskler, yüksek arıza oranları nedeniyle veri kaybı, hizmet kesintisi ve maliyet artışları gibi ciddi sorunlara yol açabilmektedir. Bu çalışma, sabit disk arızalarının SMART (Self-Monitoring, Analysis and Reporting Technology) verileri üzerinden önceden kestirimi için karar ağacı tabanlı makine öğrenmesi yöntemlerini karşılaştırmalı olarak analiz etmektedir. Çalışmada, güncel veri merkezlerinden elde edilen geniş çaplı SMART verileri kullanılmıştır. Veriler, ön işleme adımlarıyla modellenebilir hale getirilmiş ve dört farklı makine öğrenmesi modeli (karar ağaçları, rastgele ormanlar, XGBoost ve LightGBM) kullanılarak değerlendirilmiştir. Modellerin başarımı, doğruluk, yanlış alarm oranı (YAO), kaçırılan alarm oranı (KAO) ve genel başarım ölçevi (GBÖ) gibi ölçevler üzerinden kıyaslanmış; bir ay önceden (uzun vadeli) ve bir hafta önceden (kısa vadeli) yapılan kestirimler ayrı ayrı incelenmiştir. Kapsamlı bir model seçimi sürecinin ardından, kısa vadeli kestirim için seçilen LightGBM modeli %85,2 doğruluk ve %82,9 GBÖ değerine ulaşırken, uzun vadeli kestirim için seçilen rastgele ormanlar modeli %88,7 doğruluk ve %87,6 GBÖ değeri ile daha yüksek bir performans göstermiştir.

Destekleyen Kurum

T.C. Sanayi ve Teknoloji Bakanlığı

Proje Numarası

AGTMPR94351

Kaynakça

  • Referans 1 International Energy Agency. (2024). Electricity 2024: Analysis and forecast to 2026. International Energy Agency. https://iea.blob.core.windows.net/assets/6b2fd954-2017-408e-bf08- 952fdd62118a/Electricity2024-Analysisandforecastto2026.pdf (Erişim tarihi: 30.11.2024).
  • Referans 2 Ahmed, K. P., Mourin, A., & Ahmed, K. M. U. (2021). Application of predictive maintenance in industry 4.0: A use-case study for datacenters. 2021 3rd International Conference on Sustainable Technologies for Industry 4.0 (STI), IEEE, pp. 1–6.
  • Referans 3 Gargiulo, F., Duellmann, D., Arpaia, P., & Schiano Lo Moriello, R. (2021). Predicting hard disk failure by means of automatized labeling and machine learning approach. Applied Sciences, 11(18), 8293.
  • Referans 4 Wang, G., Wang, Y., & Sun, X. (2021). Multi-instance deep learning based on attention mechanism for failure prediction of unlabeled hard disk drives. IEEE Transactions on Instrumentation and Measurement, 70, 1–9.
  • Referans 5 Wang, G., Zhang, L., & Xu, W. (2017). What can we learn from four years of data center hardware failures? 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), IEEE, pp. 1–9.
  • Referans 6 Backblaze. (2023). Backblaze drive stats for 2023. Backblaze. https://www.backblaze.com/blog/backblaze- drive-stats-for-2023/ (Erişim tarihi: 30.11.2024).
  • Referans 7 Krishnamurthy, L., Adler, R., Buonadonna, P., Chhabra, J., Flanigan, M., Kushalnagar, N., Nachman, L., & Yarvis,M. (2005). Design and deployment of industrial sensor networks: Experiences from a semiconductor plant and the North Sea. Proceedings of the 3rd International Conference on Embedded Networked Sensor Systems, pp. 64–75.
  • Referans 8 Susto, G. A., Beghi, A., & De Luca, C. (2012). A predictive maintenance system for epitaxy processes based on filtering and prediction techniques. IEEE Transactions on Semiconductor Manufacturing, 25(4), 638–649.
  • Referans 9 Susto, G. A., Schirru, A., Pampuri, S., McLoone, S., & Beghi, A. (2014). Machine learning for predictive maintenance: A multiple classifier approach. IEEE Transactions on Industrial Informatics, 11(3), 812–820. Referans 10 Samsung Semiconductor. (2014). S.M.A.R.T. Application Note. Samsung Semiconductor. https://download.semiconductor.samsung.com/resources/others/SSD_Application_Note_SMART_final.pdf (Erişim tarihi: 30.11.2024).
  • Referans 11 Mashhadi, A. R., Cade, W., & Behdad, S. (2018). Moving towards real-time data-driven quality monitoring: A case study of hard disk drives. Procedia Manufacturing, 26, 1107–1115.
  • Referans 12 Coursey, A., Nath, G., Prabhu, S., & Sengupta, S. (2021). Remaining useful life estimation of hard disk drives using bidirectional LSTM networks. 2021 IEEE International Conference on Big Data (Big Data), IEEE, pp. 4832–4841.
  • Referans 13 Ferraro, A., Galli, A., Moscato, V., & Sperlí, G. (2020). A novel approach for predictive maintenance combining GAF encoding strategies and deep networks. 2020 IEEE 6th International Conference on Dependability in Sensor, Cloud and Big Data Systems and Application (DependSys), IEEE, pp. 127–132.
  • Referans 14 Li, J., Ji, X., Jia, Y., Zhu, B., Wang, G., Li, Z., & Liu, X. (2014). Hard drive failure prediction using classification and regression trees. 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), IEEE, pp. 383–394.
  • Referans 15 Chaves, I. C., De Paula, M. R. P., Leite, L. G., Gomes, J. P. P., & Machado, J. C. (2018). Hard disk drive failure prediction method based on a Bayesian network. 2018 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1–7.
  • Referans 16 Murray, J. F., Hughes, G. F., & Kreutz-Delgado, K. (2003). Hard drive failure prediction using non-parametric statistical methods. Proceedings of ICANN/ICONIP.
  • Referans 17 Zhao, Y., Liu, X., Gan, S., & Zheng, W. (2010). Predicting disk failures with HMM- and HSMM-based approaches. Advances in Data Mining: Applications and Theoretical Aspects, 390–404. Springer Berlin Heidelberg.
  • Referans 18 Shen, J., Wan, J., Lim, S. J., & Yu, L. (2018). Random-forest-based failure prediction for hard disk drives. International Journal of Distributed Sensor Networks, 14(11), 1550147718806480.
  • Referans 19 Xu, C., Wang, G., Liu, X., Guo, D., & Liu, T. Y. (2016). Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Transactions on Computers, 65(11), 3502–3508.
  • Referans 20 Pereira, F. L. F., Chaves, I. C., Gomes, J. P. P., & Machado, J. C. (2020). Using autoencoders for anomaly detection in hard disk drives. 2020 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1–7.
  • Referans 21 Shen, J., Ren, Y., Wan, J., & Lan, Y. (2021). Hard disk drive failure prediction for mobile edge computing based on an LSTM recurrent neural network. Mobile Information Systems, 2021(1), 8878364.
  • Referans 22 Hai, Q., Zhang, S., Liu, C., & Han, G. (2022). Hard disk drive failure prediction based on GRU neural network. 2022 IEEE/CIC International Conference on Communications in China (ICCC), IEEE, pp. 696–701.
  • Referans 23 Zhang, M., Ge, W., Tang, R., & Liu, P. (2023). Hard disk failure prediction based on blending ensemble learning. Applied Sciences, 13(5), 3288.
  • Referans 24 Rombach, P., & Keuper, J. (2020). SmartPred: Unsupervised hard disk failure detection. High Performance Computing: ISC High Performance 2020 International Workshops, Springer International Publishing, pp. 235– 246.
  • Referans 25 Barelli, E., & Ottaviani, E. (2021). Unsupervised anomaly detection for hard drives. PHM Society European Conference, 6(1), 7–7.
  • Referans 26 Zhou, H., Niu, Z., Wang, G., Liu, X., Liu, D., Kang, B., Zheng, H., & Zhang, Y. (2021). A proactive failure tolerant mechanism for SSDs storage systems based on unsupervised learning. 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQoS), IEEE, pp. 1–10.
  • Referans 27 Yang, Q., Jia, X., Li, X., Feng, J., Li, W., & Lee, J. (2020). Evaluating feature selection and anomaly detection methods of hard drive failure prediction. IEEE Transactions on Reliability, 70(2), 749–760.
  • Referans 28 Mohapatra, R., Coursey, A., & Sengupta, S. (2023). Large-scale end-of-life prediction of hard disks in distributed datacenters. 2023 IEEE International Conference on Smart Computing (SMARTCOMP), IEEE, pp. 261–266. Referans 29 Ahmad, W., Khan, S. A., Kim, C. H., & Kim, J. M. (2020). Feature selection for improving failure detection in hard disk drives using a genetic algorithm and significance scores. Applied Sciences, 10(9), 3200.
  • Referans 30 Wang, H., Yang, Y., & Yang, H. (2021). Hard disk failure prediction based on LightGBM with CID. 2021 IEEE Symposium on Computers and Communications (ISCC), IEEE, pp. 1–7. Referans 31 Backblaze. (2023). Hard drive test data. Backblaze. https://www.backblaze.com/cloud- storage/resources/hard-drive-test-data (Erişim tarihi: 30.11.2024).
  • Referans 32 Backblaze. (2023). What SMART stats indicate hard drive failures. Backblaze. https://www.backblaze.com/blog/what-smart-stats-indicate-hard-drive-failures/ (Erişim tarihi: 30.11.2024).
  • Referans 33 Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth International Group.
  • Referans 34 Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
  • Referans 35 Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794.
  • Referans 36 Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 3146–3154.
Toplam 33 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Bilgi Sistemleri Geliştirme Metodolojileri ve Uygulamaları
Bölüm Araştırma Makalesi
Yazarlar

Aykut Müderrisoğlu 0009-0003-4447-1971

Mustafa Özkan 0009-0000-0286-6092

Proje Numarası AGTMPR94351
Gönderilme Tarihi 23 Aralık 2024
Kabul Tarihi 24 Mart 2025
Yayımlanma Tarihi 31 Aralık 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 8 Sayı: 1

Kaynak Göster

APA Müderrisoğlu, A., & Özkan, M. (2025). Veri Merkezlerinde Kullanılan Sabit Disklerdeki Arızaların Önceden Kestirimi: Karar Ağacı Tabanlı Makine Öğrenmesi Modelleri ile Karşılaştırmalı Bir Çalışma. Journal of Investigations on Engineering and Technology, 8(1), 34-44.