Cloud computing provides scalable computing and storage resources for big healthcare data. Efficient resource utilisation is the most critical factor in processing large-scale data in a reasonable time. Due to the complexity and heterogeneity of distributed computing frameworks, resource utilisation is often lower than expected. Moreover, predicting resource usage under real-world errors in such large and complex systems is quite challenging. In this study, we propose an online resource utilisation prediction model using machine learning (ML) methods combined with an automated log data preprocessing technique to forecast future resource consumption to automatically provision resources for big cloud-based big data systems where common errors occur, including CPU, memory, network, and data locality. Our experiments using the Hadoop framework in the cloud environment show that our ML-based models predict resource usage with a high accuracy rate in environments where different faults coincidentally occur. The model can easily locate the resource bottlenecks for inefficient resource utilisation in big data systems with high accuracy.
big data MapReduce Machine learning Resource utilisation Prediction
| Birincil Dil | İngilizce |
|---|---|
| Konular | Bilgi Sistemleri Kullanıcı Deneyimi Tasarımı ve Geliştirme, Karar Desteği ve Grup Destek Sistemleri, Bilgi Sistemleri (Diğer) |
| Bölüm | Araştırma Makalesi |
| Yazarlar | |
| Gönderilme Tarihi | 10 Eylül 2024 |
| Kabul Tarihi | 12 Nisan 2025 |
| Yayımlanma Tarihi | 27 Haziran 2025 |
| Yayımlandığı Sayı | Yıl 2025 Cilt: 14 Sayı: 2 |
Bu eser Creative Commons Atıf-GayriTicari-Türetilemez 4.0 Uluslararası Lisansı ile lisanslanmıştır.