Cloud computing provides scalable computing and storage resources for big healthcare data. Efficient resource utilisation is the most critical factor in processing large-scale data in a reasonable time. Due to the complexity and heterogeneity of distributed computing frameworks, resource utilisation is often lower than expected. Moreover, predicting resource usage under real-world errors in such large and complex systems is quite challenging. In this study, we propose an online resource utilisation prediction model using machine learning (ML) methods combined with an automated log data preprocessing technique to forecast future resource consumption to automatically provision resources for big cloud-based big data systems where common errors occur, including CPU, memory, network, and data locality. Our experiments using the Hadoop framework in the cloud environment show that our ML-based models predict resource usage with a high accuracy rate in environments where different faults coincidentally occur. The model can easily locate the resource bottlenecks for inefficient resource utilisation in big data systems with high accuracy.
Primary Language | English |
---|---|
Subjects | Information Systems User Experience Design and Development, Decision Support and Group Support Systems, Information Systems (Other) |
Journal Section | Articles |
Authors | |
Publication Date | June 27, 2025 |
Submission Date | September 10, 2024 |
Acceptance Date | April 12, 2025 |
Published in Issue | Year 2025 Volume: 14 Issue: 2 |
This work is licensed under the Creative Commons Attribution-Non-Commercial-Non-Derivable 4.0 International License.