Loading [a11y]/accessibility-menu.js
Research Article
BibTex RIS Cite

Random Forest ve LightGBM Algoritmaları Kullanılarak Su Kalitesinin pH Değerinin Tahmini

Year 2025, Volume: 11 Issue: 1, 42 - 49, 28.03.2025
https://doi.org/10.58626/memba.1667338

Abstract

Bu çalışma, su kalitesinin değerlendirilmesinde önemli bir parametre olan pH değerinin tahmini için Random Forest Regression ve LightGBM algoritmalarını karşılaştırmayı amaçlamaktadır. Kaggle platformundan elde edilen geniş bir veri seti üzerinde gerçekleştirilen analizlerde, her iki algoritmanın performansı RMSE, R-squared ve AUC (Area Under Curve) gibi metriklerle değerlendirilmiştir. Sonuçlar, LightGBM algoritmasının AUC değeriyle (0.86), Random Forest'tan (0.84) daha yüksek performans sergilediğini ve özellikle büyük ve karmaşık veri setlerinde daha iyi bir tahmin doğruluğu sağladığını göstermiştir. Bu bulgular, makine öğrenimi tekniklerinin çevresel izleme süreçlerindeki uygulanabilirliğini ve su kalitesinin etkin bir şekilde yönetilmesindeki potansiyelini ortaya koymaktadır. Elde edilen sonuçlar, pH tahmini gibi çevresel sorunların çözümünde LightGBM algoritmasının üstünlüğünü vurgulamakla birlikte, daha kapsamlı yaklaşımlar için öneriler de sunmaktadır. Hibrit modelleme tekniklerinin uygulanması, farklı su kaynaklarından alınan veri setleriyle genelleştirilebilir analizlerin yapılması ve gerçek zamanlı izleme sistemlerinin geliştirilmesi, çalışmanın bulgularının genişletilmesi adına önerilmektedir. Bu çalışma, çevresel izleme ve su kalitesi yönetiminde makine öğrenimi algoritmalarının önemini bir kez daha ortaya koyarak literatüre katkı sağlamaktadır.

References

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
  • Elsenety, M. M., Mohamed, M. B. I., Sultan, M. E., & Elsayed, B. A. (2022). Facile and highly precise pH-value estimation using common pH paper based on machine learning techniques and supported mobile devices. Scientific Reports, 12(22584). https://doi.org/10.1038/s41598-022-27054-5
  • Ganapa, J. R., Choudari, S., & Rao, M. K. (2024). Gold price prediction using random forest regression. Educational Administration: Theory and Practice, 30(1), 1052–1055. https://doi.org/10.53555/kuey.v30i1.5928
  • Gao, B., & Balyan, V. (2022). Construction of a financial default risk prediction model based on the LightGBM algorithm. Journal of Intelligent Systems, 31(767–779). https://doi.org/10.1515/jisys-2022-0036
  • Iyer, S., Kaushik, S., & Nandal, P. (2023). Water quality prediction using machine learning. Manav Rachna International Journal of Engineering and Technology, 10(1), 59-68. https://doi.org/10.58864/mrijet.2023.10.1.8
  • Kaggle, https://www.kaggle.com/datasets/somasreemajumder/waterdataset , (30.12.2024).
  • Karaatlı, M., Helvacıoğlu, Ö. C., Ömürbek, N., & Tokgöz, G. (2012). Yapay sinir ağlari yöntemi ile otomobil satiş tahmini. Uluslararası Yönetim İktisat ve İşletme Dergisi, 8(17), 87-100.
  • Koranga, M., et al. (2022). Machine learning algorithms for water quality prediction for Nanital Lake, Uttarakhand. International Journal of Advanced Research, 10(2), 103-114.
  • Li, Y., Zou, C., Berecibar, M., Nanini-Maury, E., Chand, J. C.-W., van den Bossche, P., Van Mierlo, J., & Omar, N. (2018). Random forest regression for online capacity estimation of lithium-ion batteries. Applied Energy, 232, 197–210. https://doi.org/10.1016/j.apenergy.2018.09.182
  • Liang, Y., Wu, J., Wang, W., Cao, Y., Zhong, B., & Chen, Z. (2019). Product marketing prediction based on XGBoost and LightGBM algorithm. AIPR 2019, ACM. https://doi.org/10.1145/3357254.3357290
  • Meybeck, M., Peters, N. E., & Chapman, D. V. (2006). Water quality. Encyclopedia of hydrological sciences.
  • Nayak, B., & Panda, P. K. (2024). A Comprehensive Review of Water Quality Analysis. International Journal of Image and Graphics, 2650033.
  • Nguyen, X. C., Nguyen, T. T. H., La, D. D., Kumar, G., Rene, E. R., Nguyen, D. D., ... & Nguyen, V. K. (2021). Development of machine learning-based models to forecast solid waste generation in residential areas: A case study from Vietnam. Resources, Conservation and Recycling, 167, 105381.
  • Omer, N. H. (2019). Water quality parameters. Water quality-science, assessments and policy, 18, 1-34.
  • Rogers III, O. N., & Ambili, P. S. (2024). Water Quality Predıction with Machine Learning Algorithms. EPRA International Journal of Multidisciplinary Research (IJMR), 10(4), 82-86.
  • Segal, M. R. (2003). Machine learning benchmarks and random forest regression. Biostatistics Division, University of California, San Francisco.
  • Song, J., Liu, G., Jiang, J., Zhang, P., & Liang, Y. (2021). Prediction of protein–ATP binding residues based on ensemble of deep convolutional neural networks and LightGBM algorithm. International Journal of Molecular Sciences, 22(939). https://doi.org/10.3390/ijms22020939
  • Stackelberg, P. E., Belitz, K., Brown, C. J., Erickson, M. L., Elliott, S. M., Kauffman, L. J., Ransom, K. M., & Reddy, J. E. (2021). Machine learning predictions of pH in the glacial aquifer system, northern USA. Groundwater, 59(3), 352-368. https://doi.org/10.1111/gwat.13063
  • Tziachris, P., Aschonitis, V., Chatzistathis, T., Papadopoulou, M., & Doukas, I. D. (2020). Comparing machine learning models and hybrid geostatistical methods using environmental and soil covariates for soil pH prediction. International Journal of Geo-Information, 9(4), 276. https://doi.org/10.3390/ijgi9040276.
  • Yang, Y., Wu, Y., Wang, P., & Xu, J. (2021). Stock price prediction based on XGBoost and LightGBM. E3S Web of Conferences, 275 (01040). https://doi.org/10.1051/e3sconf/20212750104

Prediction of Water Quality’s pH value using Random Forest and LightGBM Algorithms

Year 2025, Volume: 11 Issue: 1, 42 - 49, 28.03.2025
https://doi.org/10.58626/memba.1667338

Abstract

This study aims to compare Random Forest Regression and LightGBM algorithms for the prediction of pH value, which is an important parameter in water quality assessment. The performance of both algorithms is evaluated with metrics such as RMSE, R-squared and AUC (Area Under Curve). The results show that the LightGBM algorithm outperforms Random Forest (0.84) with an AUC value of 0.86 and provides better prediction accuracy, especially on large and complex datasets. These findings demonstrate the applicability of machine learning techniques in environmental monitoring processes and their potential for effective management of water quality. The results highlight the superiority of the LightGBM algorithm in solving environmental problems such as pH prediction, but also provide suggestions for more comprehensive approaches. The application of hybrid modeling techniques, generalizable analyses with datasets from different water sources, and the development of real-time monitoring systems are suggested to extend the findings of the study. This study contributes to the literature by demonstrating the importance of machine learning algorithms in environmental monitoring and water quality management.

References

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
  • Elsenety, M. M., Mohamed, M. B. I., Sultan, M. E., & Elsayed, B. A. (2022). Facile and highly precise pH-value estimation using common pH paper based on machine learning techniques and supported mobile devices. Scientific Reports, 12(22584). https://doi.org/10.1038/s41598-022-27054-5
  • Ganapa, J. R., Choudari, S., & Rao, M. K. (2024). Gold price prediction using random forest regression. Educational Administration: Theory and Practice, 30(1), 1052–1055. https://doi.org/10.53555/kuey.v30i1.5928
  • Gao, B., & Balyan, V. (2022). Construction of a financial default risk prediction model based on the LightGBM algorithm. Journal of Intelligent Systems, 31(767–779). https://doi.org/10.1515/jisys-2022-0036
  • Iyer, S., Kaushik, S., & Nandal, P. (2023). Water quality prediction using machine learning. Manav Rachna International Journal of Engineering and Technology, 10(1), 59-68. https://doi.org/10.58864/mrijet.2023.10.1.8
  • Kaggle, https://www.kaggle.com/datasets/somasreemajumder/waterdataset , (30.12.2024).
  • Karaatlı, M., Helvacıoğlu, Ö. C., Ömürbek, N., & Tokgöz, G. (2012). Yapay sinir ağlari yöntemi ile otomobil satiş tahmini. Uluslararası Yönetim İktisat ve İşletme Dergisi, 8(17), 87-100.
  • Koranga, M., et al. (2022). Machine learning algorithms for water quality prediction for Nanital Lake, Uttarakhand. International Journal of Advanced Research, 10(2), 103-114.
  • Li, Y., Zou, C., Berecibar, M., Nanini-Maury, E., Chand, J. C.-W., van den Bossche, P., Van Mierlo, J., & Omar, N. (2018). Random forest regression for online capacity estimation of lithium-ion batteries. Applied Energy, 232, 197–210. https://doi.org/10.1016/j.apenergy.2018.09.182
  • Liang, Y., Wu, J., Wang, W., Cao, Y., Zhong, B., & Chen, Z. (2019). Product marketing prediction based on XGBoost and LightGBM algorithm. AIPR 2019, ACM. https://doi.org/10.1145/3357254.3357290
  • Meybeck, M., Peters, N. E., & Chapman, D. V. (2006). Water quality. Encyclopedia of hydrological sciences.
  • Nayak, B., & Panda, P. K. (2024). A Comprehensive Review of Water Quality Analysis. International Journal of Image and Graphics, 2650033.
  • Nguyen, X. C., Nguyen, T. T. H., La, D. D., Kumar, G., Rene, E. R., Nguyen, D. D., ... & Nguyen, V. K. (2021). Development of machine learning-based models to forecast solid waste generation in residential areas: A case study from Vietnam. Resources, Conservation and Recycling, 167, 105381.
  • Omer, N. H. (2019). Water quality parameters. Water quality-science, assessments and policy, 18, 1-34.
  • Rogers III, O. N., & Ambili, P. S. (2024). Water Quality Predıction with Machine Learning Algorithms. EPRA International Journal of Multidisciplinary Research (IJMR), 10(4), 82-86.
  • Segal, M. R. (2003). Machine learning benchmarks and random forest regression. Biostatistics Division, University of California, San Francisco.
  • Song, J., Liu, G., Jiang, J., Zhang, P., & Liang, Y. (2021). Prediction of protein–ATP binding residues based on ensemble of deep convolutional neural networks and LightGBM algorithm. International Journal of Molecular Sciences, 22(939). https://doi.org/10.3390/ijms22020939
  • Stackelberg, P. E., Belitz, K., Brown, C. J., Erickson, M. L., Elliott, S. M., Kauffman, L. J., Ransom, K. M., & Reddy, J. E. (2021). Machine learning predictions of pH in the glacial aquifer system, northern USA. Groundwater, 59(3), 352-368. https://doi.org/10.1111/gwat.13063
  • Tziachris, P., Aschonitis, V., Chatzistathis, T., Papadopoulou, M., & Doukas, I. D. (2020). Comparing machine learning models and hybrid geostatistical methods using environmental and soil covariates for soil pH prediction. International Journal of Geo-Information, 9(4), 276. https://doi.org/10.3390/ijgi9040276.
  • Yang, Y., Wu, Y., Wang, P., & Xu, J. (2021). Stock price prediction based on XGBoost and LightGBM. E3S Web of Conferences, 275 (01040). https://doi.org/10.1051/e3sconf/20212750104
There are 20 citations in total.

Details

Primary Language English
Subjects Ecology (Other)
Journal Section Research Articles
Authors

İbrahim Budak 0000-0001-7762-6114

Publication Date March 28, 2025
Submission Date January 29, 2025
Acceptance Date March 24, 2025
Published in Issue Year 2025 Volume: 11 Issue: 1

Cite

APA Budak, İ. (2025). Prediction of Water Quality’s pH value using Random Forest and LightGBM Algorithms. MEMBA Su Bilimleri Dergisi, 11(1), 42-49. https://doi.org/10.58626/memba.1667338
AMA Budak İ. Prediction of Water Quality’s pH value using Random Forest and LightGBM Algorithms. MEMBA Su Bilimleri Dergisi. March 2025;11(1):42-49. doi:10.58626/memba.1667338
Chicago Budak, İbrahim. “Prediction of Water Quality’s PH Value Using Random Forest and LightGBM Algorithms”. MEMBA Su Bilimleri Dergisi 11, no. 1 (March 2025): 42-49. https://doi.org/10.58626/memba.1667338.
EndNote Budak İ (March 1, 2025) Prediction of Water Quality’s pH value using Random Forest and LightGBM Algorithms. MEMBA Su Bilimleri Dergisi 11 1 42–49.
IEEE İ. Budak, “Prediction of Water Quality’s pH value using Random Forest and LightGBM Algorithms”, MEMBA Su Bilimleri Dergisi, vol. 11, no. 1, pp. 42–49, 2025, doi: 10.58626/memba.1667338.
ISNAD Budak, İbrahim. “Prediction of Water Quality’s PH Value Using Random Forest and LightGBM Algorithms”. MEMBA Su Bilimleri Dergisi 11/1 (March 2025), 42-49. https://doi.org/10.58626/memba.1667338.
JAMA Budak İ. Prediction of Water Quality’s pH value using Random Forest and LightGBM Algorithms. MEMBA Su Bilimleri Dergisi. 2025;11:42–49.
MLA Budak, İbrahim. “Prediction of Water Quality’s PH Value Using Random Forest and LightGBM Algorithms”. MEMBA Su Bilimleri Dergisi, vol. 11, no. 1, 2025, pp. 42-49, doi:10.58626/memba.1667338.
Vancouver Budak İ. Prediction of Water Quality’s pH value using Random Forest and LightGBM Algorithms. MEMBA Su Bilimleri Dergisi. 2025;11(1):42-9.

Menba Kastamonu Üniversitesi Su Ürünleri Fakültesi Dergisi olarak 2013'te kurulan dergimiz,
MEMBA Su Bilimleri Dergisi olarak yayın hayatına devam etmektedir.
-----------
Su bilimleri alanında biyoloji, ekoloji, içsular, balık besleme, balık avcılığı, balıkçılık teknolojisi, balıkçılık ekonomisi ve yönetimi, su ürünleri işleme teknolojileri, su kimyası, mikrobiyoloji, alg biyoteknolojisi, denizel organizmaların korunması, acısu ve tatlı su habitatları ve kirlilik, ekotoksikoloji, tarımsal ve çevresel sürdürülebilirlik, iklim ve bitki büyüme modelleri, iklim değişikliği, doğal afetler, hidrometeorolojik afetler, uzaktan algılama, coğrafi bilgi teknolojileri, kıyısal alanlar, kurak ve yarıkurak topografyalar, mekansal analiz ve modelleme, biyocoğrafya, fiziki coğrafya, beşeri ve ekonomik coğrafya, jeomorfoloji, çevresel sorunlar, hayvansal ve bitkisel biyoteknoloji, hayvansal ve bitkisel üretim alanlarında İngilizce ve Türkçe orjinal makaleler, kısa notlar, teknik notlar, raporlar ve derlemeleri yılda dört sayı (Mart, Haziran, Eylül, Aralık) olarak yayınlanan online, açık erişimli, uluslararası hakemli dergidir.

MEMBA Su Bilimleri Dergisi
TRDizin, SOBIAD, ASCI, CAB Direct, Google Scholar, Paperity, Asosindex, Academic Journal Index, CNKI Scholar
dizinlerinde taranmaktadır.
----------
Dergimize makale yükleme sırasında intihal benzerlik raporu yüklemek zorunlu ve bu raporun intihal benzerlik oranının % 30'un altında olması gerekmektedir. Bu raporu yazarlar makale yükleme sırasında göndermelidir.
Dergimize yüklenen Türkçe ve İngilizce makalelerde Türkçe ve İngilizce özetlerin bulunması zorunludur.