TY - JOUR T1 - Application of machine learning algorithms in the investigation of groundwater quality parameters over YSR district, India AU - Mogaraju, Jagadish Kumar PY - 2023 DA - January DO - 10.31127/tuje.1032314 JF - Turkish Journal of Engineering JO - TUJE PB - Murat YAKAR WT - DergiPark SN - 2587-1366 SP - 64 EP - 72 VL - 7 IS - 1 LA - en AB - Human life sustained for decades due to the availability of basic needs, and freshwater is one of them. However, groundwater quality is constantly under pressure. This can be attributed to anthropogenic activities not limited to urban areas but to rural zones. Machine learning methods like linear discriminant analysis (LDA), Classification and Regression Trees (CART), k-Nearest Neighbour (KNN), Support Vector Machines (SVM) and, Random Forest (RF) models were used to analyse groundwater quality variables. The mean accuracy of each classifier was calculated, and the obtained mean accuracies were 77.5% (LDA), 87% (CART), 96% (KNN), 93.5% (SVM) and 96% (RF). RF and KNN models were selected as optimal models with higher accuracy. This study made it apparent that machine learning algorithms can estimate and predict water quality variables with significant accuracy. In this study, the observations and variables were compared with the water quality index and drinking water limits provided by the Bureau of Indian Standards. The water quality index for each observation was calculated. If at least four variables have a higher value than prescribed limits, it was assigned a value of 1; if more than four variables reported higher values, it was assigned a value of 2. KW - Groundwater KW - Machine learning KW - Support vector Machines KW - Classification Trees KW - Random Forest CR - Aytaç, E. (2020). Unsupervised learning approach in defining the similarity of catchments: Hydrological response unit-based k-means clustering, a demonstration on Western Black Sea Region of Turkey. International Soil and Water Conservation Research, 8(3), 321–331. https://doi.org/10.1016/j.iswcr.2020.05.002 CR - Singha, S., Pasupuleti, S., Singha, S. S., Singh, R., & Kumar, S. (2021). Prediction of groundwater quality using efficient machine learning technique. Chemosphere, 276. https://doi.org/10.1016/j.chemosphere.2021.130265 CR - Bilali, A., Taleb, A., & Brouziyne, Y. (2021). Groundwater quality forecasting using machine learning algorithms for irrigation purposes. Agricultural Water Management, 245. https://doi.org/10.1016/j.agwat.2020.106625 CR - Yenugu, S. R., Vangala, S., & Badri, S. (2020a). Groundwater quality evaluation using GIS and water quality index in and around inactive mines, Southwestern parts of Cuddapah basin, Andhra Pradesh, South India. HydroResearch, 3, 146–157. https://doi.org/10.1016/j.hydres.2020.11.001 CR - Brindha, K., Pavelic, P., Sotoukee, T., Douangsavanh, S., & Elango, L. (2017). Geochemical Characteristics and Groundwater Quality in the Vientiane Plain, Laos. Exposure and Health, 9(2), 89–104. https://doi.org/10.1007/s12403-016-0224-8 CR - Reddy, B. M., V.Sunitha, M.Prasad, Reddy, Y. S., & Reddy, M. R. (2019). Evaluation of groundwater suitability for domestic and agricultural utility in semi-arid region of Anantapur, Andhra Pradesh State, South India. Groundwater for Sustainable Development, 9, 100262. https://doi.org/10.1016/j.gsd.2019.100262 CR - Datta, P. S., & Tyagi, S. K. (1996). Major Ion Chemistry of Groundwater in Delhi Area: Chemical Weathering Processes and Groundwater Flow Regime. Journal of Geological Society of India (Online Archive from Vol 1 to Vol 78), 47(2), 179–188. CR - Raju, N. J. (2007). Hydrogeochemical parameters for assessment of groundwater quality in the upper Gunjanaeru River basin, Cuddapah District, Andhra Pradesh, South India. Environmental Geology, 52(6), 1067–1074. https://doi.org/10.1007/s00254-006-0546-0 CR - Ramakrishna Reddy, M., Janardhana Raju, N., Venkatarami Reddy, Y., & Reddy, T. V. K. (2000). Water resources development and management in the Cuddapah district, India. Environmental Geology, 39(3), 342–352. https://doi.org/10.1007/s002540050013 CR - Sreedevi, P. D. (2004a). Groundwater Quality of Pageru River Basin, Cuddapah District, Andhra Pradesh. Journal of Geological Society of India (Online Archive from Vol 1 to Vol 78), 64(5), 619–636. CR - 11. Bedi, S., Samal, A., Ray, C., & Snow, D. (2020). Comparative evaluation of machine learning models for groundwater quality assessment. Environmental Monitoring and Assessment, 192(12), 776. https://doi.org/10.1007/s10661-020-08695-3 CR - Mosavi, A., Hosseini, F. S., Choubin, B., Abdolshahnejad, M., Gharechaee, H., Lahijanzadeh, A., & Dineva, A. A. (2020). Susceptibility Prediction of Groundwater Hardness Using Ensemble Machine Learning Models. Water, 12(10), 2770. https://doi.org/10.3390/w12102770 CR - Sajedi-Hosseini, F., Malekian, A., Choubin, B., Rahmati, O., Cipullo, S., Coulon, F., & Pradhan, B. (2018). A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination. Science of The Total Environment, 644, 954–962. https://doi.org/10.1016/j.scitotenv.2018.07.054 CR - Agrawal, P., Sinha, A., Kumar, S., Agarwal, A., Banerjee, A., Villuri, V. G. K., … Pasupuleti, S. (2021). Exploring Artificial Intelligence Techniques for Groundwater Quality Assessment. Water, 13(9), 1172. https://doi.org/10.3390/w13091172 CR - Tamiru, H., & Wagari, M. (2021). Comparison of ANN model and GIS tools for delineation of groundwater potential zones, Fincha Catchment, Abay Basin, Ethiopia. Geocarto International, 0(0), 1–19. https://doi.org/10.1080/10106049.2021.1946171 CR - Naghibi, S. A., Pourghasemi, H. R., & Abbaspour, K. (2018). A comparison between ten advanced and soft computing models for groundwater qanat potential assessment in Iran using R and GIS. Theoretical and Applied Climatology, 131(3), 967–984. https://doi.org/10.1007/s00704-016-2022-4 CR - Golkarian, A., Naghibi, S. A., Kalantar, B., & Pradhan, B. (2018). Groundwater potential mapping using C5.0, random forest, and multivariate adaptive regression spline models in GIS. Environmental Monitoring and Assessment, 190(3), 149. https://doi.org/10.1007/s10661-018-6507-8 CR - Acar, E., & Özerdem, M. S. (2020). On a yearly basis prediction of soil water content utilizing sar data: A machine learning and feature selection approach. Turkish Journal of Electrical Engineering & Computer Sciences, 28(4), 2316–2330. Retrieved from https://online-journals.tubitak.gov.tr/publishedManuscriptDetails.htm?id=27563 CR - Acar, E., Ozerdem, M. S., & Ustundag, B. B. (2019). Machine Learning based Regression Model for Prediction of Soil Surface Humidity over Moderately Vegetated Fields. 2019 8th International Conference on Agro-Geoinformatics (Agro-Geoinformatics), 1–4. 8820461 https://doi.org/10.1109/AgroGeoinformatics.2019. CR - Al-Adhaileh, M. H., & Alsaade, F. W. (2021). Modelling and prediction of water quality by using artificial intelligence. Sustain., 13. https://doi.org/10.3390/su13084259 CR - https://indiawris.gov.in/wris/#/GWQuality CR - http://cgwb.gov.in/GW-data-access.html CR - Districts, India, 2016—University of Texas Libraries GeoData. (n.d.). Retrieved November 21, 2021, from https://geodata.lib.utexas.edu/catalog/stanford-sh819zz8121 CR - Yenugu, S. R., Vangala, S., & Badri, S. (2020b). Monitoring of groundwater quality for drinking purposes using the WQI method and its health implications around inactive mines in Vemula-Vempalli region, Kadapa District, South India. Applied Water Science, 10(8), 202. https://doi.org/10.1007/s13201-020-01284-2 CR - Sreedevi, P. D. (2004b). Groundwater quality of Pageru River basin, Cuddapah District, Andhra Pradesh. Journal of Geological Society of India, 64. CR - Castro, C. L., & Braga, A. P. (2013). Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 24. https://doi.org/10.1109/TNNLS.2013.2246188 CR - Collins, R., & Jerkins, A. (1996). The impact of agriculture land use on stream chemistry in the middle Hills of the Himalayas, Nepal. Journal of Hydrology, 185. https://doi.org/10.1016/0022-1694(95)03008-5 CR - Ako, A. A., Eyong, G. E. T., Shimada, J., Koike, K., Hosono, T., Ichiyanagi, K., … Roger, N. N. (2014). Nitrate contamination of groundwater in two areas of the Cameroon Volcanic Line (Banana Plain and Mount Cameroon area). Applied Water Science, 4(2), 99–113. https://doi.org/10.1007/s13201-013-0134-x CR - Cateni, S., Colla, V., & Vannucci, M. (2014). A method for resampling imbalanced datasets in binary classification tasks for real-world problems. Neurocomputing, 135. https://doi.org/10.1016/J.NEUCOM.2013.05.059 CR - Ajmera, T. K., & Goyal, M. K. (2012). Development of stage discharge rating curve using model tree and neural networks: An application to Peachtree Creek in Atlanta. Expert Systems with Applications, 39. https://doi.org/10.1016/j.eswa.2011.11.101 CR - Zhou, Z. H., & Liu, X. Y. (2006). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 18. https://doi.org/10.1109/TKDE.2006.17 CR - Zhang, C., Tang, Y., Xu, X., & Kiely, G. (2011). Towards spatial geochemical modelling: Use of geographically weighted regression for mapping soil organic carbon contents in Ireland. Applied Geochemistry, 26. CR - Cunningham, P., & Delany, S. J. (2021). k-Nearest Neighbour Classifiers—A Tutorial. Conference Papers. https://doi.org/10.1145/3459665 CR - Celestino, A. E. M., Cruz, D. A. M., Sánchez, E. M. O., & Reyes, F. G. (n.d.). Groundwater Quality Assessment: An Improved Approach to K-Means Clustering, Principal Component Analysis and Spatial Analysis: A Case Study. Retrieved from https://core.ac.uk/display/156977871 CR - Biau, G. (2012). Analysis of a Random Forests Model. Journal of Machine Learning Research, 13(38), 1063–1095. Retrieved from http://jmlr.org/papers/v13/biau12a.html CR - Hastie, T., Tibshirani, R., & Friedman, J. (2009). Random Forests. In T. Hastie, R. Tibshirani, & J. Friedman (Eds.), The Elements of Statistical Learning: Data Mining, Inference, and Prediction (pp. 587–604). New York, NY: Springer. https://doi.org/10.1007/978-0-387-84858-7_15 CR - Gislason, P. O., Benediktsson, J. A., & Sveinsson, J. R. (2006). Random Forests for land cover classification. Pattern Recognition Letters, 27(4), 294–300. https://doi.org/10.1016/j.patrec.2005.08.011 UR - https://doi.org/10.31127/tuje.1032314 L1 - https://dergipark.org.tr/en/download/article-file/2114776 ER -