TY - JOUR T1 - Spatiotemporal analysis and machine learning-based prediction of air quality in Indian urban cities AU - Jain, Rituraj AU - Singh, Sitesh Kumar AU - Palaniappan, Damodharan AU - Parmar, Kumar AU - T, Premavathi AU - Gothania, Jaishri PY - 2025 DA - December Y2 - 2024 DO - 10.35208/ert.1587308 JF - Environmental Research and Technology JO - ERT PB - Mehmet Sinan Bilgili WT - DergiPark SN - 2636-8498 SP - 809 EP - 822 VL - 8 IS - 4 LA - en AB - Air pollution, more specifically Particulate Matter (PM2.5 - particulate matter with diameter less than 2.5 micrometers), threatens the public health most critically in urban Indian cities, and Delhi, among them, presents the most acute challenge. This study predicts the concentrations of PM2.5 using machine learning models using data ranging from 2010 to 2023 and assessing model fit via R², RMSE, MAE, and MAPE metrics. Models tested: Random Forest, Gradient Boosting, AdaBoost, Histogram-Based Gradient Boosting, XGBoost. The Random Forest model is extremely effective for the training set (R² = 0.99) but shows the highest degree of overfitting, with R² of 0.35 for the test set. Gradient Boosting has a more balanced result, with R² 0.54 and 0.48, respectively on the training and test set, as well as fewer errors (RMSE: 56.46, MAE: 39.60, MAPE: 0.50). Hence, it is a good predictor. AdaBoost performs the worst with an R² of 0.28 on the test set and the highest errors in terms of RMSE: 66.86, MAE: 52.34, MAPE: 0.94. Histogram Gradient Boosting and XGBoost: both of these models yield an average accuracy value, but the Gradient Boosting model is still a tad better than the former ones in terms of RMSE and MAE. Thus, Gradient Boosting happens to be the most accurate model in light of generalization as well as accuracy for the prediction of the concentration of PM2.5. These results will be highly beneficial to policymakers to adopt machine learning-based air quality forecasting for better environmental management and the protection of public health. KW - Air quality prediction KW - gradient boosting KW - machine learning models KW - particulate matter (PM2.5) KW - random forest KW - spatiotemporal analysis CR - H. Liu, Q. Han, H. Sun, J. Sheng, and Z. Yang, “Spatiotemporal adaptive attention graph convolution network for city-level air quality prediction,” Scientific Reports, vol. 13(1), pp. 13335, 2023, doi: 10.1038/s41598-023-39286-0. CR - J. Duan, Y. Gong, J. Luo, and Z. Zhao, “Air-quality prediction based on the ARIMA-CNN-LSTM combination model optimized by dung beetle optimizer,” Scientific Reports, vol. 13(1), pp. 12127, 2023, doi: 10.1038/s41598-023-36620-4. CR - D. M and R. V, “Novel Regression and Least Square Support Vector Machine Learning Technique for Air Pollution Forecasting,” International Journal of Engineering Trends and Technology, vol. 71(4), pp. 147–158, 2023, doi: 10.14445/22315381/IJETT-V71I4P214. CR - X. Zhang, X. Jiang, and Y. Li, “Prediction of air quality index based on the SSA-BiLSTM-LightGBM model,” Scientific Reports, vol. 13(1), pp. 5550, 2023, doi: 10.1038/s41598-023-32775-2. CR - M. Bonas and S. Castruccio, “Calibration of SpatioTemporal forecasts from citizen science urban air pollution data with sparse recurrent neural networks,” The Annals of Applied Statistics, vol. 17(3), 2023, doi: 10.1214/22-AOAS1683. CR - R. López-Blanco, M. Chaveinte García, R. S. Alonso, J. Prieto, and J. M. Corchado, “Pollutant Time Series Analysis for Improving Air-Quality in Smart Cities,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 8(3), pp. 98, 2023, doi: 10.9781/ijimai.2023.08.005. CR - R. Guo, Y. Qi, B. Zhao, Z. Pei, F. Wen, S. Wu, and Q. Zhang, “High-Resolution Urban Air Quality Mapping for Multiple Pollutants Based on Dense Monitoring Data and Machine Learning,” International Journal of Environmental Research and Public Health, vol. 19(13), pp. 8005, 2022, doi: 10.3390/ijerph19138005. CR - M. Méndez, M. G. Merayo, and M. Núñez, “Machine learning algorithms to forecast air quality: a survey,” Artificial Intelligence Review, vol. 56(9), pp. 10031–10066, 2023, doi: 10.1007/s10462-023-10424-4. CR - S. Chowdhury, A. Pillarisetti, A. Oberholzer, J. Jetter, J. Mitchell, E. Cappuccilli, B. Aamaas, K. Aunan, A. Pozzer, and D. Alexander, “A global review of the state of the evidence of household air pollution’s contribution to ambient fine particulate matter and their related health impacts,” Environment International, vol. 173, pp. 107835, 2023, doi: 10.1016/j.envint.2023.107835. CR - J. Cheng, F. Li, L. Liu, H. Jiao, and L. Cui, “Spatiotemporal Variation Air Quality Index Characteristics in China’s Major Cities During 2014–2020,” Water Air & Soil Pollution, vol. 234(5), pp. 292, 2023, doi: 10.1007/s11270-023-06304-w. CR - R. R. Behera, D. R. Satapathy, A. Majhi, and C. R. Panda, “Spatiotemporal variation of atmospheric pollution and its plausible sources in an industrial populated city, Bay of Bengal, Paradip, India,” Urban Climate, vol. 37, pp. 100860, 2021, doi: 10.1016/j.uclim.2021.100860. CR - A. A. Khan, K. Garsa, P. Jindal, P. C. S. Devara, S. Tiwari, and P. B. Sharma, “Demographic Evaluation and Parametric Assessment of Air Pollutants over Delhi NCR,” Atmosphere (Basel), vol. 14(9), pp. 1390, 2023, doi: 10.3390/atmos14091390. CR - Vaishali, G. Verma, and R. M. Das, “Influence of Temperature and Relative Humidity on PM2.5 Concentration over Delhi,” MAPAN, vol. 38(3), pp. 759–769, 2023, doi: 10.1007/s12647-023-00656-8. CR - K. K. Rani Samal, K. Sathya Babu, A. Acharya, and S. K. Das, “Long Term Forecasting of Ambient Air Quality Using Deep Learning Approach,” in IEEE 17th India Council International Conference (INDICON), IEEE, Dec. 2020, pp. 1-6. doi: 10.1109/INDICON49873.2020.9342529. CR - M. Ansari and M. Alam, “An Intelligent IoT-Cloud-Based Air Pollution Forecasting Model Using Univariate Time-Series Analysis,” Arabian Journal for Science and Engineering, vol. 49(3), pp. 3135–3162, 2024, doi: 10.1007/s13369-023-07876-9. CR - K. Kumar and B. P. Pande, “Air pollution prediction with machine learning: a case study of Indian cities,” International Journal of Environmental Science and Technology, vol. 20(5), pp. 5333–5348, 2023, doi: 10.1007/s13762-022-04241-5. CR - S. M. Selvi, K. Ravikumar, A. D. Rajendran, A. B. Bagavathi, N. Narayanan, and V. Mangottiri, “Assessment of Air Quality Index in major cities of India - Lessons from Lockdown,” IOP Conference Series Materials Science and Engineering, vol. 955(1), pp. 012079, 2020, doi: 10.1088/1757-899X/955/1/012079. CR - V. Sharma, S. Ghosh, S. Dey, and S. Singh, “Modelling PM2.5 for Data-Scarce Zone of Northwestern India using Multi Linear Regression and Random Forest Approaches,” Annals of GIS, vol. 29(3), pp. 415–427, 2023, doi: 10.1080/19475683.2023.2183523. CR - A. Masood and K. Ahmad, “Prediction of PM2.5 concentrations using soft computing techniques for the megacity Delhi, India,” Stochastic Environmental Research and Risk Assessment, vol. 37(2), pp. 625–638, 2023, doi: 10.1007/s00477-022-02291-2. CR - Central Pollution Control Board, “CPCB|Central Pollution Control Board,” cpcb.nic.in, 2019. https://cpcb.nic.in/ CR - D. Kothandaraman, N. Praveena, K. Varadarajkumar, B.M. Rao, D. Dhabliya, S. Satla, and W. Abera, “Intelligent Forecasting of Air Quality and Pollution Prediction Using Machine Learning,” Adsorption Science & Technology, vol. 2022, pp. 1–15, 2022, doi: 10.1155/2022/5086622. CR - T. Toharudin, R.E. Caraka,I.R. Pratiwi,Y. Kim, P.U. Gio, A.D. Sakti, M. Noh, F.A.L. Nugraha, R.S. Pontoh, T.H. Putri, T.S. Azzahra, J.J. Cerelia, G. Darmawan, and B. Pardamean, "Boosting Algorithm to Handle Unbalanced Classification of PM2.5 Concentration Levels by Observing Meteorological Parameters in Jakarta-Indonesia Using AdaBoost, XGBoost, CatBoost, and LightGBM," IEEE Access, vol. 11, pp. 35680-35696, 2023, doi: 10.1109/ACCESS.2023.3265019 CR - N. Doreswamy, H. K. S, Y. Km, and I. Gad, “Forecasting Air Pollution Particulate Matter (PM2.5) Using Machine Learning Regression Models,” Procedia Computer Science, vol. 171, pp. 2057–2066, 2020, doi: 10.1016/j.procs.2020.04.221. CR - A. Sarkar, S. S. Ray, A. Prasad and C. Pradhan, "A Novel Detection Approach of Ground Level Ozone using Machine Learning Classifiers," Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, 2021, pp. 428-432, doi: 10.1109/I-SMAC52330.2021.9640852 UR - https://doi.org/10.35208/ert.1587308 L1 - https://dergipark.org.tr/en/download/article-file/4375405 ER -