TY - JOUR T1 - A Meta-Heuristic Algorithm-Based Feature Selection Approach to Improve Prediction Success for Salmonella Occurrence in Agricultural Waters AU - Topalcengiz, Zeynal AU - Demir, Murat AU - Canayaz, Murat PY - 2024 DA - January Y2 - 2023 DO - 10.15832/ankutbd.1302050 JF - Journal of Agricultural Sciences JO - J Agr Sci-Tarim Bili PB - Ankara University WT - DergiPark SN - 1300-7580 SP - 118 EP - 130 VL - 30 IS - 1 LA - en AB - The presence of Salmonella in agricultural waters may be a source of produce contamination. Recently, the performances of various algorithms have been tested for the prediction of indicator bacteria population and pathogen occurrence in agricultural water sources. The purpose of this study was to evaluate the performance of meta-heuristic optimization algorithms for feature selection to increase the Salmonella occurrence prediction success of commonly used algorithms in agricultural waters. Previously collected datasets from six agricultural ponds in Central Florida included the population of indicator microorganisms, physicochemical water attributes, and weather station measurements. Salmonella presence was also reported with PCR-confirmed method in data set. Features were selected by using binary meta-heuristic optimization methods including differential evolution optimization (DEO), grey wolf optimization (GWO), Harris hawks optimization (HHO) and particle swarm optimization (PSO). Each meta-heuristic method was run 100 times for the extraction of features before classification analysis. Selected features after optimization were used in the K-nearest neighbor algorithm (kNN), support vector machine (SVM) and decision tree (DT) classification methods. Microbiological indicators were ranked as the first or second features by all optimization algorithms. Generic Escherichia coli was selected as the first feature 81 and 91 times out of 100 using GWO and DEO, respectively. The meta-heuristic optimization algorithms for the feature selection process followed by machine learning classification methods yielded a prediction accuracy between 93.57 and 95.55%. Meta-heuristic optimization algorithms had a positive effect on improving Salmonella prediction success in agricultural waters despite spatio-temporal variations. This study indicates that the development of computer-based tools with improved meta-heuristic optimization algorithms can help growers to assess risk of Salmonella occurrence in specific agricultural water sources with the increased prediction success. KW - Optimization KW - Support Vector Machine KW - kNN KW - Decision tree KW - Water quality CR - Abimbola O P, Mittelstet A R, Messer T L, Berry E D, Bartelt-Hunt S L & Hansen S P (2020). Predicting Escherichia coli loads in cascading dams with machine learning: An integration of hydrometeorology, animal density and grazing pattern. The Science of the Total Environment 722: 137894. https://doi.org/10.1016/j.scitotenv.2020.137894 CR - Akinola O O, Ezugwu A E, Agushaka J O, Zitar R A & Abualigah L (2022). Multiclass feature selection with metaheuristic optimization algorithms: a review. Neural Computing and Applications 34: 19751-19790. https://doi.org/10.1007/s00521-022-07705-4 CR - Agrawal P, Abutarboush H F, Ganesh T & Mohamed A W (2021). Metaheuristic algorithms on feature selection: A survey of one decade of research (2009-2019). IEEE Access 9: 26766-26791. https://doi.org/10.1109/ACCESS.2021.3056407 CR - Ashbolt N, Grabow W O K & Snozzi M (2001). Indicators of microbial water quality. In: L Fewtrell & J Bartram (Eds.), Water Quality: Guidelines, Standards and Health, World Health Organization (WHO) IWA Publishing pp. 289-316 CR - Ayhan S & Erdoğmuş Ş (2014). Kernel function selection for the solution of classification problems via support vector machines. Destek vektör makineleriyle sınıflandırma problemlerinin çözümü için çekirdek fonksiyonu seçimi (In Turkish). Eskişehir Osmangazi University Journal of Economics and Administrative Sciences 9:175-201 CR - Benjamin L, Atwill E R, Jay-Russell M, Cooley M, Carychao D, Gorski L & Mandrell R E (2013). Occurrence of generic Escherichia coli, E. coli O157 and Salmonella spp. in water and sediment from leafy green produce farms and streams on the Central California coast. International Journal of Food Microbiology 165(1): 65-76. https://doi.org/10.1016/j.ijfoodmicro.2013.04.003 CR - Blum C & Roli A (2003). Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Computing Surveys 35: 268-308. https://doi.org/10.1145/937503.937505 CR - Bradley A P (1997). The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition 30: 1145-1159. https://doi.org/10.1016/S0031-3203(96)00142-2 CR - Bradshaw J K, Snyder B J, Oladeinde A, Spidle D, Berrang M E, Meinersmann R J, Oakley B, Sidle R C, Sullivan K & Molina M (2016). Characterizing relationships among fecal indicator bacteria, microbial source tracking markers, and associated waterborne pathogen occurrence in stream water and sediments in a mixed land use watershed. Water Research 101: 498-509. https://doi.org/10.1016/j.watres.2016.05.014 CR - Budak H (2018). Feature selection methods and a new approach. Özellik seçim yöntemleri ve yeni bir yaklaşım (In Turkish). Süleyman Demirel University Journal of Natural and Applied Sciences 22: 21-31. https://doi.org/10.19113/sdufbed.01653 CR - Buyrukoğlu S (2021). New hybrid data mining model for prediction of Salmonella presence in agricultural waters based on ensemble feature selection and machine learning algorithms. Journal of Food Safety 41: 12903. https://doi.org/10.1111/jfs.12903 CR - Buyrukoğlu G, Buyrukoğlu S & Topalcengiz Z (2021). Comparing regression models with count data to artificial neural network and ensemble models for prediction of generic Escherichia coli population in agricultural ponds based on weather station measurements. Microbial Risk Analysis 19: 100171. https://doi.org/10.1016/j.mran.2021.100171 CR - Buyrukoğlu S, Yılmaz Y & Topalcengiz Z (2022). Correlation value determined to increase Salmonella prediction success of deep neural network for agricultural waters. Environmental Monitoring and Assessment 194: 373. https://doi.org/10.1007/s10661-022-10050-7 CR - Canayaz M (2021). MH-COVIDNet: Diagnosis of COVID-19 using deep neural networks and meta-heuristic-based feature selection on X-ray images. BIomedical Signal Processing and Control 64: 102257. https://doi.org/10.1016/j.bspc.2020.102257 CR - Centers for Disease Control and Prevention (CDC) (2007). Multistate outbreaks of Salmonella infections associated with raw tomatoes eaten in restaurants--United States, 2005-2006. MMWR. Morbidity and Mortality Weekly Report 56(35): 909–911. CR - Cortes C & Vapnik V (1995). Support-vector networks. Machine Learning 20: 273-297. https://doi.org/10.1007/BF00994018 CR - Çelik Y, Yıldız İ & Karadeniz A T (2019). A brief review of metaheuristic algorithms improved in the last three years. European Journal of Science and Technology pp. 463-477. https://doi.org/10.31590/ejosat.638431 CR - Das S & Suganthan P N (2011). Differential Evolution: A Survey of the State-of-the-Art. IEEE Transactions on Evolutionary Computation 15: 4-31. https://doi.org/10.1109/TEVC.2010.2059031 CR - Dokeroglu T, Deniz A & Kiziloz H E (2022). A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing 494: 269-296. https://doi.org/10.1016/j.neucom.2022.04.083 CR - Emary E, Zawbaa H M & Hassanien A E (2016). Binary grey wolf optimization approaches for feature selection. Neurocomputing 172: 371-381. https://doi.org/10.1016/j.neucom.2015.06.083 CR - Food and Drug Administration (FDA) (2015). Federal Register Notice: Standards for the Growing, Harvesting, Packing, and Holding of Produce for Human Consumption; Final Rule. Available at: https://www.gpo.gov/fdsys/pkg/FR-2015-11-27/pdf/2015-28159.pdf. Accessed 12 July 2022 CR - Grandini M, Bagli E & Visani G (2020). Metrics for Multi-Class Classification: An Overview. ArXiv, https://doi.org/10.48550/arXiv.2008.05756 CR - Greene S K, Daly E R, Talbot E A, Demma L J, Holzbauer S, Patel N J, Hill T A, Walderhaug M O, Hoekstra R M, Lynch M F & Painter J A (2008). Recurrent multistate outbreak of Salmonella Newport associated with tomatoes from contaminated fields, 2005. Epidemiology and Infection 136(2): 157–165. https://doi.org/10.1017/S095026880700859X CR - Guo G, Wang H, Bell D, Bi Y & Greer K (2003). KNN model-based approach in classification. In: R Meersman et al (Eds.), On the move to meaningful internet systems 2003: CoopIS, DOA, and ODBASE. OTM 2003. Lecture Notes in Computer Science, Springer, pp. 986-996. https://doi.org/10.1007/978-3-540-39964-3_62 CR - Hand D, Mannila H & Smyth P (2001). Principles of data mining. A Bradford Book the MIT Press. CR - Havelaar A H, Vazquez K M, Topalcengiz Z, Muñoz-Carpena R & Danyluk M D (2017). Evaluating the U.S. Food Safety Modernization Act Produce Safety Rule standard for microbial quality of agricultural water for growing produce. Journal of Food Protection 80: 1832-1841. https://doi.org/10.4315/0362-028X.JFP-17-122 CR - Heidari A A, Mirjalili S, Faris H, Aljarah I, Mafarja M & Chen H (2019). Harris hawks optimization: Algorithm and applications. Future Generation Computer Systems 97: 849-872. https://doi.org/10.1016/j.future.2019.02.028 Imandoust S B & Bolandraftar M (2013). Application of K-nearest neighbor (KNN) approach for predicting economic events: Theoretical background. International Journal of Engineering Research and Applications 3: 605-610. CR - Kennedy J & Eberhart R (1995). Particle swarm optimization. Proceedings of ICNN'95 - International Conference on Neural Networks, 4: 1942-1948. https://doi.org/10.1109/ICNN.1995.488968 CR - Liang Y, Liao B & Zhu W. (2017). An improved binary differential evolution algorithm to infer tumor phylogenetic trees. BioMed Research International 2017: 5482750. https://doi.org/10.1155/2017/5482750 CR - McEgan R, Mootian G, Goodridge L D, Schaffner D W & Danyluk M D (2013). Predicting Salmonella populations from biological, chemical, and physical indicators in Florida surface waters. Applied and Environmental Microbiology 79(13): 4094-4105. https://doi.org/10.1128/AEM.00777-13 CR - Mirjalili S, Mirjalili S M & Lewis A. (2014). Grey wolf optimizer. Advances in Engineering Software 69: 46-61. https://doi.org/10.1016/j.advengsoft.2013.12.007 CR - Nitze I, Schulthess U & Asche H (2012). Comparison of machine learning algorithms random forest, artificial neural network and support vector machine to maximum likelihood for supervised crop type classification. Proceedings of the 4th GEOBIA 35-40. CR - Osowski S, Siwek K & Markiewicz T (2004). MLP and SVM networks - a comparative study. Proceedings of the 6th Nordic Signal Processing Symposium pp. 37-40 CR - Phyu T Z & Oo N N (2016). Performance comparison of feature selection methods. MATEC Web of Conferences 42: 06002. https://doi.org/10.1051/matecconf/20164206002 CR - Polat H, Topalcengiz Z & Danyluk M D (2020). Prediction of Salmonella presence and absence in agricultural surface waters by artificial intelligence approaches. Journal of Food Safety 40: e12733. https://doi.org/10.1111/jfs.12733 CR - Price K V, Storn R M & Lampinen J A (2005). Differential evolution: A practical approach to global optimization, Springer https://doi.org/10.1007/3-540- CR - Steele M, Mahdi A & Odumeru J (2005). Microbial assessment of irrigation water used for production of fruit and vegetables in Ontario, Canada. Journal of Food Protection 68(7): 1388–1392. https://doi.org/10.4315/0362-028X-68.7.1388 CR - Storn R & Price K (1997). Differential evolution - A simple and efficient adaptive scheme for global optimization over continuous spaces. Journal of Global Optimization 11: 341-359. https://doi.org/10.1023/A:1008202821328 CR - Tharwat A (2018). Classification assessment methods. Applied Computing and Informatics 17: 168-192. https://doi.org/10.1016/j.aci.2018.08.003 CR - Too J, Abdullah A R, Mohd Saad N M, Ali N M & Tee W (2018). A new competitive binary grey wolf optimizer to solve the feature selection problem in EMG signals classification. Computers 7: 58. https://doi.org/10.3390/computers7040058 CR - Too J, Abdullah A R, Mohd Saad N M & Tee W (2019). EMG feature selection and classification using a Pbest-guide binary particle swarm optimization, Computation 7(1): 12. https://doi.org/10.3390/computation7010012 CR - Topalcengiz Z & Danyluk M D (2019). Fate of generic and Shiga toxin-producing Escherichia coli (STEC) in Central Florida surface waters and evaluation of EPA Worst Case water as standard medium. Food Research International 120: 322-329. https://doi.org/10.1016/j.foodres.2019.02.045 CR - Topalcengiz Z, McEgan R & Danyluk M D (2019). Fate of Salmonella in Central Florida surface waters and evaluation of EPA Worst Case Water as a standard medium. Journal of Food Protection 82(6): 916–925. https://doi.org/10.4315/0362-028X.JFP-18-331 CR - Topalcengiz Z, Strawn L K & Danyluk M D (2017). Microbial quality of agricultural water in Central Florida. PLoS ONE 12(4): e0174889. https://doi.org/10.1371/journal.pone.0174889. CR - Truchado P, Hernandez N, Gil M I, Ivanek R & Allende A (2018). Correlation between E. coli levels and the presence of foodborne pathogens in surface irrigation water: Establishment of a sampling program. Water Research 128: 226–233. https://doi.org/10.1016/j.watres.2017.10.041 CR - Weller D L, Love T, Belias A & Wiedmann M (2020). Predictive Models may complement or provide an alternative to existing strategies for assessing the enteric pathogen contamination status of northeastern streams used to provide water for produce production. Frontiers in Sustainable Food Systems 4: 561517. https://doi.org/10.3389/fsufs.2020.561517 CR - Yang X S (2011). Review of metaheuristics and generalized evolutionary walk algorithm. International Journal of Bio-Inspired Computation 3: 77-84. https://doi.org/10.1504/IJBIC.2011.039907 CR - Zhang Y, Liu R, Wang X, Chen H & Li C (2021). Boosted binary Harris hawks optimizer and feature selection. Engineering with Computers 37: 3741-3770. https://doi.org/10.1007/s00366-020-01028-5 UR - https://doi.org/10.15832/ankutbd.1302050 L1 - https://dergipark.org.tr/en/download/article-file/3163714 ER -