TY - JOUR T1 - Effect of Imputation Methods in the Classifier Performance AU - Cihan, Pinar AU - Kalıpsız, Oya AU - Gökçe, Erhan PY - 2019 DA - December Y2 - 2019 JF - Sakarya University Journal of Science JO - SAUJS PB - Sakarya University WT - DergiPark SN - 2147-835X SP - 1225 EP - 1236 VL - 23 IS - 6 LA - en AB - Missingvalues in a data set present an important problem for almost any traditionaland modern statistical method, since most of these methods were developed underthe assumption that the data set was complete. However, in the real world nocomplete datasets are available and the issue of missing data is frequentlyencountered in veterinary field studies as in other fields. While imputation ofmissing data is important in veterinary field studies where data mining isnewly starting to be implemented, another important issue is how it should beimputed. This is because in many studies observations with any variables havingmissing values are being removed or they are completed by traditional methods.In recent years, while alternative approaches are widely available to preventremoval of observations with missing values, they are being used rarely. Theaim of this study is to examine mean, median, nearest neighbors, mice andmissForest methods to impute the simulated missing data which is the randomlyremoved with varying frequencies (5 to 25% by 5%) from original veterinarydataset. Then highly accurate methods selected to impute original dataset forobservation of influence in classifier performance and to determine the optimalimputation method for the original dataset. KW - missing value KW - multiple imputation KW - classification KW - naive bayes KW - decision tree KW - machine learning KW - veterinary CR - [1] J. L. Schafer, Analysis of incomplete multivariate data: Chapman and Hall/CRC, 1997. CR - [2] I. R. Dohoo, C. R. Nielsen, and U. Emanuelson, "Multiple imputation in veterinary epidemiological studies: a case study and simulation," Preventive veterinary medicine, vol. 129, pp. 35-47, 2016. CR - [3] G. Ser and S. Keskin, "EXAMINING OF MULTIPLE IMPUTATION METHOD IN TWO MISSING OBSERVATION MECHANISMS," JAPS, Journal of Animal and Plant Sciences, vol. 26, pp. 594-598, 2016. CR - [4] P. Cihan, E. Gökçe, and O. Kalıpsız, "A review of machine learning applications in veterinary field," Kafkas Univ Vet Fak Derg, vol. 23, pp. 673-680, 2017. CR - [5] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, et al., "Missing value estimation methods for DNA microarrays," Bioinformatics, vol. 17, pp. 520-525, 2001. CR - [6] S. Van Buuren, H. C. Boshuizen, and D. L. Knook, "Multiple imputation of missing blood pressure covariates in survival analysis," Statistics in medicine, vol. 18, pp. 681-694, 1999. CR - [7] D. J. Stekhoven and P. Bühlmann, "MissForest—non-parametric missing value imputation for mixed-type data," Bioinformatics, vol. 28, pp. 112-118, 2011. CR - [8] E. HM, "An epidemiological study on neonatal lamb health," Kafkas Üniversitesi Veteriner Fakültesi Dergisi, vol. 15, 2009. CR - [9] K. AH and E. HM, "Risk Factors Associated with Passive Immunity, Health, Birth Weight And Growth Performance in Lambs: III. The Relationship among Passive Immunity, Birth Weight Gender, Birth Type, Parity, Dam," Kafkas Üniversitesi Veteriner Fakültesi Dergisi, vol. 19, 2013. CR - [10] R. J. Little and D. B. Rubin, Statistical analysis with missing data vol. 333: John Wiley & Sons, 2014. CR - [11] E. Alpaydin, Introduction to machine learning: MIT press, 2009. CR - [12] A. J. Viera and J. M. Garrett, "Understanding interobserver agreement: the kappa statistic," Fam Med, vol. 37, pp. 360-363, 2005. UR - https://dergipark.org.tr/en/pub/saufenbilder/issue//515716 L1 - https://dergipark.org.tr/en/download/article-file/828462 ER -