Missing
values in a data set present an important problem for almost any traditional
and modern statistical method, since most of these methods were developed under
the assumption that the data set was complete. However, in the real world no
complete datasets are available and the issue of missing data is frequently
encountered in veterinary field studies as in other fields. While imputation of
missing data is important in veterinary field studies where data mining is
newly starting to be implemented, another important issue is how it should be
imputed. This is because in many studies observations with any variables having
missing values are being removed or they are completed by traditional methods.
In recent years, while alternative approaches are widely available to prevent
removal of observations with missing values, they are being used rarely. The
aim of this study is to examine mean, median, nearest neighbors, mice and
missForest methods to impute the simulated missing data which is the randomly
removed with varying frequencies (5 to 25% by 5%) from original veterinary
dataset. Then highly accurate methods selected to impute original dataset for
observation of influence in classifier performance and to determine the optimal
imputation method for the original dataset.
missing value multiple imputation classification naive bayes decision tree machine learning veterinary
Primary Language | English |
---|---|
Subjects | Computer Software |
Journal Section | Research Articles |
Authors | |
Publication Date | December 1, 2019 |
Submission Date | January 22, 2019 |
Acceptance Date | July 23, 2019 |
Published in Issue | Year 2019 Volume: 23 Issue: 6 |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.