Detection and Diagnostic Methods of Multiple Influential Points in Binary Logistic Regression Model in Animal Breeding
Abstract
Multiple influential points adversely affect parameter estimation in binary logistic regression models and lead to misinterpretation of results. An influential point is a data point that does not follow the overall slope of remaining data and has extreme value in terms of x. Since the presence of approximately 10% of influential points in a dataset affects parameter estimates, detection and diagnosis of these points greatly matter. Graphical (such as scatter graph and box graph) and analytical methods are adopted in the detection and diagnosis of multiple influential points. Among the commonly used diagnostic methods are Pearson residuals, Standardized Pearson Residuals (SPR), Cook Distance (CD), Hat matrix, DFFITS, and DFBETA. However, these methods mask problems and fail to diagnose if there are multiple influential points. Many statisticians have developed and proposed new diagnostic methods, such as Generalized Standardized Pearson Residual (GSPR) and Generalized Weights (GW), to overcome this problem. This study exploited a dataset containing multiple influential points (15%) for weaning weight (WW), yearling weight (YW), fleece weight (FW), and fertility rate (FR) of Romney ewes and modelled the effects of WW, TW and FW variables on FR by binary logistic regression model. This study is intended to determine the multiple influential points by graphical methods and to examine the performance of commonly used and newly developed methods in the diagnosis of these data points. As a result, it was observed that the commonly used methods mask multiple influential points and the new proposed methods competently identify these points.
Keywords
References
- Aktaş, A. H. & Doğan, Ş. (2014). Effect of live weight and age of Akkaraman ewes at mating on multiple birth rate, growth traits, and survival rate of lambs. Turk. J. Vet. Anim. Sci., 38, 176–182. doi:10.5194/aab-58-451-2015.
- Aktaş, A. H., Dursun, Ş., Doğan, Ş., Kiyma, Z., Demirci, U., & Halıcı, İ. (2015). Effects of ewe live weight and age on reproductive performance, lamb growth, and survival in Central Anatolian Merino sheep. Arc. Anim. Breed., 58, 451-459. doi:10.5194/aab-58-451-2015.
- Baeza-Rodríguez, J. J., Montaño-Bermúdez, M., Vega-Murillo, V. E., & Arechavaleta-Velasco, M. E. (2018). Linear and logistic models for multiple-breed genetic analysis of heifer fertility in Mexican Simmental–Simbrah beef cattle. Journal of Applied Animal Research, 46(1), 534-540. doi:10.1080/09712119.2017.1357559.
- Belsly, D. A., Kuh, E. & Welsch, R. E. (1980). Regression Diagnostics: Identifying Influential data and Source of Collinearity. Wiley, New York.
- Cook, R. D. (1977). Detection of influential observation in linear regression. Technometrics. 19(1), 15-18. doi:10.1080/00401706.1977.10489493.
- Copas, J. B. (1988). Binary regression model for contaminated data (with discuss). Journal of the Royal Statistical Society, Series B., 50, 225-265.
- Estaghvirou, S. B. O., Ogutu, J. O. & Piepho, H.-P. (2014). Influence of Outliers on Accuracy Estimation in Genomic Prediction in Plant Breeding. G3-Genes Genomes Genetics. 4, 2317-2328. doi:10.1534/g3.114.011957.
- Eyduran, E., Özdemir, T., Çak, B., & Alarslan, E. (2005). Using of logistic regression in animal science. Journal of Applied Sciences. 5(10), 1753-1756. doi:10.3923/jas.2005.1753.1756.
Details
Primary Language
English
Subjects
Zootechny (Other)
Journal Section
Research Article
Authors
Burcu Mestav
*
0000-0003-0864-5279
Türkiye
Publication Date
December 31, 2019
Submission Date
October 25, 2019
Acceptance Date
November 28, 2019
Published in Issue
Year 2019 Volume: 29 Number: 4
