HANDLING MISSING VALUES IN MIXED PANEL FINANCIAL DATA: A COMPARISON OF DIFFERENT TECHNIQUES

Cumhur Ekinci; Mustafa Abdullah Hakkoz; Ünsal Kıran; Sirma Seker

doi:10.17261/Pressacademia.2023.1869

EN

HANDLING MISSING VALUES IN MIXED PANEL FINANCIAL DATA: A COMPARISON OF DIFFERENT TECHNIQUES

Abstract

Purpose- The purpose of this study is to compare the success of alternative data imputation techniques with missing data. The study distinguishes itself from the rest of the literature by proposing an appropriate technique for mixed data on financial performance and environmental, social and governance (ESG) metrics of companies. In addition to simple imputation techniques, we also use machine learning techniques that allow working with more complex data. Methodology- We first employ ad-hoc methods such as mean, median, mode, constant, most frequent and regression imputation. In what follows, we handle multivariate imputation techniques such as multiple imputation by chained equations (MICE). Finally, we run imputation methods with machine learning (ML) classification such as K-nearest Neighbor (KNN), Ridge and Random Forest. To consider the assumptions of missing data, we first check the normality of the variables with Kolmogorov-Smirnov test and employ Rubin’s classification technique that defines the relationship among variables with the probability of missing data. The success of imputation techniques applied to missing data changes when the missing data are classified with Rubin’s technique according to randomness. Consequently, we apply listwise deletion at various levels and alternative data imputation techniques. We then compare their performances. The raw data contain parametric as well as categorical variables (binary and others). Among these are time-series (yearly) financial series such as sales and total assets obtained from financial statements, ESG scores as well as float ratios for firms from several countries and industries. Imputation is done randomly on a sample varying from 5% to 30% of the dataset and results are compared to true data based on accuracy or other measures such as root mean square errors (RMSE) or mean absolute percentage error (MAPE). Several robustness checks have been performed to supplement the analysis. Findings- Results show that ML methods such as KNN have a superior performance than others. Moreover, when multidimensional nature of the data is taken into account, the prediction performance improves. Hence, an optimality can be reached based on parameters. Conclusion- Based upon the analysis, we conclude that the selected imputation technique and how it is employed matter to attain a higher accuracy and a better prediction of the missing values on selected mixed panel data in finance.

Keywords

References

Demirtas, H., Freels, S. A., and Yucel, R. M. (2008). Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: A simulation assessment. Journal of Statistical Computation and Simulation, 78(1), 69–84.
Lin, W. C., & Tsai, C. F. (2020). Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review, 53, 1487–1509.
Little, R. J., & Rubin, D. B. (2020). Statistical Analysis with Missing Data. 3rd ed., John Wiley & Sons.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592.
Sahin, Ö., Bax, K., Czado, C., & Paterlini, S. (2022). Environmental, Social, Governance scores and the Missing pillar—Why does missing information matter?. Corporate Social Responsibility and Environmental Management, 29(5), 1782–1798.
Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. CRC Press.
Serafeim, G. (2015). Integrated reporting and investor clientele. Journal of Applied Corporate Finance, 27(2), 34–51.
Song, Q., & Shepperd, M. (2007). Missing data imputation techniques. International Journal of Business Intelligence and Data Mining, 2(3), 261–291.

Details

Primary Language

English

Subjects

Business Administration

Journal Section

Research Article

Authors

Cumhur Ekinci ^*
0000-0002-0475-2272
Türkiye

Mustafa Abdullah Hakkoz This is me
0000-0002-2963-8513
Türkiye

Ünsal Kıran
0000-0003-1813-8748
Türkiye

Sirma Seker
0000-0002-2823-9078
Türkiye

Publication Date

January 15, 2024

Submission Date

November 15, 2023

Acceptance Date

January 15, 2024

Published in Issue

Year 2023 Volume: 18 Number: 1

DOI

https://doi.org/10.17261/Pressacademia.2023.1869

IZ

https://izlik.org/JA36HW43CD

Cite

RIS / Bibtex

APA

Ekinci, C., Hakkoz, M. A., Kıran, Ü., & Seker, S. (2024). HANDLING MISSING VALUES IN MIXED PANEL FINANCIAL DATA: A COMPARISON OF DIFFERENT TECHNIQUES. PressAcademia Procedia, 18(1), 103-104. https://doi.org/10.17261/Pressacademia.2023.1869

AMA

1.Ekinci C, Hakkoz MA, Kıran Ü, Seker S. HANDLING MISSING VALUES IN MIXED PANEL FINANCIAL DATA: A COMPARISON OF DIFFERENT TECHNIQUES. PAP. 2024;18(1):103-104. doi:10.17261/Pressacademia.2023.1869

Chicago

Ekinci, Cumhur, Mustafa Abdullah Hakkoz, Ünsal Kıran, and Sirma Seker. 2024. “HANDLING MISSING VALUES IN MIXED PANEL FINANCIAL DATA: A COMPARISON OF DIFFERENT TECHNIQUES”. PressAcademia Procedia 18 (1): 103-4. https://doi.org/10.17261/Pressacademia.2023.1869.

EndNote

Ekinci C, Hakkoz MA, Kıran Ü, Seker S (January 1, 2024) HANDLING MISSING VALUES IN MIXED PANEL FINANCIAL DATA: A COMPARISON OF DIFFERENT TECHNIQUES. PressAcademia Procedia 18 1 103–104.

IEEE

[1]C. Ekinci, M. A. Hakkoz, Ü. Kıran, and S. Seker, “HANDLING MISSING VALUES IN MIXED PANEL FINANCIAL DATA: A COMPARISON OF DIFFERENT TECHNIQUES”, PAP, vol. 18, no. 1, pp. 103–104, Jan. 2024, doi: 10.17261/Pressacademia.2023.1869.

ISNAD

Ekinci, Cumhur - Hakkoz, Mustafa Abdullah - Kıran, Ünsal - Seker, Sirma. “HANDLING MISSING VALUES IN MIXED PANEL FINANCIAL DATA: A COMPARISON OF DIFFERENT TECHNIQUES”. PressAcademia Procedia 18/1 (January 1, 2024): 103-104. https://doi.org/10.17261/Pressacademia.2023.1869.

JAMA

1.Ekinci C, Hakkoz MA, Kıran Ü, Seker S. HANDLING MISSING VALUES IN MIXED PANEL FINANCIAL DATA: A COMPARISON OF DIFFERENT TECHNIQUES. PAP. 2024;18:103–104.

MLA

Ekinci, Cumhur, et al. “HANDLING MISSING VALUES IN MIXED PANEL FINANCIAL DATA: A COMPARISON OF DIFFERENT TECHNIQUES”. PressAcademia Procedia, vol. 18, no. 1, Jan. 2024, pp. 103-4, doi:10.17261/Pressacademia.2023.1869.

Vancouver

1.Cumhur Ekinci, Mustafa Abdullah Hakkoz, Ünsal Kıran, Sirma Seker. HANDLING MISSING VALUES IN MIXED PANEL FINANCIAL DATA: A COMPARISON OF DIFFERENT TECHNIQUES. PAP. 2024 Jan. 1;18(1):103-4. doi:10.17261/Pressacademia.2023.1869