Hybrid Analytic Method for Missing Data Imputation in Medical Big Data

Karima Benhamza; Nadjette Benhamıda; Mohamed Ilyes Bourahdoun; Bilel Boudjahem

doi:10.53508/ijiam.1118198

Research Article

Year 2022, , 1 - 11, 16.01.2023

Karima Benhamza , Nadjette Benhamıda , Mohamed Ilyes Bourahdoun Bilel Boudjahem

https://doi.org/10.53508/ijiam.1118198

Abstract

References

[1] Nazari, Elham, Mohammad Hasan Shahriari, and Hamed Tabesh. "BigData analysis in healthcare: apache hadoop, apache spark and apache flink." Frontiers in Health Informatics 8.1 (2019): 14.
[2] Palanisamy, Venketesh, and Ramkumar Thirunavukarasu. "Implications of big data analytics in developing healthcare frameworks–A review." Journal of King Saud University-Computer and Information Sciences 31.4 (2019): 415-425.
[3] Kumar, Sunil, and Maninder Singh. "Big data analytics for healthcare industry: impact, applications, and tools." Big data mining and analytics 2.1 (2018): 48-57.
[4] Bahri, Safa, et al. "Big data for healthcare: a survey." IEEE access 7 (2018): 7397-7408.
[5] Bennett, Derrick A. "How can I deal with missing data in my study?." Australian and New Zealand journal of public health 25.5 (2001): 464-469.
[6] Graham, John W. "Missing data: Analysis and design". Springer Science & Business Media, (2012).
[7] Mack, Christina, Zhaohui Su, and Daniel Westreich. "Managing missing data in patient registries: addendum to registries for evaluating patient outcomes: a user’s guide." (2018).
[8] Rubin, Donald B. "Inference and missing data." Biometrika 63.3 (1976): 581-592.
[9] Little, Roderick JA, and Donald B. Rubin. Statistical analysis with missing data. Vol. 793. John Wiley & Sons, 2019.
[10] Ludbrook, John. "Outlying observations and missing values: how should they be handled?." Clinical and experimental pharmacology & physiology 35.5-6 (2008): 670-678.
[11] Zhang, Zhongheng. "Missing values in big data research: some basic skills." Annals of translational medicine 3.21 (2015).
[12] Langkamp, Diane L., Amy Lehman, and Stanley Lemeshow. "Techniques for handling missing data in secondary analyses of large surveys." Academic pediatrics 10.3 (2010): 205-210.
[13] Donders, A. Rogier T., et al. "A gentle introduction to imputation of missing values." Journal of clinical epidemiology 59.10 (2006): 1087-1091.
[14] Jerez, José M., et al. "Missing data imputation using statistical and machine learning methods in a real breast cancer problem." Artificial intelligence in medicine 50.2 (2010): 105-115.
[15] Hruschka, Eduardo R., Estevam R. Hruschka, and Nelson FF Ebecken. "Towards efficient imputation by nearest-neighbors: A clustering-based approach." Australasian Joint Conference on Artificial Intelligence. Springer, Berlin, Heidelberg, 2004.
[16] Zhang, Shichao. "Nearest neighbor selection for iteratively kNN imputation." Journal of Systems and Software 85.11 (2012): 2541-2552.
[17] Pujianto, Utomo, Aji Prasetya Wibawa, and Muhammad Iqbal Akbar. "K-nearest neighbor (k-NN) based missing data imputation." 2019 5th International Conference on Science in Information Technology (ICSITech). IEEE, 2019.
[18] Silva-Ramírez, Esther-Lydia, Rafael Pino-Mejías, and Manuel López-Coello. "Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns." Applied Soft Computing 29 (2015): 65-74.
[19] Purwar, Archana, and Sandeep Kumar Singh. "Hybrid prediction model with missing value imputation for medical data." Expert Systems with Applications 42.13 (2015): 5621-5631.
[20] Twala, Bhekisipho. "An empirical comparison of techniques for handling incomplete data using decision trees." Applied Artificial Intelligence 23.5 (2009): 373-405.
[21] Gimpy, Dr, and Minakshi Rajan Vohra. "Estimation of missing values using decision tree approach." Int J Comput Sci Inf Technol 5.4 (2014): 5216-5220.
[22] Zhang, Shichao, et al. "Missing value imputation based on data clustering." Transactions on computational science I. Springer, Berlin, Heidelberg, 2008. 128-138.
[23] Zhang, Zhaoyang, Hua Fang, and Honggang Wang. "Multiple imputation based clustering validation (miv) for big longitudinal trial data with missing values in ehealth." Journal of medical systems 40.6 (2016): 1-9.
[24] Emmanuel, Tlamelo, et al. "A survey on missing data in machine learning." Journal of Big Data 8.1 (2021): 1-37.
[25] Enders CK. Applied missing data analysis. New York: The Guilford Press; 2010.
[26] Carpenter, James R., Michael G. Kenward, and Stijn Vansteelandt. "A comparison of multiple imputation and doubly robust estimation for analyses with missing data." Journal of the Royal Statistical Society: Series A (Statistics in Society) 169.3 (2006): 571-584.
[27] Beale, Evelyn ML, and Roderick JA Little. "Missing values in multivariate analysis." Journal of the Royal Statistical Society: Series B (Methodological) 37.1 (1975): 129-145.
[28] Carpenter, James R., and Michael G. Kenward. "Missing data in randomised controlled trials: a practical guide." (2007): 199.
[29] Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. "An efficient k-means clustering algorithm: Analysis and implementation." IEEE transactions on pattern analysis and machine intelligence 24.7 (2002): 881-892.
[30] Cover, Thomas, and Peter Hart. "Nearest neighbor pattern classification." IEEE transactions on information theory 13.1 (1967): 21-27.
[31] Gou, Jianping, et al. "A generalized mean distance-based k-nearest neighbor classifier." Expert Systems with Applications 115 (2019): 356-372. [32] Allen, David M. "Mean square error of prediction as a criterion for selecting variables." Technometrics 13.3 (1971): 469-475.

Hybrid Analytic Method for Missing Data Imputation in Medical Big Data

Year 2022, , 1 - 11, 16.01.2023

Karima Benhamza , Nadjette Benhamıda , Mohamed Ilyes Bourahdoun Bilel Boudjahem

https://doi.org/10.53508/ijiam.1118198

Abstract

Compared to other traditional datasets, medical data has several hidden challenges. In fact, the possibility of missing values for certain attributes presents a great dispute for data mining researchers to make correct medical decisions. In this paper, a hybrid scheme combining the k-means method and regression analysis is proposed. A combination of these two analytical methods allows to find the best distributional model of numerical data in space and helps to predict missing data. Applied to medical data (diabetes dataset), the proposed model predicts the values with a minor error rate, which is considered very satisfactory.

Keywords

Medical Big data, Missing data, Imputation, K-means, Regression

References

[1] Nazari, Elham, Mohammad Hasan Shahriari, and Hamed Tabesh. "BigData analysis in healthcare: apache hadoop, apache spark and apache flink." Frontiers in Health Informatics 8.1 (2019): 14.
[2] Palanisamy, Venketesh, and Ramkumar Thirunavukarasu. "Implications of big data analytics in developing healthcare frameworks–A review." Journal of King Saud University-Computer and Information Sciences 31.4 (2019): 415-425.
[3] Kumar, Sunil, and Maninder Singh. "Big data analytics for healthcare industry: impact, applications, and tools." Big data mining and analytics 2.1 (2018): 48-57.
[4] Bahri, Safa, et al. "Big data for healthcare: a survey." IEEE access 7 (2018): 7397-7408.
[5] Bennett, Derrick A. "How can I deal with missing data in my study?." Australian and New Zealand journal of public health 25.5 (2001): 464-469.
[6] Graham, John W. "Missing data: Analysis and design". Springer Science & Business Media, (2012).
[7] Mack, Christina, Zhaohui Su, and Daniel Westreich. "Managing missing data in patient registries: addendum to registries for evaluating patient outcomes: a user’s guide." (2018).
[8] Rubin, Donald B. "Inference and missing data." Biometrika 63.3 (1976): 581-592.
[9] Little, Roderick JA, and Donald B. Rubin. Statistical analysis with missing data. Vol. 793. John Wiley & Sons, 2019.
[10] Ludbrook, John. "Outlying observations and missing values: how should they be handled?." Clinical and experimental pharmacology & physiology 35.5-6 (2008): 670-678.
[11] Zhang, Zhongheng. "Missing values in big data research: some basic skills." Annals of translational medicine 3.21 (2015).
[12] Langkamp, Diane L., Amy Lehman, and Stanley Lemeshow. "Techniques for handling missing data in secondary analyses of large surveys." Academic pediatrics 10.3 (2010): 205-210.
[13] Donders, A. Rogier T., et al. "A gentle introduction to imputation of missing values." Journal of clinical epidemiology 59.10 (2006): 1087-1091.
[14] Jerez, José M., et al. "Missing data imputation using statistical and machine learning methods in a real breast cancer problem." Artificial intelligence in medicine 50.2 (2010): 105-115.
[15] Hruschka, Eduardo R., Estevam R. Hruschka, and Nelson FF Ebecken. "Towards efficient imputation by nearest-neighbors: A clustering-based approach." Australasian Joint Conference on Artificial Intelligence. Springer, Berlin, Heidelberg, 2004.
[16] Zhang, Shichao. "Nearest neighbor selection for iteratively kNN imputation." Journal of Systems and Software 85.11 (2012): 2541-2552.
[17] Pujianto, Utomo, Aji Prasetya Wibawa, and Muhammad Iqbal Akbar. "K-nearest neighbor (k-NN) based missing data imputation." 2019 5th International Conference on Science in Information Technology (ICSITech). IEEE, 2019.
[18] Silva-Ramírez, Esther-Lydia, Rafael Pino-Mejías, and Manuel López-Coello. "Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns." Applied Soft Computing 29 (2015): 65-74.
[19] Purwar, Archana, and Sandeep Kumar Singh. "Hybrid prediction model with missing value imputation for medical data." Expert Systems with Applications 42.13 (2015): 5621-5631.
[20] Twala, Bhekisipho. "An empirical comparison of techniques for handling incomplete data using decision trees." Applied Artificial Intelligence 23.5 (2009): 373-405.
[21] Gimpy, Dr, and Minakshi Rajan Vohra. "Estimation of missing values using decision tree approach." Int J Comput Sci Inf Technol 5.4 (2014): 5216-5220.
[22] Zhang, Shichao, et al. "Missing value imputation based on data clustering." Transactions on computational science I. Springer, Berlin, Heidelberg, 2008. 128-138.
[23] Zhang, Zhaoyang, Hua Fang, and Honggang Wang. "Multiple imputation based clustering validation (miv) for big longitudinal trial data with missing values in ehealth." Journal of medical systems 40.6 (2016): 1-9.
[24] Emmanuel, Tlamelo, et al. "A survey on missing data in machine learning." Journal of Big Data 8.1 (2021): 1-37.
[25] Enders CK. Applied missing data analysis. New York: The Guilford Press; 2010.
[26] Carpenter, James R., Michael G. Kenward, and Stijn Vansteelandt. "A comparison of multiple imputation and doubly robust estimation for analyses with missing data." Journal of the Royal Statistical Society: Series A (Statistics in Society) 169.3 (2006): 571-584.
[27] Beale, Evelyn ML, and Roderick JA Little. "Missing values in multivariate analysis." Journal of the Royal Statistical Society: Series B (Methodological) 37.1 (1975): 129-145.
[28] Carpenter, James R., and Michael G. Kenward. "Missing data in randomised controlled trials: a practical guide." (2007): 199.
[29] Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. "An efficient k-means clustering algorithm: Analysis and implementation." IEEE transactions on pattern analysis and machine intelligence 24.7 (2002): 881-892.
[30] Cover, Thomas, and Peter Hart. "Nearest neighbor pattern classification." IEEE transactions on information theory 13.1 (1967): 21-27.
[31] Gou, Jianping, et al. "A generalized mean distance-based k-nearest neighbor classifier." Expert Systems with Applications 115 (2019): 356-372. [32] Allen, David M. "Mean square error of prediction as a criterion for selecting variables." Technometrics 13.3 (1971): 469-475.

There are 31 citations in total.

Details

Primary Language	English
Subjects	Software Engineering (Other)
Journal Section	Articles
Authors	Karima Benhamza Nadjette Benhamıda Mohamed Ilyes Bourahdoun This is me Bilel Boudjahem This is me
Publication Date	January 16, 2023
Acceptance Date	October 31, 2022
Published in Issue	Year 2022

Cite

APA	Benhamza, K., Benhamıda, N., Bourahdoun, M. I., Boudjahem, B. (2023). Hybrid Analytic Method for Missing Data Imputation in Medical Big Data. International Journal of Informatics and Applied Mathematics, 5(2), 1-11. https://doi.org/10.53508/ijiam.1118198
AMA	Benhamza K, Benhamıda N, Bourahdoun MI, Boudjahem B. Hybrid Analytic Method for Missing Data Imputation in Medical Big Data. IJIAM. January 2023;5(2):1-11. doi:10.53508/ijiam.1118198
Chicago	Benhamza, Karima, Nadjette Benhamıda, Mohamed Ilyes Bourahdoun, and Bilel Boudjahem. “Hybrid Analytic Method for Missing Data Imputation in Medical Big Data”. International Journal of Informatics and Applied Mathematics 5, no. 2 (January 2023): 1-11. https://doi.org/10.53508/ijiam.1118198.
EndNote	Benhamza K, Benhamıda N, Bourahdoun MI, Boudjahem B (January 1, 2023) Hybrid Analytic Method for Missing Data Imputation in Medical Big Data. International Journal of Informatics and Applied Mathematics 5 2 1–11.
IEEE	K. Benhamza, N. Benhamıda, M. I. Bourahdoun, and B. Boudjahem, “Hybrid Analytic Method for Missing Data Imputation in Medical Big Data”, IJIAM, vol. 5, no. 2, pp. 1–11, 2023, doi: 10.53508/ijiam.1118198.
ISNAD	Benhamza, Karima et al. “Hybrid Analytic Method for Missing Data Imputation in Medical Big Data”. International Journal of Informatics and Applied Mathematics 5/2 (January 2023), 1-11. https://doi.org/10.53508/ijiam.1118198.
JAMA	Benhamza K, Benhamıda N, Bourahdoun MI, Boudjahem B. Hybrid Analytic Method for Missing Data Imputation in Medical Big Data. IJIAM. 2023;5:1–11.
MLA	Benhamza, Karima et al. “Hybrid Analytic Method for Missing Data Imputation in Medical Big Data”. International Journal of Informatics and Applied Mathematics, vol. 5, no. 2, 2023, pp. 1-11, doi:10.53508/ijiam.1118198.
Vancouver	Benhamza K, Benhamıda N, Bourahdoun MI, Boudjahem B. Hybrid Analytic Method for Missing Data Imputation in Medical Big Data. IJIAM. 2023;5(2):1-11.

Article Files

Full Text

International Journal of Informatics and Applied Mathematics