Research Article
BibTex RIS Cite

Hepatitis C Disease Detection Based on PCA–SVM Model

Year 2022, Volume: 9 Issue: 2, 111 - 116, 30.06.2022
https://doi.org/10.17350/HJSE19030000261

Abstract

Hepatitis C is a liver disease caused by infection with the hepatitis C virus (HCV), which is transmitted through the blood. The disease can lead to diseases ranging from a mild form to serious lifelong illness. Studies to detect the disease early and reduce its effect are continuing. This study proposes an effective support vector machine model supported by principal component analysis for detecting hepatitis c disease. The dataset consisted of twelve independent variables, each containing 582 samples, and these variables were used as inputs to the two classifiers, support vector machine (SVM) and artificial neural network (ANN). The accuracy, sensitivity, specificity, MCC and KAPPA were calculated using two classification models. In addition, performance comparisons of classifiers were made for the two cases with and without PCA (principal component analysis) applied to the inputs. The highest accuracy (98.7%), sensitivity (99.1%), specificity (95.2%), MCC (92.3%) and Kappa (92.3%) in the binary class label were obtained with the SVM with PCA. In the four-class label, the highest accuracy was achieved with the same model with 95.7%. The results show that an SVM classifier model, in which PCA-reduced independent variables are applied to its inputs, may be a candidate for an accurate prediction model to predict hepatitis C disease.

References

  • [1] Hepatitis.https://www.who.int/news-room/fact-sheets/detail/hepatitis-c (accessed Nov. 20,2021)
  • [2] Rayan Z, Alfonse M, Salem ABM. Machine Learning Approaches in Smart Health. Procedia Computer Science 154 (2019) 361–368. https://doi.org/10.1016/J.PROCS.2019.06.052
  • [3] Gündoğdu S. Improving breast cancer prediction using a pattern recognition network with optimal feature subsets. Croatian medical journal 62 (2021) 480–487. https://doi.org/10.3325/cmj.2021.62.480
  • [4] Senturk ZK. Early diagnosis of Parkinson’s disease using machine learning algorithms. Medical Hypotheses 138 (2020) 109603. https://doi.org/10.1016/J.MEHY.2020.109603
  • [5] Ayeldeen H, Shaker O, Ayeldeen G, Anwar KM. Prediction of liver fibrosis stages by machine learning model: A decision tree approach, Paper presented at Third World Conference on Complex Systems (WCCS), Marrakech, Morocco, 23-25 November, IEEE, pp. 1-6, 2015. https://doi.org/10.1109/ICOCS.2015.7483212
  • [6] Orczyk T, Porwik P. Liver fibrosis diagnosis support system using machine learning methods, in: Chaki,R., Cortesi, A., Saeed, K., Chaki, N. (Eds) Advanced Computing and Systems for Security. Advances in Intelligent Systems and Computing. Springer, New Delhi, pp.111-121, 2016. https://doi.org/10.1007/978-81-322-2650-5_8
  • [7] Bhargav, KS, Thota DSSB, Kumari TD, Vikas B. Application of Machine Learning Classification Algorithms on Hepatitis Dataset. International Journal of Applied Engineering Research 13 (2018) 12732–12737.
  • [8] Ahammed K, Satu MS, Khan MI, Whaiduzzaman M. Predicting Infectious State of Hepatitis C Virus Affected Patient's Applying Machine Learning Methods, Paper presented at 2020 IEEE Region 10 Symposium (TENSYMP), Dhaka, Bangladesh, 5-7 June, IEEE, pp. 1371-1374, 2020. http://doi: 10.1109/TENSYMP50017.2020.9230464.
  • [9] Syafa’ah L, Zulfatman Z, Pakaya I, Lestandy M. Comparison of Machine Learning Classification Methods in Hepatitis C Virus. Jurnal Online Informatika 6(2021) 73–78. https://doi.org/10.15575/JOIN.V6I1.719
  • [10] Nandipati SC, XinYing C, Wah KK. Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques. Applications of Modelling and Simulation 4 (2020) 89–100.
  • [11] Shlens J. A tutorial on principal component analysis. arXiv preprint 2014. https://doi.org/10.48550/arXiv.1404.1100
  • [12] 12. Abdi H, Williams LJ. Principal component analysis, Wiley Interdisciplinary Reviews: Computational Statistics 2 (2010) 433-459. https://doi.org/10.1002/wics.101
  • [13] Jolliffe I. Principal component analysis, in Lovric M.(Ed.).International Encyclopedia of Statistical Science. Springer Series in Statistics. Springer, Berlin/Heidelberg, pp.1094-1096, 2011. . https://doi.org/10.1007/978-3-642-04898-2_455
  • [14] Kuhn M, Johnson K. Applied predictive modeling, first ed. Springer, New York, 2013.
  • [15] Kuhn M, Johnson K. Feature engineering and selection: A practical approach for predictive models, first ed. CRC Press, London, 2019.
  • [16] Hotelling H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24 (1933) 417-441.
  • [17] Shafizadeh-Moghadam H. Fully component selection: An efficient combination of feature selection and principal component analysis to increase model performance. Expert Systems with Applications 186 (2021) 115678. https://doi.org/10.1016/J.ESWA.2021.115678
  • [18] Vinodhini G, Chandrasekaran RM. Sentiment Mining Using SVM-Based Hybrid Classification Model, in: Krishnan G, Anitha, R, Lekshmi R, Kumar, M, Bonato A, Graña M (Eds). Advances in Intelligent Systems and Computing. Paper presented at Computational Intelligence, Cyber Security and Computational Models, Coimbatore, India, 19-21 December. Springer, New Delhi, 246, pp.155-162. 2014. https://doi.org/10.1007/978-81-322-1680-3_18
  • [19] Liang N, Tuo Y, Deng Y, He T. PCA-based SVM classification for simulated ice floes in front of sluice gates. Polar Science (2022) 100839. https://doi.org/10.1016/j.polar.2022.100839.
  • [20] Yanqing Z, Shiwei Z, Junfeng Y, Lei W. Predicting corporate financial distress by PCA-based support vector machines, Paper presented at 2010 International Conference on Networking and Information Technology, Manila, 11-12 June, IEEE, pp. 373-376, 2010. https://doi.org/10.1109/ICNIT.2010.5508491
  • [21] Dua D, Graff C. UCI Machine Learning Repository [http://archive. ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science 2019. (accessed Nov. 15, 2021).
  • [22] Jolliffe I. A 50-year personal journey through time with principal component analysis. Journal of Multivariate Analysis 188 (2022) 104820. https://doi.org/10.1016/J.JMVA.2021.104820
  • [23] Uddin MP, Mamun MA, Hossain MA. PCA-based Feature Reduction for Hyperspectral Remote Sensing Image Classification. IETE Technical Review 38 (2021) 377–396. https://doi.org/10.1080/ 02564602.2020.1740615
  • [24] Schreiber JB. Issues and recommendations for exploratory factor analysis and principal component analysis. Research in Social and Administrative Pharmacy 17 (2021) 1004–1011. https://doi. org/10.1016/J.SAPHARM.2020.07.027
  • [25] Nocedal J, Wright SJ. Numerical Optimization, second ed. Springer, Berlin/Heidelberg, Germany, 2006.
  • [26] Cirillo A, Laudante G, Pirozzi S. Tactile sensor data interpretation for estimation of wire features. Electronics (Switzerland) 10(2021) 1458. https://doi.org/10.3390/ELECTRONICS10121458
  • [27] Toropova AP, Toropov AA, Benfenati E. Semi-correlations as a tool to model for skin sensitization. Food and Chemical Toxicology 157 (2021) 112580. https://doi.org/10.1016/j.fct.2021.112580
  • [28] Lim JY, Nam JS, Shin H, Park J, Song HI, Kang M, Lim K, Choi Y. Identification of Newly Emerging Influenza Viruses by Detecting the Virally Infected Cells Based on Surface Enhanced Raman Spectroscopy and Principal Component Analysis. Analytical Chemistry 91 (2019) 5677–5684. https://doi.org/10.1021/acs. analchem.8b05533
Year 2022, Volume: 9 Issue: 2, 111 - 116, 30.06.2022
https://doi.org/10.17350/HJSE19030000261

Abstract

References

  • [1] Hepatitis.https://www.who.int/news-room/fact-sheets/detail/hepatitis-c (accessed Nov. 20,2021)
  • [2] Rayan Z, Alfonse M, Salem ABM. Machine Learning Approaches in Smart Health. Procedia Computer Science 154 (2019) 361–368. https://doi.org/10.1016/J.PROCS.2019.06.052
  • [3] Gündoğdu S. Improving breast cancer prediction using a pattern recognition network with optimal feature subsets. Croatian medical journal 62 (2021) 480–487. https://doi.org/10.3325/cmj.2021.62.480
  • [4] Senturk ZK. Early diagnosis of Parkinson’s disease using machine learning algorithms. Medical Hypotheses 138 (2020) 109603. https://doi.org/10.1016/J.MEHY.2020.109603
  • [5] Ayeldeen H, Shaker O, Ayeldeen G, Anwar KM. Prediction of liver fibrosis stages by machine learning model: A decision tree approach, Paper presented at Third World Conference on Complex Systems (WCCS), Marrakech, Morocco, 23-25 November, IEEE, pp. 1-6, 2015. https://doi.org/10.1109/ICOCS.2015.7483212
  • [6] Orczyk T, Porwik P. Liver fibrosis diagnosis support system using machine learning methods, in: Chaki,R., Cortesi, A., Saeed, K., Chaki, N. (Eds) Advanced Computing and Systems for Security. Advances in Intelligent Systems and Computing. Springer, New Delhi, pp.111-121, 2016. https://doi.org/10.1007/978-81-322-2650-5_8
  • [7] Bhargav, KS, Thota DSSB, Kumari TD, Vikas B. Application of Machine Learning Classification Algorithms on Hepatitis Dataset. International Journal of Applied Engineering Research 13 (2018) 12732–12737.
  • [8] Ahammed K, Satu MS, Khan MI, Whaiduzzaman M. Predicting Infectious State of Hepatitis C Virus Affected Patient's Applying Machine Learning Methods, Paper presented at 2020 IEEE Region 10 Symposium (TENSYMP), Dhaka, Bangladesh, 5-7 June, IEEE, pp. 1371-1374, 2020. http://doi: 10.1109/TENSYMP50017.2020.9230464.
  • [9] Syafa’ah L, Zulfatman Z, Pakaya I, Lestandy M. Comparison of Machine Learning Classification Methods in Hepatitis C Virus. Jurnal Online Informatika 6(2021) 73–78. https://doi.org/10.15575/JOIN.V6I1.719
  • [10] Nandipati SC, XinYing C, Wah KK. Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques. Applications of Modelling and Simulation 4 (2020) 89–100.
  • [11] Shlens J. A tutorial on principal component analysis. arXiv preprint 2014. https://doi.org/10.48550/arXiv.1404.1100
  • [12] 12. Abdi H, Williams LJ. Principal component analysis, Wiley Interdisciplinary Reviews: Computational Statistics 2 (2010) 433-459. https://doi.org/10.1002/wics.101
  • [13] Jolliffe I. Principal component analysis, in Lovric M.(Ed.).International Encyclopedia of Statistical Science. Springer Series in Statistics. Springer, Berlin/Heidelberg, pp.1094-1096, 2011. . https://doi.org/10.1007/978-3-642-04898-2_455
  • [14] Kuhn M, Johnson K. Applied predictive modeling, first ed. Springer, New York, 2013.
  • [15] Kuhn M, Johnson K. Feature engineering and selection: A practical approach for predictive models, first ed. CRC Press, London, 2019.
  • [16] Hotelling H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24 (1933) 417-441.
  • [17] Shafizadeh-Moghadam H. Fully component selection: An efficient combination of feature selection and principal component analysis to increase model performance. Expert Systems with Applications 186 (2021) 115678. https://doi.org/10.1016/J.ESWA.2021.115678
  • [18] Vinodhini G, Chandrasekaran RM. Sentiment Mining Using SVM-Based Hybrid Classification Model, in: Krishnan G, Anitha, R, Lekshmi R, Kumar, M, Bonato A, Graña M (Eds). Advances in Intelligent Systems and Computing. Paper presented at Computational Intelligence, Cyber Security and Computational Models, Coimbatore, India, 19-21 December. Springer, New Delhi, 246, pp.155-162. 2014. https://doi.org/10.1007/978-81-322-1680-3_18
  • [19] Liang N, Tuo Y, Deng Y, He T. PCA-based SVM classification for simulated ice floes in front of sluice gates. Polar Science (2022) 100839. https://doi.org/10.1016/j.polar.2022.100839.
  • [20] Yanqing Z, Shiwei Z, Junfeng Y, Lei W. Predicting corporate financial distress by PCA-based support vector machines, Paper presented at 2010 International Conference on Networking and Information Technology, Manila, 11-12 June, IEEE, pp. 373-376, 2010. https://doi.org/10.1109/ICNIT.2010.5508491
  • [21] Dua D, Graff C. UCI Machine Learning Repository [http://archive. ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science 2019. (accessed Nov. 15, 2021).
  • [22] Jolliffe I. A 50-year personal journey through time with principal component analysis. Journal of Multivariate Analysis 188 (2022) 104820. https://doi.org/10.1016/J.JMVA.2021.104820
  • [23] Uddin MP, Mamun MA, Hossain MA. PCA-based Feature Reduction for Hyperspectral Remote Sensing Image Classification. IETE Technical Review 38 (2021) 377–396. https://doi.org/10.1080/ 02564602.2020.1740615
  • [24] Schreiber JB. Issues and recommendations for exploratory factor analysis and principal component analysis. Research in Social and Administrative Pharmacy 17 (2021) 1004–1011. https://doi. org/10.1016/J.SAPHARM.2020.07.027
  • [25] Nocedal J, Wright SJ. Numerical Optimization, second ed. Springer, Berlin/Heidelberg, Germany, 2006.
  • [26] Cirillo A, Laudante G, Pirozzi S. Tactile sensor data interpretation for estimation of wire features. Electronics (Switzerland) 10(2021) 1458. https://doi.org/10.3390/ELECTRONICS10121458
  • [27] Toropova AP, Toropov AA, Benfenati E. Semi-correlations as a tool to model for skin sensitization. Food and Chemical Toxicology 157 (2021) 112580. https://doi.org/10.1016/j.fct.2021.112580
  • [28] Lim JY, Nam JS, Shin H, Park J, Song HI, Kang M, Lim K, Choi Y. Identification of Newly Emerging Influenza Viruses by Detecting the Virally Infected Cells Based on Surface Enhanced Raman Spectroscopy and Principal Component Analysis. Analytical Chemistry 91 (2019) 5677–5684. https://doi.org/10.1021/acs. analchem.8b05533
There are 28 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Research Articles
Authors

Serdar Gündoğdu 0000-0003-2549-5284

Publication Date June 30, 2022
Submission Date January 1, 2022
Published in Issue Year 2022 Volume: 9 Issue: 2

Cite

Vancouver Gündoğdu S. Hepatitis C Disease Detection Based on PCA–SVM Model. Hittite J Sci Eng. 2022;9(2):111-6.

Hittite Journal of Science and Engineering is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY NC).